i965/fs: use a better heuristic for SIMD16i965-sched

Previously, we presumed that any SIMD16 program that spilled wasn't worth it, and failed to compile SIMD16 if it spilled at all. But that isn't a real measure of why SIMD16 helps over SIMD8 at all. SIMD16 helps because, among a few other things, it helps reduce pipeline and hide latency. In other words, most of the time, we'll be able to run a shader for twice the number of pixels in less than twice the number of cycles. Now that we have an ok-ish estimate of the number of cycles a program takes, we can simply compare the cycle counts directly to see if that is indeed the case. There are two reasons why SIMD16 can be worse than SIMD8: 1. Increased register pressure means that we aren't as able to get a good schedule, so we wind up with lots of false dependencies and not much latency hiding. 2. We wind up spilling, which creates latency which we can't hide and we have huge stalls. Previously, we didn't consider (1) at all, and we only half-considered (2) -- we would give up as soon as the shader spilled, even if we were able to hide the latency of scratch reads/writes well. But by using the cycle count estimate, we can consider both (1) and (2) directly, since false dependencies and poor scheduling of textures and spills will both result in an increased cycle count. Now, we bail out if the SIMD16 program contains more than twice the cycles of the SIMD8 program. This is a fairly conservative threshold, since by that point, the SIMD16 and SIMD8 programs should take about the same time. That way, we won't miss any opportunity for SIMD16 programs to help us, even if occasionally we use ones that don't help that much.
author: Connor Abbott <cwabbott0@gmail.com> 2015-06-11 20:36:07 -0700
committer: Connor Abbott <cwabbott0@gmail.com> 2015-10-03 14:32:17 -0400
commit: d5a12f15d5b27c130dd8f1e4fe049aef2f348e0b (patch)
tree: 728249030e434f3995e416be76eeb47660129b04
parent: e147b90a716c503c5c19b32a486bd745b7af01fb (diff)
2 files changed, 12 insertions, 14 deletions
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 8e1b2d6eb3..8f14c52267 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -4953,20 +4953,11 @@ fs_visitor::allocate_registers()
    } while(!allocated_without_spills && num_missed < 8);
 
    if (!allocated_without_spills) {
-      /* We assume that any spilling is worse than just dropping back to
-       * SIMD8.  There's probably actually some intermediate point where
-       * SIMD16 with a couple of spills is still better.
-       */
-      if (dispatch_width == 16) {
-         fail("Failure to register allocate.  Reduce number of "
-              "live scalar values to avoid this.");
-      } else {
-         compiler->shader_perf_log(log_data,
-                                   "%s shader triggered register spilling.  "
-                                   "Try reducing the number of live scalar "
-                                   "values to improve performance.\n",
-                                   stage_name);
-      }
+      compiler->shader_perf_log(log_data,
+                                "%s shader triggered register spilling.  "
+                                "Try reducing the number of live scalar "
+                                "values to improve performance.\n",
+                                stage_name);
 
       /* Since we're out of heuristics, just go spill registers until we
        * get an allocation.
@@ -4988,6 +4979,10 @@ fs_visitor::allocate_registers()
 
    schedule_instructions(0, SCHEDULE_POST);
 
+   if (dispatch_width == 16 && cfg->cycle_count > 2 * simd8_cycles) {
+      fail("Failure to schedule SIMD16 advantageously");
+   }
+
    if (last_scratch > 0)
       prog_data->total_scratch = brw_get_scratch_size(last_scratch);
 }
@@ -5212,6 +5207,7 @@ brw_wm_fs_emit(struct brw_context *brw,
       if (!v.simd16_unsupported) {
          /* Try a SIMD16 compile */
          v2.import_uniforms(&v);
+         v2.simd8_cycles = v.cfg->cycle_count;
          if (!v2.run_fs(brw->use_rep_send)) {
             perf_debug("SIMD16 shader failed to compile: %s", v2.fail_msg);
          } else {
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h
index e088b517c5..4828b0af20 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -364,6 +364,8 @@ public:
    bool simd16_unsupported;
    char *no16_msg;
 
+   unsigned simd8_cycles;
+
    /* Result of last visit() method. Still used by emit_texture() */
    fs_reg result;
author	Connor Abbott <cwabbott0@gmail.com>	2015-06-11 20:36:07 -0700
committer	Connor Abbott <cwabbott0@gmail.com>	2015-10-03 14:32:17 -0400
commit	d5a12f15d5b27c130dd8f1e4fe049aef2f348e0b (patch)
tree	728249030e434f3995e416be76eeb47660129b04
parent	e147b90a716c503c5c19b32a486bd745b7af01fb (diff)