diff options
author | Connor Abbott <cwabbott0@gmail.com> | 2015-06-11 20:36:07 -0700 |
---|---|---|
committer | Matt Turner <mattst88@gmail.com> | 2016-10-20 15:32:46 -0700 |
commit | c8d0e85e9a25d317d84d353e776c3a7cef785372 (patch) | |
tree | 2d11801e19c953f8159977ca134c905dcecd570b | |
parent | 621c6f6a755d9316fc34563ed3dcb1f54bd605ba (diff) |
i965/fs: use a better heuristic for SIMD16
Previously, we presumed that any SIMD16 program that spilled wasn't
worth it, and failed to compile SIMD16 if it spilled at all. But
that isn't a real measure of why SIMD16 helps over SIMD8 at all.
SIMD16 helps because, among a few other things, it helps reduce pipeline
and hide latency. In other words, most of the time, we'll be able to run
a shader for twice the number of pixels in less than twice the number of
cycles. Now that we have an ok-ish estimate of the number of cycles a
program takes, we can simply compare the cycle counts directly to see if
that is indeed the case. There are two reasons why SIMD16 can be worse
than SIMD8:
1. Increased register pressure means that we aren't as able to get a
good schedule, so we wind up with lots of false dependencies and not
much latency hiding.
2. We wind up spilling, which creates latency which we can't hide and we
have huge stalls.
Previously, we didn't consider (1) at all, and we only half-considered
(2) -- we would give up as soon as the shader spilled, even if we were
able to hide the latency of scratch reads/writes well. But by using the
cycle count estimate, we can consider both (1) and (2) directly, since
false dependencies and poor scheduling of textures and spills will both
result in an increased cycle count. Now, we bail out if the SIMD16
program contains more than twice the cycles of the SIMD8 program. This
is a fairly conservative threshold, since by that point, the SIMD16 and
SIMD8 programs should take about the same time. That way, we won't miss
any opportunity for SIMD16 programs to help us, even if occasionally we
use ones that don't help that much.
LOST: 484
GAINED: 6
-rw-r--r-- | src/mesa/drivers/dri/i965/brw_fs.cpp | 6 | ||||
-rw-r--r-- | src/mesa/drivers/dri/i965/brw_fs.h | 2 |
2 files changed, 7 insertions, 1 deletions
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 408295e4fe98..3524af231c74 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -5924,9 +5924,12 @@ fs_visitor::allocate_registers(bool allow_spilling) schedule_instructions(SCHEDULE_POST); + if (dispatch_width == 16 && cfg->cycle_count > 2 * simd8_cycles) { + fail("Failure to schedule SIMD16 advantageously"); + } + if (last_scratch > 0) { unsigned max_scratch_size = 2 * 1024 * 1024; - prog_data->total_scratch = brw_get_scratch_size(last_scratch); if (stage == MESA_SHADER_COMPUTE) { @@ -6591,6 +6594,7 @@ brw_compile_fs(const struct brw_compiler *compiler, void *log_data, &prog_data->base, prog, shader, 16, shader_time_index16); v16.import_uniforms(&v8); + v16.simd8_cycles = v8.cfg->cycle_count; if (!v16.run_fs(allow_spilling, use_rep_send)) { compiler->shader_perf_log(log_data, "SIMD16 shader failed to compile: %s", diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 2fd4a237bcc5..18b6795ecc4c 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -337,6 +337,8 @@ public: bool failed; char *fail_msg; + unsigned simd8_cycles; + /** Register numbers for thread payload fields. */ struct thread_payload { uint8_t source_depth_reg; |