summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-10-16i965/sched: use liveness analysis for computing register pressurei965-sched-conservative-v2Connor Abbott1-56/+244
Previously, we were using some heuristics to try and detect when a write was about to begin a live range, or when a read was about to end a live range. We never used the liveness analysis information used by the register allocator, though, which meant that the scheduler's and the allocator's ideas of when a live range began and ended were different. Not only did this make our estimate of the register pressure benefit of scheduling an instruction wrong in some cases, but it was preventing us from knowing the actual register pressure when scheduling each instruction, which we want to have in order to switch to register pressure scheduling only when the register pressure is too high. This commit rewrites the register pressure tracking code to use the same model as our register allocator currently uses. We use the results of liveness analysis, as well as the compute_payload_ranges() function that we split out in the last commit. This means that we compute live ranges twice on each round through the register allocator, although we could speed it up by only recomputing the ranges and not the live in/live out sets after scheduling, since we only shuffle around instructions within a single basic block when we schedule. Shader-db results on bdw: total instructions in shared programs: 7130187 -> 7129880 (-0.00%) instructions in affected programs: 1744 -> 1437 (-17.60%) helped: 1 HURT: 1 total cycles in shared programs: 172535126 -> 172473226 (-0.04%) cycles in affected programs: 11338636 -> 11276736 (-0.55%) helped: 876 HURT: 873 LOST: 8 GAINED: 0
2015-10-16i965/fs: split out calculation of payload live rangesConnor Abbott2-22/+31
We'll need this for the scheduler too, since it wants to know when the live ranges of payload registers end in order to model them in our register pressure calculations.
2015-10-16i965: dump scheduling cycle estimatesConnor Abbott4-9/+35
The heuristic we're using is rather lame, since it assumes everything is non-uniform and loops execute 10 times, but it should be enough for measuring improvements in the scheduler that don't result in a change in the number of instructions. v2: - Switch loops and cycle counts to be compatible with older shader-db. - Make loop heuristic 10x to match with spilling code.
2015-10-16i965: always run the post-RA schedulerConnor Abbott1-2/+1
Before, we would only do scheduling after register allocation if we spilled, despite the fact that the pre-RA scheduler was only supposed to be for register pressure and set the latencies of every instruction to 1. This meant that unless we spilled, which we rarely do, then we never considered instruction latencies at all, and we usually never bothered to try and hide texture fetch latency. Although a later commit removes the setting the latency to 1 part, we still want to always run the post-RA scheduler since it's able to take the false dependencies that the register allocator creates into account, and it can be more aggressive than the pre-RA scheduler since it doesn't have to worry about register pressure at all. XXX perf data
2015-10-16i965/sched: write-after-read dependencies are freeConnor Abbott1-4/+4
Although write-after-write dependencies have the same latency as read-after-write dependencies due to how the register scoreboard works, write-after-read dependencies aren't checked by the EU at all, so they're purely a constraint on how the scheduler can order the instructions.
2015-10-16i965: fix cycle estimates when there's a pipeline stallConnor Abbott1-7/+8
The issue time for an instruction is how many cycles it takes to actually put it into the pipeline. If there's a pipeline stall that causes the instruction to be delayed, we should first take that into account to figure out when the instruction would start executing and *then* add the issue time. The old code had it backwards, and so we would underestimate the total time whenever we thought there would be a pipeline stall by up to the issue time of the instruction.
2015-10-15nir/glsl: Use shader_prog->Name for naming the NIR shaderJason Ekstrand1-1/+1
This has the better name to use. Aparently, sh->Name is usually 0. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Neil Roberts <neil@linux.intel.com>
2015-10-15nir: Add helpers for creating variables and adding them to listsJason Ekstrand4-46/+99
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-15nir/prog: Use nir_foreach_variableJason Ekstrand1-1/+1
Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-15mesa: wrap a ridiculously long line in es1_conversion.cBrian Paul1-1/+19
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15mesa: add num_buffers() helper in blend.cBrian Paul1-15/+22
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15mesa: optimize _UsesDualSrc blend flag settingBrian Paul1-1/+6
For glBlendFunc and glBlendFuncSeparate(), the _UsesDualSrc flag will be the same for all buffers, so no need to compute it N times. Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15mesa: fix incorrect error string in _mesa_BlendEquationiARB()Brian Paul1-1/+1
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15mesa: move validate_blend_factors() call after no-change checkBrian Paul1-6/+6
A redundant call to glBlendFuncSeparateiARB() is more likely than getting invalid values, so do the no-op check first. Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15mesa: optimize no-change check in _mesa_BlendEquationSeparate()Brian Paul1-15/+26
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15mesa: optimize no-change check in _mesa_BlendEquation()Brian Paul1-12/+23
Same story as preceeding change to _mesa_BlendFuncSeparate(). Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15mesa: optimize no-change check in _mesa_BlendFuncSeparate()Brian Paul1-15/+28
Streamline the checking for no state change in _mesa_BlendFuncSeparate() (and _mesa_BlendFunc()). If _BlendFuncPerBuffer is false, we only need to check the 0th buffer state. Move argument validation after the no-op check. I'm looking at an app that issues about 1000 redundant glBlendFunc() calls per frame! Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15mesa: short-cut new_state == _NEW_LINE in _mesa_update_state_locked()Brian Paul1-1/+5
We can skip to the end of _mesa_update_state_locked() if only the _NEW_LINE flag is set since none of the derived state depends on it (just like _NEW_CURRENT_ATTRIB). Note that we still call the ctx->Driver.UpdateState() function, of course. v2: use bitmask-based test, per Eric. Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15mesa: remove FLUSH_VERTICES() in _mesa_MatrixMode()Brian Paul1-1/+0
Changing the matrix mode alone has no effect on rendering and does not need to trigger a flush or state validation. Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15mesa: android: Fix the incorrect path of sse_minmax.cChih-Wei Huang1-1/+1
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org> Fixes: 669cfc267a1 (android: mesa: fix the path of the SSE4_1 optimisations) Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-15i965: android: add the i965_compile_FILES sources to the driverMauro Rossi1-0/+1
i965_compile_FILES are needed otherwise we'll error out as below: target SharedLib: i915_dri (out/target/product/x86/obj/SHARED_LIBRARIES/i915_dri_intermediates/LINKED/i915_dri.so) external/mesa/src/mesa/drivers/dri/i965/brw_ir_fs.h:181: error: undefined reference to 'fs_inst::~fs_inst()' ... ... external/mesa/src/mesa/drivers/dri/i965/intel_screen.c:1484: error: undefined reference to 'brw_compiler_create' collect2: error: ld returned 1 exit status build/core/shared_library.mk:81: recipe for target 'out/target/product/x86/obj/SHARED_LIBRARIES/i965_dri_intermediates/LINKED/i965_dri.so' failed make: *** [out/target/product/x86/obj/SHARED_LIBRARIES/i965_dri_intermediates/LINKED/i965_dri.so] Error 1 [Emil Velikov: tweak commit message] Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-15program: convert _mesa_init_gl_program() to take struct gl_program *Emil Velikov10-67/+68
Rather than accepting a void pointer, only to down and up cast around it, convert the function to take the base (struct gl_program) pointer. Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-15nir: include nir_instr_set.h in the tarballEmil Velikov1-0/+1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-10-15glsl: Allow arrays of arrays in GLSL ES 3.10 and GLSL 4.30Timothy Arceri3-18/+20
V3: use a check_*_allowed style function for requirements checking rather than has_* which doesn't encapsulate the error message V2: add missing 's' to the extension name in error messages and add decimal place in version string Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-10-15glsl: allow for AoA in calculating offset to ubo start regionTimothy Arceri1-2/+1
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15glsl: build ubo name and indexing offset for AoATimothy Arceri1-30/+86
V2: split out unrelated change as suggested by Samuel Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15glsl: link uniform block arrays of arraysTimothy Arceri3-112/+229
This adds support for setting up the UniformBlock structures for AoA and also adds support for resizing AoA blocks with a packed layout. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15glsl: Add AoA support when checking for non-const indexTimothy Arceri1-1/+1
When checking for non-const indexing of interfaces take into account arrays of arrays Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15glsl: Add support for lowering interface block arrays of arraysTimothy Arceri1-14/+38
V2: make array processing functions static Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15glsl: add AoA support for an inteface with unsized array membersTimothy Arceri1-4/+12
Add support for setting the max access of an unsized member of an interface array of arrays. For example ifc[j][k].foo[i] where foo is unsized. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-15glsl: add AoA support for linking interface blocks with unsized membersTimothy Arceri2-6/+7
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-15glsl: avoid hitting assert for arrays of arraysTimothy Arceri1-0/+6
Also add TODO comment about adding proper support Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-15glsl: add AoA support for atomic countersTimothy Arceri1-23/+54
This marks all counters in an AoA as active. For AoA all but the innermost array are treated as separate counters/uniforms. The Nvidia binary also goes further and finds inactive counters in the AoA, in future we should do this too, however this gets things working for the time being. This change also removes the use of UniformHash for atomic counters, this avoids having to generate name strings used as hash keys. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-15glsl: add std140 layout support for AoATimothy Arceri1-7/+8
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-15i965: add arrays of arrays support for varyingsTimothy Arceri2-5/+3
V2: get the correct vector elements value for outputs Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15glsl: calculate AoA uniform offset correctly for structsTimothy Arceri1-1/+16
This allows the correct offset to be calculated for use in indirect indexing of samplers. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15glsl: remove dead code in a single passTimothy Arceri4-17/+57
Currently only one ir assignment is removed for each var in a single dead code optimisation pass. This means if a var has more than one assignment, then it requires all the glsl optimisations to be run again for each additional assignment to be removed. Another pass is also required to remove the variable itself. With this change all assignments and the variable are removed in a single pass. Some of the arrays of arrays conformance tests that were looping through 8 dimensions ended up with a var with hundreds of assignments. This change helps ES31-CTS.arrays_of_arrays.InteractionFunctionCalls1 go from around 3 min 20 sec -> 2 min ES31-CTS.arrays_of_arrays.InteractionFunctionCalls2 went from around 9 min 20 sec to 7 min 30 sec I had difficulty getting the public shader-db to give a consistent result with or without this change but the results seemed unchanged at between 15-20 seconds. Thomas Helland measured change with shader-db on his machine from approx 117 secs to 112 secs. V3: Simplify freeing of list as suggested by Ian, and spelling fixes. V2: Add assert to be sure references are counted before assignments. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Tested-By: Thomas Helland <thomashelland90@gmail.com> Tested-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-15glsl: dont allow gl_PerVertex to be redeclared as an array of arraysTimothy Arceri2-1/+8
V3: move patch after fixes to ast for AoA and add const to helper as suggested by Ian V2: move single dimensional array detection into a helper Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-15glsl: check that only the outermost array is unsizedTimothy Arceri1-0/+22
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15glsl: allow AoA to be sized by initializer or constructorTimothy Arceri5-41/+82
V2: Split out unsized array validation to its own patch as suggested by Samuel. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15glsl: add support for initialising sampler AoATimothy Arceri1-34/+49
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-15glsl: Add support for linking uniform arrays of arraysTimothy Arceri2-6/+14
V3: Fix setting of data.location for struct AoA UBO members V2: Handle arrays of arrays in the same way structures are handled The ARB_arrays_of_arrays spec doesn't give very many details on how AoA uniforms are intended to be implemented. However in the ARB_program_interface_query spec there are details that show AoA are intended to be handled in a similar way to structs. Issues 7 from the ARB_program_interface_query spec: We define rules consistent with our enumeration rules for other complex types. For existing one-dimensional arrays, we enumerate a single entry if the array is an array of basic types, or separate entries for each array element if the array is an array of structures. We follow similar rules here. For a uniform array such as: uniform vec4 a[5][4][3]; we enumerate twenty different entries ("a[0][0][0]" through "a[4][3][0]"), each of which is treated as an array with three elements. This is morally equivalent to what you'd get if you worked around the limitation in current GLSL via: struct ArrayBottom { vec4 c[3]; }; struct ArrayMid { ArrayBottom b[3]; }; uniform ArrayMid a[5]; which would enumerate "a[0].b[0].c[0]" through "a[4].b[3].c[0]". Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-14i965: Don't hardcode FS in "validation failed!" message.Kenneth Graunke1-1/+1
Instead, print "Scalar VS" or "Scalar FS". Otherwise it's really confusing which stage is broken. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-14glsl: Support uint index in lower_vector_insertJordan Justen1-1/+5
The ES31-CTS.compute_shader.pipeline-compute-chain test case generates an unsigned index by using gl_LocalInvocationID.x and gl_LocalInvocationID.y as array indices. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-14glsl: Support uint index in do_vec_index_to_cond_assignJordan Justen1-1/+3
The ES31-CTS.compute_shader.pipeline-compute-chain test case generates an unsigned index by using gl_LocalInvocationID.x and gl_LocalInvocationID.y as array indices. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-14i965/fs: Ignore compute shaders in brw_nir_lower_inputsJordan Justen1-0/+4
The commit shown below caused compute shaders to hit the unreachable in the default of the switch block. Since compute shaders don't have any inputs, we can make brw_nir_lower_inputs a no-op for CS. commit 2953c3d76178d7589947e6ea1dbd902b7b02b3d4 Author: Kenneth Graunke <kenneth@whitecape.org> Date: Fri Aug 14 15:15:11 2015 -0700 i965/vs: Map scalar VS input locations properly; avoid tons of MOVs. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-14i965/fs: Simplify FS in brw_nir_lower_inputs to only support scalar modeJordan Justen1-1/+2
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-14mesa: remove unused functions in program.cBrian Paul1-51/+0
replace_registers() and adjust_param_indexes() were unused. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-14mesa: minor indentation fix in _mesa_BindTextureUnit()Brian Paul1-1/+1
2015-10-14mesa: remove unused texUnit local in _mesa_BindTextureUnit()Brian Paul1-7/+0
The texture unit is error-checked before this and the texUnit var is unused, so remove it. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>