~nh/mesa - nh's Mesa repository; mostly radeonsi related development

Age	Commit message (Collapse)	Author	Files	Lines
2017-01-10	ac/nir: use ac_emit_fdiv throughoutcubemaps	Nicolai Hähnle	1	-22/+6
	... and eliminate emit_fdiv and nir_to_llvm_context::fpmath_md_*, which are now unused.
2017-01-10	ac/nir: use ac_build_gather_values[_extended] throughout	Nicolai Hähnle	1	-65/+24
	... and eliminate the non-ac copies. Mostly straight-forward search & replace.
2017-01-10	ac/nir: use ac_emit_llvm_intrinsic throughout	Nicolai Hähnle	1	-79/+41
	... by straight-forward search & replace, and eliminate emit_llvm_intrinsic.
2017-01-10	radeonsi: remove unused si_prepare_cube_coords	Nicolai Hähnle	2	-200/+0

2017-01-10	amd/common: unify cube map coordinate handling between radeonsi and radv	Nicolai Hähnle	6	-197/+440
	Code is taken from a combination of radv (for the more basic functions, to avoid gallivm dependencies) and radeonsi (for the new and improved derivative calculations). v2: add 0.5 offset to tex coords only after derivative calculation
2017-01-10	radeonsi: only touch first three coordinates in si_prepare_cube_coords	Nicolai Hähnle	1	-12/+1
	Sourcing coords_arg[4] is actually never correct, since bias is handled differently in tex_fetch_args anyway.
2017-01-10	radeonsi: remove unused si_llvm_cube_to_2d_coords	Nicolai Hähnle	1	-28/+0

2017-01-10	radeonsi: restrict cube map derivative computations to the correct plane	Nicolai Hähnle	1	-23/+107
	As remarked by the comment in the original code, the old algorithm fails when (tc + deriv) points at a different cube face. Instead, simply project the derivative directly to the plane of the selected cube face. The new code is based on exactly differentiating (using the chain rule) the projection onto a plane corresponding to a fixed cube map face (which is still selected in the usual way based on the texture coordinate itself). The computations end up fairly involved, but we do save two reciprocal computations. Fixes GL45-CTS.texture_cube_map_array.sampling. v2: add 0.5 offset to tex coords only after derivative calculation
2017-01-10	radeonsi: communicate cube map coordinates more explicitly	Nicolai Hähnle	1	-33/+43

2017-01-10	radeonsi: fix the offset in cube map coordinate conversion	Nicolai Hähnle	1	-1/+1
	The correct offset is really 0.5, both intuitively and according to the formulas in Section 8.13 (Cube Map Texture Selection) of the OpenGL spec. This mistake probably never hurt because wrap-around is constrained to individual cube faces.
2017-01-09	radv: drop unused fields in physical device.	Dave Airlie	1	-6/+0
	Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-09	i965: call intel_prepare_render always when reading pixels	Tapani Pälli	1	-6/+6
	Currently we do this only in the fallback code (when tiled memcpy version failed) but it needs to be done always so that we have correct read and write buffer in place. No regressions seen in CI. Fixes: dEQP-EGL.functional.buffer_age.* Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98330 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-01-09	st/mesa: pass gl_program to st_bind_ubos()	Timothy Arceri	1	-18/+18
	We no longer need anything from gl_linked_shader. Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-09	st/mesa: pass gl_program to st_bind_images()	Timothy Arceri	1	-24/+22
	We no longer need anything from gl_linked_shader. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-09	st/mesa: stop passing gl_linked_shader to set_affected_state_flags()	Timothy Arceri	1	-7/+6
	We now get everything we need from the gl_program param. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-09	st/mesa/glsl: set num_images directly in shader_info	Timothy Arceri	6	-20/+13
	This change also removes the now duplicate NumImages field. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-09	st/mesa: pass gl_program to st_bind_ssbos()	Timothy Arceri	1	-21/+21
	We no longer need to pass gl_shader_program. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-09	nir: add another comparison simplification	Timothy Arceri	1	-0/+2
	On BDW: total instructions in shared programs: 13061877 -> 13060965 (-0.01%) instructions in affected programs: 133569 -> 132657 (-0.68%) helped: 566 HURT: 0 total cycles in shared programs: 256611784 -> 256599536 (-0.00%) cycles in affected programs: 861016 -> 848768 (-1.42%) helped: 379 HURT: 73 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09	nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.	Kenneth Graunke	1	-0/+4
	On BDW: total instructions in shared programs: 13074882 -> 13068703 (-0.05%) instructions in affected programs: 1823116 -> 1816937 (-0.34%) helped: 4187 HURT: 537 total cycles in shared programs: 256622718 -> 256425382 (-0.08%) cycles in affected programs: 123790120 -> 123592784 (-0.16%) helped: 3823 HURT: 2037 total spills in shared programs: 15276 -> 14929 (-2.27%) spills in affected programs: 9446 -> 9099 (-3.67%) helped: 352 HURT: 1 total fills in shared programs: 20496 -> 20144 (-1.72%) fills in affected programs: 13040 -> 12688 (-2.70%) helped: 352 HURT: 1 LOST: 2 GAINED: 21 v2: Rely on 'a' being a well-formed boolean (Connor, Eric). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09	nir: Convert ineg(b2i(a)) to a if it's a boolean.	Kenneth Graunke	1	-0/+2
	On BDW: total instructions in shared programs: 13071119 -> 13070371 (-0.01%) instructions in affected programs: 83424 -> 82676 (-0.90%) helped: 505 HURT: 45 (all TCS, all hurt by a single instruction) total cycles in shared programs: 256601322 -> 256588932 (-0.00%) cycles in affected programs: 819410 -> 807020 (-1.51%) helped: 450 HURT: 57 total loops in shared programs: 2950 -> 2942 (-0.27%) loops in affected programs: 8 -> 0 helped: 7 HURT: 0 v2: Drop unnecessary 'a@bool' annotation (Connor, Eric). Add a comment explaining the rule (Ian). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1] Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-07	i965: Move TES input VUE map calculation out a layer.	Kenneth Graunke	3	-9/+11
	In Vulkan, we'll compile the TCS and TES at the same time, so I can just pass the TCS output VUE map to brw_compile_tes as the TES input VUE map. So, we only need to do this in GL. Move it to the GL-specific layer. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-07	i965: Pass NULL for gl_program when compiling TES.	Kenneth Graunke	1	-1/+1
	This isn't needed, and Vulkan doesn't have one. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-07	i965: Move TES spacing/domain/topology setup to brw_compile_tes().	Kenneth Graunke	2	-33/+34
	Moving this down a layer lets us share code between Vulkan and GL. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-07	i965: Access TES shader info via NIR.	Kenneth Graunke	1	-6/+6
	NIR exists in both GL and Vulkan, but gl_program is GL specific. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-07	mesa: Introduce a compiler enum for tessellation spacing.	Kenneth Graunke	11	-47/+54
	It feels weird using GL_* enums in a Vulkan driver. v2: Fix the TESS_SPACING -> PIPE_TESS_SPACING conversion. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-07	compiler: Change shader_info->tes.vertex_order into a ccw boolean.	Kenneth Graunke	4	-13/+7
	The vertex order is either clockwise or counterclockwise. We can just store a "ccw" boolean rather than GLenum values. I don't want to use GLenums in a Vulkan driver, and even in GL a simple boolean works fine. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-07	anv/pipeline: Call NIR passes using NIR_PASS_V	Jason Ekstrand	1	-31/+15
	This lets us get validation without having to do it manually. Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-07	anv/pipeline: Only call remove_dead_variables once	Jason Ekstrand	1	-3/+3
	It can handle multiple modes at a time now so there's no reason to call it repeatedly. Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-07	Revert recent GLSL slot counting fiasco.	Kenneth Graunke	5	-62/+14
	I apparently broke mark_whole_variable in ir_set_program_inouts. It was passing a type that wasn't var->type, so the wrapper didn't work out. It's all broken, revert it and start over. Fixes all kinds of things on other drivers. Revert "glsl: Make is_fixed_function_array actually check for varyings." This reverts commit 42699e12711668a142b7acf11c168cf4301c1295. Revert "glsl: Mark whole variable used for ClipDistance and TessLevel." This reverts commit 5c580e64cc206ab160e1767c42e4d6c81f67da4d. Revert "glsl: Override the # of varying slots for ClipDistance and TessLevel." This reverts commit 8b5749f65ac434961308ccb579fb8a816e4f29d5. Revert "glsl: Create and use a new ir_variable::count_attribute_slots() wrapper." This reverts commit 6aa5cb34d03765b7be8611aa516bc201bd337f73.
2017-01-07	glsl: Make is_fixed_function_array actually check for varyings.	Kenneth Graunke	1	-0/+4
	We can't check VARYING_SLOT_* locations until we've determined that the variable is actually a varying. Fixes assert failures in drivers which actually use this path, such as radeonsi and i915. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99314 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-07	drirc: Allow extension midshader for Divinity: Original Sin (EE)	Kai Wasserbäch	1	-0/+4
	See also <https://bugs.freedesktop.org/show_bug.cgi?id=93551#c27> where this was first observed as a requirement. Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org> Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-01-07	glsl: fix opt_minmax redundancy checks against baserange	Timothy Arceri	1	-2/+2
	Marking operations as redundant if they are equal to the base range is fine when the tree structure is something like this: max / \ max b / \ 3 max / \ 3 a But the opt falls apart with a tree like this: max / \ max max / \ / \ 3 a b 3 The problem is that both branches are treated the same: descending in the left branch will prune the constant, and then descending the right branch will prune the constant there as well, because limits[0] wasn't updated to take the change on the left branch into account, and so we still get [3,\infty) as baserange. In order to fix the bug we just disable the marking of redundant expressions when they match the baserange. NIR algebraic opt will clean up the first tree for anyway, hopefully other backends are smart enough to do this also. Cc: "13.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06	i965/compiler: Use the new nir_opt_copy_prop_vars pass	Jason Ekstrand	1	-0/+1
	We run this after nir_lower_vars_to_ssa so that as many load/store_var intrinsics as possible before copy_prop_vars executes. This is because the pass isn't particularly efficient (it does a lot of linear walks of a linked list) so we'd like as much of the work as possible to be done before copy_prop_vars runs. Shader DB results on Sky Lake: total instructions in shared programs: 12020290 -> 12013627 (-0.06%) instructions in affected programs: 26033 -> 19370 (-25.59%) helped: 16 HURT: 13 total cycles in shared programs: 137772848 -> 137549012 (-0.16%) cycles in affected programs: 6955660 -> 6731824 (-3.22%) helped: 217 HURT: 237 total loops in shared programs: 3208 -> 3208 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 4112 -> 4057 (-1.34%) spills in affected programs: 483 -> 428 (-11.39%) helped: 2 HURT: 0 total fills in shared programs: 5519 -> 5102 (-7.56%) fills in affected programs: 993 -> 576 (-41.99%) helped: 2 HURT: 0 LOST: 0 GAINED: 0 Broadwell had similar results. On older hardware, the impact isn't as large because they don't advertise GL 4.5. Of the hurt programs, all but one are hurt by a single instruction and the one is hurt by 3 instructions. All of the helped programs, on the other hand, are helped by at least 3 instructions and one kerbal space program shader is helped by 44.59%. The real star of the show, however, is the Gl43CSDof synmark2 benchmark which has two shaders which are cut by 28% and 40% and the over-all runtime performance of the benchmark on my Sky Lake laptop is improved by around 25-30% (it's a bit hard to be exact due to thermal throttling). Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-06	nir: Add a local variable-based copy propagation pass	Jason Ekstrand	3	-0/+816
	Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-06	nir/builder: Add a helper for getting the most recently added instruction	Jason Ekstrand	1	-0/+7
	Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-06	nir/builder: Add a load_deref_var helper	Jason Ekstrand	1	-0/+16
	Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-06	nir/dead_variables: Remove shader-local variables that are only written	Jason Ekstrand	1	-9/+60
	Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-06	nir/dead_variables: Removed shared variables when requested	Jason Ekstrand	1	-0/+3
	Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-06	anv/formats: Use the real format for B4G4R4A4_UNORM_PACK16 on gen8	Jason Ekstrand	1	-2/+2
	Because border color is handled pre-swizzle, when we move the alpha channel around in the format, the OPAQUE_BLACK border colors don't work correctly on B4G4R4A4_UNORM_PACK16 with the hack. This fixes the following Vulkan CTS tests on Broadwell: dEQP-VK.pipeline.sampler.view_type.2d_array.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black dEQP-VK.pipeline.sampler.view_type.1d_array.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black dEQP-VK.pipeline.sampler.view_type.2d.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black dEQP-VK.pipeline.sampler.view_type.1d.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black dEQP-VK.pipeline.sampler.view_type.3d.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2017-01-06	isl: Mark A4B4G4R4_UNORM as supported on gen8	Jason Ekstrand	1	-1/+4
	Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: "13.0" <mesa-dev@lists.freedesktop.org>
2017-01-07	radv: fix depth transitions with layerCount = VK_REMAINING_ARRAY_LAYERS	Pierre-Loup A. Griffais	1	-1/+1
	Interpreting layerCount literally would try to create billions of image views in radv_process_depth_image_inplace(). Signed-off-by: Pierre-Loup A. Griffais <pgriffais@valvesoftware.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-01-06	i965: Rework gl_TessLevel*[] handling to use NIR compact arrays.	Kenneth Graunke	10	-364/+92
	Treating everything as scalar arrays allows us to drop a bunch of special case input/output munging all throughout the backend. Instead, we just need to remap the TessLevel components to the appropriate patch URB header locations in remap_patch_urb_offsets(). We also switch to treating the TES input versions of these as ordinary shader inputs rather than system values, as remap_patch_urb_offsets() just makes everything work out without special handling. This regresses one Piglit test: arb_tessellation_shader-large-uniforms/GL_TESS_CONTROL_SHADER-array-at-limit The compiler starts promoting the constant arrays assigned to gl_TessLevel* to uniform arrays. Since the shader also has a uniform array that uses the maximum number of uniform components, this puts it over the uniform component limit enforced by the linker. This is arguably a bug in the constant array promotion code (it should avoid pushing us over limits), but is unlikely to penalize any real application. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-06	i965: Inline store_output helper in quads workaround code.	Kenneth Graunke	1	-14/+10
	It's only used in one place, it ignores the offset parameter currently, and I want to add more parameters...at which point, passing in a bunch of integers seems less obvious than writing it out. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-06	nir: Make glsl_to_nir compact scalar TessLevel arrays.	Kenneth Graunke	1	-1/+12
	Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-06	i965: Make unify_interfaces not spread VARYING_BIT_TESS_LEVEL_*.	Kenneth Graunke	1	-2/+5
	This is harmless today because gl_TessLevelInner/Outer in the TES is currently treated as system values. However, when we move to treating them as inputs, this would cause a bug: with no TCS present, it would propagate TES reads of VARYING_SLOT_TESS_LEVEL into the VS output VUE map slots. This is totally bogus - those don't even exist in the VS. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-06	glsl: Support gl_TessLevelInner/Outer[] as TES input variables.	Kenneth Graunke	2	-4/+17
	Upcoming reworks in i965 are going to make it easy to handle this like any other input. Having it as a system value will just require additional code for no benefit. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-06	glsl: Mark whole variable used for ClipDistance and TessLevel*.	Kenneth Graunke	1	-3/+23
	There's no point in trying to mark partial array access for gl_ClipDistance, gl_TessLevelOuter, or gl_TessLevelInner - they're special built-in variables that control fixed function hardware, and will likely be used in an all-or-nothing fashion. Since these arrays only occupy 1-2 varying slots, we have to avoid our normal processing which increments the slot value by the array index. (I wrote this code before i965 switched from ir_set_program_inouts to nir_shader_gather_info. It's not used by anyone today, and I'm not sure how valuable it is...the alternative to GLSL IR lowering is NIR compact arrays, at which point you should use nir_gather_info.) Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-06	glsl: Override the # of varying slots for ClipDistance and TessLevel*.	Kenneth Graunke	1	-0/+18
	Right now, this shouldn't have any effect, as all drivers use LowerClipDist and LowerTessFactors to turn the float[] arrays into vectors. However, it should help make it possible for drivers to avoid that lowering. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-06	glsl: Create and use a new ir_variable::count_attribute_slots() wrapper.	Kenneth Graunke	5	-11/+17
	This wraps glsl_type::count_attribute_slots(), but will soon contain a couple of overrides for a couple of GLSL built-ins variables. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-06	gallium/radeon: use the internal clear_buffer callback to fix r600g	Marek Olšák	1	-1/+3
	r600g doesn't set pipe_context::clear_buffer. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99303 Reviewed-by: Alex Deucher <alexander.deucher@amd.com>