~nh/mesa - nh's Mesa repository; mostly radeonsi related development

Age	Commit message (Collapse)	Author	Files	Lines
2017-09-16	radeonsi: emit DLDEXP and DFRACEXP TGSI opcodesldexp	Nicolai Hähnle	2	-1/+26
	Note: this causes spurious regressions in some current piglit tests, because the tests incorrectly assume that there is no denorm support for doubles. I'm going to send out a fix for those tests as well.
2017-09-16	radeonsi: emit LDEXP opcode	Nicolai Hähnle	2	-1/+3
	The LLVM intrinsic has existed for a long time. The current name was established in LLVM 3.9.
2017-09-16	st/glsl_to_tgsi: use LDEXP when available	Nicolai Hähnle	1	-3/+7

2017-09-16	gallium: add LDEXP TGSI instruction and corresponding cap	Nicolai Hähnle	20	-3/+50

2017-09-16	tgsi: infer that dst[1] of DFRACEXP is an integer	Nicolai Hähnle	5	-6/+9

2017-09-16	gallivm: add support for TGSI instructions with two outputs	Nicolai Hähnle	3	-1/+31

2017-09-16	gallivm: add dst register index to lp_build_tgsi_context::emit_store	Nicolai Hähnle	6	-20/+27

2017-09-16	tgsi: clarify the semantics of DFRACEXP	Nicolai Hähnle	4	-22/+20
	The status quo is quite the mess: 1. tgsi_exec will do a per-channel computation, and store the dst[0] result (significand) correctly for each channel. The dst[1] result (exponent) will be written to the first bit set in the writemask. So per-component calculation only works partially. 2. r600 will only do a single computation. It will replicate the exponent but not the significand. 3. The docs pretend that there's per-component calculation, but even get dst[0] and dst[1] confused. 4. Luckily, st_glsl_to_tgsi only ever emits single-component instructions, and kind-of assumes that everything is replicated, generating this for the dvec4 case: DFRACEXP TEMP[0].xy, TEMP[1].x, CONST[0][0].xyxy DFRACEXP TEMP[0].zw, TEMP[1].y, CONST[0][0].zwzw DFRACEXP TEMP[2].xy, TEMP[1].z, CONST[0][1].xyxy DFRACEXP TEMP[2].zw, TEMP[1].w, CONST[0][1].zwzw Settle on the simplest behavior, which is single-component calculation with replication, document it, and adjust tgsi_exec and r600.
2017-09-16	tgsi: fix the documentation of DLDEXP	Nicolai Hähnle	1	-1/+1
	Sourcing the exponent for the zw destination pair from Z is consistent with both tgsi_exec and gallivm. In practice, st_glsl_to_tgsi always generates per-channel instructions anyway.
2017-09-16	tgsi: infer that DLDEXP's second source has an integer type	Nicolai Hähnle	4	-7/+11

2017-09-16	glsl/lower_instruction: handle denorms and overflow in ldexp correctly	Nicolai Hähnle	1	-64/+107
	GLSL ES requires both, and while GLSL explicitly doesn't require correct overflow handling, it does appear to require handling input inf/denorms correctly. Fixes dEQP-GLES31.functional.shaders.builtin_functions.precision.ldexp.* Cc: mesa-stable@lists.freedesktop.org
2017-09-13	st/glsl_to_tgsi: remove unused code in temprename	Nicolai Hähnle	1	-15/+1
	Reviewed-By: Gert Wollny <gw.fossdev@gmail.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-13	st/glsl_to_tgsi: be precise about merging scopes	Nicolai Hähnle	1	-2/+2
	enclosing_scope already contains enclosing_scope_first_read. What we really want to check here -- not for correctness, but for speed -- is whether last_read_scope already contains enclosing_scope. Reviewed-By: Gert Wollny <gw.fossdev@gmail.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-13	ac/surface: match Z and stencil tile config	Nicolai Hähnle	1	-7/+42
	Fixes various piglit tests on Stoney, see the comment. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13	ac/surface: sanity-check that we got a TC-compatible HTILE if requested	Nicolai Hähnle	1	-0/+6
	Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13	ac/addrlib: enable assertions in debug builds	Nicolai Hähnle	1	-9/+17
	Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13	ac/addrlib: relax an assertion	Nicolai Hähnle	1	-1/+2
	Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13	ac/addrlib: relax an assertion	Nicolai Hähnle	1	-1/+1
	This assertion is triggered on Stoney in Piglit ./bin/framebuffer-blit-levels {draw,read} stencil -auto -fbo and similar tests. It should be harmless -- just relax it until we can get internal clarification. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13	radeonsi: hard-code pixel center for interpolateAtSample without multisample ↵	Nicolai Hähnle	3	-1/+33
	buffers The GLSL rules for interpolateAtSample are unfortunate: "Returns the value of the input interpolant variable at the location of sample number sample. If multisample buffers are not available, the input variable will be evaluated at the center of the pixel. If sample sample does not exist, the position used to interpolate the input variable is undefined." This fix will fallback to monolithic shader compilation when interpolateAtSample is used without multisampling. One alternative would be to always upload 16 sample positions, filling the buffer up with repetition when the actual number of samples is less, and then ANDing the sample ID with 0xf. However, that punishes all well-behaving users of interpolateAtSample, when in reality, only conformance tests should be affected by the issue. Fixes dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.non_multisample_buffer.* Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13	radeonsi: apply a mask to gl_SampleMaskIn in the PS prolog	Nicolai HÃÂ¤hnle	3	-5/+76
	gl_SampleMaskIn is supposed to contain set bits only for the samples that are covered by the current fragment shader invocation, but the VGPR initialization hardware loads the set of all bits that are covered at the current pixel. Fixes various tests in dEQP-GLES31.functional.shaders.sample_variables.sample_mask_in.* Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13	radeonsi: remove SET_PREDICATION workaround on newer firmware	Nicolai Hähnle	1	-2/+4
	We need to keep the workaround for older firmware, though. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13	amd/common: get ME/PFP/CE firmware feature versions as well	Nicolai Hähnle	3	-4/+12
	Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13	radeonsi: rename variable to clarify its meaning	Nicolai Hähnle	1	-10/+10
	Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13	radeonsi: make si_init_shader_selector_async static	Nicolai Hähnle	2	-2/+1
	Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13	radeonsi: fix segfault in descriptor dumping	Nicolai Hähnle	1	-0/+18
	Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13	ddebug: write out final driver log messages with GALLIUM_DDEBUG=always	Nicolai Hähnle	3	-2/+15
	If the last operation happens to be a non-draw, such as a transfer_map that triggers a decompress blit, there may be interesting messages left in the driver log. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13	swr/rast: Fetch compile state changes	Tim Rowley	3	-6/+15
	Add InstanceStrideEnable field and rename InstanceDataStepRate to InstanceAdvancementState in INPUT_ELEMENT_DESC structure. Add stubs for handling InstanceStrideEnable in FetchJit::JitLoadVertices() and FetchJit::JitGatherVertices() and assert if they are triggered. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13	swr/rast: adjust linux cpu topology identification code	Tim Rowley	1	-43/+38
	Make more robust to handle strange strange configurations like a vmware exported 4-way numa X 1-core configuration. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13	swr/rast: Missed conversion to SIMD_T	Tim Rowley	1	-1/+1
	Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13	swr/rast: whitespace changes	Tim Rowley	1	-0/+2
	Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13	swr/rast: add graph write to jit debug putput	Tim Rowley	1	-3/+3
	Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13	swr/rast: Migrate memory pointers to gfxptr_t type	Tim Rowley	9	-36/+36
	Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13	swr/rast: Remove hardcoded clip/cull slot from clipper	Tim Rowley	1	-14/+21
	Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13	swr/rast: Start to remove hardcoded clipcull_dist vertex attrib slot	Tim Rowley	3	-8/+15
	Add new field in SWR_BACKEND_STATE::vertexClipCullOffset to specify the start of the clip/cull section of the vertex header. Removed use of hardcoded slot from binner. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13	swr/rast: Move clip/cull enables in API	Tim Rowley	9	-40/+40
	Moved from from SWR_RASTSTATE to SWR_BACKEND_STATE. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13	swr/rast: Add new API SwrStallBE	Tim Rowley	2	-0/+17
	SwrStallBE stalls the backend threads until all work submitted before the stall has finished. The frontend threads can continue to make forward progress. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13	glsl: compile unused function out	Eric Engestrom	1	-0/+2
	The function is only called from one place, which is hidden behind the same `#ifdef DEBUG`. Fixes: ca73c3358c91434e68ab "glsl: Mark functions static" Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-09-13	radv: compile out unused code	Eric Engestrom	1	-0/+2
	Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-09-13	radv: clear push_constant_stages when resetting a command buffer	Samuel Pitoiset	1	-0/+1
	Per the spec: "Resetting a command buffer is an operation that discards any previously recorded commands and puts a command buffer in the initial state." As far I'm concerned, that flag can be changed by calling VkCmdPushConstants() (or any other functions which update it), so it should be cleared as well. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-13	radv: add more radv_emit_XXX() helpers for the dynamic state	Samuel Pitoiset	1	-40/+77
	Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-13	radv: remove useless 'cmd_buffer' param from radv_buffer_view_init()	Samuel Pitoiset	4	-7/+5
	Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-13	radv/gfx9: fix image resource handling.	Dave Airlie	1	-8/+19
	GFX9 changes how images are layed out, so this needs updating. Fixes: dEQP-VK.query_pool.statistics_query.* Cc: "17.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-09-13	radv/ac: bump params array for image atomic comp swap	Dave Airlie	1	-1/+1
	For the comp_swap case this was overflowing and crashing sometimes. Fixes: dEQP-VK.image.atomic_operations.compare_exchange.* Cc: "17.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-09-13	radv/gfx9: set mip0-depth correctly for 2d arrays/3d images	Dave Airlie	1	-2/+2
	This field covers the whole resource. Fixes: dEQP-VK.pipeline.image.suballocation.sampling_type.combined.view_type.3d.format.* dEQP-VK.texture.filtering.3d.combinations.* Cc: "17.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-09-13	radv: handle GFX9 1D textures	Dave Airlie	2	-14/+76
	As GFX9 can't handle 1D depth textures, radeonsi and apparantly pro just update all 1D textures to 2D, and work around it. This ports the workarounds from radeonsi. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Cc: "17.2" <mesa-stable@lists.freedesktop.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-09-13	radv: don't use iview for meta image width/height.	Dave Airlie	2	-13/+21
	Work out the width/height from the level manually, as on GFX9 we won't minify the iview width/height. This fixes: dEQP-VK.api.image_clearing.core.clear_color_image* on gfx9 Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Cc: "17.2" <mesa-stable@lists.freedesktop.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-09-12	intel/eu/validate: Look up types on demand in execution_type()	Jason Ekstrand	1	-4/+2
	We are looking up the execution type prior to checking how many sources we have. This leads to looking for a type for src1 on MOV instructions which is bogus. On BDW+, the src1 register type overlaps with the 64-bit immediate and causes us problems. Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org
2017-09-12	Revert "winsys/amdgpu: disable local BOs on Raven"	Marek Olšák	1	-2/+1
	This reverts commit 1cda9a2fee05effd9c64bd773bc6005281593662. It works now.
2017-09-12	radv: Don't allocate CMASK for linear images.	Bas Nieuwenhuizen	1	-1/+3
	We can't use it anyway in fast clears, and on GFX9 it seems to actually hange the card if we specify it. Fixes: f4e499ec791 "radv: add initial non-conformant radv vulkan driver"
2017-09-12	radv: Disable multilayer & multilevel DCC.	Bas Nieuwenhuizen	1	-0/+1
	The current DCC init routine doesn't account for initializing a single layer or level. Multilayer seems hard for small textures on pre-GFX9 as tre metadata for the layers can be interleaved. For GFX9 multilevel textures are a problem for similar reasons. So just disable this for now, until we handle the texture modes correctly. Fixes: f4e499ec791 "radv: add initial non-conformant radv vulkan driver"