summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-09-16radeonsi: emit DLDEXP and DFRACEXP TGSI opcodesldexpNicolai Hähnle2-1/+26
Note: this causes spurious regressions in some current piglit tests, because the tests incorrectly assume that there is no denorm support for doubles. I'm going to send out a fix for those tests as well.
2017-09-16radeonsi: emit LDEXP opcodeNicolai Hähnle2-1/+3
The LLVM intrinsic has existed for a long time. The current name was established in LLVM 3.9.
2017-09-16st/glsl_to_tgsi: use LDEXP when availableNicolai Hähnle1-3/+7
2017-09-16gallium: add LDEXP TGSI instruction and corresponding capNicolai Hähnle20-3/+50
2017-09-16tgsi: infer that dst[1] of DFRACEXP is an integerNicolai Hähnle5-6/+9
2017-09-16gallivm: add support for TGSI instructions with two outputsNicolai Hähnle3-1/+31
2017-09-16gallivm: add dst register index to lp_build_tgsi_context::emit_storeNicolai Hähnle6-20/+27
2017-09-16tgsi: clarify the semantics of DFRACEXPNicolai Hähnle4-22/+20
The status quo is quite the mess: 1. tgsi_exec will do a per-channel computation, and store the dst[0] result (significand) correctly for each channel. The dst[1] result (exponent) will be written to the first bit set in the writemask. So per-component calculation only works partially. 2. r600 will only do a single computation. It will replicate the exponent but not the significand. 3. The docs pretend that there's per-component calculation, but even get dst[0] and dst[1] confused. 4. Luckily, st_glsl_to_tgsi only ever emits single-component instructions, and kind-of assumes that everything is replicated, generating this for the dvec4 case: DFRACEXP TEMP[0].xy, TEMP[1].x, CONST[0][0].xyxy DFRACEXP TEMP[0].zw, TEMP[1].y, CONST[0][0].zwzw DFRACEXP TEMP[2].xy, TEMP[1].z, CONST[0][1].xyxy DFRACEXP TEMP[2].zw, TEMP[1].w, CONST[0][1].zwzw Settle on the simplest behavior, which is single-component calculation with replication, document it, and adjust tgsi_exec and r600.
2017-09-16tgsi: fix the documentation of DLDEXPNicolai Hähnle1-1/+1
Sourcing the exponent for the zw destination pair from Z is consistent with both tgsi_exec and gallivm. In practice, st_glsl_to_tgsi always generates per-channel instructions anyway.
2017-09-16tgsi: infer that DLDEXP's second source has an integer typeNicolai Hähnle4-7/+11
2017-09-16glsl/lower_instruction: handle denorms and overflow in ldexp correctlyNicolai Hähnle1-64/+107
GLSL ES requires both, and while GLSL explicitly doesn't require correct overflow handling, it does appear to require handling input inf/denorms correctly. Fixes dEQP-GLES31.functional.shaders.builtin_functions.precision.ldexp.* Cc: mesa-stable@lists.freedesktop.org
2017-09-13st/glsl_to_tgsi: remove unused code in temprenameNicolai Hähnle1-15/+1
Reviewed-By: Gert Wollny <gw.fossdev@gmail.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-13st/glsl_to_tgsi: be precise about merging scopesNicolai Hähnle1-2/+2
enclosing_scope already contains enclosing_scope_first_read. What we really want to check here -- not for correctness, but for speed -- is whether last_read_scope already contains enclosing_scope. Reviewed-By: Gert Wollny <gw.fossdev@gmail.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-13ac/surface: match Z and stencil tile configNicolai Hähnle1-7/+42
Fixes various piglit tests on Stoney, see the comment. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13ac/surface: sanity-check that we got a TC-compatible HTILE if requestedNicolai Hähnle1-0/+6
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13ac/addrlib: enable assertions in debug buildsNicolai Hähnle1-9/+17
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13ac/addrlib: relax an assertionNicolai Hähnle1-1/+2
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13ac/addrlib: relax an assertionNicolai Hähnle1-1/+1
This assertion is triggered on Stoney in Piglit ./bin/framebuffer-blit-levels {draw,read} stencil -auto -fbo and similar tests. It should be harmless -- just relax it until we can get internal clarification. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13radeonsi: hard-code pixel center for interpolateAtSample without multisample ↵Nicolai Hähnle3-1/+33
buffers The GLSL rules for interpolateAtSample are unfortunate: "Returns the value of the input interpolant variable at the location of sample number sample. If multisample buffers are not available, the input variable will be evaluated at the center of the pixel. If sample sample does not exist, the position used to interpolate the input variable is undefined." This fix will fallback to monolithic shader compilation when interpolateAtSample is used without multisampling. One alternative would be to always upload 16 sample positions, filling the buffer up with repetition when the actual number of samples is less, and then ANDing the sample ID with 0xf. However, that punishes all well-behaving users of interpolateAtSample, when in reality, only conformance tests should be affected by the issue. Fixes dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.non_multisample_buffer.* Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13radeonsi: apply a mask to gl_SampleMaskIn in the PS prologNicolai Hähnle3-5/+76
gl_SampleMaskIn is supposed to contain set bits only for the samples that are covered by the current fragment shader invocation, but the VGPR initialization hardware loads the set of all bits that are covered at the current pixel. Fixes various tests in dEQP-GLES31.functional.shaders.sample_variables.sample_mask_in.* Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13radeonsi: remove SET_PREDICATION workaround on newer firmwareNicolai Hähnle1-2/+4
We need to keep the workaround for older firmware, though. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13amd/common: get ME/PFP/CE firmware feature versions as wellNicolai Hähnle3-4/+12
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13radeonsi: rename variable to clarify its meaningNicolai Hähnle1-10/+10
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13radeonsi: make si_init_shader_selector_async staticNicolai Hähnle2-2/+1
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13radeonsi: fix segfault in descriptor dumpingNicolai Hähnle1-0/+18
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13ddebug: write out final driver log messages with GALLIUM_DDEBUG=alwaysNicolai Hähnle3-2/+15
If the last operation happens to be a non-draw, such as a transfer_map that triggers a decompress blit, there may be interesting messages left in the driver log. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13swr/rast: Fetch compile state changesTim Rowley3-6/+15
Add InstanceStrideEnable field and rename InstanceDataStepRate to InstanceAdvancementState in INPUT_ELEMENT_DESC structure. Add stubs for handling InstanceStrideEnable in FetchJit::JitLoadVertices() and FetchJit::JitGatherVertices() and assert if they are triggered. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13swr/rast: adjust linux cpu topology identification codeTim Rowley1-43/+38
Make more robust to handle strange strange configurations like a vmware exported 4-way numa X 1-core configuration. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13swr/rast: Missed conversion to SIMD_TTim Rowley1-1/+1
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13swr/rast: whitespace changesTim Rowley1-0/+2
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13swr/rast: add graph write to jit debug putputTim Rowley1-3/+3
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13swr/rast: Migrate memory pointers to gfxptr_t typeTim Rowley9-36/+36
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13swr/rast: Remove hardcoded clip/cull slot from clipperTim Rowley1-14/+21
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13swr/rast: Start to remove hardcoded clipcull_dist vertex attrib slotTim Rowley3-8/+15
Add new field in SWR_BACKEND_STATE::vertexClipCullOffset to specify the start of the clip/cull section of the vertex header. Removed use of hardcoded slot from binner. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13swr/rast: Move clip/cull enables in APITim Rowley9-40/+40
Moved from from SWR_RASTSTATE to SWR_BACKEND_STATE. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13swr/rast: Add new API SwrStallBETim Rowley2-0/+17
SwrStallBE stalls the backend threads until all work submitted before the stall has finished. The frontend threads can continue to make forward progress. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-13glsl: compile unused function outEric Engestrom1-0/+2
The function is only called from one place, which is hidden behind the same `#ifdef DEBUG`. Fixes: ca73c3358c91434e68ab "glsl: Mark functions static" Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-09-13radv: compile out unused codeEric Engestrom1-0/+2
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-09-13radv: clear push_constant_stages when resetting a command bufferSamuel Pitoiset1-0/+1
Per the spec: "Resetting a command buffer is an operation that discards any previously recorded commands and puts a command buffer in the initial state." As far I'm concerned, that flag can be changed by calling VkCmdPushConstants() (or any other functions which update it), so it should be cleared as well. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-13radv: add more radv_emit_XXX() helpers for the dynamic stateSamuel Pitoiset1-40/+77
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-13radv: remove useless 'cmd_buffer' param from radv_buffer_view_init()Samuel Pitoiset4-7/+5
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-13radv/gfx9: fix image resource handling.Dave Airlie1-8/+19
GFX9 changes how images are layed out, so this needs updating. Fixes: dEQP-VK.query_pool.statistics_query.* Cc: "17.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-09-13radv/ac: bump params array for image atomic comp swapDave Airlie1-1/+1
For the comp_swap case this was overflowing and crashing sometimes. Fixes: dEQP-VK.image.atomic_operations.compare_exchange.* Cc: "17.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-09-13radv/gfx9: set mip0-depth correctly for 2d arrays/3d imagesDave Airlie1-2/+2
This field covers the whole resource. Fixes: dEQP-VK.pipeline.image.suballocation.sampling_type.combined.view_type.3d.format.* dEQP-VK.texture.filtering.3d.combinations.* Cc: "17.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-09-13radv: handle GFX9 1D texturesDave Airlie2-14/+76
As GFX9 can't handle 1D depth textures, radeonsi and apparantly pro just update all 1D textures to 2D, and work around it. This ports the workarounds from radeonsi. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Cc: "17.2" <mesa-stable@lists.freedesktop.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-09-13radv: don't use iview for meta image width/height.Dave Airlie2-13/+21
Work out the width/height from the level manually, as on GFX9 we won't minify the iview width/height. This fixes: dEQP-VK.api.image_clearing.core.clear_color_image* on gfx9 Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Cc: "17.2" <mesa-stable@lists.freedesktop.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-09-12intel/eu/validate: Look up types on demand in execution_type()Jason Ekstrand1-4/+2
We are looking up the execution type prior to checking how many sources we have. This leads to looking for a type for src1 on MOV instructions which is bogus. On BDW+, the src1 register type overlaps with the 64-bit immediate and causes us problems. Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org
2017-09-12Revert "winsys/amdgpu: disable local BOs on Raven"Marek Olšák1-2/+1
This reverts commit 1cda9a2fee05effd9c64bd773bc6005281593662. It works now.
2017-09-12radv: Don't allocate CMASK for linear images.Bas Nieuwenhuizen1-1/+3
We can't use it anyway in fast clears, and on GFX9 it seems to actually hange the card if we specify it. Fixes: f4e499ec791 "radv: add initial non-conformant radv vulkan driver"
2017-09-12radv: Disable multilayer & multilevel DCC.Bas Nieuwenhuizen1-0/+1
The current DCC init routine doesn't account for initializing a single layer or level. Multilayer seems hard for small textures on pre-GFX9 as tre metadata for the layers can be interleaved. For GFX9 multilevel textures are a problem for similar reasons. So just disable this for now, until we handle the texture modes correctly. Fixes: f4e499ec791 "radv: add initial non-conformant radv vulkan driver"