summaryrefslogtreecommitdiff
path: root/src/gallium/drivers/radeonsi
AgeCommit message (Collapse)AuthorFilesLines
2017-09-16gallium: add LDEXP TGSI instruction and corresponding capNicolai Hähnle1-0/+1
2017-09-16tgsi: infer that dst[1] of DFRACEXP is an integerNicolai Hähnle1-1/+1
2017-09-16gallivm: add dst register index to lp_build_tgsi_context::emit_storeNicolai Hähnle3-11/+18
2017-09-13radeonsi: hard-code pixel center for interpolateAtSample without multisample ↵Nicolai Hähnle3-1/+33
buffers The GLSL rules for interpolateAtSample are unfortunate: "Returns the value of the input interpolant variable at the location of sample number sample. If multisample buffers are not available, the input variable will be evaluated at the center of the pixel. If sample sample does not exist, the position used to interpolate the input variable is undefined." This fix will fallback to monolithic shader compilation when interpolateAtSample is used without multisampling. One alternative would be to always upload 16 sample positions, filling the buffer up with repetition when the actual number of samples is less, and then ANDing the sample ID with 0xf. However, that punishes all well-behaving users of interpolateAtSample, when in reality, only conformance tests should be affected by the issue. Fixes dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.non_multisample_buffer.* Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13radeonsi: apply a mask to gl_SampleMaskIn in the PS prologNicolai Hähnle3-5/+76
gl_SampleMaskIn is supposed to contain set bits only for the samples that are covered by the current fragment shader invocation, but the VGPR initialization hardware loads the set of all bits that are covered at the current pixel. Fixes various tests in dEQP-GLES31.functional.shaders.sample_variables.sample_mask_in.* Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13radeonsi: rename variable to clarify its meaningNicolai Hähnle1-10/+10
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13radeonsi: make si_init_shader_selector_async staticNicolai Hähnle2-2/+1
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13radeonsi: fix segfault in descriptor dumpingNicolai Hähnle1-0/+18
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-11radeonsi: optimize TCS epilog when invocation 0 writes tess factorsMarek Olšák4-28/+89
This removes the barrier and LDS stores and loads for tess factors when it's possible. The removal of the barrier seems more important to me though. In one shader, it removes 17 * 4 bytes from the shader binary. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-08radeonsi: move the guts of ARB_shader_group_vote emission to acConnor Abbott1-21/+3
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-09-08radeonsi: move si_emit_ballot() to acConnor Abbott1-32/+6
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-09-08radeonsi: move emit_optimization_barrier() to acConnor Abbott1-43/+2
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-09-08radeonsi: move llvm_get_type_size() to acConnor Abbott1-34/+9
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-09-07ac/surface: add radeon_surf::has_stencil for convenienceMarek Olšák3-6/+6
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-07radeonsi: don't read tcs_out_lds_layout.patch_stride from an SGPRMarek Olšák1-6/+14
Same as before, writing TCS outputs to LDS is rare. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-07radeonsi: don't read tcs_out_lds_layout.vertex_size from an SGPRMarek Olšák3-6/+20
TCS outputs are usually not written to LDS, so no stats here. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-07radeonsi/gfx9: don't read LS out vertex stride from an SGPR in monolithic HSMarek Olšák2-1/+11
-44 bytes in a monolithic LS-HS binary. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-07radeonsi: don't read the LS output vertex stride from an SGPR in LSMarek Olšák1-4/+21
Now it's able to generate ds_write2_b64 instead of ds_write2_b32. -20 bytes in one shader binary. (having only 1 output) Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-07radeonsi: don't read the number of TCS out vertices from an SGPR in TCSMarek Olšák1-2/+15
-16 bytes in one shader binary. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-07radeonsi: don't always apply the PrimID instancing bug workaround on SIMarek Olšák1-1/+1
It looks like commit 391673af7ad1565a5f6ac8fc2f8c9fcdd1fe9908 that should have fixed the perf regression didn't really change much if anything. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-07radeonsi: remove 2 callbacks from si_shader_contextMarek Olšák3-17/+13
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-06radeonsi/gfx9: proper workaround for LS/HS VGPR initialization bugNicolai Hähnle5-24/+85
When the HS wave is empty, the hardware writes the LS VGPRs starting at v0 instead of v2. Workaround by shifting them back into place when necessary. For simplicity, this is always done in the LS prolog. According to the hardware team, this will be fixed in future chips, so take that into account already. Note that this is not a bug fix, as the bug was already worked around by commit 166823bfd26 ("radeonsi/gfx9: add a temporary workaround for a tessellation driver bug"). This change merely replaces the workaround by one that should be better. v2: add workaround code to shader only when necessary v3: clarify the prefer_mono comment Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-06amd/common: pass chip_class to ac_dump_regNicolai Hähnle1-15/+30
Acked-by: Marek Olšák <marek.olsak@amd.com>
2017-09-06radeonsi/gfx9: always flush DB metadata on framebuffer changesNicolai Hähnle3-4/+14
This fixes GL45-CTS.shader_image_load_store.basic-glsl-earlyFragTests. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-05radeonsi/gfx9: implement primitive binningMarek Olšák8-7/+485
This increases performance, but it was tuned for Raven, not Vega. We don't know yet how Vega will perform, hopefully not worse. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-05radeonsi: add more state flags into si_state_dsaMarek Olšák2-1/+23
3 flags for primitive binning, 2 flags for out-of-order rasterization (but that will be done some other time) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-05radeonsi/gfx9: don't use BREAK_BATCH and FLUSH_DFSM if DFSM is disabledMarek Olšák2-3/+4
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-04radeonsi: eliminate PS color outputs when colormask kills themMarek Olšák3-0/+6
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-04gallium/radeon: sort DBG shader flags according to pipe_shader_typeMarek Olšák1-3/+2
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-04radeonsi: ensure cache flushes happen before SET_PREDICATION packetsNicolai Hähnle1-5/+10
The data is read when the render_cond_atom is emitted, so we must delay emitting the atom until after the flush. Fixes: 0fe0320dc074 ("radeonsi: use optimal packet order when doing a pipeline sync") Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-04radeonsi: fix ARB_transform_feedback_overflow_query on <= VINicolai Hähnle1-1/+3
The result written by the shader workaround needs to be written back, or the CP may read stale data. Fixes: 78476cfe071a ("radeonsi: enable ARB_transform_feedback_overflow_query") Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-04radeonsi: fix compute shader state dumpingNicolai Hähnle1-6/+11
Fixes: 420c438589c8 ("radeonsi: log draw and compute state into log context") Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-04radeonsi: add an assertion that only two-dimensional constant references are ↵Nicolai Hähnle1-2/+3
used v2: remove some redundant checks Acked-by: Roland Scheidegger <sroland@vmware.com> (v1) Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> (v1) Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-09-01radeonsi: move si_vm_fault_occured() to AMD common codeSamuel Pitoiset1-102/+4
For radv, in order to report VM faults when detected. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-08-30radeonsi: update dirty_level_mask before dispatchingSamuel Pitoiset1-0/+5
This fixes a rendering issue with Hitman when bindless textures are enabled. Fixes: 2263610827 ("radeonsi: flush DB caches only when transitioning from DB to texturing") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-29ac/debug: Support multiple trace ids for nested IBs.Bas Nieuwenhuizen1-9/+10
Signed-off-by: Bas Nieuwenhuizen <basni@google.com> Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-08-29radeonsi: stop leaking nirTimothy Arceri1-0/+1
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-28radeonsi: rewrite late alloc VS limit computationMarek Olšák1-12/+25
This is still very simple, but it's better than before. Loosely ported from Vulkan.
2017-08-28radeonsi: correct maximum wave count per SIMDMarek Olšák1-1/+12
v2: don't special-case Tonga and Iceland. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-27Revert "radeonsi: get the raster config from AMDGPU on SI"Marek Olšák1-17/+0
This reverts commit fc99cb3c9edee3af773700cf7ebdc60dc02fcaba. "The performance went down from 64.7 to 51.4 fps in Valley and from 30.8 to 25.1 fps in Heaven on Radeon HD 7970. Other games seem to have also a 10-25% performance decrease." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102429 It looks like we can't use the raster config values from the kernel.
2017-08-25radeonsi: set IF_THRESHOLD to 4Timothy Arceri1-1/+1
In 74e39de9324d it was set to 3 and it was reported that 4 caused tesseract to start spilling VGPRs. This no longer seems to be the case. Totals: SGPRS: 2787844 -> 2787764 (-0.00 %) VGPRS: 1713121 -> 1712717 (-0.02 %) Spilled SGPRs: 7532 -> 7532 (0.00 %) Spilled VGPRs: 49 -> 33 (-32.65 %) Private memory VGPRs: 2060 -> 2060 (0.00 %) Scratch size: 2200 -> 2180 (-0.91 %) dwords per thread Code Size: 79265520 -> 79248360 (-0.02 %) bytes LDS: 436 -> 436 (0.00 %) blocks Max Waves: 670535 -> 670608 (0.01 %) Wait states: 0 -> 0 (0.00 %) Before: VGPR SPILLING APPS Shaders SpillVGPR PrivVGPR ScratchSize EffectsCaveDemo 301 0 256 264 ReflectionsSubwayDemo 264 0 256 264 VehicleGame 295 0 128 132 bioshock-infinite 1140 0 448 516 dirt-showdown 453 33 0 28 gang-beasts 364 0 500 496 kerbal-space-program 1228 0 472 480 tomb-raider-ultra 1199 16 0 20 After: VGPR SPILLING APPS Shaders SpillVGPR PrivVGPR ScratchSize EffectsCaveDemo 301 0 256 264 ReflectionsSubwayDemo 264 0 256 264 VehicleGame 295 0 128 132 bioshock-infinite 1140 0 448 516 dirt-showdown 453 33 0 28 gang-beasts 364 0 500 496 kerbal-space-program 1228 0 472 480 The only change in VGPR spills is the elimination of all spills in Tomb Raider at Ultra settings. Closer examination shows that the shaders go over the limit because they contain three expressions a mul, rcp and ubo load. The ubo load is actually used elsewhere and is therefore stored in a temp already in IR such as tgsi but glsl ir counts it agaist the if cost. Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Acked-by: Marek Olšák <marek.olsak@amd.com>
2017-08-25glsl: pass shader source keys to the disk cacheTimothy Arceri1-1/+1
We don't actually write them to disk here. That will happen in the following commit. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-24radeonsi: get the raster config from AMDGPU on SIMarek Olšák1-0/+17
Not sure yet if we wanna do this on CIK and VI too. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-24radeonsi: clean up setting GRBM_GFX_INDEXMarek Olšák1-19/+22
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-24radeonsi: move PA_SC_RASTER_CONFIG emission into a separate functionMarek Olšák1-70/+73
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-23radeonsi: fix wrong assertion in si_init_bindless_descriptors()Samuel Pitoiset1-1/+1
Bad mistake, sorry. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-08-23radeonsi: update comment describing indices into sctx->descriptorsNicolai Hähnle1-6/+5
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-23radeonsi: do not assert when reserving bindless slot 0Samuel Pitoiset1-1/+4
When assertions were disabled, the compiler removed the call to util_idalloc_alloc() and the first allocated bindless slot was 0 which is invalid per the spec. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-08-23radeonsi: rename some bindless-related helper functionsSamuel Pitoiset1-21/+21
I think it makes more sense. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-23radeonsi: minor cleanups in si_make_{texture,image}_handle_resident()Samuel Pitoiset1-12/+12
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>