Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
|
|
|
|
buffers
The GLSL rules for interpolateAtSample are unfortunate:
"Returns the value of the input interpolant variable at
the location of sample number sample. If
multisample buffers are not available, the input
variable will be evaluated at the center of the pixel.
If sample sample does not exist, the position used to
interpolate the input variable is undefined."
This fix will fallback to monolithic shader compilation when
interpolateAtSample is used without multisampling.
One alternative would be to always upload 16 sample positions,
filling the buffer up with repetition when the actual number of
samples is less, and then ANDing the sample ID with 0xf. However,
that punishes all well-behaving users of interpolateAtSample,
when in reality, only conformance tests should be affected by
the issue.
Fixes
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.non_multisample_buffer.*
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
gl_SampleMaskIn is supposed to contain set bits only for the samples that
are covered by the current fragment shader invocation, but the VGPR
initialization hardware loads the set of all bits that are covered at the
current pixel.
Fixes various tests in
dEQP-GLES31.functional.shaders.sample_variables.sample_mask_in.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
This removes the barrier and LDS stores and loads for tess factors
when it's possible. The removal of the barrier seems more important
to me though.
In one shader, it removes 17 * 4 bytes from the shader binary.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
Same as before, writing TCS outputs to LDS is rare.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
TCS outputs are usually not written to LDS, so no stats here.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
-44 bytes in a monolithic LS-HS binary.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
Now it's able to generate ds_write2_b64 instead of ds_write2_b32.
-20 bytes in one shader binary. (having only 1 output)
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
-16 bytes in one shader binary.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
It looks like commit 391673af7ad1565a5f6ac8fc2f8c9fcdd1fe9908 that should
have fixed the perf regression didn't really change much if anything.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
When the HS wave is empty, the hardware writes the LS VGPRs starting at
v0 instead of v2. Workaround by shifting them back into place when
necessary. For simplicity, this is always done in the LS prolog.
According to the hardware team, this will be fixed in future chips,
so take that into account already.
Note that this is not a bug fix, as the bug was already worked
around by commit 166823bfd26 ("radeonsi/gfx9: add a temporary workaround
for a tessellation driver bug"). This change merely replaces the
workaround by one that should be better.
v2: add workaround code to shader only when necessary
v3: clarify the prefer_mono comment
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
Acked-by: Marek Olšák <marek.olsak@amd.com>
|
|
This fixes GL45-CTS.shader_image_load_store.basic-glsl-earlyFragTests.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
This increases performance, but it was tuned for Raven, not Vega.
We don't know yet how Vega will perform, hopefully not worse.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
3 flags for primitive binning, 2 flags for out-of-order rasterization
(but that will be done some other time)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
The data is read when the render_cond_atom is emitted, so we must
delay emitting the atom until after the flush.
Fixes: 0fe0320dc074 ("radeonsi: use optimal packet order when doing a pipeline sync")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
The result written by the shader workaround needs to be written back, or
the CP may read stale data.
Fixes: 78476cfe071a ("radeonsi: enable ARB_transform_feedback_overflow_query")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
Fixes: 420c438589c8 ("radeonsi: log draw and compute state into log context")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
used
v2: remove some redundant checks
Acked-by: Roland Scheidegger <sroland@vmware.com> (v1)
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> (v1)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
|
|
For radv, in order to report VM faults when detected.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
|
|
This fixes a rendering issue with Hitman when bindless textures
are enabled.
Fixes: 2263610827 ("radeonsi: flush DB caches only when transitioning from DB to texturing")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
|
|
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
This is still very simple, but it's better than before.
Loosely ported from Vulkan.
|
|
v2: don't special-case Tonga and Iceland.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
This reverts commit fc99cb3c9edee3af773700cf7ebdc60dc02fcaba.
"The performance went down from 64.7 to 51.4 fps in Valley and from 30.8 to
25.1 fps in Heaven on Radeon HD 7970. Other games seem to have also a 10-25%
performance decrease."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102429
It looks like we can't use the raster config values from the kernel.
|
|
In 74e39de9324d it was set to 3 and it was reported that 4 caused
tesseract to start spilling VGPRs. This no longer seems to be the
case.
Totals:
SGPRS: 2787844 -> 2787764 (-0.00 %)
VGPRS: 1713121 -> 1712717 (-0.02 %)
Spilled SGPRs: 7532 -> 7532 (0.00 %)
Spilled VGPRs: 49 -> 33 (-32.65 %)
Private memory VGPRs: 2060 -> 2060 (0.00 %)
Scratch size: 2200 -> 2180 (-0.91 %) dwords per thread
Code Size: 79265520 -> 79248360 (-0.02 %) bytes
LDS: 436 -> 436 (0.00 %) blocks
Max Waves: 670535 -> 670608 (0.01 %)
Wait states: 0 -> 0 (0.00 %)
Before:
VGPR SPILLING APPS Shaders SpillVGPR PrivVGPR ScratchSize
EffectsCaveDemo 301 0 256 264
ReflectionsSubwayDemo 264 0 256 264
VehicleGame 295 0 128 132
bioshock-infinite 1140 0 448 516
dirt-showdown 453 33 0 28
gang-beasts 364 0 500 496
kerbal-space-program 1228 0 472 480
tomb-raider-ultra 1199 16 0 20
After:
VGPR SPILLING APPS Shaders SpillVGPR PrivVGPR ScratchSize
EffectsCaveDemo 301 0 256 264
ReflectionsSubwayDemo 264 0 256 264
VehicleGame 295 0 128 132
bioshock-infinite 1140 0 448 516
dirt-showdown 453 33 0 28
gang-beasts 364 0 500 496
kerbal-space-program 1228 0 472 480
The only change in VGPR spills is the elimination of all spills
in Tomb Raider at Ultra settings. Closer examination shows that
the shaders go over the limit because they contain three
expressions a mul, rcp and ubo load. The ubo load is actually
used elsewhere and is therefore stored in a temp already in IR
such as tgsi but glsl ir counts it agaist the if cost.
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
|
|
We don't actually write them to disk here. That will happen in the
following commit.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
Not sure yet if we wanna do this on CIK and VI too.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
Bad mistake, sorry.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
|
|
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
When assertions were disabled, the compiler removed
the call to util_idalloc_alloc() and the first allocated
bindless slot was 0 which is invalid per the spec.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
|
|
I think it makes more sense.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|