Age | Commit message (Collapse) | Author | Files | Lines |
|
Write-back caching cannot be used for buffers being scanned out by the
display engine; surfaces used for scan-out must be write-through or
uncached. I originally chose WT for render targets because it works in
all cases. However, we really want to use write-back caching where
possible, as it is more efficient.
Most renderbuffers are not used for scanout - off-screen FBOs certainly
are fine, and non-pageflipped backbuffers should be fine as well. So
in most cases WB will work. However, we don't know what will be used
for scan-out, so we instead simply use the PTE value specified by the
kernel, as it knows these things.
This matches our MOCS choice on Haswell.
Fixes performance regressions since commit ee4484be3dc827cf15bcf109f5
in a microbenchmark (spotted by Eero Tamminen). Improves performance
in GLBenchmark 2.7/EgyptHD by 7.44362% +/- 0.496939% (n=55) on a
Broadwell GT2.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: mesa-stable@lists.freedesktop.org
|
|
Like BDW_MOCS_WB and BDW_MOCS_WT, this specifies that we want to use all
three caches (L3, LLC, and eLLC where available), but leaves the LLC
caching mode up to the kernel's page table entry.
This allows the kernel to pick WB/WT/UC based on whether it's using a
buffer for scanout.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
|
|
Otherwise some parts of tiled slices can be missed.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
Fixes GPUVM faults when running the piglit test "getteximage-formats
init-by-rendering" with R600_DEBUG=forcedma on SI.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
We are currently only dealing with depth-only or stencil-only resources
here, not with resources having both depth and stencil[0]. In both cases,
the tiling mode index is in the tile_mode field, not in the
stencil_tile_mode field.
[0] Add an assertion for that.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
The VE format of edge flag pointers was changed in
780ce576bb1781f027797039693b98253ee4813e.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
|
|
Add finalize_vertex_elements() to finalize ilo_ve_state. This fixes a
potential issue with URB entry allocation for VS and move the complexity of
gen6_3DSTATE_VERTEX_ELEMENTS() to the new function.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
|
|
To replace the hacky zs_align_surface().
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
|
|
The size is always 24 bytes. We can upload them to the dynamic buffer.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
|
|
Commit "st/xa: scissor to help tilers" broke xa_yuv_planar_blit() and vmwgfx
textured video. Fix this by implementing scissors also in the yuv draw path.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Cc: Rob Clark <robclark@freedesktop.org>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
|
|
Unused; it was replaced by include/pci_ids/i965_pci_ids.h long ago.
Acked-by: Matt Turner <mattst88@gmail.com>
|
|
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
The code was kind of mixed up what buffers were getting stored in the case
that a resolve bit was unset (which are set based on the GL state at draw
time) and the buffer wasn't actually bound. In particular, depth-only
rendering would store the color buffer contents, which happen to be
pointing at the depth buffer.
Thanks to clearing out the resolve bits for things we really can't
resolve, now I can drop the safety checks for buffer presence around the
actual stores.
Fixes 42 piglit tests.
|
|
|
|
Notice the mistaken (but harmless) argument swapping in brw_math_invert().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
Tested-by: Mark Janes <mark.a.janes@intel.com>
|
|
My attempts to clarify the code with _compacted/_uncompacted prefixed
variables apparently failed. Hopefully this is clearer.
In any case, the previous code wasn't clear enough to gcc to let it
optimize division by a power of two into a shift. No problems now.
Also, the previous code (in the ADD case) didn't work on 32-bit x86, due
to complicated set of interactions best summed up as unsigned division
and compiler optimizations.
Tested-by: Mark Janes <mark.a.janes@intel.com>
|
|
We need to keep track if a state change other than frag/vert shader
state will trigger us to need a different shader variant, and if
necessary mark the appropriate shader state as dirty. Otherwise we will
forget to re-emit the shader state.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
This is for hw that needs to emulate some texture wrap modes (like
CLAMP) with some help from the shader.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
Keep the existing function as a common helper. But this lets us move an
a2xx specific hack out of common code. And the PIPE_TEX_WRAP_CLAMP
emulation will require an a3xx specific hack. So rather than piling on
hacks, split this out.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
We just clamp the incoming texture coordinates. This breaks the lambda
calculation, but it gets the piglit tests to pass. This is the same
behavior as in i965.
|
|
One spot in the docs says that it's stored at a miplevel just beyond the
last miplevel, which was scary. But really, you just load it as the R
coordinate (which conflicts with cubemaps, but you don't do border
clamping on cubes).
|
|
We have to expose them for GL 2.0, but we just always return a value of 0.
We should be advertising 0 query bits instead of 64, but gallium doesn't
have plumbing for that yet. At least this stops the segfaults.
|
|
Drops instructions on vs-temp-array-mat4-index-col-row-wr.shader_test,
which I was looking at because it's failing to register allocate.
|
|
Definitely helps when trying to understand and optimize a program.
|
|
This may reduce register pressure and uniform counts. Drops a bunch of 0
uniform loads on vs-temp-array-mat4-index-col-row-wr.shader_test, which is
failing to register allocate.
|
|
It's not passing some of the piglit tests, because it looks like at small
miplevels some contents from surrounding faces are getting filtered in at
the corners. It does get 7 new tests passing.
|
|
In the other related fields, "0" refers to the size of the first miplevel,
while this is a field in a slice. The other implicit slices we have
(cubemap layers) don't vary in size compared to the first one.
|
|
Almost always, the MOV will get copy propagated out. Even if it doesn't,
it's probably better to just reload the uniform at next use (to reduce
register pressure) rather than try to save instruction count.
I was looking at this because in the presence of texturing (which calls
add_uniform() directly to get the uniform load forced into the
instruction) the c->uniform_contents indices don't match 1:1 with the
temporary qregs.
|
|
commit 4ed23fd broke creation of pbuffer surfaces, patch fixes
the failure, noticed when running chrome with '--use-gl=egl'.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
|
|
An 'else' is missing in the disassembler.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
|
|
According to GLES (i.e. 1.0 and above) spec textureCubeLod and
texture2DProjLod are built in functions. We seem to disable support
for these functions with GLES. This patch enables the support.
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84355
|
|
We need 2.4.57 for fd_bo_dmabuf() / fd_bo_from_dmabuf().
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
pow(x, y) is equivalent to exp(log(x) * y).
instructions in affected programs: 578 -> 458 (-20.76%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
|
|
When an instruction's result was consumed by multiple mov.sat
instructions, we would decide that we couldn't move the saturate
modifier because something else was using the result, even though it was
just another mov.sat!
total instructions in shared programs: 4275598 -> 4274842 (-0.02%)
instructions in affected programs: 75634 -> 74878 (-1.00%)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
|
|
When we find a mov.sat, we search backwards. We might as well search
everything else backwards as well and potentially look at fewer
instructions.
This change enables the next patch.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
|
|
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
Get rid of the 'default' case (as suggestied by imirkin) so compiler
warns us about missing caps. Also add some caps that were missing until
now.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
4155d1c7 'st/mesa: drop dependence on API profile in st_init_extensions'
broke freedreno because somehow 'PIPE_CAP_MAX_VIEWPORTS' fell through
the cracks. Resulting that we reported zero viewports. So the state
tracker never bothered to give us any valid viewport!
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
Among other things, fixes a bug for fixed point registers/bitfields.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
At least on a3xx, we cannot do it without some emulation in shader.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|