Age | Commit message (Collapse) | Author | Files | Lines |
|
compressed textures are very slow because decoding is rather complex
(and because there's no jit code code to decode them too for non-technical
reasons).
Thus, add some texture cache which holds a couple of decoded blocks.
Right now this handles only s3tc format albeit it could be extended to work
with other formats rather trivially as long as the result of decode fits into
32bit per texel (ideally, rgtc actually would decode to more than 8 bits
per channel, but even then making it work for it shouldn't be too difficult).
This can improve performance noticeably but don't expect wonders (uncompressed
is unsurprisingly still faster). It's also possible it might be slower in
some cases (using nearest filtering for example or if there's otherwise not
many cache hits, the cache is only direct mapped which isn't great).
Also, actual decode of a block relies on util code, thus even though always
full blocks are decoded it is done texel by texel - this could obviously
benefit greatly from simd-optimized code decoding full blocks at once...
Note the cache is per (raster) thread, and currently only used for fragment
shaders.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
|
|
When the draw module splits long line loops, the sections are emitted
as line strips. But the primitive type wasn't set correctly so each
section was being drawn as a loop, introducing extra line segments.
To fix this, we pass a new DRAW_LINE_LOOP_AS_STRIP flag to the run()
function. The linear/elt_run() functions have to check for this flag
and set their primitive type accordingly.
No piglit regressions. Fixes piglit's lineloop with -count 4097 or
higher.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81174
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
|
|
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
|
|
Generated by running:
git grep -l INLINE src/gallium/ | xargs sed -i 's/\bINLINE\b/inline/g'
git grep -l INLINE src/mesa/state_tracker/ | xargs sed -i 's/\bINLINE\b/inline/g'
git checkout src/gallium/state_trackers/clover/Doxyfile
and manual edits to
src/gallium/include/pipe/p_compiler.h
src/gallium/README.portability
to remove mentions of the inline define.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Marek Olšák <marek.olsak@amd.com>
|
|
This extends the draw code to add support for invocations.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
This is just for softpipe, llvmpipe won't work without
some changes.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
To allow for shaders which use SVIEW decls for TEX* instructions, we
need to preserve the constraint that the shader either has no SVIEW's or
it has one matching SVIEW for each SAMP.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
|
|
This probably got broken when the samplers were converted to be indexed
by shader type.
Seen when looking at bug 89819 though I'm not sure if that really was what
the bug was about...
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
|
|
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
|
|
Was off-by-one. llvm says inserting an element with an index higher than the
number of elements yields undefined results. Previously such inserts were
ignored but as of llvm revision 235854 the vector gets replaced with undef,
causing failures.
This fixes piglit gl-3.2-layered-rendering-gl-layer, as mentioned in
https://llvm.org/bugs/show_bug.cgi?id=23424.
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: mesa-stable@lists.freedesktop.org
|
|
We were resetting the prim id count for each run of the prim assembler,
hence this only worked when the draw calls were very small (the exact limit
depending on the vertex size), since larger draw calls get split up.
So, do the same as we do already if there's a gs, reset it to zero explicitly
for every new instance (this possibly could use the same variable but that
isn't doable without some heavy refactoring and I'm not sure it makes sense).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90130.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
CC: <mesa-stable@lists.freedesktop.org>
|
|
Neither the shader nor the key change when doing elts or linear variant, so
this was just annoying (probably mildly useful at some point when we printed
the IR per function too).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
|
|
llvm goes crazy when doing that, using way more memory and time, though there's
probably more to it - this points to a very much similar issue as fixed in
8a9f5ecdb116d0449d63f7b94efbfa8b205d826f. In any case I've seen a quite
plain looking vertex shader with just ~50 simple tgsi instructions (but with a
dozen or so such indirect constant buffer lookups) go from a terribly high
~440ms compile time (consuming 25MB of memory in the process) down to a still
awful ~230ms and 13MB with this fix (with llvm 3.3), so there's still obvious
improvements possible (but I have no clue why it's so slow...).
The resulting shader is most likely also faster (certainly seemed so though
I don't have any hard numbers as it may have been influenced by compile times)
since generally fetching constants outside the buffer range is most likely an
app error (that is we expect all indices to be valid).
It is possible this fixes some mysterious vertex shader slowdowns we've seen
ever since we are conforming to newer apis at least partially (the main draw
loop also has similar looking conditionals which we probably could do without -
if not for the fetch at least for the additional elts condition.)
v2: use static vars for the fake bufs, minor code cleanups
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
|
|
This has got a bit out of control with more and more parameters added.
Worse, whenever something in there changes all callees have to be updated
for that, even though they don't really do much with any parameter in there
except pass it on to the actual sampling function.
Hence simply put almost everything into a struct. Also instead of relying
on some arguments being NULL, be explicit and set this in a key (which is
just reused for function generation for simplicity). (The code still relies
on them being NULL in the end for now.)
Technically there is a minimal functional change here for shadow sampling:
if shadow sampling is done is now determined explicitly by the texture
function (either sample_c or the gl-style tex func inherit this from target)
instead of the static texture state. These two should always match, however.
Otherwise, it should generate all the same code.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
|
|
The callbacks used for getting the dynamic texture/sampler state were using
the jit_context from the generated jit function. This works just fine, however
that way it's impossible to generate separate functions for texture sampling,
as will be done in the next commit. Hence, pass this pointer through all
interfaces so it can be passed to a separate function (technically, it would
probably be possible to extract this pointer from the current function instead,
but this feels hacky and would probably require some more hacks if we'd use
real functions instead of inlining all shader functions at some point).
There should be no difference in the generated code for now.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
|
|
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89372
Reviewed-by: Dave Airlie <airlied@redhat.com>
|
|
The code was exactly the same, except util/ has c++ guards and a struct
simple_node declaration.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
v2: move initialization of llvm_gs to declaration.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
|
|
This reverts db3dfcfe90a3d27e6020e0d3642f8ab0330e57be.
The commit was correct but we've got some precision problems later in
llvmpipe (or possibly in draw clip) due to the vertices coming in in
different order, causing some internal test failures. So revert for now.
(Will only affect drivers which actually support constant-interpolated
attributes and not just flatshading.)
|
|
This fixes 4 vertexid related piglit tests with llvmpipe due to switching
behavior of vertexid to the one gl expects.
(Won't fix non-llvm draw path since we don't get the basevertex currently.)
|
|
Because all topologies are reduced to basic primitives (i.e. no strips, fans)
and the vertices involved are all copied, there's no need for any elaborate
decisions where to insert the prim id. The logic employed was correct for
first provoking vertex, but didn't account at all for the last provoking
vertex case. And since we now will get the right constant value even if the
primitive type is later changed (for unfilled etc.) this is no longer
required to pass certain tests (which were checking for prim_id == some
const interpolated value so passing because both were wrong in the end).
This is a bit overkill (3x4 values assigned in total even though it's really
one scalar per prim...) but the code is now much easier and I don't need to
add more cases for last provoking vertex.
This fixes piglit primitive-id-no-gs-strip test.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
|
|
Previously the first provoking vertex convention would only be used if
flatshading were enabled. No matter how I look at it that cannot be possibly
correct. Maybe the code getting used was somewhat simpler that way at a time
where there weren't constant interpolated attributes, only flatshading...
(Note that all other places including the decomposition macros already do
the same.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
|
|
This stage only worked for traditional old-school flatshading, it did ignore
constant interpolated values and only handled colors, the code probably
predates using of constant interpolated values in gallium. So fix this - the
clip stage apparently did this a long time ago already.
Unfortunately this also means the stage needs to be invoked when flatshading
isn't enabled but some other prim changing stages are - for instance with
fill mode line each of the 3 lines in a tri should get the same attribute
value from the leading vertex in the original tri if interpolation is constant,
which did not happen before
Due to that, the stage is now run in more cases, even unnecessary ones. Could
in theory skip it completely if there aren't any constant interpolated
attributes (and rast->flatshade isn't set), but not sure it's worth bothering,
as it looks kinda complicated getting this information in advance.
No piglit change (doesn't really cover this directly).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
|
|
Just like we do for tris (det shouldn't matter at this point, however
can have flags for things like line stipple reset).
No piglit change, it would fail line stippling tests if the flatshade
stage were run, which will happen with the next commit.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
|
|
Required by Nine. Tested with util_run_tests.
It's added to softpipe, llvmpipe, and r300g/swtcl.
Tested-by: David Heidelberg <david@ixit.cz>
|
|
|
|
The prim assembler may change the prim type when injecting prim ids now,
which isn't reflected by what's stored in emit.
This looks brittle and potentially dangerous (it is not obvious if such prim
type changes are really supported by pt emit, the prim type is actually also
set in prepare which would then be different).
This fixes piglit primitive-id-no-gs-first-vertex.shader_test.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
|
|
The decomposition done in the prim assembler will turn tri fans into tris,
but this wasn't reflected in the output prim type. Meaning with a tri fan
with 6 verts input, the output was a tri fan with 12 vertices instead of a
tri list with 12 vertices (not as bad as it sounds, since the additional tris
created would all be degenerate since they'd all have two times vertex zero
but still bogus).
This is because the prim assembler is used if either the input topology is
something with adjacency, or if prim id needs to be injected, and for the
latter case topologies without adjacency can be converted to basic ones.
Unfortunately decomposition here for inserting prim ids is necessary, at
least for the indexed case where we can't just insert the prim id at the
right place depending on provoking vertex.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
|
|
The default macros when the adjacency macros aren't defined will already
exactly do that (that is, drop the adjacent vertices and call the non-adjacent
macro).
Reviewed-by: Jose Fonseca <jfonseca@vmwarec.com>
|
|
Addresses MSVC warnings "result of 32-bit shift implicitly converted to
64 bits (was 64-bit shift intended?)", which can often be symptom of
bugs, but in these cases were all benign.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
|
|
This patch remove workaround related to LLVM < 3.2 bug.
Original bug has been closed as fixed in 2011.
At this moment gallium requires LLVM 3.3 (2013).
LLVM has been tested without SSE2 support in commit
ca70de9bd20bc4a11b2d2d368e0cc1f49527a947 and removed after requiring
LLVM 3.3 in commit 013ff2fae13da41c2f5619c4698b0a7b5aa6a06d
Original LLVM bug: http://llvm.org/bugs/show_bug.cgi?id=6960
Signed-off-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
|
|
Mostly add a couple cases so we don't just check gs for this.
There's only one gotcha, the built-in vp transform in the llvm vs can't
handle it (this would be fixable though non-trivial due to vp index being
non-constant for the SoA outputs, but we don't use it if there's a gs
neither - the whole clip/vp transform integration there is suboptimal).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
|
|
Reviewed-by: Brian Paul <brianp@vmware.com>
|
|
We cannot guarantee that vertex buffers have the necessary alignment for
fetching all AoS members at once (for instance 4x32bit XYZW data). We can
however guarantee that for textures. This did not cause errors for older
llvm versions but it now matters and will cause segfaults if the data
happens to not be aligned. Thus we need to set alignment manually.
(Note that we can't actually really guarantee data to be even element aligned
due to offsets in vertex buffers being bytes and OpenGL allowing this, but
it does not matter for x86 as alignment is only required for sse vectors -
not sure what happens on other archs, however.)
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=85467.
|
|
structures.
No change in behavior.
|
|
scale[3].
Unfortunately no LLVM type was generated for pipe_viewport_state -- it
was being treated as a single floating point array --, so llvmpipe (and
any driver that relies on draw/llvm) got totally busted.
|
|
Almost all drivers ignore them.
|
|
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
v2: fix svga too
|
|
Use an array of properties indexed by TGSI_PROPERTY_* definitions.
|
|
Reuse the LLVMContext already allocated in llvmpipe_context
for draw_llvm if ppossible. This should decrease the memory
footprint of an llvmpipe context.
v2: Fix compile with llvm disabled.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
|
|
This is one step to make llvmpipe thread safe as mandated by the OpenGL
standard. Using the global LLVMContext is obviously a problem for
that kind of use pattern. The patch introduces two LLVMContext
instances that are private to an OpenGL context and used for all
compiles. One is put into struct draw_llvm and the other
one into struct llvmpipe_context.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
|
|
The draw module would still try to use gallivm, causing many piglit tests
to fail with an assertion failure. llvmpipe might have been similarly
affected.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
|
|
I only made IS_NEGATIVE(x) use signbit in commit 0f3ba405 in an attempt
to fix 54805, but it didn't help. We didn't use signbit on some
platforms and instead defined it to x < 0.0f.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
|
|
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
|
|
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
|
|
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
|
|
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
|
|
This simplifies the code and makes it a little easier to understand.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
|
|
|