Age | Commit message (Collapse) | Author | Files | Lines |
|
Only fully-qualified direct derefs, collected in direct_deref_nodes,
are checked for aliasing, so it is already known up front that they
have only array derefs of type direct.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
|
|
The return value was needed to make use of the old nir_foreach_block
helper, but not needed anymore with the macro version. Then go one
step further and move the foreach directly into the register variable
uses function.
v2: Move foreach to register_variable_uses(). (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
|
|
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
|
|
This replaces some "if (...} { }" with "if (...) continue;" to reduce
nesting depth and makes nir_metadata_preserve conditional on progress
for the given impl.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
|
|
However, it only fails when running out of memory. Now, if we
are about to check that, we should be consistent and check
the allocation of the worklist as well.
CID: 1433512
Fixes: edb18564c7 nir: Initial implementation of a nir_instr_worklist
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
|
|
Earlier commit enforced that we'll bail out if the number of terminators
is different than 2. With that in mind, the assert() will never trigger.
Fixes: 56b867395de ("glsl: fix infinite loop caused by bug in loop
unrolling pass")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
|
|
Fixes: 042ee4bea26 "(spirv: Move SPIR-V building to Makefile.spirv.am and
spirv/meson.build")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
|
|
We neeed to skip the var if its not a uniform here as well as checking
the bindless flag since UBOs can contain bindless samplers.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
|
|
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
|
|
So it is more clear about when to use nir_instr_rewrite_src()
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
|
|
Since 31d91f019b58ca362c05db1fd0c75fedd169cd7b, the makefile tries to
find the file SConstript.spirv instead of SConscript.spirv which breaks
the make dist command.
Reviewed-by: Brian Paul <brianp@vmware.com>
|
|
In the emit_copies() function, the use of "newv" and "temp" names made
sense when only copies from temporaries to the new variables were
being done. But now there are other calls to copy with other pairings,
and "temp" doesn't always refer to a temporary created in this
pass. Use the names "dest" and "src" instead.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
|
|
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
From the SPIR-V spec, OpTypeImage:
"Depth is whether or not this image is a depth image. (Note that
whether or not depth comparisons are actually done is a property of
the sampling opcode, not of this type declaration.)"
The sampling opcodes that specify depth comparisons are
OpImageSample{Proj}Dref{Explicit,Implicit}Lod, so we should set
is_shadow only for these (we were using the deph property of the
image until now).
v2:
- Do the same for OpImageDrefGather.
- Set is_shadow to false if the sampling opcode is not one of these (Jason)
- Reuse an existing switch statement instead of adding a new one (Jason)
Fixes crashes in:
dEQP-VK.spirv_assembly.instruction.graphics.image_sampler.depth_property.*
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
|
|
Otherwise we may end up trying to coalesce in a case such as
ssa_1 = fadd r1, r2
r3.x = fneg(r2);
r3 = vec4(ssa_1, ssa_1.y, ...)
and that would cause us to move the writes to r3 from the vec to the
fadd which would re-order them with respect to the write from the fneg.
In order to solve this, we just don't coalesce if the destination of the
vec is not SSA. We could try to get clever and still coalesce if there
are no writes to the destination of the vec between the vec and the ALU
source. However, since registers only come from phi webs and indirects,
the chances of having a vec with a register destination that is actually
coalescable into its source is very slim.
Shader-db results on Haswell:
total instructions in shared programs: 13657906 -> 13659101 (<.01%)
instructions in affected programs: 149291 -> 150486 (0.80%)
helped: 0
HURT: 592
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105440
Fixes: 2458ea95c56 "nir/lower_vec_to_movs: Coalesce movs on-the-fly when possible"
Reported-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Tested-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
This fixes a bug in radeonsi where LLVM cannot handle the case where
a break exists but its not the last instruction in the block.
LLVM would fail with:
Terminator found in the middle of a basic block!
LLVM ERROR: Broken function found, compilation aborted!
Fixes: 96fe8834f539 "glsl_to_tgsi: do fewer optimizations with GLSLOptimizeConservatively"
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105317
|
|
Add helpers to get the number of src/dest components for an intrinsic,
and update spots that were open-coding this logic to use the helpers
instead.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
We were validating this for locals but nothing else.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
This fixes the fs-interpolateAtCentroid-block-array piglit test on i965.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
|
|
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
|
|
Because nir_instr_remove is an inline wrapper around nir_instr_remove_v,
the compiler should be able to tell that the return value is unused and
not emit the extra code in most cases.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
We already have these for bit_size
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
Reviewed-by: Neil Roberts <nroberts@igalia.com>
|
|
The MSVC compiler warns when the function parameter types don't
exactly match with respect to enum vs. uint32_t. Use SpvOp everywhere.
Alternately, uint32_t could be used everywhere. There doesn't seem
to be an advantage to one over the other.
Reviewed-by: Neil Roberts <nroberts@igalia.com>
|
|
Reviewed-by: Neil Roberts <nroberts@igalia.com>
|
|
This needs to before the function, not after, to compile with MSVC.
This works with gcc too.
Reviewed-by: Neil Roberts <nroberts@igalia.com>
|
|
Fixes warning that "negation of an unsigned value results in an
unsigned value".
Reviewed-by: Neil Roberts <nroberts@igalia.com>
|
|
The SCons build broke with commit ba975140d3c9 because a SPIR-V
function is called from Mesa main. This adds a convenience library for
SPIR-V and adds it to everything that was including nir. It also adds
both nir and spirv to drivers/x11/SConscript.
Also add nir/spirv modules to osmesa and libgl-gdi targets. (Brian Paul)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105817
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
|
|
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
|
|
ARB_gl_spirv adds the ability to use SPIR-V binaries, and a new
method, glSpecializeShader. Here we add a new function to do the
validation for this function:
From OpenGL 4.6 spec, section 7.2.1"
"Shader Specialization", error table:
INVALID_VALUE is generated if <pEntryPoint> does not name a valid
entry point for <shader>.
INVALID_VALUE is generated if any element of <pConstantIndex>
refers to a specialization constant that does not exist in the
shader module contained in <shader>.""
v2: rebase update (spirv_to_nir options added, changes on the warning
logging, and others)
v3: include passing options on common initialization, doesn't call
setjmp on common_initialization
v4: (after Jason comments):
* Rename common_initialization to vtn_builder_create
* Move validation method and their helpers to own source file.
* Create own handle_constant_decoration_cb instead of reuse existing one
v5: put vtn_build_create refactoring to their own patch (Jason)
v6: update after vtn_builder_create method renamed, add explanatory
comment, tweak existing comment and commit message (Timothy)
|
|
Refactored from spirv_to_nir, in order to be reused later.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
v2: renamed method (from vtn_builder_create), add explanatory comment
(Timothy)
|
|
Future changes will add generated files used only from
src/compiler/glsl. These can't be built from Makefile.nir.am, and we
can't move all the rules from Makefile.nir.am to Makefile.spirv.am (and
it would be silly anyway).
v2: Do it for meson too.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (the meson bits)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (the automake bits)
|
|
This slightly simplifies later changes that add more Makefile.*.am
files.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
|
|
Previously bitset.h would include u_math.h to get bitscan.h. u_math.h
lives in src/gallium/auxiliary/util while both bitset.h and bitscan.h
live in src/util. Having the one file directly include another file
that lives in the same directory makes much more sense.
As a side-effect, several files need to directly include standard header
files that were previously indirectly included.
v2: Fix build break in src/amd/common/ac_nir_to_llvm.c.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
|
|
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
|
|
Co-authored-by: Daniel Schürmann <daniel.schuermann@campus.tu-berlin.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
|
|
These are needed for SPV_AMD_shader_trinary_minmax,
the AMD HW supports these.
Co-authored-by: Daniel Schürmann <daniel.schuermann@campus.tu-berlin.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
|
|
Fixes: 76dfed8ae2d5 ("nir: mako all the intrinsics")
Signed-off-by: Stefan Schake <stschake@gmail.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
|
|
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
|
|
I have no idea why but having dest_components == -1 was causing a memory
leak somewhere. Without this, you can't get through a full shader-db
run without running out of memory.
Reviewed-by: Rob Clark <robdclark@gmail.com>
|
|
When an if nesting inside anouther if is optimised away we can
end up with a loop terminator and following block that looks like
this:
if ssa_596 {
block block_5:
/* preds: block_4 */
vec1 32 ssa_601 = load_const (0xffffffff /* -nan */)
break
/* succs: block_8 */
} else {
block block_6:
/* preds: block_4 */
/* succs: block_7 */
}
block block_7:
/* preds: block_6 */
vec1 32 ssa_602 = phi block_6: ssa_552
vec1 32 ssa_603 = phi block_6: ssa_553
vec1 32 ssa_604 = iadd ssa_551, ssa_66
The problem is the phis. Loop unrolling expects the last block in
the loop to be empty once we splice the instructions in the last
block into the continue branch. The problem is we cant move phis
so here we lower the phis to regs when preparing the loop for
unrolling. As it could be possible to have multiple additional
blocks/ifs following the terminator we just convert all phis at
the top level of the loop body for simplicity.
We also add some comments to loop_prepare_for_unroll() while we
are here.
Fixes: 51daccb289eb "nir: add a loop unrolling pass"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105670
|
|
Apparently it is not happy about things like: .foo = {}
So skip over initializers for empty lists.
Fixes: 76dfed8ae2d5c6c509eb2661389be3c6a25077df
Reported-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
|
|
I threatened to do this a long time ago.. I probably *should* have done
it a long time ago when there where many fewer intrinsics. But the
system of macro/#include magic for dealing with intrinsics is a bit
annoying, and python has the nice property of optional fxn params,
making it possible to define new intrinsics while ignoring parameters
that are not applicable (and naming optional params). And not having to
specify various array lengths explicitly is nice too.
I think the end result makes it easier to add new intrinsics.
v2: couple small fixes found with a test program to compare the old and
new tables
v3: misc comments, don't rely on capture=true for meson.build, get rid
of system_values table to avoid return value of intrinsic() and
*mostly* remove side-effects, add autotools build support
v4: scons build
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
|
|
This is supposed to have both BASE and COMPONENT but num_indices was
inadvertantly set to 1.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
|
|
The VECN() macro was taking advantage of a GCC specific feature that is
not available on lesser compilers, mostly for the purposes of avoiding a
macro that encoded a return statement.
But as suggested by Ian, we could just have the macro produce the entire
method body and avoid the need for this. So let's do that instead.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105740
Fixes: f407edf3407396379e16b0be74b8d3b85d2ad7f0
Cc: Emil Velikov <emil.velikov@collabora.com>
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Cc: Roland Scheidegger <sroland@vmware.com>
Cc: Ian Romanick <idr@freedesktop.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
|
|
Just checking for 2 jumps is not enough to be sure we can do a
complex loop unroll. We need to make sure we also have also found
2 loop terminators.
Without this we were attempting to unroll a loop where the second
jump was nested inside multiple ifs which loop analysis is unable
to detect as a terminator. We ended up splicing out the first
terminator but failed to actually unroll the loop, this resulted
in the creation of a possible infinite loop.
Fixes: 646621c66da9 "glsl: make loop unrolling more like the nir unrolling path"
Tested-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105670
|
|
Now that i965 recognizes that a-b generates the same conditions as 'a <
b', there is no reason to condition this transformation on 'is not used
by conditional.'
Since this was the only user of the is_not_used_by_conditional function,
delete it.
All Gen6+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 14400775 -> 14400595 (<.01%)
instructions in affected programs: 36712 -> 36532 (-0.49%)
helped: 182
HURT: 26
helped stats (abs) min: 1 max: 2 x̄: 1.13 x̃: 1
helped stats (rel) min: 0.15% max: 1.82% x̄: 0.70% x̃: 0.62%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.24% max: 1.02% x̄: 0.82% x̃: 0.90%
95% mean confidence interval for instructions value: -0.97 -0.76
95% mean confidence interval for instructions %-change: -0.59% -0.43%
Instructions are helped.
total cycles in shared programs: 532929592 -> 532926345 (<.01%)
cycles in affected programs: 478660 -> 475413 (-0.68%)
helped: 187
HURT: 22
helped stats (abs) min: 2 max: 200 x̄: 20.99 x̃: 18
helped stats (rel) min: 0.23% max: 24.10% x̄: 1.48% x̃: 1.03%
HURT stats (abs) min: 1 max: 214 x̄: 30.86 x̃: 11
HURT stats (rel) min: 0.01% max: 23.06% x̄: 3.12% x̃: 0.86%
95% mean confidence interval for cycles value: -19.50 -11.57
95% mean confidence interval for cycles %-change: -1.42% -0.58%
Cycles are helped.
GM45 and Iron Lake had similar results. (Iron Lake shown)
total cycles in shared programs: 177851578 -> 177851810 (<.01%)
cycles in affected programs: 24408 -> 24640 (0.95%)
helped: 2
HURT: 4
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 0.42% max: 0.47% x̄: 0.44% x̃: 0.44%
HURT stats (abs) min: 24 max: 108 x̄: 60.00 x̃: 54
HURT stats (rel) min: 0.52% max: 1.62% x̄: 1.04% x̃: 1.02%
95% mean confidence interval for cycles value: -7.75 85.08
95% mean confidence interval for cycles %-change: -0.39% 1.49%
Inconclusive result (value mean confidence interval includes 0).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
Not used in GL but 8 and 16 component vectors exist in OpenCL.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
|