Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
|
|
|
|
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
|
|
OMG. there are functions in this class that are abasolutelye insane.
|
|
I have verified that after being run through indent that the two output
files are the same except for whitespace and a few variations to
comments.
|
|
|
|
|
|
This replaces a Printer class with a mako template.
|
|
The string module isn't used very much, since the str class (which is a
builtin) has most of the same functionality.
|
|
|
|
This version makes use of the fact that strings are sequences and can be
looped over, it also simplifies the handling of the 'ARB' special case.
|
|
This patch fixes a number of style issues in the file. A short list of
them is as follows:
- Don't shadow python builtins
- Reformat docstrings
- don't use return at the end of every function (a blank return in
python returns None, which is the same as not using a return)
- fix spacing around lists and function arguments
- delete unused variables
- use _ to hold unused values when exploding containers
- correct spacing around assignments
- compare to None using 'is' rather than '=='
|
|
wrap long lines, remove superfluous semicolons, replace a really stupid
loop with not a loop.
This produces exactly the same code as the previous patch.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
|
|
This produces largely the same output (once it's piped through indent),
there is still some whitespace differences, plus some comments are
changed, including the formatting of the copyright header. This is due
to shared code between the template generators that wasn't present
before.
None of these differences should result in changes once the code is
compiled.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
|
|
This makes the code much easier to read and work with.
|
|
has_key is deprecated, and since part of the goal of this series is to
be able to use either python2 or python3 we shouldn't use very old
python2 only methods.
|
|
|
|
Python has sensible, builtin bool types. Use them.
|
|
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
|
|
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
|
|
This generates whitespace equivalent mako.
This adds some python infrastructure that might be useful in previous
patches for simplification.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
|
|
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
|
|
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
|
|
This uses mako instead of the original print wall. After running this
code through indent with the same flags and INDENT_FLAGS in configure.ac
this results in a few extra lines of whitespace, which can't be removed
without massively reducing the readability of the template (which is
already a little convoluted.
I'm not really happy with this particular generator, but I'd like to
punt being smart until after rewriting the underlying xml parsers.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
|
|
This produces what is effectively the same code as the previous
generator (minus whitespace changes). The difference is the way hex is
printed, and the capitalization of the hex string.
The details are that the original code implemented '0x' as a string, and
then joined the resulting hex to it. The new code uses a hex formatter
string. Due to a limitation in python letters in a hex string are all
caps, or all lower.
So, this happened:
0xBEEFBEEF -> 0xbeefbeef
The other option would be to have:
0xBEEFBEEF -> 0XBEEFBEEF
When I polled people around the office they preferred all lower to all
upper, but I don't care either way.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
|
|
|
|
This patch converts another function to use mako instead of print.
It comes with the same warnings at the previous makoizing patches, there
are small changes, some small differences that only a human will notice.
There is one major comment omission that is worth mentioning, the
comments in the MESA_alt_functions struct identifying which function is
aliased are not longer generated. These could be added back in, but
would require turning some lazy code into eager code, for the purpose of
printing a comment. I felt that the advantages of using the lazy
operators was more valuable than the comments, but if others object I
will change it.
|
|
This patch converts the other function of gl_table.py to use mako
instead of print to generate the dispatch table header. This results in
code that is easier to read and understand, even if the end lines of
code are increased (This is due to some static content like copyright
headers being duplicated in and between templates.
This produces the same content with a couple of trivial human readable
differences
1) The copyright headers has been re-formated slightly, line breaks have
changed and 'IBM' has been replaced with 'THE AUTHORS OR COPYRIGHT
HOLDERS', a more standard MIT license header
2) The "do not edit" message is reworded slightly
3) A few comments have been removed
|
|
This uses a very simple mako template and a couple of generators to
create the gl_table.h file, as opposed to a complex class of print
statements. The result is easier to read, and easier to modify.
There are a few minor differences between the file generated after this
patch and before.
1) The formatting of the copyright is slightly different: lines are
wrapped in slightly different places, and the explicit names of the
authors and/or copyright holders in the final clause are replaced
with "THE AUTHORS OR COPYRIGHT HOLDERS"
2) The "DO NOT EDIT" comment is slightly reworded
3) Some ifdef changes. Each level of preprocessor macro now has one
additional space between the # and the first letter of the word, and
some comments were removed because the brevity of the template makes
it obvious which ifdefs and endifs go together.
|
|
These are basically just moves, so they should be safe as well.
When disabling i965's GLSL IR level scalarizer (channel expressions)
pass, I started seeing NIR code like this:
if ssa_21 {
block block_1:
/* preds: block_0 */
vec4 ssa_120 = vec4 ssa_82, ssa_83, ssa_84, ssa_30
/* succs: block_3 */
} else {
block block_2:
/* preds: block_0 */
/* succs: block_3 */
}
block block_3:
/* preds: block_1 block_2 */
vec4 ssa_33 = phi block_1: ssa_120, block_2: ssa_2
Previously, the GLSL IR scalarizer pass would break the vec4 into a
series of fmovs, which were allowed by the peephole pass. But with
the vec4 operation, they were not. We want to keep getting selects.
Normal i965 on Broadwell:
instructions in affected programs: 200 -> 176 (-12.00%)
helped: 4
With brw_fs_channel_expressions() disabled:
instructions in affected programs: 1832 -> 1646 (-10.15%)
helped: 30
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
|
|
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
|
|
This was originally only used by the vertex shader, but it's now used by
the geometry shader as well, and will also eventually be used for
tessellation control and evaluation shaders.
I suspect it will be easier to find in a file named after the concept.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
|
|
This implements a workaround (exact excerpt as a comment in the code). The docs
specify [clearly, after you struggle for a while] that the offset isn't relative
to state base. This actually makes sense. This fixes hangs on SKL.
Buffer #0 is meant to be used for normal uniforms.
Buffer #1 is typically used for gather constants when using RS.
Buffer #1-#3 could be used to push a bunch of UBO data which would just be
somewhere in memory, and not relative to the dynamic state.
NOTE: I've moved away from the ternary operator for the new gen9 conditions.
Admittedly it's probably not great to do this, but I really want to fix this all
up in the subsequent patch and doing it here makes that diff a lot nicer. I want
to split out the gen8/9 code to make the function a bit more readable, but to
keep this easily cherry-pickable I am doing this fix first. If we decide not to
merge the cleanup patch then I can revisit this.
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Valtteri Rantala <Valtteri.rantala@intel.com>
|
|
Print GL_FLOAT, etc. instead of hex value.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
It allows us to remove ilo_ib_state::draw_start_offset and
ILO_PRIM_RECTANGLES. gen6_3d_translate_pipe_prim() is also replaced by
ilo_translate_draw_mode().
|
|
With ilo_format.[ch] moved out of core, the aligning of vertex buffers does
not belong to core anymore.
|
|
They provide PIPE_FORMAT_x to GEN6_FORMAT_x translation as well as some
convenient helpers. Move them out of core.
|
|
Check if a surface format can be used for the specified access type.
|
|
Check if a surface format can be used as a VE format.
|
|
Use the newly-introduced NV_VRAM_DOMAIN() macro to support alternative
VRAM domains for chips that do not have dedicated video memory.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Martin Peres <martin.peres@free.fr>
|
|
Some GPUs (e.g. GK20A, GM20B) do not embed VRAM of their own and use
the system memory as a backend instead. For such systems, allocating
objects in VRAM results in errors since the kernel will not allow
VRAM objects allocations.
This patch adds a vram_domain member to struct nouveau_screen that can
optionally be initialized to an alternative domain to use for VRAM
allocations. If left untouched, NOUVEAU_BO_VRAM will be used for
systems that embed VRAM, and NOUVEAU_BO_GART will be used for VRAM-less
systems.
Code that uses GPU objects is then expected to use the NV_VRAM_DOMAIN()
macro in place of NOUVEAU_BO_VRAM to ensure correct behavior on
VRAM-less chips.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Martin Peres <martin.peres@free.fr>
|
|
Replace gen6_idrt_data with ilo_state_compute, which has a bunch of
validations and is now preferred.
|
|
This fixes a regression in that r600 stopped working when
sampler views were pushed.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
For query_levels, we generate a getinfo with writemask of (z), which RA
will consider as size==3. But we were still generating four fanouts.
Which meant that RA would see it as two different register classes,
depending on the path to definer. Ie. on the getinfo instruction itself
it would see size==3, but when chasing back through the fanouts it would
see size==4.
Easiest way to solve that is to just generate the chain of neighboring
fanouts to have the correct size in the first place.
Note: we may eventually want split_dest() to take start/end or wrmask
instead, since really we only need size==1. But RA is not clever enough
for that, query_levels is not that common, and the other two registers
that get allocated are never used so those register slots can be
immediately re-used. So bunch of work for probably no real gain.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
Seems like a4xx gets this right.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
We get this information from NIR (which gets it from sview decl in tgsi
when translating from tgsi), so no need to maintain shader variants for
this.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
This shuffles things around to allow the shader to have multiple basic
blocks. We drop the entire CFG structure from nir and just preserve the
blocks. At scheduling we know whether to schedule conditional branches
or unconditional jumps at the end of the block based on the # of block
successors. (Dropping jumps to the following instruction, etc.)
One slight complication is that variables (load_var/store_var, ie.
arrays) are not in SSA form, so we have to figure out where to put the
phi's ourself. For this, we use the predecessor set information from
nir_block. (We could perhaps use NIR's dominance frontier information
to help with this?)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
Without this, negative branch/jump offsets look like very large positive
offsets.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|