~nh/mesa - nh's Mesa repository; mostly radeonsi related development

Age	Commit message (Collapse)	Author	Files	Lines
2018-01-16	ac_surface: don't apply the 256-byte alignment to staging surfacesuser_stride	Nicolai Hähnle	1	-1/+4
	Having the over-alignment on staging surfaces breaks the user_stride mechanism. This whole thing is a hack. We should really have a generic mechanism for specifying minimum stride alignments. In the meantime, I'm not sure if this breaks radv with GFX6/GFX9 hybrid graphics (e.g., pre-gfx9 on Raven). Cc: Dave Airlie <airlied@redhat.com>
2018-01-16	radeonsi: implement transfer_map with user_stride	Nicolai Hähnle	1	-5/+28
	The stride ends up being aligned by AddrLib in ways that are inconvenient to express clearly, but basically, a stride that is aligned to both 64 pixels and 256 bytes will go through unchanged in practice.
2018-01-16	radeonsi: fix failure paths of r600_texture_transfer_map	Nicolai Hähnle	1	-13/+12
	trans is zero-initialized, but trans->resource is setup immediately so needs to be dereferenced.
2018-01-16	st/dri: implement __DRI_IMAGE_TRANSFER_MAP_USER_STRIDE	Nicolai Hähnle	1	-6/+11

2018-01-16	gallium: add user_stride parameter to pipe_context::transfer_map	Nicolai Hähnle	33	-12/+53
	Allow callers to prescribe a desired stride for a transfer. Drivers are free to ignore this new parameter. There is no new capability because it's unclear how strict requirements on this feature should be expressed.
2018-01-16	gallium: use pipe_transfer_map_box inline helper	Nicolai Hähnle	28	-46/+62
	We will change pipe_context::transfer_map in a subsequent commit. Wrapping it in an inline function makes that subsequent change less noisy.
2018-01-16	dri_interface: add __DRI_IMAGE_TRANSFER_USER_STRIDE	Nicolai Hähnle	1	-3/+13
	Allow the caller to specify the row stride (in bytes) with which an image should be mapped. Note that completely ignoring USER_STRIDE is a valid implementation of mapImage. This is horrible API design. Unfortunately, cros_gralloc does indeed have a horrible API design -- in that arbitrary images should be allowed to be mapped with the stride that a linear image of the same width would have. There is no separate capability bit because it's unclear how stricter requirements should be defined.
2018-01-16	dri_interface: document error behavior of mapImage	Nicolai Hähnle	1	-0/+2
	This function is meant to return NULL on error, unlike some other APIs (such as mmap()), which return MAP_FAILED.
2018-01-15	Revert "ac/shader: gather If TES reads TESSINNER or TESSOUTER"	Samuel Pitoiset	5	-8/+4
	This can't work for two reasons: - TESSINNER/TESSOUTER are shader input values, so never translated to the intrinsic ops - the shader info pass scans the current stage but we want to know in TCS, if TES reads the tess factors. This fixes 6 regressions related to deqp-vk/tessellation/shader_input_output/tess_level_{inner,outer}_XXX_tes This reverts commit 5ba1a61648e2dea96f621a5886ad8b937a471ab4.
2018-01-15	amd/common: fix loading InstanceID for tess on < GFX9	Samuel Pitoiset	1	-2/+1
	InstanceID is in VGPR2, not 1. One more failure that CTS didn't catch up... Reported-by: Alex Smith <asmith@feralinteractive.com> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-15	ac/shader: gather If TES reads TESSINNER or TESSOUTER	Samuel Pitoiset	5	-4/+8
	This shouldn't be scanned in the pipeline. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-01-15	ac: remove ac_shader_variant_info::fs::output_mask	Samuel Pitoiset	2	-3/+0
	Unused. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-01-15	r600/shader: Initialize max_driver_temp_used correctly for the first time	Gert Wollny	1	-0/+1
	Without this initialization the temp registers used in tgsi_declaration may used random indices, and this may result in failing translation from TGSI with an error message "GPR limit exceeded", because the random index is greater then the allowed limit implying that the shader uses more temporary registers then available. Signed-off-by: Gert Wollny <gw.fossdev@gmail.com> Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-01-14	freedreno/ir3: "soft" depth scheduling for SFU instructions	Rob Clark	1	-9/+21
	First try with a "soft" depth, to try to schedule sfu instructions further from their consumers, but fall back to hard depth (which might result in stalling) if nothing else is avail to schedule. Previously the consumer of a sfu instruction could end up scheduled immediately after (since "hard" depth from sfu to consumer would be 0). This works because legalize pass would insert a (ss) sync bit, but it is sub-optimal since it would cause a stall. Instead prioritize other instructions for 4 cycles if they would no cause a nop to be inserted. This minimizes the stalling. There is a slight penalty in general to overall # of instructions in shader (since we could end up needing nop's later due to scheduling the "deeper" sfu consumer later), but ends up being a wash on register pressure. Overall this seems to be worth a 10+% gain in fps. Increasing the "soft" depth of sfu consumer beyond 4 helps a bit in some cases, but 4 seems to be a good trade-off between getting 99% of the gain and not increasing instruction count of shaders too much. It's possible a similar approach could help for tex/mem instructions, but the (sy) sync bit seems to trigger a switch to a different thread- group to hide memory latency (possibly with some limits depending on number of registers used?). Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-01-14	freedreno/a5xx: work around SWAP vs TILE_MODE constraint	Rob Clark	1	-0/+20
	If the blit isn't changing format, but is changing tiling, just lie and call things ARGB (since the exact component order doesn't matter for a tiling blit). Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-01-14	freedreno/a5xx: texture tiling	Rob Clark	16	-25/+339
	Overall a nice 5-10% gain for most games. And more for things like glmark2 texture benchmark. There are some rough edges. In particular, the hardware seems to only support tiling or component swap. (Ie. from hw PoV, ARGB/ABGR/RGBA/ BGRA are all the same format but with different component swap.) For tiled formats, only ARGB is possible. This isn't a big problem for sampling since we also have swizzle state there (and since util_format_compose_swizzles() already takes into account the component order, we didn't use COLOR_SWAP for sampling). But it is a problem if you try to render to a tiled BGRA (for example) surface. The next patch introduces a workaround for blitter, so we can generate tiled textures in ABGR/RGBA/BGRA, but that doesn't help the render- target case. To handle that, I think we'd need to keep track that the tiled format is different from the linear format, which seems like it would get extra fun with sampler views/etc. So for now, disabled by default, enable with FD_MESA_DEBUG=ttile. In practice it works fine for all the games I've tried, but makes piglit grumpy. Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-01-14	freedreno: update generated headers	Rob Clark	6	-26/+35
	Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-01-14	freedreno: add screen->setup_slices() for tex layout	Rob Clark	3	-19/+43
	The rules are sufficiently different for a5xx with tiled textures, so split this out into something that can be implemented per-generation. The a5xx specific implementation will come in a later patch. Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-01-14	r300g: remove double assignment	Grazvydas Ignotas	1	-1/+0
	Trivial. Found by Coccinelle.
2018-01-14	util: use faster zlib's CRC32 implementaion	Grazvydas Ignotas	1	-0/+13
	zlib provides a faster slice-by-4 CRC32 implementation than the traditional single byte lookup one used by mesa. As most supported platforms now link zlib unconditionally, we can easily use it. Improvement for a 1MB buffer (avg MB/s, n=100, zlib 1.2.8): i5-6600K C2D E4500 mesa zlib mesa zlib 443 1443 225% +/- 2.1% 403 1175 191% +/- 0.9% It has been verified the calculation results stay the same after this change. Signed-off-by: Grazvydas Ignotas <notasas@gmail.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-01-14	android,configure,meson: define HAVE_ZLIB	Grazvydas Ignotas	3	-0/+3
	The next change wants to use some optional zlib functionality, however not all platforms currently use zlib. Based on earlier Jordan Justen's patches and their review feedback. Signed-off-by: Grazvydas Ignotas <notasas@gmail.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2018-01-14	util/crc32: don't drop the const qualifier	Grazvydas Ignotas	1	-1/+1
	Signed-off-by: Grazvydas Ignotas <notasas@gmail.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-01-14	ac: add doubles support to isign	Timothy Arceri	1	-7/+18
	Fixes a number of int64 piglit tests, for example: generated_tests/spec/arb_gpu_shader_int64/execution/built-in-functions/fs-sign-i64vec2.shader_test Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-14	ac: add i64_0 and i64_1 to llvm build context	Timothy Arceri	2	-0/+4
	These will be used in the following patch. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-14	ac/nir: fix translation of nir_op_b2i for doubles	Timothy Arceri	1	-3/+9
	V2: just zero-extend the 32-bit value. Fixes a number of int64 piglet tests, for example: generated_tests/spec/arb_gpu_shader_int64/execution/conversion/frag-conversion-explicit-bool-int64_t.shader_test Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-13	ac: fix build error in si_shader	Mauro Rossi	1	-1/+1
	assert() is replaced by unreachable(), to avoid following building error: external/mesa/src/gallium/drivers/radeonsi/si_shader.c:1967:1: error: control may reach end of non-void function [-Werror,-Wreturn-type] } ^ 1 error generated. Fixes: c797cd6 ("ac: add load_patch_vertices_in() to the abi") Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-01-13	radv/radeonsi/nir: lower 64bit flrp	Timothy Arceri	2	-0/+2
	Fixes a bunch of arb_gpu_shader_fp64 piglit tests for example: generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-mix-double-double-double.shader_test Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-12	broadcom/vc5: Use MSF to ignore discards/non-dispatched channels in loops.	Eric Anholt	1	-1/+5
	Prevents potential infinite loops when a non-dispatched or discarded channel never triggers the loop break condition.
2018-01-12	broadcom/vc5: Use XOR instead of SUB for execute flags comparisons.	Eric Anholt	1	-3/+3
	I think this should be equivalent other than power, and it's the kind of comparison we use for nir_op_ieq.
2018-01-12	broadcom/vc5: Also check the update flags for avoiding DCE.	Eric Anholt	1	-1/+5
	I was trying to do a NULL-destination UF, and it got removed.
2018-01-12	broadcom/vc5: Fix up channel swizzling for textures on 4.x.	Eric Anholt	1	-2/+5
	I had 3.x putting swizzling in the texture state only for 16-bit texture returns, and in the shader for 32-bit. This may be due to having mixed up the return channel setup on 3.x back before I had moved it into the compiler. On 4.x, the non-border-color texwrap tests are passing nicely with both 16 and 32-bit returns with swizzling in the texture state.
2018-01-12	broadcom/vc5: Port the draw-time state emission to V3D 4.1.	Eric Anholt	7	-27/+76

2018-01-12	broadcom/vc5: Rename V3D 3.x Flat Shade Action to match v4.x naming.	Eric Anholt	2	-8/+8
	Now that the actions are reused for centroid and nonperspective, give them a more generic name.
2018-01-12	broadcom/vc5: Update pixel center setup for V3D 4.x.	Eric Anholt	1	-2/+12
	The fxcd/fycd instructions now return half-integer pixel centers when not doing sample-rate shading.
2018-01-12	broadcom/vc5: Print the buffer name in simulator overflow checks.	Eric Anholt	1	-2/+4
	Revealed that I was writing past the TSDA, not the Z buffer as I expected.
2018-01-12	broadcom/vc5: Add support for loading varyings in V3D 4.1.	Eric Anholt	6	-17/+13
	The LDVARY signal now writes an arbitrary register, so I took out the magic src register file and replaced it with an instruction with LDVARY set so we have somewhere to hang a QFILE_TEMP destination for register allocation.
2018-01-12	broadcom/vc5: Update state setup for V3D 4.1.	Eric Anholt	7	-14/+206

2018-01-12	broadcom/vc5: Add compiler support for V3D 4.x texturing.	Eric Anholt	7	-6/+283

2018-01-12	broadcom/vc5: Add the new TMU write addresses for V3D 4.x (and r5rep).	Eric Anholt	2	-10/+37
	The V3D 3.x series of TMU writes with meaning depending on the texture type is replaced with writes to specific registers for each texture argument semantic.
2018-01-12	broadcom/vc5: Move V3D 3.3 texturing to a separate file.	Eric Anholt	5	-229/+267
	V3D 4.x texturing changes enough that #ifdefs would just make a mess of it.
2018-01-12	broadcom/vc5: Move V3D 3.3 VPM write setup to a separate file.	Eric Anholt	5	-34/+82
	For V4.1 texturing, I need the V4.1 XML, so the main compiler needs to stop including V3.3 XML.
2018-01-12	broadcom/vc5: Set up depth formats for V3D 4.x.	Eric Anholt	1	-1/+12
	We no longer have the small depth-specific output format enum, and instead depth is just at the end of the output image format enum.
2018-01-12	broadcom/vc5: Always use the RGBA8 formats for RGBX8.	Eric Anholt	1	-3/+7
	The RGBX8 formats were dropped from V3D 4.x, but we don't really need them anyway (we already handle other non-alpha formats by forcing A to 1).
2018-01-12	broadcom/vc5: Move the formats table to per-V3D-version compile.	Eric Anholt	12	-337/+451

2018-01-12	broadcom/vc5: Add support for V3D 4.1 CLIF dumping.	Eric Anholt	5	-17/+57

2018-01-12	broadcom/vc5: Move the body of CLIF dumping to a per-version file.	Eric Anholt	6	-155/+255
	I want the library's entrypoints to still be unversioned, but the actual packet dumping needs to be per-version.
2018-01-12	broadcom/vc5: Use THRSW to enable multi-threaded shaders.	Eric Anholt	9	-81/+311
	This is a major performance boost on all of V3D, but is required on V3D 4.x where shaders are always either 2- or 4-threaded.
2018-01-12	broadcom/vc5: Properly schedule the thread-end THRSW.	Eric Anholt	2	-39/+137
	This fills in the delay slots of thread end as much as we can (other than being cautious about potential TLBZ writes). In the process, I moved the thread end THRSW instruction creation to the scheduler. Once we start emitting THRSWs in the shader, we need to schedule the thread-end one differently from other THRSWs, so having it in there makes that easy.
2018-01-12	broadcom/vc5: Implement GFXH-1684 workaround.	Eric Anholt	4	-0/+20
	Apparently the VPM writes need to be flushed out before we end the shader.
2018-01-12	broadcom/vc5: Port drawing commands to V3D 4.x.	Eric Anholt	9	-20/+93
	This required extending the CL submit ioctl, because the tile alloc/state buffer setup has moved from the BCL to register writes.