~nh/mesa - nh's Mesa repository; mostly radeonsi related development

Age	Commit message (Collapse)	Author	Files	Lines
2017-09-16	gallium: add LDEXP TGSI instruction and corresponding cap	Nicolai Hähnle	1	-0/+1

2017-09-16	tgsi: infer that dst[1] of DFRACEXP is an integer	Nicolai Hähnle	1	-1/+1

2017-09-16	gallivm: add dst register index to lp_build_tgsi_context::emit_store	Nicolai Hähnle	3	-11/+18

2017-09-13	radeonsi: hard-code pixel center for interpolateAtSample without multisample ↵	Nicolai Hähnle	3	-1/+33
	buffers The GLSL rules for interpolateAtSample are unfortunate: "Returns the value of the input interpolant variable at the location of sample number sample. If multisample buffers are not available, the input variable will be evaluated at the center of the pixel. If sample sample does not exist, the position used to interpolate the input variable is undefined." This fix will fallback to monolithic shader compilation when interpolateAtSample is used without multisampling. One alternative would be to always upload 16 sample positions, filling the buffer up with repetition when the actual number of samples is less, and then ANDing the sample ID with 0xf. However, that punishes all well-behaving users of interpolateAtSample, when in reality, only conformance tests should be affected by the issue. Fixes dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.non_multisample_buffer.* Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13	radeonsi: apply a mask to gl_SampleMaskIn in the PS prolog	Nicolai HÃÂ¤hnle	3	-5/+76
	gl_SampleMaskIn is supposed to contain set bits only for the samples that are covered by the current fragment shader invocation, but the VGPR initialization hardware loads the set of all bits that are covered at the current pixel. Fixes various tests in dEQP-GLES31.functional.shaders.sample_variables.sample_mask_in.* Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13	radeonsi: rename variable to clarify its meaning	Nicolai Hähnle	1	-10/+10
	Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13	radeonsi: make si_init_shader_selector_async static	Nicolai Hähnle	2	-2/+1
	Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13	radeonsi: fix segfault in descriptor dumping	Nicolai Hähnle	1	-0/+18
	Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-11	radeonsi: optimize TCS epilog when invocation 0 writes tess factors	Marek Olšák	4	-28/+89
	This removes the barrier and LDS stores and loads for tess factors when it's possible. The removal of the barrier seems more important to me though. In one shader, it removes 17 * 4 bytes from the shader binary. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-08	radeonsi: move the guts of ARB_shader_group_vote emission to ac	Connor Abbott	1	-21/+3
	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-09-08	radeonsi: move si_emit_ballot() to ac	Connor Abbott	1	-32/+6
	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-09-08	radeonsi: move emit_optimization_barrier() to ac	Connor Abbott	1	-43/+2
	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-09-08	radeonsi: move llvm_get_type_size() to ac	Connor Abbott	1	-34/+9
	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-09-07	ac/surface: add radeon_surf::has_stencil for convenience	Marek Olšák	3	-6/+6
	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-07	radeonsi: don't read tcs_out_lds_layout.patch_stride from an SGPR	Marek Olšák	1	-6/+14
	Same as before, writing TCS outputs to LDS is rare. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-07	radeonsi: don't read tcs_out_lds_layout.vertex_size from an SGPR	Marek Olšák	3	-6/+20
	TCS outputs are usually not written to LDS, so no stats here. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-07	radeonsi/gfx9: don't read LS out vertex stride from an SGPR in monolithic HS	Marek Olšák	2	-1/+11
	-44 bytes in a monolithic LS-HS binary. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-07	radeonsi: don't read the LS output vertex stride from an SGPR in LS	Marek Olšák	1	-4/+21
	Now it's able to generate ds_write2_b64 instead of ds_write2_b32. -20 bytes in one shader binary. (having only 1 output) Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-07	radeonsi: don't read the number of TCS out vertices from an SGPR in TCS	Marek Olšák	1	-2/+15
	-16 bytes in one shader binary. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-07	radeonsi: don't always apply the PrimID instancing bug workaround on SI	Marek Olšák	1	-1/+1
	It looks like commit 391673af7ad1565a5f6ac8fc2f8c9fcdd1fe9908 that should have fixed the perf regression didn't really change much if anything. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-07	radeonsi: remove 2 callbacks from si_shader_context	Marek Olšák	3	-17/+13
	Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-06	radeonsi/gfx9: proper workaround for LS/HS VGPR initialization bug	Nicolai Hähnle	5	-24/+85
	When the HS wave is empty, the hardware writes the LS VGPRs starting at v0 instead of v2. Workaround by shifting them back into place when necessary. For simplicity, this is always done in the LS prolog. According to the hardware team, this will be fixed in future chips, so take that into account already. Note that this is not a bug fix, as the bug was already worked around by commit 166823bfd26 ("radeonsi/gfx9: add a temporary workaround for a tessellation driver bug"). This change merely replaces the workaround by one that should be better. v2: add workaround code to shader only when necessary v3: clarify the prefer_mono comment Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-06	amd/common: pass chip_class to ac_dump_reg	Nicolai Hähnle	1	-15/+30
	Acked-by: Marek Olšák <marek.olsak@amd.com>
2017-09-06	radeonsi/gfx9: always flush DB metadata on framebuffer changes	Nicolai Hähnle	3	-4/+14
	This fixes GL45-CTS.shader_image_load_store.basic-glsl-earlyFragTests. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-05	radeonsi/gfx9: implement primitive binning	Marek Olšák	8	-7/+485
	This increases performance, but it was tuned for Raven, not Vega. We don't know yet how Vega will perform, hopefully not worse. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-05	radeonsi: add more state flags into si_state_dsa	Marek Olšák	2	-1/+23
	3 flags for primitive binning, 2 flags for out-of-order rasterization (but that will be done some other time) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-05	radeonsi/gfx9: don't use BREAK_BATCH and FLUSH_DFSM if DFSM is disabled	Marek Olšák	2	-3/+4
	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-04	radeonsi: eliminate PS color outputs when colormask kills them	Marek Olšák	3	-0/+6
	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-04	gallium/radeon: sort DBG shader flags according to pipe_shader_type	Marek Olšák	1	-3/+2
	Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-04	radeonsi: ensure cache flushes happen before SET_PREDICATION packets	Nicolai Hähnle	1	-5/+10
	The data is read when the render_cond_atom is emitted, so we must delay emitting the atom until after the flush. Fixes: 0fe0320dc074 ("radeonsi: use optimal packet order when doing a pipeline sync") Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-04	radeonsi: fix ARB_transform_feedback_overflow_query on <= VI	Nicolai Hähnle	1	-1/+3
	The result written by the shader workaround needs to be written back, or the CP may read stale data. Fixes: 78476cfe071a ("radeonsi: enable ARB_transform_feedback_overflow_query") Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-04	radeonsi: fix compute shader state dumping	Nicolai Hähnle	1	-6/+11
	Fixes: 420c438589c8 ("radeonsi: log draw and compute state into log context") Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-04	radeonsi: add an assertion that only two-dimensional constant references are ↵	Nicolai Hähnle	1	-2/+3
	used v2: remove some redundant checks Acked-by: Roland Scheidegger <sroland@vmware.com> (v1) Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> (v1) Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-09-01	radeonsi: move si_vm_fault_occured() to AMD common code	Samuel Pitoiset	1	-102/+4
	For radv, in order to report VM faults when detected. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-08-30	radeonsi: update dirty_level_mask before dispatching	Samuel Pitoiset	1	-0/+5
	This fixes a rendering issue with Hitman when bindless textures are enabled. Fixes: 2263610827 ("radeonsi: flush DB caches only when transitioning from DB to texturing") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-29	ac/debug: Support multiple trace ids for nested IBs.	Bas Nieuwenhuizen	1	-9/+10
	Signed-off-by: Bas Nieuwenhuizen <basni@google.com> Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-08-29	radeonsi: stop leaking nir	Timothy Arceri	1	-0/+1
	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-28	radeonsi: rewrite late alloc VS limit computation	Marek Olšák	1	-12/+25
	This is still very simple, but it's better than before. Loosely ported from Vulkan.
2017-08-28	radeonsi: correct maximum wave count per SIMD	Marek Olšák	1	-1/+12
	v2: don't special-case Tonga and Iceland. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-27	Revert "radeonsi: get the raster config from AMDGPU on SI"	Marek Olšák	1	-17/+0
	This reverts commit fc99cb3c9edee3af773700cf7ebdc60dc02fcaba. "The performance went down from 64.7 to 51.4 fps in Valley and from 30.8 to 25.1 fps in Heaven on Radeon HD 7970. Other games seem to have also a 10-25% performance decrease." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102429 It looks like we can't use the raster config values from the kernel.
2017-08-25	radeonsi: set IF_THRESHOLD to 4	Timothy Arceri	1	-1/+1
	In 74e39de9324d it was set to 3 and it was reported that 4 caused tesseract to start spilling VGPRs. This no longer seems to be the case. Totals: SGPRS: 2787844 -> 2787764 (-0.00 %) VGPRS: 1713121 -> 1712717 (-0.02 %) Spilled SGPRs: 7532 -> 7532 (0.00 %) Spilled VGPRs: 49 -> 33 (-32.65 %) Private memory VGPRs: 2060 -> 2060 (0.00 %) Scratch size: 2200 -> 2180 (-0.91 %) dwords per thread Code Size: 79265520 -> 79248360 (-0.02 %) bytes LDS: 436 -> 436 (0.00 %) blocks Max Waves: 670535 -> 670608 (0.01 %) Wait states: 0 -> 0 (0.00 %) Before: VGPR SPILLING APPS Shaders SpillVGPR PrivVGPR ScratchSize EffectsCaveDemo 301 0 256 264 ReflectionsSubwayDemo 264 0 256 264 VehicleGame 295 0 128 132 bioshock-infinite 1140 0 448 516 dirt-showdown 453 33 0 28 gang-beasts 364 0 500 496 kerbal-space-program 1228 0 472 480 tomb-raider-ultra 1199 16 0 20 After: VGPR SPILLING APPS Shaders SpillVGPR PrivVGPR ScratchSize EffectsCaveDemo 301 0 256 264 ReflectionsSubwayDemo 264 0 256 264 VehicleGame 295 0 128 132 bioshock-infinite 1140 0 448 516 dirt-showdown 453 33 0 28 gang-beasts 364 0 500 496 kerbal-space-program 1228 0 472 480 The only change in VGPR spills is the elimination of all spills in Tomb Raider at Ultra settings. Closer examination shows that the shaders go over the limit because they contain three expressions a mul, rcp and ubo load. The ubo load is actually used elsewhere and is therefore stored in a temp already in IR such as tgsi but glsl ir counts it agaist the if cost. Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Acked-by: Marek Olšák <marek.olsak@amd.com>
2017-08-25	glsl: pass shader source keys to the disk cache	Timothy Arceri	1	-1/+1
	We don't actually write them to disk here. That will happen in the following commit. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-24	radeonsi: get the raster config from AMDGPU on SI	Marek Olšák	1	-0/+17
	Not sure yet if we wanna do this on CIK and VI too. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-24	radeonsi: clean up setting GRBM_GFX_INDEX	Marek Olšák	1	-19/+22
	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-24	radeonsi: move PA_SC_RASTER_CONFIG emission into a separate function	Marek Olšák	1	-70/+73
	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-23	radeonsi: fix wrong assertion in si_init_bindless_descriptors()	Samuel Pitoiset	1	-1/+1
	Bad mistake, sorry. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-08-23	radeonsi: update comment describing indices into sctx->descriptors	Nicolai Hähnle	1	-6/+5
	Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-23	radeonsi: do not assert when reserving bindless slot 0	Samuel Pitoiset	1	-1/+4
	When assertions were disabled, the compiler removed the call to util_idalloc_alloc() and the first allocated bindless slot was 0 which is invalid per the spec. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-08-23	radeonsi: rename some bindless-related helper functions	Samuel Pitoiset	1	-21/+21
	I think it makes more sense. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-23	radeonsi: minor cleanups in si_make_{texture,image}_handle_resident()	Samuel Pitoiset	1	-12/+12
	Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>