~nh/mesa - nh's Mesa repository; mostly radeonsi related development

Age	Commit message (Collapse)	Author	Files	Lines
2017-07-31	radeonsi: add enable_sisched driconf optiondriconf	Nicolai Hähnle	1	-0/+4
	Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31	gallium: add pipe_screen_config to screen_create functions	Nicolai Hähnle	1	-2/+2
	This allows a more generic mechanism for passing user configurations into drivers by accessing the dri options directly. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31	radeonsi: enable R600_DEBUG=nir for vertex and fragment shaders	Nicolai Hähnle	1	-0/+6
	Also, disable geometry and tessellation shaders. Mixing and matching NIR and TGSI shaders should work (and I've tested it for the VS/PS interface), but geometry and tessellation requires VS-as-ES/LS, which isn't implemented yet for NIR. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31	radeonsi: implement pipe_screen::get_compiler_options for NIR	Nicolai Hähnle	1	-0/+33
	Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31	gallium: add PIPE_CAP_NIR_SAMPLERS_AS_DEREF	Nicolai Hähnle	1	-0/+1
	Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-26	radeonsi: decrease the number of compiler threads	Marek Olšák	1	-1/+1
	Cc: 17.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-26	radeonsi: fix detection of DRAW_INDIRECT_MULTI on SI	Nicolai Hähnle	1	-2/+2
	The firmware version numbers for SI were wrong. The new numbers are probably too conservative (we don't have a definitive answer by the firmware team), but DRAW_INDIRECT_MULTI has been confirmed to work with these versions on Tahiti (by Gustaw) and on Verde (by myself). While this is technically adding a feature, it's a feature we thought we had for a long time. The change is small enough and we're early enough in the 17.2 release cycle that it should still go in. Reported-by: Gustaw Smolarczyk <wielkiegie@gmail.com> Cc: 17.2 <mesa-stable@lists.freedesktop.org> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-18	radeonsi: add back the USE_MININUM_PRIORITY flag to the low-prio compiler queue	Marek Olšák	1	-1/+2
	Accidentally removed in 9f320e0a387a1009c5218daf130b3b754a3c2800. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17	radeonsi: automatically resize shader compiler thread queues when they are full	Marek Olšák	1	-8/+4
	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17	radeonsi: expose ARB_timer_query unconditionally	Marek Olšák	1	-5/+2
	clock_crystal_freq is always non-zero now. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17	radeonsi: prevent a crash with DBG_CHECK_VM and u_threaded_context	Marek Olšák	1	-4/+6
	by setting PIPE_CONTEXT_DEBUG in the caller Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17	radeonsi/gfx9: add workarounds to avoid VGPR indexing completely	Marek Olšák	1	-6/+19
	For inputs and outputs, indirect indexing is lowered by the GLSL compiler. For temporaries, use alloca and disable the "promote-alloca" pass. In the future, we could switch all codepaths to alloca permanently and just rely on the "promote-alloca" pass. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17	radeonsi: merge si_llvm_get_amdgpu_target into ac_get_llvm_target	Marek Olšák	1	-1/+1
	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-27	radeonsi: move instance divisors into a constant buffer	Marek Olšák	1	-0/+2
	Shader key size: 107 -> 47 Divisors of 0 and 1 are encoded in the shader key. Greater instance divisors are loaded from a constant buffer. The shader code doing the division is huge. Is it something we need to worry about? Does any app use instance divisors >= 2? VS prolog disassembly: s_load_dwordx4 s[12:15], s[0:1], 0x80 ; C00A0300 00000080 s_nop 0 ; BF800000 s_waitcnt lgkmcnt(0) ; BF8C007F s_buffer_load_dword s14, s[12:15], 0x4 ; C0220386 00000004 s_waitcnt lgkmcnt(0) ; BF8C007F v_cvt_f32_u32_e32 v4, s14 ; 7E080C0E v_rcp_iflag_f32_e32 v4, v4 ; 7E084704 v_mul_f32_e32 v4, 0x4f800000, v4 ; 0A0808FF 4F800000 v_cvt_u32_f32_e32 v4, v4 ; 7E080F04 v_mul_hi_u32 v5, v4, s14 ; D2860005 00001D04 v_mul_lo_i32 v6, v4, s14 ; D2850006 00001D04 v_cmp_eq_u32_e64 s[12:13], 0, v5 ; D0CA000C 00020A80 v_sub_i32_e32 v5, vcc, 0, v6 ; 340A0C80 v_cndmask_b32_e64 v5, v6, v5, s[12:13] ; D1000005 00320B06 v_mul_hi_u32 v5, v5, v4 ; D2860005 00020905 v_add_i32_e32 v6, vcc, v5, v4 ; 320C0905 v_subrev_i32_e32 v4, vcc, v5, v4 ; 36080905 v_cndmask_b32_e64 v4, v4, v6, s[12:13] ; D1000004 00320D04 v_mul_hi_u32 v5, v4, v1 ; D2860005 00020304 v_add_i32_e32 v4, vcc, s8, v0 ; 32080008 v_mul_lo_i32 v6, v5, s14 ; D2850006 00001D05 v_add_i32_e32 v7, vcc, 1, v5 ; 320E0A81 v_cmp_ge_u32_e64 s[12:13], v1, v6 ; D0CE000C 00020D01 v_sub_i32_e32 v6, vcc, v1, v6 ; 340C0D01 v_cmp_le_u32_e32 vcc, s14, v6 ; 7D960C0E v_cndmask_b32_e64 v8, 0, -1, s[12:13] ; D1000008 00318280 v_cndmask_b32_e64 v6, 0, -1, vcc ; D1000006 01A98280 v_and_b32_e32 v6, v8, v6 ; 260C0D08 v_cmp_eq_u32_e32 vcc, 0, v6 ; 7D940C80 v_cndmask_b32_e32 v6, v7, v5, vcc ; 000C0B07 v_add_i32_e32 v5, vcc, -1, v5 ; 320A0AC1 v_cmp_eq_u32_e32 vcc, 0, v8 ; 7D941080 v_cndmask_b32_e32 v5, v6, v5, vcc ; 000A0B06 v_add_i32_e32 v5, vcc, s9, v5 ; 320A0A09 v2: set prefer_mono for fetched instance divisors Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-23	gallium/radeon: pass create_screen flags to r600_common_screen_init	Marek Olšák	1	-2/+3
	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22	radeonsi/gfx9: keep reusing the same buffer/address for the gfx9 flush fence	Marek Olšák	1	-0/+18
	instead of using a monotonic suballocator v2: initialize the memory at context creation Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22	radeonsi/gfx9: enable the constant engine	Marek Olšák	1	-4/+1
	I think this kernel commit fixes it: drm/amdgpu:use FRAME_CNTL for new GFX ucode Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22	radeonsi/gfx9: indirect buffers and all CP packets use TC L2	Marek Olšák	1	-2/+4
	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-19	radeonsi/gfx9: disable sparse buffers	Marek Olšák	1	-0/+3
	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-18	radeonsi: reduce overhead for resident textures which need color decompression	Samuel Pitoiset	1	-0/+4
	This is done by introducing a separate list. si_decompress_textures() is now 5x faster. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-18	radeonsi: reduce overhead for resident textures which need depth decompression	Samuel Pitoiset	1	-0/+2
	This is done by introducing a separate list. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14	radeonsi: enable ARB_bindless_texture	Samuel Pitoiset	1	-1/+3
	This has only been tested on RX480. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14	radeonsi: implement ARB_bindless_texture	Samuel Pitoiset	1	-0/+15
	This implements the Gallium interface. Decompression of resident textures/images will follow in the next patches. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14	radeonsi: add a slab allocator for bindless descriptors	Samuel Pitoiset	1	-0/+12
	For each texture/image handles, we need to allocate a new buffer for the bindless descriptor. But when the number of buffers added to the current CS becomes high, the overhead in the winsys (and in the kernel) is important. To reduce this bottleneck, the idea is to suballocate the bindless descriptors using a slab similar to the one used in the winsys. Currently, a buffer can hold 1024 bindless descriptors but this limit is arbitrary and could be changed in the future for some reasons. Once a slab is allocated the "base" buffer is added to a per-context list. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14	gallium: add PIPE_CAP_BINDLESS_TEXTURE	Samuel Pitoiset	1	-0/+1
	Whether bindless texture operations are supported by the underlying driver. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07	radeonsi: clean up decompress blend state names	Marek Olšák	1	-4/+4
	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07	radeonsi: use a compiler queue with a low priority for optimized shaders	Marek Olšák	1	-4/+27
	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07	util/u_queue: add an option to set the minimum thread priority	Marek Olšák	1	-1/+1
	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07	radeonsi: decrease the number of compiler threads to num CPUs - 1	Marek Olšák	1	-1/+4
	Reserve one core for other things (like draw calls). Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-02	gallium: Add a cap to check if the driver supports ARB_post_depth_coverage	Lyude	1	-0/+1
	Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-05-29	radeonsi: fix a crash in si_destroy_context if we fail early	Marek Olšák	1	-1/+2
	Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-25	radeon: rename has_uvd info to has_hw_decode	Leo Liu	1	-1/+1
	Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-22	radeonsi/gfx9: compile shaders with +xnack	Marek Olšák	1	-6/+7
	so that LLVM doesn't allocate SGPRs where XNACK is. Cc: 17.1 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18	radeonsi: do only 1 big CE dump at end of IBs and one reload in the preamble	Marek Olšák	1	-0/+1
	A later commit will only upload descriptors used by shaders, so we won't do full dumps anymore, so the only way to have a complete mirror of CE RAM in memory is to do a separate dump after the last draw call. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17	gallium: add PIPE_CAP_ALLOW_MAPPED_BUFFERS_DURING_EXECUTION	Marek Olšák	1	-0/+1
	for skipping mapped-buffer checking in every GL draw call Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-15	radeonsi: enable threaded_context	Marek Olšák	1	-3/+34
	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-15	gallium/radeon: unwrap a context if we get a wrapped one	Marek Olšák	1	-1/+1
	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-15	radeonsi/gfx9: add support for Raven	Marek Olšák	1	-2/+5
	Cc: 17.1 <mesa-stable@lists.freedesktop.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-10	gallium: add PIPE_CAP_CAN_BIND_CONST_BUFFER_AS_VERTEX	Marek Olšák	1	-0/+1
	The next patch will use it. This is really for svga and GL2-level drivers. Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Brian Paul <brianp@vmware.com>
2017-05-05	radeonsi: drop support for LLVM 3.8	Marek Olšák	1	-14/+7
	LLVM 3.8: - had broken indirect resource indexing - didn't have scratch coalescing - was the last user of problematic v16i8 - only supported OpenGL 4.1 This leaves us with LLVM 3.9 and LLVM 4.0 support for Mesa 17.2. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28	radeonsi: remove VS epilog code, compile VS with PrimID export on demand	Marek Olšák	1	-1/+0
	The use of PrimID in the pixel shader is too rare to deserve such a sizable support code. The initial idea of the VS epilog was to move the clipping code there and remove it based on states, but optimized variants are now used to do that and are easier to support, so the VS epilog has turned out to be not so useful. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28	radeonsi/gfx9: enable OpenGL 4.5	Marek Olšák	1	-5/+0
	Tentatively enable it, expecting the scratch buffer support to be done before the next Mesa release. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-26	radeonsi: disable the TGSI merge registers pass	Samuel Pitoiset	1	-1/+1
	47109 shaders in 29632 tests Totals: SGPRS: 1917364 -> 1916620 (-0.04 %) VGPRS: 1165802 -> 1165202 (-0.05 %) Spilled SGPRs: 1880 -> 1843 (-1.97 %) Spilled VGPRs: 70 -> 65 (-7.14 %) Private memory VGPRs: 1184 -> 1184 (0.00 %) Scratch size: 1312 -> 1308 (-0.30 %) dwords per thread Code Size: 60211356 -> 60192268 (-0.03 %) bytes LDS: 1077 -> 1077 (0.00 %) blocks Max Waves: 428597 -> 428674 (0.02 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 238173 -> 237429 (-0.31 %) VGPRS: 149556 -> 148956 (-0.40 %) Spilled SGPRs: 1263 -> 1226 (-2.93 %) Spilled VGPRs: 25 -> 20 (-20.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 20 -> 16 (-20.00 %) dwords per thread Code Size: 10457904 -> 10438816 (-0.18 %) bytes LDS: 50 -> 50 (0.00 %) blocks Max Waves: 41283 -> 41360 (0.19 %) Wait states: 0 -> 0 (0.00 %) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-26	gallium: add PIPE_SHADER_CAP_TGSI_SKIP_MERGE_REGISTERS	Samuel Pitoiset	1	-0/+1
	Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-14	radeonsi: enable ARB_shader_viewport_layer_array	Nicolai Hähnle	1	-1/+1
	Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-04-14	gallium: add PIPE_CAP_TGSI_TES_LAYER_VIEWPORT	Nicolai Hähnle	1	-0/+1
	Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-04-05	radeonsi: enable ARB_shader_ballot	Nicolai Hähnle	1	-1/+3
	Require LLVM 5.0 or later because LLVM 4.0 is easily fooled into putting the lane select of llvm.amdgcn.readlane into a VGPR and then fails to continue to compile. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05	gallium: add PIPE_CAP_TGSI_BALLOT	Nicolai Hähnle	1	-0/+1
	Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05	radeonsi: enable ARB_sparse_buffer	Nicolai Hähnle	1	-1/+10
	v2: - fill in DRM version requirement - disable on SI due to CP DMA faults Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05	gallium: add sparse buffer interface and capability	Nicolai Hähnle	1	-0/+1
	v2: - explain the resource_commit interface in more detail Reviewed-by: Marek Olšák <marek.olsak@amd.com>