summaryrefslogtreecommitdiff
path: root/src/gallium/drivers/radeonsi/si_pipe.c
AgeCommit message (Collapse)AuthorFilesLines
2017-07-31radeonsi: add enable_sisched driconf optiondriconfNicolai Hähnle1-0/+4
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31gallium: add pipe_screen_config to screen_create functionsNicolai Hähnle1-2/+2
This allows a more generic mechanism for passing user configurations into drivers by accessing the dri options directly. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31radeonsi: enable R600_DEBUG=nir for vertex and fragment shadersNicolai Hähnle1-0/+6
Also, disable geometry and tessellation shaders. Mixing and matching NIR and TGSI shaders should work (and I've tested it for the VS/PS interface), but geometry and tessellation requires VS-as-ES/LS, which isn't implemented yet for NIR. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31radeonsi: implement pipe_screen::get_compiler_options for NIRNicolai Hähnle1-0/+33
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31gallium: add PIPE_CAP_NIR_SAMPLERS_AS_DEREFNicolai Hähnle1-0/+1
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-26radeonsi: decrease the number of compiler threadsMarek Olšák1-1/+1
Cc: 17.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-26radeonsi: fix detection of DRAW_INDIRECT_MULTI on SINicolai Hähnle1-2/+2
The firmware version numbers for SI were wrong. The new numbers are probably too conservative (we don't have a definitive answer by the firmware team), but DRAW_INDIRECT_MULTI has been confirmed to work with these versions on Tahiti (by Gustaw) and on Verde (by myself). While this is technically adding a feature, it's a feature we thought we had for a long time. The change is small enough and we're early enough in the 17.2 release cycle that it should still go in. Reported-by: Gustaw Smolarczyk <wielkiegie@gmail.com> Cc: 17.2 <mesa-stable@lists.freedesktop.org> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-18radeonsi: add back the USE_MININUM_PRIORITY flag to the low-prio compiler queueMarek Olšák1-1/+2
Accidentally removed in 9f320e0a387a1009c5218daf130b3b754a3c2800. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17radeonsi: automatically resize shader compiler thread queues when they are fullMarek Olšák1-8/+4
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17radeonsi: expose ARB_timer_query unconditionallyMarek Olšák1-5/+2
clock_crystal_freq is always non-zero now. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17radeonsi: prevent a crash with DBG_CHECK_VM and u_threaded_contextMarek Olšák1-4/+6
by setting PIPE_CONTEXT_DEBUG in the caller Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17radeonsi/gfx9: add workarounds to avoid VGPR indexing completelyMarek Olšák1-6/+19
For inputs and outputs, indirect indexing is lowered by the GLSL compiler. For temporaries, use alloca and disable the "promote-alloca" pass. In the future, we could switch all codepaths to alloca permanently and just rely on the "promote-alloca" pass. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17radeonsi: merge si_llvm_get_amdgpu_target into ac_get_llvm_targetMarek Olšák1-1/+1
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-27radeonsi: move instance divisors into a constant bufferMarek Olšák1-0/+2
Shader key size: 107 -> 47 Divisors of 0 and 1 are encoded in the shader key. Greater instance divisors are loaded from a constant buffer. The shader code doing the division is huge. Is it something we need to worry about? Does any app use instance divisors >= 2? VS prolog disassembly: s_load_dwordx4 s[12:15], s[0:1], 0x80 ; C00A0300 00000080 s_nop 0 ; BF800000 s_waitcnt lgkmcnt(0) ; BF8C007F s_buffer_load_dword s14, s[12:15], 0x4 ; C0220386 00000004 s_waitcnt lgkmcnt(0) ; BF8C007F v_cvt_f32_u32_e32 v4, s14 ; 7E080C0E v_rcp_iflag_f32_e32 v4, v4 ; 7E084704 v_mul_f32_e32 v4, 0x4f800000, v4 ; 0A0808FF 4F800000 v_cvt_u32_f32_e32 v4, v4 ; 7E080F04 v_mul_hi_u32 v5, v4, s14 ; D2860005 00001D04 v_mul_lo_i32 v6, v4, s14 ; D2850006 00001D04 v_cmp_eq_u32_e64 s[12:13], 0, v5 ; D0CA000C 00020A80 v_sub_i32_e32 v5, vcc, 0, v6 ; 340A0C80 v_cndmask_b32_e64 v5, v6, v5, s[12:13] ; D1000005 00320B06 v_mul_hi_u32 v5, v5, v4 ; D2860005 00020905 v_add_i32_e32 v6, vcc, v5, v4 ; 320C0905 v_subrev_i32_e32 v4, vcc, v5, v4 ; 36080905 v_cndmask_b32_e64 v4, v4, v6, s[12:13] ; D1000004 00320D04 v_mul_hi_u32 v5, v4, v1 ; D2860005 00020304 v_add_i32_e32 v4, vcc, s8, v0 ; 32080008 v_mul_lo_i32 v6, v5, s14 ; D2850006 00001D05 v_add_i32_e32 v7, vcc, 1, v5 ; 320E0A81 v_cmp_ge_u32_e64 s[12:13], v1, v6 ; D0CE000C 00020D01 v_sub_i32_e32 v6, vcc, v1, v6 ; 340C0D01 v_cmp_le_u32_e32 vcc, s14, v6 ; 7D960C0E v_cndmask_b32_e64 v8, 0, -1, s[12:13] ; D1000008 00318280 v_cndmask_b32_e64 v6, 0, -1, vcc ; D1000006 01A98280 v_and_b32_e32 v6, v8, v6 ; 260C0D08 v_cmp_eq_u32_e32 vcc, 0, v6 ; 7D940C80 v_cndmask_b32_e32 v6, v7, v5, vcc ; 000C0B07 v_add_i32_e32 v5, vcc, -1, v5 ; 320A0AC1 v_cmp_eq_u32_e32 vcc, 0, v8 ; 7D941080 v_cndmask_b32_e32 v5, v6, v5, vcc ; 000A0B06 v_add_i32_e32 v5, vcc, s9, v5 ; 320A0A09 v2: set prefer_mono for fetched instance divisors Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-23gallium/radeon: pass create_screen flags to r600_common_screen_initMarek Olšák1-2/+3
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22radeonsi/gfx9: keep reusing the same buffer/address for the gfx9 flush fenceMarek Olšák1-0/+18
instead of using a monotonic suballocator v2: initialize the memory at context creation Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22radeonsi/gfx9: enable the constant engineMarek Olšák1-4/+1
I think this kernel commit fixes it: drm/amdgpu:use FRAME_CNTL for new GFX ucode Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-22radeonsi/gfx9: indirect buffers and all CP packets use TC L2Marek Olšák1-2/+4
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-19radeonsi/gfx9: disable sparse buffersMarek Olšák1-0/+3
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-18radeonsi: reduce overhead for resident textures which need color decompressionSamuel Pitoiset1-0/+4
This is done by introducing a separate list. si_decompress_textures() is now 5x faster. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-18radeonsi: reduce overhead for resident textures which need depth decompressionSamuel Pitoiset1-0/+2
This is done by introducing a separate list. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14radeonsi: enable ARB_bindless_textureSamuel Pitoiset1-1/+3
This has only been tested on RX480. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14radeonsi: implement ARB_bindless_textureSamuel Pitoiset1-0/+15
This implements the Gallium interface. Decompression of resident textures/images will follow in the next patches. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14radeonsi: add a slab allocator for bindless descriptorsSamuel Pitoiset1-0/+12
For each texture/image handles, we need to allocate a new buffer for the bindless descriptor. But when the number of buffers added to the current CS becomes high, the overhead in the winsys (and in the kernel) is important. To reduce this bottleneck, the idea is to suballocate the bindless descriptors using a slab similar to the one used in the winsys. Currently, a buffer can hold 1024 bindless descriptors but this limit is arbitrary and could be changed in the future for some reasons. Once a slab is allocated the "base" buffer is added to a per-context list. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14gallium: add PIPE_CAP_BINDLESS_TEXTURESamuel Pitoiset1-0/+1
Whether bindless texture operations are supported by the underlying driver. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07radeonsi: clean up decompress blend state namesMarek Olšák1-4/+4
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07radeonsi: use a compiler queue with a low priority for optimized shadersMarek Olšák1-4/+27
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07util/u_queue: add an option to set the minimum thread priorityMarek Olšák1-1/+1
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-07radeonsi: decrease the number of compiler threads to num CPUs - 1Marek Olšák1-1/+4
Reserve one core for other things (like draw calls). Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-02gallium: Add a cap to check if the driver supports ARB_post_depth_coverageLyude1-0/+1
Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-05-29radeonsi: fix a crash in si_destroy_context if we fail earlyMarek Olšák1-1/+2
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-25radeon: rename has_uvd info to has_hw_decodeLeo Liu1-1/+1
Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>
2017-05-22radeonsi/gfx9: compile shaders with +xnackMarek Olšák1-6/+7
so that LLVM doesn't allocate SGPRs where XNACK is. Cc: 17.1 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-18radeonsi: do only 1 big CE dump at end of IBs and one reload in the preambleMarek Olšák1-0/+1
A later commit will only upload descriptors used by shaders, so we won't do full dumps anymore, so the only way to have a complete mirror of CE RAM in memory is to do a separate dump after the last draw call. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-17gallium: add PIPE_CAP_ALLOW_MAPPED_BUFFERS_DURING_EXECUTIONMarek Olšák1-0/+1
for skipping mapped-buffer checking in every GL draw call Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-15radeonsi: enable threaded_contextMarek Olšák1-3/+34
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-15gallium/radeon: unwrap a context if we get a wrapped oneMarek Olšák1-1/+1
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-05-15radeonsi/gfx9: add support for RavenMarek Olšák1-2/+5
Cc: 17.1 <mesa-stable@lists.freedesktop.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-05-10gallium: add PIPE_CAP_CAN_BIND_CONST_BUFFER_AS_VERTEXMarek Olšák1-0/+1
The next patch will use it. This is really for svga and GL2-level drivers. Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Brian Paul <brianp@vmware.com>
2017-05-05radeonsi: drop support for LLVM 3.8Marek Olšák1-14/+7
LLVM 3.8: - had broken indirect resource indexing - didn't have scratch coalescing - was the last user of problematic v16i8 - only supported OpenGL 4.1 This leaves us with LLVM 3.9 and LLVM 4.0 support for Mesa 17.2. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28radeonsi: remove VS epilog code, compile VS with PrimID export on demandMarek Olšák1-1/+0
The use of PrimID in the pixel shader is too rare to deserve such a sizable support code. The initial idea of the VS epilog was to move the clipping code there and remove it based on states, but optimized variants are now used to do that and are easier to support, so the VS epilog has turned out to be not so useful. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-28radeonsi/gfx9: enable OpenGL 4.5Marek Olšák1-5/+0
Tentatively enable it, expecting the scratch buffer support to be done before the next Mesa release. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-26radeonsi: disable the TGSI merge registers passSamuel Pitoiset1-1/+1
47109 shaders in 29632 tests Totals: SGPRS: 1917364 -> 1916620 (-0.04 %) VGPRS: 1165802 -> 1165202 (-0.05 %) Spilled SGPRs: 1880 -> 1843 (-1.97 %) Spilled VGPRs: 70 -> 65 (-7.14 %) Private memory VGPRs: 1184 -> 1184 (0.00 %) Scratch size: 1312 -> 1308 (-0.30 %) dwords per thread Code Size: 60211356 -> 60192268 (-0.03 %) bytes LDS: 1077 -> 1077 (0.00 %) blocks Max Waves: 428597 -> 428674 (0.02 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 238173 -> 237429 (-0.31 %) VGPRS: 149556 -> 148956 (-0.40 %) Spilled SGPRs: 1263 -> 1226 (-2.93 %) Spilled VGPRs: 25 -> 20 (-20.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 20 -> 16 (-20.00 %) dwords per thread Code Size: 10457904 -> 10438816 (-0.18 %) bytes LDS: 50 -> 50 (0.00 %) blocks Max Waves: 41283 -> 41360 (0.19 %) Wait states: 0 -> 0 (0.00 %) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-26gallium: add PIPE_SHADER_CAP_TGSI_SKIP_MERGE_REGISTERSSamuel Pitoiset1-0/+1
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-14radeonsi: enable ARB_shader_viewport_layer_arrayNicolai Hähnle1-1/+1
Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-04-14gallium: add PIPE_CAP_TGSI_TES_LAYER_VIEWPORTNicolai Hähnle1-0/+1
Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-04-05radeonsi: enable ARB_shader_ballotNicolai Hähnle1-1/+3
Require LLVM 5.0 or later because LLVM 4.0 is easily fooled into putting the lane select of llvm.amdgcn.readlane into a VGPR and then fails to continue to compile. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05gallium: add PIPE_CAP_TGSI_BALLOTNicolai Hähnle1-0/+1
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05radeonsi: enable ARB_sparse_bufferNicolai Hähnle1-1/+10
v2: - fill in DRM version requirement - disable on SI due to CP DMA faults Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05gallium: add sparse buffer interface and capabilityNicolai Hähnle1-0/+1
v2: - explain the resource_commit interface in more detail Reviewed-by: Marek Olšák <marek.olsak@amd.com>