~currojerez/mesa - Unnamed repository; edit this file to name it for gitweb.

Age	Commit message (Collapse)	Author	Files	Lines
2019-08-12	anv/gen9: Optimize slice and subslice load balancing behavior.jenkins-vk	Francisco Jerez	4	-0/+109
	See "i965/gen9: Optimize slice and subslice load balancing behavior." for the rationale. According to Jason, improves Aztec Ruins performance by 2.7%. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1) v2: Undo CPU performance micro-optimization done in i965 and iris due to lack of data justifying it on anv. Use cmd_buffer_apply_pipe_flushes wrapper instead of emitting pipe control command directly. (Jason)
2019-08-12	iris/gen9: Optimize slice and subslice load balancing behavior.	Francisco Jerez	5	-0/+110
	See "i965/gen9: Optimize slice and subslice load balancing behavior." for the rationale. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-12	intel/genxml: Add GT_MODE hashing defs for Gen9.	Francisco Jerez	1	-0/+17
	Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-12	i965/gen9: Optimize slice and subslice load balancing behavior.	Francisco Jerez	5	-6/+109
	The default pixel hashing mode settings used for slice and subslice load balancing are far from optimal under certain conditions (see the comments below for the gory details). The top-of-the-line GT4 parts suffer from a particularly severe performance problem currently due to a subslice load balancing issue. Fixing this seems to improve graphics performance across the board for most of the benchmarks in my test set, up to ~20% in some cases, e.g. from SKL GT4: unigine/valley: 3.44% ±0.11% gfxbench/gl_manhattan31: 3.99% ±0.13% gputest/pixmark_piano: 7.95% ±0.33% synmark/OglTexFilterAniso: 15.22% ±0.07% synmark/OglTexMem128: 22.26% ±0.06% Lower-end platforms are also affected by some subslice load imbalance to a lesser degree, especially during CCS resolve and fast clear operations, which are handled specially here due to rasterization ocurring in reduced CCS coordinates, which changes the semantics of the pixel hashing mode settings. No regressions seen during my tests on some SKL, KBL and BXT configurations. Additional benchmark reports welcome on any Gen9 platforms (that includes anything with Skylake, Broxton, Kabylake, Geminilake, Coffeelake, Whiskey Lake, Comet Lake or Amber Lake in your renderer string). P.S.: A similar problem is likely to be present on other non-Gen9 platforms, especially for CCS resolve and fast clear operations. Will follow-up with additional patches fixing the hashing mode for those once I have enough performance data to justify it. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-12	pan/midgard: Handle 64-bit address in mir_mask_of_read_components	Alyssa Rosenzweig	1	-1/+36
	This is a bit of a hack, but it'll hold us over until we have 64-bit support wired through. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12	pan/midgard: Allocate separate spill indices for lowered moves	Alyssa Rosenzweig	1	-6/+4
	This helps RA be slightly more reasonable. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12	pan/midgard: Extend liveness analysis to trinary ops	Alyssa Rosenzweig	1	-6/+2
	Fixes RA fails with multiple indirect SSBO writes. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12	pan/midgard: Fix load/store pairing	Alyssa Rosenzweig	1	-9/+6
	This used a delicate hack to try to find indirect inputs and skip them as candidates for pairing. Let's use a better criterion -- no sources -- and pair based on that. We could do better, but that would require more complex data flow analysis than we're interested in doing here. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12	pan/midgard: Implement nir_intrinsic_load_num_work_groups	Alyssa Rosenzweig	5	-0/+21
	Just a sysval to route through. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12	pan/midgard: Implement some compute builtins	Alyssa Rosenzweig	1	-0/+28
	We implement gl_WorkGroupID and gl_LocalInvocationID, which map to ld_compute_id with special sources. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12	pan/midgard: Rename ld_global_id -> ld_compute_id	Alyssa Rosenzweig	2	-3/+3
	It's used for more general loads within a compute shader. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12	pan/midgard: Handle partial writes in liveness analysis	Alyssa Rosenzweig	1	-9/+5
	This allows liveness analysis within a loop to be more fine grained, fixing RA failures with partial spilled movs within a loop, as well as enabling a slight reduction of register pressure more generally: total registers in shared programs: 350 -> 347 (-0.86%) registers in affected programs: 12 -> 9 (-25.00%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 25.00% max: 25.00% x̄: 25.00% x̃: 25.00% Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12	pan/midgard: Dump "no spill"?	Alyssa Rosenzweig	1	-0/+3
	Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12	pan/midgard: Absorb nonexistance sources	Alyssa Rosenzweig	1	-0/+5
	Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12	pan/midgard: Pretty-print destinations	Alyssa Rosenzweig	1	-5/+6
	They're not "sources" but they follow the same conventions. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12	pan/midgard: Pretty-print units	Alyssa Rosenzweig	1	-1/+24
	Since we are seeing some use of MIR post-scheduling, let's get this printed right. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12	pan/midgard: Print mask in dumped MIR	Alyssa Rosenzweig	1	-1/+19
	Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12	pan/midgard: Add no_spill flag	Alyssa Rosenzweig	2	-6/+15
	Hint for the RA to avoid infinite spilling loops. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12	pan/midgard: Generalize mir_mask_of_read_components	Alyssa Rosenzweig	1	-11/+24
	This now works for load/store and texture instructions as well as ALU. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12	pan/midgard: Implement SSBO access	Alyssa Rosenzweig	2	-11/+115
	Just laying the groundwork. Reads and writes should be supported (both direct and indirect, either int or float, vec1/2/3/4), but no bounds checking is done at the moment. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12	pan/midgard: Pipe uniform mask through when spilling	Alyssa Rosenzweig	2	-2/+30
	This is a corner case that happens a lot with SSBOs. Basically, if we only read a few components of a uniform, we need to only spill a few components or otherwise we try to spill what we spilled and RA hangs. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12	pan/midgard: Clamp sysval component count	Alyssa Rosenzweig	2	-5/+9
	We don't want to load a 128-bit sysval when 64-bits will do. Fixes RA failures with SSBO indirect writes. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12	pan/midgard: Pass uploaded midgard_instruction through	Alyssa Rosenzweig	2	-5/+7
	We want to edit it after emission in some cases. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12	pan/midgard: Allow sysval destination override	Alyssa Rosenzweig	2	-4/+10
	Sometimes a sysval is used to facilitate an instruction but is not the instruction itself. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12	panfrost: Force flush every compute job	Alyssa Rosenzweig	1	-0/+2
	This is of course suboptimal for performance, forcing each glDispatchCompute call to be submitted separately to the kernel and finish to completion. However, for the initial bring-up of compute jobs, this simplifies quite a bit. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12	panfrost: Add SSBO system value	Alyssa Rosenzweig	3	-0/+38
	For each SSBO index we get from Gallium/NIR, we need two pieces of information in the shader: 1. The address of the SSBO in GPU memory. Within the shader, we'll be accessing it with raw memory load/store, so we need the actual address, not just an index. 2. The size of the SSBO. This is not strictly necessary, but at some point, we may like to do bounds checking on SSBO accesses. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12	gallium/util: Add u_stream_outputs_for_vertices helper	Alyssa Rosenzweig	1	-0/+19
	This u_prim.h helper determines the number of outputs for stream output, given a particular primitive type and a vertex count. This is useful for statically calculating sizes of stream output buffers (i.e. when there is no geometry/tessellation shader in use). This helper will be used in Panfrost's transform feedback implementation, as you can probably guess since why else would I be submitting it.... See also dEQP's getTransformFeedbackOutputCount routine. v2: Simplify definition using new helpers, which also extends to non-ES2 primitive types (Eric). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-12	radeonsi: remove the always_nir option	Marek Olšák	4	-6/+2
	tgsi_to_nir is no longer optional if NIR is enabled.
2019-08-12	radeonsi/nir: implement default tess level system values	Marek Olšák	3	-18/+45
	Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-12	compiler: add SYSTEM_VALUE_TESS_LEVEL_OUTER/INNER_DEFAULT	Marek Olšák	4	-0/+20
	TCS system values for internal passthru TCS, needed by radeonsi NIR support Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-12	gallium: add TGSI_SEMANTIC_DEFAULT_OUTER/INNER_LEVEL	Marek Olšák	5	-12/+19
	for radeonsi NIR support.
2019-08-12	tgsi_to_nir: handle tess level inner/outer varyings	Marek Olšák	1	-0/+7
	for internal radeonsi shaders Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-12	tgsi_to_nir: add support for the stencil FS output	Marek Olšák	1	-5/+12
	Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-12	tgsi_to_nir: add support for TEX_LZ	Marek Olšák	1	-2/+9
	Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-12	compiler: add SYSTEM_VALUE_USER_DATA_AMD	Marek Olšák	7	-0/+23
	for internal radeonsi shaders
2019-08-12	compiler: add shader_info.cs.user_data_components_amd	Marek Olšák	3	-0/+5

2019-08-12	tgsi_to_nir: add basic compute shader support	Marek Olšák	1	-0/+23
	Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-12	tgsi_to_nir: add support for LOAD & STORE with SSBOs and images	Marek Olšák	1	-2/+310
	Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-12	tgsi_to_nir: make setup_texture_info reusable	Marek Olšák	1	-36/+48
	Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-12	tgsi_to_nir: add support for TXF_LZ	Marek Olšák	1	-4/+13
	Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-12	compiler: add shader_info.vs.blit_sgprs_amd	Marek Olšák	3	-0/+12
	for internal radeonsi shaders
2019-08-12	tgsi_to_nir: be careful about not losing any TGSI properties silently (v2)	Marek Olšák	1	-1/+48
	v2: squash with Timur Kristof's commit Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-12	tgsi/scan: don't set GS_INVOCATIONS for all shader stages	Marek Olšák	1	-1/+3
	Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-12	compiler: add ACCESS_STREAM_CACHE_POLICY	Marek Olšák	2	-0/+6
	radeonsi will use this. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-12	gallium: add AMD-specific compute TGSI enums	Marek Olšák	6	-11/+9
	for tgsi_to_nir
2019-08-12	gallium: add TGSI_PROPERTY_VS_BLIT_SGPRS_AMD for tgsi_to_nir	Marek Olšák	7	-15/+14
	needed by radeonsi NIR support
2019-08-12	st/mesa: don't allocate mipmapped texture for NEAREST_MIPMAP_LINEAR	Marek Olšák	1	-0/+12
	Reviewed-by: Brian Paul <brianp@vmware.com>
2019-08-12	glsl: Optimize the SoftFP64 shader when first creating it.	Kenneth Graunke	1	-0/+13
	By optimizing the shader before inlining, we avoid having to redo this work for each inlined copy of a function. It should also reduce the memory consumption a bit. This cuts the KHR-GL46.arrays_of_arrays_gl.SubroutineFunctionCalls2 runtime by 25% on my Icelake. That test compiles many shaders, which contain large types (dmat4) and division (expensive operations). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-08-12	etnaviv: fix compile warnings in release build	Christian Gmeiner	2	-2/+2
	[27/31] Compiling C object 'src/gallium/drivers/etnaviv/df32d18@@etnaviv@sta/etnaviv_compiler_nir.c.o'. In file included from ../../src/gitlab_mesa/src/gallium/drivers/etnaviv/etnaviv_compiler_nir.c:552: ../../src/gitlab_mesa/src/gallium/drivers/etnaviv/etnaviv_compiler_nir_emit.h: In function 'ra_assign': ../../src/gitlab_mesa/src/gallium/drivers/etnaviv/etnaviv_compiler_nir_emit.h:903:9: warning: unused variable 'ok' [-Wunused-variable] bool ok = ra_allocate(g); ^~ ../../src/gitlab_mesa/src/gallium/drivers/etnaviv/etnaviv_compiler_nir.c: In function 'etna_compile_shader_nir': ../../src/gitlab_mesa/src/gallium/drivers/etnaviv/etnaviv_compiler_nir.c:663:9: warning: unused variable 'ok' [-Wunused-variable] bool ok = emit_shader(c->nir, &options, &v->num_temps, &num_consts); ^~ Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Jonathan Marek <jonathan@marek.ca>
2019-08-12	radv: Do not setup attachments without a framebuffer.	Bas Nieuwenhuizen	1	-3/+5
	Test that found this: dEQP-VK.geometry.layered.1d_array.secondary_cmd_buffer Fixes: 49e6c2fb78c "radv: Store color/depth surface info in attachment info instead of framebuffer." Reviewed-by: Dave Airlie <airlied@redhat.com>