summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-01-30meta: Don't write depth when decompressing tex-imagesno_frag_depthTopi Pohjolainen1-1/+1
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-30meta: Don't write depth when generating miptreesTopi Pohjolainen1-1/+1
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-30meta/blit: Compile programs with and without depthTopi Pohjolainen2-5/+11
When color buffers alone are concerned the depth is not needed. No regression on BDW where meta blit is used instead of blorp. I also disabled blorp temporarily for fbo-blits on IVB and saw no regressions there either. I also compared several graphics benchmarks on BDW and saw neither regressions or improvements. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-30meta/blit: Write depth only when asked forTopi Pohjolainen1-2/+3
Implementing an idea from Ken, on i965 the shader program for 2D blits becomes significantly simpler. Before: pln(8) g6<1>F g4<0,1,0>F g2<8,8,1>F { align1 1Q compacted }; pln(8) g7<1>F g4.4<0,1,0>F g2<8,8,1>F { align1 1Q compacted }; send(8) g2<1>UW g6<8,8,1>F sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 1Q }; mov(8) g123<1>F g2<8,8,1>F { align1 1Q compacted }; mov(8) g124<1>F g3<8,8,1>F { align1 1Q compacted }; mov(8) g125<1>F g4<8,8,1>F { align1 1Q compacted }; mov(8) g126<1>F g5<8,8,1>F { align1 1Q compacted }; mov(8) g127<1>F g2<8,8,1>F { align1 1Q compacted }; nop ; sendc(8) null g123<8,8,1>F render RT write SIMD8 LastRT Surface = 0 mlen 5 rlen 0 { align1 1Q EOT }; After: pln(8) g6<1>F g4<0,1,0>F g2<8,8,1>F { align1 1Q compacted }; pln(8) g7<1>F g4.4<0,1,0>F g2<8,8,1>F { align1 1Q compacted }; send(8) g124<1>UW g6<8,8,1>F sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 1Q }; sendc(8) null g124<8,8,1>F render RT write SIMD8 LastRT Surface = 0 mlen 4 rlen 0 { align1 1Q EOT }; v2 (Matt): Removed unintended white-space change Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-30meta/blit: Add plumbing for shaders without depthTopi Pohjolainen4-3/+5
Currently all blit programs are unconditionally compiled with gl_FragDepth. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-29nir/opt_algebraic: Add some constant bcsel reductionsJason Ekstrand1-2/+28
total instructions in shared programs: 5998190 -> 5997603 (-0.01%) instructions in affected programs: 54276 -> 53689 (-1.08%) helped: 293 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29nir/opt_algebraic: Add some boolean simplificationsJason Ekstrand1-4/+5
total instructions in shared programs: 5998321 -> 5998287 (-0.00%) instructions in affected programs: 4520 -> 4486 (-0.75%) helped: 8 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29nir/algebraic: Support specifying variable as constant or by typeJason Ekstrand2-6/+26
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29nir/algebraic: Fail to compile of a variable is used in a replace but not ↵Jason Ekstrand1-0/+7
the search Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29nir/search: Allow for matching variables based on typesJason Ekstrand2-0/+23
This allows you to match on an unknown value but only if it is of a given type. 90% of the uses of this are for matching only booleans, but adding the generality of arbitrary types is no more complex. nir_algebraic.py doesn't handle this yet but that's ok because the C language will ensure that the default type on all variables is void. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29nir/search: Add support for matching unknown constantsJason Ekstrand2-0/+13
There are some algebraic transformations that we want to do but only if certain things are constants. For instance, we may want to replace a * (b + c) with (a * b) + (a * c) as long as a and either b or c is constant. While this generates more instructions, some of it will get constant folded. nir_algebraic.py doesn't handle this yet, but that's ok because the C language will make sure that false is the default for now. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29nir: Add an invalid typeJason Ekstrand1-0/+1
This allows us to indicate a concept of an invalid type. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29gallium/docs: fix docs wrt ARL/ARR/FLRRoland Scheidegger1-10/+8
since the address reg holds integer values, ARL/ARR do an implicit float-to-int conversion, so clarify that. Thus it is also incorrect to say that FLR really does the same as ARL. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-01-29nir: Add variants of some of the comparison simplifications.Eric Anholt1-0/+4
We end up with these from TGSI-to-NIR because the pass generating the comparisons doesn't know if the arg is actually a bool input or not. vc4 results: total instructions in shared programs: 41801 -> 41508 (-0.70%) instructions in affected programs: 4253 -> 3960 (-6.89%) Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-29vc4: Fix point size handling when it's the first output.Eric Anholt1-1/+1
2015-01-29nir: Don't try to to-SSA ALU instructions that are already SSA.Eric Anholt1-0/+3
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-29nir: Fix a bit of broken indentation.Eric Anholt1-1/+1
Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-29nir: Add a couple of helpers for glsl types.Eric Anholt2-1/+16
This will be used by tgsi_to_nir, which needs to get vec4 types for declaring shader input/output variables. v2: Add a missing space. Reviewed-by: Matt Turner <mattst88@gmail.com> (v2) Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-29docs: fix mesa 10.4.3 release dateEmil Velikov1-1/+1
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-01-29Mesa: Advertise GL_OES_texture_*float* extensions support with i965.Kalyan Kondapally1-0/+5
This patch advertises support for GL_OES_texture_*float* extensions when using i965 drivers. Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com> Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-01-29Mesa: Add support for HALF_FLOAT_OES type.Kalyan Kondapally4-6/+52
This patch adds needed support for accepting HALF_FLOAT_OES as valid type for TexImage*D and TexSubImage*D when Texture FLoat extensions are supported. Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com> Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-01-29Mesa: Add support for GL_OES_texture_*float* extensions.Kalyan Kondapally4-0/+132
This patch series adds support for following GLES2 Texture Float extensions: 1)GL_OES_texture_float, 2)GL_OES_texture_half_float, 3)GL_OES_texture_float_linear, 4)GL_OES_texture_half_float_linear. This patch adds basic infrastructure and needed boolean flags to advertise support for these extensions, by default the support is disabled. Next patch in the series introduces support for HALF_FLOAT_OES token. v4: take assert away and make valid_filter_for_float conditional (Tapani), fix the alphabetical order (Emil) Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com> Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-01-28nir: Make vec-to-movs handle src/dest aliasing.Eric Anholt1-10/+72
It now emits vector MOVs instead of a series of individual MOVs, which should be useful to any vector backends. This pushes the problem of src/dest aliasing of channels on a scalar chip to the backend, but if there are any vector operations in your shader then you needed to be handling this already. Fixes fs-swap-problem with my scalarizing patches. v2: Rename to insert_mov(), and add a comment about what it does. v3: Rewrite the comment. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v3)
2015-01-28gallium: Replace u_simple_list.h with util/simple_list.hEric Anholt25-228/+23
The code was exactly the same, except util/ has c++ guards and a struct simple_node declaration. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-01-28mesa: Port a variant of 68afbe89c72d085dcbbf2b264f0201ab73fe339e to util/Eric Anholt1-0/+1
The idea is that after a remove_from_list(), you might want to be able to do a remove_from_list() on it again or an is_empty_list(). This is apparently relied on by r300g. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-01-28mesa: Move simple_list.h to src/util.Eric Anholt31-28/+29
We have two copies of it in the tree, I'm going to delete one. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-01-28radeonsi: Enable VGPR spilling for all shader types v5Tom Stellard8-52/+217
v2: - Only emit write SPI_TMPRING_SIZE once per packet. - Use context global scratch buffer. v3: - Patch shaders using WRITE_DATA packet instead of map/unmap. - Emit ICACHE_FLUSH, CS_PARTIAL_FLUSH, PS_PARTIAL_FLUSH, and VS_PARTIAL_FLUSH when patching shaders. v4: - Code cleanups. - Remove unnecessary multiplies. v5: - Patch shaders in system memory and re-upload to vram. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-28radeonsi/compute: Allocate the scratch buffer during state creationTom Stellard2-24/+62
This moves scratch buffer allocation from si_launch_grid() to si_create_compute_state(). This helps to reduce the overhead of launching a kernel and also fixes a bug in the code that would cause the scratch buffer to be too small if a kernel with smaller scratch size was launched before a kernel with a larger scratch size. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-28radeonsi: Add radeon_shader_binary member to struct si_shaderTom Stellard2-6/+6
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-28radeonsi/compute: Rename si_compute::program to si_compute::shaderTom Stellard1-5/+5
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-28radeonsi: Avoid leaking memory when rebuilding shader statesMarek Olšák3-4/+13
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-28nir/opcodes: Use a return type of tfloat for ldexpJason Ekstrand1-1/+1
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-28Revert "util: Move the alternate fpclassify implementation to util"Jason Ekstrand2-63/+50
This reverts commits d6eb572905e39c36168b8f5da240af961f9dde0a and 58e8468d113c7d3d4a59ea4a8d70fd45b78e85e6. This is no longer necessary as we aren't using it in NIR anymore. Also, it broke the build on some strange systems so let's put it back in querymatrix where it came from. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88852 Acked-by: Matt Turner <mattst88@gmail.com>
2015-01-28Revert "nir/opcodes: Use fpclassify() instead of isnormal() for ldexp"Jason Ekstrand1-1/+1
This reverts commit d7d340fb2f68c46bd5a0008ecf53c6693e29c916. We have an isnormal() implementation available, the only problem was that we had the wrong return type (fixed in a later patch). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88806 Acked-by: Matt Turner <mattst88@gmail.com>
2015-01-28util: Predicate the fpclassify fallback on !defined(__cplusplus)Jason Ekstrand1-2/+12
The problem is that the fallbacks we have at the moment don't work in C++. While we could theoretically fix the fallbacks it would also raise the issue of correctly detecting the fpclassify function. So, for now, we'll just disable it until we actually have a C++ user. Reported-by: Tom Stellard <thomas.stellard@amd.com> Tested-by: Tom Stellard <thomas.stellard@amd.com> Tested-by: EdB <edb+mesa@sigluy.net>
2015-01-28drirc: set allow_glsl_extension_directive_midshader for Dead Island.Sven Arvidsson1-0/+4
Signed-off-by: Sven Arvidsson <sa@whiz.se> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87076 Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-01-28nir/opcodes: Use fpclassify() instead of isnormal() for ldexpJason Ekstrand1-1/+1
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88806 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-01-28util: Move the alternate fpclassify implementation to utilJason Ekstrand2-50/+53
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-01-28i965/tex: Don't create read-write textures with non-renderable formatsJason Ekstrand1-0/+5
I haven't actually seen this bug in the wild, but it's possible that someone could ask to do a S3TC PBO download or something. This protects us from accidentally creating a render target with a compressed or otherwise non-renderable format. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-28i965/gen8: Include the buffer offset when emitting renderbuffer relocsJason Ekstrand1-1/+1
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88792 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-28mesa: improve error messaging for format CSV parserTapani Pälli2-2/+7
Patch adds 2 error messages that point user directly to fix mispelled or impossible swizzle field for a format. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-28clover/llvm: Dump the OpenCL C code earlier.EdB1-3/+3
[ Francisco Jerez: As discussed on the mailing list, this is intended to produce more useful debug output in cases where the compilation terminates unexpectedly. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-01-28clover/llvm: Move CLOVER_DEBUG stuff into anonymous namespace.EdB1-13/+20
[ Francisco Jerez: As we're at it make debug_options[] local to its only user and remove temporary. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-01-28r600g: add support for primitive id without geom shader (v2)Dave Airlie6-1/+51
GLSL 1.50 specifies a fragment shader may have a primitive id input without a geometry shader present. On r600 hw there is a special GS scenario for this, you have to enable GS_SCENARIO_A and pass the primitive id through the vertex shader which operates in GS_A mode. This is a first pass attempt at this, and passes the piglit tests that test for this. v1.1: clean up debug print + no need to assign key value to setup output. v2: add r600 support Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-01-28r600g: move selecting the pixel shader earlier.Dave Airlie1-3/+4
In order to detect that a pixel shader has a prim id input when we have no geometry shader we need to reorder the shader selection so the pixel shader is selected first, then the vertex shader key can take into account the primitive id input requirement and lack of geom shader. Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-01-27st/clover: Pass target instead of target.begin() to std::string()Michel Dänzer1-3/+3
Fixes reading beyond allocated memory: ==1936== Invalid read of size 1 ==1936== at 0x4C2C1B4: strlen (vg_replace_strmem.c:412) ==1936== by 0x9E00C30: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.20) ==1936== by 0x5B44FAE: clover::compile_program_llvm(clover::compat::string const&, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&, pipe_shader_ir, clover::compat::string const&, clover::compat::string const&, clover::compat::string&) (invocation.cpp:698) ==1936== by 0x5B39A20: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63) ==1936== by 0x5B20152: clBuildProgram (program.cpp:182) ==1936== by 0x400F41: main (hello_world.c:109) ==1936== Address 0x56fee1f is 0 bytes after a block of size 15 alloc'd ==1936== at 0x4C28C20: malloc (vg_replace_malloc.c:296) ==1936== by 0x5B398F0: alloc (compat.hpp:59) ==1936== by 0x5B398F0: vector<std::basic_string<char> > (compat.hpp:98) ==1936== by 0x5B398F0: string<std::basic_string<char> > (compat.hpp:327) ==1936== by 0x5B398F0: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63) ==1936== by 0x5B20152: clBuildProgram (program.cpp:182) ==1936== by 0x400F41: main (hello_world.c:109) Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-01-27r600g,radeonsi: Fix calculation of IR target cap string buffer sizeMichel Dänzer1-2/+2
Fixes writing beyond the allocated buffer: ==31855== Invalid write of size 1 ==31855== at 0x50AB2A9: vsprintf (iovsprintf.c:43) ==31855== by 0x508F6F6: sprintf (sprintf.c:32) ==31855== by 0xB59C7EC: r600_get_compute_param (r600_pipe_common.c:526) ==31855== by 0x5B2B7DE: get_compute_param<char> (device.cpp:37) ==31855== by 0x5B2B7DE: clover::device::ir_target() const (device.cpp:201) ==31855== by 0x5B398E0: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63) ==31855== by 0x5B20152: clBuildProgram (program.cpp:182) ==31855== by 0x400F41: main (hello_world.c:109) ==31855== Address 0x56fed5f is 0 bytes after a block of size 15 alloc'd ==31855== at 0x4C29180: operator new(unsigned long) (vg_replace_malloc.c:324) ==31855== by 0x5B2B7C2: allocate (new_allocator.h:104) ==31855== by 0x5B2B7C2: allocate (alloc_traits.h:357) ==31855== by 0x5B2B7C2: _M_allocate (stl_vector.h:170) ==31855== by 0x5B2B7C2: _M_create_storage (stl_vector.h:185) ==31855== by 0x5B2B7C2: _Vector_base (stl_vector.h:136) ==31855== by 0x5B2B7C2: vector (stl_vector.h:278) ==31855== by 0x5B2B7C2: get_compute_param<char> (device.cpp:35) ==31855== by 0x5B2B7C2: clover::device::ir_target() const (device.cpp:201) ==31855== by 0x5B398E0: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63) ==31855== by 0x5B20152: clBuildProgram (program.cpp:182) ==31855== by 0x400F41: main (hello_world.c:109) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-01-26nir: fix a bug with constant folding non-per-component instructionsConnor Abbott1-1/+2
Before, we were only copying the first N channels, where N is the size of the SSA destination, which is fine for per-component instructions, but non-per-component instructions like fdot3 can have more source components than destination components. Fix this using the helper function introduced in the last patch. v2: use new helper name Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-26nir: add a helper function for getting the number of source componentsConnor Abbott1-0/+15
Unlike with non-SSA ALU instructions, where if they're per-component you have to look at the writemask to know which source channels are being used, SSA ALU instructions always have all the possible channels enabled so we can just look at the number of components in the SSA definition for per-component instructions to say how many source components are being used. v2: use new name nir_ssa_alu_instr_src_components() Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-26i965: Implemente a tiled fast-path for glReadPixels and glGetTexImageSisinty Sasmita Patra3-1/+271
Added intel_readpixels_tiled_mempcpy and intel_gettexsubimage_tiled_mempcpy functions. These are the fast paths for glReadPixels and glGetTexImage. On chrome, using the RoboHornet 2D Canvas toDataURL test, this patch cuts amount of time spent in glReadPixels by more than half and reduces the time of the entire test by 10%. v2: Jason Ekstrand <jason.ekstrand@intel.com> - Refactor to make the functions look more like the old intel_tex_subimage_tiled_memcpy - Don't export the readpixels_tiled_memcpy function - Fix some pointer arithmatic bugs in partial image downloads (using ReadPixels with a non-zero x or y offset) - Fix a bug when ReadPixels is performed on an FBO wrapping a texture miplevel other than zero. v3: Jason Ekstrand <jason.ekstrand@intel.com> - Better documentation fot the *_tiled_memcpy functions - Add target restrictions for renderbuffers wrapping textures v4: Jason Ekstrand <jason.ekstrand@intel.com> - Only check the return value of brw_bo_map for error and not bo->virtual v5: Jason Ekstrand <jason.ekstrand@intel.com> - Don't unnecessarily repeat a comment Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Chad Versace <chad.versace@intel.com>