summaryrefslogtreecommitdiff
path: root/src/gallium
AgeCommit message (Collapse)AuthorFilesLines
2013-07-18nv50: H.264/MPEG2 decoding support via VP2, available on NV84-NV96, NVA0Ilia Mirkin11-3/+1815
Adds H.264 and MPEG2 codec support via VP2, using firmware from the blob. Acceleration is supported at the bitstream level for H.264 and IDCT level for MPEG2. Known issues: - H.264 interlaced doesn't render properly - H.264 shows very occasional artifacts on a small fraction of videos - MPEG2 + VDPAU shows frequent but small artifacts, which aren't there when using XvMC on the same videos Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2013-07-17gallivm: (trivial) simplify lp_build_cos/lp_build_sin a tiny bitRoland Scheidegger1-7/+6
Use "or" instead of "add" (this is a classic select sequence, which at least newer llvm versions can actually recognize (3.2+?), and the "add" might prevent that - and we really don't want an add instead of an or with avx if it isn't recognized (even without avx logic ops might be cheaper)). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-07-17util/u_format_s3tc: handle srgb formats correctly.Roland Scheidegger2-185/+254
Instead of just ignoring the srgb/linear conversions, simply call the corresponding conversion functions, for all of pack/unpack/fetch, both for float and unorm8 versions (though some don't make a whole lot of sense, i.e. unorm8/unorm8 srgb/linear combinations). Refactored some functions a bit so don't have to duplicate all the code (there's a slight change for packing dxt1_rgb, as there will now be always 4 components initialized and sent to the external compression function so the same code can be used for all, the quite horrid and ad-hoc interface (by now) should always have worked with that). Fixes llvmpipe/softpipe piglit texwrap GL_EXT_texture_sRGB-s3tc. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-07-17r600g/sb: improve alu packing on caymanVadim Girlin2-15/+89
Scheduler/register allocator in r600-sb was developed and optimized on evergreen (VLIW-5) hardware, so currently it's not optimal for VLIW-4 chips. This patch should improve performance on cayman gpus due to better alu packing, but also it tends to increase register usage, so overall positive effect on performance has to be proven by real benchmarks yet. Some results with bfgminer kernel on cayman: source bytecode: 60 gprs, 3905 alu groups, sbcl before the patch: 45 gprs, 4088 alu groups, sbcl with this patch: 55 gprs, 3474 alu groups. Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
2013-07-17r600g/sb: fix handling of new multislot instructions on caymanVadim Girlin3-5/+6
Ex-scalar instructions that became multislot on cayman do replicate result to all channels - handle them similar to DOT4. Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
2013-07-17r600g/sb: fix debug dump code in schedulerVadim Girlin1-4/+5
Update the stale debug code for other changes related to debug output. Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
2013-07-17r600g/sb: fix initial register allocationVadim Girlin1-0/+1
Mark values that are members of the 'same register' constraint as preallocated in ra_init pass, this will prevent incorrect reallocation in scheduler in some cases. Should fix https://bugs.freedesktop.org/show_bug.cgi?id=66713 Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
2013-07-17r600g/sb: move chip & class name functions to sb_contextVadim Girlin4-53/+55
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
2013-07-17r600g/sb: fix handling of PS in source bytecode on caymanVadim Girlin1-0/+5
Actually PS doesn't make sense for cayman and isn't even mentioned in cayman docs, but llvm backend currently uses it in bytecode and, assuming that hw seems to be mostly ok with it, this will allow sb to parse such source bytecode correctly. Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
2013-07-17r600g/sb: Initialize ra_checker member variables.Vinson Lee1-1/+1
Fixes "Uninitialized scalar field" defect reported by Coverity. Signed-off-by: Vinson Lee <vlee@freedesktop.org>
2013-07-17gallium/util: use explicily sized types for {un, }pack_rgba_{s, u}intEmil Velikov2-8/+8
Every function but the above four uses explicitly sized types for their src and dst arguments. Even fetch_rgba_{s,u}int follows the convention. Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Signed-off-by: Marek Olšák <maraeo@gmail.com>
2013-07-17llvmpipe: use MCJIT on ARM and AArch64Kyle McMartin2-2/+9
MCJIT is the only supported LLVM JIT on AArch64 and ARM (the regular JIT has bit-rotted badly on ARM and doesn't exist on AArch64.) Signed-off-by: Kyle McMartin <kyle@redhat.com> Signed-off-by: Dave Airlie <airlied@gmail.com>
2013-07-16llvmpipe: support sRGB framebuffersRoland Scheidegger4-18/+111
Just use the new conversion functions to do the work. The way it's plugged in into the blend code is quite hacktastic but follows all the same hacks as used by packed float format already. Only support 4x8bit srgb formats (rgba/rgbx plus swizzle), 24bit formats never worked anyway in the blend code and are thus disabled, and I don't think anyone is interested in L8/L8A8. Would need even more hacks otherwise. Unless I'm missing something, this is the last feature except MSAA needed for OpenGL 3.0, and for OpenGL 3.1 as well I believe. v2: prettify a bit, use separate function for packing. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-07-15Revert "r300g: allow HiZ with a 16-bit zbuffer"Marek Olšák1-0/+1
This reverts commit 631c631cbf5b7e84e42a7cfffa1c206d63143370. https://bugs.freedesktop.org/show_bug.cgi?id=66921 Cc: mesa-stable@lists.freedesktop.org
2013-07-15r300g/swtcl: fix a lockup in MSAA resolveMarek Olšák1-0/+7
Cc: mesa-stable@lists.freedesktop.org
2013-07-15r300g/swtcl: fix geometry corruption by uploading indices to a bufferMarek Olšák3-45/+31
The splitting of a draw call into several draw commands was broken, because the split sometimes took place in the middle of a primitive. The splitting was supposed to be dealing with the case when there are more indices than the maximum size of a CS. This commit throws that code away and uses a real index buffer instead. https://bugs.freedesktop.org/show_bug.cgi?id=66558 Cc: mesa-stable@lists.freedesktop.org
2013-07-14gallivm: (trivial) use constant instead of exp2f() functionRoland Scheidegger1-2/+3
Some lame compilers can't do exp2f() and as far as I can tell they can't do exp2() (with doubles) neither so instead of providing some workaround for that (wouldn't actually be too bad just replace with pow) and since it is used with a constant only just use the precalculated constant.
2013-07-14ilo: skip 3DSTATE_INDEX_BUFFER when possibleChia-I Wu4-59/+77
When only the offset to the index buffer is changed, we can skip the 3DSTATE_INDEX_BUFFER if we always use 0 for the offset, and add (offset / index_size) to Start Vertex Location in 3DPRIMITIVE.
2013-07-13gallivm: handle srgb-to-linear and linear-to-srgb conversionsRoland Scheidegger6-7/+332
srgb-to-linear is using 3rd degree polynomial for now which should be _just_ good enough. Reverse is using some rational polynomials and is quite accurate, though not hooked into llvmpipe's blend code yet and hence unused (untested). Using a table might also be an option (for srgb-to-linear especially). This does not enable any new features yet because EXT_texture_srgb was already supported via util_format fallbacks, but performance was lacking probably due to the external function call (the table used by the util_format_srgb code may not be all that much slower on its own). Some performance figures (taken from modified gloss, replaced both base and sphere texture to use GL_SRGB instead of GL_RGB, measured on 1Ghz Sandy Bridge, the numbers aren't terribly accurate): normal gloss, aos, 8-wide: 47 fps normal gloss, aos, 4-wide: 48 fps normal gloss, forced to soa, 8-wide: 48 fps normal gloss, forced to soa, 4-wide: 47 fps patched gloss, old code, soa, 8-wide: 21 fps patched gloss, old code, soa, 4-wide: 24 fps patched gloss, new code, soa, 8-wide: 41 fps patched gloss, new code, soa, 4-wide: 38 fps So there's a performance hit but it seems acceptable, certainly better than using the fallback. Note the new code only works for 4x8bit srgb formats, others (L8/L8A8) will continue to use the old util_format fallback, because I can't be bothered to write code for formats noone uses anyway (as decoding is done as part of lp_build_unpack_rgba_soa which can only handle block type width of 32). Compressed srgb formats should get their own path though eventually (it is going to be expensive in any case, first decompress, then convert). No piglit regressions. v2: use lp_build_polynomial instead of ad-hoc polynomial construction, also since keeping both linear to srgb functions for now make sure both are compiled (since they share quite some code just integrate into the same function). v3: formatting fixes and bugfix in the complicated (disabled) linear-to-srgb path. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-07-13gallivm: better support for fast rsqrtRoland Scheidegger2-16/+63
We had to disable fast rsqrt before because it wasn't precise enough etc. However in situations when we know we're not going to need more precision we can still use a fast rsqrt (which can be several times faster than the quite expensive sqrt). Hence introduce a new helper which does exactly that - it is probably not useful calling it in some situations if there's no fast rsqrt available so make it queryable if it's available too. v2: use fast_rsqrt consistently instead of rsqrt_fast, fix indentation, let rsqrt use fast_rsqrt. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-07-13r600g/sb: Initialize ra_constraint::cost.Vinson Lee1-1/+1
Fixes "Uninitialized scalar field" reported by Coverity. Signed-off-by: Vinson Lee <vlee@freedesktop.org>
2013-07-13winsys/radeon: allow a NULL cs pointer in radeon_bo_map to fix a segfaultMarek Olšák1-9/+11
The original idea was that cs=NULL should be allowed here, but we never used NULL until 862f69fbe1e54e0e9a3c439450a14f. This fixes a segfault in CoreBreach.
2013-07-13ilo: move a santiy check into its assert()Chia-I Wu1-5/+2
The compiler does not know that ilo_3d_pipeline_estimate_size() is pure and can be eliminated in a release build in gen6_pipeline_end(). Move the call into the assert().
2013-07-13ilo: mark some states dirty when they are really changedChia-I Wu1-0/+16
The checks may seem redundant because cso_context handles them, but util_blitter does not have access to cso_context.
2013-07-13ilo: clean up ilo_blitter_pipe_begin()Chia-I Wu3-27/+39
Document why certain states need to be saved, and fix a bug when blitting with scissor enabled.
2013-07-12r600g: don't use the CB/DB CP COHER logic on r6xxAlex Deucher1-2/+10
There are hw bugs. Flush and inv event is sufficient. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=66837 Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-12nv30: fix KILL_IF breakageBrian Paul1-1/+1
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66858
2013-07-11gallium: fixup definitions of the rsq and sqrtZack Rusin3-16/+11
GLSL spec says that rsq is undefined for src<=0, but the D3D10 spec says it needs to be a NaN, so lets stop taking an absolute value of the source which completely breaks that behavior. For the gl program we can simply insert an extra abs instrunction which produces the desired behavior there. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>
2013-07-12util/u_format: Comment out half float denormal test case.José Fonseca1-0/+5
So that lp_test_format doesn't fail until we decide what should be done.
2013-07-12gallivm: Eliminate redundant lp_build_select calls.José Fonseca1-12/+2
lp_build_cmp already returns 0 / ~0, so the lp_build_select call is unnecessary. Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-07-12tgsi: rename the TGSI fragment kill opcodesBrian Paul30-105/+103
TGSI_OPCODE_KIL and KILP had confusing names. The former was conditional kill (if any src component < 0). The later was unconditional kill. At one time KILP was supposed to work with NV-style condition codes/predicates but we never had that in TGSI. This patch renames both opcodes: TGSI_OPCODE_KIL -> KILL_IF (kill if src.xyzw < 0) TGSI_OPCODE_KILP -> KILL (unconditional kill) Note: I didn't just transpose the opcode names to help ensure that I didn't miss updating any code anywhere. I believe I've updated all the relevant code and comments but I'm not 100% sure that some drivers had this right in the first place. For example, the radeon driver might have llvm.AMDGPU.kill and llvm.AMDGPU.kilp mixed up. Driver authors should review their code. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-07-12tgsi: fix-up KILP commentsBrian Paul3-10/+8
KILP is really unconditional fragment kill. We've had KIL and KILP transposed forever. I'll fix that next. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-07-12tgsi: exec TGSI_OPCODE_SQRT as a scalar instruction, not vectorBrian Paul1-1/+1
To align with the docs and the state tracker. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-07-12tgsi: use X component of the second operand in exec_scalar_binary()Brian Paul1-1/+1
The code happened to work in the past since the (scalar) src args effectively always have a swizzle of .xxxx, .yyyy, .zzzz, or .wwww so whether you grab the X or Y component doesn't really matter. Just fixing the code to make it look right. Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-07-12os: add os_get_process_name() functionBrian Paul3-0/+133
v2: explicitly test for BSD/APPLE, #warning for unexpected environments.
2013-07-12softpipe: silence some MSVC warningsBrian Paul2-14/+14
2013-07-12hud: silence some MSVC warningsBrian Paul1-8/+8
2013-07-12util: add casts to silence MSVC warnings in u_blit.cBrian Paul1-14/+14
2013-07-12tgsi: s/unsigned/int/ to silence MSVC warningBrian Paul1-1/+1
2013-07-12radeon/uvd: fall back to shader based decoding for MPEG2 on UVD 2.x v2Christian König2-5/+19
UVD 2.x doesn't support hardware decoding of MPEG2, just use shader based decoding for those chipsets. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=66450 v2: fix interlacing as well Signed-off-by: Christian König <christian.koenig@amd.com>
2013-07-11r600g: x/y coordinates must be divided by block dim in dma blitChristoph Bumiller2-4/+16
Note: this is a candidate for the 9.1 branch. Reviewed-by: Marek Olšák <maraeo@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-12r600g/sb: Fix Android build v2Chih-Wei Huang4-7/+8
Add the sb CXX files to the Android Makefile and also stop using some c++11 features. v2 (Vadim Girlin): use &bc[0] instead of bc.begin()
2013-07-11r600g/sb: improve math optimizations v2Vadim Girlin11-47/+435
This patch adds support for some math optimizations that are generally considered unsafe, that's why they are currently disabled for compute shaders. GL requirements are less strict, so they are enabled for for GL shaders by default. In case of any issues with applications that rely on higher precision than guaranteed by GL, 'sbsafemath' option in R600_DEBUG allows to disable them. v2 - always set proper src vector size for transformed instructions - check for clamp modifier in the expr_handler::fold_assoc Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
2013-07-11st/xvmc/tests: avoid non portable error.h functionsJonathan Gray1-5/+8
Signed-off-by: Jonathan Gray <jsg@jsg.id.au> Reviewed-by: Christian König <christian.koenig@amd.com>
2013-07-11winsys/intel: build with VISIBILITY_CFLAGSChia-I Wu1-1/+2
There is no public symbol in this winsys.
2013-07-11ilo: reduce PIPE_CAP_MAX_TEXTURE_CUBE_LEVELS to 12Chia-I Wu1-2/+3
So that there are at most (2^22 * 6) texels, lower than the 2^26 limit.
2013-07-11ilo: correctly initialize undefined registers in fsChia-I Wu1-5/+15
Initialize all 4 channels of undefined registers (that is, TEMPs that are used before being assigned) in FS.
2013-07-10radeonsi: Handle TGSI_OPCODE_DDX/Y using local memoryMichel Dänzer4-2/+103
16 more little piglits. Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2013-07-10radeonsi: Handle TGSI_OPCODE_TXDMichel Dänzer1-2/+25
One more little piglit. Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2013-07-10util/u_math: Use xmmintrin.h whenever possible.José Fonseca1-9/+17
It seems __builtin_ia32_ldmxcsr is only available on gcc and only when -msse is used. xmmintrin.h/pmmintrin.h provide portable intrinsics, but these too are only available with gcc when -msse/-msse3 are set. scons build always sets -msse on x86 builds, but autotools doesn't seem to. We could try to get this working on gcc x86 without -msse by emitting assembly, but I believe that in this day and age we really should be building Mesa with -msse and -msse2.