summaryrefslogtreecommitdiff
path: root/src/gallium/drivers/swr
AgeCommit message (Collapse)AuthorFilesLines
2018-01-10swr: Handle indirect indices in GSGeorge Kyriazis1-8/+39
BuilderSWR::swr_gs_llvm_fetch_input() (and consequently swr_gs_llvm_fetch_input()), did not handle the case where is_vindex_indirect or is_aindex_direct is set. Implement it, using the code in draw_llvm.c as a guideline. Fixes the following piglit tests: dynamic_input_array_index (crash) gs-input-array-vec4-index-rd vs-output-array-vec4-index-wr-before-gs Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-10swr/rast: switch win32 jit format to COFFTim Rowley1-2/+2
Allows for call-stack and exception handling for jitted functions. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-10swr/rast: don't use 32-bit gathers for elements < 32-bits in sizeTim Rowley1-1/+60
Using a gather for elements less than 32-bits in size can cause pagefaults when loading the last elements in a page-aligned-sized buffer. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-10swr/rast: autogenerate named structs instead of literal structsTim Rowley1-8/+15
Results in far smaller and useful IR output. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-10swr/rast: SIMD16 fetch shader jitter cleanupTim Rowley1-720/+368
Bake in USE_SIMD16_BUILDER code paths (for USE_SIMD16_SHADER defined), remove USE_SIMD16_BUILDER define, remove deprecated psuedo-SIMD16 code paths. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-10swr/rast: shuffle header files for msvc pre-compiled header usageTim Rowley10-88/+143
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-10swr/rast: SIMD16 builder - cleanup naming (simd2 -> simd16)Tim Rowley5-233/+239
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-08meson: Build SWR driverDylan Baker2-0/+447
This enables the SWR driver, but doesn't actually hook it up to any of the targets yet. I felt like this patch was big and complicated enough without adding that. v2: - Fix typo 'delemeited' -> 'delimited' (Eric E) - Fix type 'errror' -> 'error' (Eric E) - Use variables to hold files instead of looking above the current meson build (Eric E) - Use foreach loops to reduce the number of unique generators - Add comment about why some generators have names and some are just added to a list v3: - Remove trailing whitespace Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
2018-01-04swr/rast: fix invalid sign masks in avx512 simdlib codeTim Rowley3-3/+3
Should be 0x80000000 instead of 0x8000000. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-03swr/rast: fix MemoryBuffer build break for llvm-6Tim Rowley1-0/+4
LLVM api change. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104381 Tested-by: Laurent Carlier <lordheavym@gmail.com> Reviewed-By: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-19gallium: plumb context priority through to driverRob Clark1-0/+1
Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Andres Rodriguez <andresx7@gmail.com> Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
2017-12-18swr: Account for index_bias in offsetsGeorge Kyriazis1-3/+3
When calculating buffer offsets for client buffers account for info.index_bias. Fixes the follow piglit tests: arb_draw_elements_base_vertex-drawelements-user_varrays arb_draw_elements_base_vertex-negative-index-user_varrays Reviewed-By: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15swr/rast: Move more RTAI handling out of binnerTim Rowley2-12/+2
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15swr/rast: EXTRACT2 changed from vextract/vinsert to vshuffleTim Rowley3-61/+32
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15swr/rast: Fix cache of API thread event managerTim Rowley1-1/+1
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15swr/rast: Replace VPSRL with LSHRTim Rowley4-41/+4
Replace use of x86 intrinsic with general llvm IR instruction. Generates the same final assembly. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15swr/rast: Rework thread binding parameters for machine partitioningTim Rowley7-88/+322
Add BASE_NUMA_NODE, BASE_CORE, BASE_THREAD parameters to SwrCreateContext. Add optional SWR_API_THREADING_INFO parameter to SwrCreateContext to control reservation of API threads. Add SwrBindApiThread() function to allow binding of API threads to reserved HW threads. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15swr/rast: Pull of RTAI gather & offset out of clip/bin codeTim Rowley7-146/+203
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15swr/rast: Remove no-op VBROADCAST of vIDTim Rowley1-2/+2
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15swr/rast: SIMD16 Fetch - Fully widen 32-bit integer vertex componentsTim Rowley4-17/+109
Also widen the 16-bit a 8-bit integer vertex component gathers to SIMD16. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15swr/rast: Replace INSERT2 vextract/vinsert with JOIN2 vshuffleTim Rowley3-105/+30
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15swr/rast: SIMD16 Fetch - Fully widen 16-bit float vertex componentsTim Rowley1-7/+48
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15swr/rast: SIMD16 Fetch - Fully widen 32-bit float vertex componentsTim Rowley4-32/+194
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15swr/rast: Pass prim to ClipSimdTim Rowley1-5/+5
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15swr/rast: Pull most of the VPAI manipulation out of the binner/clipperTim Rowley7-158/+177
Move out of binner/clipper; hand them down from the frontend code instead. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15swr/rast: Move GatherScissors to headerTim Rowley2-127/+127
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15swr/rast: Rewrite Shuffle8bpcGatherd using shuffleTim Rowley1-182/+62
Ease future code maintenance, prepare for folding simd8 and simd16 versions. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15swr/rast: Convert gather masks to Nx1bitTim Rowley2-40/+14
Simplifies calling code, gets gather function interface closer to llvm's masked_gather. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15swr/rast: WIP - Widen fetch shader to SIMD16Tim Rowley1-27/+689
Widen vertex gather/storage to SIMD16 for all component types. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15swr/rast: Corrections to multi-scissor handlingTim Rowley1-88/+88
binner's GatherScissors() will be turned into a real gather in the not too distant future. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15swr/rast: Binner fixes for viewport index offset handlingTim Rowley2-2/+12
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15swr/rast: Remove unneeded copy of gather maskTim Rowley2-79/+23
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-13swr: Correct texture allocation and limit max size to 2GBBruce Cherniak2-4/+10
This patch fixes piglit tex3d-maxsize by correcting 4 things: The total_size calculation was using 32-bit math, therefore a >4GB allocation request overflowed and was not returning false (unsupported). Changed AlignedMalloc arguments from "unsigned int" to size_t, to handle >4GB allocations. Added error checking on texture allocations to fail gracefully. Finally, temporarily decreased supported max texture size from 4GB to 2GB. The gallivm texture-sampler needs some additional work to correctly handle larger than 2GB textures (offsets to LLVMBuildGEP are signed). I'm working on a follow-on patch to allow up to 4GB textures, as this is useful in HPC visualization applications. Fixes piglit tex3d-maxsize. v2: Updated patch description to clarify ">4GB". Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
2017-12-13swr: Fix KNOB_MAX_WORKER_THREADS thread creation override.Bruce Cherniak1-2/+1
Environment variable KNOB_MAX_WORKER_THREADS allows the user to override default thread creation and thread binding. Previous commit to adjust linux cpu topology caused setting this KNOB to bind all threads to a single core. This patch restores correct functionality of override. Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2017-12-06swr/scons: Fix another intermittent build failureGeorge Kyriazis1-0/+1
gen_BackendPixelRate*.cpp depends on gen_ar_eventhandler.hpp. Fix missing dependency. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-01swr/scons: Fix intermittent build failureGeorge Kyriazis1-0/+1
gen_rasterizer*.cpp depends on gen_ar_eventhandler.hpp. Account for new dependency. Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2017-11-20swr/rast: Repair simd8 frontend code rotTim Rowley1-1/+1
Keep non-default simd8 frontend code running for comparison purposes. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20swr/rast: Implement AVX-512 GATHERPS in SIMD16 fetch shaderTim Rowley4-29/+220
Disabled for now. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20swr/rast: Simplify GATHER* jit builder apiTim Rowley4-48/+48
General cleanup, and prep work for possibly moving to llvm masked gather intrinsic. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20swr/rast: Add alignment to transpose targetsTim Rowley1-8/+8
Needed to ensure alignment for avx512. Fixes address sanitizer crash. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20swr/rast: Cache eventmanagerTim Rowley3-0/+9
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20swr/rast: Enable AVX-512 targets in the jitterTim Rowley2-10/+0
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20swr/rast: Points with clipdistance can't go through simplepoints pathTim Rowley1-1/+2
Fixes piglit glsl-1.20:vs-clip-vertex-primitives and glsl-1.30:vs-clip-distance-primitives. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20swr/rast: Code style change (NFC)Tim Rowley1-2/+7
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20swr/rast: Widen fetch shader to SIMD16Tim Rowley5-3/+151
Widen fetch shader to SIMD16, enable SIMD16 types in the jitter, and provide utility EXTRACT/INSERT SIMD8 <-> SIMD16 utility functions. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-20swr/rast: Support flexible vertex layout for DS outputTim Rowley2-0/+3
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-11-14swr/rast: Faster emulated simd16 permuteTim Rowley1-23/+11
Speed up simd16 frontend (default) on avx/avx2 platforms; fixes performance regression caused by switch to simdlib. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com> Cc: mesa-stable@lists.freedesktop.org
2017-11-14swr/rast: Use gather instruction for i32gather_ps on simd16/avx512Tim Rowley1-11/+1
Speed up avx512 platforms; fixes performance regression caused by swithc to simdlib. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com> Cc: mesa-stable@lists.freedesktop.org
2017-11-10swr: Fixed an uncommon freed-memory access during state validationBruce Cherniak2-17/+25
State validation is performed during clear and draw calls. Validation during clear was still accessing vertex buffer state. When the currently set vertex buffers are client arrays, this could lead to accessing freed memory. Such is the case with the VMD application. Previously, vertex buffer validation depended on a dirty bit or the draw info indicating an indexed draw. This required special handling for clears. But, vertex buffer validation still occurred which was unnecessary and wrong. Now, only minimal validation is performed during clear, deferring the remainder to the next draw. And, by setting the dirty bit in swr_draw_vbo for indexed draws, vertex buffer validation is only dependent upon a single dirty bit. This fixes a bug exposed by the VMD application when changing models. Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
2017-11-09util: move os_time.[ch] to src/utilNicolai Hähnle2-2/+2
Reviewed-by: Marek Olšák <marek.olsak@amd.com>