summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2014-12-30i965: Reorganize miptree mappingreorg-array-optv2Ben Widawsky2-35/+234
The goal was to make all the decisions for which type of mapping very obvious. There are a couple of tricky conditions which aren't obvious in the existing if ladder. It wasn't my direct intention to drastically alter any of the decisions, however, read below. NOTE: The code could have been fixed in two ways - either add comments and create some const variables to clarify things (ie. tiled = tiling != NONE), or the more convoluted way that I did it. Initially I went with the former, and abandoned it for the latter, which seemed to be both more elegant, and clear, until the code was written. The way this patch reorganizes things allows a clear split between what is necessary versus what is optimal. I really don't like how the patch came out. I wasn't planning to send it, but then I discovered it actually improves performance on LLC platforms. I initially developed the patch to help debug a perf regression on LLC platforms with my earlier blit optimizations, and accidentally found this. Honestly, I'm therefore not really sure what to do. I'm sick enough of looking at this that I do not wish to figure out exactly which path is improving things. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-12-30i965: Use map->mode instead of modeBen Widawsky1-2/+2
We get mode embedded in the structure when we attach the map. This patch is simply to prep for the next patch. No intentional functional change/trivial Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-12-30i915/i965: Rename map_gttBen Widawsky2-22/+24
It doesn't actually do a "gtt" mapping, and it can get extremely confusing to name it otherwise. Use the common __ idiom for the helper function instead. To keep the symmetry unmap is also changed even though it could be entirely removed and replaced with unmap_raw. Cc: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-12-27i965: Allow intel_try_pbo_upload for 3D and array texturestexture-array-optv2Neil Roberts3-39/+88
I just realised I made regular cube map textures stop working via the blit path with this patch. Here is a v2 which just adds GL_TEXTURE_CUBE_MAP to the switch in intel_try_pbo_upload. I've tested that it still works with a hacky tweak to the piglit test case. ------- >8 --------------- (use git am --scissors to automatically chop here) intel_try_pbo_upload now iterates over each slice of the uploaded data and and does a separate blit for each image. This copies in some fiddly details store_texsubimage in order to handle the image stride correctly for 1D array textures. Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
2014-12-27i965: Allow GL_UNPACK_SKIP_ROWS/PIXELS in intel_try_pbo_uploadNeil Roberts1-1/+6
This should just be a simple case of adding the skip values to the src offset so we can trivially implement it. Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
2014-12-27i965: Use try_pbo_upload for glTexSubImage* as wellNeil Roberts3-14/+41
There is an existing function to attempt to upload texture data from a PBO via the blit pipeline called try_pbo_upload. However it was only used for the glTexImage* functions. This patches renames it to intel_try_pbo_upload and adds parameters to specify the x/y offsets and size and makes intelTexSubImage use it as well. Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
2014-12-24i965: Allow Y-tiled allocations for large surfacesBen Widawsky1-15/+56
This patch will use a new calculation to determine if a surface can be blitted from or to. Previously, the "total_height" member was used. Total_height in the case of 2d, 3d, and cube map arrays is the height of each slice/layer/face. Since the GL map APIS only ever deal with a slice at a time however, the determining factor is really the height of one slice. This patch also has a side effect of not needing to set potentially large texture objects to the CPU domain, which implies we do not need to clflush the entire objects. (See references below for a kernel patch to achieve the same thing) With both the Y-tiled surfaces, and the removal of excessive clflushes, this improves the terrain benchmark on Cherryview (data collected by Jordan) Difference at 95.0% confidence 17.9236 +/- 0.252116 153.005% +/- 2.1522% (Student's t, pooled s = 0.205889) Jordan was extremely helpful in creating this patch. Consider him co-author. References: http://patchwork.freedesktop.org/patch/38909/ Cc: Jordan Justen <jordan.l.justen@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-12-24i965: Attempt to blit for larger texturesBen Widawsky1-2/+104
The blit engine is limited to 32Kx32K transfer. In cases where we have to fall back to the blitter, and when trying to blit a slice of a 2d texture array, or face of a cube map, we don't need to transfer the entire texture. I doubt this patch will get exercised at this point since we'll always allocate a linear BO for huge buffers. The next patch changes that. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-12-23i965: Add more stringent blitter assertionsBen Widawsky1-0/+3
Blits to or from a y-tiled surface must always be a multiple of the tile size. From page 16 of the HSW PRM (https://01.org/linuxgraphics/sites/default/files/documentation/intel-gfx-prm-osrc-hsw-memory-views.pdf#16) "The pitch of a tiled enclosing region must be an integral number of tile widths" Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-12-23i965: Consolidate some of the intel_blit logicBen Widawsky1-20/+8
An upcoming patch is going to introduce some code here, and having this code organized as the patch does makes it a bit easier to read later. There should be no functional change here. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-12-23i965/hsw: Limit max WM threads to physical limitBen Widawsky1-1/+1
2014-12-23i965/mipmap: disable MCS for sint MSAA buffers on GEN8+Ben Widawsky1-1/+1
Seems this is needed on all GENs. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-12-23i965: Allow other tiling formats for 128 bpp on gen7+Ben Widawsky1-2/+2
The most recent docs I can find say this workaround is needed for Sandybridge only. It says GEN6+ support linear, X, and Y, while GEN6 must be X or Y. commit c189840b21e176d87cbb382e64e848061b8c7b06 Author: Kenneth Graunke <kenneth@whitecape.org> Date: Tue Aug 13 15:03:12 2013 -0700 i965: Force X-tiling for 128 bpp formats on Sandybridge. The above commit has a mailing list discussion about this where Ken said the issue was reintroduced on later GENs. I can't find the evidence to support this anymore, so let's turn it on and hope for the best. Cc: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-12-23glsl: check if implicitly sized arrays match explicitly sized arrays across ↵Timothy Arceri1-1/+20
the same stage V2: Improve error message. Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-12-22i965: Use safer pointer arithmetic in gather_oa_results()Chad Versace1-1/+1
This patch reduces the likelihood of pointer arithmetic overflow bugs in gather_oa_results(), like the one fixed by b69c7c5dac. I haven't yet encountered any overflow bugs in the wild along this patch's codepath. But I get nervous when I see code patterns like this: (void*) + (int) * (int) I smell 32-bit overflow all over this code. This patch retypes 'snapshot_size' to 'ptrdiff_t', which should fix any potential overflow. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2014-12-22i965: Use safer pointer arithmetic in intel_texsubimage_tiled_memcpy()Chad Versace1-3/+4
This patch reduces the likelihood of pointer arithmetic overflow bugs in intel_texsubimage_tiled_memcpy() , like the one fixed by b69c7c5dac. I haven't yet encountered any overflow bugs in the wild along this patch's codepath. But I recently solved, in commit b69c7c5dac, an overflow bug in a line of code that looks very similar to pointer arithmetic in this function. This patch conceptually applies the same fix as in b69c7c5dac. Instead of retyping the variables, though, this patch adds some casts. (I tried to retype the variables as ptrdiff_t, but it quickly got very messy. The casts are cleaner). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2014-12-22i965: Fix intel_miptree_map() signature to be more 64-bit safeChad Versace5-10/+24
This patch should diminish the likelihood of pointer arithmetic overflow bugs, like the one fixed by b69c7c5dac. Change the type of parameter 'out_stride' from int to ptrdiff_t. The logic is that if you call intel_miptree_map() and use the value of 'out_stride', then you must be doing pointer arithmetic on 'out_ptr'. Using ptrdiff_t instead of int should make a little bit harder to hit overflow bugs. As a side-effect, some function-scope variables needed to be retyped to avoid compilation errors. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2014-12-22i965: Remove spurious casts in copy_image_with_memcpy()Chad Versace1-4/+4
If a pointer points to raw, untyped memory and is never dereferenced, then declare it as 'void*' instead of casting it to 'void*'. Signed-off-by: Chad Versace <chad.versace@linux.intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-12-21radeonsi: force NaNs to 0Marek Olšák1-4/+8
This fixes incorrect rendering in Unreal Engine demos. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83510 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-12-21st/nine: fix DBG typo (trivial)David Heidelberg1-1/+1
Signed-off-by: David Heidelberg <david@ixit.cz> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2014-12-21r300g: implement ARR opcodeDavid Heidelberg4-4/+16
Same as ARL, just has extra rounding. Useful for st/nine. Tested-by: Pavel Ondračka <pavel.ondracka@email.cz> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: David Heidelberg <david@ixit.cz> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2014-12-20freedreno/a4xx: blend-colorRob Clark1-0/+13
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-12-20freedreno/a4xx: alpha-testRob Clark1-0/+2
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-12-20freedreno: update generated headersRob Clark6-61/+151
2014-12-20freedreno/ir3: trans_kill cleanupRob Clark1-12/+7
trans_kill() only handles the single opcode. Drop the remnant of a time when both KILL and KILL_IF were handled by the same fxn. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-12-20freedreno/ir3: hack for standalone compilerRob Clark1-1/+5
Standalone compiler doesn't have screen or context. We need to come up with a better way to control the target arch (ie. something that we can control from cmdline w/ standalone compiler) but for now this hack keeps it from segfault'ing. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-12-19i965/fs: Add missing const qualifier.Matt Turner1-1/+1
2014-12-18vc4: Coalesce MOVs into VPM with the instructions generating the values.Eric Anholt4-15/+143
total instructions in shared programs: 41168 -> 40976 (-0.47%) instructions in affected programs: 18156 -> 17964 (-1.06%)
2014-12-17vc4: Redefine VPM writes as a (destination) QIR register file.Eric Anholt3-7/+19
This will let me coalesce the VPM writes into the instructions generating the values.
2014-12-18docs: note change in minimum GCC version to 4.2.0Timothy Arceri1-1/+1
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Matt Turner <mattst88@gmail.com>
2014-12-18gallium: remove support for GCC older than 4.2.0Timothy Arceri1-1/+1
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-12-18mesa: bump required GCC version to 4.2.0Timothy Arceri1-3/+3
It turns out Mesa hasn't compiled on less then 4.2 for a while so update conf to reflect this. Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-12-17vc4: Add support for turning constant uniforms into small immediates.Eric Anholt13-46/+283
Small immediates have the downside of taking over the raddr B field, so you might have less chance to pack instructions together thanks to raddr B conflicts. However, it also reduces some register pressure since it lets you load 2 "uniform" values in one instruction (avoiding a previous load of the constant value to a register), and increases some pairing for the same reason. total uniforms in shared programs: 16231 -> 13374 (-17.60%) uniforms in affected programs: 10280 -> 7423 (-27.79%) total instructions in shared programs: 40795 -> 41168 (0.91%) instructions in affected programs: 25551 -> 25924 (1.46%) In a previous version of this patch I had a reduction in instruction count by forcing the other args alongside a SMALL_IMM to be in the A file or accumulators, but that increases register pressure and had a bug in handling FRAG_Z. In this patch is I just use raddr conflict resolution, which is more expensive. I think I'd rather tweak allocation to have some way to slightly prefer good choices for files in general, rather than risk failing to register allocate by forcing things into register classes.
2014-12-17vc4: Move follow_movs() to common QIR code.Eric Anholt3-11/+12
I want this from other passes.
2014-12-17vc4: Fix missing newline for load immediate instruction disasm.Eric Anholt1-4/+4
2014-12-17mesa: Remove unnecessary -f from $(RM).Matt Turner4-8/+8
$(RM) includes -f.
2014-12-17mesa: Remove tarballs/checksum rules.Matt Turner1-75/+0
2014-12-17gallium: Add egl and gbm to distribution.Matt Turner1-0/+4
2014-12-17mesa: Set DISTCHECK_CONFIGURE_FLAGS.Matt Turner1-0/+13
Enable some non-default options that distros are likely to use.
2014-12-17targets/xvmc: Add uninstall hooks to handle megadriver hardlinks.Matt Turner1-0/+5
2014-12-17targets/vdpau: Add uninstall hooks to handle megadriver hardlinks.Matt Turner1-0/+5
2014-12-17targets/vdpau: Add clean-local rule to remove .lib links.Matt Turner1-0/+6
2014-12-17vc4: Add a userspace BO cache.Eric Anholt4-4/+175
Since our kernel BOs require CMA allocation, and the use of them requires new mmaps, it's pretty expensive and we should avoid it if possible. Copying my original design for Intel, make a userspace cache that reuses BOs that haven't been shared to other processes but frees BOs that have sat in the cache for over a second. Improves glxgears framerate on RPi by around 30%.
2014-12-17vc4: Add dmabuf support.Eric Anholt4-24/+78
This gets DRI3 working on modesetting with glamor. It's not enabled under simulation, because it looks like handing our dumb-allocated buffers off to the server doesn't actually work for the server's rendering.
2014-12-17vc4: Drop a weird argument in the BOs-from-handles API.Eric Anholt3-7/+5
2014-12-17draw: revert using correct order for prim decomposition.Roland Scheidegger1-1/+3
This reverts db3dfcfe90a3d27e6020e0d3642f8ab0330e57be. The commit was correct but we've got some precision problems later in llvmpipe (or possibly in draw clip) due to the vertices coming in in different order, causing some internal test failures. So revert for now. (Will only affect drivers which actually support constant-interpolated attributes and not just flatshading.)
2014-12-17util: Silence signed-unsigned comparison warningsJan Vesely1-6/+6
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-12-16i965: Require pixel alignment for GPU copy blitCody Northrop2-4/+6
The blitter will start at a pixel's natural alignment. For PBOs, if the provided offset if not aligned, bits will get dropped. This change adds offset alignment check for src and dst, kicking back if the requirements are not met. The change is based on following verbiage from BSPEC: Color pixel sizes supported are 8, 16, and 32 bits per pixel (bpp). All pixels are naturally aligned. Found in the following locations: page 35 of intel-gfx-prm-osrc-hsw-blitter.pdf page 29 of ivb_ihd_os_vol1_part4.pdf page 29 of snb_ihd_os_vol1_part5.pdf This behavior was observed with Steam Big Picture rendering incorrect icon colors. The fix has been tested on Ubuntu and SteamOS on Haswell. Signed-off-by: Cody Northrop <cody@lunarg.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83908 Reviewed-by: Neil Roberts <neil@linux.intel.com>
2014-12-16i965: remove includes of sampler.h from extern "C" blocksMark Janes4-5/+4
C linkage was removed from functions in program/sampler.cpp. However, some cpp files include program/sampler.h within extern "C" blocks, causing link errors for test_vec4_copy_propagation. Reviewed-by: Brian Paul <brianp@vmware.com> Tested-by: Ian Romanick <ian.d.romanick@intel.com>
2014-12-16i965/query: Cache whether the batch references the query BO.Kenneth Graunke2-4/+26
Chris Wilson noted that repeated calls to CheckQuery() would call drm_intel_bo_references(brw->batch.bo, query->bo) on each invocation, which is expensive. Once we've flushed, we know that future batches won't reference query->bo, so there's no point in asking more than once. This patch adds a brw_query_object::flushed flag, which is a conservative estimate of whether the batch has been flushed. On the first call to CheckQuery() or WaitQuery(), we check if the batch references query->bo. If not, it must have been flushed for some reason (such as being full). We record that it was flushed. If it does reference query->bo, we explicitly flush, and record that we did so. Any subsequent checks will simply see that query->flushed is set, and skip the drm_intel_bo_references() call. Inspired by a patch from Chris Wilson. According to Eero, this does not affect the performance of Witcher 2 on Haswell, but approximately halves the userspace CPU usage. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86969 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>