summaryrefslogtreecommitdiff
path: root/VC4.mdwn
diff options
context:
space:
mode:
authorEricAnholt <EricAnholt@web>2014-12-16 10:23:30 -0800
committerdri <iki-dri@freedesktop.org>2014-12-16 10:23:30 -0800
commit421320c219fa06e49f294b50cdf6c82627120bc3 (patch)
tree18772ed37cc8a72278edf956711526721667b035 /VC4.mdwn
parenteb5db2a45942107169a92ceae6d2f18c51144297 (diff)
Diffstat (limited to 'VC4.mdwn')
-rw-r--r--VC4.mdwn32
1 files changed, 24 insertions, 8 deletions
diff --git a/VC4.mdwn b/VC4.mdwn
index ec7dc97..f29c4fa 100644
--- a/VC4.mdwn
+++ b/VC4.mdwn
@@ -4,8 +4,6 @@ The kernel driver can be found at: <https://github.com/anholt/linux/tree/vc4>
The kernel driver initializes KMS using the firmware blob to set a 1680x1050 video mode using the firmware's CMA area, then smashes the scanout to point at KMS's allocated framebuffer.
-The kernel driver executes GPU jobs synchronously, without inter-thread dispatch and spins waiting for threads to complete.
-
The shader validator currently has a root hole in that you could modify the shaders between validation and execution by the GPU, since we aren't zapping the user's mappings of the shader BO.
## 3D driver status
@@ -19,7 +17,7 @@ The 3D driver can be found at: <http://cgit.freedesktop.org/mesa/mesa/tree/src/g
* No control flow support
-This is going to require having actual phi nodes in the SSA and doing an out-of-SSA pass.
+This is going to require having actual phi nodes in the SSA and doing an out-of-SSA pass. See vc4-nir-5 for WIP.
* No 3D texture support
@@ -37,19 +35,31 @@ We can't export stencil because only the first dword of the value written to TLB
We want to be exposing 0, so that testcases know they can't actually get results from OQs.
+* No support for DDX/DDY
+
+This might be doable with the MUL output rotation support.
+
###Significant performance projects:
-* Build the pairing instruction scheduler.
+* Add a pre-register-allocation scheduler.
+
+Especially if we order texture results collection after other texture setup, this could be a big win (which would be hard to achieve post-register-allocation, when there are more dependencies). See vc4-qir-sched.
+
+* Improve the pairing instruction scheduler.
-You can do independent add and mul operations in a single cycle on this hardware, and we're not packing those together.
+We could pair more often if we converted ADD-based MOVs to MUL-based MOVs. We could also pair some cases with the PM bit set in only one of the instructions.
-* Support other vertex attribute formats.
+* Copy propagate VPM reads into their usage if only used once.
-Right now we're mostly just handling 32-bit float inputs, but there's other interesting support to do.
+* Register coalesce VPM writes into the instruction generating them.
+
+* Have the register allocator ask us for our preference in register file choice.
+
+If we minimize raddr a/b conflicts when possible, we can get way fewer instructions generated.
* Need to cache kernel BO allocations/mappings
-Right now we're freeing the objects immediately when we're done, which costs a lot of CPU time. We want to cache in userspace like normal drivers do, but we'll need a memory pressure handler for sure when we do since we're so memory constrained.
+Right now we're freeing the objects immediately when we're done, which costs a lot of CPU time. We want to cache in userspace like normal drivers do, but we'll need a memory pressure handler for sure when we do since we're so memory constrained. See vc4-bo-cache.
* Defer FBO flushes across binds.
@@ -59,6 +69,10 @@ If you bind a new FBO as a render target, we'd rather not flush our current rend
Right now we store color and depth at the end of every render job. But if we're flushing for the sake of glXSwapBuffers, we don't need real contents in the depth buffer and could drop those stores. We could also listen for the EGL framebuffer invalidate thing and clear our resolve bits based on that.
+* Don't flush on incremental clears.
+
+Sometimes apps will do a glClear(COLOR) then a glClear(DEPTH), without rendering in between, and we shouldn't do a scene store/load for that.
+
* ARB_blend_func_extended
This could massively improve component-alpha text rendering performance in X.
@@ -89,6 +103,8 @@ Unfortunately gallium drivers use their own custom EGL backend to support the de
The gallium drivers also require a gbm backend, though this part is more sane.
+You have to add mask_gpu_interrupt0=0x400 to /boot/config.txt
+
## Building the X Server
sudo apt-get build-dep xserver-xorg