Process four pixels at a time; some new instructions

author: Søren Sandmann <sandmann@redhat.com> 2007-12-08 09:52:39 -0500
committer: Søren Sandmann <sandmann@redhat.com> 2007-12-08 09:52:39 -0500
commit: 3ef6f915540df92ce20921e696b7ab8110dd4806 (patch)
tree: 2d9aff2e6504746bf8fd1722d91f4bc9a5c3adba /TODO
parent: d6391c34f3d5e339da176908d3d6e165875a8723 (diff)
1 files changed, 26 insertions, 24 deletions
diff --git a/TODO b/TODO
index 8a27080..b62b901 100644
--- a/TODO
+++ b/TODO
@@ -75,35 +75,20 @@
 
   A simple, but probably pretty good scheme:
 
-  - Generate two versions of each op, one where everything is aligned, and one where
-    alignment is detected on the fly.
+  - Generate two versions of each op, one where everything is aligned,
+    and one where alignment is detected on the fly.
 
     In both cases n_pixels is computed, the number of pixels to handle
-    per iteration. In the aligned case, we then just read in that
-    many pixels as efficiently as possible. For the first iteration,
-    that probably means 2 pixels in many cases, but eventually it
-    would be nice to unroll once to get to four pixels.
+    per iteration. In the aligned case, we then just read in that many
+    pixels as efficiently as possible. For the first iteration, that
+    probably means 2 pixels in many cases, but eventually it would be
+    nice to unroll once to get to four pixels.
 
     So both versions have a preamble that reads the source, mask and
     destinations into sse registers. Then afterwards, the computations
     are the same, then finally two different versions generate the
     final write to the destination.
 
-- Backwards vs. forwards
-
-  The code currently in testjit iterates backwards over each line. It
-  may be a little better to go forward. This could be done by
-
-      - initializing the line to (line + w * bpp)
-      - initializing w to -width
-      - not having a displacement:
-
-      	    movq (line, w, bpp), xmm0
-
-	and 
-
-	    add 2, width
-
 - It is important that the register allocator is not too dumb
 
      - EAX is the only register we can use for multiplication
@@ -188,9 +173,6 @@
     smaller by compressing fields into uint8_t's. This would cause gdb
     to not show registers in enums though.
 
-- Pixman CPU detection should be generated dynamically. That will get
-  rid of the annoying #ifdefs and getisax() stuff.
-
 - Public API:
 
   pixman-sse-jit.h:
@@ -220,6 +202,26 @@
 
 DONE:
 
+- Pixman CPU detection should be generated dynamically. That will get
+  rid of the annoying #ifdefs and getisax() stuff.
+
+  - Backwards vs. forwards
+
+    The code currently in testjit iterates backwards over each line. It
+    may be a little better to go forward. This could be done by
+
+      - initializing the line to (line + w * bpp)
+      - initializing w to -width
+      - not having a displacement:
+
+      	    movq (line, w, bpp), xmm0
+
+	and 
+
+	    add 2, width
+
+    Current code iterates forwards.
+
   - The memindex/membase should take ops, not reg numbers.
 
   - There should only REG and MEM in the ops. emit_memindex() can
author	Søren Sandmann <sandmann@redhat.com>	2007-12-08 09:52:39 -0500
committer	Søren Sandmann <sandmann@redhat.com>	2007-12-08 09:52:39 -0500
commit	3ef6f915540df92ce20921e696b7ab8110dd4806 (patch)
tree	2d9aff2e6504746bf8fd1722d91f4bc9a5c3adba /TODO
parent	d6391c34f3d5e339da176908d3d6e165875a8723 (diff)