diff options
author | Søren Sandmann <sandmann@redhat.com> | 2007-12-08 09:52:39 -0500 |
---|---|---|
committer | Søren Sandmann <sandmann@redhat.com> | 2007-12-08 09:52:39 -0500 |
commit | 3ef6f915540df92ce20921e696b7ab8110dd4806 (patch) | |
tree | 2d9aff2e6504746bf8fd1722d91f4bc9a5c3adba /TODO | |
parent | d6391c34f3d5e339da176908d3d6e165875a8723 (diff) |
Process four pixels at a time; some new instructions
Diffstat (limited to 'TODO')
-rw-r--r-- | TODO | 50 |
1 files changed, 26 insertions, 24 deletions
@@ -75,35 +75,20 @@ A simple, but probably pretty good scheme: - - Generate two versions of each op, one where everything is aligned, and one where - alignment is detected on the fly. + - Generate two versions of each op, one where everything is aligned, + and one where alignment is detected on the fly. In both cases n_pixels is computed, the number of pixels to handle - per iteration. In the aligned case, we then just read in that - many pixels as efficiently as possible. For the first iteration, - that probably means 2 pixels in many cases, but eventually it - would be nice to unroll once to get to four pixels. + per iteration. In the aligned case, we then just read in that many + pixels as efficiently as possible. For the first iteration, that + probably means 2 pixels in many cases, but eventually it would be + nice to unroll once to get to four pixels. So both versions have a preamble that reads the source, mask and destinations into sse registers. Then afterwards, the computations are the same, then finally two different versions generate the final write to the destination. -- Backwards vs. forwards - - The code currently in testjit iterates backwards over each line. It - may be a little better to go forward. This could be done by - - - initializing the line to (line + w * bpp) - - initializing w to -width - - not having a displacement: - - movq (line, w, bpp), xmm0 - - and - - add 2, width - - It is important that the register allocator is not too dumb - EAX is the only register we can use for multiplication @@ -188,9 +173,6 @@ smaller by compressing fields into uint8_t's. This would cause gdb to not show registers in enums though. -- Pixman CPU detection should be generated dynamically. That will get - rid of the annoying #ifdefs and getisax() stuff. - - Public API: pixman-sse-jit.h: @@ -220,6 +202,26 @@ DONE: +- Pixman CPU detection should be generated dynamically. That will get + rid of the annoying #ifdefs and getisax() stuff. + + - Backwards vs. forwards + + The code currently in testjit iterates backwards over each line. It + may be a little better to go forward. This could be done by + + - initializing the line to (line + w * bpp) + - initializing w to -width + - not having a displacement: + + movq (line, w, bpp), xmm0 + + and + + add 2, width + + Current code iterates forwards. + - The memindex/membase should take ops, not reg numbers. - There should only REG and MEM in the ops. emit_memindex() can |