summaryrefslogtreecommitdiff
path: root/I965ScalarSSA.mdwn
diff options
context:
space:
mode:
authorJason Ekstrand <jason.ekstrand@intel.com>2015-04-24 13:44:38 -0700
committerJason Ekstrand <jason.ekstrand@intel.com>2015-04-24 13:44:38 -0700
commita63d361a41892280d43236681396a702ff795648 (patch)
tree2376785bd940cbe8d5cc9326c8ef4e1523f51a47 /I965ScalarSSA.mdwn
parent10553e4319941d59ae070331445a99ab404870a6 (diff)
Move I915ScalarSSA to I965ScalarSSA
Diffstat (limited to 'I965ScalarSSA.mdwn')
-rw-r--r--I965ScalarSSA.mdwn213
1 files changed, 213 insertions, 0 deletions
diff --git a/I965ScalarSSA.mdwn b/I965ScalarSSA.mdwn
new file mode 100644
index 0000000..afbe61b
--- /dev/null
+++ b/I965ScalarSSA.mdwn
@@ -0,0 +1,213 @@
+This page is a collection of thoughts on how to best do SSA form in the
+scalar backend IR on i965. There are a variety of different issues
+discussed here. I've tried my best to put them into a reasonable order.
+
+# General Considerations
+
+There are a number of general things that will probably need to change in
+order to nicely handle SSA form.
+
+ 1. Register numbers. Right now, we "allocate" an integer name for each
+ register and treat each value as if it comes out of a magical
+ variable-length register file. For SSA values, we will probably want
+ to do something similar to NIR where we just point back to the
+ instruction that generates the value.
+
+# Validation
+
+One of the nicest things in all of NIR is the validator. A validation
+layer cannot check that the shader is correct, but it can check that it is
+in the right form. Having a good validator has been a massive help in NIR
+and we really should repeat it in the backend.
+
+# Registers
+
+Most of the interesting bits of SSA have to do with how you handle
+registers/values.
+
+## Register sizes
+
+Every register has a concept of size. Hardware registers (physical or
+otherwise) and messages will be specified in terms of a number of physical
+registers. Other registers have a three-dimensional size; a number of
+bits, a width, and a length. The number of bits is fairly obvious: 32 for
+float/int, 16 for hf, 64 for double, etc. The width is the number of SIMD
+channels stored in the register.
+
+## Execution masks and interference
+
+Most values, in particular anything in a GLSL variable, exist in a
+per-SIMD-channel way. These registers can be treated with the normal
+SSA-based interference rules. Consider the following example:
+
+ float a = 1.0;
+ if (foo) {
+ b = 0.5;
+ use(b);
+ } else {
+ use(a);
+ }
+
+If `foo` has different values for different SIMD channels, then the only
+channels of `b` that are set to 0.5 and used will be the channels where
+`foo` is true. Similarly, the only channels of `a` that will be used are
+the channels where `foo` is false. Therefore, we can freely assign `a` and
+`b` to the same register without any harm.
+
+The situation is quite different if the assignment to `b` happens to all
+channels regardless of the execution mask. In this case, assigning `a` and
+`b` to the same register will cause the values we want in `a` to be
+squashed by the assignment to be so they have to be in separate registers.
+
+For the sake of clearer discussion, will use the term **SIMD-local** to
+refer to a register that is only used in a per-SIMD-channel way and
+**SIMD-global** to refer to anything else.
+
+### Bit sizes and exec offsets.
+
+These issues become even more interesting when you throw in quarter control
+and non-32-bit types. With these added complexities, the execution masks
+do not always apply to the same 32-bit slice of any given register.
+Instead, it may apply to 16-bit or 64-bit slices. For alternate quarter
+controls, you can have the second half of the mask apply to the given set
+of 32-bit slices instead of the first.
+
+## Register files
+
+In the current backend IR's, we have a variety of different register files
+that indicate different things. This has worked pretty well, and we
+probably want to do the same type of thing in a new IR. I think we want
+the following register files:
+
+ * **ATTR**: Push attribute registers. These registers will be pushed by
+ the vertex fetching hardware and are considered to be "defined" at the
+ start of the program. These are per-SIMD-channel and as such, have a
+ width equal to the dispatch width.
+
+ * **UNIFORM**: Push uniform registers. These registers store uniform push
+ constants that are loaded by the hardware for use and are considered to
+ be "defined" at the start of the program. These are uniform values and
+ as such always have a width of 1.
+
+ * **HW_REG**: An actual physical hardware register. These are needed for
+ a variety of setup tasks. Usually, we only use g0 and g1, but there are
+ other times when we need a particular physical register. Hardware
+ registers don't really have a width per se but we do need some way of
+ specifying what hardware registers get touched. Hardware registers will
+ allow full register regioning. Reads/writes to physcal hardware
+ registers cannot be reordered.
+
+ * **Allocated HW_REG**: An allocated hardware register is similar to a
+ hardware register in that it allows multiple writes, partial writes,
+ full register regioning, and are SIMD-global. The primary difference
+ between allocated and physical hardware registers is that allocated
+ hardware registers get their actual register number assigned by the
+ allocator. In order to allow for SSA-based allocation, any given
+ allocated hardware register may only be used in a single basic block.
+ (It will be considered to be defined at the first use/def and all other
+ use/defs will be considered uses.) We may implement this by simply
+ using a register number of -1 to indicate that it should be allocated.
+
+ * **MRF**: Message registers. These are only ever defined by a
+ LOAD_PAYLOAD type operation and used by send messages. On SNB-, these
+ will be allocated out of a special pool of 16 registers (24 on SNB?).
+ On IVB+, these will be allocated out of the same pool as the rest of the
+ registers. Messages are special in that they are partially SIMD-local
+ and partially SIMD-global. How will we handle this in the allocator? I
+ have no idea.
+
+ * **IMM**: Immediate values. We could do what NIR does and handle
+ immediates by using a constant load instruction. This may be a good
+ idea, but it doesn't map very well to the hardware. It's probably
+ better to use a special register file/type for these.
+
+ * **FLAG**: Boolean flag values. These store boolean-only SSA values and
+ get allocated into one of the available flag registers. Since we have
+ very few flag registers, we may have to "spill" a flag value to a
+ regular SSA value by inserting a move-to-flag shortly before the uses of
+ the flag value. However, we do want the ability to track flag values
+ explicitly rather than the current "writes flag" boolean. There is a
+ tricky bit here because the CMP instruction *always* writes the flag
+ whether you want it to or not so we can't quite use SSA interference for
+ this.
+
+ * **SSA**: These are the regular values. They have a number of bits,
+ a width (1, 8, 16, or 32), a quarter (first half, second quarter, etc.)
+ and a length. In general, the width will be 1 for values that are
+ uniform and 8, 16, or 32 for non-uniform values. Non-uniform value
+ with a width less than the dispatch width will also have a specified
+ quarter to denote which SIMD channels it corresponds to. Unlike
+ hardware registers, partial writes and interesting regions are *not*
+ allowed on SSA values. Also, SSA values have exactly one definition and
+ this definition must dominate its entire live-range.
+
+One more we *may* want, but I'm not sure what to do with it:
+
+ * **ACCUM**: We also have an accumulator register. Some instructions just
+ write to this register by default. If we use the accumulator value, we
+ need to be very careful with re-ordering of instructions. I don't know
+ if we want to track this as an SSA value or not, but we do need to track
+ it.
+
+# Register Allocation
+
+One of the big advantages to SSA in the backend IR is the ability to use
+SSA for doing register allocation. More than the allocation itself
+SSA-based allocation schemes gives you the ability to know the exact
+register pressure at any point in the program and make decisions based on
+it. The benefits of this apply to many areas of the backend compiler such
+as value numbering, scheduling, spilling, and many more. If we can, we
+really want an SSA-based scheme.
+
+## SSA-based allocation
+
+The problem here is that most SSA-based register allocation schemes are
+fairly thoroughly CPU-focused. In our setting, there are some additional
+constraints:
+
+ 1. Register sizes. In the usual CPU setting, each register is a single
+ atomic entity. In the GPU setting, however, we have to deal with
+ allocating blocks of contiguous registers. This means that we have to
+ solve a packing problem.
+
+ 2. SIMD masking. In the usual CPU setting, if two registers don't
+ interfere in the SSA sense of interference, you can assign them to the
+ same register. In the SIMD setting, SIMD-global registers interfere
+ with more than just what the dominance tree tells you. Also, registers
+ with different bit-sizes and different quarters can interfere even if
+ they don't interfere in the usual SSA sense.
+
+Connor and I (Jason) have discussed this quite a bit and have come up with
+two different schemes that seem to at least start to address these
+problems. However, the problem is very far from being solved.
+
+## In the mean time...
+
+Before we actually solve the SSA-based register allocation problem, we can
+probably do better than we are today by using a hybrid approach. This
+would be to combine both SSA-based interference with the interval-based
+interference we use today. One way to do this would be as follows:
+
+ bool
+ regs_interfere(reg *a, reg *b)
+ {
+ if (regs_ssa_interfere(a, b))
+ return true;
+
+ if (!regs_interval_interfere(a, b))
+ return false;
+
+ return a->width != b->width ||
+ a->bits != b->bits ||
+ a->quarter != b->quarter ||
+ a->simd_global || b->simd_global;
+ }
+
+This would get us an interference graph with fewer edges than the purely
+interval-based approach but would also allow us to avoid the sharp edges
+given by working on a SIMD architecture. We would then use the same
+register allocator that we have today.
+
+The problem with this approach is, of course, that we wouldn't get the nice
+register pressure properties of *real* SSA-based register allocation.
+However, it may save us a few registers and gain us some SIMD16 programs.