src/compiler/nir/docs/instructions.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286

Instructions
============

The basic unit of computation in NIR is the *instruction*. An instruction can
be one of the various types listed below. Each instruction type is a derived
class of ``nir_instr``. Instructions occur in *basic blocks*; each basic block
consists of a list of instructions which is executed from beginning to end.

ALU Instructions
----------------

ALU instructions represent simple operations, such as addition,
multiplication, comparison, etc., that take a certain number of arguments and
return a result that only depends on the arguments. A good rule of thumb is
that only things which can be constant folded should be ALU operations. If it
can't be constant folded, then it should probably be an intrinsic instead.

ALU operations are *typeless*, meaning that they're only defined to convert a
certain bitpattern input to another bitpattern output. intBitsToFloat() and
friends are implicit. Boolean true is defined to be ~0 (NIR_TRUE) and false
is defined to be NIR_FALSE.

Each ALU instruction has an *opcode*, which is a member of an enum
(``nir_op``) that describes what it does as well as how many arguments it
takes. Associated with each opcode is an info structure (``nir_op_info``),
which shows how many arguments the opcode takes as well as information such as
whether the opcode is commutative (``op a b == op b a``) or associative (``(op
(op a b) c) == (op a (op b c))``). The info structure for each opcode may be
accessed through a global array called ``nir_op_infos`` that's indexed by the
opcode.

Even though ALU operations are typeless, each opcode also has an "ALU type"
which can be floating-point, boolean, integer, or unsigned integer. The ALU
type mainly helps backends which use the absolute value, negate, and saturate
modifiers (normally not used by core NIR) -- there's some generic
infrastructure in NIR which will fold iabs and ineg operations into integer
sources, as well as fabs and fneg for floating-point sources, although most
core NIR optimizations will assume that they are kept separate. In addition,
if an operation takes a boolean argument, then the argument may be assumed to
be either NIR_TRUE or NIR_FALSE, and if an operation's result has a boolean
type, then it may only produce only NIR_TRUE or NIR_FALSE.

ALU opcodes also have the notion of *size*, or the number of components. ALU
opcodes are either *non-per-component*, in which case the destination as well
as each of the arguments are explicitly sized, or *per-component*.
Per-component opcodes have the destination size as well as at least one of
the argument sizes set to 0. The sources with their size set to 0 are known
as the *per-component sources*. Conceptually, for per-component instructions,
the destination is computed by looping over each component and computing some
function which depends only on the matching component of the per-component
sources as well as possibly all the components of the non-per-component
sources. In pseudocode:

::

    for each component "comp":
        dest.comp = some_func(per_comp_src1.comp, per_comp_src2.comp, ...,
                              non_per_comp_src)


Both the info table entry and the enum values are generated from a Python
script called nir_opcodes.py which, when imported, creates an ``opcodes``
list which contains objects of the ``Opcode`` class. Inside nir_opcodes.py,
opcodes are created using the ``opcode`` function, which constructs the
object and adds it to the list, as well as various helper functions which call
``opcode``. For example, the following line in nir_opcodes.py:

.. code-block:: c

    binop("fmul", tfloat, commutative + associative, "src0 * src1")

creates a declaration of a nir_op_fmul member of the ``nir_op`` enum, which is
defined in the generated file nir_opcodes.h, as well as the following entry in
the nir_op_infos array (defined in nir_opcodes.c):

.. code-block:: c
    
    {
       .name = "fmul",
       .num_inputs = 2,
       .output_size = 0,
       .output_type = nir_type_float,
       .input_sizes = {
          0, 0
        },
        .input_types = {
          nir_type_float, nir_type_float
        },
        .algebraic_properties =
            NIR_OP_IS_COMMUTATIVE | NIR_OP_IS_ASSOCIATIVE
    },

The ``src0 * src1`` part of the definition isn't just documentation; it's
actually used to generate code that can constant fold the operation.
Currently, every ALU operation must have a description of how it should be
constant-folded, which makes documenting the operation (including any corner
cases) much simpler in most cases, as well as obviating the need to deal with
per-component and non-per-component subtleties -- the pseudocode above is
implemented for you, and all you have to do is write the ``some_func``. In
this case, the definition of ``fmul`` also creates the following code in
nir_constant_expressions.c:

.. code-block:: c

    static nir_const_value
    evaluate_fmul(unsigned num_components, nir_const_value *_src)
    {
       nir_const_value _dst_val = { { {0, 0, 0, 0} } };

                      
          for (unsigned _i = 0; _i < num_components; _i++) {
                   float src0 = _src[0].f[_i];
                   float src1 = _src[1].f[_i];

                float dst = src0 * src1;

                _dst_val.f[_i] = dst;
          }

       return _dst_val;
    }

as well as the following case in ``nir_eval_const_opcode``:

.. code-block:: c

   case nir_op_fmul: {
      return evaluate_fmul(num_components, src);
      break;
   }

For more information on the format of the constant expression strings, see
the documentation for the ``Opcode`` class in nir_opcodes.py.

Intrinsic Instructions
----------------------

Intrinsics are like the stateful sidekicks to ALU instructions; they include
mainly various different kinds of loads/stores, as well as execution
barriers. Similar to ALU instructions, there is an enum of opcodes
(``nir_intrinsic_op``) as well as a table containing information for each
opcode (``nir_intrinsic_infos``). Intrinsics may or may not have a
destination, and they may also include 1 or more constant indices (integers).
Also similar to ALU instructions, both destinations and sources include a
size that's part of the opcode, and both may be made per-component by setting
their size to 0, in which case the size is obtained from the
``num_components`` field of the instruction. Finally, intrinsics may include
one or more variable dereferences, although these are usually lowered away
before they reach the driver.

Unlike ALU instructions, which can be freely reordered and deleted as long as
they still produce the same result and satisfy the constaints imposed by SSA
form, intrinsics have a few rules regarding how they may be reordered.
Currently, they're rather conservative, but it's expected that they'll get
more refined in the future.  There are two flags that are part of
``nir_intrinsic_infos``: ``NIR_INTRINSIC_CAN_REORDER`` and
``NIR_INTRINSIC_CAN_DELETE``. If an intrinsic can be reordered, then it can be
reordered with respect to *any* other instruction; to prevent two intrinsics
from being reordered with respect to each other, both must not have "can
reorder." If an intrinsic can be deleted, then its only dependencies are on
whatever uses its result, and if it's unused then it can be deleted. For
example, if two intrinsic opcodes are for reading and writing to a common
resource, then the store opcode should have neither flag set, and the load
instruction should have only the "can delete" flag set. Note that load
instructions can't be reordered with respect to each other, and both load and
store instructions can't be reordered with respect to other loads/stores to
resources which don't alias with the resource you're reading/writing; this is
a deficiency of the model, which is expected to change when more
sophisticated analyses are implemented.

Two especially important intrinsics are ``load_var`` and ``store_var``,
through which all loads and stores to variables occur. Most accesses (besides
accesses to textures and buffers) to variables happen through these
instructions in core NIR, although they can be lowered to loads/stores to
registers, inputs, outputs, etc. with actual indices before they reach the
backend.

Unlike ALU instructions, intrinsics haven't yet been converted to the new
Python way of specifing opcodes. Instead, intrinsic opcodes are defined in a
header file, nir_intrinsics.h, which expands to a series of ``INTRINSIC``
macros.  nir_intrinsics.h is included twice, once in nir.h to create the
``nir_intrinsic_op``, and another time in ``nir_intrinsics.c`` to create the
``nir_intrinsic_infos`` array. For example, here's the definition of the
``store_var`` intrinsic:

.. code-block:: c

    INTRINSIC(store_var, 1, ARR(0), false, 0, 1, 0, 0)

This says that ``store_var`` has one source of size 0 (and thus is
per-component), has no destination, one variable, no indices, and no semantic
flags (it can't be reordered and can't be deleted). It creates the
nir_intrinsic_store_var enum member, as well as the corresponding entry in
``nir_intrinsic_infos``.

Call Instructions
-----------------

Call instructions in NIR are pretty simple. They contain a pointer to the
overload that they reference. Arguments are passed through dereferences, which
may be copied from, copied to, or both depending on whether the matching
parameter in the overload is an input, and output, or both. In addition,
there's a return dereference (NULL for functions with void return type) which
gets overwritten with the return value of the function.

Jump Instructions
-----------------

A jump instruction in NIR is a break, a continue, or a return. Returns don't
include a value; instead, functions that return a value instead fill out a
specially-designated variable which is the return variable. For more
information, see :doc:`Control Flow <control_flow>`.

Texture Instructions
--------------------

Even though texture instructions *could* be supported as intrinsics, the
vast number of combinations mean that doing so is practically impossible.
Instead, NIR has a dedicated texture instruction. There's still an array of
sources, except that each source also has a *type* associated with it. There
are various source types, each corresponding to a piece of information that
the different texture operations require. There can be at most one source of
each type. In addition, there are several texture operations:


* ``nir_texop_tex``: normal texture lookup.
* ``nir_texop_txb``: texture lookup with LOD bias.
* ``nir_texop_txl``: texture look-up with explicit LOD.
* ``nir_texop_txd``: texture look-up with partial derivatvies.
* ``nir_texop_txf``: texel fetch with explicit LOD.
* ``nir_texop_txf_ms``: multisample texture fetch.
* ``nir_texop_txs``: query texture size.
* ``nir_texop_lod``: texture lod query.
* ``nir_texop_tg4``: texture gather.
* ``nir_texop_query_levels``: texture levels query.

It's assumed that frontends will only insert the source types that are needed
given the sampler type and the operation.

Like a lot of other resources, there are two ways to represent a sampler in
NIR: either using a variable dereference, or as an index in a single flat
array. When using an index, there is various information stored in the
texture instruction itself so that backends which need to know the type of
the sampler, whether it's a cube or array sampler, etc. can have that
information even in the lowered form.

Constant-Load Instructions
--------------------------

This instruction creates a constant SSA value. Note that writing to a
register isn't supported; instead, you can use a constant load instruction
plus a move to a register.

Undef Instructions
------------------

Creates an undefined SSA value. At each use of the value, each of the bits
can be assumed to be whatever the implementation or optimization passes deem
convenient. Similar in semantics to a register that's read before its written.

Phi Instructions
----------------

From Instructions.h in LLVM:

::

    // PHINode - The PHINode class is used to represent the magical mystical PHI
    // node, that can not exist in nature, but can be synthesized in a computer
    // scientist's overactive imagination.

Phi nodes contain a list of sources matched to predecessor blocks, where
there must be one source for each predecessor block. Conceptually, when a
certain predecessor block branches to the block with the phi node, the
source corresponding to the predessor block is copied to the destination of
the phi node. If there's more than one phi node in a block, then this
process happens in parallel. Phi nodes must be at the beginning of a block,
i.e. each block must consist of any phi instructions followed by any non-phi
nodes.

Parallel Copy Instructions
--------------------------

Copies a list of registers or SSA values to another list of registers or SSA
values in parallel. Only used internally by the from-SSA pass.