1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
|
Beignet Compiler
================
This code base contains the compiler part of the Beignet OpenCL stack. The
compiler is responsible to take a OpenCL language string and to compile it into
a binary that can be executed on Intel integrated GPUs.
Status
------
After two years development, beignet is mature now. It now supports all the
OpenCL 1.2 mandatory features. Beignet get almost 100% pass rate with both
OpenCV 3.0 test suite and the piglit opencl test suite. There are some
performance tuning related items remained, see [[here|Backend/TODO]] for a
(incomplete) lists of things to do.
Interface with the run-time
---------------------------
Even if the compiler makes a very liberal use of C++ (templates, variadic
templates, macros), we really tried hard to make a very simple interface with
the run-time. The interface is therefore a pure C99 interface and it is defined
in `src/backend/program.h`.
The goal is to hide the complexity of the inner data structures and to enable
simple run-time implementation using straightforward C99.
Note that the data structures are fully opaque: this allows us to use both the
C++ simulator or the real Gen program in a relatively non-intrusive way.
Various environment variables
-----------------------------
Environment variables are used all over the code. Most important ones are:
- `OCL_STRICT_CONFORMANCE` `(0 or 1)`. Gen does not provide native high
precision math instructions compliant with OpenCL Spec. So we provide a
software version to meet the high precision requirement. Obviously the
software version's performance is not as good as native version supported by
GEN hardware. What's more, most graphics application don't need this high
precision, so we choose 0 as the default value. So OpenCL apps do not suffer
the performance penalty for using high precision math functions.
- `OCL_SIMD_WIDTH` `(8 or 16)`. Select the number of lanes per hardware thread,
Normally, you don't need to set it, we will select suitable simd width for
a given kernel. Default value is 16.
- `OCL_OUTPUT_KENERL_SOURCE` `(0 or 1)`. Output the building or compiling kernel's
source code.
- `OCL_OUTPUT_GEN_IR` `(0 or 1)`. Output Gen IR (scalar intermediate
representation) code
- `OCL_OUTPUT_LLVM_BEFORE_LINK` `(0 or 1)`. Output LLVM code before llvm link
- `OCL_OUTPUT_LLVM_AFTER_LINK` `(0 or 1)`. Output LLVM code after llvm link
- `OCL_OUTPUT_LLVM_AFTER_GEN` `(0 or 1)`. Output LLVM code after the lowering
passes, Gen IR is generated based on it.
- `OCL_OUTPUT_ASM` `(0 or 1)`. Output Gen ISA
- `OCL_OUTPUT_REG_ALLOC` `(0 or 1)`. Output Gen register allocations, including
virtual register to physical register mapping, live ranges.
- `OCL_OUTPUT_BUILD_LOG` `(0 or 1)`. Output error messages if there is any
during CL kernel compiling and linking.
- `OCL_OUTPUT_CFG` `(0 or 1)`. Output control flow graph in .dot file.
- `OCL_OUTPUT_CFG_ONLY` `(0 or 1)`. Output control flow graph in .dot file,
but without instructions in each BasicBlock.
- `OCL_PRE_ALLOC_INSN_SCHEDULE` `(0 or 1)`. The instruction scheduler in
beignet are currently splitted into two passes: before and after register
allocation. The pre-alloc scheduler tend to decrease register pressure.
This variable is used to disable/enable pre-alloc scheduler. This pass is
disabled now for some bugs.
- `OCL_POST_ALLOC_INSN_SCHEDULE` `(0 or 1)`. Disable/enable post-alloc
instruction scheduler. The post-alloc scheduler tend to reduce instruction
latency. By default, this is enabled now.
- `OCL_SIMD16_SPILL_THRESHOLD` `(0 to 256)`. Tune how much registers can be
spilled under SIMD16. Default value is 16. We find spill too much register
under SIMD16 is not as good as fall back to SIMD8 mode. So we set the
variable to control spilled register number under SIMD16.
- `OCL_USE_PCH` `(0 or 1)`. The default value is 1. If it is enabled, we use
a pre compiled header file which include all basic ocl headers. This would
reduce the compile time.
Implementation details
----------------------
Several key decisions may use the hardware in an usual way. See the following
documents for the technical details about the compiler implementation:
- [[Mixed buffer pointer)|mixed_buffer_pointer]]
- [[Unstructured branches|unstructured_branches]]
- [[Scalar intermediate representation|gen_ir]]
- [[Clean backend implementation|compiler_backend]]
Ben Segovia.
|