summaryrefslogtreecommitdiff
path: root/docs/Beignet/Backend/TODO.mdwn
blob: f14433de0e652c1d30fc5f2ca2fe16af6a60cf3b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
TODO
====

The compiler is far from complete. Even if the skeleton is now done and should
be solid, There are a _lot_ of things to do from trivial to complex.

OpenCL standard library
-----------------------

Today we define the OpenCL API in header file `src/ocl_stdlib.h`. This file is
from being complete.

By the way, one question remains: do we want to implement
the high-precision functions as _inline_ functions or as external functions to
call? Indeed, inlining all functions may lead to severe code bloats while
calling functions will require to implement a proper ABI. We certainly want to
do both actually.

LLVM front-end
--------------

The code is defined in `src/llvm`.  We used the PTX ABI and the OpenCL profile
to compile the code. Therefore, a good part of the job is already done. However,
many things must be implemented:

- Lowering down of various intrinsics like `llvm.memcpy`

- Better resolving of the PHI functions. Today, we always generate MOV
  instructions at the end of each basic block . They can be easily optimized.

- From LLVM 3.3, we use SPIR IR. We need to use the compiler defined type to
  represent sampler_t/image2d_t/image1d_t/....

- Adding support for long (int64).

Gen IR
------

The code is defined in `src/ir`. Main things to do are:

- Finishing the handling of function arguments (see the [[IR
  description|gen_ir]] for more details)

- Adding support for linking IR units together. OpenCL indeed allows to create
  programs from several sources

- Uniform analysys. This is a major performance improvement. A "uniform" value
  is basically a value where regardless the control flow, all the activated
  lanes will be identical. Trivial examples are immediate values, function
  arguments. Also, operations on uniform will produce uniform values and so
  on...

- Merging of independent uniform loads (and samples). This is a major
  performance improvement once the uniform analysis is done. Basically, several
  uniform loads may be collapsed into one load if no writes happens in-between.
  This will obviously impact both instruction selection and the register
  allocation.

- Adding support for long (int64).

Backend
-------

The code is defined in `src/backend`. Main things to do are:

- Int64 support?

- Implementing register spilling (see the [[compiler backend
  description|compiler_backend]] for more details)

- Implementing proper instruction selection. A "simple" tree matching algorithm
  should provide good results for Gen

- Improving the instruction scheduling pass

General plumbing
----------------

I tried to keep the code clean, well, as far as C++ can be really clean. There
are some header cleaning steps required though, in particular in the backend
code.

The context used in the IR code generation (see `src/ir/context.*pp`) should be
split up and cleaned up too.

I also purely and simply copied and pasted the Gen ISA disassembler from Mesa.
This leads to code duplication. Also some messages used by OpenCL (untyped reads
and writes) are not properly decoded yet.

All of those code should be improved and cleaned up are tracked with "XXX"
comments in the code.

Parts of the code leaks memory when exceptions are used. There are some pointers
to track and replace with std::unique_ptr. Note that we also add a custom memory
debugger that nicely complements (i.e. it is fast) Valgrind.