summaryrefslogtreecommitdiff
path: root/docs/Beignet/Backend/TODO.mdwn
diff options
context:
space:
mode:
Diffstat (limited to 'docs/Beignet/Backend/TODO.mdwn')
-rw-r--r--docs/Beignet/Backend/TODO.mdwn109
1 files changed, 109 insertions, 0 deletions
diff --git a/docs/Beignet/Backend/TODO.mdwn b/docs/Beignet/Backend/TODO.mdwn
new file mode 100644
index 00000000..3f1ccb4c
--- /dev/null
+++ b/docs/Beignet/Backend/TODO.mdwn
@@ -0,0 +1,109 @@
+TODO
+====
+
+The compiler is far from complete. Even if the skeleton is now done and should
+be solid, There are a _lot_ of things to do from trivial to complex.
+
+OpenCL standard library
+-----------------------
+
+Today we define the OpenCL API in header file `src/ocl_stdlib.h`. This file is
+from being complete.
+
+By the way, one question remains: do we want to implement
+the high-precision functions as _inline_ functions or as external functions to
+call? Indeed, inlining all functions may lead to severe code bloats while
+calling functions will require to implement a proper ABI. We certainly want to
+do both actually.
+
+LLVM front-end
+--------------
+
+The code is defined in `src/llvm`. We used the PTX ABI and the OpenCL profile
+to compile the code. Therefore, a good part of the job is already done. However,
+many things must be implemented:
+
+- Lowering down of various intrinsics like `llvm.memcpy`
+
+- Conformance test for all OpenCL built-ins (`native_cos`, `native_sin`,
+ `mad`, atomic operations, barriers...).
+
+- Lowering down of int16 / int8 / float16 / char16 / char8 / char4 loads and
+ stores into the supported loads and stores
+
+- Support for local declaration of local array (the OpenCL profile will properly
+ declare them as global arrays)
+
+- Support for doubles
+
+- Support atomic extensions.
+
+- Better resolving of the PHI functions. Today, we always generate MOV
+ instructions at the end of each basic block . They can be easily optimized.
+
+Gen IR
+------
+
+The code is defined in `src/ir`. Main things to do are:
+
+- Bringing support for doubles
+
+- Adding support for atomic extensions.
+
+- Finishing the handling of function arguments (see the [[IR
+ description|gen_ir]] for more details)
+
+- Adding support for linking IR units together. OpenCL indeed allows to create
+ programs from several sources
+
+- Uniform analysys. This is a major performance improvement. A "uniform" value
+ is basically a value where regardless the control flow, all the activated
+ lanes will be identical. Trivial examples are immediate values, function
+ arguments. Also, operations on uniform will produce uniform values and so
+ on...
+
+- Merging of independent uniform loads (and samples). This is a major
+ performance improvement once the uniform analysis is done. Basically, several
+ uniform loads may be collapsed into one load if no writes happens in-between.
+ This will obviously impact both instruction selection and the register
+ allocation.
+
+Backend
+-------
+
+The code is defined in `src/backend`. Main things to do are:
+
+- Implementing support for doubles
+
+- Implementing atomic extensions.
+
+- Implementing register spilling (see the [[compiler backend
+ description|compiler_backend]] for more details)
+
+- Implementing proper instruction selection. A "simple" tree matching algorithm
+ should provide good results for Gen
+
+- Improving the instruction scheduling pass
+
+General plumbing
+----------------
+
+I tried to keep the code clean, well, as far as C++ can be really clean. There
+are some header cleaning steps required though, in particular in the backend
+code.
+
+The context used in the IR code generation (see `src/ir/context.*pp`) should be
+split up and cleaned up too.
+
+I also purely and simply copied and pasted the Gen ISA disassembler from Mesa.
+This leads to code duplication. Also some messages used by OpenCL (untyped reads
+and writes) are not properly decoded yet.
+
+There are some quick and dirty hacks also like the use of function call `system`
+(...). This should be cleanly replaced by popen and stuff. I also directly
+called the LLVM compiler executable instead of using Clang library. All of this
+should be improved and cleaned up. Track "XXX" comments in the code.
+
+Parts of the code leaks memory when exceptions are used. There are some pointers
+to track and replace with std::unique_ptr. Note that we also add a custom memory
+debugger that nicely complements (i.e. it is fast) Valgrind.