diff options
author | Zhigang Gong <zhigang.gong@intel.com> | 2014-11-12 13:01:02 +0800 |
---|---|---|
committer | Zhigang Gong <zhigang.gong@intel.com> | 2014-11-12 13:01:02 +0800 |
commit | bd0e19aadc481fddc02edb8b48e2000ca2f5ae96 (patch) | |
tree | 29e9d9dbdcc9670c05734c13ae2e92458373a424 /docs | |
parent | 47ba7dd6736b83e93327b4403aa06fb0dcf1c3cd (diff) |
update some documents.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Diffstat (limited to 'docs')
-rw-r--r-- | docs/Beignet.mdwn | 32 | ||||
-rw-r--r-- | docs/Beignet/Backend/TODO.mdwn | 29 | ||||
-rw-r--r-- | docs/Beignet/Backend/compiler_backend.mdwn | 8 |
3 files changed, 29 insertions, 40 deletions
diff --git a/docs/Beignet.mdwn b/docs/Beignet.mdwn index 83f1eb82..2a977eca 100644 --- a/docs/Beignet.mdwn +++ b/docs/Beignet.mdwn @@ -126,8 +126,8 @@ Supported Targets * 3rd Generation Intel Core Processors * Intel “Bay Trail” platforms with Intel HD Graphics - * 4th Generation Intel Core Processors, need kernel patch currently, see below - for details: + * 4th Generation Intel Core Processors, need kernel patch currently, see the "Known Issues" section. + * 5th Generation Intel Core Processors "Broadwell". Known Issues ------------ @@ -154,8 +154,8 @@ Known Issues `# echo 0 > /sys/module/i915/parameters/enable_cmd_parser` * Some unit test cases, maybe 20 to 30, fail on 4th Generation (HSW) platform. - The 4th Generation Intel Core Processors's support requires some Linux kernel - modification. You need to apply the patch at: + _The 4th Generation Intel Core Processors's support requires some Linux kernel + modification_. You need to apply the patch at: [https://01.org/zh/beignet/downloads/linux-kernel-patch-hsw-support](https://01.org/zh/beignet/downloads/linux-kernel-patch-hsw-support) * Precision issue. @@ -179,12 +179,12 @@ is also good which is about 99%. There are still some remains work items listed most of them are extension support and performance related. - Performance tuning. There are some major optimizations need to be done, - Peephole optimization, convert to structured BBs and leverage Gen's structured - instructions, and optimize the extreme slow software based sin/cos/... math - functions due to the native math instruction lack of necessary precision. - And all the code is inlined which will increase the icache miss rate - significantly. And many other things which are specified partially in - [[here|Beignet/Backend/TODO]]. + Peephole optimization, futher tuning the structurized BB transformation to + support more pattern such as self loop/while loop. And optimize the slow + software based sin/cos/... math functions due to the native math instruction + lack of necessary precision. And all the code is inlined which will increase + the icache miss rate significantly. And many other things which are specified + partially in [[here|Beignet/Backend/TODO]]. - Complete cl\_khr\_gl\_sharing support. We lack of some APIs implementation such as clCreateFromGLBuffer,clCreateFromGLRenderbuffer,clGetGLObjectInfo... Currently, @@ -198,9 +198,6 @@ most of them are extension support and performance related. (i.e. for each NDRangeKernels). This is really inefficient since some expensive pipe controls are issued for each batch buffer. -- Valgrind reports some leaks in libdrm. It sounds like a false positive but it - has to be checked. Idem for LLVM. There is one leak here to check. - More generally, everything in the run-time that triggers the "FATAL" macro means that something that must be supported is not implemented properly (either it does not comply with the standard or it is just missing) @@ -208,7 +205,7 @@ does not comply with the standard or it is just missing) Project repository ------------------ Right now, we host our project on fdo at: -[http://cgit.freedesktop.org/beignet/](http://cgit.freedesktop.org/beignet/). +[http://cgit.freedesktop.org/beignet/](http://cgit.freedesktop.org/beignet/). And the intel 01.org: [https://01.org/beignet](https://01.org/beignet) @@ -223,7 +220,12 @@ How to contribute You are always welcome to contribute to this project, just need to subscribe to the beignet mail list and send patches to it for review. The official mail list is as below: -[http://lists.freedesktop.org/mailman/listinfo/beignet](http://lists.freedesktop.org/mailman/listinfo/beignet) +[http://lists.freedesktop.org/mailman/listinfo/beignet](http://lists.freedesktop.org/mailman/listinfo/beignet) +The official bugzilla is at: +[https://bugs.freedesktop.org/enter_bug.cgi?product=Beignet](https://bugs.freedesktop.org/enter_bug.cgi?product=Beignet) +You are welcome to submit beignet bug. Please be noted, please specify the exact platform +information, such as BYT/IVB/HSW/BDW, and GT1/GT2/GT3. You can easily get this information +by running the beignet's unit test. Documents for OpenCL application developers ------------------------------------------- diff --git a/docs/Beignet/Backend/TODO.mdwn b/docs/Beignet/Backend/TODO.mdwn index 501c5082..4dc8593a 100644 --- a/docs/Beignet/Backend/TODO.mdwn +++ b/docs/Beignet/Backend/TODO.mdwn @@ -24,9 +24,6 @@ The code is defined in `src/llvm`. We used the SPIR and the OpenCL profile to compile the code. Therefore, a good part of the job is already done. However, many things must be implemented: -- Better resolving of the PHI functions. Today, we always generate MOV - instructions at the end of each basic block . They can be easily optimized. - - From LLVM 3.3, we use SPIR IR. We need to use the compiler defined type to represent sampler\_t/image2d\_t/image1d\_t/.... @@ -34,25 +31,14 @@ many things must be implemented: compatible for different clang versions. And may contribute what we have done in the ocl\_stdlib.h to libclc if possible. -- Optimize math functions. If the native math instructions don't compy with the - OCL spec, we use pure software style to implement those math instructions which - is extremely slow, for example. The cos and sin for HD4000 platform are very slow. - For some applications which may not need such a high accurate results. We may - provide a mechanism to use native\_xxx functions instead of the extremely slow - version. +- Optimize math functions. Gen IR ------ The code is defined in `src/ir`. Main things to do are: -- Convert unstructured BBs to structured format, and leverage Gen's structured - instruction such as if/else/endif to encoding those BBs. Then we can save many - instructions which are used to maintain software pcips and predications. - -- Implement those llvm.memset/llvm.memcpy more efficiently. Currently, we lower - them as normal memcpy at llvm module level and not considering the intrinsics - all have a constant data length. +- Support structurized while loop and self loop BBs. - Finishing the handling of function arguments (see the [[IR description|gen_ir]] for more details) @@ -66,7 +52,8 @@ The code is defined in `src/ir`. Main things to do are: - Implement fast path for small local variables. When the kernel only defines a small local array/variable, there will be a good chance to allocate the local array/variable in register space rather than system memory. This will reduce a - lot of memory load/stroe from the system memory. + lot of memory load/stroe from the system memory. After custom loop unrolling, + this optimization is not very important for most cases now. Backend ------- @@ -84,10 +71,10 @@ The code is defined in `src/backend`. Main things to do are: - Reduce the macro instructions in gen\_context. The macro instructions added in gen\_context will not get a chance to do post register allocation scheduling. -- leverage the structured if/endif for branching processing. - - Peephole optimization. There are many chances to do further peephole optimization. +- Implement a better framework to do backend instructions optimizations. + General plumbing ---------------- @@ -104,7 +91,3 @@ and writes) are not properly decoded yet. All of those code should be improved and cleaned up are tracked with "XXX" comments in the code. - -Parts of the code leaks memory when exceptions are used. There are some pointers -to track and replace with std::unique\_ptr. Note that we also add a custom memory -debugger that nicely complements (i.e. it is fast) Valgrind. diff --git a/docs/Beignet/Backend/compiler_backend.mdwn b/docs/Beignet/Backend/compiler_backend.mdwn index 3c489b2f..30b2aba8 100644 --- a/docs/Beignet/Backend/compiler_backend.mdwn +++ b/docs/Beignet/Backend/compiler_backend.mdwn @@ -100,8 +100,12 @@ do smarter scratch memory allocation to reduce scratch memory requirement. Instruction scheduling ---------------------- -Intra-basic block instruction scheduling is relatively simple. It is implemented -but has known bug, we need further effort to fix it. +Pre register allocation instruction scheduling is not implemented. Although it +may reduce register pressure but it may also increase register dependencies. We +need to think about a trade-off mechanism to do this optimization. +Post register allocation scheduling has been implemented, and could get about +8% performance improvement. But those cycles data are based on experiments not +accurate. We may need to tweak it when we get more information. Instruction encoding -------------------- |