diff options
author | Yang Rong <rong.r.yang@intel.com> | 2014-06-24 00:28:51 +0800 |
---|---|---|
committer | Zhigang Gong <zhigang.gong@intel.com> | 2014-06-26 09:13:20 +0800 |
commit | fe8bd8197a93bd5a04422a9b9ce7f7a33ca190aa (patch) | |
tree | d83fb2c0536c32f1502c184eaa010fd7674a611a /docs | |
parent | 873a74f1ae9fea80c0df90df4106f20143f77707 (diff) |
Add optimization guide.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Diffstat (limited to 'docs')
-rw-r--r-- | docs/optimization-guide.mdwn | 28 |
1 files changed, 28 insertions, 0 deletions
diff --git a/docs/optimization-guide.mdwn b/docs/optimization-guide.mdwn new file mode 100644 index 00000000..70ed72e4 --- /dev/null +++ b/docs/optimization-guide.mdwn @@ -0,0 +1,28 @@ +Optimization Guide +==================== + +All the SIMD optimization principle also apply to Beignet optimization. +Furthermore, there are some special tips for Beignet optimization. + +1. It is recommended to choose multiple of 16 work group size. Too much SLM usage may reduce parallelism at group level. + If kernel uses large amount SLM, it's better to choose large work group size. Please refer the following table for recommendations + with some SLM usage. +| Amount of SLM | 0 | 4K | 8K | 16K | 32K | +| WorkGroup size| 16 | 64 | 128 | 256 | 512 | + +2. GEN7's read/write on global memory with DWORD and DWORD4 are significantly faster than read/write on BYTE/WORD. + Use DWORD or DWORD4 to access data in global memory if possible. If you cannot avoid the byte/word access, try to do it on SLM. + +3. Use float data type as much as possible. + +4. Avoid using long. GEN7's performance for long integer is poor. + +5. If there is a small constant buffer, define it in the kernel instead of using the constant buffer argument if possible. + The compiler may optimize it if the buffer is defined inside kernel. + +6. Avoid unnecessary synchronizations, both in the runtime and in the kernel. For examples, clFinish and clWaitForEvents in runtime + and barrier() in the kernel. + +7. Consider native version of math built-ins, such as native\_sin, native\_cos, if your kernel is not precision sensitive. + +8. Try to eliminate branching as much as possible. For example using min, max, clamp or select built-ins instead of if/else if possible. |