summaryrefslogtreecommitdiff
path: root/backend
AgeCommit message (Collapse)AuthorFilesLines
2015-02-11Correct the bit fields error for indirect address of Gen8Junyan He1-2/+2
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-02-06Backend: Fix one bug of printf because of ir reorder.Junyan He3-13/+33
The llvm will generate ir which has if.else block before if.then block. We parse the printf statement before llvm_to_gen. The later if-else analysis will reorder the if-else blocks. This cause when we print out the result, we get the wrong message from another printf statement. Add printf index to the index buffer to record which one the result belongs to, and so this bug is fixed. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-28check the predication in case of endless loop.Luo1-0/+5
v2: Add comment from ruiling: or dead loop, it has an unconditional branch at its end. Simply do not treat it as a loop is also acceptable. I ran into this problem when I execute ./opencv_test_imgproc --gtest_filter=OCL_Imgproc/HoughLines.RealImage/0 And it fix the problem. Signed-off-by: Luo <xionghu.luo@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-01-26GBE: add GEN_TYPE_HF to getTypeSize.Zhigang Gong1-0/+1
Gen8 use GEN_TYPE_HF, we need to let getTypeSize support it. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Tested-by: Zhu Bingbing <bingbingx.zhu@intel.com>
2015-01-23Add the check for src and dst span different registers.Junyan He1-2/+41
On IVB and HSW, When dst spans two registers, the source MUST span two registers. So the following instructions: mov (16) r104.0<2>:uw r126.0<8;8,1>:uw { Align1, H1 } mov (16) r104.1<2>:uw r111.0<8;8,1>:uw { Align1, H1 } mov (16) r106.0<2>:uw r110.0<8;8,1>:uw { Align1, H1 } mov (16) r106.1<2>:uw r109.0<8;8,1>:uw { Align1, H1 } are illegal. Add the check to split instruction into 2 SIMD8 instructions here. TODO: These instructions are allowed on BDW, need to improve. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-01-22GBE: fix popcount bugs.Zhigang Gong4-10/+20
We need to pass correct popcount source type to backend. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com>
2015-01-21GBE: fix an ACC register related instruction scheduling bugZhigang Gong3-2/+18
Some instructions modify the ACC register in the gen_context stage which's not regonized by current instruction scheduling algorithm. This patch fix this bug by checking all the possible SEL_OPs which may change the ACC implicitly. The corresponding bugzilla link is as below: https://bugs.freedesktop.org/show_bug.cgi?id=88587 Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-01-16fix the wrong implementation of popcount.Luo Xionghu2-7/+4
add disassembly for cbit. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-13Fix the printf buffer size bug.Junyan He5-12/+17
We can not know the accurate size of the printf buffer size before run the kernel. Sometimes, especially when the global work items size is huge, the output buffer is not enough and the print message logic will cause the segment fault. We increase the printf buffer to 16M at most and add out of range check to avoid crash. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-12GBE: Fix a disassembly bug.Ruiling Song1-2/+2
It looks a typo, which wrongly interprete bti/msg_type field. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-12GBE: disable spill register under simd16 mode.Zhigang Gong1-3/+2
Register spilling awlays cost much more than fallback to simd8 which could avoid register spilling or at least reduce the spilled registers. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2015-01-12add the reduced self loop node detection.Luo Xionghu1-11/+26
if the self loop node is reduced, the llvm loop info couldn't detect such kind of self loops, handle it by checking whether the compacted node has a successor pointed to itself. v2: differentiate the compacted node from basic node to make the logic clearer, comments the while node as it is not enabled now. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-12reuse the loop info from llvm.Luo Xionghu2-36/+21
the original loop detect algorithm caused the luxmark building performance 10x regression, this patch reused the loop info from llvm to handle SelfLoopNode. the trimmed path couldn't recognize nested while structures(if nodes in while caused performance regression). also the simple while loop node is still not handled yet. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-12add CMake option USE_STANDALONE_GBE_COMPILER and STANDALONE_GBE_COMPILER_DIRGuo Yejun2-8/+32
At some platforms with old c/c++ environment, C++11 features are not supported, it results in the failure to build the gbe compiler part which depends on LLVM/clang using C++11 features. The way to resolve is to build a standalone gbe compiler within another feasible system, and build beignet with the already built standalone gbe compiler by setting USE_STANDALONE_GBE_COMPILER=true. The path of the standalone compiler is /usr/local/lib/beignet as default or could be specified by STANDALONE_GBE_COMPILER_DIR. Once USE_STANDALONE_GBE_COMPILER is given, all the gbe compiler relative code will not be built any longer, only libcl.so and libgebinterp.so are built. And libcl.so is special for GEN_PCI_ID, which is queried from the building machie or could be specified as CMake option. v2: separate the CMake option name. update the commit comments. add back the script for gen pci id, and build driver with it. v3: add file FindStandaloneGbeCompiler.cmake to make the main cmakefile clean. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-12add option BUILD_STANDALONE_GBE_COMPILER to build static compilerGuo Yejun1-10/+29
The standalone compiler (gbe_bin_generater), depending on LLVM/clang, could only be built with C++11 features. To make it workable within old c/c++ version environment, add one CMAKE option to link against all static libraries. And also zip the compiler and necessary files into a tar ball. v2: change the option name to BUILD_STANDALONE_GBE_COMPILER. zip necessary files into a tar ball. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09Fix loop condition of PrintfSet constructor.Yan Wang1-1/+1
Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09remove useless dependency liboclGuo Yejun1-2/+0
libocl is the name of sub directory, the project name in the sub directory, it is not something that others can depend on. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09refine gbe_bin_generater usage to add -t optionGuo Yejun1-1/+1
-t option specifies the gen target pci id, it tells gbe_bin_generater the target platform that it compiles for. The compile result is llvm level binary if this option is not given. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09libocl: Reimplement trigonometric functions.Ruiling Song1-378/+172
Previous version was ported from msun which derived from fdlibm, which is good for cpu, with lots of if-condition check to try to optimize for different input data. But it is really bad for gpu. So I reimplement these functions based on well-known payne & Hanek's algorithm. Compared with previous version, it could reduce the static ASM instruction number of sin/cos from about 1700 to 400. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09libocl: remove useless code.Ruiling Song1-57/+0
These kind of logic already handled by atan2(). Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09do not use C++11 features inside libgbeinterpGuo Yejun12-87/+111
some embedded systems have not upgraded the c/c++ environment, it makes the request to remove the C++11 features. It is possible for the CL_EMBEDDED_PROFILE with some more changes (to be done later). This change modifies the keyword auto and nullptr. btw, C++ new feature is a must for libgbe (the OpenCL compiler) which depends on LLVM/clang Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09change Immediate::operator= from private to publicGuo Yejun1-1/+2
change the attribute of "Immediate & operator= (const Immediate &)" in class Immediate from private to public, otherwise, a compile issue appears when build with old gcc versions for the following code in function.hpp: INLINE ImmediateIndex newImmediate(const Immediate &imm) { const ImmediateIndex index(this->immediateNum()); this->immediates.push_back(imm); return index; } Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09do not include llvm/clang headers for libgbeinterpGuo Yejun2-1/+12
libgbeinterp does not depend on llvm/clange, so remove these header files for code clean. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09Fix PrintfState copying.Yan Wang1-4/+29
PrintfState include std::string object and shouldn't be copied by malloc/memcpy. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: He Junyan <junyan.he@inbox.com>
2015-01-09replace hash_map with mapGuo Yejun5-91/+5
there is no strong evidence to show hash_map makes better performance for beignet, since hash_map requires std::hash which is not supported in some g++ old versions, so replace hash_map with map. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2015-01-09add collectImageArgs to handle image count limitations.Luo Xionghu1-0/+28
read only images in a kernel should be LE than MAX_READ_IMAGE_ARS; write only images in a kernel should be LE than MAX_WRITE_IMAGE_ARS. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09fix min_max_read_image_args and min_max_parameter_size issue.Luo Xionghu2-4/+6
this patch revert fb4bced99b7c08d0d43386abf33448860fb7fc41 as the spec defined the min_max_parameter_size's min value is 1024; the BTI_MAX_NUM and btiBase could be 130 because of 128 images with 1 const surface and 1 private surface. v2: add BTI_MAX_READ_IMAGE_ARGS and BTI_MAX_WRITE_IMAGE_ARGS in backend. change the BTI_MAX_ID to 253. the image numbers will be calculated in later patch and check its limitation. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09libocl: implement high precision pown()Ruiling Song1-5/+232
This version is based on pow() implementation ported from msun. I just modify it to support a floating point to the power of an integer. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09GBE: remove software maintained SLM offset related code.Zhigang Gong7-37/+1
v2: also remove allocSLMOffsetCurbe(). Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2015-01-09GBE: support const private array initialization.Ruiling Song2-45/+54
Developers are allowed to declare initialized private array like below: void func() { const arr[]={1, 2, 3, 4}; } The implementation is simply put them into __constant memory space. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09GBE: use sr0.1's SLM Offset to eliminate the software SLM offset for HSW.Zhigang Gong2-4/+10
sr0.1 has a SLM Offset bits field which could be used to set slm offset (4K unit), so we just need to initialize it at the beginning of the kernel and don't need to maintain the software SLM offset. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2015-01-09GBE: fix an image regression.Zhigang Gong2-29/+30
This patch fix one regressions in the image processing path. For all non-workarounded image which the image offset is 0, we should always use float type coord. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
2015-01-09libocl: flush denorm to zero in remquo()Ruiling Song1-0/+2
Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09libocl: Correctly handle -inf in exp10.Ruiling Song1-3/+3
exp10(-inf) should return 0.0f Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09libocl: flush denorm into zero in ldexp()Ruiling Song1-1/+1
inf and denorm logic in internal_ldexp() is useless, as inf and denorm is already handled in __gen_ocl_scalbnf() and wrapper function. It is better to flush denorm to zero in wrapper function, so we don't have to change the internal implementation. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09libocl: Flush denorm input into zero in rootn()Ruiling Song1-0/+8
Gen does not support denorm. We have to flush input to zero. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09libocl: Imporve precision of exp()Ruiling Song1-9/+27
This patch reverts most logic in 500843d36ab6631d71570130c0c08048f9b8f3fe It seems native_exp will lose some precision which can make it not satisfy OpenCL Spec. These kind of cases often come from other function that involk internal_exp() like sinh/cosh. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09libocl: Improve precision of pow/powr.Ruiling Song1-14/+67
pow: When splitting a float into two floats. normally, we use 0xfffff000 as the mask. This leaves 12bit effective mantissa in high bits. after some calculation, it seems lost some bit in high bits. so I change the mask to 0xffffe000, which only leave 11bit mantissa in the high bits. Then the precision can meet OpenCL Spec requirement. powr: powr() defined different edge case behavior in OpenCL Spec 7.5 Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09add half math function support.Luo Xionghu1-1/+15
simply define the half_xxx functions to xxx. v2: functions need be defined to native_xxx since they could pass under non-strict conformance mode except sin/cos/powr/tan. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09GBE: code cleanup.Zhigang Gong3-12/+2
Remove some useless comments according to Matt's suggestion. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-01-09GBE/CL: use 2D image to implement large image1D_buffer.Zhigang Gong1-3/+17
Per OpenCL spec, the minimum CL_DEVICE_IMAGE_MAX_BUFFER_SIZE is 65536 which is too large for 1D surface on Gen platforms. Have to use a 2D surface to implement it. As OpenCL spec only allows the image1d_t to be accessed via default sampler, it is doable as it will never use a float coordinates and never use linear non-nearest filters. Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-01-09GBE: remove some image1d_buffer related builtin functions.Zhigang Gong2-9/+9
Per OpenCL spec, image1d buffer only support no sampler access. Remove those unsupported functions. Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-01-09GBE: switch to CLANG native sampler_t.Zhigang Gong9-18/+179
CLANG has sampler_t support since LLVM 3.3, let's switch to that type rather than the old hacky way. One major problem is the sampler static checking. As Gen platform has some hardware restrication and if the sampler value is a const defined at kernel side, we need to use the value to optimize the code path. Now the sampler_t becomes an obaque type now, the CLANG doesn't support any arithmatic operations on it. So we have to introduce a new pass to do this optimization. v2: fix comments. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-01-09GBE: switch to use CLANG native image types.Zhigang Gong9-424/+175
CLANG has all native image types since 3.3. There is no need to keep the original hacky implementation now. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-01-09Refactor all image builtin functions.Zhigang Gong4-416/+618
Refactor almost all the image builtin related functions to simplfy the code and get rid of most of the awful macros. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-01-09GBE: don't always treat a multiple destination instruction as root.Zhigang Gong1-3/+2
Don't know why we set this type of instruction as root. It doesn't make sense. For example, if we have a read_imagei() to read some data to a int4 value and then never use these 4 value, we definitely don't need to generate this instruction. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2014-12-16GBE: Add some missing constant expression cases.Zhigang Gong4-11/+135
Major for two types of constant expression cases: 1. The destination is a vector. 2. Some missing operators. Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2014-12-16GBE: Add constant pointer in the memcpy intrinsic.Zhigang Gong3-1/+187
Blender may generate such type of intrinsics. Now fix it. Also fixed a previous typo which will not assert when it should assert. Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2014-12-16GBE: eliminate duplicate GEP handling logic.Zhigang Gong3-61/+50
Part of GEP lowering logic in constant expression is the same as the normal GEP instruction lowering pass. This patch extract the common logic and reduce the redundant code. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2014-12-16GBE: remove useless code.Zhigang Gong1-23/+4
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>