summaryrefslogtreecommitdiff
path: root/backend/src/ir
AgeCommit message (Collapse)AuthorFilesLines
2014-09-19draft fix.loop_optZhigang Gong3-9/+49
./opencv_test_imgproc --gtest_filter=OCL_Filter/LaplacianTest.Accuracy/60 hang. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2014-09-19GBE: fix a loop header file including bug.Zhigang Gong1-1/+0
function.hpp doesn't need to include the structural_analysis.hpp. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2014-09-18add handleSelfLoopNode to insert while instruction on Gen IR level.Luo Xionghu3-10/+34
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
2014-09-18Add Gen IR WHILE.Luo Xionghu3-1/+9
Add Gen IR WHILE to mark the strucutred region. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
2014-09-18GBE/libocl: Add __gen_ocl_get_timestamp() to get timestamp.Ruiling Song5-0/+132
Gen provide tm0 register for intra-kernel profiling. Here we provide an API __gen_ocl_get_timestamp() to return the timestamp in TM. The return type is defined as: struct time_stamp { ulong tick; uint event; }; 'tick' is a 64bit time tick. 'event' stores a value which means whether a tmEvent has occured (non-zero) or not (0). tmEvent includes time-impacting event such as context switch or frequency change since last time tm0 was read. I add a sample in the kernels/compiler_time_stamp.cl. Hope it would help you understand how to use it. V2: Introduce ir::ARFRegister to avoid directly use of nr/subnr in Gen IR. Rename __gen_ocl_extract_reg to __gen_ocl_region. Rename beignet_get_time_stamp to __gen_ocl_get_timestamp. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-09-18Add long support for printfJunyan He1-5/+20
V2: Replace all the long and ulong to int64_t Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-09-12fix piglit get kernel info FUNCTION ATTRIBUTE fail.Luo1-0/+5
the backend need return the kernel FUNCTION ATTRIBUTE message to the clGetKernelInfo. there are 3 kind of function attribute so far, vec_type_hint parameter is not available to return due to llvm lack of such info. Signed-off-by: Luo <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-08-27fix opencv_test_imgproc subcase OCL_ImgProc/Accumulate.Mask regression.Luo Xionghu4-7/+33
This regression is caused by structural analysis when check the if-then node, acturally there are four types of if-then node according to the topology and fallthrough information. fallthrough check is added in this patch. v2: add inversePredicate member and function for BranchInstruction; print the exact meanning of IF instruction in GEN_IR. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-08-19Fix compile warnings for CLANG compilerLv Meng1-15/+13
1.fix data structure redefine warnings. 2.fix 'data' with variable sized type 'union<*>' not at the end of a class warning(in immediate.hpp). 3.fix implicitly conversion warning. 4.fix explicitly assigning a variable type warning. 5.fix comparison of unsigned expression < 0 is always false warning(in cl_api.c). Signed-off-by: Lv Meng <meng.lv@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2014-08-19Fix compile warnings for ICC compilerLv Meng6-22/+22
1.the "const" associated functions' modification is to fix "type qualifier on return type is meaningless" for ICC compile warning. 2.the "operator new" shoud have the corresponding "operator delete" function. 3.In C++0x std::auto_ptr will be deprecated in favor of std::unique_ptr. Signed-off-by: Lv Meng <meng.lv@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2014-08-12Fix compile errors for CLANG compilerLv Meng1-5/+0
Use vector to fix "variable length array of non-POD element type" compiler error. The /beignet/backend/src/./ir/context.hpp "fn->immediates[index] = imm" would call a private func 'operator=' which would trigger error, and it is not being used. The undefined reference to `check_copy_overlap' would occur in the following calling. Signed-off-by: Lv Meng <meng.lv@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2014-08-12GBE: initialize BTI structure to zero.Ruiling Song1-0/+4
Clear to zero to avoid garbage data, as we do not assign it later for local/constant memory access. v2: move initialization code into constructor. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2014-07-31GBE: complete constant expression processing.Zhigang Gong7-35/+384
The target is to process all possible complex nested constant expression as below: const = type0 OP0 (const0) const0 = type1 OP1 (const1, const2) const1 = ... The supported OPs are as below: BITCAST, ADD, SUB, MUL, DIV, REM, SHL, ASHR, LSHR, AND, OR, XOR We also add support for array/vector type of immediate. Some possible examples are as below: float bitcast (i32 trunc (i128 bitcast (<4 x i32> <i32 1064178811, i32 1064346583, i32 1062836634, i32 undef> to i128) to i32) to float) float bitcast (i32 trunc (i128 lshr (i128 bitcast (<4 x i32> <i32 1064178811, i32 1064346583, i32 1062836634, i32 undef> to i128), i128 32) to i32) to float) v2: separate all private method implementations to immediate.cpp. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2014-07-31GBE: simplify processConstant.Zhigang Gong1-1/+1
Preparation to support generic constant expression. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2014-07-31GBE: refactor the immediate class to support vector data type.Zhigang Gong4-39/+163
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2014-07-30GBE: Handle bti allocation for internal buffer used by printf.Ruiling Song4-0/+23
1. Move the bti/Register map from gbe::Context to ir::Function. 2. use GlobalVariable instead of 'call' to get internal buffer (used for printf) base address. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-07-30GBE: Refine bti usage in backend & runtime.Ruiling Song7-26/+61
Previously, we simply map 2G surface for memory access, which has obvious security issue, user can easily read/write graphics memory that does not belong to him. To prevent such kind of behaviour, We bind each surface to a dedicated bti. HW provides automatic bounds check. For out-of-bound write, it will be ignored. And for read out-of-bound, hardware will simply return zero value. The idea behind the patch is for a load/store instruction, it will search through the LLVM use-def chain until finding out where the address comes from. Then the bti is saved in ir::Instruction and used for the later code generation. And for mixed pointer case, a load/store will access more than one bti. To simplify some code, '0' is reserved for constant address space, '1' is reserved for private address space. Other btis are assigned automatically by backend. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-07-28GBE: align the fields in union ImageInfoKey.Ruiling Song2-2/+2
To avoid possible garbage data. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-07-08Use instruction if else and endif manipulate structuresYongjia Zhang1-0/+2
Use instruction if, else and endif manipulate the control flow of identified if-then and if-else structures at backend. but this is not enabled, just add the necessary code to backend. Signed-off-by: Yongjia Zhang <yongjia.zhang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-07-08Add structure identification on ir levelYongjia Zhang5-9/+1462
Add tool structures and functions for identifying if-then and if-else structures on Gen IR level. Signed-off-by: Yongjia Zhang <yongjia.zhang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-07-08Add Gen IR IF, ELSE and ENDIFYongjia Zhang3-4/+27
Add Gen IR IF, ELSE and ENDIF to mark the strucutred region. Signed-off-by: Yongjia Zhang <yongjia.zhang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-06-24Implement the %p in the printfJunyan He2-1/+5
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-06-24Add the support for vector type in printf.Junyan He2-71/+84
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-06-23Add the format and flag support for printf.Junyan He2-9/+108
The format and flag such as -+# and precision request has been added into the output. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-06-23Add the support for %s in printfJunyan He2-27/+23
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2014-06-23Add %f and %c support for printf.Junyan He2-36/+38
Add the %c and %f support for printf. Also add the int to float and int to char conversion. Some minor errors such as wrong index flags have been fixed. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2014-06-23GBE: fix some get kernel arg info bugs.Zhigang Gong1-0/+1
Still can't handle the sampler_t which is not used actually. Access qualifier seems broken with llvm 3.3. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2014-06-20GBE/runtime: fixup broken 1d array image support.Zhigang Gong1-1/+1
As sample LD message doesn't support array index, we have to create a 2D array surface with the same buffer object. Thus one 1D array image will have two surfaces binded to it one is the index and the second is 128 + index. And then at kernel side, we will access the corresponding 2D array surface when the LD message is required otherwise will access the origin 1D array surface. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: He Junyan <junyan.he@inbox.com>
2014-06-19Add a lock in the place of printf outputJunyan He2-10/+37
If multi-thread run the kernel simultaneously, the output may interlace with each other. Add a lock to avoid this. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-06-13Add the llvm info to the function for later usage.Junyan He3-5/+18
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-06-13Add the support for 1D image in backendJunyan He5-24/+13
1. Delete the is3D member in instruction class. Because we need more than 1 bit to represent 1D 2D and 3D. We now add an invalid register in ir profile, and comparing the coords to it to judge the dimension. 2. Rename all the xxx_image to xxx_image2D to make its meaning clear. 3. Update the according Sampler and Typed_Write instruction in selection and Gen IR generation. v2: fix the use of InvalidRegister. Use ir::ocl::invalid only. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2014-06-11Add the PrintfParser llvm parser into the llvm backend.Junyan He2-0/+5
The PrintfParser will work before the llvm gen backend. It will filter out all the printf function call. When the printf call found, we will analyse the print format and % place holder here. Replace the print call with STORE or CONV+STORE instruction if needed. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-06-11Add the PrintfSet class into the irJunyan He2-0/+316
The PrintfSet will be used to collect all the infomation in the kernel. After the kernel executed, it will be used to generate the according printf output. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-06-11Add two special register for printf output buffer usageJunyan He2-2/+7
printfiptr for printf index buffer pointer in curbe and printfbptr for printf output buffer pointer in curbe. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-06-11GBE: support SLM bool load and store.Zhigang Gong1-1/+0
The OCL spec does allow the use of a i1/BOOL SLM variable, so we have to support the load and store of it. To make things simple, I choose to use S16 to represent i1 value. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2014-06-04GBE: Optmize phi eliminationRuiling Song1-0/+11
During phi elimination, we simply insert 3 MOVs for one phi instruction to avoid lost copy issue. But in fact, only two of them are needed for most of time. This patch tries to see whether the move from phiCopy to phi can be avoided. The patch basically checks whether the phiCopy and phi have live range interference. If no, then they can be coalesced, thus one instruction can be optimized. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-06-04Revert "GBE: No need to compute liveout again in value.cpp."Ruiling Song1-0/+33
We need to transfer ValueDef from predecessors to their successors. Consider a register defined in BB0, and used in BB3. we need to iterate over liveout to pass the def in BB0 to BB3, so the use in BB3 could get that correct def. Otherwise, the UD/DU graph is incomplete. This reverts commit 89b490b5a17cfda2d9816dc1c246ce5bbff12648. Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-05-28separate runtime(libcl.so) and compiler(libgbe.so)Guo Yejun2-30/+33
On embedded/handheld devices, storage and memory are scarce, it is necessary to provide only the OpenCL runtime library with small size, and only the executable binary kernel will be supported on such device. At the beginning of process (before function main), OpenCL runtime (libcl.so) will try to load the compiler (libgbe.so), the system's behavior is the same as before if successfully loaded, otherwise, the runtime assumes no OpenCL compiler in the system, and the device info will be changed as CL_DEVICE_COMPILER_AVAILABLE=false and CL_DEVICE_PROFILE="EMBEDDED_PROFILE", the clBuildProgram returns CL_COMPILER_NOT_AVAILABLE if the program is created with clCreateProgramWithSource, following the OpenCL spec. To simulate the case without OpenCL compiler, just delete the file libgbe.so, or export OCL_NON_COMPILER=1. Some explanation of the binary kernel interpreter (libinterp.a): libinterp.a is used to interpret the binary kernel inside runtime, and the runtime library libcl.so is built against libinterp.a. Since the code to interpret binary kernel is tightly integrated inside the compiler, to avoid code duplicate, a new file gbe_bin_interpreter.cpp is created to include some other .cpp files; to make libinterp.a small (the purpose to make libcl.so small), the macro GBE_COMPILER_AVAILABLE is used to make only the needed code active when build for libinterp.a. V2: code base is changed to call function gbe_set_image_base_index in gbe_bin_generater, while this function is modified in this patch as gbe_set_image_base_index_compiler, fix it accordingly. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com> Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-05-23GBE: fix a uniform analysis bug.Zhigang Gong2-20/+46
If a value is defined in a loop and is used out-of the loop. That value could not be a uniform(scalar) value. The reason is that value may be assigned different scalar value on different lanes when it reenters with different lanes actived. Thanks for yang rong reporting this bug. Signed-off-by: Zhigang Gong <zhigang.gong@gmail.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2014-05-19HSW: Workaround the slm address issue.Yang Rong2-2/+4
Each work group has it's own slm offset, and when dispatch threads, TSG will handle it automatic in IVB. But it will fail in HSW. After check, all work group's slm offset are 0, even the slm index is correct in R0.0. So calc the slm offset for slm index, and add it to the slm address. TODO: need to find the root casue. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Junyan He <junyan.he@inbox.com>
2014-05-14GBE: fix one regression caused by uniform analysis.Zhigang Gong1-1/+7
Some instructions handle simd1 incorrectly. Disable them currently. v2: add addsat into the unsupported list. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2014-05-13GBE: enable uniform analysis for bool data type.Zhigang Gong1-2/+1
v2: refine the flag allocation implementation. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2014-05-13GBE: enable uniform for load instruction.Zhigang Gong1-2/+3
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2014-05-13GBE: implement uniform analysis.Zhigang Gong3-0/+20
We have many uniform (scalar) input values which include the kernel input argument and some special registers. And all those variables derived by all uniform values are also uniform values. This patch analysis this type of register at liveness analysis stage, and change uniform register's type to scalar type. Then latter, these registers need less register space. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2014-05-13GBE: No need to compute liveout again in value.cpp.Zhigang Gong1-33/+0
We already did a complete liveness analysis at the liveness.cpp. Don't need to do that again. Save about 10% of the compile time. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2014-05-09do not serialize zero image/sampler info into binaryGuo Yejun2-0/+5
if there is no image/sampler used in kernel source, it is not necessary to serialize the zero image/sampler info into kernel binary. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2014-04-23GBE: fixed the undefined phi value's liveness analysis.Zhigang Gong4-4/+9
If a phi component is undef from one of the predecessors, we should not pass it as the predecessor's liveout registers. Otherwise, that phi register's liveness may be extent to the basic block zero which is not good. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2014-04-22support __gen_ocl_simd_any and __gen_ocl_simd_allGuo Yejun2-0/+6
short __gen_ocl_simd_any(short x): if x in any of the active threads in the same SIMD is not zero, the return value for all these threads is not zero, otherwise, zero returned. short __gen_ocl_simd_all(short x): only if x in all of the active threads in the same SIMD is not zero, the return value for all these threads is not zero, otherwise, zero returned. for example: to check if a special value exists in a global buffer, use one SIMD to do the searching parallelly, the whole SIMD can stop the task once the value is found. The key kernel code looks like: for(; ; ) { ... if (__gen_ocl_simd_any(...)) break; //the whole SIMD stop the searching } Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-04-08GBE: Add two helper scalar registers to hold 0 and all 1s.Zhigang Gong2-4/+8
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2014-04-08GBE: Don't need the emask/notemask/barriermask any more.Zhigang Gong2-9/+3
As we change to use if/endif and change the implementation of the barrier, we don't need to maintain emask/notmask/barriermask any more. Just remove them. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>