~gongzg/beignet - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2014-09-19	draft fix.loop_opt	Zhigang Gong	3	-9/+49
	./opencv_test_imgproc --gtest_filter=OCL_Filter/LaplacianTest.Accuracy/60 hang. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2014-09-19	GBE: fix a loop header file including bug.	Zhigang Gong	1	-1/+0
	function.hpp doesn't need to include the structural_analysis.hpp. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2014-09-18	add handleSelfLoopNode to insert while instruction on Gen IR level.	Luo Xionghu	3	-10/+34
	Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
2014-09-18	Add Gen IR WHILE.	Luo Xionghu	3	-1/+9
	Add Gen IR WHILE to mark the strucutred region. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
2014-09-18	GBE/libocl: Add __gen_ocl_get_timestamp() to get timestamp.	Ruiling Song	5	-0/+132
	Gen provide tm0 register for intra-kernel profiling. Here we provide an API __gen_ocl_get_timestamp() to return the timestamp in TM. The return type is defined as: struct time_stamp { ulong tick; uint event; }; 'tick' is a 64bit time tick. 'event' stores a value which means whether a tmEvent has occured (non-zero) or not (0). tmEvent includes time-impacting event such as context switch or frequency change since last time tm0 was read. I add a sample in the kernels/compiler_time_stamp.cl. Hope it would help you understand how to use it. V2: Introduce ir::ARFRegister to avoid directly use of nr/subnr in Gen IR. Rename __gen_ocl_extract_reg to __gen_ocl_region. Rename beignet_get_time_stamp to __gen_ocl_get_timestamp. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-09-18	Add long support for printf	Junyan He	1	-5/+20
	V2: Replace all the long and ulong to int64_t Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-09-12	fix piglit get kernel info FUNCTION ATTRIBUTE fail.	Luo	1	-0/+5
	the backend need return the kernel FUNCTION ATTRIBUTE message to the clGetKernelInfo. there are 3 kind of function attribute so far, vec_type_hint parameter is not available to return due to llvm lack of such info. Signed-off-by: Luo <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-08-27	fix opencv_test_imgproc subcase OCL_ImgProc/Accumulate.Mask regression.	Luo Xionghu	4	-7/+33
	This regression is caused by structural analysis when check the if-then node, acturally there are four types of if-then node according to the topology and fallthrough information. fallthrough check is added in this patch. v2: add inversePredicate member and function for BranchInstruction; print the exact meanning of IF instruction in GEN_IR. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-08-19	Fix compile warnings for CLANG compiler	Lv Meng	1	-15/+13
	1.fix data structure redefine warnings. 2.fix 'data' with variable sized type 'union<*>' not at the end of a class warning(in immediate.hpp). 3.fix implicitly conversion warning. 4.fix explicitly assigning a variable type warning. 5.fix comparison of unsigned expression < 0 is always false warning(in cl_api.c). Signed-off-by: Lv Meng <meng.lv@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2014-08-19	Fix compile warnings for ICC compiler	Lv Meng	6	-22/+22
	1.the "const" associated functions' modification is to fix "type qualifier on return type is meaningless" for ICC compile warning. 2.the "operator new" shoud have the corresponding "operator delete" function. 3.In C++0x std::auto_ptr will be deprecated in favor of std::unique_ptr. Signed-off-by: Lv Meng <meng.lv@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2014-08-12	Fix compile errors for CLANG compiler	Lv Meng	1	-5/+0
	Use vector to fix "variable length array of non-POD element type" compiler error. The /beignet/backend/src/./ir/context.hpp "fn->immediates[index] = imm" would call a private func 'operator=' which would trigger error, and it is not being used. The undefined reference to `check_copy_overlap' would occur in the following calling. Signed-off-by: Lv Meng <meng.lv@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2014-08-12	GBE: initialize BTI structure to zero.	Ruiling Song	1	-0/+4
	Clear to zero to avoid garbage data, as we do not assign it later for local/constant memory access. v2: move initialization code into constructor. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2014-07-31	GBE: complete constant expression processing.	Zhigang Gong	7	-35/+384
	The target is to process all possible complex nested constant expression as below: const = type0 OP0 (const0) const0 = type1 OP1 (const1, const2) const1 = ... The supported OPs are as below: BITCAST, ADD, SUB, MUL, DIV, REM, SHL, ASHR, LSHR, AND, OR, XOR We also add support for array/vector type of immediate. Some possible examples are as below: float bitcast (i32 trunc (i128 bitcast (<4 x i32> <i32 1064178811, i32 1064346583, i32 1062836634, i32 undef> to i128) to i32) to float) float bitcast (i32 trunc (i128 lshr (i128 bitcast (<4 x i32> <i32 1064178811, i32 1064346583, i32 1062836634, i32 undef> to i128), i128 32) to i32) to float) v2: separate all private method implementations to immediate.cpp. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2014-07-31	GBE: simplify processConstant.	Zhigang Gong	1	-1/+1
	Preparation to support generic constant expression. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2014-07-31	GBE: refactor the immediate class to support vector data type.	Zhigang Gong	4	-39/+163
	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2014-07-30	GBE: Handle bti allocation for internal buffer used by printf.	Ruiling Song	4	-0/+23
	1. Move the bti/Register map from gbe::Context to ir::Function. 2. use GlobalVariable instead of 'call' to get internal buffer (used for printf) base address. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-07-30	GBE: Refine bti usage in backend & runtime.	Ruiling Song	7	-26/+61
	Previously, we simply map 2G surface for memory access, which has obvious security issue, user can easily read/write graphics memory that does not belong to him. To prevent such kind of behaviour, We bind each surface to a dedicated bti. HW provides automatic bounds check. For out-of-bound write, it will be ignored. And for read out-of-bound, hardware will simply return zero value. The idea behind the patch is for a load/store instruction, it will search through the LLVM use-def chain until finding out where the address comes from. Then the bti is saved in ir::Instruction and used for the later code generation. And for mixed pointer case, a load/store will access more than one bti. To simplify some code, '0' is reserved for constant address space, '1' is reserved for private address space. Other btis are assigned automatically by backend. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-07-28	GBE: align the fields in union ImageInfoKey.	Ruiling Song	2	-2/+2
	To avoid possible garbage data. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-07-08	Use instruction if else and endif manipulate structures	Yongjia Zhang	1	-0/+2
	Use instruction if, else and endif manipulate the control flow of identified if-then and if-else structures at backend. but this is not enabled, just add the necessary code to backend. Signed-off-by: Yongjia Zhang <yongjia.zhang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-07-08	Add structure identification on ir level	Yongjia Zhang	5	-9/+1462
	Add tool structures and functions for identifying if-then and if-else structures on Gen IR level. Signed-off-by: Yongjia Zhang <yongjia.zhang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-07-08	Add Gen IR IF, ELSE and ENDIF	Yongjia Zhang	3	-4/+27
	Add Gen IR IF, ELSE and ENDIF to mark the strucutred region. Signed-off-by: Yongjia Zhang <yongjia.zhang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-06-24	Implement the %p in the printf	Junyan He	2	-1/+5
	Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-06-24	Add the support for vector type in printf.	Junyan He	2	-71/+84
	Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-06-23	Add the format and flag support for printf.	Junyan He	2	-9/+108
	The format and flag such as -+# and precision request has been added into the output. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-06-23	Add the support for %s in printf	Junyan He	2	-27/+23
	Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2014-06-23	Add %f and %c support for printf.	Junyan He	2	-36/+38
	Add the %c and %f support for printf. Also add the int to float and int to char conversion. Some minor errors such as wrong index flags have been fixed. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2014-06-23	GBE: fix some get kernel arg info bugs.	Zhigang Gong	1	-0/+1
	Still can't handle the sampler_t which is not used actually. Access qualifier seems broken with llvm 3.3. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2014-06-20	GBE/runtime: fixup broken 1d array image support.	Zhigang Gong	1	-1/+1
	As sample LD message doesn't support array index, we have to create a 2D array surface with the same buffer object. Thus one 1D array image will have two surfaces binded to it one is the index and the second is 128 + index. And then at kernel side, we will access the corresponding 2D array surface when the LD message is required otherwise will access the origin 1D array surface. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: He Junyan <junyan.he@inbox.com>
2014-06-19	Add a lock in the place of printf output	Junyan He	2	-10/+37
	If multi-thread run the kernel simultaneously, the output may interlace with each other. Add a lock to avoid this. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-06-13	Add the llvm info to the function for later usage.	Junyan He	3	-5/+18
	Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-06-13	Add the support for 1D image in backend	Junyan He	5	-24/+13
	1. Delete the is3D member in instruction class. Because we need more than 1 bit to represent 1D 2D and 3D. We now add an invalid register in ir profile, and comparing the coords to it to judge the dimension. 2. Rename all the xxx_image to xxx_image2D to make its meaning clear. 3. Update the according Sampler and Typed_Write instruction in selection and Gen IR generation. v2: fix the use of InvalidRegister. Use ir::ocl::invalid only. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2014-06-11	Add the PrintfParser llvm parser into the llvm backend.	Junyan He	2	-0/+5
	The PrintfParser will work before the llvm gen backend. It will filter out all the printf function call. When the printf call found, we will analyse the print format and % place holder here. Replace the print call with STORE or CONV+STORE instruction if needed. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-06-11	Add the PrintfSet class into the ir	Junyan He	2	-0/+316
	The PrintfSet will be used to collect all the infomation in the kernel. After the kernel executed, it will be used to generate the according printf output. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-06-11	Add two special register for printf output buffer usage	Junyan He	2	-2/+7
	printfiptr for printf index buffer pointer in curbe and printfbptr for printf output buffer pointer in curbe. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-06-11	GBE: support SLM bool load and store.	Zhigang Gong	1	-1/+0
	The OCL spec does allow the use of a i1/BOOL SLM variable, so we have to support the load and store of it. To make things simple, I choose to use S16 to represent i1 value. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2014-06-04	GBE: Optmize phi elimination	Ruiling Song	1	-0/+11
	During phi elimination, we simply insert 3 MOVs for one phi instruction to avoid lost copy issue. But in fact, only two of them are needed for most of time. This patch tries to see whether the move from phiCopy to phi can be avoided. The patch basically checks whether the phiCopy and phi have live range interference. If no, then they can be coalesced, thus one instruction can be optimized. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-06-04	Revert "GBE: No need to compute liveout again in value.cpp."	Ruiling Song	1	-0/+33
	We need to transfer ValueDef from predecessors to their successors. Consider a register defined in BB0, and used in BB3. we need to iterate over liveout to pass the def in BB0 to BB3, so the use in BB3 could get that correct def. Otherwise, the UD/DU graph is incomplete. This reverts commit 89b490b5a17cfda2d9816dc1c246ce5bbff12648. Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-05-28	separate runtime(libcl.so) and compiler(libgbe.so)	Guo Yejun	2	-30/+33
	On embedded/handheld devices, storage and memory are scarce, it is necessary to provide only the OpenCL runtime library with small size, and only the executable binary kernel will be supported on such device. At the beginning of process (before function main), OpenCL runtime (libcl.so) will try to load the compiler (libgbe.so), the system's behavior is the same as before if successfully loaded, otherwise, the runtime assumes no OpenCL compiler in the system, and the device info will be changed as CL_DEVICE_COMPILER_AVAILABLE=false and CL_DEVICE_PROFILE="EMBEDDED_PROFILE", the clBuildProgram returns CL_COMPILER_NOT_AVAILABLE if the program is created with clCreateProgramWithSource, following the OpenCL spec. To simulate the case without OpenCL compiler, just delete the file libgbe.so, or export OCL_NON_COMPILER=1. Some explanation of the binary kernel interpreter (libinterp.a): libinterp.a is used to interpret the binary kernel inside runtime, and the runtime library libcl.so is built against libinterp.a. Since the code to interpret binary kernel is tightly integrated inside the compiler, to avoid code duplicate, a new file gbe_bin_interpreter.cpp is created to include some other .cpp files; to make libinterp.a small (the purpose to make libcl.so small), the macro GBE_COMPILER_AVAILABLE is used to make only the needed code active when build for libinterp.a. V2: code base is changed to call function gbe_set_image_base_index in gbe_bin_generater, while this function is modified in this patch as gbe_set_image_base_index_compiler, fix it accordingly. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com> Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-05-23	GBE: fix a uniform analysis bug.	Zhigang Gong	2	-20/+46
	If a value is defined in a loop and is used out-of the loop. That value could not be a uniform(scalar) value. The reason is that value may be assigned different scalar value on different lanes when it reenters with different lanes actived. Thanks for yang rong reporting this bug. Signed-off-by: Zhigang Gong <zhigang.gong@gmail.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2014-05-19	HSW: Workaround the slm address issue.	Yang Rong	2	-2/+4
	Each work group has it's own slm offset, and when dispatch threads, TSG will handle it automatic in IVB. But it will fail in HSW. After check, all work group's slm offset are 0, even the slm index is correct in R0.0. So calc the slm offset for slm index, and add it to the slm address. TODO: need to find the root casue. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Junyan He <junyan.he@inbox.com>
2014-05-14	GBE: fix one regression caused by uniform analysis.	Zhigang Gong	1	-1/+7
	Some instructions handle simd1 incorrectly. Disable them currently. v2: add addsat into the unsupported list. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2014-05-13	GBE: enable uniform analysis for bool data type.	Zhigang Gong	1	-2/+1
	v2: refine the flag allocation implementation. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2014-05-13	GBE: enable uniform for load instruction.	Zhigang Gong	1	-2/+3
	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2014-05-13	GBE: implement uniform analysis.	Zhigang Gong	3	-0/+20
	We have many uniform (scalar) input values which include the kernel input argument and some special registers. And all those variables derived by all uniform values are also uniform values. This patch analysis this type of register at liveness analysis stage, and change uniform register's type to scalar type. Then latter, these registers need less register space. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2014-05-13	GBE: No need to compute liveout again in value.cpp.	Zhigang Gong	1	-33/+0
	We already did a complete liveness analysis at the liveness.cpp. Don't need to do that again. Save about 10% of the compile time. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2014-05-09	do not serialize zero image/sampler info into binary	Guo Yejun	2	-0/+5
	if there is no image/sampler used in kernel source, it is not necessary to serialize the zero image/sampler info into kernel binary. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2014-04-23	GBE: fixed the undefined phi value's liveness analysis.	Zhigang Gong	4	-4/+9
	If a phi component is undef from one of the predecessors, we should not pass it as the predecessor's liveout registers. Otherwise, that phi register's liveness may be extent to the basic block zero which is not good. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2014-04-22	support __gen_ocl_simd_any and __gen_ocl_simd_all	Guo Yejun	2	-0/+6
	short __gen_ocl_simd_any(short x): if x in any of the active threads in the same SIMD is not zero, the return value for all these threads is not zero, otherwise, zero returned. short __gen_ocl_simd_all(short x): only if x in all of the active threads in the same SIMD is not zero, the return value for all these threads is not zero, otherwise, zero returned. for example: to check if a special value exists in a global buffer, use one SIMD to do the searching parallelly, the whole SIMD can stop the task once the value is found. The key kernel code looks like: for(; ; ) { ... if (__gen_ocl_simd_any(...)) break; //the whole SIMD stop the searching } Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-04-08	GBE: Add two helper scalar registers to hold 0 and all 1s.	Zhigang Gong	2	-4/+8
	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2014-04-08	GBE: Don't need the emask/notemask/barriermask any more.	Zhigang Gong	2	-9/+3
	As we change to use if/endif and change the implementation of the barrier, we don't need to maintain emask/notmask/barriermask any more. Just remove them. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>