~gongzg/beignet - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2015-12-21	Backend: Fix a memory leak for structurizer.HEAD master	Junyan He	1	-5/+4
	In structurizer, the useless instruction is just be erased from block. The iintrusive_list::erase() just unlink the instruction, but not free its resource. We should use remove() to deallocate the instruction object. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-21	add support for build option -cl-fast-relaxed-math	Guo Yejun	6	-13/+35
	current code handles this option at clang level, actually, it is also necessary at LLVM -> GEN stage. V2: check if options is NULL Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14	fix debug instruction welform assert.	Luo Xionghu	1	-1/+7
	handle llvm.fabs.f16 instruction in case compiler_half_isnan. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14	fix workgroup_broadcast instruction debug mode assert.	Luo Xionghu	1	-1/+1
	the workgoup related parameter of the workgroup_broadcast function is FAMILY_QWORD. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14	Utest: Add a bitonic sort test for non-constant extractelement	Pan Xiuli	3	-1/+94
	This test case is for the new added non-constand index extractelement path in llvm_scalarize pass. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14	Backend: Implement the non-constant extractelement scalarize	Pan Xiuli	1	-7/+39
	The maybe non-constant index for the extractelement inst, it was not implemented in the llvm_scalarize. Now provide an implemention by allocating a new vector and storing all the component in it. Then can get the needed component by GEP inst. V2: Remove debuginfo and fix map insert bug Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14	Utests: Add test cases for reduce add.	Junyan He	2	-0/+81
	Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14	Backend: Add reduce add to gen_context.	Junyan He	1	-2/+22
	Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14	Utests: Add test cases for workgroup reduce max/min.	Junyan He	3	-0/+203
	Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14	Runtime: Add the threadid calculation for curbe.	Junyan He	1	-1/+11
	Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14	Backend: Implement reduce min and max in gen_context	Junyan He	1	-5/+279
	Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14	Backend: Add state register into schedule consideration.	Junyan He	1	-4/+12
	Because the workgroup OP has forwarding msg and wait functions, it needs all the threads to sync with each other. It has very similar behavior as BARRIER, so we add it into schedule consideration accordingly. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14	Backend: Add WORKGROUP_OP instruction selection.	Junyan He	6	-0/+41
	Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14	Add forward message function for gen encoder.	Junyan He	2	-0/+14
	Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14	Backend: Establishing the thread/TID-EUID map.	Junyan He	2	-8/+151
	We need to use forward message to send data and sync threads within the same work group. The HW lack the feature to get the TID and EUID of other threads. So we need to establish a map for this usage. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14	libocl: Refine the workgroup functions, add signed info.	Junyan He	1	-57/+57
	Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14	Backend: Add threadid as a curbe register.	Junyan He	4	-3/+7
	We need to build a map between logical workgroup thread and real hw thread. This curbe will hold the logical workgroup thread ID when it is executed on some HW thread. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14	Backend: Add tidMapSLM and wgBroadcastSLM to each function.	Junyan He	3	-9/+20
	We will use SLM to store the value to broadcast and the map between real hw thread and logical workgroup thread. These two values give the offset in the SLM. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14	Backend: Add sr0 reg helper function.	Junyan He	2	-7/+11
	sr0 is used to specify the state reigster where we can get the state of each EU thread. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14	add Broxton support	Guo Yejun	12	-12/+213
	special versions of linux kernel and libdrm are needed. utest and conformance test PASSED. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Junyan He <junyan.he@linux.intel.com>
2015-12-10	fix LLVM 3.5 fail.	Luo Xionghu	1	-0/+14
	aslo call llvm::WriteBitcodeToFile for -dump-spir-binary option under LLVM 3.5. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-10	utests: add an utest for mix	Pan Xiuli	3	-1/+56
	Add a testcase for compiler mix. Since mix will have error, we take err limit as 1e-3 and print the max err. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-10	Backend: refine mix with hardware lrp function	Pan Xiuli	13	-1/+37
	EU support lrp function that simillar to mix, but only with float, so refine only float related mix with lrp. There will be little errors whit mix now with lrp. V2: Rebase the patch Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-10	utest: add utest to generate spir binary from beignet.	Luo Xionghu	1	-0/+99
	include case compile_spir_binary and build_spir_binary. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-10	backend: enable option -dump-spir-binary to generate SPIR binary from beignet.	Luo Xionghu	1	-5/+22
	allow user to pass "-dump-spir-binary=[file_name]" to generate the spir binary to the file. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-10	runtime: add macro DEBUGP() to handle debug printf.	Ruiling Song	2	-3/+9
	It will output debug message under debug mode, and will not output under release mode. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-10	Utest: Add test for get_global/local_linear_id	Pan Xiuli	5	-1/+183
	Add two utest case test for OCL2.0 new work-item built-in functions. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-10	utest: write to dst buffer to fix utest failure	Ruiling Song	1	-0/+2
	the utest will check on this. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-10	libocl: Add three work-item built-in function	Pan Xiuli	2	-0/+33
	Add get global/local linear id by calculate with global/local id, size and offset. The get_queue_local_size() and get_loal_size() should be different when the global work group size is not uniform, but now they are the same. We will refine these functions when we support non-uniform work-group size. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Junyan He <junyan.he@linux.intel.com>
2015-12-09	SKL: Use kernel-defined MOCS values instead of assuming hardware defaults.	Francisco Jerez	1	-2/+2
	Reported to fix fix a ~50% performance regression (in OpenCV 3.0 and Luxmark 2.1 among others) with v4.3 kernels on Gen9 hardware. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92975 Signed-off-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09	fix gcc build error.	Luo Xionghu	2	-2/+2
	this link fail appears on gcc 5.2.1. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09	GBE: implement pre-register-allocation instruction scheduling.	Zhigang Gong	1	-21/+116
	To find out an instruction scheduling policy to achieve the theoretical minimum registers required in a basic block is a NP problem. We have to use some heuristic factor to simplify the algorithm. There are many researchs which indicate a bottom-up list scheduling is much better than the top-down method in turns of register pressure. I choose one of such research paper as our target. The paper is as below: "Register-Sensitive Selection, Duplication, and Sequencing of Instructions" It use the bottom-up list scheduling with a Sethi-Ullman label as an heuristic number. As we will do cycle awareness scheduling after the register allocation, we don't need to bother with cycle related heuristic number here. I just skipped the EST computing and usage part in the algorithm. It turns out this algorithm works well. It could reduce the register spilling in clBlas's sgemmBlock kernel from 83+ to only 20. Although this scheduling method seems to be lowering the ILP(instruction level parallism). It's not a big issue, because we will allocate as much as possible different registers in the following register allocation stage, and we will do a after allocation instruction scheduling which will try to get as much ILP as possible. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-12-09	change the sampler type value to keep same with spir spec.	Luo Xionghu	1	-17/+16
	these value are defined in 2.1.3.1 @ spir_spec-1.2.pdf. v2: include new patch. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09	gbe/libocl: define the gentype half_xxx math function instead of using MACRO.	Luo Xionghu	3	-29/+70
	these function need to be overloadable for link. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09	gbe: add vec_type_hint's type into functionAttributes.	Luo Xionghu	3	-7/+104
	for SPIR kernel, user may call clGetKernelInfo with CL_KERNEL_ATTRIBUTES to query the functionAttributes. v2: remove debug code. v3: fix llvm 3.5 build fail. v4: only llvm-3.6 or later has kernel_arg_base_type in metadata. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09	runtime: add missing supported format image_1d_buffer.	Luo Xionghu	1	-0/+1
	Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09	runtime: fix clLinkProgram bug.	Luo Xionghu	1	-0/+10
	clLinkProgram need check the existence of "-cl-kernel-arg-info" build_option of all the input_programs. User may link two SPIR program and call clGetKernelArgInfo to query kernel args. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09	runtime: fix clCompileProgram bug.	Luo Xionghu	3	-13/+6
	forgot to add FROM_LLVM_SPIR in compileProgram; the BINARY_TYPE is BINARY_TYPE_INTERMIDIATE if create from SPIR binary. v2: refine the source_type logic: source_type is already set in clCreateProgramWithSource or clCreateProgramWithBinary, shouldn't be set in clBuildProgram or clCompileProgram. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09	runtime: initialize the memory content to 0.	Luo Xionghu	1	-0/+1
	v2: move the memset inside the pointer check. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09	gbe/libocl: change xxx_fence function to OVERLOADABLE.	Luo Xionghu	2	-6/+6
	the mem_fence, read_mem_fence and write_mem_fence functions need be OVERLOADABLE for link. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09	gbe: use kernel_arg_base_type to recognize image arguments.	Luo Xionghu	2	-7/+21
	we use string compare function to recognize the image type, but image types could be defined to other type, so use kernel_arg_base_type to recognize image arguments. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09	gbe/libocl: define the vloada_xxx function instead of using MACRO.	Luo Xionghu	2	-8/+26
	these function need to be overloadable for link. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09	add utest to demo how to run CM kernerl via OpenCL APIs	Guo Yejun	3	-0/+280
	In this test case, the CM kernel is in VISA binary format, not in GenX Binary format, it means that the CM jitter is needed to compile the CM kernel from VISA format to GenX format, please refer to cmrt_package_path/jitter/readme.txt to prepare the jitter. v2: add comments about the CM jitter Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09	make Beignet as intermedia layer of CMRT	Guo Yejun	15	-14/+497
	CMRT is C for Media Runtime on Intel GPU, see https://github.com/01org/cmrt. There is a request to make Beignet as intermedia layer of CMRT, in other words, application programer write OpenCL APIs to execute the CM kernel on GPU, the following shows the key code, and please refer to the next patch of unit test for detail. prog = clCreateProgramWithBinary("cm kernel"); clBuildProgram(prog); kernel = clCreateKernel(prog, "kernel name"); image = clCreateImage(); clSetKernelArg(kernel, image); clEnqueueNDRangeKernel(kernel); Inside Beignet, once cm kernel is invoked, the following relative APIs will be directly passed to CMRT library (libcmrt.so) which is loaded via dlopen only when necessary. Since we use this simple method to keep the code clean, OpenCL spec is not strictly followed, and cl_event is not supported for this case. v2: add comments about the cm queue in fuction cmrt_enqueue Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-02	GBE/DebugInfo: Print line and column NO. with ASM	Bai Yannan	1	-0/+7
	Print line and column numbers with ASM, if OCL_DEBUGINFO is true. Signed-off-by: Yannan Bai <yannan.bai@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-12-02	GBE/DebugInfo: Pass debug info : SEL IR => GenInsn	Bai Yannan	4	-1/+23
	1. Add a structure DBGInfo in GenEncoder class, storing debug infomation for subsequentlt passing to GenInsn. 2. Add a vector<DebugInfo> in GenEncoder class, storing debug information corresponding to vector<GenInstruction>. 3. Pass debug information from SEL IR firstly to Gen Encoder, then add to vector when emitting, if OCL_DEBUGINFO is true. Signed-off-by: Yannan Bai <yannan.bai@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-12-02	GBE/DebugInfo: Pass debug info : GEN IR => SEL IR	Bai Yannan	2	-0/+14
	1. Add a DebugInfo type structure DBGInfo in Opaqueue class, storing debug infomation in selection for subsequentlt passing to selection IR. 2. Add a DebugInfo type structure DBGInfo in SelectionInstruction class, storing debug infomation. 3. Pass debug information from GEN IR firstly to selection queue, then pass to selection IR when emitting, if OCL_DEBUGINFO is true. Signed-off-by: Yannan Bai <yannan.bai@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-12-02	GBE/DebugInfo: Pass debug info :llvm IR => GEN IR	Bai Yannan	4	-1/+25
	1. Add a DebugInfo type structure DBGInfo into context class, storing debug information. 2. Add a DebugInfo type structure DBGInfo in Instruction class, storing debug infomation. 3. Pass debug information firstly from llvm IR to Context, then to GEN IR when emiting, if OCL_DEBUGINFO is true. Signed-off-by: Yannan Bai <yannan.bai@intel.com> Signed-off-by: Meng Lv <meng.lv@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-12-02	GBE/DebugInfo: Enable new feature	Bai Yannan	2	-0/+7
	1. Add a bool env OCL_DEBUGINFO to enable generate debug infomation while compiling cl source, set it false as default value. 2. Define a structure type DebugInfo. Signed-off-by: Yannan Bai <yannan.bai@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-11-26	check image from buffer's base address alignment.	Luo Xionghu	1	-0/+9
	per spec, if create image from USE_HOST_PTR buffer, the buffer's base address need be aligned. v2: return error code CL_VALID_IMAGE_FORMAT_DESCRIPTOR. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>