~ldeks/beignet - Personal branch of beignet.

Age	Commit message (Collapse)	Author	Files	Lines
2015-05-21	Minor fixes for Fedora. Don't upstream.HEAD master	Laura Ekstrand	2	-0/+6

2015-05-19	Remove some LGPL incompatible code.	Zhigang Gong	4	-1449/+0
	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-05-18	Docs: update/clarify Haswell issues	Rebecca N. Palmer	2	-19/+35
	Reflect recent beignet and Linux changes. Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-05-18	Add a sanity test in clGetDeviceIDs	Rebecca N. Palmer	2	-0/+84
	Run a small __local-using kernel in clGetDeviceIDs; if this returns the wrong result, return CL_DEVICE_NOT_FOUND. As far as I can see, there's no way to tell in advance (except unreliably with a global version check) whether __local-using batches will be accepted...so the easiest solution is probably to just try running one and see what result we get. Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-05-18	Fix a indirect register bug.	Yang Rong	1	-1/+2
	Must init the fields indirect register used. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-05-15	CHV: Fix a chv long convert bug.	Yang Rong	1	-4/+4
	When convert byte/short/int to long, the temp regiser type signed or not is decided by src type, not dst type. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
2015-05-15	GBE: remove unnecessary assert	Ruiling Song	2	-2/+0
	Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
2015-05-15	GBE: Fix the immediate data type	Ruiling Song	2	-7/+7
	Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
2015-05-15	GBE: make all memory operation share same bti dependency.	Ruiling Song	1	-22/+17
	As we are going to support dynamic bti, it is impossible to add the bti dependency. so just use one bti dependency, that is to say, we don't change the memory instruction sequence in instruction scheduler. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
2015-05-15	correct the src output of alu3 when OCL_OUTPUT_ASM=1	Guo Yejun	1	-3/+12
	Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2015-05-15	add environment variable OCL_OUTPUT_KERNEL_SOURCE.	Luo Xionghu	2	-0/+12
	export the variable to 1 to view the building or compiling kernel's source code. By default, it is false and GBE will not print any code. v2: also output the build options if not empty. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2015-05-15	Add stuct argument indirect load test.	Yang Rong	5	-5/+63
	1. Enable compiler_argument_structure_indirect. 2. Add compiler_argument_structure_indirect, which has select address and load argument instruction. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
2015-05-15	Add Indirect struct argument read support.	Yang Rong	8	-36/+257
	The steps to handle Indirect argument read: 1. Find out all indirect loads and its address caculation. 2. Add INDIRECT_MOV IR instruction, replace load to INDIRECT_MOV. 3. Replace the bass address and offset ADD instruction to offset MOV instruction. Could optimize. V2: use a tmp uw register to calc offset for indirect move. V3: tmp can't be uniform, because exec width is not 1 when uniform. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
2015-05-13	add utest for intel_sub_group_shuffle	Guo Yejun	3	-1/+65
	v2: correct kernel to be suitable for simd_width both 8 and 16 Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
2015-05-12	rename __gen_ocl_simd_any/all to sub_group_any/all	Guo Yejun	12	-47/+43
	it is defined in https://www.khronos.org/registry/cl/extensions/intel/cl_intel_subgroups.txt Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-05-12	add sub group functions intel_sub_group_shuffle	Guo Yejun	9	-3/+81
	floatN intel_sub_group_shuffle(floatN x, uint c); intN intel_sub_group_shuffle(intN x, uint c); uintN intel_sub_group_shuffle(uintN x, uint c); the value of x of the c-th channel of the SIMD is returned, for all SIMD channels, the behavior is undefined if c is larger than simdsize - 1 Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-05-12	rename __gen_ocl_get_simd_id/size to get_sub_group_id/size	Guo Yejun	10	-26/+31
	Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com> Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-05-12	GBE: fix LOD initialization for typed write instruction.	Zhigang Gong	1	-1/+1
	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-05-12	add introduction to build Beignet with yocto	Guo Yejun	2	-4/+73
	Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-05-12	Fix two argument lowering bug.	Yang Rong	1	-4/+7
	1. When there are some mismatch AddImm/Load, may be a indirect load, should return false. 2. Can't remove load in REMOVE_INSN, because load's use is not empty. Force remove it. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-05-04	Revert "CL/Driver: enable atomics in L3 for HSW."	Zhigang Gong	2	-14/+1
	This issue should be fixed in KMD in the future. Let's revert it here. This reverts commit ef7127c03bd533277afc443b335c37a69927250a.
2015-05-04	add benckmark for copy data from buffer to image.	Luo Xionghu	3	-1/+68
	v2: use random input data; update comments. v3: change the image attribute to __write_only. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Weng, Chuanbo" <chuanbo.weng@intel.com>
2015-05-04	Optimization of clEnqueueCopyBufferToImage for 16 aligned case.	Luo Xionghu	4	-9/+56
	We can change the image_channel_order to CL_RGBA and image_channel_data_type to CL_UNSIGNED_INT32 for some special case, thus 16 bytes can be read by one work item. Bandwidth is fully used. v2: merge patch 3 of initializing region0; remove k dimension in kernel for 2d image. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Weng, Chuanbo" <chuanbo.weng@intel.com>
2015-05-04	Chv: Add chv backend support.	Yang Rong	5	-10/+193
	The chv's backend is almost same as bdw. But some long register restrictions: 1. ARF registers must never be used with 64b datatype. 2. Source and Destination horizontal stride must be aligned to the same qword. 3. Source and Destination offset must be the same, except the case of scalar source. Add ChvContent in gen8_context.cpp to handle it. The chv's encoder is same as Gen8Encoder. V2: Fix sz a typo in function ChvContext::setA0Content when rebase. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-05-04	CHV: Add cherryview support in the runtime.	Meng Mengmeng	7	-7/+71
	Cherryview's EU configurations is not decided by pciid, must get from kernel by libdrm. Thanks for Jeff adding this support in the kernel and libdrm. V2: Add the warning when can't get configurations. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-04-30	utests: fix test case builtin_tgamma.	Rebecca N. Palmer	1	-3/+16
	Compare with tgamma instead of tgammaf for better accuracy. Include negative inputs, and handle the resulting denormals. Print maximum error found. Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-04-30	Allow building with Python 3	Rebecca N. Palmer	2	-12/+14
	Make the build scripts work in both Python 2 and Python 3. (CMake prefers Python 2 if both are available, but will use Python 3 if only it is installed.) Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-04-29	Make tgamma meet the accuracy standard.	Rebecca N. Palmer	1	-7/+89
	The old tgamma=exp(lgamma) implementation had high rounding error on large outputs, exceeding the 16ulp specification for approx. x>8 (hence the test failure in strict conformance mode). Replace this with an implementation based on glibc's http://sources.debian.net/src/glibc/2.19-17/sysdeps/ieee754/flt-32/e_gammaf_r.c/ Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2015-04-29	utest_pow: don't fail on declared lack of denormals.	Rebecca N. Palmer	1	-2/+8
	0.01**20.5 is denormal; at least Ivy Bridge does not support denormals and hence returns 0. As this is allowed by the OpenCL standard, it shouldn't fail the test. Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2015-04-24	add utest for __gen_ocl_get_simd_id	Guo Yejun	3	-1/+43
	Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-24	add simd level function __gen_ocl_get_simd_id	Guo Yejun	13	-3/+46
	uint __gen_ocl_get_simd_id(); return value ranges from 0 to simdsize - 1 V2: use function sel.selReg to refine code V3: correct the uniform condition in liveness.cpp Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-24	add utest for __gen_ocl_get_simd_size	Guo Yejun	3	-1/+39
	Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-24	add simd level function __gen_ocl_get_simd_size	Guo Yejun	12	-1/+140
	uint __gen_ocl_get_simd_size(); returns 8 if SIMD8, returns 16 if SIMD16 V2: add missing files remove some unnecessary functions V3: correct the dst register setting, it is possible not uniform V4: remove unnecessary function Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-15	Doc: update known issue for the store/load pointer issue.	Zhigang Gong	1	-0/+11
	We will defer the fix of this known issue to 1.1.0. Let's document it before that. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-04-14	Kill the A0 cache in GenContext.	Junyan He	3	-82/+24
	The a0 value cache in Gencontext can just hold the value in compiling time, which may be different with the true offset value in run time when the code generates the backward jump. So just kill the cache of a0 and we will use load vector instruction to optimize it lader. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2015-04-14	Doc: update cmd parser issue for HSW platforms.	Zhigang Gong	1	-5/+8
	For HSW platform, due to the atomic in L3 related registers' usage, we always need to disable the cmd parser. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-04-14	GBE: should initialize useDWLabel to false by default.	Zhigang Gong	1	-1/+1
	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-14	Doc: add a command to install dependencies.	Zhigang Gong	1	-0/+6
	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-04-14	utests: don't continue to run any case when fail to initialize device.	Zhigang Gong	1	-1/+5
	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-14	runtime: don't try to open nonexistent render nodes or device files.	Zhigang Gong	1	-1/+4
	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-13	BDW: Refine unpacked_ud in the gen8_context.cpp.	Yang Rong	1	-16/+19
	Add a function unpacked_ud to handle unpacked_ud from long. Also fix a in calculateFullU64MUL when uniform register and offset!=0. V2: Refined the git log. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Signed-off-by: Meng Mengmeng <mengmeng.meng@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-04-13	GBE: fix a bug in byte scatter write.	Zhigang Gong	1	-5/+11
	In uniform mode, we should set simd width to 1 and set noMask bit. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-13	GBE: fix an potential assertion in constant expanding pass.	Zhigang Gong	1	-1/+1
	Using the inserPos is good enough. If using --insertPos, there is one potential issue when the insertPos is the head of a list then it will trigger an assertion. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2015-04-13	GBE: correct the instruction replacement logic in scalarize pass.	Zhigang Gong	1	-9/+34
	When we want to delete an old instruction and replace it with the new one, we only call the LLVM IR's replace function which is not sufficient for the scalarize pass, as we also keep some local reference int eh vecVals map. We need to replace all of those local reference also. Otherwise, the deleted values may be used in the subsequent instructions which causes fatal error latter. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-13	GBE: refine error handling for private libva buffer sharing extension.	Zhigang Gong	2	-0/+11
	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-13	GBE: Use actual bti information to determine a pointer's addressspace.	Zhigang Gong	5	-8/+23
	Due to the private constant buffer support, it introduces private address space mixed with constant address space some time. And more generic, one constant address space may be located in private address space in LLVM IR layer. Such as the following code: __kernel ... { const int2 foo[] = {{0, 1}, {2, 3}}; int2 data = foo[get_global_id(0) % 2]; } The foo is in private address space but we finally will use __constant bti to access it in Gen backend. The the above code will cause a assertion fail in gen insturcion selection stage, because it generate a vector loading instruction on a __constant buffer. So we should use the actual BTI data to determine one pointer's address space rather than get it from the LLVM IR layer. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-13	GBE: Extend front label ip to 32 bit on demand.	Zhigang Gong	10	-29/+128
	If the front end label ip exceed 0xffff, then the backend will use real DW to represent each block's IP address. This is a dynamic behaviour according to the actual front end's label number. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-13	GBE: don't type cast register/labelindex to integer.	Zhigang Gong	3	-33/+32
	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-13	GBE: extend backend label to 32 bit.	Zhigang Gong	3	-15/+15
	The front end label is still 16 bit. But the auxiliary label could be larger than that. This is the preparation to support 32 bit label for both front end and backend. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-13	GBE: extend registers/tuples/immediates to 32bit wide.	Zhigang Gong	7	-29/+30
	For some extremly large kernel, these values may be larger than 0xFFFF, we have to extend them to 32 bit. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>