summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-05-21Minor fixes for Fedora. Don't upstream.HEADmasterLaura Ekstrand2-0/+6
2015-05-19Remove some LGPL incompatible code.Zhigang Gong4-1449/+0
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-05-18Docs: update/clarify Haswell issuesRebecca N. Palmer2-19/+35
Reflect recent beignet and Linux changes. Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-05-18Add a sanity test in clGetDeviceIDsRebecca N. Palmer2-0/+84
Run a small __local-using kernel in clGetDeviceIDs; if this returns the wrong result, return CL_DEVICE_NOT_FOUND. As far as I can see, there's no way to tell in advance (except unreliably with a global version check) whether __local-using batches will be accepted...so the easiest solution is probably to just try running one and see what result we get. Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-05-18Fix a indirect register bug.Yang Rong1-1/+2
Must init the fields indirect register used. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-05-15CHV: Fix a chv long convert bug.Yang Rong1-4/+4
When convert byte/short/int to long, the temp regiser type signed or not is decided by src type, not dst type. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
2015-05-15GBE: remove unnecessary assertRuiling Song2-2/+0
Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
2015-05-15GBE: Fix the immediate data typeRuiling Song2-7/+7
Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
2015-05-15GBE: make all memory operation share same bti dependency.Ruiling Song1-22/+17
As we are going to support dynamic bti, it is impossible to add the bti dependency. so just use one bti dependency, that is to say, we don't change the memory instruction sequence in instruction scheduler. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
2015-05-15correct the src output of alu3 when OCL_OUTPUT_ASM=1Guo Yejun1-3/+12
Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2015-05-15add environment variable OCL_OUTPUT_KERNEL_SOURCE.Luo Xionghu2-0/+12
export the variable to 1 to view the building or compiling kernel's source code. By default, it is false and GBE will not print any code. v2: also output the build options if not empty. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2015-05-15Add stuct argument indirect load test.Yang Rong5-5/+63
1. Enable compiler_argument_structure_indirect. 2. Add compiler_argument_structure_indirect, which has select address and load argument instruction. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
2015-05-15Add Indirect struct argument read support.Yang Rong8-36/+257
The steps to handle Indirect argument read: 1. Find out all indirect loads and its address caculation. 2. Add INDIRECT_MOV IR instruction, replace load to INDIRECT_MOV. 3. Replace the bass address and offset ADD instruction to offset MOV instruction. Could optimize. V2: use a tmp uw register to calc offset for indirect move. V3: tmp can't be uniform, because exec width is not 1 when uniform. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
2015-05-13add utest for intel_sub_group_shuffleGuo Yejun3-1/+65
v2: correct kernel to be suitable for simd_width both 8 and 16 Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
2015-05-12rename __gen_ocl_simd_any/all to sub_group_any/allGuo Yejun12-47/+43
it is defined in https://www.khronos.org/registry/cl/extensions/intel/cl_intel_subgroups.txt Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-05-12add sub group functions intel_sub_group_shuffleGuo Yejun9-3/+81
floatN intel_sub_group_shuffle(floatN x, uint c); intN intel_sub_group_shuffle(intN x, uint c); uintN intel_sub_group_shuffle(uintN x, uint c); the value of x of the c-th channel of the SIMD is returned, for all SIMD channels, the behavior is undefined if c is larger than simdsize - 1 Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-05-12rename __gen_ocl_get_simd_id/size to get_sub_group_id/sizeGuo Yejun10-26/+31
Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com> Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-05-12GBE: fix LOD initialization for typed write instruction.Zhigang Gong1-1/+1
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-05-12add introduction to build Beignet with yoctoGuo Yejun2-4/+73
Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-05-12Fix two argument lowering bug.Yang Rong1-4/+7
1. When there are some mismatch AddImm/Load, may be a indirect load, should return false. 2. Can't remove load in REMOVE_INSN, because load's use is not empty. Force remove it. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-05-04Revert "CL/Driver: enable atomics in L3 for HSW."Zhigang Gong2-14/+1
This issue should be fixed in KMD in the future. Let's revert it here. This reverts commit ef7127c03bd533277afc443b335c37a69927250a.
2015-05-04add benckmark for copy data from buffer to image.Luo Xionghu3-1/+68
v2: use random input data; update comments. v3: change the image attribute to __write_only. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Weng, Chuanbo" <chuanbo.weng@intel.com>
2015-05-04Optimization of clEnqueueCopyBufferToImage for 16 aligned case.Luo Xionghu4-9/+56
We can change the image_channel_order to CL_RGBA and image_channel_data_type to CL_UNSIGNED_INT32 for some special case, thus 16 bytes can be read by one work item. Bandwidth is fully used. v2: merge patch 3 of initializing region0; remove k dimension in kernel for 2d image. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Weng, Chuanbo" <chuanbo.weng@intel.com>
2015-05-04Chv: Add chv backend support.Yang Rong5-10/+193
The chv's backend is almost same as bdw. But some long register restrictions: 1. ARF registers must never be used with 64b datatype. 2. Source and Destination horizontal stride must be aligned to the same qword. 3. Source and Destination offset must be the same, except the case of scalar source. Add ChvContent in gen8_context.cpp to handle it. The chv's encoder is same as Gen8Encoder. V2: Fix sz a typo in function ChvContext::setA0Content when rebase. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-05-04CHV: Add cherryview support in the runtime.Meng Mengmeng7-7/+71
Cherryview's EU configurations is not decided by pciid, must get from kernel by libdrm. Thanks for Jeff adding this support in the kernel and libdrm. V2: Add the warning when can't get configurations. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-04-30utests: fix test case builtin_tgamma.Rebecca N. Palmer1-3/+16
Compare with tgamma instead of tgammaf for better accuracy. Include negative inputs, and handle the resulting denormals. Print maximum error found. Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-04-30Allow building with Python 3Rebecca N. Palmer2-12/+14
Make the build scripts work in both Python 2 and Python 3. (CMake prefers Python 2 if both are available, but will use Python 3 if only it is installed.) Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-04-29Make tgamma meet the accuracy standard.Rebecca N. Palmer1-7/+89
The old tgamma=exp(lgamma) implementation had high rounding error on large outputs, exceeding the 16ulp specification for approx. x>8 (hence the test failure in strict conformance mode). Replace this with an implementation based on glibc's http://sources.debian.net/src/glibc/2.19-17/sysdeps/ieee754/flt-32/e_gammaf_r.c/ Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2015-04-29utest_pow: don't fail on declared lack of denormals.Rebecca N. Palmer1-2/+8
0.01**20.5 is denormal; at least Ivy Bridge does not support denormals and hence returns 0. As this is allowed by the OpenCL standard, it shouldn't fail the test. Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2015-04-24add utest for __gen_ocl_get_simd_idGuo Yejun3-1/+43
Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-24add simd level function __gen_ocl_get_simd_idGuo Yejun13-3/+46
uint __gen_ocl_get_simd_id(); return value ranges from 0 to simdsize - 1 V2: use function sel.selReg to refine code V3: correct the uniform condition in liveness.cpp Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-24add utest for __gen_ocl_get_simd_sizeGuo Yejun3-1/+39
Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-24add simd level function __gen_ocl_get_simd_sizeGuo Yejun12-1/+140
uint __gen_ocl_get_simd_size(); returns 8 if SIMD8, returns 16 if SIMD16 V2: add missing files remove some unnecessary functions V3: correct the dst register setting, it is possible not uniform V4: remove unnecessary function Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-15Doc: update known issue for the store/load pointer issue.Zhigang Gong1-0/+11
We will defer the fix of this known issue to 1.1.0. Let's document it before that. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-04-14Kill the A0 cache in GenContext.Junyan He3-82/+24
The a0 value cache in Gencontext can just hold the value in compiling time, which may be different with the true offset value in run time when the code generates the backward jump. So just kill the cache of a0 and we will use load vector instruction to optimize it lader. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2015-04-14Doc: update cmd parser issue for HSW platforms.Zhigang Gong1-5/+8
For HSW platform, due to the atomic in L3 related registers' usage, we always need to disable the cmd parser. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-04-14GBE: should initialize useDWLabel to false by default.Zhigang Gong1-1/+1
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-14Doc: add a command to install dependencies.Zhigang Gong1-0/+6
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-04-14utests: don't continue to run any case when fail to initialize device.Zhigang Gong1-1/+5
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-14runtime: don't try to open nonexistent render nodes or device files.Zhigang Gong1-1/+4
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-13BDW: Refine unpacked_ud in the gen8_context.cpp.Yang Rong1-16/+19
Add a function unpacked_ud to handle unpacked_ud from long. Also fix a in calculateFullU64MUL when uniform register and offset!=0. V2: Refined the git log. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Signed-off-by: Meng Mengmeng <mengmeng.meng@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-04-13GBE: fix a bug in byte scatter write.Zhigang Gong1-5/+11
In uniform mode, we should set simd width to 1 and set noMask bit. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-13GBE: fix an potential assertion in constant expanding pass.Zhigang Gong1-1/+1
Using the inserPos is good enough. If using --insertPos, there is one potential issue when the insertPos is the head of a list then it will trigger an assertion. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2015-04-13GBE: correct the instruction replacement logic in scalarize pass.Zhigang Gong1-9/+34
When we want to delete an old instruction and replace it with the new one, we only call the LLVM IR's replace function which is not sufficient for the scalarize pass, as we also keep some local reference int eh vecVals map. We need to replace all of those local reference also. Otherwise, the deleted values may be used in the subsequent instructions which causes fatal error latter. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-13GBE: refine error handling for private libva buffer sharing extension.Zhigang Gong2-0/+11
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-13GBE: Use actual bti information to determine a pointer's addressspace.Zhigang Gong5-8/+23
Due to the private constant buffer support, it introduces private address space mixed with constant address space some time. And more generic, one constant address space may be located in private address space in LLVM IR layer. Such as the following code: __kernel ... { const int2 foo[] = {{0, 1}, {2, 3}}; int2 data = foo[get_global_id(0) % 2]; } The foo is in private address space but we finally will use __constant bti to access it in Gen backend. The the above code will cause a assertion fail in gen insturcion selection stage, because it generate a vector loading instruction on a __constant buffer. So we should use the actual BTI data to determine one pointer's address space rather than get it from the LLVM IR layer. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-13GBE: Extend front label ip to 32 bit on demand.Zhigang Gong10-29/+128
If the front end label ip exceed 0xffff, then the backend will use real DW to represent each block's IP address. This is a dynamic behaviour according to the actual front end's label number. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-13GBE: don't type cast register/labelindex to integer.Zhigang Gong3-33/+32
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-13GBE: extend backend label to 32 bit.Zhigang Gong3-15/+15
The front end label is still 16 bit. But the auxiliary label could be larger than that. This is the preparation to support 32 bit label for both front end and backend. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-13GBE: extend registers/tuples/immediates to 32bit wide.Zhigang Gong7-29/+30
For some extremly large kernel, these values may be larger than 0xFFFF, we have to extend them to 32 bit. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>