Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
Reflect recent beignet and Linux changes.
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Run a small __local-using kernel in clGetDeviceIDs; if this returns
the wrong result, return CL_DEVICE_NOT_FOUND.
As far as I can see, there's no way to tell in advance (except
unreliably with a global version check) whether __local-using batches
will be accepted...so the easiest solution is probably to just try
running one and see what result we get.
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Must init the fields indirect register used.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
When convert byte/short/int to long, the temp regiser type signed
or not is decided by src type, not dst type.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
As we are going to support dynamic bti, it is impossible
to add the bti dependency. so just use one bti dependency,
that is to say, we don't change the memory instruction sequence
in instruction scheduler.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
export the variable to 1 to view the building or compiling kernel's
source code. By default, it is false and GBE will not print any code.
v2: also output the build options if not empty.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
1. Enable compiler_argument_structure_indirect.
2. Add compiler_argument_structure_indirect, which has select address and load argument instruction.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
The steps to handle Indirect argument read:
1. Find out all indirect loads and its address caculation.
2. Add INDIRECT_MOV IR instruction, replace load to INDIRECT_MOV.
3. Replace the bass address and offset ADD instruction to offset MOV instruction. Could optimize.
V2: use a tmp uw register to calc offset for indirect move.
V3: tmp can't be uniform, because exec width is not 1 when uniform.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
v2: correct kernel to be suitable for simd_width both 8 and 16
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
it is defined in https://www.khronos.org/registry/cl/extensions/intel/cl_intel_subgroups.txt
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
floatN intel_sub_group_shuffle(floatN x, uint c);
intN intel_sub_group_shuffle(intN x, uint c);
uintN intel_sub_group_shuffle(uintN x, uint c);
the value of x of the c-th channel of the SIMD is returned, for all SIMD channels,
the behavior is undefined if c is larger than simdsize - 1
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
1. When there are some mismatch AddImm/Load, may be a indirect load, should return false.
2. Can't remove load in REMOVE_INSN, because load's use is not empty. Force remove it.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
This issue should be fixed in KMD in the future. Let's revert it
here.
This reverts commit ef7127c03bd533277afc443b335c37a69927250a.
|
|
v2: use random input data; update comments.
v3: change the image attribute to __write_only.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: "Weng, Chuanbo" <chuanbo.weng@intel.com>
|
|
We can change the image_channel_order to CL_RGBA and
image_channel_data_type to CL_UNSIGNED_INT32 for some special
case, thus 16 bytes can be read by one work item. Bandwidth is
fully used.
v2: merge patch 3 of initializing region0;
remove k dimension in kernel for 2d image.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: "Weng, Chuanbo" <chuanbo.weng@intel.com>
|
|
The chv's backend is almost same as bdw. But some long register restrictions:
1. ARF registers must never be used with 64b datatype.
2. Source and Destination horizontal stride must be aligned to the same qword.
3. Source and Destination offset must be the same, except the case of scalar source.
Add ChvContent in gen8_context.cpp to handle it. The chv's encoder is same as Gen8Encoder.
V2: Fix sz a typo in function ChvContext::setA0Content when rebase.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Cherryview's EU configurations is not decided by pciid, must get from kernel by libdrm.
Thanks for Jeff adding this support in the kernel and libdrm.
V2: Add the warning when can't get configurations.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Compare with tgamma instead of tgammaf for better accuracy.
Include negative inputs, and handle the resulting denormals.
Print maximum error found.
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Make the build scripts work in both Python 2 and Python 3.
(CMake prefers Python 2 if both are available, but will use
Python 3 if only it is installed.)
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
The old tgamma=exp(lgamma) implementation had high rounding error on
large outputs, exceeding the 16ulp specification for approx. x>8
(hence the test failure in strict conformance mode).
Replace this with an implementation based on glibc's
http://sources.debian.net/src/glibc/2.19-17/sysdeps/ieee754/flt-32/e_gammaf_r.c/
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
0.01**20.5 is denormal; at least Ivy Bridge does not support
denormals and hence returns 0. As this is allowed by the
OpenCL standard, it shouldn't fail the test.
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
uint __gen_ocl_get_simd_id();
return value ranges from 0 to simdsize - 1
V2: use function sel.selReg to refine code
V3: correct the uniform condition in liveness.cpp
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
uint __gen_ocl_get_simd_size();
returns 8 if SIMD8, returns 16 if SIMD16
V2: add missing files
remove some unnecessary functions
V3: correct the dst register setting, it is possible not uniform
V4: remove unnecessary function
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
We will defer the fix of this known issue to 1.1.0. Let's document it
before that.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
The a0 value cache in Gencontext can just hold the value
in compiling time, which may be different with the true
offset value in run time when the code generates the
backward jump. So just kill the cache of a0 and we will
use load vector instruction to optimize it lader.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
For HSW platform, due to the atomic in L3 related registers' usage,
we always need to disable the cmd parser.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Add a function unpacked_ud to handle unpacked_ud from long.
Also fix a in calculateFullU64MUL when uniform register and offset!=0.
V2: Refined the git log.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Signed-off-by: Meng Mengmeng <mengmeng.meng@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
In uniform mode, we should set simd width to 1 and set noMask bit.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Using the inserPos is good enough. If using --insertPos, there
is one potential issue when the insertPos is the head of a list
then it will trigger an assertion.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
When we want to delete an old instruction and replace it with the new one,
we only call the LLVM IR's replace function which is not sufficient for
the scalarize pass, as we also keep some local reference int eh vecVals
map. We need to replace all of those local reference also.
Otherwise, the deleted values may be used in the subsequent instructions
which causes fatal error latter.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Due to the private constant buffer support, it introduces private address
space mixed with constant address space some time. And more generic, one
constant address space may be located in private address space in LLVM IR
layer. Such as the following code:
__kernel ...
{
const int2 foo[] = {{0, 1}, {2, 3}};
int2 data = foo[get_global_id(0) % 2];
}
The foo is in private address space but we finally will use __constant bti
to access it in Gen backend. The the above code will cause a assertion fail
in gen insturcion selection stage, because it generate a vector loading
instruction on a __constant buffer.
So we should use the actual BTI data to determine one pointer's address space
rather than get it from the LLVM IR layer.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
If the front end label ip exceed 0xffff, then the backend will
use real DW to represent each block's IP address. This is
a dynamic behaviour according to the actual front end's label
number.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
The front end label is still 16 bit. But the auxiliary
label could be larger than that. This is the preparation
to support 32 bit label for both front end and backend.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
For some extremly large kernel, these values may be larger than
0xFFFF, we have to extend them to 32 bit.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|