Age | Commit message (Collapse) | Author | Files | Lines |
|
src modifier is not supported by some instructions.
so return false when it exists. This fix piglit %
scalar-arithmetic-int failed
V2: (1)add hadd rhadd
(2)confirmed math functions support midifer except IDIV/Mod
Signed-off-by: rander.wang <rander.wang@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Check the device supported subgroup sizes, and use
intel_reqd_sub_group_size to build kernels in these size. Then check if
there is spill for each kernel.
V2: Fix memory leak
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Add CL_DEVICE_SUB_GROUP_SIZES_INTEL for clGetDeviceInfo, add
CL_KERNEL_SPILL_MEM_SIZE_INTEL for clGetKernelWorkGroupInfo and add
CL_KERNEL_COMPILE_SUB_GROUP_SIZE_INTEL for clGetKernelSubGroupInfo.
We only have this extension for LLVM 40+ for frontend support.
V2: Add opencl-c define
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
If we get intel_reqd_sub_group_size attribute from frontend then set it
to backend.
V2: Refine the codeGenNum with runtime caclculate and fail the build if
the size from frontend is illegal.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
for the following GEN IR, %41 is kernel argument (struct)
the first LOAD will be mov, and the second LOAD will be indirect move
(see lowerFunctionArguments). It hurts performance,
and even impacts the correctness of reg liveness of indriect mov
LOADI.uint64 %1114 72
ADD.int64 %78 %41 %1114
LOAD.int64.private.aligned {%79} %78 bti:255
LOADI.int64 %1115 8
ADD.int64 %1116 %78 %1115
LOAD.int64.private.aligned {%80} %1116 bti:255
this function folds the constants of 72 and 8 together,
and so it will be direct mov.
the GEN IR looks like:
LOADI.int64 %1115 80
ADD.int64 %1116 %41 %1115
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
It is similar with 2D image for avoiding extended image width truncated.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
It will test aligned4 and aligned16 kernel for 3D image.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
1. Only copy the data by origin and region defined.
2. Add clFinish to guarantee the kernel copying is finished when blocking writing.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
1. Support wrrting data by mapping/unmapping mode.
2. Add mapping record logic.
3. Add clFinish to guarantee the kernel copying is finished.
4. Fix the error of calling clEnqueueMapImageByKernel.
blocking_map and map_flags need be switched.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
large image.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
image by kernel copying.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
USE_HOST_PTR mode.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
It is used to reproduce the bug of clCopyImage/clFillImage of conformance test.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
It is used to reproduce the bug of allocations of conformance test.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
the negative Add is like:
exp -a
llvm transfer it to:
add x -a, 0
exp x
Signed-off-by: rander.wang <rander.wang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
LLVM transform Mad(a, -b, c) to
Add b, -b, 0
Mad val, a, b, c
pow(a,-b) and other buildin math function to the same instruction sequence like above
for Gen support negtive modifier, mad(a, -b, c) is native suppoted.
Do it just like a: mov b, -b, so it is a Mov operation like LocalCopyPropagation
Signed-off-by: rander.wang <rander.wang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
Signed-off-by: rander.wang <rander.wang@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
there some patterns like:
sqrt r1, r2;
load r4, 1.0; ===> rqrt r3, r2
div r3, r4, r1;
Signed-off-by: rander.wang <rander.wang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
Found a missing macro that need change to support LLVM40+.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
remove some corner cases check for these path can not be
reached.And refine branch code to select. These improvements
get 20% performance. and the performance of OCL_ExpFixture_Exp
in opencv can match up to other Gen driver
Signed-off-by: rander.wang <rander.wang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
the test OCL_Magnitude of opencv is slow on beignet because
of hypot. refine the hypot, change algorithm and remove
unnecessary code to get 30% up
Signed-off-by: rander.wang <rander.wang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
"imagedim_non_pow_2" cases of basic modudle of confrmance shows
regression after use TILE_Y mode for large image by previous patch.
This bug comes from the non-align16 kernel of clEnqueueCopyBufferToImage
and clEnqueueCopyImageToBuffer.
It will force CL_RGBA/CL_UNORM_INT8/8191x8192 image of conformance test
to CL_R/CL_UNSIGNED_INT8/32764x8192 image for copying.
So it makes width as 8191 x 4 = 32764 and its width will exceed the maximum
width (16 x 1024 = 16384) of GEN surface state structure which only has 14 bits.
So use align4 copy kernel to avoid this bug.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
|
|
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
|
|
There is a race condition between building .bc and header files and
generating code from .cl targets. Fix the race by adding the
dependency to generated files.
Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
|
|
when the return value is ARG_INDIRECT_READ, there is still possible
that some IRs read it directly, and will be handled in buildConstantPush()
so we need to refresh the dag afer function buildConstantPush
another method is to update DAG accordingly, but i don't think it
is easy compared with the refresh method, so i do not choose it.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
We only output MATH function before, now we can know which math
function is it.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: rander.wang <rander.wang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
It will fail to copy data from host ptr to TILE_Y large image by memcpy.
Use clEnqueueCopyBufferToImage to do this on GPU side.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
It will fail to copy data from TILE_Y large image to buffer by memcpy.
Use clEnqueueCopyImageToBuffer to do this on GPU side.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
It will fail to copy data from TILE_Y large image to buffer by memcpy.
Use clEnqueueCopyImageToBuffer to do this.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
It may failed to copy data from host ptr to TILE_Y large image.
So use clCopyBufferToImage to do this on GPU side.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
It is for testing large image with TILE_Y mode.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
It is for testing large image with TILE_Y mode.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
It is for testing large image with TILE_Y mode.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
It is for testing large image with TILE_Y mode.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
After the refine we can not know if a sampler is a constant initialized
or not. Then the compiler optimization for constant sampler will break
and we will runtime decide which SAMPLE instruction will use.
Now fix the sampler refine for LLVM40 to enable the constant check.
V2: Fix a typo of function __gen_ocl_sampler_to_int type.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
In llvm literal structs have no name, so check it first.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
|
|
refine the algorithm to remove unnecessary operations
Signed-off-by: rander.wang <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
remove private array and convert if to select
Signed-off-by: rander.wang <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
refine algorithm to remove branch
Signed-off-by: rander.wang <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
remove redundent operation to get more performance
Signed-off-by: rander.wang <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
get it from crlibm and refine it for gen
Signed-off-by: rander.wang <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
do it like sin function
Signed-off-by: rander.wang <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
(1)refine the NAN check
(2)using sqrt to get cos
(3)remove small range check
Signed-off-by: rander.wang <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
using a simple algorithm to get it
Signed-off-by: rander.wang <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
remove private array and some unnecessary if check.
convert some if to select. improve about 50% performance
Signed-off-by: rander.wang <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
mov all the common math function to match_common.cl. it is
easy to maitain
Signed-off-by: rander.wang <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
Geminilake is almost same as bxt, except intel_gpgpu_read_ts_reg
function.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
Geminilake's backend is same as bxt.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|