Age | Commit message (Collapse) | Author | Files | Lines |
|
reordering.
When we want to reorder the BBs and move the unstructured BB out-of the
structured block, we need to add a BRA to the block. If the exit of the
structured block is a loop, we need to append a unconditional BRA right
after the predicated BRA. Otherwise, we may lost the correct successor
if an unstructured BB is moved next to this BB.
After this patch, with loop optimization enabled, there is no regression
on both utests and piglit. But there are still a few regressions in opencv
test suite:
[----------] Global test environment tear-down
[==========] 8 tests from 2 test cases ran. (40041 ms total)
[ PASSED ] 2 tests.
[ FAILED ] 6 tests, listed below:
[ FAILED ] OCL_Photo/FastNlMeansDenoising.Mat/2, where GetParam() = (Channels(2), false)
[ FAILED ] OCL_Photo/FastNlMeansDenoising.Mat/3, where GetParam() = (Channels(2), true)
[ FAILED ] OCL_Photo/FastNlMeansDenoisingColored.Mat/0, where GetParam() = (Channels(3), false)
[ FAILED ] OCL_Photo/FastNlMeansDenoisingColored.Mat/1, where GetParam() = (Channels(3), true)
[ FAILED ] OCL_Photo/FastNlMeansDenoisingColored.Mat/2, where GetParam() = (Channels(4), false)
[ FAILED ] OCL_Photo/FastNlMeansDenoisingColored.Mat/3, where GetParam() = (Channels(4), true)
So let's keep this optimizaion disabled. Will enable it when I fixed all
the known issues.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Luo <xionghu.luo@intel.com>
|
|
function.hpp doesn't need to include the structural_analysis.hpp.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Luo <xionghu.luo@intel.com>
|
|
1. WHILE instruction should be non-schedulable.
2. if this WHILE instruction jumps to an ELSE instruction, the distance
need add 2.
v2:
We also need to take care of HSW for while instruction.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
v2:
disable loop optimization by default due to still buggy.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
Add Gen IR WHILE to mark the strucutred region.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Gen provide tm0 register for intra-kernel profiling.
Here we provide an API __gen_ocl_get_timestamp() to return
the timestamp in TM.
The return type is defined as:
struct time_stamp {
ulong tick;
uint event;
};
'tick' is a 64bit time tick. 'event' stores a value which means
whether a tmEvent has occured (non-zero) or not (0). tmEvent includes
time-impacting event such as context switch or frequency change
since last time tm0 was read.
I add a sample in the kernels/compiler_time_stamp.cl. Hope it
would help you understand how to use it.
V2:
Introduce ir::ARFRegister to avoid directly use of nr/subnr in Gen IR.
Rename __gen_ocl_extract_reg to __gen_ocl_region.
Rename beignet_get_time_stamp to __gen_ocl_get_timestamp.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
|
|
V2:
Replace all the long and ulong to int64_t
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
|
|
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
This patch still has to be pending to fix the wide integer issue completely.
Although we have a fallback mechanism which will try to build the module again
by ignoring some passes to avoid the wide integer issue, it's broken now on
master branch. As now all the builtin functions have been built statically,
and those bitcode may already have i128/i512 etc.
This reverts commit 565d1eb00d9a5219c2848b3674e40ac07cb48b89.
|
|
this patch was lost during the libocl merge. resubmit it to improve the
vector function performance.
please refer to e2db890596eea0a6eb741e11e576a38952f1ed1e for detail.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
no need to set the LinkOnceAnyLinkage for global variables and functions
to avoid redefinition.
v2:
also enable the VerifierPass.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
The LLVM_LFLAGS is used before finding the LLVM package,
which causes the CMake fails to set correct -L flags and
cause linkage error.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
|
|
Because we delete the validation of the PCH file, sometimes
the PCH in the system dir is not compatible with the clang
and cause crash.
Anytime, we need to use internal PCH when compiling.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
It seems that hw return wrong result when y is equal to 0x80000000
in sub_sat(int x, int y). So we re-write it as:
add_sat(add_sat(0x7fffffff, x), 1)
Also enable corresponding utest.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
work_group_size_hint should define another variable.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
the 'COMPILER' is to choose the detail compiler,the default is GCC.
Signed-off-by: Lv Meng <meng.lv@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
If we want to link multiple files together, and one kernel
function need refer other kernel functions in other files,
we must not set those functions as linked once attribute.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
the backend need return the kernel FUNCTION ATTRIBUTE message to the
clGetKernelInfo.
there are 3 kind of function attribute so far, vec_type_hint parameter
is not available to return due to llvm lack of such info.
Signed-off-by: Luo <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
According to the spec:
The build status is to
Returns the build, compile or link status,
whichever was performed last on program for
device.
The previous implementation only consider the clProgramBuild and
doesn't consider the compile. Now fix it.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
If the binary is a executable type, the first byte is zero and
we need to set the binary type correctly to CL_PROGRAM_BINARY_TYPE_EXECUTABLE.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
LunarGLASS have update his copyright, so update the copyright in llvm_scalarize.cpp.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Actually, we don't support double completely currently.
Let's disable it now. This bring a little incompatible point with the 1.2 spec
which doesn't require the kernel to use the following pragma to enable fp64.
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
If the application wants to try the partially supported double with beignet
under opencl 1.2, the application will still need to add the above pragma.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
And when we fail to compile a module, the fileName may be NULL, we can't
access it unconditionally.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
When compile a empty string, we may get an empty module which is not
an error.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
the memory object should be checked whether valid in context buffers before being set as kernel arguments.
v2: rename the function from mem_in_buffers to is_valid_mem, move the
magic header check into it.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
add CL_KERNEL_GLOBAL_WORK_SIZE option for clGetKernelWorkGroupInfo.
v2: should return the max global work size instead of current work size.
This funtion need return CL_INVALID_VALUE if the device is not a custom
device or kernel is not a built-in kernel.
we have 3 kind of built-in kernels for 1d/2d/3d memories, the max global
work size are decided by the dimension and memory type.
the piglit fail is caused by calling NON built-in kernels, so need send
patch to piglit later.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Actually, CLANG does take this option and we should not
filter it out. We also change the default option to create
PCH file to -cl-std=CL1.2. And if the user pass in a CL1.1
we will have to disable PCH.
Another change is that if we are CL1.2, then we should enable
the cl_khr_fp64 by default. As from CL1.2, this extension should
be enabled by default.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Modify the __ocl_math_fastpath_flag init value in the
backend link stage to switch between fast path and
conformance path.
V2:
Rename the function prototype parameter name.
V3:
Modify the parameter to boolean and correct some comment words.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
|
|
The -cl-std= will specify the least version to compile
the source code providing to our API. So we need to
check it early, and return failure if our platform's
version can not meet the request. In the backend, we
just ignore this cmd line option.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
It seems that this function is required by latest PyOpenCL.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Yichao Yu <yyc1992@gmail.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
|
|
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
add pointer check.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: Meng, Mengmeng <mengmeng.meng@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: Meng, Mengmeng <mengmeng.meng@intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Integer type wider than 64 bit is hard to handle on Gen.
Let's try to prevent ScalarReplAggregates pass to generate
such type of integer.
v2:
fix compilation error with LLVM 3.3.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|