Age | Commit message (Collapse) | Author | Files | Lines |
|
./opencv_test_imgproc --gtest_filter=OCL_Filter/LaplacianTest.Accuracy/60
hang.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
function.hpp doesn't need to include the structural_analysis.hpp.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
|
|
Add Gen IR WHILE to mark the strucutred region.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
|
|
Gen provide tm0 register for intra-kernel profiling.
Here we provide an API __gen_ocl_get_timestamp() to return
the timestamp in TM.
The return type is defined as:
struct time_stamp {
ulong tick;
uint event;
};
'tick' is a 64bit time tick. 'event' stores a value which means
whether a tmEvent has occured (non-zero) or not (0). tmEvent includes
time-impacting event such as context switch or frequency change
since last time tm0 was read.
I add a sample in the kernels/compiler_time_stamp.cl. Hope it
would help you understand how to use it.
V2:
Introduce ir::ARFRegister to avoid directly use of nr/subnr in Gen IR.
Rename __gen_ocl_extract_reg to __gen_ocl_region.
Rename beignet_get_time_stamp to __gen_ocl_get_timestamp.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
V2:
Replace all the long and ulong to int64_t
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
the backend need return the kernel FUNCTION ATTRIBUTE message to the
clGetKernelInfo.
there are 3 kind of function attribute so far, vec_type_hint parameter
is not available to return due to llvm lack of such info.
Signed-off-by: Luo <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
This regression is caused by structural analysis when check the if-then
node, acturally there are four types of if-then node according to the
topology and fallthrough information. fallthrough check is added in this
patch.
v2: add inversePredicate member and function for BranchInstruction;
print the exact meanning of IF instruction in GEN_IR.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
1.fix data structure redefine warnings.
2.fix 'data' with variable sized type 'union<*>' not at the end of a class warning(in immediate.hpp).
3.fix implicitly conversion warning.
4.fix explicitly assigning a variable type warning.
5.fix comparison of unsigned expression < 0 is always false warning(in cl_api.c).
Signed-off-by: Lv Meng <meng.lv@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
1.the "const" associated functions' modification is to fix "type qualifier on return type is meaningless" for ICC compile warning.
2.the "operator new" shoud have the corresponding "operator delete" function.
3.In C++0x std::auto_ptr will be deprecated in favor of std::unique_ptr.
Signed-off-by: Lv Meng <meng.lv@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
Use vector to fix "variable length array of non-POD element type" compiler error.
The /beignet/backend/src/./ir/context.hpp "fn->immediates[index] = imm" would call a private func
'operator=' which would trigger error, and it is not being used.
The undefined reference to `check_copy_overlap' would occur in the following calling.
Signed-off-by: Lv Meng <meng.lv@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
Clear to zero to avoid garbage data, as we do not
assign it later for local/constant memory access.
v2:
move initialization code into constructor.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
The target is to process all possible complex nested constant expression as below:
const = type0 OP0 (const0)
const0 = type1 OP1 (const1, const2)
const1 = ...
The supported OPs are as below:
BITCAST,
ADD,
SUB,
MUL,
DIV,
REM,
SHL,
ASHR,
LSHR,
AND,
OR,
XOR
We also add support for array/vector type of immediate. Some possible examples are as below:
float bitcast (i32 trunc (i128 bitcast (<4 x i32> <i32 1064178811, i32 1064346583, i32 1062836634, i32 undef> to i128) to i32) to float)
float bitcast (i32 trunc (i128 lshr (i128 bitcast (<4 x i32> <i32 1064178811, i32 1064346583, i32 1062836634, i32 undef> to i128), i128 32) to i32) to float)
v2:
separate all private method implementations to immediate.cpp.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
Preparation to support generic constant expression.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
1. Move the bti/Register map from gbe::Context to ir::Function.
2. use GlobalVariable instead of 'call' to get internal buffer (used for printf) base address.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Previously, we simply map 2G surface for memory access,
which has obvious security issue, user can easily read/write graphics
memory that does not belong to him. To prevent such kind of behaviour,
We bind each surface to a dedicated bti. HW provides automatic
bounds check. For out-of-bound write, it will be ignored. And for read
out-of-bound, hardware will simply return zero value.
The idea behind the patch is for a load/store instruction, it will search
through the LLVM use-def chain until finding out where the address
comes from. Then the bti is saved in ir::Instruction and used for
the later code generation. And for mixed pointer case, a load/store
will access more than one bti.
To simplify some code, '0' is reserved for constant address space,
'1' is reserved for private address space. Other btis are assigned
automatically by backend.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
To avoid possible garbage data.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Use instruction if, else and endif manipulate the control flow of
identified if-then and if-else structures at backend. but this
is not enabled, just add the necessary code to backend.
Signed-off-by: Yongjia Zhang <yongjia.zhang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Add tool structures and functions for identifying if-then and
if-else structures on Gen IR level.
Signed-off-by: Yongjia Zhang <yongjia.zhang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Add Gen IR IF, ELSE and ENDIF to mark the strucutred region.
Signed-off-by: Yongjia Zhang <yongjia.zhang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
The format and flag such as -+# and precision request has
been added into the output.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Add the %c and %f support for printf.
Also add the int to float and int to char conversion.
Some minor errors such as wrong index flags have been fixed.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Still can't handle the sampler_t which is not used actually.
Access qualifier seems broken with llvm 3.3.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
As sample LD message doesn't support array index, we have
to create a 2D array surface with the same buffer object.
Thus one 1D array image will have two surfaces binded to it
one is the index and the second is 128 + index.
And then at kernel side, we will access the corresponding
2D array surface when the LD message is required otherwise
will access the origin 1D array surface.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
|
|
If multi-thread run the kernel simultaneously, the output
may interlace with each other. Add a lock to avoid this.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
1. Delete the is3D member in instruction class. Because we need more
than 1 bit to represent 1D 2D and 3D. We now add an invalid register
in ir profile, and comparing the coords to it to judge the dimension.
2. Rename all the xxx_image to xxx_image2D to make its meaning clear.
3. Update the according Sampler and Typed_Write instruction in selection
and Gen IR generation.
v2:
fix the use of InvalidRegister. Use ir::ocl::invalid only.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
The PrintfParser will work before the llvm gen backend.
It will filter out all the printf function call. When
the printf call found, we will analyse the print format
and % place holder here. Replace the print call with
STORE or CONV+STORE instruction if needed.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
The PrintfSet will be used to collect all the infomation in
the kernel. After the kernel executed, it will be used
to generate the according printf output.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
printfiptr for printf index buffer pointer in curbe
and printfbptr for printf output buffer pointer in curbe.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
The OCL spec does allow the use of a i1/BOOL SLM
variable, so we have to support the load and store of
it. To make things simple, I choose to use S16 to represent
i1 value.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
During phi elimination, we simply insert 3 MOVs for one phi instruction
to avoid lost copy issue. But in fact, only two of them are needed for
most of time. This patch tries to see whether the move from phiCopy
to phi can be avoided.
The patch basically checks whether the phiCopy and phi have live range
interference. If no, then they can be coalesced, thus one instruction
can be optimized.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
We need to transfer ValueDef from predecessors to their successors.
Consider a register defined in BB0, and used in BB3. we need to
iterate over liveout to pass the def in BB0 to BB3, so the use
in BB3 could get that correct def. Otherwise, the UD/DU graph is incomplete.
This reverts commit 89b490b5a17cfda2d9816dc1c246ce5bbff12648.
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
On embedded/handheld devices, storage and memory are scarce, it is
necessary to provide only the OpenCL runtime library with small size,
and only the executable binary kernel will be supported on such device.
At the beginning of process (before function main), OpenCL runtime
(libcl.so) will try to load the compiler (libgbe.so), the system's
behavior is the same as before if successfully loaded, otherwise,
the runtime assumes no OpenCL compiler in the system, and the device
info will be changed as CL_DEVICE_COMPILER_AVAILABLE=false and
CL_DEVICE_PROFILE="EMBEDDED_PROFILE", the clBuildProgram returns
CL_COMPILER_NOT_AVAILABLE if the program is created with
clCreateProgramWithSource, following the OpenCL spec.
To simulate the case without OpenCL compiler, just delete the file
libgbe.so, or export OCL_NON_COMPILER=1.
Some explanation of the binary kernel interpreter (libinterp.a):
libinterp.a is used to interpret the binary kernel inside runtime,
and the runtime library libcl.so is built against libinterp.a.
Since the code to interpret binary kernel is tightly integrated inside
the compiler, to avoid code duplicate, a new file gbe_bin_interpreter.cpp
is created to include some other .cpp files; to make libinterp.a small
(the purpose to make libcl.so small), the macro GBE_COMPILER_AVAILABLE
is used to make only the needed code active when build for libinterp.a.
V2: code base is changed to call function gbe_set_image_base_index in
gbe_bin_generater, while this function is modified in this patch as
gbe_set_image_base_index_compiler, fix it accordingly.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
If a value is defined in a loop and is used out-of the
loop. That value could not be a uniform(scalar) value.
The reason is that value may be assigned different
scalar value on different lanes when it reenters with
different lanes actived.
Thanks for yang rong reporting this bug.
Signed-off-by: Zhigang Gong <zhigang.gong@gmail.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Each work group has it's own slm offset, and when dispatch threads,
TSG will handle it automatic in IVB. But it will fail in HSW.
After check, all work group's slm offset are 0, even the slm index is
correct in R0.0. So calc the slm offset for slm index, and add it
to the slm address.
TODO: need to find the root casue.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Junyan He <junyan.he@inbox.com>
|
|
Some instructions handle simd1 incorrectly. Disable them
currently.
v2:
add addsat into the unsupported list.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
v2:
refine the flag allocation implementation.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
We have many uniform (scalar) input values which include
the kernel input argument and some special registers.
And all those variables derived by all uniform values are
also uniform values. This patch analysis this type of register
at liveness analysis stage, and change uniform register's
type to scalar type. Then latter, these registers need
less register space.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
We already did a complete liveness analysis at the liveness.cpp.
Don't need to do that again. Save about 10% of the compile time.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
if there is no image/sampler used in kernel source, it is not
necessary to serialize the zero image/sampler info into kernel binary.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
If a phi component is undef from one of the predecessors,
we should not pass it as the predecessor's liveout registers.
Otherwise, that phi register's liveness may be extent to
the basic block zero which is not good.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
short __gen_ocl_simd_any(short x):
if x in any of the active threads in the same SIMD is not zero,
the return value for all these threads is not zero, otherwise, zero returned.
short __gen_ocl_simd_all(short x):
only if x in all of the active threads in the same SIMD is not zero,
the return value for all these threads is not zero, otherwise, zero returned.
for example:
to check if a special value exists in a global buffer, use one SIMD
to do the searching parallelly, the whole SIMD can stop the task
once the value is found. The key kernel code looks like:
for(; ; ) {
...
if (__gen_ocl_simd_any(...))
break; //the whole SIMD stop the searching
}
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
As we change to use if/endif and change the implementation of
the barrier, we don't need to maintain emask/notmask/barriermask
any more. Just remove them.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|