Age | Commit message (Collapse) | Author | Files | Lines |
|
In structurizer, the useless instruction is just be
erased from block. The iintrusive_list::erase() just
unlink the instruction, but not free its resource.
We should use remove() to deallocate the instruction
object.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
current code handles this option at clang level, actually, it is
also necessary at LLVM -> GEN stage.
V2: check if options is NULL
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
handle llvm.fabs.f16 instruction in case compiler_half_isnan.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
the workgoup related parameter of the workgroup_broadcast function is
FAMILY_QWORD.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
This test case is for the new added non-constand index extractelement
path in llvm_scalarize pass.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
The maybe non-constant index for the extractelement inst, it was not
implemented in the llvm_scalarize. Now provide an implemention by
allocating a new vector and storing all the component in it. Then
can get the needed component by GEP inst.
V2:
Remove debuginfo and fix map insert bug
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Because the workgroup OP has forwarding msg and wait functions,
it needs all the threads to sync with each other. It has very
similar behavior as BARRIER, so we add it into schedule consideration
accordingly.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
We need to use forward message to send data and sync
threads within the same work group. The HW lack the
feature to get the TID and EUID of other threads. So
we need to establish a map for this usage.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
We need to build a map between logical workgroup thread and
real hw thread. This curbe will hold the logical workgroup thread
ID when it is executed on some HW thread.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
We will use SLM to store the value to broadcast and the map
between real hw thread and logical workgroup thread.
These two values give the offset in the SLM.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
sr0 is used to specify the state reigster where we can get the
state of each EU thread.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
special versions of linux kernel and libdrm are needed.
utest and conformance test PASSED.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
|
|
aslo call llvm::WriteBitcodeToFile for -dump-spir-binary option
under LLVM 3.5.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Add a testcase for compiler mix. Since mix will have
error, we take err limit as 1e-3 and print the max err.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
EU support lrp function that simillar to mix, but only
with float, so refine only float related mix with lrp.
There will be little errors whit mix now with lrp.
V2:
Rebase the patch
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
include case compile_spir_binary and build_spir_binary.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
allow user to pass "-dump-spir-binary=[file_name]" to generate the spir
binary to the file.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
It will output debug message under debug mode, and will
not output under release mode.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Add two utest case test for OCL2.0 new work-item built-in
functions.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
the utest will check on this.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Add get global/local linear id by calculate with global/local
id, size and offset. The get_queue_local_size() and get_loal_size()
should be different when the global work group size is not uniform,
but now they are the same. We will refine these functions when we
support non-uniform work-group size.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
|
|
Reported to fix fix a ~50% performance regression (in OpenCV 3.0 and
Luxmark 2.1 among others) with v4.3 kernels on Gen9 hardware.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92975
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
this link fail appears on gcc 5.2.1.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
To find out an instruction scheduling policy to achieve the theoretical minimum
registers required in a basic block is a NP problem. We have to use some heuristic
factor to simplify the algorithm. There are many researchs which indicate a
bottom-up list scheduling is much better than the top-down method in turns of
register pressure. I choose one of such research paper as our target. The paper
is as below:
"Register-Sensitive Selection, Duplication, and Sequencing of Instructions"
It use the bottom-up list scheduling with a Sethi-Ullman label as an
heuristic number. As we will do cycle awareness scheduling after the register
allocation, we don't need to bother with cycle related heuristic number here.
I just skipped the EST computing and usage part in the algorithm.
It turns out this algorithm works well. It could reduce the register spilling
in clBlas's sgemmBlock kernel from 83+ to only 20.
Although this scheduling method seems to be lowering the ILP(instruction level parallism).
It's not a big issue, because we will allocate as much as possible different registers
in the following register allocation stage, and we will do a after allocation
instruction scheduling which will try to get as much ILP as possible.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
these value are defined in 2.1.3.1 @ spir_spec-1.2.pdf.
v2: include new patch.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
these function need to be overloadable for link.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
for SPIR kernel, user may call clGetKernelInfo with CL_KERNEL_ATTRIBUTES
to query the functionAttributes.
v2: remove debug code.
v3: fix llvm 3.5 build fail.
v4: only llvm-3.6 or later has kernel_arg_base_type in metadata.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
clLinkProgram need check the existence of "-cl-kernel-arg-info"
build_option of all the input_programs. User may link two SPIR
program and call clGetKernelArgInfo to query kernel args.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
forgot to add FROM_LLVM_SPIR in compileProgram; the BINARY_TYPE is
BINARY_TYPE_INTERMIDIATE if create from SPIR binary.
v2: refine the source_type logic: source_type is already set in
clCreateProgramWithSource or clCreateProgramWithBinary, shouldn't be set
in clBuildProgram or clCompileProgram.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
v2: move the memset inside the pointer check.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
the mem_fence, read_mem_fence and write_mem_fence functions need
be OVERLOADABLE for link.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
we use string compare function to recognize the image type, but
image types could be defined to other type, so use kernel_arg_base_type
to recognize image arguments.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
these function need to be overloadable for link.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
In this test case, the CM kernel is in VISA binary format, not in
GenX Binary format, it means that the CM jitter is needed to compile
the CM kernel from VISA format to GenX format, please refer to
cmrt_package_path/jitter/readme.txt to prepare the jitter.
v2: add comments about the CM jitter
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
CMRT is C for Media Runtime on Intel GPU, see https://github.com/01org/cmrt.
There is a request to make Beignet as intermedia layer of CMRT, in
other words, application programer write OpenCL APIs to execute the
CM kernel on GPU, the following shows the key code, and please refer
to the next patch of unit test for detail.
prog = clCreateProgramWithBinary("cm kernel");
clBuildProgram(prog);
kernel = clCreateKernel(prog, "kernel name");
image = clCreateImage();
clSetKernelArg(kernel, image);
clEnqueueNDRangeKernel(kernel);
Inside Beignet, once cm kernel is invoked, the following relative APIs
will be directly passed to CMRT library (libcmrt.so) which is loaded
via dlopen only when necessary. Since we use this simple method to
keep the code clean, OpenCL spec is not strictly followed, and cl_event
is not supported for this case.
v2: add comments about the cm queue in fuction cmrt_enqueue
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Print line and column numbers with ASM, if OCL_DEBUGINFO is true.
Signed-off-by: Yannan Bai <yannan.bai@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
1. Add a structure DBGInfo in GenEncoder class, storing debug infomation for subsequentlt passing to GenInsn.
2. Add a vector<DebugInfo> in GenEncoder class, storing debug information corresponding to vector<GenInstruction>.
3. Pass debug information from SEL IR firstly to Gen Encoder, then add to vector when emitting, if OCL_DEBUGINFO is true.
Signed-off-by: Yannan Bai <yannan.bai@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
1. Add a DebugInfo type structure DBGInfo in Opaqueue class, storing debug infomation in selection for subsequentlt passing to selection IR.
2. Add a DebugInfo type structure DBGInfo in SelectionInstruction class, storing debug infomation.
3. Pass debug information from GEN IR firstly to selection queue, then pass to selection IR when emitting, if OCL_DEBUGINFO is true.
Signed-off-by: Yannan Bai <yannan.bai@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
1. Add a DebugInfo type structure DBGInfo into context class, storing debug information.
2. Add a DebugInfo type structure DBGInfo in Instruction class, storing debug infomation.
3. Pass debug information firstly from llvm IR to Context, then to GEN IR when emiting, if OCL_DEBUGINFO is true.
Signed-off-by: Yannan Bai <yannan.bai@intel.com>
Signed-off-by: Meng Lv <meng.lv@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
1. Add a bool env OCL_DEBUGINFO to enable generate debug infomation while compiling cl source, set it false as default value.
2. Define a structure type DebugInfo.
Signed-off-by: Yannan Bai <yannan.bai@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
per spec, if create image from USE_HOST_PTR buffer, the buffer's base
address need be aligned.
v2: return error code CL_VALID_IMAGE_FORMAT_DESCRIPTOR.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|