summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-12-21Backend: Fix a memory leak for structurizer.HEADmasterJunyan He1-5/+4
In structurizer, the useless instruction is just be erased from block. The iintrusive_list::erase() just unlink the instruction, but not free its resource. We should use remove() to deallocate the instruction object. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-21add support for build option -cl-fast-relaxed-mathGuo Yejun6-13/+35
current code handles this option at clang level, actually, it is also necessary at LLVM -> GEN stage. V2: check if options is NULL Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14fix debug instruction welform assert.Luo Xionghu1-1/+7
handle llvm.fabs.f16 instruction in case compiler_half_isnan. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14fix workgroup_broadcast instruction debug mode assert.Luo Xionghu1-1/+1
the workgoup related parameter of the workgroup_broadcast function is FAMILY_QWORD. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14Utest: Add a bitonic sort test for non-constant extractelementPan Xiuli3-1/+94
This test case is for the new added non-constand index extractelement path in llvm_scalarize pass. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14Backend: Implement the non-constant extractelement scalarizePan Xiuli1-7/+39
The maybe non-constant index for the extractelement inst, it was not implemented in the llvm_scalarize. Now provide an implemention by allocating a new vector and storing all the component in it. Then can get the needed component by GEP inst. V2: Remove debuginfo and fix map insert bug Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14Utests: Add test cases for reduce add.Junyan He2-0/+81
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14Backend: Add reduce add to gen_context.Junyan He1-2/+22
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14Utests: Add test cases for workgroup reduce max/min.Junyan He3-0/+203
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14Runtime: Add the threadid calculation for curbe.Junyan He1-1/+11
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14Backend: Implement reduce min and max in gen_contextJunyan He1-5/+279
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14Backend: Add state register into schedule consideration.Junyan He1-4/+12
Because the workgroup OP has forwarding msg and wait functions, it needs all the threads to sync with each other. It has very similar behavior as BARRIER, so we add it into schedule consideration accordingly. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14Backend: Add WORKGROUP_OP instruction selection.Junyan He6-0/+41
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14Add forward message function for gen encoder.Junyan He2-0/+14
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14Backend: Establishing the thread/TID-EUID map.Junyan He2-8/+151
We need to use forward message to send data and sync threads within the same work group. The HW lack the feature to get the TID and EUID of other threads. So we need to establish a map for this usage. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14libocl: Refine the workgroup functions, add signed info.Junyan He1-57/+57
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14Backend: Add threadid as a curbe register.Junyan He4-3/+7
We need to build a map between logical workgroup thread and real hw thread. This curbe will hold the logical workgroup thread ID when it is executed on some HW thread. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14Backend: Add tidMapSLM and wgBroadcastSLM to each function.Junyan He3-9/+20
We will use SLM to store the value to broadcast and the map between real hw thread and logical workgroup thread. These two values give the offset in the SLM. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14Backend: Add sr0 reg helper function.Junyan He2-7/+11
sr0 is used to specify the state reigster where we can get the state of each EU thread. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14add Broxton supportGuo Yejun12-12/+213
special versions of linux kernel and libdrm are needed. utest and conformance test PASSED. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Junyan He <junyan.he@linux.intel.com>
2015-12-10fix LLVM 3.5 fail.Luo Xionghu1-0/+14
aslo call llvm::WriteBitcodeToFile for -dump-spir-binary option under LLVM 3.5. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-10utests: add an utest for mixPan Xiuli3-1/+56
Add a testcase for compiler mix. Since mix will have error, we take err limit as 1e-3 and print the max err. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-10Backend: refine mix with hardware lrp functionPan Xiuli13-1/+37
EU support lrp function that simillar to mix, but only with float, so refine only float related mix with lrp. There will be little errors whit mix now with lrp. V2: Rebase the patch Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-10utest: add utest to generate spir binary from beignet.Luo Xionghu1-0/+99
include case compile_spir_binary and build_spir_binary. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-10backend: enable option -dump-spir-binary to generate SPIR binary from beignet.Luo Xionghu1-5/+22
allow user to pass "-dump-spir-binary=[file_name]" to generate the spir binary to the file. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-10runtime: add macro DEBUGP() to handle debug printf.Ruiling Song2-3/+9
It will output debug message under debug mode, and will not output under release mode. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-10Utest: Add test for get_global/local_linear_idPan Xiuli5-1/+183
Add two utest case test for OCL2.0 new work-item built-in functions. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-10utest: write to dst buffer to fix utest failureRuiling Song1-0/+2
the utest will check on this. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-10libocl: Add three work-item built-in functionPan Xiuli2-0/+33
Add get global/local linear id by calculate with global/local id, size and offset. The get_queue_local_size() and get_loal_size() should be different when the global work group size is not uniform, but now they are the same. We will refine these functions when we support non-uniform work-group size. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Junyan He <junyan.he@linux.intel.com>
2015-12-09SKL: Use kernel-defined MOCS values instead of assuming hardware defaults.Francisco Jerez1-2/+2
Reported to fix fix a ~50% performance regression (in OpenCV 3.0 and Luxmark 2.1 among others) with v4.3 kernels on Gen9 hardware. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92975 Signed-off-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09fix gcc build error.Luo Xionghu2-2/+2
this link fail appears on gcc 5.2.1. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09GBE: implement pre-register-allocation instruction scheduling.Zhigang Gong1-21/+116
To find out an instruction scheduling policy to achieve the theoretical minimum registers required in a basic block is a NP problem. We have to use some heuristic factor to simplify the algorithm. There are many researchs which indicate a bottom-up list scheduling is much better than the top-down method in turns of register pressure. I choose one of such research paper as our target. The paper is as below: "Register-Sensitive Selection, Duplication, and Sequencing of Instructions" It use the bottom-up list scheduling with a Sethi-Ullman label as an heuristic number. As we will do cycle awareness scheduling after the register allocation, we don't need to bother with cycle related heuristic number here. I just skipped the EST computing and usage part in the algorithm. It turns out this algorithm works well. It could reduce the register spilling in clBlas's sgemmBlock kernel from 83+ to only 20. Although this scheduling method seems to be lowering the ILP(instruction level parallism). It's not a big issue, because we will allocate as much as possible different registers in the following register allocation stage, and we will do a after allocation instruction scheduling which will try to get as much ILP as possible. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-12-09change the sampler type value to keep same with spir spec.Luo Xionghu1-17/+16
these value are defined in 2.1.3.1 @ spir_spec-1.2.pdf. v2: include new patch. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09gbe/libocl: define the gentype half_xxx math function instead of using MACRO.Luo Xionghu3-29/+70
these function need to be overloadable for link. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09gbe: add vec_type_hint's type into functionAttributes.Luo Xionghu3-7/+104
for SPIR kernel, user may call clGetKernelInfo with CL_KERNEL_ATTRIBUTES to query the functionAttributes. v2: remove debug code. v3: fix llvm 3.5 build fail. v4: only llvm-3.6 or later has kernel_arg_base_type in metadata. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09runtime: add missing supported format image_1d_buffer.Luo Xionghu1-0/+1
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09runtime: fix clLinkProgram bug.Luo Xionghu1-0/+10
clLinkProgram need check the existence of "-cl-kernel-arg-info" build_option of all the input_programs. User may link two SPIR program and call clGetKernelArgInfo to query kernel args. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09runtime: fix clCompileProgram bug.Luo Xionghu3-13/+6
forgot to add FROM_LLVM_SPIR in compileProgram; the BINARY_TYPE is BINARY_TYPE_INTERMIDIATE if create from SPIR binary. v2: refine the source_type logic: source_type is already set in clCreateProgramWithSource or clCreateProgramWithBinary, shouldn't be set in clBuildProgram or clCompileProgram. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09runtime: initialize the memory content to 0.Luo Xionghu1-0/+1
v2: move the memset inside the pointer check. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09gbe/libocl: change xxx_fence function to OVERLOADABLE.Luo Xionghu2-6/+6
the mem_fence, read_mem_fence and write_mem_fence functions need be OVERLOADABLE for link. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09gbe: use kernel_arg_base_type to recognize image arguments.Luo Xionghu2-7/+21
we use string compare function to recognize the image type, but image types could be defined to other type, so use kernel_arg_base_type to recognize image arguments. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09gbe/libocl: define the vloada_xxx function instead of using MACRO.Luo Xionghu2-8/+26
these function need to be overloadable for link. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09add utest to demo how to run CM kernerl via OpenCL APIsGuo Yejun3-0/+280
In this test case, the CM kernel is in VISA binary format, not in GenX Binary format, it means that the CM jitter is needed to compile the CM kernel from VISA format to GenX format, please refer to cmrt_package_path/jitter/readme.txt to prepare the jitter. v2: add comments about the CM jitter Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09make Beignet as intermedia layer of CMRTGuo Yejun15-14/+497
CMRT is C for Media Runtime on Intel GPU, see https://github.com/01org/cmrt. There is a request to make Beignet as intermedia layer of CMRT, in other words, application programer write OpenCL APIs to execute the CM kernel on GPU, the following shows the key code, and please refer to the next patch of unit test for detail. prog = clCreateProgramWithBinary("cm kernel"); clBuildProgram(prog); kernel = clCreateKernel(prog, "kernel name"); image = clCreateImage(); clSetKernelArg(kernel, image); clEnqueueNDRangeKernel(kernel); Inside Beignet, once cm kernel is invoked, the following relative APIs will be directly passed to CMRT library (libcmrt.so) which is loaded via dlopen only when necessary. Since we use this simple method to keep the code clean, OpenCL spec is not strictly followed, and cl_event is not supported for this case. v2: add comments about the cm queue in fuction cmrt_enqueue Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-02GBE/DebugInfo: Print line and column NO. with ASMBai Yannan1-0/+7
Print line and column numbers with ASM, if OCL_DEBUGINFO is true. Signed-off-by: Yannan Bai <yannan.bai@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-12-02GBE/DebugInfo: Pass debug info : SEL IR => GenInsnBai Yannan4-1/+23
1. Add a structure DBGInfo in GenEncoder class, storing debug infomation for subsequentlt passing to GenInsn. 2. Add a vector<DebugInfo> in GenEncoder class, storing debug information corresponding to vector<GenInstruction>. 3. Pass debug information from SEL IR firstly to Gen Encoder, then add to vector when emitting, if OCL_DEBUGINFO is true. Signed-off-by: Yannan Bai <yannan.bai@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-12-02GBE/DebugInfo: Pass debug info : GEN IR => SEL IRBai Yannan2-0/+14
1. Add a DebugInfo type structure DBGInfo in Opaqueue class, storing debug infomation in selection for subsequentlt passing to selection IR. 2. Add a DebugInfo type structure DBGInfo in SelectionInstruction class, storing debug infomation. 3. Pass debug information from GEN IR firstly to selection queue, then pass to selection IR when emitting, if OCL_DEBUGINFO is true. Signed-off-by: Yannan Bai <yannan.bai@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-12-02GBE/DebugInfo: Pass debug info :llvm IR => GEN IRBai Yannan4-1/+25
1. Add a DebugInfo type structure DBGInfo into context class, storing debug information. 2. Add a DebugInfo type structure DBGInfo in Instruction class, storing debug infomation. 3. Pass debug information firstly from llvm IR to Context, then to GEN IR when emiting, if OCL_DEBUGINFO is true. Signed-off-by: Yannan Bai <yannan.bai@intel.com> Signed-off-by: Meng Lv <meng.lv@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-12-02GBE/DebugInfo: Enable new featureBai Yannan2-0/+7
1. Add a bool env OCL_DEBUGINFO to enable generate debug infomation while compiling cl source, set it false as default value. 2. Define a structure type DebugInfo. Signed-off-by: Yannan Bai <yannan.bai@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-11-26check image from buffer's base address alignment.Luo Xionghu1-0/+9
per spec, if create image from USE_HOST_PTR buffer, the buffer's base address need be aligned. v2: return error code CL_VALID_IMAGE_FORMAT_DESCRIPTOR. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>