summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-11-25Add utest for workgroup_broadcast.Junyan He3-0/+57
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-25Handle the WorkGroup_Broadcast logic in insn_selection.Junyan He1-0/+87
We use slm to store the value which will be broadcasted to the whole work group. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-25Add WorkGroup functions to Gen IR logic in llvm_gen_backend.Junyan He2-1/+96
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-25Add the WorkGroupInstruction as a new type of instruction.Junyan He3-0/+190
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-25libocl: Add the module for work_group functions.Junyan He4-1/+246
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-25Add a benchmark which test do 3*3 median filter in image.Meng Mengmeng2-7/+47
It's basic image test for uchar, ushort and uint. v2: convert uint to float before do median filter and use intermediate variable in if loop. Signed-off-by: Meng Mengmeng <mengmeng.meng@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-11-25Add a benchmark which test do 3*3 median filter in buffer.Meng Mengmeng2-9/+75
It's basic buffer test for uchar, ushort and uint. Signed-off-by: Meng Mengmeng <mengmeng.meng@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-11-25Refine the benchmark tests: copy buffer and image.Meng Mengmeng2-6/+6
Get FPS of the two benchmarks in place of GB/S. v2: Operating 1000 frame instead of 100. Signed-off-by: Meng Mengmeng <mengmeng.meng@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-11-25Add a option which could set the benchmark unit properly.Meng Mengmeng10-15/+15
For benchmarks, the units are varied e.g. GB/S, FPS, score and so on. So we need to make a choice for every benchmark. Signed-off-by: Meng Mengmeng <mengmeng.meng@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-11-25Backend: Refine printfs into ir unitPan Xiuli8-29/+25
Move the printfs of PrintfParser into the ir::Unit to make the gbe thread safe. The old static printfs will be cleared by othrer thread when running in multithread. V2: Rebase the patch Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-25runtime: set CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE to kernel's ↵Zhigang Gong5-6/+13
SIMD_WIDTH. It makes sense to set CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE to the corresponding SIMD size. Then it provides a way for intel's OCL application to get SIMD width at runtime and make some SIMD width dependant optimization possible. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-11-25GBE: decrease the loop unrolling threshold to 640.Zhigang Gong1-1/+1
1024 is some how too large for some kernels and may cause some kernels fail to build due to lack of enough scratching space. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-11-25GBE: remove useless assertions code.Zhigang Gong1-9/+5
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-11-25GBE: don't assert even if we fail to compile kernel at the backend stage.Zhigang Gong5-17/+31
We should not assert even if the application triggers a internal limitation such as lack of scratch space. We should return error to the application and let the application to make further decision. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-11-25GBE: extent register allocator size/offset to 32bit.Zhigang Gong2-29/+29
Because the range of scratch size exceed the int16_t's maximum size. We have to extent these elements to 32 bit. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-11-25Utests: Fix the failure for half math tests.Junyan He1-17/+24
We do not have native half type support on X86 platforms. The half math functions on CPU side are just used in utests, so we do not want to import the soft imitation code or add dependency on some math libs for half. We just use float to to calculate the reference value. This causes the diff between CPU results and GPU results. We use random func to generate src value but when this src value is very close to pi or pi/2, the truncation diff imported by float -> half will be magnified a lot in the result of some math functions, e.g. sin, cos and tan. We now just use a float table as src to fix this. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-25Backend: Add gen9 barrier prediction settingPan Xiuli1-0/+1
Gen9 have a different context to emit BarrierInst that contains wait instruction, and wait instruction need to be no predication. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-24Backend: add debugwait functionPan Xiuli13-5/+91
Use wait function to extend a debug function: void debugwait(void) This function can hang the gpu unless gpu reset or host send something to let it go. EXTREMELY DANGEROUS for machines turn off hangcheck v2: Fix some bugs, and add setting predicate and execwidth, also modify some inst scheduling v3: Add push and pop in insturction selection, and set nomask with execwidth. v4: Fix barrier predicate setting bugs, and rebase the patch Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-24Backend: enable to choose notification registerPan Xiuli3-5/+5
There are 3 notification can be used by wait, so we should be able to choose which one we'd like to use. Also the 3 reg is n0.0 n0.1 and n0.2 so also change the function name. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-20GBE: CreateCall2 is removed in llvm 3.7.Ruiling Song1-4/+7
Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-19Runtime: return the correct error code in cl_event_check_waitlist.Yang Rong1-2/+4
Return CL_INVALID_CONTEXT if the context associated with command_queue and events in event_wait_list are not the same. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Luo Xionghu <xionghu.luo@intel.com>
2015-11-19Fix sizing error for bitfieldGiuseppe Bilotta1-1/+1
The mergeable field was define as an uint32_t, but MAX_SRC_NUM is now 40, so we need at least an uint64_t. Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-11-17Backend: Append the reg interval for registers need for profiling.Junyan He1-0/+47
The work dim information related registers and timestamp registers are always needed in curbe. We need to set the correct life interval for them. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-17Backend: Implement StoreProfilingInstruction in GenContext.Junyan He1-0/+167
The offset 0 of the profiling buffer contains the log number. We will use atomic instruction to inc it every time a log is generated. We will generate one log for each HW gpu thread. The log contains the XYZ range of global work items which are executed on this thread, the EU id, the Sub Slice id, thread number, and 20 points' timestamp which we are interested in. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-17Backend: Implement emitCalcTimestampInstruction in GenContext.Junyan He1-2/+109
We will maintain a real clock to record the real execute time of the orginal code. We do not want to introduce overhead because of adding the profiling instructions, so every time we enter the proliling instructions block, we will calculate the real time clock value and update the real clock, and when leave this the proliling instructions block, we will record the time stamp of that leave point. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-17Backend: Add ADD_ and SUB_ timestamps help functions.Junyan He4-7/+67
The timestamps are calculated by Long type. Before BDW, there is no Long type support and we use i32 operations to implement them. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-17Backend: Avoid CALC_TIMESTAMP and STORE_PROFILING being scheduled.Junyan He1-1/+3
We do not want CALC_TIMESTAMP and STORE_PROFILING to be scheduled with other instructions, because it will get the wrong timestamps. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-17Backend: Fix two bugs about curbe related pointer.Junyan He2-6/+8
1. rename __gen_ocl_timestamp_buf to __gen_ocl_profiling_buf 2. printfbptr printfiptr and profilingbptr should be 64 bits on BDW later platforms. So just set them to QWORD. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-17Runtime: Bind the profiling buffer when profiling enabled.Junyan He6-1/+126
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-17Add profiling info APIs to runtime.Junyan He6-1/+80
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-17Backend: Add profilingProlog function for GenContext.Junyan He2-0/+135
The profilingProlog will collect useful information for profiling, including XYZ global range and prolog timestamp. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-17Backend: Add tm0 function for arf timestamp register.Junyan He1-0/+10
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-17Backend: Add a auxiliary function to convert GenReg to uniform.Junyan He1-0/+9
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-17Backend: Add CalcTimestamp and StoreProfiling to insn selection.Junyan He6-0/+163
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-17Backend: Add IVAR OCL_PROFILING_LOG to control profiling log.Junyan He9-10/+32
We add OCL_PROFILING_LOG as a int type, because there may be different types of profiling format in the future. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-17Backend: Add CalcTimestamp and StoreProfiling.Junyan He2-0/+47
When in profiling, the profiling inserter function will insert calc_timestamp for each point which we are interested in. At the end of the kernel, just before return, we will insert a store_profiling function call. The function will hold a reference to the global val profiling_buf and avoid it being released when run optimization passes. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-17Backend: Insert store_profiling before lowed return.Junyan He1-0/+7
After the lowering return pass, a new block which just has one RET instruction will be generated, and all RET INSTs in the middle will be replaced by BRA INST. We want our store_profiling instruction to be inserted just before that return instruction and out of any condition blocks. So we postpone the STORE_PROFILING here. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-17Backend: Add ProfilingInfo to Unit.Junyan He2-1/+15
The Unit will hold profiling infomation. The profiling infomation may be needed throughout the whole backend processing, so it is suitable to add it to unit. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-17Backend: Add profiling registers to curbe.Junyan He3-2/+24
Add five timestamp reigsters and one pointer register into curbe. The five timestamp reigsters will hold all the infomation of profiling timestamps, includes 20 uint timestamps for each point, 1 ulong prolog holding the start time and and 1 ulong epilog holding the end time of that kernel. The pointer reigster will hold the log buffer address. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-17Backend: Add ProfilingInserter and a new function pass.Junyan He2-0/+210
When user enables profiling feature, we need to insert extra instructions to record and store the timestamps. By now, the function pass will just insert the requred instructions at the head of first 20 blocks. Later, we will support to insert timestamps at any point in the code. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Signed-off-by: Bai Yannan <yannan.bai@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-17Backend: Add StoreProfiling and CalcTimestamp instructionsJunyan He3-2/+123
Add two instructions for profiling usage. CalcTimestamp will calculate the timestamps and update the timestamp in the according slot. StoreProfiling will store the information to buffer and generate logs. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-17Backend: Add ProfilingInfo class to ir.Junyan He3-0/+208
ProfilingInfo will play important role in output the profiling log. It will record the profiling information and generate the logs after clfinish. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-17First reference beignet's CL header to buildZhenyu Wang1-1/+3
This is to fix build error when new intel extension is added into beignet. As current cmake rule will use old system installed CL headers instead of beignet ones, which leads to compile failure as new extension definition won't be found. So this trys to simply always prefer to use beignet's CL header in build/install. Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-11-17CMake: Add -lrt to the link command of libcl.soJunyan He1-0/+1
The clock_gettime will cause the linkage error on some version of GCC, we need to add -lrt at the end of the link command line. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-11-17Full support of cl_intel_motion_estimation extension.Chuanbo Weng2-50/+175
The following items are supported in this commit: 1. Return residuals. 2. All types of mb_block_type, subpixel_mode, sad_adjust_mode in cl_motion_estimation_desc_intel. After this commit, cl_intel_motion_estimation is fully supported. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-11-11gbe: fix uitofp instruction issue.Luo Xionghu1-1/+11
llvm 3.7 may generate cast instructions "%13 = uitofp i1 %12 to float", while the dst type is float or double , should call the coresponding newXXXimmediate function. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-11-11runtime: extension size not enough.Luo Xionghu3-3/+10
define a MACRO to hold the value. v2: use same MACRO in cl_extensions.h; add header file protection for cl_extension.h. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-11-10Add document of video motion estimation support.Chuanbo Weng2-0/+80
v3: Fix two typos. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-11-10Add basic utest for block_motion_estimate_intel.Chuanbo Weng3-0/+111
If the CL device does not support this builtin kernel, the test returns PASS. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-11-10Add extensions intel_accelerator and basic intel_motion_estimation.Chuanbo Weng22-33/+1017
v2: 1. Just upload the first vme_state. 2. Remove duplicated code in check_opt1_extension. 3. Check image format before cl_gpgpu_bind_image_for_vme. 4. Fix error of getting mv. Because we suppose this kernel run in SIMD16 mode, so dword 0 of grf 1 should be __gen_ocl_region(8,vme_result.s0), not __gen_ocl_region(0,vme_result.s1). v3: Return CL_IMAGE_FORMAT_NOT_SUPPORTED if image format is not the required one. v4: Fix two conflicts after code rebase and wordaround a curbe related bug. v6: Treat simd8 and simd16 differently when getting mv. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>