summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-09-25GBE: implement pre-register-allocation instruction scheduling.optimize2Zhigang Gong1-21/+116
To find out an instruction scheduling policy to achieve the theoretical minimum registers required in a basic block is a NP problem. We have to use some heuristic factor to simplify the algorithm. There are many researchs which indicate a bottom-up list scheduling is much better than the top-down method in turns of register pressure. I choose one of such research paper as our target. The paper is as below: "Register-Sensitive Selection, Duplication, and Sequencing of Instructions" It use the bottom-up list scheduling with a Sethi-Ullman label as an heuristic number. As we will do cycle awareness scheduling after the register allocation, we don't need to bother with cycle related heuristic number here. I just skipped the EST computing and usage part in the algorithm. It turns out this algorithm works well. It could reduce the register spilling in clBlas's sgemmBlock kernel from 83+ to only 20. Although this scheduling method seems to be lowering the ILP(instruction level parallism). It's not a big issue, because we will allocate as much as possible different registers in the following register allocation stage, and we will do a after allocation instruction scheduling which will try to get as much ILP as possible. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-25GBE: fix a zero/one's liveness bug.Zhigang Gong1-0/+29
This is a long standing bug, and is exposed by my latest register allocation refinement patchset. ir::ocl::zero and ir::ocl::one are global registers, we have to compute its liveness information carefully, not just get a local interval ID. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-25GBE: we no longer need to allocate register from two directions.Zhigang Gong2-2/+2
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-25GBE: don't always allocate ir::ocl::one/zeroZhigang Gong5-13/+17
Use liveness information, we can only allocate them on demand. And they could be treated as non-curbe-payload register. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-25GBE: don't treat btiUtil as a curbe payload register.Zhigang Gong8-99/+128
Btiutil should be just a normal temporary register and only alive for those specific laod/store instructions with mixed BTI used. Although btiutil only takes one DW register space, but in practice, it may waste one entire 32-byte register space as it has very long live range. This patch fix this issue completely. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-25GBE: refine longjmp checking.Zhigang Gong2-2/+26
v2: simplify the logic in function.hpp. Let the user to prepare correct start and end point. Fix the incorrect start/end point for one forward jump and one backward jump case. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-25GBE: refactor curbe register allocation.Zhigang Gong17-226/+266
The major motivation is to normalize the curbe payload's allocation and prepare to use liveness information to avoid unecessary payload register allocation and avoid fragments when allocate curbe registers. For an example, for GBE_CURBE_LOCAL_ID_Y/Z, many one dimention kernels don't need them. But previous curbe allocation occurs before the liveness interval computing, thus it will allocate that curbe anyway. Altough it will be expired soon but it still need us to prepare those payload at host side. After this patch, this type of overhead has been eliminated easily. Another purpose is to eliminate the ugly curbe patch list handling in backend. After this patch, the curbe register handling is much cleaner than before. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-24GBE: avoid vector registers when there is high register pressure.Zhigang Gong1-3/+1
If the reservedSpillRegs is not zero, it indicates we are in a very high register pressure. Use register vector will likely increase that pressure and will cause significant performance problem which is much worse than use a short-live temporary vector register with several additional MOVs. So let's simply avoid use vector registers and just use a temporary short-live-interval vector. v2: remove out-of-date comments. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-24GBE: enable post phi copy optimization function.Zhigang Gong1-1/+1
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-24GBE: Don't try to remove instructions when liveness is in dynamic update phase.Zhigang Gong1-14/+7
As we want to avoid liveness update all the time, we maintain the liveness information dynamically during the phi mov optimization. Instruction(self-copy) remving bring unecessary complexity here. Let's avoid do that here, and do the self-copy removing latter in removeMOVs(). v2: forgot to remove incorrect liveness checking for special registers. Now remove them. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-23GBE: continue to refine interfering check.Zhigang Gong2-23/+123
More aggresive interfering check, even if both registers are in Livein set or Liveout set, they are still possible not interfering to each other. v2: Liveout interfering check need to take care those BBs which has only one register defined. For example: BBn: ... MOV %r1, %src ... Both %r1 and %r2 are in the BBn's liveout set, but %r2 is not defined or used in BBn. The previous implementation ignore this BB which is incorrect. As %r1 was modified to a different value, it means %r1 could not be replaced with %r2 in this case. v3: Add comments and assertion to restrict the usage of interleve check functions of DAG class. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-23GBE: implement further phi mov optimization based on intra-BB interefering ↵Zhigang Gong1-6/+130
analysis. The previous phi mov optimization try to reduce the phi copy source register and the phi copy register if the phi copy source register is a normal SSA value. But for some cases, many phi copy source registers are also phi copy value which has multiple definitions. And they could all be reduced to one phi copy register if there is no interfering in all BBs. This patch with the previous patches could reduce the whole spilled register from 200+ to only 70 for a SGEMM kernel and the performance could boost about 10 times. v2: Add one FIXME tag to indicate one more optimization opportunity we missed in current implementation. Could be solved in the future. v3: Disable postPhi mov optimization for now as there is a liveness bug need to be fixed. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-23GBE: add some dag helper routines to check registers' interfering.Zhigang Gong2-0/+113
These helper function will be used in further phi mov optimization. v2: remove the useless debug message code. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-23GBE: add two helper routines for liveness partially update.Zhigang Gong2-0/+44
We don't need to recompute the entire liveness information for all cases. This is a preparation patch for further phi copy optimization. v2: also need to update varKill set. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-23GBE: refine liveness analysis.Zhigang Gong3-9/+12
Only in gen backend stage, we need to take care of the special extra liveout and uniform analysis. In IR stage, we don't need to handle them. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-23GBE: refine Phi copy interfering check.Zhigang Gong1-0/+2
If the PHI source register's definition instruction uses the phi register, it is not a interfere. For an example: MOV %phi, %phicopy ... ADD %phiSrcDef, %phi, tmp ... MOV %phicopy, %phiSrcDef ... The %phi and the %phiSrcDef is not interering each other. Simply advancing the start of the check to next instruction is enough to get better result. For some special case, this patch could get significant performance boost. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-23Driver: fix the annoying "Failed to release userptr..." error messagePan Xiuli1-2/+4
It is a drm related bug. As the drm driver changed the time to free their test userptr to bufmgr destroy(30921483c70c6939f017476eac13da6aa26b3b3c), we need anothr order to release our driver to make sure the test userptr can be freed with a valid fd. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-23Calculate appropriate timestamps for cl profileMidhun Kodiyath3-4/+71
Fix to calculate the current cpu monotonic raw timestamp in nanoseconds for enqueued,submitted,start and finshed and send this to application based on the parameter queries. Signed-off-by: Midhun Kodiyath <midhunchandra.kodiyath@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-22add bswap64 in utest.Luo Xionghu2-5/+72
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-22add bswap64 for gen7/gen75 and gen8 seperately.Luo Xionghu2-0/+174
as the long type data layout is not continous on platform gen7/gen75, the indirect address access pattern is a bit different than gen8. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-22fix bswap bug.Luo Xionghu2-6/+12
if the source is uniform and dst is non-uniform, no need to add the indirect address index. v2: missing a uniform check in gen8 context UD bswap. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-22add utest for creating 2d image from buffer.Luo Xionghu2-0/+83
v2: check cl_khr_image2d_from_buffer support first; use CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT to allocate memory. v3: fix clGetDeviceInfo use. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-09-22enable create image 2d from buffer in clCreateImage.Luo Xionghu7-29/+99
this patch allows create 2d image with a cl buffer with zero copy. v2: should use reference to manage the release the buffer and image. After being created, the buffer reference count is 2, and image reference count is 1. if image is released first, decrease the image reference count and buffer reference count both, release the bo when the buffer is released at last; if buffer is released first, decrease the buffer reference count only, release the buffer when the image is released. add CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT in cl_device_info. v3: move is_image_from_buffer to _cl_mem_image; return CL_INVALID_IMAGE_SIZE if image size is larger than the buffer. v4: pitchalignment set to 2. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-09-22return 32 could gain 0.2% performance on opencv optical flow case.Luo Xionghu1-1/+1
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-09-21should check the return value of cl_program_new.Luo Xionghu1-0/+18
catch the error: out of host memery. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-21GBE: Minor refine uw1grf(nr, subnr).Ruiling Song1-1/+7
let's just keep things simple. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-21GBE: fix ub1grf(nr, subnr) issue.Ruiling Song1-1/+7
suboffset() will not set .subnr correctly, as vec1() will get a horizontal stride 0 register. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-21Fix clLinkProgram error.Yang Rong2-16/+29
All programs or none programs specified by input_programs contain a compiled binary or library for the device. Otherwise return CL_INVALID_OPERATION. Correct this condition check. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Luo, Xionghu <xionghu.luo@intel.com>
2015-09-18Don't use cl_buffer_get_subdata in clEnqueueReadBuffer.Yang Rong1-1/+4
cl_buffer_get_subdata sometime is very very very slow in linux kernel, in skl and chv, and it is random. So temporary disable it, use map/copy/unmap to read. Should re-enable it after find root cause. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Luo, Xionghu <xionghu.luo@intel.com>
2015-09-18Fix piglit clLinkProgram fail.Yang Rong7-9/+80
1. return CL_INVALID_LINKER_OPTIONS when invalid options, using clang to check the options. 2. return CL_INVALID_OPERATION when the binary type is not same. 3. When link fail, will not return CL_LINK_PROGRAM_FAILURE, fix it. 4. Should not delete program in genProgramBuildFromLLVM, the program is new and delete from runtime. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Luo, Xionghu <xionghu.luo@intel.com>
2015-09-09GBE: fix build error with LLVM 3.5 and previous version.Zhigang Gong1-1/+6
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-08GBE: add check dumpASMFileName.empty()Ruiling Song1-5/+8
Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-08GBE: Use addRemappedFile to avoid creating temporary cl source file.Zhigang Gong1-30/+10
LLVM provides powerful string-remapped feature which could be used to map a string to an input file name, thus we don't need to create a temporary cl source file any more. This patch not only make things much clear and avoid the unecessary file creation. It only fixes some weird directory related problems. Because beignet creates the temoprary file at the /tmp directory. Then the clang will search the include files in that directory by default, but the developer expects it to search the working directory firstly. This causing two weird things: 1. If a .cl file is including a .h file in the current directory, beignet will not find it. 2. Even if the probram add a "-I." option manually, beignet will search /tmp firstly, and if there is a .h file in /tmp/ with the eaxct same file name, beignet will the file located in /tmp. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Luo, Xionghu <xionghu.luo@intel.com>
2015-09-08utests: Added unit tests to test LLVM and ASM dump generation.Sirisha Gandikota1-0/+107
This patch adds 2 new tests to the unit tests. It uses the existing framework and data structures and tests the llvm/asm dump generation when these flags (-dump-opt-llvm, -dump-opt-asm) are passed as build options along with the dump file names. Methods added: 1) get_build_llvm_info() tests LLVM dump generation 2) get_build_asm_info() tests ASM dump generation Signed-off-by: Sirisha Gandikota <sirisha.gandikota@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2015-09-07Utest: Add -cl-kernel-arg-info to the utest test_get_arg_infoJunyan He1-1/+1
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-07Runtime: Add NULL pointer check in clGetKernelArgInfoJunyan He1-1/+2
There is no NULL pointer check for kernel->program->build_opts. This will cause utest test_get_arg_info crash. In fact, we will add -cl-kernel-arg-info flag for compiling ever time, and so the arg info is always avaible. But some test case deliberately unset this flag and expect the ERR return value, so we really need a check here. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-02Fix clGetKernelArgInfo fail on piglitPan Xiuli2-9/+13
1.Change the code for null param_value 2.Add the return value check for build option "-cl-kernel-arg-info" 3.Correct one return value typo Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-08-27GBE: a potential bug in instruction scheduling.Zhigang Gong1-1/+5
ENDIF should be treated as barrier-like instruction in instruction scheduling. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Luo, Xionghu <xionghu.luo@intel.com>
2015-08-27GBE: one minor bug in OP_SIMD_XXX.Zhigang Gong1-1/+7
Need to take care of the uniform cases. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-08-27utests: refine image 1d buffer test case.Zhigang Gong2-53/+32
We need to test large image 1d buffer read and write testing. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-08-27GBE: fix the broken image_1d_buffer write.Zhigang Gong1-1/+13
We should treat it as a 2D image as image 1d buffer may be exceed the 1D image size restrication. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-08-27correct simd width when dst of simd_shuffle is scalarGuo Yejun1-0/+5
originally, the dst of simd_shuffle is not uniform, but if it is optimized as scalar, just use simd_width=1 to generate sel_op/asm Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-08-27remove GBE_CURBE_STACK_POINTER in payloadGuo Yejun9-30/+60
initialize the data inside kernel with packed integer vector V2: call functions from ctx, instead of ctx.registerAllocator Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-08-24backend/src/backend: Handle -dump-opt-llvm=[PATH] in clCompileProgram and ↵Manasi Navare1-30/+31
clBuildProgram OpenCL API This is a resubmission of the patch with support for LLVM 3.4 Allows the user to request a dump of the LLVM-generated IR to the file specified in [PATH] through clCompileProgram options Signed-off-by: Manasi Navare <manasi.d.navare@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-08-20GBE/PRINTF: store variable instead of pointer in "slots".Luo Xionghu3-13/+27
this could fix the bug: https://bugs.freedesktop.org/show_bug.cgi?id=90472 v2: the vector "slots" stores the pointer of PrintfSlot from vector "fmts", but the push_back operation of "fmts" will cause resize if capacity is not enough and call the copy constructor and destructor of that PrintfSlot, leading to a illegal pointer in "slots", so this patch change to store the variable instead of pointer. update the destructor of PrintfSlot according to the SLOT_TYPE. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Junyan He <junyan.he@inbox.com>
2015-08-14fix issue when build against llvm3.3Guo Yejun1-1/+7
llvm 3.3 has a different constructure of llvm::raw_fd_ostream V2: refine the code Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2015-08-13backend: Turn on ASM dump.Manasi Navare2-0/+11
Open the file specified for the ASM dump and write the assembly to it. Signed-off-by: Manasi Navare <manasi.d.navare@intel.com> Signed-off-by: Laura Ekstrand <laura.d.ekstrand@intel.com> Reviewed-by: Song, Ruiling <ruiling.song@intel.com>
2015-08-13backend: Add ASM file name to GenContext object.Laura Ekstrand3-0/+9
Part of the plumbing that passes the ASM file name from the compiler options level down to the emitCode level so that the assembly can be written to that file. Signed-off-by: Manasi Navare <manasi.d.navare@intel.com> Signed-off-by: Laura Ekstrand <laura.d.ekstrand@intel.com> Reviewed-by: Song, Ruiling <ruiling.song@intel.com>
2015-08-13backend: Add ASM file name to GenProgram object.Laura Ekstrand2-2/+4
Part of the plumbing that passes the ASM file name from the compiler options level down to the emitCode level so that the assembly can be written to that file. Signed-off-by: Manasi Navare <manasi.d.navare@intel.com> Signed-off-by: Laura Ekstrand <laura.d.ekstrand@intel.com> Reviewed-by: Song, Ruiling <ruiling.song@intel.com>
2015-08-13backend, src: Add ASM file name to gbe_program_new_from_llvmLaura Ekstrand4-2/+4
Part of the plumbing that passes the ASM file name from the compiler options level down to the emitCode level so that the assembly can be written to that file Signed-off-by: Manasi Navare <manasi.d.navare@intel.com> Signed-off-by: Laura Ekstrand <laura.d.ekstrand@intel.com> Reviewed-by: Song, Ruiling <ruiling.song@intel.com>