~gongzg/beignet - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2015-09-25	GBE: implement pre-register-allocation instruction scheduling.optimize2	Zhigang Gong	1	-21/+116
	To find out an instruction scheduling policy to achieve the theoretical minimum registers required in a basic block is a NP problem. We have to use some heuristic factor to simplify the algorithm. There are many researchs which indicate a bottom-up list scheduling is much better than the top-down method in turns of register pressure. I choose one of such research paper as our target. The paper is as below: "Register-Sensitive Selection, Duplication, and Sequencing of Instructions" It use the bottom-up list scheduling with a Sethi-Ullman label as an heuristic number. As we will do cycle awareness scheduling after the register allocation, we don't need to bother with cycle related heuristic number here. I just skipped the EST computing and usage part in the algorithm. It turns out this algorithm works well. It could reduce the register spilling in clBlas's sgemmBlock kernel from 83+ to only 20. Although this scheduling method seems to be lowering the ILP(instruction level parallism). It's not a big issue, because we will allocate as much as possible different registers in the following register allocation stage, and we will do a after allocation instruction scheduling which will try to get as much ILP as possible. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-25	GBE: fix a zero/one's liveness bug.	Zhigang Gong	1	-0/+29
	This is a long standing bug, and is exposed by my latest register allocation refinement patchset. ir::ocl::zero and ir::ocl::one are global registers, we have to compute its liveness information carefully, not just get a local interval ID. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-25	GBE: we no longer need to allocate register from two directions.	Zhigang Gong	2	-2/+2
	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-25	GBE: don't always allocate ir::ocl::one/zero	Zhigang Gong	5	-13/+17
	Use liveness information, we can only allocate them on demand. And they could be treated as non-curbe-payload register. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-25	GBE: don't treat btiUtil as a curbe payload register.	Zhigang Gong	8	-99/+128
	Btiutil should be just a normal temporary register and only alive for those specific laod/store instructions with mixed BTI used. Although btiutil only takes one DW register space, but in practice, it may waste one entire 32-byte register space as it has very long live range. This patch fix this issue completely. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-25	GBE: refine longjmp checking.	Zhigang Gong	2	-2/+26
	v2: simplify the logic in function.hpp. Let the user to prepare correct start and end point. Fix the incorrect start/end point for one forward jump and one backward jump case. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-25	GBE: refactor curbe register allocation.	Zhigang Gong	17	-226/+266
	The major motivation is to normalize the curbe payload's allocation and prepare to use liveness information to avoid unecessary payload register allocation and avoid fragments when allocate curbe registers. For an example, for GBE_CURBE_LOCAL_ID_Y/Z, many one dimention kernels don't need them. But previous curbe allocation occurs before the liveness interval computing, thus it will allocate that curbe anyway. Altough it will be expired soon but it still need us to prepare those payload at host side. After this patch, this type of overhead has been eliminated easily. Another purpose is to eliminate the ugly curbe patch list handling in backend. After this patch, the curbe register handling is much cleaner than before. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-24	GBE: avoid vector registers when there is high register pressure.	Zhigang Gong	1	-3/+1
	If the reservedSpillRegs is not zero, it indicates we are in a very high register pressure. Use register vector will likely increase that pressure and will cause significant performance problem which is much worse than use a short-live temporary vector register with several additional MOVs. So let's simply avoid use vector registers and just use a temporary short-live-interval vector. v2: remove out-of-date comments. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-24	GBE: enable post phi copy optimization function.	Zhigang Gong	1	-1/+1
	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-24	GBE: Don't try to remove instructions when liveness is in dynamic update phase.	Zhigang Gong	1	-14/+7
	As we want to avoid liveness update all the time, we maintain the liveness information dynamically during the phi mov optimization. Instruction(self-copy) remving bring unecessary complexity here. Let's avoid do that here, and do the self-copy removing latter in removeMOVs(). v2: forgot to remove incorrect liveness checking for special registers. Now remove them. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-23	GBE: continue to refine interfering check.	Zhigang Gong	2	-23/+123
	More aggresive interfering check, even if both registers are in Livein set or Liveout set, they are still possible not interfering to each other. v2: Liveout interfering check need to take care those BBs which has only one register defined. For example: BBn: ... MOV %r1, %src ... Both %r1 and %r2 are in the BBn's liveout set, but %r2 is not defined or used in BBn. The previous implementation ignore this BB which is incorrect. As %r1 was modified to a different value, it means %r1 could not be replaced with %r2 in this case. v3: Add comments and assertion to restrict the usage of interleve check functions of DAG class. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-23	GBE: implement further phi mov optimization based on intra-BB interefering ↵	Zhigang Gong	1	-6/+130
	analysis. The previous phi mov optimization try to reduce the phi copy source register and the phi copy register if the phi copy source register is a normal SSA value. But for some cases, many phi copy source registers are also phi copy value which has multiple definitions. And they could all be reduced to one phi copy register if there is no interfering in all BBs. This patch with the previous patches could reduce the whole spilled register from 200+ to only 70 for a SGEMM kernel and the performance could boost about 10 times. v2: Add one FIXME tag to indicate one more optimization opportunity we missed in current implementation. Could be solved in the future. v3: Disable postPhi mov optimization for now as there is a liveness bug need to be fixed. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-23	GBE: add some dag helper routines to check registers' interfering.	Zhigang Gong	2	-0/+113
	These helper function will be used in further phi mov optimization. v2: remove the useless debug message code. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-23	GBE: add two helper routines for liveness partially update.	Zhigang Gong	2	-0/+44
	We don't need to recompute the entire liveness information for all cases. This is a preparation patch for further phi copy optimization. v2: also need to update varKill set. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-23	GBE: refine liveness analysis.	Zhigang Gong	3	-9/+12
	Only in gen backend stage, we need to take care of the special extra liveout and uniform analysis. In IR stage, we don't need to handle them. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-23	GBE: refine Phi copy interfering check.	Zhigang Gong	1	-0/+2
	If the PHI source register's definition instruction uses the phi register, it is not a interfere. For an example: MOV %phi, %phicopy ... ADD %phiSrcDef, %phi, tmp ... MOV %phicopy, %phiSrcDef ... The %phi and the %phiSrcDef is not interering each other. Simply advancing the start of the check to next instruction is enough to get better result. For some special case, this patch could get significant performance boost. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-23	Driver: fix the annoying "Failed to release userptr..." error message	Pan Xiuli	1	-2/+4
	It is a drm related bug. As the drm driver changed the time to free their test userptr to bufmgr destroy(30921483c70c6939f017476eac13da6aa26b3b3c), we need anothr order to release our driver to make sure the test userptr can be freed with a valid fd. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-23	Calculate appropriate timestamps for cl profile	Midhun Kodiyath	3	-4/+71
	Fix to calculate the current cpu monotonic raw timestamp in nanoseconds for enqueued,submitted,start and finshed and send this to application based on the parameter queries. Signed-off-by: Midhun Kodiyath <midhunchandra.kodiyath@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-22	add bswap64 in utest.	Luo Xionghu	2	-5/+72
	Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-22	add bswap64 for gen7/gen75 and gen8 seperately.	Luo Xionghu	2	-0/+174
	as the long type data layout is not continous on platform gen7/gen75, the indirect address access pattern is a bit different than gen8. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-22	fix bswap bug.	Luo Xionghu	2	-6/+12
	if the source is uniform and dst is non-uniform, no need to add the indirect address index. v2: missing a uniform check in gen8 context UD bswap. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-22	add utest for creating 2d image from buffer.	Luo Xionghu	2	-0/+83
	v2: check cl_khr_image2d_from_buffer support first; use CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT to allocate memory. v3: fix clGetDeviceInfo use. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-09-22	enable create image 2d from buffer in clCreateImage.	Luo Xionghu	7	-29/+99
	this patch allows create 2d image with a cl buffer with zero copy. v2: should use reference to manage the release the buffer and image. After being created, the buffer reference count is 2, and image reference count is 1. if image is released first, decrease the image reference count and buffer reference count both, release the bo when the buffer is released at last; if buffer is released first, decrease the buffer reference count only, release the buffer when the image is released. add CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT in cl_device_info. v3: move is_image_from_buffer to _cl_mem_image; return CL_INVALID_IMAGE_SIZE if image size is larger than the buffer. v4: pitchalignment set to 2. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-09-22	return 32 could gain 0.2% performance on opencv optical flow case.	Luo Xionghu	1	-1/+1
	Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-09-21	should check the return value of cl_program_new.	Luo Xionghu	1	-0/+18
	catch the error: out of host memery. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-21	GBE: Minor refine uw1grf(nr, subnr).	Ruiling Song	1	-1/+7
	let's just keep things simple. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-21	GBE: fix ub1grf(nr, subnr) issue.	Ruiling Song	1	-1/+7
	suboffset() will not set .subnr correctly, as vec1() will get a horizontal stride 0 register. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-21	Fix clLinkProgram error.	Yang Rong	2	-16/+29
	All programs or none programs specified by input_programs contain a compiled binary or library for the device. Otherwise return CL_INVALID_OPERATION. Correct this condition check. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Luo, Xionghu <xionghu.luo@intel.com>
2015-09-18	Don't use cl_buffer_get_subdata in clEnqueueReadBuffer.	Yang Rong	1	-1/+4
	cl_buffer_get_subdata sometime is very very very slow in linux kernel, in skl and chv, and it is random. So temporary disable it, use map/copy/unmap to read. Should re-enable it after find root cause. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Luo, Xionghu <xionghu.luo@intel.com>
2015-09-18	Fix piglit clLinkProgram fail.	Yang Rong	7	-9/+80
	1. return CL_INVALID_LINKER_OPTIONS when invalid options, using clang to check the options. 2. return CL_INVALID_OPERATION when the binary type is not same. 3. When link fail, will not return CL_LINK_PROGRAM_FAILURE, fix it. 4. Should not delete program in genProgramBuildFromLLVM, the program is new and delete from runtime. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Luo, Xionghu <xionghu.luo@intel.com>
2015-09-09	GBE: fix build error with LLVM 3.5 and previous version.	Zhigang Gong	1	-1/+6
	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-08	GBE: add check dumpASMFileName.empty()	Ruiling Song	1	-5/+8
	Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-08	GBE: Use addRemappedFile to avoid creating temporary cl source file.	Zhigang Gong	1	-30/+10
	LLVM provides powerful string-remapped feature which could be used to map a string to an input file name, thus we don't need to create a temporary cl source file any more. This patch not only make things much clear and avoid the unecessary file creation. It only fixes some weird directory related problems. Because beignet creates the temoprary file at the /tmp directory. Then the clang will search the include files in that directory by default, but the developer expects it to search the working directory firstly. This causing two weird things: 1. If a .cl file is including a .h file in the current directory, beignet will not find it. 2. Even if the probram add a "-I." option manually, beignet will search /tmp firstly, and if there is a .h file in /tmp/ with the eaxct same file name, beignet will the file located in /tmp. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Luo, Xionghu <xionghu.luo@intel.com>
2015-09-08	utests: Added unit tests to test LLVM and ASM dump generation.	Sirisha Gandikota	1	-0/+107
	This patch adds 2 new tests to the unit tests. It uses the existing framework and data structures and tests the llvm/asm dump generation when these flags (-dump-opt-llvm, -dump-opt-asm) are passed as build options along with the dump file names. Methods added: 1) get_build_llvm_info() tests LLVM dump generation 2) get_build_asm_info() tests ASM dump generation Signed-off-by: Sirisha Gandikota <sirisha.gandikota@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2015-09-07	Utest: Add -cl-kernel-arg-info to the utest test_get_arg_info	Junyan He	1	-1/+1
	Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-07	Runtime: Add NULL pointer check in clGetKernelArgInfo	Junyan He	1	-1/+2
	There is no NULL pointer check for kernel->program->build_opts. This will cause utest test_get_arg_info crash. In fact, we will add -cl-kernel-arg-info flag for compiling ever time, and so the arg info is always avaible. But some test case deliberately unset this flag and expect the ERR return value, so we really need a check here. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-02	Fix clGetKernelArgInfo fail on piglit	Pan Xiuli	2	-9/+13
	1.Change the code for null param_value 2.Add the return value check for build option "-cl-kernel-arg-info" 3.Correct one return value typo Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-08-27	GBE: a potential bug in instruction scheduling.	Zhigang Gong	1	-1/+5
	ENDIF should be treated as barrier-like instruction in instruction scheduling. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Luo, Xionghu <xionghu.luo@intel.com>
2015-08-27	GBE: one minor bug in OP_SIMD_XXX.	Zhigang Gong	1	-1/+7
	Need to take care of the uniform cases. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-08-27	utests: refine image 1d buffer test case.	Zhigang Gong	2	-53/+32
	We need to test large image 1d buffer read and write testing. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-08-27	GBE: fix the broken image_1d_buffer write.	Zhigang Gong	1	-1/+13
	We should treat it as a 2D image as image 1d buffer may be exceed the 1D image size restrication. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-08-27	correct simd width when dst of simd_shuffle is scalar	Guo Yejun	1	-0/+5
	originally, the dst of simd_shuffle is not uniform, but if it is optimized as scalar, just use simd_width=1 to generate sel_op/asm Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-08-27	remove GBE_CURBE_STACK_POINTER in payload	Guo Yejun	9	-30/+60
	initialize the data inside kernel with packed integer vector V2: call functions from ctx, instead of ctx.registerAllocator Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-08-24	backend/src/backend: Handle -dump-opt-llvm=[PATH] in clCompileProgram and ↵	Manasi Navare	1	-30/+31
	clBuildProgram OpenCL API This is a resubmission of the patch with support for LLVM 3.4 Allows the user to request a dump of the LLVM-generated IR to the file specified in [PATH] through clCompileProgram options Signed-off-by: Manasi Navare <manasi.d.navare@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-08-20	GBE/PRINTF: store variable instead of pointer in "slots".	Luo Xionghu	3	-13/+27
	this could fix the bug: https://bugs.freedesktop.org/show_bug.cgi?id=90472 v2: the vector "slots" stores the pointer of PrintfSlot from vector "fmts", but the push_back operation of "fmts" will cause resize if capacity is not enough and call the copy constructor and destructor of that PrintfSlot, leading to a illegal pointer in "slots", so this patch change to store the variable instead of pointer. update the destructor of PrintfSlot according to the SLOT_TYPE. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Junyan He <junyan.he@inbox.com>
2015-08-14	fix issue when build against llvm3.3	Guo Yejun	1	-1/+7
	llvm 3.3 has a different constructure of llvm::raw_fd_ostream V2: refine the code Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2015-08-13	backend: Turn on ASM dump.	Manasi Navare	2	-0/+11
	Open the file specified for the ASM dump and write the assembly to it. Signed-off-by: Manasi Navare <manasi.d.navare@intel.com> Signed-off-by: Laura Ekstrand <laura.d.ekstrand@intel.com> Reviewed-by: Song, Ruiling <ruiling.song@intel.com>
2015-08-13	backend: Add ASM file name to GenContext object.	Laura Ekstrand	3	-0/+9
	Part of the plumbing that passes the ASM file name from the compiler options level down to the emitCode level so that the assembly can be written to that file. Signed-off-by: Manasi Navare <manasi.d.navare@intel.com> Signed-off-by: Laura Ekstrand <laura.d.ekstrand@intel.com> Reviewed-by: Song, Ruiling <ruiling.song@intel.com>
2015-08-13	backend: Add ASM file name to GenProgram object.	Laura Ekstrand	2	-2/+4
	Part of the plumbing that passes the ASM file name from the compiler options level down to the emitCode level so that the assembly can be written to that file. Signed-off-by: Manasi Navare <manasi.d.navare@intel.com> Signed-off-by: Laura Ekstrand <laura.d.ekstrand@intel.com> Reviewed-by: Song, Ruiling <ruiling.song@intel.com>
2015-08-13	backend, src: Add ASM file name to gbe_program_new_from_llvm	Laura Ekstrand	4	-2/+4
	Part of the plumbing that passes the ASM file name from the compiler options level down to the emitCode level so that the assembly can be written to that file Signed-off-by: Manasi Navare <manasi.d.navare@intel.com> Signed-off-by: Laura Ekstrand <laura.d.ekstrand@intel.com> Reviewed-by: Song, Ruiling <ruiling.song@intel.com>