~hejunyan/beignet - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2015-10-13	Fix gpgpu node related bug	Pan Xiuli	1	-1/+1
	This should be a typo, we should wait for the gpgpu and create node only if the batch buffer is busy. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-10-13	Add -dump-opt-asm support to the clLinkProgram() API	Manasi Navare	1	-3/+1
	This will dump the Gen ASM output to the file specified in the -dump-opt-asm Link option during the Link program step. Signed-off-by: Manasi Navare <manasi.d.navare@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-10-12	GBE: fix kernel arguments uploading bug.	Zhigang Gong	2	-5/+9
	After the curbe allocation refactor, not all kernel arguments will be allocated unconditional. If some kernel arguments haven't been used at all, the corresponding arguments will be ignored at backend thus we may get a -1 offset. On the runtime driver side, we need check this situation. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-24	GBE: refactor curbe register allocation.	Zhigang Gong	3	-20/+30
	The major motivation is to normalize the curbe payload's allocation and prepare to use liveness information to avoid unecessary payload register allocation and avoid fragments when allocate curbe registers. For an example, for GBE_CURBE_LOCAL_ID_Y/Z, many one dimention kernels don't need them. But previous curbe allocation occurs before the liveness interval computing, thus it will allocate that curbe anyway. Altough it will be expired soon but it still need us to prepare those payload at host side. After this patch, this type of overhead has been eliminated easily. Another purpose is to eliminate the ugly curbe patch list handling in backend. After this patch, the curbe register handling is much cleaner than before. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-23	Driver: fix the annoying "Failed to release userptr..." error message	Pan Xiuli	1	-2/+4
	It is a drm related bug. As the drm driver changed the time to free their test userptr to bufmgr destroy(30921483c70c6939f017476eac13da6aa26b3b3c), we need anothr order to release our driver to make sure the test userptr can be freed with a valid fd. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-23	Calculate appropriate timestamps for cl profile	Midhun Kodiyath	3	-4/+71
	Fix to calculate the current cpu monotonic raw timestamp in nanoseconds for enqueued,submitted,start and finshed and send this to application based on the parameter queries. Signed-off-by: Midhun Kodiyath <midhunchandra.kodiyath@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-22	enable create image 2d from buffer in clCreateImage.	Luo Xionghu	7	-29/+99
	this patch allows create 2d image with a cl buffer with zero copy. v2: should use reference to manage the release the buffer and image. After being created, the buffer reference count is 2, and image reference count is 1. if image is released first, decrease the image reference count and buffer reference count both, release the bo when the buffer is released at last; if buffer is released first, decrease the buffer reference count only, release the buffer when the image is released. add CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT in cl_device_info. v3: move is_image_from_buffer to _cl_mem_image; return CL_INVALID_IMAGE_SIZE if image size is larger than the buffer. v4: pitchalignment set to 2. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-09-22	return 32 could gain 0.2% performance on opencv optical flow case.	Luo Xionghu	1	-1/+1
	Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-09-21	should check the return value of cl_program_new.	Luo Xionghu	1	-0/+18
	catch the error: out of host memery. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-21	Fix clLinkProgram error.	Yang Rong	2	-16/+29
	All programs or none programs specified by input_programs contain a compiled binary or library for the device. Otherwise return CL_INVALID_OPERATION. Correct this condition check. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Luo, Xionghu <xionghu.luo@intel.com>
2015-09-18	Don't use cl_buffer_get_subdata in clEnqueueReadBuffer.	Yang Rong	1	-1/+4
	cl_buffer_get_subdata sometime is very very very slow in linux kernel, in skl and chv, and it is random. So temporary disable it, use map/copy/unmap to read. Should re-enable it after find root cause. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Luo, Xionghu <xionghu.luo@intel.com>
2015-09-18	Fix piglit clLinkProgram fail.	Yang Rong	4	-3/+24
	1. return CL_INVALID_LINKER_OPTIONS when invalid options, using clang to check the options. 2. return CL_INVALID_OPERATION when the binary type is not same. 3. When link fail, will not return CL_LINK_PROGRAM_FAILURE, fix it. 4. Should not delete program in genProgramBuildFromLLVM, the program is new and delete from runtime. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Luo, Xionghu <xionghu.luo@intel.com>
2015-09-07	Runtime: Add NULL pointer check in clGetKernelArgInfo	Junyan He	1	-1/+2
	There is no NULL pointer check for kernel->program->build_opts. This will cause utest test_get_arg_info crash. In fact, we will add -cl-kernel-arg-info flag for compiling ever time, and so the arg info is always avaible. But some test case deliberately unset this flag and expect the ERR return value, so we really need a check here. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-02	Fix clGetKernelArgInfo fail on piglit	Pan Xiuli	2	-9/+13
	1.Change the code for null param_value 2.Add the return value check for build option "-cl-kernel-arg-info" 3.Correct one return value typo Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-08-27	remove GBE_CURBE_STACK_POINTER in payload	Guo Yejun	1	-9/+0
	initialize the data inside kernel with packed integer vector V2: call functions from ctx, instead of ctx.registerAllocator Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-08-13	backend, src: Add ASM file name to gbe_program_new_from_llvm	Laura Ekstrand	1	-1/+1
	Part of the plumbing that passes the ASM file name from the compiler options level down to the emitCode level so that the assembly can be written to that file Signed-off-by: Manasi Navare <manasi.d.navare@intel.com> Signed-off-by: Laura Ekstrand <laura.d.ekstrand@intel.com> Reviewed-by: Song, Ruiling <ruiling.song@intel.com>
2015-08-13	Set proper Vendor ID	Midhun Kodiyath	6	-27/+30
	Device ID and vendor ID are not same.Set the correct vendor ID. Signed-off-by: Midhun Kodiyath <midhunchandra.kodiyath@intel.com> Reviewed-by: Song, Ruiling <ruiling.song@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-08-13	generate sub_group_id inside kernel instead of payload	Guo Yejun	1	-8/+0
	get_sub_group_id ranges at [0, 7] for SIMD8 and [0, 15] for SIMD16, previously we set up the values in kernel payload, now change it to generate the values inside kernel with packed integer vector. v2: encapsulate into a function so that others can get the lane id easily. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-08-10	runtime: always try to update event status in clGetEventProfilingInfo().	Zhigang Gong	1	-0/+1
	Some applications forgot to call clWaitForEvents() before calling to clGetEventProfilingInfo(). Let's update the event's status here. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-08-10	Update last event status in clFinish.	Luo Xionghu	1	-0/+3
	The event should have been finished after clFinish, update the event status. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-08-10	runtime: add detailed broadwell device name.	Ruiling Song	1	-6/+6
	Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-08-10	runtime: add Broadwell deviceID 0x162B	Ruiling Song	2	-1/+5
	Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-07-17	Runtime: correct event and the wait events compare when check event.	Yang Rong	1	-1/+1
	When the event parament is not NULL, the event will point to a new event, so need to check address of the event and the wait events. V2: check the address of the event and the wait events. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2015-07-17	Remove deprecated function cl_context_get_static_kernel().	Zhigang Gong	2	-66/+12
	Also fix a spelling bug - s/internel/internal. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-07-17	runtime: fix a builtin-kernel related thread safe bug.	Zhigang Gong	2	-8/+26
	This patch fixed two thread-safe bugs in the builtin-kernel usage code path. 1. The builtin kernel array itself need to be protected. 2. Each caller need to get a dup of the builtin kernel, rather than share the same kernel structure. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-07-17	runtime bug: brw GT3 devices reported to GT2.	Luo Xionghu	1	-5/+5
	Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-07-17	Reorder GBE_BIN_GENERATER arguments.	Koop Mast	1	-2/+2
	Basically, it boils down to a difference in getopt(3). The getopt(3) on (Free)BSD will exit parsing arguments at the first unknown argument. Signed-off-by: Koop Mast <kwm@rainbow-runner.nl> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-07-15	Need to check eventWaitList in clEnqueueNDRangeKernel.	Zhigang Gong	1	-0/+1
	Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-07-15	Fixed a thread safe bug.	Zhigang Gong	6	-18/+56
	last_event and current_event should be thread private data. Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-07-15	runtime: Need to separate atomic in L3 test and SLM test in self_test().	Zhigang Gong	1	-3/+12
	On HSW, if we use default 4.0.x kernel without the i915.enable_ppgtt=2 boot argument, then the atomic in L3 will not work and the SLM will not work neither. We need to test atomic in L3 firstly, if it fails we need to test SLM again. Otherwise, beignet will not give any error/warning information for both atomic in L3 and SLM not working case. v2: shold set the atomic test result before the second round SLM test. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com>
2015-07-14	Runtime: Add default extension for platforms before BDW.	Junyan He	3	-0/+14
	Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-07-06	runtime: Add cl device's standalone extension.	Junyan He	4	-22/+24
	The cl device may have different extensions from the platform. We will add some items based on the platform extensions. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-07-06	runtime: Use cl_get_platform_default to replace global value.	Junyan He	6	-68/+66
	The init order of the intel_platform and the intel_extension is somehow not clear. When some API such as clGetDeviceIDs can pass NULL as cl_platform_id, we just use the global value intel_platform as the default but do not care about the init state of the extension. The init of the extension may be done when the cl device is created. This is OK if the paltform and the device have the same extensions. But now because of the fp16, they are not always the same. Use cl_get_platform_default to replace the global value to ensure that when default platform is available, the extension is also inited. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-07-03	enable CL_UNSIGNED_INT8 for CL_RG to fix regression	Guo Yejun	1	-0/+1
	the regression is caused when only enable CL_UNORM_INT8 for CL_RG, the reason is that during the image copy implementation with internal kernel, all formats are considerd as integer format, it becomes unknown since CL_UNSIGNED_INT8 is not enabled yet. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-07-02	runtime: Add fp16 extension to BDW later platform.	Junyan He	6	-56/+102
	Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-06-30	use self test to determine enable/or disable atomics in L3 for HSW.	Luo Xionghu	9	-23/+66
	check the selftest kernel return value, if enqueue kernel failed, set the flag to not enable atomics the L3 for HSW. This reverts commit 83f8739b6fc4893fac60145326052ccb5cf653dc. v2: don't use global variable to pass value from runtime to driver. v3: add type SELF_TEST_OTHER_FAIL to differentiate from SELF_TEST_ATOMIC_FAIL; seperate the ATOMIC_FAIL from SLM_FAIL, only SLM_FAIL can be control by env OCL_IGNORE_SELF_TEST. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang, Rong <rong.r.yang@intel.com>
2015-06-19	enable CL_RG + CL_UNORM_INT8 for image	Guo Yejun	1	-0/+5
	Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang, Rong <rong.r.yang@intel.com>
2015-06-18	fix global variable out of boundary writing in libocl.	Luo Xionghu	1	-1/+1
	need minus one when fill '\0' to sizeof char type array. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-05-18	Docs: update/clarify Haswell issues	Rebecca N. Palmer	1	-5/+9
	Reflect recent beignet and Linux changes. Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-05-18	Add a sanity test in clGetDeviceIDs	Rebecca N. Palmer	1	-0/+82
	Run a small __local-using kernel in clGetDeviceIDs; if this returns the wrong result, return CL_DEVICE_NOT_FOUND. As far as I can see, there's no way to tell in advance (except unreliably with a global version check) whether __local-using batches will be accepted...so the easiest solution is probably to just try running one and see what result we get. Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-05-12	rename __gen_ocl_get_simd_id/size to get_sub_group_id/size	Guo Yejun	1	-1/+1
	Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com> Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-05-04	Revert "CL/Driver: enable atomics in L3 for HSW."	Zhigang Gong	2	-14/+1
	This issue should be fixed in KMD in the future. Let's revert it here. This reverts commit ef7127c03bd533277afc443b335c37a69927250a.
2015-05-04	add benckmark for copy data from buffer to image.	Luo Xionghu	1	-1/+1
	v2: use random input data; update comments. v3: change the image attribute to __write_only. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Weng, Chuanbo" <chuanbo.weng@intel.com>
2015-05-04	Optimization of clEnqueueCopyBufferToImage for 16 aligned case.	Luo Xionghu	4	-9/+56
	We can change the image_channel_order to CL_RGBA and image_channel_data_type to CL_UNSIGNED_INT32 for some special case, thus 16 bytes can be read by one work item. Bandwidth is fully used. v2: merge patch 3 of initializing region0; remove k dimension in kernel for 2d image. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Weng, Chuanbo" <chuanbo.weng@intel.com>
2015-05-04	CHV: Add cherryview support in the runtime.	Meng Mengmeng	4	-3/+56
	Cherryview's EU configurations is not decided by pciid, must get from kernel by libdrm. Thanks for Jeff adding this support in the kernel and libdrm. V2: Add the warning when can't get configurations. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-04-24	add simd level function __gen_ocl_get_simd_id	Guo Yejun	1	-0/+8
	uint __gen_ocl_get_simd_id(); return value ranges from 0 to simdsize - 1 V2: use function sel.selReg to refine code V3: correct the uniform condition in liveness.cpp Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-14	runtime: don't try to open nonexistent render nodes or device files.	Zhigang Gong	1	-1/+4
	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-13	GBE: refine error handling for private libva buffer sharing extension.	Zhigang Gong	2	-0/+11
	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-13	GBE: Extend front label ip to 32 bit on demand.	Zhigang Gong	1	-4/+12
	If the front end label ip exceed 0xffff, then the backend will use real DW to represent each block's IP address. This is a dynamic behaviour according to the actual front end's label number. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-10	runtime: Enhance the error handling when flush gpgpu command queue.	Zhigang Gong	9	-30/+30
	Beignet uses drm_intel_gem_bo_context_exec() to flush command queue to linux drm driver layer. We need to check the return value of that function, as it may fail when the application uses very large array. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>