beignet - Beignet OpenCL Library for Intel Ivy Bridge and newer GPUs (mirrored from https://gitlab.freedesktop.org/beignet/beignet)

Age	Commit message (Collapse)	Author	Files	Lines
2017-07-27	Runtime: fix the context ref is not 0 assert when delete.	Yang, Rong R	1	-22/+8
	The CL_ENQUEUE_FILL_BUFFER_ALIGN8_* internal program is the same program, only add the program's ref once, but when delete context, caculate the internal program count, will add them individually. This mismatch will cause the context be free by mistake. New different CL_ENQUEUE_FILL_BUFFER_ALIGN8_* program for clearly. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-07-27	Runtime: fix a cl_gpgpu_bind_image_for_vme NULL SIGSEGV.	Yang, Rong R	1	-1/+2
	Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-07-12	Implement extension cl_intel_device_side_avc_motion_estimation.	Chuanbo Weng	6	-3/+148
	This patch mainly contains: 1. built-in function __gen_ocl_ime implementation. 2. Lots of built-in functions of cl_intel_device_side_avc_motion_estimation are implemented. 3. This extension is required to run in simd16 mode. v2: move the utests to seprate patches one by one; as all the utests has extension function check, no need to put them in stand alone utest; uncomment the self test; fix extension check logic issue, should be && instead of \|\|. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Signed-off-by: Xionghu Luo <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-07-10	Runtime: remove ctx's useless fileds.	Yang, Rong R	3	-43/+5
	built_in_prgs and built_in_kernels seems useless, remove them. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-07-10	Runtime: fix a recurrent release context error.	Yang, Rong R	1	-10/+8
	Before release internal resources, must set them to null, otherwize, when delete these resources, will call release context again. The ctx->built_in_prgs should be release by application. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-07-04	Runtime: refine max group size for SKL & KBL	rander	1	-9/+9
	Now change max group size to 256. it is a reasonable size for Gen9. According to performance test, 256 make good progress in openCV and no regression. So change it Signed-off-by: rander.wang <rander.wang@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-06-23	GBE: clean llvm module's clone and release.	Yang, Rong R	3	-1/+7
	There are some changes: 1. Clone the module before call LLVMLinkModules2, remove other clones for it. 2. Don't delete module in function llvmToGen. 3. Add a function programNewFromLLVMFile so genProgramNewFromLLVM and buildFromLLVMModule only handle llvm module. Actually, programNewFromLLVMFile is only used by clCreateProgramWithLLVMIntel, and I think it is useless, maybe we could delete it at all. V2: define errDiag beside #if/#endif. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-06-22	Add missed kernel names into built-in kernel list.	Yan Wang	1	-1/+16
	Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-06-22	Runtime: Add missing SKL deivce ID	Pan Xiuli	2	-1/+9
	It seems we missed some newly added device ID for SKL. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-06-16	Fix context leak with internal kernels	Patrick Beaulieu	1	-1/+21
	Account for internal program ctx references in cl_context_delete Signed-off-by: Patrick Beaulieu <patrick.beaulieu@avigilon.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-06-16	Runtime: Add new API enums for cl_intel_required_subgroup_size extension	Pan Xiuli	5	-0/+40
	Add CL_DEVICE_SUB_GROUP_SIZES_INTEL for clGetDeviceInfo, add CL_KERNEL_SPILL_MEM_SIZE_INTEL for clGetKernelWorkGroupInfo and add CL_KERNEL_COMPILE_SUB_GROUP_SIZE_INTEL for clGetKernelSubGroupInfo. We only have this extension for LLVM 40+ for frontend support. V2: Add opencl-c define Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-06-14	Use aligned16 and aligne4 kernel to copy for large 3D image with TILE_Y.	Yan Wang	7	-37/+149
	It is similar with 2D image for avoiding extended image width truncated. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-06-13	Optimize clEnqueueWriteImageByKernel and clEnqueuReadImageByKernel.	Yan Wang	1	-7/+18
	1. Only copy the data by origin and region defined. 2. Add clFinish to guarantee the kernel copying is finished when blocking writing. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-06-13	Fix bug of clEnqueueUnmapMemObjectForKernel and clEnqueueMapImageByKernel.	Yan Wang	1	-34/+113
	1. Support wrrting data by mapping/unmapping mode. 2. Add mapping record logic. 3. Add clFinish to guarantee the kernel copying is finished. 4. Fix the error of calling clEnqueueMapImageByKernel. blocking_map and map_flags need be switched. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-06-13	Add clFinish for guarantee the kernel copying is finished when create TILE_Y ↵	Yan Wang	1	-0/+7
	large image. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-06-13	Add cl_mem_record_map_mem_for_kernel() for record map adress for TILE_Y ↵	Yan Wang	2	-26/+88
	image by kernel copying. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-06-09	Runtime: Fix a mssing llvm version marco for LLVM40+	Pan Xiuli	1	-1/+1
	Found a missing macro that need change to support LLVM40+. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-05-25	Fix bug of clEnqueueCopyBufferToImage and clEnqueueCopyImageToBuffer.	Yan Wang	5	-28/+89
	"imagedim_non_pow_2" cases of basic modudle of confrmance shows regression after use TILE_Y mode for large image by previous patch. This bug comes from the non-align16 kernel of clEnqueueCopyBufferToImage and clEnqueueCopyImageToBuffer. It will force CL_RGBA/CL_UNORM_INT8/8191x8192 image of conformance test to CL_R/CL_UNSIGNED_INT8/32764x8192 image for copying. So it makes width as 8191 x 4 = 32764 and its width will exceed the maximum width (16 x 1024 = 16384) of GEN surface state structure which only has 14 bits. So use align4 copy kernel to avoid this bug. Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
2017-05-25	build: fix cmake code generation dependencies.	Ismo Puustinen	1	-2/+2
	There is a race condition between building .bc and header files and generating code from .cl targets. Fix the race by adding the dependency to generated files. Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
2017-05-18	Implement TILE_Y large image in clEnqueueWriteImage.	Yan Wang	1	-0/+46
	It will fail to copy data from host ptr to TILE_Y large image by memcpy. Use clEnqueueCopyBufferToImage to do this on GPU side. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-05-18	Implement TILE_Y large image in clEnqueueReadImage.	Yan Wang	1	-0/+55
	It will fail to copy data from TILE_Y large image to buffer by memcpy. Use clEnqueueCopyImageToBuffer to do this on GPU side. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-05-18	Implement TILE_Y large image in clEnqueueMapImage and clEnqueueUnmapMemObject.	Yan Wang	1	-0/+111
	It will fail to copy data from TILE_Y large image to buffer by memcpy. Use clEnqueueCopyImageToBuffer to do this. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-05-18	Create image with TILE_Y mode still when image size>128MB for performance.	Yan Wang	4	-6/+111
	It may failed to copy data from host ptr to TILE_Y large image. So use clCopyBufferToImage to do this on GPU side. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-05-15	GLK: add geminilake runtime support.	Yang Rong	2	-2/+47
	Geminilake is almost same as bxt, except intel_gpgpu_read_ts_reg function. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-05-15	GLK: add Geminilake pciids.	Yang Rong	1	-1/+8
	Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-03-23	Limit get_program_global_data() calls to OpenCL 2.0	Jan Beich	1	-2/+4
	https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=217635 Signed-off-by: Jan Beich <jbeich@freebsd.org> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-03-17	intel: Check that we can reserve the zero-offset	Yang Rong	1	-11/+20
	commit ff57cee0519d ("ocl20/runtime: take the first 64KB page table entries") tries to allocate a bo at 0 offset, but failed to take into account that something may already be allocated there that it is not allowed to evict (particularly when not using full-ppgtt separation). Failure to do so causes all execution to subsequentally fail with "drm_intel_gem_bo_context_exec() failed: Device or resource busy" Reported-by: Kenneth Johansson <ken@kenjo.org> Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=98647 Contributor: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-03-13	add extension cl_intel_media_block_io READ related function	Luo Xionghu	1	-0/+1
	v2: add #define intel_media_block_io in libocl; move extension check code to this patch; Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-03-13	add extension intel_planar_yuv.	Luo Xionghu	9	-24/+217
	create a w* (3/2h) size bo for the whole CL_NV12_INTEL format surface, and the y surface (format CL_R) share the first w h part, uv surface (format CL_RG) share the left w * 1/2h part; set correct bo offset for uv surface per different platforms. v2: add extension define in libocl; fix error check. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-03-07	CMAKE: Refine builtin kernel bin generator	Pan Xiuli	1	-7/+7
	Move the generated builtin str and bin files into the Cmake build directory to avoid chaos when changing LLVM. V2: Fix a bug that the builtin.cl was not written into build dir. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-02-14	Runtime: add a warning when load gen binary fail.	Yang Rong	1	-0/+1
	Some applications use program's binary by default, if load the former's gen binary, because the fields of gen binary has changed, and lack of version checking, will lead to clCreateProgramWithBinary fail, may cause applications fail silently. Add a warning to hint user. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-02-14	Runtime: fix get non support type device bug.	Yang Rong	2	-4/+8
	Only return support device type (GPU and default) in function cl_get_gt_device. Contributor: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-02-14	Enable OpenCL 2.0 only where supported	Pan Xiuli	5	-8/+18
	This allows a single beignet binary to both offer 2.0 where available, and still work on older hardware. V2: Default to 1.2 when -cl-std is not set (required by the OpenCL spec, and also likely to be faster). V3: Only enable OpenCL 2.0 when llvm version is 39. V4: Only enable OpenCL 2.0 on x64 host. V5: Always return 32 as address bits. Contributor: Rebecca N. Palmer <rebecca_palmer@zoho.com> Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-02-14	Enable support for two-component 16-bit planes	Mark Thompson	1	-0/+2
	This is needed to support the chroma plane of P010 surfaces being mapped from VAAPI. Signed-off-by: Mark Thompson <sw@jkqxz.net> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-02-10	Free context devices on context release	Giuseppe Bilotta	1	-0/+1
	The context owns the array of devices passed to cl_context_new, so it's its duty to free it. Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-02-10	Fix obvious copy-paste	Giuseppe Bilotta	1	-1/+1
	The conditional was equal to the one before, and would never be hit because internal kernels were reset after release. Instead, since the body is resetting built-in kernels, it appears obvious that the conditional should be on the existence of built-in kernels. Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-02-10	API: Fix local memory type to CL_LOCAL	Pan Xiuli	4	-4/+4
	We are using SLM as local memory and we should return CL_LOCAL for CL_DEVICE_LOCAL_MEM_TYPE. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-02-08	Typo in error message	Giuseppe Bilotta	1	-1/+1
	Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Reviewed-by: He Junyan <junyan.he@inbox.com>
2017-02-06	Make CL-GL sharing available via ICD	Rebecca N. Palmer	1	-12/+16
	Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com> Reviewed-by: Chuanbo Weng <chuanbo.weng@intel.com>
2017-01-19	Android.mk: update Android.mk for android build.	Yang Rong	1	-3/+15
	Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-01-11	Add some pointer access check.	Yang Rong	3	-1/+5
	Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-01-11	Fix two bugs about command queue destroy.	Junyan He	2	-1/+4
	1. Call finish before we destroy the command queue. We should make sure all the commands in the queue are finished before we really destroy the command_queue. If not, may cause event status error. We leave the queue's life time to user and do not ref the queue when create event. 2. Loose the assert condition when notify queue. We have the case when ref of the queue is 0 but still need to notify. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-01-09	Fail, don't assert, if unable to create context	Rebecca N. Palmer	1	-3/+5
	As the "do we have any usable devices?" check uses this, it needs to not crash even when we don't. Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-01-06	Runtime: Fix a event bug.	Yang Rong	1	-7/+21
	SVM enqueues need to call cl_event_exec every time. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Junyan He <junyan.he@linux.intel.com>
2017-01-06	Fix a event notify bug.	Junyan He	4	-48/+27
	When a event complete, we need to notify all the command_queue within the same context. But sometime, some command_queue in the context is already invalid. Modify to ensure all the command_queue to be notified are valid. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-01-05	Fix two bugs about event.	Junyan He	2	-16/+13
	1. NDrangeKernel need to call cl_event_exec every time. 2. Enqueue Barrier event need to add to queue every time. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-12-30	Runtime: fix a profiling fail.	Yang Rong	3	-23/+6
	cl_event_exec it the uniformal entry for all event command execution, call cl_enqueue_handle may miss time stamp record. Replace all cl_enqueue_handle to cl_event_exec. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2016-12-30	Runtime: add the head file to avoid implicit declaration of function ↵	Yang Rong	1	-0/+1
	‘cl_devices_list_include_check’ warning. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2016-12-30	OCL20: handle device enqueue in runtime.	Yang, Rong R	15	-700/+1006
	There are some step to handle device enqueue: 1. allocate the device enqueue bo to store the device enqueue information for parent kernel. Add must convert all global buffers to SVM buffers to make sure the child kernels have the same GPU address. 2. When flush the command, check whether have device enqueue or not. If has device enqueue, must wait finish and parse the device enqueue info. 3. Start the child ndrange according the device enqueue info, and the parent's global buffers as the exec info. Because of non uniform workgroup size, one enqueue api will flush serveral times, but device enqueue only need handle once, so add a flag to function cl_command_queue_flush to indicate the last flush. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2016-12-30	OCL20: add a cl_kernel pointer to gpgpu.	Yang, Rong R	4	-5/+31
	Because in flush the command queue, must check the currunt flushed command queue has device enqueue or not, it need the cl_kernel. So store the cl_kernel pointer to gpgpu. And add two function intel_gpgpu_set_kernel and intel_gpgpu_get_kernel for it. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>