beignet - Beignet OpenCL Library for Intel Ivy Bridge and newer GPUs (mirrored from https://gitlab.freedesktop.org/beignet/beignet)

Age	Commit message (Collapse)	Author	Files	Lines
2017-06-14	Use aligned16 and aligne4 kernel to copy for large 3D image with TILE_Y.	Yan Wang	1	-9/+41
	It is similar with 2D image for avoiding extended image width truncated. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-06-13	Add clFinish for guarantee the kernel copying is finished when create TILE_Y ↵	Yan Wang	1	-0/+7
	large image. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-06-13	Add cl_mem_record_map_mem_for_kernel() for record map adress for TILE_Y ↵	Yan Wang	1	-26/+83
	image by kernel copying. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-05-25	Fix bug of clEnqueueCopyBufferToImage and clEnqueueCopyImageToBuffer.	Yan Wang	1	-28/+50
	"imagedim_non_pow_2" cases of basic modudle of confrmance shows regression after use TILE_Y mode for large image by previous patch. This bug comes from the non-align16 kernel of clEnqueueCopyBufferToImage and clEnqueueCopyImageToBuffer. It will force CL_RGBA/CL_UNORM_INT8/8191x8192 image of conformance test to CL_R/CL_UNSIGNED_INT8/32764x8192 image for copying. So it makes width as 8191 x 4 = 32764 and its width will exceed the maximum width (16 x 1024 = 16384) of GEN surface state structure which only has 14 bits. So use align4 copy kernel to avoid this bug. Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
2017-05-18	Create image with TILE_Y mode still when image size>128MB for performance.	Yan Wang	1	-5/+102
	It may failed to copy data from host ptr to TILE_Y large image. So use clCopyBufferToImage to do this on GPU side. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-03-13	add extension intel_planar_yuv.	Luo Xionghu	1	-8/+150
	create a w* (3/2h) size bo for the whole CL_NV12_INTEL format surface, and the y surface (format CL_R) share the first w h part, uv surface (format CL_RG) share the left w * 1/2h part; set correct bo offset for uv surface per different platforms. v2: add extension define in libocl; fix error check. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-12-30	OCL20: handle device enqueue in runtime.	Yang, Rong R	1	-10/+20
	There are some step to handle device enqueue: 1. allocate the device enqueue bo to store the device enqueue information for parent kernel. Add must convert all global buffers to SVM buffers to make sure the child kernels have the same GPU address. 2. When flush the command, check whether have device enqueue or not. If has device enqueue, must wait finish and parse the device enqueue info. 3. Start the child ndrange according the device enqueue info, and the parent's global buffers as the exec info. Because of non uniform workgroup size, one enqueue api will flush serveral times, but device enqueue only need handle once, so add a flag to function cl_command_queue_flush to indicate the last flush. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2016-12-28	Runtime: fix fill image event assert and some SVM rebase error.	Yang, Rong R	1	-18/+17
	Also remove the useless function cl_context_add_svm. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2016-12-28	Refine list related functions.	Junyan He	1	-5/+5
	Make the list related functions more clear and readable. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-12-16	Refine clSetMemObjectDestructorCallback API.	Yang Rong	1	-10/+33
	Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-12-16	Add multi devices support in context.	Junyan He	1	-19/+19
	In future there may be more than one device on the platform, we need to consider multi devices within one context. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-12-16	Modify clGetImageInfo using cl_get_info_helper.	Junyan He	1	-90/+0
	Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-12-16	Modify clGetMemObjectInfo using cl_get_info_helper.	Junyan He	1	-79/+1
	Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-12-02	save host_ptr when create sub buffer from CL_MEM_ALLOC_HOST_PTR	Guo, Yejun	1	-1/+1
	it fixes issue at https://bugs.freedesktop.org/show_bug.cgi?id=98490 Signed-off-by: Guo, Yejun <yejun.guo@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2016-11-28	fix build issue when HAS_BO_SET_SOFTPIN is false	Guo, Yejun	1	-1/+1
	Signed-off-by: Guo, Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-11-08	Runtime: Add support for clGetMemObjectInfo	Pan Xiuli	1	-4/+10
	clGetMemObjectInfo with CL_MEM_ASSOCIATED_MEMOBJECT should return the mem in cl_image_desc. As in CL_MEM_OBJECT_IMAGE1D_BUFFER we copy the buffer, add a workaround for it. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-11-08	Runtime: Add suport for sRGB to clEnqueueFillImage	Pan Xiuli	1	-1/+22
	Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-11-08	Runtime: Add suport for sRGB to clEnqueueCopyImage	Pan Xiuli	1	-1/+3
	Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-11-08	Runtime: Add pipe related APIs	Pan Xiuli	1	-0/+100
	Add clCreatePipe and clGetPipeInfo Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-11-08	OCL20: Add svm support.	Yang Rong	1	-3/+108
	Enable CL_DEVICE_SVM_COARSE_GRAIN_BUFFER svm support, use userptr and softpin to implement it. Use userptr to share the page between cpu and gpu, and softpin to unify the cpu and gpu's address. Now it works on i386 system. x86_64 depends on backend support. This patch base on DRM library and DRM kernel driver's softpin patch: http://lists.freedesktop.org/archives/intel-gfx/2015-September/075446.html. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2016-09-28	Modify all event related functions using new event handle.	Junyan He	1	-18/+100
	Rewrite the cl_event, and modify all the event functions using this new event manner. Event will co-operate with command queue's thread together. v2: Fix a logic problem in event create failed. V3: Set enqueue default to do nothing, handle some enqueue has nothing to do. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-09-23	Delete all the verbose locks and use list to store CL objects.	Junyan He	1	-36/+19
	We use context's lock when we add and delete cl objects. Every cl object should use it's own lock to protect itself. We also add some helper functions to ease the adding and removing operations. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-09-12	Runtime: re-enable cl_khr_gl_sharing with existing egl extension.	Chuanbo Weng	1	-1/+1
	In order to query low level layout of GL buffer object/texture/render buffer, previous implementation introduced an egl extension and implemented in Beignet side. This way is broken once mesa change its related internal code. In this patch, we use an new egl extension (EGL_MESA_image_dma_buf_export) to query related layout infomations of gl texture. Since this egl extension is already accepted by Khronos, so it's a stable method. This patch just implement GL texture 2d buffer sharing, and we will implement other target type if necessary. v2: Add CMake build option to enable cl_khr_gl_sharing(default off). Clean up related CMake code. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-09-05	fix w of image when simulate image1dbuffer with image2d	Guo Yejun	1	-1/+0
	and also change the utest to hit the potential case Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-09-02	Runtime: Apply base object to cl_mem.	Junyan He	1	-10/+9
	Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-08-03	Runtime: fix a userptr bug.	Yang, Rong R	1	-1/+3
	Userptr also require size cache alignment, otherwise, the remained memory may be allocated in CPU side, when gpu flush the last cacheline to memory, will override the value changed by CPU. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com>
2016-07-20	Runtime: set size member of cl_image created by clCreateImageFromFdINTEL.	Chuanbo Weng	1	-6/+11
	The size need to be set, so clGetMemObjectInfo(.., CL_MEM_SIZE, ..) can return actual size of this cl_image instead of 0. Also some code refinement to make logic more clear. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2016-06-12	runtime: The depth should be 1 for CL_MEM_OBJECT_IMAGE2D in beignet's ↵	Chuanbo Weng	1	-2/+2
	implementation. Reported by Feng Yuan. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-06-12	Runtime: Disable image hostptr for default	Pan Xiuli	1	-1/+8
	Image with hostptr can not use tiling and can be very slow when need access image. Disable image hostptr for default for good profermance. Add an option OCL_IMAGE_HOSTPTR to enable ture image hostptr. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-05-23	runtime: error handling to avoid null pointer dereference.	Luo Xionghu	1	-2/+5
	Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09	make Beignet as intermedia layer of CMRT	Guo Yejun	1	-0/+7
	CMRT is C for Media Runtime on Intel GPU, see https://github.com/01org/cmrt. There is a request to make Beignet as intermedia layer of CMRT, in other words, application programer write OpenCL APIs to execute the CM kernel on GPU, the following shows the key code, and please refer to the next patch of unit test for detail. prog = clCreateProgramWithBinary("cm kernel"); clBuildProgram(prog); kernel = clCreateKernel(prog, "kernel name"); image = clCreateImage(); clSetKernelArg(kernel, image); clEnqueueNDRangeKernel(kernel); Inside Beignet, once cm kernel is invoked, the following relative APIs will be directly passed to CMRT library (libcmrt.so) which is loaded via dlopen only when necessary. Since we use this simple method to keep the code clean, OpenCL spec is not strictly followed, and cl_event is not supported for this case. v2: add comments about the cm queue in fuction cmrt_enqueue Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-26	check image from buffer's base address alignment.	Luo Xionghu	1	-0/+9
	per spec, if create image from USE_HOST_PTR buffer, the buffer's base address need be aligned. v2: return error code CL_VALID_IMAGE_FORMAT_DESCRIPTOR. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-10	Add extension clCreateImageFromFdINTEL to create cl image by external fd.	Chuanbo Weng	1	-0/+58
	Before this patch, Beignet can only create cl image from external bo by its handle using clCreateImageFromLibvaIntel. Render node is the first choice of accessing gpu in currect Beignet implementation. DRM_IOCTL_GEM_OPEN is used by clCreateBufferFromLibvaIntel but forbidden in Render node mode. So it's necessary to add this extension to support buffer sharing between different libraries. v2: Seperate clCreateMemObjectFromFdIntel into two extensions: clCreateBufferFromFdINTEL and clCreateImageFromFdINTEL. v3: Set depth of _cl_mem_image to 0 because it's CL_MEM_OBJECT_IMAGE2D type. Fix rebase conflict: add a parameter when invoke cl_mem_allocate. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-11-10	Add extension clCreateBufferFromFdINTEL to create cl buffer by external ↵	Chuanbo Weng	1	-0/+30
	buffer object's fd. Before this patch, Beignet can only create cl buffer from external bo by its handle using clCreateBufferFromLibvaIntel. Render node is the first choice of accessing gpu in currect Beignet implementation. DRM_IOCTL_GEM_OPEN is used by clCreateBufferFromLibvaIntel but forbidden in Render node mode. So it's necessary to add this extension to support buffer sharing between different libraries. v2: Seperate clCreateMemObjectFromFdIntel into two extensions: clCreateBufferFromFdINTEL and clCreateImageFromFdINTEL. v3: Fix rebase conflict: add a parameter when invoke cl_mem_allocate. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-11-04	set the pitch of image from buffer to the buffer's pitch.	Luo Xionghu	1	-1/+6
	if image from buffer, the image's pitch should be same with buffer bo's row pitch. v2: correct style. image from buffer need update both aligned_pitch and aligned_h, while image from user ptr only set aligned_pitch, so just keep them independently. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo Yejun <yejun.guo@intel.com>
2015-11-03	fix regression issue for climage + uesrptr	Guo Yejun	1	-1/+2
	userptr requires the exact same memory layout between cpu and gpu, since the current implementation uses the value of row_pitchh, ignoring the slice_pitch provided by the application. so, enable userptr only if slice_pitch == row_pitchh for image3d, 2darray and 1darray. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-10-20	add conditions of pitch and h to enable userptr for climage_use_host_ptr	Guo Yejun	1	-1/+4
	this regression issue is reported from conformance test, to enable userptr for climage + use_host_ptr, the memory layout between the host_ptr (for CPU) and drm bo (for GPU) must be the same. it means bo's row pitch should be the same as image's row pitch, and h should be the same as aligned h. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-10-20	alignment of NO TILING surface limitation shouldn't be removed.	Luo Xionghu	1	-3/+4
	it could cause sampler data mismatch on IVB. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-10-14	enable USE_HOST_PTR for cl image with userptr to avoid extra copying	Guo Yejun	1	-9/+36
	the pointer must be 64 byte aligned, and only when w,h equals to its aligned value, otherwise, roll back to the old method with extra copying. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-10-14	refine code to separate the usage of data and image2d_from_buffer	Guo Yejun	1	-25/+36
	currently, 'void* data' has two meanings: the pointer from application, and the buffer to create image2d from. It makes the code a little complex when supporting userptr for image. So, add a new function parameter to separate the two meanings. V2: fix when HAS_USERPTR is not enabled Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: xionghu.luo@intel.com
2015-09-22	enable create image 2d from buffer in clCreateImage.	Luo Xionghu	1	-27/+88
	this patch allows create 2d image with a cl buffer with zero copy. v2: should use reference to manage the release the buffer and image. After being created, the buffer reference count is 2, and image reference count is 1. if image is released first, decrease the image reference count and buffer reference count both, release the bo when the buffer is released at last; if buffer is released first, decrease the buffer reference count only, release the buffer when the image is released. add CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT in cl_device_info. v3: move is_image_from_buffer to _cl_mem_image; return CL_INVALID_IMAGE_SIZE if image size is larger than the buffer. v4: pitchalignment set to 2. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-07-17	runtime: fix a builtin-kernel related thread safe bug.	Zhigang Gong	1	-1/+11
	This patch fixed two thread-safe bugs in the builtin-kernel usage code path. 1. The builtin kernel array itself need to be protected. 2. Each caller need to get a dup of the builtin kernel, rather than share the same kernel structure. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-05-04	Optimization of clEnqueueCopyBufferToImage for 16 aligned case.	Luo Xionghu	1	-8/+36
	We can change the image_channel_order to CL_RGBA and image_channel_data_type to CL_UNSIGNED_INT32 for some special case, thus 16 bytes can be read by one work item. Bandwidth is fully used. v2: merge patch 3 of initializing region0; remove k dimension in kernel for 2d image. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Weng, Chuanbo" <chuanbo.weng@intel.com>
2015-04-13	GBE: refine error handling for private libva buffer sharing extension.	Zhigang Gong	1	-0/+4
	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-02-13	Optimization of clEnqueueCopyImageToBuffer for 16 aligned case.	Chuanbo Weng	1	-8/+36
	We can change the image_channel_order to CL_RGBA and image_channel_data_type to CL_UNSIGNED_INT32 for some special case, thus 16 bytes can be read by one work item. Bandwidth is fully used. v2: Now we just optimize for IMAGE2D, so add judgement to not affect other image type's code path. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-02-09	SKL: fix some 3D and 2D array image fail.	Yang Rong	1	-1/+2
	SKL need use aligned_pitch * aligned_h to calculate slice pitch, so add a new type of cl_buffer_get_tiling_align to get it. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-02-09	runtime: don't free the host_ptr for a subbuffer.	Zhigang Gong	1	-1/+3
	When the buffer has CL_MEM_ALLOC_HOST_PTR, the runtime need to free the host_ptr at destructor. But if the buffer is a subbuffer, then its host ptr is not allocated by itself, we should not free it here. Otherwise, it may cause some weird errors such as: "corrupted double-linked list..". Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com>
2015-02-06	Implement 1D/2D image array related cl_mem_kernel_copy_image in cl way ↵	Chuanbo Weng	1	-12/+31
	instead of cpu way. Before this patch, cl_mem_kernel_copy_image do cpu memory copy in order to copy image array objects. This is very slow for large image size. This patch implement image array copy in cl way, which dramatically accelerate image array related clEnqueueCopyImage. clCopyImage case in OpenCL conformance test will not be blocked anymore. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-02-06	Return error, don't crash, on allocation failure	Rebecca N. Palmer	1	-10/+13
	As previously noted, when cl_mem_allocate fails, its error handling then calls cl_mem_delete on the incompletely-set-up buffer, which aborts at assert(mem->ctx). This patch appears to fix the problem, but be warned I don't know this code well enough to know what else it might break. Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-30	SKL: Use TILE_Y as default TILING mode in skl.	Yang Rong	1	-1/+2
	3D Image can't use TILE_X in skl so change to default TILING MODE to TILE_Y. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: He Junyan <Junyan.he@inbox.com>