summaryrefslogtreecommitdiff
path: root/src/cl_mem.c
AgeCommit message (Collapse)AuthorFilesLines
2017-06-14Use aligned16 and aligne4 kernel to copy for large 3D image with TILE_Y.Yan Wang1-9/+41
It is similar with 2D image for avoiding extended image width truncated. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-06-13Add clFinish for guarantee the kernel copying is finished when create TILE_Y ↵Yan Wang1-0/+7
large image. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-06-13Add cl_mem_record_map_mem_for_kernel() for record map adress for TILE_Y ↵Yan Wang1-26/+83
image by kernel copying. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-05-25Fix bug of clEnqueueCopyBufferToImage and clEnqueueCopyImageToBuffer.Yan Wang1-28/+50
"imagedim_non_pow_2" cases of basic modudle of confrmance shows regression after use TILE_Y mode for large image by previous patch. This bug comes from the non-align16 kernel of clEnqueueCopyBufferToImage and clEnqueueCopyImageToBuffer. It will force CL_RGBA/CL_UNORM_INT8/8191x8192 image of conformance test to CL_R/CL_UNSIGNED_INT8/32764x8192 image for copying. So it makes width as 8191 x 4 = 32764 and its width will exceed the maximum width (16 x 1024 = 16384) of GEN surface state structure which only has 14 bits. So use align4 copy kernel to avoid this bug. Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
2017-05-18Create image with TILE_Y mode still when image size>128MB for performance.Yan Wang1-5/+102
It may failed to copy data from host ptr to TILE_Y large image. So use clCopyBufferToImage to do this on GPU side. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-03-13add extension intel_planar_yuv.Luo Xionghu1-8/+150
create a w* (3/2*h) size bo for the whole CL_NV12_INTEL format surface, and the y surface (format CL_R) share the first w * h part, uv surface (format CL_RG) share the left w * 1/2h part; set correct bo offset for uv surface per different platforms. v2: add extension define in libocl; fix error check. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-12-30OCL20: handle device enqueue in runtime.Yang, Rong R1-10/+20
There are some step to handle device enqueue: 1. allocate the device enqueue bo to store the device enqueue information for parent kernel. Add must convert all global buffers to SVM buffers to make sure the child kernels have the same GPU address. 2. When flush the command, check whether have device enqueue or not. If has device enqueue, must wait finish and parse the device enqueue info. 3. Start the child ndrange according the device enqueue info, and the parent's global buffers as the exec info. Because of non uniform workgroup size, one enqueue api will flush serveral times, but device enqueue only need handle once, so add a flag to function cl_command_queue_flush to indicate the last flush. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2016-12-28Runtime: fix fill image event assert and some SVM rebase error.Yang, Rong R1-18/+17
Also remove the useless function cl_context_add_svm. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2016-12-28Refine list related functions.Junyan He1-5/+5
Make the list related functions more clear and readable. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-12-16Refine clSetMemObjectDestructorCallback API.Yang Rong1-10/+33
Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-12-16Add multi devices support in context.Junyan He1-19/+19
In future there may be more than one device on the platform, we need to consider multi devices within one context. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-12-16Modify clGetImageInfo using cl_get_info_helper.Junyan He1-90/+0
Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-12-16Modify clGetMemObjectInfo using cl_get_info_helper.Junyan He1-79/+1
Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-12-02save host_ptr when create sub buffer from CL_MEM_ALLOC_HOST_PTRGuo, Yejun1-1/+1
it fixes issue at https://bugs.freedesktop.org/show_bug.cgi?id=98490 Signed-off-by: Guo, Yejun <yejun.guo@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2016-11-28fix build issue when HAS_BO_SET_SOFTPIN is falseGuo, Yejun1-1/+1
Signed-off-by: Guo, Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-11-08Runtime: Add support for clGetMemObjectInfoPan Xiuli1-4/+10
clGetMemObjectInfo with CL_MEM_ASSOCIATED_MEMOBJECT should return the mem in cl_image_desc. As in CL_MEM_OBJECT_IMAGE1D_BUFFER we copy the buffer, add a workaround for it. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-11-08Runtime: Add suport for sRGB to clEnqueueFillImagePan Xiuli1-1/+22
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-11-08Runtime: Add suport for sRGB to clEnqueueCopyImagePan Xiuli1-1/+3
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-11-08Runtime: Add pipe related APIsPan Xiuli1-0/+100
Add clCreatePipe and clGetPipeInfo Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-11-08OCL20: Add svm support.Yang Rong1-3/+108
Enable CL_DEVICE_SVM_COARSE_GRAIN_BUFFER svm support, use userptr and softpin to implement it. Use userptr to share the page between cpu and gpu, and softpin to unify the cpu and gpu's address. Now it works on i386 system. x86_64 depends on backend support. This patch base on DRM library and DRM kernel driver's softpin patch: http://lists.freedesktop.org/archives/intel-gfx/2015-September/075446.html. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2016-09-28Modify all event related functions using new event handle.Junyan He1-18/+100
Rewrite the cl_event, and modify all the event functions using this new event manner. Event will co-operate with command queue's thread together. v2: Fix a logic problem in event create failed. V3: Set enqueue default to do nothing, handle some enqueue has nothing to do. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-09-23Delete all the verbose locks and use list to store CL objects.Junyan He1-36/+19
We use context's lock when we add and delete cl objects. Every cl object should use it's own lock to protect itself. We also add some helper functions to ease the adding and removing operations. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-09-12Runtime: re-enable cl_khr_gl_sharing with existing egl extension.Chuanbo Weng1-1/+1
In order to query low level layout of GL buffer object/texture/render buffer, previous implementation introduced an egl extension and implemented in Beignet side. This way is broken once mesa change its related internal code. In this patch, we use an new egl extension (EGL_MESA_image_dma_buf_export) to query related layout infomations of gl texture. Since this egl extension is already accepted by Khronos, so it's a stable method. This patch just implement GL texture 2d buffer sharing, and we will implement other target type if necessary. v2: Add CMake build option to enable cl_khr_gl_sharing(default off). Clean up related CMake code. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-09-05fix w of image when simulate image1dbuffer with image2dGuo Yejun1-1/+0
and also change the utest to hit the potential case Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-09-02Runtime: Apply base object to cl_mem.Junyan He1-10/+9
Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-08-03Runtime: fix a userptr bug.Yang, Rong R1-1/+3
Userptr also require size cache alignment, otherwise, the remained memory may be allocated in CPU side, when gpu flush the last cacheline to memory, will override the value changed by CPU. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com>
2016-07-20Runtime: set size member of cl_image created by clCreateImageFromFdINTEL.Chuanbo Weng1-6/+11
The size need to be set, so clGetMemObjectInfo(.., CL_MEM_SIZE, ..) can return actual size of this cl_image instead of 0. Also some code refinement to make logic more clear. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2016-06-12runtime: The depth should be 1 for CL_MEM_OBJECT_IMAGE2D in beignet's ↵Chuanbo Weng1-2/+2
implementation. Reported by Feng Yuan. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-06-12Runtime: Disable image hostptr for defaultPan Xiuli1-1/+8
Image with hostptr can not use tiling and can be very slow when need access image. Disable image hostptr for default for good profermance. Add an option OCL_IMAGE_HOSTPTR to enable ture image hostptr. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-05-23runtime: error handling to avoid null pointer dereference.Luo Xionghu1-2/+5
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09make Beignet as intermedia layer of CMRTGuo Yejun1-0/+7
CMRT is C for Media Runtime on Intel GPU, see https://github.com/01org/cmrt. There is a request to make Beignet as intermedia layer of CMRT, in other words, application programer write OpenCL APIs to execute the CM kernel on GPU, the following shows the key code, and please refer to the next patch of unit test for detail. prog = clCreateProgramWithBinary("cm kernel"); clBuildProgram(prog); kernel = clCreateKernel(prog, "kernel name"); image = clCreateImage(); clSetKernelArg(kernel, image); clEnqueueNDRangeKernel(kernel); Inside Beignet, once cm kernel is invoked, the following relative APIs will be directly passed to CMRT library (libcmrt.so) which is loaded via dlopen only when necessary. Since we use this simple method to keep the code clean, OpenCL spec is not strictly followed, and cl_event is not supported for this case. v2: add comments about the cm queue in fuction cmrt_enqueue Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-26check image from buffer's base address alignment.Luo Xionghu1-0/+9
per spec, if create image from USE_HOST_PTR buffer, the buffer's base address need be aligned. v2: return error code CL_VALID_IMAGE_FORMAT_DESCRIPTOR. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-10Add extension clCreateImageFromFdINTEL to create cl image by external fd.Chuanbo Weng1-0/+58
Before this patch, Beignet can only create cl image from external bo by its handle using clCreateImageFromLibvaIntel. Render node is the first choice of accessing gpu in currect Beignet implementation. DRM_IOCTL_GEM_OPEN is used by clCreateBufferFromLibvaIntel but forbidden in Render node mode. So it's necessary to add this extension to support buffer sharing between different libraries. v2: Seperate clCreateMemObjectFromFdIntel into two extensions: clCreateBufferFromFdINTEL and clCreateImageFromFdINTEL. v3: Set depth of _cl_mem_image to 0 because it's CL_MEM_OBJECT_IMAGE2D type. Fix rebase conflict: add a parameter when invoke cl_mem_allocate. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-11-10Add extension clCreateBufferFromFdINTEL to create cl buffer by external ↵Chuanbo Weng1-0/+30
buffer object's fd. Before this patch, Beignet can only create cl buffer from external bo by its handle using clCreateBufferFromLibvaIntel. Render node is the first choice of accessing gpu in currect Beignet implementation. DRM_IOCTL_GEM_OPEN is used by clCreateBufferFromLibvaIntel but forbidden in Render node mode. So it's necessary to add this extension to support buffer sharing between different libraries. v2: Seperate clCreateMemObjectFromFdIntel into two extensions: clCreateBufferFromFdINTEL and clCreateImageFromFdINTEL. v3: Fix rebase conflict: add a parameter when invoke cl_mem_allocate. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-11-04set the pitch of image from buffer to the buffer's pitch.Luo Xionghu1-1/+6
if image from buffer, the image's pitch should be same with buffer bo's row pitch. v2: correct style. image from buffer need update both aligned_pitch and aligned_h, while image from user ptr only set aligned_pitch, so just keep them independently. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo Yejun <yejun.guo@intel.com>
2015-11-03fix regression issue for climage + uesrptrGuo Yejun1-1/+2
userptr requires the exact same memory layout between cpu and gpu, since the current implementation uses the value of row_pitch*h, ignoring the slice_pitch provided by the application. so, enable userptr only if slice_pitch == row_pitch*h for image3d, 2darray and 1darray. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-10-20add conditions of pitch and h to enable userptr for climage_use_host_ptrGuo Yejun1-1/+4
this regression issue is reported from conformance test, to enable userptr for climage + use_host_ptr, the memory layout between the host_ptr (for CPU) and drm bo (for GPU) must be the same. it means bo's row pitch should be the same as image's row pitch, and h should be the same as aligned h. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-10-20alignment of NO TILING surface limitation shouldn't be removed.Luo Xionghu1-3/+4
it could cause sampler data mismatch on IVB. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-10-14enable USE_HOST_PTR for cl image with userptr to avoid extra copyingGuo Yejun1-9/+36
the pointer must be 64 byte aligned, and only when w,h equals to its aligned value, otherwise, roll back to the old method with extra copying. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-10-14refine code to separate the usage of data and image2d_from_bufferGuo Yejun1-25/+36
currently, 'void* data' has two meanings: the pointer from application, and the buffer to create image2d from. It makes the code a little complex when supporting userptr for image. So, add a new function parameter to separate the two meanings. V2: fix when HAS_USERPTR is not enabled Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: xionghu.luo@intel.com
2015-09-22enable create image 2d from buffer in clCreateImage.Luo Xionghu1-27/+88
this patch allows create 2d image with a cl buffer with zero copy. v2: should use reference to manage the release the buffer and image. After being created, the buffer reference count is 2, and image reference count is 1. if image is released first, decrease the image reference count and buffer reference count both, release the bo when the buffer is released at last; if buffer is released first, decrease the buffer reference count only, release the buffer when the image is released. add CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT in cl_device_info. v3: move is_image_from_buffer to _cl_mem_image; return CL_INVALID_IMAGE_SIZE if image size is larger than the buffer. v4: pitchalignment set to 2. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-07-17runtime: fix a builtin-kernel related thread safe bug.Zhigang Gong1-1/+11
This patch fixed two thread-safe bugs in the builtin-kernel usage code path. 1. The builtin kernel array itself need to be protected. 2. Each caller need to get a dup of the builtin kernel, rather than share the same kernel structure. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-05-04Optimization of clEnqueueCopyBufferToImage for 16 aligned case.Luo Xionghu1-8/+36
We can change the image_channel_order to CL_RGBA and image_channel_data_type to CL_UNSIGNED_INT32 for some special case, thus 16 bytes can be read by one work item. Bandwidth is fully used. v2: merge patch 3 of initializing region0; remove k dimension in kernel for 2d image. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Weng, Chuanbo" <chuanbo.weng@intel.com>
2015-04-13GBE: refine error handling for private libva buffer sharing extension.Zhigang Gong1-0/+4
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-02-13Optimization of clEnqueueCopyImageToBuffer for 16 aligned case.Chuanbo Weng1-8/+36
We can change the image_channel_order to CL_RGBA and image_channel_data_type to CL_UNSIGNED_INT32 for some special case, thus 16 bytes can be read by one work item. Bandwidth is fully used. v2: Now we just optimize for IMAGE2D, so add judgement to not affect other image type's code path. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-02-09SKL: fix some 3D and 2D array image fail.Yang Rong1-1/+2
SKL need use aligned_pitch * aligned_h to calculate slice pitch, so add a new type of cl_buffer_get_tiling_align to get it. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-02-09runtime: don't free the host_ptr for a subbuffer.Zhigang Gong1-1/+3
When the buffer has CL_MEM_ALLOC_HOST_PTR, the runtime need to free the host_ptr at destructor. But if the buffer is a subbuffer, then its host ptr is not allocated by itself, we should not free it here. Otherwise, it may cause some weird errors such as: "corrupted double-linked list..". Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com>
2015-02-06Implement 1D/2D image array related cl_mem_kernel_copy_image in cl way ↵Chuanbo Weng1-12/+31
instead of cpu way. Before this patch, cl_mem_kernel_copy_image do cpu memory copy in order to copy image array objects. This is very slow for large image size. This patch implement image array copy in cl way, which dramatically accelerate image array related clEnqueueCopyImage. clCopyImage case in OpenCL conformance test will not be blocked anymore. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-02-06Return error, don't crash, on allocation failureRebecca N. Palmer1-10/+13
As previously noted, when cl_mem_allocate fails, its error handling then calls cl_mem_delete on the incompletely-set-up buffer, which aborts at assert(mem->ctx). This patch appears to fix the problem, but be warned I don't know this code well enough to know what else it might break. Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-30SKL: Use TILE_Y as default TILING mode in skl.Yang Rong1-1/+2
3D Image can't use TILE_X in skl so change to default TILING MODE to TILE_Y. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: He Junyan <Junyan.he@inbox.com>