Age | Commit message (Collapse) | Author | Files | Lines |
|
It is similar with 2D image for avoiding extended image width truncated.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
large image.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
image by kernel copying.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
"imagedim_non_pow_2" cases of basic modudle of confrmance shows
regression after use TILE_Y mode for large image by previous patch.
This bug comes from the non-align16 kernel of clEnqueueCopyBufferToImage
and clEnqueueCopyImageToBuffer.
It will force CL_RGBA/CL_UNORM_INT8/8191x8192 image of conformance test
to CL_R/CL_UNSIGNED_INT8/32764x8192 image for copying.
So it makes width as 8191 x 4 = 32764 and its width will exceed the maximum
width (16 x 1024 = 16384) of GEN surface state structure which only has 14 bits.
So use align4 copy kernel to avoid this bug.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
|
|
It may failed to copy data from host ptr to TILE_Y large image.
So use clCopyBufferToImage to do this on GPU side.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
create a w* (3/2*h) size bo for the whole CL_NV12_INTEL format
surface, and the y surface (format CL_R) share the first w * h
part, uv surface (format CL_RG) share the left w * 1/2h part; set
correct bo offset for uv surface per different platforms.
v2: add extension define in libocl; fix error check.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
There are some step to handle device enqueue:
1. allocate the device enqueue bo to store the device enqueue
information for parent kernel. Add must convert all global buffers to
SVM buffers to make sure the child kernels have the same GPU address.
2. When flush the command, check whether have device enqueue or not. If
has device enqueue, must wait finish and parse the device enqueue info.
3. Start the child ndrange according the device enqueue info, and the
parent's global buffers as the exec info.
Because of non uniform workgroup size, one enqueue api will flush
serveral times, but device enqueue only need handle once, so add a flag
to function cl_command_queue_flush to indicate the last flush.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
Also remove the useless function cl_context_add_svm.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
Make the list related functions more clear and readable.
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
In future there may be more than one device on the platform,
we need to consider multi devices within one context.
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
it fixes issue at https://bugs.freedesktop.org/show_bug.cgi?id=98490
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
clGetMemObjectInfo with CL_MEM_ASSOCIATED_MEMOBJECT should return
the mem in cl_image_desc. As in CL_MEM_OBJECT_IMAGE1D_BUFFER we
copy the buffer, add a workaround for it.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Add clCreatePipe and clGetPipeInfo
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Enable CL_DEVICE_SVM_COARSE_GRAIN_BUFFER svm support, use userptr and softpin
to implement it. Use userptr to share the page between cpu and gpu, and softpin
to unify the cpu and gpu's address. Now it works on i386 system. x86_64 depends on
backend support.
This patch base on DRM library and DRM kernel driver's softpin patch:
http://lists.freedesktop.org/archives/intel-gfx/2015-September/075446.html.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
Rewrite the cl_event, and modify all the event functions
using this new event manner. Event will co-operate with
command queue's thread together.
v2:
Fix a logic problem in event create failed.
V3:
Set enqueue default to do nothing, handle some enqueue has nothing
to do.
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
We use context's lock when we add and delete cl objects.
Every cl object should use it's own lock to protect itself.
We also add some helper functions to ease the adding and
removing operations.
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
In order to query low level layout of GL buffer object/texture/render
buffer, previous implementation introduced an egl extension and
implemented in Beignet side. This way is broken once mesa change its
related internal code. In this patch, we use an new egl extension
(EGL_MESA_image_dma_buf_export) to query related layout infomations
of gl texture. Since this egl extension is already accepted by Khronos,
so it's a stable method. This patch just implement GL texture 2d buffer
sharing, and we will implement other target type if necessary.
v2:
Add CMake build option to enable cl_khr_gl_sharing(default off).
Clean up related CMake code.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
and also change the utest to hit the potential case
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Userptr also require size cache alignment, otherwise, the remained
memory may be allocated in CPU side, when gpu flush the last cacheline
to memory, will override the value changed by CPU.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com>
|
|
The size need to be set, so clGetMemObjectInfo(.., CL_MEM_SIZE, ..) can
return actual size of this cl_image instead of 0.
Also some code refinement to make logic more clear.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
implementation.
Reported by Feng Yuan.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Image with hostptr can not use tiling and can be very slow when need
access image. Disable image hostptr for default for good profermance.
Add an option OCL_IMAGE_HOSTPTR to enable ture image hostptr.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
CMRT is C for Media Runtime on Intel GPU, see https://github.com/01org/cmrt.
There is a request to make Beignet as intermedia layer of CMRT, in
other words, application programer write OpenCL APIs to execute the
CM kernel on GPU, the following shows the key code, and please refer
to the next patch of unit test for detail.
prog = clCreateProgramWithBinary("cm kernel");
clBuildProgram(prog);
kernel = clCreateKernel(prog, "kernel name");
image = clCreateImage();
clSetKernelArg(kernel, image);
clEnqueueNDRangeKernel(kernel);
Inside Beignet, once cm kernel is invoked, the following relative APIs
will be directly passed to CMRT library (libcmrt.so) which is loaded
via dlopen only when necessary. Since we use this simple method to
keep the code clean, OpenCL spec is not strictly followed, and cl_event
is not supported for this case.
v2: add comments about the cm queue in fuction cmrt_enqueue
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
per spec, if create image from USE_HOST_PTR buffer, the buffer's base
address need be aligned.
v2: return error code CL_VALID_IMAGE_FORMAT_DESCRIPTOR.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Before this patch, Beignet can only create cl image from external bo by
its handle using clCreateImageFromLibvaIntel. Render node is the first
choice of accessing gpu in currect Beignet implementation. DRM_IOCTL_GEM_OPEN
is used by clCreateBufferFromLibvaIntel but forbidden in Render node mode.
So it's necessary to add this extension to support buffer sharing between
different libraries.
v2:
Seperate clCreateMemObjectFromFdIntel into two extensions: clCreateBufferFromFdINTEL
and clCreateImageFromFdINTEL.
v3:
Set depth of _cl_mem_image to 0 because it's CL_MEM_OBJECT_IMAGE2D type.
Fix rebase conflict: add a parameter when invoke cl_mem_allocate.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
buffer object's fd.
Before this patch, Beignet can only create cl buffer from external bo by
its handle using clCreateBufferFromLibvaIntel. Render node is the first
choice of accessing gpu in currect Beignet implementation. DRM_IOCTL_GEM_OPEN
is used by clCreateBufferFromLibvaIntel but forbidden in Render node mode.
So it's necessary to add this extension to support buffer sharing between
different libraries.
v2:
Seperate clCreateMemObjectFromFdIntel into two extensions:
clCreateBufferFromFdINTEL and clCreateImageFromFdINTEL.
v3:
Fix rebase conflict: add a parameter when invoke cl_mem_allocate.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
if image from buffer, the image's pitch should be same with
buffer bo's row pitch.
v2: correct style. image from buffer need update both aligned_pitch and
aligned_h, while image from user ptr only set aligned_pitch, so just
keep them independently.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Guo Yejun <yejun.guo@intel.com>
|
|
userptr requires the exact same memory layout between cpu and gpu,
since the current implementation uses the value of row_pitch*h, ignoring
the slice_pitch provided by the application. so, enable userptr
only if slice_pitch == row_pitch*h for image3d, 2darray and 1darray.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
this regression issue is reported from conformance test, to enable
userptr for climage + use_host_ptr, the memory layout between the
host_ptr (for CPU) and drm bo (for GPU) must be the same. it means
bo's row pitch should be the same as image's row pitch, and h should
be the same as aligned h.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
it could cause sampler data mismatch on IVB.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
|
|
the pointer must be 64 byte aligned, and only when w,h equals to its
aligned value, otherwise, roll back to the old method with extra copying.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
currently, 'void* data' has two meanings: the pointer from application,
and the buffer to create image2d from. It makes the code a little
complex when supporting userptr for image. So, add a new function
parameter to separate the two meanings.
V2: fix when HAS_USERPTR is not enabled
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: xionghu.luo@intel.com
|
|
this patch allows create 2d image with a cl buffer with zero copy.
v2: should use reference to manage the release the buffer and image.
After being created, the buffer reference count is 2, and image reference
count is 1.
if image is released first, decrease the image reference count and
buffer reference count both, release the bo when the buffer is released
at last;
if buffer is released first, decrease the buffer reference count only,
release the buffer when the image is released.
add CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT in cl_device_info.
v3: move is_image_from_buffer to _cl_mem_image; return
CL_INVALID_IMAGE_SIZE if image size is larger than the buffer.
v4: pitchalignment set to 2.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
|
|
This patch fixed two thread-safe bugs in the builtin-kernel
usage code path.
1. The builtin kernel array itself need to be protected.
2. Each caller need to get a dup of the builtin kernel,
rather than share the same kernel structure.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
We can change the image_channel_order to CL_RGBA and
image_channel_data_type to CL_UNSIGNED_INT32 for some special
case, thus 16 bytes can be read by one work item. Bandwidth is
fully used.
v2: merge patch 3 of initializing region0;
remove k dimension in kernel for 2d image.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: "Weng, Chuanbo" <chuanbo.weng@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
We can change the image_channel_order to CL_RGBA and
image_channel_data_type to CL_UNSIGNED_INT32 for some special
case, thus 16 bytes can be read by one work item. Bandwidth is
fully used.
v2:
Now we just optimize for IMAGE2D, so add judgement to not affect
other image type's code path.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
SKL need use aligned_pitch * aligned_h to calculate slice pitch, so add a new type of
cl_buffer_get_tiling_align to get it.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
When the buffer has CL_MEM_ALLOC_HOST_PTR, the runtime
need to free the host_ptr at destructor. But if the buffer
is a subbuffer, then its host ptr is not allocated by itself,
we should not free it here. Otherwise, it may cause
some weird errors such as:
"corrupted double-linked list..".
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com>
|
|
instead of cpu way.
Before this patch, cl_mem_kernel_copy_image do cpu memory copy in order
to copy image array objects. This is very slow for large image size.
This patch implement image array copy in cl way, which dramatically
accelerate image array related clEnqueueCopyImage.
clCopyImage case in OpenCL conformance test will not be blocked anymore.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
As previously noted, when cl_mem_allocate fails, its error handling
then calls cl_mem_delete on the incompletely-set-up buffer, which
aborts at assert(mem->ctx).
This patch appears to fix the problem, but be warned I don't know this
code well enough to know what else it might break.
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
3D Image can't use TILE_X in skl so change to default TILING MODE to TILE_Y.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: He Junyan <Junyan.he@inbox.com>
|