Age | Commit message (Collapse) | Author | Files | Lines |
|
The CL_ENQUEUE_FILL_BUFFER_ALIGN8_* internal program is the same
program, only add the program's ref once, but when delete context,
caculate the internal program count, will add them individually.
This mismatch will cause the context be free by mistake.
New different CL_ENQUEUE_FILL_BUFFER_ALIGN8_* program for clearly.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
This patch mainly contains:
1. built-in function __gen_ocl_ime implementation.
2. Lots of built-in functions of cl_intel_device_side_avc_motion_estimation
are implemented.
3. This extension is required to run in simd16 mode.
v2: move the utests to seprate patches one by one;
as all the utests has extension function check, no need to put them
in stand alone utest;
uncomment the self test;
fix extension check logic issue, should be && instead of ||.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Signed-off-by: Xionghu Luo <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
built_in_prgs and built_in_kernels seems useless, remove them.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
Before release internal resources, must set them to null, otherwize,
when delete these resources, will call release context again.
The ctx->built_in_prgs should be release by application.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
Now change max group size to 256. it is a reasonable
size for Gen9. According to performance test, 256 make
good progress in openCV and no regression. So change it
Signed-off-by: rander.wang <rander.wang@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
There are some changes:
1. Clone the module before call LLVMLinkModules2, remove other
clones for it.
2. Don't delete module in function llvmToGen.
3. Add a function programNewFromLLVMFile so genProgramNewFromLLVM
and buildFromLLVMModule only handle llvm module. Actually,
programNewFromLLVMFile is only used by clCreateProgramWithLLVMIntel,
and I think it is useless, maybe we could delete it at all.
V2: define errDiag beside #if/#endif.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
It seems we missed some newly added device ID for SKL.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Account for internal program ctx references in cl_context_delete
Signed-off-by: Patrick Beaulieu <patrick.beaulieu@avigilon.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Add CL_DEVICE_SUB_GROUP_SIZES_INTEL for clGetDeviceInfo, add
CL_KERNEL_SPILL_MEM_SIZE_INTEL for clGetKernelWorkGroupInfo and add
CL_KERNEL_COMPILE_SUB_GROUP_SIZE_INTEL for clGetKernelSubGroupInfo.
We only have this extension for LLVM 40+ for frontend support.
V2: Add opencl-c define
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
It is similar with 2D image for avoiding extended image width truncated.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
1. Only copy the data by origin and region defined.
2. Add clFinish to guarantee the kernel copying is finished when blocking writing.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
1. Support wrrting data by mapping/unmapping mode.
2. Add mapping record logic.
3. Add clFinish to guarantee the kernel copying is finished.
4. Fix the error of calling clEnqueueMapImageByKernel.
blocking_map and map_flags need be switched.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
large image.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
image by kernel copying.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Found a missing macro that need change to support LLVM40+.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
"imagedim_non_pow_2" cases of basic modudle of confrmance shows
regression after use TILE_Y mode for large image by previous patch.
This bug comes from the non-align16 kernel of clEnqueueCopyBufferToImage
and clEnqueueCopyImageToBuffer.
It will force CL_RGBA/CL_UNORM_INT8/8191x8192 image of conformance test
to CL_R/CL_UNSIGNED_INT8/32764x8192 image for copying.
So it makes width as 8191 x 4 = 32764 and its width will exceed the maximum
width (16 x 1024 = 16384) of GEN surface state structure which only has 14 bits.
So use align4 copy kernel to avoid this bug.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
|
|
There is a race condition between building .bc and header files and
generating code from .cl targets. Fix the race by adding the
dependency to generated files.
Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
|
|
It will fail to copy data from host ptr to TILE_Y large image by memcpy.
Use clEnqueueCopyBufferToImage to do this on GPU side.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
It will fail to copy data from TILE_Y large image to buffer by memcpy.
Use clEnqueueCopyImageToBuffer to do this on GPU side.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
It will fail to copy data from TILE_Y large image to buffer by memcpy.
Use clEnqueueCopyImageToBuffer to do this.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
It may failed to copy data from host ptr to TILE_Y large image.
So use clCopyBufferToImage to do this on GPU side.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Geminilake is almost same as bxt, except intel_gpgpu_read_ts_reg
function.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=217635
Signed-off-by: Jan Beich <jbeich@freebsd.org>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
commit ff57cee0519d ("ocl20/runtime: take the first 64KB page table
entries") tries to allocate a bo at 0 offset, but failed to take into
account that something may already be allocated there that it is not
allowed to evict (particularly when not using full-ppgtt separation).
Failure to do so causes all execution to subsequentally fail with
"drm_intel_gem_bo_context_exec() failed: Device or resource busy"
Reported-by: Kenneth Johansson <ken@kenjo.org>
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=98647
Contributor: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
v2: add #define intel_media_block_io in libocl; move extension check
code to this patch;
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
create a w* (3/2*h) size bo for the whole CL_NV12_INTEL format
surface, and the y surface (format CL_R) share the first w * h
part, uv surface (format CL_RG) share the left w * 1/2h part; set
correct bo offset for uv surface per different platforms.
v2: add extension define in libocl; fix error check.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Move the generated builtin str and bin files into the Cmake build
directory to avoid chaos when changing LLVM.
V2: Fix a bug that the builtin.cl was not written into build dir.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Some applications use program's binary by default, if load the former's
gen binary, because the fields of gen binary has changed, and lack of version
checking, will lead to clCreateProgramWithBinary fail, may cause
applications fail silently.
Add a warning to hint user.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
Only return support device type (GPU and default) in function
cl_get_gt_device.
Contributor: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
This allows a single beignet binary to both offer 2.0 where
available, and still work on older hardware.
V2: Default to 1.2 when -cl-std is not set (required by the OpenCL spec,
and also likely to be faster).
V3: Only enable OpenCL 2.0 when llvm version is 39.
V4: Only enable OpenCL 2.0 on x64 host.
V5: Always return 32 as address bits.
Contributor: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
This is needed to support the chroma plane of P010 surfaces being
mapped from VAAPI.
Signed-off-by: Mark Thompson <sw@jkqxz.net>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
The context owns the array of devices passed to cl_context_new, so it's
its duty to free it.
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
The conditional was equal to the one before, and would never be hit
because internal kernels were reset after release. Instead, since the
body is resetting built-in kernels, it appears obvious that the
conditional should be on the existence of built-in kernels.
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
We are using SLM as local memory and we should return CL_LOCAL for
CL_DEVICE_LOCAL_MEM_TYPE.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
|
|
Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Chuanbo Weng <chuanbo.weng@intel.com>
|
|
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
1. Call finish before we destroy the command queue.
We should make sure all the commands in the queue are
finished before we really destroy the command_queue.
If not, may cause event status error. We leave the queue's
life time to user and do not ref the queue when create
event.
2. Loose the assert condition when notify queue.
We have the case when ref of the queue is 0 but still need
to notify.
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
As the "do we have any usable devices?" check uses this,
it needs to not crash even when we don't.
Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
SVM enqueues need to call cl_event_exec every time.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
|
|
When a event complete, we need to notify all the command_queue
within the same context. But sometime, some command_queue in
the context is already invalid.
Modify to ensure all the command_queue to be notified are
valid.
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
1. NDrangeKernel need to call cl_event_exec every time.
2. Enqueue Barrier event need to add to queue every time.
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
cl_event_exec it the uniformal entry for all event command execution,
call cl_enqueue_handle may miss time stamp record.
Replace all cl_enqueue_handle to cl_event_exec.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
‘cl_devices_list_include_check’ warning.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
There are some step to handle device enqueue:
1. allocate the device enqueue bo to store the device enqueue
information for parent kernel. Add must convert all global buffers to
SVM buffers to make sure the child kernels have the same GPU address.
2. When flush the command, check whether have device enqueue or not. If
has device enqueue, must wait finish and parse the device enqueue info.
3. Start the child ndrange according the device enqueue info, and the
parent's global buffers as the exec info.
Because of non uniform workgroup size, one enqueue api will flush
serveral times, but device enqueue only need handle once, so add a flag
to function cl_command_queue_flush to indicate the last flush.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
Because in flush the command queue, must check the currunt flushed
command queue has device enqueue or not, it need the cl_kernel. So store
the cl_kernel pointer to gpgpu. And add two function intel_gpgpu_set_kernel
and intel_gpgpu_get_kernel for it.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|