summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)AuthorFilesLines
2017-07-27Runtime: fix the context ref is not 0 assert when delete.Yang, Rong R1-22/+8
The CL_ENQUEUE_FILL_BUFFER_ALIGN8_* internal program is the same program, only add the program's ref once, but when delete context, caculate the internal program count, will add them individually. This mismatch will cause the context be free by mistake. New different CL_ENQUEUE_FILL_BUFFER_ALIGN8_* program for clearly. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-07-27Runtime: fix a cl_gpgpu_bind_image_for_vme NULL SIGSEGV.Yang, Rong R1-1/+2
Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-07-12Implement extension cl_intel_device_side_avc_motion_estimation.Chuanbo Weng6-3/+148
This patch mainly contains: 1. built-in function __gen_ocl_ime implementation. 2. Lots of built-in functions of cl_intel_device_side_avc_motion_estimation are implemented. 3. This extension is required to run in simd16 mode. v2: move the utests to seprate patches one by one; as all the utests has extension function check, no need to put them in stand alone utest; uncomment the self test; fix extension check logic issue, should be && instead of ||. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Signed-off-by: Xionghu Luo <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-07-10Runtime: remove ctx's useless fileds.Yang, Rong R3-43/+5
built_in_prgs and built_in_kernels seems useless, remove them. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-07-10Runtime: fix a recurrent release context error.Yang, Rong R1-10/+8
Before release internal resources, must set them to null, otherwize, when delete these resources, will call release context again. The ctx->built_in_prgs should be release by application. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-07-04Runtime: refine max group size for SKL & KBLrander1-9/+9
Now change max group size to 256. it is a reasonable size for Gen9. According to performance test, 256 make good progress in openCV and no regression. So change it Signed-off-by: rander.wang <rander.wang@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-06-23GBE: clean llvm module's clone and release.Yang, Rong R3-1/+7
There are some changes: 1. Clone the module before call LLVMLinkModules2, remove other clones for it. 2. Don't delete module in function llvmToGen. 3. Add a function programNewFromLLVMFile so genProgramNewFromLLVM and buildFromLLVMModule only handle llvm module. Actually, programNewFromLLVMFile is only used by clCreateProgramWithLLVMIntel, and I think it is useless, maybe we could delete it at all. V2: define errDiag beside #if/#endif. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-06-22Add missed kernel names into built-in kernel list.Yan Wang1-1/+16
Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-06-22Runtime: Add missing SKL deivce IDPan Xiuli2-1/+9
It seems we missed some newly added device ID for SKL. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-06-16Fix context leak with internal kernelsPatrick Beaulieu1-1/+21
Account for internal program ctx references in cl_context_delete Signed-off-by: Patrick Beaulieu <patrick.beaulieu@avigilon.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-06-16Runtime: Add new API enums for cl_intel_required_subgroup_size extensionPan Xiuli5-0/+40
Add CL_DEVICE_SUB_GROUP_SIZES_INTEL for clGetDeviceInfo, add CL_KERNEL_SPILL_MEM_SIZE_INTEL for clGetKernelWorkGroupInfo and add CL_KERNEL_COMPILE_SUB_GROUP_SIZE_INTEL for clGetKernelSubGroupInfo. We only have this extension for LLVM 40+ for frontend support. V2: Add opencl-c define Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-06-14Use aligned16 and aligne4 kernel to copy for large 3D image with TILE_Y.Yan Wang7-37/+149
It is similar with 2D image for avoiding extended image width truncated. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-06-13Optimize clEnqueueWriteImageByKernel and clEnqueuReadImageByKernel.Yan Wang1-7/+18
1. Only copy the data by origin and region defined. 2. Add clFinish to guarantee the kernel copying is finished when blocking writing. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-06-13Fix bug of clEnqueueUnmapMemObjectForKernel and clEnqueueMapImageByKernel.Yan Wang1-34/+113
1. Support wrrting data by mapping/unmapping mode. 2. Add mapping record logic. 3. Add clFinish to guarantee the kernel copying is finished. 4. Fix the error of calling clEnqueueMapImageByKernel. blocking_map and map_flags need be switched. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-06-13Add clFinish for guarantee the kernel copying is finished when create TILE_Y ↵Yan Wang1-0/+7
large image. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-06-13Add cl_mem_record_map_mem_for_kernel() for record map adress for TILE_Y ↵Yan Wang2-26/+88
image by kernel copying. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-06-09Runtime: Fix a mssing llvm version marco for LLVM40+Pan Xiuli1-1/+1
Found a missing macro that need change to support LLVM40+. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-05-25Fix bug of clEnqueueCopyBufferToImage and clEnqueueCopyImageToBuffer.Yan Wang5-28/+89
"imagedim_non_pow_2" cases of basic modudle of confrmance shows regression after use TILE_Y mode for large image by previous patch. This bug comes from the non-align16 kernel of clEnqueueCopyBufferToImage and clEnqueueCopyImageToBuffer. It will force CL_RGBA/CL_UNORM_INT8/8191x8192 image of conformance test to CL_R/CL_UNSIGNED_INT8/32764x8192 image for copying. So it makes width as 8191 x 4 = 32764 and its width will exceed the maximum width (16 x 1024 = 16384) of GEN surface state structure which only has 14 bits. So use align4 copy kernel to avoid this bug. Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
2017-05-25build: fix cmake code generation dependencies.Ismo Puustinen1-2/+2
There is a race condition between building .bc and header files and generating code from .cl targets. Fix the race by adding the dependency to generated files. Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
2017-05-18Implement TILE_Y large image in clEnqueueWriteImage.Yan Wang1-0/+46
It will fail to copy data from host ptr to TILE_Y large image by memcpy. Use clEnqueueCopyBufferToImage to do this on GPU side. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-05-18Implement TILE_Y large image in clEnqueueReadImage.Yan Wang1-0/+55
It will fail to copy data from TILE_Y large image to buffer by memcpy. Use clEnqueueCopyImageToBuffer to do this on GPU side. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-05-18Implement TILE_Y large image in clEnqueueMapImage and clEnqueueUnmapMemObject.Yan Wang1-0/+111
It will fail to copy data from TILE_Y large image to buffer by memcpy. Use clEnqueueCopyImageToBuffer to do this. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-05-18Create image with TILE_Y mode still when image size>128MB for performance.Yan Wang4-6/+111
It may failed to copy data from host ptr to TILE_Y large image. So use clCopyBufferToImage to do this on GPU side. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-05-15GLK: add geminilake runtime support.Yang Rong2-2/+47
Geminilake is almost same as bxt, except intel_gpgpu_read_ts_reg function. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-05-15GLK: add Geminilake pciids.Yang Rong1-1/+8
Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-03-23Limit get_program_global_data() calls to OpenCL 2.0Jan Beich1-2/+4
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=217635 Signed-off-by: Jan Beich <jbeich@freebsd.org> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-03-17intel: Check that we can reserve the zero-offsetYang Rong1-11/+20
commit ff57cee0519d ("ocl20/runtime: take the first 64KB page table entries") tries to allocate a bo at 0 offset, but failed to take into account that something may already be allocated there that it is not allowed to evict (particularly when not using full-ppgtt separation). Failure to do so causes all execution to subsequentally fail with "drm_intel_gem_bo_context_exec() failed: Device or resource busy" Reported-by: Kenneth Johansson <ken@kenjo.org> Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=98647 Contributor: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-03-13add extension cl_intel_media_block_io READ related functionLuo Xionghu1-0/+1
v2: add #define intel_media_block_io in libocl; move extension check code to this patch; Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-03-13add extension intel_planar_yuv.Luo Xionghu9-24/+217
create a w* (3/2*h) size bo for the whole CL_NV12_INTEL format surface, and the y surface (format CL_R) share the first w * h part, uv surface (format CL_RG) share the left w * 1/2h part; set correct bo offset for uv surface per different platforms. v2: add extension define in libocl; fix error check. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-03-07CMAKE: Refine builtin kernel bin generatorPan Xiuli1-7/+7
Move the generated builtin str and bin files into the Cmake build directory to avoid chaos when changing LLVM. V2: Fix a bug that the builtin.cl was not written into build dir. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-02-14Runtime: add a warning when load gen binary fail.Yang Rong1-0/+1
Some applications use program's binary by default, if load the former's gen binary, because the fields of gen binary has changed, and lack of version checking, will lead to clCreateProgramWithBinary fail, may cause applications fail silently. Add a warning to hint user. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-02-14Runtime: fix get non support type device bug.Yang Rong2-4/+8
Only return support device type (GPU and default) in function cl_get_gt_device. Contributor: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-02-14Enable OpenCL 2.0 only where supportedPan Xiuli5-8/+18
This allows a single beignet binary to both offer 2.0 where available, and still work on older hardware. V2: Default to 1.2 when -cl-std is not set (required by the OpenCL spec, and also likely to be faster). V3: Only enable OpenCL 2.0 when llvm version is 39. V4: Only enable OpenCL 2.0 on x64 host. V5: Always return 32 as address bits. Contributor: Rebecca N. Palmer <rebecca_palmer@zoho.com> Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-02-14Enable support for two-component 16-bit planesMark Thompson1-0/+2
This is needed to support the chroma plane of P010 surfaces being mapped from VAAPI. Signed-off-by: Mark Thompson <sw@jkqxz.net> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-02-10Free context devices on context releaseGiuseppe Bilotta1-0/+1
The context owns the array of devices passed to cl_context_new, so it's its duty to free it. Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-02-10Fix obvious copy-pasteGiuseppe Bilotta1-1/+1
The conditional was equal to the one before, and would never be hit because internal kernels were reset after release. Instead, since the body is resetting built-in kernels, it appears obvious that the conditional should be on the existence of built-in kernels. Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-02-10API: Fix local memory type to CL_LOCALPan Xiuli4-4/+4
We are using SLM as local memory and we should return CL_LOCAL for CL_DEVICE_LOCAL_MEM_TYPE. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-02-08Typo in error messageGiuseppe Bilotta1-1/+1
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Reviewed-by: He Junyan <junyan.he@inbox.com>
2017-02-06Make CL-GL sharing available via ICDRebecca N. Palmer1-12/+16
Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com> Reviewed-by: Chuanbo Weng <chuanbo.weng@intel.com>
2017-01-19Android.mk: update Android.mk for android build.Yang Rong1-3/+15
Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-01-11Add some pointer access check.Yang Rong3-1/+5
Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-01-11Fix two bugs about command queue destroy.Junyan He2-1/+4
1. Call finish before we destroy the command queue. We should make sure all the commands in the queue are finished before we really destroy the command_queue. If not, may cause event status error. We leave the queue's life time to user and do not ref the queue when create event. 2. Loose the assert condition when notify queue. We have the case when ref of the queue is 0 but still need to notify. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-01-09Fail, don't assert, if unable to create contextRebecca N. Palmer1-3/+5
As the "do we have any usable devices?" check uses this, it needs to not crash even when we don't. Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-01-06Runtime: Fix a event bug.Yang Rong1-7/+21
SVM enqueues need to call cl_event_exec every time. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Junyan He <junyan.he@linux.intel.com>
2017-01-06Fix a event notify bug.Junyan He4-48/+27
When a event complete, we need to notify all the command_queue within the same context. But sometime, some command_queue in the context is already invalid. Modify to ensure all the command_queue to be notified are valid. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-01-05Fix two bugs about event.Junyan He2-16/+13
1. NDrangeKernel need to call cl_event_exec every time. 2. Enqueue Barrier event need to add to queue every time. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-12-30Runtime: fix a profiling fail.Yang Rong3-23/+6
cl_event_exec it the uniformal entry for all event command execution, call cl_enqueue_handle may miss time stamp record. Replace all cl_enqueue_handle to cl_event_exec. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2016-12-30Runtime: add the head file to avoid implicit declaration of function ↵Yang Rong1-0/+1
‘cl_devices_list_include_check’ warning. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2016-12-30OCL20: handle device enqueue in runtime.Yang, Rong R15-700/+1006
There are some step to handle device enqueue: 1. allocate the device enqueue bo to store the device enqueue information for parent kernel. Add must convert all global buffers to SVM buffers to make sure the child kernels have the same GPU address. 2. When flush the command, check whether have device enqueue or not. If has device enqueue, must wait finish and parse the device enqueue info. 3. Start the child ndrange according the device enqueue info, and the parent's global buffers as the exec info. Because of non uniform workgroup size, one enqueue api will flush serveral times, but device enqueue only need handle once, so add a flag to function cl_command_queue_flush to indicate the last flush. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2016-12-30OCL20: add a cl_kernel pointer to gpgpu.Yang, Rong R4-5/+31
Because in flush the command queue, must check the currunt flushed command queue has device enqueue or not, it need the cl_kernel. So store the cl_kernel pointer to gpgpu. And add two function intel_gpgpu_set_kernel and intel_gpgpu_get_kernel for it. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>