summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)AuthorFilesLines
2015-12-23Driver: Fix GPGPU delete bugPan Xiuli1-2/+2
The first patch 192feb51 has something wrong in rebase and takes new bug in. Now fix both the original bug and revert the wrong patch. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-23Runtime: because double's built-ins haven't completely support, so disable ↵Yang Rong1-0/+21
it by default. Add a cmake option for it, cmake with option -DEXPERIMENTAL_DOUBLE=true to enable it. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
2015-12-21add support for build option -cl-fast-relaxed-mathGuo Yejun1-1/+1
current code handles this option at clang level, actually, it is also necessary at LLVM -> GEN stage. V2: check if options is NULL Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14Runtime: Add the threadid calculation for curbe.Junyan He1-1/+11
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14add Broxton supportGuo Yejun3-7/+41
special versions of linux kernel and libdrm are needed. utest and conformance test PASSED. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Junyan He <junyan.he@linux.intel.com>
2015-12-10runtime: add macro DEBUGP() to handle debug printf.Ruiling Song2-3/+9
It will output debug message under debug mode, and will not output under release mode. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09SKL: Use kernel-defined MOCS values instead of assuming hardware defaults.Francisco Jerez1-2/+2
Reported to fix fix a ~50% performance regression (in OpenCV 3.0 and Luxmark 2.1 among others) with v4.3 kernels on Gen9 hardware. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92975 Signed-off-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09fix gcc build error.Luo Xionghu2-2/+2
this link fail appears on gcc 5.2.1. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09runtime: add missing supported format image_1d_buffer.Luo Xionghu1-0/+1
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09runtime: fix clLinkProgram bug.Luo Xionghu1-0/+10
clLinkProgram need check the existence of "-cl-kernel-arg-info" build_option of all the input_programs. User may link two SPIR program and call clGetKernelArgInfo to query kernel args. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09runtime: fix clCompileProgram bug.Luo Xionghu3-13/+6
forgot to add FROM_LLVM_SPIR in compileProgram; the BINARY_TYPE is BINARY_TYPE_INTERMIDIATE if create from SPIR binary. v2: refine the source_type logic: source_type is already set in clCreateProgramWithSource or clCreateProgramWithBinary, shouldn't be set in clBuildProgram or clCompileProgram. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09runtime: initialize the memory content to 0.Luo Xionghu1-0/+1
v2: move the memset inside the pointer check. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09make Beignet as intermedia layer of CMRTGuo Yejun14-14/+491
CMRT is C for Media Runtime on Intel GPU, see https://github.com/01org/cmrt. There is a request to make Beignet as intermedia layer of CMRT, in other words, application programer write OpenCL APIs to execute the CM kernel on GPU, the following shows the key code, and please refer to the next patch of unit test for detail. prog = clCreateProgramWithBinary("cm kernel"); clBuildProgram(prog); kernel = clCreateKernel(prog, "kernel name"); image = clCreateImage(); clSetKernelArg(kernel, image); clEnqueueNDRangeKernel(kernel); Inside Beignet, once cm kernel is invoked, the following relative APIs will be directly passed to CMRT library (libcmrt.so) which is loaded via dlopen only when necessary. Since we use this simple method to keep the code clean, OpenCL spec is not strictly followed, and cl_event is not supported for this case. v2: add comments about the cm queue in fuction cmrt_enqueue Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-26check image from buffer's base address alignment.Luo Xionghu1-0/+9
per spec, if create image from USE_HOST_PTR buffer, the buffer's base address need be aligned. v2: return error code CL_VALID_IMAGE_FORMAT_DESCRIPTOR. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-25runtime: set CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE to kernel's ↵Zhigang Gong5-6/+13
SIMD_WIDTH. It makes sense to set CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE to the corresponding SIMD size. Then it provides a way for intel's OCL application to get SIMD width at runtime and make some SIMD width dependant optimization possible. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-11-19Runtime: return the correct error code in cl_event_check_waitlist.Yang Rong1-2/+4
Return CL_INVALID_CONTEXT if the context associated with command_queue and events in event_wait_list are not the same. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Luo Xionghu <xionghu.luo@intel.com>
2015-11-17Runtime: Bind the profiling buffer when profiling enabled.Junyan He6-1/+126
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-17Add profiling info APIs to runtime.Junyan He2-0/+18
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-17CMake: Add -lrt to the link command of libcl.soJunyan He1-0/+1
The clock_gettime will cause the linkage error on some version of GCC, we need to add -lrt at the end of the link command line. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-11-17Full support of cl_intel_motion_estimation extension.Chuanbo Weng1-33/+166
The following items are supported in this commit: 1. Return residuals. 2. All types of mb_block_type, subpixel_mode, sad_adjust_mode in cl_motion_estimation_desc_intel. After this commit, cl_intel_motion_estimation is fully supported. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-11-11runtime: extension size not enough.Luo Xionghu3-3/+10
define a MACRO to hold the value. v2: use same MACRO in cl_extensions.h; add header file protection for cl_extension.h. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-11-10Add extensions intel_accelerator and basic intel_motion_estimation.Chuanbo Weng21-33/+914
v2: 1. Just upload the first vme_state. 2. Remove duplicated code in check_opt1_extension. 3. Check image format before cl_gpgpu_bind_image_for_vme. 4. Fix error of getting mv. Because we suppose this kernel run in SIMD16 mode, so dword 0 of grf 1 should be __gen_ocl_region(8,vme_result.s0), not __gen_ocl_region(0,vme_result.s1). v3: Return CL_IMAGE_FORMAT_NOT_SUPPORTED if image format is not the required one. v4: Fix two conflicts after code rebase and wordaround a curbe related bug. v6: Treat simd8 and simd16 differently when getting mv. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-11-10Add extension clCreateImageFromFdINTEL to create cl image by external fd.Chuanbo Weng6-0/+125
Before this patch, Beignet can only create cl image from external bo by its handle using clCreateImageFromLibvaIntel. Render node is the first choice of accessing gpu in currect Beignet implementation. DRM_IOCTL_GEM_OPEN is used by clCreateBufferFromLibvaIntel but forbidden in Render node mode. So it's necessary to add this extension to support buffer sharing between different libraries. v2: Seperate clCreateMemObjectFromFdIntel into two extensions: clCreateBufferFromFdINTEL and clCreateImageFromFdINTEL. v3: Set depth of _cl_mem_image to 0 because it's CL_MEM_OBJECT_IMAGE2D type. Fix rebase conflict: add a parameter when invoke cl_mem_allocate. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-11-10Add extension clCreateBufferFromFdINTEL to create cl buffer by external ↵Chuanbo Weng6-3/+93
buffer object's fd. Before this patch, Beignet can only create cl buffer from external bo by its handle using clCreateBufferFromLibvaIntel. Render node is the first choice of accessing gpu in currect Beignet implementation. DRM_IOCTL_GEM_OPEN is used by clCreateBufferFromLibvaIntel but forbidden in Render node mode. So it's necessary to add this extension to support buffer sharing between different libraries. v2: Seperate clCreateMemObjectFromFdIntel into two extensions: clCreateBufferFromFdINTEL and clCreateImageFromFdINTEL. v3: Fix rebase conflict: add a parameter when invoke cl_mem_allocate. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-11-04set the pitch of image from buffer to the buffer's pitch.Luo Xionghu1-1/+6
if image from buffer, the image's pitch should be same with buffer bo's row pitch. v2: correct style. image from buffer need update both aligned_pitch and aligned_h, while image from user ptr only set aligned_pitch, so just keep them independently. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo Yejun <yejun.guo@intel.com>
2015-11-03fix regression issue for climage + uesrptrGuo Yejun1-1/+2
userptr requires the exact same memory layout between cpu and gpu, since the current implementation uses the value of row_pitch*h, ignoring the slice_pitch provided by the application. so, enable userptr only if slice_pitch == row_pitch*h for image3d, 2darray and 1darray. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-03runtime: dynamically get global memory size and max alloc sizePan Xiuli2-4/+20
Now device and driver can support bigger memory, we need to abandon our old 2G hard code. We get global memory by considering device limitation, drm driver and kernel support and raw, this will ensure a bigger global memory and a more stable system. We get max mem alloc size from global memory size and the device limition. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
2015-11-03driver: add setup_bti_gen9 for bigger buffer up to 4GPan Xiuli1-2/+39
Now gen9 can support bigger buffer size, and it can also support 4G global memory. We add new function to support it. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
2015-11-03runtime: refine the cl_device_id to support bigger memoryPan Xiuli6-11/+73
Now gen8 and gen9 support 4G global memory, and gen9 support 4G single buffer. Need to move the global_mem and max_mem_alloc size into each define header. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
2015-11-03drivers: change the buf size to size_tPan Xiuli2-9/+12
The uint32_t size is not enough for coming bigger gpu memory, now GEN9 support 4G buffer. Also add assertion for invalid size. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
2015-10-27Runtime: Refine ext enable function for platform.Junyan He3-19/+54
We enable fp64 extension just on BDW platform. The platforms before Gen7 will not have fp64 support. We will enable fp64 on gen8 later platforms after this feature is stable. V3: Unify the extersion setting for FP16 and FP64. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-10-21Runtime: add CL_DEVICE_SPIR_VERSIONS to clGetDeviceInfo.Yang Rong3-0/+4
Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Luo Xionghu <xionghu.luo@intel.com>
2015-10-21use table to define and query binary headers.Luo Xionghu2-26/+39
currently, we support create program from 4 types of binary: SPIR(BITCODE), LLVM Compiled Object, LLVM Library and GEN Binary. The detailed formats are listed in code. also use table to match or fill gen binary header in backend. v2: use enum to replace the magic number. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-10-20add conditions of pitch and h to enable userptr for climage_use_host_ptrGuo Yejun1-1/+4
this regression issue is reported from conformance test, to enable userptr for climage + use_host_ptr, the memory layout between the host_ptr (for CPU) and drm bo (for GPU) must be the same. it means bo's row pitch should be the same as image's row pitch, and h should be the same as aligned h. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-10-20pitchalignment should be set to 1.Luo Xionghu1-1/+1
fix case precesion fail: opencv_test_video/OCL_Video/PyrLKOpticalFlow. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-10-20Revert "return 32 could gain 0.2% performance on opencv optical flow case."Luo Xionghu1-1/+1
This reverts commit 729b16fdb387437f97115e938745ab1135151553. ./opencv_test_imgproc --gtest_filter=OCL_Imgproc/CLAHETest.* failed due to this patch. Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-10-20alignment of NO TILING surface limitation shouldn't be removed.Luo Xionghu1-3/+4
it could cause sampler data mismatch on IVB. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-10-14enable USE_HOST_PTR for cl image with userptr to avoid extra copyingGuo Yejun3-18/+44
the pointer must be 64 byte aligned, and only when w,h equals to its aligned value, otherwise, roll back to the old method with extra copying. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-10-14refine code to separate the usage of data and image2d_from_bufferGuo Yejun3-26/+38
currently, 'void* data' has two meanings: the pointer from application, and the buffer to create image2d from. It makes the code a little complex when supporting userptr for image. So, add a new function parameter to separate the two meanings. V2: fix when HAS_USERPTR is not enabled Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: xionghu.luo@intel.com
2015-10-13Fix a event leak in create contextPan Xiuli1-0/+1
We get an event out of NDRangeKernel, and we don't release it. As an gpgpu event it can also make drm buffer leak, to avoid potenial error we just release it.w Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-10-13runtime: refine the last_event in queue to a listPan Xiuli3-27/+55
Refine the event struct to make last_event become a list to store all uncompeleted events and update them every queue flush. This can make sure all events created in the runtime have a chance to update status and run callback functions and then be deleted. We will also fix the memory leak problem casued by uncompeted events. This is a bugfix for https://bugs.freedesktop.org/show_bug.cgi?id=91710 The leaked events with gpu buffers will be unreferenced and cause other drm buffer leak and result in terrible memory leak. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-10-13Fix gpgpu node related bugPan Xiuli1-1/+1
This should be a typo, we should wait for the gpgpu and create node only if the batch buffer is busy. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-10-13Add -dump-opt-asm support to the clLinkProgram() APIManasi Navare1-3/+1
This will dump the Gen ASM output to the file specified in the -dump-opt-asm Link option during the Link program step. Signed-off-by: Manasi Navare <manasi.d.navare@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-10-12GBE: fix kernel arguments uploading bug.Zhigang Gong2-5/+9
After the curbe allocation refactor, not all kernel arguments will be allocated unconditional. If some kernel arguments haven't been used at all, the corresponding arguments will be ignored at backend thus we may get a -1 offset. On the runtime driver side, we need check this situation. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-24GBE: refactor curbe register allocation.Zhigang Gong3-20/+30
The major motivation is to normalize the curbe payload's allocation and prepare to use liveness information to avoid unecessary payload register allocation and avoid fragments when allocate curbe registers. For an example, for GBE_CURBE_LOCAL_ID_Y/Z, many one dimention kernels don't need them. But previous curbe allocation occurs before the liveness interval computing, thus it will allocate that curbe anyway. Altough it will be expired soon but it still need us to prepare those payload at host side. After this patch, this type of overhead has been eliminated easily. Another purpose is to eliminate the ugly curbe patch list handling in backend. After this patch, the curbe register handling is much cleaner than before. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-23Driver: fix the annoying "Failed to release userptr..." error messagePan Xiuli1-2/+4
It is a drm related bug. As the drm driver changed the time to free their test userptr to bufmgr destroy(30921483c70c6939f017476eac13da6aa26b3b3c), we need anothr order to release our driver to make sure the test userptr can be freed with a valid fd. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-23Calculate appropriate timestamps for cl profileMidhun Kodiyath3-4/+71
Fix to calculate the current cpu monotonic raw timestamp in nanoseconds for enqueued,submitted,start and finshed and send this to application based on the parameter queries. Signed-off-by: Midhun Kodiyath <midhunchandra.kodiyath@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-22enable create image 2d from buffer in clCreateImage.Luo Xionghu7-29/+99
this patch allows create 2d image with a cl buffer with zero copy. v2: should use reference to manage the release the buffer and image. After being created, the buffer reference count is 2, and image reference count is 1. if image is released first, decrease the image reference count and buffer reference count both, release the bo when the buffer is released at last; if buffer is released first, decrease the buffer reference count only, release the buffer when the image is released. add CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT in cl_device_info. v3: move is_image_from_buffer to _cl_mem_image; return CL_INVALID_IMAGE_SIZE if image size is larger than the buffer. v4: pitchalignment set to 2. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-09-22return 32 could gain 0.2% performance on opencv optical flow case.Luo Xionghu1-1/+1
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-09-21should check the return value of cl_program_new.Luo Xionghu1-0/+18
catch the error: out of host memery. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>