Age | Commit message (Collapse) | Author | Files | Lines |
|
special versions of linux kernel and libdrm are needed.
utest and conformance test PASSED.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
|
|
It will output debug message under debug mode, and will
not output under release mode.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Reported to fix fix a ~50% performance regression (in OpenCV 3.0 and
Luxmark 2.1 among others) with v4.3 kernels on Gen9 hardware.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92975
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
this link fail appears on gcc 5.2.1.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
clLinkProgram need check the existence of "-cl-kernel-arg-info"
build_option of all the input_programs. User may link two SPIR
program and call clGetKernelArgInfo to query kernel args.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
forgot to add FROM_LLVM_SPIR in compileProgram; the BINARY_TYPE is
BINARY_TYPE_INTERMIDIATE if create from SPIR binary.
v2: refine the source_type logic: source_type is already set in
clCreateProgramWithSource or clCreateProgramWithBinary, shouldn't be set
in clBuildProgram or clCompileProgram.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
v2: move the memset inside the pointer check.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
CMRT is C for Media Runtime on Intel GPU, see https://github.com/01org/cmrt.
There is a request to make Beignet as intermedia layer of CMRT, in
other words, application programer write OpenCL APIs to execute the
CM kernel on GPU, the following shows the key code, and please refer
to the next patch of unit test for detail.
prog = clCreateProgramWithBinary("cm kernel");
clBuildProgram(prog);
kernel = clCreateKernel(prog, "kernel name");
image = clCreateImage();
clSetKernelArg(kernel, image);
clEnqueueNDRangeKernel(kernel);
Inside Beignet, once cm kernel is invoked, the following relative APIs
will be directly passed to CMRT library (libcmrt.so) which is loaded
via dlopen only when necessary. Since we use this simple method to
keep the code clean, OpenCL spec is not strictly followed, and cl_event
is not supported for this case.
v2: add comments about the cm queue in fuction cmrt_enqueue
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
per spec, if create image from USE_HOST_PTR buffer, the buffer's base
address need be aligned.
v2: return error code CL_VALID_IMAGE_FORMAT_DESCRIPTOR.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
SIMD_WIDTH.
It makes sense to set CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE to the
corresponding SIMD size. Then it provides a way for intel's OCL application
to get SIMD width at runtime and make some SIMD width dependant optimization
possible.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
Return CL_INVALID_CONTEXT if the context associated with
command_queue and events in event_wait_list are not the same.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Luo Xionghu <xionghu.luo@intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
The clock_gettime will cause the linkage error on some
version of GCC, we need to add -lrt at the end of the
link command line.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
The following items are supported in this commit:
1. Return residuals.
2. All types of mb_block_type, subpixel_mode, sad_adjust_mode in
cl_motion_estimation_desc_intel.
After this commit, cl_intel_motion_estimation is fully supported.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
define a MACRO to hold the value.
v2: use same MACRO in cl_extensions.h; add header file protection for
cl_extension.h.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
v2:
1. Just upload the first vme_state.
2. Remove duplicated code in check_opt1_extension.
3. Check image format before cl_gpgpu_bind_image_for_vme.
4. Fix error of getting mv. Because we suppose this kernel run in SIMD16
mode, so dword 0 of grf 1 should be
__gen_ocl_region(8,vme_result.s0), not
__gen_ocl_region(0,vme_result.s1).
v3:
Return CL_IMAGE_FORMAT_NOT_SUPPORTED if image format is not the required
one.
v4:
Fix two conflicts after code rebase and wordaround a curbe related bug.
v6:
Treat simd8 and simd16 differently when getting mv.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
Before this patch, Beignet can only create cl image from external bo by
its handle using clCreateImageFromLibvaIntel. Render node is the first
choice of accessing gpu in currect Beignet implementation. DRM_IOCTL_GEM_OPEN
is used by clCreateBufferFromLibvaIntel but forbidden in Render node mode.
So it's necessary to add this extension to support buffer sharing between
different libraries.
v2:
Seperate clCreateMemObjectFromFdIntel into two extensions: clCreateBufferFromFdINTEL
and clCreateImageFromFdINTEL.
v3:
Set depth of _cl_mem_image to 0 because it's CL_MEM_OBJECT_IMAGE2D type.
Fix rebase conflict: add a parameter when invoke cl_mem_allocate.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
buffer object's fd.
Before this patch, Beignet can only create cl buffer from external bo by
its handle using clCreateBufferFromLibvaIntel. Render node is the first
choice of accessing gpu in currect Beignet implementation. DRM_IOCTL_GEM_OPEN
is used by clCreateBufferFromLibvaIntel but forbidden in Render node mode.
So it's necessary to add this extension to support buffer sharing between
different libraries.
v2:
Seperate clCreateMemObjectFromFdIntel into two extensions:
clCreateBufferFromFdINTEL and clCreateImageFromFdINTEL.
v3:
Fix rebase conflict: add a parameter when invoke cl_mem_allocate.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
if image from buffer, the image's pitch should be same with
buffer bo's row pitch.
v2: correct style. image from buffer need update both aligned_pitch and
aligned_h, while image from user ptr only set aligned_pitch, so just
keep them independently.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Guo Yejun <yejun.guo@intel.com>
|
|
userptr requires the exact same memory layout between cpu and gpu,
since the current implementation uses the value of row_pitch*h, ignoring
the slice_pitch provided by the application. so, enable userptr
only if slice_pitch == row_pitch*h for image3d, 2darray and 1darray.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Now device and driver can support bigger memory, we need to abandon
our old 2G hard code. We get global memory by considering device
limitation, drm driver and kernel support and raw, this will ensure
a bigger global memory and a more stable system. We get max mem alloc
size from global memory size and the device limition.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
Now gen9 can support bigger buffer size, and it can also support
4G global memory. We add new function to support it.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
Now gen8 and gen9 support 4G global memory, and gen9 support
4G single buffer. Need to move the global_mem and max_mem_alloc
size into each define header.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
The uint32_t size is not enough for coming bigger
gpu memory, now GEN9 support 4G buffer. Also add
assertion for invalid size.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
We enable fp64 extension just on BDW platform. The
platforms before Gen7 will not have fp64 support.
We will enable fp64 on gen8 later platforms after
this feature is stable.
V3:
Unify the extersion setting for FP16 and FP64.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Luo Xionghu <xionghu.luo@intel.com>
|
|
currently, we support create program from 4 types of binary: SPIR(BITCODE),
LLVM Compiled Object, LLVM Library and GEN Binary. The detailed formats are
listed in code.
also use table to match or fill gen binary header in backend.
v2: use enum to replace the magic number.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
|
|
this regression issue is reported from conformance test, to enable
userptr for climage + use_host_ptr, the memory layout between the
host_ptr (for CPU) and drm bo (for GPU) must be the same. it means
bo's row pitch should be the same as image's row pitch, and h should
be the same as aligned h.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
fix case precesion fail: opencv_test_video/OCL_Video/PyrLKOpticalFlow.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
|
|
This reverts commit 729b16fdb387437f97115e938745ab1135151553.
./opencv_test_imgproc --gtest_filter=OCL_Imgproc/CLAHETest.* failed due
to this patch.
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
|
|
it could cause sampler data mismatch on IVB.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
|
|
the pointer must be 64 byte aligned, and only when w,h equals to its
aligned value, otherwise, roll back to the old method with extra copying.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
currently, 'void* data' has two meanings: the pointer from application,
and the buffer to create image2d from. It makes the code a little
complex when supporting userptr for image. So, add a new function
parameter to separate the two meanings.
V2: fix when HAS_USERPTR is not enabled
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: xionghu.luo@intel.com
|
|
We get an event out of NDRangeKernel, and we don't release it.
As an gpgpu event it can also make drm buffer leak, to avoid
potenial error we just release it.w
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Refine the event struct to make last_event become a list to store
all uncompeleted events and update them every queue flush. This can
make sure all events created in the runtime have a chance to update
status and run callback functions and then be deleted. We will also
fix the memory leak problem casued by uncompeted events.
This is a bugfix for https://bugs.freedesktop.org/show_bug.cgi?id=91710
The leaked events with gpu buffers will be unreferenced and cause other
drm buffer leak and result in terrible memory leak.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
This should be a typo, we should wait for the gpgpu and create
node only if the batch buffer is busy.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
This will dump the Gen ASM output to the file specified in the
-dump-opt-asm Link option during the Link program step.
Signed-off-by: Manasi Navare <manasi.d.navare@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
After the curbe allocation refactor, not all kernel arguments
will be allocated unconditional. If some kernel arguments haven't
been used at all, the corresponding arguments will be ignored
at backend thus we may get a -1 offset. On the runtime driver
side, we need check this situation.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
The major motivation is to normalize the curbe payload's
allocation and prepare to use liveness information
to avoid unecessary payload register allocation and avoid
fragments when allocate curbe registers. For an example,
for GBE_CURBE_LOCAL_ID_Y/Z, many one dimention
kernels don't need them. But previous curbe allocation
occurs before the liveness interval computing, thus it
will allocate that curbe anyway. Altough it will be expired
soon but it still need us to prepare those payload at
host side. After this patch, this type of overhead
has been eliminated easily.
Another purpose is to eliminate the ugly curbe patch list
handling in backend. After this patch, the curbe register
handling is much cleaner than before.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
It is a drm related bug. As the drm driver changed the time to free their test
userptr to bufmgr destroy(30921483c70c6939f017476eac13da6aa26b3b3c), we need
anothr order to release our driver to make sure the test userptr can be freed
with a valid fd.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Fix to calculate the current cpu monotonic raw timestamp in nanoseconds
for enqueued,submitted,start and finshed and send this to application
based on the parameter queries.
Signed-off-by: Midhun Kodiyath <midhunchandra.kodiyath@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
this patch allows create 2d image with a cl buffer with zero copy.
v2: should use reference to manage the release the buffer and image.
After being created, the buffer reference count is 2, and image reference
count is 1.
if image is released first, decrease the image reference count and
buffer reference count both, release the bo when the buffer is released
at last;
if buffer is released first, decrease the buffer reference count only,
release the buffer when the image is released.
add CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT in cl_device_info.
v3: move is_image_from_buffer to _cl_mem_image; return
CL_INVALID_IMAGE_SIZE if image size is larger than the buffer.
v4: pitchalignment set to 2.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
|
|
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
|
|
catch the error: out of host memery.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
All programs or none programs specified by input_programs contain a compiled binary or library
for the device. Otherwise return CL_INVALID_OPERATION.
Correct this condition check.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Luo, Xionghu <xionghu.luo@intel.com>
|
|
cl_buffer_get_subdata sometime is very very very slow in linux kernel, in skl and chv,
and it is random. So temporary disable it, use map/copy/unmap to read.
Should re-enable it after find root cause.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Luo, Xionghu <xionghu.luo@intel.com>
|
|
1. return CL_INVALID_LINKER_OPTIONS when invalid options, using clang to check the options.
2. return CL_INVALID_OPERATION when the binary type is not same.
3. When link fail, will not return CL_LINK_PROGRAM_FAILURE, fix it.
4. Should not delete program in genProgramBuildFromLLVM, the program is new and delete from runtime.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Luo, Xionghu <xionghu.luo@intel.com>
|
|
There is no NULL pointer check for kernel->program->build_opts.
This will cause utest test_get_arg_info crash.
In fact, we will add -cl-kernel-arg-info flag for compiling
ever time, and so the arg info is always avaible.
But some test case deliberately unset this flag and expect the ERR
return value, so we really need a check here.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|