Age | Commit message (Collapse) | Author | Files | Lines |
|
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Luo Xionghu <xionghu.luo@intel.com>
|
|
this patch allows create 2d image with a cl buffer with zero copy.
v2: should use reference to manage the release the buffer and image.
After being created, the buffer reference count is 2, and image reference
count is 1.
if image is released first, decrease the image reference count and
buffer reference count both, release the bo when the buffer is released
at last;
if buffer is released first, decrease the buffer reference count only,
release the buffer when the image is released.
add CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT in cl_device_info.
v3: move is_image_from_buffer to _cl_mem_image; return
CL_INVALID_IMAGE_SIZE if image size is larger than the buffer.
v4: pitchalignment set to 2.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
|
|
Device ID and vendor ID are not same.Set the correct vendor ID.
Signed-off-by: Midhun Kodiyath <midhunchandra.kodiyath@intel.com>
Reviewed-by: Song, Ruiling <ruiling.song@intel.com>
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
|
|
The cl device may have different extensions from the
platform. We will add some items based on the platform
extensions.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
check the selftest kernel return value, if enqueue kernel failed,
set the flag to not enable atomics the L3 for HSW.
This reverts commit 83f8739b6fc4893fac60145326052ccb5cf653dc.
v2: don't use global variable to pass value from runtime to driver.
v3: add type SELF_TEST_OTHER_FAIL to differentiate from SELF_TEST_ATOMIC_FAIL;
seperate the ATOMIC_FAIL from SLM_FAIL, only SLM_FAIL can be control by
env OCL_IGNORE_SELF_TEST.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang, Rong <rong.r.yang@intel.com>
|
|
To make the license statement consistent to each other, adjust
all license versions to v2.1+. Thus beignet should have a pure
LGPL v2.1+ license.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
Per OpenCL spec 1.2:
CL_DEVICE_IMAGE_MAX_BUFFER_SIZE should be size_t type rather
than cl_ulong.
This bug will cause problems on i386 platform.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
|
|
When SLM enable, get kernal max workgroup size should return the a sub slice's max thread * simdwidth.
So need the sub slice information.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
|
|
add CL_KERNEL_GLOBAL_WORK_SIZE option for clGetKernelWorkGroupInfo.
v2: should return the max global work size instead of current work size.
This funtion need return CL_INVALID_VALUE if the device is not a custom
device or kernel is not a built-in kernel.
we have 3 kind of built-in kernels for 1d/2d/3d memories, the max global
work size are decided by the dimension and memory type.
the piglit fail is caused by calling NON built-in kernels, so need send
patch to piglit later.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
If the kernel doesn't use slm/barrier, there is no hard limitation
for the max group size. And if the max work group size is more than
1024, the original 64 urb entry count will not be sufficient to hold
all the curbe payload. Change the entry count to max thread count to
fix this potential issue.
I found this bug when I tried to run phoronix test suite's juliagpu
test case on my MBA.
v2:
refine the max kernel work group size calculation mechanism.
the wg_sz should not be a device's member variable, it should be
a variable derived from kernel and device's attriute at runtime.
also fix wrong configuration for IVB GT1.
v3:
Add an important max thread limitation in the GPGPU_WALKER command.
For non-Baytrail, the max thread depth * max thread height * max thread width
should less than 64 (under either simd16 or simd8), no matter whether
SLM/barrier is used. We oversighted that limitation before, thus for
a simd8 kernel which use work group size 1024 will exceed this limitation
and half of the thread will not be executed at all.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
Include CL_DEVICE_LINKER_AVAILABLE, CL_DEVICE_PRINTF_BUFFER_SIZE, CL_DEVICE_PREFERRED_INTEROP_USER_SYNC.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Support CL_PROGRAM_KERNEL_NAMES and CL_PROGRAM_NUM_KERNELS in API clGetProgramInfo,
and CL_DOUBLE_FP_CONFIG in API clGetDeviceInfo.
Also fix a bug of CL_MEM_HOST_PTR in API clGetMemObjectInfo.
v2:
also fix the utest get_mem_info.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
As sample LD message doesn't support array index, we have
to create a 2D array surface with the same buffer object.
Thus one 1D array image will have two surfaces binded to it
one is the index and the second is 128 + index.
And then at kernel side, we will access the corresponding
2D array surface when the LD message is required otherwise
will access the origin 1D array surface.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
|
|
creates an array of sub-devices that each reference a non-intersecting
set of compute units within in_device, according to a partition scheme
given by properties.
Reviewed-by: He Junyan <junyan.he@inbox.com>
Signed-off-by: Luo <xionghu.luo@intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Actually, the scratch size is much like the local memory size
which should be a device dependent information.
This patch is to put scratch mem size to the device attribute
structure. And when the kernel needs more than the maximum scratch
memory, we just return a out-of-resource error rather than trigger
an assertion.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Song, Ruiling <ruiling.song@intel.com>
|
|
When a kernel has __attribute__((reqd_work_group_size(X, Y, Z))) qualifier,
the kernel will only accept that group size.
v2: add binary load/store support.
v3: fix the MDNode parsing according to spir spec. It's using the following
structure rather than a tbaa tree.
!spir.functions = !f !0,!1,...,!N g
; Note: The first element is always an LLVM::Function signature
!0 = metadata !f < function signature >, !01, !02, ..., , !0i g
!1 = metadata !f < function signature >, !11, !12, ..., , !1j g
...
!N = metadata !f < function signature >, !N1, !N2, ..., , !Nk g
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
- CL_DEVICE_MAX_PARAMETER_SIZE is of type size_t
- CL_DEVICE_MAX_WORK_GROUP_SIZE is of type size_t
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Currently, there are no built-in kernels, so this function returns an empty
string.
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
This returns the library major/minor version. As it does not follow the
usual naming scheme, the output code is duplicated.
Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
This adds a pointer to the dispatch table at the beginning of every object
of type
- cl_command_queue
- cl_context
- cl_device_id
- cl_event
- cl_kernel
- cl_mem
- cl_platform_id
- cl_program
- cl_sampler
as required by the ICD specification. The layout of the dispatch table
comes from the OpenCL ICD loader by Brice Videau <brice.videau@imag.fr> and
Vincent Danjean <Vincent.Danjean@ens-lyon.org>.
To avoid dispatch table entries being overwritten with the ICD loader's
implementations of the CL functions (as would be the proper behaviour for
the ELF loader), the -Bsymbolic option is given to the linker.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
currently, a static solution based on the type of device has been implemented. This has been done for the sake of completeness. A real implementation should ideally parse the kernel, and extract this information.
Fixed issue:
A return value issue in drm_intel_bo_subdata, where different versions of the library differ in how they treat error. In one case, an rval of zero indicates success,
and in the other, it indicates failure. Fix is to remove the checking of rval entirely.
|
|
one reportory and kernels are compiled and sorted per generation
|
|
|