summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)AuthorFilesLines
2015-10-13Fix gpgpu node related bugPan Xiuli1-1/+1
This should be a typo, we should wait for the gpgpu and create node only if the batch buffer is busy. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-10-13Add -dump-opt-asm support to the clLinkProgram() APIManasi Navare1-3/+1
This will dump the Gen ASM output to the file specified in the -dump-opt-asm Link option during the Link program step. Signed-off-by: Manasi Navare <manasi.d.navare@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-10-12GBE: fix kernel arguments uploading bug.Zhigang Gong2-5/+9
After the curbe allocation refactor, not all kernel arguments will be allocated unconditional. If some kernel arguments haven't been used at all, the corresponding arguments will be ignored at backend thus we may get a -1 offset. On the runtime driver side, we need check this situation. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-24GBE: refactor curbe register allocation.Zhigang Gong3-20/+30
The major motivation is to normalize the curbe payload's allocation and prepare to use liveness information to avoid unecessary payload register allocation and avoid fragments when allocate curbe registers. For an example, for GBE_CURBE_LOCAL_ID_Y/Z, many one dimention kernels don't need them. But previous curbe allocation occurs before the liveness interval computing, thus it will allocate that curbe anyway. Altough it will be expired soon but it still need us to prepare those payload at host side. After this patch, this type of overhead has been eliminated easily. Another purpose is to eliminate the ugly curbe patch list handling in backend. After this patch, the curbe register handling is much cleaner than before. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-23Driver: fix the annoying "Failed to release userptr..." error messagePan Xiuli1-2/+4
It is a drm related bug. As the drm driver changed the time to free their test userptr to bufmgr destroy(30921483c70c6939f017476eac13da6aa26b3b3c), we need anothr order to release our driver to make sure the test userptr can be freed with a valid fd. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-23Calculate appropriate timestamps for cl profileMidhun Kodiyath3-4/+71
Fix to calculate the current cpu monotonic raw timestamp in nanoseconds for enqueued,submitted,start and finshed and send this to application based on the parameter queries. Signed-off-by: Midhun Kodiyath <midhunchandra.kodiyath@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-22enable create image 2d from buffer in clCreateImage.Luo Xionghu7-29/+99
this patch allows create 2d image with a cl buffer with zero copy. v2: should use reference to manage the release the buffer and image. After being created, the buffer reference count is 2, and image reference count is 1. if image is released first, decrease the image reference count and buffer reference count both, release the bo when the buffer is released at last; if buffer is released first, decrease the buffer reference count only, release the buffer when the image is released. add CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT in cl_device_info. v3: move is_image_from_buffer to _cl_mem_image; return CL_INVALID_IMAGE_SIZE if image size is larger than the buffer. v4: pitchalignment set to 2. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-09-22return 32 could gain 0.2% performance on opencv optical flow case.Luo Xionghu1-1/+1
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-09-21should check the return value of cl_program_new.Luo Xionghu1-0/+18
catch the error: out of host memery. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-21Fix clLinkProgram error.Yang Rong2-16/+29
All programs or none programs specified by input_programs contain a compiled binary or library for the device. Otherwise return CL_INVALID_OPERATION. Correct this condition check. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Luo, Xionghu <xionghu.luo@intel.com>
2015-09-18Don't use cl_buffer_get_subdata in clEnqueueReadBuffer.Yang Rong1-1/+4
cl_buffer_get_subdata sometime is very very very slow in linux kernel, in skl and chv, and it is random. So temporary disable it, use map/copy/unmap to read. Should re-enable it after find root cause. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Luo, Xionghu <xionghu.luo@intel.com>
2015-09-18Fix piglit clLinkProgram fail.Yang Rong4-3/+24
1. return CL_INVALID_LINKER_OPTIONS when invalid options, using clang to check the options. 2. return CL_INVALID_OPERATION when the binary type is not same. 3. When link fail, will not return CL_LINK_PROGRAM_FAILURE, fix it. 4. Should not delete program in genProgramBuildFromLLVM, the program is new and delete from runtime. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Luo, Xionghu <xionghu.luo@intel.com>
2015-09-07Runtime: Add NULL pointer check in clGetKernelArgInfoJunyan He1-1/+2
There is no NULL pointer check for kernel->program->build_opts. This will cause utest test_get_arg_info crash. In fact, we will add -cl-kernel-arg-info flag for compiling ever time, and so the arg info is always avaible. But some test case deliberately unset this flag and expect the ERR return value, so we really need a check here. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-02Fix clGetKernelArgInfo fail on piglitPan Xiuli2-9/+13
1.Change the code for null param_value 2.Add the return value check for build option "-cl-kernel-arg-info" 3.Correct one return value typo Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-08-27remove GBE_CURBE_STACK_POINTER in payloadGuo Yejun1-9/+0
initialize the data inside kernel with packed integer vector V2: call functions from ctx, instead of ctx.registerAllocator Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-08-13backend, src: Add ASM file name to gbe_program_new_from_llvmLaura Ekstrand1-1/+1
Part of the plumbing that passes the ASM file name from the compiler options level down to the emitCode level so that the assembly can be written to that file Signed-off-by: Manasi Navare <manasi.d.navare@intel.com> Signed-off-by: Laura Ekstrand <laura.d.ekstrand@intel.com> Reviewed-by: Song, Ruiling <ruiling.song@intel.com>
2015-08-13Set proper Vendor IDMidhun Kodiyath6-27/+30
Device ID and vendor ID are not same.Set the correct vendor ID. Signed-off-by: Midhun Kodiyath <midhunchandra.kodiyath@intel.com> Reviewed-by: Song, Ruiling <ruiling.song@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-08-13generate sub_group_id inside kernel instead of payloadGuo Yejun1-8/+0
get_sub_group_id ranges at [0, 7] for SIMD8 and [0, 15] for SIMD16, previously we set up the values in kernel payload, now change it to generate the values inside kernel with packed integer vector. v2: encapsulate into a function so that others can get the lane id easily. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-08-10runtime: always try to update event status in clGetEventProfilingInfo().Zhigang Gong1-0/+1
Some applications forgot to call clWaitForEvents() before calling to clGetEventProfilingInfo(). Let's update the event's status here. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-08-10Update last event status in clFinish.Luo Xionghu1-0/+3
The event should have been finished after clFinish, update the event status. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-08-10runtime: add detailed broadwell device name.Ruiling Song1-6/+6
Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-08-10runtime: add Broadwell deviceID 0x162BRuiling Song2-1/+5
Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-07-17Runtime: correct event and the wait events compare when check event.Yang Rong1-1/+1
When the event parament is not NULL, the event will point to a new event, so need to check address of the event and the wait events. V2: check the address of the event and the wait events. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2015-07-17Remove deprecated function cl_context_get_static_kernel().Zhigang Gong2-66/+12
Also fix a spelling bug - s/internel/internal. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-07-17runtime: fix a builtin-kernel related thread safe bug.Zhigang Gong2-8/+26
This patch fixed two thread-safe bugs in the builtin-kernel usage code path. 1. The builtin kernel array itself need to be protected. 2. Each caller need to get a dup of the builtin kernel, rather than share the same kernel structure. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-07-17runtime bug: brw GT3 devices reported to GT2.Luo Xionghu1-5/+5
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-07-17Reorder GBE_BIN_GENERATER arguments.Koop Mast1-2/+2
Basically, it boils down to a difference in getopt(3). The getopt(3) on (Free)BSD will exit parsing arguments at the first unknown argument. Signed-off-by: Koop Mast <kwm@rainbow-runner.nl> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-07-15Need to check eventWaitList in clEnqueueNDRangeKernel.Zhigang Gong1-0/+1
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-07-15Fixed a thread safe bug.Zhigang Gong6-18/+56
last_event and current_event should be thread private data. Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-07-15runtime: Need to separate atomic in L3 test and SLM test in self_test().Zhigang Gong1-3/+12
On HSW, if we use default 4.0.x kernel without the i915.enable_ppgtt=2 boot argument, then the atomic in L3 will not work and the SLM will not work neither. We need to test atomic in L3 firstly, if it fails we need to test SLM again. Otherwise, beignet will not give any error/warning information for both atomic in L3 and SLM not working case. v2: shold set the atomic test result before the second round SLM test. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com>
2015-07-14Runtime: Add default extension for platforms before BDW.Junyan He3-0/+14
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-07-06runtime: Add cl device's standalone extension.Junyan He4-22/+24
The cl device may have different extensions from the platform. We will add some items based on the platform extensions. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-07-06runtime: Use cl_get_platform_default to replace global value.Junyan He6-68/+66
The init order of the intel_platform and the intel_extension is somehow not clear. When some API such as clGetDeviceIDs can pass NULL as cl_platform_id, we just use the global value intel_platform as the default but do not care about the init state of the extension. The init of the extension may be done when the cl device is created. This is OK if the paltform and the device have the same extensions. But now because of the fp16, they are not always the same. Use cl_get_platform_default to replace the global value to ensure that when default platform is available, the extension is also inited. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-07-03enable CL_UNSIGNED_INT8 for CL_RG to fix regressionGuo Yejun1-0/+1
the regression is caused when only enable CL_UNORM_INT8 for CL_RG, the reason is that during the image copy implementation with internal kernel, all formats are considerd as integer format, it becomes unknown since CL_UNSIGNED_INT8 is not enabled yet. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-07-02runtime: Add fp16 extension to BDW later platform.Junyan He6-56/+102
Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-06-30use self test to determine enable/or disable atomics in L3 for HSW.Luo Xionghu9-23/+66
check the selftest kernel return value, if enqueue kernel failed, set the flag to not enable atomics the L3 for HSW. This reverts commit 83f8739b6fc4893fac60145326052ccb5cf653dc. v2: don't use global variable to pass value from runtime to driver. v3: add type SELF_TEST_OTHER_FAIL to differentiate from SELF_TEST_ATOMIC_FAIL; seperate the ATOMIC_FAIL from SLM_FAIL, only SLM_FAIL can be control by env OCL_IGNORE_SELF_TEST. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang, Rong <rong.r.yang@intel.com>
2015-06-19enable CL_RG + CL_UNORM_INT8 for imageGuo Yejun1-0/+5
Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang, Rong <rong.r.yang@intel.com>
2015-06-18fix global variable out of boundary writing in libocl.Luo Xionghu1-1/+1
need minus one when fill '\0' to sizeof char type array. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-05-18Docs: update/clarify Haswell issuesRebecca N. Palmer1-5/+9
Reflect recent beignet and Linux changes. Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-05-18Add a sanity test in clGetDeviceIDsRebecca N. Palmer1-0/+82
Run a small __local-using kernel in clGetDeviceIDs; if this returns the wrong result, return CL_DEVICE_NOT_FOUND. As far as I can see, there's no way to tell in advance (except unreliably with a global version check) whether __local-using batches will be accepted...so the easiest solution is probably to just try running one and see what result we get. Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com> Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-05-12rename __gen_ocl_get_simd_id/size to get_sub_group_id/sizeGuo Yejun1-1/+1
Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com> Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-05-04Revert "CL/Driver: enable atomics in L3 for HSW."Zhigang Gong2-14/+1
This issue should be fixed in KMD in the future. Let's revert it here. This reverts commit ef7127c03bd533277afc443b335c37a69927250a.
2015-05-04add benckmark for copy data from buffer to image.Luo Xionghu1-1/+1
v2: use random input data; update comments. v3: change the image attribute to __write_only. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Weng, Chuanbo" <chuanbo.weng@intel.com>
2015-05-04Optimization of clEnqueueCopyBufferToImage for 16 aligned case.Luo Xionghu4-9/+56
We can change the image_channel_order to CL_RGBA and image_channel_data_type to CL_UNSIGNED_INT32 for some special case, thus 16 bytes can be read by one work item. Bandwidth is fully used. v2: merge patch 3 of initializing region0; remove k dimension in kernel for 2d image. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Weng, Chuanbo" <chuanbo.weng@intel.com>
2015-05-04CHV: Add cherryview support in the runtime.Meng Mengmeng4-3/+56
Cherryview's EU configurations is not decided by pciid, must get from kernel by libdrm. Thanks for Jeff adding this support in the kernel and libdrm. V2: Add the warning when can't get configurations. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-04-24add simd level function __gen_ocl_get_simd_idGuo Yejun1-0/+8
uint __gen_ocl_get_simd_id(); return value ranges from 0 to simdsize - 1 V2: use function sel.selReg to refine code V3: correct the uniform condition in liveness.cpp Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-14runtime: don't try to open nonexistent render nodes or device files.Zhigang Gong1-1/+4
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-13GBE: refine error handling for private libva buffer sharing extension.Zhigang Gong2-0/+11
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-13GBE: Extend front label ip to 32 bit on demand.Zhigang Gong1-4/+12
If the front end label ip exceed 0xffff, then the backend will use real DW to represent each block's IP address. This is a dynamic behaviour according to the actual front end's label number. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-10runtime: Enhance the error handling when flush gpgpu command queue.Zhigang Gong9-30/+30
Beignet uses drm_intel_gem_bo_context_exec() to flush command queue to linux drm driver layer. We need to check the return value of that function, as it may fail when the application uses very large array. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>