summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)AuthorFilesLines
2015-01-29SKL: Add function intel_gpgpu_bind_image_gen9.Yang Rong4-6/+90
SKL's qpitch is difference with BDW. And SURFTYPE_1D's qpitch means distance in pixels between array slices. So add two parameters slice_pitch and bpp to calculate it.
2015-01-29SKL: add skl select_pipeline and cache_control functions.Yang Rong2-2/+25
The skl's cache control field in the surface state changed index to the pre-defined registers. Because index 9 is what beignet need, use it directly. Skl's select_pipeline command need the mask, add intel_gpgpu_select_pipeline_gen9 for it. Signed-off-by: Yang Rong <rong.r.yang@intel.com>
2015-01-29SKL: Add the function gen9' intel_build_idrt.Yang Rong2-3/+48
Correct stuct gen8_interface_descriptor. Add function intel_gpgpu_build_idrt_gen9 for difference slm size setting. Disable skl's global barrier now.
2015-01-29SKL: correct the pipe control struct.Yang Rong2-2/+77
From BDW, pipe control need 6 DW, correct it. Also affect BDW. Signed-off-by: Yang Rong <rong.r.yang@intel.com>
2015-01-29SKL: Use TILE_Y as default TILING mode in skl.Yang Rong1-1/+2
3D Image can't use TILE_X in skl so change to default TILING MODE to TILE_Y. Signed-off-by: Yang Rong <rong.r.yang@intel.com>
2015-01-29SKL: enable skl device.Yang Rong3-3/+73
Add the intel_gpgpu_set_base_address_gen9 for SKL, the other functions are same as BDW in intel_GPGPU. And the SKL's backend just same as BDW. Should derive from GEN8 later. With this commit, some utests pass.
2015-01-29SKL: Add skl pci ids and device.Yang Rong2-3/+164
SKL add the new GT4 type device.
2015-01-23loose the alignment limitation for host_ptr of CL_MEM_USE_HOST_PTRGuo Yejun3-4/+22
the current limitation is both host_ptr and buffer size should be page aligned, loose the limitation of host_ptr to be cache line size (64byte) alignment, and no limitation for the size. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-23correct the cache line size to be 64Guo Yejun2-2/+2
the correct value of cache line size is 64 bytes, not 128. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-13Fix the printf buffer size bug.Junyan He4-5/+9
We can not know the accurate size of the printf buffer size before run the kernel. Sometimes, especially when the global work items size is huge, the output buffer is not enough and the print message logic will cause the segment fault. We increase the printf buffer to 16M at most and add out of range check to avoid crash. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-12add CMake option USE_STANDALONE_GBE_COMPILER and STANDALONE_GBE_COMPILER_DIRGuo Yejun1-2/+3
At some platforms with old c/c++ environment, C++11 features are not supported, it results in the failure to build the gbe compiler part which depends on LLVM/clang using C++11 features. The way to resolve is to build a standalone gbe compiler within another feasible system, and build beignet with the already built standalone gbe compiler by setting USE_STANDALONE_GBE_COMPILER=true. The path of the standalone compiler is /usr/local/lib/beignet as default or could be specified by STANDALONE_GBE_COMPILER_DIR. Once USE_STANDALONE_GBE_COMPILER is given, all the gbe compiler relative code will not be built any longer, only libcl.so and libgebinterp.so are built. And libcl.so is special for GEN_PCI_ID, which is queried from the building machie or could be specified as CMake option. v2: separate the CMake option name. update the commit comments. add back the script for gen pci id, and build driver with it. v3: add file FindStandaloneGbeCompiler.cmake to make the main cmakefile clean. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09CL/Driver/HSW: Convert L3 cycle for texture to uncachable.Zhigang Gong1-1/+1
This is to workaround a bug we found with darktable. After this patch, darktable could work fine on HSW. And based on the test result, most of the benchmarks haven't been affected much by this patch. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
2015-01-07CL/Driver: quick fix regression caused by remove MI_FLUSH.Zhigang Gong1-0/+2
On Gen8, we also need an extra pipe control after the MEDIA_STATE_FLUSH. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-01-07CL/Driver: enable atomics in L3 for HSW.Zhigang Gong2-1/+14
This could get more than 10x boost for some atomic stress workloads. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-01-06Remove obsolete MI_FLUSHZhenyu Wang4-11/+3
This is caught in emulator debug that MI_FLUSH is obsolete from IVB/HSW and beignet used wrong flush bit too, so don't go risk but remove it. Current kernel would take care to flush ring after each request, so shouldn't need extra flush. Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-01-04runtime: fix max work group size for IVBGT1.Zhigang Gong1-2/+2
If the kernel is compiled under simd8 mode, the maximum work group size should be 8 * 6 * 6 = 288. The original 512 is too large for it. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
2015-01-04runtime: tweak max memory allocation size.Zhigang Gong2-2/+12
Increase the maximum memory allocation size to at least 512MB and will set it to larger if the system has more total memory. This tweak will make darktable happy to handle big pictures. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> v2: reduce max constant buffer to 128MB. v3: fix the sysinfo usage. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
2014-12-29Separate flush and invalidate in function intel_gpgpu_pipe_control.Yang, Rong2-2/+36
HSW has a limitation when PIPECONTROL with RO Cache Invalidation: Prior to programming a PIPECONTROL command with any of the RO cache invalidation bit set, program a PIPECONTROL flush command with CS stall bit and HDC Flush bit set. So must use two PIPECONTROL commands to flush and invalidate L3 cache in HSW. This patch fix some random fails which has very heavy DC read/write in HSW. Signed-off-by: Yang, Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-12-29Use libdrm interface to get device idZhenyu Wang2-22/+2
Remove own ioctl call for device id but use libdrm interface instead. This not only saves one extra ioctl call as it's already been read when gem bufmgr inits, and also would allow to override device id with libdrm helper environment 'INTEL_DEVID_OVERRIDE'. To combine with aub dump, you can do device debugging with fulsim emulator by choosing any device you want and don't need hw metal at all. Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-12-29Add aub dump supportZhenyu Wang1-1/+16
Use current libdrm interface to dump aub file for debug in emulator. This adds new driver environment of OCL_DUMP_AUB=1 to enable this. Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-12-25Remove deprecated fulsim codeZhenyu Wang3-237/+1
Remove pretty old fulsim code which seems having no users also used interfaces not in open source libdrm, and call windows fulsim binary instead of linux. We will use current libdrm interface instead. Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-12-25fix min_max_read_image_args and min_max_parameter_size issue.Luo Xionghu6-6/+7
this patch revert fb4bced99b7c08d0d43386abf33448860fb7fc41 as the spec defined the min_max_parameter_size's min value is 1024; the BTI_MAX_NUM and btiBase could be 130 because of 128 images with 1 const surface and 1 private surface. v2: add BTI_MAX_READ_IMAGE_ARGS and BTI_MAX_WRITE_IMAGE_ARGS in backend. change the BTI_MAX_ID to 253. the image numbers will be calculated in later patch and check its limitation. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-12-23fix max_parameter_size not correct on x86 platforms.Luo Xionghu2-2/+2
this value should depend on the pointer size according to the system. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-12-18GBE/CL: use 2D image to implement large image1D_buffer.Zhigang Gong4-12/+50
Per OpenCL spec, the minimum CL_DEVICE_IMAGE_MAX_BUFFER_SIZE is 65536 which is too large for 1D surface on Gen platforms. Have to use a 2D surface to implement it. As OpenCL spec only allows the image1d_t to be accessed via default sampler, it is doable as it will never use a float coordinates and never use linear non-nearest filters. Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2014-12-18GBE: switch to CLANG native sampler_t.Zhigang Gong1-7/+5
CLANG has sampler_t support since LLVM 3.3, let's switch to that type rather than the old hacky way. One major problem is the sampler static checking. As Gen platform has some hardware restrication and if the sampler value is a const defined at kernel side, we need to use the value to optimize the code path. Now the sampler_t becomes an obaque type now, the CLANG doesn't support any arithmatic operations on it. So we have to introduce a new pass to do this optimization. v2: fix comments. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2014-12-03Change CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR from 8 to 16.Chuanbo Weng1-1/+1
Because accessing global memory by uchar16/char16 will fully utilize memory bandwidth, so change CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR from 8 to 16. Three OpenCV cases will speedup from this patch: OCL_ThreshFixture_Threshold, 25% improvement OCL_MaxFixture_Max, 105% improvement OCL_MinFixture_Min, 105% improvement. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-12-02enable CL_MEM_ALLOC_HOST_PTR with user_ptr to avoid copy between GPU/CPUGuo Yejun3-16/+33
when user ptr is enabled, allocates page aligned system memory for CL_MEM_ALLOC_HOST_PTR inside the driver and wraps it as GPU memory to avoid the copy between GPU and CPU. and also do some code refine for the relative user_ptr code. tests verified: beignet/utest, conformance/basic, buffers, mem_host_flags Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2014-12-02clean code, the logic is already at the beginning of functionGuo Yejun1-16/+0
Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2014-12-02Fix based on piglit OpenCL falied case (cl-api-compile-program).Yan Wang1-4/+2
1. Return the expected error code. 2. Don't destroy cl_program object after comile error because it may be used still in the future. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-11-28fix issue to pass utest of runtime_climage_from_boname for BDWGuo Yejun1-2/+2
To create cl image from bo name with offset, the offset needs to be added into surface_base_addr_lo/hi. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Tested-by: "Zhu, BingbingX" <bingbingx.zhu@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-11-27fix issue to create cl image from libva with non-zero offsetGuo Yejun4-7/+5
Beignet accepts buffer object name to share data between libva, it supports to create cl image from the bo name with a non-zero offset, but it does not work at some platforms. The driver calls intel_bo_gem_create_from_name to retrieve the dri_bo, and the offset of dri_bo is changed by the non-zero offset. At some platforms, the change of the offset has side effect when the kernel is executed again and so intel_bo_gem_create_from_name is called for the second time. So, do not change the offset of dri_bo, but maintain the non-zero offset in cl_image, and maintain the non-zero offset until we write the surface state into batch buffer. V2: correct the offset parameter passed to dri_bo_emit_reloc Signed-off-by: Guo Yejun <yejun.guo@intel.com>
2014-11-24Change the IVB/HSW's max_work_group_size to 512, and BYT to 256.Yang Rong1-15/+15
To decide the kernel's work group size, application should get CL_DEVICE_MAX_WORK_GROUP_SIZE first, and then get the CL_KERNEL_WORK_GROUP_SIZE after clBuildProgram. But some application only check the CL_DEVICE_MAX_WORK_GROUP_SIZE, and if kernel run simd8 mode or other cause, may exceed the CL_KERNEL_WORK_GROUP_SIZE. So change to CL_DEVICE_MAX_WORK_GROUP_SIZE to the minimum CL_KERNEL_WORK_GROUP_SIZE. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-11-21Fix the opencv_test_core/OCL_Arithm random segment fault.Yang Rong1-37/+36
If call cl_event_delete before call back, then event will be deleted if application release event in the call back. So must move the cl_event_delete at the last. V2: V1 will not delete event if not user event, also need delete it. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-11-21BDW: Change the default tiling mode to TILING_Y on BDW.Yang Rong1-3/+7
TILING_Y's performance is better than TILING_X'S on BDW, but almost same on IVB/HSW. Using the TILING_Y as default tiling mode temporary, still need to find out the root cause why different behavior between BDW and IVB/HSW. V2: still using static and only initialize once. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-11-19Fix NO_TILING alignment bug.Yang Rong1-1/+1
Also need align height when CL_NO_TILING. This patch can fix some tiling_y error. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-11-19re-enable userptr with fix: CPU access after GPU finishes the renderingGuo Yejun2-8/+35
1. the wait logic is integrated into function cl_mem_map/unmap_auto 2. use cl_mem_map/unmap_auto for userptr inside clEnqueueRead/WriteBuffer 3. do not use cl_buffer_subdata for userptr, use cl_mem_map/memcpy instead Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-11-18Change the IVB/HSW L3 SQC credit setting.Yang Rong1-2/+2
Set the L3SQ General Priority Credit to max, and L3SQ High Priority Credit to zero, it can slightly improve the performacne, about 2% of luxmark. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-11-18Remove patch version on master branch.Zhigang Gong1-1/+0
Master branch is for the next major release. 1.0.x series will be maintained on Release_v1.0 branch. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2014-11-13Fix the bug of multi-thread crashJunyan He1-13/+0
The cl_thread has a potential problem. If the threads are created and destroyed very fast, while the queue remain avaible, the resource of destroyed thread will not be free correctly and will be wrongly reused by later created thread. V2: Use a easy way to handle this case. We do not clear the resource and just keep it. The later thread will not wrongly reuse it. The thread number will not be very huge, so it is reasonable to clear all the resource when the command queue is destroyed. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-11-13runtime: fix bug in cl_enqueue_read_buffer.Zhigang Gong1-3/+8
If the buffer is a userptr buffer, we should copy it directly. Otherwise, it fails in libdrm. As drm_intel_gem_bo_subdata() refuses to read a userptr buffer object. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com>
2014-11-13runtime: refine version handling.Zhigang Gong2-0/+5
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2014-11-12runtime: fix one bug in BDW image.Zhigang Gong1-2/+4
As we still have the image 1d array workaround, we need to fix it for BDW as well. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Junyan He <junyan.he@linux.intel.com>
2014-11-12Revert "BDW: Change the default tiling mode to TILING_Y on BDW."Zhigang Gong1-8/+4
This reverts commit f2c57a46de4f51fa5d4c8e02cc751fce7ff417c8.
2014-11-11License: adjust all license version to LGPL v2.1+.Zhigang Gong57-57/+57
To make the license statement consistent to each other, adjust all license versions to v2.1+. Thus beignet should have a pure LGPL v2.1+ license. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2014-11-11Revert "fix issue to create cl image from libva with non-zero offset"Zhigang Gong4-7/+9
We found this patch cause some serious regressions. Considering it is not part of the OCL standard API, we choose to revert it for 1.0 release. This reverts commit b6660fa343e4e80231123695834cc24e3fc5487b.
2014-11-10use posix_memalign instead of aligned_alloc to be more compatibleGuo Yejun1-7/+11
At some systems, function aligned_alloc is not supported. From Linux Programmer's Manual: The function aligned_alloc() was added to glibc in version 2.16. The function posix_memalign() is available since glibc 2.1.91. V2: add check for return value of posix_memalign Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-11-10BDW: Change the default tiling mode to TILING_Y on BDW.Yang Rong1-4/+8
TILING_Y's performance is better than TILING_X'S on BDW, but almost same on IVB/HSW. Using the TILING_Y as default tiling mode temporary, still need to find out the root cause why different behavior between BDW and IVB/HSW. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-11-10fix issue to create cl image from libva with non-zero offsetGuo Yejun4-9/+7
Beignet accepts buffer object name to share data between libva, it is supposed to support to create cl image from the bo name with a non-zero offset, but it does not work at some platforms. The driver calls intel_bo_gem_create_from_name to retrieve the dri_bo, and the offset of dri_bo is changed by the non-zero offset. At some platforms, the change of the offset has side effect when the kernel is executed again and so intel_bo_gem_create_from_name is called for the second time. So, do not change the offset of dri_bo, but maintain the non-zero offset in cl_image, and use the non-zero offset until we fill the surface state. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-11-10fix a bug in clCompileProgram().Luo Xionghu1-0/+4
passing a binary program to clCompileProgram() should return CL_INVALID_OPERATION. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-11-10fix piglit clCreateProgramWithBinary fail.Luo Xionghu1-0/+8
the program should be deserialized and loaded when created from a EXECUTABLE binary. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>