~hejunyan/beignet - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2015-12-23	Driver: Fix GPGPU delete bug	Pan Xiuli	1	-2/+2
	The first patch 192feb51 has something wrong in rebase and takes new bug in. Now fix both the original bug and revert the wrong patch. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-23	Runtime: because double's built-ins haven't completely support, so disable ↵	Yang Rong	1	-0/+21
	it by default. Add a cmake option for it, cmake with option -DEXPERIMENTAL_DOUBLE=true to enable it. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
2015-12-21	add support for build option -cl-fast-relaxed-math	Guo Yejun	1	-1/+1
	current code handles this option at clang level, actually, it is also necessary at LLVM -> GEN stage. V2: check if options is NULL Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14	Runtime: Add the threadid calculation for curbe.	Junyan He	1	-1/+11
	Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-14	add Broxton support	Guo Yejun	3	-7/+41
	special versions of linux kernel and libdrm are needed. utest and conformance test PASSED. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Junyan He <junyan.he@linux.intel.com>
2015-12-10	runtime: add macro DEBUGP() to handle debug printf.	Ruiling Song	2	-3/+9
	It will output debug message under debug mode, and will not output under release mode. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09	SKL: Use kernel-defined MOCS values instead of assuming hardware defaults.	Francisco Jerez	1	-2/+2
	Reported to fix fix a ~50% performance regression (in OpenCV 3.0 and Luxmark 2.1 among others) with v4.3 kernels on Gen9 hardware. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92975 Signed-off-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09	fix gcc build error.	Luo Xionghu	2	-2/+2
	this link fail appears on gcc 5.2.1. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09	runtime: add missing supported format image_1d_buffer.	Luo Xionghu	1	-0/+1
	Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09	runtime: fix clLinkProgram bug.	Luo Xionghu	1	-0/+10
	clLinkProgram need check the existence of "-cl-kernel-arg-info" build_option of all the input_programs. User may link two SPIR program and call clGetKernelArgInfo to query kernel args. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09	runtime: fix clCompileProgram bug.	Luo Xionghu	3	-13/+6
	forgot to add FROM_LLVM_SPIR in compileProgram; the BINARY_TYPE is BINARY_TYPE_INTERMIDIATE if create from SPIR binary. v2: refine the source_type logic: source_type is already set in clCreateProgramWithSource or clCreateProgramWithBinary, shouldn't be set in clBuildProgram or clCompileProgram. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09	runtime: initialize the memory content to 0.	Luo Xionghu	1	-0/+1
	v2: move the memset inside the pointer check. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09	make Beignet as intermedia layer of CMRT	Guo Yejun	14	-14/+491
	CMRT is C for Media Runtime on Intel GPU, see https://github.com/01org/cmrt. There is a request to make Beignet as intermedia layer of CMRT, in other words, application programer write OpenCL APIs to execute the CM kernel on GPU, the following shows the key code, and please refer to the next patch of unit test for detail. prog = clCreateProgramWithBinary("cm kernel"); clBuildProgram(prog); kernel = clCreateKernel(prog, "kernel name"); image = clCreateImage(); clSetKernelArg(kernel, image); clEnqueueNDRangeKernel(kernel); Inside Beignet, once cm kernel is invoked, the following relative APIs will be directly passed to CMRT library (libcmrt.so) which is loaded via dlopen only when necessary. Since we use this simple method to keep the code clean, OpenCL spec is not strictly followed, and cl_event is not supported for this case. v2: add comments about the cm queue in fuction cmrt_enqueue Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-26	check image from buffer's base address alignment.	Luo Xionghu	1	-0/+9
	per spec, if create image from USE_HOST_PTR buffer, the buffer's base address need be aligned. v2: return error code CL_VALID_IMAGE_FORMAT_DESCRIPTOR. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-25	runtime: set CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE to kernel's ↵	Zhigang Gong	5	-6/+13
	SIMD_WIDTH. It makes sense to set CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE to the corresponding SIMD size. Then it provides a way for intel's OCL application to get SIMD width at runtime and make some SIMD width dependant optimization possible. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-11-19	Runtime: return the correct error code in cl_event_check_waitlist.	Yang Rong	1	-2/+4
	Return CL_INVALID_CONTEXT if the context associated with command_queue and events in event_wait_list are not the same. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Luo Xionghu <xionghu.luo@intel.com>
2015-11-17	Runtime: Bind the profiling buffer when profiling enabled.	Junyan He	6	-1/+126
	Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-17	Add profiling info APIs to runtime.	Junyan He	2	-0/+18
	Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-17	CMake: Add -lrt to the link command of libcl.so	Junyan He	1	-0/+1
	The clock_gettime will cause the linkage error on some version of GCC, we need to add -lrt at the end of the link command line. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-11-17	Full support of cl_intel_motion_estimation extension.	Chuanbo Weng	1	-33/+166
	The following items are supported in this commit: 1. Return residuals. 2. All types of mb_block_type, subpixel_mode, sad_adjust_mode in cl_motion_estimation_desc_intel. After this commit, cl_intel_motion_estimation is fully supported. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-11-11	runtime: extension size not enough.	Luo Xionghu	3	-3/+10
	define a MACRO to hold the value. v2: use same MACRO in cl_extensions.h; add header file protection for cl_extension.h. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-11-10	Add extensions intel_accelerator and basic intel_motion_estimation.	Chuanbo Weng	21	-33/+914
	v2: 1. Just upload the first vme_state. 2. Remove duplicated code in check_opt1_extension. 3. Check image format before cl_gpgpu_bind_image_for_vme. 4. Fix error of getting mv. Because we suppose this kernel run in SIMD16 mode, so dword 0 of grf 1 should be __gen_ocl_region(8,vme_result.s0), not __gen_ocl_region(0,vme_result.s1). v3: Return CL_IMAGE_FORMAT_NOT_SUPPORTED if image format is not the required one. v4: Fix two conflicts after code rebase and wordaround a curbe related bug. v6: Treat simd8 and simd16 differently when getting mv. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-11-10	Add extension clCreateImageFromFdINTEL to create cl image by external fd.	Chuanbo Weng	6	-0/+125
	Before this patch, Beignet can only create cl image from external bo by its handle using clCreateImageFromLibvaIntel. Render node is the first choice of accessing gpu in currect Beignet implementation. DRM_IOCTL_GEM_OPEN is used by clCreateBufferFromLibvaIntel but forbidden in Render node mode. So it's necessary to add this extension to support buffer sharing between different libraries. v2: Seperate clCreateMemObjectFromFdIntel into two extensions: clCreateBufferFromFdINTEL and clCreateImageFromFdINTEL. v3: Set depth of _cl_mem_image to 0 because it's CL_MEM_OBJECT_IMAGE2D type. Fix rebase conflict: add a parameter when invoke cl_mem_allocate. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-11-10	Add extension clCreateBufferFromFdINTEL to create cl buffer by external ↵	Chuanbo Weng	6	-3/+93
	buffer object's fd. Before this patch, Beignet can only create cl buffer from external bo by its handle using clCreateBufferFromLibvaIntel. Render node is the first choice of accessing gpu in currect Beignet implementation. DRM_IOCTL_GEM_OPEN is used by clCreateBufferFromLibvaIntel but forbidden in Render node mode. So it's necessary to add this extension to support buffer sharing between different libraries. v2: Seperate clCreateMemObjectFromFdIntel into two extensions: clCreateBufferFromFdINTEL and clCreateImageFromFdINTEL. v3: Fix rebase conflict: add a parameter when invoke cl_mem_allocate. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-11-04	set the pitch of image from buffer to the buffer's pitch.	Luo Xionghu	1	-1/+6
	if image from buffer, the image's pitch should be same with buffer bo's row pitch. v2: correct style. image from buffer need update both aligned_pitch and aligned_h, while image from user ptr only set aligned_pitch, so just keep them independently. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo Yejun <yejun.guo@intel.com>
2015-11-03	fix regression issue for climage + uesrptr	Guo Yejun	1	-1/+2
	userptr requires the exact same memory layout between cpu and gpu, since the current implementation uses the value of row_pitchh, ignoring the slice_pitch provided by the application. so, enable userptr only if slice_pitch == row_pitchh for image3d, 2darray and 1darray. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-11-03	runtime: dynamically get global memory size and max alloc size	Pan Xiuli	2	-4/+20
	Now device and driver can support bigger memory, we need to abandon our old 2G hard code. We get global memory by considering device limitation, drm driver and kernel support and raw, this will ensure a bigger global memory and a more stable system. We get max mem alloc size from global memory size and the device limition. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
2015-11-03	driver: add setup_bti_gen9 for bigger buffer up to 4G	Pan Xiuli	1	-2/+39
	Now gen9 can support bigger buffer size, and it can also support 4G global memory. We add new function to support it. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
2015-11-03	runtime: refine the cl_device_id to support bigger memory	Pan Xiuli	6	-11/+73
	Now gen8 and gen9 support 4G global memory, and gen9 support 4G single buffer. Need to move the global_mem and max_mem_alloc size into each define header. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
2015-11-03	drivers: change the buf size to size_t	Pan Xiuli	2	-9/+12
	The uint32_t size is not enough for coming bigger gpu memory, now GEN9 support 4G buffer. Also add assertion for invalid size. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
2015-10-27	Runtime: Refine ext enable function for platform.	Junyan He	3	-19/+54
	We enable fp64 extension just on BDW platform. The platforms before Gen7 will not have fp64 support. We will enable fp64 on gen8 later platforms after this feature is stable. V3: Unify the extersion setting for FP16 and FP64. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-10-21	Runtime: add CL_DEVICE_SPIR_VERSIONS to clGetDeviceInfo.	Yang Rong	3	-0/+4
	Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Luo Xionghu <xionghu.luo@intel.com>
2015-10-21	use table to define and query binary headers.	Luo Xionghu	2	-26/+39
	currently, we support create program from 4 types of binary: SPIR(BITCODE), LLVM Compiled Object, LLVM Library and GEN Binary. The detailed formats are listed in code. also use table to match or fill gen binary header in backend. v2: use enum to replace the magic number. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-10-20	add conditions of pitch and h to enable userptr for climage_use_host_ptr	Guo Yejun	1	-1/+4
	this regression issue is reported from conformance test, to enable userptr for climage + use_host_ptr, the memory layout between the host_ptr (for CPU) and drm bo (for GPU) must be the same. it means bo's row pitch should be the same as image's row pitch, and h should be the same as aligned h. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-10-20	pitchalignment should be set to 1.	Luo Xionghu	1	-1/+1
	fix case precesion fail: opencv_test_video/OCL_Video/PyrLKOpticalFlow. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-10-20	Revert "return 32 could gain 0.2% performance on opencv optical flow case."	Luo Xionghu	1	-1/+1
	This reverts commit 729b16fdb387437f97115e938745ab1135151553. ./opencv_test_imgproc --gtest_filter=OCL_Imgproc/CLAHETest.* failed due to this patch. Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-10-20	alignment of NO TILING surface limitation shouldn't be removed.	Luo Xionghu	1	-3/+4
	it could cause sampler data mismatch on IVB. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-10-14	enable USE_HOST_PTR for cl image with userptr to avoid extra copying	Guo Yejun	3	-18/+44
	the pointer must be 64 byte aligned, and only when w,h equals to its aligned value, otherwise, roll back to the old method with extra copying. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-10-14	refine code to separate the usage of data and image2d_from_buffer	Guo Yejun	3	-26/+38
	currently, 'void* data' has two meanings: the pointer from application, and the buffer to create image2d from. It makes the code a little complex when supporting userptr for image. So, add a new function parameter to separate the two meanings. V2: fix when HAS_USERPTR is not enabled Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: xionghu.luo@intel.com
2015-10-13	Fix a event leak in create context	Pan Xiuli	1	-0/+1
	We get an event out of NDRangeKernel, and we don't release it. As an gpgpu event it can also make drm buffer leak, to avoid potenial error we just release it.w Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-10-13	runtime: refine the last_event in queue to a list	Pan Xiuli	3	-27/+55
	Refine the event struct to make last_event become a list to store all uncompeleted events and update them every queue flush. This can make sure all events created in the runtime have a chance to update status and run callback functions and then be deleted. We will also fix the memory leak problem casued by uncompeted events. This is a bugfix for https://bugs.freedesktop.org/show_bug.cgi?id=91710 The leaked events with gpu buffers will be unreferenced and cause other drm buffer leak and result in terrible memory leak. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-10-13	Fix gpgpu node related bug	Pan Xiuli	1	-1/+1
	This should be a typo, we should wait for the gpgpu and create node only if the batch buffer is busy. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-10-13	Add -dump-opt-asm support to the clLinkProgram() API	Manasi Navare	1	-3/+1
	This will dump the Gen ASM output to the file specified in the -dump-opt-asm Link option during the Link program step. Signed-off-by: Manasi Navare <manasi.d.navare@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2015-10-12	GBE: fix kernel arguments uploading bug.	Zhigang Gong	2	-5/+9
	After the curbe allocation refactor, not all kernel arguments will be allocated unconditional. If some kernel arguments haven't been used at all, the corresponding arguments will be ignored at backend thus we may get a -1 offset. On the runtime driver side, we need check this situation. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-24	GBE: refactor curbe register allocation.	Zhigang Gong	3	-20/+30
	The major motivation is to normalize the curbe payload's allocation and prepare to use liveness information to avoid unecessary payload register allocation and avoid fragments when allocate curbe registers. For an example, for GBE_CURBE_LOCAL_ID_Y/Z, many one dimention kernels don't need them. But previous curbe allocation occurs before the liveness interval computing, thus it will allocate that curbe anyway. Altough it will be expired soon but it still need us to prepare those payload at host side. After this patch, this type of overhead has been eliminated easily. Another purpose is to eliminate the ugly curbe patch list handling in backend. After this patch, the curbe register handling is much cleaner than before. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-09-23	Driver: fix the annoying "Failed to release userptr..." error message	Pan Xiuli	1	-2/+4
	It is a drm related bug. As the drm driver changed the time to free their test userptr to bufmgr destroy(30921483c70c6939f017476eac13da6aa26b3b3c), we need anothr order to release our driver to make sure the test userptr can be freed with a valid fd. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-23	Calculate appropriate timestamps for cl profile	Midhun Kodiyath	3	-4/+71
	Fix to calculate the current cpu monotonic raw timestamp in nanoseconds for enqueued,submitted,start and finshed and send this to application based on the parameter queries. Signed-off-by: Midhun Kodiyath <midhunchandra.kodiyath@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-09-22	enable create image 2d from buffer in clCreateImage.	Luo Xionghu	7	-29/+99
	this patch allows create 2d image with a cl buffer with zero copy. v2: should use reference to manage the release the buffer and image. After being created, the buffer reference count is 2, and image reference count is 1. if image is released first, decrease the image reference count and buffer reference count both, release the bo when the buffer is released at last; if buffer is released first, decrease the buffer reference count only, release the buffer when the image is released. add CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT in cl_device_info. v3: move is_image_from_buffer to _cl_mem_image; return CL_INVALID_IMAGE_SIZE if image size is larger than the buffer. v4: pitchalignment set to 2. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-09-22	return 32 could gain 0.2% performance on opencv optical flow case.	Luo Xionghu	1	-1/+1
	Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
2015-09-21	should check the return value of cl_program_new.	Luo Xionghu	1	-0/+18
	catch the error: out of host memery. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>