~hejunyan/beignet - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2015-01-12	add utest of CL_MEM_ALLOC_HOST_PTR	Guo Yejun	3	-0/+32
	Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-01-12	add CMake option USE_STANDALONE_GBE_COMPILER and STANDALONE_GBE_COMPILER_DIR	Guo Yejun	7	-16/+110
	At some platforms with old c/c++ environment, C++11 features are not supported, it results in the failure to build the gbe compiler part which depends on LLVM/clang using C++11 features. The way to resolve is to build a standalone gbe compiler within another feasible system, and build beignet with the already built standalone gbe compiler by setting USE_STANDALONE_GBE_COMPILER=true. The path of the standalone compiler is /usr/local/lib/beignet as default or could be specified by STANDALONE_GBE_COMPILER_DIR. Once USE_STANDALONE_GBE_COMPILER is given, all the gbe compiler relative code will not be built any longer, only libcl.so and libgebinterp.so are built. And libcl.so is special for GEN_PCI_ID, which is queried from the building machie or could be specified as CMake option. v2: separate the CMake option name. update the commit comments. add back the script for gen pci id, and build driver with it. v3: add file FindStandaloneGbeCompiler.cmake to make the main cmakefile clean. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-12	add option BUILD_STANDALONE_GBE_COMPILER to build static compiler	Guo Yejun	1	-10/+29
	The standalone compiler (gbe_bin_generater), depending on LLVM/clang, could only be built with C++11 features. To make it workable within old c/c++ version environment, add one CMAKE option to link against all static libraries. And also zip the compiler and necessary files into a tar ball. v2: change the option name to BUILD_STANDALONE_GBE_COMPILER. zip necessary files into a tar ball. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	Add read buffer/image benchmark.	Yang Rong	5	-1/+159
	Add there two benchmark to compare the buffer and image performance V2: init the coord before read image. V3: Correct the image's width and buffer's read index. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	CL/Driver/HSW: Convert L3 cycle for texture to uncachable.	Zhigang Gong	1	-1/+1
	This is to workaround a bug we found with darktable. After this patch, darktable could work fine on HSW. And based on the test result, most of the benchmarks haven't been affected much by this patch. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
2015-01-09	Change the IVB/HSW L3 SQC credit setting.	Yang Rong	1	-2/+2
	Set the L3SQ General Priority Credit to max, and L3SQ High Priority Credit to zero, it can slightly improve the performacne, about 2% of luxmark. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	utests: skip one test when it fail to open XDisplay.	Zhigang Gong	1	-0/+4
	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-01-09	Fix loop condition of PrintfSet constructor.	Yan Wang	1	-1/+1
	Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	remove useless dependency libocl	Guo Yejun	1	-2/+0
	libocl is the name of sub directory, the project name in the sub directory, it is not something that others can depend on. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	CL/Driver: quick fix regression caused by remove MI_FLUSH.	Zhigang Gong	1	-0/+2
	On Gen8, we also need an extra pipe control after the MEDIA_STATE_FLUSH. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-01-09	refine gbe_bin_generater usage to add -t option	Guo Yejun	1	-1/+1
	-t option specifies the gen target pci id, it tells gbe_bin_generater the target platform that it compiles for. The compile result is llvm level binary if this option is not given. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	libocl: Reimplement trigonometric functions.	Ruiling Song	1	-378/+172
	Previous version was ported from msun which derived from fdlibm, which is good for cpu, with lots of if-condition check to try to optimize for different input data. But it is really bad for gpu. So I reimplement these functions based on well-known payne & Hanek's algorithm. Compared with previous version, it could reduce the static ASM instruction number of sin/cos from about 1700 to 400. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	CL/Driver: enable atomics in L3 for HSW.	Zhigang Gong	2	-1/+14
	This could get more than 10x boost for some atomic stress workloads. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-01-09	libocl: remove useless code.	Ruiling Song	1	-57/+0
	These kind of logic already handled by atan2(). Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	fix utest build for some old gcc version	Guo Yejun	5	-26/+26
	change the keyword from constexpr to const, update the code for explicit type conversion and std::map's iterator. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	do not use C++11 features inside libgbeinterp	Guo Yejun	12	-87/+111
	some embedded systems have not upgraded the c/c++ environment, it makes the request to remove the C++11 features. It is possible for the CL_EMBEDDED_PROFILE with some more changes (to be done later). This change modifies the keyword auto and nullptr. btw, C++ new feature is a must for libgbe (the OpenCL compiler) which depends on LLVM/clang Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	change Immediate::operator= from private to public	Guo Yejun	1	-1/+2
	change the attribute of "Immediate & operator= (const Immediate &)" in class Immediate from private to public, otherwise, a compile issue appears when build with old gcc versions for the following code in function.hpp: INLINE ImmediateIndex newImmediate(const Immediate &imm) { const ImmediateIndex index(this->immediateNum()); this->immediates.push_back(imm); return index; } Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	do not include llvm/clang headers for libgbeinterp	Guo Yejun	2	-1/+12
	libgbeinterp does not depend on llvm/clange, so remove these header files for code clean. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	Remove obsolete MI_FLUSH	Zhenyu Wang	4	-11/+3
	This is caught in emulator debug that MI_FLUSH is obsolete from IVB/HSW and beignet used wrong flush bit too, so don't go risk but remove it. Current kernel would take care to flush ring after each request, so shouldn't need extra flush. Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-01-09	Don't check some edge condtion in non-strict mode.	Zhigang Gong	1	-2/+2
	Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-01-09	add edge case detection for powr in utests	Meng Mengmeng	2	-3/+6
	power(x,y) return Nan for x<0 in spec, so add that for powr. Signed-off-by: Meng Mengmeng <mengmeng.meng@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	utests: make utests maths ULP values consistent with specification	Meng Mengmeng	3	-8/+96
	Signed-off-by: Meng Mengmeng <mengmeng.meng@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	runtime: fix max work group size for IVBGT1.	Zhigang Gong	1	-2/+2
	If the kernel is compiled under simd8 mode, the maximum work group size should be 8 * 6 * 6 = 288. The original 512 is too large for it. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
2015-01-09	runtime: tweak max memory allocation size.	Zhigang Gong	2	-2/+12
	Increase the maximum memory allocation size to at least 512MB and will set it to larger if the system has more total memory. This tweak will make darktable happy to handle big pictures. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> v2: reduce max constant buffer to 128MB. v3: fix the sysinfo usage. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
2015-01-09	utests: reduce test count.	Zhigang Gong	1	-4/+5
	No need to iterate so many times. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
2015-01-09	Fix PrintfState copying.	Yan Wang	1	-4/+29
	PrintfState include std::string object and shouldn't be copied by malloc/memcpy. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: He Junyan <junyan.he@inbox.com>
2015-01-09	Separate flush and invalidate in function intel_gpgpu_pipe_control.	Yang, Rong	2	-2/+36
	HSW has a limitation when PIPECONTROL with RO Cache Invalidation: Prior to programming a PIPECONTROL command with any of the RO cache invalidation bit set, program a PIPECONTROL flush command with CS stall bit and HDC Flush bit set. So must use two PIPECONTROL commands to flush and invalidate L3 cache in HSW. This patch fix some random fails which has very heavy DC read/write in HSW. Signed-off-by: Yang, Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	Use libdrm interface to get device id	Zhenyu Wang	2	-22/+2
	Remove own ioctl call for device id but use libdrm interface instead. This not only saves one extra ioctl call as it's already been read when gem bufmgr inits, and also would allow to override device id with libdrm helper environment 'INTEL_DEVID_OVERRIDE'. To combine with aub dump, you can do device debugging with fulsim emulator by choosing any device you want and don't need hw metal at all. Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	Add aub dump support	Zhenyu Wang	1	-1/+16
	Use current libdrm interface to dump aub file for debug in emulator. This adds new driver environment of OCL_DUMP_AUB=1 to enable this. Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	Remove deprecated fulsim code	Zhenyu Wang	7	-274/+2
	Remove pretty old fulsim code which seems having no users also used interfaces not in open source libdrm, and call windows fulsim binary instead of linux. We will use current libdrm interface instead. Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	replace hash_map with map	Guo Yejun	5	-91/+5
	there is no strong evidence to show hash_map makes better performance for beignet, since hash_map requires std::hash which is not supported in some g++ old versions, so replace hash_map with map. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2015-01-09	add collectImageArgs to handle image count limitations.	Luo Xionghu	1	-0/+28
	read only images in a kernel should be LE than MAX_READ_IMAGE_ARS; write only images in a kernel should be LE than MAX_WRITE_IMAGE_ARS. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	fix min_max_read_image_args and min_max_parameter_size issue.	Luo Xionghu	8	-10/+13
	this patch revert fb4bced99b7c08d0d43386abf33448860fb7fc41 as the spec defined the min_max_parameter_size's min value is 1024; the BTI_MAX_NUM and btiBase could be 130 because of 128 images with 1 const surface and 1 private surface. v2: add BTI_MAX_READ_IMAGE_ARGS and BTI_MAX_WRITE_IMAGE_ARGS in backend. change the BTI_MAX_ID to 253. the image numbers will be calculated in later patch and check its limitation. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	libocl: implement high precision pown()	Ruiling Song	1	-5/+232
	This version is based on pow() implementation ported from msun. I just modify it to support a floating point to the power of an integer. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	GBE: remove software maintained SLM offset related code.	Zhigang Gong	7	-37/+1
	v2: also remove allocSLMOffsetCurbe(). Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2015-01-09	GBE: support const private array initialization.	Ruiling Song	2	-45/+54
	Developers are allowed to declare initialized private array like below: void func() { const arr[]={1, 2, 3, 4}; } The implementation is simply put them into __constant memory space. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	GBE: use sr0.1's SLM Offset to eliminate the software SLM offset for HSW.	Zhigang Gong	2	-4/+10
	sr0.1 has a SLM Offset bits field which could be used to set slm offset (4K unit), so we just need to initialize it at the beginning of the kernel and don't need to maintain the software SLM offset. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
2015-01-09	GBE: fix an image regression.	Zhigang Gong	2	-29/+30
	This patch fix one regressions in the image processing path. For all non-workarounded image which the image offset is 0, we should always use float type coord. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
2015-01-09	fix max_parameter_size not correct on x86 platforms.	Luo Xionghu	2	-2/+2
	this value should depend on the pointer size according to the system. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	libocl: flush denorm to zero in remquo()	Ruiling Song	1	-0/+2
	Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	libocl: Correctly handle -inf in exp10.	Ruiling Song	1	-3/+3
	exp10(-inf) should return 0.0f Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	libocl: flush denorm into zero in ldexp()	Ruiling Song	1	-1/+1
	inf and denorm logic in internal_ldexp() is useless, as inf and denorm is already handled in __gen_ocl_scalbnf() and wrapper function. It is better to flush denorm to zero in wrapper function, so we don't have to change the internal implementation. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	libocl: Flush denorm input into zero in rootn()	Ruiling Song	1	-0/+8
	Gen does not support denorm. We have to flush input to zero. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	libocl: Imporve precision of exp()	Ruiling Song	1	-9/+27
	This patch reverts most logic in 500843d36ab6631d71570130c0c08048f9b8f3fe It seems native_exp will lose some precision which can make it not satisfy OpenCL Spec. These kind of cases often come from other function that involk internal_exp() like sinh/cosh. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	libocl: Improve precision of pow/powr.	Ruiling Song	1	-14/+67
	pow: When splitting a float into two floats. normally, we use 0xfffff000 as the mask. This leaves 12bit effective mantissa in high bits. after some calculation, it seems lost some bit in high bits. so I change the mask to 0xffffe000, which only leave 11bit mantissa in the high bits. Then the precision can meet OpenCL Spec requirement. powr: powr() defined different edge case behavior in OpenCL Spec 7.5 Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	add half math function support.	Luo Xionghu	1	-1/+15
	simply define the half_xxx functions to xxx. v2: functions need be defined to native_xxx since they could pass under non-strict conformance mode except sin/cos/powr/tan. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2015-01-09	GBE: code cleanup.	Zhigang Gong	3	-12/+2
	Remove some useless comments according to Matt's suggestion. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2015-01-09	GBE/CL: use 2D image to implement large image1D_buffer.	Zhigang Gong	5	-15/+67
	Per OpenCL spec, the minimum CL_DEVICE_IMAGE_MAX_BUFFER_SIZE is 65536 which is too large for 1D surface on Gen platforms. Have to use a 2D surface to implement it. As OpenCL spec only allows the image1d_t to be accessed via default sampler, it is doable as it will never use a float coordinates and never use linear non-nearest filters. Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-01-09	GBE: remove some image1d_buffer related builtin functions.	Zhigang Gong	2	-9/+9
	Per OpenCL spec, image1d buffer only support no sampler access. Remove those unsupported functions. Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-01-09	GBE: switch to CLANG native sampler_t.	Zhigang Gong	10	-25/+184
	CLANG has sampler_t support since LLVM 3.3, let's switch to that type rather than the old hacky way. One major problem is the sampler static checking. As Gen platform has some hardware restrication and if the sampler value is a const defined at kernel side, we need to use the value to optimize the code path. Now the sampler_t becomes an obaque type now, the CLANG doesn't support any arithmatic operations on it. So we have to introduce a new pass to do this optimization. v2: fix comments. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>