beignet - Beignet OpenCL Library for Intel Ivy Bridge and newer GPUs (mirrored from https://gitlab.freedesktop.org/beignet/beignet)

Age	Commit message (Collapse)	Author	Files	Lines
2017-04-17	backend: add convert_double_R(float x)	rander	1	-3/+3
	just call convert_double(float) for double can fully cover the data range of float, so no data lost Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17	backend: add double support to convert_double_rte\|n\|z\|p(double x)	rander	1	-1/+1
	just call convert_double(double x). actually just a mov Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17	backend: add double support to convert_u\|char\|short\|int_rtp(double x)	rander	1	-0/+21
	first convert double to u\|long, then convert to smaller type And converting double directly to smaller type does not save any instructions Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17	backend: add int8 convert to double.	rander	1	-0/+72
	the algorithm is very simple, for convert_double_rte\|z\|p\|n(int8 x) the input from -128 ~ 127 or 0 ~ 255 should get the same result Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17	backend: add double support to ↵	rander	1	-1/+49
	convert_u\|char\|short\|int\|long_sat_rte\|z\|n\|p(double x) Algorithm: do the operation as rte\|z\|n\|p without sat when the data range is in. And if outof range, just clamp to the max\|min. Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17	Backend: add double support to convert_u\|long_rtp(double x)	rander	1	-0/+45
	Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17	Backend: add double support to convert_u\|char\|short\|int\|long_rtz(double x)	rander	1	-2/+23
	rtz can be done with rtn with usigned type. for signed type, rtn with abs(x), then add the sign effect Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17	Backend: add double support to convert_u\|char\|u\|short\|u\|int_rte(double x)	rander	1	-0/+9
	Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17	Backend: add double support to convert_u\|long_rte(double x)	rander	1	-2/+37
	Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17	Backend: add double support to convert_float_rtn(double x)	rander	2	-0/+31
	Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17	Backend: add double support to convert_uchar\|short_rtn(double x)	rander	1	-0/+20
	Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17	Backend: add double support to convert_u\|int_rtn(double x)	rander	1	-0/+35
	Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17	Backend: add double support to convert_u\|long_rtn(double)	rander	2	-1/+58
	Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17	Backend: add double support to ↵	rander	3	-1/+25
	convert_uchar\|char\|short\|ushort\|int\|uint\|long\|ulong_sat(double x) HW support Double to int16, int32 from IVB, others done by software. Double to int64 is supported by BWD+, now skip it and refine it later Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17	Backend:add double support to max min min step	rander	2	-0/+35
	Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17	Backend: add double support to prefetch. Actually it does nothing	rander	2	-0/+2
	Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17	Backend: add double support for shuffle	rander	2	-0/+4
	Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17	Backend: add double support for select.	Yang Rong	1	-0/+2
	Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-13	Backend: Add LLVM40 support	Pan Xiuli	21	-34/+217
	1.Refine APFloat fltSemantics. 2.Refine bitcode read/write header. 3.Refine clang invocation. 4.Refine return llvm::error handler. 5.Refine ilist_iterator usage. 6.Refine CFG Printer pass manager. 7.Refine GEP with pointer type changing. 8.Refine libocl 20 support V2: Add missing ocl_sampler.ll and ocl_sampler_20.ll file V3: Fix some build problem for llvm36 Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-04-13	Backend: Refine FCmp one and une	Pan Xiuli	1	-4/+6
	llvm will merge: %1 = fcmp olt %a, %b %2 = fcmp ogt %a, %b %dst = or %1, %2 into %dst = fcmp one %a, %b And own CMP.NE is actually une so refine Fcmp one into CMP.LT and CMP.GT and OR Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-04-13	Backend: Refine LLVM version check macro	Pan Xiuli	17	-104/+104
	LLVM 4.0 is coming, we should refine our version check to fit the LLVM_MAJOR_VERSION bump to 4. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-04-13	Backend: Refine GEP lowering code	Pan Xiuli	3	-16/+30
	Pointer is not as like as array or vector, we should handle it in a standalone path to fit furture change about PointerType inheritance. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-04-13	Backend: Fix an include file error problem	Pan Xiuli	4	-5/+4
	We should not include any llvm header in ir unit, and we need add missing headers for proliling after deleting llvm headers. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-04-13	Backend: Remove old llvm support code.	Pan Xiuli	6	-90/+0
	LLVM 3.3 or older is not supportted by Beignet now, and we need delete these codes. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-04-13	Backend: Fix flag and subflag seting for src 3 instruction	Pan Xiuli	3	-6/+19
	Before gen8, src 3 instruction has different flag and subflag bits V2: Fix the sub flag bit. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-03-23	Backend: Add hole reuse in reg alloction	Pan Xiuli	2	-17/+121
	We first find regs that have pool in simple linear scale, and save them in HoleRegPool, when allocte regs we first try to search fit candidate in the pool and choose the most fit one to reuse. V2: Refine hole reuse only in one block. V3: Refine data structure with less variable, add OCL_REUSE_HOLE_REG to control the optimization. V4: Spilt the patch into instruction ID part and hole reuse, refine the blockID of the reg. V5: Refine some variable and function name. Add check for not spill the hole regs that already been used. V6: Fix some case when the dst is partial write. V7: Fix hole spill dead loop. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-03-23	Backend: Store the spill register information	Pan Xiuli	1	-5/+33
	In some case we may use some subnr of a spilled reg, we need use the reg information of the spilled reg in unspill. V2: Fix some uninit register problem. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-03-23	llvm3.9 will assert if ouput is empty string.	Luo Xionghu	1	-4/+8
	Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-03-23	fix regression on pre-BDW platform.	Luo Xionghu	1	-3/+7
	ivb/hsw will spit the 32X32 to two simd8 instructions, and noMask instruction introduced there, the if-opt pass shouldn't change the predicate state for no mask instructions. v2: fix typo. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-03-23	Properly check return value from __cxa_demangle	Jan Beich	1	-2/+2
	FreeBSD uses libcxxrt (via libc++) instead of GNU libiberty (via libstdc++) for __cxa_demangle(). When output_buffer and length both are NULL it doesn't modify status on success. Rather than rely on maybe uninitialized variable check the function doesn't return NULL. Fixes: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213732 Signed-off-by: Jan Beich <jbeich@freebsd.org> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-03-13	Backend:add double support for some relation function	rander	3	-0/+87
	Signed-off-by: rander <rander.wang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-03-13	Backend: add double support to bitselect	rander	2	-0/+4
	Signed-off-by: rander <rander.wang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-03-13	implement extension cl_intel_media_block_io WRITE related function	Luo Xionghu	8	-31/+194
	v2: use static fixBlockSize; no need set default width/height in IR level. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-03-13	fix build error log not output issue.	Luo Xionghu	1	-4/+4
	v2: output build option and err if variable set. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-03-13	add extension cl_intel_media_block_io READ related function	Luo Xionghu	9	-33/+272
	v2: add #define intel_media_block_io in libocl; move extension check code to this patch; Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-03-13	add extension intel_planar_yuv.	Luo Xionghu	1	-0/+1
	create a w* (3/2h) size bo for the whole CL_NV12_INTEL format surface, and the y surface (format CL_R) share the first w h part, uv surface (format CL_RG) share the left w * 1/2h part; set correct bo offset for uv surface per different platforms. v2: add extension define in libocl; fix error check. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-03-07	Backend: refine the geometry function	rander	1	-4/+4
	Signed-off-by: rander <rander.wang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-03-07	Backend: for BDW and after, According to BSpec no need to split CMP when src ↵	rander	4	-0/+11
	is DW DF Signed-off-by: rander <rander.wang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-03-07	Backend: Add missing Unaligned OWord Block Read disasm	Pan Xiuli	1	-1/+1
	Now OWord Block Read disasm is missing, add it with Oword Block Read. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-02-28	Backend: Fix a selection ir optimization bug	Pan Xiuli	1	-1/+4
	We used to check for unpacked instructions, but we will also ignore some patterns like: MOV %1, %2.1 MUL %4, %3, %1 ==> MUL $4, %3, %2.1 Add more check to keep this kind of optimization. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-02-24	MAD compact instrcution could not support "absolute" attribute.	Yan Wang	1	-0/+2
	If absolute of SRCs of MAD instruction is 1, doens't use compact instruction. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-02-17	move simpleBlock check and if/endif optimize after select.	Luo Xionghu	4	-2/+123
	the if opt could be a independent pass like function by checking the instruction state changes and special instructions like I64, mixed bit etc. this could reduce the code complexit of structure code. v2: as the GenInstructionState flag/subFlag default value is 0.0, so isSimpleBlock function return false if the insn state uses 0.1 as flag. This rule could make function more straight forward, no need to enum the special instructions except SEL_OP_SEL_CMP(no predication per spec). v3: update code per review comments. remove duplicate code; redefine MACRO name;endifOffset rename patch moved to later patchset. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-02-17	revert patch 2edb7451a8f92295f79e29ef16740b5cd16127f2.	Luo Xionghu	2	-101/+17
	the if/endif optimization need be located after instruction selection to make code modular and reduce complexity. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-02-17	remove useless code.	Luo Xionghu	1	-1/+0
	Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-02-14	Enable OpenCL 2.0 only where supported	Pan Xiuli	1	-9/+10
	This allows a single beignet binary to both offer 2.0 where available, and still work on older hardware. V2: Default to 1.2 when -cl-std is not set (required by the OpenCL spec, and also likely to be faster). V3: Only enable OpenCL 2.0 when llvm version is 39. V4: Only enable OpenCL 2.0 on x64 host. V5: Always return 32 as address bits. Contributor: Rebecca N. Palmer <rebecca_palmer@zoho.com> Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-02-10	GBE: use shr instead of division as possible.	Yang Rong	1	-1/+12
	GEN's div instruction need several cycles, use the shl instruction when divisor is pow of 2 constant. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-02-10	GBE: use shl instead of multiply as possible.	Yang Rong	1	-0/+19
	i32 multiply and i64 multiply need several instructions, use the shl instruction when one source is pow of 2 constant. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-02-06	Fix typo	Rebecca N. Palmer	1	-1/+1
	Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-02-06	GBE: use shift for PowerOf2 size when lowering GEP.	Ruiling Song	1	-6/+13
	For 64bit address, the multiply would expand to several instructions. As for most time, the size is PowerOf 2. So we can use left-shift to do this. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-01-19	Android.mk: update Android.mk for android build.	Yang Rong	2	-1/+11
	Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>