summaryrefslogtreecommitdiff
path: root/backend
AgeCommit message (Collapse)AuthorFilesLines
2017-04-17backend: add convert_double_R(float x)rander1-3/+3
just call convert_double(float) for double can fully cover the data range of float, so no data lost Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17backend: add double support to convert_double_rte|n|z|p(double x)rander1-1/+1
just call convert_double(double x). actually just a mov Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17backend: add double support to convert_u|char|short|int_rtp(double x)rander1-0/+21
first convert double to u|long, then convert to smaller type And converting double directly to smaller type does not save any instructions Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17backend: add int8 convert to double.rander1-0/+72
the algorithm is very simple, for convert_double_rte|z|p|n(int8 x) the input from -128 ~ 127 or 0 ~ 255 should get the same result Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17backend: add double support to ↵rander1-1/+49
convert_u|char|short|int|long_sat_rte|z|n|p(double x) Algorithm: do the operation as rte|z|n|p without sat when the data range is in. And if outof range, just clamp to the max|min. Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17Backend: add double support to convert_u|long_rtp(double x)rander1-0/+45
Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17Backend: add double support to convert_u|char|short|int|long_rtz(double x)rander1-2/+23
rtz can be done with rtn with usigned type. for signed type, rtn with abs(x), then add the sign effect Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17Backend: add double support to convert_u|char|u|short|u|int_rte(double x)rander1-0/+9
Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17Backend: add double support to convert_u|long_rte(double x)rander1-2/+37
Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17Backend: add double support to convert_float_rtn(double x)rander2-0/+31
Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17Backend: add double support to convert_uchar|short_rtn(double x)rander1-0/+20
Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17Backend: add double support to convert_u|int_rtn(double x)rander1-0/+35
Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17Backend: add double support to convert_u|long_rtn(double)rander2-1/+58
Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17Backend: add double support to ↵rander3-1/+25
convert_uchar|char|short|ushort|int|uint|long|ulong_sat(double x) HW support Double to int16, int32 from IVB, others done by software. Double to int64 is supported by BWD+, now skip it and refine it later Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17Backend:add double support to max min min steprander2-0/+35
Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17Backend: add double support to prefetch. Actually it does nothingrander2-0/+2
Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17Backend: add double support for shufflerander2-0/+4
Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-17Backend: add double support for select.Yang Rong1-0/+2
Signed-off-by: rander <rander.wang@intel.com> Tested-by: Yang Rong <rong.r.yang@intel.com>
2017-04-13Backend: Add LLVM40 supportPan Xiuli21-34/+217
1.Refine APFloat fltSemantics. 2.Refine bitcode read/write header. 3.Refine clang invocation. 4.Refine return llvm::error handler. 5.Refine ilist_iterator usage. 6.Refine CFG Printer pass manager. 7.Refine GEP with pointer type changing. 8.Refine libocl 20 support V2: Add missing ocl_sampler.ll and ocl_sampler_20.ll file V3: Fix some build problem for llvm36 Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-04-13Backend: Refine FCmp one and unePan Xiuli1-4/+6
llvm will merge: %1 = fcmp olt %a, %b %2 = fcmp ogt %a, %b %dst = or %1, %2 into %dst = fcmp one %a, %b And own CMP.NE is actually une so refine Fcmp one into CMP.LT and CMP.GT and OR Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-04-13Backend: Refine LLVM version check macroPan Xiuli17-104/+104
LLVM 4.0 is coming, we should refine our version check to fit the LLVM_MAJOR_VERSION bump to 4. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-04-13Backend: Refine GEP lowering codePan Xiuli3-16/+30
Pointer is not as like as array or vector, we should handle it in a standalone path to fit furture change about PointerType inheritance. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-04-13Backend: Fix an include file error problemPan Xiuli4-5/+4
We should not include any llvm header in ir unit, and we need add missing headers for proliling after deleting llvm headers. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-04-13Backend: Remove old llvm support code.Pan Xiuli6-90/+0
LLVM 3.3 or older is not supportted by Beignet now, and we need delete these codes. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-04-13Backend: Fix flag and subflag seting for src 3 instructionPan Xiuli3-6/+19
Before gen8, src 3 instruction has different flag and subflag bits V2: Fix the sub flag bit. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-03-23Backend: Add hole reuse in reg alloctionPan Xiuli2-17/+121
We first find regs that have pool in simple linear scale, and save them in HoleRegPool, when allocte regs we first try to search fit candidate in the pool and choose the most fit one to reuse. V2: Refine hole reuse only in one block. V3: Refine data structure with less variable, add OCL_REUSE_HOLE_REG to control the optimization. V4: Spilt the patch into instruction ID part and hole reuse, refine the blockID of the reg. V5: Refine some variable and function name. Add check for not spill the hole regs that already been used. V6: Fix some case when the dst is partial write. V7: Fix hole spill dead loop. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-03-23Backend: Store the spill register informationPan Xiuli1-5/+33
In some case we may use some subnr of a spilled reg, we need use the reg information of the spilled reg in unspill. V2: Fix some uninit register problem. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-03-23llvm3.9 will assert if ouput is empty string.Luo Xionghu1-4/+8
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-03-23fix regression on pre-BDW platform.Luo Xionghu1-3/+7
ivb/hsw will spit the 32X32 to two simd8 instructions, and noMask instruction introduced there, the if-opt pass shouldn't change the predicate state for no mask instructions. v2: fix typo. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-03-23Properly check return value from __cxa_demangleJan Beich1-2/+2
FreeBSD uses libcxxrt (via libc++) instead of GNU libiberty (via libstdc++) for __cxa_demangle(). When *output_buffer* and *length* both are NULL it doesn't modify *status* on success. Rather than rely on maybe uninitialized variable check the function doesn't return NULL. Fixes: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213732 Signed-off-by: Jan Beich <jbeich@freebsd.org> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-03-13Backend:add double support for some relation functionrander3-0/+87
Signed-off-by: rander <rander.wang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-03-13Backend: add double support to bitselectrander2-0/+4
Signed-off-by: rander <rander.wang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-03-13implement extension cl_intel_media_block_io WRITE related functionLuo Xionghu8-31/+194
v2: use static fixBlockSize; no need set default width/height in IR level. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-03-13fix build error log not output issue.Luo Xionghu1-4/+4
v2: output build option and err if variable set. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-03-13add extension cl_intel_media_block_io READ related functionLuo Xionghu9-33/+272
v2: add #define intel_media_block_io in libocl; move extension check code to this patch; Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-03-13add extension intel_planar_yuv.Luo Xionghu1-0/+1
create a w* (3/2*h) size bo for the whole CL_NV12_INTEL format surface, and the y surface (format CL_R) share the first w * h part, uv surface (format CL_RG) share the left w * 1/2h part; set correct bo offset for uv surface per different platforms. v2: add extension define in libocl; fix error check. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-03-07Backend: refine the geometry functionrander1-4/+4
Signed-off-by: rander <rander.wang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-03-07Backend: for BDW and after, According to BSpec no need to split CMP when src ↵rander4-0/+11
is DW DF Signed-off-by: rander <rander.wang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-03-07Backend: Add missing Unaligned OWord Block Read disasmPan Xiuli1-1/+1
Now OWord Block Read disasm is missing, add it with Oword Block Read. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-02-28Backend: Fix a selection ir optimization bugPan Xiuli1-1/+4
We used to check for unpacked instructions, but we will also ignore some patterns like: MOV %1, %2.1 MUL %4, %3, %1 ==> MUL $4, %3, %2.1 Add more check to keep this kind of optimization. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-02-24MAD compact instrcution could not support "absolute" attribute.Yan Wang1-0/+2
If absolute of SRCs of MAD instruction is 1, doens't use compact instruction. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-02-17move simpleBlock check and if/endif optimize after select.Luo Xionghu4-2/+123
the if opt could be a independent pass like function by checking the instruction state changes and special instructions like I64, mixed bit etc. this could reduce the code complexit of structure code. v2: as the GenInstructionState flag/subFlag default value is 0.0, so isSimpleBlock function return false if the insn state uses 0.1 as flag. This rule could make function more straight forward, no need to enum the special instructions except SEL_OP_SEL_CMP(no predication per spec). v3: update code per review comments. remove duplicate code; redefine MACRO name;endifOffset rename patch moved to later patchset. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-02-17revert patch 2edb7451a8f92295f79e29ef16740b5cd16127f2.Luo Xionghu2-101/+17
the if/endif optimization need be located after instruction selection to make code modular and reduce complexity. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-02-17remove useless code.Luo Xionghu1-1/+0
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-02-14Enable OpenCL 2.0 only where supportedPan Xiuli1-9/+10
This allows a single beignet binary to both offer 2.0 where available, and still work on older hardware. V2: Default to 1.2 when -cl-std is not set (required by the OpenCL spec, and also likely to be faster). V3: Only enable OpenCL 2.0 when llvm version is 39. V4: Only enable OpenCL 2.0 on x64 host. V5: Always return 32 as address bits. Contributor: Rebecca N. Palmer <rebecca_palmer@zoho.com> Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-02-10GBE: use shr instead of division as possible.Yang Rong1-1/+12
GEN's div instruction need several cycles, use the shl instruction when divisor is pow of 2 constant. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-02-10GBE: use shl instead of multiply as possible.Yang Rong1-0/+19
i32 multiply and i64 multiply need several instructions, use the shl instruction when one source is pow of 2 constant. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
2017-02-06Fix typoRebecca N. Palmer1-1/+1
Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2017-02-06GBE: use shift for PowerOf2 size when lowering GEP.Ruiling Song1-6/+13
For 64bit address, the multiply would expand to several instructions. As for most time, the size is PowerOf 2. So we can use left-shift to do this. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2017-01-19Android.mk: update Android.mk for android build.Yang Rong2-1/+11
Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>