Age | Commit message (Collapse) | Author | Files | Lines |
|
just call convert_double(float) for double can fully cover
the data range of float, so no data lost
Signed-off-by: rander <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
just call convert_double(double x). actually just a mov
Signed-off-by: rander <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
first convert double to u|long, then convert to smaller type And converting
double directly to smaller type does not save any instructions
Signed-off-by: rander <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
the algorithm is very simple, for convert_double_rte|z|p|n(int8 x) the
input from -128 ~ 127 or 0 ~ 255 should get the same result
Signed-off-by: rander <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
convert_u|char|short|int|long_sat_rte|z|n|p(double x)
Algorithm: do the operation as rte|z|n|p without sat when the data range is in.
And if outof range, just clamp to the max|min.
Signed-off-by: rander <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: rander <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
rtz can be done with rtn with usigned type.
for signed type, rtn with abs(x), then add the sign effect
Signed-off-by: rander <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: rander <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: rander <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: rander <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: rander <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: rander <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: rander <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
convert_uchar|char|short|ushort|int|uint|long|ulong_sat(double x)
HW support Double to int16, int32 from IVB, others done by software.
Double to int64 is supported by BWD+, now skip it and refine it later
Signed-off-by: rander <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: rander <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: rander <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: rander <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: rander <rander.wang@intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
|
|
1.Refine APFloat fltSemantics.
2.Refine bitcode read/write header.
3.Refine clang invocation.
4.Refine return llvm::error handler.
5.Refine ilist_iterator usage.
6.Refine CFG Printer pass manager.
7.Refine GEP with pointer type changing.
8.Refine libocl 20 support
V2: Add missing ocl_sampler.ll and ocl_sampler_20.ll file
V3: Fix some build problem for llvm36
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
llvm will merge:
%1 = fcmp olt %a, %b
%2 = fcmp ogt %a, %b
%dst = or %1, %2
into
%dst = fcmp one %a, %b
And own CMP.NE is actually une so refine Fcmp one into CMP.LT and CMP.GT
and OR
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
LLVM 4.0 is coming, we should refine our version check to fit the
LLVM_MAJOR_VERSION bump to 4.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Pointer is not as like as array or vector, we should handle it in a
standalone path to fit furture change about PointerType inheritance.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
We should not include any llvm header in ir unit, and we need add
missing headers for proliling after deleting llvm headers.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
LLVM 3.3 or older is not supportted by Beignet now, and we need delete
these codes.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Before gen8, src 3 instruction has different flag and subflag bits
V2: Fix the sub flag bit.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
We first find regs that have pool in simple linear scale, and save them
in HoleRegPool, when allocte regs we first try to search fit candidate
in the pool and choose the most fit one to reuse.
V2: Refine hole reuse only in one block.
V3: Refine data structure with less variable, add OCL_REUSE_HOLE_REG to
control the optimization.
V4: Spilt the patch into instruction ID part and hole reuse, refine the
blockID of the reg.
V5: Refine some variable and function name. Add check for not spill the
hole regs that already been used.
V6: Fix some case when the dst is partial write.
V7: Fix hole spill dead loop.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
In some case we may use some subnr of a spilled reg, we need use the
reg information of the spilled reg in unspill.
V2: Fix some uninit register problem.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
ivb/hsw will spit the 32X32 to two simd8 instructions, and noMask
instruction introduced there, the if-opt pass shouldn't change the
predicate state for no mask instructions.
v2: fix typo.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
FreeBSD uses libcxxrt (via libc++) instead of GNU libiberty (via
libstdc++) for __cxa_demangle(). When *output_buffer* and *length*
both are NULL it doesn't modify *status* on success. Rather than rely
on maybe uninitialized variable check the function doesn't return NULL.
Fixes: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213732
Signed-off-by: Jan Beich <jbeich@freebsd.org>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
Signed-off-by: rander <rander.wang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
Signed-off-by: rander <rander.wang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
v2: use static fixBlockSize; no need set default width/height in IR
level.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
v2: output build option and err if variable set.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
v2: add #define intel_media_block_io in libocl; move extension check
code to this patch;
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
create a w* (3/2*h) size bo for the whole CL_NV12_INTEL format
surface, and the y surface (format CL_R) share the first w * h
part, uv surface (format CL_RG) share the left w * 1/2h part; set
correct bo offset for uv surface per different platforms.
v2: add extension define in libocl; fix error check.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: rander <rander.wang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
is DW DF
Signed-off-by: rander <rander.wang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
Now OWord Block Read disasm is missing, add it with Oword Block Read.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
We used to check for unpacked instructions, but we will also ignore
some patterns like:
MOV %1, %2.1
MUL %4, %3, %1
==>
MUL $4, %3, %2.1
Add more check to keep this kind of optimization.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
If absolute of SRCs of MAD instruction is 1, doens't use compact
instruction.
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
the if opt could be a independent pass like function by checking the
instruction state changes and special instructions like I64, mixed bit
etc. this could reduce the code complexit of structure code.
v2: as the GenInstructionState flag/subFlag default value is 0.0, so
isSimpleBlock function return false if the insn state uses 0.1 as flag.
This rule could make function more straight forward, no need to enum
the special instructions except SEL_OP_SEL_CMP(no predication per spec).
v3: update code per review comments. remove duplicate code; redefine
MACRO name;endifOffset rename patch moved to later patchset.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
the if/endif optimization need be located after instruction selection
to make code modular and reduce complexity.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
This allows a single beignet binary to both offer 2.0 where
available, and still work on older hardware.
V2: Default to 1.2 when -cl-std is not set (required by the OpenCL spec,
and also likely to be faster).
V3: Only enable OpenCL 2.0 when llvm version is 39.
V4: Only enable OpenCL 2.0 on x64 host.
V5: Always return 32 as address bits.
Contributor: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
GEN's div instruction need several cycles, use the shl
instruction when divisor is pow of 2 constant.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
i32 multiply and i64 multiply need several instructions, use the shl
instruction when one source is pow of 2 constant.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
For 64bit address, the multiply would expand to several instructions.
As for most time, the size is PowerOf 2. So we can use left-shift to
do this.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|