Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
|
|
I suspect that this is too conformant to OpenCL 1.2 spec.
|
|
|
|
|
|
|
|
|
|
On some distributions, the CMAKE_INSTALL_FULL_LIBDIR or CMAKE_LIBRARY_ARCHITECTURE
may be undefined. To avoid generate intel-beignet-.icd file name, we need to get
rid of the extra "-" for such case.
Reported by Igor Gnatenko.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
llvm 3.6 will give a UNDEF value for NAN. The will cause
the store instruction for UNDEF to be ignored. We need
to modify it to NAN here.
Comments from Zhigang:
"
The related commit of why LLVM won't just simply return NaN for such
case is at:
Make the sqrt intrinsic return undef for a negative input.
As discussed here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140609/220598.html
And again here:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/077168.html
The sqrt of a negative number when using the llvm intrinsic is undefined.
We should return undef rather than 0.0 to match the definition in the LLVM IR lang ref.
This change should not affect any code that isn't using "no-nans-fp-math";
ie, no-nans is a requirement for generating the llvm intrinsic in place of a sqrt function call.
Unfortunately, the behavior introduced by this patch will not match current gcc, xlc, icc, and
possibly other compilers. The current clang/llvm behavior of returning 0.0 doesn't either.
We knowingly approve of this difference with the other compilers in an attempt to flag code
that is invoking undefined behavior.
A front-end warning should also try to convince the user that the program will fail:
http://llvm.org/bugs/show_bug.cgi?id=21093
Differential Revision: http://reviews.llvm.org/D5527
This patch is a workaround for the following scenario:
printf("%f \n", sqrt(-1.0f));
"
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Accordying to OpenCL 1.2 Rev 17:
"CL_KERNEL_ARG_TYPE_CONST is returned if the argument is a pointer and the referenced type is declared with the restrict or const qualifier. For
example, a kernel argument declared as global int const *x returns CL_KERNEL_ARG_TYPE_CONST but a kernel argument declared as global int *
const x does not."
So only need to return CL_KERNEL_ARG_TYPE_CONST for pointer arguments.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: "Weng, Chuanbo" <chuanbo.weng@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
the callInst may contains bitcast instruction if the function's is
different with the decleration. strip the bitcast instruction to get
the real name.
v2: remove printf message.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
The llvm include should be specified when llvm is
not installed in standard dir.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
This is the dev branch for next major release 1.1.0.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
And update document accordingly.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
The bug was introduces when we removed the hacky invalid
register. Now we will not pass in a fixed count of coordinates
for the typed_write instruction.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
the clang 3.5 will call CallGraphSCCPass to add attribute "Attribute::ReadOnly"
for these parameters only reads memeory, but this attribute is not
supported in the VerifierPass of llvm 3.3. This is a bug of llvm 3.3.
v2: disable this extension in runtime for old llvm.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
v2: split to a seperate patch.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
It cause performance regression according to VIZ-5046.
This reverts commit 3c407838c11c52be6f2ccb237884073566ed8c90.
|
|
translate native pow to llvm.pow for fast path.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
translate native mad to llvm.fma.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
translate native rndd to llvm.floor.
v2:
fix ocl_convert.sh.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
translate native rndu to llvm.ceil.
v2:
fix ocl_convert.sh
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
translate native rnde to llvm.rint.
v2:
fix ocl_convert.sh.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
translate native rndz to llvm.trunc.
v2:
fix ocl_convert.h
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
translate native fabs to llvm.fabs for fast path.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
v2:
fix ocl_geometric.cl.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
As constant propagation will introduce constantExpr and gep instruction,
I choose not to run constant propagation pass after RemoveGep pass.
So, here only generate Multiply as needed.
We may do such kind of optimization in Gen IR level in the future.
This could fix the performance regression introduced by:
"GBE: Import constantexpr lower pass from pNaCl"
to the opencv case:
opencv_perf_imgproc/OCL_BilateralFixture_Bilateral
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
We add the test case for uniform when doing the bswap.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
We move the bswap logic from llvm_to_gen to backend for
efficienc using indirect mode.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Because Gen8 has 16 sub-registers for A0, we can use
them to decrease the instructions number.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
a0 as a address register acts a very important role in
indirect mode access. We add auxiliary functions to set
its content correctly and effectively.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Add a0_subnr and addr_imm to GenRegister, in order to
represent the indirect register, which may be some
imm offset from a0.x subregister's base address.
Also add to_indirect1xN help function to convert a register
to an indirect 1XN register.
V3:
1. Add Gen8 encoder setting.
2. Reorder the patches.
3. Add logic for gen8 context, using 16 a0 sub-registers.
4. Fix some bugs of uniform src.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
To generate SPIR binary, please refer to the page
https://github.com/KhronosGroup/SPIR.
For llvm3.2, the command is "clang -cc1 -emit-llvm-bc -triple
spir-unknown-unknown -cl-std=CL1.2 -include opencl_spir.h
compiler_ceil.cl -o compiler_ceil32.spir"
For llvm3.5, the option -cl-kernel-arg-info is required,
and option -fno-builtin is required to avoid warning.
v2: add missing load_program_from_spir.cpp file.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
rename "printf" to "__gen_ocl_printf_stub" and "puts" to
"__gen_ocl_puts_stub" in PrintfParser after link.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
the SPIR header file requirs these functions to be overlable.
(https://github.com/KhronosGroup/SPIR-Tools/blob/master/headers/opencl_spir.h)
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
the SPIR are built by clang generating a standard llvm Module file,
beignet need insert one byte before the module repesents binary type
then parse the module to link.
enable cl_khr_spir extension output string;
enable the SPIR calling conversion of CallingConv::SPIR_KERNEL;
get_global_id shoud be OVERLOADABLE; fix some bugs in prinf parse
and backend.
v2: move OVERLOADABLE change to another patch to keep clean;
rename FROM_INTERMEDIATE to FROM_LLVM_SPIR.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: Meng Mengmeng <mengmeng.meng@intel.com>
|
|
LLVM3.6 revert the c api LLVMLinkModules to LLVM3.5 last-minute. Consist with it.
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
As there may be some other LLVM users such as mesa, and they
may link to different LLVM library. To avoid such type of
conflicts, we use -Bsymbolic to disable the symbol preemption.
This patch should fix the build bug at:
https://bugs.freedesktop.org/show_bug.cgi?id=89325
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
I found some optimization pass may add fastcall attribute to some
builtin functions. We need to add the corresponding support.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
LLVM 3.6 may generate the following instructions:
%Pivot = icmp slt i1 %trunc49, false
when do siwth lowering pass.
To support it we must use GEN_TYPE_W to represent B rather
than GEN_TYPE_UW and we also need to remove the corresponding
assertions.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
The backend SEL instruction could support bool type
since we change the bool representation to normal
S16 data type. Now let us remove this assertion
check.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|