Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch is based on Rebecca's patch at:
https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=5;filename=Fix-pow-erf-tgamma.patch;att=3;bug=768090.
And fixed another bug which we should not use an absolute error checking.
We should use ULP and considering the strict conformance or non strict
conformance state.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
This patch is based on Rebecca's patch at:
https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=5;filename=Fix-pow-erf-tgamma.patch;att=3;bug=768090.
And fixed another bug which we should not use an absolute error checking.
We should use ULP and considering the strict conformance or non strict
conformance state.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
tgamma is actually lgamma, a related but very different function.
This patch is from:
https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=5;filename=Fix-pow-erf-tgamma.patch;att=3;bug=768090
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
erf/erfc diverge (instead of converging to 1 or 0) for arguments above
about 2.
This patch is from:
https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=5;filename=Fix-pow-erf-tgamma.patch;att=3;bug=768090
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
pow/pown ignore the sign of their first argument (e.g. pow(-2,3) gives
8 instead of -8)
This patch is from:
https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=5;filename=Fix-pow-erf-tgamma.patch;att=3;bug=768090
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
Add support for the following OPs:
FCmp/ICmp/FPToSI/FPToUI/SIToFP/UIToFP.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
cmake interprets OCL_PCM_PATH=... as a command and will enclose it in
quotes in case it contains characters requiring protection, e.g. ~
a quoted "FOO=bar" is interpreted by /bin/sh as a command (that does not
exist), not a variable setting for a following command
use env to set the variables unambiguously
Signed-off-by: Andreas Beckmann <anbe@debian.org>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
Signed-off-by: Andreas Beckmann <anbe@debian.org>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
For 0.9.x, we only support GCC build.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
It seems that hw return wrong result when y is equal to 0x80000000
in sub_sat(int x, int y). So we re-write it as:
add_sat(add_sat(0x7fffffff, x), 1)
Also enable corresponding utest.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
work_group_size_hint should define another variable.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
the backend need return the kernel FUNCTION ATTRIBUTE message to the
clGetKernelInfo.
there are 3 kind of function attribute so far, vec_type_hint parameter
is not available to return due to llvm lack of such info.
Signed-off-by: Luo <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
According to the spec:
The build status is to
Returns the build, compile or link status,
whichever was performed last on program for
device.
The previous implementation only consider the clProgramBuild and
doesn't consider the compile. Now fix it.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
If the binary is a executable type, the first byte is zero and
we need to set the binary type correctly to CL_PROGRAM_BINARY_TYPE_EXECUTABLE.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
If we want to link multiple files together, and one kernel
function need refer other kernel functions in other files,
we must not set those functions as linked once attribute.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
LunarGLASS have update his copyright, so update the copyright in llvm_scalarize.cpp.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Actually, we don't support double completely currently.
Let's disable it now. This bring a little incompatible point with the 1.2 spec
which doesn't require the kernel to use the following pragma to enable fp64.
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
If the application wants to try the partially supported double with beignet
under opencl 1.2, the application will still need to add the above pragma.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
And when we fail to compile a module, the fileName may be NULL, we can't
access it unconditionally.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
the memory object should be checked whether valid in context buffers before being set as kernel arguments.
v2: rename the function from mem_in_buffers to is_valid_mem, move the
magic header check into it.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
add CL_KERNEL_GLOBAL_WORK_SIZE option for clGetKernelWorkGroupInfo.
v2: should return the max global work size instead of current work size.
This funtion need return CL_INVALID_VALUE if the device is not a custom
device or kernel is not a built-in kernel.
we have 3 kind of built-in kernels for 1d/2d/3d memories, the max global
work size are decided by the dimension and memory type.
the piglit fail is caused by calling NON built-in kernels, so need send
patch to piglit later.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Actually, CLANG does take this option and we should not
filter it out. We also change the default option to create
PCH file to -cl-std=CL1.2. And if the user pass in a CL1.1
we will have to disable PCH.
Another change is that if we are CL1.2, then we should enable
the cl_khr_fp64 by default. As from CL1.2, this extension should
be enabled by default.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
The -cl-std= will specify the least version to compile
the source code providing to our API. So we need to
check it early, and return failure if our platform's
version can not meet the request. In the backend, we
just ignore this cmd line option.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
It seems that this function is required by latest PyOpenCL.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Now beignet is a pure opencl 1.2 implementation.
Set some predefined macros correctly.
__OPENCL_C_VERSION__ and __OPENCL_VERSION__ should
be 120 by default.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
add pointer check.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Reported-by: Jérôme
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
The needNewBTI is a state that only valid for the current candidate.
So need to reset to default value for each candidate.
This fix the regression in opencv 3.0:
./opencv_perf_objdetect OCL_Cascade_Image_MinSize_CascadeClassifier.CascadeClassifier
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Clear to zero to avoid garbage data, as we do not
assign it later for local/constant memory access.
v2:
move initialization code into constructor.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Due to hardware limitation on Gen7/Gen75 when sampling a
surface with clamp address mode and nearest filter mode
on a integer image1Darray type surface, we have to bind
one buffer to to bti. The previous implementation hard
coded it to 128 + original index and when check whether
it is such type bti in driver layer, assume the bti reserved
is 3 which is wrong now.
This patch fixed those hard coded functions and use the
macros defined in the program.h.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
When there are multi printf statements in multi kernel
fucntions within the same translate unit, if they have
the same sting parameter, the Clang will just generate
one global string named .strXXX to represent that string.
So when translating the kernel to gen, we can not unref
that global var. Just ignore it to avoid assert.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
1. Move the bti/Register map from gbe::Context to ir::Function.
2. use GlobalVariable instead of 'call' to get internal buffer (used for printf) base address.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Previously, we simply map 2G surface for memory access,
which has obvious security issue, user can easily read/write graphics
memory that does not belong to him. To prevent such kind of behaviour,
We bind each surface to a dedicated bti. HW provides automatic
bounds check. For out-of-bound write, it will be ignored. And for read
out-of-bound, hardware will simply return zero value.
The idea behind the patch is for a load/store instruction, it will search
through the LLVM use-def chain until finding out where the address
comes from. Then the bti is saved in ir::Instruction and used for
the later code generation. And for mixed pointer case, a load/store
will access more than one bti.
To simplify some code, '0' is reserved for constant address space,
'1' is reserved for private address space. Other btis are assigned
automatically by backend.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: Meng, Mengmeng <mengmeng.meng@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: Meng, Mengmeng <mengmeng.meng@intel.com>
|
|
Integer type wider than 64 bit is hard to handle on Gen.
Let's try to prevent ScalarReplAggregates pass to generate
such type of integer.
v2:
fix compilation error with LLVM 3.3.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
When there are multi printf statements in multi kernel
fucntions within the same translate unit, if they have
the same sting parameter, the Clang will just generate
one global string named .strXXX to represent that string.
So when translating the kernel to gen, we can not unref
that global var. Just ignore it to avoid assert.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
The fastpath is to lose some accuracy but get fast speed. It is not
to generate error result. The rootn has many special input and need
to be taken care before we call the native pow directly.
This patch fix all the pow related failures at the OpenCV 3.0 test
suite.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
Similar as the bug found by junyan, some events are
accessed before assigned.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
|
|
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
1. A INSERT_REGINSERT_REG typo.
2. Release main_buf in utest sub_buffer_check.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
1. Some systems don't define ulong type, use unsigned long instead of..
2. Use sA, sB... instead of sa, sb... to access vector 16, because sometimes sa, sb will cause clang error.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|