Age | Commit message (Collapse) | Author | Files | Lines |
|
I found EXEC_PROMGRAM may truncate the output variable for some case
thus we may get incorrect generated unit test cases thus break the
configuration some times.
This patch use EXECUTE_PROCESS to replace all the deprecated EXEC_PROGRAM
and it will not truncate the output variable of the command.
v2:
fix the error in examples/CMakeLists.txt.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
|
|
Compare with tgamma instead of tgammaf for better accuracy.
Include negative inputs, and handle the resulting denormals.
Print maximum error found.
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Make the build scripts work in both Python 2 and Python 3.
(CMake prefers Python 2 if both are available, but will use
Python 3 if only it is installed.)
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
The old tgamma=exp(lgamma) implementation had high rounding error on
large outputs, exceeding the 16ulp specification for approx. x>8
(hence the test failure in strict conformance mode).
Replace this with an implementation based on glibc's
http://sources.debian.net/src/glibc/2.19-17/sysdeps/ieee754/flt-32/e_gammaf_r.c/
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
0.01**20.5 is denormal; at least Ivy Bridge does not support
denormals and hence returns 0. As this is allowed by the
OpenCL standard, it shouldn't fail the test.
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
Reflect recent beignet and Linux changes.
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Run a small __local-using kernel in clGetDeviceIDs; if this returns
the wrong result, return CL_DEVICE_NOT_FOUND.
As far as I can see, there's no way to tell in advance (except
unreliably with a global version check) whether __local-using batches
will be accepted...so the easiest solution is probably to just try
running one and see what result we get.
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
We will defer the fix of this known issue to 1.1.0. Let's document it
before that.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
For HSW platform, due to the atomic in L3 related registers' usage,
we always need to disable the cmd parser.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Although the recommended LLVM version is 3.5, Beignet still support to
build with LLVM 3.3/3.4.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: Mengmeng Meng <mengmeng.meng@intel.com>
|
|
In uniform mode, we should set simd width to 1 and set noMask bit.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Using the inserPos is good enough. If using --insertPos, there
is one potential issue when the insertPos is the head of a list
then it will trigger an assertion.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
When we want to delete an old instruction and replace it with the new one,
we only call the LLVM IR's replace function which is not sufficient for
the scalarize pass, as we also keep some local reference int eh vecVals
map. We need to replace all of those local reference also.
Otherwise, the deleted values may be used in the subsequent instructions
which causes fatal error latter.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Due to the private constant buffer support, it introduces private address
space mixed with constant address space some time. And more generic, one
constant address space may be located in private address space in LLVM IR
layer. Such as the following code:
__kernel ...
{
const int2 foo[] = {{0, 1}, {2, 3}};
int2 data = foo[get_global_id(0) % 2];
}
The foo is in private address space but we finally will use __constant bti
to access it in Gen backend. The the above code will cause a assertion fail
in gen insturcion selection stage, because it generate a vector loading
instruction on a __constant buffer.
So we should use the actual BTI data to determine one pointer's address space
rather than get it from the LLVM IR layer.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
If the front end label ip exceed 0xffff, then the backend will
use real DW to represent each block's IP address. This is
a dynamic behaviour according to the actual front end's label
number.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
The front end label is still 16 bit. But the auxiliary
label could be larger than that. This is the preparation
to support 32 bit label for both front end and backend.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
For some extremly large kernel, these values may be larger than
0xFFFF, we have to extend them to 32 bit.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Should not use hard coded uint16_t for safe type definition.
Prepare to extend some types to uint32_t.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Beignet uses drm_intel_gem_bo_context_exec() to flush command queue to
linux drm driver layer. We need to check the return value of that function,
as it may fail when the application uses very large array.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
All the gen registers should get the uniform information from the
corresponding virtual registers. The use of GenRegister::xxxgrf
on a virtual register is very dangerous which may cause inconsistency.
This patch eliminate all the use of it in gen_insn_selection stage.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Otherwise, it may cause segfault in instruction encoding stage.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
CL_COMPLETE + thread safety for callbacks
When trying to register a callback on the clEnqueueReadBuffer command, since it is processed
synchroniously all the time, the command was marked CL_COMPLETE every time. If the event returned
by clEnqueueReadBuffer was then used to register a callback function, the callback function did
no check to execute it if nessary.
Modified the handling of the callback registration in cl_set_event_callback to only call the callback being created if it's status is already reached.
Added thread safety measures for pfn_notify calls since the status value can be changed while executing the callback.
Grouped the pfn_notify calls to a unified function cl_event_call_callback that handles thread safety: it queues callbacks in a node list while under the protection of pthread_mutex and then calls the callbacks outside of the pthread_mutex (this is required because the callback can deadlock if it calls a cl_api function that uses the mutex)
Signed-off-by: David Couturier <david.couturier@polymtl.ca>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
If /dev/dri/cardX is inaccessible, return CL_DEVICE_NOT_FOUND,
don't assert-fail.
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
As beignet now works with LLVM/Clang 3.6, accept this version
when searching for llvm-config.
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
compile the OpenCL standard library with the same version of clang
as will compile OpenCL user code, not plain "clang" (i.e. the
system default version, which may be different).
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
As presently written, a 'make package' will attempt to INSTALL the
Beignet ICD loader to /etc/OpenCL/vendors whereas it should just
do a local install and then package the file. The proposed change instructs
CPack to include the `DESTDIR` variable when it calls `make install`, thus
directing the desination for the ICD loader to a local directory instead
of a system path.
Signed-off-by: Brian Kloppenborg <brian@arrayfire.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
llvm 3.6 will give a UNDEF value for NAN. The will cause
the store instruction for UNDEF to be ignored. We need
to modify it to NAN here.
Comments from Zhigang:
"
The related commit of why LLVM won't just simply return NaN for such
case is at:
Make the sqrt intrinsic return undef for a negative input.
As discussed here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140609/220598.html
And again here:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/077168.html
The sqrt of a negative number when using the llvm intrinsic is undefined.
We should return undef rather than 0.0 to match the definition in the LLVM IR lang ref.
This change should not affect any code that isn't using "no-nans-fp-math";
ie, no-nans is a requirement for generating the llvm intrinsic in place of a sqrt function call.
Unfortunately, the behavior introduced by this patch will not match current gcc, xlc, icc, and
possibly other compilers. The current clang/llvm behavior of returning 0.0 doesn't either.
We knowingly approve of this difference with the other compilers in an attempt to flag code
that is invoking undefined behavior.
A front-end warning should also try to convince the user that the program will fail:
http://llvm.org/bugs/show_bug.cgi?id=21093
Differential Revision: http://reviews.llvm.org/D5527
This patch is a workaround for the following scenario:
printf("%f \n", sqrt(-1.0f));
"
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Accordying to OpenCL 1.2 Rev 17:
"CL_KERNEL_ARG_TYPE_CONST is returned if the argument is a pointer and the referenced type is declared with the restrict or const qualifier. For
example, a kernel argument declared as global int const *x returns CL_KERNEL_ARG_TYPE_CONST but a kernel argument declared as global int *
const x does not."
So only need to return CL_KERNEL_ARG_TYPE_CONST for pointer arguments.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: "Weng, Chuanbo" <chuanbo.weng@intel.com>
|
|
On some distributions, the CMAKE_INSTALL_FULL_LIBDIR or CMAKE_LIBRARY_ARCHITECTURE
may be undefined. To avoid generate intel-beignet-.icd file name, we need to get
rid of the extra "-" for such case.
Reported by Igor Gnatenko.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
And update document accordingly.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
As constant propagation will introduce constantExpr and gep instruction,
I choose not to run constant propagation pass after RemoveGep pass.
So, here only generate Multiply as needed.
We may do such kind of optimization in Gen IR level in the future.
This could fix the performance regression introduced by:
"GBE: Import constantexpr lower pass from pNaCl"
to the opencv case:
opencv_perf_imgproc/OCL_BilateralFixture_Bilateral
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: Meng Mengmeng <mengmeng.meng@intel.com>
|
|
LLVM3.6 revert the c api LLVMLinkModules to LLVM3.5 last-minute. Consist with it.
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
As there may be some other LLVM users such as mesa, and they
may link to different LLVM library. To avoid such type of
conflicts, we use -Bsymbolic to disable the symbol preemption.
This patch should fix the build bug at:
https://bugs.freedesktop.org/show_bug.cgi?id=89325
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
I found some optimization pass may add fastcall attribute to some
builtin functions. We need to add the corresponding support.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
LLVM 3.6 may generate the following instructions:
%Pivot = icmp slt i1 %trunc49, false
when do siwth lowering pass.
To support it we must use GEN_TYPE_W to represent B rather
than GEN_TYPE_UW and we also need to remove the corresponding
assertions.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|