Age | Commit message (Collapse) | Author | Files | Lines |
|
There are some step to handle device enqueue:
1. allocate the device enqueue bo to store the device enqueue
information for parent kernel. Add must convert all global buffers to
SVM buffers to make sure the child kernels have the same GPU address.
2. When flush the command, check whether have device enqueue or not. If
has device enqueue, must wait finish and parse the device enqueue info.
3. Start the child ndrange according the device enqueue info, and the
parent's global buffers as the exec info.
Because of non uniform workgroup size, one enqueue api will flush
serveral times, but device enqueue only need handle once, so add a flag
to function cl_command_queue_flush to indicate the last flush.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
In future there may be more than one device on the platform,
we need to consider multi devices within one context.
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
V2: Remove check for device queue and add device queue flag.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
The extra exec info need reloc, otherwize gpu can't read/write.
And it don't need set to curbe.
So reloc it to unused binding table.
Signed-off-by: Yang Rong <rong.r.yang at intel.com>
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
The CL object with ref == 0 should be considered
as a invalid object.
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Rewrite the cl_event, and modify all the event functions
using this new event manner. Event will co-operate with
command queue's thread together.
v2:
Fix a logic problem in event create failed.
V3:
Set enqueue default to do nothing, handle some enqueue has nothing
to do.
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
According to Spec, event should be more suitable to implement
in async mode. We now add a thread to each command queue to
handle the event commands. The basic idea is:
1. If a command depends on other events which are not COMPLETED,
this command must be queued to that thread. Every event's status
change will notify the command queue, and give that thread a
chance to dequeue and run the enqueued commands.
2. For some BLOCK API, such as MapBuffer with BLOCK flag set, we
will wait for all the events in wait list ready and execute it
in sync mode, no event will be queued to that thread.
3. For NDRange like commands, because we want to gain the best
performance, we will check its wait list, if all are COMPLETED,
we SUBMIT that NDRange command, and set the event to SUBMUTTED
status. Event will also be queued to that thread, and that
thread will wait for it COMPLETED.
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
We use context's lock when we add and delete cl objects.
Every cl object should use it's own lock to protect itself.
We also add some helper functions to ease the adding and
removing operations.
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
CMRT is C for Media Runtime on Intel GPU, see https://github.com/01org/cmrt.
There is a request to make Beignet as intermedia layer of CMRT, in
other words, application programer write OpenCL APIs to execute the
CM kernel on GPU, the following shows the key code, and please refer
to the next patch of unit test for detail.
prog = clCreateProgramWithBinary("cm kernel");
clBuildProgram(prog);
kernel = clCreateKernel(prog, "kernel name");
image = clCreateImage();
clSetKernelArg(kernel, image);
clEnqueueNDRangeKernel(kernel);
Inside Beignet, once cm kernel is invoked, the following relative APIs
will be directly passed to CMRT library (libcmrt.so) which is loaded
via dlopen only when necessary. Since we use this simple method to
keep the code clean, OpenCL spec is not strictly followed, and cl_event
is not supported for this case.
v2: add comments about the cm queue in fuction cmrt_enqueue
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
last_event and current_event should be thread private data.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Beignet uses drm_intel_gem_bo_context_exec() to flush command queue to
linux drm driver layer. We need to check the return value of that function,
as it may fail when the application uses very large array.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
Remove pretty old fulsim code which seems having no users
also used interfaces not in open source libdrm, and call
windows fulsim binary instead of linux.
We will use current libdrm interface instead.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
To make the license statement consistent to each other, adjust
all license versions to v2.1+. Thus beignet should have a pure
LGPL v2.1+ license.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
When pending a command queue, we need to record the whole gpgpu
structure not just the batch buffer. For the following reason:
1. We need to keep those private buffer, for example those printf buffers.
2. We need to make sure this gpgpu will not be reused by other enqueuement.
v2:
Don't try to flush all user event attached to the queue.
Just need to flush the current event when doing command queue flush.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
This command blocks command execution, that is, any following commands
enqueued after it do not execute until it completes;
API clEnqueueMarkerWithWaitList patch didn't push the latest, update in
this patch.
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Signed-off-by: Luo <xionghu.luo@intel.com>
Conflicts:
src/cl_event.c
|
|
We find some cases will use multi-threads to run on the same queue,
executing the same kernel. This will cause the gpgpu struct which
is very important for GPU context setting be destroyed because we
do not implement any sync protect on it now.
Move the gpgpu struct into thread specific space will fix this problem
because the lib_drm will do the GPU command serialization for us.
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Zou, Nanhai" <nanhai.zou@intel.com>
|
|
Add some event info to cl_command_queue.
One is non-complete user events, used to block marker event and barrier.
After these events become CL_COMPLETE, the events blocked by these events also
become CL_COMPLETE, so marker event will also set to CL_COMPLETE. If there is no
user events, need wait last event complete and set marker event to complete.
Add barrier_index, for clEnqueueBarrier, point to user events, indicate the enqueue
apis follow clEnqueueBarrier should wait on how many user events.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Currently, simply allocate enough graphics memory as constant memory space.
And bind it to bti 2. Constant cache read are backed by dword scatter read.
Different from other data port messages, the address need to be dword aligned,
and the addresses are in units of dword.
The constant address space data are placed in order: first global constant,
then the constant buffer kernel argument.
v2: change function & variable naming, to make clear 'curbe' and 'constant buffer'
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Though we support get the CL_QUEUE_PROPERTIES,
but because the CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
and CL_QUEUE_PROFILING_ENABLE will never be set when
create the queue, it just return a all 0 bitfield now.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Song, Ruiling <ruiling.song@intel.com>
|
|
The previous implementation forget to set the last batch
buffer, so it always does nothing at clFinish(). Now we
move the gpu sync to driver side, and set the last batch
to proper buffer object and make clFinish work as expected.
Reported and tested by: Edward Ching <edward.k.ching@gmail.com>
Signed-off-by: Zou, Nanhai <nanhai.zou@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Yi Sun <yi.sun@intel.com>
|
|
This commit does two major things as below:
1. Allocate image surface at compile time, and add new gbe interfaces to let runtime know
how many image surfaces we have, and the image allocation informations. Thus the runtime
library know how to bind those image surfaces.
2. As now for both image and sampler, at compile time, we know the eaxct binding table
index. We no longer need to get those index from the input argument(curbe) and prepare
the desc to the architecture register. We can use imm as the desc thus we can save
4 out of 4 instructions for SampleInstruction and save 2 out of 12 instructions for
the TypedWriteInstruction.
This patch is also a major prepartion for the get_image_width/height/... functions.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
|
|
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
This adds a pointer to the dispatch table at the beginning of every object
of type
- cl_command_queue
- cl_context
- cl_device_id
- cl_event
- cl_kernel
- cl_mem
- cl_platform_id
- cl_program
- cl_sampler
as required by the ICD specification. The layout of the dispatch table
comes from the OpenCL ICD loader by Brice Videau <brice.videau@imag.fr> and
Vincent Danjean <Vincent.Danjean@ens-lyon.org>.
To avoid dispatch table entries being overwritten with the ICD loader's
implementations of the CL functions (as would be the proper behaviour for
the ELF loader), the -Bsymbolic option is given to the linker.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
intel/) and replaced them by dependencies to the call backs. This should allow an easier integration with the performance simulator
|
|
|
|
|
|
|
|
|
|
|