summaryrefslogtreecommitdiff
path: root/src/cl_command_queue.h
AgeCommit message (Collapse)AuthorFilesLines
2016-12-30OCL20: handle device enqueue in runtime.Yang, Rong R1-4/+9
There are some step to handle device enqueue: 1. allocate the device enqueue bo to store the device enqueue information for parent kernel. Add must convert all global buffers to SVM buffers to make sure the child kernels have the same GPU address. 2. When flush the command, check whether have device enqueue or not. If has device enqueue, must wait finish and parse the device enqueue info. 3. Start the child ndrange according the device enqueue info, and the parent's global buffers as the exec info. Because of non uniform workgroup size, one enqueue api will flush serveral times, but device enqueue only need handle once, so add a flag to function cl_command_queue_flush to indicate the last flush. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2016-12-19Move clCreateCommandQueueWithProperties API to command_queue file.Junyan He1-12/+2
Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-12-16refine clCreateCommandQueue and clRetainCommandQueue.Junyan He1-7/+4
Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-12-16Add multi devices support in context.Junyan He1-0/+1
In future there may be more than one device on the platform, we need to consider multi devices within one context. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-11-08Runtime: Add support for queue size and fix error handlingPan Xiuli1-0/+1
V2: Remove check for device queue and add device queue flag. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-11-08OCL20: Implement clSetKernelExecInfo apiYang Rong1-2/+11
The extra exec info need reloc, otherwize gpu can't read/write. And it don't need set to curbe. So reloc it to unused binding table. Signed-off-by: Yang Rong <rong.r.yang at intel.com> Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
2016-10-10Delete useless event list in command queue struct.Junyan He1-15/+4
Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-10-10Delete useless cl_thread files.Junyan He1-14/+0
Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-09-28Add ref check for CL object's validation.Junyan He1-1/+3
The CL object with ref == 0 should be considered as a invalid object. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-09-28Modify all event related functions using new event handle.Junyan He1-3/+4
Rewrite the cl_event, and modify all the event functions using this new event manner. Event will co-operate with command queue's thread together. v2: Fix a logic problem in event create failed. V3: Set enqueue default to do nothing, handle some enqueue has nothing to do. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-09-23Add command queue's enqueue thread.Junyan He1-0/+21
According to Spec, event should be more suitable to implement in async mode. We now add a thread to each command queue to handle the event commands. The basic idea is: 1. If a command depends on other events which are not COMPLETED, this command must be queued to that thread. Every event's status change will notify the command queue, and give that thread a chance to dequeue and run the enqueued commands. 2. For some BLOCK API, such as MapBuffer with BLOCK flag set, we will wait for all the events in wait list ready and execute it in sync mode, no event will be queued to that thread. 3. For NDRange like commands, because we want to gain the best performance, we will check its wait list, if all are COMPLETED, we SUBMIT that NDRange command, and set the event to SUBMUTTED status. Event will also be queued to that thread, and that thread will wait for it COMPLETED. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-09-23Delete all the verbose locks and use list to store CL objects.Junyan He1-1/+0
We use context's lock when we add and delete cl objects. Every cl object should use it's own lock to protect itself. We also add some helper functions to ease the adding and removing operations. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2016-09-02Runtime: Apply base object to cl_command_queue.Junyan He1-3/+5
Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-12-09make Beignet as intermedia layer of CMRTGuo Yejun1-0/+2
CMRT is C for Media Runtime on Intel GPU, see https://github.com/01org/cmrt. There is a request to make Beignet as intermedia layer of CMRT, in other words, application programer write OpenCL APIs to execute the CM kernel on GPU, the following shows the key code, and please refer to the next patch of unit test for detail. prog = clCreateProgramWithBinary("cm kernel"); clBuildProgram(prog); kernel = clCreateKernel(prog, "kernel name"); image = clCreateImage(); clSetKernelArg(kernel, image); clEnqueueNDRangeKernel(kernel); Inside Beignet, once cm kernel is invoked, the following relative APIs will be directly passed to CMRT library (libcmrt.so) which is loaded via dlopen only when necessary. Since we use this simple method to keep the code clean, OpenCL spec is not strictly followed, and cl_event is not supported for this case. v2: add comments about the cm queue in fuction cmrt_enqueue Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
2015-07-15Fixed a thread safe bug.Zhigang Gong1-2/+0
last_event and current_event should be thread private data. Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2015-04-10runtime: Enhance the error handling when flush gpgpu command queue.Zhigang Gong1-1/+1
Beignet uses drm_intel_gem_bo_context_exec() to flush command queue to linux drm driver layer. We need to check the return value of that function, as it may fail when the application uses very large array. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2014-12-25Remove deprecated fulsim codeZhenyu Wang1-4/+0
Remove pretty old fulsim code which seems having no users also used interfaces not in open source libdrm, and call windows fulsim binary instead of linux. We will use current libdrm interface instead. Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2014-11-11License: adjust all license version to LGPL v2.1+.Zhigang Gong1-1/+1
To make the license statement consistent to each other, adjust all license versions to v2.1+. Thus beignet should have a pure LGPL v2.1+ license. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
2014-07-03runtime: fix a gpgpu event and thread local gpgpu handling bug.Zhigang Gong1-0/+4
When pending a command queue, we need to record the whole gpgpu structure not just the batch buffer. For the following reason: 1. We need to keep those private buffer, for example those printf buffers. 2. We need to make sure this gpgpu will not be reused by other enqueuement. v2: Don't try to flush all user event attached to the queue. Just need to flush the current event when doing command queue flush. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
2014-06-13add [opencl-1.2] API clEnqueueBarrierWithWaitList.Luo1-3/+6
This command blocks command execution, that is, any following commands enqueued after it do not execute until it completes; API clEnqueueMarkerWithWaitList patch didn't push the latest, update in this patch. Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com> Signed-off-by: Luo <xionghu.luo@intel.com> Conflicts: src/cl_event.c
2013-11-08Move the gpgpu struct from cl_command_queue to thread specific contextJunyan He1-1/+8
We find some cases will use multi-threads to run on the same queue, executing the same kernel. This will cause the gpgpu struct which is very important for GPU context setting be destroyed because we do not implement any sync protect on it now. Move the gpgpu struct into thread specific space will fix this problem because the lib_drm will do the GPU command serialization for us. Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com> Reviewed-by: "Zou, Nanhai" <nanhai.zou@intel.com>
2013-09-18Implement clEnqueueMarker and clEnqueueBarrier.Yang Rong1-0/+15
Add some event info to cl_command_queue. One is non-complete user events, used to block marker event and barrier. After these events become CL_COMPLETE, the events blocked by these events also become CL_COMPLETE, so marker event will also set to CL_COMPLETE. If there is no user events, need wait last event complete and set marker event to complete. Add barrier_index, for clEnqueueBarrier, point to user events, indicate the enqueue apis follow clEnqueueBarrier should wait on how many user events. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-09-04Implement constant buffer based on constant cache.Ruiling Song1-3/+0
Currently, simply allocate enough graphics memory as constant memory space. And bind it to bti 2. Constant cache read are backed by dword scatter read. Different from other data port messages, the address need to be dword aligned, and the addresses are in units of dword. The constant address space data are placed in order: first global constant, then the constant buffer kernel argument. v2: change function & variable naming, to make clear 'curbe' and 'constant buffer' Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-08-12Add function cl_command_queue_flush to flush a commandYang Rong1-0/+3
Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-06-26Add the support of the API: clGetCommandQueueInfoJunyan He1-7/+8
Though we support get the CL_QUEUE_PROPERTIES, but because the CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE and CL_QUEUE_PROFILING_ENABLE will never be set when create the queue, it just return a all 0 bitfield now. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Song, Ruiling <ruiling.song@intel.com>
2013-06-13CL: Fix the bug in clfinish.Zou, Nanhai1-1/+0
The previous implementation forget to set the last batch buffer, so it always does nothing at clFinish(). Now we move the gpu sync to driver side, and set the last batch to proper buffer object and make clFinish work as expected. Reported and tested by: Edward Ching <edward.k.ching@gmail.com> Signed-off-by: Zou, Nanhai <nanhai.zou@intel.com> Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-06-05GBE: Add two builtin functions get_work_dim / get_global_offset.Zhigang Gong1-0/+1
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com> Tested-by: Yi Sun <yi.sun@intel.com>
2013-05-15GBE/Runtime: Optimize Sample/TypedWrite instruction.Zhigang Gong1-0/+3
This commit does two major things as below: 1. Allocate image surface at compile time, and add new gbe interfaces to let runtime know how many image surfaces we have, and the image allocation informations. Thus the runtime library know how to bind those image surfaces. 2. As now for both image and sampler, at compile time, we know the eaxct binding table index. We no longer need to get those index from the input argument(curbe) and prepare the desc to the architecture register. We can use imm as the desc thus we can save 4 out of 4 instructions for SampleInstruction and save 2 out of 12 instructions for the TypedWriteInstruction. This patch is also a major prepartion for the get_image_width/height/... functions. Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com> Tested-by: Lv, Meng <meng.lv@intel.com>
2013-04-22Add constant pointer as argument support in runtime.Yang Rong1-0/+2
Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-04-18Implement KHR ICD extensionSimon Richter1-0/+1
This adds a pointer to the dispatch table at the beginning of every object of type - cl_command_queue - cl_context - cl_device_id - cl_event - cl_kernel - cl_mem - cl_platform_id - cl_program - cl_sampler as required by the ICD specification. The layout of the dispatch table comes from the OpenCL ICD loader by Brice Videau <brice.videau@imag.fr> and Vincent Danjean <Vincent.Danjean@ens-lyon.org>. To avoid dispatch table entries being overwritten with the ICD loader's implementations of the CL functions (as would be the proper behaviour for the ELF loader), the -Bsymbolic option is given to the linker. Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2012-08-10Added first test for stack handlingBenjamin Segovia1-7/+1
2012-08-10Made the structure naming more consistentBenjamin Segovia1-11/+11
2012-08-10Removed all direct dependencies to the intel driver (ie files contained in ↵Benjamin Segovia1-11/+12
intel/) and replaced them by dependencies to the call backs. This should allow an easier integration with the performance simulator
2012-08-10Added first support for (still linear) images for Gen7bsegovia1-0/+1
2012-08-10Cleaned and simplified code for gen6 Code now starts to work for gen7bsegovia1-1/+0
2012-08-10Cleaned up code to start GPGPU_WALKER integrationbsegovia1-6/+14
2012-08-10%s/genx_gpgpu/intel_gpgpu/bsegovia1-2/+2
2012-08-10Added all miniCL filesbsegovia1-0/+70