drm/i915: Start of GPU scheduler - ~johnharr/scheduler

diff options

author	John Harrison <John.C.Harrison@Intel.com>	2014-04-01 16:27:39 +0100
committer	John Harrison <John.C.Harrison@Intel.com>	2016-05-06 14:12:50 +0100
commit	be7c60b7b2c301a22be806529d28b9c6240be7ab (patch)
tree	3e76766c0b0fcb5cea092cbe8e5eda98eb19b520 /drivers/gpu/drm/i915/intel_psr.c
parent	2924c20373c1b86c9a8fd422b786a54692c97241 (diff)

drm/i915: Start of GPU scheduler

Initial creation of scheduler source files. Note that this patch implements most of the scheduler functionality but does not hook it in to the driver yet. It also leaves the scheduler code in 'pass through' mode so that even when it is hooked in, it will not actually do very much. This allows the hooks to be added one at a time in bite size chunks and only when the scheduler is finally enabled at the end does anything start happening. The general theory of operation is that when batch buffers are submitted to the driver, the execbuffer() code packages up all the information required to execute the batch buffer at a later time. This package is given over to the scheduler which adds it to an internal node list. The scheduler also scans the list of objects associated with the batch buffer and compares them against the objects already in use by other buffers in the node list. If matches are found then the new batch buffer node is marked as being dependent upon the matching node. The same is done for the context object. The scheduler also bumps up the priority of such matching nodes on the grounds that the more dependencies a given batch buffer has the more important it is likely to be. The scheduler aims to have a given (tuneable) number of batch buffers in flight on the hardware at any given time. If fewer than this are currently executing when a new node is queued, then the node is passed straight through to the submit function. Otherwise it is simply added to the queue and the driver returns back to user land. The scheduler is notified when each batch buffer completes and updates its internal tracking accordingly. At the end of the completion interrupt processing, if any scheduler tracked batches were processed, the scheduler's deferred worker thread is woken up. This can do more involved processing such as actually removing completed nodes from the queue and freeing up the resources associated with them (internal memory allocations, DRM object references, context reference, etc.). The work handler also checks the in flight count and calls the submission code if a new slot has appeared. When the scheduler's submit code is called, it scans the queued node list for the highest priority node that has no unmet dependencies. Note that the dependency calculation is complex as it must take inter-ring dependencies and potential preemptions into account. Note also that in the future this will be extended to include external dependencies such as the Android Native Sync file descriptors and/or the linux dma-buff synchronisation scheme. If a suitable node is found then it is sent to execbuff_final() for submission to the hardware. The in flight count is then re-checked and a new node popped from the list if appropriate. All nodes that are not submitted have their priority bumped. This ensures that low priority tasks do not get starved out by busy higher priority ones - everything will eventually get its turn to run. Note that this patch does not implement pre-emptive scheduling. Only basic scheduling by re-ordering batch buffer submission is currently implemented. Pre-emption of actively executing batch buffers comes in the next patch series. v2: Changed priority levels to +/-1023 due to feedback from Chris Wilson. Removed redundant index from scheduler node. Changed time stamps to use jiffies instead of raw monotonic. This provides lower resolution but improved compatibility with other i915 code. Major re-write of completion tracking code due to struct fence conversion. The scheduler no longer has it's own private IRQ handler but just lets the existing request code handle completion events. Instead, the scheduler now hooks into the request notify code to be told when a request has completed. Reduced driver mutex locking scope. Removal of scheduler nodes no longer grabs the mutex lock. v3: Refactor of dependency generation to make the code more readable. Also added in read-read optimisation support - i.e., don't treat a shared read-only buffer as being a dependency. Allowed the killing of queued nodes rather than only flying ones. v4: Updated the commit message to better reflect the current state of the code. Downgraded some BUG_ONs to WARN_ONs. Used the correct array memory allocator function (kmalloc_array instead of kmalloc). Corrected the format of some comments. Wrapped some lines differently to keep the style checker happy. Fixed a WARN_ON when killing nodes. The dependency removal code checks that nodes being destroyed do not have any oustanding dependencies (which would imply they should not have been executed yet). In the case of nodes being destroyed, e.g. due to context banning, then this might well be the case - they have not been executed and do indeed have outstanding dependencies. Re-instated the code to disble interrupts when not in use. The underlying problem causing broken IRQ reference counts seems to have been fixed now. v5: Shuffled various functions around to remove forward declarations as apparently these are frowned upon. Removed lots of white space as apparently having easy to read code is also frowned upon. Split the direct submission scheduler bypass code out into a separate function. Squashed down the i915_scheduler.c sections of various patches into this patch. Thus the later patches simply hook in existing code into various parts of the driver rather than adding the code as well. Added documentation to various functions. Re-worked the submit function in terms of mutex locking, error handling and exit paths. Split the delayed work handler function in half. Made use of the kernel 'clamp' macro. [Joonas Lahtinen] Added runtime PM calls as these must be done at the top level before acquiring the driver mutex lock. [Chris Wilson] Removed some obsolete debug code that had been forgotten about. Moved more clean up code into the 'i915_gem_scheduler_clean_node()' function rather than replicating it in mutliple places. Used lighter weight spinlocks. v6: Updated to newer nightly (lots of ring -> engine renaming). Added 'for_each_scheduler_node()' and 'assert_scheduler_lock_held()' helper macros. Renamed 'i915_gem_execbuff_release_batch_obj' to 'i915_gem_execbuf_release_batch_obj'. Updated to use 'to_i915()' instead of dev_private. Converted all enum labels to uppercase. Removed various unnecessary WARNs. Renamed 'saved_objects' to just 'objs'. Split code for counting incomplete nodes out into a separate function. Removed even more white space. Added a destroy() function. [review feedback from Joonas Lahtinen] Added running totals of 'flying' and 'queued' nodes rather than re-calculating each time as a minor CPU performance optimisation. Removed support for out of order seqno completion. All the prep work patch series (seqno to request conversion, late seqno assignment, etc.) that has now been done means that the scheduler no longer generates out of order seqno completions. Thus all the complex code for coping with such is no longer required and can be removed. Fixed a bug in scheduler bypass mode introduced in the clean up code refactoring of v5. The clean up function was seeing the node in the wrong state and thus refusing to process it. For: VIZ-1587 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Diffstat (limited to 'drivers/gpu/drm/i915/intel_psr.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: