summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-09-03drm/i915: Unlock the shared hwsp_gtt object after pinningdrm-intel-gt-nextThomas Hellström1-0/+9
The hwsp_gtt object is used for sub-allocation and could therefore be shared by many contexts causing unnecessary contention during concurrent context pinning. However since we're currently locking it only for pinning, it remains resident until we unpin it, and therefore it's safe to drop the lock early, allowing for concurrent thread access. Signed-off-by: Thomas Hellström <thomas.hellstrom@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-03drm/i915: Filter wake_flags passed to default_wake_functionChris Wilson1-3/+7
(NOTE: This is the minimal backportable fix, a full fix is being developed at https://patchwork.freedesktop.org/patch/388048/) The flags passed to the wait_entry.func are passed onwards to try_to_wake_up(), which has a very particular interpretation for its wake_flags. In particular, beyond the published WF_SYNC, it has a few internal flags as well. Since we passed the fence->error down the chain via the flags argument, these ended up in the default_wake_function confusing the kernel/sched. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2110 Fixes: ef4688497512 ("drm/i915: Propagate fence errors") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: <stable@vger.kernel.org> # v5.4+ Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200728152144.1100-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Joonas: Rebased and reordered into drm-intel-gt-next branch] [Joonas: Added a note and link about more complete fix] Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-03drm/i915: Remove i915_request.lock requirement for execution callbacksChris Wilson1-47/+58
To implement preempt-to-busy (and so efficient timeslicing and best utilization of the hardware submission ports) we let the GPU run asynchronously in respect to the ELSP submission queue. This created challenges in keeping and accessing the driver state mirroring the asynchronous GPU execution. Previous fix 1d9221e9d395 ("drm/i915: Skip signaling a signaled request") however did not correctly serialize request retirement with the execution callbacks. We were using the i915_request.lock to serialise adding an execution callback with __i915_request_submit. However, if we use an atomic llist_add to serialise multiple waiters and then check to see if the request is already executing, we can remove the irq-spinlock and fix serialization between retirement and execution callbacks in one go. v2: Avoid using the irq_work when outside of the irq-spinlocks, where we can execute the callbacks immediately. v3: Pay close attention to the order of setting ACTIVE on retirement, we need to ensure the request is signaled and breadcrumbs detached before we finish removing the request from the engine. v4: Expanded commit message. Fixes: 1d9221e9d395 ("drm/i915: Skip signaling a signaled request") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200716142207.13003-2-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Joonas: Rebased and reordered into drm-intel-gt-next branch] [Joonas: Added expanded commit message from Tvrtko and Chris] Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-03drm/i915: Be wary of data races when reading the active execlistsChris Wilson2-6/+34
To implement preempt-to-busy (and so efficient timeslicing and best utilization of the hardware submission ports) we let the GPU run asynchronously in respect to the ELSP submission queue. This created challenges in keeping and accessing the driver state mirroring the asynchronous GPU execution. The latest occurence of this was spotted by KCSAN: [ 1413.563200] BUG: KCSAN: data-race in __await_execution+0x217/0x370 [i915] [ 1413.563221] [ 1413.563236] race at unknown origin, with read to 0xffff88885bb6c478 of 8 bytes by task 9654 on cpu 1: [ 1413.563548] __await_execution+0x217/0x370 [i915] [ 1413.563891] i915_request_await_dma_fence+0x4eb/0x6a0 [i915] [ 1413.564235] i915_request_await_object+0x421/0x490 [i915] [ 1413.564577] i915_gem_do_execbuffer+0x29b7/0x3c40 [i915] [ 1413.564967] i915_gem_execbuffer2_ioctl+0x22f/0x5c0 [i915] [ 1413.564998] drm_ioctl_kernel+0x156/0x1b0 [ 1413.565022] drm_ioctl+0x2ff/0x480 [ 1413.565046] __x64_sys_ioctl+0x87/0xd0 [ 1413.565069] do_syscall_64+0x4d/0x80 [ 1413.565094] entry_SYSCALL_64_after_hwframe+0x44/0xa9 To complicate matters, we have to both avoid the read tearing of *active and avoid any write tearing as perform the pending[] -> inflight[] promotion of the execlists. This is because we cannot rely on the memcpy doing u64 aligned copies on all kernels/platforms and so we opt to open-code it with explicit WRITE_ONCE annotations to satisfy KCSAN. v2: When in doubt, write the same comment again. v3: Expanded commit message. Fixes: b55230e5e800 ("drm/i915: Check for awaits on still currently executing requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200716142207.13003-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Joonas: Rebased and reordered into drm-intel-gt-next branch] [Joonas: Added expanded commit message from Tvrtko and Chris] Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-03drm/i915: Add ww locking to pin_to_display_plane, v2.Maarten Lankhorst2-16/+41
Use ww locking for pin_to_display_plane for all the pinning and locking. With the locking removed from set_cache_level, we need to fix i915_gem_set_caching_ioctl to take the object reservation lock. As this is a single lock, we don't need to use the ww dance. Changes since v1: - Do not use ww locking in i915_gem_set_caching_ioctl (Thomas). Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-24-maarten.lankhorst@linux.intel.com
2020-09-03drm/i915: Add ww locking to vm_fault_gttMaarten Lankhorst1-18/+33
We want to start requiring the reservation_lock instead of obj->mm.lock for pinning objects, take the ww lock inside vm_fault_gtt as a first step towards the legacy lock removal. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-23-maarten.lankhorst@linux.intel.com
2020-09-03drm/i915: Move i915_vma_lock in the selftests to avoid lock inversion, v3.Maarten Lankhorst5-96/+125
Make sure vma_lock is not used as inner lock when kernel context is used, and add ww handling where appropriate. Ensure that execbuf selftests keep passing by using ww handling. Changes since v2: - Fix i915_gem_context finally. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-22-maarten.lankhorst@linux.intel.com
2020-09-03drm/i915: Use ww pinning for intel_context_create_request()Maarten Lankhorst1-5/+15
We want to get rid of intel_context_pin(), convert intel_context_create_request() first. :) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-21-maarten.lankhorst@linux.intel.com
2020-09-03drm/i915/selftests: Fix locking inversion in lrc selftest.Maarten Lankhorst1-3/+12
This function does not use intel_context_create_request, so it has to use the same locking order as normal code. This is required to shut up lockdep in selftests. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-20-maarten.lankhorst@linux.intel.com
2020-09-03drm/i915: Dirty hack to fix selftests locking inversionMaarten Lankhorst1-0/+12
Some i915 selftests still use i915_vma_lock() as inner lock, and intel_context_create_request() intel_timeline->mutex as outer lock. Fortunately for selftests this is not an issue, they should be fixed but we can move ahead and cleanify lockdep now. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-19-maarten.lankhorst@linux.intel.com
2020-09-03drm/i915: Convert i915_perf to ww locking as wellMaarten Lankhorst1-15/+42
We have the ordering of timeline->mutex vs resv_lock wrong, convert the i915_pin_vma and intel_context_pin as well to future-proof this. We may need to do future changes to do this more transaction-like, and only get down to a single i915_gem_ww_ctx, but for now this should work. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-18-maarten.lankhorst@linux.intel.com
2020-09-03drm/i915: Kill last user of intel_context_create_request outside of selftestsMaarten Lankhorst1-14/+29
Instead of using intel_context_create_request(), use intel_context_pin() and i915_create_request directly. Now all those calls are gone outside of selftests. :) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-17-maarten.lankhorst@linux.intel.com
2020-09-03drm/i915: Convert i915_gem_object/client_blt.c to use ww locking as well, v2.Maarten Lankhorst3-70/+163
This is the last part outside of selftests that still don't use the correct lock ordering of timeline->mutex vs resv_lock. With gem fixed, there are a few places that still get locking wrong: - gvt/scheduler.c - i915_perf.c - Most if not all selftests. Changes since v1: - Add intel_engine_pm_get/put() calls to fix use-after-free when using intel_engine_get_pool(). Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-16-maarten.lankhorst@linux.intel.com
2020-09-03drm/i915: Make sure execbuffer always passes ww state to i915_vma_pin.Maarten Lankhorst26-137/+217
As a preparation step for full object locking and wait/wound handling during pin and object mapping, ensure that we always pass the ww context in i915_gem_execbuffer.c to i915_vma_pin, use lockdep to ensure this happens. This also requires changing the order of eb_parse slightly, to ensure we pass ww at a point where we could still handle -EDEADLK safely. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-15-maarten.lankhorst@linux.intel.com
2020-09-03drm/i915: Rework intel_context pinning to do everything outside of pin_mutexMaarten Lankhorst5-106/+190
Instead of doing everything inside of pin_mutex, we move all pinning outside. Because i915_active has its own reference counting and pinning is also having the same issues vs mutexes, we make sure everything is pinned first, so the pinning in i915_active only needs to bump refcounts. This allows us to take pin refcounts correctly all the time. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-14-maarten.lankhorst@linux.intel.com
2020-09-03drm/i915: Pin engine before pinning all objects, v5.Maarten Lankhorst2-65/+152
We want to lock all gem objects, including the engine context objects, rework the throttling to ensure that we can do this. Now we only throttle once, but can take eb_pin_engine while acquiring objects. This means we will have to drop the lock to wait. If we don't have to throttle we can still take the fastpath, if not we will take the slowpath and wait for the throttle request while unlocked. The engine has to be pinned as first step, otherwise gpu relocations won't work. Changes since v1: - Only need to get a throttled request in the fastpath, no need for a global flag any more. - Always free the waited request correctly. Changes since v2: - Use intel_engine_pm_get()/put() to keeep engine pool alive during EDEADLK handling. Changes since v3: - Fix small rq leak. Changes since v4: - Use a single reloc_context, for intel_context_pin_ww(). Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-13-maarten.lankhorst@linux.intel.com
2020-09-03drm/i915: Nuke arguments to eb_pin_engineMaarten Lankhorst1-10/+7
Those arguments are already set as eb.file and eb.args, so kill off the extra arguments. This will allow us to move eb_pin_engine() to after we reserved all BO's. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-12-maarten.lankhorst@linux.intel.com
2020-09-03drm/i915: Add ww context handling to context_barrier_taskMaarten Lankhorst2-29/+48
This is required if we want to pass a ww context in intel_context_pin and gen6_ppgtt_pin(). Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-11-maarten.lankhorst@linux.intel.com
2020-09-03drm/i915: Use ww locking in intel_renderstate.Maarten Lankhorst3-32/+71
We want to start using ww locking in intel_context_pin, for this we need to lock multiple objects, and the single i915_gem_object_lock is not enough. Convert to using ww-waiting, and make sure we always pin intel_context_state, even if we don't have a renderstate object. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-10-maarten.lankhorst@linux.intel.com
2020-09-03drm/i915: Use per object locking in execbuf, v12.Maarten Lankhorst4-170/+262
Now that we changed execbuf submission slightly to allow us to do all pinning in one place, we can now simply add ww versions on top of struct_mutex. All we have to do is a separate path for -EDEADLK handling, which needs to unpin all gem bo's before dropping the lock, then starting over. This finally allows us to do parallel submission, but because not all of the pinning code uses the ww ctx yet, we cannot completely drop struct_mutex yet. Changes since v1: - Keep struct_mutex for now. :( Changes since v2: - Make sure we always lock the ww context in slowpath. Changes since v3: - Don't call __eb_unreserve_vma in eb_move_to_gpu now; this can be done on normal unlock path. - Unconditionally release vmas and context. Changes since v4: - Rebased on top of struct_mutex reduction. Changes since v5: - Remove training wheels. Changes since v6: - Fix accidentally broken -ENOSPC handling. Changes since v7: - Handle gt buffer pool better. Changes since v8: - Properly clear variables, to make -EDEADLK handling not BUG. Change since v9: - Fix unpinning fence on pnv and below. Changes since v10: - Make relocation gpu chaining working again. Changes since v11: - Remove relocation chaining, pain to make it work. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-9-maarten.lankhorst@linux.intel.com
2020-09-03drm/i915: Parse command buffer earlier in eb_relocate(slow)Maarten Lankhorst1-31/+37
We want to introduce backoff logic, but we need to lock the pool object as well for command parsing. Because of this, we will need backoff logic for the engine pool obj, move the batch validation up slightly to eb_lookup_vmas, and the actual command parsing in a separate function which can get called from execbuf relocation fast and slowpath. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-8-maarten.lankhorst@linux.intel.com
2020-09-03drm/i915: Remove locking from i915_gem_object_prepare_read/writeMaarten Lankhorst8-27/+59
Execbuffer submission will perform its own WW locking, and we cannot rely on the implicit lock there. This also makes it clear that the GVT code will get a lockdep splat when multiple batchbuffer shadows need to be performed in the same instance, fix that up. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-7-maarten.lankhorst@linux.intel.com
2020-09-03drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.Maarten Lankhorst22-39/+170
i915_gem_ww_ctx is used to lock all gem bo's for pinning and memory eviction. We don't use it yet, but lets start adding the definition first. To use it, we have to pass a non-NULL ww to gem_object_lock, and don't unlock directly. It is done in i915_gem_ww_ctx_fini. Changes since v1: - Change ww_ctx and obj order in locking functions (Jonas Lahtinen) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-6-maarten.lankhorst@linux.intel.com
2020-09-03Revert "drm/i915/gem: Split eb_vma into its own allocation"Maarten Lankhorst1-73/+51
This reverts commit 0f1dd02295f3 ("drm/i915/gem: Split eb_vma into its own allocation") and also moves all unreserving to a single place at the end, which is a minor simplification. With the WW locking, we will drop all references only at the end when unlocking, so refcounting can now be removed. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-5-maarten.lankhorst@linux.intel.com
2020-09-03Revert "drm/i915/gem: Drop relocation slowpath".Maarten Lankhorst1-10/+252
This reverts commit 7dc8f1143778 ("drm/i915/gem: Drop relocation slowpath"). We need the slowpath relocation for taking ww-mutex inside the page fault handler, and we will take this mutex when pinning all objects. We also functionally revert ef398881d27d ("drm/i915/gem: Limit struct_mutex to eb_reserve"), as we need the struct_mutex in the slowpath as well, and a tiny part of 003d8b9143a6 ("drm/i915/gem: Only call eb_lookup_vma once during execbuf ioctl"). Specifically, we make the -EAGAIN handling part of fallback to slowpath again. With this, we have a proper working slowpath again, which will allow us to do fault handling with WW locks held. [mlankhorst: Adjusted for reloc_gpu_flush() changes] Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> [mlankhorst: Removed extra reloc_gpu_flush()] Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-4-maarten.lankhorst@linux.intel.com
2020-09-03drm/i915: Revert relocation chaining commits.Maarten Lankhorst2-144/+35
This reverts commit 964a9b0f611ee ("drm/i915/gem: Use chained reloc batches") and commit 0e97fbb080553 ("drm/i915/gem: Use a single chained reloc batches for a single execbuf"). When adding ww locking to execbuf, it's hard enough to deal with a single BO that is part of relocation execution. Chaining is hard to get right, and with GPU relocation deprecated, it's best to drop this altogether, instead of trying to fix something we will remove. This is not a completely 1:1 revert, we reset rq_size to 0 in reloc_cache_init, this was from e3d291301f99 ("drm/i915/gem: Implement legacy MI_STORE_DATA_IMM"), because we don't want to break the selftests. (Daniel) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-3-maarten.lankhorst@linux.intel.com
2020-09-03Revert "drm/i915/gem: Async GPU relocations only"Maarten Lankhorst2-27/+313
This reverts commit 9e0f9464e2ab ("drm/i915/gem: Async GPU relocations only"), and related commit 7ac2d2536dfa7 ("drm/i915/gem: Delete unused code"). Async GPU relocations are not the path forward, we want to remove GPU accelerated relocation support eventually when userspace is fixed to use VM_BIND, and this is the first step towards that. We will keep async gpu relocations around for now, until userspace is fixed. Relocation support will be disabled completely on platforms where there was never any userspace that depends on it, as the hardware doesn't require it from at least gen9+ onward. For older platforms, the plan is to use cpu relocations only. The igt side is fixed in igt commit 39e9aa1032a4e ("tests/i915: Remove subtests that rely on async relocation behavior"). Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-2-maarten.lankhorst@linux.intel.com
2020-09-03drm/i915/gem: Free the fence after a fence-chain lookup failureChris Wilson1-0/+1
If dma_fence_chain_find_seqno() reports an error, it does so in its preamble before it disposes of the input fence. On handling the error, we need to drop the reference to the fence. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2292 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: 13149e8bafc4 ("drm/i915: add syncobj timeline support") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200806161056.17593-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-09-03drm/i915/gem: Reduce context termination list iteration guard to RCUChris Wilson1-13/+19
As we now protect the timeline list using RCU, we can drop the timeline->mutex for guarding the list iteration during context close, as we are searching for an inflight request. Any new request will see the context is banned and not be submitted. In doing so, pull the checks for a concurrent submission of the request (notably the i915_request_completed()) under the engine spinlock, to fully serialise with __i915_request_submit()). That is in the case of preempt-to-busy where the request may be completed during the __i915_request_submit(), we need to be careful that we sample the request status after serialising so that we don't miss the request the engine is actually submitting. Fixes: 4a3174152147 ("drm/i915/gem: Refine occupancy test in kill_context()") References: d22d2d073ef8 ("drm/i915: Protect i915_request_await_start from early waits") # rcu protection of timeline->requests References: https://gitlab.freedesktop.org/drm/intel/-/issues/1622 References: https://gitlab.freedesktop.org/drm/intel/-/issues/2158 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200806105954.7766-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-09-03drm/i915/selftests: Prevent selecting 0 for our random width/alignChris Wilson1-3/+3
When igt_random_offset() is a given a range of [0, PAGE_SIZE], it is allowed to return 0. However, attempting to use a size of 0 for the igt_lmem_write_cpu() byte poking, leads to call igt_random_offset() with a range of [offset, offset + 0] and ask it to find a length of 4 within it. This triggers the bug on that the requested length should fit within the range! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200806145728.16495-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-09-03drm/i915/gt: Hold context/request reference while breadcrumbs are activeChris Wilson2-38/+74
Currently we hold no actual reference to the request nor context while they are attached to a breadcrumb. To avoid freeing the request/context too early, we serialise with cancel-breadcrumbs by taking the irq spinlock in i915_request_retire(). The alternative is to take a reference for a new breadcrumb and release it upon signaling; removing the more frequently hit contention point in i915_request_retire(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200801160225.6814-2-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Joonas: Rebased and reordered into drm-intel-gt-next branch] Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-03drm/i915/gt: Move intel_breadcrumbs_arm_irq earlierChris Wilson1-42/+42
Move the __intel_breadcrumbs_arm_irq earlier, next to the disarm_irq, so that we can make use of it in the following patch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200801160225.6814-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-09-03drm/i915/gt: Shrink i915_page_directory's slab bucketChris Wilson4-28/+50
kmalloc uses power-of-two slab buckets for small allocations (up to a few pages). Since i915_page_directory is a page of pointers, plus a couple more, this is rounded up to 8K, and we waste nearly 50% of that allocation. Long terms this leads to poor memory utilisation, bloating the kernel footprint, but the problem is exacerbated by our conservative preallocation scheme for binding VMA. As we are required to allocate all levels for each vma just in case we need to insert them upon binding, this leads to a large multiplication factor for a single page vma. By halving the allocation we need for the page directory structure, we halve the impact of that factor, bringing workloads that once fitted into memory, hopefully back to fitting into memory. We maintain the split between i915_page_directory and i915_page_table as we only need half the allocation for the lowest, most populous, level. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-3-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-09-03drm/i915/gt: Switch to object allocations for page directoriesChris Wilson18-420/+289
The GEM object is grossly overweight for the practicality of tracking large numbers of individual pages, yet it is currently our only abstraction for tracking DMA allocations. Since those allocations need to be reserved upfront before an operation, and that we need to break away from simple system memory, we need to ditch using plain struct page wrappers. In the process, we drop the WC mapping as we ended up clflushing everything anyway due to various issues across a wider range of platforms. Though in a future step, we need to drop the kmap_atomic approach which suggests we need to pre-map all the pages and keep them mapped. v2: Verify our large scratch page is suitably DMA aligned; and manually clear the scratch since we are allocating plain struct pages full of prior content. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-2-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-09-03drm/i915: Preallocate stashes for vma page-directoriesChris Wilson9-190/+237
We need to make the DMA allocations used for page directories to be performed up front so that we can include those allocations in our memory reservation pass. The downside is that we have to assume the worst case, even before we know the final layout, and always allocate enough page directories for this object, even when there will be overlap. This unfortunately can be quite expensive, especially as we have to clear/reset the page directories and DMA pages, but it should only be required during early phases of a workload when new objects are being discovered, or after memory/eviction pressure when we need to rebind. Once we reach steady state, the objects should not be moved and we no longer need to preallocating the pages tables. It should be noted that the lifetime for the page directories DMA is more or less decoupled from individual fences as they will be shared across objects across timelines. v2: Only allocate enough PD space for the PTE we may use, we do not need to allocate PD that will be left as scratch. v3: Store the shift unto the first PD level to encapsulate the different PTE counts for gen6/gen8. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-09-03drm/i915/gt: Distinguish the virtual breadcrumbs from the irq breadcrumbsChris Wilson16-95/+162
On the virtual engines, we only use the intel_breadcrumbs for tracking signaling of stale breadcrumbs from the irq_workers. They do not have any associated interrupt handling, active requests are passed to a physical engine and associated breadcrumb interrupt handler. This causes issues for us as we need to ensure that we do not actually try and enable interrupts and the powermanagement required for them on the virtual engine, as they will never be disabled. Instead, let's specify the physical engine used for interrupt handler on a particular breadcrumb. v2: Drop b->irq_armed = true mocking for no interrupt HW Fixes: 4fe6abb8f513 ("drm/i915/gt: Ignore irq enabling on the virtual engines") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-4-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-09-03drm/i915/gt: Only transfer the virtual context to the new engine if activeChris Wilson1-25/+40
One more complication of preempt-to-busy with respect to the virtual engine is that we may have retired the last request along the virtual engine at the same time as preparing to submit the completed request to a new engine. That submit will be shortcircuited, but not before we have updated the context with the new register offsets and marked the virtual engine as bound to the new engine (by calling swap on ve->siblings[]). As we may have just retired the completed request, we may also be in the middle of calling virtual_context_exit() to turn off the power management associated with the virtual engine, and that in turn walks the ve->siblings[]. If we happen to call swap() on the array as we walk, we will call intel_engine_pm_put() twice on the same engine. In this patch, we prevent this by only updating the bound engine after a successful submission which weeds out the already completed requests. Alternatively, we could walk a non-volatile array for the pm, such as using the engine->mask. The small advantage to performing the update after the submit is that we then only have to do a swap for active requests. Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy") References: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine" Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: "Nayana, Venkata Ramana" <venkata.ramana.nayana@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-3-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-09-03drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbsChris Wilson4-52/+21
After staring at the breadcrumb enabling/cancellation and coming to the conclusion that the cause of the mysterious stale breadcrumbs must the act of submitting a completed requests, we can then redirect those completed requests onto a dedicated signaled_list at the time of construction and so eliminate intel_engine_transfer_stale_breadcrumbs(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-2-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-09-03drm/i915: Remove requirement for holding i915_request.lock for breadcrumbsChris Wilson3-87/+123
Since the breadcrumb enabling/cancelling itself is serialised by the breadcrumbs.irq_lock, with a bit of care we can remove the outer serialisation with i915_request.lock for concurrent dma_fence_enable_signaling(). This has the important side-effect of eliminating the nested i915_request.lock within request submission. The challenge in serialisation is around the unsubmission where we take an active request that wants a breadcrumb on the signaling engine and put it to sleep. We do not want a concurrent dma_fence_enable_signaling() to attach a breadcrumb as we unsubmit, so we must mark the request as no longer active before serialising with the concurrent enable-signaling. On retire, we serialise with the concurrent enable-signaling, but instead of clearing ACTIVE, we mark it as SIGNALED. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Joonas: Rebased and reordered into drm-intel-gt-next branch] Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-03drm/i915: Provide a fastpath for waiting on vma bindingsChris Wilson2-2/+22
Before we can execute a request, we must wait for all of its vma to be bound. This is a frequent operation for which we can optimise away a few atomic operations (notably a cmpxchg) in lieu of the RCU protection. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200731085015.32368-7-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-09-03drm/i915: Reduce locking around i915_active_acquire_preallocate_barrier()Chris Wilson1-5/+5
As the conversion between idle-barrier and full i915_active_fence is already serialised by explicit memory barriers, we can reduce the spinlock in i915_active_acquire_preallocate_barrier() for finding an idle-barrier to reuse to an RCU read lock to ensure the fence remains valid, only taking the spinlock for the update of the rbtree itself. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200731085015.32368-6-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-09-03drm/i915: Make the stale cached active node available for any timelineChris Wilson1-2/+28
Rather than require the next timeline after idling to match the MRU before idling, reset the index on the node and allow it to match the first request. However, this requires cmpxchg(u64) and so is not trivial on 32b, so for compatibility we just fallback to keeping the cached node pointing to the MRU timeline. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200731085015.32368-5-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-09-03drm/i915: Keep the most recently used active-fence upon discardChris Wilson2-11/+20
Whenever an i915_active idles, we prune its tree of old fence slots to prevent a gradual leak should it be used to track many, many timelines. The downside is that we then have to frequently reallocate the rbtree. A compromise is that we keep the most recently used fence slot, and reuse that for the next active reference as that is the most likely timeline to be reused. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200731085015.32368-4-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-09-03drm/i915: Export a preallocate variant of i915_active_acquire()Chris Wilson4-38/+130
Sometimes we have to be very careful not to allocate underneath a mutex (or spinlock) and yet still want to track activity. Enter i915_active_acquire_for_context(). This raises the activity counter on i915_active prior to use and ensures that the fence-tree contains a slot for the context. v2: Refactor active_lookup() so it can be called again before/after locking to resolve contention. Since we protect the rbtree until we idle, we can do a lockfree lookup, with the caveat that if another thread performs a concurrent insertion, the rotations from the insert may cause us to not find our target. A second pass holding the treelock will find the target if it exists, or the place to perform our insertion. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200731085015.32368-3-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-09-03drm/i915: Skip taking acquire mutex for no ref->active callbackChris Wilson1-8/+16
If no active callback is defined for i915_active, we do not need to serialise its enabling with the mutex. We still do only want to call the debug activate once, and must still serialise with a concurrent retire. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200731085015.32368-2-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-09-03drm/i915/selftests: Drop stale timeline constructor assertChris Wilson1-1/+0
Since we pass around encoded parameters to the kernel context constructor using the ce->timeline pointer, we can no longer assert that it should be zero for mock timeline construction. Fixes: b4ee5af8fc5f ("drm/i915/gt: Support multiple pinned timelines") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200731102206.6793-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Joonas: Updated Fixes: link after rebasing and reordering into drm-intel-gt-next branch] Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-03drm/i915/gt: Pull release of node->age under the spinlockChris Wilson1-1/+1
We need to ensure that the list is valid prior to marking the node as retrievable, otherwise we may see two threads compete over the same node in intel_gt_get_buffer_pool(). If the first thread acquires and releases the node in the same jiffie, the second thread may then acquire it (as the jiffie now again matches the expected value) and claim the node before it is put back into the list. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200730134049.8822-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-09-03drm/i915/gt: Support multiple pinned timelinesChris Wilson8-23/+61
We may need to allocate more than one pinned context/timeline for each engine which can utilise the per-engine HWSP, so we need to give each a different offset within it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200730183906.25422-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-09-03drm/i915/gem: Delay tracking the GEM context until it is registeredChris Wilson1-5/+11
Avoid exposing a partially constructed context by deferring the list_add() from the initial construction to the end of registration. Otherwise, if we peek into the list of contexts from inside debugfs, we may see the partially constructed context and chase down some dangling incomplete pointers. Reported-by: CQ Tang <cq.tang@intel.com> Fixes: 3aa9945a528e ("drm/i915: Separate GEM context construction and registration to userspace") References: f6e8aa387171 ("drm/i915: Report the number of closed vma held by each context in debugfs") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: CQ Tang <cq.tang@intel.com> Cc: <stable@vger.kernel.org> # v5.2+ Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200730092856.23615-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-09-03drm/i915/gt: Fix termination condition for freeing all buffer objectsChris Wilson1-5/+8
A last minute change, that unfortunately broke CI so badly it declared SUCCESS, was to refactor the debug free all buffer pool code to reuse the normal worker, inverted the termination condition so that it instead of discarding the nodes, they were all declared young enough and eligible for reuse. Fixes: bed4ce6f50aa ("drm/i915/gt: Delay taking the spinlock for grabbing from the buffer pool") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200729110756.2344-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Joonas: Updating Fixes: link after rebasing and reordering into drm-intel-gt-next] Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>