diff options
author | Peter Zijlstra <peterz@infradead.org> | 2022-10-06 15:00:39 +0200 |
---|---|---|
committer | Peter Zijlstra <peterz@infradead.org> | 2022-10-17 16:32:05 +0200 |
commit | ca6c21327c6af02b7eec31ce4b9a740a18c6c13f (patch) | |
tree | 9d1f4e125bef71966eef12256f3495ba21940281 /include | |
parent | 9abf2313adc1ca1b6180c508c25f22f9395cc780 (diff) |
perf: Fix missing SIGTRAPs
Marco reported:
Due to the implementation of how SIGTRAP are delivered if
perf_event_attr::sigtrap is set, we've noticed 3 issues:
1. Missing SIGTRAP due to a race with event_sched_out() (more
details below).
2. Hardware PMU events being disabled due to returning 1 from
perf_event_overflow(). The only way to re-enable the event is
for user space to first "properly" disable the event and then
re-enable it.
3. The inability to automatically disable an event after a
specified number of overflows via PERF_EVENT_IOC_REFRESH.
The worst of the 3 issues is problem (1), which occurs when a
pending_disable is "consumed" by a racing event_sched_out(), observed
as follows:
CPU0 | CPU1
--------------------------------+---------------------------
__perf_event_overflow() |
perf_event_disable_inatomic() |
pending_disable = CPU0 | ...
| _perf_event_enable()
| event_function_call()
| task_function_call()
| /* sends IPI to CPU0 */
<IPI> | ...
__perf_event_enable() +---------------------------
ctx_resched()
task_ctx_sched_out()
ctx_sched_out()
group_sched_out()
event_sched_out()
pending_disable = -1
</IPI>
<IRQ-work>
perf_pending_event()
perf_pending_event_disable()
/* Fails to send SIGTRAP because no pending_disable! */
</IRQ-work>
In the above case, not only is that particular SIGTRAP missed, but also
all future SIGTRAPs because 'event_limit' is not reset back to 1.
To fix, rework pending delivery of SIGTRAP via IRQ-work by introduction
of a separate 'pending_sigtrap', no longer using 'event_limit' and
'pending_disable' for its delivery.
Additionally; and different to Marco's proposed patch:
- recognise that pending_disable effectively duplicates oncpu for
the case where it is set. As such, change the irq_work handler to
use ->oncpu to target the event and use pending_* as boolean toggles.
- observe that SIGTRAP targets the ctx->task, so the context switch
optimization that carries contexts between tasks is invalid. If
the irq_work were delayed enough to hit after a context switch the
SIGTRAP would be delivered to the wrong task.
- observe that if the event gets scheduled out
(rotation/migration/context-switch/...) the irq-work would be
insufficient to deliver the SIGTRAP when the event gets scheduled
back in (the irq-work might still be pending on the old CPU).
Therefore have event_sched_out() convert the pending sigtrap into a
task_work which will deliver the signal at return_to_user.
Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Debugged-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Marco Elver <elver@google.com>
Debugged-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Marco Elver <elver@google.com>
Diffstat (limited to 'include')
-rw-r--r-- | include/linux/perf_event.h | 19 |
1 files changed, 15 insertions, 4 deletions
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 853f64b6c8c2..0031f7b4d9ab 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -756,11 +756,14 @@ struct perf_event { struct fasync_struct *fasync; /* delayed work for NMIs and such */ - int pending_wakeup; - int pending_kill; - int pending_disable; + unsigned int pending_wakeup; + unsigned int pending_kill; + unsigned int pending_disable; + unsigned int pending_sigtrap; unsigned long pending_addr; /* SIGTRAP */ - struct irq_work pending; + struct irq_work pending_irq; + struct callback_head pending_task; + unsigned int pending_work; atomic_t event_limit; @@ -877,6 +880,14 @@ struct perf_event_context { #endif void *task_ctx_data; /* pmu specific data */ struct rcu_head rcu_head; + + /* + * Sum (event->pending_sigtrap + event->pending_work) + * + * The SIGTRAP is targeted at ctx->task, as such it won't do changing + * that until the signal is delivered. + */ + local_t nr_pending; }; /* |