summaryrefslogtreecommitdiff
path: root/include/acpi
diff options
context:
space:
mode:
authorJames Morse <james.morse@arm.com>2020-05-01 17:45:42 +0100
committerRafael J. Wysocki <rafael.j.wysocki@intel.com>2020-05-19 19:51:11 +0200
commit7f17b4a121d0d50eca22cb1edebf0a157f3e43bf (patch)
tree31cd01e9c0f47de182fb69e6d1520ccfd53edf9e /include/acpi
parent062022315e8ad9e0628515dfc756ab54b5fdb26b (diff)
ACPI: APEI: Kick the memory_failure() queue for synchronous errors
memory_failure() offlines or repairs pages of memory that have been discovered to be corrupt. These may be detected by an external component, (e.g. the memory controller), and notified via an IRQ. In this case the work is queued as not all of memory_failure()s work can happen in IRQ context. If the error was detected as a result of user-space accessing a corrupt memory location the CPU may take an abort instead. On arm64 this is a 'synchronous external abort', and on a firmware first system it is replayed using NOTIFY_SEA. This notification has NMI like properties, (it can interrupt IRQ-masked code), so the memory_failure() work is queued. If we return to user-space before the queued memory_failure() work is processed, we will take the fault again. This loop may cause platform firmware to exceed some threshold and reboot when Linux could have recovered from this error. For NMIlike notifications keep track of whether memory_failure() work was queued, and make task_work pending to flush out the queue. To save memory allocations, the task_work is allocated as part of the ghes_estatus_node, and free()ing it back to the pool is deferred. Signed-off-by: James Morse <james.morse@arm.com> Tested-by: Tyler Baicar <baicar@os.amperecomputing.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Diffstat (limited to 'include/acpi')
-rw-r--r--include/acpi/ghes.h3
1 files changed, 3 insertions, 0 deletions
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index e3f1cddb4ac8..517a5231cc1b 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -33,6 +33,9 @@ struct ghes_estatus_node {
struct llist_node llnode;
struct acpi_hest_generic *generic;
struct ghes *ghes;
+
+ int task_work_cpu;
+ struct callback_head task_work;
};
struct ghes_estatus_cache {