From dcdfdd40fa82b6704d2841938e5c8ec3051eb0d6 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Tue, 6 Jun 2023 17:26:29 +0300 Subject: mm: Add support for unaccepted memory UEFI Specification version 2.9 introduces the concept of memory acceptance. Some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP, require memory to be accepted before it can be used by the guest. Accepting happens via a protocol specific to the Virtual Machine platform. There are several ways the kernel can deal with unaccepted memory: 1. Accept all the memory during boot. It is easy to implement and it doesn't have runtime cost once the system is booted. The downside is very long boot time. Accept can be parallelized to multiple CPUs to keep it manageable (i.e. via DEFERRED_STRUCT_PAGE_INIT), but it tends to saturate memory bandwidth and does not scale beyond the point. 2. Accept a block of memory on the first use. It requires more infrastructure and changes in page allocator to make it work, but it provides good boot time. On-demand memory accept means latency spikes every time kernel steps onto a new memory block. The spikes will go away once workload data set size gets stabilized or all memory gets accepted. 3. Accept all memory in background. Introduce a thread (or multiple) that gets memory accepted proactively. It will minimize time the system experience latency spikes on memory allocation while keeping low boot time. This approach cannot function on its own. It is an extension of #2: background memory acceptance requires functional scheduler, but the page allocator may need to tap into unaccepted memory before that. The downside of the approach is that these threads also steal CPU cycles and memory bandwidth from the user's workload and may hurt user experience. Implement #1 and #2 for now. #2 is the default. Some workloads may want to use #1 with accept_memory=eager in kernel command line. #3 can be implemented later based on user's demands. Support of unaccepted memory requires a few changes in core-mm code: - memblock accepts memory on allocation. It serves early boot memory allocations and doesn't limit them to pre-accepted pool of memory. - page allocator accepts memory on the first allocation of the page. When kernel runs out of accepted memory, it accepts memory until the high watermark is reached. It helps to minimize fragmentation. EFI code will provide two helpers if the platform supports unaccepted memory: - accept_memory() makes a range of physical addresses accepted. - range_contains_unaccepted_memory() checks anything within the range of physical addresses requires acceptance. Signed-off-by: Kirill A. Shutemov Signed-off-by: Borislav Petkov (AMD) Reviewed-by: Vlastimil Babka Acked-by: Mike Rapoport # memblock Link: https://lore.kernel.org/r/20230606142637.5171-2-kirill.shutemov@linux.intel.com --- drivers/base/node.c | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'drivers') diff --git a/drivers/base/node.c b/drivers/base/node.c index b46db17124f3..655975946ef6 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -448,6 +448,9 @@ static ssize_t node_read_meminfo(struct device *dev, "Node %d ShmemPmdMapped: %8lu kB\n" "Node %d FileHugePages: %8lu kB\n" "Node %d FilePmdMapped: %8lu kB\n" +#endif +#ifdef CONFIG_UNACCEPTED_MEMORY + "Node %d Unaccepted: %8lu kB\n" #endif , nid, K(node_page_state(pgdat, NR_FILE_DIRTY)), @@ -477,6 +480,10 @@ static ssize_t node_read_meminfo(struct device *dev, nid, K(node_page_state(pgdat, NR_SHMEM_PMDMAPPED)), nid, K(node_page_state(pgdat, NR_FILE_THPS)), nid, K(node_page_state(pgdat, NR_FILE_PMDMAPPED)) +#endif +#ifdef CONFIG_UNACCEPTED_MEMORY + , + nid, K(sum_zone_node_page_state(nid, NR_UNACCEPTED)) #endif ); len += hugetlb_report_node_meminfo(buf, len, nid); -- cgit v1.2.3 From 2e9f46ee1599be8a50a5366eb3ef4a4b5acff0b7 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Tue, 6 Jun 2023 17:26:30 +0300 Subject: efi/x86: Get full memory map in allocate_e820() Currently allocate_e820() is only interested in the size of map and size of memory descriptor to determine how many e820 entries the kernel needs. UEFI Specification version 2.9 introduces a new memory type -- unaccepted memory. To track unaccepted memory, the kernel needs to allocate a bitmap. The size of the bitmap is dependent on the maximum physical address present in the system. A full memory map is required to find the maximum address. Modify allocate_e820() to get a full memory map. Signed-off-by: Kirill A. Shutemov Signed-off-by: Borislav Petkov (AMD) Reviewed-by: Borislav Petkov Reviewed-by: Tom Lendacky Acked-by: Ard Biesheuvel Link: https://lore.kernel.org/r/20230606142637.5171-3-kirill.shutemov@linux.intel.com --- drivers/firmware/efi/libstub/x86-stub.c | 26 +++++++++++--------------- 1 file changed, 11 insertions(+), 15 deletions(-) (limited to 'drivers') diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c index a0bfd31358ba..cd77a7a61470 100644 --- a/drivers/firmware/efi/libstub/x86-stub.c +++ b/drivers/firmware/efi/libstub/x86-stub.c @@ -681,28 +681,24 @@ static efi_status_t allocate_e820(struct boot_params *params, struct setup_data **e820ext, u32 *e820ext_size) { - unsigned long map_size, desc_size, map_key; + struct efi_boot_memmap *map; efi_status_t status; - __u32 nr_desc, desc_version; + __u32 nr_desc; - /* Only need the size of the mem map and size of each mem descriptor */ - map_size = 0; - status = efi_bs_call(get_memory_map, &map_size, NULL, &map_key, - &desc_size, &desc_version); - if (status != EFI_BUFFER_TOO_SMALL) - return (status != EFI_SUCCESS) ? status : EFI_UNSUPPORTED; - - nr_desc = map_size / desc_size + EFI_MMAP_NR_SLACK_SLOTS; + status = efi_get_memory_map(&map, false); + if (status != EFI_SUCCESS) + return status; - if (nr_desc > ARRAY_SIZE(params->e820_table)) { - u32 nr_e820ext = nr_desc - ARRAY_SIZE(params->e820_table); + nr_desc = map->map_size / map->desc_size; + if (nr_desc > ARRAY_SIZE(params->e820_table) - EFI_MMAP_NR_SLACK_SLOTS) { + u32 nr_e820ext = nr_desc - ARRAY_SIZE(params->e820_table) + + EFI_MMAP_NR_SLACK_SLOTS; status = alloc_e820ext(nr_e820ext, e820ext, e820ext_size); - if (status != EFI_SUCCESS) - return status; } - return EFI_SUCCESS; + efi_bs_call(free_pool, map); + return status; } struct exit_boot_struct { -- cgit v1.2.3 From 745e3ed85f71a6382a239b03d9278a8025f2beae Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Tue, 6 Jun 2023 17:26:31 +0300 Subject: efi/libstub: Implement support for unaccepted memory UEFI Specification version 2.9 introduces the concept of memory acceptance: Some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP, requiring memory to be accepted before it can be used by the guest. Accepting happens via a protocol specific for the Virtual Machine platform. Accepting memory is costly and it makes VMM allocate memory for the accepted guest physical address range. It's better to postpone memory acceptance until memory is needed. It lowers boot time and reduces memory overhead. The kernel needs to know what memory has been accepted. Firmware communicates this information via memory map: a new memory type -- EFI_UNACCEPTED_MEMORY -- indicates such memory. Range-based tracking works fine for firmware, but it gets bulky for the kernel: e820 (or whatever the arch uses) has to be modified on every page acceptance. It leads to table fragmentation and there's a limited number of entries in the e820 table. Another option is to mark such memory as usable in e820 and track if the range has been accepted in a bitmap. One bit in the bitmap represents a naturally aligned power-2-sized region of address space -- unit. For x86, unit size is 2MiB: 4k of the bitmap is enough to track 64GiB or physical address space. In the worst-case scenario -- a huge hole in the middle of the address space -- It needs 256MiB to handle 4PiB of the address space. Any unaccepted memory that is not aligned to unit_size gets accepted upfront. The bitmap is allocated and constructed in the EFI stub and passed down to the kernel via EFI configuration table. allocate_e820() allocates the bitmap if unaccepted memory is present, according to the size of unaccepted region. Signed-off-by: Kirill A. Shutemov Signed-off-by: Borislav Petkov (AMD) Reviewed-by: Ard Biesheuvel Link: https://lore.kernel.org/r/20230606142637.5171-4-kirill.shutemov@linux.intel.com --- arch/x86/boot/compressed/Makefile | 1 + arch/x86/boot/compressed/mem.c | 9 + arch/x86/include/asm/efi.h | 2 + drivers/firmware/efi/Kconfig | 14 ++ drivers/firmware/efi/efi.c | 1 + drivers/firmware/efi/libstub/Makefile | 2 + drivers/firmware/efi/libstub/bitmap.c | 41 +++++ drivers/firmware/efi/libstub/efistub.h | 6 + drivers/firmware/efi/libstub/find.c | 43 +++++ drivers/firmware/efi/libstub/unaccepted_memory.c | 222 +++++++++++++++++++++++ drivers/firmware/efi/libstub/x86-stub.c | 13 ++ include/linux/efi.h | 12 +- 12 files changed, 365 insertions(+), 1 deletion(-) create mode 100644 arch/x86/boot/compressed/mem.c create mode 100644 drivers/firmware/efi/libstub/bitmap.c create mode 100644 drivers/firmware/efi/libstub/find.c create mode 100644 drivers/firmware/efi/libstub/unaccepted_memory.c (limited to 'drivers') diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile index 6b6cfe607bdb..cc4978123c30 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -107,6 +107,7 @@ endif vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o $(obj)/tdcall.o +vmlinux-objs-$(CONFIG_UNACCEPTED_MEMORY) += $(obj)/mem.o vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_mixed.o diff --git a/arch/x86/boot/compressed/mem.c b/arch/x86/boot/compressed/mem.c new file mode 100644 index 000000000000..67594fcb11d9 --- /dev/null +++ b/arch/x86/boot/compressed/mem.c @@ -0,0 +1,9 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "error.h" + +void arch_accept_memory(phys_addr_t start, phys_addr_t end) +{ + /* Platform-specific memory-acceptance call goes here */ + error("Cannot accept memory"); +} diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index 419280d263d2..8b4be7cecdb8 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -31,6 +31,8 @@ extern unsigned long efi_mixed_mode_stack_pa; #define ARCH_EFI_IRQ_FLAGS_MASK X86_EFLAGS_IF +#define EFI_UNACCEPTED_UNIT_SIZE PMD_SIZE + /* * The EFI services are called through variadic functions in many cases. These * functions are implemented in assembler and support only a fixed number of diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig index 043ca31c114e..231f1c70d1db 100644 --- a/drivers/firmware/efi/Kconfig +++ b/drivers/firmware/efi/Kconfig @@ -269,6 +269,20 @@ config EFI_COCO_SECRET virt/coco/efi_secret module to access the secrets, which in turn allows userspace programs to access the injected secrets. +config UNACCEPTED_MEMORY + bool + depends on EFI_STUB + help + Some Virtual Machine platforms, such as Intel TDX, require + some memory to be "accepted" by the guest before it can be used. + This mechanism helps prevent malicious hosts from making changes + to guest memory. + + UEFI specification v2.9 introduced EFI_UNACCEPTED_MEMORY memory type. + + This option adds support for unaccepted memory and makes such memory + usable by the kernel. + config EFI_EMBEDDED_FIRMWARE bool select CRYPTO_LIB_SHA256 diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index abeff7dc0b58..7dce06e419c5 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -843,6 +843,7 @@ static __initdata char memory_type_name[][13] = { "MMIO Port", "PAL Code", "Persistent", + "Unaccepted", }; char * __init efi_md_typeattr_format(char *buf, size_t size, diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile index 3abb2b357482..16d64a34d1e1 100644 --- a/drivers/firmware/efi/libstub/Makefile +++ b/drivers/firmware/efi/libstub/Makefile @@ -96,6 +96,8 @@ CFLAGS_arm32-stub.o := -DTEXT_OFFSET=$(TEXT_OFFSET) zboot-obj-$(CONFIG_RISCV) := lib-clz_ctz.o lib-ashldi3.o lib-$(CONFIG_EFI_ZBOOT) += zboot.o $(zboot-obj-y) +lib-$(CONFIG_UNACCEPTED_MEMORY) += unaccepted_memory.o bitmap.o find.o + extra-y := $(lib-y) lib-y := $(patsubst %.o,%.stub.o,$(lib-y)) diff --git a/drivers/firmware/efi/libstub/bitmap.c b/drivers/firmware/efi/libstub/bitmap.c new file mode 100644 index 000000000000..5c9bba0d549b --- /dev/null +++ b/drivers/firmware/efi/libstub/bitmap.c @@ -0,0 +1,41 @@ +#include + +void __bitmap_set(unsigned long *map, unsigned int start, int len) +{ + unsigned long *p = map + BIT_WORD(start); + const unsigned int size = start + len; + int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG); + unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start); + + while (len - bits_to_set >= 0) { + *p |= mask_to_set; + len -= bits_to_set; + bits_to_set = BITS_PER_LONG; + mask_to_set = ~0UL; + p++; + } + if (len) { + mask_to_set &= BITMAP_LAST_WORD_MASK(size); + *p |= mask_to_set; + } +} + +void __bitmap_clear(unsigned long *map, unsigned int start, int len) +{ + unsigned long *p = map + BIT_WORD(start); + const unsigned int size = start + len; + int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG); + unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start); + + while (len - bits_to_clear >= 0) { + *p &= ~mask_to_clear; + len -= bits_to_clear; + bits_to_clear = BITS_PER_LONG; + mask_to_clear = ~0UL; + p++; + } + if (len) { + mask_to_clear &= BITMAP_LAST_WORD_MASK(size); + *p &= ~mask_to_clear; + } +} diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h index 54a2822cae77..6aa38a1bf126 100644 --- a/drivers/firmware/efi/libstub/efistub.h +++ b/drivers/firmware/efi/libstub/efistub.h @@ -1136,4 +1136,10 @@ void efi_remap_image(unsigned long image_base, unsigned alloc_size, asmlinkage efi_status_t __efiapi efi_zboot_entry(efi_handle_t handle, efi_system_table_t *systab); +efi_status_t allocate_unaccepted_bitmap(__u32 nr_desc, + struct efi_boot_memmap *map); +void process_unaccepted_memory(u64 start, u64 end); +void accept_memory(phys_addr_t start, phys_addr_t end); +void arch_accept_memory(phys_addr_t start, phys_addr_t end); + #endif diff --git a/drivers/firmware/efi/libstub/find.c b/drivers/firmware/efi/libstub/find.c new file mode 100644 index 000000000000..4e7740d28987 --- /dev/null +++ b/drivers/firmware/efi/libstub/find.c @@ -0,0 +1,43 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include +#include +#include + +/* + * Common helper for find_next_bit() function family + * @FETCH: The expression that fetches and pre-processes each word of bitmap(s) + * @MUNGE: The expression that post-processes a word containing found bit (may be empty) + * @size: The bitmap size in bits + * @start: The bitnumber to start searching at + */ +#define FIND_NEXT_BIT(FETCH, MUNGE, size, start) \ +({ \ + unsigned long mask, idx, tmp, sz = (size), __start = (start); \ + \ + if (unlikely(__start >= sz)) \ + goto out; \ + \ + mask = MUNGE(BITMAP_FIRST_WORD_MASK(__start)); \ + idx = __start / BITS_PER_LONG; \ + \ + for (tmp = (FETCH) & mask; !tmp; tmp = (FETCH)) { \ + if ((idx + 1) * BITS_PER_LONG >= sz) \ + goto out; \ + idx++; \ + } \ + \ + sz = min(idx * BITS_PER_LONG + __ffs(MUNGE(tmp)), sz); \ +out: \ + sz; \ +}) + +unsigned long _find_next_bit(const unsigned long *addr, unsigned long nbits, unsigned long start) +{ + return FIND_NEXT_BIT(addr[idx], /* nop */, nbits, start); +} + +unsigned long _find_next_zero_bit(const unsigned long *addr, unsigned long nbits, + unsigned long start) +{ + return FIND_NEXT_BIT(~addr[idx], /* nop */, nbits, start); +} diff --git a/drivers/firmware/efi/libstub/unaccepted_memory.c b/drivers/firmware/efi/libstub/unaccepted_memory.c new file mode 100644 index 000000000000..ca61f4733ea5 --- /dev/null +++ b/drivers/firmware/efi/libstub/unaccepted_memory.c @@ -0,0 +1,222 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include +#include +#include "efistub.h" + +struct efi_unaccepted_memory *unaccepted_table; + +efi_status_t allocate_unaccepted_bitmap(__u32 nr_desc, + struct efi_boot_memmap *map) +{ + efi_guid_t unaccepted_table_guid = LINUX_EFI_UNACCEPTED_MEM_TABLE_GUID; + u64 unaccepted_start = ULLONG_MAX, unaccepted_end = 0, bitmap_size; + efi_status_t status; + int i; + + /* Check if the table is already installed */ + unaccepted_table = get_efi_config_table(unaccepted_table_guid); + if (unaccepted_table) { + if (unaccepted_table->version != 1) { + efi_err("Unknown version of unaccepted memory table\n"); + return EFI_UNSUPPORTED; + } + return EFI_SUCCESS; + } + + /* Check if there's any unaccepted memory and find the max address */ + for (i = 0; i < nr_desc; i++) { + efi_memory_desc_t *d; + unsigned long m = (unsigned long)map->map; + + d = efi_early_memdesc_ptr(m, map->desc_size, i); + if (d->type != EFI_UNACCEPTED_MEMORY) + continue; + + unaccepted_start = min(unaccepted_start, d->phys_addr); + unaccepted_end = max(unaccepted_end, + d->phys_addr + d->num_pages * PAGE_SIZE); + } + + if (unaccepted_start == ULLONG_MAX) + return EFI_SUCCESS; + + unaccepted_start = round_down(unaccepted_start, + EFI_UNACCEPTED_UNIT_SIZE); + unaccepted_end = round_up(unaccepted_end, EFI_UNACCEPTED_UNIT_SIZE); + + /* + * If unaccepted memory is present, allocate a bitmap to track what + * memory has to be accepted before access. + * + * One bit in the bitmap represents 2MiB in the address space: + * A 4k bitmap can track 64GiB of physical address space. + * + * In the worst case scenario -- a huge hole in the middle of the + * address space -- It needs 256MiB to handle 4PiB of the address + * space. + * + * The bitmap will be populated in setup_e820() according to the memory + * map after efi_exit_boot_services(). + */ + bitmap_size = DIV_ROUND_UP(unaccepted_end - unaccepted_start, + EFI_UNACCEPTED_UNIT_SIZE * BITS_PER_BYTE); + + status = efi_bs_call(allocate_pool, EFI_LOADER_DATA, + sizeof(*unaccepted_table) + bitmap_size, + (void **)&unaccepted_table); + if (status != EFI_SUCCESS) { + efi_err("Failed to allocate unaccepted memory config table\n"); + return status; + } + + unaccepted_table->version = 1; + unaccepted_table->unit_size = EFI_UNACCEPTED_UNIT_SIZE; + unaccepted_table->phys_base = unaccepted_start; + unaccepted_table->size = bitmap_size; + memset(unaccepted_table->bitmap, 0, bitmap_size); + + status = efi_bs_call(install_configuration_table, + &unaccepted_table_guid, unaccepted_table); + if (status != EFI_SUCCESS) { + efi_bs_call(free_pool, unaccepted_table); + efi_err("Failed to install unaccepted memory config table!\n"); + } + + return status; +} + +/* + * The accepted memory bitmap only works at unit_size granularity. Take + * unaligned start/end addresses and either: + * 1. Accepts the memory immediately and in its entirety + * 2. Accepts unaligned parts, and marks *some* aligned part unaccepted + * + * The function will never reach the bitmap_set() with zero bits to set. + */ +void process_unaccepted_memory(u64 start, u64 end) +{ + u64 unit_size = unaccepted_table->unit_size; + u64 unit_mask = unaccepted_table->unit_size - 1; + u64 bitmap_size = unaccepted_table->size; + + /* + * Ensure that at least one bit will be set in the bitmap by + * immediately accepting all regions under 2*unit_size. This is + * imprecise and may immediately accept some areas that could + * have been represented in the bitmap. But, results in simpler + * code below + * + * Consider case like this (assuming unit_size == 2MB): + * + * | 4k | 2044k | 2048k | + * ^ 0x0 ^ 2MB ^ 4MB + * + * Only the first 4k has been accepted. The 0MB->2MB region can not be + * represented in the bitmap. The 2MB->4MB region can be represented in + * the bitmap. But, the 0MB->4MB region is <2*unit_size and will be + * immediately accepted in its entirety. + */ + if (end - start < 2 * unit_size) { + arch_accept_memory(start, end); + return; + } + + /* + * No matter how the start and end are aligned, at least one unaccepted + * unit_size area will remain to be marked in the bitmap. + */ + + /* Immediately accept a phys_base) { + arch_accept_memory(start, + min(unaccepted_table->phys_base, end)); + start = unaccepted_table->phys_base; + } + + /* Nothing to record */ + if (end < unaccepted_table->phys_base) + return; + + /* Translate to offsets from the beginning of the bitmap */ + start -= unaccepted_table->phys_base; + end -= unaccepted_table->phys_base; + + /* Accept memory that doesn't fit into bitmap */ + if (end > bitmap_size * unit_size * BITS_PER_BYTE) { + unsigned long phys_start, phys_end; + + phys_start = bitmap_size * unit_size * BITS_PER_BYTE + + unaccepted_table->phys_base; + phys_end = end + unaccepted_table->phys_base; + + arch_accept_memory(phys_start, phys_end); + end = bitmap_size * unit_size * BITS_PER_BYTE; + } + + /* + * 'start' and 'end' are now both unit_size-aligned. + * Record the range as being unaccepted: + */ + bitmap_set(unaccepted_table->bitmap, + start / unit_size, (end - start) / unit_size); +} + +void accept_memory(phys_addr_t start, phys_addr_t end) +{ + unsigned long range_start, range_end; + unsigned long bitmap_size; + u64 unit_size; + + if (!unaccepted_table) + return; + + unit_size = unaccepted_table->unit_size; + + /* + * Only care for the part of the range that is represented + * in the bitmap. + */ + if (start < unaccepted_table->phys_base) + start = unaccepted_table->phys_base; + if (end < unaccepted_table->phys_base) + return; + + /* Translate to offsets from the beginning of the bitmap */ + start -= unaccepted_table->phys_base; + end -= unaccepted_table->phys_base; + + /* Make sure not to overrun the bitmap */ + if (end > unaccepted_table->size * unit_size * BITS_PER_BYTE) + end = unaccepted_table->size * unit_size * BITS_PER_BYTE; + + range_start = start / unit_size; + bitmap_size = DIV_ROUND_UP(end, unit_size); + + for_each_set_bitrange_from(range_start, range_end, + unaccepted_table->bitmap, bitmap_size) { + unsigned long phys_start, phys_end; + + phys_start = range_start * unit_size + unaccepted_table->phys_base; + phys_end = range_end * unit_size + unaccepted_table->phys_base; + + arch_accept_memory(phys_start, phys_end); + bitmap_clear(unaccepted_table->bitmap, + range_start, range_end - range_start); + } +} diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c index cd77a7a61470..3cc7faac001d 100644 --- a/drivers/firmware/efi/libstub/x86-stub.c +++ b/drivers/firmware/efi/libstub/x86-stub.c @@ -613,6 +613,16 @@ setup_e820(struct boot_params *params, struct setup_data *e820ext, u32 e820ext_s e820_type = E820_TYPE_PMEM; break; + case EFI_UNACCEPTED_MEMORY: + if (!IS_ENABLED(CONFIG_UNACCEPTED_MEMORY)) { + efi_warn_once( +"The system has unaccepted memory, but kernel does not support it\nConsider enabling CONFIG_UNACCEPTED_MEMORY\n"); + continue; + } + e820_type = E820_TYPE_RAM; + process_unaccepted_memory(d->phys_addr, + d->phys_addr + PAGE_SIZE * d->num_pages); + break; default: continue; } @@ -697,6 +707,9 @@ static efi_status_t allocate_e820(struct boot_params *params, status = alloc_e820ext(nr_e820ext, e820ext, e820ext_size); } + if (IS_ENABLED(CONFIG_UNACCEPTED_MEMORY) && status == EFI_SUCCESS) + status = allocate_unaccepted_bitmap(nr_desc, map); + efi_bs_call(free_pool, map); return status; } diff --git a/include/linux/efi.h b/include/linux/efi.h index 571d1a6e1b74..8ffe451a6a2f 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -108,7 +108,8 @@ typedef struct { #define EFI_MEMORY_MAPPED_IO_PORT_SPACE 12 #define EFI_PAL_CODE 13 #define EFI_PERSISTENT_MEMORY 14 -#define EFI_MAX_MEMORY_TYPE 15 +#define EFI_UNACCEPTED_MEMORY 15 +#define EFI_MAX_MEMORY_TYPE 16 /* Attribute values: */ #define EFI_MEMORY_UC ((u64)0x0000000000000001ULL) /* uncached */ @@ -417,6 +418,7 @@ void efi_native_runtime_setup(void); #define LINUX_EFI_MOK_VARIABLE_TABLE_GUID EFI_GUID(0xc451ed2b, 0x9694, 0x45d3, 0xba, 0xba, 0xed, 0x9f, 0x89, 0x88, 0xa3, 0x89) #define LINUX_EFI_COCO_SECRET_AREA_GUID EFI_GUID(0xadf956ad, 0xe98c, 0x484c, 0xae, 0x11, 0xb5, 0x1c, 0x7d, 0x33, 0x64, 0x47) #define LINUX_EFI_BOOT_MEMMAP_GUID EFI_GUID(0x800f683f, 0xd08b, 0x423a, 0xa2, 0x93, 0x96, 0x5c, 0x3c, 0x6f, 0xe2, 0xb4) +#define LINUX_EFI_UNACCEPTED_MEM_TABLE_GUID EFI_GUID(0xd5d1de3c, 0x105c, 0x44f9, 0x9e, 0xa9, 0xbc, 0xef, 0x98, 0x12, 0x00, 0x31) #define RISCV_EFI_BOOT_PROTOCOL_GUID EFI_GUID(0xccd15fec, 0x6f73, 0x4eec, 0x83, 0x95, 0x3e, 0x69, 0xe4, 0xb9, 0x40, 0xbf) @@ -534,6 +536,14 @@ struct efi_boot_memmap { efi_memory_desc_t map[]; }; +struct efi_unaccepted_memory { + u32 version; + u32 unit_size; + u64 phys_base; + u64 size; + unsigned long bitmap[]; +}; + /* * Architecture independent structure for describing a memory map for the * benefit of efi_memmap_init_early(), and for passing context between -- cgit v1.2.3 From 2053bc57f36763febced0b5cd91821698bcf6b3d Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Tue, 6 Jun 2023 17:26:33 +0300 Subject: efi: Add unaccepted memory support efi_config_parse_tables() reserves memory that holds unaccepted memory configuration table so it won't be reused by page allocator. Core-mm requires few helpers to support unaccepted memory: - accept_memory() checks the range of addresses against the bitmap and accept memory if needed. - range_contains_unaccepted_memory() checks if anything within the range requires acceptance. Architectural code has to provide efi_get_unaccepted_table() that returns pointer to the unaccepted memory configuration table. arch_accept_memory() handles arch-specific part of memory acceptance. Signed-off-by: Kirill A. Shutemov Signed-off-by: Borislav Petkov (AMD) Reviewed-by: Ard Biesheuvel Reviewed-by: Tom Lendacky Link: https://lore.kernel.org/r/20230606142637.5171-6-kirill.shutemov@linux.intel.com --- arch/x86/platform/efi/efi.c | 3 + drivers/firmware/efi/Makefile | 1 + drivers/firmware/efi/efi.c | 25 +++++++ drivers/firmware/efi/unaccepted_memory.c | 112 +++++++++++++++++++++++++++++++ include/linux/efi.h | 1 + 5 files changed, 142 insertions(+) create mode 100644 drivers/firmware/efi/unaccepted_memory.c (limited to 'drivers') diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index f3f2d87cce1b..e9f99c56f3ce 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -96,6 +96,9 @@ static const unsigned long * const efi_tables[] = { #ifdef CONFIG_EFI_COCO_SECRET &efi.coco_secret, #endif +#ifdef CONFIG_UNACCEPTED_MEMORY + &efi.unaccepted, +#endif }; u64 efi_setup; /* efi setup_data physical address */ diff --git a/drivers/firmware/efi/Makefile b/drivers/firmware/efi/Makefile index b51f2a4c821e..e489fefd23da 100644 --- a/drivers/firmware/efi/Makefile +++ b/drivers/firmware/efi/Makefile @@ -41,3 +41,4 @@ obj-$(CONFIG_EFI_CAPSULE_LOADER) += capsule-loader.o obj-$(CONFIG_EFI_EARLYCON) += earlycon.o obj-$(CONFIG_UEFI_CPER_ARM) += cper-arm.o obj-$(CONFIG_UEFI_CPER_X86) += cper-x86.o +obj-$(CONFIG_UNACCEPTED_MEMORY) += unaccepted_memory.o diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index 7dce06e419c5..d817e7afd266 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -50,6 +50,9 @@ struct efi __read_mostly efi = { #ifdef CONFIG_EFI_COCO_SECRET .coco_secret = EFI_INVALID_TABLE_ADDR, #endif +#ifdef CONFIG_UNACCEPTED_MEMORY + .unaccepted = EFI_INVALID_TABLE_ADDR, +#endif }; EXPORT_SYMBOL(efi); @@ -605,6 +608,9 @@ static const efi_config_table_type_t common_tables[] __initconst = { #ifdef CONFIG_EFI_COCO_SECRET {LINUX_EFI_COCO_SECRET_AREA_GUID, &efi.coco_secret, "CocoSecret" }, #endif +#ifdef CONFIG_UNACCEPTED_MEMORY + {LINUX_EFI_UNACCEPTED_MEM_TABLE_GUID, &efi.unaccepted, "Unaccepted" }, +#endif #ifdef CONFIG_EFI_GENERIC_STUB {LINUX_EFI_SCREEN_INFO_TABLE_GUID, &screen_info_table }, #endif @@ -759,6 +765,25 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables, } } + if (IS_ENABLED(CONFIG_UNACCEPTED_MEMORY) && + efi.unaccepted != EFI_INVALID_TABLE_ADDR) { + struct efi_unaccepted_memory *unaccepted; + + unaccepted = early_memremap(efi.unaccepted, sizeof(*unaccepted)); + if (unaccepted) { + unsigned long size; + + if (unaccepted->version == 1) { + size = sizeof(*unaccepted) + unaccepted->size; + memblock_reserve(efi.unaccepted, size); + } else { + efi.unaccepted = EFI_INVALID_TABLE_ADDR; + } + + early_memunmap(unaccepted, sizeof(*unaccepted)); + } + } + return 0; } diff --git a/drivers/firmware/efi/unaccepted_memory.c b/drivers/firmware/efi/unaccepted_memory.c new file mode 100644 index 000000000000..08a9a843550a --- /dev/null +++ b/drivers/firmware/efi/unaccepted_memory.c @@ -0,0 +1,112 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include +#include +#include +#include + +/* Protects unaccepted memory bitmap */ +static DEFINE_SPINLOCK(unaccepted_memory_lock); + +/* + * accept_memory() -- Consult bitmap and accept the memory if needed. + * + * Only memory that is explicitly marked as unaccepted in the bitmap requires + * an action. All the remaining memory is implicitly accepted and doesn't need + * acceptance. + * + * No need to accept: + * - anything if the system has no unaccepted table; + * - memory that is below phys_base; + * - memory that is above the memory that addressable by the bitmap; + */ +void accept_memory(phys_addr_t start, phys_addr_t end) +{ + struct efi_unaccepted_memory *unaccepted; + unsigned long range_start, range_end; + unsigned long flags; + u64 unit_size; + + unaccepted = efi_get_unaccepted_table(); + if (!unaccepted) + return; + + unit_size = unaccepted->unit_size; + + /* + * Only care for the part of the range that is represented + * in the bitmap. + */ + if (start < unaccepted->phys_base) + start = unaccepted->phys_base; + if (end < unaccepted->phys_base) + return; + + /* Translate to offsets from the beginning of the bitmap */ + start -= unaccepted->phys_base; + end -= unaccepted->phys_base; + + /* Make sure not to overrun the bitmap */ + if (end > unaccepted->size * unit_size * BITS_PER_BYTE) + end = unaccepted->size * unit_size * BITS_PER_BYTE; + + range_start = start / unit_size; + + spin_lock_irqsave(&unaccepted_memory_lock, flags); + for_each_set_bitrange_from(range_start, range_end, unaccepted->bitmap, + DIV_ROUND_UP(end, unit_size)) { + unsigned long phys_start, phys_end; + unsigned long len = range_end - range_start; + + phys_start = range_start * unit_size + unaccepted->phys_base; + phys_end = range_end * unit_size + unaccepted->phys_base; + + arch_accept_memory(phys_start, phys_end); + bitmap_clear(unaccepted->bitmap, range_start, len); + } + spin_unlock_irqrestore(&unaccepted_memory_lock, flags); +} + +bool range_contains_unaccepted_memory(phys_addr_t start, phys_addr_t end) +{ + struct efi_unaccepted_memory *unaccepted; + unsigned long flags; + bool ret = false; + u64 unit_size; + + unaccepted = efi_get_unaccepted_table(); + if (!unaccepted) + return false; + + unit_size = unaccepted->unit_size; + + /* + * Only care for the part of the range that is represented + * in the bitmap. + */ + if (start < unaccepted->phys_base) + start = unaccepted->phys_base; + if (end < unaccepted->phys_base) + return false; + + /* Translate to offsets from the beginning of the bitmap */ + start -= unaccepted->phys_base; + end -= unaccepted->phys_base; + + /* Make sure not to overrun the bitmap */ + if (end > unaccepted->size * unit_size * BITS_PER_BYTE) + end = unaccepted->size * unit_size * BITS_PER_BYTE; + + spin_lock_irqsave(&unaccepted_memory_lock, flags); + while (start < end) { + if (test_bit(start / unit_size, unaccepted->bitmap)) { + ret = true; + break; + } + + start += unit_size; + } + spin_unlock_irqrestore(&unaccepted_memory_lock, flags); + + return ret; +} diff --git a/include/linux/efi.h b/include/linux/efi.h index 8ffe451a6a2f..67cb72d7a764 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -646,6 +646,7 @@ extern struct efi { unsigned long tpm_final_log; /* TPM2 Final Events Log table */ unsigned long mokvar_table; /* MOK variable config table */ unsigned long coco_secret; /* Confidential computing secret table */ + unsigned long unaccepted; /* Unaccepted memory table */ efi_get_time_t *get_time; efi_set_time_t *set_time; -- cgit v1.2.3 From c211c19e80d046441655e372c6ae15f29d358259 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Tue, 6 Jun 2023 17:26:34 +0300 Subject: efi/unaccepted: Avoid load_unaligned_zeropad() stepping into unaccepted memory load_unaligned_zeropad() can lead to unwanted loads across page boundaries. The unwanted loads are typically harmless. But, they might be made to totally unrelated or even unmapped memory. load_unaligned_zeropad() relies on exception fixup (#PF, #GP and now #VE) to recover from these unwanted loads. But, this approach does not work for unaccepted memory. For TDX, a load from unaccepted memory will not lead to a recoverable exception within the guest. The guest will exit to the VMM where the only recourse is to terminate the guest. There are two parts to fix this issue and comprehensively avoid access to unaccepted memory. Together these ensure that an extra "guard" page is accepted in addition to the memory that needs to be used. 1. Implicitly extend the range_contains_unaccepted_memory(start, end) checks up to end+unit_size if 'end' is aligned on a unit_size boundary. 2. Implicitly extend accept_memory(start, end) to end+unit_size if 'end' is aligned on a unit_size boundary. Side note: This leads to something strange. Pages which were accepted at boot, marked by the firmware as accepted and will never _need_ to be accepted might be on unaccepted_pages list This is a cue to ensure that the next page is accepted before 'page' can be used. This is an actual, real-world problem which was discovered during TDX testing. Signed-off-by: Kirill A. Shutemov Signed-off-by: Borislav Petkov (AMD) Reviewed-by: Dave Hansen Reviewed-by: Ard Biesheuvel Reviewed-by: Tom Lendacky Link: https://lore.kernel.org/r/20230606142637.5171-7-kirill.shutemov@linux.intel.com --- drivers/firmware/efi/unaccepted_memory.c | 35 ++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) (limited to 'drivers') diff --git a/drivers/firmware/efi/unaccepted_memory.c b/drivers/firmware/efi/unaccepted_memory.c index 08a9a843550a..853f7dc3c21d 100644 --- a/drivers/firmware/efi/unaccepted_memory.c +++ b/drivers/firmware/efi/unaccepted_memory.c @@ -46,6 +46,34 @@ void accept_memory(phys_addr_t start, phys_addr_t end) start -= unaccepted->phys_base; end -= unaccepted->phys_base; + /* + * load_unaligned_zeropad() can lead to unwanted loads across page + * boundaries. The unwanted loads are typically harmless. But, they + * might be made to totally unrelated or even unmapped memory. + * load_unaligned_zeropad() relies on exception fixup (#PF, #GP and now + * #VE) to recover from these unwanted loads. + * + * But, this approach does not work for unaccepted memory. For TDX, a + * load from unaccepted memory will not lead to a recoverable exception + * within the guest. The guest will exit to the VMM where the only + * recourse is to terminate the guest. + * + * There are two parts to fix this issue and comprehensively avoid + * access to unaccepted memory. Together these ensure that an extra + * "guard" page is accepted in addition to the memory that needs to be + * used: + * + * 1. Implicitly extend the range_contains_unaccepted_memory(start, end) + * checks up to end+unit_size if 'end' is aligned on a unit_size + * boundary. + * + * 2. Implicitly extend accept_memory(start, end) to end+unit_size if + * 'end' is aligned on a unit_size boundary. (immediately following + * this comment) + */ + if (!(end % unit_size)) + end += unit_size; + /* Make sure not to overrun the bitmap */ if (end > unaccepted->size * unit_size * BITS_PER_BYTE) end = unaccepted->size * unit_size * BITS_PER_BYTE; @@ -93,6 +121,13 @@ bool range_contains_unaccepted_memory(phys_addr_t start, phys_addr_t end) start -= unaccepted->phys_base; end -= unaccepted->phys_base; + /* + * Also consider the unaccepted state of the *next* page. See fix #1 in + * the comment on load_unaligned_zeropad() in accept_memory(). + */ + if (!(end % unit_size)) + end += unit_size; + /* Make sure not to overrun the bitmap */ if (end > unaccepted->size * unit_size * BITS_PER_BYTE) end = unaccepted->size * unit_size * BITS_PER_BYTE; -- cgit v1.2.3 From c0461bd16666351f0de11578b1e02dcdae4db736 Mon Sep 17 00:00:00 2001 From: Dionna Glaze Date: Tue, 6 Jun 2023 09:51:27 -0500 Subject: x86/efi: Safely enable unaccepted memory in UEFI The UEFI v2.9 specification includes a new memory type to be used in environments where the OS must accept memory that is provided from its host. Before the introduction of this memory type, all memory was accepted eagerly in the firmware. In order for the firmware to safely stop accepting memory on the OS's behalf, the OS must affirmatively indicate support to the firmware. This is only a problem for AMD SEV-SNP, since Linux has had support for it since 5.19. The other technology that can make use of unaccepted memory, Intel TDX, does not yet have Linux support, so it can strictly require unaccepted memory support as a dependency of CONFIG_TDX and not require communication with the firmware. Enabling unaccepted memory requires calling a 0-argument enablement protocol before ExitBootServices. This call is only made if the kernel is compiled with UNACCEPTED_MEMORY=y This protocol will be removed after the end of life of the first LTS that includes it, in order to give firmware implementations an expiration date for it. When the protocol is removed, firmware will strictly infer that a SEV-SNP VM is running an OS that supports the unaccepted memory type. At the earliest convenience, when unaccepted memory support is added to Linux, SEV-SNP may take strict dependence in it. After the firmware removes support for the protocol, this should be reverted. [tl: address some checkscript warnings] Signed-off-by: Dionna Glaze Signed-off-by: Tom Lendacky Signed-off-by: Borislav Petkov (AMD) Reviewed-by: Ard Biesheuvel Link: https://lore.kernel.org/r/0d5f3d9a20b5cf361945b7ab1263c36586a78a42.1686063086.git.thomas.lendacky@amd.com --- drivers/firmware/efi/libstub/x86-stub.c | 36 +++++++++++++++++++++++++++++++++ include/linux/efi.h | 3 +++ 2 files changed, 39 insertions(+) (limited to 'drivers') diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c index 3cc7faac001d..220be75a5cdc 100644 --- a/drivers/firmware/efi/libstub/x86-stub.c +++ b/drivers/firmware/efi/libstub/x86-stub.c @@ -26,6 +26,17 @@ const efi_dxe_services_table_t *efi_dxe_table; u32 image_offset __section(".data"); static efi_loaded_image_t *image = NULL; +typedef union sev_memory_acceptance_protocol sev_memory_acceptance_protocol_t; +union sev_memory_acceptance_protocol { + struct { + efi_status_t (__efiapi * allow_unaccepted_memory)( + sev_memory_acceptance_protocol_t *); + }; + struct { + u32 allow_unaccepted_memory; + } mixed_mode; +}; + static efi_status_t preserve_pci_rom_image(efi_pci_io_protocol_t *pci, struct pci_setup_rom **__rom) { @@ -310,6 +321,29 @@ setup_memory_protection(unsigned long image_base, unsigned long image_size) #endif } +static void setup_unaccepted_memory(void) +{ + efi_guid_t mem_acceptance_proto = OVMF_SEV_MEMORY_ACCEPTANCE_PROTOCOL_GUID; + sev_memory_acceptance_protocol_t *proto; + efi_status_t status; + + if (!IS_ENABLED(CONFIG_UNACCEPTED_MEMORY)) + return; + + /* + * Enable unaccepted memory before calling exit boot services in order + * for the UEFI to not accept all memory on EBS. + */ + status = efi_bs_call(locate_protocol, &mem_acceptance_proto, NULL, + (void **)&proto); + if (status != EFI_SUCCESS) + return; + + status = efi_call_proto(proto, allow_unaccepted_memory); + if (status != EFI_SUCCESS) + efi_err("Memory acceptance protocol failed\n"); +} + static const efi_char16_t apple[] = L"Apple"; static void setup_quirks(struct boot_params *boot_params, @@ -908,6 +942,8 @@ asmlinkage unsigned long efi_main(efi_handle_t handle, setup_quirks(boot_params, bzimage_addr, buffer_end - buffer_start); + setup_unaccepted_memory(); + status = exit_boot(boot_params, handle); if (status != EFI_SUCCESS) { efi_err("exit_boot() failed!\n"); diff --git a/include/linux/efi.h b/include/linux/efi.h index 67cb72d7a764..18d83a613635 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -437,6 +437,9 @@ void efi_native_runtime_setup(void); #define DELLEMC_EFI_RCI2_TABLE_GUID EFI_GUID(0x2d9f28a2, 0xa886, 0x456a, 0x97, 0xa8, 0xf1, 0x1e, 0xf2, 0x4f, 0xf4, 0x55) #define AMD_SEV_MEM_ENCRYPT_GUID EFI_GUID(0x0cf29b71, 0x9e51, 0x433a, 0xa3, 0xb7, 0x81, 0xf3, 0xab, 0x16, 0xb8, 0x75) +/* OVMF protocol GUIDs */ +#define OVMF_SEV_MEMORY_ACCEPTANCE_PROTOCOL_GUID EFI_GUID(0xc5a010fe, 0x38a7, 0x4531, 0x8a, 0x4a, 0x05, 0x00, 0xd2, 0xfd, 0x16, 0x49) + typedef struct { efi_guid_t guid; u64 table; -- cgit v1.2.3 From 84b9b44b99780d35fe72ac63c4724f158771e898 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 17 Jan 2023 18:13:56 +0100 Subject: virt: sevguest: Add CONFIG_CRYPTO dependency This driver fails to link when CRYPTO is disabled, or in a loadable module: WARNING: unmet direct dependencies detected for CRYPTO_GCM WARNING: unmet direct dependencies detected for CRYPTO_AEAD2 Depends on [m]: CRYPTO [=m] Selected by [y]: - SEV_GUEST [=y] && VIRT_DRIVERS [=y] && AMD_MEM_ENCRYPT [=y] x86_64-linux-ld: crypto/aead.o: in function `crypto_register_aeads': Fixes: fce96cf04430 ("virt: Add SEV-SNP guest driver") Signed-off-by: Arnd Bergmann Signed-off-by: Borislav Petkov (AMD) Link: https://lore.kernel.org/r/20230117171416.2715125-1-arnd@kernel.org --- drivers/virt/coco/sev-guest/Kconfig | 1 + 1 file changed, 1 insertion(+) (limited to 'drivers') diff --git a/drivers/virt/coco/sev-guest/Kconfig b/drivers/virt/coco/sev-guest/Kconfig index f9db0799ae67..da2d7ca531f0 100644 --- a/drivers/virt/coco/sev-guest/Kconfig +++ b/drivers/virt/coco/sev-guest/Kconfig @@ -2,6 +2,7 @@ config SEV_GUEST tristate "AMD SEV Guest driver" default m depends on AMD_MEM_ENCRYPT + select CRYPTO select CRYPTO_AEAD2 select CRYPTO_GCM help -- cgit v1.2.3