summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2024-08-25Merge tag 's390-6.11-4' of ↵Linus Torvalds8-33/+85
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Fix KASLR base offset to account for symbol offsets in the vmlinux ELF file, preventing tool breakages like the drgn debugger - Fix potential memory corruption of physmem_info during kernel physical address randomization - Fix potential memory corruption due to overlap between the relocated lowcore and identity mapping by correctly reserving lowcore memory - Fix performance regression and avoid randomizing identity mapping base by default - Fix unnecessary delay of AP bus binding complete uevent to prevent startup lag in KVM guests using AP * tag 's390-6.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/boot: Fix KASLR base offset off by __START_KERNEL bytes s390/boot: Avoid possible physmem_info segment corruption s390/ap: Refine AP bus bindings complete processing s390/mm: Pin identity mapping base to zero s390/mm: Prevent lowcore vs identity mapping overlap
2024-08-24Merge tag 'mips-fixes_6.11_1' of ↵Linus Torvalds2-8/+11
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Thomas Bogendoerfer: - Set correct timer mode on Loongson64 - Only request r4k clockevent interrupt on one CPU * tag 'mips-fixes_6.11_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: cevt-r4k: Don't call get_c0_compare_int if timer irq is installed MIPS: Loongson64: Set timer mode in cpu-probe
2024-08-24Merge tag 'arm64-fixes' of ↵Linus Torvalds6-5/+33
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 kvm fixes from Catalin Marinas: - Don't drop references on LPIs that weren't visited by the vgic-debug iterator - Cure lock ordering issue when unregistering vgic redistributors - Fix for misaligned stage-2 mappings when VMs are backed by hugetlb pages - Treat SGI registers as UNDEFINED if a VM hasn't been configured for GICv3 * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: KVM: arm64: Make ICC_*SGI*_EL1 undef in the absence of a vGICv3 KVM: arm64: Ensure canonical IPA is hugepage-aligned when handling fault KVM: arm64: vgic: Don't hold config_lock while unregistering redistributors KVM: arm64: vgic-debug: Don't put unmarked LPIs
2024-08-23Merge tag 'kvmarm-fixes-6.11-2' of ↵Catalin Marinas18-38/+72
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into for-next/fixes KVM/arm64 fixes for 6.11, round #2 - Don't drop references on LPIs that weren't visited by the vgic-debug iterator - Cure lock ordering issue when unregistering vgic redistributors - Fix for misaligned stage-2 mappings when VMs are backed by hugetlb pages - Treat SGI registers as UNDEFINED if a VM hasn't been configured for GICv3 * tag 'kvmarm-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm: KVM: arm64: Make ICC_*SGI*_EL1 undef in the absence of a vGICv3 KVM: arm64: Ensure canonical IPA is hugepage-aligned when handling fault KVM: arm64: vgic: Don't hold config_lock while unregistering redistributors KVM: arm64: vgic-debug: Don't put unmarked LPIs KVM: arm64: vgic: Hold config_lock while tearing down a CPU interface KVM: selftests: arm64: Correct feature test for S1PIE in get-reg-list KVM: arm64: Tidying up PAuth code in KVM KVM: arm64: vgic-debug: Exit the iterator properly w/o LPI KVM: arm64: Enforce dependency on an ARMv8.4-aware toolchain docs: KVM: Fix register ID of SPSR_FIQ KVM: arm64: vgic: fix unexpected unlock sparse warnings KVM: arm64: fix kdoc warnings in W=1 builds KVM: arm64: fix override-init warnings in W=1 builds KVM: arm64: free kvm->arch.nested_mmus with kvfree()
2024-08-22s390/boot: Fix KASLR base offset off by __START_KERNEL bytesAlexander Gordeev6-31/+52
Symbol offsets to the KASLR base do not match symbol address in the vmlinux image. That is the result of setting the KASLR base to the beginning of .text section as result of an optimization. Revert that optimization and allocate virtual memory for the whole kernel image including __START_KERNEL bytes as per the linker script. That allows keeping the semantics of the KASLR base offset in sync with other architectures. Rename __START_KERNEL to TEXT_OFFSET, since it represents the offset of the .text section within the kernel image, rather than a virtual address. Still skip mapping TEXT_OFFSET bytes to save memory on pgtables and provoke exceptions in case an attempt to access this area is made, as no kernel symbol may reside there. In case CONFIG_KASAN is enabled the location counter might exceed the value of TEXT_OFFSET, while the decompressor linker script forcefully resets it to TEXT_OFFSET, which leads to a sections overlap link failure. Use MAX() expression to avoid that. Reported-by: Omar Sandoval <osandov@osandov.com> Closes: https://lore.kernel.org/linux-s390/ZnS8dycxhtXBZVky@telecaster.dhcp.thefacebook.com/ Fixes: 56b1069c40c7 ("s390/boot: Rework deployment of the kernel image") Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-22s390/boot: Avoid possible physmem_info segment corruptionAlexander Gordeev1-2/+2
When physical memory for the kernel image is allocated it does not consider extra memory required for offsetting the image start to match it with the lower 20 bits of KASLR virtual base address. That might lead to kernel access beyond its memory range. Suggested-by: Vasily Gorbik <gor@linux.ibm.com> Fixes: 693d41f7c938 ("s390/mm: Restore mapping of kernel image using large pages") Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-22KVM: arm64: Make ICC_*SGI*_EL1 undef in the absence of a vGICv3Marc Zyngier2-0/+13
On a system with a GICv3, if a guest hasn't been configured with GICv3 and that the host is not capable of GICv2 emulation, a write to any of the ICC_*SGI*_EL1 registers is trapped to EL2. We therefore try to emulate the SGI access, only to hit a NULL pointer as no private interrupt is allocated (no GIC, remember?). The obvious fix is to give the guest what it deserves, in the shape of a UNDEF exception. Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240820100349.3544850-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-08-22KVM: arm64: Ensure canonical IPA is hugepage-aligned when handling faultOliver Upton1-1/+8
Zenghui reports that VMs backed by hugetlb pages are no longer booting after commit fd276e71d1e7 ("KVM: arm64: nv: Handle shadow stage 2 page faults"). Support for shadow stage-2 MMUs introduced the concept of a fault IPA and canonical IPA to stage-2 fault handling. These are identical in the non-nested case, as the hardware stage-2 context is always that of the canonical IPA space. Both addresses need to be hugepage-aligned when preparing to install a hugepage mapping to ensure that KVM uses the correct GFN->PFN translation and installs that at the correct IPA for the current stage-2. And now I'm feeling thirsty after all this talk of IPAs... Fixes: fd276e71d1e7 ("KVM: arm64: nv: Handle shadow stage 2 page faults") Reported-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240822071710.2291690-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-08-21s390/mm: Pin identity mapping base to zeroAlexander Gordeev2-1/+15
SIE instruction performs faster when the virtual address of SIE block matches the physical one. Pin the identity mapping base to zero for the benefit of SIE and other instructions that have similar performance impact. Still, randomize the base when DEBUG_VM kernel configuration option is enabled. Suggested-by: Vasily Gorbik <gor@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-21s390/mm: Prevent lowcore vs identity mapping overlapAlexander Gordeev1-1/+18
The identity mapping position in virtual memory is randomized together with the kernel mapping. That position can never overlap with the lowcore even when the lowcore is relocated. Prevent overlapping with the lowcore to allow independent positioning of the identity mapping. With the current value of the alternative lowcore address of 0x70000 the overlap could happen in case the identity mapping is placed at zero. This is a prerequisite for uncoupling of randomization base of kernel image and identity mapping in virtual memory. Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-20MIPS: cevt-r4k: Don't call get_c0_compare_int if timer irq is installedJiaxun Yang1-8/+7
This avoids warning: [ 0.118053] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:283 Caused by get_c0_compare_int on secondary CPU. We also skipped saving IRQ number to struct clock_event_device *cd as it's never used by clockevent core, as per comments it's only meant for "non CPU local devices". Reported-by: Serge Semin <fancer.lancer@gmail.com> Closes: https://lore.kernel.org/linux-mips/6szkkqxpsw26zajwysdrwplpjvhl5abpnmxgu2xuj3dkzjnvsf@4daqrz4mf44k/ Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2024-08-19KVM: arm64: vgic: Don't hold config_lock while unregistering redistributorsMarc Zyngier2-3/+11
We recently moved the teardown of the vgic part of a vcpu inside a critical section guarded by the config_lock. This teardown phase involves calling into kvm_io_bus_unregister_dev(), which takes the kvm->srcu lock. However, this violates the established order where kvm->srcu is taken on a memory fault (such as an MMIO access), possibly followed by taking the config_lock if the GIC emulation requires mutual exclusion from the other vcpus. It therefore results in a bad lockdep splat, as reported by Zenghui. Fix this by moving the call to kvm_io_bus_unregister_dev() outside of the config_lock critical section. At this stage, there shouln't be any need to hold the config_lock. As an additional bonus, document the ordering between kvm->slots_lock, kvm->srcu and kvm->arch.config_lock so that I cannot pretend I didn't know about those anymore. Fixes: 9eb18136af9f ("KVM: arm64: vgic: Hold config_lock while tearing down a CPU interface") Reported-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Tested-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20240819125045.3474845-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-08-19KVM: arm64: vgic-debug: Don't put unmarked LPIsZenghui Yu1-1/+1
If there were LPIs being mapped behind our back (i.e., between .start() and .stop()), we would put them at iter_unmark_lpis() without checking if they were actually *marked*, which is obviously not good. Switch to use the xa_for_each_marked() iterator to fix it. Cc: stable@vger.kernel.org Fixes: 85d3ccc8b75b ("KVM: arm64: vgic-debug: Use an xarray mark for debug iterator") Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240817101541.1664-1-yuzenghui@huawei.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-08-18Merge tag 'driver-core-6.11-rc4' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are two driver fixes for regressions from 6.11-rc1 due to the driver core change making a structure in a driver core callback const. These were missed by all testing EXCEPT for what Bart happened to be running, so I appreciate the fixes provided here for some odd/not-often-used driver subsystems that nothing else happened to catch. Both of these fixes have been in linux-next all week with no reported issues" * tag 'driver-core-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: mips: sgi-ip22: Fix the build ARM: riscpc: ecard: Fix the build
2024-08-17Merge tag 'powerpc-6.11-2' of ↵Linus Torvalds4-4/+16
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix crashes on 85xx with some configs since the recent hugepd rework. - Fix boot warning with hugepages and CONFIG_DEBUG_VIRTUAL on some platforms. - Don't enable offline cores when changing SMT modes, to match existing userspace behaviour. Thanks to Christophe Leroy, Dr. David Alan Gilbert, Guenter Roeck, Nysal Jan K.A, Shrikanth Hegde, Thomas Gleixner, and Tyrel Datwyler. * tag 'powerpc-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/topology: Check if a core is online cpu/SMT: Enable SMT only if a core is online powerpc/mm: Fix boot warning with hugepages and CONFIG_DEBUG_VIRTUAL powerpc/mm: Fix size of allocated PGDIR soc: fsl: qbman: remove unused struct 'cgr_comp'
2024-08-16Merge tag 'arm64-fixes' of ↵Linus Torvalds4-5/+4
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Fix the arm64 __get_mem_asm() to use the _ASM_EXTABLE_##type##ACCESS() macro instead of the *_ERR() one in order to avoid writing -EFAULT to the value register in case of a fault - Initialise all elements of the acpi_early_node_map[] to NUMA_NO_NODE. Prior to this fix, only the first element was initialised - Move the KASAN random tag seed initialisation after the per-CPU areas have been initialised (prng_state is __percpu) * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Fix KASAN random tag seed initialization arm64: ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE arm64: uaccess: correct thinko in __get_mem_asm()
2024-08-16Merge tag 'riscv-for-linus-6.11-rc4' of ↵Linus Torvalds10-21/+32
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - reintroduce the text patching global icache flush - fix syscall entry code to correctly initialize a0, which manifested as a strace bug - XIP kernels now map the entire kernel, which fixes boot under at least DEBUG_VIRTUAL=y - initialize all nodes in the acpi_early_node_map initializer - fix OOB access in the Andes vendor extension probing code - A new key for scalar misaligned access performance in hwprobe, which correctly treat the values as an enum (as opposed to a bitmap) * tag 'riscv-for-linus-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fix out-of-bounds when accessing Andes per hart vendor extension array RISC-V: hwprobe: Add SCALAR to misaligned perf defines RISC-V: hwprobe: Add MISALIGNED_PERF key RISC-V: ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE riscv: change XIP's kernel_map.size to be size of the entire kernel riscv: entry: always initialize regs->a0 to -ENOSYS riscv: Re-introduce global icache flush in patch_text_XXX()
2024-08-15Merge patch series "RISC-V: hwprobe: Misaligned scalar perf fix and rename"Palmer Dabbelt5-15/+22
Evan Green <evan@rivosinc.com> says: The CPUPERF0 hwprobe key was documented and identified in code as a bitmask value, but its contents were an enum. This produced incorrect behavior in conjunction with the WHICH_CPUS hwprobe flag. The first patch in this series fixes the bitmask/enum problem by creating a new hwprobe key that returns the same data, but is properly described as a value instead of a bitmask. The second patch renames the value definitions in preparation for adding vector misaligned access info. As of this version, the old defines are kept in place to maintain source compatibility with older userspace programs. * b4-shazam-merge: RISC-V: hwprobe: Add SCALAR to misaligned perf defines RISC-V: hwprobe: Add MISALIGNED_PERF key Link: https://lore.kernel.org/r/20240809214444.3257596-1-evan@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-15riscv: Fix out-of-bounds when accessing Andes per hart vendor extension arrayAlexandre Ghiti1-1/+1
The out-of-bounds access is reported by UBSAN: [ 0.000000] UBSAN: array-index-out-of-bounds in ../arch/riscv/kernel/vendor_extensions.c:41:66 [ 0.000000] index -1 is out of range for type 'riscv_isavendorinfo [32]' [ 0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.11.0-rc2ubuntu-defconfig #2 [ 0.000000] Hardware name: riscv-virtio,qemu (DT) [ 0.000000] Call Trace: [ 0.000000] [<ffffffff94e078ba>] dump_backtrace+0x32/0x40 [ 0.000000] [<ffffffff95c83c1a>] show_stack+0x38/0x44 [ 0.000000] [<ffffffff95c94614>] dump_stack_lvl+0x70/0x9c [ 0.000000] [<ffffffff95c94658>] dump_stack+0x18/0x20 [ 0.000000] [<ffffffff95c8bbb2>] ubsan_epilogue+0x10/0x46 [ 0.000000] [<ffffffff95485a82>] __ubsan_handle_out_of_bounds+0x94/0x9c [ 0.000000] [<ffffffff94e09442>] __riscv_isa_vendor_extension_available+0x90/0x92 [ 0.000000] [<ffffffff94e043b6>] riscv_cpufeature_patch_func+0xc4/0x148 [ 0.000000] [<ffffffff94e035f8>] _apply_alternatives+0x42/0x50 [ 0.000000] [<ffffffff95e04196>] apply_boot_alternatives+0x3c/0x100 [ 0.000000] [<ffffffff95e05b52>] setup_arch+0x85a/0x8bc [ 0.000000] [<ffffffff95e00ca0>] start_kernel+0xa4/0xfb6 The dereferencing using cpu should actually not happen, so remove it. Fixes: 23c996fc2bc1 ("riscv: Extend cpufeature.c to detect vendor extensions") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240814192619.276794-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-15arm64: Fix KASAN random tag seed initializationSamuel Holland2-3/+2
Currently, kasan_init_sw_tags() is called before setup_per_cpu_areas(), so per_cpu(prng_state, cpu) accesses the same address regardless of the value of "cpu", and the same seed value gets copied to the percpu area for every CPU. Fix this by moving the call to smp_prepare_boot_cpu(), which is the first architecture hook after setup_per_cpu_areas(). Fixes: 3c9e3aa11094 ("kasan: add tag related helper functions") Fixes: 3f41b6093823 ("kasan: fix random seed generation for tag-based mode") Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Link: https://lore.kernel.org/r/20240814091005.969756-1-samuel.holland@sifive.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-08-14RISC-V: hwprobe: Add SCALAR to misaligned perf definesEvan Green4-14/+19
In preparation for misaligned vector performance hwprobe keys, rename the hwprobe key values associated with misaligned scalar accesses to include the term SCALAR. Leave the old defines in place to maintain source compatibility. This change is intended to be a functional no-op. Signed-off-by: Evan Green <evan@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20240809214444.3257596-3-evan@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-14RISC-V: hwprobe: Add MISALIGNED_PERF keyEvan Green3-1/+3
RISCV_HWPROBE_KEY_CPUPERF_0 was mistakenly flagged as a bitmask in hwprobe_key_is_bitmask(), when in reality it was an enum value. This causes problems when used in conjunction with RISCV_HWPROBE_WHICH_CPUS, since SLOW, FAST, and EMULATED have values whose bits overlap with each other. If the caller asked for the set of CPUs that was SLOW or EMULATED, the returned set would also include CPUs that were FAST. Introduce a new hwprobe key, RISCV_HWPROBE_KEY_MISALIGNED_PERF, which returns the same values in response to a direct query (with no flags), but is properly handled as an enumerated value. As a result, SLOW, FAST, and EMULATED are all correctly treated as distinct values under the new key when queried with the WHICH_CPUS flag. Leave the old key in place to avoid disturbing applications which may have already come to rely on the key, with or without its broken behavior with respect to the WHICH_CPUS flag. Fixes: e178bf146e4b ("RISC-V: hwprobe: Introduce which-cpus flag") Signed-off-by: Evan Green <evan@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20240809214444.3257596-2-evan@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-14RISC-V: ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODEHaibo Xu1-1/+1
Currently, only acpi_early_node_map[0] was initialized to NUMA_NO_NODE. To ensure all the values were properly initialized, switch to initialize all of them to NUMA_NO_NODE. Fixes: eabd9db64ea8 ("ACPI: RISCV: Add NUMA support based on SRAT and SLIT") Reported-by: Andrew Jones <ajones@ventanamicro.com> Suggested-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Haibo Xu <haibo1.xu@intel.com> Reviewed-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Link: https://lore.kernel.org/r/0d362a8ae50558b95685da4c821b2ae9e8cf78be.1722828421.git.haibo1.xu@intel.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-14riscv: change XIP's kernel_map.size to be size of the entire kernelNam Cao1-2/+2
With XIP kernel, kernel_map.size is set to be only the size of data part of the kernel. This is inconsistent with "normal" kernel, who sets it to be the size of the entire kernel. More importantly, XIP kernel fails to boot if CONFIG_DEBUG_VIRTUAL is enabled, because there are checks on virtual addresses with the assumption that kernel_map.size is the size of the entire kernel (these checks are in arch/riscv/mm/physaddr.c). Change XIP's kernel_map.size to be the size of the entire kernel. Signed-off-by: Nam Cao <namcao@linutronix.de> Cc: <stable@vger.kernel.org> # v6.1+ Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240508191917.2892064-1-namcao@linutronix.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-14riscv: entry: always initialize regs->a0 to -ENOSYSCeleste Liu1-2/+2
Otherwise when the tracer changes syscall number to -1, the kernel fails to initialize a0 with -ENOSYS and subsequently fails to return the error code of the failed syscall to userspace. For example, it will break strace syscall tampering. Fixes: 52449c17bdd1 ("riscv: entry: set a0 = -ENOSYS only when syscall != -1") Reported-by: "Dmitry V. Levin" <ldv@strace.io> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Cc: stable@vger.kernel.org Signed-off-by: Celeste Liu <CoelacanthusHex@gmail.com> Link: https://lore.kernel.org/r/20240627142338.5114-2-CoelacanthusHex@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-14arm64: ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODEHaibo Xu1-1/+1
Currently, only acpi_early_node_map[0] was initialized to NUMA_NO_NODE. To ensure all the values were properly initialized, switch to initialize all of them to NUMA_NO_NODE. Fixes: e18962491696 ("arm64: numa: rework ACPI NUMA initialization") Cc: <stable@vger.kernel.org> # 4.19.x Reported-by: Andrew Jones <ajones@ventanamicro.com> Suggested-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Haibo Xu <haibo1.xu@intel.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Link: https://lore.kernel.org/r/853d7f74aa243f6f5999e203246f0d1ae92d2b61.1722828421.git.haibo1.xu@intel.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-08-14arm64: uaccess: correct thinko in __get_mem_asm()Mark Rutland1-1/+1
In the CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y version of __get_mem_asm(), we incorrectly use _ASM_EXTABLE_##type##ACCESS_ERR() such that upon a fault the extable fixup handler writes -EFAULT into "%w0", which is the register containing 'x' (the result of the load). This was a thinko in commit: 86a6a68febfcf57b ("arm64: start using 'asm goto' for get_user() when available") Prior to that commit _ASM_EXTABLE_##type##ACCESS_ERR_ZERO() was used such that the extable fixup handler wrote -EFAULT into "%w0" (the register containing 'err'), and zero into "%w1" (the register containing 'x'). When the 'err' variable was removed, the extable entry was updated incorrectly. Writing -EFAULT to the value register is unnecessary but benign: * We never want -EFAULT in the value register, and previously this would have been zeroed in the extable fixup handler. * In __get_user_error() the value is overwritten with zero explicitly in the error path. * The asm goto outputs cannot be used when the goto label is taken, as older compilers (e.g. clang < 16.0.0) do not guarantee that asm goto outputs are usable in this path and may use a stale value rather than the value in an output register. Consequently, zeroing in the extable fixup handler is insufficient to ensure callers see zero in the error path. * The expected usage of unsafe_get_user() and get_kernel_nofault() requires that the value is not consumed in the error path. Some versions of GCC would mis-compile asm goto with outputs, and erroneously omit subsequent assignments, breaking the error path handling in __get_user_error(). This was discussed at: https://lore.kernel.org/lkml/ZpfxLrJAOF2YNqCk@J2N7QTR9R3.cambridge.arm.com/ ... and was fixed by removing support for asm goto with outputs on those broken compilers in commit: f2f6a8e887172503 ("init/Kconfig: remove CONFIG_GCC_ASM_GOTO_OUTPUT_WORKAROUND") With that out of the way, we can safely replace the usage of _ASM_EXTABLE_##type##ACCESS_ERR() with _ASM_EXTABLE_##type##ACCESS(), leaving the value register unchanged in the case a fault is taken, as was originally intended. This matches other architectures and matches our __put_mem_asm(). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240807103731.2498893-1-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-08-14KVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)Sean Christopherson1-0/+2
Disallow read-only memslots for SEV-{ES,SNP} VM types, as KVM can't directly emulate instructions for ES/SNP, and instead the guest must explicitly request emulation. Unless the guest explicitly requests emulation without accessing memory, ES/SNP relies on KVM creating an MMIO SPTE, with the subsequent #NPF being reflected into the guest as a #VC. But for read-only memslots, KVM deliberately doesn't create MMIO SPTEs, because except for ES/SNP, doing so requires setting reserved bits in the SPTE, i.e. the SPTE can't be readable while also generating a #VC on writes. Because KVM never creates MMIO SPTEs and jumps directly to emulation, the guest never gets a #VC. And since KVM simply resumes the guest if ES/SNP guests trigger emulation, KVM effectively puts the vCPU into an infinite #NPF loop if the vCPU attempts to write read-only memory. Disallow read-only memory for all VMs with protected state, i.e. for upcoming TDX VMs as well as ES/SNP VMs. For TDX, it's actually possible to support read-only memory, as TDX uses EPT Violation #VE to reflect the fault into the guest, e.g. KVM could configure read-only SPTEs with RX protections and SUPPRESS_VE=0. But there is no strong use case for supporting read-only memslots on TDX, e.g. the main historical usage is to emulate option ROMs, but TDX disallows executing from shared memory. And if someone comes along with a legitimate, strong use case, the restriction can always be lifted for TDX. Don't bother trying to retroactively apply the restriction to SEV-ES VMs that are created as type KVM_X86_DEFAULT_VM. Read-only memslots can't possibly work for SEV-ES, i.e. disallowing such memslots is really just means reporting an error to userspace instead of silently hanging vCPUs. Trying to deal with the ordering between KVM_SEV_INIT and memslot creation isn't worth the marginal benefit it would provide userspace. Fixes: 26c44aa9e076 ("KVM: SEV: define VM types for SEV and SEV-ES") Fixes: 1dfe571c12cf ("KVM: SEV: Add initial SEV-SNP support") Cc: Peter Gonda <pgonda@google.com> Cc: Michael Roth <michael.roth@amd.com> Cc: Vishal Annapurve <vannapurve@google.com> Cc: Ackerly Tng <ackerleytng@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240809190319.1710470-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-08-13KVM: x86: Make x2APIC ID 100% readonlySean Christopherson1-7/+15
Ignore the userspace provided x2APIC ID when fixing up APIC state for KVM_SET_LAPIC, i.e. make the x2APIC fully readonly in KVM. Commit a92e2543d6a8 ("KVM: x86: use hardware-compatible format for APIC ID register"), which added the fixup, didn't intend to allow userspace to modify the x2APIC ID. In fact, that commit is when KVM first started treating the x2APIC ID as readonly, apparently to fix some race: static inline u32 kvm_apic_id(struct kvm_lapic *apic) { - return (kvm_lapic_get_reg(apic, APIC_ID) >> 24) & 0xff; + /* To avoid a race between apic_base and following APIC_ID update when + * switching to x2apic_mode, the x2apic mode returns initial x2apic id. + */ + if (apic_x2apic_mode(apic)) + return apic->vcpu->vcpu_id; + + return kvm_lapic_get_reg(apic, APIC_ID) >> 24; } Furthermore, KVM doesn't support delivering interrupts to vCPUs with a modified x2APIC ID, but KVM *does* return the modified value on a guest RDMSR and for KVM_GET_LAPIC. I.e. no remotely sane setup can actually work with a modified x2APIC ID. Making the x2APIC ID fully readonly fixes a WARN in KVM's optimized map calculation, which expects the LDR to align with the x2APIC ID. WARNING: CPU: 2 PID: 958 at arch/x86/kvm/lapic.c:331 kvm_recalculate_apic_map+0x609/0xa00 [kvm] CPU: 2 PID: 958 Comm: recalc_apic_map Not tainted 6.4.0-rc3-vanilla+ #35 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.2-1-1 04/01/2014 RIP: 0010:kvm_recalculate_apic_map+0x609/0xa00 [kvm] Call Trace: <TASK> kvm_apic_set_state+0x1cf/0x5b0 [kvm] kvm_arch_vcpu_ioctl+0x1806/0x2100 [kvm] kvm_vcpu_ioctl+0x663/0x8a0 [kvm] __x64_sys_ioctl+0xb8/0xf0 do_syscall_64+0x56/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7fade8b9dd6f Unfortunately, the WARN can still trigger for other CPUs than the current one by racing against KVM_SET_LAPIC, so remove it completely. Reported-by: Michal Luczaj <mhal@rbox.co> Closes: https://lore.kernel.org/all/814baa0c-1eaa-4503-129f-059917365e80@rbox.co Reported-by: Haoyu Wu <haoyuwu254@gmail.com> Closes: https://lore.kernel.org/all/20240126161633.62529-1-haoyuwu254@gmail.com Reported-by: syzbot+545f1326f405db4e1c3e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000c2a6b9061cbca3c3@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240802202941.344889-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-08-13KVM: x86: Use this_cpu_ptr() instead of per_cpu_ptr(smp_processor_id())Isaku Yamahata1-4/+2
Use this_cpu_ptr() instead of open coding the equivalent in various user return MSR helpers. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Yuan Yao <yuan.yao@intel.com> [sean: massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Message-ID: <20240802201630.339306-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-08-13KVM: x86: hyper-v: Remove unused inline function kvm_hv_free_pa_page()Yue Haibing1-1/+0
There is no caller in tree since introduction in commit b4f69df0f65e ("KVM: x86: Make Hyper-V emulation optional") Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Message-ID: <20240803113233.128185-1-yuehaibing@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-08-13KVM: SVM: Fix an error code in sev_gmem_post_populate()Dan Carpenter1-2/+3
The copy_from_user() function returns the number of bytes which it was not able to copy. Return -EFAULT instead. Fixes: dee5a47cc7a4 ("KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Message-ID: <20240612115040.2423290-4-dan.carpenter@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-08-13Merge tag 'kvm-s390-master-6.11-1' of ↵Paolo Bonzini2-2/+10
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD Fix invalid gisa designation value when gisa is not in use. Panic if (un)share fails to maintain security.
2024-08-13Merge tag 'kvmarm-fixes-6.11-1' of ↵Paolo Bonzini16-33/+39
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.11, round #1 - Use kvfree() for the kvmalloc'd nested MMUs array - Set of fixes to address warnings in W=1 builds - Make KVM depend on assembler support for ARMv8.4 - Fix for vgic-debug interface for VMs without LPIs - Actually check ID_AA64MMFR3_EL1.S1PIE in get-reg-list selftest - Minor code / comment cleanups for configuring PAuth traps - Take kvm->arch.config_lock to prevent destruction / initialization race for a vCPU's CPUIF which may lead to a UAF
2024-08-13KVM: SVM: Fix uninitialized variable bugDan Carpenter1-1/+1
If snp_lookup_rmpentry() fails then "assigned" is printed in the error message but it was never initialized. Initialize it to false. Fixes: dee5a47cc7a4 ("KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Message-ID: <20240612115040.2423290-3-dan.carpenter@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-08-13mips: sgi-ip22: Fix the buildBart Van Assche1-1/+1
Fix a recently introduced build failure. Fixes: d69d80484598 ("driver core: have match() callback in struct bus_type take a const *") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240805232026.65087-3-bvanassche@acm.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-13ARM: riscpc: ecard: Fix the buildBart Van Assche1-1/+1
Fix a recently introduced build failure. Cc: Russell King <rmk+kernel@armlinux.org.uk> Fixes: d69d80484598 ("driver core: have match() callback in struct bus_type take a const *") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240805232026.65087-2-bvanassche@acm.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-13powerpc/topology: Check if a core is onlineNysal Jan K.A1-0/+13
topology_is_core_online() checks if the core a CPU belongs to is online. The core is online if at least one of the sibling CPUs is online. The first CPU of an online core is also online in the common case, so this should be fairly quick. Fixes: 73c58e7e1412 ("powerpc: Add HOTPLUG_SMT support") Signed-off-by: Nysal Jan K.A <nysal@linux.ibm.com> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240731030126.956210-3-nysal@linux.ibm.com
2024-08-12powerpc/mm: Fix boot warning with hugepages and CONFIG_DEBUG_VIRTUALChristophe Leroy2-2/+1
Booting with CONFIG_DEBUG_VIRTUAL leads to following warning when passing hugepage reservation on command line: Kernel command line: hugepagesz=1g hugepages=1 hugepagesz=64m hugepages=1 hugepagesz=256m hugepages=1 noreboot HugeTLB: allocating 1 of page size 1.00 GiB failed. Only allocated 0 hugepages. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at arch/powerpc/include/asm/io.h:948 __alloc_bootmem_huge_page+0xd4/0x284 Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 6.10.0-rc6-00396-g6b0e82791bd0-dirty #936 Hardware name: MPC8544DS e500v2 0x80210030 MPC8544 DS NIP: c1020240 LR: c10201d0 CTR: 00000000 REGS: c13fdd30 TRAP: 0700 Not tainted (6.10.0-rc6-00396-g6b0e82791bd0-dirty) MSR: 00021000 <CE,ME> CR: 44084288 XER: 20000000 GPR00: c10201d0 c13fde20 c130b560 e8000000 e8001000 00000000 00000000 c1420000 GPR08: 00000000 00028001 00000000 00000004 44084282 01066ac0 c0eb7c9c efffe149 GPR16: c0fc4228 0000005f ffffffff c0eb7d0c c0eb7cc0 c0eb7ce0 ffffffff 00000000 GPR24: c1441cec efffe153 e8001000 c14240c0 00000000 c1441d64 00000000 e8000000 NIP [c1020240] __alloc_bootmem_huge_page+0xd4/0x284 LR [c10201d0] __alloc_bootmem_huge_page+0x64/0x284 Call Trace: [c13fde20] [c10201d0] __alloc_bootmem_huge_page+0x64/0x284 (unreliable) [c13fde50] [c10207b8] hugetlb_hstate_alloc_pages+0x8c/0x3e8 [c13fdeb0] [c1021384] hugepages_setup+0x240/0x2cc [c13fdef0] [c1000574] unknown_bootoption+0xfc/0x280 [c13fdf30] [c0078904] parse_args+0x200/0x4c4 [c13fdfa0] [c1000d9c] start_kernel+0x238/0x7d0 [c13fdff0] [c0000434] set_ivor+0x12c/0x168 Code: 554aa33e 7c042840 3ce0c142 80a7427c 5109a016 50caa016 7c9a2378 7fdcf378 4180000c 7c052040 41810160 7c095040 <0fe00000> 38c00000 40800108 3c60c0eb ---[ end trace 0000000000000000 ]--- This is due to virt_addr_valid() using high_memory before it is set. high_memory is set in mem_init() using max_low_pfn, but max_low_pfn is available long before, it is set in mem_topology_setup(). So just like commit daa9ada2093e ("powerpc/mm: Fix boot crash with FLATMEM") moved the setting of max_mapnr immediately after the call to mem_topology_setup(), the same can be done for high_memory. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/62b69c4baad067093f39e7e60df0fe27a86b8d2a.1723100702.git.christophe.leroy@csgroup.eu
2024-08-12powerpc/mm: Fix size of allocated PGDIRChristophe Leroy1-2/+2
Commit 6b0e82791bd0 ("powerpc/e500: switch to 64 bits PGD on 85xx (32 bits)") increased the size of PGD entries but failed to increase the PGD directory. Use the size of pgd_t instead of the size of pointers to calculate the allocated size. Reported-by: Guenter Roeck <linux@roeck-us.net> Fixes: 6b0e82791bd0 ("powerpc/e500: switch to 64 bits PGD on 85xx (32 bits)") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/1cdaacb391cbd3e0240f0e0faf691202874e9422.1723109462.git.christophe.leroy@csgroup.eu
2024-08-11Merge tag 'x86-urgent-2024-08-11' of ↵Linus Torvalds5-27/+41
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: - Fix 32-bit PTI for real. pti_clone_entry_text() is called twice, once before initcalls so that initcalls can use the user-mode helper and then again after text is set read only. Setting read only on 32-bit might break up the PMD mapping, which makes the second invocation of pti_clone_entry_text() find the mappings out of sync and failing. Allow the second call to split the existing PMDs in the user mapping and synchronize with the kernel mapping. - Don't make acpi_mp_wake_mailbox read-only after init as the mail box must be writable in the case that CPU hotplug operations happen after boot. Otherwise the attempt to start a CPU crashes with a write to read only memory. - Add a missing sanity check in mtrr_save_state() to ensure that the fixed MTRR MSRs are supported. Otherwise mtrr_save_state() ends up in a #GP, which is fixed up, but the WARN_ON() can bring systems down when panic on warn is set. * tag 'x86-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mtrr: Check if fixed MTRRs exist before saving them x86/paravirt: Fix incorrect virt spinlock setting on bare metal x86/acpi: Remove __ro_after_init from acpi_mp_wake_mailbox x86/mm: Fix PTI for i386 some more
2024-08-09Merge tag 'arm-fixes-6.11-1' of ↵Linus Torvalds8-53/+23
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "There are three sets of patches for the soc tree: - Marek Behún addresses multiple build time regressions caused by changes to the cznic turris-omnia support - Dmitry Torokhov fixes a regression in the legacy "gumstix" board code he cleaned up earlier - The TI K3 maintainers found multiple bugs in the in gpio, audio and pcie devicetree nodes" * tag 'arm-fixes-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: ARM: pxa/gumstix: fix attaching properties to vbus gpio device doc: platform: cznic: turris-omnia-mcu: Use double backticks for attribute value doc: platform: cznic: turris-omnia-mcu: Fix sphinx-build warning platform: cznic: turris-omnia-mcu: Make GPIO code optional platform: cznic: turris-omnia-mcu: Make poweroff and wakeup code optional platform: cznic: turris-omnia-mcu: Make TRNG code optional platform: cznic: turris-omnia-mcu: Make watchdog code optional arm64: dts: ti: k3-j784s4-main: Correct McASP DMAs arm64: dts: ti: k3-j722s: Fix gpio-range for main_pmx0 arm64: dts: ti: k3-am62p: Fix gpio-range for main_pmx0 arm64: dts: ti: k3-am62p: Add gpio-ranges for mcu_gpio0 arm64: dts: ti: k3-am62-verdin-dahlia: Keep CTRL_SLEEP_MOCI# regulator on arm64: dts: ti: k3-j784s4-evm: Consolidate serdes0 references arm64: dts: ti: k3-j784s4-evm: Assign only lanes 0 and 1 to PCIe1
2024-08-08KVM: arm64: vgic: Hold config_lock while tearing down a CPU interfaceMarc Zyngier1-2/+1
Tearing down a vcpu CPU interface involves freeing the private interrupt array. If we don't hold the lock, we may race against another thread trying to configure it. Yeah, fuzzers do wonderful things... Taking the lock early solves this particular problem. Fixes: 03b3d00a70b5 ("KVM: arm64: vgic: Allocate private interrupts on demand") Reported-by: Alexander Potapenko <glider@google.com> Tested-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240808091546.3262111-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-08-08MIPS: Loongson64: Set timer mode in cpu-probeJiaxun Yang1-0/+4
Loongson64 C and G processors have EXTIMER feature which is conflicting with CP0 counter. Although the processor resets in EXTIMER disabled & INTIMER enabled mode, which is compatible with MIPS CP0 compare, firmware may attempt to enable EXTIMER and interfere CP0 compare. Set timer mode back to MIPS compatible mode to fix booting on systems with such firmware before we have an actual driver for EXTIMER. Cc: stable@vger.kernel.org Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2024-08-08x86/mtrr: Check if fixed MTRRs exist before saving themAndi Kleen1-1/+1
MTRRs have an obsolete fixed variant for fine grained caching control of the 640K-1MB region that uses separate MSRs. This fixed variant has a separate capability bit in the MTRR capability MSR. So far all x86 CPUs which support MTRR have this separate bit set, so it went unnoticed that mtrr_save_state() does not check the capability bit before accessing the fixed MTRR MSRs. Though on a CPU that does not support the fixed MTRR capability this results in a #GP. The #GP itself is harmless because the RDMSR fault is handled gracefully, but results in a WARN_ON(). Add the missing capability check to prevent this. Fixes: 2b1f6278d77c ("[PATCH] x86: Save the MTRRs of the BSP before booting an AP") Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20240808000244.946864-1-ak@linux.intel.com
2024-08-07KVM: arm64: Tidying up PAuth code in KVMFuad Tabba4-15/+7
Tidy up some of the PAuth trapping code to clear up some comments and avoid clang/checkpatch warnings. Also, don't bother setting PAuth HCR_EL2 bits in pKVM, since it's handled by the hypervisor. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240722163311.1493879-1-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-08-07KVM: arm64: vgic-debug: Exit the iterator properly w/o LPIZenghui Yu1-2/+3
In case the guest doesn't have any LPI, we previously relied on the iterator setting 'intid = nr_spis + VGIC_NR_PRIVATE_IRQS' && 'lpi_idx = 1' to exit the iterator. But it was broken with commit 85d3ccc8b75b ("KVM: arm64: vgic-debug: Use an xarray mark for debug iterator") -- the intid remains at 'nr_spis + VGIC_NR_PRIVATE_IRQS - 1', and we end up endlessly printing the last SPI's state. Consider that it's meaningless to search the LPI xarray and populate lpi_idx when there is no LPI, let's just skip the process for that case. The result is that * If there's no LPI, we focus on the intid and exit the iterator when it runs out of the valid SPI range. * Otherwise we keep the current logic and let the xarray drive the iterator. Fixes: 85d3ccc8b75b ("KVM: arm64: vgic-debug: Use an xarray mark for debug iterator") Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240807052024.2084-1-yuzenghui@huawei.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-08-07KVM: arm64: Enforce dependency on an ARMv8.4-aware toolchainMarc Zyngier1-0/+1
With the NV support of TLBI-range operations, KVM makes use of instructions that are only supported by binutils versions >= 2.30. This breaks the build for very old toolchains. Make KVM support conditional on having ARMv8.4 support in the assembler, side-stepping the issue. Fixes: 5d476ca57d7d ("KVM: arm64: nv: Add handling of range-based TLBI operations") Reported-by: Viresh Kumar <viresh.kumar@linaro.org> Suggested-by: Arnd Bergmann <arnd@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20240807115144.3237260-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-08-07x86/paravirt: Fix incorrect virt spinlock setting on bare metalChen Yu2-9/+10
The kernel can change spinlock behavior when running as a guest. But this guest-friendly behavior causes performance problems on bare metal. The kernel uses a static key to switch between the two modes. In theory, the static key is enabled by default (run in guest mode) and should be disabled for bare metal (and in some guests that want native behavior or paravirt spinlock). A performance drop is reported when running encode/decode workload and BenchSEE cache sub-workload. Bisect points to commit ce0a1b608bfc ("x86/paravirt: Silence unused native_pv_lock_init() function warning"). When CONFIG_PARAVIRT_SPINLOCKS is disabled the virt_spin_lock_key is incorrectly set to true on bare metal. The qspinlock degenerates to test-and-set spinlock, which decreases the performance on bare metal. Set the default value of virt_spin_lock_key to false. If booting in a VM, enable this key. Later during the VM initialization, if other high-efficient spinlock is preferred (e.g. paravirt-spinlock), or the user wants the native qspinlock (via nopvspin boot commandline), the virt_spin_lock_key is disabled accordingly. This results in the following decision matrix: X86_FEATURE_HYPERVISOR Y Y Y N CONFIG_PARAVIRT_SPINLOCKS Y Y N Y/N PV spinlock Y N N Y/N virt_spin_lock_key N Y/N Y N Fixes: ce0a1b608bfc ("x86/paravirt: Silence unused native_pv_lock_init() function warning") Reported-by: Prem Nath Dey <prem.nath.dey@intel.com> Reported-by: Xiaoping Zhou <xiaoping.zhou@intel.com> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Suggested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Suggested-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20240806112207.29792-1-yu.c.chen@intel.com
2024-08-07x86/acpi: Remove __ro_after_init from acpi_mp_wake_mailboxZhiquan Li1-1/+1
On a platform using the "Multiprocessor Wakeup Structure"[1] to startup secondary CPUs the control processor needs to memremap() the physical address of the MP Wakeup Structure mailbox to the variable acpi_mp_wake_mailbox, which holds the virtual address of mailbox. To wake up the AP the control processor writes the APIC ID of AP, the wakeup vector and the ACPI_MP_WAKE_COMMAND_WAKEUP command into the mailbox. Current implementation doesn't consider the case which restricts boot time CPU bringup to 1 with the kernel parameter "maxcpus=1" and brings other CPUs online later from user space as it sets acpi_mp_wake_mailbox to read-only after init. So when the first AP is tried to brought online after init, the attempt to update the variable results in a kernel panic. The memremap() call that initializes the variable cannot be moved into acpi_parse_mp_wake() because memremap() is not functional at that point in the boot process. Also as the APs might never be brought up, keep the memremap() call in acpi_wakeup_cpu() so that the operation only takes place when needed. Fixes: 24dd05da8c79 ("x86/apic: Mark acpi_mp_wake_* variables as __ro_after_init") Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: https://lore.kernel.org/all/20240805103531.1230635-1-zhiquan1.li@intel.com