summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)AuthorFilesLines
2024-11-11Improve consistency of '#error' directive messagesNataniel Farzan2-2/+2
Remove the use of contractions and use proper punctuation in #error directive messages that discourage the direct inclusion of header files. Link: https://lkml.kernel.org/r/20241105032231.28833-1-natanielfarzan@gmail.com Signed-off-by: Nataniel Farzan <natanielfarzan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11asm/vga.h: don't bother with scr_mem{cpy,move}v() unless we need toAl Viro1-5/+0
... if they are identical to fallbacks, just leave them alone. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-11-10powerpc/irq: use seq_put_decimal_ull_width() for decimal valuesDavid Wang1-22/+22
On a system with n CPUs and m interrupts, there will be n*m decimal values yielded via seq_printf(.."%10u "..) which is less efficient than seq_put_decimal_ull_width(), stress reading /proc/interrupts indicates ~30% performance improvement with this patch. Signed-off-by: David Wang <00107082@163.com> [mpe: Flesh out change log based on original submission] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/all/20241103080552.4787-1-00107082@163.com Link: https://patch.msgid.link/20241108162327.9887-1-00107082@163.com
2024-11-10powerpc/pseries: Fix KVM guest detection for disabling hardlockup detectorGautam Menghani1-0/+1
As per the kernel documentation[1], hardlockup detector should be disabled in KVM guests as it may give false positives. On PPC, hardlockup detector is enabled inside KVM guests because disable_hardlockup_detector() is marked as early_initcall and it relies on kvm_guest static key (is_kvm_guest()) which is initialized later during boot by check_kvm_guest(), which is a core_initcall. check_kvm_guest() is also called in pSeries_smp_probe(), which is called before initcalls, but it is skipped if KVM guest does not have doorbell support or if the guest is launched with SMT=1. Call check_kvm_guest() in disable_hardlockup_detector() so that is_kvm_guest() check goes through fine and hardlockup detector can be disabled inside the KVM guest. [1]: Documentation/admin-guide/sysctl/kernel.rst Fixes: 633c8e9800f3 ("powerpc/pseries: Enable hardlockup watchdog for PowerVM partitions") Cc: stable@vger.kernel.org # v5.14+ Signed-off-by: Gautam Menghani <gautam@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241108094839.33084-1-gautam@linux.ibm.com
2024-11-10fadump: reserve param area if below boot_mem_topSourabh Jain1-1/+1
The param area is a memory region where the kernel places additional command-line arguments for fadump kernel. Currently, the param memory area is reserved in fadump kernel if it is above boot_mem_top. However, it should be reserved if it is below boot_mem_top because the fadump kernel already reserves memory from boot_mem_top to the end of DRAM. Currently, there is no impact from not reserving param memory if it is below boot_mem_top, as it is not used after the early boot phase of the fadump kernel. However, if this changes in the future, it could lead to issues in the fadump kernel. Fixes: 3416c9daa6b1 ("powerpc/fadump: pass additional parameters when fadump is active") Acked-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241107055817.489795-2-sourabhjain@linux.ibm.com
2024-11-10powerpc/fadump: allocate memory for additional parameters earlyHari Bathini3-5/+15
Memory for passing additional parameters to fadump capture kernel is allocated during subsys_initcall level, using memblock. But as slab is already available by this time, allocation happens via the buddy allocator. This may work for radix MMU but is likely to fail in most cases for hash MMU as hash MMU needs this memory in the first memory block for it to be accessible in real mode in the capture kernel (second boot). So, allocate memory for additional parameters area as soon as MMU mode is obvious. Fixes: 683eab94da75 ("powerpc/fadump: setup additional parameters for dump capture kernel") Reported-by: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com> Closes: https://lore.kernel.org/lkml/a70e4064-a040-447b-8556-1fd02f19383d@linux.vnet.ibm.com/T/#u Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241107055817.489795-1-sourabhjain@linux.ibm.com
2024-11-10powerpc/ftrace: Fix ftrace bug with KASAN=yMichael Ellerman1-3/+4
Booting a KASAN=y kernel with the recently added ftrace out-of-line support causes a warning at boot: ------------[ cut here ]------------ Stub index overflow (1729 > 1728) WARNING: CPU: 0 PID: 0 at arch/powerpc/kernel/trace/ftrace.c:209 ftrace_init_nop+0x408/0x444 ... NIP ftrace_init_nop+0x408/0x444 LR ftrace_init_nop+0x404/0x444 Call Trace: ftrace_init_nop+0x404/0x444 (unreliable) ftrace_process_locs+0x544/0x8a0 ftrace_init+0xb4/0x22c start_kernel+0x1dc/0x4d4 start_here_common+0x1c/0x20 ... ftrace failed to modify [<c0000000030beddc>] _sub_I_65535_1+0x8/0x3c actual: 00:00:00:60 Initializing ftrace call sites ftrace record flags: 0 (0) expected tramp: c00000000008b418 ------------[ cut here ]------------ The function in question, _sub_I_65535_1 is some sort of trampoline generated for KASAN, and is in the .text.startup section. That section is part of INIT_TEXT, meaning is_kernel_inittext() returns true for it. But the script that determines how many out-of-line ftrace stubs are needed isn't doesn't consider .text.startup as inittext, leading to there not being enough space for the init stubs. Conversely the logic to calculate how many stubs are needed for the text section isn't filtering out the symbols in .text.startup and so ends up over counting. Fix both problems by calculating the total number of stubs first, then the number that count as inittext, and then subtract the latter from the former to get the count for the text section. Fixes: eec37961a56a ("powerpc64/ftrace: Move ftrace sequence out of line") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241107111630.31068-1-mpe@ellerman.id.au
2024-11-08Merge tag 'kvm-riscv-6.13-1' of https://github.com/kvm-riscv/linux into HEADPaolo Bonzini6-4/+6
KVM/riscv changes for 6.13 - Accelerate KVM RISC-V when running as a guest - Perf support to collect KVM guest statistics from host side
2024-11-08KVM: powerpc: remove remaining traces of KVM_CAP_PPC_RMAPaolo Bonzini1-3/+0
This was only needed for PPC970 support, which is long gone: the implementation was removed in 2014. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241023124507.280382-2-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-11-08powerpc/cell: Use for_each_of_range() iteratorRob Herring (Arm)1-33/+16
Simplify the cell_iommu_get_fixed_address() dma-ranges parsing to use the for_each_of_range() iterator. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241106212647.341857-1-robh@kernel.org
2024-11-08powerpc/44x: Use for_each_of_range() iteratorRob Herring (Arm)1-14/+9
Simplify the ppc44x PCI dma-ranges parsing to use the for_each_of_range() iterator. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241106212640.341677-1-robh@kernel.org
2024-11-07asm-generic: introduce text-patching.hMike Rapoport (Microsoft)41-40/+40
Several architectures support text patching, but they name the header files that declare patching functions differently. Make all such headers consistently named text-patching.h and add an empty header in asm-generic for architectures that do not support text patching. Link: https://lkml.kernel.org/r/20241023162711.2579610-4-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: kdevops <kdevops@lists.linux.dev> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Song Liu <song@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm: drop hugetlb_get_unmapped_area{_*} functionsOscar Salvador1-10/+0
Hugetlb mappings are now handled through normal channels just like any other mapping, so we no longer need hugetlb_get_unmapped_area* specific functions. Link: https://lkml.kernel.org/r/20241007075037.267650-8-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Peter Xu <peterx@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06arch/powerpc: teach book3s64 arch_get_unmapped_area{_topdown} to handle ↵Oscar Salvador1-10/+30
hugetlb mappings We want to stop special casing hugetlb mappings and make them go through generic channels, so teach arch_get_unmapped_area{_topdown} to handle those. Reshuffle file_to_psize() definition so arch_get_unmapped_area{_topdown} can make use of it. Link: https://lkml.kernel.org/r/20241007075037.267650-6-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Peter Xu <peterx@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07powerpc/ps3: Mark ps3_setup_uhc_device() __initGeert Uytterhoeven1-1/+1
ps3_setup_uhc_device() is only called from ps3_setup_ehci_device() and ps3_setup_ohci_device(), which are both marked __init. Hence replace the former's __ref marker by __init. Note that before commit bd721ea73e1f9655 ("treewide: replace obsolete _refok by __ref"), the function was marked __init_refok, which probably should have been __init in the first place. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/31fe9435056fcfbf82c3a01693be278d5ce4ad0f.1730899557.git.geert+renesas@glider.be
2024-11-06fs/xattr: add *at family syscallsChristian Göttsche1-0/+4
Add the four syscalls setxattrat(), getxattrat(), listxattrat() and removexattrat(). Those can be used to operate on extended attributes, especially security related ones, either relative to a pinned directory or on a file descriptor without read access, avoiding a /proc/<pid>/fd/<fd> detour, requiring a mounted procfs. One use case will be setfiles(8) setting SELinux file contexts ("security.selinux") without race conditions and without a file descriptor opened with read access requiring SELinux read permission. Use the do_{name}at() pattern from fs/open.c. Pass the value of the extended attribute, its length, and for setxattrat(2) the command (XATTR_CREATE or XATTR_REPLACE) via an added struct xattr_args to not exceed six syscall arguments and not merging the AT_* and XATTR_* flags. [AV: fixes by Christian Brauner folded in, the entire thing rebased on top of {filename,file}_...xattr() primitives, treatment of empty pathnames regularized. As the result, AT_EMPTY_PATH+NULL handling is cheap, so f...(2) can use it] Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Link: https://lore.kernel.org/r/20240426162042.191916-1-cgoettsche@seltendoof.de Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christian Brauner <brauner@kernel.org> CC: x86@kernel.org CC: linux-alpha@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: linux-arm-kernel@lists.infradead.org CC: linux-ia64@vger.kernel.org CC: linux-m68k@lists.linux-m68k.org CC: linux-mips@vger.kernel.org CC: linux-parisc@vger.kernel.org CC: linuxppc-dev@lists.ozlabs.org CC: linux-s390@vger.kernel.org CC: linux-sh@vger.kernel.org CC: sparclinux@vger.kernel.org CC: linux-fsdevel@vger.kernel.org CC: audit@vger.kernel.org CC: linux-arch@vger.kernel.org CC: linux-api@vger.kernel.org CC: linux-security-module@vger.kernel.org CC: selinux@vger.kernel.org [brauner: slight tweaks] Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-06powerpc: Add __must_check to set_memory_...()Christophe Leroy1-7/+7
After the following powerpc commits, all calls to set_memory_...() functions check returned value. - Commit 8f17bd2f4196 ("powerpc: Handle error in mark_rodata_ro() and mark_initmem_nx()") - Commit f7f18e30b468 ("powerpc/kprobes: Handle error returned by set_memory_rox()") - Commit 009cf11d4aab ("powerpc: Don't ignore errors from set_memory_{n}p() in __kernel_map_pages()") - Commit 9cbacb834b4a ("powerpc: Don't ignore errors from set_memory_{n}p() in __kernel_map_pages()") - Commit 78cb0945f714 ("powerpc: Handle error in mark_rodata_ro() and mark_initmem_nx()") All calls in core parts of the kernel also always check returned value, can be looked at with following query: $ git grep -w -e set_memory_ro -e set_memory_rw -e set_memory_x -e set_memory_nx -e set_memory_rox `find . -maxdepth 1 -type d | grep -v arch | grep /` It is now possible to flag those functions with __must_check to make sure no new unchecked call it added. Link: https://github.com/KSPP/linux/issues/7 Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/775dae48064a661554802ed24ed5bdffe1784724.1725723351.git.christophe.leroy@csgroup.eu
2024-11-06KVM: PPC: Book3S HV: Mask off LPCR_MER for a vCPU before running it to avoid ↵Gautam Menghani1-0/+12
spurious interrupts Running a L2 vCPU (see [1] for terminology) with LPCR_MER bit set and no pending interrupts results in that L2 vCPU getting an infinite flood of spurious interrupts. The 'if check' in kvmhv_run_single_vcpu() sets the LPCR_MER bit if there are pending interrupts. The spurious flood problem can be observed in 2 cases: 1. Crashing the guest while interrupt heavy workload is running a. Start a L2 guest and run an interrupt heavy workload (eg: ipistorm) b. While the workload is running, crash the guest (make sure kdump is configured) c. Any one of the vCPUs of the guest will start getting an infinite flood of spurious interrupts. 2. Running LTP stress tests in multiple guests at the same time a. Start 4 L2 guests. b. Start running LTP stress tests on all 4 guests at same time. c. In some time, any one/more of the vCPUs of any of the guests will start getting an infinite flood of spurious interrupts. The root cause of both the above issues is the same: 1. A NMI is sent to a running vCPU that has LPCR_MER bit set. 2. In the NMI path, all registers are refreshed, i.e, H_GUEST_GET_STATE is called for all the registers. 3. When H_GUEST_GET_STATE is called for LPCR, the vcpu->arch.vcore->lpcr of that vCPU at L1 level gets updated with LPCR_MER set to 1, and this new value is always used whenever that vCPU runs, regardless of whether there was a pending interrupt. 4. Since LPCR_MER is set, the vCPU in L2 always jumps to the external interrupt handler, and this cycle never ends. Fix the spurious flood by masking off the LPCR_MER bit before running a L2 vCPU to ensure that it is not set if there are no pending interrupts. [1] Terminology: 1. L0 : PAPR hypervisor running in HV mode 2. L1 : Linux guest (logical partition) running on top of L0 3. L2 : KVM guest running on top of L1 Fixes: ec0f6639fa88 ("KVM: PPC: Book3S HV nestedv2: Ensure LPCR_MER bit is passed to the L0") Cc: stable@vger.kernel.org # v6.8+ Signed-off-by: Gautam Menghani <gautam@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
2024-11-05powerpc: assert_pte_locked() use pte_offset_map_ro_nolock()Qi Zheng1-1/+1
In assert_pte_locked(), we just get the ptl and assert if it was already held, so convert it to using pte_offset_map_ro_nolock(). Link: https://lkml.kernel.org/r/42559e042eb6fc3129a40f710d671712030646b4.1727332572.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05KVM: PPC: Book3S HV: Add Power11 capability support for Nested PAPR guestsAmit Machhiwal5-8/+17
The Power11 architected and raw mode support in Linux was merged in commit c2ed087ed35c ("powerpc: Add Power11 architected and raw mode"), and the corresponding support in QEMU is pending in [1], which is currently in its V6. Currently, booting a KVM guest inside a pseries LPAR (Logical Partition) on a kernel without P11 support results the guest boot in a Power10 compatibility mode (i.e., with logical PVR of Power10). However, booting a KVM guest on a kernel with P11 support causes the following boot crash. On a Power11 LPAR, the Power Hypervisor (L0) returns a support for both Power10 and Power11 capabilities through H_GUEST_GET_CAPABILITIES hcall. However, KVM currently supports only Power10 capabilities, resulting in only Power10 capabilities being set as "nested capabilities" via an H_GUEST_SET_CAPABILITIES hcall. In the guest entry path, gs_msg_ops_kvmhv_nestedv2_config_fill_info() is called by kvmhv_nestedv2_flush_vcpu() to fill the GSB (Guest State Buffer) elements. The arch_compat is set to the logical PVR of Power11, followed by an H_GUEST_SET_STATE hcall. This hcall returns H_INVALID_ELEMENT_VALUE as a return code when setting a Power11 logical PVR, as only Power10 capabilities were communicated as supported between PHYP and KVM, utimately resulting in the KVM guest boot crash. KVM: unknown exit, hardware reason ffffffffffffffea NIP 000000007daf97e0 LR 000000007daf1aec CTR 000000007daf1ab4 XER 0000000020040000 CPU#0 MSR 8000000000103000 HID0 0000000000000000 HF 6c002000 iidx 3 didx 3 TB 00000000 00000000 DECR 0 GPR00 8000000000003000 000000007e580e20 000000007db26700 0000000000000000 GPR04 00000000041a0c80 000000007df7f000 0000000000200000 000000007df7f000 GPR08 000000007db6d5d8 000000007e65fa90 000000007db6d5d0 0000000000003000 GPR12 8000000000000001 0000000000000000 0000000000000000 0000000000000000 GPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20 0000000000000000 0000000000000000 0000000000000000 000000007db21a30 GPR24 000000007db65000 0000000000000000 0000000000000000 0000000000000003 GPR28 000000007db6d5e0 000000007db22220 000000007daf27ac 000000007db75000 CR 20000404 [ E - - - - G - G ] RES 000@ffffffffffffffff SRR0 000000007daf97e0 SRR1 8000000000102000 PVR 0000000000820200 VRSAVE 0000000000000000 SPRG0 0000000000000000 SPRG1 000000000000ff20 SPRG2 0000000000000000 SPRG3 0000000000000000 SPRG4 0000000000000000 SPRG5 0000000000000000 SPRG6 0000000000000000 SPRG7 0000000000000000 CFAR 0000000000000000 LPCR 0000000000020400 PTCR 0000000000000000 DAR 0000000000000000 DSISR 0000000000000000 Fix this by adding the Power11 capability support and the required plumbing in place. Note: * Booting a Power11 KVM nested PAPR guest requires [1] in QEMU. [1] https://lore.kernel.org/all/20240731055022.696051-1-adityag@linux.ibm.com/ Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241028101622.741573-1-amachhiw@linux.ibm.com
2024-11-05powerpc: Use str_enabled_disabled() helper functionThorsten Blum1-2/+3
Remove hard-coded strings by using the str_enabled_disabled() helper function. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241027222219.1173-2-thorsten.blum@linux.dev
2024-11-05powerpc/modules: start/end_opd are only needed for ABI v1Michael Ellerman1-0/+2
The start_opd/end_opd members of struct mod_arch_specific are only needed for kernels built using ELF ABI v1. Guard them with an ifdef to save a little bit of space on ELF ABI v2 kernels. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20240812063312.730496-1-mpe@ellerman.id.au
2024-11-05powerpc/ps3: replace open-coded sysfs_emit functionPaulo Miguel Almeida1-3/+2
sysfs_emit() helper function should be used when formatting the value to be returned to user space. This patch replaces open-coded sysfs_emit() in sysfs .show() callbacks Link: https://github.com/KSPP/linux/issues/105 Signed-off-by: Paulo Miguel Almeida <paulo.miguel.almeida.rodenas@gmail.com> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/ZxMV3YvSulJFZ8rk@mail.google.com
2024-11-04powerpc/vdso: Drop -mstack-protector-guard flags in 32-bit files with clangNathan Chancellor1-2/+6
Under certain conditions, the 64-bit '-mstack-protector-guard' flags may end up in the 32-bit vDSO flags, resulting in build failures due to the structure of clang's argument parsing of the stack protector options, which validates the arguments of the stack protector guard flags unconditionally in the frontend, choking on the 64-bit values when targeting 32-bit: clang: error: invalid value 'r13' in 'mstack-protector-guard-reg=', expected one of: r2 clang: error: invalid value 'r13' in 'mstack-protector-guard-reg=', expected one of: r2 make[3]: *** [arch/powerpc/kernel/vdso/Makefile:85: arch/powerpc/kernel/vdso/vgettimeofday-32.o] Error 1 make[3]: *** [arch/powerpc/kernel/vdso/Makefile:87: arch/powerpc/kernel/vdso/vgetrandom-32.o] Error 1 Remove these flags by adding them to the CC32FLAGSREMOVE variable, which already handles situations similar to this. Additionally, reformat and align a comment better for the expanding CONFIG_CC_IS_CLANG block. Cc: stable@vger.kernel.org # v6.1+ Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030-powerpc-vdso-drop-stackp-flags-clang-v1-1-d95e7376d29c@kernel.org
2024-11-03convert spu_run(2)Al Viro1-9/+5
all failure exits prior to fdget() are returns, fdput() is immediately followed by return. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-03switch spufs_calls_{get,put}() to CLASS() useAl Viro1-35/+17
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-03fdget(), trivial conversionsAl Viro3-37/+14
fdget() is the first thing done in scope, all matching fdput() are immediately followed by leaving the scope. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-02powerpc: Split systemcfg struct definitions out from vdsoThomas Weißschuh8-37/+58
The systemcfg data has nothing to do anymore with the vdso. Split it into a dedicated header file. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241010-vdso-generic-base-v1-27-b64f0842d512@linutronix.de
2024-11-02powerpc: Split systemcfg data out of vdso data pageThomas Weißschuh8-54/+48
The systemcfg data only has minimal overlap with the vdso data. Splitting the two avoids mapping the implementation-defined vdso data into /proc/ppc64/systemcfg. It is also a preparation for the standardization of vdso data storage. The only field actually used by both systemcfg and vdso is tb_ticks_per_sec and it is only changed once during time_init(). Initialize it in both structures there. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241010-vdso-generic-base-v1-26-b64f0842d512@linutronix.de
2024-11-02powerpc: Add kconfig option for the systemcfg pageThomas Weißschuh2-2/+10
The systemcfg page through procfs is only a backwards-compatible interface for very old applications. Make it possible to be disabled. This also creates a convenient config #define to guard any accesses to the systemcfg page. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241010-vdso-generic-base-v1-25-b64f0842d512@linutronix.de
2024-11-02powerpc/pseries/lparcfg: Use num_possible_cpus() for potential processorsThomas Weißschuh1-2/+1
The systemcfg processorCount variable tracks currently online variables, not possible ones, so the stored value is wrong. The code preferably tries to use the ibm,lrdr-capacity field 4 which "represents the maximum number of processors that the guest can have." Switch from processorCount to the better matching num_possible_cpus(). Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241010-vdso-generic-base-v1-24-b64f0842d512@linutronix.de
2024-11-02powerpc/pseries/lparcfg: Fix printing of system_active_processorsThomas Weißschuh1-1/+1
When printing the information "system_active_processors", the variable partition_potential_processors is used instead of partition_active_processors. The wrong value is displayed. Use partition_active_processors instead. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241010-vdso-generic-base-v1-23-b64f0842d512@linutronix.de
2024-11-02powerpc/procfs: Propagate error of remap_pfn_range()Thomas Weißschuh1-4/+3
If the operation fails and userspace is unaware it will access unmapped memory, crashing the process. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241010-vdso-generic-base-v1-22-b64f0842d512@linutronix.de
2024-11-02powerpc/vdso: Remove offset comment from 32bit vdso_arch_dataThomas Weißschuh1-1/+1
This offset was copy-pasted from the systemcfg structure. It has no meaning for the 32bit VDSO. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241010-vdso-generic-base-v1-21-b64f0842d512@linutronix.de
2024-10-31powerpc64/bpf: Add support for bpf trampolinesNaveen N Rao5-5/+891
Add support for bpf_arch_text_poke() and arch_prepare_bpf_trampoline() for 64-bit powerpc. While the code is generic, BPF trampolines are only enabled on 64-bit powerpc. 32-bit powerpc will need testing and some updates. BPF Trampolines adhere to the existing ftrace ABI utilizing a two-instruction profiling sequence, as well as the newer ABI utilizing a three-instruction profiling sequence enabling return with a 'blr'. The trampoline code itself closely follows x86 implementation. BPF prog JIT is extended to mimic 64-bit powerpc approach for ftrace having a single nop at function entry, followed by the function profiling sequence out-of-line and a separate long branch stub for calls to trampolines that are out of range. A dummy_tramp is provided to simplify synchronization similar to arm64. When attaching a bpf trampoline to a bpf prog, we can patch up to three things: - the nop at bpf prog entry to go to the out-of-line stub - the instruction in the out-of-line stub to either call the bpf trampoline directly, or to branch to the long_branch stub. - the trampoline address before the long_branch stub. We do not need any synchronization here since we always have a valid branch target regardless of the order in which the above stores are seen. dummy_tramp ensures that the long_branch stub goes to a valid destination on other cpus, even when the branch to the long_branch stub is seen before the updated trampoline address. However, when detaching a bpf trampoline from a bpf prog, or if changing the bpf trampoline address, we need synchronization to ensure that other cpus can no longer branch into the older trampoline so that it can be safely freed. bpf_tramp_image_put() uses rcu_tasks to ensure all cpus make forward progress, but we still need to ensure that other cpus execute isync (or some CSI) so that they don't go back into the trampoline again. While here, update the stale comment that describes the redzone usage in ppc64 BPF JIT. Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-18-hbathini@linux.ibm.com
2024-10-31samples/ftrace: Add support for ftrace direct samples on powerpcNaveen N Rao1-0/+2
Add powerpc 32-bit and 64-bit samples for ftrace direct. This serves to show the sample instruction sequence to be used by ftrace direct calls to adhere to the ftrace ABI. On 64-bit powerpc, TOC setup requires some additional work. Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-17-hbathini@linux.ibm.com
2024-10-31powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLSNaveen N Rao5-29/+116
Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS similar to the arm64 implementation. ftrace direct calls allow custom trampolines to be called into directly from function ftrace call sites, bypassing the ftrace trampoline completely. This functionality is currently utilized by BPF trampolines to hook into kernel function entries. Since we have limited relative branch range, we support ftrace direct calls through support for DYNAMIC_FTRACE_WITH_CALL_OPS. In this approach, ftrace trampoline is not entirely bypassed. Rather, it is re-purposed into a stub that reads direct_call field from the associated ftrace_ops structure and branches into that, if it is not NULL. For this, it is sufficient if we can ensure that the ftrace trampoline is reachable from all traceable functions. When multiple ftrace_ops are associated with a call site, we utilize a call back to set pt_regs->orig_gpr3 that can then be tested on the return path from the ftrace trampoline to branch into the direct caller. Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-16-hbathini@linux.ibm.com
2024-10-31powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPSNaveen N Rao7-12/+102
Implement support for DYNAMIC_FTRACE_WITH_CALL_OPS similar to the arm64 implementation. This works by patching-in a pointer to an associated ftrace_ops structure before each traceable function. If multiple ftrace_ops are associated with a call site, then a special ftrace_list_ops is used to enable iterating over all the registered ftrace_ops. If no ftrace_ops are associated with a call site, then a special ftrace_nop_ops structure is used to render the ftrace call as a no-op. ftrace trampoline can then read the associated ftrace_ops for a call site by loading from an offset from the LR, and branch directly to the associated function. The primary advantage with this approach is that we don't have to iterate over all the registered ftrace_ops for call sites that have a single ftrace_ops registered. This is the equivalent of implementing support for dynamic ftrace trampolines, which set up a special ftrace trampoline for each registered ftrace_ops and have individual call sites branch into those directly. A secondary advantage is that this gives us a way to add support for direct ftrace callers without having to resort to using stubs. The address of the direct call trampoline can be loaded from the ftrace_ops structure. To support this, we reserve a nop before each function on 32-bit powerpc. For 64-bit powerpc, two nops are reserved before each out-of-line stub. During ftrace activation, we update this location with the associated ftrace_ops pointer. Then, on ftrace entry, we load from this location and call into ftrace_ops->func(). For 64-bit powerpc, we ensure that the out-of-line stub area is doubleword aligned so that ftrace_ops address can be updated atomically. Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-15-hbathini@linux.ibm.com
2024-10-31powerpc64/ftrace: Support .text larger than 32MB with out-of-line stubsNaveen N Rao6-13/+58
We are restricted to a .text size of ~32MB when using out-of-line function profile sequence. Allow this to be extended up to the previous limit of ~64MB by reserving space in the middle of .text. A new config option CONFIG_PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE is introduced to specify the number of function stubs that are reserved in .text. On boot, ftrace utilizes stubs from this area first before using the stub area at the end of .text. A ppc64le defconfig has ~44k functions that can be traced. A more conservative value of 32k functions is chosen as the default value of PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE so that we do not allot more space than necessary by default. If building a kernel that only has 32k trace-able functions, we won't allot any more space at the end of .text during the pass on vmlinux.o. Otherwise, only the remaining functions get space for stubs at the end of .text. This default value should help cover a .text size of ~48MB in total (including space reserved at the end of .text which can cover up to 32MB), which should be sufficient for most common builds. For a very small kernel build, this can be set to 0. Or, this can be bumped up to a larger value to support vmlinux .text size up to ~64MB. Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-14-hbathini@linux.ibm.com
2024-10-31powerpc64/ftrace: Move ftrace sequence out of lineNaveen N Rao12-38/+380
Function profile sequence on powerpc includes two instructions at the beginning of each function: mflr r0 bl ftrace_caller The call to ftrace_caller() gets nop'ed out during kernel boot and is patched in when ftrace is enabled. Given the sequence, we cannot return from ftrace_caller with 'blr' as we need to keep LR and r0 intact. This results in link stack (return address predictor) imbalance when ftrace is enabled. To address that, we would like to use a three instruction sequence: mflr r0 bl ftrace_caller mtlr r0 Further more, to support DYNAMIC_FTRACE_WITH_CALL_OPS, we need to reserve two instruction slots before the function. This results in a total of five instruction slots to be reserved for ftrace use on each function that is traced. Move the function profile sequence out-of-line to minimize its impact. To do this, we reserve a single nop at function entry using -fpatchable-function-entry=1 and add a pass on vmlinux.o to determine the total number of functions that can be traced. This is then used to generate a .S file reserving the appropriate amount of space for use as ftrace stubs, which is built and linked into vmlinux. On bootup, the stub space is split into separate stubs per function and populated with the proper instruction sequence. A pointer to the associated stub is maintained in dyn_arch_ftrace. For modules, space for ftrace stubs is reserved from the generic module stub space. This is restricted to and enabled by default only on 64-bit powerpc, though there are some changes to accommodate 32-bit powerpc. This is done so that 32-bit powerpc could choose to opt into this based on further tests and benchmarks. As an example, after this patch, kernel functions will have a single nop at function entry: <kernel_clone>: addis r2,r12,467 addi r2,r2,-16028 nop mfocrf r11,8 ... When ftrace is enabled, the nop is converted to an unconditional branch to the stub associated with that function: <kernel_clone>: addis r2,r12,467 addi r2,r2,-16028 b ftrace_ool_stub_text_end+0x11b28 mfocrf r11,8 ... The associated stub: <ftrace_ool_stub_text_end+0x11b28>: mflr r0 bl ftrace_caller mtlr r0 b kernel_clone+0xc ... This change showed an improvement of ~10% in null_syscall benchmark on a Power 10 system with ftrace enabled. Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-13-hbathini@linux.ibm.com
2024-10-31powerpc/ftrace: Add a postlink script to validate function tracerNaveen N Rao3-1/+59
Function tracer on powerpc can only work with vmlinux having a .text size of up to ~64MB due to powerpc branch instruction having a limited relative branch range of 32MB. Today, this is only detected on kernel boot when ftrace is init'ed. Add a post-link script to check the size of .text so that we can detect this at build time, and break the build if necessary. We add a dependency on !COMPILE_TEST for CONFIG_HAVE_FUNCTION_TRACER so that allyesconfig and other test builds can continue to work without enabling ftrace. Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-11-hbathini@linux.ibm.com
2024-10-31powerpc64/bpf: Fold bpf_jit_emit_func_call_hlp() into ↵Naveen N Rao1-44/+17
bpf_jit_emit_func_call_rel() Commit 61688a82e047 ("powerpc/bpf: enable kfunc call") enhanced bpf_jit_emit_func_call_hlp() to handle calls out to module region, where bpf progs are generated. The only difference now between bpf_jit_emit_func_call_hlp() and bpf_jit_emit_func_call_rel() is in handling of the initial pass where target function address is not known. Fold that logic into bpf_jit_emit_func_call_hlp() and rename it to bpf_jit_emit_func_call_rel() to simplify bpf function call JIT code. We don't actually need to load/restore TOC across a call out to a different kernel helper or to a different bpf program since they all work with the kernel TOC. We only need to do it if we have to call out to a module function. So, guard TOC load/restore with appropriate conditions. Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-10-hbathini@linux.ibm.com
2024-10-31powerpc/ftrace: Move ftrace stub used for init text before _einittextNaveen N Rao1-2/+1
Move the ftrace stub used to cover inittext before _einittext so that it is within kernel text, as seen through core_kernel_text(). This is required for a subsequent change to ftrace. Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-9-hbathini@linux.ibm.com
2024-10-31powerpc/ftrace: Skip instruction patching if the instructions are the sameNaveen N Rao1-1/+1
To simplify upcoming changes to ftrace, add a check to skip actual instruction patching if the old and new instructions are the same. We still validate that the instruction is what we expect, but don't actually patch the same instruction again. Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-8-hbathini@linux.ibm.com
2024-10-31powerpc/ftrace: Remove pointer to struct module from dyn_arch_ftraceNaveen N Rao3-63/+56
Pointer to struct module is only relevant for ftrace records belonging to kernel modules. Having this field in dyn_arch_ftrace wastes memory for all ftrace records belonging to the kernel. Remove the same in favour of looking up the module from the ftrace record address, similar to other architectures. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-7-hbathini@linux.ibm.com
2024-10-31powerpc/module_64: Convert #ifdef to IS_ENABLED()Naveen N Rao1-8/+2
Minor refactor for converting #ifdef to IS_ENABLED(). Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-6-hbathini@linux.ibm.com
2024-10-31powerpc32/ftrace: Unify 32-bit and 64-bit ftrace entry codeNaveen N Rao2-4/+6
On 32-bit powerpc, gcc generates a three instruction sequence for function profiling: mflr r0 stw r0, 4(r1) bl _mcount On kernel boot, the call to _mcount() is nop-ed out, to be patched back in when ftrace is actually enabled. The 'stw' instruction therefore is not necessary unless ftrace is enabled. Nop it out during ftrace init. When ftrace is enabled, we want the 'stw' so that stack unwinding works properly. Perform the same within the ftrace handler, similar to 64-bit powerpc. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-5-hbathini@linux.ibm.com
2024-10-31powerpc64/ftrace: Nop out additional 'std' instruction emitted by gcc v5.xNaveen N Rao1-1/+5
Gcc v5.x emits a 3-instruction sequence for -mprofile-kernel: mflr r0 std r0, 16(r1) bl _mcount Gcc v6.x moved to a simpler 2-instruction sequence by removing the 'std' instruction. The store saved the return address in the LR save area in the caller stack frame for stack unwinding. However, with dynamic ftrace, we no longer have a call to _mcount on kernel boot when ftrace is not enabled. When ftrace is enabled, that store is performed within ftrace_caller(). As such, the additional 'std' instruction is redundant. Nop it out on kernel boot. With this change, we now use the same 2-instruction profiling sequence with both -mprofile-kernel, as well as -fpatchable-function-entry on 64-bit powerpc. Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-4-hbathini@linux.ibm.com
2024-10-31powerpc/kprobes: Use ftrace to determine if a probe is at function entryNaveen N Rao1-10/+8
Rather than hard-coding the offset into a function to be used to determine if a kprobe is at function entry, use ftrace_location() to determine the ftrace location within the function and categorize all instructions till that offset to be function entry. For functions that cannot be traced, we fall back to using a fixed offset of 8 (two instructions) to categorize a probe as being at function entry for 64-bit elfv2, unless we are using pcrel. Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-3-hbathini@linux.ibm.com
2024-10-31powerpc/trace: Account for -fpatchable-function-entry support by toolchainNaveen N Rao1-4/+7
So far, we have relied on the fact that gcc supports both -mprofile-kernel, as well as -fpatchable-function-entry, and clang supports neither. Our Makefile only checks for CONFIG_MPROFILE_KERNEL to decide which files to build. Clang has a feature request out [*] to implement -fpatchable-function-entry, and is unlikely to support -mprofile-kernel. Update our Makefile checks so that we pick up the correct files to build once clang picks up support for -fpatchable-function-entry. [*] https://github.com/llvm/llvm-project/issues/57031 Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-2-hbathini@linux.ibm.com