Age | Commit message (Collapse) | Author | Files | Lines |
|
Replace an of_get_property() call by of_property_match_string()
so that this function implementation can be simplified.
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/linuxppc-dev/d9bdc1b6-ea7e-47aa-80aa-02ae649abf72@csgroup.eu/
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/linuxppc-dev/87cyk97ufp.fsf@mail.lhotse/
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/ede25e03-7a14-4787-ae1b-4fc9290add5a@web.de
|
|
Commit 384e338a9187 ("powerpc: drop MPC8540_ADS and MPC8560_ADS platform
support") and commit b751ed04bc5e ("powerpc: drop MPC85xx_CDS platform
support") removes the platform support for MPC8540_ADS, MPC8560_ADS and
MPC85xx_CDS in the source tree, but misses to remove the config options in
the Kconfig file. Hence, these three config options are without any effect
since then.
Drop these three dead config options.
Fixes: 384e338a9187 ("powerpc: drop MPC8540_ADS and MPC8560_ADS platform support")
Fixes: b751ed04bc5e ("powerpc: drop MPC85xx_CDS platform support")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20240927095203.392365-1-lukas.bulwahn@redhat.com
|
|
Replace `cpumask_any_and(a, b) >= nr_cpu_ids`
with the more readable `!cpumask_intersects(a, b)`.
Comparison between cpumask_any_and() and cpumask_intersects()
The cpumask_any_and() function expands using FIND_FIRST_BIT(),
resulting in a loop that iterates through each bit of the bitmask:
for (idx = 0; idx * BITS_PER_LONG < sz; idx++) {
val = (FETCH);
if (val) {
sz = min(idx * BITS_PER_LONG + __ffs(MUNGE(val)), sz);
break;
}
}
The cpumask_intersects() function expands using __bitmap_intersects(),
resulting in that the first loop iterates through each long word of the bitmask,
and the second through each bit within a long word:
unsigned int k, lim = bits/BITS_PER_LONG;
for (k = 0; k < lim; ++k)
if (bitmap1[k] & bitmap2[k])
return true;
if (bits % BITS_PER_LONG)
if ((bitmap1[k] & bitmap2[k]) & BITMAP_LAST_WORD_MASK(bits))
return true;
Conclusion: cpumask_intersects() is at least as efficient as cpumask_any_and(),
if not more so, as it typically performs fewer loops and comparisons.
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20240926092623.399577-2-costa.shul@redhat.com
|
|
If there were no anamolies noted, then we can simply remove the log file
and return, but only after the path variable has been initialized.
Signed-off-by: zhang jiao <zhangjiao2@cmss.chinamobile.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20240930012757.2395-1-zhangjiao2@cmss.chinamobile.com
|
|
Currently this cannot lookup symbol beyond 64 characters in some cases
like "ls", "lp" and "t"
Fix this by using KSYM_NAME_LEN instead of fixed 64 characters
Signed-off-by: Mukesh Kumar Chaurasiya <mchauras@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241024191232.1570894-2-mchauras@linux.ibm.com
|
|
The correct format string for resource_size_t is %pa which
acts on the address of the variable to be formatted [1].
[1] https://elixir.bootlin.com/linux/v6.11.3/source/Documentation/core-api/printk-formats.rst#L229
Introduced by commit 9d9326d3bc0e ("phy: Change mii_bus id field to a string")
Flagged by gcc-14 as:
arch/powerpc/platforms/82xx/ep8248e.c: In function 'ep8248e_mdio_probe':
arch/powerpc/platforms/82xx/ep8248e.c:131:46: warning: format '%x' expects argument of type 'unsigned int', but argument 4 has type 'resource_size_t' {aka 'long long unsigned int'} [-Wformat=]
131 | snprintf(bus->id, MII_BUS_ID_SIZE, "%x", res.start);
| ~^ ~~~~~~~~~
| | |
| | resource_size_t {aka long long unsigned int}
| unsigned int
| %llx
No functional change intended.
Compile tested only.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/netdev/711d7f6d-b785-7560-f4dc-c6aad2cce99@linux-m68k.org/
Signed-off-by: Simon Horman <horms@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241014-ep8248e-pa-fmt-v1-1-009ea0dcc18f@kernel.org
|
|
Reorganize kerneldoc parameter names to match the parameter
order in the function header.
Problems identified using Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20240930112121.95324-12-Julia.Lawall@inria.fr
|
|
These functions are not used outside of sstep.c
Fixes: 350779a29f11 ("powerpc: Handle most loads and stores in instruction emulation code")
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241001130356.14664-1-msuchanek@suse.de
|
|
These offsets are not used anymore, delete them.
Fixes: c39b1dcf055d ("powerpc/vdso: Add a page for non-time data")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241113-vdso-powerpc-asm-offsets-v1-1-3f7e589f090d@linutronix.de
|
|
spu_priv1_beat_ops were removed in commit bf4981a00636 ("powerpc: Remove
the celleb support"), remove the unneeded extern.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241112114805.453894-1-mpe@ellerman.id.au
|
|
This driver is no longer buildable since the PPC_MAPLE platform was
removed, see commit 62f8f307c80e ("powerpc/64: Remove maple platform").
Remove the driver.
Note that the comment in the driver says it supports "SMU & 970FX
based G5 Macs", but that's not true, that comment was copied from
pmac64-cpufreq.c, which still exists and continues to support those
machines.
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241112085148.415574-1-mpe@ellerman.id.au
|
|
On a system with n CPUs and m interrupts, there will be n*m decimal
values yielded via seq_printf(.."%10u "..) which is less efficient
than seq_put_decimal_ull_width(), stress reading /proc/interrupts
indicates ~30% performance improvement with this patch.
Signed-off-by: David Wang <00107082@163.com>
[mpe: Flesh out change log based on original submission]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/all/20241103080552.4787-1-00107082@163.com
Link: https://patch.msgid.link/20241108162327.9887-1-00107082@163.com
|
|
As per the kernel documentation[1], hardlockup detector should
be disabled in KVM guests as it may give false positives. On
PPC, hardlockup detector is enabled inside KVM guests because
disable_hardlockup_detector() is marked as early_initcall and it
relies on kvm_guest static key (is_kvm_guest()) which is initialized
later during boot by check_kvm_guest(), which is a core_initcall.
check_kvm_guest() is also called in pSeries_smp_probe(), which is called
before initcalls, but it is skipped if KVM guest does not have doorbell
support or if the guest is launched with SMT=1.
Call check_kvm_guest() in disable_hardlockup_detector() so that
is_kvm_guest() check goes through fine and hardlockup detector can be
disabled inside the KVM guest.
[1]: Documentation/admin-guide/sysctl/kernel.rst
Fixes: 633c8e9800f3 ("powerpc/pseries: Enable hardlockup watchdog for PowerVM partitions")
Cc: stable@vger.kernel.org # v5.14+
Signed-off-by: Gautam Menghani <gautam@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241108094839.33084-1-gautam@linux.ibm.com
|
|
The param area is a memory region where the kernel places additional
command-line arguments for fadump kernel. Currently, the param memory
area is reserved in fadump kernel if it is above boot_mem_top. However,
it should be reserved if it is below boot_mem_top because the fadump
kernel already reserves memory from boot_mem_top to the end of DRAM.
Currently, there is no impact from not reserving param memory if it is
below boot_mem_top, as it is not used after the early boot phase of the
fadump kernel. However, if this changes in the future, it could lead to
issues in the fadump kernel.
Fixes: 3416c9daa6b1 ("powerpc/fadump: pass additional parameters when fadump is active")
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241107055817.489795-2-sourabhjain@linux.ibm.com
|
|
Memory for passing additional parameters to fadump capture kernel
is allocated during subsys_initcall level, using memblock. But
as slab is already available by this time, allocation happens via
the buddy allocator. This may work for radix MMU but is likely to
fail in most cases for hash MMU as hash MMU needs this memory in
the first memory block for it to be accessible in real mode in the
capture kernel (second boot). So, allocate memory for additional
parameters area as soon as MMU mode is obvious.
Fixes: 683eab94da75 ("powerpc/fadump: setup additional parameters for dump capture kernel")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>
Closes: https://lore.kernel.org/lkml/a70e4064-a040-447b-8556-1fd02f19383d@linux.vnet.ibm.com/T/#u
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241107055817.489795-1-sourabhjain@linux.ibm.com
|
|
Booting a KASAN=y kernel with the recently added ftrace out-of-line
support causes a warning at boot:
------------[ cut here ]------------
Stub index overflow (1729 > 1728)
WARNING: CPU: 0 PID: 0 at arch/powerpc/kernel/trace/ftrace.c:209 ftrace_init_nop+0x408/0x444
...
NIP ftrace_init_nop+0x408/0x444
LR ftrace_init_nop+0x404/0x444
Call Trace:
ftrace_init_nop+0x404/0x444 (unreliable)
ftrace_process_locs+0x544/0x8a0
ftrace_init+0xb4/0x22c
start_kernel+0x1dc/0x4d4
start_here_common+0x1c/0x20
...
ftrace failed to modify
[<c0000000030beddc>] _sub_I_65535_1+0x8/0x3c
actual: 00:00:00:60
Initializing ftrace call sites
ftrace record flags: 0
(0)
expected tramp: c00000000008b418
------------[ cut here ]------------
The function in question, _sub_I_65535_1 is some sort of trampoline
generated for KASAN, and is in the .text.startup section. That section
is part of INIT_TEXT, meaning is_kernel_inittext() returns true for it.
But the script that determines how many out-of-line ftrace stubs are
needed isn't doesn't consider .text.startup as inittext, leading to
there not being enough space for the init stubs.
Conversely the logic to calculate how many stubs are needed for the text
section isn't filtering out the symbols in .text.startup and so ends up
over counting.
Fix both problems by calculating the total number of stubs first, then
the number that count as inittext, and then subtract the latter from the
former to get the count for the text section.
Fixes: eec37961a56a ("powerpc64/ftrace: Move ftrace sequence out of line")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241107111630.31068-1-mpe@ellerman.id.au
|
|
Simplify the cell_iommu_get_fixed_address() dma-ranges parsing to use
the for_each_of_range() iterator.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241106212647.341857-1-robh@kernel.org
|
|
Simplify the ppc44x PCI dma-ranges parsing to use the
for_each_of_range() iterator.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241106212640.341677-1-robh@kernel.org
|
|
Currently the mitigation patching test errors out if the kernel is
tainted prior to the test running.
That causes the test to fail unnecessarily if some other test has caused
the kernel to be tainted, or if a proprietary or force module is loaded
for example.
Instead just warn if the kernel is tainted to begin with, and only
report a change in the taint state as an error in the test.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241106130453.1741013-5-mpe@ellerman.id.au
|
|
Fix some tests which weren't returning an error code from main.
Although these tests only ever return success, they can still fail if
they time out and the harness kills them. If that happens they still
return success to the shell, which is incorrect and confuses the higher
level error reporting.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241106130453.1741013-4-mpe@ellerman.id.au
|
|
Starting with Ubuntu 24.04, building the selftests with the big endian
compiler (which defaults to 32-bit) fails with errors:
stack_expansion_ldst.c:178:37: error: format '%lx' expects argument
of type 'long unsigned int', but argument 2 has type 'rlim_t' {aka 'long long unsigned int'}
subpage_prot.c:214:38: error: format '%lx' expects argument of type
'long unsigned int', but argument 3 has type 'off_t' {aka 'long long int'}
Prior to 24.04 rlim_t was long unsigned int, and off_t was long int.
Cast to unsigned long long and long long before passing to printf to
avoid the errors.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241106130453.1741013-3-mpe@ellerman.id.au
|
|
Each of the powerpc selftests runs with a timeout of 2 minutes by
default (see tools/testing/selftests/powerpc/harness.c).
But when tests are run with run_kselftest.sh it uses a timeout of 45
seconds, meaning some tests run OK standalone but fail when run with the
test runner.
So tell run_kselftest.sh to give each test 130 seconds, that should
allow the tests to complete, or be killed by the powerpc test harness
after 2 minutes. If for some reason the harness fails, or for the few
tests that don't use the harness, the 130 second timeout should catch
them if they get stuck.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241106130453.1741013-2-mpe@ellerman.id.au
|
|
The count_stcx_fail test runs for close to or just over 2 minutes, which
means it sometimes times out.
That's overkill for a test that just demonstrates some PMU counters
are working. Drop the 64 billion instruction case, to lower the runtime
to ~30s.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241106130453.1741013-1-mpe@ellerman.id.au
|
|
ps3_setup_uhc_device() is only called from ps3_setup_ehci_device() and
ps3_setup_ohci_device(), which are both marked __init. Hence replace
the former's __ref marker by __init.
Note that before commit bd721ea73e1f9655 ("treewide: replace obsolete
_refok by __ref"), the function was marked __init_refok, which probably
should have been __init in the first place.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/31fe9435056fcfbf82c3a01693be278d5ce4ad0f.1730899557.git.geert+renesas@glider.be
|
|
After the following powerpc commits, all calls to set_memory_...()
functions check returned value.
- Commit 8f17bd2f4196 ("powerpc: Handle error in mark_rodata_ro() and
mark_initmem_nx()")
- Commit f7f18e30b468 ("powerpc/kprobes: Handle error returned by
set_memory_rox()")
- Commit 009cf11d4aab ("powerpc: Don't ignore errors from
set_memory_{n}p() in __kernel_map_pages()")
- Commit 9cbacb834b4a ("powerpc: Don't ignore errors from
set_memory_{n}p() in __kernel_map_pages()")
- Commit 78cb0945f714 ("powerpc: Handle error in mark_rodata_ro() and
mark_initmem_nx()")
All calls in core parts of the kernel also always check returned value,
can be looked at with following query:
$ git grep -w -e set_memory_ro -e set_memory_rw -e set_memory_x -e set_memory_nx -e set_memory_rox `find . -maxdepth 1 -type d | grep -v arch | grep /`
It is now possible to flag those functions with __must_check to make
sure no new unchecked call it added.
Link: https://github.com/KSPP/linux/issues/7
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/775dae48064a661554802ed24ed5bdffe1784724.1725723351.git.christophe.leroy@csgroup.eu
|
|
Remove hard-coded strings by using the str_enabled_disabled() helper
function.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241027222219.1173-2-thorsten.blum@linux.dev
|
|
The start_opd/end_opd members of struct mod_arch_specific are only
needed for kernels built using ELF ABI v1. Guard them with an ifdef to
save a little bit of space on ELF ABI v2 kernels.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20240812063312.730496-1-mpe@ellerman.id.au
|
|
sysfs_emit() helper function should be used when formatting the value
to be returned to user space.
This patch replaces open-coded sysfs_emit() in sysfs .show() callbacks
Link: https://github.com/KSPP/linux/issues/105
Signed-off-by: Paulo Miguel Almeida <paulo.miguel.almeida.rodenas@gmail.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/ZxMV3YvSulJFZ8rk@mail.google.com
|
|
Under certain conditions, the 64-bit '-mstack-protector-guard' flags may
end up in the 32-bit vDSO flags, resulting in build failures due to the
structure of clang's argument parsing of the stack protector options,
which validates the arguments of the stack protector guard flags
unconditionally in the frontend, choking on the 64-bit values when
targeting 32-bit:
clang: error: invalid value 'r13' in 'mstack-protector-guard-reg=', expected one of: r2
clang: error: invalid value 'r13' in 'mstack-protector-guard-reg=', expected one of: r2
make[3]: *** [arch/powerpc/kernel/vdso/Makefile:85: arch/powerpc/kernel/vdso/vgettimeofday-32.o] Error 1
make[3]: *** [arch/powerpc/kernel/vdso/Makefile:87: arch/powerpc/kernel/vdso/vgetrandom-32.o] Error 1
Remove these flags by adding them to the CC32FLAGSREMOVE variable, which
already handles situations similar to this. Additionally, reformat and
align a comment better for the expanding CONFIG_CC_IS_CLANG block.
Cc: stable@vger.kernel.org # v6.1+
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030-powerpc-vdso-drop-stackp-flags-clang-v1-1-d95e7376d29c@kernel.org
|
|
Add support for bpf_arch_text_poke() and arch_prepare_bpf_trampoline()
for 64-bit powerpc. While the code is generic, BPF trampolines are only
enabled on 64-bit powerpc. 32-bit powerpc will need testing and some
updates.
BPF Trampolines adhere to the existing ftrace ABI utilizing a
two-instruction profiling sequence, as well as the newer ABI utilizing a
three-instruction profiling sequence enabling return with a 'blr'. The
trampoline code itself closely follows x86 implementation.
BPF prog JIT is extended to mimic 64-bit powerpc approach for ftrace
having a single nop at function entry, followed by the function
profiling sequence out-of-line and a separate long branch stub for calls
to trampolines that are out of range. A dummy_tramp is provided to
simplify synchronization similar to arm64.
When attaching a bpf trampoline to a bpf prog, we can patch up to three
things:
- the nop at bpf prog entry to go to the out-of-line stub
- the instruction in the out-of-line stub to either call the bpf trampoline
directly, or to branch to the long_branch stub.
- the trampoline address before the long_branch stub.
We do not need any synchronization here since we always have a valid
branch target regardless of the order in which the above stores are
seen. dummy_tramp ensures that the long_branch stub goes to a valid
destination on other cpus, even when the branch to the long_branch stub
is seen before the updated trampoline address.
However, when detaching a bpf trampoline from a bpf prog, or if changing
the bpf trampoline address, we need synchronization to ensure that other
cpus can no longer branch into the older trampoline so that it can be
safely freed. bpf_tramp_image_put() uses rcu_tasks to ensure all cpus
make forward progress, but we still need to ensure that other cpus
execute isync (or some CSI) so that they don't go back into the
trampoline again. While here, update the stale comment that describes
the redzone usage in ppc64 BPF JIT.
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-18-hbathini@linux.ibm.com
|
|
Add powerpc 32-bit and 64-bit samples for ftrace direct. This serves to
show the sample instruction sequence to be used by ftrace direct calls
to adhere to the ftrace ABI.
On 64-bit powerpc, TOC setup requires some additional work.
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-17-hbathini@linux.ibm.com
|
|
Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS similar to the arm64
implementation.
ftrace direct calls allow custom trampolines to be called into directly
from function ftrace call sites, bypassing the ftrace trampoline
completely. This functionality is currently utilized by BPF trampolines
to hook into kernel function entries.
Since we have limited relative branch range, we support ftrace direct
calls through support for DYNAMIC_FTRACE_WITH_CALL_OPS. In this
approach, ftrace trampoline is not entirely bypassed. Rather, it is
re-purposed into a stub that reads direct_call field from the associated
ftrace_ops structure and branches into that, if it is not NULL. For
this, it is sufficient if we can ensure that the ftrace trampoline is
reachable from all traceable functions.
When multiple ftrace_ops are associated with a call site, we utilize a
call back to set pt_regs->orig_gpr3 that can then be tested on the
return path from the ftrace trampoline to branch into the direct caller.
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-16-hbathini@linux.ibm.com
|
|
Implement support for DYNAMIC_FTRACE_WITH_CALL_OPS similar to the
arm64 implementation.
This works by patching-in a pointer to an associated ftrace_ops
structure before each traceable function. If multiple ftrace_ops are
associated with a call site, then a special ftrace_list_ops is used to
enable iterating over all the registered ftrace_ops. If no ftrace_ops
are associated with a call site, then a special ftrace_nop_ops structure
is used to render the ftrace call as a no-op. ftrace trampoline can then
read the associated ftrace_ops for a call site by loading from an offset
from the LR, and branch directly to the associated function.
The primary advantage with this approach is that we don't have to
iterate over all the registered ftrace_ops for call sites that have a
single ftrace_ops registered. This is the equivalent of implementing
support for dynamic ftrace trampolines, which set up a special ftrace
trampoline for each registered ftrace_ops and have individual call sites
branch into those directly.
A secondary advantage is that this gives us a way to add support for
direct ftrace callers without having to resort to using stubs. The
address of the direct call trampoline can be loaded from the ftrace_ops
structure.
To support this, we reserve a nop before each function on 32-bit
powerpc. For 64-bit powerpc, two nops are reserved before each
out-of-line stub. During ftrace activation, we update this location with
the associated ftrace_ops pointer. Then, on ftrace entry, we load from
this location and call into ftrace_ops->func().
For 64-bit powerpc, we ensure that the out-of-line stub area is
doubleword aligned so that ftrace_ops address can be updated atomically.
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-15-hbathini@linux.ibm.com
|
|
We are restricted to a .text size of ~32MB when using out-of-line
function profile sequence. Allow this to be extended up to the previous
limit of ~64MB by reserving space in the middle of .text.
A new config option CONFIG_PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE is
introduced to specify the number of function stubs that are reserved in
.text. On boot, ftrace utilizes stubs from this area first before using
the stub area at the end of .text.
A ppc64le defconfig has ~44k functions that can be traced. A more
conservative value of 32k functions is chosen as the default value of
PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE so that we do not allot more space
than necessary by default. If building a kernel that only has 32k
trace-able functions, we won't allot any more space at the end of .text
during the pass on vmlinux.o. Otherwise, only the remaining functions
get space for stubs at the end of .text. This default value should help
cover a .text size of ~48MB in total (including space reserved at the
end of .text which can cover up to 32MB), which should be sufficient for
most common builds. For a very small kernel build, this can be set to 0.
Or, this can be bumped up to a larger value to support vmlinux .text
size up to ~64MB.
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-14-hbathini@linux.ibm.com
|
|
Function profile sequence on powerpc includes two instructions at the
beginning of each function:
mflr r0
bl ftrace_caller
The call to ftrace_caller() gets nop'ed out during kernel boot and is
patched in when ftrace is enabled.
Given the sequence, we cannot return from ftrace_caller with 'blr' as we
need to keep LR and r0 intact. This results in link stack (return
address predictor) imbalance when ftrace is enabled. To address that, we
would like to use a three instruction sequence:
mflr r0
bl ftrace_caller
mtlr r0
Further more, to support DYNAMIC_FTRACE_WITH_CALL_OPS, we need to
reserve two instruction slots before the function. This results in a
total of five instruction slots to be reserved for ftrace use on each
function that is traced.
Move the function profile sequence out-of-line to minimize its impact.
To do this, we reserve a single nop at function entry using
-fpatchable-function-entry=1 and add a pass on vmlinux.o to determine
the total number of functions that can be traced. This is then used to
generate a .S file reserving the appropriate amount of space for use as
ftrace stubs, which is built and linked into vmlinux.
On bootup, the stub space is split into separate stubs per function and
populated with the proper instruction sequence. A pointer to the
associated stub is maintained in dyn_arch_ftrace.
For modules, space for ftrace stubs is reserved from the generic module
stub space.
This is restricted to and enabled by default only on 64-bit powerpc,
though there are some changes to accommodate 32-bit powerpc. This is
done so that 32-bit powerpc could choose to opt into this based on
further tests and benchmarks.
As an example, after this patch, kernel functions will have a single nop
at function entry:
<kernel_clone>:
addis r2,r12,467
addi r2,r2,-16028
nop
mfocrf r11,8
...
When ftrace is enabled, the nop is converted to an unconditional branch
to the stub associated with that function:
<kernel_clone>:
addis r2,r12,467
addi r2,r2,-16028
b ftrace_ool_stub_text_end+0x11b28
mfocrf r11,8
...
The associated stub:
<ftrace_ool_stub_text_end+0x11b28>:
mflr r0
bl ftrace_caller
mtlr r0
b kernel_clone+0xc
...
This change showed an improvement of ~10% in null_syscall benchmark on a
Power 10 system with ftrace enabled.
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-13-hbathini@linux.ibm.com
|
|
On powerpc, we would like to be able to make a pass on vmlinux.o and
generate a new object file to be linked into vmlinux. Add a generic pass
in Makefile.vmlinux that architectures can use for this purpose.
Architectures need to select CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX and must
provide arch/<arch>/tools/Makefile with .arch.vmlinux.o target, which
will be invoked prior to the final vmlinux link step.
Acked-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-12-hbathini@linux.ibm.com
|
|
Function tracer on powerpc can only work with vmlinux having a .text
size of up to ~64MB due to powerpc branch instruction having a limited
relative branch range of 32MB. Today, this is only detected on kernel
boot when ftrace is init'ed. Add a post-link script to check the size of
.text so that we can detect this at build time, and break the build if
necessary.
We add a dependency on !COMPILE_TEST for CONFIG_HAVE_FUNCTION_TRACER so
that allyesconfig and other test builds can continue to work without
enabling ftrace.
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-11-hbathini@linux.ibm.com
|
|
bpf_jit_emit_func_call_rel()
Commit 61688a82e047 ("powerpc/bpf: enable kfunc call") enhanced
bpf_jit_emit_func_call_hlp() to handle calls out to module region, where
bpf progs are generated. The only difference now between
bpf_jit_emit_func_call_hlp() and bpf_jit_emit_func_call_rel() is in
handling of the initial pass where target function address is not known.
Fold that logic into bpf_jit_emit_func_call_hlp() and rename it to
bpf_jit_emit_func_call_rel() to simplify bpf function call JIT code.
We don't actually need to load/restore TOC across a call out to a
different kernel helper or to a different bpf program since they all
work with the kernel TOC. We only need to do it if we have to call out
to a module function. So, guard TOC load/restore with appropriate
conditions.
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-10-hbathini@linux.ibm.com
|
|
Move the ftrace stub used to cover inittext before _einittext so that it
is within kernel text, as seen through core_kernel_text(). This is
required for a subsequent change to ftrace.
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-9-hbathini@linux.ibm.com
|
|
To simplify upcoming changes to ftrace, add a check to skip actual
instruction patching if the old and new instructions are the same. We
still validate that the instruction is what we expect, but don't
actually patch the same instruction again.
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-8-hbathini@linux.ibm.com
|
|
Pointer to struct module is only relevant for ftrace records belonging
to kernel modules. Having this field in dyn_arch_ftrace wastes memory
for all ftrace records belonging to the kernel. Remove the same in
favour of looking up the module from the ftrace record address, similar
to other architectures.
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-7-hbathini@linux.ibm.com
|
|
Minor refactor for converting #ifdef to IS_ENABLED().
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-6-hbathini@linux.ibm.com
|
|
On 32-bit powerpc, gcc generates a three instruction sequence for
function profiling:
mflr r0
stw r0, 4(r1)
bl _mcount
On kernel boot, the call to _mcount() is nop-ed out, to be patched back
in when ftrace is actually enabled. The 'stw' instruction therefore is
not necessary unless ftrace is enabled. Nop it out during ftrace init.
When ftrace is enabled, we want the 'stw' so that stack unwinding works
properly. Perform the same within the ftrace handler, similar to 64-bit
powerpc.
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-5-hbathini@linux.ibm.com
|
|
Gcc v5.x emits a 3-instruction sequence for -mprofile-kernel:
mflr r0
std r0, 16(r1)
bl _mcount
Gcc v6.x moved to a simpler 2-instruction sequence by removing the 'std'
instruction. The store saved the return address in the LR save area in
the caller stack frame for stack unwinding. However, with dynamic
ftrace, we no longer have a call to _mcount on kernel boot when ftrace
is not enabled. When ftrace is enabled, that store is performed within
ftrace_caller(). As such, the additional 'std' instruction is redundant.
Nop it out on kernel boot.
With this change, we now use the same 2-instruction profiling sequence
with both -mprofile-kernel, as well as -fpatchable-function-entry on
64-bit powerpc.
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-4-hbathini@linux.ibm.com
|
|
Rather than hard-coding the offset into a function to be used to
determine if a kprobe is at function entry, use ftrace_location() to
determine the ftrace location within the function and categorize all
instructions till that offset to be function entry.
For functions that cannot be traced, we fall back to using a fixed
offset of 8 (two instructions) to categorize a probe as being at
function entry for 64-bit elfv2, unless we are using pcrel.
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-3-hbathini@linux.ibm.com
|
|
So far, we have relied on the fact that gcc supports both
-mprofile-kernel, as well as -fpatchable-function-entry, and clang
supports neither. Our Makefile only checks for CONFIG_MPROFILE_KERNEL to
decide which files to build. Clang has a feature request out [*] to
implement -fpatchable-function-entry, and is unlikely to support
-mprofile-kernel.
Update our Makefile checks so that we pick up the correct files to build
once clang picks up support for -fpatchable-function-entry.
[*] https://github.com/llvm/llvm-project/issues/57031
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-2-hbathini@linux.ibm.com
|
|
The maple platform was added in 2004 [1], to support the "Maple" 970FX
evaluation board.
It was later used for IBM JS20/JS21 machines, as well as the Bimini
machine, aka "Yellow Dog Powerstation".
Sadly all those machines have passed into memory, and there's been no
evidence for years that anyone is still using any of them.
Remove the platform and related code. It can always be reinstated if
there's interest.
Note that this has no impact on support for 970FX based Power Macs.
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux-fullhistory.git/commit/?id=f0d068d65c5e555ffcfbc189de32598f6f00770c
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241013102957.548291-1-mpe@ellerman.id.au
|
|
The help text refers to lilo, but the install script does not run lilo
and never has. The reference to lilo seems to have come originally from
arch/ppc/Makefile, but it was not true there either.
Remove it.
Reported-by: Thorsten Leemhuis <linux@leemhuis.info>
Link: https://fosstodon.org/@kernellogger/113032940928131612
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241009053806.135807-1-mpe@ellerman.id.au
|
|
The dtl_access_lock needs to be a rw_sempahore, a sleeping lock, because
the code calls kmalloc() while holding it, which can sleep:
# echo 1 > /proc/powerpc/vcpudispatch_stats
BUG: sleeping function called from invalid context at include/linux/sched/mm.h:337
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 199, name: sh
preempt_count: 1, expected: 0
3 locks held by sh/199:
#0: c00000000a0743f8 (sb_writers#3){.+.+}-{0:0}, at: vfs_write+0x324/0x438
#1: c0000000028c7058 (dtl_enable_mutex){+.+.}-{3:3}, at: vcpudispatch_stats_write+0xd4/0x5f4
#2: c0000000028c70b8 (dtl_access_lock){+.+.}-{2:2}, at: vcpudispatch_stats_write+0x220/0x5f4
CPU: 0 PID: 199 Comm: sh Not tainted 6.10.0-rc4 #152
Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries
Call Trace:
dump_stack_lvl+0x130/0x148 (unreliable)
__might_resched+0x174/0x410
kmem_cache_alloc_noprof+0x340/0x3d0
alloc_dtl_buffers+0x124/0x1ac
vcpudispatch_stats_write+0x2a8/0x5f4
proc_reg_write+0xf4/0x150
vfs_write+0xfc/0x438
ksys_write+0x88/0x148
system_call_exception+0x1c4/0x5a0
system_call_common+0xf4/0x258
Fixes: 06220d78f24a ("powerpc/pseries: Introduce rwlock to gatekeep DTLB usage")
Tested-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Nysal Jan K.A <nysal@linux.ibm.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20240819122401.513203-1-mpe@ellerman.id.au
|
|
Drop the include of dma-mapping.h in machdep.h, replace it with forward
declarations of struct device and struct pci_dev, and include time64.h
and page.h which are required for time64_t and pgprot_t respectively.
Add direct includes of some other headers to some files that were
getting them via machdep.h.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241009051826.132805-2-mpe@ellerman.id.au
|