Age | Commit message (Collapse) | Author | Files | Lines |
|
Two new kfuncs are added, bpf_dynptr_slice and bpf_dynptr_slice_rdwr.
The user must pass in a buffer to store the contents of the data slice
if a direct pointer to the data cannot be obtained.
For skb and xdp type dynptrs, these two APIs are the only way to obtain
a data slice. However, for other types of dynptrs, there is no
difference between bpf_dynptr_slice(_rdwr) and bpf_dynptr_data.
For skb type dynptrs, the data is copied into the user provided buffer
if any of the data is not in the linear portion of the skb. For xdp type
dynptrs, the data is copied into the user provided buffer if the data is
between xdp frags.
If the skb is cloned and a call to bpf_dynptr_data_rdwr is made, then
the skb will be uncloned (see bpf_unclone_prologue()).
Please note that any bpf_dynptr_write() automatically invalidates any prior
data slices of the skb dynptr. This is because the skb may be cloned or
may need to pull its paged buffer into the head. As such, any
bpf_dynptr_write() will automatically have its prior data slices
invalidated, even if the write is to data in the skb head of an uncloned
skb. Please note as well that any other helper calls that change the
underlying packet buffer (eg bpf_skb_pull_data()) invalidates any data
slices of the skb dynptr as well, for the same reasons.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20230301154953.641654-10-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add xdp dynptrs, which are dynptrs whose underlying pointer points
to a xdp_buff. The dynptr acts on xdp data. xdp dynptrs have two main
benefits. One is that they allow operations on sizes that are not
statically known at compile-time (eg variable-sized accesses).
Another is that parsing the packet data through dynptrs (instead of
through direct access of xdp->data and xdp->data_end) can be more
ergonomic and less brittle (eg does not need manual if checking for
being within bounds of data_end).
For reads and writes on the dynptr, this includes reading/writing
from/to and across fragments. Data slices through the bpf_dynptr_data
API are not supported; instead bpf_dynptr_slice() and
bpf_dynptr_slice_rdwr() should be used.
For examples of how xdp dynptrs can be used, please see the attached
selftests.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20230301154953.641654-9-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add skb dynptrs, which are dynptrs whose underlying pointer points
to a skb. The dynptr acts on skb data. skb dynptrs have two main
benefits. One is that they allow operations on sizes that are not
statically known at compile-time (eg variable-sized accesses).
Another is that parsing the packet data through dynptrs (instead of
through direct access of skb->data and skb->data_end) can be more
ergonomic and less brittle (eg does not need manual if checking for
being within bounds of data_end).
For bpf prog types that don't support writes on skb data, the dynptr is
read-only (bpf_dynptr_write() will return an error)
For reads and writes through the bpf_dynptr_read() and bpf_dynptr_write()
interfaces, reading and writing from/to data in the head as well as from/to
non-linear paged buffers is supported. Data slices through the
bpf_dynptr_data API are not supported; instead bpf_dynptr_slice() and
bpf_dynptr_slice_rdwr() (added in subsequent commit) should be used.
For examples of how skb dynptrs can be used, please see the attached
selftests.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20230301154953.641654-8-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Some bpf dynptr functions will be called from places where
if CONFIG_BPF_SYSCALL is not set, then the dynptr function is
undefined. For example, when skb type dynptrs are added in the
next commit, dynptr functions are called from net/core/filter.c
This patch defines no-op implementations of these dynptr functions
so that they do not break compilation by being an undefined reference.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20230301154953.641654-5-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This change cleans up process_dynptr_func's flow to be more intuitive
and updates some comments with more context.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20230301154953.641654-3-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The size of bpf_cpumask is fixed, so there is no need to allocate many
bpf_mem_caches for bpf_cpumask_ma, just one bpf_mem_cache is enough.
Also add comments for bpf_mem_alloc_init() in bpf_mem_alloc.h to prevent
future miuse.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230216024821.2202916-1-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- Add dedicated kmem_cache for typical/small skb->head, avoid having
to access struct page at kfree time, and improve memory use.
- Introduce sysctl to set default RPS configuration for new netdevs.
- Define Netlink protocol specification format which can be used to
describe messages used by each family and auto-generate parsers.
Add tools for generating kernel data structures and uAPI headers.
- Expose all net/core sysctls inside netns.
- Remove 4s sleep in netpoll if carrier is instantly detected on
boot.
- Add configurable limit of MDB entries per port, and port-vlan.
- Continue populating drop reasons throughout the stack.
- Retire a handful of legacy Qdiscs and classifiers.
Protocols:
- Support IPv4 big TCP (TSO frames larger than 64kB).
- Add IP_LOCAL_PORT_RANGE socket option, to control local port range
on socket by socket basis.
- Track and report in procfs number of MPTCP sockets used.
- Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
manager.
- IPv6: don't check net.ipv6.route.max_size and rely on garbage
collection to free memory (similarly to IPv4).
- Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
- ICMP: add per-rate limit counters.
- Add support for user scanning requests in ieee802154.
- Remove static WEP support.
- Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
reporting.
- WiFi 7 EHT channel puncturing support (client & AP).
BPF:
- Add a rbtree data structure following the "next-gen data structure"
precedent set by recently added linked list, that is, by using
kfunc + kptr instead of adding a new BPF map type.
- Expose XDP hints via kfuncs with initial support for RX hash and
timestamp metadata.
- Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
better support decap on GRE tunnel devices not operating in collect
metadata.
- Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
- Remove the need for trace_printk_lock for bpf_trace_printk and
bpf_trace_vprintk helpers.
- Extend libbpf's bpf_tracing.h support for tracing arguments of
kprobes/uprobes and syscall as a special case.
- Significantly reduce the search time for module symbols by
livepatch and BPF.
- Enable cpumasks to be used as kptrs, which is useful for tracing
programs tracking which tasks end up running on which CPUs in
different time intervals.
- Add support for BPF trampoline on s390x and riscv64.
- Add capability to export the XDP features supported by the NIC.
- Add __bpf_kfunc tag for marking kernel functions as kfuncs.
- Add cgroup.memory=nobpf kernel parameter option to disable BPF
memory accounting for container environments.
Netfilter:
- Remove the CLUSTERIP target. It has been marked as obsolete for
years, and we still have WARN splats wrt races of the out-of-band
/proc interface installed by this target.
- Add 'destroy' commands to nf_tables. They are identical to the
existing 'delete' commands, but do not return an error if the
referenced object (set, chain, rule...) did not exist.
Driver API:
- Improve cpumask_local_spread() locality to help NICs set the right
IRQ affinity on AMD platforms.
- Separate C22 and C45 MDIO bus transactions more clearly.
- Introduce new DCB table to control DSCP rewrite on egress.
- Support configuration of Physical Layer Collision Avoidance (PLCA)
Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
shared medium Ethernet.
- Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
preemption of low priority frames by high priority frames.
- Add support for controlling MACSec offload using netlink SET.
- Rework devlink instance refcounts to allow registration and
de-registration under the instance lock. Split the code into
multiple files, drop some of the unnecessarily granular locks and
factor out common parts of netlink operation handling.
- Add TX frame aggregation parameters (for USB drivers).
- Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
messages with notifications for debug.
- Allow offloading of UDP NEW connections via act_ct.
- Add support for per action HW stats in TC.
- Support hardware miss to TC action (continue processing in SW from
a specific point in the action chain).
- Warn if old Wireless Extension user space interface is used with
modern cfg80211/mac80211 drivers. Do not support Wireless
Extensions for Wi-Fi 7 devices at all. Everyone should switch to
using nl80211 interface instead.
- Improve the CAN bit timing configuration. Use extack to return
error messages directly to user space, update the SJW handling,
including the definition of a new default value that will benefit
CAN-FD controllers, by increasing their oscillator tolerance.
New hardware / drivers:
- Ethernet:
- nVidia BlueField-3 support (control traffic driver)
- Ethernet support for imx93 SoCs
- Motorcomm yt8531 gigabit Ethernet PHY
- onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
- Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
- Amlogic gxl MDIO mux
- WiFi:
- RealTek RTL8188EU (rtl8xxxu)
- Qualcomm Wi-Fi 7 devices (ath12k)
- CAN:
- Renesas R-Car V4H
Drivers:
- Bluetooth:
- Set Per Platform Antenna Gain (PPAG) for Intel controllers.
- Ethernet NICs:
- Intel (1G, igc):
- support TSN / Qbv / packet scheduling features of i226 model
- Intel (100G, ice):
- use GNSS subsystem instead of TTY
- multi-buffer XDP support
- extend support for GPIO pins to E823 devices
- nVidia/Mellanox:
- update the shared buffer configuration on PFC commands
- implement PTP adjphase function for HW offset control
- TC support for Geneve and GRE with VF tunnel offload
- more efficient crypto key management method
- multi-port eswitch support
- Netronome/Corigine:
- add DCB IEEE support
- support IPsec offloading for NFP3800
- Freescale/NXP (enetc):
- support XDP_REDIRECT for XDP non-linear buffers
- improve reconfig, avoid link flap and waiting for idle
- support MAC Merge layer
- Other NICs:
- sfc/ef100: add basic devlink support for ef100
- ionic: rx_push mode operation (writing descriptors via MMIO)
- bnxt: use the auxiliary bus abstraction for RDMA
- r8169: disable ASPM and reset bus in case of tx timeout
- cpsw: support QSGMII mode for J721e CPSW9G
- cpts: support pulse-per-second output
- ngbe: add an mdio bus driver
- usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
- r8152: handle devices with FW with NCM support
- amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
- virtio-net: support multi buffer XDP
- virtio/vsock: replace virtio_vsock_pkt with sk_buff
- tsnep: XDP support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add support for latency TLV (in FW control messages)
- Microchip (sparx5):
- separate explicit and implicit traffic forwarding rules, make
the implicit rules always active
- add support for egress DSCP rewrite
- IS0 VCAP support (Ingress Classification)
- IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
etc.)
- ES2 VCAP support (Egress Access Control)
- support for Per-Stream Filtering and Policing (802.1Q,
8.6.5.1)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- add MAB (port auth) offload support
- enable PTP receive for mv88e6390
- NXP (ocelot):
- support MAC Merge layer
- support for the the vsc7512 internal copper phys
- Microchip:
- lan9303: convert to PHYLINK
- lan966x: support TC flower filter statistics
- lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
- lan937x: support Credit Based Shaper configuration
- ksz9477: support Energy Efficient Ethernet
- other:
- qca8k: convert to regmap read/write API, use bulk operations
- rswitch: Improve TX timestamp accuracy
- Intel WiFi (iwlwifi):
- EHT (Wi-Fi 7) rate reporting
- STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware.
- Qualcomm 802.11ax WiFi (ath11k):
- IPQ5018 support
- Fine Timing Measurement (FTM) responder role support
- channel 177 support
- MediaTek WiFi (mt76):
- per-PHY LED support
- mt7996: EHT (Wi-Fi 7) support
- Wireless Ethernet Dispatch (WED) reset support
- switch to using page pool allocator
- RealTek WiFi (rtw89):
- support new version of Bluetooth co-existance
- Mobile:
- rmnet: support TX aggregation"
* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
page_pool: add a comment explaining the fragment counter usage
net: ethtool: fix __ethtool_dev_mm_supported() implementation
ethtool: pse-pd: Fix double word in comments
xsk: add linux/vmalloc.h to xsk.c
sefltests: netdevsim: wait for devlink instance after netns removal
selftest: fib_tests: Always cleanup before exit
net/mlx5e: Align IPsec ASO result memory to be as required by hardware
net/mlx5e: TC, Set CT miss to the specific ct action instance
net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
net/mlx5: Refactor tc miss handling to a single function
net/mlx5: Kconfig: Make tc offload depend on tc skb extension
net/sched: flower: Support hardware miss to tc action
net/sched: flower: Move filter handle initialization earlier
net/sched: cls_api: Support hardware miss to tc action
net/sched: Rename user cookie and act cookie
sfc: fix builds without CONFIG_RTC_LIB
sfc: clean up some inconsistent indentings
net/mlx4_en: Introduce flexible array to silence overflow warning
net: lan966x: Fix possible deadlock inside PTP
net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
"API:
- Use kmap_local instead of kmap_atomic
- Change request callback to take void pointer
- Print FIPS status in /proc/crypto (when enabled)
Algorithms:
- Add rfc4106/gcm support on arm64
- Add ARIA AVX2/512 support on x86
Drivers:
- Add TRNG driver for StarFive SoC
- Delete ux500/hash driver (subsumed by stm32/hash)
- Add zlib support in qat
- Add RSA support in aspeed"
* tag 'v6.3-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (156 commits)
crypto: x86/aria-avx - Do not use avx2 instructions
crypto: aspeed - Fix modular aspeed-acry
crypto: hisilicon/qm - fix coding style issues
crypto: hisilicon/qm - update comments to match function
crypto: hisilicon/qm - change function names
crypto: hisilicon/qm - use min() instead of min_t()
crypto: hisilicon/qm - remove some unused defines
crypto: proc - Print fips status
crypto: crypto4xx - Call dma_unmap_page when done
crypto: octeontx2 - Fix objects shared between several modules
crypto: nx - Fix sparse warnings
crypto: ecc - Silence sparse warning
tls: Pass rec instead of aead_req into tls_encrypt_done
crypto: api - Remove completion function scaffolding
tls: Remove completion function scaffolding
tipc: Remove completion function scaffolding
net: ipv6: Remove completion function scaffolding
net: ipv4: Remove completion function scaffolding
net: macsec: Remove completion function scaffolding
dm: Remove completion function scaffolding
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Hans de Goede:
- AMD PMC: Improvements to aid s2idle debugging
- Dell WMI-DDV: hwmon support
- INT3472 camera sensor power-management: Improve privacy LED support
- Intel VSEC: Base TPMI (Topology Aware Register and PM Capsule
Interface) support
- Mellanox: SN5600 and Nvidia L1 switch support
- Microsoft Surface Support: Various cleanups + code improvements
- tools/intel-speed-select: Various improvements
- Miscellaneous other cleanups / fixes
* tag 'platform-drivers-x86-v6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (80 commits)
platform/x86: nvidia-wmi-ec-backlight: Add force module parameter
platform/x86/amd/pmf: Add depends on CONFIG_POWER_SUPPLY
platform/x86: dell-ddv: Prefer asynchronous probing
platform/x86: dell-ddv: Add hwmon support
Documentation/ABI: Add new attribute for mlxreg-io sysfs interfaces
platform: mellanox: mlx-platform: Move bus shift assignment out of the loop
platform: mellanox: mlx-platform: Add mux selection register to regmap
platform_data/mlxreg: Add field with mapped resource address
platform/mellanox: mlxreg-hotplug: Allow more flexible hotplug events configuration
platform: mellanox: Extend all systems with I2C notification callback
platform: mellanox: Split logic in init and exit flow
platform: mellanox: Split initialization procedure
platform: mellanox: Introduce support of new Nvidia L1 switch
platform: mellanox: Introduce support for next-generation 800GB/s switch
platform: mellanox: Cosmetic changes - rename to more common name
platform: mellanox: Change "reset_pwr_converter_fail" attribute
platform: mellanox: Introduce support for rack manager switch
MAINTAINERS: dell-wmi-sysman: drop Divya Bharathi
x86/platform/uv: Make kobj_type structure constant
platform/x86: think-lmi: Make kobj_type structure constant
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
Pull chrome platform updates from Tzung-Bi Shih:
"New drivers:
- cros_ec_uart for ChromeOS EC protocol over UART
- cros_typec_vdm for USB PD Vendor Defined Message
Improvements:
- Preserve logs as much as possible when EC panics
- Shutdown to refrain from potential HW damages when EC panics
Fixes:
- Fix DP_PORT_VDO to include DP_CAP_RECEPTACLE
- Fix a lockdep false positive
Cleanups:
- Use sysfs_emit*() instead of scnprintf()
- Use asm instead of asm-generic for unaligned.h
Misc:
- Rename module name from cros_ec_typec to cros-ec-typec
- Minor fixes"
* tag 'tag-chrome-platform-for-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux: (34 commits)
platform/chrome: cros_ec_typec: Fix spelling mistake
platform/chrome: cros_typec_vdm: Add Attention support
platform/chrome: cros_ec: Add VDM attention headers
platform/chrome: cros_typec_vdm: Fix VDO copy
platform/chrome: cros_ec_typec: allow deferred probe of switch handles
platform/chrome: cros_ec_proto: remove big stub objects from stack
platform/chrome: cros_ec_uart: fix negative type promoted to high
platform/chrome: cros_ec: Use per-device lockdep key
platform/chrome: fix kernel-doc warnings for cros_ec_command
platform/chrome: fix kernel-doc warning for last_resume_result
platform/chrome: fix kernel-doc warning for suspend_timeout_ms
platform/chrome: fix kernel-doc warnings for panic notifier
platform/chrome: cros_ec_lpc: initialize the buf variable
platform/chrome: cros_ec: Fix panic notifier registration
platform/chrome: cros_typec_switch: Check for retimer flag
platform/chrome: cros_typec_switch: Use fwnode* prop check
platform/chrome: cros_typec_vdm: Add VDM send support
platform/chrome: cros_typec_vdm: Add VDM reply support
platform/chrome: cros_ec_typec: Add initial VDM support
platform/chrome: cros_ec_typec: Alter module name with hyphens
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:
- allow Linux to run as the nested root partition for Microsoft
Hypervisor (Jinank Jain and Nuno Das Neves)
- clean up the return type of callback functions (Dawei Li)
* tag 'hyperv-next-signed-20230220' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Fix hv_get/set_register for nested bringup
Drivers: hv: Make remove callback of hyperv driver void returned
Drivers: hv: Enable vmbus driver for nested root partition
x86/hyperv: Add an interface to do nested hypercalls
Drivers: hv: Setup synic registers in case of nested root partition
x86/hyperv: Add support for detecting nested hypervisor
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- Support for arm64 SME 2 and 2.1. SME2 introduces a new 512-bit
architectural register (ZT0, for the look-up table feature) that
Linux needs to save/restore
- Include TPIDR2 in the signal context and add the corresponding
kselftests
- Perf updates: Arm SPEv1.2 support, HiSilicon uncore PMU updates, ACPI
support to the Marvell DDR and TAD PMU drivers, reset DTM_PMU_CONFIG
(ARM CMN) at probe time
- Support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64
- Permit EFI boot with MMU and caches on. Instead of cleaning the
entire loaded kernel image to the PoC and disabling the MMU and
caches before branching to the kernel bare metal entry point, leave
the MMU and caches enabled and rely on EFI's cacheable 1:1 mapping of
all of system RAM to populate the initial page tables
- Expose the AArch32 (compat) ELF_HWCAP features to user in an arm64
kernel (the arm32 kernel only defines the values)
- Harden the arm64 shadow call stack pointer handling: stash the shadow
stack pointer in the task struct on interrupt, load it directly from
this structure
- Signal handling cleanups to remove redundant validation of size
information and avoid reading the same data from userspace twice
- Refactor the hwcap macros to make use of the automatically generated
ID registers. It should make new hwcaps writing less error prone
- Further arm64 sysreg conversion and some fixes
- arm64 kselftest fixes and improvements
- Pointer authentication cleanups: don't sign leaf functions, unify
asm-arch manipulation
- Pseudo-NMI code generation optimisations
- Minor fixes for SME and TPIDR2 handling
- Miscellaneous updates: ARCH_FORCE_MAX_ORDER is now selectable,
replace strtobool() to kstrtobool() in the cpufeature.c code, apply
dynamic shadow call stack in two passes, intercept pfn changes in
set_pte_at() without the required break-before-make sequence, attempt
to dump all instructions on unhandled kernel faults
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (130 commits)
arm64: fix .idmap.text assertion for large kernels
kselftest/arm64: Don't require FA64 for streaming SVE+ZA tests
kselftest/arm64: Copy whole EXTRA context
arm64: kprobes: Drop ID map text from kprobes blacklist
perf: arm_spe: Print the version of SPE detected
perf: arm_spe: Add support for SPEv1.2 inverted event filtering
perf: Add perf_event_attr::config3
arm64/sme: Fix __finalise_el2 SMEver check
drivers/perf: fsl_imx8_ddr_perf: Remove set-but-not-used variable
arm64/signal: Only read new data when parsing the ZT context
arm64/signal: Only read new data when parsing the ZA context
arm64/signal: Only read new data when parsing the SVE context
arm64/signal: Avoid rereading context frame sizes
arm64/signal: Make interface for restore_fpsimd_context() consistent
arm64/signal: Remove redundant size validation from parse_user_sigframe()
arm64/signal: Don't redundantly verify FPSIMD magic
arm64/cpufeature: Use helper macros to specify hwcaps
arm64/cpufeature: Always use symbolic name for feature value in hwcaps
arm64/sysreg: Initial unsigned annotations for ID registers
arm64/sysreg: Initial annotation of signed ID registers
...
|
|
The results of "access_ok()" can be mis-speculated. The result is that
you can end speculatively:
if (access_ok(from, size))
// Right here
even for bad from/size combinations. On first glance, it would be ideal
to just add a speculation barrier to "access_ok()" so that its results
can never be mis-speculated.
But there are lots of system calls just doing access_ok() via
"copy_to_user()" and friends (example: fstat() and friends). Those are
generally not problematic because they do not _consume_ data from
userspace other than the pointer. They are also very quick and common
system calls that should not be needlessly slowed down.
"copy_from_user()" on the other hand uses a user-controller pointer and
is frequently followed up with code that might affect caches. Take
something like this:
if (!copy_from_user(&kernelvar, uptr, size))
do_something_with(kernelvar);
If userspace passes in an evil 'uptr' that *actually* points to a kernel
addresses, and then do_something_with() has cache (or other)
side-effects, it could allow userspace to infer kernel data values.
Add a barrier to the common copy_from_user() code to prevent
mis-speculated values which happen after the copy.
Also add a stub for architectures that do not define barrier_nospec().
This makes the macro usable in generic code.
Since the barrier is now usable in generic code, the x86 #ifdef in the
BPF code can also go away.
Reported-by: Jordy Zomer <jordyzomer@google.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Daniel Borkmann <daniel@iogearbox.net> # BPF bits
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control updates from Rafael Wysocki:
"The majority of changes here are related to the general switch-over to
using arrays of generic trip point structures registered along with a
thermal zone instead of trip point callbacks (this has been done
mostly by Daniel Lezcano with some help from yours truly on the Intel
drivers front).
Apart from that and the related reorganization of code, there are some
enhancements of the existing driver and a new Mediatek Low Voltage
Thermal Sensor (LVTS) driver. The Intel powerclamp undergoes a major
rework so it will use the generic idle_inject facility for CPU idle
time injection going forward and it will take additional module
parameters for specifying the subset of CPUs to be affected by it
(work done by Srinivas Pandruvada).
Also included are assorted fixes and a whole bunch of cleanups.
Specifics:
- Rework a large bunch of drivers to use the generic thermal trip
structure and use the opportunity to do more cleanups by removing
unused functions from the OF code (Daniel Lezcano)
- Remove core header inclusion from drivers (Daniel Lezcano)
- Fix some locking issues related to the generic thermal trip rework
(Johan Hovold)
- Fix a crash when requesting the critical temperature on tegra,
which is related to the generic trip point work (Jon Hunter)
- Clean up thermal device unregistration code (Viresh Kumar)
- Fix and clean up thermal control core initialization error code
paths (Daniel Lezcano)
- Relocate the trip points handling code into a separate file (Daniel
Lezcano)
- Make the thermal core fail registration of thermal zones and
cooling devices if the thermal class has not been registered
(Rafael Wysocki)
- Add trip point initialization helper functions for ACPI-defined
trip points and modify two thermal drivers to use them (Rafael
Wysocki, Daniel Lezcano)
- Make the core thermal control code use sysfs_emit_at() instead of
scnprintf() where applicable (ye xingchen)
- Consolidate code accessing the Intel TCC (Thermal Control
Circuitry) MSRs by introducing library functions for that and
making the TCC-related code in thermal drivers use them (Zhang Rui)
- Enhance the x86_pkg_temp_thermal driver to support dynamic tjmax
changes (Zhang Rui)
- Address an "unsigned expression compared with zero" warning in the
intel_soc_dts_iosf thermal driver (Yang Li)
- Update comments regarding two functions in the Intel Menlow thermal
driver (Deming Wang)
- Use sysfs_emit_at() instead of scnprintf() in the int340x thermal
driver (ye xingchen)
- Make the intel_pch thermal driver support the Wellsburg PCH (Tim
Zimmermann)
- Modify the intel_pch and processor_thermal_device_pci thermal
drivers use generic trip point tables instead of thermal zone trip
point callbacks (Daniel Lezcano)
- Add production mode attribute sysfs attribute to the int340x
thermal driver (Srinivas Pandruvada)
- Rework dynamic trip point updates handling and locking in the
int340x thermal driver (Rafael Wysocki)
- Make the int340x thermal driver use a generic trip points table
instead of thermal zone trip point callbacks (Rafael Wysocki,
Daniel Lezcano)
- Clean up and improve the int340x thermal driver (Rafael Wysocki)
- Simplify and clean up the intel_pch thermal driver (Rafael Wysocki)
- Fix the Intel powerclamp thermal driver and make it use the common
idle injection framework (Srinivas Pandruvada)
- Add two module parameters, cpumask and max_idle, to the Intel
powerclamp thermal driver to allow it to affect only a specific
subset of CPUs instead of all of them (Srinivas Pandruvada)
- Make the Intel quark_dts thermal driver Use generic trip point
objects instead of its own trip point representation (Daniel
Lezcano)
- Add toctree entry for thermal documents and fix two issues in the
Intel powerclamp driver documentation (Bagas Sanjaya)
- Use strscpy() to instead of strncpy() in the thermal core (Xu
Panda)
- Fix thermal_sampling_exit() (Vincent Guittot)
- Add Mediatek Low Voltage Thermal Sensor (LVTS) driver (Balsam
Chihi)
- Add r8a779g0 RCar support to the rcar_gen3 thermal driver (Geert
Uytterhoeven)
- Fix useless call to set_trips() when resuming in the rcar_gen3
thermal control driver and add interrupt support detection at init
time to it (Niklas Söderlund)
- Fix memory corruption in the hi3660 thermal driver (Yongqin Liu)
- Fix include path for libnl3 in pkg-config file for libthermal
(Vibhav Pant)
- Remove syscfg-based driver for st as the platform is not supported
any more (Alain Volmat)"
* tag 'thermal-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (135 commits)
thermal/drivers/st: Remove syscfg based driver
thermal: Remove core header inclusion from drivers
tools/lib/thermal: Fix include path for libnl3 in pkg-config file.
thermal/drivers/hisi: Drop second sensor hi3660
thermal/drivers/rcar_gen3_thermal: Fix device initialization
thermal/drivers/rcar_gen3_thermal: Create device local ops struct
thermal/drivers/rcar_gen3_thermal: Do not call set_trips() when resuming
thermal/drivers/rcar_gen3: Add support for R-Car V4H
dt-bindings: thermal: rcar-gen3-thermal: Add r8a779g0 support
thermal/drivers/mediatek: Add the Low Voltage Thermal Sensor driver
dt-bindings: thermal: mediatek: Add LVTS thermal controllers
thermal/drivers/mediatek: Relocate driver to mediatek folder
tools/lib/thermal: Fix thermal_sampling_exit()
Documentation: powerclamp: Fix numbered lists formatting
Documentation: powerclamp: Escape wildcard in cpumask description
Documentation: admin-guide: Add toctree entry for thermal docs
thermal: intel: powerclamp: Add two module parameters
Documentation: admin-guide: Move intel_powerclamp documentation
thermal: core: Use sysfs_emit_at() instead of scnprintf()
thermal: intel: powerclamp: Fix duration module parameter
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These add EPP support to the AMD P-state cpufreq driver, add support
for new platforms to the Intel RAPL power capping driver, intel_idle
and the Qualcomm cpufreq driver, enable thermal cooling for Tegra194,
drop the custom cpufreq driver for loongson1 that is not necessary any
more (and the corresponding cpufreq platform device), fix assorted
issues and clean up code.
Specifics:
- Add EPP support to the AMD P-state cpufreq driver (Perry Yuan, Wyes
Karny, Arnd Bergmann, Bagas Sanjaya)
- Drop the custom cpufreq driver for loongson1 that is not necessary
any more and the corresponding cpufreq platform device (Keguang
Zhang)
- Remove "select SRCU" from system sleep, cpufreq and OPP Kconfig
entries (Paul E. McKenney)
- Enable thermal cooling for Tegra194 (Yi-Wei Wang)
- Register module device table and add missing compatibles for
cpufreq-qcom-hw (Nícolas F. R. A. Prado, Abel Vesa and Luca Weiss)
- Various dt binding updates for qcom-cpufreq-nvmem and
opp-v2-kryo-cpu (Christian Marangi)
- Make kobj_type structure in the cpufreq core constant (Thomas
Weißschuh)
- Make cpufreq_unregister_driver() return void (Uwe Kleine-König)
- Make the TEO cpuidle governor check CPU utilization in order to
refine idle state selection (Kajetan Puchalski)
- Make Kconfig select the haltpoll cpuidle governor when the haltpoll
cpuidle driver is selected and replace a default_idle() call in
that driver with arch_cpu_idle() to allow MWAIT to be used (Li
RongQing)
- Add Emerald Rapids Xeon support to the intel_idle driver (Artem
Bityutskiy)
- Add ARCH_SUSPEND_POSSIBLE dependencies for ARMv4 cpuidle drivers to
avoid randconfig build failures (Arnd Bergmann)
- Make kobj_type structures used in the cpuidle sysfs interface
constant (Thomas Weißschuh)
- Make the cpuidle driver registration code update microsecond values
of idle state parameters in accordance with their nanosecond values
if they are provided (Rafael Wysocki)
- Make the PSCI cpuidle driver prevent topology CPUs from being
suspended on PREEMPT_RT (Krzysztof Kozlowski)
- Document that pm_runtime_force_suspend() cannot be used with
DPM_FLAG_SMART_SUSPEND (Richard Fitzgerald)
- Add EXPORT macros for exporting PM functions from drivers (Richard
Fitzgerald)
- Remove /** from non-kernel-doc comments in hibernation code (Randy
Dunlap)
- Fix possible name leak in powercap_register_zone() (Yang Yingliang)
- Add Meteor Lake and Emerald Rapids support to the intel_rapl power
capping driver (Zhang Rui)
- Modify the idle_inject power capping facility to support 100% idle
injection (Srinivas Pandruvada)
- Fix large time windows handling in the intel_rapl power capping
driver (Zhang Rui)
- Fix memory leaks with using debugfs_lookup() in the generic PM
domains and Energy Model code (Greg Kroah-Hartman)
- Add missing 'cache-unified' property in the example for kryo OPP
bindings (Rob Herring)
- Fix error checking in opp_migrate_dentry() (Qi Zheng)
- Let qcom,opp-fuse-level be a 2-long array for qcom SoCs (Konrad
Dybcio)
- Modify some power management utilities to use the canonical ftrace
path (Ross Zwisler)
- Correct spelling problems for Documentation/power/ as reported by
codespell (Randy Dunlap)"
* tag 'pm-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (53 commits)
Documentation: amd-pstate: disambiguate user space sections
cpufreq: amd-pstate: Fix invalid write to MSR_AMD_CPPC_REQ
dt-bindings: opp: opp-v2-kryo-cpu: enlarge opp-supported-hw maximum
dt-bindings: cpufreq: qcom-cpufreq-nvmem: make cpr bindings optional
dt-bindings: cpufreq: qcom-cpufreq-nvmem: specify supported opp tables
PM: Add EXPORT macros for exporting PM functions
cpuidle: psci: Do not suspend topology CPUs on PREEMPT_RT
MIPS: loongson32: Drop obsolete cpufreq platform device
powercap: intel_rapl: Fix handling for large time window
cpuidle: driver: Update microsecond values of state parameters as needed
cpuidle: sysfs: make kobj_type structures constant
cpuidle: add ARCH_SUSPEND_POSSIBLE dependencies
PM: EM: fix memory leak with using debugfs_lookup()
PM: domains: fix memory leak with using debugfs_lookup()
cpufreq: Make kobj_type structure constant
cpufreq: davinci: Fix clk use after free
cpufreq: amd-pstate: avoid uninitialized variable use
cpufreq: Make cpufreq_unregister_driver() return void
OPP: fix error checking in opp_migrate_dentry()
dt-bindings: cpufreq: cpufreq-qcom-hw: Add SM8550 compatible
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"Beyond some specific LoadPin, UBSAN, and fortify features, there are
other fixes scattered around in various subsystems where maintainers
were okay with me carrying them in my tree or were non-responsive but
the patches were reviewed by others:
- Replace 0-length and 1-element arrays with flexible arrays in
various subsystems (Paulo Miguel Almeida, Stephen Rothwell, Kees
Cook)
- randstruct: Disable Clang 15 support (Eric Biggers)
- GCC plugins: Drop -std=gnu++11 flag (Sam James)
- strpbrk(): Refactor to use strchr() (Andy Shevchenko)
- LoadPin LSM: Allow root filesystem switching when non-enforcing
- fortify: Use dynamic object size hints when available
- ext4: Fix CFI function prototype mismatch
- Nouveau: Fix DP buffer size arguments
- hisilicon: Wipe entire crypto DMA pool on error
- coda: Fully allocate sig_inputArgs
- UBSAN: Improve arm64 trap code reporting
- copy_struct_from_user(): Add minimum bounds check on kernel buffer
size"
* tag 'hardening-v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
randstruct: disable Clang 15 support
uaccess: Add minimum bounds check on kernel buffer size
arm64: Support Clang UBSAN trap codes for better reporting
coda: Avoid partial allocation of sig_inputArgs
gcc-plugins: drop -std=gnu++11 to fix GCC 13 build
lib/string: Use strchr() in strpbrk()
crypto: hisilicon: Wipe entire pool on error
net/i40e: Replace 0-length array with flexible array
io_uring: Replace 0-length array with flexible array
ext4: Fix function prototype mismatch for ext4_feat_ktype
i915/gvt: Replace one-element array with flexible-array member
drm/nouveau/disp: Fix nvif_outp_acquire_dp() argument size
LoadPin: Allow filesystem switch when not enforcing
LoadPin: Move pin reporting cleanly out of locking
LoadPin: Refactor sysctl initialization
LoadPin: Refactor read-only check into a helper
ARM: ixp4xx: Replace 0-length arrays with flexible arrays
fortify: Use __builtin_dynamic_object_size() when available
rxrpc: replace zero-lenth array with DECLARE_FLEX_ARRAY() helper
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:
- Documentation updates
- Miscellaneous fixes, perhaps most notably:
- Throttling callback invocation based on the number of callbacks
that are now ready to invoke instead of on the total number of
callbacks
- Several patches that suppress false-positive boot-time
diagnostics, for example, due to lockdep not yet being
initialized
- Make expedited RCU CPU stall warnings dump stacks of any tasks
that are blocking the stalled grace period. (Normal RCU CPU
stall warnings have done this for many years)
- Lazy-callback fixes to avoid delays during boot, suspend, and
resume. (Note that lazy callbacks must be explicitly enabled, so
this should not (yet) affect production use cases)
- Make kfree_rcu() and friends take advantage of polled grace periods,
thus reducing memory footprint by almost two orders of magnitude,
admittedly on a microbenchmark
This also begins the transition from kfree_rcu(p) to
kfree_rcu_mightsleep(p). This transition was motivated by bugs where
kfree_rcu(p), which can block, was typed instead of the intended
kfree_rcu(p, rh)
- SRCU updates, perhaps most notably fixing a bug that causes SRCU to
fail when booted on a system with a non-zero boot CPU. This
surprising situation actually happens for kdump kernels on the
powerpc architecture
This also adds an srcu_down_read() and srcu_up_read(), which act like
srcu_read_lock() and srcu_read_unlock(), but allow an SRCU read-side
critical section to be handed off from one task to another
- Clean up the now-useless SRCU Kconfig option
There are a few more commits that are not yet acked or pulled into
maintainer trees, and these will be in a pull request for a later
merge window
- RCU-tasks updates, perhaps most notably these fixes:
- A strange interaction between PID-namespace unshare and the
RCU-tasks grace period that results in a low-probability but
very real hang
- A race between an RCU tasks rude grace period on a single-CPU
system and CPU-hotplug addition of the second CPU that can
result in a too-short grace period
- A race between shrinking RCU tasks down to a single callback
list and queuing a new callback to some other CPU, but where
that queuing is delayed for more than an RCU grace period. This
can result in that callback being stranded on the non-boot CPU
- Torture-test updates and fixes
- Torture-test scripting updates and fixes
- Provide additional RCU CPU stall-warning information in kernels built
with CONFIG_RCU_CPU_STALL_CPUTIME=y, and restore the full five-minute
timeout limit for expedited RCU CPU stall warnings
* tag 'rcu.2023.02.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (80 commits)
rcu/kvfree: Add kvfree_rcu_mightsleep() and kfree_rcu_mightsleep()
kernel/notifier: Remove CONFIG_SRCU
init: Remove "select SRCU"
fs/quota: Remove "select SRCU"
fs/notify: Remove "select SRCU"
fs/btrfs: Remove "select SRCU"
fs: Remove CONFIG_SRCU
drivers/pci/controller: Remove "select SRCU"
drivers/net: Remove "select SRCU"
drivers/md: Remove "select SRCU"
drivers/hwtracing/stm: Remove "select SRCU"
drivers/dax: Remove "select SRCU"
drivers/base: Remove CONFIG_SRCU
rcu: Disable laziness if lazy-tracking says so
rcu: Track laziness during boot and suspend
rcu: Remove redundant call to rcu_boost_kthread_setaffinity()
rcu: Allow up to five minutes expedited RCU CPU stall-warning timeouts
rcu: Align the output of RCU CPU stall warning messages
rcu: Add RCU stall diagnosis information
sched: Add helper nr_context_switches_cpu()
...
|
|
Pull workqueue updates from Tejun Heo:
- When per-cpu workqueue workers expire after sitting idle for too
long, they used to wake up to the CPU that they're bound to in order
to exit. This unfortunately could cause unwanted disturbances on CPUs
isolated for e.g. RT applications.
The worker exit path is restructured so that an existing worker is
unbound from its CPU before being woken up for the last time,
allowing it to migrate away from an isolated CPU for exiting.
- A couple debug improvements. Watchdog dump is made more compact and
workqueue now warns if used-after-free during the RCU grace period
after destroy_workqueue().
* tag 'wq-for-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Fold rebind_worker() within rebind_workers()
workqueue: Unbind kworkers before sending them to exit()
workqueue: Don't hold any lock while rcuwait'ing for !POOL_MANAGER_ACTIVE
workqueue: Convert the idle_timer to a timer + work_struct
workqueue: Factorize unbind/rebind_workers() logic
workqueue: Protects wq_unbound_cpumask with wq_pool_attach_mutex
workqueue: Make show_pwq() use run-length encoding
workqueue: Add a new flag to spot the potential UAF error
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Updates for the interrupt subsystem:
Core:
- Move the interrupt affinity spreading mechanism into lib/group_cpus
so it can be used for similar spreading requirements, e.g. in the
block multi-queue code
This also contains a first usecase in the block multi-queue code
which Jens asked to take along with the librarization
- Improve irqdomain locking to close a number race conditions which
can be observed with massive parallel device driver probing
- Enforce and document the semantics of disable_irq() which cannot be
invoked safely from non-sleepable context
- Move the IPI multiplexing code from the Apple AIC driver into the
core, so it can be reused by RISCV
Drivers:
- Plug OF node refcounting leaks in various drivers
- Correctly mark level triggered interrupts in the Broadcom L2
drivers
- The usual small fixes and improvements
- No new drivers for the record!"
* tag 'irq-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
irqchip/irq-bcm7120-l2: Set IRQ_LEVEL for level triggered interrupts
irqchip/irq-brcmstb-l2: Set IRQ_LEVEL for level triggered interrupts
irqdomain: Switch to per-domain locking
irqchip/mvebu-odmi: Use irq_domain_create_hierarchy()
irqchip/loongson-pch-msi: Use irq_domain_create_hierarchy()
irqchip/gic-v3-mbi: Use irq_domain_create_hierarchy()
irqchip/gic-v3-its: Use irq_domain_create_hierarchy()
irqchip/gic-v2m: Use irq_domain_create_hierarchy()
irqchip/alpine-msi: Use irq_domain_add_hierarchy()
x86/uv: Use irq_domain_create_hierarchy()
x86/ioapic: Use irq_domain_create_hierarchy()
irqdomain: Clean up irq_domain_push/pop_irq()
irqdomain: Drop leftover brackets
irqdomain: Drop dead domain-name assignment
irqdomain: Drop revmap mutex
irqdomain: Fix domain registration race
irqdomain: Fix mapping-creation race
irqdomain: Refactor __irq_domain_alloc_irqs()
irqdomain: Look for existing mapping only once
irqdomain: Drop bogus fwspec-mapping error handling
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"Updates for timekeeping, timers and clockevent/source drivers:
Core:
- Yet another round of improvements to make the clocksource watchdog
more robust:
- Relax the clocksource-watchdog skew criteria to match the NTP
criteria.
- Temporarily skip the watchdog when high memory latencies are
detected which can lead to false-positives.
- Provide an option to enable TSC skew detection even on systems
where TSC is marked as reliable.
Sigh!
- Initialize the restart block in the nanosleep syscalls to be
directed to the no restart function instead of doing a partial
setup on entry.
This prevents an erroneous restart_syscall() invocation from
corrupting user space data. While such a situation is clearly a
user space bug, preventing this is a correctness issue and caters
to the least suprise principle.
- Ignore the hrtimer slack for realtime tasks in schedule_hrtimeout()
to align it with the nanosleep semantics.
Drivers:
- The obligatory new driver bindings for Mediatek, Rockchip and
RISC-V variants.
- Add support for the C3STOP misfeature to the RISC-V timer to handle
the case where the timer stops in deeper idle state.
- Set up a static key in the RISC-V timer correctly before first use.
- The usual small improvements and fixes all over the place"
* tag 'timers-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
clocksource/drivers/timer-sun4i: Add CLOCK_EVT_FEAT_DYNIRQ
clocksource/drivers/em_sti: Mark driver as non-removable
clocksource/drivers/sh_tmu: Mark driver as non-removable
clocksource/drivers/riscv: Patch riscv_clock_next_event() jump before first use
clocksource/drivers/timer-microchip-pit64b: Add delay timer
clocksource/drivers/timer-microchip-pit64b: Select driver only on ARM
dt-bindings: timer: sifive,clint: add comaptibles for T-Head's C9xx
dt-bindings: timer: mediatek,mtk-timer: add MT8365
clocksource/drivers/riscv: Get rid of clocksource_arch_init() callback
clocksource/drivers/sh_cmt: Mark driver as non-removable
clocksource/drivers/timer-microchip-pit64b: Drop obsolete dependency on COMPILE_TEST
clocksource/drivers/riscv: Increase the clock source rating
clocksource/drivers/timer-riscv: Set CLOCK_EVT_FEAT_C3STOP based on DT
dt-bindings: timer: Add bindings for the RISC-V timer device
RISC-V: time: initialize hrtimer based broadcast clock event device
dt-bindings: timer: rk-timer: Add rktimer for rv1126
time/debug: Fix memory leak with using debugfs_lookup()
clocksource: Enable TSC watchdog checking of HPET and PMTMR only when requested
posix-timers: Use atomic64_try_cmpxchg() in __update_gt_cputime()
clocksource: Verify HPET and PMTMR when TSC unverified
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resource control updates from Borislav Petkov:
- Add support for a new AMD feature called slow memory bandwidth
allocation. Its goal is to control resource allocation in external
slow memory which is connected to the machine like for example
through CXL devices, accelerators etc
* tag 'x86_cache_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Fix a silly -Wunused-but-set-variable warning
Documentation/x86: Update resctrl.rst for new features
x86/resctrl: Add interface to write mbm_local_bytes_config
x86/resctrl: Add interface to write mbm_total_bytes_config
x86/resctrl: Add interface to read mbm_local_bytes_config
x86/resctrl: Add interface to read mbm_total_bytes_config
x86/resctrl: Support monitor configuration
x86/resctrl: Add __init attribute to rdt_get_mon_l3_config()
x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation
x86/resctrl: Include new features in command line options
x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- Improve the scalability of the CFS bandwidth unthrottling logic with
large number of CPUs.
- Fix & rework various cpuidle routines, simplify interaction with the
generic scheduler code. Add __cpuidle methods as noinstr to objtool's
noinstr detection and fix boatloads of cpuidle bugs & quirks.
- Add new ABI: introduce MEMBARRIER_CMD_GET_REGISTRATIONS, to query
previously issued registrations.
- Limit scheduler slice duration to the sysctl_sched_latency period, to
improve scheduling granularity with a large number of SCHED_IDLE
tasks.
- Debuggability enhancement on sys_exit(): warn about disabled IRQs,
but also enable them to prevent a cascade of followup problems and
repeat warnings.
- Fix the rescheduling logic in prio_changed_dl().
- Micro-optimize cpufreq and sched-util methods.
- Micro-optimize ttwu_runnable()
- Micro-optimize the idle-scanning in update_numa_stats(),
select_idle_capacity() and steal_cookie_task().
- Update the RSEQ code & self-tests
- Constify various scheduler methods
- Remove unused methods
- Refine __init tags
- Documentation updates
- Misc other cleanups, fixes
* tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (110 commits)
sched/rt: pick_next_rt_entity(): check list_entry
sched/deadline: Add more reschedule cases to prio_changed_dl()
sched/fair: sanitize vruntime of entity being placed
sched/fair: Remove capacity inversion detection
sched/fair: unlink misfit task from cpu overutilized
objtool: mem*() are not uaccess safe
cpuidle: Fix poll_idle() noinstr annotation
sched/clock: Make local_clock() noinstr
sched/clock/x86: Mark sched_clock() noinstr
x86/pvclock: Improve atomic update of last_value in pvclock_clocksource_read()
x86/atomics: Always inline arch_atomic64*()
cpuidle: tracing, preempt: Squash _rcuidle tracing
cpuidle: tracing: Warn about !rcu_is_watching()
cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG
cpuidle: drivers: firmware: psci: Dont instrument suspend code
KVM: selftests: Fix build of rseq test
exit: Detect and fix irq disabled state in oops
cpuidle, arm64: Fix the ARM64 cpuidle logic
cpuidle: mvebu: Fix duplicate flags assignment
sched/fair: Limit sched slice duration
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
- Optimize perf_sample_data layout
- Prepare sample data handling for BPF integration
- Update the x86 PMU driver for Intel Meteor Lake
- Restructure the x86 uncore code to fix a SPR (Sapphire Rapids)
discovery breakage
- Fix the x86 Zhaoxin PMU driver
- Cleanups
* tag 'perf-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
perf/x86/intel/uncore: Add Meteor Lake support
x86/perf/zhaoxin: Add stepping check for ZXC
perf/x86/intel/ds: Fix the conversion from TSC to perf time
perf/x86/uncore: Don't WARN_ON_ONCE() for a broken discovery table
perf/x86/uncore: Add a quirk for UPI on SPR
perf/x86/uncore: Ignore broken units in discovery table
perf/x86/uncore: Fix potential NULL pointer in uncore_get_alias_name
perf/x86/uncore: Factor out uncore_device_to_die()
perf/core: Call perf_prepare_sample() before running BPF
perf/core: Introduce perf_prepare_header()
perf/core: Do not pass header for sample ID init
perf/core: Set data->sample_flags in perf_prepare_sample()
perf/core: Add perf_sample_save_brstack() helper
perf/core: Add perf_sample_save_raw_data() helper
perf/core: Add perf_sample_save_callchain() helper
perf/core: Save the dynamic parts of sample data size
x86/kprobes: Use switch-case for 0xFF opcodes in prepare_emulation
perf/core: Change the layout of perf_sample_data
perf/x86/msr: Add Meteor Lake support
perf/x86/cstate: Add Meteor Lake support
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- rwsem micro-optimizations
- spinlock micro-optimizations
- cleanups, simplifications
* tag 'locking-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
vduse: Remove include of rwlock.h
locking/lockdep: Remove lockdep_init_map_crosslock.
x86/ACPI/boot: Use try_cmpxchg() in __acpi_{acquire,release}_global_lock()
x86/PAT: Use try_cmpxchg() in set_page_memtype()
locking/rwsem: Disable preemption in all down_write*() and up_write() code paths
locking/rwsem: Disable preemption in all down_read*() and up_read() code paths
locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
locking/qspinlock: Micro-optimize pending state waiting for unlock
|
|
For drivers to support partial offload of a filter's action list,
add support for action miss to specify an action instance to
continue from in sw.
CT action in particular can't be fully offloaded, as new connections
need to be handled in software. This imposes other limitations on
the actions that can be offloaded together with the CT action, such
as packet modifications.
Assign each action on a filter's action list a unique miss_cookie
which drivers can then use to fill action_miss part of the tc skb
extension. On getting back this miss_cookie, find the action
instance with relevant cookie and continue classifying from there.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next
Stefan Schmidt says:
====================
pull-request: ieee802154-next 2023-02-20
Miquel Raynal build upon his earlier work and introduced two new
features into the ieee802154 stack. Beaconing to announce existing
PAN's and passive scanning to discover the beacons and associated
PAN's. The matching changes to the userspace configuration tool
have been posted as well and will be released together with the
kernel release.
Arnd Bergmann and Dmitry Torokhov worked on converting the
at86rf230 and cc2520 drivers away from the unused platform_data
usage and towards the new gpiod API. (I had to add a revert as
Dmitry found a regression on an already pushed tree on my side).
Changes since v1 (pull request 2023-02-02)
- Netlink API extack and NLA_POLICY* usage as suggested by Jakub
- Removed always true condition found by kernel test robot
- Simplify device removal with running background job for scanning
- Fix problems with beacon sending in some cases by using the MLME
tx path
* tag 'ieee802154-for-net-next-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next:
ieee802154: Drop device trackers
mac802154: Fix an always true condition
mac802154: Send beacons using the MLME Tx path
ieee802154: Change error code on monitor scan netlink request
ieee802154: Convert scan error messages to extack
ieee802154: Use netlink policies when relevant on scan parameters
ieee802154: at86rf230: switch to using gpiod API
ieee802154: at86rf230: drop support for platform data
Revert "at86rf230: convert to gpio descriptors"
cc2520: move to gpio descriptors
mac802154: Avoid superfluous endianness handling
at86rf230: convert to gpio descriptors
mac802154: Handle basic beaconing
ieee802154: Add support for user beaconing requests
mac802154: Handle passive scanning
mac802154: Add MLME Tx locked helpers
mac802154: Prepare forcing specific symbol duration
ieee802154: Introduce a helper to validate a channel
ieee802154: Define a beacon frame header
ieee802154: Add support for user scanning requests
====================
Link: https://lore.kernel.org/r/20230220213749.386451-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The call "skb_copy_from_linear_data(skb, inl + 1, spc)" triggers a FORTIFY
memcpy() warning on ppc64 platform:
In function ‘fortify_memcpy_chk’,
inlined from ‘skb_copy_from_linear_data’ at ./include/linux/skbuff.h:4029:2,
inlined from ‘build_inline_wqe’ at drivers/net/ethernet/mellanox/mlx4/en_tx.c:722:4,
inlined from ‘mlx4_en_xmit’ at drivers/net/ethernet/mellanox/mlx4/en_tx.c:1066:3:
./include/linux/fortify-string.h:513:25: error: call to ‘__write_overflow_field’ declared with
attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()?
[-Werror=attribute-warning]
513 | __write_overflow_field(p_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Same behaviour on x86 you can get if you use "__always_inline" instead of
"inline" for skb_copy_from_linear_data() in skbuff.h
The call here copies data into inlined tx destricptor, which has 104
bytes (MAX_INLINE) space for data payload. In this case "spc" is known
in compile-time but the destination is used with hidden knowledge
(real structure of destination is different from that the compiler
can see). That cause the fortify warning because compiler can check
bounds, but the real bounds are different. "spc" can't be bigger than
64 bytes (MLX4_INLINE_ALIGN), so the data can always fit into inlined
tx descriptor. The fact that "inl" points into inlined tx descriptor is
determined earlier in mlx4_en_xmit().
Avoid confusing the compiler with "inl + 1" constructions to get to past
the inl header by introducing a flexible array "data" to the struct so
that the compiler can see that we are not dealing with an array of inl
structs, but rather, arbitrary data following the structure. There are
no changes to the structure layout reported by pahole, and the resulting
machine code is actually smaller.
Reported-by: Josef Oskera <joskera@redhat.com>
Link: https://lore.kernel.org/lkml/20230217094541.2362873-1-joskera@redhat.com
Fixes: f68f2ff91512 ("fortify: Detect struct member overflows in memcpy() at compile-time")
Cc: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20230218183842.never.954-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-02-17
We've added 64 non-merge commits during the last 7 day(s) which contain
a total of 158 files changed, 4190 insertions(+), 988 deletions(-).
The main changes are:
1) Add a rbtree data structure following the "next-gen data structure"
precedent set by recently-added linked-list, that is, by using
kfunc + kptr instead of adding a new BPF map type, from Dave Marchevsky.
2) Add a new benchmark for hashmap lookups to BPF selftests,
from Anton Protopopov.
3) Fix bpf_fib_lookup to only return valid neighbors and add an option
to skip the neigh table lookup, from Martin KaFai Lau.
4) Add cgroup.memory=nobpf kernel parameter option to disable BPF memory
accouting for container environments, from Yafang Shao.
5) Batch of ice multi-buffer and driver performance fixes,
from Alexander Lobakin.
6) Fix a bug in determining whether global subprog's argument is
PTR_TO_CTX, which is based on type names which breaks kprobe progs,
from Andrii Nakryiko.
7) Prep work for future -mcpu=v4 LLVM option which includes usage of
BPF_ST insn. Thus improve BPF_ST-related value tracking in verifier,
from Eduard Zingerman.
8) More prep work for later building selftests with Memory Sanitizer
in order to detect usages of undefined memory, from Ilya Leoshkevich.
9) Fix xsk sockets to check IFF_UP earlier to avoid a NULL pointer
dereference via sendmsg(), from Maciej Fijalkowski.
10) Implement BPF trampoline for RV64 JIT compiler, from Pu Lehui.
11) Fix BPF memory allocator in combination with BPF hashtab where it could
corrupt special fields e.g. used in bpf_spin_lock, from Hou Tao.
12) Fix LoongArch BPF JIT to always use 4 instructions for function
address so that instruction sequences don't change between passes,
from Hengqi Chen.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits)
selftests/bpf: Add bpf_fib_lookup test
bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup
riscv, bpf: Add bpf trampoline support for RV64
riscv, bpf: Add bpf_arch_text_poke support for RV64
riscv, bpf: Factor out emit_call for kernel and bpf context
riscv: Extend patch_text for multiple instructions
Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES"
selftests/bpf: Add global subprog context passing tests
selftests/bpf: Convert test_global_funcs test to test_loader framework
bpf: Fix global subprog context argument resolution logic
LoongArch, bpf: Use 4 instructions for function address in JIT
bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state
bpf: Disable bh in bpf_test_run for xdp and tc prog
xsk: check IFF_UP earlier in Tx path
Fix typos in selftest/bpf files
selftests/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
samples/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
bpftool: Use bpf_{btf,link,map,prog}_get_info_by_fd()
libbpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
libbpf: Introduce bpf_{btf,link,map,prog}_get_info_by_fd()
...
====================
Link: https://lore.kernel.org/r/20230217221737.31122-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull ARM SoC updates from Arnd Bergmann:
"The majority of the changes are for the OMAP2 platform, mostly
removing some dead code that got left behind from previous cleanups.
Aside from that, there are very minor updates and correctness fixes
for Zynq, i.MX, Samsung, Broadcom, AT91, ep93xx, and OMAP1"
* tag 'arm-soc-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (26 commits)
dt-bindings: soc: samsung: exynos-pmu: allow phys as child
ARM: imx: mach-imx6ul: add imx6ulz support
ARM: imx: Call ida_simple_remove() for ida_simple_get
arm64: drop redundant "ARMv8" from Kconfig option title
ARM: ep93xx: Convert to use descriptors for GPIO LEDs
ARM: s3c: fix s3c64xx_set_timer_source prototype
ARM: OMAP2+: Fix spelling typos in comment
ARM: OMAP2+: Remove unneeded #include <linux/pinctrl/machine.h>
ARM: OMAP2+: Remove unneeded #include <linux/pinctrl/pinmux.h>
ARM: OMAP1: call platform_device_put() in error case in omap1_dm_timer_init()
ARM: BCM63xx: remove useless goto statement
ARM: omap2: make functions static
ARM: omap2: remove unused omap2_pm_init
ARM: omap2: remove unused declarations
ARM: omap2: remove unused functions
ARM: omap2: smartreflex: remove on_init control
ARM: omap2: remove APLL control
ARM: omap2: simplify clock2xxx header
ARM: omap2: remove unused omap_hwmod_reset.c
ARM: omap2: remove unused headers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC boardfile updates from Arnd Bergmann
"Unused boardfile removal for 6.3
This is a follow-up to the deprecation of most of the old-style board
files that was merged in linux-6.0, removing them for good.
This branch is almost exclusively dead code removal based on those
annotations. Some device driver removals went through separate
subsystem trees, but the majority is in the same branch, in order to
better handle dependencies between the patches and avoid breaking
bisection.
Unfortunately that leads to merge conflicts against other changes in
the subsystem trees, but they should all be trivial to resolve by
removing the files.
See commit 7d0d3fa7339e ("Merge tag 'arm-boardfiles-6.0' of
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc") for the
description of which machines were marked unused and are now removed.
The only removals that got postponed are Terastation WXL (mv78xx0) and
Jornada720 (StrongARM1100), which turned out to still have potential
users"
* tag 'arm-boardfile-remove-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (91 commits)
mmc: omap: drop TPS65010 dependency
ARM: pxa: restore mfp-pxa320.h
usb: ohci-omap: avoid unused-variable warning
ARM: debug: remove references in DEBUG_UART_8250_SHIFT to removed configs
ARM: s3c: remove obsolete s3c-cpu-freq header
MAINTAINERS: adjust SAMSUNG SOC CLOCK DRIVERS after s3c24xx support removal
MAINTAINERS: update file entries after arm multi-platform rework and mach-pxa removal
ARM: remove CONFIG_UNUSED_BOARD_FILES
mfd: remove htc-pasic3 driver
w1: remove ds1wm driver
usb: remove ohci-tmio driver
fbdev: remove w100fb driver
fbdev: remove tmiofb driver
mmc: remove tmio_mmc driver
mfd: remove ucb1400 support
mfd: remove toshiba tmio drivers
rtc: remove v3020 driver
power: remove pda_power supply driver
ASoC: pxa: remove unused board support
pcmcia: remove unused pxa/sa1100 drivers
...
|
|
Pull block updates from Jens Axboe:
- NVMe updates via Christoph:
- Small improvements to the logging functionality (Amit Engel)
- Authentication cleanups (Hannes Reinecke)
- Cleanup and optimize the DMA mapping cod in the PCIe driver
(Keith Busch)
- Work around the command effects for Format NVM (Keith Busch)
- Misc cleanups (Keith Busch, Christoph Hellwig)
- Fix and cleanup freeing single sgl (Keith Busch)
- MD updates via Song:
- Fix a rare crash during the takeover process
- Don't update recovery_cp when curr_resync is ACTIVE
- Free writes_pending in md_stop
- Change active_io to percpu
- Updates to drbd, inching us closer to unifying the out-of-tree driver
with the in-tree one (Andreas, Christoph, Lars, Robert)
- BFQ update adding support for multi-actuator drives (Paolo, Federico,
Davide)
- Make brd compliant with REQ_NOWAIT (me)
- Fix for IOPOLL and queue entering, fixing stalled IO waiting on
timeouts (me)
- Fix for REQ_NOWAIT with multiple bios (me)
- Fix memory leak in blktrace cleanup (Greg)
- Clean up sbitmap and fix a potential hang (Kemeng)
- Clean up some bits in BFQ, and fix a bug in the request injection
(Kemeng)
- Clean up the request allocation and issue code, and fix some bugs
related to that (Kemeng)
- ublk updates and fixes:
- Add support for unprivileged ublk (Ming)
- Improve device deletion handling (Ming)
- Misc (Liu, Ziyang)
- s390 dasd fixes (Alexander, Qiheng)
- Improve utility of request caching and fixes (Anuj, Xiao)
- zoned cleanups (Pankaj)
- More constification for kobjs (Thomas)
- blk-iocost cleanups (Yu)
- Remove bio splitting from drivers that don't need it (Christoph)
- Switch blk-cgroups to use struct gendisk. Some of this is now
incomplete as select late reverts were done. (Christoph)
- Add bvec initialization helpers, and convert callers to use that
rather than open-coding it (Christoph)
- Misc fixes and cleanups (Jinke, Keith, Arnd, Bart, Li, Martin,
Matthew, Ulf, Zhong)
* tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux: (169 commits)
brd: use radix_tree_maybe_preload instead of radix_tree_preload
block: use proper return value from bio_failfast()
block: bio-integrity: Copy flags when bio_integrity_payload is cloned
block: Fix io statistics for cgroup in throttle path
brd: mark as nowait compatible
brd: check for REQ_NOWAIT and set correct page allocation mask
brd: return 0/-error from brd_insert_page()
block: sync mixed merged request's failfast with 1st bio's
Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"
Revert "blk-cgroup: pass a gendisk to blkg_lookup"
Revert "blk-cgroup: delay blk-cgroup initialization until add_disk"
Revert "blk-cgroup: delay calling blkcg_exit_disk until disk_release"
Revert "blk-cgroup: move the cgroup information to struct gendisk"
nvme-pci: remove iod use_sgls
nvme-pci: fix freeing single sgl
block: ublk: check IO buffer based on flag need_get_data
s390/dasd: Fix potential memleak in dasd_eckd_init()
s390/dasd: sort out physical vs virtual pointers usage
block: Remove the ALLOC_CACHE_SLACK constant
block: make kobj_type structures constant
...
|
|
Pull io_uring ITER_UBUF conversion from Jens Axboe:
"Since we now have ITER_UBUF available, switch to using it for single
ranges as it's more efficient than ITER_IOVEC for that"
* tag 'for-6.3/iter-ubuf-2023-02-16' of git://git.kernel.dk/linux:
block: use iter_ubuf for single range
iov_iter: move iter_ubuf check inside restore WARN
io_uring: use iter_ubuf for single range imports
io_uring: switch network send/recv to ITER_UBUF
iov: add import_ubuf()
|
|
Pull io_uring updates from Jens Axboe:
- Cleanup series making the async prep and handling of
REQ_F_FORCE_ASYNC easier to follow and verify (Dylan)
- Enable specifying specific flags for OP_MSG_RING (Breno)
- Enable use of KASAN with the internal request cache (Breno)
- Split the opcode definition structs into a hot and cold part (Breno)
- OP_MSG_RING fixes (Pavel, me)
- Fix an issue with IOPOLL cancelation and PREEMPT_NONE (me)
- Handle TIF_NOTIFY_RESUME for the io-wq threads that never return to
userspace (me)
- Add support for using io_uring_register() with a registered ring fd
(Josh)
- Improve handling of poll on the ring fd (Pavel)
- Series improving the task_work handling (Pavel)
- Misc cleanups, fixes, improvements (Dmitrii, Quanfa, Richard, Pavel,
me)
* tag 'for-6.3/io_uring-2023-02-16' of git://git.kernel.dk/linux: (51 commits)
io_uring: Support calling io_uring_register with a registered ring fd
io_uring,audit: don't log IORING_OP_MADVISE
io_uring: mark task TASK_RUNNING before handling resume/task work
io_uring: always go async for unsupported open flags
io_uring: always go async for unsupported fadvise flags
io_uring: for requests that require async, force it
io_uring: if a linked request has REQ_F_FORCE_ASYNC then run it async
io_uring: add reschedule point to handle_tw_list()
io_uring: add a conditional reschedule to the IOPOLL cancelation loop
io_uring: return normal tw run linking optimisation
io_uring: refactor tctx_task_work
io_uring: refactor io_put_task helpers
io_uring: refactor req allocation
io_uring: improve io_get_sqe
io_uring: kill outdated comment about overflow flush
io_uring: use user visible tail in io_uring_poll()
io_uring: pass in io_issue_def to io_assign_file()
io_uring: Enable KASAN for request cache
io_uring: handle TIF_NOTIFY_RESUME when checking for task_work
io_uring/msg-ring: ensure flags passing works for task_work completions
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"The usual mix of performance improvements and new features.
The core change is reworking how checksums are processed, with
followup cleanups and simplifications. There are two minor changes in
block layer and iomap code.
Features:
- block group allocation class heuristics:
- pack files by size (up to 128k, up to 8M, more) to avoid
fragmentation in block groups, assuming that file size and life
time is correlated, in particular this may help during balance
- with tracepoints and extensible in the future
Performance:
- send: cache directory utimes and only emit the command when
necessary
- speedup up to 10x
- smaller final stream produced (no redundant utimes commands
issued)
- compatibility not affected
- fiemap: skip backref checks for shared leaves
- speedup 3x on sample filesystem with all leaves shared (e.g. on
snapshots)
- micro optimized b-tree key lookup, speedup in metadata operations
(sample benchmark: fs_mark +10% of files/sec)
Core changes:
- change where checksumming is done in the io path:
- checksum and read repair does verification at lower layer
- cascaded cleanups and simplifications
- raid56 refactoring and cleanups
Fixes:
- sysfs: make sure that a run-time change of a feature is correctly
tracked by the feature files
- scrub: better reporting of tree block errors
Other:
- locally enable -Wmaybe-uninitialized after fixing all warnings
- misc cleanups, spelling fixes
Other code:
- block: export bio_split_rw
- iomap: remove IOMAP_F_ZONE_APPEND"
* tag 'for-6.3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (109 commits)
btrfs: make kobj_type structures constant
btrfs: remove the bdev argument to btrfs_rmap_block
btrfs: don't rely on unchanging ->bi_bdev for zone append remaps
btrfs: never return true for reads in btrfs_use_zone_append
btrfs: pass a btrfs_bio to btrfs_use_append
btrfs: set bbio->file_offset in alloc_new_bio
btrfs: use file_offset to limit bios size in calc_bio_boundaries
btrfs: do unsigned integer division in the extent buffer binary search loop
btrfs: eliminate extra call when doing binary search on extent buffer
btrfs: raid56: handle endio in scrub_rbio
btrfs: raid56: handle endio in recover_rbio
btrfs: raid56: handle endio in rmw_rbio
btrfs: raid56: submit the read bios from scrub_assemble_read_bios
btrfs: raid56: fold rmw_read_wait_recover into rmw_read_bios
btrfs: raid56: fold recover_assemble_read_bios into recover_rbio
btrfs: raid56: add a bio_list_put helper
btrfs: raid56: wait for I/O completion in submit_read_bios
btrfs: raid56: simplify code flow in rmw_rbio
btrfs: raid56: simplify error handling and code flow in raid56_parity_write
btrfs: replace btrfs_wait_tree_block_writeback by wait_on_extent_buffer_writeback
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
"Support for auditing decisions regarding fanotify permission events"
* tag 'fsnotify_for_v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify,audit: Allow audit to use the full permission event response
fanotify: define struct members to hold response decision context
fanotify: Ensure consistent variable type for response
|
|
Pull fsverity updates from Eric Biggers:
"Fix the longstanding implementation limitation that fsverity was only
supported when the Merkle tree block size, filesystem block size, and
PAGE_SIZE were all equal.
Specifically, add support for Merkle tree block sizes less than
PAGE_SIZE, and make ext4 support fsverity on filesystems where the
filesystem block size is less than PAGE_SIZE.
Effectively, this means that fsverity can now be used on systems with
non-4K pages, at least on ext4. These changes have been tested using
the verity group of xfstests, newly updated to cover the new code
paths.
Also update fs/verity/ to support verifying data from large folios.
There's also a similar patch for fs/crypto/, to support decrypting
data from large folios, which I'm including in here to avoid a merge
conflict between the fscrypt and fsverity branches"
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
fscrypt: support decrypting data from large folios
fsverity: support verifying data from large folios
fsverity.rst: update git repo URL for fsverity-utils
ext4: allow verity with fs block size < PAGE_SIZE
fs/buffer.c: support fsverity in block_read_full_folio()
f2fs: simplify f2fs_readpage_limit()
ext4: simplify ext4_readpage_limit()
fsverity: support enabling with tree block size < PAGE_SIZE
fsverity: support verification with tree block size < PAGE_SIZE
fsverity: replace fsverity_hash_page() with fsverity_hash_block()
fsverity: use EFBIG for file too large to enable verity
fsverity: store log2(digest_size) precomputed
fsverity: simplify Merkle tree readahead size calculation
fsverity: use unsigned long for level_start
fsverity: remove debug messages and CONFIG_FS_VERITY_DEBUG
fsverity: pass pos and size to ->write_merkle_tree_block
fsverity: optimize fsverity_cleanup_inode() on non-verity files
fsverity: optimize fsverity_prepare_setattr() on non-verity files
fsverity: optimize fsverity_file_open() on non-verity files
|
|
Pull fscrypt updates from Eric Biggers:
"Simplify the implementation of the test_dummy_encryption mount option
by adding the 'test dummy key' on-demand"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
fscrypt: clean up fscrypt_add_test_dummy_key()
fs/super.c: stop calling fscrypt_destroy_keyring() from __put_super()
f2fs: stop calling fscrypt_add_test_dummy_key()
ext4: stop calling fscrypt_add_test_dummy_key()
fscrypt: add the test dummy encryption key on-demand
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull vfs hardening update from Christian Brauner:
"Jan pointed out that during shutdown both filp_close() and super block
destruction will use basic printk logging when bugs are detected. This
causes issues in a few scenarios:
- Tools like syzkaller cannot figure out that the logged message
indicates a bug.
- Users that explicitly opt in to have the kernel bug on data
corruption by selecting CONFIG_BUG_ON_DATA_CORRUPTION should see
the kernel crash when they did actually select that option.
- When there are busy inodes after the superblock is shut down later
access to such a busy inodes walks through freed memory. It would
be better to cleanly crash instead.
All of this can be addressed by using the already existing
CHECK_DATA_CORRUPTION() macro in these places when kernel bugs are
detected. Its logging improvement is useful for all users.
Otherwise this only has a meaningful behavioral effect when users do
select CONFIG_BUG_ON_DATA_CORRUPTION which means this is backward
compatible for regular users"
* tag 'fs.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
fs: Use CHECK_DATA_CORRUPTION() when kernel bugs are detected
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull vfs idmapping updates from Christian Brauner:
- Last cycle we introduced the dedicated struct mnt_idmap type for
mount idmapping and the required infrastucture in 256c8aed2b42 ("fs:
introduce dedicated idmap type for mounts"). As promised in last
cycle's pull request message this converts everything to rely on
struct mnt_idmap.
Currently we still pass around the plain namespace that was attached
to a mount. This is in general pretty convenient but it makes it easy
to conflate namespaces that are relevant on the filesystem with
namespaces that are relevant on the mount level. Especially for
non-vfs developers without detailed knowledge in this area this was a
potential source for bugs.
This finishes the conversion. Instead of passing the plain namespace
around this updates all places that currently take a pointer to a
mnt_userns with a pointer to struct mnt_idmap.
Now that the conversion is done all helpers down to the really
low-level helpers only accept a struct mnt_idmap argument instead of
two namespace arguments.
Conflating mount and other idmappings will now cause the compiler to
complain loudly thus eliminating the possibility of any bugs. This
makes it impossible for filesystem developers to mix up mount and
filesystem idmappings as they are two distinct types and require
distinct helpers that cannot be used interchangeably.
Everything associated with struct mnt_idmap is moved into a single
separate file. With that change no code can poke around in struct
mnt_idmap. It can only be interacted with through dedicated helpers.
That means all filesystems are and all of the vfs is completely
oblivious to the actual implementation of idmappings.
We are now also able to extend struct mnt_idmap as we see fit. For
example, we can decouple it completely from namespaces for users that
don't require or don't want to use them at all. We can also extend
the concept of idmappings so we can cover filesystem specific
requirements.
In combination with the vfs{g,u}id_t work we finished in v6.2 this
makes this feature substantially more robust and thus difficult to
implement wrong by a given filesystem and also protects the vfs.
- Enable idmapped mounts for tmpfs and fulfill a longstanding request.
A long-standing request from users had been to make it possible to
create idmapped mounts for tmpfs. For example, to share the host's
tmpfs mount between multiple sandboxes. This is a prerequisite for
some advanced Kubernetes cases. Systemd also has a range of use-cases
to increase service isolation. And there are more users of this.
However, with all of the other work going on this was way down on the
priority list but luckily someone other than ourselves picked this
up.
As usual the patch is tiny as all the infrastructure work had been
done multiple kernel releases ago. In addition to all the tests that
we already have I requested that Rodrigo add a dedicated tmpfs
testsuite for idmapped mounts to xfstests. It is to be included into
xfstests during the v6.3 development cycle. This should add a slew of
additional tests.
* tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (26 commits)
shmem: support idmapped mounts for tmpfs
fs: move mnt_idmap
fs: port vfs{g,u}id helpers to mnt_idmap
fs: port fs{g,u}id helpers to mnt_idmap
fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap
fs: port i_{g,u}id_{needs_}update() to mnt_idmap
quota: port to mnt_idmap
fs: port privilege checking helpers to mnt_idmap
fs: port inode_owner_or_capable() to mnt_idmap
fs: port inode_init_owner() to mnt_idmap
fs: port acl to mnt_idmap
fs: port xattr to mnt_idmap
fs: port ->permission() to pass mnt_idmap
fs: port ->fileattr_set() to pass mnt_idmap
fs: port ->set_acl() to pass mnt_idmap
fs: port ->get_acl() to pass mnt_idmap
fs: port ->tmpfile() to pass mnt_idmap
fs: port ->rename() to pass mnt_idmap
fs: port ->mknod() to pass mnt_idmap
fs: port ->mkdir() to pass mnt_idmap
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull i_version updates from Jeff Layton:
"This overhauls how we handle i_version queries from nfsd.
Instead of having special routines and grabbing the i_version field
directly out of the inode in some cases, we've moved most of the
handling into the various filesystems' getattr operations. As a bonus,
this makes ceph's change attribute usable by knfsd as well.
This should pave the way for future work to make this value queryable
by userland, and to make it more resilient against rolling back on a
crash"
* tag 'iversion-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
nfsd: remove fetch_iversion export operation
nfsd: use the getattr operation to fetch i_version
nfsd: move nfsd4_change_attribute to nfsfh.c
ceph: report the inode version in getattr if requested
nfs: report the inode version in getattr if requested
vfs: plumb i_version handling into struct kstat
fs: clarify when the i_version counter must be updated
fs: uninline inode_query_iversion
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull file locking updates from Jeff Layton:
"The main change here is that I've broken out most of the file locking
definitions into a new header file. I also went ahead and completed
the removal of locks_inode function"
* tag 'locks-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
fs: remove locks_inode
filelock: move file locking definitions to separate header file
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm updates from Jarkko Sakkinen:
"In additon to bug fixes, these are noteworthy changes:
- In TPM I2C drivers, migrate from probe() to probe_new() (a new
driver model in I2C).
- TPM CRB: Pluton support
- Add duplicate hash detection to the blacklist keyring in order to
give more meaningful klog output than e.g. [1]"
Link: https://askubuntu.com/questions/1436856/ubuntu-22-10-blacklist-problem-blacklisting-hash-13-message-on-boot [1]
* tag 'tpm-v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm: add vendor flag to command code validation
tpm: Add reserved memory event log
tpm: Use managed allocation for bios event log
tpm: tis_i2c: Convert to i2c's .probe_new()
tpm: tpm_i2c_nuvoton: Convert to i2c's .probe_new()
tpm: tpm_i2c_infineon: Convert to i2c's .probe_new()
tpm: tpm_i2c_atmel: Convert to i2c's .probe_new()
tpm: st33zp24: Convert to i2c's .probe_new()
KEYS: asymmetric: Fix ECDSA use via keyctl uapi
certs: don't try to update blacklist keys
KEYS: Add new function key_create()
certs: make blacklisted hash available in klog
tpm_crb: Add support for CRB devices based on Pluton
crypto: certs: fix FIPS selftest dependency
|
|
https://git.linaro.org/people/jens.wiklander/linux-tee
Pull TEE update from Jens Wiklander:
"Remove get_kernel_pages()
Vmalloc page support is removed from shm_get_kernel_pages() and the
get_kernel_pages() call is replaced by calls to get_page(). With no
remaining callers of get_kernel_pages() the function is removed"
[ This looks like it's just some random 'tee' cleanup, but the bigger
picture impetus for this is really to to to remove historical
confusion with mixed use of kernel virtual addresses and 'struct page'
pointers.
Kernel virtual pointers in the vmalloc space is then particularly
confusing - both for looking up a page pointer (when trying to then
unify a "virtual address or page" interface) and _particularly_ when
mixed with HIGHMEM support and the kmap*() family of remapping.
This is particularly true with HIGHMEM getting much less test coverage
with 32-bit architectures being increasingly legacy targets.
So we actively wanted to remove get_kernel_pages() to make sure nobody
else used it too, and thus the 'tee' part is "finally remove last
user".
See also commit 6647e76ab623 ("v4l2: don't fall back to follow_pfn()
if pin_user_pages_fast() fails") for a totally different version of a
conceptually similar "let's stop this confusion of different ways of
referring to memory". - Linus ]
* tag 'remove-get_kernel_pages-for-6.3' of https://git.linaro.org/people/jens.wiklander/linux-tee:
mm: Remove get_kernel_pages()
tee: Remove call to get_kernel_pages()
tee: Remove vmalloc page support
highmem: Enhance is_kmap_addr() to check kmap_local_page() mappings
|
|
That really was meant to be a per netns attribute from the beginning.
The idea is that once proper isolation is in place in the main
namespace, additional demux in the child namespaces will be redundant.
Let's make child netns default rps mask empty by default.
To avoid bloating the netns with a possibly large cpumask, allocate
it on-demand during the first write operation.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Add safeguard to check for NULL tupe in objects updates via
NFT_MSG_NEWOBJ, this should not ever happen. From Alok Tiwari.
2) Incorrect pointer check in the new destroy rule command,
from Yang Yingliang.
3) Incorrect status bitcheck in nf_conntrack_udp_packet(),
from Florian Westphal.
4) Simplify seq_print_acct(), from Ilia Gavrilov.
5) Use 2-arg optimal variant of kfree_rcu() in IPVS,
from Julian Anastasov.
6) TCP connection enters CLOSE state in conntrack for locally
originated TCP reset packet from the reject target,
from Florian Westphal.
The fixes #2 and #3 in this series address issues from the previous pull
nf-next request in this net-next cycle.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"A single build fix for the PCI/MSI infrastructure.
The addition of the new alloc/free interfaces in this cycle forgot to
add stub functions for pci_msix_alloc_irq_at() and pci_msix_free_irq()
for the CONFIG_PCI_MSI=n case"
* tag 'irq-urgent-2023-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
PCI/MSI: Provide missing stubs for CONFIG_PCI_MSI=n
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates from Marc Zyngier:
- New and improved irqdomain locking, closing a number of races that
became apparent now that we are able to probe drivers in parallel
- A bunch of OF node refcounting bugs have been fixed
- We now have a new IPI mux, lifted from the Apple AIC code and
made common. It is expected that riscv will eventually benefit
from it
- Two small fixes for the Broadcom L2 drivers
- Various cleanups and minor bug fixes
Link: https://lore.kernel.org/r/20230218143452.3817627-1-maz@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 regression fix from Will Deacon:
"Apologies for the _extremely_ late pull request here, but we had a
'perf' (i.e. CPU PMU) regression on the Apple M1 reported on Wednesday
[1] which was introduced by bd2756811766 ("perf: Rewrite core context
handling") during the merge window.
Mark and I looked into this and noticed an additional problem caused
by the same patch, where the 'CHAIN' event (used to combine two
adjacent 32-bit counters into a single 64-bit counter) was not being
filtered correctly. Mark posted a series on Thursday [2] which
addresses both of these regressions and I queued it the same day.
The changes are small, self-contained and have been confirmed to fix
the original regression.
Summary:
- Fix 'perf' regression for non-standard CPU PMU hardware (i.e. Apple
M1)"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: perf: reject CHAIN events at creation time
arm_pmu: fix event CPU filtering
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Six hotfixes. Five are cc:stable: four for MM, one for nilfs2.
Also a MAINTAINERS update"
* tag 'mm-hotfixes-stable-2023-02-17-15-16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
nilfs2: fix underflow in second superblock position calculations
hugetlb: check for undefined shift on 32 bit architectures
mm/migrate: fix wrongly apply write bit after mkdirty on sparc64
MAINTAINERS: update FPU EMULATOR web page
mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount
mm/filemap: fix page end in filemap_get_read_batch
|
|
Users can specify the hugetlb page size in the mmap, shmget and
memfd_create system calls. This is done by using 6 bits within the flags
argument to encode the base-2 logarithm of the desired page size. The
routine hstate_sizelog() uses the log2 value to find the corresponding
hugetlb hstate structure. Converting the log2 value (page_size_log) to
potential hugetlb page size is the simple statement:
1UL << page_size_log
Because only 6 bits are used for page_size_log, the left shift can not be
greater than 63. This is fine on 64 bit architectures where a long is 64
bits. However, if a value greater than 31 is passed on a 32 bit
architecture (where long is 32 bits) the shift will result in undefined
behavior. This was generally not an issue as the result of the undefined
shift had to exactly match hugetlb page size to proceed.
Recent improvements in runtime checking have resulted in this undefined
behavior throwing errors such as reported below.
Fix by comparing page_size_log to BITS_PER_LONG before doing shift.
Link: https://lkml.kernel.org/r/20230216013542.138708-1-mike.kravetz@oracle.com
Link: https://lore.kernel.org/lkml/CA+G9fYuei_Tr-vN9GS7SfFyU1y9hNysnf=PB7kT0=yv4MiPgVg@mail.gmail.com/
Fixes: 42d7395feb56 ("mm: support more pagesizes for MAP_HUGETLB/SHM_HUGETLB")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reviewed-by: Jesper Juhl <jesperjuhl76@gmail.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|