diff options
142 files changed, 1613 insertions, 652 deletions
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 69af2173555f..6d02168d78be 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1599,6 +1599,15 @@ The following nested keys are defined. pglazyfreed (npn) Amount of reclaimed lazyfree pages + swpin_zero + Number of pages swapped into memory and filled with zero, where I/O + was optimized out because the page content was detected to be zero + during swapout. + + swpout_zero + Number of zero-filled pages swapped out with I/O skipped due to the + content being detected as zero. + zswpin Number of pages moved in to memory from zswap. diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 1666576acc0e..d401577b5a6a 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -6727,6 +6727,15 @@ torture.verbose_sleep_duration= [KNL] Duration of each verbose-printk() sleep in jiffies. + tpm.disable_pcr_integrity= [HW,TPM] + Do not protect PCR registers from unintended physical + access, or interposers in the bus by the means of + having an integrity protected session wrapped around + TPM2_PCR_Extend command. Consider this in a situation + where TPM is heavily utilized by IMA, thus protection + causing a major performance hit, and the space where + machines are deployed is by other means guarded. + tpm_suspend_pcr=[HW,TPM] Format: integer pcr id Specify that at suspend time, the tpm driver diff --git a/Documentation/networking/devmem.rst b/Documentation/networking/devmem.rst index a55bf21f671c..d95363645331 100644 --- a/Documentation/networking/devmem.rst +++ b/Documentation/networking/devmem.rst @@ -225,6 +225,15 @@ The user must ensure the tokens are returned to the kernel in a timely manner. Failure to do so will exhaust the limited dmabuf that is bound to the RX queue and will lead to packet drops. +The user must pass no more than 128 tokens, with no more than 1024 total frags +among the token->token_count across all the tokens. If the user provides more +than 1024 frags, the kernel will free up to 1024 frags and return early. + +The kernel returns the number of actual frags freed. The number of frags freed +can be less than the tokens provided by the user in case of: + +(a) an internal kernel leak bug. +(b) the user passed more than 1024 frags. Implementation & Caveats ======================== diff --git a/Documentation/security/landlock.rst b/Documentation/security/landlock.rst index 36f26501fd15..59ecdb1c0d4d 100644 --- a/Documentation/security/landlock.rst +++ b/Documentation/security/landlock.rst @@ -11,18 +11,18 @@ Landlock LSM: kernel documentation Landlock's goal is to create scoped access-control (i.e. sandboxing). To harden a whole system, this feature should be available to any process, -including unprivileged ones. Because such process may be compromised or +including unprivileged ones. Because such a process may be compromised or backdoored (i.e. untrusted), Landlock's features must be safe to use from the kernel and other processes point of view. Landlock's interface must therefore expose a minimal attack surface. Landlock is designed to be usable by unprivileged processes while following the system security policy enforced by other access control mechanisms (e.g. DAC, -LSM). Indeed, a Landlock rule shall not interfere with other access-controls -enforced on the system, only add more restrictions. +LSM). A Landlock rule shall not interfere with other access-controls enforced +on the system, only add more restrictions. Any user can enforce Landlock rulesets on their processes. They are merged and -evaluated according to the inherited ones in a way that ensures that only more +evaluated against inherited rulesets in a way that ensures that only more constraints can be added. User space documentation can be found here: @@ -43,7 +43,7 @@ Guiding principles for safe access controls only impact the processes requesting them. * Resources (e.g. file descriptors) directly obtained from the kernel by a sandboxed process shall retain their scoped accesses (at the time of resource - acquisition) whatever process use them. + acquisition) whatever process uses them. Cf. `File descriptor access rights`_. Design choices @@ -71,7 +71,7 @@ the same results, when they are executed under the same Landlock domain. Taking the ``LANDLOCK_ACCESS_FS_TRUNCATE`` right as an example, it may be allowed to open a file for writing without being allowed to :manpage:`ftruncate` the resulting file descriptor if the related file -hierarchy doesn't grant such access right. The following sequences of +hierarchy doesn't grant that access right. The following sequences of operations have the same semantic and should then have the same result: * ``truncate(path);`` @@ -81,7 +81,7 @@ Similarly to file access modes (e.g. ``O_RDWR``), Landlock access rights attached to file descriptors are retained even if they are passed between processes (e.g. through a Unix domain socket). Such access rights will then be enforced even if the receiving process is not sandboxed by Landlock. Indeed, -this is required to keep a consistent access control over the whole system, and +this is required to keep access controls consistent over the whole system, and this avoids unattended bypasses through file descriptor passing (i.e. confused deputy attack). diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst index c8d3e46badc5..d639c61cb472 100644 --- a/Documentation/userspace-api/landlock.rst +++ b/Documentation/userspace-api/landlock.rst @@ -8,13 +8,13 @@ Landlock: unprivileged access control ===================================== :Author: Mickaël Salaün -:Date: September 2024 +:Date: October 2024 -The goal of Landlock is to enable to restrict ambient rights (e.g. global +The goal of Landlock is to enable restriction of ambient rights (e.g. global filesystem or network access) for a set of processes. Because Landlock -is a stackable LSM, it makes possible to create safe security sandboxes as new -security layers in addition to the existing system-wide access-controls. This -kind of sandbox is expected to help mitigate the security impact of bugs or +is a stackable LSM, it makes it possible to create safe security sandboxes as +new security layers in addition to the existing system-wide access-controls. +This kind of sandbox is expected to help mitigate the security impact of bugs or unexpected/malicious behaviors in user space applications. Landlock empowers any process, including unprivileged ones, to securely restrict themselves. @@ -86,8 +86,8 @@ to be explicit about the denied-by-default access rights. LANDLOCK_SCOPE_SIGNAL, }; -Because we may not know on which kernel version an application will be -executed, it is safer to follow a best-effort security approach. Indeed, we +Because we may not know which kernel version an application will be executed +on, it is safer to follow a best-effort security approach. Indeed, we should try to protect users as much as possible whatever the kernel they are using. @@ -129,7 +129,7 @@ version, and only use the available subset of access rights: LANDLOCK_SCOPE_SIGNAL); } -This enables to create an inclusive ruleset that will contain our rules. +This enables the creation of an inclusive ruleset that will contain our rules. .. code-block:: c @@ -219,42 +219,41 @@ If the ``landlock_restrict_self`` system call succeeds, the current thread is now restricted and this policy will be enforced on all its subsequently created children as well. Once a thread is landlocked, there is no way to remove its security policy; only adding more restrictions is allowed. These threads are -now in a new Landlock domain, merge of their parent one (if any) with the new -ruleset. +now in a new Landlock domain, which is a merger of their parent one (if any) +with the new ruleset. Full working code can be found in `samples/landlock/sandboxer.c`_. Good practices -------------- -It is recommended setting access rights to file hierarchy leaves as much as +It is recommended to set access rights to file hierarchy leaves as much as possible. For instance, it is better to be able to have ``~/doc/`` as a read-only hierarchy and ``~/tmp/`` as a read-write hierarchy, compared to ``~/`` as a read-only hierarchy and ``~/tmp/`` as a read-write hierarchy. Following this good practice leads to self-sufficient hierarchies that do not depend on their location (i.e. parent directories). This is particularly relevant when we want to allow linking or renaming. Indeed, having consistent -access rights per directory enables to change the location of such directory +access rights per directory enables changing the location of such directories without relying on the destination directory access rights (except those that are required for this operation, see ``LANDLOCK_ACCESS_FS_REFER`` documentation). Having self-sufficient hierarchies also helps to tighten the required access rights to the minimal set of data. This also helps avoid sinkhole directories, -i.e. directories where data can be linked to but not linked from. However, +i.e. directories where data can be linked to but not linked from. However, this depends on data organization, which might not be controlled by developers. In this case, granting read-write access to ``~/tmp/``, instead of write-only -access, would potentially allow to move ``~/tmp/`` to a non-readable directory +access, would potentially allow moving ``~/tmp/`` to a non-readable directory and still keep the ability to list the content of ``~/tmp/``. Layers of file path access rights --------------------------------- Each time a thread enforces a ruleset on itself, it updates its Landlock domain -with a new layer of policy. Indeed, this complementary policy is stacked with -the potentially other rulesets already restricting this thread. A sandboxed -thread can then safely add more constraints to itself with a new enforced -ruleset. +with a new layer of policy. This complementary policy is stacked with any +other rulesets potentially already restricting this thread. A sandboxed thread +can then safely add more constraints to itself with a new enforced ruleset. One policy layer grants access to a file path if at least one of its rules encountered on the path grants the access. A sandboxed thread can only access @@ -265,7 +264,7 @@ etc.). Bind mounts and OverlayFS ------------------------- -Landlock enables to restrict access to file hierarchies, which means that these +Landlock enables restricting access to file hierarchies, which means that these access rights can be propagated with bind mounts (cf. Documentation/filesystems/sharedsubtree.rst) but not with Documentation/filesystems/overlayfs.rst. @@ -278,21 +277,21 @@ access to multiple file hierarchies at the same time, whether these hierarchies are the result of bind mounts or not. An OverlayFS mount point consists of upper and lower layers. These layers are -combined in a merge directory, result of the mount point. This merge hierarchy -may include files from the upper and lower layers, but modifications performed -on the merge hierarchy only reflects on the upper layer. From a Landlock -policy point of view, each OverlayFS layers and merge hierarchies are -standalone and contains their own set of files and directories, which is -different from bind mounts. A policy restricting an OverlayFS layer will not -restrict the resulted merged hierarchy, and vice versa. Landlock users should -then only think about file hierarchies they want to allow access to, regardless -of the underlying filesystem. +combined in a merge directory, and that merged directory becomes available at +the mount point. This merge hierarchy may include files from the upper and +lower layers, but modifications performed on the merge hierarchy only reflect +on the upper layer. From a Landlock policy point of view, all OverlayFS layers +and merge hierarchies are standalone and each contains their own set of files +and directories, which is different from bind mounts. A policy restricting an +OverlayFS layer will not restrict the resulted merged hierarchy, and vice versa. +Landlock users should then only think about file hierarchies they want to allow +access to, regardless of the underlying filesystem. Inheritance ----------- Every new thread resulting from a :manpage:`clone(2)` inherits Landlock domain -restrictions from its parent. This is similar to the seccomp inheritance (cf. +restrictions from its parent. This is similar to seccomp inheritance (cf. Documentation/userspace-api/seccomp_filter.rst) or any other LSM dealing with task's :manpage:`credentials(7)`. For instance, one process's thread may apply Landlock rules to itself, but they will not be automatically applied to other @@ -311,8 +310,8 @@ Ptrace restrictions A sandboxed process has less privileges than a non-sandboxed process and must then be subject to additional restrictions when manipulating another process. To be allowed to use :manpage:`ptrace(2)` and related syscalls on a target -process, a sandboxed process should have a subset of the target process rules, -which means the tracee must be in a sub-domain of the tracer. +process, a sandboxed process should have a superset of the target process's +access rights, which means the tracee must be in a sub-domain of the tracer. IPC scoping ----------- @@ -322,7 +321,7 @@ interactions between sandboxes. Each Landlock domain can be explicitly scoped for a set of actions by specifying it on a ruleset. For example, if a sandboxed process should not be able to :manpage:`connect(2)` to a non-sandboxed process through abstract :manpage:`unix(7)` sockets, we can -specify such restriction with ``LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET``. +specify such a restriction with ``LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET``. Moreover, if a sandboxed process should not be able to send a signal to a non-sandboxed process, we can specify this restriction with ``LANDLOCK_SCOPE_SIGNAL``. @@ -394,7 +393,7 @@ Backward and forward compatibility Landlock is designed to be compatible with past and future versions of the kernel. This is achieved thanks to the system call attributes and the associated bitflags, particularly the ruleset's ``handled_access_fs``. Making -handled access right explicit enables the kernel and user space to have a clear +handled access rights explicit enables the kernel and user space to have a clear contract with each other. This is required to make sure sandboxing will not get stricter with a system update, which could break applications. @@ -563,33 +562,34 @@ always allowed when using a kernel that only supports the first or second ABI. Starting with the Landlock ABI version 3, it is now possible to securely control truncation thanks to the new ``LANDLOCK_ACCESS_FS_TRUNCATE`` access right. -Network support (ABI < 4) -------------------------- +TCP bind and connect (ABI < 4) +------------------------------ Starting with the Landlock ABI version 4, it is now possible to restrict TCP bind and connect actions to only a set of allowed ports thanks to the new ``LANDLOCK_ACCESS_NET_BIND_TCP`` and ``LANDLOCK_ACCESS_NET_CONNECT_TCP`` access rights. -IOCTL (ABI < 5) ---------------- +Device IOCTL (ABI < 5) +---------------------- IOCTL operations could not be denied before the fifth Landlock ABI, so :manpage:`ioctl(2)` is always allowed when using a kernel that only supports an earlier ABI. Starting with the Landlock ABI version 5, it is possible to restrict the use of -:manpage:`ioctl(2)` using the new ``LANDLOCK_ACCESS_FS_IOCTL_DEV`` right. +:manpage:`ioctl(2)` on character and block devices using the new +``LANDLOCK_ACCESS_FS_IOCTL_DEV`` right. -Abstract UNIX socket scoping (ABI < 6) --------------------------------------- +Abstract UNIX socket (ABI < 6) +------------------------------ Starting with the Landlock ABI version 6, it is possible to restrict connections to an abstract :manpage:`unix(7)` socket by setting ``LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET`` to the ``scoped`` ruleset attribute. -Signal scoping (ABI < 6) ------------------------- +Signal (ABI < 6) +---------------- Starting with the Landlock ABI version 6, it is possible to restrict :manpage:`signal(7)` sending by setting ``LANDLOCK_SCOPE_SIGNAL`` to the @@ -605,9 +605,9 @@ Build time configuration Landlock was first introduced in Linux 5.13 but it must be configured at build time with ``CONFIG_SECURITY_LANDLOCK=y``. Landlock must also be enabled at boot -time as the other security modules. The list of security modules enabled by +time like other security modules. The list of security modules enabled by default is set with ``CONFIG_LSM``. The kernel configuration should then -contains ``CONFIG_LSM=landlock,[...]`` with ``[...]`` as the list of other +contain ``CONFIG_LSM=landlock,[...]`` with ``[...]`` as the list of other potentially useful security modules for the running system (see the ``CONFIG_LSM`` help). @@ -669,7 +669,7 @@ Questions and answers What about user space sandbox managers? --------------------------------------- -Using user space process to enforce restrictions on kernel resources can lead +Using user space processes to enforce restrictions on kernel resources can lead to race conditions or inconsistent evaluations (i.e. `Incorrect mirroring of the OS code and state <https://www.ndss-symposium.org/ndss2003/traps-and-pitfalls-practical-problems-system-call-interposition-based-security-tools/>`_). diff --git a/MAINTAINERS b/MAINTAINERS index 21fdaa19229a..b878ddc99f94 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -19579,6 +19579,17 @@ S: Supported F: Documentation/devicetree/bindings/i2c/renesas,iic-emev2.yaml F: drivers/i2c/busses/i2c-emev2.c +RENESAS ETHERNET AVB DRIVER +M: Paul Barker <paul.barker.ct@bp.renesas.com> +M: Niklas Söderlund <niklas.soderlund@ragnatech.se> +L: netdev@vger.kernel.org +L: linux-renesas-soc@vger.kernel.org +S: Supported +F: Documentation/devicetree/bindings/net/renesas,etheravb.yaml +F: drivers/net/ethernet/renesas/Kconfig +F: drivers/net/ethernet/renesas/Makefile +F: drivers/net/ethernet/renesas/ravb* + RENESAS ETHERNET SWITCH DRIVER R: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> L: netdev@vger.kernel.org @@ -19628,6 +19639,14 @@ F: Documentation/devicetree/bindings/i2c/renesas,rmobile-iic.yaml F: drivers/i2c/busses/i2c-rcar.c F: drivers/i2c/busses/i2c-sh_mobile.c +RENESAS R-CAR SATA DRIVER +M: Geert Uytterhoeven <geert+renesas@glider.be> +L: linux-ide@vger.kernel.org +L: linux-renesas-soc@vger.kernel.org +S: Supported +F: Documentation/devicetree/bindings/ata/renesas,rcar-sata.yaml +F: drivers/ata/sata_rcar.c + RENESAS R-CAR THERMAL DRIVERS M: Niklas Söderlund <niklas.soderlund@ragnatech.se> L: linux-renesas-soc@vger.kernel.org @@ -19703,6 +19722,17 @@ S: Supported F: Documentation/devicetree/bindings/i2c/renesas,rzv2m.yaml F: drivers/i2c/busses/i2c-rzv2m.c +RENESAS SUPERH ETHERNET DRIVER +M: Niklas Söderlund <niklas.soderlund@ragnatech.se> +L: netdev@vger.kernel.org +L: linux-renesas-soc@vger.kernel.org +S: Supported +F: Documentation/devicetree/bindings/net/renesas,ether.yaml +F: drivers/net/ethernet/renesas/Kconfig +F: drivers/net/ethernet/renesas/Makefile +F: drivers/net/ethernet/renesas/sh_eth* +F: include/linux/sh_eth.h + RENESAS USB PHY DRIVER M: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> L: linux-renesas-soc@vger.kernel.org diff --git a/arch/loongarch/include/asm/kasan.h b/arch/loongarch/include/asm/kasan.h index c6bce5fbff57..7f52bd31b9d4 100644 --- a/arch/loongarch/include/asm/kasan.h +++ b/arch/loongarch/include/asm/kasan.h @@ -25,6 +25,7 @@ /* 64-bit segment value. */ #define XKPRANGE_UC_SEG (0x8000) #define XKPRANGE_CC_SEG (0x9000) +#define XKPRANGE_WC_SEG (0xa000) #define XKVRANGE_VC_SEG (0xffff) /* Cached */ @@ -41,20 +42,28 @@ #define XKPRANGE_UC_SHADOW_SIZE (XKPRANGE_UC_SIZE >> KASAN_SHADOW_SCALE_SHIFT) #define XKPRANGE_UC_SHADOW_END (XKPRANGE_UC_KASAN_OFFSET + XKPRANGE_UC_SHADOW_SIZE) +/* WriteCombine */ +#define XKPRANGE_WC_START WRITECOMBINE_BASE +#define XKPRANGE_WC_SIZE XRANGE_SIZE +#define XKPRANGE_WC_KASAN_OFFSET XKPRANGE_UC_SHADOW_END +#define XKPRANGE_WC_SHADOW_SIZE (XKPRANGE_WC_SIZE >> KASAN_SHADOW_SCALE_SHIFT) +#define XKPRANGE_WC_SHADOW_END (XKPRANGE_WC_KASAN_OFFSET + XKPRANGE_WC_SHADOW_SIZE) + /* VMALLOC (Cached or UnCached) */ #define XKVRANGE_VC_START MODULES_VADDR #define XKVRANGE_VC_SIZE round_up(KFENCE_AREA_END - MODULES_VADDR + 1, PGDIR_SIZE) -#define XKVRANGE_VC_KASAN_OFFSET XKPRANGE_UC_SHADOW_END +#define XKVRANGE_VC_KASAN_OFFSET XKPRANGE_WC_SHADOW_END #define XKVRANGE_VC_SHADOW_SIZE (XKVRANGE_VC_SIZE >> KASAN_SHADOW_SCALE_SHIFT) #define XKVRANGE_VC_SHADOW_END (XKVRANGE_VC_KASAN_OFFSET + XKVRANGE_VC_SHADOW_SIZE) /* KAsan shadow memory start right after vmalloc. */ #define KASAN_SHADOW_START round_up(KFENCE_AREA_END, PGDIR_SIZE) #define KASAN_SHADOW_SIZE (XKVRANGE_VC_SHADOW_END - XKPRANGE_CC_KASAN_OFFSET) -#define KASAN_SHADOW_END round_up(KASAN_SHADOW_START + KASAN_SHADOW_SIZE, PGDIR_SIZE) +#define KASAN_SHADOW_END (round_up(KASAN_SHADOW_START + KASAN_SHADOW_SIZE, PGDIR_SIZE) - 1) #define XKPRANGE_CC_SHADOW_OFFSET (KASAN_SHADOW_START + XKPRANGE_CC_KASAN_OFFSET) #define XKPRANGE_UC_SHADOW_OFFSET (KASAN_SHADOW_START + XKPRANGE_UC_KASAN_OFFSET) +#define XKPRANGE_WC_SHADOW_OFFSET (KASAN_SHADOW_START + XKPRANGE_WC_KASAN_OFFSET) #define XKVRANGE_VC_SHADOW_OFFSET (KASAN_SHADOW_START + XKVRANGE_VC_KASAN_OFFSET) extern bool kasan_early_stage; diff --git a/arch/loongarch/include/asm/page.h b/arch/loongarch/include/asm/page.h index e85df33f11c7..8f21567a3188 100644 --- a/arch/loongarch/include/asm/page.h +++ b/arch/loongarch/include/asm/page.h @@ -113,10 +113,7 @@ struct page *tlb_virt_to_page(unsigned long kaddr); extern int __virt_addr_valid(volatile void *kaddr); #define virt_addr_valid(kaddr) __virt_addr_valid((volatile void *)(kaddr)) -#define VM_DATA_DEFAULT_FLAGS \ - (VM_READ | VM_WRITE | \ - ((current->personality & READ_IMPLIES_EXEC) ? VM_EXEC : 0) | \ - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) +#define VM_DATA_DEFAULT_FLAGS VM_DATA_FLAGS_TSK_EXEC #include <asm-generic/memory_model.h> #include <asm-generic/getorder.h> diff --git a/arch/loongarch/kernel/acpi.c b/arch/loongarch/kernel/acpi.c index f1a74b80f22c..382a09a7152c 100644 --- a/arch/loongarch/kernel/acpi.c +++ b/arch/loongarch/kernel/acpi.c @@ -58,48 +58,48 @@ void __iomem *acpi_os_ioremap(acpi_physical_address phys, acpi_size size) return ioremap_cache(phys, size); } -static int cpu_enumerated = 0; - #ifdef CONFIG_SMP -static int set_processor_mask(u32 id, u32 flags) +static int set_processor_mask(u32 id, u32 pass) { - int nr_cpus; - int cpu, cpuid = id; - - if (!cpu_enumerated) - nr_cpus = NR_CPUS; - else - nr_cpus = nr_cpu_ids; + int cpu = -1, cpuid = id; - if (num_processors >= nr_cpus) { + if (num_processors >= NR_CPUS) { pr_warn(PREFIX "nr_cpus limit of %i reached." - " processor 0x%x ignored.\n", nr_cpus, cpuid); + " processor 0x%x ignored.\n", NR_CPUS, cpuid); return -ENODEV; } + if (cpuid == loongson_sysconf.boot_cpu_id) cpu = 0; - else - cpu = find_first_zero_bit(cpumask_bits(cpu_present_mask), NR_CPUS); - - if (!cpu_enumerated) - set_cpu_possible(cpu, true); - if (flags & ACPI_MADT_ENABLED) { + switch (pass) { + case 1: /* Pass 1 handle enabled processors */ + if (cpu < 0) + cpu = find_first_zero_bit(cpumask_bits(cpu_present_mask), NR_CPUS); num_processors++; set_cpu_present(cpu, true); - __cpu_number_map[cpuid] = cpu; - __cpu_logical_map[cpu] = cpuid; - } else + break; + case 2: /* Pass 2 handle disabled processors */ + if (cpu < 0) + cpu = find_first_zero_bit(cpumask_bits(cpu_possible_mask), NR_CPUS); disabled_cpus++; + break; + default: + return cpu; + } + + set_cpu_possible(cpu, true); + __cpu_number_map[cpuid] = cpu; + __cpu_logical_map[cpu] = cpuid; return cpu; } #endif static int __init -acpi_parse_processor(union acpi_subtable_headers *header, const unsigned long end) +acpi_parse_p1_processor(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_core_pic *processor = NULL; @@ -110,13 +110,30 @@ acpi_parse_processor(union acpi_subtable_headers *header, const unsigned long en acpi_table_print_madt_entry(&header->common); #ifdef CONFIG_SMP acpi_core_pic[processor->core_id] = *processor; - set_processor_mask(processor->core_id, processor->flags); + if (processor->flags & ACPI_MADT_ENABLED) + set_processor_mask(processor->core_id, 1); #endif return 0; } static int __init +acpi_parse_p2_processor(union acpi_subtable_headers *header, const unsigned long end) +{ + struct acpi_madt_core_pic *processor = NULL; + + processor = (struct acpi_madt_core_pic *)header; + if (BAD_MADT_ENTRY(processor, end)) + return -EINVAL; + +#ifdef CONFIG_SMP + if (!(processor->flags & ACPI_MADT_ENABLED)) + set_processor_mask(processor->core_id, 2); +#endif + + return 0; +} +static int __init acpi_parse_eio_master(union acpi_subtable_headers *header, const unsigned long end) { static int core = 0; @@ -143,12 +160,14 @@ static void __init acpi_process_madt(void) } #endif acpi_table_parse_madt(ACPI_MADT_TYPE_CORE_PIC, - acpi_parse_processor, MAX_CORE_PIC); + acpi_parse_p1_processor, MAX_CORE_PIC); + + acpi_table_parse_madt(ACPI_MADT_TYPE_CORE_PIC, + acpi_parse_p2_processor, MAX_CORE_PIC); acpi_table_parse_madt(ACPI_MADT_TYPE_EIO_PIC, acpi_parse_eio_master, MAX_IO_PICS); - cpu_enumerated = 1; loongson_sysconf.nr_cpus = num_processors; } @@ -310,6 +329,10 @@ static int __ref acpi_map_cpu2node(acpi_handle handle, int cpu, int physid) int nid; nid = acpi_get_node(handle); + + if (nid != NUMA_NO_NODE) + nid = early_cpu_to_node(cpu); + if (nid != NUMA_NO_NODE) { set_cpuid_to_node(physid, nid); node_set(nid, numa_nodes_parsed); @@ -324,12 +347,14 @@ int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, u32 acpi_id, int *pcpu { int cpu; - cpu = set_processor_mask(physid, ACPI_MADT_ENABLED); - if (cpu < 0) { + cpu = cpu_number_map(physid); + if (cpu < 0 || cpu >= nr_cpu_ids) { pr_info(PREFIX "Unable to map lapic to logical cpu number\n"); - return cpu; + return -ERANGE; } + num_processors++; + set_cpu_present(cpu, true); acpi_map_cpu2node(handle, cpu, physid); *pcpu = cpu; diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c index a5fc61f8b348..e5a39bbad078 100644 --- a/arch/loongarch/kernel/paravirt.c +++ b/arch/loongarch/kernel/paravirt.c @@ -51,11 +51,18 @@ static u64 paravt_steal_clock(int cpu) } #ifdef CONFIG_SMP +static struct smp_ops native_ops; + static void pv_send_ipi_single(int cpu, unsigned int action) { int min, old; irq_cpustat_t *info = &per_cpu(irq_stat, cpu); + if (unlikely(action == ACTION_BOOT_CPU)) { + native_ops.send_ipi_single(cpu, action); + return; + } + old = atomic_fetch_or(BIT(action), &info->message); if (old) return; @@ -75,6 +82,11 @@ static void pv_send_ipi_mask(const struct cpumask *mask, unsigned int action) if (cpumask_empty(mask)) return; + if (unlikely(action == ACTION_BOOT_CPU)) { + native_ops.send_ipi_mask(mask, action); + return; + } + action = BIT(action); for_each_cpu(i, mask) { info = &per_cpu(irq_stat, i); @@ -147,6 +159,8 @@ static void pv_init_ipi(void) { int r, swi; + /* Init native ipi irq for ACTION_BOOT_CPU */ + native_ops.init_ipi(); swi = get_percpu_irq(INT_SWI0); if (swi < 0) panic("SWI0 IRQ mapping failed\n"); @@ -193,6 +207,7 @@ int __init pv_ipi_init(void) return 0; #ifdef CONFIG_SMP + native_ops = mp_ops; mp_ops.init_ipi = pv_init_ipi; mp_ops.send_ipi_single = pv_send_ipi_single; mp_ops.send_ipi_mask = pv_send_ipi_mask; diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c index 9afc2d8b3414..5d59e9ce2772 100644 --- a/arch/loongarch/kernel/smp.c +++ b/arch/loongarch/kernel/smp.c @@ -302,7 +302,7 @@ static void __init fdt_smp_setup(void) __cpu_number_map[cpuid] = cpu; __cpu_logical_map[cpu] = cpuid; - early_numa_add_cpu(cpu, 0); + early_numa_add_cpu(cpuid, 0); set_cpuid_to_node(cpuid, 0); } @@ -331,11 +331,11 @@ void __init loongson_prepare_cpus(unsigned int max_cpus) int i = 0; parse_acpi_topology(); + cpu_data[0].global_id = cpu_logical_map(0); for (i = 0; i < loongson_sysconf.nr_cpus; i++) { set_cpu_present(i, true); csr_mail_send(0, __cpu_logical_map[i], 0); - cpu_data[i].global_id = __cpu_logical_map[i]; } per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE; @@ -380,6 +380,7 @@ void loongson_init_secondary(void) cpu_logical_map(cpu) / loongson_sysconf.cores_per_package; cpu_data[cpu].core = pptt_enabled ? cpu_data[cpu].core : cpu_logical_map(cpu) % loongson_sysconf.cores_per_package; + cpu_data[cpu].global_id = cpu_logical_map(cpu); } void loongson_smp_finish(void) diff --git a/arch/loongarch/mm/kasan_init.c b/arch/loongarch/mm/kasan_init.c index 427d6b1aec09..d2681272d8f0 100644 --- a/arch/loongarch/mm/kasan_init.c +++ b/arch/loongarch/mm/kasan_init.c @@ -13,6 +13,13 @@ static pgd_t kasan_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE); +#ifdef __PAGETABLE_P4D_FOLDED +#define __pgd_none(early, pgd) (0) +#else +#define __pgd_none(early, pgd) (early ? (pgd_val(pgd) == 0) : \ +(__pa(pgd_val(pgd)) == (unsigned long)__pa(kasan_early_shadow_p4d))) +#endif + #ifdef __PAGETABLE_PUD_FOLDED #define __p4d_none(early, p4d) (0) #else @@ -55,6 +62,9 @@ void *kasan_mem_to_shadow(const void *addr) case XKPRANGE_UC_SEG: offset = XKPRANGE_UC_SHADOW_OFFSET; break; + case XKPRANGE_WC_SEG: + offset = XKPRANGE_WC_SHADOW_OFFSET; + break; case XKVRANGE_VC_SEG: offset = XKVRANGE_VC_SHADOW_OFFSET; break; @@ -79,6 +89,8 @@ const void *kasan_shadow_to_mem(const void *shadow_addr) if (addr >= XKVRANGE_VC_SHADOW_OFFSET) return (void *)(((addr - XKVRANGE_VC_SHADOW_OFFSET) << KASAN_SHADOW_SCALE_SHIFT) + XKVRANGE_VC_START); + else if (addr >= XKPRANGE_WC_SHADOW_OFFSET) + return (void *)(((addr - XKPRANGE_WC_SHADOW_OFFSET) << KASAN_SHADOW_SCALE_SHIFT) + XKPRANGE_WC_START); else if (addr >= XKPRANGE_UC_SHADOW_OFFSET) return (void *)(((addr - XKPRANGE_UC_SHADOW_OFFSET) << KASAN_SHADOW_SCALE_SHIFT) + XKPRANGE_UC_START); else if (addr >= XKPRANGE_CC_SHADOW_OFFSET) @@ -142,6 +154,19 @@ static pud_t *__init kasan_pud_offset(p4d_t *p4dp, unsigned long addr, int node, return pud_offset(p4dp, addr); } +static p4d_t *__init kasan_p4d_offset(pgd_t *pgdp, unsigned long addr, int node, bool early) +{ + if (__pgd_none(early, pgdp_get(pgdp))) { + phys_addr_t p4d_phys = early ? + __pa_symbol(kasan_early_shadow_p4d) : kasan_alloc_zeroed_page(node); + if (!early) + memcpy(__va(p4d_phys), kasan_early_shadow_p4d, sizeof(kasan_early_shadow_p4d)); + pgd_populate(&init_mm, pgdp, (p4d_t *)__va(p4d_phys)); + } + + return p4d_offset(pgdp, addr); +} + static void __init kasan_pte_populate(pmd_t *pmdp, unsigned long addr, unsigned long end, int node, bool early) { @@ -178,19 +203,19 @@ static void __init kasan_pud_populate(p4d_t *p4dp, unsigned long addr, do { next = pud_addr_end(addr, end); kasan_pmd_populate(pudp, addr, next, node, early); - } while (pudp++, addr = next, addr != end); + } while (pudp++, addr = next, addr != end && __pud_none(early, READ_ONCE(*pudp))); } static void __init kasan_p4d_populate(pgd_t *pgdp, unsigned long addr, unsigned long end, int node, bool early) { unsigned long next; - p4d_t *p4dp = p4d_offset(pgdp, addr); + p4d_t *p4dp = kasan_p4d_offset(pgdp, addr, node, early); do { next = p4d_addr_end(addr, end); kasan_pud_populate(p4dp, addr, next, node, early); - } while (p4dp++, addr = next, addr != end); + } while (p4dp++, addr = next, addr != end && __p4d_none(early, READ_ONCE(*p4dp))); } static void __init kasan_pgd_populate(unsigned long addr, unsigned long end, @@ -218,7 +243,7 @@ static void __init kasan_map_populate(unsigned long start, unsigned long end, asmlinkage void __init kasan_early_init(void) { BUILD_BUG_ON(!IS_ALIGNED(KASAN_SHADOW_START, PGDIR_SIZE)); - BUILD_BUG_ON(!IS_ALIGNED(KASAN_SHADOW_END, PGDIR_SIZE)); + BUILD_BUG_ON(!IS_ALIGNED(KASAN_SHADOW_END + 1, PGDIR_SIZE)); } static inline void kasan_set_pgd(pgd_t *pgdp, pgd_t pgdval) @@ -233,7 +258,7 @@ static void __init clear_pgds(unsigned long start, unsigned long end) * swapper_pg_dir. pgd_clear() can't be used * here because it's nop on 2,3-level pagetable setups */ - for (; start < end; start += PGDIR_SIZE) + for (; start < end; start = pgd_addr_end(start, end)) kasan_set_pgd((pgd_t *)pgd_offset_k(start), __pgd(0)); } @@ -243,6 +268,17 @@ void __init kasan_init(void) phys_addr_t pa_start, pa_end; /* + * If PGDIR_SIZE is too large for cpu_vabits, KASAN_SHADOW_END will + * overflow UINTPTR_MAX and then looks like a user space address. + * For example, PGDIR_SIZE of CONFIG_4KB_4LEVEL is 2^39, which is too + * large for Loongson-2K series whose cpu_vabits = 39. + */ + if (KASAN_SHADOW_END < vm_map_base) { + pr_warn("PGDIR_SIZE too large for cpu_vabits, KernelAddressSanitizer disabled.\n"); + return; + } + + /* * PGD was populated as invalid_pmd_table or invalid_pud_table * in pagetable_init() which depends on how many levels of page * table you are using, but we had to clean the gpd of kasan diff --git a/arch/mips/crypto/crc32-mips.c b/arch/mips/crypto/crc32-mips.c index a7a1d43a1b2c..90eacf00cfc3 100644 --- a/arch/mips/crypto/crc32-mips.c +++ b/arch/mips/crypto/crc32-mips.c @@ -123,20 +123,20 @@ static u32 crc32c_mips_le_hw(u32 crc_, const u8 *p, unsigned int len) for (; len >= sizeof(u64); p += sizeof(u64), len -= sizeof(u64)) { u64 value = get_unaligned_le64(p); - CRC32(crc, value, d); + CRC32C(crc, value, d); } if (len & sizeof(u32)) { u32 value = get_unaligned_le32(p); - CRC32(crc, value, w); + CRC32C(crc, value, w); p += sizeof(u32); } } else { for (; len >= sizeof(u32); len -= sizeof(u32)) { u32 value = get_unaligned_le32(p); - CRC32(crc, value, w); + CRC32C(crc, value, w); p += sizeof(u32); } } diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 2098dc689088..95c6beb8ce27 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2629,19 +2629,26 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu) { struct kvm_lapic *apic = vcpu->arch.apic; - if (apic->apicv_active) { - /* irr_pending is always true when apicv is activated. */ - apic->irr_pending = true; + /* + * When APICv is enabled, KVM must always search the IRR for a pending + * IRQ, as other vCPUs and devices can set IRR bits even if the vCPU + * isn't running. If APICv is disabled, KVM _should_ search the IRR + * for a pending IRQ. But KVM currently doesn't ensure *all* hardware, + * e.g. CPUs and IOMMUs, has seen the change in state, i.e. searching + * the IRR at this time could race with IRQ delivery from hardware that + * still sees APICv as being enabled. + * + * FIXME: Ensure other vCPUs and devices observe the change in APICv + * state prior to updating KVM's metadata caches, so that KVM + * can safely search the IRR and set irr_pending accordingly. + */ + apic->irr_pending = true; + + if (apic->apicv_active) apic->isr_count = 1; - } else { - /* - * Don't clear irr_pending, searching the IRR can race with - * updates from the CPU as APICv is still active from hardware's - * perspective. The flag will be cleared as appropriate when - * KVM injects the interrupt. - */ + else apic->isr_count = count_vectors(apic->regs + APIC_ISR); - } + apic->highest_isr_cache = -1; } diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 0b851ef937f2..fb854cf20ac3 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -450,8 +450,11 @@ static int __sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp, goto e_free; /* This needs to happen after SEV/SNP firmware initialization. */ - if (vm_type == KVM_X86_SNP_VM && snp_guest_req_init(kvm)) - goto e_free; + if (vm_type == KVM_X86_SNP_VM) { + ret = snp_guest_req_init(kvm); + if (ret) + goto e_free; + } INIT_LIST_HEAD(&sev->regions_list); INIT_LIST_HEAD(&sev->mirror_vms); @@ -2212,10 +2215,6 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp) if (sev->snp_context) return -EINVAL; - sev->snp_context = snp_context_create(kvm, argp); - if (!sev->snp_context) - return -ENOTTY; - if (params.flags) return -EINVAL; @@ -2230,6 +2229,10 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp) if (params.policy & SNP_POLICY_MASK_SINGLE_SOCKET) return -EINVAL; + sev->snp_context = snp_context_create(kvm, argp); + if (!sev->snp_context) + return -ENOTTY; + start.gctx_paddr = __psp_pa(sev->snp_context); start.policy = params.policy; memcpy(start.gosvw, params.gosvw, sizeof(params.gosvw)); diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index a8e7bc04d9bf..931a7361c30f 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -1197,11 +1197,14 @@ static void nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu, kvm_hv_nested_transtion_tlb_flush(vcpu, enable_ept); /* - * If vmcs12 doesn't use VPID, L1 expects linear and combined mappings - * for *all* contexts to be flushed on VM-Enter/VM-Exit, i.e. it's a - * full TLB flush from the guest's perspective. This is required even - * if VPID is disabled in the host as KVM may need to synchronize the - * MMU in response to the guest TLB flush. + * If VPID is disabled, then guest TLB accesses use VPID=0, i.e. the + * same VPID as the host, and so architecturally, linear and combined + * mappings for VPID=0 must be flushed at VM-Enter and VM-Exit. KVM + * emulates L2 sharing L1's VPID=0 by using vpid01 while running L2, + * and so KVM must also emulate TLB flush of VPID=0, i.e. vpid01. This + * is required if VPID is disabled in KVM, as a TLB flush (there are no + * VPIDs) still occurs from L1's perspective, and KVM may need to + * synchronize the MMU in response to the guest TLB flush. * * Note, using TLB_FLUSH_GUEST is correct even if nested EPT is in use. * EPT is a special snowflake, as guest-physical mappings aren't @@ -2315,6 +2318,17 @@ static void prepare_vmcs02_early_rare(struct vcpu_vmx *vmx, vmcs_write64(VMCS_LINK_POINTER, INVALID_GPA); + /* + * If VPID is disabled, then guest TLB accesses use VPID=0, i.e. the + * same VPID as the host. Emulate this behavior by using vpid01 for L2 + * if VPID is disabled in vmcs12. Note, if VPID is disabled, VM-Enter + * and VM-Exit are architecturally required to flush VPID=0, but *only* + * VPID=0. I.e. using vpid02 would be ok (so long as KVM emulates the + * required flushes), but doing so would cause KVM to over-flush. E.g. + * if L1 runs L2 X with VPID12=1, then runs L2 Y with VPID12 disabled, + * and then runs L2 X again, then KVM can and should retain TLB entries + * for VPID12=1. + */ if (enable_vpid) { if (nested_cpu_has_vpid(vmcs12) && vmx->nested.vpid02) vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->nested.vpid02); @@ -5950,6 +5964,12 @@ static int handle_invvpid(struct kvm_vcpu *vcpu) return nested_vmx_fail(vcpu, VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID); + /* + * Always flush the effective vpid02, i.e. never flush the current VPID + * and never explicitly flush vpid01. INVVPID targets a VPID, not a + * VMCS, and so whether or not the current vmcs12 has VPID enabled is + * irrelevant (and there may not be a loaded vmcs12). + */ vpid02 = nested_get_vpid02(vcpu); switch (type) { case VMX_VPID_EXTENT_INDIVIDUAL_ADDR: diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 81ed596e4454..d28618e9277e 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -217,9 +217,11 @@ module_param(ple_window_shrink, uint, 0444); static unsigned int ple_window_max = KVM_VMX_DEFAULT_PLE_WINDOW_MAX; module_param(ple_window_max, uint, 0444); -/* Default is SYSTEM mode, 1 for host-guest mode */ +/* Default is SYSTEM mode, 1 for host-guest mode (which is BROKEN) */ int __read_mostly pt_mode = PT_MODE_SYSTEM; +#ifdef CONFIG_BROKEN module_param(pt_mode, int, S_IRUGO); +#endif struct x86_pmu_lbr __ro_after_init vmx_lbr_caps; @@ -3216,7 +3218,7 @@ void vmx_flush_tlb_all(struct kvm_vcpu *vcpu) static inline int vmx_get_current_vpid(struct kvm_vcpu *vcpu) { - if (is_guest_mode(vcpu)) + if (is_guest_mode(vcpu) && nested_cpu_has_vpid(get_vmcs12(vcpu))) return nested_get_vpid02(vcpu); return to_vmx(vcpu)->vpid; } diff --git a/drivers/bluetooth/btintel.c b/drivers/bluetooth/btintel.c index 438b92967bc3..30a32ebbcc68 100644 --- a/drivers/bluetooth/btintel.c +++ b/drivers/bluetooth/btintel.c @@ -3288,13 +3288,12 @@ static int btintel_diagnostics(struct hci_dev *hdev, struct sk_buff *skb) case INTEL_TLV_TEST_EXCEPTION: /* Generate devcoredump from exception */ if (!hci_devcd_init(hdev, skb->len)) { - hci_devcd_append(hdev, skb); + hci_devcd_append(hdev, skb_clone(skb, GFP_ATOMIC)); hci_devcd_complete(hdev); } else { bt_dev_err(hdev, "Failed to generate devcoredump"); - kfree_skb(skb); } - return 0; + break; default: bt_dev_err(hdev, "Invalid exception type %02X", tlv->val[0]); } diff --git a/drivers/char/tpm/tpm-buf.c b/drivers/char/tpm/tpm-buf.c index cad0048bcc3c..e49a19fea3bd 100644 --- a/drivers/char/tpm/tpm-buf.c +++ b/drivers/char/tpm/tpm-buf.c @@ -147,6 +147,26 @@ void tpm_buf_append_u32(struct tpm_buf *buf, const u32 value) EXPORT_SYMBOL_GPL(tpm_buf_append_u32); /** + * tpm_buf_append_handle() - Add a handle + * @chip: &tpm_chip instance + * @buf: &tpm_buf instance + * @handle: a TPM object handle + * + * Add a handle to the buffer, and increase the count tracking the number of + * handles in the command buffer. Works only for command buffers. + */ +void tpm_buf_append_handle(struct tpm_chip *chip, struct tpm_buf *buf, u32 handle) +{ + if (buf->flags & TPM_BUF_TPM2B) { + dev_err(&chip->dev, "Invalid buffer type (TPM2B)\n"); + return; + } + + tpm_buf_append_u32(buf, handle); + buf->handles++; +} + +/** * tpm_buf_read() - Read from a TPM buffer * @buf: &tpm_buf instance * @offset: offset within the buffer diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c index 1e856259219e..dfdcbd009720 100644 --- a/drivers/char/tpm/tpm2-cmd.c +++ b/drivers/char/tpm/tpm2-cmd.c @@ -14,6 +14,10 @@ #include "tpm.h" #include <crypto/hash_info.h> +static bool disable_pcr_integrity; +module_param(disable_pcr_integrity, bool, 0444); +MODULE_PARM_DESC(disable_pcr_integrity, "Disable integrity protection of TPM2_PCR_Extend"); + static struct tpm2_hash tpm2_hash_map[] = { {HASH_ALGO_SHA1, TPM_ALG_SHA1}, {HASH_ALGO_SHA256, TPM_ALG_SHA256}, @@ -232,18 +236,26 @@ int tpm2_pcr_extend(struct tpm_chip *chip, u32 pcr_idx, int rc; int i; - rc = tpm2_start_auth_session(chip); - if (rc) - return rc; + if (!disable_pcr_integrity) { + rc = tpm2_start_auth_session(chip); + if (rc) + return rc; + } rc = tpm_buf_init(&buf, TPM2_ST_SESSIONS, TPM2_CC_PCR_EXTEND); if (rc) { - tpm2_end_auth_session(chip); + if (!disable_pcr_integrity) + tpm2_end_auth_session(chip); return rc; } - tpm_buf_append_name(chip, &buf, pcr_idx, NULL); - tpm_buf_append_hmac_session(chip, &buf, 0, NULL, 0); + if (!disable_pcr_integrity) { + tpm_buf_append_name(chip, &buf, pcr_idx, NULL); + tpm_buf_append_hmac_session(chip, &buf, 0, NULL, 0); + } else { + tpm_buf_append_handle(chip, &buf, pcr_idx); + tpm_buf_append_auth(chip, &buf, 0, NULL, 0); + } tpm_buf_append_u32(&buf, chip->nr_allocated_banks); @@ -253,9 +265,11 @@ int tpm2_pcr_extend(struct tpm_chip *chip, u32 pcr_idx, chip->allocated_banks[i].digest_size); } - tpm_buf_fill_hmac_session(chip, &buf); + if (!disable_pcr_integrity) + tpm_buf_fill_hmac_session(chip, &buf); rc = tpm_transmit_cmd(chip, &buf, 0, "attempting extend a PCR value"); - rc = tpm_buf_check_hmac_response(chip, &buf, rc); + if (!disable_pcr_integrity) + rc = tpm_buf_check_hmac_response(chip, &buf, rc); tpm_buf_destroy(&buf); diff --git a/drivers/char/tpm/tpm2-sessions.c b/drivers/char/tpm/tpm2-sessions.c index 0739830904b2..b0f13c8ea79c 100644 --- a/drivers/char/tpm/tpm2-sessions.c +++ b/drivers/char/tpm/tpm2-sessions.c @@ -237,9 +237,7 @@ void tpm_buf_append_name(struct tpm_chip *chip, struct tpm_buf *buf, #endif if (!tpm2_chip_auth(chip)) { - tpm_buf_append_u32(buf, handle); - /* count the number of handles in the upper bits of flags */ - buf->handles++; + tpm_buf_append_handle(chip, buf, handle); return; } @@ -272,6 +270,31 @@ void tpm_buf_append_name(struct tpm_chip *chip, struct tpm_buf *buf, } EXPORT_SYMBOL_GPL(tpm_buf_append_name); +void tpm_buf_append_auth(struct tpm_chip *chip, struct tpm_buf *buf, + u8 attributes, u8 *passphrase, int passphrase_len) +{ + /* offset tells us where the sessions area begins */ + int offset = buf->handles * 4 + TPM_HEADER_SIZE; + u32 len = 9 + passphrase_len; + + if (tpm_buf_length(buf) != offset) { + /* not the first session so update the existing length */ + len += get_unaligned_be32(&buf->data[offset]); + put_unaligned_be32(len, &buf->data[offset]); + } else { + tpm_buf_append_u32(buf, len); + } + /* auth handle */ + tpm_buf_append_u32(buf, TPM2_RS_PW); + /* nonce */ + tpm_buf_append_u16(buf, 0); + /* attributes */ + tpm_buf_append_u8(buf, 0); + /* passphrase */ + tpm_buf_append_u16(buf, passphrase_len); + tpm_buf_append(buf, passphrase, passphrase_len); +} + /** * tpm_buf_append_hmac_session() - Append a TPM session element * @chip: the TPM chip structure @@ -309,26 +332,8 @@ void tpm_buf_append_hmac_session(struct tpm_chip *chip, struct tpm_buf *buf, #endif if (!tpm2_chip_auth(chip)) { - /* offset tells us where the sessions area begins */ - int offset = buf->handles * 4 + TPM_HEADER_SIZE; - u32 len = 9 + passphrase_len; - - if (tpm_buf_length(buf) != offset) { - /* not the first session so update the existing length */ - len += get_unaligned_be32(&buf->data[offset]); - put_unaligned_be32(len, &buf->data[offset]); - } else { - tpm_buf_append_u32(buf, len); - } - /* auth handle */ - tpm_buf_append_u32(buf, TPM2_RS_PW); - /* nonce */ - tpm_buf_append_u16(buf, 0); - /* attributes */ - tpm_buf_append_u8(buf, 0); - /* passphrase */ - tpm_buf_append_u16(buf, passphrase_len); - tpm_buf_append(buf, passphrase, passphrase_len); + tpm_buf_append_auth(chip, buf, attributes, passphrase, + passphrase_len); return; } @@ -948,10 +953,13 @@ static int tpm2_load_null(struct tpm_chip *chip, u32 *null_key) /* Deduce from the name change TPM interference: */ dev_err(&chip->dev, "null key integrity check failed\n"); tpm2_flush_context(chip, tmp_null_key); - chip->flags |= TPM_CHIP_FLAG_DISABLE; err: - return rc ? -ENODEV : 0; + if (rc) { + chip->flags |= TPM_CHIP_FLAG_DISABLE; + rc = -ENODEV; + } + return rc; } /** diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index cd2ac1ba53d2..400337f3b572 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -1028,26 +1028,29 @@ static void hybrid_update_cpu_capacity_scaling(void) } } -static void __hybrid_init_cpu_capacity_scaling(void) +static void __hybrid_refresh_cpu_capacity_scaling(void) { hybrid_max_perf_cpu = NULL; hybrid_update_cpu_capacity_scaling(); } -static void hybrid_init_cpu_capacity_scaling(bool refresh) +static void hybrid_refresh_cpu_capacity_scaling(void) { - bool disable_itmt = false; + guard(mutex)(&hybrid_capacity_lock); - mutex_lock(&hybrid_capacity_lock); + __hybrid_refresh_cpu_capacity_scaling(); +} +static void hybrid_init_cpu_capacity_scaling(bool refresh) +{ /* * If hybrid_max_perf_cpu is set at this point, the hybrid CPU capacity * scaling has been enabled already and the driver is just changing the * operation mode. */ if (refresh) { - __hybrid_init_cpu_capacity_scaling(); - goto unlock; + hybrid_refresh_cpu_capacity_scaling(); + return; } /* @@ -1056,19 +1059,13 @@ static void hybrid_init_cpu_capacity_scaling(bool refresh) * do not do that when SMT is in use. */ if (hwp_is_hybrid && !sched_smt_active() && arch_enable_hybrid_capacity_scale()) { - __hybrid_init_cpu_capacity_scaling(); - disable_itmt = true; - } - -unlock: - mutex_unlock(&hybrid_capacity_lock); - - /* - * Disabling ITMT causes sched domains to be rebuilt to disable asym - * packing and enable asym capacity. - */ - if (disable_itmt) + hybrid_refresh_cpu_capacity_scaling(); + /* + * Disabling ITMT causes sched domains to be rebuilt to disable asym + * packing and enable asym capacity. + */ sched_clear_itmt_support(); + } } static bool hybrid_clear_max_perf_cpu(void) @@ -1404,7 +1401,7 @@ static void intel_pstate_update_limits_for_all(void) mutex_lock(&hybrid_capacity_lock); if (hybrid_max_perf_cpu) - __hybrid_init_cpu_capacity_scaling(); + __hybrid_refresh_cpu_capacity_scaling(); mutex_unlock(&hybrid_capacity_lock); } diff --git a/drivers/firmware/arm_scmi/perf.c b/drivers/firmware/arm_scmi/perf.c index 2d77b5f40ca7..c7e5a34b254b 100644 --- a/drivers/firmware/arm_scmi/perf.c +++ b/drivers/firmware/arm_scmi/perf.c @@ -373,7 +373,7 @@ static int iter_perf_levels_update_state(struct scmi_iterator_state *st, return 0; } -static inline void +static inline int process_response_opp(struct device *dev, struct perf_dom_info *dom, struct scmi_opp *opp, unsigned int loop_idx, const struct scmi_msg_resp_perf_describe_levels *r) @@ -386,12 +386,16 @@ process_response_opp(struct device *dev, struct perf_dom_info *dom, le16_to_cpu(r->opp[loop_idx].transition_latency_us); ret = xa_insert(&dom->opps_by_lvl, opp->perf, opp, GFP_KERNEL); - if (ret) - dev_warn(dev, "Failed to add opps_by_lvl at %d for %s - ret:%d\n", + if (ret) { + dev_info(dev, FW_BUG "Failed to add opps_by_lvl at %d for %s - ret:%d\n", opp->perf, dom->info.name, ret); + return ret; + } + + return 0; } -static inline void +static inline int process_response_opp_v4(struct device *dev, struct perf_dom_info *dom, struct scmi_opp *opp, unsigned int loop_idx, const struct scmi_msg_resp_perf_describe_levels_v4 *r) @@ -404,9 +408,11 @@ process_response_opp_v4(struct device *dev, struct perf_dom_info *dom, le16_to_cpu(r->opp[loop_idx].transition_latency_us); ret = xa_insert(&dom->opps_by_lvl, opp->perf, opp, GFP_KERNEL); - if (ret) - dev_warn(dev, "Failed to add opps_by_lvl at %d for %s - ret:%d\n", + if (ret) { + dev_info(dev, FW_BUG "Failed to add opps_by_lvl at %d for %s - ret:%d\n", opp->perf, dom->info.name, ret); + return ret; + } /* Note that PERF v4 reports always five 32-bit words */ opp->indicative_freq = le32_to_cpu(r->opp[loop_idx].indicative_freq); @@ -415,13 +421,21 @@ process_response_opp_v4(struct device *dev, struct perf_dom_info *dom, ret = xa_insert(&dom->opps_by_idx, opp->level_index, opp, GFP_KERNEL); - if (ret) + if (ret) { dev_warn(dev, "Failed to add opps_by_idx at %d for %s - ret:%d\n", opp->level_index, dom->info.name, ret); + /* Cleanup by_lvl too */ + xa_erase(&dom->opps_by_lvl, opp->perf); + + return ret; + } + hash_add(dom->opps_by_freq, &opp->hash, opp->indicative_freq); } + + return 0; } static int @@ -429,16 +443,22 @@ iter_perf_levels_process_response(const struct scmi_protocol_handle *ph, const void *response, struct scmi_iterator_state *st, void *priv) { + int ret; struct scmi_opp *opp; struct scmi_perf_ipriv *p = priv; - opp = &p->perf_dom->opp[st->desc_index + st->loop_idx]; + opp = &p->perf_dom->opp[p->perf_dom->opp_count]; if (PROTOCOL_REV_MAJOR(p->version) <= 0x3) - process_response_opp(ph->dev, p->perf_dom, opp, st->loop_idx, - response); + ret = process_response_opp(ph->dev, p->perf_dom, opp, + st->loop_idx, response); else - process_response_opp_v4(ph->dev, p->perf_dom, opp, st->loop_idx, - response); + ret = process_response_opp_v4(ph->dev, p->perf_dom, opp, + st->loop_idx, response); + + /* Skip BAD duplicates received from firmware */ + if (ret) + return ret == -EBUSY ? 0 : ret; + p->perf_dom->opp_count++; dev_dbg(ph->dev, "Level %d Power %d Latency %dus Ifreq %d Index %d\n", diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index c4cf26f1d149..be0743dac3ff 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -269,8 +269,6 @@ rdma_find_ndev_for_src_ip_rcu(struct net *net, const struct sockaddr *src_in) break; #endif } - if (!ret && dev && is_vlan_dev(dev)) - dev = vlan_dev_real_dev(dev); return ret ? ERR_PTR(ret) : dev; } diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c index 6715c96a3eee..9eb290ec71a8 100644 --- a/drivers/infiniband/hw/bnxt_re/main.c +++ b/drivers/infiniband/hw/bnxt_re/main.c @@ -300,9 +300,6 @@ static void bnxt_re_shutdown(struct auxiliary_device *adev) struct bnxt_re_en_dev_info *en_info = auxiliary_get_drvdata(adev); struct bnxt_re_dev *rdev; - if (!en_info) - return; - rdev = en_info->rdev; ib_unregister_device(&rdev->ibdev); bnxt_re_dev_uninit(rdev, BNXT_RE_COMPLETE_REMOVE); @@ -316,9 +313,6 @@ static void bnxt_re_stop_irq(void *handle) struct bnxt_qplib_nq *nq; int indx; - if (!en_info) - return; - rdev = en_info->rdev; rcfw = &rdev->rcfw; @@ -339,9 +333,6 @@ static void bnxt_re_start_irq(void *handle, struct bnxt_msix_entry *ent) struct bnxt_qplib_nq *nq; int indx, rc; - if (!en_info) - return; - rdev = en_info->rdev; msix_ent = rdev->en_dev->msix_entries; rcfw = &rdev->rcfw; @@ -1991,10 +1982,6 @@ static void bnxt_re_remove(struct auxiliary_device *adev) struct bnxt_re_dev *rdev; mutex_lock(&bnxt_re_mutex); - if (!en_info) { - mutex_unlock(&bnxt_re_mutex); - return; - } rdev = en_info->rdev; if (rdev) @@ -2025,7 +2012,15 @@ static int bnxt_re_probe(struct auxiliary_device *adev, auxiliary_set_drvdata(adev, en_info); rc = bnxt_re_add_device(adev, BNXT_RE_COMPLETE_INIT); + if (rc) + goto err; mutex_unlock(&bnxt_re_mutex); + return 0; + +err: + mutex_unlock(&bnxt_re_mutex); + kfree(en_info); + return rc; } @@ -2035,9 +2030,6 @@ static int bnxt_re_suspend(struct auxiliary_device *adev, pm_message_t state) struct bnxt_en_dev *en_dev; struct bnxt_re_dev *rdev; - if (!en_info) - return 0; - rdev = en_info->rdev; en_dev = en_info->en_dev; mutex_lock(&bnxt_re_mutex); @@ -2082,9 +2074,6 @@ static int bnxt_re_resume(struct auxiliary_device *adev) struct bnxt_re_en_dev_info *en_info = auxiliary_get_drvdata(adev); struct bnxt_re_dev *rdev; - if (!en_info) - return 0; - mutex_lock(&bnxt_re_mutex); /* L2 driver may invoke this callback during device recovery, resume. * reset. Current RoCE driver doesn't recover the device in case of diff --git a/drivers/mailbox/qcom-cpucp-mbox.c b/drivers/mailbox/qcom-cpucp-mbox.c index e5437c294803..44f4ed15f818 100644 --- a/drivers/mailbox/qcom-cpucp-mbox.c +++ b/drivers/mailbox/qcom-cpucp-mbox.c @@ -138,7 +138,7 @@ static int qcom_cpucp_mbox_probe(struct platform_device *pdev) return irq; ret = devm_request_irq(dev, irq, qcom_cpucp_mbox_irq_fn, - IRQF_TRIGGER_HIGH, "apss_cpucp_mbox", cpucp); + IRQF_TRIGGER_HIGH | IRQF_NO_SUSPEND, "apss_cpucp_mbox", cpucp); if (ret < 0) return dev_err_probe(dev, ret, "Failed to register irq: %d\n", irq); diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c index d478aafa02c9..23e0b71b991e 100644 --- a/drivers/md/dm-bufio.c +++ b/drivers/md/dm-bufio.c @@ -2471,7 +2471,8 @@ struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsign int r; unsigned int num_locks; struct dm_bufio_client *c; - char slab_name[27]; + char slab_name[64]; + static atomic_t seqno = ATOMIC_INIT(0); if (!block_size || block_size & ((1 << SECTOR_SHIFT) - 1)) { DMERR("%s: block size not specified or is not multiple of 512b", __func__); @@ -2522,7 +2523,8 @@ struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsign (block_size < PAGE_SIZE || !is_power_of_2(block_size))) { unsigned int align = min(1U << __ffs(block_size), (unsigned int)PAGE_SIZE); - snprintf(slab_name, sizeof(slab_name), "dm_bufio_cache-%u", block_size); + snprintf(slab_name, sizeof(slab_name), "dm_bufio_cache-%u-%u", + block_size, atomic_inc_return(&seqno)); c->slab_cache = kmem_cache_create(slab_name, block_size, align, SLAB_RECLAIM_ACCOUNT, NULL); if (!c->slab_cache) { @@ -2531,9 +2533,11 @@ struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsign } } if (aux_size) - snprintf(slab_name, sizeof(slab_name), "dm_bufio_buffer-%u", aux_size); + snprintf(slab_name, sizeof(slab_name), "dm_bufio_buffer-%u-%u", + aux_size, atomic_inc_return(&seqno)); else - snprintf(slab_name, sizeof(slab_name), "dm_bufio_buffer"); + snprintf(slab_name, sizeof(slab_name), "dm_bufio_buffer-%u", + atomic_inc_return(&seqno)); c->slab_buffer = kmem_cache_create(slab_name, sizeof(struct dm_buffer) + aux_size, 0, SLAB_RECLAIM_ACCOUNT, NULL); if (!c->slab_buffer) { diff --git a/drivers/md/dm-cache-background-tracker.c b/drivers/md/dm-cache-background-tracker.c index 9c5308298cf1..f3051bd7d2df 100644 --- a/drivers/md/dm-cache-background-tracker.c +++ b/drivers/md/dm-cache-background-tracker.c @@ -11,12 +11,6 @@ #define DM_MSG_PREFIX "dm-background-tracker" -struct bt_work { - struct list_head list; - struct rb_node node; - struct policy_work work; -}; - struct background_tracker { unsigned int max_work; atomic_t pending_promotes; @@ -26,10 +20,10 @@ struct background_tracker { struct list_head issued; struct list_head queued; struct rb_root pending; - - struct kmem_cache *work_cache; }; +struct kmem_cache *btracker_work_cache = NULL; + struct background_tracker *btracker_create(unsigned int max_work) { struct background_tracker *b = kmalloc(sizeof(*b), GFP_KERNEL); @@ -48,12 +42,6 @@ struct background_tracker *btracker_create(unsigned int max_work) INIT_LIST_HEAD(&b->queued); b->pending = RB_ROOT; - b->work_cache = KMEM_CACHE(bt_work, 0); - if (!b->work_cache) { - DMERR("couldn't create mempool for background work items"); - kfree(b); - b = NULL; - } return b; } @@ -66,10 +54,9 @@ void btracker_destroy(struct background_tracker *b) BUG_ON(!list_empty(&b->issued)); list_for_each_entry_safe (w, tmp, &b->queued, list) { list_del(&w->list); - kmem_cache_free(b->work_cache, w); + kmem_cache_free(btracker_work_cache, w); } - kmem_cache_destroy(b->work_cache); kfree(b); } EXPORT_SYMBOL_GPL(btracker_destroy); @@ -180,7 +167,7 @@ static struct bt_work *alloc_work(struct background_tracker *b) if (max_work_reached(b)) return NULL; - return kmem_cache_alloc(b->work_cache, GFP_NOWAIT); + return kmem_cache_alloc(btracker_work_cache, GFP_NOWAIT); } int btracker_queue(struct background_tracker *b, @@ -203,7 +190,7 @@ int btracker_queue(struct background_tracker *b, * There was a race, we'll just ignore this second * bit of work for the same oblock. */ - kmem_cache_free(b->work_cache, w); + kmem_cache_free(btracker_work_cache, w); return -EINVAL; } @@ -244,7 +231,7 @@ void btracker_complete(struct background_tracker *b, update_stats(b, &w->work, -1); rb_erase(&w->node, &b->pending); list_del(&w->list); - kmem_cache_free(b->work_cache, w); + kmem_cache_free(btracker_work_cache, w); } EXPORT_SYMBOL_GPL(btracker_complete); diff --git a/drivers/md/dm-cache-background-tracker.h b/drivers/md/dm-cache-background-tracker.h index 5b8f5c667b81..09c8fc59f7bb 100644 --- a/drivers/md/dm-cache-background-tracker.h +++ b/drivers/md/dm-cache-background-tracker.h @@ -26,6 +26,14 @@ * protected with a spinlock. */ +struct bt_work { + struct list_head list; + struct rb_node node; + struct policy_work work; +}; + +extern struct kmem_cache *btracker_work_cache; + struct background_work; struct background_tracker; diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c index 40709310e327..849eb6333e98 100644 --- a/drivers/md/dm-cache-target.c +++ b/drivers/md/dm-cache-target.c @@ -10,6 +10,7 @@ #include "dm-bio-record.h" #include "dm-cache-metadata.h" #include "dm-io-tracker.h" +#include "dm-cache-background-tracker.h" #include <linux/dm-io.h> #include <linux/dm-kcopyd.h> @@ -2263,7 +2264,7 @@ static int parse_cache_args(struct cache_args *ca, int argc, char **argv, /*----------------------------------------------------------------*/ -static struct kmem_cache *migration_cache; +static struct kmem_cache *migration_cache = NULL; #define NOT_CORE_OPTION 1 @@ -3445,22 +3446,36 @@ static int __init dm_cache_init(void) int r; migration_cache = KMEM_CACHE(dm_cache_migration, 0); - if (!migration_cache) - return -ENOMEM; + if (!migration_cache) { + r = -ENOMEM; + goto err; + } + + btracker_work_cache = kmem_cache_create("dm_cache_bt_work", + sizeof(struct bt_work), __alignof__(struct bt_work), 0, NULL); + if (!btracker_work_cache) { + r = -ENOMEM; + goto err; + } r = dm_register_target(&cache_target); if (r) { - kmem_cache_destroy(migration_cache); - return r; + goto err; } return 0; + +err: + kmem_cache_destroy(migration_cache); + kmem_cache_destroy(btracker_work_cache); + return r; } static void __exit dm_cache_exit(void) { dm_unregister_target(&cache_target); kmem_cache_destroy(migration_cache); + kmem_cache_destroy(btracker_work_cache); } module_init(dm_cache_init); diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c index 41e451235f63..e9f6e4e62290 100644 --- a/drivers/mmc/host/dw_mmc.c +++ b/drivers/mmc/host/dw_mmc.c @@ -2957,8 +2957,8 @@ static int dw_mci_init_slot(struct dw_mci *host) if (host->use_dma == TRANS_MODE_IDMAC) { mmc->max_segs = host->ring_size; mmc->max_blk_size = 65535; - mmc->max_req_size = DW_MCI_DESC_DATA_LENGTH * host->ring_size; - mmc->max_seg_size = mmc->max_req_size; + mmc->max_seg_size = 0x1000; + mmc->max_req_size = mmc->max_seg_size * host->ring_size; mmc->max_blk_count = mmc->max_req_size / 512; } else if (host->use_dma == TRANS_MODE_EDMAC) { mmc->max_segs = 64; diff --git a/drivers/mmc/host/sunxi-mmc.c b/drivers/mmc/host/sunxi-mmc.c index d3bd0ac99ec4..e0ab5fd635e6 100644 --- a/drivers/mmc/host/sunxi-mmc.c +++ b/drivers/mmc/host/sunxi-mmc.c @@ -1191,10 +1191,9 @@ static const struct sunxi_mmc_cfg sun50i_a64_emmc_cfg = { .needs_new_timings = true, }; -static const struct sunxi_mmc_cfg sun50i_a100_cfg = { +static const struct sunxi_mmc_cfg sun50i_h616_cfg = { .idma_des_size_bits = 16, .idma_des_shift = 2, - .clk_delays = NULL, .can_calibrate = true, .mask_data0 = true, .needs_new_timings = true, @@ -1217,8 +1216,9 @@ static const struct of_device_id sunxi_mmc_of_match[] = { { .compatible = "allwinner,sun20i-d1-mmc", .data = &sun20i_d1_cfg }, { .compatible = "allwinner,sun50i-a64-mmc", .data = &sun50i_a64_cfg }, { .compatible = "allwinner,sun50i-a64-emmc", .data = &sun50i_a64_emmc_cfg }, - { .compatible = "allwinner,sun50i-a100-mmc", .data = &sun50i_a100_cfg }, + { .compatible = "allwinner,sun50i-a100-mmc", .data = &sun20i_d1_cfg }, { .compatible = "allwinner,sun50i-a100-emmc", .data = &sun50i_a100_emmc_cfg }, + { .compatible = "allwinner,sun50i-h616-mmc", .data = &sun50i_h616_cfg }, { /* sentinel */ } }; MODULE_DEVICE_TABLE(of, sunxi_mmc_of_match); diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index b1bffd8e9a95..15e0f14d0d49 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1008,6 +1008,8 @@ static void bond_hw_addr_swap(struct bonding *bond, struct slave *new_active, if (bond->dev->flags & IFF_UP) bond_hw_addr_flush(bond->dev, old_active->dev); + + bond_slave_ns_maddrs_add(bond, old_active); } if (new_active) { @@ -1024,6 +1026,8 @@ static void bond_hw_addr_swap(struct bonding *bond, struct slave *new_active, dev_mc_sync(new_active->dev, bond->dev); netif_addr_unlock_bh(bond->dev); } + + bond_slave_ns_maddrs_del(bond, new_active); } } @@ -2341,6 +2345,11 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev, bond_compute_features(bond); bond_set_carrier(bond); + /* Needs to be called before bond_select_active_slave(), which will + * remove the maddrs if the slave is selected as active slave. + */ + bond_slave_ns_maddrs_add(bond, new_slave); + if (bond_uses_primary(bond)) { block_netpoll_tx(); bond_select_active_slave(bond); @@ -2350,7 +2359,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev, if (bond_mode_can_use_xmit_hash(bond)) bond_update_slave_arr(bond, NULL); - if (!slave_dev->netdev_ops->ndo_bpf || !slave_dev->netdev_ops->ndo_xdp_xmit) { if (bond->xdp_prog) { @@ -2548,6 +2556,12 @@ static int __bond_release_one(struct net_device *bond_dev, if (oldcurrent == slave) bond_change_active_slave(bond, NULL); + /* Must be called after bond_change_active_slave () as the slave + * might change from an active slave to a backup slave. Then it is + * necessary to clear the maddrs on the backup slave. + */ + bond_slave_ns_maddrs_del(bond, slave); + if (bond_is_lb(bond)) { /* Must be called only after the slave has been * detached from the list and the curr_active_slave diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c index 95d59a18c022..327b6ecdc77e 100644 --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -15,6 +15,7 @@ #include <linux/sched/signal.h> #include <net/bonding.h> +#include <net/ndisc.h> static int bond_option_active_slave_set(struct bonding *bond, const struct bond_opt_value *newval); @@ -1234,6 +1235,68 @@ static int bond_option_arp_ip_targets_set(struct bonding *bond, } #if IS_ENABLED(CONFIG_IPV6) +static bool slave_can_set_ns_maddr(const struct bonding *bond, struct slave *slave) +{ + return BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP && + !bond_is_active_slave(slave) && + slave->dev->flags & IFF_MULTICAST; +} + +static void slave_set_ns_maddrs(struct bonding *bond, struct slave *slave, bool add) +{ + struct in6_addr *targets = bond->params.ns_targets; + char slot_maddr[MAX_ADDR_LEN]; + int i; + + if (!slave_can_set_ns_maddr(bond, slave)) + return; + + for (i = 0; i < BOND_MAX_NS_TARGETS; i++) { + if (ipv6_addr_any(&targets[i])) + break; + + if (!ndisc_mc_map(&targets[i], slot_maddr, slave->dev, 0)) { + if (add) + dev_mc_add(slave->dev, slot_maddr); + else + dev_mc_del(slave->dev, slot_maddr); + } + } +} + +void bond_slave_ns_maddrs_add(struct bonding *bond, struct slave *slave) +{ + if (!bond->params.arp_validate) + return; + slave_set_ns_maddrs(bond, slave, true); +} + +void bond_slave_ns_maddrs_del(struct bonding *bond, struct slave *slave) +{ + if (!bond->params.arp_validate) + return; + slave_set_ns_maddrs(bond, slave, false); +} + +static void slave_set_ns_maddr(struct bonding *bond, struct slave *slave, + struct in6_addr *target, struct in6_addr *slot) +{ + char target_maddr[MAX_ADDR_LEN], slot_maddr[MAX_ADDR_LEN]; + + if (!bond->params.arp_validate || !slave_can_set_ns_maddr(bond, slave)) + return; + + /* remove the previous maddr from slave */ + if (!ipv6_addr_any(slot) && + !ndisc_mc_map(slot, slot_maddr, slave->dev, 0)) + dev_mc_del(slave->dev, slot_maddr); + + /* add new maddr on slave if target is set */ + if (!ipv6_addr_any(target) && + !ndisc_mc_map(target, target_maddr, slave->dev, 0)) + dev_mc_add(slave->dev, target_maddr); +} + static void _bond_options_ns_ip6_target_set(struct bonding *bond, int slot, struct in6_addr *target, unsigned long last_rx) @@ -1243,8 +1306,10 @@ static void _bond_options_ns_ip6_target_set(struct bonding *bond, int slot, struct slave *slave; if (slot >= 0 && slot < BOND_MAX_NS_TARGETS) { - bond_for_each_slave(bond, slave, iter) + bond_for_each_slave(bond, slave, iter) { slave->target_last_arp_rx[slot] = last_rx; + slave_set_ns_maddr(bond, slave, target, &targets[slot]); + } targets[slot] = *target; } } @@ -1296,15 +1361,30 @@ static int bond_option_ns_ip6_targets_set(struct bonding *bond, { return -EPERM; } + +static void slave_set_ns_maddrs(struct bonding *bond, struct slave *slave, bool add) {} + +void bond_slave_ns_maddrs_add(struct bonding *bond, struct slave *slave) {} + +void bond_slave_ns_maddrs_del(struct bonding *bond, struct slave *slave) {} #endif static int bond_option_arp_validate_set(struct bonding *bond, const struct bond_opt_value *newval) { + bool changed = !!bond->params.arp_validate != !!newval->value; + struct list_head *iter; + struct slave *slave; + netdev_dbg(bond->dev, "Setting arp_validate to %s (%llu)\n", newval->string, newval->value); bond->params.arp_validate = newval->value; + if (changed) { + bond_for_each_slave(bond, slave, iter) + slave_set_ns_maddrs(bond, slave, !!bond->params.arp_validate); + } + return 0; } diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index b83df5f94b1f..f1d088168723 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -907,7 +907,7 @@ static int igb_request_msix(struct igb_adapter *adapter) int i, err = 0, vector = 0, free_vector = 0; err = request_irq(adapter->msix_entries[vector].vector, - igb_msix_other, IRQF_NO_THREAD, netdev->name, adapter); + igb_msix_other, 0, netdev->name, adapter); if (err) goto err_out; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c index dcfccaaa8d91..92d5cfec3dc0 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c @@ -866,7 +866,7 @@ mlx5_tc_ct_entry_add_rule(struct mlx5_tc_ct_priv *ct_priv, return 0; err_rule: - mlx5_tc_ct_entry_destroy_mod_hdr(ct_priv, zone_rule->attr, zone_rule->mh); + mlx5_tc_ct_entry_destroy_mod_hdr(ct_priv, attr, zone_rule->mh); mlx5_put_label_mapping(ct_priv, attr->ct_attr.ct_labels_id); err_mod_hdr: kfree(attr); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c index d61be26a4df1..3db31cc10719 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c @@ -660,7 +660,7 @@ tx_sync_info_get(struct mlx5e_ktls_offload_context_tx *priv_tx, while (remaining > 0) { skb_frag_t *frag = &record->frags[i]; - get_page(skb_frag_page(frag)); + page_ref_inc(skb_frag_page(frag)); remaining -= skb_frag_size(frag); info->frags[i++] = *frag; } @@ -763,7 +763,7 @@ void mlx5e_ktls_tx_handle_resync_dump_comp(struct mlx5e_txqsq *sq, stats = sq->stats; mlx5e_tx_dma_unmap(sq->pdev, dma); - put_page(wi->resync_dump_frag_page); + page_ref_dec(wi->resync_dump_frag_page); stats->tls_dump_packets++; stats->tls_dump_bytes += wi->num_bytes; } @@ -816,12 +816,12 @@ mlx5e_ktls_tx_handle_ooo(struct mlx5e_ktls_offload_context_tx *priv_tx, err_out: for (; i < info.nr_frags; i++) - /* The put_page() here undoes the page ref obtained in tx_sync_info_get(). + /* The page_ref_dec() here undoes the page ref obtained in tx_sync_info_get(). * Page refs obtained for the DUMP WQEs above (by page_ref_add) will be * released only upon their completions (or in mlx5e_free_txqsq_descs, * if channel closes). */ - put_page(skb_frag_page(&info.frags[i])); + page_ref_dec(skb_frag_page(&info.frags[i])); return MLX5E_KTLS_SYNC_FAIL; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index e601324a690a..13a3fa8dc0cb 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -4267,7 +4267,8 @@ void mlx5e_set_xdp_feature(struct net_device *netdev) struct mlx5e_params *params = &priv->channels.params; xdp_features_t val; - if (params->packet_merge.type != MLX5E_PACKET_MERGE_NONE) { + if (!netdev->netdev_ops->ndo_bpf || + params->packet_merge.type != MLX5E_PACKET_MERGE_NONE) { xdp_clear_features_flag(netdev); return; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c index 5bf8318cc48b..1d60465cc2ca 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c @@ -36,6 +36,7 @@ #include "en.h" #include "en/port.h" #include "eswitch.h" +#include "lib/mlx5.h" static int mlx5e_test_health_info(struct mlx5e_priv *priv) { @@ -247,6 +248,9 @@ static int mlx5e_cond_loopback(struct mlx5e_priv *priv) if (is_mdev_switchdev_mode(priv->mdev)) return -EOPNOTSUPP; + if (mlx5_get_sd(priv->mdev)) + return -EOPNOTSUPP; + return 0; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index f24f91d213f2..8cf61ae8b89d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -2527,8 +2527,11 @@ static void __esw_offloads_unload_rep(struct mlx5_eswitch *esw, struct mlx5_eswitch_rep *rep, u8 rep_type) { if (atomic_cmpxchg(&rep->rep_data[rep_type].state, - REP_LOADED, REP_REGISTERED) == REP_LOADED) + REP_LOADED, REP_REGISTERED) == REP_LOADED) { + if (rep_type == REP_ETH) + __esw_offloads_unload_rep(esw, rep, REP_IB); esw->offloads.rep_ops[rep_type]->unload(rep); + } } static void __unload_reps_all_vport(struct mlx5_eswitch *esw, u8 rep_type) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c index 8505d5e241e1..6e4f8aaf8d2f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -2105,13 +2105,22 @@ lookup_fte_locked(struct mlx5_flow_group *g, fte_tmp = NULL; goto out; } + + nested_down_write_ref_node(&fte_tmp->node, FS_LOCK_CHILD); + if (!fte_tmp->node.active) { + up_write_ref_node(&fte_tmp->node, false); + + if (take_write) + up_write_ref_node(&g->node, false); + else + up_read_ref_node(&g->node); + tree_put_node(&fte_tmp->node, false); - fte_tmp = NULL; - goto out; + + return NULL; } - nested_down_write_ref_node(&fte_tmp->node, FS_LOCK_CHILD); out: if (take_write) up_write_ref_node(&g->node, false); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c index 81a9232a03e1..7db9cab9bedf 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c @@ -593,9 +593,11 @@ static void irq_pool_free(struct mlx5_irq_pool *pool) kvfree(pool); } -static int irq_pools_init(struct mlx5_core_dev *dev, int sf_vec, int pcif_vec) +static int irq_pools_init(struct mlx5_core_dev *dev, int sf_vec, int pcif_vec, + bool dynamic_vec) { struct mlx5_irq_table *table = dev->priv.irq_table; + int sf_vec_available = sf_vec; int num_sf_ctrl; int err; @@ -616,6 +618,13 @@ static int irq_pools_init(struct mlx5_core_dev *dev, int sf_vec, int pcif_vec) num_sf_ctrl = DIV_ROUND_UP(mlx5_sf_max_functions(dev), MLX5_SFS_PER_CTRL_IRQ); num_sf_ctrl = min_t(int, MLX5_IRQ_CTRL_SF_MAX, num_sf_ctrl); + if (!dynamic_vec && (num_sf_ctrl + 1) > sf_vec_available) { + mlx5_core_dbg(dev, + "Not enough IRQs for SFs control and completion pool, required=%d avail=%d\n", + num_sf_ctrl + 1, sf_vec_available); + return 0; + } + table->sf_ctrl_pool = irq_pool_alloc(dev, pcif_vec, num_sf_ctrl, "mlx5_sf_ctrl", MLX5_EQ_SHARE_IRQ_MIN_CTRL, @@ -624,9 +633,11 @@ static int irq_pools_init(struct mlx5_core_dev *dev, int sf_vec, int pcif_vec) err = PTR_ERR(table->sf_ctrl_pool); goto err_pf; } - /* init sf_comp_pool */ + sf_vec_available -= num_sf_ctrl; + + /* init sf_comp_pool, remaining vectors are for the SF completions */ table->sf_comp_pool = irq_pool_alloc(dev, pcif_vec + num_sf_ctrl, - sf_vec - num_sf_ctrl, "mlx5_sf_comp", + sf_vec_available, "mlx5_sf_comp", MLX5_EQ_SHARE_IRQ_MIN_COMP, MLX5_EQ_SHARE_IRQ_MAX_COMP); if (IS_ERR(table->sf_comp_pool)) { @@ -715,6 +726,7 @@ int mlx5_irq_table_get_num_comp(struct mlx5_irq_table *table) int mlx5_irq_table_create(struct mlx5_core_dev *dev) { int num_eqs = mlx5_max_eq_cap_get(dev); + bool dynamic_vec; int total_vec; int pcif_vec; int req_vec; @@ -724,21 +736,31 @@ int mlx5_irq_table_create(struct mlx5_core_dev *dev) if (mlx5_core_is_sf(dev)) return 0; + /* PCI PF vectors usage is limited by online cpus, device EQs and + * PCI MSI-X capability. + */ pcif_vec = MLX5_CAP_GEN(dev, num_ports) * num_online_cpus() + 1; pcif_vec = min_t(int, pcif_vec, num_eqs); + pcif_vec = min_t(int, pcif_vec, pci_msix_vec_count(dev->pdev)); total_vec = pcif_vec; if (mlx5_sf_max_functions(dev)) total_vec += MLX5_MAX_MSIX_PER_SF * mlx5_sf_max_functions(dev); total_vec = min_t(int, total_vec, pci_msix_vec_count(dev->pdev)); - pcif_vec = min_t(int, pcif_vec, pci_msix_vec_count(dev->pdev)); req_vec = pci_msix_can_alloc_dyn(dev->pdev) ? 1 : total_vec; n = pci_alloc_irq_vectors(dev->pdev, 1, req_vec, PCI_IRQ_MSIX); if (n < 0) return n; - err = irq_pools_init(dev, total_vec - pcif_vec, pcif_vec); + /* Further limit vectors of the pools based on platform for non dynamic case */ + dynamic_vec = pci_msix_can_alloc_dyn(dev->pdev); + if (!dynamic_vec) { + pcif_vec = min_t(int, n, pcif_vec); + total_vec = min_t(int, n, total_vec); + } + + err = irq_pools_init(dev, total_vec - pcif_vec, pcif_vec, dynamic_vec); if (err) pci_free_irq_vectors(dev->pdev); diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c index d68f0c4e7835..9739bc9867c5 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c @@ -108,7 +108,12 @@ static int intel_eth_plat_probe(struct platform_device *pdev) if (IS_ERR(dwmac->tx_clk)) return PTR_ERR(dwmac->tx_clk); - clk_prepare_enable(dwmac->tx_clk); + ret = clk_prepare_enable(dwmac->tx_clk); + if (ret) { + dev_err(&pdev->dev, + "Failed to enable tx_clk\n"); + return ret; + } /* Check and configure TX clock rate */ rate = clk_get_rate(dwmac->tx_clk); @@ -119,7 +124,7 @@ static int intel_eth_plat_probe(struct platform_device *pdev) if (ret) { dev_err(&pdev->dev, "Failed to set tx_clk\n"); - return ret; + goto err_tx_clk_disable; } } } @@ -133,7 +138,7 @@ static int intel_eth_plat_probe(struct platform_device *pdev) if (ret) { dev_err(&pdev->dev, "Failed to set clk_ptp_ref\n"); - return ret; + goto err_tx_clk_disable; } } } @@ -149,12 +154,15 @@ static int intel_eth_plat_probe(struct platform_device *pdev) } ret = stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res); - if (ret) { - clk_disable_unprepare(dwmac->tx_clk); - return ret; - } + if (ret) + goto err_tx_clk_disable; return 0; + +err_tx_clk_disable: + if (dwmac->data->tx_clk_en) + clk_disable_unprepare(dwmac->tx_clk); + return ret; } static void intel_eth_plat_remove(struct platform_device *pdev) @@ -162,7 +170,8 @@ static void intel_eth_plat_remove(struct platform_device *pdev) struct intel_dwmac *dwmac = get_stmmac_bsp_priv(&pdev->dev); stmmac_pltfr_remove(pdev); - clk_disable_unprepare(dwmac->tx_clk); + if (dwmac->data->tx_clk_en) + clk_disable_unprepare(dwmac->tx_clk); } static struct platform_driver intel_eth_plat_driver = { diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c index 2a9132d6d743..001857c294fb 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c @@ -589,9 +589,9 @@ static int mediatek_dwmac_common_data(struct platform_device *pdev, plat->mac_interface = priv_plat->phy_mode; if (priv_plat->mac_wol) - plat->flags |= STMMAC_FLAG_USE_PHY_WOL; - else plat->flags &= ~STMMAC_FLAG_USE_PHY_WOL; + else + plat->flags |= STMMAC_FLAG_USE_PHY_WOL; plat->riwt_off = 1; plat->maxmtu = ETH_DATA_LEN; plat->host_dma_width = priv_plat->variant->dma_bit_mask; diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.c b/drivers/net/ethernet/ti/icssg/icssg_prueth.c index 5c20ceb164df..fe2fd1bfc904 100644 --- a/drivers/net/ethernet/ti/icssg/icssg_prueth.c +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.c @@ -16,6 +16,7 @@ #include <linux/if_hsr.h> #include <linux/if_vlan.h> #include <linux/interrupt.h> +#include <linux/io-64-nonatomic-hi-lo.h> #include <linux/kernel.h> #include <linux/mfd/syscon.h> #include <linux/module.h> @@ -411,6 +412,8 @@ static int prueth_perout_enable(void *clockops_data, struct prueth_emac *emac = clockops_data; u32 reduction_factor = 0, offset = 0; struct timespec64 ts; + u64 current_cycle; + u64 start_offset; u64 ns_period; if (!on) @@ -449,8 +452,14 @@ static int prueth_perout_enable(void *clockops_data, writel(reduction_factor, emac->prueth->shram.va + TIMESYNC_FW_WC_SYNCOUT_REDUCTION_FACTOR_OFFSET); - writel(0, emac->prueth->shram.va + - TIMESYNC_FW_WC_SYNCOUT_START_TIME_CYCLECOUNT_OFFSET); + current_cycle = icssg_read_time(emac->prueth->shram.va + + TIMESYNC_FW_WC_CYCLECOUNT_OFFSET); + + /* Rounding of current_cycle count to next second */ + start_offset = roundup(current_cycle, MSEC_PER_SEC); + + hi_lo_writeq(start_offset, emac->prueth->shram.va + + TIMESYNC_FW_WC_SYNCOUT_START_TIME_CYCLECOUNT_OFFSET); return 0; } diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.h b/drivers/net/ethernet/ti/icssg/icssg_prueth.h index 8722bb4a268a..f5c1d473e9f9 100644 --- a/drivers/net/ethernet/ti/icssg/icssg_prueth.h +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.h @@ -330,6 +330,18 @@ static inline int prueth_emac_slice(struct prueth_emac *emac) extern const struct ethtool_ops icssg_ethtool_ops; extern const struct dev_pm_ops prueth_dev_pm_ops; +static inline u64 icssg_read_time(const void __iomem *addr) +{ + u32 low, high; + + do { + high = readl(addr + 4); + low = readl(addr); + } while (high != readl(addr + 4)); + + return low + ((u64)high << 32); +} + /* Classifier helpers */ void icssg_class_set_mac_addr(struct regmap *miig_rt, int slice, u8 *mac); void icssg_class_set_host_mac_addr(struct regmap *miig_rt, const u8 *mac); diff --git a/drivers/net/ethernet/vertexcom/mse102x.c b/drivers/net/ethernet/vertexcom/mse102x.c index 2c37957478fb..89dc4c401a8d 100644 --- a/drivers/net/ethernet/vertexcom/mse102x.c +++ b/drivers/net/ethernet/vertexcom/mse102x.c @@ -437,13 +437,15 @@ static void mse102x_tx_work(struct work_struct *work) mse = &mses->mse102x; while ((txb = skb_dequeue(&mse->txq))) { + unsigned int len = max_t(unsigned int, txb->len, ETH_ZLEN); + mutex_lock(&mses->lock); ret = mse102x_tx_pkt_spi(mse, txb, work_timeout); mutex_unlock(&mses->lock); if (ret) { mse->ndev->stats.tx_dropped++; } else { - mse->ndev->stats.tx_bytes += txb->len; + mse->ndev->stats.tx_bytes += len; mse->ndev->stats.tx_packets++; } diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c index 4309317de3d1..3e9957b6aa14 100644 --- a/drivers/net/phy/phylink.c +++ b/drivers/net/phy/phylink.c @@ -78,7 +78,7 @@ struct phylink { unsigned int pcs_neg_mode; unsigned int pcs_state; - bool mac_link_dropped; + bool link_failed; bool using_mac_select_pcs; struct sfp_bus *sfp_bus; @@ -1475,9 +1475,9 @@ static void phylink_resolve(struct work_struct *w) cur_link_state = pl->old_link_state; if (pl->phylink_disable_state) { - pl->mac_link_dropped = false; + pl->link_failed = false; link_state.link = false; - } else if (pl->mac_link_dropped) { + } else if (pl->link_failed) { link_state.link = false; retrigger = true; } else { @@ -1572,7 +1572,7 @@ static void phylink_resolve(struct work_struct *w) phylink_link_up(pl, link_state); } if (!link_state.link && retrigger) { - pl->mac_link_dropped = false; + pl->link_failed = false; queue_work(system_power_efficient_wq, &pl->resolve); } mutex_unlock(&pl->state_mutex); @@ -1835,6 +1835,8 @@ static void phylink_phy_change(struct phy_device *phydev, bool up) pl->phy_state.pause |= MLO_PAUSE_RX; pl->phy_state.interface = phydev->interface; pl->phy_state.link = up; + if (!up) + pl->link_failed = true; mutex_unlock(&pl->state_mutex); phylink_run_resolve(pl); @@ -2158,7 +2160,7 @@ EXPORT_SYMBOL_GPL(phylink_disconnect_phy); static void phylink_link_changed(struct phylink *pl, bool up, const char *what) { if (!up) - pl->mac_link_dropped = true; + pl->link_failed = true; phylink_run_resolve(pl); phylink_dbg(pl, "%s link %s\n", what, up ? "up" : "down"); } @@ -2792,7 +2794,7 @@ int phylink_ethtool_set_pauseparam(struct phylink *pl, * link will cycle. */ if (manual_changed) { - pl->mac_link_dropped = true; + pl->link_failed = true; phylink_run_resolve(pl); } diff --git a/drivers/pmdomain/arm/scmi_perf_domain.c b/drivers/pmdomain/arm/scmi_perf_domain.c index d7ef46ccd9b8..3693423459c9 100644 --- a/drivers/pmdomain/arm/scmi_perf_domain.c +++ b/drivers/pmdomain/arm/scmi_perf_domain.c @@ -125,7 +125,8 @@ static int scmi_perf_domain_probe(struct scmi_device *sdev) scmi_pd->ph = ph; scmi_pd->genpd.name = scmi_pd->info->name; scmi_pd->genpd.flags = GENPD_FLAG_ALWAYS_ON | - GENPD_FLAG_OPP_TABLE_FW; + GENPD_FLAG_OPP_TABLE_FW | + GENPD_FLAG_DEV_NAME_FW; scmi_pd->genpd.set_performance_state = scmi_pd_set_perf_state; scmi_pd->genpd.attach_dev = scmi_pd_attach_dev; scmi_pd->genpd.detach_dev = scmi_pd_detach_dev; diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c index 5ede0f7eda09..29ad510e881c 100644 --- a/drivers/pmdomain/core.c +++ b/drivers/pmdomain/core.c @@ -7,6 +7,7 @@ #define pr_fmt(fmt) "PM: " fmt #include <linux/delay.h> +#include <linux/idr.h> #include <linux/kernel.h> #include <linux/io.h> #include <linux/platform_device.h> @@ -23,6 +24,9 @@ #include <linux/cpu.h> #include <linux/debugfs.h> +/* Provides a unique ID for each genpd device */ +static DEFINE_IDA(genpd_ida); + #define GENPD_RETRY_MAX_MS 250 /* Approximate */ #define GENPD_DEV_CALLBACK(genpd, type, callback, dev) \ @@ -171,6 +175,7 @@ static const struct genpd_lock_ops genpd_raw_spin_ops = { #define genpd_is_cpu_domain(genpd) (genpd->flags & GENPD_FLAG_CPU_DOMAIN) #define genpd_is_rpm_always_on(genpd) (genpd->flags & GENPD_FLAG_RPM_ALWAYS_ON) #define genpd_is_opp_table_fw(genpd) (genpd->flags & GENPD_FLAG_OPP_TABLE_FW) +#define genpd_is_dev_name_fw(genpd) (genpd->flags & GENPD_FLAG_DEV_NAME_FW) static inline bool irq_safe_dev_in_sleep_domain(struct device *dev, const struct generic_pm_domain *genpd) @@ -189,7 +194,7 @@ static inline bool irq_safe_dev_in_sleep_domain(struct device *dev, if (ret) dev_warn_once(dev, "PM domain %s will not be powered off\n", - genpd->name); + dev_name(&genpd->dev)); return ret; } @@ -274,7 +279,7 @@ static void genpd_debug_remove(struct generic_pm_domain *genpd) if (!genpd_debugfs_dir) return; - debugfs_lookup_and_remove(genpd->name, genpd_debugfs_dir); + debugfs_lookup_and_remove(dev_name(&genpd->dev), genpd_debugfs_dir); } static void genpd_update_accounting(struct generic_pm_domain *genpd) @@ -731,7 +736,7 @@ static int _genpd_power_on(struct generic_pm_domain *genpd, bool timed) genpd->states[state_idx].power_on_latency_ns = elapsed_ns; genpd->gd->max_off_time_changed = true; pr_debug("%s: Power-%s latency exceeded, new value %lld ns\n", - genpd->name, "on", elapsed_ns); + dev_name(&genpd->dev), "on", elapsed_ns); out: raw_notifier_call_chain(&genpd->power_notifiers, GENPD_NOTIFY_ON, NULL); @@ -782,7 +787,7 @@ static int _genpd_power_off(struct generic_pm_domain *genpd, bool timed) genpd->states[state_idx].power_off_latency_ns = elapsed_ns; genpd->gd->max_off_time_changed = true; pr_debug("%s: Power-%s latency exceeded, new value %lld ns\n", - genpd->name, "off", elapsed_ns); + dev_name(&genpd->dev), "off", elapsed_ns); out: raw_notifier_call_chain(&genpd->power_notifiers, GENPD_NOTIFY_OFF, @@ -1940,7 +1945,7 @@ int dev_pm_genpd_add_notifier(struct device *dev, struct notifier_block *nb) if (ret) { dev_warn(dev, "failed to add notifier for PM domain %s\n", - genpd->name); + dev_name(&genpd->dev)); return ret; } @@ -1987,7 +1992,7 @@ int dev_pm_genpd_remove_notifier(struct device *dev) if (ret) { dev_warn(dev, "failed to remove notifier for PM domain %s\n", - genpd->name); + dev_name(&genpd->dev)); return ret; } @@ -2013,7 +2018,7 @@ static int genpd_add_subdomain(struct generic_pm_domain *genpd, */ if (!genpd_is_irq_safe(genpd) && genpd_is_irq_safe(subdomain)) { WARN(1, "Parent %s of subdomain %s must be IRQ safe\n", - genpd->name, subdomain->name); + dev_name(&genpd->dev), subdomain->name); return -EINVAL; } @@ -2088,7 +2093,7 @@ int pm_genpd_remove_subdomain(struct generic_pm_domain *genpd, if (!list_empty(&subdomain->parent_links) || subdomain->device_count) { pr_warn("%s: unable to remove subdomain %s\n", - genpd->name, subdomain->name); + dev_name(&genpd->dev), subdomain->name); ret = -EBUSY; goto out; } @@ -2225,6 +2230,7 @@ int pm_genpd_init(struct generic_pm_domain *genpd, genpd->status = is_off ? GENPD_STATE_OFF : GENPD_STATE_ON; genpd->device_count = 0; genpd->provider = NULL; + genpd->device_id = -ENXIO; genpd->has_provider = false; genpd->accounting_time = ktime_get_mono_fast_ns(); genpd->domain.ops.runtime_suspend = genpd_runtime_suspend; @@ -2265,7 +2271,18 @@ int pm_genpd_init(struct generic_pm_domain *genpd, return ret; device_initialize(&genpd->dev); - dev_set_name(&genpd->dev, "%s", genpd->name); + + if (!genpd_is_dev_name_fw(genpd)) { + dev_set_name(&genpd->dev, "%s", genpd->name); + } else { + ret = ida_alloc(&genpd_ida, GFP_KERNEL); + if (ret < 0) { + put_device(&genpd->dev); + return ret; + } + genpd->device_id = ret; + dev_set_name(&genpd->dev, "%s_%u", genpd->name, genpd->device_id); + } mutex_lock(&gpd_list_lock); list_add(&genpd->gpd_list_node, &gpd_list); @@ -2287,13 +2304,13 @@ static int genpd_remove(struct generic_pm_domain *genpd) if (genpd->has_provider) { genpd_unlock(genpd); - pr_err("Provider present, unable to remove %s\n", genpd->name); + pr_err("Provider present, unable to remove %s\n", dev_name(&genpd->dev)); return -EBUSY; } if (!list_empty(&genpd->parent_links) || genpd->device_count) { genpd_unlock(genpd); - pr_err("%s: unable to remove %s\n", __func__, genpd->name); + pr_err("%s: unable to remove %s\n", __func__, dev_name(&genpd->dev)); return -EBUSY; } @@ -2307,9 +2324,11 @@ static int genpd_remove(struct generic_pm_domain *genpd) genpd_unlock(genpd); genpd_debug_remove(genpd); cancel_work_sync(&genpd->power_off_work); + if (genpd->device_id != -ENXIO) + ida_free(&genpd_ida, genpd->device_id); genpd_free_data(genpd); - pr_debug("%s: removed %s\n", __func__, genpd->name); + pr_debug("%s: removed %s\n", __func__, dev_name(&genpd->dev)); return 0; } @@ -3272,12 +3291,12 @@ static int genpd_summary_one(struct seq_file *s, else snprintf(state, sizeof(state), "%s", status_lookup[genpd->status]); - seq_printf(s, "%-30s %-30s %u", genpd->name, state, genpd->performance_state); + seq_printf(s, "%-30s %-30s %u", dev_name(&genpd->dev), state, genpd->performance_state); /* * Modifications on the list require holding locks on both * parent and child, so we are safe. - * Also genpd->name is immutable. + * Also the device name is immutable. */ list_for_each_entry(link, &genpd->parent_links, parent_node) { if (list_is_first(&link->parent_node, &genpd->parent_links)) @@ -3502,7 +3521,7 @@ static void genpd_debug_add(struct generic_pm_domain *genpd) if (!genpd_debugfs_dir) return; - d = debugfs_create_dir(genpd->name, genpd_debugfs_dir); + d = debugfs_create_dir(dev_name(&genpd->dev), genpd_debugfs_dir); debugfs_create_file("current_state", 0444, d, genpd, &status_fops); diff --git a/drivers/pmdomain/imx/imx93-blk-ctrl.c b/drivers/pmdomain/imx/imx93-blk-ctrl.c index 904ffa55b8f4..b10348ac10f0 100644 --- a/drivers/pmdomain/imx/imx93-blk-ctrl.c +++ b/drivers/pmdomain/imx/imx93-blk-ctrl.c @@ -313,7 +313,9 @@ static void imx93_blk_ctrl_remove(struct platform_device *pdev) of_genpd_del_provider(pdev->dev.of_node); - for (i = 0; bc->onecell_data.num_domains; i++) { + pm_runtime_disable(&pdev->dev); + + for (i = 0; i < bc->onecell_data.num_domains; i++) { struct imx93_blk_ctrl_domain *domain = &bc->domains[i]; pm_genpd_remove(&domain->genpd); diff --git a/drivers/vdpa/ifcvf/ifcvf_base.c b/drivers/vdpa/ifcvf/ifcvf_base.c index 472daa588a9d..d5507b63b6cd 100644 --- a/drivers/vdpa/ifcvf/ifcvf_base.c +++ b/drivers/vdpa/ifcvf/ifcvf_base.c @@ -108,7 +108,7 @@ int ifcvf_init_hw(struct ifcvf_hw *hw, struct pci_dev *pdev) u32 i; ret = pci_read_config_byte(pdev, PCI_CAPABILITY_LIST, &pos); - if (ret < 0) { + if (ret) { IFCVF_ERR(pdev, "Failed to read PCI capability list\n"); return -EIO; } diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c index 2dd21e0b399e..7d0c83b5b071 100644 --- a/drivers/vdpa/mlx5/core/mr.c +++ b/drivers/vdpa/mlx5/core/mr.c @@ -373,7 +373,7 @@ static int map_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct_mr struct page *pg; unsigned int nsg; int sglen; - u64 pa; + u64 pa, offset; u64 paend; struct scatterlist *sg; struct device *dma = mvdev->vdev.dma_dev; @@ -396,8 +396,10 @@ static int map_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct_mr sg = mr->sg_head.sgl; for (map = vhost_iotlb_itree_first(iotlb, mr->start, mr->end - 1); map; map = vhost_iotlb_itree_next(map, mr->start, mr->end - 1)) { - paend = map->addr + maplen(map, mr); - for (pa = map->addr; pa < paend; pa += sglen) { + offset = mr->start > map->start ? mr->start - map->start : 0; + pa = map->addr + offset; + paend = map->addr + offset + maplen(map, mr); + for (; pa < paend; pa += sglen) { pg = pfn_to_page(__phys_to_pfn(pa)); if (!sg) { mlx5_vdpa_warn(mvdev, "sg null. start 0x%llx, end 0x%llx\n", diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index dee019977716..5f581e71e201 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -3963,28 +3963,28 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name, mvdev->vdev.dma_dev = &mdev->pdev->dev; err = mlx5_vdpa_alloc_resources(&ndev->mvdev); if (err) - goto err_mpfs; + goto err_alloc; err = mlx5_vdpa_init_mr_resources(mvdev); if (err) - goto err_res; + goto err_alloc; if (MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) { err = mlx5_vdpa_create_dma_mr(mvdev); if (err) - goto err_mr_res; + goto err_alloc; } err = alloc_fixed_resources(ndev); if (err) - goto err_mr; + goto err_alloc; ndev->cvq_ent.mvdev = mvdev; INIT_WORK(&ndev->cvq_ent.work, mlx5_cvq_kick_handler); mvdev->wq = create_singlethread_workqueue("mlx5_vdpa_wq"); if (!mvdev->wq) { err = -ENOMEM; - goto err_res2; + goto err_alloc; } mvdev->vdev.mdev = &mgtdev->mgtdev; @@ -4010,17 +4010,6 @@ err_setup_vq_res: _vdpa_unregister_device(&mvdev->vdev); err_reg: destroy_workqueue(mvdev->wq); -err_res2: - free_fixed_resources(ndev); -err_mr: - mlx5_vdpa_clean_mrs(mvdev); -err_mr_res: - mlx5_vdpa_destroy_mr_resources(mvdev); -err_res: - mlx5_vdpa_free_resources(&ndev->mvdev); -err_mpfs: - if (!is_zero_ether_addr(config->mac)) - mlx5_mpfs_del_mac(pfmdev, config->mac); err_alloc: put_device(&mvdev->vdev.dev); return err; diff --git a/drivers/vdpa/solidrun/snet_main.c b/drivers/vdpa/solidrun/snet_main.c index 99428a04068d..c8b74980dbd1 100644 --- a/drivers/vdpa/solidrun/snet_main.c +++ b/drivers/vdpa/solidrun/snet_main.c @@ -555,7 +555,7 @@ static const struct vdpa_config_ops snet_config_ops = { static int psnet_open_pf_bar(struct pci_dev *pdev, struct psnet *psnet) { - char name[50]; + char *name; int ret, i, mask = 0; /* We don't know which BAR will be used to communicate.. * We will map every bar with len > 0. @@ -573,7 +573,10 @@ static int psnet_open_pf_bar(struct pci_dev *pdev, struct psnet *psnet) return -ENODEV; } - snprintf(name, sizeof(name), "psnet[%s]-bars", pci_name(pdev)); + name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "psnet[%s]-bars", pci_name(pdev)); + if (!name) + return -ENOMEM; + ret = pcim_iomap_regions(pdev, mask, name); if (ret) { SNET_ERR(pdev, "Failed to request and map PCI BARs\n"); @@ -590,10 +593,13 @@ static int psnet_open_pf_bar(struct pci_dev *pdev, struct psnet *psnet) static int snet_open_vf_bar(struct pci_dev *pdev, struct snet *snet) { - char name[50]; + char *name; int ret; - snprintf(name, sizeof(name), "snet[%s]-bar", pci_name(pdev)); + name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "snet[%s]-bars", pci_name(pdev)); + if (!name) + return -ENOMEM; + /* Request and map BAR */ ret = pcim_iomap_regions(pdev, BIT(snet->psnet->cfg.vf_bar), name); if (ret) { diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c b/drivers/vdpa/virtio_pci/vp_vdpa.c index ac4ab22f7d8b..16380764275e 100644 --- a/drivers/vdpa/virtio_pci/vp_vdpa.c +++ b/drivers/vdpa/virtio_pci/vp_vdpa.c @@ -612,7 +612,11 @@ static int vp_vdpa_probe(struct pci_dev *pdev, const struct pci_device_id *id) goto mdev_err; } - mdev_id = kzalloc(sizeof(struct virtio_device_id), GFP_KERNEL); + /* + * id_table should be a null terminated array, so allocate one additional + * entry here, see vdpa_mgmtdev_get_classes(). + */ + mdev_id = kcalloc(2, sizeof(struct virtio_device_id), GFP_KERNEL); if (!mdev_id) { err = -ENOMEM; goto mdev_id_err; @@ -632,8 +636,8 @@ static int vp_vdpa_probe(struct pci_dev *pdev, const struct pci_device_id *id) goto probe_err; } - mdev_id->device = mdev->id.device; - mdev_id->vendor = mdev->id.vendor; + mdev_id[0].device = mdev->id.device; + mdev_id[0].vendor = mdev->id.vendor; mgtdev->id_table = mdev_id; mgtdev->max_supported_vqs = vp_modern_get_num_queues(mdev); mgtdev->supported_features = vp_modern_get_features(mdev); diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index c44d8ba00c02..88074451dd61 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -24,6 +24,16 @@ MODULE_PARM_DESC(force_legacy, "Force legacy mode for transitional virtio 1 devices"); #endif +bool vp_is_avq(struct virtio_device *vdev, unsigned int index) +{ + struct virtio_pci_device *vp_dev = to_vp_device(vdev); + + if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ)) + return false; + + return index == vp_dev->admin_vq.vq_index; +} + /* wait for pending irq handlers */ void vp_synchronize_vectors(struct virtio_device *vdev) { @@ -234,10 +244,9 @@ out_info: return vq; } -static void vp_del_vq(struct virtqueue *vq) +static void vp_del_vq(struct virtqueue *vq, struct virtio_pci_vq_info *info) { struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev); - struct virtio_pci_vq_info *info = vp_dev->vqs[vq->index]; unsigned long flags; /* @@ -258,13 +267,16 @@ static void vp_del_vq(struct virtqueue *vq) void vp_del_vqs(struct virtio_device *vdev) { struct virtio_pci_device *vp_dev = to_vp_device(vdev); + struct virtio_pci_vq_info *info; struct virtqueue *vq, *n; int i; list_for_each_entry_safe(vq, n, &vdev->vqs, list) { - if (vp_dev->per_vq_vectors) { - int v = vp_dev->vqs[vq->index]->msix_vector; + info = vp_is_avq(vdev, vq->index) ? vp_dev->admin_vq.info : + vp_dev->vqs[vq->index]; + if (vp_dev->per_vq_vectors) { + int v = info->msix_vector; if (v != VIRTIO_MSI_NO_VECTOR && !vp_is_slow_path_vector(v)) { int irq = pci_irq_vector(vp_dev->pci_dev, v); @@ -273,7 +285,7 @@ void vp_del_vqs(struct virtio_device *vdev) free_irq(irq, vq); } } - vp_del_vq(vq); + vp_del_vq(vq, info); } vp_dev->per_vq_vectors = false; @@ -354,7 +366,7 @@ vp_find_one_vq_msix(struct virtio_device *vdev, int queue_idx, vring_interrupt, 0, vp_dev->msix_names[msix_vec], vq); if (err) { - vp_del_vq(vq); + vp_del_vq(vq, *p_info); return ERR_PTR(err); } diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h index 1d9c49947f52..8beecf23ec85 100644 --- a/drivers/virtio/virtio_pci_common.h +++ b/drivers/virtio/virtio_pci_common.h @@ -178,6 +178,7 @@ struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev); #define VIRTIO_ADMIN_CMD_BITMAP 0 #endif +bool vp_is_avq(struct virtio_device *vdev, unsigned int index); void vp_modern_avq_done(struct virtqueue *vq); int vp_modern_admin_cmd_exec(struct virtio_device *vdev, struct virtio_admin_cmd *cmd); diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c index 9193c30d640a..4fbcbc7a9ae1 100644 --- a/drivers/virtio/virtio_pci_modern.c +++ b/drivers/virtio/virtio_pci_modern.c @@ -43,16 +43,6 @@ static int vp_avq_index(struct virtio_device *vdev, u16 *index, u16 *num) return 0; } -static bool vp_is_avq(struct virtio_device *vdev, unsigned int index) -{ - struct virtio_pci_device *vp_dev = to_vp_device(vdev); - - if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ)) - return false; - - return index == vp_dev->admin_vq.vq_index; -} - void vp_modern_avq_done(struct virtqueue *vq) { struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev); @@ -245,7 +235,7 @@ static void vp_modern_avq_cleanup(struct virtio_device *vdev) if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ)) return; - vq = vp_dev->vqs[vp_dev->admin_vq.vq_index]->vq; + vq = vp_dev->admin_vq.info->vq; if (!vq) return; diff --git a/fs/bcachefs/backpointers.c b/fs/bcachefs/backpointers.c index 47455a85c909..654a58132a4d 100644 --- a/fs/bcachefs/backpointers.c +++ b/fs/bcachefs/backpointers.c @@ -52,6 +52,12 @@ int bch2_backpointer_validate(struct bch_fs *c, struct bkey_s_c k, enum bch_validate_flags flags) { struct bkey_s_c_backpointer bp = bkey_s_c_to_backpointer(k); + int ret = 0; + + bkey_fsck_err_on(bp.v->level > BTREE_MAX_DEPTH, + c, backpointer_level_bad, + "backpointer level bad: %u >= %u", + bp.v->level, BTREE_MAX_DEPTH); rcu_read_lock(); struct bch_dev *ca = bch2_dev_rcu_noerror(c, bp.k->p.inode); @@ -64,7 +70,6 @@ int bch2_backpointer_validate(struct bch_fs *c, struct bkey_s_c k, struct bpos bucket = bp_pos_to_bucket(ca, bp.k->p); struct bpos bp_pos = bucket_pos_to_bp_noerror(ca, bucket, bp.v->bucket_offset); rcu_read_unlock(); - int ret = 0; bkey_fsck_err_on((bp.v->bucket_offset >> MAX_EXTENT_COMPRESS_RATIO_SHIFT) >= ca->mi.bucket_size || !bpos_eq(bp.k->p, bp_pos), @@ -947,9 +952,13 @@ int bch2_check_extents_to_backpointers(struct bch_fs *c) static int check_one_backpointer(struct btree_trans *trans, struct bbpos start, struct bbpos end, - struct bkey_s_c_backpointer bp, + struct bkey_s_c bp_k, struct bkey_buf *last_flushed) { + if (bp_k.k->type != KEY_TYPE_backpointer) + return 0; + + struct bkey_s_c_backpointer bp = bkey_s_c_to_backpointer(bp_k); struct bch_fs *c = trans->c; struct btree_iter iter; struct bbpos pos = bp_to_bbpos(*bp.v); @@ -1004,9 +1013,7 @@ static int bch2_check_backpointers_to_extents_pass(struct btree_trans *trans, POS_MIN, BTREE_ITER_prefetch, k, NULL, NULL, BCH_TRANS_COMMIT_no_enospc, ({ progress_update_iter(trans, &progress, &iter, "backpointers_to_extents"); - check_one_backpointer(trans, start, end, - bkey_s_c_to_backpointer(k), - &last_flushed); + check_one_backpointer(trans, start, end, k, &last_flushed); })); bch2_bkey_buf_exit(&last_flushed, c); diff --git a/fs/bcachefs/btree_gc.c b/fs/bcachefs/btree_gc.c index 0ca3feeb42c8..81dcf9e512c0 100644 --- a/fs/bcachefs/btree_gc.c +++ b/fs/bcachefs/btree_gc.c @@ -182,7 +182,7 @@ static int set_node_max(struct bch_fs *c, struct btree *b, struct bpos new_max) bch2_btree_node_drop_keys_outside_node(b); mutex_lock(&c->btree_cache.lock); - bch2_btree_node_hash_remove(&c->btree_cache, b); + __bch2_btree_node_hash_remove(&c->btree_cache, b); bkey_copy(&b->key, &new->k_i); ret = __bch2_btree_node_hash_insert(&c->btree_cache, b); diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c index 6296a11ccb09..839d68802e42 100644 --- a/fs/bcachefs/btree_io.c +++ b/fs/bcachefs/btree_io.c @@ -733,11 +733,8 @@ static int validate_bset(struct bch_fs *c, struct bch_dev *ca, c, ca, b, i, NULL, bset_past_end_of_btree_node, "bset past end of btree node (offset %u len %u but written %zu)", - offset, sectors, ptr_written ?: btree_sectors(c))) { + offset, sectors, ptr_written ?: btree_sectors(c))) i->u64s = 0; - ret = 0; - goto out; - } btree_err_on(offset && !i->u64s, -BCH_ERR_btree_node_read_err_fixable, @@ -829,7 +826,6 @@ static int validate_bset(struct bch_fs *c, struct bch_dev *ca, BSET_BIG_ENDIAN(i), write, &bn->format); } -out: fsck_err: printbuf_exit(&buf2); printbuf_exit(&buf1); diff --git a/fs/bcachefs/btree_update_interior.c b/fs/bcachefs/btree_update_interior.c index 22740b605f0a..d596ef93239f 100644 --- a/fs/bcachefs/btree_update_interior.c +++ b/fs/bcachefs/btree_update_interior.c @@ -2398,7 +2398,8 @@ static int __bch2_btree_node_update_key(struct btree_trans *trans, if (new_hash) { mutex_lock(&c->btree_cache.lock); bch2_btree_node_hash_remove(&c->btree_cache, new_hash); - bch2_btree_node_hash_remove(&c->btree_cache, b); + + __bch2_btree_node_hash_remove(&c->btree_cache, b); bkey_copy(&b->key, new_key); ret = __bch2_btree_node_hash_insert(&c->btree_cache, b); diff --git a/fs/bcachefs/btree_write_buffer.c b/fs/bcachefs/btree_write_buffer.c index 3f56b584f8ec..1639c60dffa0 100644 --- a/fs/bcachefs/btree_write_buffer.c +++ b/fs/bcachefs/btree_write_buffer.c @@ -277,6 +277,10 @@ static int bch2_btree_write_buffer_flush_locked(struct btree_trans *trans) bool accounting_replay_done = test_bit(BCH_FS_accounting_replay_done, &c->flags); int ret = 0; + ret = bch2_journal_error(&c->journal); + if (ret) + return ret; + bch2_trans_unlock(trans); bch2_trans_begin(trans); @@ -491,7 +495,8 @@ static int fetch_wb_keys_from_journal(struct bch_fs *c, u64 seq) return ret; } -static int btree_write_buffer_flush_seq(struct btree_trans *trans, u64 seq) +static int btree_write_buffer_flush_seq(struct btree_trans *trans, u64 seq, + bool *did_work) { struct bch_fs *c = trans->c; struct btree_write_buffer *wb = &c->btree_write_buffer; @@ -502,6 +507,8 @@ static int btree_write_buffer_flush_seq(struct btree_trans *trans, u64 seq) fetch_from_journal_err = fetch_wb_keys_from_journal(c, seq); + *did_work |= wb->inc.keys.nr || wb->flushing.keys.nr; + /* * On memory allocation failure, bch2_btree_write_buffer_flush_locked() * is not guaranteed to empty wb->inc: @@ -521,17 +528,34 @@ static int bch2_btree_write_buffer_journal_flush(struct journal *j, struct journal_entry_pin *_pin, u64 seq) { struct bch_fs *c = container_of(j, struct bch_fs, journal); + bool did_work = false; - return bch2_trans_run(c, btree_write_buffer_flush_seq(trans, seq)); + return bch2_trans_run(c, btree_write_buffer_flush_seq(trans, seq, &did_work)); } int bch2_btree_write_buffer_flush_sync(struct btree_trans *trans) { struct bch_fs *c = trans->c; + bool did_work = false; trace_and_count(c, write_buffer_flush_sync, trans, _RET_IP_); - return btree_write_buffer_flush_seq(trans, journal_cur_seq(&c->journal)); + return btree_write_buffer_flush_seq(trans, journal_cur_seq(&c->journal), &did_work); +} + +/* + * The write buffer requires flushing when going RO: keys in the journal for the + * write buffer don't have a journal pin yet + */ +bool bch2_btree_write_buffer_flush_going_ro(struct bch_fs *c) +{ + if (bch2_journal_error(&c->journal)) + return false; + + bool did_work = false; + bch2_trans_run(c, btree_write_buffer_flush_seq(trans, + journal_cur_seq(&c->journal), &did_work)); + return did_work; } int bch2_btree_write_buffer_flush_nocheck_rw(struct btree_trans *trans) diff --git a/fs/bcachefs/btree_write_buffer.h b/fs/bcachefs/btree_write_buffer.h index 725e79654216..d535cea28bde 100644 --- a/fs/bcachefs/btree_write_buffer.h +++ b/fs/bcachefs/btree_write_buffer.h @@ -21,6 +21,7 @@ static inline bool bch2_btree_write_buffer_must_wait(struct bch_fs *c) struct btree_trans; int bch2_btree_write_buffer_flush_sync(struct btree_trans *); +bool bch2_btree_write_buffer_flush_going_ro(struct bch_fs *); int bch2_btree_write_buffer_flush_nocheck_rw(struct btree_trans *); int bch2_btree_write_buffer_tryflush(struct btree_trans *); diff --git a/fs/bcachefs/extents.c b/fs/bcachefs/extents.c index c4e91d123849..37e3d69bec06 100644 --- a/fs/bcachefs/extents.c +++ b/fs/bcachefs/extents.c @@ -1364,7 +1364,7 @@ void bch2_ptr_swab(struct bkey_s k) for (entry = ptrs.start; entry < ptrs.end; entry = extent_entry_next(entry)) { - switch (extent_entry_type(entry)) { + switch (__extent_entry_type(entry)) { case BCH_EXTENT_ENTRY_ptr: break; case BCH_EXTENT_ENTRY_crc32: @@ -1384,6 +1384,9 @@ void bch2_ptr_swab(struct bkey_s k) break; case BCH_EXTENT_ENTRY_rebalance: break; + default: + /* Bad entry type: will be caught by validate() */ + return; } } } diff --git a/fs/bcachefs/journal_io.c b/fs/bcachefs/journal_io.c index ccaafa90f4f4..fb35dd336331 100644 --- a/fs/bcachefs/journal_io.c +++ b/fs/bcachefs/journal_io.c @@ -708,6 +708,9 @@ static void journal_entry_dev_usage_to_text(struct printbuf *out, struct bch_fs container_of(entry, struct jset_entry_dev_usage, entry); unsigned i, nr_types = jset_entry_dev_usage_nr_types(u); + if (vstruct_bytes(entry) < sizeof(*u)) + return; + prt_printf(out, "dev=%u", le32_to_cpu(u->dev)); printbuf_indent_add(out, 2); diff --git a/fs/bcachefs/recovery_passes.c b/fs/bcachefs/recovery_passes.c index 4bbeac9e0526..dff589ddc984 100644 --- a/fs/bcachefs/recovery_passes.c +++ b/fs/bcachefs/recovery_passes.c @@ -27,6 +27,12 @@ const char * const bch2_recovery_passes[] = { NULL }; +/* Fake recovery pass, so that scan_for_btree_nodes isn't 0: */ +static int bch2_recovery_pass_empty(struct bch_fs *c) +{ + return 0; +} + static int bch2_set_may_go_rw(struct bch_fs *c) { struct journal_keys *keys = &c->journal_keys; diff --git a/fs/bcachefs/recovery_passes_types.h b/fs/bcachefs/recovery_passes_types.h index 9d96c06e365c..94dc20ca2065 100644 --- a/fs/bcachefs/recovery_passes_types.h +++ b/fs/bcachefs/recovery_passes_types.h @@ -13,6 +13,7 @@ * must never change: */ #define BCH_RECOVERY_PASSES() \ + x(recovery_pass_empty, 41, PASS_SILENT) \ x(scan_for_btree_nodes, 37, 0) \ x(check_topology, 4, 0) \ x(accounting_read, 39, PASS_ALWAYS) \ diff --git a/fs/bcachefs/sb-errors_format.h b/fs/bcachefs/sb-errors_format.h index 937275d061fe..9feb6739f77a 100644 --- a/fs/bcachefs/sb-errors_format.h +++ b/fs/bcachefs/sb-errors_format.h @@ -136,7 +136,9 @@ enum bch_fsck_flags { x(bucket_gens_nonzero_for_invalid_buckets, 122, FSCK_AUTOFIX) \ x(need_discard_freespace_key_to_invalid_dev_bucket, 123, 0) \ x(need_discard_freespace_key_bad, 124, 0) \ + x(discarding_bucket_not_in_need_discard_btree, 291, 0) \ x(backpointer_bucket_offset_wrong, 125, 0) \ + x(backpointer_level_bad, 294, 0) \ x(backpointer_to_missing_device, 126, 0) \ x(backpointer_to_missing_alloc, 127, 0) \ x(backpointer_to_missing_ptr, 128, 0) \ @@ -177,7 +179,9 @@ enum bch_fsck_flags { x(ptr_stripe_redundant, 163, 0) \ x(reservation_key_nr_replicas_invalid, 164, 0) \ x(reflink_v_refcount_wrong, 165, 0) \ + x(reflink_v_pos_bad, 292, 0) \ x(reflink_p_to_missing_reflink_v, 166, 0) \ + x(reflink_refcount_underflow, 293, 0) \ x(stripe_pos_bad, 167, 0) \ x(stripe_val_size_bad, 168, 0) \ x(stripe_csum_granularity_bad, 290, 0) \ @@ -302,7 +306,7 @@ enum bch_fsck_flags { x(accounting_key_replicas_devs_unsorted, 280, FSCK_AUTOFIX) \ x(accounting_key_version_0, 282, FSCK_AUTOFIX) \ x(logged_op_but_clean, 283, FSCK_AUTOFIX) \ - x(MAX, 291, 0) + x(MAX, 295, 0) enum bch_sb_error_id { #define x(t, n, ...) BCH_FSCK_ERR_##t = n, diff --git a/fs/bcachefs/sb-members.c b/fs/bcachefs/sb-members.c index fb08dd680dac..116131f95815 100644 --- a/fs/bcachefs/sb-members.c +++ b/fs/bcachefs/sb-members.c @@ -163,7 +163,7 @@ static int validate_member(struct printbuf *err, return -BCH_ERR_invalid_sb_members; } - if (m.btree_bitmap_shift >= 64) { + if (m.btree_bitmap_shift >= BCH_MI_BTREE_BITMAP_SHIFT_MAX) { prt_printf(err, "device %u: invalid btree_bitmap_shift %u", i, m.btree_bitmap_shift); return -BCH_ERR_invalid_sb_members; } @@ -450,7 +450,7 @@ static void __bch2_dev_btree_bitmap_mark(struct bch_sb_field_members_v2 *mi, uns m->btree_bitmap_shift += resize; } - BUG_ON(m->btree_bitmap_shift > 57); + BUG_ON(m->btree_bitmap_shift >= BCH_MI_BTREE_BITMAP_SHIFT_MAX); BUG_ON(end > 64ULL << m->btree_bitmap_shift); for (unsigned bit = start >> m->btree_bitmap_shift; diff --git a/fs/bcachefs/sb-members_format.h b/fs/bcachefs/sb-members_format.h index d727d2dfda08..2adf1221a440 100644 --- a/fs/bcachefs/sb-members_format.h +++ b/fs/bcachefs/sb-members_format.h @@ -66,6 +66,12 @@ struct bch_member { }; /* + * btree_allocated_bitmap can represent sector addresses of a u64: it itself has + * 64 elements, so 64 - ilog2(64) + */ +#define BCH_MI_BTREE_BITMAP_SHIFT_MAX 58 + +/* * This limit comes from the bucket_gens array - it's a single allocation, and * kernel allocation are limited to INT_MAX */ diff --git a/fs/bcachefs/super.c b/fs/bcachefs/super.c index 657fd3759e7b..a6ed9a0bf1c7 100644 --- a/fs/bcachefs/super.c +++ b/fs/bcachefs/super.c @@ -272,6 +272,7 @@ static void __bch2_fs_read_only(struct bch_fs *c) clean_passes++; if (bch2_btree_interior_updates_flush(c) || + bch2_btree_write_buffer_flush_going_ro(c) || bch2_journal_flush_all_pins(&c->journal) || bch2_btree_flush_all_writes(c) || seq != atomic64_read(&c->journal.seq)) { diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index 65d841d7142c..cab94d141f66 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -298,7 +298,7 @@ static int comp_refs(struct btrfs_delayed_ref_node *ref1, if (ref1->ref_root < ref2->ref_root) return -1; if (ref1->ref_root > ref2->ref_root) - return -1; + return 1; if (ref1->type == BTRFS_EXTENT_DATA_REF_KEY) ret = comp_data_refs(ref1, ref2); } diff --git a/fs/nilfs2/btnode.c b/fs/nilfs2/btnode.c index 57b4af5ad646..501ad7be5174 100644 --- a/fs/nilfs2/btnode.c +++ b/fs/nilfs2/btnode.c @@ -68,7 +68,6 @@ nilfs_btnode_create_block(struct address_space *btnc, __u64 blocknr) goto failed; } memset(bh->b_data, 0, i_blocksize(inode)); - bh->b_bdev = inode->i_sb->s_bdev; bh->b_blocknr = blocknr; set_buffer_mapped(bh); set_buffer_uptodate(bh); @@ -133,7 +132,6 @@ int nilfs_btnode_submit_block(struct address_space *btnc, __u64 blocknr, goto found; } set_buffer_mapped(bh); - bh->b_bdev = inode->i_sb->s_bdev; bh->b_blocknr = pblocknr; /* set block address for read */ bh->b_end_io = end_buffer_read_sync; get_bh(bh); diff --git a/fs/nilfs2/gcinode.c b/fs/nilfs2/gcinode.c index 1c9ae36a03ab..ace22253fed0 100644 --- a/fs/nilfs2/gcinode.c +++ b/fs/nilfs2/gcinode.c @@ -83,10 +83,8 @@ int nilfs_gccache_submit_read_data(struct inode *inode, sector_t blkoff, goto out; } - if (!buffer_mapped(bh)) { - bh->b_bdev = inode->i_sb->s_bdev; + if (!buffer_mapped(bh)) set_buffer_mapped(bh); - } bh->b_blocknr = pbn; bh->b_end_io = end_buffer_read_sync; get_bh(bh); diff --git a/fs/nilfs2/mdt.c b/fs/nilfs2/mdt.c index ceb7dc0b5bad..2db6350b5ac2 100644 --- a/fs/nilfs2/mdt.c +++ b/fs/nilfs2/mdt.c @@ -89,7 +89,6 @@ static int nilfs_mdt_create_block(struct inode *inode, unsigned long block, if (buffer_uptodate(bh)) goto failed_bh; - bh->b_bdev = sb->s_bdev; err = nilfs_mdt_insert_new_block(inode, block, bh, init_block); if (likely(!err)) { get_bh(bh); diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c index 10def4b55995..6dd8b854cd1f 100644 --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -39,7 +39,6 @@ static struct buffer_head *__nilfs_get_folio_block(struct folio *folio, first_block = (unsigned long)index << (PAGE_SHIFT - blkbits); bh = get_nth_bh(bh, block - first_block); - touch_buffer(bh); wait_on_buffer(bh); return bh; } @@ -64,6 +63,7 @@ struct buffer_head *nilfs_grab_buffer(struct inode *inode, folio_put(folio); return NULL; } + bh->b_bdev = inode->i_sb->s_bdev; return bh; } diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c index 3d404624bb96..c79b4291777f 100644 --- a/fs/ocfs2/super.c +++ b/fs/ocfs2/super.c @@ -2319,6 +2319,7 @@ static int ocfs2_verify_volume(struct ocfs2_dinode *di, struct ocfs2_blockcheck_stats *stats) { int status = -EAGAIN; + u32 blksz_bits; if (memcmp(di->i_signature, OCFS2_SUPER_BLOCK_SIGNATURE, strlen(OCFS2_SUPER_BLOCK_SIGNATURE)) == 0) { @@ -2333,11 +2334,15 @@ static int ocfs2_verify_volume(struct ocfs2_dinode *di, goto out; } status = -EINVAL; - if ((1 << le32_to_cpu(di->id2.i_super.s_blocksize_bits)) != blksz) { + /* Acceptable block sizes are 512 bytes, 1K, 2K and 4K. */ + blksz_bits = le32_to_cpu(di->id2.i_super.s_blocksize_bits); + if (blksz_bits < 9 || blksz_bits > 12) { mlog(ML_ERROR, "found superblock with incorrect block " - "size: found %u, should be %u\n", - 1 << le32_to_cpu(di->id2.i_super.s_blocksize_bits), - blksz); + "size bits: found %u, should be 9, 10, 11, or 12\n", + blksz_bits); + } else if ((1 << le32_to_cpu(blksz_bits)) != blksz) { + mlog(ML_ERROR, "found superblock with incorrect block " + "size: found %u, should be %u\n", 1 << blksz_bits, blksz); } else if (le16_to_cpu(di->id2.i_super.s_major_rev_level) != OCFS2_MAJOR_REV_LEVEL || le16_to_cpu(di->id2.i_super.s_minor_rev_level) != diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 34d2da05f2f1..e1b41554a5fb 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1760,8 +1760,9 @@ static inline int memcg_kmem_id(struct mem_cgroup *memcg) struct mem_cgroup *mem_cgroup_from_slab_obj(void *p); -static inline void count_objcg_event(struct obj_cgroup *objcg, - enum vm_event_item idx) +static inline void count_objcg_events(struct obj_cgroup *objcg, + enum vm_event_item idx, + unsigned long count) { struct mem_cgroup *memcg; @@ -1770,7 +1771,7 @@ static inline void count_objcg_event(struct obj_cgroup *objcg, rcu_read_lock(); memcg = obj_cgroup_memcg(objcg); - count_memcg_events(memcg, idx, 1); + count_memcg_events(memcg, idx, count); rcu_read_unlock(); } @@ -1825,8 +1826,9 @@ static inline struct mem_cgroup *mem_cgroup_from_slab_obj(void *p) return NULL; } -static inline void count_objcg_event(struct obj_cgroup *objcg, - enum vm_event_item idx) +static inline void count_objcg_events(struct obj_cgroup *objcg, + enum vm_event_item idx, + unsigned long count) { } diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h index b637ec14025f..cf4b11be3709 100644 --- a/include/linux/pm_domain.h +++ b/include/linux/pm_domain.h @@ -92,6 +92,10 @@ struct dev_pm_domain_list { * GENPD_FLAG_OPP_TABLE_FW: The genpd provider supports performance states, * but its corresponding OPP tables are not * described in DT, but are given directly by FW. + * + * GENPD_FLAG_DEV_NAME_FW: Instructs genpd to generate an unique device name + * using ida. It is used by genpd providers which + * get their genpd-names directly from FW. */ #define GENPD_FLAG_PM_CLK (1U << 0) #define GENPD_FLAG_IRQ_SAFE (1U << 1) @@ -101,6 +105,7 @@ struct dev_pm_domain_list { #define GENPD_FLAG_RPM_ALWAYS_ON (1U << 5) #define GENPD_FLAG_MIN_RESIDENCY (1U << 6) #define GENPD_FLAG_OPP_TABLE_FW (1U << 7) +#define GENPD_FLAG_DEV_NAME_FW (1U << 8) enum gpd_status { GENPD_STATE_ON = 0, /* PM domain is on */ @@ -163,6 +168,7 @@ struct generic_pm_domain { atomic_t sd_count; /* Number of subdomains with power "on" */ enum gpd_status status; /* Current state of the domain */ unsigned int device_count; /* Number of devices */ + unsigned int device_id; /* unique device id */ unsigned int suspended_count; /* System suspend device counter */ unsigned int prepared_count; /* Suspend counter of prepared devices */ unsigned int performance_state; /* Aggregated max performance state */ diff --git a/include/linux/sockptr.h b/include/linux/sockptr.h index fc5a206c4043..195debe2b1db 100644 --- a/include/linux/sockptr.h +++ b/include/linux/sockptr.h @@ -77,7 +77,9 @@ static inline int copy_safe_from_sockptr(void *dst, size_t ksize, { if (optlen < ksize) return -EINVAL; - return copy_from_sockptr(dst, optval, ksize); + if (copy_from_sockptr(dst, optval, ksize)) + return -EFAULT; + return 0; } static inline int copy_struct_from_sockptr(void *dst, size_t ksize, diff --git a/include/linux/tpm.h b/include/linux/tpm.h index 587b96b4418e..20a40ade8030 100644 --- a/include/linux/tpm.h +++ b/include/linux/tpm.h @@ -421,6 +421,7 @@ void tpm_buf_append_u32(struct tpm_buf *buf, const u32 value); u8 tpm_buf_read_u8(struct tpm_buf *buf, off_t *offset); u16 tpm_buf_read_u16(struct tpm_buf *buf, off_t *offset); u32 tpm_buf_read_u32(struct tpm_buf *buf, off_t *offset); +void tpm_buf_append_handle(struct tpm_chip *chip, struct tpm_buf *buf, u32 handle); /* * Check if TPM device is in the firmware upgrade mode. @@ -505,6 +506,8 @@ void tpm_buf_append_name(struct tpm_chip *chip, struct tpm_buf *buf, void tpm_buf_append_hmac_session(struct tpm_chip *chip, struct tpm_buf *buf, u8 attributes, u8 *passphrase, int passphraselen); +void tpm_buf_append_auth(struct tpm_chip *chip, struct tpm_buf *buf, + u8 attributes, u8 *passphrase, int passphraselen); static inline void tpm_buf_append_hmac_session_opt(struct tpm_chip *chip, struct tpm_buf *buf, u8 attributes, diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index aed952d04132..f70d0958095c 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -134,6 +134,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, #ifdef CONFIG_SWAP SWAP_RA, SWAP_RA_HIT, + SWPIN_ZERO, + SWPOUT_ZERO, #ifdef CONFIG_KSM KSM_SWPIN_COPY, #endif diff --git a/include/net/bond_options.h b/include/net/bond_options.h index 473a0147769e..18687ccf0638 100644 --- a/include/net/bond_options.h +++ b/include/net/bond_options.h @@ -161,5 +161,7 @@ void bond_option_arp_ip_targets_clear(struct bonding *bond); #if IS_ENABLED(CONFIG_IPV6) void bond_option_ns_ip6_targets_clear(struct bonding *bond); #endif +void bond_slave_ns_maddrs_add(struct bonding *bond, struct slave *slave); +void bond_slave_ns_maddrs_del(struct bonding *bond, struct slave *slave); #endif /* _NET_BOND_OPTIONS_H */ diff --git a/include/net/tls.h b/include/net/tls.h index 3a33924db2bc..61fef2880114 100644 --- a/include/net/tls.h +++ b/include/net/tls.h @@ -390,8 +390,12 @@ tls_offload_ctx_tx(const struct tls_context *tls_ctx) static inline bool tls_sw_has_ctx_tx(const struct sock *sk) { - struct tls_context *ctx = tls_get_ctx(sk); + struct tls_context *ctx; + + if (!sk_is_inet(sk) || !inet_test_bit(IS_ICSK, sk)) + return false; + ctx = tls_get_ctx(sk); if (!ctx) return false; return !!tls_sw_ctx_tx(ctx); @@ -399,8 +403,12 @@ static inline bool tls_sw_has_ctx_tx(const struct sock *sk) static inline bool tls_sw_has_ctx_rx(const struct sock *sk) { - struct tls_context *ctx = tls_get_ctx(sk); + struct tls_context *ctx; + + if (!sk_is_inet(sk) || !inet_test_bit(IS_ICSK, sk)) + return false; + ctx = tls_get_ctx(sk); if (!ctx) return false; return !!tls_sw_ctx_rx(ctx); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 719e0ed1e976..a1c353a62c56 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5920,12 +5920,15 @@ static void prev_balance(struct rq *rq, struct task_struct *prev, #ifdef CONFIG_SCHED_CLASS_EXT /* - * SCX requires a balance() call before every pick_next_task() including - * when waking up from SCHED_IDLE. If @start_class is below SCX, start - * from SCX instead. + * SCX requires a balance() call before every pick_task() including when + * waking up from SCHED_IDLE. If @start_class is below SCX, start from + * SCX instead. Also, set a flag to detect missing balance() call. */ - if (scx_enabled() && sched_class_above(&ext_sched_class, start_class)) - start_class = &ext_sched_class; + if (scx_enabled()) { + rq->scx.flags |= SCX_RQ_BAL_PENDING; + if (sched_class_above(&ext_sched_class, start_class)) + start_class = &ext_sched_class; + } #endif /* diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index b5f4b1a5ae98..751d73d500e5 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -2634,7 +2634,7 @@ static int balance_one(struct rq *rq, struct task_struct *prev) lockdep_assert_rq_held(rq); rq->scx.flags |= SCX_RQ_IN_BALANCE; - rq->scx.flags &= ~SCX_RQ_BAL_KEEP; + rq->scx.flags &= ~(SCX_RQ_BAL_PENDING | SCX_RQ_BAL_KEEP); if (static_branch_unlikely(&scx_ops_cpu_preempt) && unlikely(rq->scx.cpu_released)) { @@ -2645,7 +2645,7 @@ static int balance_one(struct rq *rq, struct task_struct *prev) * emitted in scx_next_task_picked(). */ if (SCX_HAS_OP(cpu_acquire)) - SCX_CALL_OP(0, cpu_acquire, cpu_of(rq), NULL); + SCX_CALL_OP(SCX_KF_REST, cpu_acquire, cpu_of(rq), NULL); rq->scx.cpu_released = false; } @@ -2948,12 +2948,11 @@ static struct task_struct *pick_task_scx(struct rq *rq) { struct task_struct *prev = rq->curr; struct task_struct *p; + bool prev_on_scx = prev->sched_class == &ext_sched_class; + bool keep_prev = rq->scx.flags & SCX_RQ_BAL_KEEP; + bool kick_idle = false; /* - * If balance_scx() is telling us to keep running @prev, replenish slice - * if necessary and keep running @prev. Otherwise, pop the first one - * from the local DSQ. - * * WORKAROUND: * * %SCX_RQ_BAL_KEEP should be set iff $prev is on SCX as it must just @@ -2962,22 +2961,41 @@ static struct task_struct *pick_task_scx(struct rq *rq) * which then ends up calling pick_task_scx() without preceding * balance_scx(). * - * For now, ignore cases where $prev is not on SCX. This isn't great and - * can theoretically lead to stalls. However, for switch_all cases, this - * happens only while a BPF scheduler is being loaded or unloaded, and, - * for partial cases, fair will likely keep triggering this CPU. + * Keep running @prev if possible and avoid stalling from entering idle + * without balancing. * - * Once fair is fixed, restore WARN_ON_ONCE(). + * Once fair is fixed, remove the workaround and trigger WARN_ON_ONCE() + * if pick_task_scx() is called without preceding balance_scx(). */ - if ((rq->scx.flags & SCX_RQ_BAL_KEEP) && - prev->sched_class == &ext_sched_class) { + if (unlikely(rq->scx.flags & SCX_RQ_BAL_PENDING)) { + if (prev_on_scx) { + keep_prev = true; + } else { + keep_prev = false; + kick_idle = true; + } + } else if (unlikely(keep_prev && !prev_on_scx)) { + /* only allowed during transitions */ + WARN_ON_ONCE(scx_ops_enable_state() == SCX_OPS_ENABLED); + keep_prev = false; + } + + /* + * If balance_scx() is telling us to keep running @prev, replenish slice + * if necessary and keep running @prev. Otherwise, pop the first one + * from the local DSQ. + */ + if (keep_prev) { p = prev; if (!p->scx.slice) p->scx.slice = SCX_SLICE_DFL; } else { p = first_local_task(rq); - if (!p) + if (!p) { + if (kick_idle) + scx_bpf_kick_cpu(cpu_of(rq), SCX_KICK_IDLE); return NULL; + } if (unlikely(!p->scx.slice)) { if (!scx_rq_bypassing(rq) && !scx_warned_zero_slice) { @@ -4979,7 +4997,7 @@ static int scx_ops_enable(struct sched_ext_ops *ops, struct bpf_link *link) if (!cpumask_equal(housekeeping_cpumask(HK_TYPE_DOMAIN), cpu_possible_mask)) { - pr_err("sched_ext: Not compatible with \"isolcpus=\" domain isolation"); + pr_err("sched_ext: Not compatible with \"isolcpus=\" domain isolation\n"); return -EINVAL; } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 6c54a57275cc..c03b3d7b320e 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -751,8 +751,9 @@ enum scx_rq_flags { */ SCX_RQ_ONLINE = 1 << 0, SCX_RQ_CAN_STOP_TICK = 1 << 1, - SCX_RQ_BAL_KEEP = 1 << 2, /* balance decided to keep current */ - SCX_RQ_BYPASSING = 1 << 3, + SCX_RQ_BAL_PENDING = 1 << 2, /* balance hasn't run yet */ + SCX_RQ_BAL_KEEP = 1 << 3, /* balance decided to keep current */ + SCX_RQ_BYPASSING = 1 << 4, SCX_RQ_IN_WAKEUP = 1 << 16, SCX_RQ_IN_BALANCE = 1 << 17, @@ -2273,20 +2273,57 @@ struct page *get_dump_page(unsigned long addr) #endif /* CONFIG_ELF_CORE */ #ifdef CONFIG_MIGRATION + +/* + * An array of either pages or folios ("pofs"). Although it may seem tempting to + * avoid this complication, by simply interpreting a list of folios as a list of + * pages, that approach won't work in the longer term, because eventually the + * layouts of struct page and struct folio will become completely different. + * Furthermore, this pof approach avoids excessive page_folio() calls. + */ +struct pages_or_folios { + union { + struct page **pages; + struct folio **folios; + void **entries; + }; + bool has_folios; + long nr_entries; +}; + +static struct folio *pofs_get_folio(struct pages_or_folios *pofs, long i) +{ + if (pofs->has_folios) + return pofs->folios[i]; + return page_folio(pofs->pages[i]); +} + +static void pofs_clear_entry(struct pages_or_folios *pofs, long i) +{ + pofs->entries[i] = NULL; +} + +static void pofs_unpin(struct pages_or_folios *pofs) +{ + if (pofs->has_folios) + unpin_folios(pofs->folios, pofs->nr_entries); + else + unpin_user_pages(pofs->pages, pofs->nr_entries); +} + /* * Returns the number of collected folios. Return value is always >= 0. */ static unsigned long collect_longterm_unpinnable_folios( - struct list_head *movable_folio_list, - unsigned long nr_folios, - struct folio **folios) + struct list_head *movable_folio_list, + struct pages_or_folios *pofs) { unsigned long i, collected = 0; struct folio *prev_folio = NULL; bool drain_allow = true; - for (i = 0; i < nr_folios; i++) { - struct folio *folio = folios[i]; + for (i = 0; i < pofs->nr_entries; i++) { + struct folio *folio = pofs_get_folio(pofs, i); if (folio == prev_folio) continue; @@ -2327,16 +2364,15 @@ static unsigned long collect_longterm_unpinnable_folios( * Returns -EAGAIN if all folios were successfully migrated or -errno for * failure (or partial success). */ -static int migrate_longterm_unpinnable_folios( - struct list_head *movable_folio_list, - unsigned long nr_folios, - struct folio **folios) +static int +migrate_longterm_unpinnable_folios(struct list_head *movable_folio_list, + struct pages_or_folios *pofs) { int ret; unsigned long i; - for (i = 0; i < nr_folios; i++) { - struct folio *folio = folios[i]; + for (i = 0; i < pofs->nr_entries; i++) { + struct folio *folio = pofs_get_folio(pofs, i); if (folio_is_device_coherent(folio)) { /* @@ -2344,7 +2380,7 @@ static int migrate_longterm_unpinnable_folios( * convert the pin on the source folio to a normal * reference. */ - folios[i] = NULL; + pofs_clear_entry(pofs, i); folio_get(folio); gup_put_folio(folio, 1, FOLL_PIN); @@ -2363,8 +2399,8 @@ static int migrate_longterm_unpinnable_folios( * calling folio_isolate_lru() which takes a reference so the * folio won't be freed if it's migrating. */ - unpin_folio(folios[i]); - folios[i] = NULL; + unpin_folio(folio); + pofs_clear_entry(pofs, i); } if (!list_empty(movable_folio_list)) { @@ -2387,12 +2423,26 @@ static int migrate_longterm_unpinnable_folios( return -EAGAIN; err: - unpin_folios(folios, nr_folios); + pofs_unpin(pofs); putback_movable_pages(movable_folio_list); return ret; } +static long +check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs) +{ + LIST_HEAD(movable_folio_list); + unsigned long collected; + + collected = collect_longterm_unpinnable_folios(&movable_folio_list, + pofs); + if (!collected) + return 0; + + return migrate_longterm_unpinnable_folios(&movable_folio_list, pofs); +} + /* * Check whether all folios are *allowed* to be pinned indefinitely (long term). * Rather confusingly, all folios in the range are required to be pinned via @@ -2417,16 +2467,13 @@ err: static long check_and_migrate_movable_folios(unsigned long nr_folios, struct folio **folios) { - unsigned long collected; - LIST_HEAD(movable_folio_list); + struct pages_or_folios pofs = { + .folios = folios, + .has_folios = true, + .nr_entries = nr_folios, + }; - collected = collect_longterm_unpinnable_folios(&movable_folio_list, - nr_folios, folios); - if (!collected) - return 0; - - return migrate_longterm_unpinnable_folios(&movable_folio_list, - nr_folios, folios); + return check_and_migrate_movable_pages_or_folios(&pofs); } /* @@ -2436,22 +2483,13 @@ static long check_and_migrate_movable_folios(unsigned long nr_folios, static long check_and_migrate_movable_pages(unsigned long nr_pages, struct page **pages) { - struct folio **folios; - long i, ret; + struct pages_or_folios pofs = { + .pages = pages, + .has_folios = false, + .nr_entries = nr_pages, + }; - folios = kmalloc_array(nr_pages, sizeof(*folios), GFP_KERNEL); - if (!folios) { - unpin_user_pages(pages, nr_pages); - return -ENOMEM; - } - - for (i = 0; i < nr_pages; i++) - folios[i] = page_folio(pages[i]); - - ret = check_and_migrate_movable_folios(nr_pages, folios); - - kfree(folios); - return ret; + return check_and_migrate_movable_pages_or_folios(&pofs); } #else static long check_and_migrate_movable_pages(unsigned long nr_pages, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 03fd4bc39ea1..5734d5d5060f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3790,7 +3790,9 @@ next: * in the case it was underused, then consider it used and * don't add it back to split_queue. */ - if (!did_split && !folio_test_partially_mapped(folio)) { + if (did_split) { + ; /* folio already removed from list */ + } else if (!folio_test_partially_mapped(folio)) { list_del_init(&folio->_deferred_list); removed++; } else { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 06df2af97415..53db98d2c4a1 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -431,6 +431,10 @@ static const unsigned int memcg_vm_event_stat[] = { PGDEACTIVATE, PGLAZYFREE, PGLAZYFREED, +#ifdef CONFIG_SWAP + SWPIN_ZERO, + SWPOUT_ZERO, +#endif #ifdef CONFIG_ZSWAP ZSWPIN, ZSWPOUT, diff --git a/mm/nommu.c b/mm/nommu.c index e9b5f527ab5b..9cb6e99215e2 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -573,7 +573,7 @@ static int delete_vma_from_mm(struct vm_area_struct *vma) VMA_ITERATOR(vmi, vma->vm_mm, vma->vm_start); vma_iter_config(&vmi, vma->vm_start, vma->vm_end); - if (vma_iter_prealloc(&vmi, vma)) { + if (vma_iter_prealloc(&vmi, NULL)) { pr_warn("Allocation of vma tree for process %d failed\n", current->pid); return -ENOMEM; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c6c7bb3ea71b..216fbbfbedcf 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1048,6 +1048,7 @@ __always_inline bool free_pages_prepare(struct page *page, bool skip_kasan_poison = should_skip_kasan_poison(page); bool init = want_init_on_free(); bool compound = PageCompound(page); + struct folio *folio = page_folio(page); VM_BUG_ON_PAGE(PageTail(page), page); @@ -1057,6 +1058,20 @@ __always_inline bool free_pages_prepare(struct page *page, if (memcg_kmem_online() && PageMemcgKmem(page)) __memcg_kmem_uncharge_page(page, order); + /* + * In rare cases, when truncation or holepunching raced with + * munlock after VM_LOCKED was cleared, Mlocked may still be + * found set here. This does not indicate a problem, unless + * "unevictable_pgs_cleared" appears worryingly large. + */ + if (unlikely(folio_test_mlocked(folio))) { + long nr_pages = folio_nr_pages(folio); + + __folio_clear_mlocked(folio); + zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages); + count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages); + } + if (unlikely(PageHWPoison(page)) && !order) { /* Do not let hwpoison pages hit pcplists/buddy */ reset_page_owner(page, order); diff --git a/mm/page_io.c b/mm/page_io.c index 69536a2b3c13..01749b99fb54 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -204,7 +204,9 @@ static bool is_folio_zero_filled(struct folio *folio) static void swap_zeromap_folio_set(struct folio *folio) { + struct obj_cgroup *objcg = get_obj_cgroup_from_folio(folio); struct swap_info_struct *sis = swp_swap_info(folio->swap); + int nr_pages = folio_nr_pages(folio); swp_entry_t entry; unsigned int i; @@ -212,6 +214,12 @@ static void swap_zeromap_folio_set(struct folio *folio) entry = page_swap_entry(folio_page(folio, i)); set_bit(swp_offset(entry), sis->zeromap); } + + count_vm_events(SWPOUT_ZERO, nr_pages); + if (objcg) { + count_objcg_events(objcg, SWPOUT_ZERO, nr_pages); + obj_cgroup_put(objcg); + } } static void swap_zeromap_folio_clear(struct folio *folio) @@ -503,6 +511,7 @@ static void sio_read_complete(struct kiocb *iocb, long ret) static bool swap_read_folio_zeromap(struct folio *folio) { int nr_pages = folio_nr_pages(folio); + struct obj_cgroup *objcg; bool is_zeromap; /* @@ -517,6 +526,13 @@ static bool swap_read_folio_zeromap(struct folio *folio) if (!is_zeromap) return false; + objcg = get_obj_cgroup_from_folio(folio); + count_vm_events(SWPIN_ZERO, nr_pages); + if (objcg) { + count_objcg_events(objcg, SWPIN_ZERO, nr_pages); + obj_cgroup_put(objcg); + } + folio_zero_range(folio, 0, folio_size(folio)); folio_mark_uptodate(folio); return true; diff --git a/mm/swap.c b/mm/swap.c index b8e3259ea2c4..59f30a981c6f 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -78,20 +78,6 @@ static void __page_cache_release(struct folio *folio, struct lruvec **lruvecp, lruvec_del_folio(*lruvecp, folio); __folio_clear_lru_flags(folio); } - - /* - * In rare cases, when truncation or holepunching raced with - * munlock after VM_LOCKED was cleared, Mlocked may still be - * found set here. This does not indicate a problem, unless - * "unevictable_pgs_cleared" appears worryingly large. - */ - if (unlikely(folio_test_mlocked(folio))) { - long nr_pages = folio_nr_pages(folio); - - __folio_clear_mlocked(folio); - zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages); - count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages); - } } /* diff --git a/mm/swapfile.c b/mm/swapfile.c index 46bd4b1a3c07..9c85bd46ab7f 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -929,7 +929,7 @@ static void swap_range_alloc(struct swap_info_struct *si, unsigned long offset, si->highest_bit = 0; del_from_avail_list(si); - if (vm_swap_full()) + if (si->cluster_info && vm_swap_full()) schedule_work(&si->reclaim_work); } } diff --git a/mm/vmstat.c b/mm/vmstat.c index b5a4cea423e1..ac6a5aa34eab 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1415,6 +1415,8 @@ const char * const vmstat_text[] = { #ifdef CONFIG_SWAP "swap_ra", "swap_ra_hit", + "swpin_zero", + "swpout_zero", #ifdef CONFIG_KSM "ksm_swpin_copy", #endif diff --git a/mm/zswap.c b/mm/zswap.c index 162013952074..0030ce8fecfc 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1053,7 +1053,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry, count_vm_event(ZSWPWB); if (entry->objcg) - count_objcg_event(entry->objcg, ZSWPWB); + count_objcg_events(entry->objcg, ZSWPWB, 1); zswap_entry_free(entry); @@ -1483,7 +1483,7 @@ bool zswap_store(struct folio *folio) if (objcg) { obj_cgroup_charge_zswap(objcg, entry->length); - count_objcg_event(objcg, ZSWPOUT); + count_objcg_events(objcg, ZSWPOUT, 1); } /* @@ -1577,7 +1577,7 @@ bool zswap_load(struct folio *folio) count_vm_event(ZSWPIN); if (entry->objcg) - count_objcg_event(entry->objcg, ZSWPIN); + count_objcg_events(entry->objcg, ZSWPIN, 1); if (swapcache) { zswap_entry_free(entry); diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index 96d097b21d13..0ac354db8177 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -3788,8 +3788,6 @@ static void hci_acldata_packet(struct hci_dev *hdev, struct sk_buff *skb) hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, handle); - if (conn && hci_dev_test_flag(hdev, HCI_MGMT)) - mgmt_device_connected(hdev, conn, NULL, 0); hci_dev_unlock(hdev); if (conn) { diff --git a/net/core/filter.c b/net/core/filter.c index e31ee8be2de0..fb56567c551e 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2249,7 +2249,7 @@ static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb, rcu_read_unlock(); return ret; } - rcu_read_unlock_bh(); + rcu_read_unlock(); if (dst) IP6_INC_STATS(net, ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES); out_drop: diff --git a/net/core/sock.c b/net/core/sock.c index 039be95c40cf..da50df485090 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1052,32 +1052,34 @@ static int sock_reserve_memory(struct sock *sk, int bytes) #ifdef CONFIG_PAGE_POOL -/* This is the number of tokens that the user can SO_DEVMEM_DONTNEED in - * 1 syscall. The limit exists to limit the amount of memory the kernel - * allocates to copy these tokens. +/* This is the number of tokens and frags that the user can SO_DEVMEM_DONTNEED + * in 1 syscall. The limit exists to limit the amount of memory the kernel + * allocates to copy these tokens, and to prevent looping over the frags for + * too long. */ #define MAX_DONTNEED_TOKENS 128 +#define MAX_DONTNEED_FRAGS 1024 static noinline_for_stack int sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optlen) { unsigned int num_tokens, i, j, k, netmem_num = 0; struct dmabuf_token *tokens; + int ret = 0, num_frags = 0; netmem_ref netmems[16]; - int ret = 0; if (!sk_is_tcp(sk)) return -EBADF; - if (optlen % sizeof(struct dmabuf_token) || + if (optlen % sizeof(*tokens) || optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS) return -EINVAL; - tokens = kvmalloc_array(optlen, sizeof(*tokens), GFP_KERNEL); + num_tokens = optlen / sizeof(*tokens); + tokens = kvmalloc_array(num_tokens, sizeof(*tokens), GFP_KERNEL); if (!tokens) return -ENOMEM; - num_tokens = optlen / sizeof(struct dmabuf_token); if (copy_from_sockptr(tokens, optval, optlen)) { kvfree(tokens); return -EFAULT; @@ -1086,24 +1088,28 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optlen) xa_lock_bh(&sk->sk_user_frags); for (i = 0; i < num_tokens; i++) { for (j = 0; j < tokens[i].token_count; j++) { + if (++num_frags > MAX_DONTNEED_FRAGS) + goto frag_limit_reached; + netmem_ref netmem = (__force netmem_ref)__xa_erase( &sk->sk_user_frags, tokens[i].token_start + j); - if (netmem && - !WARN_ON_ONCE(!netmem_is_net_iov(netmem))) { - netmems[netmem_num++] = netmem; - if (netmem_num == ARRAY_SIZE(netmems)) { - xa_unlock_bh(&sk->sk_user_frags); - for (k = 0; k < netmem_num; k++) - WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); - netmem_num = 0; - xa_lock_bh(&sk->sk_user_frags); - } - ret++; + if (!netmem || WARN_ON_ONCE(!netmem_is_net_iov(netmem))) + continue; + + netmems[netmem_num++] = netmem; + if (netmem_num == ARRAY_SIZE(netmems)) { + xa_unlock_bh(&sk->sk_user_frags); + for (k = 0; k < netmem_num; k++) + WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); + netmem_num = 0; + xa_lock_bh(&sk->sk_user_frags); } + ret++; } } +frag_limit_reached: xa_unlock_bh(&sk->sk_user_frags); for (k = 0; k < netmem_num; k++) WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index da5dba120bc9..d6649246188d 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -618,7 +618,7 @@ static int dccp_v6_do_rcv(struct sock *sk, struct sk_buff *skb) by tcp. Feel free to propose better solution. --ANK (980728) */ - if (np->rxopt.all) + if (np->rxopt.all && sk->sk_state != DCCP_LISTEN) opt_skb = skb_clone_and_charge_r(skb, sk); if (sk->sk_state == DCCP_OPEN) { /* Fast path */ diff --git a/net/ipv4/ipmr_base.c b/net/ipv4/ipmr_base.c index 271dc03fc6db..f0af12a2f70b 100644 --- a/net/ipv4/ipmr_base.c +++ b/net/ipv4/ipmr_base.c @@ -310,7 +310,8 @@ int mr_table_dump(struct mr_table *mrt, struct sk_buff *skb, if (filter->filter_set) flags |= NLM_F_DUMP_FILTERED; - list_for_each_entry_rcu(mfc, &mrt->mfc_cache_list, list) { + list_for_each_entry_rcu(mfc, &mrt->mfc_cache_list, list, + lockdep_rtnl_is_held()) { if (e < s_e) goto next_entry; if (filter->dev && diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index d71ab4e1efe1..c9de5ef8f267 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1618,7 +1618,7 @@ int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb) by tcp. Feel free to propose better solution. --ANK (980728) */ - if (np->rxopt.all) + if (np->rxopt.all && sk->sk_state != TCP_LISTEN) opt_skb = skb_clone_and_charge_r(skb, sk); if (sk->sk_state == TCP_ESTABLISHED) { /* Fast path */ @@ -1656,8 +1656,6 @@ int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb) if (reason) goto reset; } - if (opt_skb) - __kfree_skb(opt_skb); return 0; } } else diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index db586a5b3866..45a2b5f05d38 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -524,7 +524,8 @@ __lookup_addr(struct pm_nl_pernet *pernet, const struct mptcp_addr_info *info) { struct mptcp_pm_addr_entry *entry; - list_for_each_entry(entry, &pernet->local_addr_list, list) { + list_for_each_entry_rcu(entry, &pernet->local_addr_list, list, + lockdep_is_held(&pernet->lock)) { if (mptcp_addresses_equal(&entry->addr, info, entry->addr.port)) return entry; } diff --git a/net/mptcp/pm_userspace.c b/net/mptcp/pm_userspace.c index 56dfea9862b7..e35178f5205f 100644 --- a/net/mptcp/pm_userspace.c +++ b/net/mptcp/pm_userspace.c @@ -308,14 +308,17 @@ int mptcp_pm_nl_remove_doit(struct sk_buff *skb, struct genl_info *info) lock_sock(sk); + spin_lock_bh(&msk->pm.lock); match = mptcp_userspace_pm_lookup_addr_by_id(msk, id_val); if (!match) { GENL_SET_ERR_MSG(info, "address with specified id not found"); + spin_unlock_bh(&msk->pm.lock); release_sock(sk); goto out; } list_move(&match->list, &free_list); + spin_unlock_bh(&msk->pm.lock); mptcp_pm_remove_addrs(msk, &free_list); @@ -560,6 +563,7 @@ int mptcp_userspace_pm_set_flags(struct sk_buff *skb, struct genl_info *info) struct nlattr *token = info->attrs[MPTCP_PM_ATTR_TOKEN]; struct nlattr *attr = info->attrs[MPTCP_PM_ATTR_ADDR]; struct net *net = sock_net(skb->sk); + struct mptcp_pm_addr_entry *entry; struct mptcp_sock *msk; int ret = -EINVAL; struct sock *sk; @@ -601,6 +605,17 @@ int mptcp_userspace_pm_set_flags(struct sk_buff *skb, struct genl_info *info) if (loc.flags & MPTCP_PM_ADDR_FLAG_BACKUP) bkup = 1; + spin_lock_bh(&msk->pm.lock); + list_for_each_entry(entry, &msk->pm.userspace_pm_local_addr_list, list) { + if (mptcp_addresses_equal(&entry->addr, &loc.addr, false)) { + if (bkup) + entry->flags |= MPTCP_PM_ADDR_FLAG_BACKUP; + else + entry->flags &= ~MPTCP_PM_ADDR_FLAG_BACKUP; + } + } + spin_unlock_bh(&msk->pm.lock); + lock_sock(sk); ret = mptcp_pm_nl_mp_prio_send_ack(msk, &loc.addr, &rem.addr, bkup); release_sock(sk); diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index d263091659e0..48d480982b78 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2082,7 +2082,8 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied) slow = lock_sock_fast(ssk); WRITE_ONCE(ssk->sk_rcvbuf, rcvbuf); WRITE_ONCE(tcp_sk(ssk)->window_clamp, window_clamp); - tcp_cleanup_rbuf(ssk, 1); + if (tcp_can_send_ack(ssk)) + tcp_cleanup_rbuf(ssk, 1); unlock_sock_fast(ssk, slow); } } @@ -2205,7 +2206,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, cmsg_flags = MPTCP_CMSG_INQ; while (copied < len) { - int bytes_read; + int err, bytes_read; bytes_read = __mptcp_recvmsg_mskq(msk, msg, len - copied, flags, &tss, &cmsg_flags); if (unlikely(bytes_read < 0)) { @@ -2267,9 +2268,16 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, } pr_debug("block timeout %ld\n", timeo); - sk_wait_data(sk, &timeo, NULL); + mptcp_rcv_space_adjust(msk, copied); + err = sk_wait_data(sk, &timeo, NULL); + if (err < 0) { + err = copied ? : err; + goto out_err; + } } + mptcp_rcv_space_adjust(msk, copied); + out_err: if (cmsg_flags && copied >= 0) { if (cmsg_flags & MPTCP_CMSG_TS) @@ -2285,8 +2293,6 @@ out_err: pr_debug("msk=%p rx queue empty=%d:%d copied=%d\n", msk, skb_queue_empty_lockless(&sk->sk_receive_queue), skb_queue_empty(&msk->receive_queue), copied); - if (!(flags & MSG_PEEK)) - mptcp_rcv_space_adjust(msk, copied); release_sock(sk); return copied; diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 0a9287fadb47..f84aad420d44 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -393,15 +393,6 @@ static void netlink_skb_set_owner_r(struct sk_buff *skb, struct sock *sk) static void netlink_sock_destruct(struct sock *sk) { - struct netlink_sock *nlk = nlk_sk(sk); - - if (nlk->cb_running) { - if (nlk->cb.done) - nlk->cb.done(&nlk->cb); - module_put(nlk->cb.module); - kfree_skb(nlk->cb.skb); - } - skb_queue_purge(&sk->sk_receive_queue); if (!sock_flag(sk, SOCK_DEAD)) { @@ -414,14 +405,6 @@ static void netlink_sock_destruct(struct sock *sk) WARN_ON(nlk_sk(sk)->groups); } -static void netlink_sock_destruct_work(struct work_struct *work) -{ - struct netlink_sock *nlk = container_of(work, struct netlink_sock, - work); - - sk_free(&nlk->sk); -} - /* This lock without WQ_FLAG_EXCLUSIVE is good on UP and it is _very_ bad on * SMP. Look, when several writers sleep and reader wakes them up, all but one * immediately hit write lock and grab all the cpus. Exclusive sleep solves @@ -731,12 +714,6 @@ static void deferred_put_nlk_sk(struct rcu_head *head) if (!refcount_dec_and_test(&sk->sk_refcnt)) return; - if (nlk->cb_running && nlk->cb.done) { - INIT_WORK(&nlk->work, netlink_sock_destruct_work); - schedule_work(&nlk->work); - return; - } - sk_free(sk); } @@ -788,6 +765,14 @@ static int netlink_release(struct socket *sock) NETLINK_URELEASE, &n); } + /* Terminate any outstanding dump */ + if (nlk->cb_running) { + if (nlk->cb.done) + nlk->cb.done(&nlk->cb); + module_put(nlk->cb.module); + kfree_skb(nlk->cb.skb); + } + module_put(nlk->module); if (netlink_is_kernel(sk)) { diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h index 5b0e4e62ab8b..778a3809361f 100644 --- a/net/netlink/af_netlink.h +++ b/net/netlink/af_netlink.h @@ -4,7 +4,6 @@ #include <linux/rhashtable.h> #include <linux/atomic.h> -#include <linux/workqueue.h> #include <net/sock.h> /* flags */ @@ -50,7 +49,6 @@ struct netlink_sock { struct rhash_head node; struct rcu_head rcu; - struct work_struct work; }; static inline struct netlink_sock *nlk_sk(struct sock *sk) diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 9412d88a99bc..d3a03c57545b 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -92,6 +92,16 @@ struct tc_u_common { long knodes; }; +static u32 handle2id(u32 h) +{ + return ((h & 0x80000000) ? ((h >> 20) & 0x7FF) : h); +} + +static u32 id2handle(u32 id) +{ + return (id | 0x800U) << 20; +} + static inline unsigned int u32_hash_fold(__be32 key, const struct tc_u32_sel *sel, u8 fshift) @@ -310,7 +320,7 @@ static u32 gen_new_htid(struct tc_u_common *tp_c, struct tc_u_hnode *ptr) int id = idr_alloc_cyclic(&tp_c->handle_idr, ptr, 1, 0x7FF, GFP_KERNEL); if (id < 0) return 0; - return (id | 0x800U) << 20; + return id2handle(id); } static struct hlist_head *tc_u_common_hash; @@ -360,7 +370,7 @@ static int u32_init(struct tcf_proto *tp) return -ENOBUFS; refcount_set(&root_ht->refcnt, 1); - root_ht->handle = tp_c ? gen_new_htid(tp_c, root_ht) : 0x80000000; + root_ht->handle = tp_c ? gen_new_htid(tp_c, root_ht) : id2handle(0); root_ht->prio = tp->prio; root_ht->is_root = true; idr_init(&root_ht->handle_idr); @@ -612,7 +622,7 @@ static int u32_destroy_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht, if (phn == ht) { u32_clear_hw_hnode(tp, ht, extack); idr_destroy(&ht->handle_idr); - idr_remove(&tp_c->handle_idr, ht->handle); + idr_remove(&tp_c->handle_idr, handle2id(ht->handle)); RCU_INIT_POINTER(*hn, ht->next); kfree_rcu(ht, rcu); return 0; @@ -989,7 +999,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, err = u32_replace_hw_hnode(tp, ht, userflags, extack); if (err) { - idr_remove(&tp_c->handle_idr, handle); + idr_remove(&tp_c->handle_idr, handle2id(handle)); kfree(ht); return err; } diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c index f7b809c0d142..38e2fbdcbeac 100644 --- a/net/sctp/ipv6.c +++ b/net/sctp/ipv6.c @@ -683,7 +683,7 @@ static int sctp_v6_available(union sctp_addr *addr, struct sctp_sock *sp) struct sock *sk = &sp->inet.sk; struct net *net = sock_net(sk); struct net_device *dev = NULL; - int type; + int type, res, bound_dev_if; type = ipv6_addr_type(in6); if (IPV6_ADDR_ANY == type) @@ -697,14 +697,21 @@ static int sctp_v6_available(union sctp_addr *addr, struct sctp_sock *sp) if (!(type & IPV6_ADDR_UNICAST)) return 0; - if (sk->sk_bound_dev_if) { - dev = dev_get_by_index_rcu(net, sk->sk_bound_dev_if); + rcu_read_lock(); + bound_dev_if = READ_ONCE(sk->sk_bound_dev_if); + if (bound_dev_if) { + res = 0; + dev = dev_get_by_index_rcu(net, bound_dev_if); if (!dev) - return 0; + goto out; } - return ipv6_can_nonlocal_bind(net, &sp->inet) || - ipv6_chk_addr(net, in6, dev, 0); + res = ipv6_can_nonlocal_bind(net, &sp->inet) || + ipv6_chk_addr(net, in6, dev, 0); + +out: + rcu_read_unlock(); + return res; } /* This function checks if the address is a valid address to be used for diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 35681adedd9a..dfd29160fe11 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -836,6 +836,9 @@ static void vsock_sk_destruct(struct sock *sk) { struct vsock_sock *vsk = vsock_sk(sk); + /* Flush MSG_ZEROCOPY leftovers. */ + __skb_queue_purge(&sk->sk_error_queue); + vsock_deassign_transport(vsk); /* When clearing these addresses, there's no need to set the family and diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index ccbd2bc0d210..9acc13ab3f82 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -400,6 +400,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk, if (virtio_transport_init_zcopy_skb(vsk, skb, info->msg, can_zcopy)) { + kfree_skb(skb); ret = -ENOMEM; break; } @@ -1109,6 +1110,7 @@ void virtio_transport_destruct(struct vsock_sock *vsk) struct virtio_vsock_sock *vvs = vsk->trans; kfree(vvs); + vsk->trans = NULL; } EXPORT_SYMBOL_GPL(virtio_transport_destruct); @@ -1512,6 +1514,14 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, return -ENOMEM; } + /* __vsock_release() might have already flushed accept_queue. + * Subsequent enqueues would lead to a memory leak. + */ + if (sk->sk_shutdown == SHUTDOWN_MASK) { + virtio_transport_reset_no_sock(t, skb); + return -ESHUTDOWN; + } + child = vsock_create_connected(sk); if (!child) { virtio_transport_reset_no_sock(t, skb); diff --git a/samples/landlock/sandboxer.c b/samples/landlock/sandboxer.c index f847e832ba14..57565dfd74a2 100644 --- a/samples/landlock/sandboxer.c +++ b/samples/landlock/sandboxer.c @@ -60,6 +60,25 @@ static inline int landlock_restrict_self(const int ruleset_fd, #define ENV_SCOPED_NAME "LL_SCOPED" #define ENV_DELIMITER ":" +static int str2num(const char *numstr, __u64 *num_dst) +{ + char *endptr = NULL; + int err = 0; + __u64 num; + + errno = 0; + num = strtoull(numstr, &endptr, 10); + if (errno != 0) + err = errno; + /* Was the string empty, or not entirely parsed successfully? */ + else if ((*numstr == '\0') || (*endptr != '\0')) + err = EINVAL; + else + *num_dst = num; + + return err; +} + static int parse_path(char *env_path, const char ***const path_list) { int i, num_paths = 0; @@ -160,7 +179,6 @@ static int populate_ruleset_net(const char *const env_var, const int ruleset_fd, char *env_port_name, *env_port_name_next, *strport; struct landlock_net_port_attr net_port = { .allowed_access = allowed_access, - .port = 0, }; env_port_name = getenv(env_var); @@ -171,7 +189,17 @@ static int populate_ruleset_net(const char *const env_var, const int ruleset_fd, env_port_name_next = env_port_name; while ((strport = strsep(&env_port_name_next, ENV_DELIMITER))) { - net_port.port = atoi(strport); + __u64 port; + + if (strcmp(strport, "") == 0) + continue; + + if (str2num(strport, &port)) { + fprintf(stderr, "Failed to parse port at \"%s\"\n", + strport); + goto out_free_name; + } + net_port.port = port; if (landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, &net_port, 0)) { fprintf(stderr, @@ -262,6 +290,44 @@ out_unset: #define LANDLOCK_ABI_LAST 6 +#define XSTR(s) #s +#define STR(s) XSTR(s) + +/* clang-format off */ + +static const char help[] = + "usage: " ENV_FS_RO_NAME "=\"...\" " ENV_FS_RW_NAME "=\"...\" " + "[other environment variables] %1$s <cmd> [args]...\n" + "\n" + "Execute the given command in a restricted environment.\n" + "Multi-valued settings (lists of ports, paths, scopes) are colon-delimited.\n" + "\n" + "Mandatory settings:\n" + "* " ENV_FS_RO_NAME ": paths allowed to be used in a read-only way\n" + "* " ENV_FS_RW_NAME ": paths allowed to be used in a read-write way\n" + "\n" + "Optional settings (when not set, their associated access check " + "is always allowed, which is different from an empty string which " + "means an empty list):\n" + "* " ENV_TCP_BIND_NAME ": ports allowed to bind (server)\n" + "* " ENV_TCP_CONNECT_NAME ": ports allowed to connect (client)\n" + "* " ENV_SCOPED_NAME ": actions denied on the outside of the landlock domain\n" + " - \"a\" to restrict opening abstract unix sockets\n" + " - \"s\" to restrict sending signals\n" + "\n" + "Example:\n" + ENV_FS_RO_NAME "=\"${PATH}:/lib:/usr:/proc:/etc:/dev/urandom\" " + ENV_FS_RW_NAME "=\"/dev/null:/dev/full:/dev/zero:/dev/pts:/tmp\" " + ENV_TCP_BIND_NAME "=\"9418\" " + ENV_TCP_CONNECT_NAME "=\"80:443\" " + ENV_SCOPED_NAME "=\"a:s\" " + "%1$s bash -i\n" + "\n" + "This sandboxer can use Landlock features up to ABI version " + STR(LANDLOCK_ABI_LAST) ".\n"; + +/* clang-format on */ + int main(const int argc, char *const argv[], char *const *const envp) { const char *cmd_path; @@ -280,47 +346,7 @@ int main(const int argc, char *const argv[], char *const *const envp) }; if (argc < 2) { - fprintf(stderr, - "usage: %s=\"...\" %s=\"...\" %s=\"...\" %s=\"...\" %s=\"...\" %s " - "<cmd> [args]...\n\n", - ENV_FS_RO_NAME, ENV_FS_RW_NAME, ENV_TCP_BIND_NAME, - ENV_TCP_CONNECT_NAME, ENV_SCOPED_NAME, argv[0]); - fprintf(stderr, - "Execute a command in a restricted environment.\n\n"); - fprintf(stderr, - "Environment variables containing paths and ports " - "each separated by a colon:\n"); - fprintf(stderr, - "* %s: list of paths allowed to be used in a read-only way.\n", - ENV_FS_RO_NAME); - fprintf(stderr, - "* %s: list of paths allowed to be used in a read-write way.\n\n", - ENV_FS_RW_NAME); - fprintf(stderr, - "Environment variables containing ports are optional " - "and could be skipped.\n"); - fprintf(stderr, - "* %s: list of ports allowed to bind (server).\n", - ENV_TCP_BIND_NAME); - fprintf(stderr, - "* %s: list of ports allowed to connect (client).\n", - ENV_TCP_CONNECT_NAME); - fprintf(stderr, "* %s: list of scoped IPCs.\n", - ENV_SCOPED_NAME); - fprintf(stderr, - "\nexample:\n" - "%s=\"${PATH}:/lib:/usr:/proc:/etc:/dev/urandom\" " - "%s=\"/dev/null:/dev/full:/dev/zero:/dev/pts:/tmp\" " - "%s=\"9418\" " - "%s=\"80:443\" " - "%s=\"a:s\" " - "%s bash -i\n\n", - ENV_FS_RO_NAME, ENV_FS_RW_NAME, ENV_TCP_BIND_NAME, - ENV_TCP_CONNECT_NAME, ENV_SCOPED_NAME, argv[0]); - fprintf(stderr, - "This sandboxer can use Landlock features " - "up to ABI version %d.\n", - LANDLOCK_ABI_LAST); + fprintf(stderr, help, argv[0]); return 1; } diff --git a/samples/pktgen/pktgen_sample01_simple.sh b/samples/pktgen/pktgen_sample01_simple.sh index cdb9f497f87d..66cb707479e6 100755 --- a/samples/pktgen/pktgen_sample01_simple.sh +++ b/samples/pktgen/pktgen_sample01_simple.sh @@ -76,7 +76,7 @@ if [ -n "$DST_PORT" ]; then pg_set $DEV "udp_dst_max $UDP_DST_MAX" fi -[ ! -z "$UDP_CSUM" ] && pg_set $dev "flag UDPCSUM" +[ ! -z "$UDP_CSUM" ] && pg_set $DEV "flag UDPCSUM" # Setup random UDP port src range pg_set $DEV "flag UDPSRC_RND" diff --git a/security/integrity/evm/evm_main.c b/security/integrity/evm/evm_main.c index 6924ed508ebd..377e57e9084f 100644 --- a/security/integrity/evm/evm_main.c +++ b/security/integrity/evm/evm_main.c @@ -1084,7 +1084,8 @@ static void evm_file_release(struct file *file) if (!S_ISREG(inode->i_mode) || !(mode & FMODE_WRITE)) return; - if (iint && atomic_read(&inode->i_writecount) == 1) + if (iint && iint->flags & EVM_NEW_FILE && + atomic_read(&inode->i_writecount) == 1) iint->flags &= ~EVM_NEW_FILE; } diff --git a/security/integrity/ima/ima_template_lib.c b/security/integrity/ima/ima_template_lib.c index 4183956c53af..0e627eac9c33 100644 --- a/security/integrity/ima/ima_template_lib.c +++ b/security/integrity/ima/ima_template_lib.c @@ -318,15 +318,21 @@ static int ima_eventdigest_init_common(const u8 *digest, u32 digestsize, hash_algo_name[hash_algo]); } - if (digest) + if (digest) { memcpy(buffer + offset, digest, digestsize); - else + } else { /* * If digest is NULL, the event being recorded is a violation. * Make room for the digest by increasing the offset by the - * hash algorithm digest size. + * hash algorithm digest size. If the hash algorithm is not + * specified increase the offset by IMA_DIGEST_SIZE which + * fits SHA1 or MD5 */ - offset += hash_digest_size[hash_algo]; + if (hash_algo < HASH_ALGO__LAST) + offset += hash_digest_size[hash_algo]; + else + offset += IMA_DIGEST_SIZE; + } return ima_write_template_field_data(buffer, offset + digestsize, fmt, field_data); diff --git a/security/integrity/integrity.h b/security/integrity/integrity.h index 660f76cb69d3..c2c2da691123 100644 --- a/security/integrity/integrity.h +++ b/security/integrity/integrity.h @@ -37,6 +37,8 @@ struct evm_ima_xattr_data { ); u8 data[]; } __packed; +static_assert(offsetof(struct evm_ima_xattr_data, data) == sizeof(struct evm_ima_xattr_data_hdr), + "struct member likely outside of __struct_group()"); /* Only used in the EVM HMAC code. */ struct evm_xattr { @@ -65,6 +67,8 @@ struct ima_digest_data { ); u8 digest[]; } __packed; +static_assert(offsetof(struct ima_digest_data, digest) == sizeof(struct ima_digest_data_hdr), + "struct member likely outside of __struct_group()"); /* * Instead of wrapping the ima_digest_data struct inside a local structure diff --git a/security/landlock/fs.c b/security/landlock/fs.c index 7d79fc8abe21..e31b97a9f175 100644 --- a/security/landlock/fs.c +++ b/security/landlock/fs.c @@ -389,37 +389,21 @@ static bool is_nouser_or_private(const struct dentry *dentry) } static access_mask_t -get_raw_handled_fs_accesses(const struct landlock_ruleset *const domain) -{ - access_mask_t access_dom = 0; - size_t layer_level; - - for (layer_level = 0; layer_level < domain->num_layers; layer_level++) - access_dom |= - landlock_get_raw_fs_access_mask(domain, layer_level); - return access_dom; -} - -static access_mask_t get_handled_fs_accesses(const struct landlock_ruleset *const domain) { /* Handles all initially denied by default access rights. */ - return get_raw_handled_fs_accesses(domain) | + return landlock_union_access_masks(domain).fs | LANDLOCK_ACCESS_FS_INITIALLY_DENIED; } -static const struct landlock_ruleset * -get_fs_domain(const struct landlock_ruleset *const domain) -{ - if (!domain || !get_raw_handled_fs_accesses(domain)) - return NULL; - - return domain; -} +static const struct access_masks any_fs = { + .fs = ~0, +}; static const struct landlock_ruleset *get_current_fs_domain(void) { - return get_fs_domain(landlock_get_current_domain()); + return landlock_get_applicable_domain(landlock_get_current_domain(), + any_fs); } /* @@ -1517,7 +1501,8 @@ static int hook_file_open(struct file *const file) access_mask_t open_access_request, full_access_request, allowed_access, optional_access; const struct landlock_ruleset *const dom = - get_fs_domain(landlock_cred(file->f_cred)->domain); + landlock_get_applicable_domain( + landlock_cred(file->f_cred)->domain, any_fs); if (!dom) return 0; diff --git a/security/landlock/net.c b/security/landlock/net.c index c8bcd29bde09..d5dcc4407a19 100644 --- a/security/landlock/net.c +++ b/security/landlock/net.c @@ -39,27 +39,9 @@ int landlock_append_net_rule(struct landlock_ruleset *const ruleset, return err; } -static access_mask_t -get_raw_handled_net_accesses(const struct landlock_ruleset *const domain) -{ - access_mask_t access_dom = 0; - size_t layer_level; - - for (layer_level = 0; layer_level < domain->num_layers; layer_level++) - access_dom |= landlock_get_net_access_mask(domain, layer_level); - return access_dom; -} - -static const struct landlock_ruleset *get_current_net_domain(void) -{ - const struct landlock_ruleset *const dom = - landlock_get_current_domain(); - - if (!dom || !get_raw_handled_net_accesses(dom)) - return NULL; - - return dom; -} +static const struct access_masks any_net = { + .net = ~0, +}; static int current_check_access_socket(struct socket *const sock, struct sockaddr *const address, @@ -72,7 +54,9 @@ static int current_check_access_socket(struct socket *const sock, struct landlock_id id = { .type = LANDLOCK_KEY_NET_PORT, }; - const struct landlock_ruleset *const dom = get_current_net_domain(); + const struct landlock_ruleset *const dom = + landlock_get_applicable_domain(landlock_get_current_domain(), + any_net); if (!dom) return 0; diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h index 61bdbc550172..631e24d4ffe9 100644 --- a/security/landlock/ruleset.h +++ b/security/landlock/ruleset.h @@ -11,6 +11,7 @@ #include <linux/bitops.h> #include <linux/build_bug.h> +#include <linux/kernel.h> #include <linux/mutex.h> #include <linux/rbtree.h> #include <linux/refcount.h> @@ -47,6 +48,15 @@ struct access_masks { access_mask_t scope : LANDLOCK_NUM_SCOPE; }; +union access_masks_all { + struct access_masks masks; + u32 all; +}; + +/* Makes sure all fields are covered. */ +static_assert(sizeof(typeof_member(union access_masks_all, masks)) == + sizeof(typeof_member(union access_masks_all, all))); + typedef u16 layer_mask_t; /* Makes sure all layers can be checked. */ static_assert(BITS_PER_TYPE(layer_mask_t) >= LANDLOCK_MAX_NUM_LAYERS); @@ -260,6 +270,61 @@ static inline void landlock_get_ruleset(struct landlock_ruleset *const ruleset) refcount_inc(&ruleset->usage); } +/** + * landlock_union_access_masks - Return all access rights handled in the + * domain + * + * @domain: Landlock ruleset (used as a domain) + * + * Returns: an access_masks result of the OR of all the domain's access masks. + */ +static inline struct access_masks +landlock_union_access_masks(const struct landlock_ruleset *const domain) +{ + union access_masks_all matches = {}; + size_t layer_level; + + for (layer_level = 0; layer_level < domain->num_layers; layer_level++) { + union access_masks_all layer = { + .masks = domain->access_masks[layer_level], + }; + + matches.all |= layer.all; + } + + return matches.masks; +} + +/** + * landlock_get_applicable_domain - Return @domain if it applies to (handles) + * at least one of the access rights specified + * in @masks + * + * @domain: Landlock ruleset (used as a domain) + * @masks: access masks + * + * Returns: @domain if any access rights specified in @masks is handled, or + * NULL otherwise. + */ +static inline const struct landlock_ruleset * +landlock_get_applicable_domain(const struct landlock_ruleset *const domain, + const struct access_masks masks) +{ + const union access_masks_all masks_all = { + .masks = masks, + }; + union access_masks_all merge = {}; + + if (!domain) + return NULL; + + merge.masks = landlock_union_access_masks(domain); + if (merge.all & masks_all.all) + return domain; + + return NULL; +} + static inline void landlock_add_fs_access_mask(struct landlock_ruleset *const ruleset, const access_mask_t fs_access_mask, @@ -296,18 +361,11 @@ landlock_add_scope_mask(struct landlock_ruleset *const ruleset, } static inline access_mask_t -landlock_get_raw_fs_access_mask(const struct landlock_ruleset *const ruleset, - const u16 layer_level) -{ - return ruleset->access_masks[layer_level].fs; -} - -static inline access_mask_t landlock_get_fs_access_mask(const struct landlock_ruleset *const ruleset, const u16 layer_level) { /* Handles all initially denied by default access rights. */ - return landlock_get_raw_fs_access_mask(ruleset, layer_level) | + return ruleset->access_masks[layer_level].fs | LANDLOCK_ACCESS_FS_INITIALLY_DENIED; } diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c index f5a0e7182ec0..c097d356fa45 100644 --- a/security/landlock/syscalls.c +++ b/security/landlock/syscalls.c @@ -329,7 +329,7 @@ static int add_rule_path_beneath(struct landlock_ruleset *const ruleset, return -ENOMSG; /* Checks that allowed_access matches the @ruleset constraints. */ - mask = landlock_get_raw_fs_access_mask(ruleset, 0); + mask = ruleset->access_masks[0].fs; if ((path_beneath_attr.allowed_access | mask) != mask) return -EINVAL; diff --git a/security/landlock/task.c b/security/landlock/task.c index 4acbd7c40eee..dc7dab78392e 100644 --- a/security/landlock/task.c +++ b/security/landlock/task.c @@ -204,12 +204,17 @@ static bool is_abstract_socket(struct sock *const sock) return false; } +static const struct access_masks unix_scope = { + .scope = LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET, +}; + static int hook_unix_stream_connect(struct sock *const sock, struct sock *const other, struct sock *const newsk) { const struct landlock_ruleset *const dom = - landlock_get_current_domain(); + landlock_get_applicable_domain(landlock_get_current_domain(), + unix_scope); /* Quick return for non-landlocked tasks. */ if (!dom) @@ -225,7 +230,8 @@ static int hook_unix_may_send(struct socket *const sock, struct socket *const other) { const struct landlock_ruleset *const dom = - landlock_get_current_domain(); + landlock_get_applicable_domain(landlock_get_current_domain(), + unix_scope); if (!dom) return 0; @@ -243,6 +249,10 @@ static int hook_unix_may_send(struct socket *const sock, return 0; } +static const struct access_masks signal_scope = { + .scope = LANDLOCK_SCOPE_SIGNAL, +}; + static int hook_task_kill(struct task_struct *const p, struct kernel_siginfo *const info, const int sig, const struct cred *const cred) @@ -256,6 +266,7 @@ static int hook_task_kill(struct task_struct *const p, } else { dom = landlock_get_current_domain(); } + dom = landlock_get_applicable_domain(dom, signal_scope); /* Quick return for non-landlocked tasks. */ if (!dom) @@ -279,7 +290,8 @@ static int hook_file_send_sigiotask(struct task_struct *tsk, /* Lock already held by send_sigio() and send_sigurg(). */ lockdep_assert_held(&fown->lock); - dom = landlock_file(fown->file)->fown_domain; + dom = landlock_get_applicable_domain( + landlock_file(fown->file)->fown_domain, signal_scope); /* Quick return for unowned socket. */ if (!dom) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 571fa8a6c9e1..24b4fe99304a 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -7450,7 +7450,6 @@ static void alc287_alc1318_playback_pcm_hook(struct hda_pcm_stream *hinfo, struct snd_pcm_substream *substream, int action) { - alc_write_coef_idx(codec, 0x10, 0x8806); /* Change MLK to GPIO3 */ switch (action) { case HDA_GEN_PCM_ACT_OPEN: alc_write_coefex_idx(codec, 0x5a, 0x00, 0x954f); /* write gpio3 to high */ @@ -7464,7 +7463,6 @@ static void alc287_alc1318_playback_pcm_hook(struct hda_pcm_stream *hinfo, static void alc287_s4_power_gpio3_default(struct hda_codec *codec) { if (is_s4_suspend(codec)) { - alc_write_coef_idx(codec, 0x10, 0x8806); /* Change MLK to GPIO3 */ alc_write_coefex_idx(codec, 0x5a, 0x00, 0x554f); /* write gpio3 as default value */ } } @@ -7473,9 +7471,17 @@ static void alc287_fixup_lenovo_thinkpad_with_alc1318(struct hda_codec *codec, const struct hda_fixup *fix, int action) { struct alc_spec *spec = codec->spec; + static const struct coef_fw coefs[] = { + WRITE_COEF(0x24, 0x0013), WRITE_COEF(0x25, 0x0000), WRITE_COEF(0x26, 0xC300), + WRITE_COEF(0x28, 0x0001), WRITE_COEF(0x29, 0xb023), + WRITE_COEF(0x24, 0x0013), WRITE_COEF(0x25, 0x0000), WRITE_COEF(0x26, 0xC301), + WRITE_COEF(0x28, 0x0001), WRITE_COEF(0x29, 0xb023), + }; if (action != HDA_FIXUP_ACT_PRE_PROBE) return; + alc_update_coef_idx(codec, 0x10, 1<<11, 1<<11); + alc_process_coef_fw(codec, coefs); spec->power_hook = alc287_s4_power_gpio3_default; spec->gen.pcm_playback_hook = alc287_alc1318_playback_pcm_hook; } @@ -10496,6 +10502,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x103c, 0x8b59, "HP Elite mt645 G7 Mobile Thin Client U89", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF), SND_PCI_QUIRK(0x103c, 0x8b5d, "HP", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF), SND_PCI_QUIRK(0x103c, 0x8b5e, "HP", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF), + SND_PCI_QUIRK(0x103c, 0x8b5f, "HP", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF), SND_PCI_QUIRK(0x103c, 0x8b63, "HP Elite Dragonfly 13.5 inch G4", ALC245_FIXUP_CS35L41_SPI_4_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8b65, "HP ProBook 455 15.6 inch G10 Notebook PC", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF), SND_PCI_QUIRK(0x103c, 0x8b66, "HP", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF), @@ -11673,6 +11680,8 @@ static const struct snd_hda_pin_quirk alc269_fallback_pin_fixup_tbl[] = { {0x1a, 0x40000000}), SND_HDA_PIN_QUIRK(0x10ec0256, 0x1043, "ASUS", ALC2XX_FIXUP_HEADSET_MIC, {0x19, 0x40000000}), + SND_HDA_PIN_QUIRK(0x10ec0255, 0x1558, "Clevo", ALC2XX_FIXUP_HEADSET_MIC, + {0x19, 0x40000000}), {} }; diff --git a/sound/soc/codecs/max9768.c b/sound/soc/codecs/max9768.c index e4793a5d179e..8af3c7e5317f 100644 --- a/sound/soc/codecs/max9768.c +++ b/sound/soc/codecs/max9768.c @@ -54,10 +54,17 @@ static int max9768_set_gpio(struct snd_kcontrol *kcontrol, { struct snd_soc_component *c = snd_soc_kcontrol_component(kcontrol); struct max9768 *max9768 = snd_soc_component_get_drvdata(c); + bool val = !ucontrol->value.integer.value[0]; + int ret; - gpiod_set_value_cansleep(max9768->mute, !ucontrol->value.integer.value[0]); + if (val != gpiod_get_value_cansleep(max9768->mute)) + ret = 1; + else + ret = 0; - return 0; + gpiod_set_value_cansleep(max9768->mute, val); + + return ret; } static const DECLARE_TLV_DB_RANGE(volume_tlv, diff --git a/sound/soc/generic/audio-graph-card2.c b/sound/soc/generic/audio-graph-card2.c index 4ad3d1b0714f..93eee40cec76 100644 --- a/sound/soc/generic/audio-graph-card2.c +++ b/sound/soc/generic/audio-graph-card2.c @@ -270,16 +270,19 @@ static enum graph_type __graph_get_type(struct device_node *lnk) if (of_node_name_eq(np, GRAPH_NODENAME_MULTI)) { ret = GRAPH_MULTI; + fw_devlink_purge_absent_suppliers(&np->fwnode); goto out_put; } if (of_node_name_eq(np, GRAPH_NODENAME_DPCM)) { ret = GRAPH_DPCM; + fw_devlink_purge_absent_suppliers(&np->fwnode); goto out_put; } if (of_node_name_eq(np, GRAPH_NODENAME_C2C)) { ret = GRAPH_C2C; + fw_devlink_purge_absent_suppliers(&np->fwnode); goto out_put; } diff --git a/sound/soc/intel/boards/sof_sdw.c b/sound/soc/intel/boards/sof_sdw.c index 35d707d3ae9c..4a0ab50d1e50 100644 --- a/sound/soc/intel/boards/sof_sdw.c +++ b/sound/soc/intel/boards/sof_sdw.c @@ -594,6 +594,14 @@ static const struct dmi_system_id sof_sdw_quirk_table[] = { .callback = sof_sdw_quirk_cb, .matches = { DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc"), + DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "0CF1") + }, + .driver_data = (void *)(SOC_SDW_CODEC_SPKR), + }, + { + .callback = sof_sdw_quirk_cb, + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc"), DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "0CF7") }, .driver_data = (void *)(SOC_SDW_CODEC_SPKR), diff --git a/sound/usb/quirks-table.h b/sound/usb/quirks-table.h index 24c981c9b240..199d0603cf8e 100644 --- a/sound/usb/quirks-table.h +++ b/sound/usb/quirks-table.h @@ -324,7 +324,6 @@ YAMAHA_DEVICE(0x105a, NULL), YAMAHA_DEVICE(0x105b, NULL), YAMAHA_DEVICE(0x105c, NULL), YAMAHA_DEVICE(0x105d, NULL), -YAMAHA_DEVICE(0x1718, "P-125"), { USB_DEVICE(0x0499, 0x1503), QUIRK_DRIVER_INFO { @@ -391,6 +390,19 @@ YAMAHA_DEVICE(0x1718, "P-125"), } } }, +{ + USB_DEVICE(0x0499, 0x1718), + QUIRK_DRIVER_INFO { + /* .vendor_name = "Yamaha", */ + /* .product_name = "P-125", */ + QUIRK_DATA_COMPOSITE { + { QUIRK_DATA_STANDARD_AUDIO(1) }, + { QUIRK_DATA_STANDARD_AUDIO(2) }, + { QUIRK_DATA_MIDI_YAMAHA(3) }, + QUIRK_COMPOSITE_END + } + } +}, YAMAHA_DEVICE(0x2000, "DGP-7"), YAMAHA_DEVICE(0x2001, "DGP-5"), YAMAHA_DEVICE(0x2002, NULL), diff --git a/tools/sched_ext/scx_show_state.py b/tools/sched_ext/scx_show_state.py index 8bc626ede1c4..c4b3fdda9a0b 100644 --- a/tools/sched_ext/scx_show_state.py +++ b/tools/sched_ext/scx_show_state.py @@ -35,6 +35,6 @@ print(f'enabled : {read_static_key("__scx_ops_enabled")}') print(f'switching_all : {read_int("scx_switching_all")}') print(f'switched_all : {read_static_key("__scx_switched_all")}') print(f'enable_state : {ops_state_str(enable_state)} ({enable_state})') -print(f'bypass_depth : {read_atomic("scx_ops_bypass_depth")}') +print(f'bypass_depth : {prog["scx_ops_bypass_depth"].value_()}') print(f'nr_rejected : {read_atomic("scx_nr_rejected")}') print(f'enable_seq : {read_atomic("scx_enable_seq")}') diff --git a/tools/testing/selftests/bpf/progs/verifier_bits_iter.c b/tools/testing/selftests/bpf/progs/verifier_bits_iter.c index 156cc278e2fc..7c881bca9af5 100644 --- a/tools/testing/selftests/bpf/progs/verifier_bits_iter.c +++ b/tools/testing/selftests/bpf/progs/verifier_bits_iter.c @@ -57,9 +57,15 @@ __description("null pointer") __success __retval(0) int null_pointer(void) { - int nr = 0; + struct bpf_iter_bits iter; + int err, nr = 0; int *bit; + err = bpf_iter_bits_new(&iter, NULL, 1); + bpf_iter_bits_destroy(&iter); + if (err != -EINVAL) + return 1; + bpf_for_each(bits, bit, NULL, 1) nr++; return nr; @@ -194,15 +200,33 @@ __description("bad words") __success __retval(0) int bad_words(void) { - void *bad_addr = (void *)(3UL << 30); - int nr = 0; + void *bad_addr = (void *)-4095; + struct bpf_iter_bits iter; + volatile int nr; int *bit; + int err; + + err = bpf_iter_bits_new(&iter, bad_addr, 1); + bpf_iter_bits_destroy(&iter); + if (err != -EFAULT) + return 1; + nr = 0; bpf_for_each(bits, bit, bad_addr, 1) nr++; + if (nr != 0) + return 2; + err = bpf_iter_bits_new(&iter, bad_addr, 4); + bpf_iter_bits_destroy(&iter); + if (err != -EFAULT) + return 3; + + nr = 0; bpf_for_each(bits, bit, bad_addr, 4) nr++; + if (nr != 0) + return 4; - return nr; + return 0; } diff --git a/tools/testing/selftests/drivers/net/bonding/bond_options.sh b/tools/testing/selftests/drivers/net/bonding/bond_options.sh index 41d0859feb7d..edc56e2cc606 100755 --- a/tools/testing/selftests/drivers/net/bonding/bond_options.sh +++ b/tools/testing/selftests/drivers/net/bonding/bond_options.sh @@ -11,6 +11,8 @@ ALL_TESTS=" lib_dir=$(dirname "$0") source ${lib_dir}/bond_topo_3d1c.sh +c_maddr="33:33:00:00:00:10" +g_maddr="33:33:00:00:02:54" skip_prio() { @@ -240,6 +242,54 @@ arp_validate_test() done } +# Testing correct multicast groups are added to slaves for ns targets +arp_validate_mcast() +{ + RET=0 + local arp_valid=$(cmd_jq "ip -n ${s_ns} -j -d link show bond0" ".[].linkinfo.info_data.arp_validate") + local active_slave=$(cmd_jq "ip -n ${s_ns} -d -j link show bond0" ".[].linkinfo.info_data.active_slave") + + for i in $(seq 0 2); do + maddr_list=$(ip -n ${s_ns} maddr show dev eth${i}) + + # arp_valid == 0 or active_slave should not join any maddrs + if { [ "$arp_valid" == "null" ] || [ "eth${i}" == ${active_slave} ]; } && \ + echo "$maddr_list" | grep -qE "${c_maddr}|${g_maddr}"; then + RET=1 + check_err 1 "arp_valid $arp_valid active_slave $active_slave, eth$i has mcast group" + # arp_valid != 0 and backup_slave should join both maddrs + elif [ "$arp_valid" != "null" ] && [ "eth${i}" != ${active_slave} ] && \ + ( ! echo "$maddr_list" | grep -q "${c_maddr}" || \ + ! echo "$maddr_list" | grep -q "${m_maddr}"); then + RET=1 + check_err 1 "arp_valid $arp_valid active_slave $active_slave, eth$i has mcast group" + fi + done + + # Do failover + ip -n ${s_ns} link set ${active_slave} down + # wait for active link change + slowwait 2 active_slave_changed $active_slave + active_slave=$(cmd_jq "ip -n ${s_ns} -d -j link show bond0" ".[].linkinfo.info_data.active_slave") + + for i in $(seq 0 2); do + maddr_list=$(ip -n ${s_ns} maddr show dev eth${i}) + + # arp_valid == 0 or active_slave should not join any maddrs + if { [ "$arp_valid" == "null" ] || [ "eth${i}" == ${active_slave} ]; } && \ + echo "$maddr_list" | grep -qE "${c_maddr}|${g_maddr}"; then + RET=1 + check_err 1 "arp_valid $arp_valid active_slave $active_slave, eth$i has mcast group" + # arp_valid != 0 and backup_slave should join both maddrs + elif [ "$arp_valid" != "null" ] && [ "eth${i}" != ${active_slave} ] && \ + ( ! echo "$maddr_list" | grep -q "${c_maddr}" || \ + ! echo "$maddr_list" | grep -q "${m_maddr}"); then + RET=1 + check_err 1 "arp_valid $arp_valid active_slave $active_slave, eth$i has mcast group" + fi + done +} + arp_validate_arp() { local mode=$1 @@ -261,8 +311,10 @@ arp_validate_ns() fi for val in $(seq 0 6); do - arp_validate_test "mode $mode arp_interval 100 ns_ip6_target ${g_ip6} arp_validate $val" + arp_validate_test "mode $mode arp_interval 100 ns_ip6_target ${g_ip6},${c_ip6} arp_validate $val" log_test "arp_validate" "$mode ns_ip6_target arp_validate $val" + arp_validate_mcast + log_test "arp_validate" "join mcast group" done } diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 156fbfae940f..48645a2e29da 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -241,16 +241,18 @@ CFLAGS += -Wall -Wstrict-prototypes -Wuninitialized -O2 -g -std=gnu99 \ -Wno-gnu-variable-sized-type-not-at-end -MD -MP -DCONFIG_64BIT \ -fno-builtin-memcmp -fno-builtin-memcpy \ -fno-builtin-memset -fno-builtin-strnlen \ - -fno-stack-protector -fno-PIE -I$(LINUX_TOOL_INCLUDE) \ - -I$(LINUX_TOOL_ARCH_INCLUDE) -I$(LINUX_HDR_PATH) -Iinclude \ - -I$(<D) -Iinclude/$(ARCH_DIR) -I ../rseq -I.. $(EXTRA_CFLAGS) \ - $(KHDR_INCLUDES) + -fno-stack-protector -fno-PIE -fno-strict-aliasing \ + -I$(LINUX_TOOL_INCLUDE) -I$(LINUX_TOOL_ARCH_INCLUDE) \ + -I$(LINUX_HDR_PATH) -Iinclude -I$(<D) -Iinclude/$(ARCH_DIR) \ + -I ../rseq -I.. $(EXTRA_CFLAGS) $(KHDR_INCLUDES) ifeq ($(ARCH),s390) CFLAGS += -march=z10 endif ifeq ($(ARCH),x86) +ifeq ($(shell echo "void foo(void) { }" | $(CC) -march=x86-64-v2 -x c - -c -o /dev/null 2>/dev/null; echo "$$?"),0) CFLAGS += -march=x86-64-v2 endif +endif ifeq ($(ARCH),arm64) tools_dir := $(top_srcdir)/tools arm64_tools_dir := $(tools_dir)/arch/arm64/tools/ diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c index ba0c8e996035..ce687f8d248f 100644 --- a/tools/testing/selftests/kvm/guest_memfd_test.c +++ b/tools/testing/selftests/kvm/guest_memfd_test.c @@ -134,7 +134,7 @@ static void test_create_guest_memfd_invalid(struct kvm_vm *vm) size); } - for (flag = 0; flag; flag <<= 1) { + for (flag = BIT(0); flag; flag <<= 1) { fd = __vm_create_guest_memfd(vm, page_size, flag); TEST_ASSERT(fd == -1 && errno == EINVAL, "guest_memfd() with flag '0x%lx' should fail with EINVAL", diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c index 089b8925b6b2..d7ac122820bf 100644 --- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c +++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c @@ -200,7 +200,7 @@ static inline void init_vmcs_control_fields(struct vmx_pages *vmx) if (vmx->eptp_gpa) { uint64_t ept_paddr; struct eptPageTablePointer eptp = { - .memory_type = VMX_BASIC_MEM_TYPE_WB, + .memory_type = X86_MEMTYPE_WB, .page_walk_length = 3, /* + 1 */ .ad_enabled = ept_vpid_cap_supported(VMX_EPT_VPID_CAP_AD_BITS), .address = vmx->eptp_gpa >> PAGE_SHIFT_4K, diff --git a/tools/testing/selftests/kvm/memslot_perf_test.c b/tools/testing/selftests/kvm/memslot_perf_test.c index 989ffe0d047f..e3711beff7f3 100644 --- a/tools/testing/selftests/kvm/memslot_perf_test.c +++ b/tools/testing/selftests/kvm/memslot_perf_test.c @@ -417,7 +417,7 @@ static bool _guest_should_exit(void) */ static noinline void host_perform_sync(struct sync_area *sync) { - alarm(2); + alarm(10); atomic_store_explicit(&sync->sync_flag, true, memory_order_release); while (atomic_load_explicit(&sync->sync_flag, memory_order_acquire)) diff --git a/tools/testing/selftests/mm/hugetlb_dio.c b/tools/testing/selftests/mm/hugetlb_dio.c index 60001c142ce9..432d5af15e66 100644 --- a/tools/testing/selftests/mm/hugetlb_dio.c +++ b/tools/testing/selftests/mm/hugetlb_dio.c @@ -44,6 +44,13 @@ void run_dio_using_hugetlb(unsigned int start_off, unsigned int end_off) if (fd < 0) ksft_exit_fail_perror("Error opening file\n"); + /* Get the free huge pages before allocation */ + free_hpage_b = get_free_hugepages(); + if (free_hpage_b == 0) { + close(fd); + ksft_exit_skip("No free hugepage, exiting!\n"); + } + /* Allocate a hugetlb page */ orig_buffer = mmap(NULL, h_pagesize, mmap_prot, mmap_flags, -1, 0); if (orig_buffer == MAP_FAILED) { diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore index 217d8b7a7365..59fe07ee2df9 100644 --- a/tools/testing/selftests/net/.gitignore +++ b/tools/testing/selftests/net/.gitignore @@ -19,6 +19,7 @@ log.txt msg_oob msg_zerocopy ncdevmem +netlink-dumps nettest psock_fanout psock_snd diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index 649f1fe0dc46..5e86f7a51b43 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -78,6 +78,7 @@ TEST_PROGS += test_vxlan_vnifiltering.sh TEST_GEN_FILES += io_uring_zerocopy_tx TEST_PROGS += io_uring_zerocopy_tx.sh TEST_GEN_FILES += bind_bhash +TEST_GEN_PROGS += netlink-dumps TEST_GEN_PROGS += sk_bind_sendto_listen TEST_GEN_PROGS += sk_connect_zero_addr TEST_GEN_PROGS += sk_so_peek_off diff --git a/tools/testing/selftests/net/netlink-dumps.c b/tools/testing/selftests/net/netlink-dumps.c new file mode 100644 index 000000000000..7ee6dcd334df --- /dev/null +++ b/tools/testing/selftests/net/netlink-dumps.c @@ -0,0 +1,110 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define _GNU_SOURCE + +#include <fcntl.h> +#include <stdio.h> +#include <string.h> +#include <sys/socket.h> +#include <sys/stat.h> +#include <sys/syscall.h> +#include <sys/types.h> +#include <unistd.h> + +#include <linux/genetlink.h> +#include <linux/netlink.h> +#include <linux/mqueue.h> + +#include "../kselftest_harness.h" + +static const struct { + struct nlmsghdr nlhdr; + struct genlmsghdr genlhdr; + struct nlattr ahdr; + __u16 val; + __u16 pad; +} dump_policies = { + .nlhdr = { + .nlmsg_len = sizeof(dump_policies), + .nlmsg_type = GENL_ID_CTRL, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP, + .nlmsg_seq = 1, + }, + .genlhdr = { + .cmd = CTRL_CMD_GETPOLICY, + .version = 2, + }, + .ahdr = { + .nla_len = 6, + .nla_type = CTRL_ATTR_FAMILY_ID, + }, + .val = GENL_ID_CTRL, + .pad = 0, +}; + +// Sanity check for the test itself, make sure the dump doesn't fit in one msg +TEST(test_sanity) +{ + int netlink_sock; + char buf[8192]; + ssize_t n; + + netlink_sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC); + ASSERT_GE(netlink_sock, 0); + + n = send(netlink_sock, &dump_policies, sizeof(dump_policies), 0); + ASSERT_EQ(n, sizeof(dump_policies)); + + n = recv(netlink_sock, buf, sizeof(buf), MSG_DONTWAIT); + ASSERT_GE(n, sizeof(struct nlmsghdr)); + + n = recv(netlink_sock, buf, sizeof(buf), MSG_DONTWAIT); + ASSERT_GE(n, sizeof(struct nlmsghdr)); + + close(netlink_sock); +} + +TEST(close_in_progress) +{ + int netlink_sock; + ssize_t n; + + netlink_sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC); + ASSERT_GE(netlink_sock, 0); + + n = send(netlink_sock, &dump_policies, sizeof(dump_policies), 0); + ASSERT_EQ(n, sizeof(dump_policies)); + + close(netlink_sock); +} + +TEST(close_with_ref) +{ + char cookie[NOTIFY_COOKIE_LEN] = {}; + int netlink_sock, mq_fd; + struct sigevent sigev; + ssize_t n; + + netlink_sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC); + ASSERT_GE(netlink_sock, 0); + + n = send(netlink_sock, &dump_policies, sizeof(dump_policies), 0); + ASSERT_EQ(n, sizeof(dump_policies)); + + mq_fd = syscall(__NR_mq_open, "sed", O_CREAT | O_WRONLY, 0600, 0); + ASSERT_GE(mq_fd, 0); + + memset(&sigev, 0, sizeof(sigev)); + sigev.sigev_notify = SIGEV_THREAD; + sigev.sigev_value.sival_ptr = cookie; + sigev.sigev_signo = netlink_sock; + + syscall(__NR_mq_notify, mq_fd, &sigev); + + close(netlink_sock); + + // give mqueue time to fire + usleep(100 * 1000); +} + +TEST_HARNESS_MAIN diff --git a/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json b/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json index 24bd0c2a3014..b2ca9d4e991b 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json +++ b/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json @@ -329,5 +329,29 @@ "teardown": [ "$TC qdisc del dev $DEV1 parent root drr" ] + }, + { + "id": "1234", + "name": "Exercise IDR leaks by creating/deleting a filter many (2048) times", + "category": [ + "filter", + "u32" + ], + "plugins": { + "requires": "nsPlugin" + }, + "setup": [ + "$TC qdisc add dev $DEV1 parent root handle 10: drr", + "$TC filter add dev $DEV1 parent 10:0 protocol ip prio 2 u32 match ip src 0.0.0.2/32 action drop", + "$TC filter add dev $DEV1 parent 10:0 protocol ip prio 3 u32 match ip src 0.0.0.3/32 action drop" + ], + "cmdUnderTest": "bash -c 'for i in {1..2048} ;do echo filter delete dev $DEV1 pref 3;echo filter add dev $DEV1 parent 10:0 protocol ip prio 3 u32 match ip src 0.0.0.3/32 action drop;done | $TC -b -'", + "expExitCode": "0", + "verifyCmd": "$TC filter show dev $DEV1", + "matchPattern": "protocol ip pref 3 u32", + "matchCount": "3", + "teardown": [ + "$TC qdisc del dev $DEV1 parent root drr" + ] } ] diff --git a/tools/virtio/vringh_test.c b/tools/virtio/vringh_test.c index 43d3a6aa1dcf..b9591223437a 100644 --- a/tools/virtio/vringh_test.c +++ b/tools/virtio/vringh_test.c @@ -519,7 +519,7 @@ int main(int argc, char *argv[]) errx(1, "virtqueue_add_sgs: %i", err); __kmalloc_fake = NULL; - /* Host retreives it. */ + /* Host retrieves it. */ vringh_iov_init(&riov, host_riov, ARRAY_SIZE(host_riov)); vringh_iov_init(&wiov, host_wiov, ARRAY_SIZE(host_wiov)); |