summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2021-09-20Merge tag 'afs-fixes-20210913' of ↵Linus Torvalds1-2/+6
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS fixes from David Howells: "Fixes for AFS problems that can cause data corruption due to interaction with another client modifying data cached locally: - When d_revalidating a dentry, don't look at the inode to which it points. Only check the directory to which the dentry belongs. This was confusing things and causing the silly-rename cleanup code to remove the file now at the dentry of a file that got deleted. - Fix mmap data coherency. When a callback break is received that relates to a file that we have cached, the data content may have been changed (there are other reasons, such as the user's rights having been changed). However, we're checking it lazily, only on entry to the kernel, which doesn't happen if we have a writeable shared mapped page on that file. We make the kernel keep track of mmapped files and clear all PTEs mapping to that file as soon as the callback comes in by calling unmap_mapping_pages() (we don't necessarily want to zap the pagecache). This causes the kernel to be reentered when userspace tries to access the mmapped address range again - and at that point we can query the server and, if we need to, zap the page cache. Ideally, I would check each file at the point of notification, but that involves poking the server[*] - which is holding an exclusive lock on the vnode it is changing, waiting for all the clients it notified to reply. This could then deadlock against the server. Further, invalidating the pagecache might call ->launder_page(), which would try to write to the file, which would definitely deadlock. (AFS doesn't lease file access). [*] Checking to see if the file content has changed is a matter of comparing the current data version number, but we have to ask the server for that. We also need to get a new callback promise and we need to poke the server for that too. - Add some more points at which the inode is validated, since we're doing it lazily, notably in ->read_iter() and ->page_mkwrite(), but also when performing some directory operations. Ideally, checking in ->read_iter() would be done in some derivation of filemap_read(). If we're going to call the server to read the file, then we get the file status fetch as part of that. - The above is now causing us to make a lot more calls to afs_validate() to check the inode - and afs_validate() takes the RCU read lock each time to make a quick check (ie. afs_check_validity()). This is entirely for the purpose of checking cb_s_break to see if the server we're using reinitialised its list of callbacks - however this isn't a very common event, so most of the time we're taking this needlessly. Add a new cell-wide counter to count the number of reinitialisations done by any server and check that - and only if that changes, take the RCU read lock and check the server list (the server list may change, but the cell a file is part of won't). - Don't update vnode->cb_s_break and ->cb_v_break inside the validity checking loop. The cb_lock is done with read_seqretry, so we might go round the loop a second time after resetting those values - and that could cause someone else checking validity to miss something (I think). Also included are patches for fixes for some bugs encountered whilst debugging this: - Fix a leak of afs_read objects and fix a leak of keys hidden by that. - Fix a leak of pages that couldn't be added to extend a writeback. - Fix the maintenance of i_blocks when i_size is changed by a local write or a local dir edit" Link: https://bugzilla.kernel.org/show_bug.cgi?id=214217 [1] Link: https://lore.kernel.org/r/163111665183.283156.17200205573146438918.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163113612442.352844.11162345591911691150.stgit@warthog.procyon.org.uk/ # i_blocks patch * tag 'afs-fixes-20210913' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Fix updating of i_blocks on file/dir extension afs: Fix corruption in reads at fpos 2G-4G from an OpenAFS server afs: Try to avoid taking RCU read lock when checking vnode validity afs: Fix mmap coherency vs 3rd-party changes afs: Fix incorrect triggering of sillyrename on 3rd-party invalidation afs: Add missing vnode validation checks afs: Fix page leak afs: Fix missing put on afs_read objects and missing get on the key therein
2021-09-20Merge tag '5.15-rc1-smb3' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds1-1/+0
Pull cifs client fixes from Steve French: - two deferred close fixes (for bugs found with xfstests 478 and 461) - a deferred close improvement in rename - two trivial fixes for incorrect Linux comment formatting of multiple cifs files (pointed out by automated kernel test robot and checkpatch) * tag '5.15-rc1-smb3' of git://git.samba.org/sfrench/cifs-2.6: cifs: Not to defer close on file when lock is set cifs: Fix soft lockup during fsstress cifs: Deferred close performance improvements cifs: fix incorrect kernel doc comments cifs: remove pathname for file from SPDX header
2021-09-19pci_iounmap'2: Electric Boogaloo: try to make sense of it allLinus Torvalds1-23/+3
Nathan Chancellor reports that the recent change to pci_iounmap in commit 9caea0007601 ("parisc: Declare pci_iounmap() parisc version only when CONFIG_PCI enabled") causes build errors on arm64. It took me about two hours to convince myself that I think I know what the logic of that mess of #ifdef's in the <asm-generic/io.h> header file really aim to do, and rewrite it to be easier to follow. Famous last words. Anyway, the code has now been lifted from that grotty header file into lib/pci_iomap.c, and has fairly extensive comments about what the logic is. It also avoids indirecting through another confusing (and badly named) helper function that has other preprocessor config conditionals. Let's see what odd architecture did something else strange in this area to break things. But my arm64 cross build is clean. Fixes: 9caea0007601 ("parisc: Declare pci_iounmap() parisc version only when CONFIG_PCI enabled") Reported-by: Nathan Chancellor <nathan@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Ulrich Teichert <krypton@ulrich-teichert.org> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-19Merge tag 'x86_urgent_for_v5.15_rc2' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Prevent a infinite loop in the MCE recovery on return to user space, which was caused by a second MCE queueing work for the same page and thereby creating a circular work list. - Make kern_addr_valid() handle existing PMD entries, which are marked not present in the higher level page table, correctly instead of blindly dereferencing them. - Pass a valid address to sanitize_phys(). This was caused by the mixture of inclusive and exclusive ranges. memtype_reserve() expect 'end' being exclusive, but sanitize_phys() wants it inclusive. This worked so far, but with end being the end of the physical address space the fail is exposed. - Increase the maximum supported GPIO numbers for 64bit. Newer SoCs exceed the previous maximum. * tag 'x86_urgent_for_v5.15_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Avoid infinite loop for copy from user recovery x86/mm: Fix kern_addr_valid() to cope with existing but not present entries x86/platform: Increase maximum GPIO number for X86_64 x86/pat: Pass valid address to sanitize_phys()
2021-09-19parisc: Declare pci_iounmap() parisc version only when CONFIG_PCI enabledHelge Deller2-10/+3
Linus noticed odd declaration rules for pci_iounmap() in iomap.h and pci_iomap.h, where it dependend on either NO_GENERIC_PCI_IOPORT_MAP or GENERIC_IOMAP when CONFIG_PCI was disabled. Testing on parisc seems to indicate that we need pci_iounmap() only when CONFIG_PCI is enabled, so the declaration of pci_iounmap() can be moved cleanly into pci_iomap.h in sync with the declarations of pci_iomap(). Link: https://lore.kernel.org/all/CAHk-=wjRrh98pZoQ+AzfWmsTZacWxTJKXZ9eKU2X_0+jM=O8nw@mail.gmail.com/ Signed-off-by: Helge Deller <deller@gmx.de> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: 97a29d59fc22 ("[PARISC] fix compile break caused by iomap: make IOPORT/PCI mapping functions conditional") Cc: Arnd Bergmann <arnd@arndb.de> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Ulrich Teichert <krypton@ulrich-teichert.org> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-17Merge tag 'iov_iter.3-5.15-2021-09-17' of git://git.kernel.dk/linux-blockLinus Torvalds1-5/+16
Pull io_uring iov_iter retry fixes from Jens Axboe: "This adds a helper to save/restore iov_iter state, and modifies io_uring to use it. After that is done, we can now kill the iter->truncated addition that we added for this release. The io_uring change is being overly cautious with the save/restore/advance, but better safe than sorry and we can always improve that and reduce the overhead if it proves to be of concern. The only case to be worried about in this regard is huge IO, where iteration can take a while to iterate segments. I spent some time writing test cases, and expanded the coverage quite a bit from the last posting of this. liburing carries this regression test case now: https://git.kernel.dk/cgit/liburing/tree/test/file-verify.c which exercises all of this. It now also supports provided buffers, and explicitly tests for end-of-file/device truncation as well. On top of that, Pavel sanitized the IOPOLL retry path to follow the exact same pattern as normal IO" * tag 'iov_iter.3-5.15-2021-09-17' of git://git.kernel.dk/linux-block: io_uring: move iopoll reissue into regular IO path Revert "iov_iter: track truncated size" io_uring: use iov_iter state save/restore helpers iov_iter: add helper to save iov_iter state
2021-09-17Merge tag 'io_uring-5.15-2021-09-17' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+7
Pull io_uring fixes from Jens Axboe: "Mostly fixes for regressions in this cycle, but also a few fixes that predate this release. The odd one out is a tweak to the direct files added in this release, where attempting to reuse a slot is allowed instead of needing an explicit removal of that slot first. It's a considerable improvement in usability to that API, hence I'm sending it for -rc2. - io-wq race fix and cleanup (Hao) - loop_rw_iter() type fix - SQPOLL max worker race fix - Allow poll arm for O_NONBLOCK files, fixing a case where it's impossible to properly use io_uring if you cannot modify the file flags - Allow direct open to simply reuse a slot, instead of needing it explicitly removed first (Pavel) - Fix a case where we missed signal mask restoring in cqring_wait, if we hit -EFAULT (Xiaoguang)" * tag 'io_uring-5.15-2021-09-17' of git://git.kernel.dk/linux-block: io_uring: allow retry for O_NONBLOCK if async is supported io_uring: auto-removal for direct open/accept io_uring: fix missing sigmask restore in io_cqring_wait() io_uring: pin SQPOLL data before unlocking ring lock io-wq: provide IO_WQ_* constants for IORING_REGISTER_IOWQ_MAX_WORKERS arg items io-wq: fix potential race of acct->nr_workers io-wq: code clean of io_wqe_create_worker() io_uring: ensure symmetry in handling iter types in loop_rw_iter()
2021-09-16Merge tag 'net-5.15-rc2' of ↵Linus Torvalds5-111/+34
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf. Current release - regressions: - vhost_net: fix OoB on sendmsg() failure - mlx5: bridge, fix uninitialized variable usage - bnxt_en: fix error recovery regression Current release - new code bugs: - bpf, mm: fix lockdep warning triggered by stack_map_get_build_id_offset() Previous releases - regressions: - r6040: restore MDIO clock frequency after MAC reset - tcp: fix tp->undo_retrans accounting in tcp_sacktag_one() - dsa: flush switchdev workqueue before tearing down CPU/DSA ports Previous releases - always broken: - ptp: dp83640: don't define PAGE0, avoid compiler warning - igc: fix tunnel segmentation offloads - phylink: update SFP selected interface on advertising changes - stmmac: fix system hang caused by eee_ctrl_timer during suspend/resume - mlx5e: fix mutual exclusion between CQE compression and HW TS Misc: - bpf, cgroups: fix cgroup v2 fallback on v1/v2 mixed mode - sfc: fallback for lack of xdp tx queues - hns3: add option to turn off page pool feature" * tag 'net-5.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (67 commits) mlxbf_gige: clear valid_polarity upon open igc: fix tunnel offloading net/{mlx5|nfp|bnxt}: Remove unnecessary RTNL lock assert net: wan: wanxl: define CROSS_COMPILE_M68K selftests: nci: replace unsigned int with int net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports Revert "net: phy: Uniform PHY driver access" net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup ptp: dp83640: don't define PAGE0 bnx2x: Fix enabling network interfaces without VFs Revert "Revert "ipv4: fix memory leaks in ip_cmsg_send() callers"" tcp: fix tp->undo_retrans accounting in tcp_sacktag_one() net-caif: avoid user-triggerable WARN_ON(1) bpf, selftests: Add test case for mixed cgroup v1/v2 bpf, selftests: Add cgroup v1 net_cls classid helpers bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode bpf: Add oversize check before call kvcalloc() net: hns3: fix the timing issue of VF clearing interrupt sources net: hns3: fix the exception when query imp info net: hns3: disable mac in flr process ...
2021-09-15Merge tag 'hyperv-fixes-signed-20210915' of ↵Linus Torvalds1-2/+19
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Fix kernel crash caused by uio driver (Vitaly Kuznetsov) - Remove on-stack cpumask from HV APIC code (Wei Liu) * tag 'hyperv-fixes-signed-20210915' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/hyperv: remove on-stack cpumask from hv_send_ipi_mask_allbutself asm-generic/hyperv: provide cpumask_to_vpset_noself Drivers: hv: vmbus: Fix kernel crash upon unbinding a device from uio_hv_generic driver
2021-09-15net: dsa: flush switchdev workqueue before tearing down CPU/DSA portsVladimir Oltean1-0/+5
Sometimes when unbinding the mv88e6xxx driver on Turris MOX, these error messages appear: mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete be:79:b4:9e:9e:96 vid 1 from fdb: -2 mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete be:79:b4:9e:9e:96 vid 0 from fdb: -2 mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 100 from fdb: -2 mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 1 from fdb: -2 mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 0 from fdb: -2 (and similarly for other ports) What happens is that DSA has a policy "even if there are bugs, let's at least not leak memory" and dsa_port_teardown() clears the dp->fdbs and dp->mdbs lists, which are supposed to be empty. But deleting that cleanup code, the warnings go away. => the FDB and MDB lists (used for refcounting on shared ports, aka CPU and DSA ports) will eventually be empty, but are not empty by the time we tear down those ports. Aka we are deleting them too soon. The addresses that DSA complains about are host-trapped addresses: the local addresses of the ports, and the MAC address of the bridge device. The problem is that offloading those entries happens from a deferred work item scheduled by the SWITCHDEV_FDB_DEL_TO_DEVICE handler, and this races with the teardown of the CPU and DSA ports where the refcounting is kept. In fact, not only it races, but fundamentally speaking, if we iterate through the port list linearly, we might end up tearing down the shared ports even before we delete a DSA user port which has a bridge upper. So as it turns out, we need to first tear down the user ports (and the unused ones, for no better place of doing that), then the shared ports (the CPU and DSA ports). In between, we need to ensure that all work items scheduled by our switchdev handlers (which only run for user ports, hence the reason why we tear them down first) have finished. Fixes: 161ca59d39e9 ("net: dsa: reference count the MDB entries at the cross-chip notifier level") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20210914134726.2305133-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-15Merge branch 'absolute-pointer' (patches from Guenter)Linus Torvalds1-0/+2
Merge absolute_pointer macro series from Guenter Roeck: "Kernel test builds currently fail for several architectures with error messages such as the following. drivers/net/ethernet/i825xx/82596.c: In function 'i82596_probe': arch/m68k/include/asm/string.h:72:25: error: '__builtin_memcpy' reading 6 bytes from a region of size 0 [-Werror=stringop-overread] Such warnings may be reported by gcc 11.x for string and memory operations on fixed addresses if gcc's builtin functions are used for those operations. This series introduces absolute_pointer() to fix the problem. absolute_pointer() disassociates a pointer from its originating symbol type and context, and thus prevents gcc from making assumptions about pointers passed to memory operations" * emailed patches from Guenter Roeck <linux@roeck-us.net>: alpha: Use absolute_pointer to define COMMAND_LINE alpha: Move setup.h out of uapi net: i825xx: Use absolute_pointer for memcpy from fixed memory location compiler.h: Introduce absolute_pointer macro
2021-09-15compiler.h: Introduce absolute_pointer macroGuenter Roeck1-0/+2
absolute_pointer() disassociates a pointer from its originating symbol type and context. Use it to prevent compiler warnings/errors such as drivers/net/ethernet/i825xx/82596.c: In function 'i82596_probe': arch/m68k/include/asm/string.h:72:25: error: '__builtin_memcpy' reading 6 bytes from a region of size 0 [-Werror=stringop-overread] Such warnings may be reported by gcc 11.x for string and memory operations on fixed addresses. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-15Revert "iov_iter: track truncated size"Jens Axboe1-5/+1
This reverts commit 2112ff5ce0c1128fe7b4d19cfe7f2b8ce5b595fa. We no longer need to track the truncation count, the one user that did need it has been converted to using iov_iter_restore() instead. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-14memblock: introduce saner 'memblock_free_ptr()' interfaceLinus Torvalds1-0/+1
The boot-time allocation interface for memblock is a mess, with 'memblock_alloc()' returning a virtual pointer, but then you are supposed to free it with 'memblock_free()' that takes a _physical_ address. Not only is that all kinds of strange and illogical, but it actually causes bugs, when people then use it like a normal allocation function, and it fails spectacularly on a NULL pointer: https://lore.kernel.org/all/20210912140820.GD25450@xsang-OptiPlex-9020/ or just random memory corruption if the debug checks don't catch it: https://lore.kernel.org/all/61ab2d0c-3313-aaab-514c-e15b7aa054a0@suse.cz/ I really don't want to apply patches that treat the symptoms, when the fundamental cause is this horribly confusing interface. I started out looking at just automating a sane replacement sequence, but because of this mix or virtual and physical addresses, and because people have used the "__pa()" macro that can take either a regular kernel pointer, or just the raw "unsigned long" address, it's all quite messy. So this just introduces a new saner interface for freeing a virtual address that was allocated using 'memblock_alloc()', and that was kept as a regular kernel pointer. And then it converts a couple of users that are obvious and easy to test, including the 'xbc_nodes' case in lib/bootconfig.c that caused problems. Reported-by: kernel test robot <oliver.sang@intel.com> Fixes: 40caa127f3c7 ("init: bootconfig: Remove all bootconfig data when the init memory is removed") Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-14iov_iter: add helper to save iov_iter stateJens Axboe1-0/+15
In an ideal world, when someone is passed an iov_iter and returns X bytes, then X bytes would have been consumed/advanced from the iov_iter. But we have use cases that always consume the entire iterator, a few examples of that are iomap and bdev O_DIRECT. This means we cannot rely on the state of the iov_iter once we've called ->read_iter() or ->write_iter(). This would be easier if we didn't always have to deal with truncate of the iov_iter, as rewinding would be trivial without that. We recently added a commit to track the truncate state, but that grew the iov_iter by 8 bytes and wasn't the best solution. Implement a helper to save enough of the iov_iter state to sanely restore it after we've called the read/write iterator helpers. This currently only works for IOVEC/BVEC/KVEC as that's all we need, support for other iterator types are left as an exercise for the reader. Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wiacKV4Gh-MYjteU0LwNBSGpWrK-Ov25HdqB1ewinrFPg@mail.gmail.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller3-110/+28
Daniel Borkmann says: ==================== pull-request: bpf 2021-09-14 The following pull-request contains BPF updates for your *net* tree. We've added 7 non-merge commits during the last 13 day(s) which contain a total of 18 files changed, 334 insertions(+), 193 deletions(-). The main changes are: 1) Fix mmap_lock lockdep splat in BPF stack map's build_id lookup, from Yonghong Song. 2) Fix BPF cgroup v2 program bypass upon net_cls/prio activation, from Daniel Borkmann. 3) Fix kvcalloc() BTF line info splat on oversized allocation attempts, from Bixuan Cui. 4) Fix BPF selftest build of task_pt_regs test for arm64/s390, from Jean-Philippe Brucker. 5) Fix BPF's disasm.{c,h} to dual-license so that it is aligned with bpftool given the former is a build dependency for the latter, from Daniel Borkmann with ACKs from contributors. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-14x86/mce: Avoid infinite loop for copy from user recoveryTony Luck1-0/+1
There are two cases for machine check recovery: 1) The machine check was triggered by ring3 (application) code. This is the simpler case. The machine check handler simply queues work to be executed on return to user. That code unmaps the page from all users and arranges to send a SIGBUS to the task that triggered the poison. 2) The machine check was triggered in kernel code that is covered by an exception table entry. In this case the machine check handler still queues a work entry to unmap the page, etc. but this will not be called right away because the #MC handler returns to the fix up code address in the exception table entry. Problems occur if the kernel triggers another machine check before the return to user processes the first queued work item. Specifically, the work is queued using the ->mce_kill_me callback structure in the task struct for the current thread. Attempting to queue a second work item using this same callback results in a loop in the linked list of work functions to call. So when the kernel does return to user, it enters an infinite loop processing the same entry for ever. There are some legitimate scenarios where the kernel may take a second machine check before returning to the user. 1) Some code (e.g. futex) first tries a get_user() with page faults disabled. If this fails, the code retries with page faults enabled expecting that this will resolve the page fault. 2) Copy from user code retries a copy in byte-at-time mode to check whether any additional bytes can be copied. On the other side of the fence are some bad drivers that do not check the return value from individual get_user() calls and may access multiple user addresses without noticing that some/all calls have failed. Fix by adding a counter (current->mce_count) to keep track of repeated machine checks before task_work() is called. First machine check saves the address information and calls task_work_add(). Subsequent machine checks before that task_work call back is executed check that the address is in the same page as the first machine check (since the callback will offline exactly one page). Expected worst case is four machine checks before moving on (e.g. one user access with page faults disabled, then a repeat to the same address with page faults enabled ... repeat in copy tail bytes). Just in case there is some code that loops forever enforce a limit of 10. [ bp: Massage commit message, drop noinstr, fix typo, extend panic messages. ] Fixes: 5567d11c21a1 ("x86/mce: Send #MC singal from task work") Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/YT/IJ9ziLqmtqEPu@agluck-desk2.amr.corp.intel.com
2021-09-13bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed modeDaniel Borkmann2-101/+28
Fix cgroup v1 interference when non-root cgroup v2 BPF programs are used. Back in the days, commit bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup") embedded per-socket cgroup information into sock->sk_cgrp_data and in order to save 8 bytes in struct sock made both mutually exclusive, that is, when cgroup v1 socket tagging (e.g. net_cls/net_prio) is used, then cgroup v2 falls back to the root cgroup in sock_cgroup_ptr() (&cgrp_dfl_root.cgrp). The assumption made was "there is no reason to mix the two and this is in line with how legacy and v2 compatibility is handled" as stated in bd1060a1d671. However, with Kubernetes more widely supporting cgroups v2 as well nowadays, this assumption no longer holds, and the possibility of the v1/v2 mixed mode with the v2 root fallback being hit becomes a real security issue. Many of the cgroup v2 BPF programs are also used for policy enforcement, just to pick _one_ example, that is, to programmatically deny socket related system calls like connect(2) or bind(2). A v2 root fallback would implicitly cause a policy bypass for the affected Pods. In production environments, we have recently seen this case due to various circumstances: i) a different 3rd party agent and/or ii) a container runtime such as [0] in the user's environment configuring legacy cgroup v1 net_cls tags, which triggered implicitly mentioned root fallback. Another case is Kubernetes projects like kind [1] which create Kubernetes nodes in a container and also add cgroup namespaces to the mix, meaning programs which are attached to the cgroup v2 root of the cgroup namespace get attached to a non-root cgroup v2 path from init namespace point of view. And the latter's root is out of reach for agents on a kind Kubernetes node to configure. Meaning, any entity on the node setting cgroup v1 net_cls tag will trigger the bypass despite cgroup v2 BPF programs attached to the namespace root. Generally, this mutual exclusiveness does not hold anymore in today's user environments and makes cgroup v2 usage from BPF side fragile and unreliable. This fix adds proper struct cgroup pointer for the cgroup v2 case to struct sock_cgroup_data in order to address these issues; this implicitly also fixes the tradeoffs being made back then with regards to races and refcount leaks as stated in bd1060a1d671, and removes the fallback, so that cgroup v2 BPF programs always operate as expected. [0] https://github.com/nestybox/sysbox/ [1] https://kind.sigs.k8s.io/ Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/bpf/20210913230759.2313-1-daniel@iogearbox.net
2021-09-13cifs: remove pathname for file from SPDX headerSteve French1-1/+0
checkpatch complains about source files with filenames (e.g. in these cases just below the SPDX header in comments at the top of various files in fs/cifs). It also is helpful to change this now so will be less confusing when the parent directory is renamed e.g. from fs/cifs to fs/smb_client (or fs/smbfs) Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-13Merge branch 'gcc-min-version-5.1' (make gcc-5.1 the minimum version)Linus Torvalds5-183/+4
Merge patch series from Nick Desaulniers to update the minimum gcc version to 5.1. This is some of the left-overs from the merge window that I didn't want to deal with yesterday, so it comes in after -rc1 but was sent before. Gcc-4.9 support has been an annoyance for some time, and with -Werror I had the choice of applying a fairly big patch from Kees Cook to remove a fair number of initializer warnings (still leaving some), or this patch series from Nick that just removes the source of the problem. The initializer cleanups might still be worth it regardless, but honestly, I preferred just tackling the problem with gcc-4.9 head-on. We've been more aggressiuve about no longer having to care about compilers that were released a long time ago, and I think it's been a good thing. I added a couple of patches on top to sort out a few left-overs now that we no longer support gcc-4.x. As noted by Arnd, as a result of this minimum compiler version upgrade we can probably change our use of '--std=gnu89' to '--std=gnu11', and finally start using local loop declarations etc. But this series does _not_ yet do that. Link: https://lore.kernel.org/all/20210909182525.372ee687@canb.auug.org.au/ Link: https://lore.kernel.org/lkml/CAK7LNASs6dvU6D3jL2GG3jW58fXfaj6VNOe55NJnTB8UPuk2pA@mail.gmail.com/ Link: https://github.com/ClangBuiltLinux/linux/issues/1438 * emailed patches from Nick Desaulniers <ndesaulniers@google.com>: Drop some straggling mentions of gcc-4.9 as being stale compiler_attributes.h: drop __has_attribute() support for gcc4 vmlinux.lds.h: remove old check for GCC 4.9 compiler-gcc.h: drop checks for older GCC versions Makefile: drop GCC < 5 -fno-var-tracking-assignments workaround arm64: remove GCC version check for ARCH_SUPPORTS_INT128 powerpc: remove GCC version check for UPD_CONSTR riscv: remove Kconfig check for GCC version for ARCH_RV64I Kconfig.debug: drop GCC 5+ version check for DWARF5 mm/ksm: remove old GCC 4.9+ check compiler.h: drop fallback overflow checkers Documentation: raise minimum supported version of GCC to 5.1
2021-09-13Drop some straggling mentions of gcc-4.9 as being staleLinus Torvalds1-1/+0
Fix up the admin-guide README file to the new gcc-5.1 requirement, and remove a stale comment about gcc support for the __assume_aligned__ attribute. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-13compiler_attributes.h: drop __has_attribute() support for gcc4Linus Torvalds1-20/+0
Now that GCC 5.1 is the minimally supported default, the manual workaround for older gcc versions not having __has_attribute() are no longer relevant and can be removed. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-13vmlinux.lds.h: remove old check for GCC 4.9Nick Desaulniers1-4/+0
Now that GCC 5.1 is the minimally supported version of GCC, we can effectively revert commit 85c2ce9104eb ("sched, vmlinux.lds: Increase STRUCT_ALIGNMENT to 64 bytes for GCC-4.9") Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-13compiler-gcc.h: drop checks for older GCC versionsNick Desaulniers1-3/+1
Now that GCC 5.1 is the minimally supported default, drop the values we don't use. Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-13compiler.h: drop fallback overflow checkersNick Desaulniers3-152/+3
Once upgrading the minimum supported version of GCC to 5.1, we can drop the fallback code for !COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW. This is effectively a revert of commit f0907827a8a9 ("compiler.h: enable builtin overflow checkers and add fallback code") Link: https://github.com/ClangBuiltLinux/linux/issues/1438#issuecomment-916745801 Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-13io-wq: provide IO_WQ_* constants for IORING_REGISTER_IOWQ_MAX_WORKERS arg itemsEugene Syromiatnikov1-1/+7
The items passed in the array pointed by the arg parameter of IORING_REGISTER_IOWQ_MAX_WORKERS io_uring_register operation carry certain semantics: they refer to different io-wq worker categories; provide IO_WQ_* constants in the UAPI, so these categories can be referenced in the user space code. Suggested-by: Jens Axboe <axboe@kernel.dk> Complements: 2e480058ddc21ec5 ("io-wq: provide a way to limit max number of workers") Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com> Link: https://lore.kernel.org/r/20210913154415.GA12890@asgard.redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-13afs: Try to avoid taking RCU read lock when checking vnode validityDavid Howells1-2/+6
Try to avoid taking the RCU read lock when checking the validity of a vnode's callback state. The only thing it's needed for is to pin the parent volume's server list whilst we search it to find the record of the server we're currently using to see if it has been reinitialised (ie. it sent us a CB.InitCallBackState* RPC). Do this by the following means: (1) Keep an additional per-cell counter (fs_s_break) that's incremented each time any of the fileservers in the cell reinitialises. Since the new counter can be accessed without RCU from the vnode, we can check that first - and only if it differs, get the RCU read lock and check the volume's server list. (2) Replace afs_get_s_break_rcu() with afs_check_server_good() which now indicates whether the callback promise is still expected to be present on the server. This does the checks as described in (1). (3) Restructure afs_check_validity() to take account of the change in (2). We can also get rid of the valid variable and just use the need_clear variable with the addition of the afs_cb_break_no_promise reason. (4) afs_check_validity() probably shouldn't be altering vnode->cb_v_break and vnode->cb_s_break when it doesn't have cb_lock exclusively locked. Move the change to vnode->cb_v_break to __afs_break_callback(). Delegate the change to vnode->cb_s_break to afs_select_fileserver() and set vnode->cb_fs_s_break there also. (5) afs_validate() no longer needs to get the RCU read lock around its call to afs_check_validity() - and can skip the call entirely if we don't have a promise. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Markus Suvanto <markus.suvanto@gmail.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/163111669583.283156.1397603105683094563.stgit@warthog.procyon.org.uk/
2021-09-12Merge tag 'compiler-attributes-for-linus-v5.15-rc1-v2' of ↵Linus Torvalds3-9/+25
git://github.com/ojeda/linux Pull compiler attributes updates from Miguel Ojeda: - Fix __has_attribute(__no_sanitize_coverage__) for GCC 4 (Marco Elver) - Add Nick as Reviewer for compiler_attributes.h (Nick Desaulniers) - Move __compiletime_{error|warning} (Nick Desaulniers) * tag 'compiler-attributes-for-linus-v5.15-rc1-v2' of git://github.com/ojeda/linux: compiler_attributes.h: move __compiletime_{error|warning} MAINTAINERS: add Nick as Reviewer for compiler_attributes.h Compiler Attributes: fix __has_attribute(__no_sanitize_coverage__) for GCC 4
2021-09-12Merge tag 'smp-urgent-2021-09-12' of ↵Linus Torvalds3-46/+110
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug updates from Thomas Gleixner: "Updates for the SMP and CPU hotplug: - Remove DEFINE_SMP_CALL_CACHE_FUNCTION() which is a left over of the original hotplug code and now causing trouble with the ARM64 cache topology setup due to the pointless SMP function call. It's not longer required as the hotplug callbacks are guaranteed to be invoked on the upcoming CPU. - Remove the deprecated and now unused CPU hotplug functions - Rewrite the CPU hotplug API documentation" * tag 'smp-urgent-2021-09-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation: core-api/cpuhotplug: Rewrite the API section cpu/hotplug: Remove deprecated CPU-hotplug functions. thermal: Replace deprecated CPU-hotplug functions. drivers: base: cacheinfo: Get rid of DEFINE_SMP_CALL_CACHE_FUNCTION()
2021-09-12Merge tag 'locking_urgent_for_v5.15_rc1' of ↵Linus Torvalds1-10/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Borislav Petkov: - Fix the futex PI requeue machinery to not return to userspace in inconsistent state - Avoid a potential null pointer dereference in the ww_mutex deadlock check - Other smaller cleanups and optimizations * tag 'locking_urgent_for_v5.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rtmutex: Fix ww_mutex deadlock check futex: Remove unused variable 'vpid' in futex_proxy_trylock_atomic() futex: Avoid redundant task lookup futex: Clarify comment for requeue_pi_wake_futex() futex: Prevent inconsistent state and exit race futex: Return error code instead of assigning it without effect locking/rwsem: Add missing __init_rwsem() for PREEMPT_RT
2021-09-12Merge tag 'timers_urgent_for_v5.15_rc1' of ↵Linus Torvalds1-2/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Borislav Petkov: - Handle negative second values properly when converting a timespec64 to nanoseconds. * tag 'timers_urgent_for_v5.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: time: Handle negative seconds correctly in timespec64_to_ns()
2021-09-11Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds6-27/+363
Pull virtio updates from Michael Tsirkin: - vduse driver ("vDPA Device in Userspace") supporting emulated virtio block devices - virtio-vsock support for end of record with SEQPACKET - vdpa: mac and mq support for ifcvf and mlx5 - vdpa: management netlink for ifcvf - virtio-i2c, gpio dt bindings - misc fixes and cleanups * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (39 commits) Documentation: Add documentation for VDUSE vduse: Introduce VDUSE - vDPA Device in Userspace vduse: Implement an MMU-based software IOTLB vdpa: Support transferring virtual addressing during DMA mapping vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap() vdpa: Add an opaque pointer for vdpa_config_ops.dma_map() vhost-iotlb: Add an opaque pointer for vhost IOTLB vhost-vdpa: Handle the failure of vdpa_reset() vdpa: Add reset callback in vdpa_config_ops vdpa: Fix some coding style issues file: Export receive_fd() to modules eventfd: Export eventfd_wake_count to modules iova: Export alloc_iova_fast() and free_iova_fast() virtio-blk: remove unneeded "likely" statements virtio-balloon: Use virtio_find_vqs() helper vdpa: Make use of PFN_PHYS/PFN_UP/PFN_DOWN helper macro vsock_test: update message bounds test for MSG_EOR af_vsock: rename variables in receive loop virtio/vsock: support MSG_EOR bit processing vhost/vsock: support MSG_EOR bit processing ...
2021-09-11Merge tag 'trace-v5.15-3' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Minor fixes to the processing of the bootconfig tree" * tag 'trace-v5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: bootconfig: Rename xbc_node_find_child() to xbc_node_find_subkey() tracing/boot: Fix to check the histogram control param is a leaf node tracing/boot: Fix trace_boot_hist_add_array() to check array is value
2021-09-11Merge tag 'pwm/for-5.15-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm Pull pwm updates from Thierry Reding: "The changes this time around are mostly janitorial in nature. A lot of this is simplifications of drivers using device-managed functions and improving compilation coverage. The Mediatek display PWM driver now supports the atomic API. Cleanups and minor fixes make up the remainder of this set" * tag 'pwm/for-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (54 commits) pwm: mtk-disp: Implement atomic API .get_state() pwm: mtk-disp: Fix overflow in period and duty calculation pwm: mtk-disp: Implement atomic API .apply() pwm: mtk-disp: Adjust the clocks to avoid them mismatch dt-bindings: pwm: rockchip: Add description for rk3568 pwm: Make pwmchip_remove() return void pwm: sun4i: Don't check the return code of pwmchip_remove() pwm: sifive: Don't check the return code of pwmchip_remove() pwm: samsung: Don't check the return code of pwmchip_remove() pwm: renesas-tpu: Don't check the return code of pwmchip_remove() pwm: rcar: Don't check the return code of pwmchip_remove() pwm: pca9685: Don't check the return code of pwmchip_remove() pwm: omap-dmtimer: Don't check the return code of pwmchip_remove() pwm: mtk-disp: Don't check the return code of pwmchip_remove() pwm: imx-tpm: Don't check the return code of pwmchip_remove() pwm: img: Don't check the return code of pwmchip_remove() pwm: cros-ec: Don't check the return code of pwmchip_remove() pwm: brcmstb: Don't check the return code of pwmchip_remove() pwm: atmel-tcb: Don't check the return code of pwmchip_remove() pwm: atmel-hlcdc: Don't check the return code of pwmchip_remove() ...
2021-09-11Merge tag 'thermal-v5.15-rc1' of ↵Linus Torvalds2-3/+18
git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux Pull thermal updates from Daniel Lezcano: - Add the tegra3 thermal sensor and fix the compilation testing on tegra by adding a dependency on ARCH_TEGRA along with COMPILE_TEST (Dmitry Osipenko) - Fix the error code for the exynos when devm_get_clk() fails (Dan Carpenter) - Add the TCC cooling support for AlderLake platform (Sumeet Pawnikar) - Add support for hardware trip points for the rcar gen3 thermal driver and store TSC id as unsigned int (Niklas Söderlund) - Replace the deprecated CPU-hotplug functions get_online_cpus() and put_online_cpus (Sebastian Andrzej Siewior) - Add the thermal tools directory in the MAINTAINERS file (Daniel Lezcano) - Fix the Makefile and the cross compilation flags for the userspace 'tmon' tool (Rolf Eike Beer) - Allow to use the IMOK independently from the GDDV on Int340x (Sumeet Pawnikar) - Fix the stub thermal_cooling_device_register() function prototype which does not match the real function (Arnd Bergmann) - Make the thermal trip point optional in the DT bindings (Maxime Ripard) - Fix a typo in a comment in the core code (Geert Uytterhoeven) - Reduce the verbosity of the trace in the SoC thermal tegra driver (Dmitry Osipenko) - Add the support for the LMh (Limit Management hardware) driver on the QCom platforms (Thara Gopinath) - Allow processing of HWP interrupt by adding a weak function in the Intel driver (Srinivas Pandruvada) - Prevent an abort of the sensor probe is a channel is not used (Matthias Kaehlcke) * tag 'thermal-v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: thermal/drivers/qcom/spmi-adc-tm5: Don't abort probing if a sensor is not used thermal/drivers/intel: Allow processing of HWP interrupt dt-bindings: thermal: Add dt binding for QCOM LMh thermal/drivers/qcom: Add support for LMh driver firmware: qcom_scm: Introduce SCM calls to access LMh thermal/drivers/tegra-soctherm: Silence message about clamped temperature thermal: Spelling s/scallbacks/callbacks/ dt-bindings: thermal: Make trips node optional thermal/core: Fix thermal_cooling_device_register() prototype thermal/drivers/int340x: Use IMOK independently tools/thermal/tmon: Add cross compiling support thermal/tools/tmon: Improve the Makefile MAINTAINERS: Add missing userspace thermal tools to the thermal section thermal/drivers/intel_powerclamp: Replace deprecated CPU-hotplug functions. thermal/drivers/rcar_gen3_thermal: Store TSC id as unsigned int thermal/drivers/rcar_gen3_thermal: Add support for hardware trip points drivers/thermal/intel: Add TCC cooling support for AlderLake platform thermal/drivers/exynos: Fix an error code in exynos_tmu_probe() thermal/drivers/tegra: Correct compile-testing of drivers thermal/drivers/tegra: Add driver for Tegra30 thermal sensor
2021-09-11asm-generic/hyperv: provide cpumask_to_vpset_noselfWei Liu1-2/+19
This is a new variant which removes `self' cpu from the vpset. It will be used in Hyper-V enlightened IPI code. Signed-off-by: Wei Liu <wei.liu@kernel.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210910185714.299411-2-wei.liu@kernel.org
2021-09-11Documentation: core-api/cpuhotplug: Rewrite the API sectionThomas Gleixner1-22/+110
Dave stumbled over the incomplete and confusing documentation of the CPU hotplug API. Rewrite it, add the missing function documentations and correct the existing ones. Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210909123212.489059409@linutronix.de
2021-09-11cpu/hotplug: Remove deprecated CPU-hotplug functions.Sebastian Andrzej Siewior1-6/+0
No users in tree use the deprecated CPU-hotplug functions anymore. Remove them. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210803141621.780504-39-bigeasy@linutronix.de
2021-09-11Merge branch 'linus' into smp/urgentThomas Gleixner509-4500/+15849
Ensure that all usage sites of get/put_online_cpus() except for the struggler in drivers/thermal are gone. So the last user and the deprecated inlines can be removed.
2021-09-10bpf, mm: Fix lockdep warning triggered by stack_map_get_build_id_offset()Yonghong Song1-9/+0
Currently the bpf selftest "get_stack_raw_tp" triggered the warning: [ 1411.304463] WARNING: CPU: 3 PID: 140 at include/linux/mmap_lock.h:164 find_vma+0x47/0xa0 [ 1411.304469] Modules linked in: bpf_testmod(O) [last unloaded: bpf_testmod] [ 1411.304476] CPU: 3 PID: 140 Comm: systemd-journal Tainted: G W O 5.14.0+ #53 [ 1411.304479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 1411.304481] RIP: 0010:find_vma+0x47/0xa0 [ 1411.304484] Code: de 48 89 ef e8 ba f5 fe ff 48 85 c0 74 2e 48 83 c4 08 5b 5d c3 48 8d bf 28 01 00 00 be ff ff ff ff e8 2d 9f d8 00 85 c0 75 d4 <0f> 0b 48 89 de 48 8 [ 1411.304487] RSP: 0018:ffffabd440403db8 EFLAGS: 00010246 [ 1411.304490] RAX: 0000000000000000 RBX: 00007f00ad80a0e0 RCX: 0000000000000000 [ 1411.304492] RDX: 0000000000000001 RSI: ffffffff9776b144 RDI: ffffffff977e1b0e [ 1411.304494] RBP: ffff9cf5c2f50000 R08: ffff9cf5c3eb25d8 R09: 00000000fffffffe [ 1411.304496] R10: 0000000000000001 R11: 00000000ef974e19 R12: ffff9cf5c39ae0e0 [ 1411.304498] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9cf5c39ae0e0 [ 1411.304501] FS: 00007f00ae754780(0000) GS:ffff9cf5fba00000(0000) knlGS:0000000000000000 [ 1411.304504] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1411.304506] CR2: 000000003e34343c CR3: 0000000103a98005 CR4: 0000000000370ee0 [ 1411.304508] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1411.304510] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1411.304512] Call Trace: [ 1411.304517] stack_map_get_build_id_offset+0x17c/0x260 [ 1411.304528] __bpf_get_stack+0x18f/0x230 [ 1411.304541] bpf_get_stack_raw_tp+0x5a/0x70 [ 1411.305752] RAX: 0000000000000000 RBX: 5541f689495641d7 RCX: 0000000000000000 [ 1411.305756] RDX: 0000000000000001 RSI: ffffffff9776b144 RDI: ffffffff977e1b0e [ 1411.305758] RBP: ffff9cf5c02b2f40 R08: ffff9cf5ca7606c0 R09: ffffcbd43ee02c04 [ 1411.306978] bpf_prog_32007c34f7726d29_bpf_prog1+0xaf/0xd9c [ 1411.307861] R10: 0000000000000001 R11: 0000000000000044 R12: ffff9cf5c2ef60e0 [ 1411.307865] R13: 0000000000000005 R14: 0000000000000000 R15: ffff9cf5c2ef6108 [ 1411.309074] bpf_trace_run2+0x8f/0x1a0 [ 1411.309891] FS: 00007ff485141700(0000) GS:ffff9cf5fae00000(0000) knlGS:0000000000000000 [ 1411.309896] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1411.311221] syscall_trace_enter.isra.20+0x161/0x1f0 [ 1411.311600] CR2: 00007ff48514d90e CR3: 0000000107114001 CR4: 0000000000370ef0 [ 1411.312291] do_syscall_64+0x15/0x80 [ 1411.312941] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1411.313803] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 1411.314223] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1411.315082] RIP: 0033:0x7f00ad80a0e0 [ 1411.315626] Call Trace: [ 1411.315632] stack_map_get_build_id_offset+0x17c/0x260 To reproduce, first build `test_progs` binary: make -C tools/testing/selftests/bpf -j60 and then run the binary at tools/testing/selftests/bpf directory: ./test_progs -t get_stack_raw_tp The warning is due to commit 5b78ed24e8ec ("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()") which added mmap_assert_locked() in find_vma() function. The mmap_assert_locked() function asserts that mm->mmap_lock needs to be held. But this is not the case for bpf_get_stack() or bpf_get_stackid() helper (kernel/bpf/stackmap.c), which uses mmap_read_trylock_non_owner() instead. Since mm->mmap_lock is not held in bpf_get_stack[id]() use case, the above warning is emitted during test run. This patch fixed the issue by (1). using mmap_read_trylock() instead of mmap_read_trylock_non_owner() to satisfy lockdep checking in find_vma(), and (2). droping lockdep for mmap_lock right before the irq_work_queue(). The function mmap_read_trylock_non_owner() is also removed since after this patch nobody calls it any more. Fixes: 5b78ed24e8ec ("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()") Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Luigi Rizzo <lrizzo@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: linux-mm@kvack.org Link: https://lore.kernel.org/bpf/20210909155000.1610299-1-yhs@fb.com
2021-09-10Merge tag 'pm-5.15-rc1-3' of ↵Linus Torvalds2-4/+9
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These improve hybrid processors support in intel_pstate, fix an issue in the core devices PM code, clean up the handling of dedicated wake IRQs, update the Energy Model documentation and update MAINTAINERS. Specifics: - Make the HWP performance levels calibration on hybrid processors in intel_pstate more straightforward (Rafael Wysocki). - Prevent the PM core from leaving devices in suspend after a failing system-wide suspend transition in some cases when driver PM flags are used (Prasad Sodagudi). - Drop unused function argument from the dedicated wake IRQs handling code (Sergey Shtylyov). - Fix up Energy Model kerneldoc comments and include them in the Energy Model documentation (Lukasz Luba). - Use my kernel.org address in MAINTAINERS insead of the personal one (Rafael Wysocki)" * tag 'pm-5.15-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: MAINTAINERS: Change Rafael's e-mail address PM: sleep: core: Avoid setting power.must_resume to false Documentation: power: include kernel-doc in Energy Model doc PM: EM: fix kernel-doc comments cpufreq: intel_pstate: hybrid: Rework HWP calibration ACPI: CPPC: Introduce cppc_get_nominal_perf() PM: sleep: wakeirq: drop useless parameter from dev_pm_attach_wake_irq()
2021-09-10Merge tag 'char-misc-5.15-rc1-2' of ↵Linus Torvalds1-20/+166
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull habanalabs updates from Greg KH: "Here is another round of misc driver patches for 5.15-rc1. In here is only updates for the Habanalabs driver. This request is late because the previously-objected-to dma-buf patches are all removed and some fixes that you and others found are now included in here as well. All of these have been in linux-next for well over a week with no reports of problems, and they are all self-contained to only this one driver. Full details are in the shortlog" * tag 'char-misc-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (61 commits) habanalabs/gaudi: hwmon default card name habanalabs: add support for f/w reset habanalabs/gaudi: block ICACHE_BASE_ADDERESS_HIGH in TPC habanalabs: cannot sleep while holding spinlock habanalabs: never copy_from_user inside spinlock habanalabs: remove unnecessary device status check habanalabs: disable IRQ in user interrupts spinlock habanalabs: add "in device creation" status habanalabs/gaudi: invalidate PMMU mem cache on init habanalabs/gaudi: size should be printed in decimal habanalabs/gaudi: define DC POWER for secured PMC habanalabs/gaudi: unmask out of bounds SLM access interrupt habanalabs: add userptr_lookup node in debugfs habanalabs/gaudi: fetch TPC/MME ECC errors from F/W habanalabs: modify multi-CS to wait on stream masters habanalabs/gaudi: add monitored SOBs to state dump habanalabs/gaudi: restore user registers when context opens habanalabs/gaudi: increase boot fit timeout habanalabs: update to latest firmware headers habanalabs/gaudi: minimize number of register reads ...
2021-09-10Merge branches 'pm-cpufreq', 'pm-sleep' and 'pm-em'Rafael J. Wysocki2-4/+9
* pm-cpufreq: cpufreq: intel_pstate: hybrid: Rework HWP calibration ACPI: CPPC: Introduce cppc_get_nominal_perf() * pm-sleep: PM: sleep: core: Avoid setting power.must_resume to false PM: sleep: wakeirq: drop useless parameter from dev_pm_attach_wake_irq() * pm-em: Documentation: power: include kernel-doc in Energy Model doc PM: EM: fix kernel-doc comments
2021-09-10Merge tag 'drm-next-2021-09-10' of git://anongit.freedesktop.org/drm/drmLinus Torvalds1-1/+2
Pull drm fixes from Dave Airlie: "Just an initial bunch of fixes for the merge window, amdgpu is most of them with a few ttm fixes and an fbdev avoid multiply overflow fix. core: - Make some dma-buf config options depend on DMA_SHARED_BUFFER - Handle multiplication overflow of fbdev xres/yres in the core ttm: - Fix ttm_bo_move_memcpy() when ttm_resource is subclassed - Fix ttm deadlock if target BO isn't idle - ttm build fix - ttm docs fix dma-buf: - config option fixes fbdev: - limit resolutions to avoid int overflow i915: - stddef change. amdgpu: - Misc cleanups, typo fixes - EEPROM fix - Add some new PCI IDs - Scatter/Gather display support for Yellow Carp - PCIe DPM fix for RKL platforms - RAS fix amdkfd: - SVM fix vc4: - static function fix mgag200: - fix uninit var panfrost: - lock_region fixes" * tag 'drm-next-2021-09-10' of git://anongit.freedesktop.org/drm/drm: (36 commits) drm/ttm: Fix a deadlock if the target BO is not idle during swap fbmem: don't allow too huge resolutions dma-buf: DMABUF_SYSFS_STATS should depend on DMA_SHARED_BUFFER dma-buf: DMABUF_DEBUG should depend on DMA_SHARED_BUFFER drm/i915: use linux/stddef.h due to "isystem: trim/fixup stdarg.h and other headers" dma-buf: DMABUF_MOVE_NOTIFY should depend on DMA_SHARED_BUFFER drm/amdkfd: drop process ref count when xnack disable drm/amdgpu: enable more pm sysfs under SRIOV 1-VF mode drm/amdgpu: fix fdinfo race with process exit drm/amdgpu: Fix a deadlock if previous GEM object allocation fails drm/amdgpu: stop scheduler when calling hw_fini (v2) drm/amdgpu: Clear RAS interrupt status on aldebaran drm/amd/display: Initialize lt_settings on instantiation drm/amd/display: cleanup idents after a revert drm/amd/display: Fix memory leak reported by coverity drm/ttm: Fix ttm_bo_move_memcpy() for subclassed struct ttm_resource drm/amdgpu/swsmu: fix spelling mistake "minimun" -> "minimum" drm/amdgpu: Disable PCIE_DPM on Intel RKL Platform drm/amdgpu: show both cmd id and name when psp cmd failed drm/amd/display: setup system context for APUs ...
2021-09-09bootconfig: Rename xbc_node_find_child() to xbc_node_find_subkey()Masami Hiramatsu1-2/+2
Rename xbc_node_find_child() to xbc_node_find_subkey() for clarifying that function returns a key node (no value node). Since there are xbc_node_for_each_child() (loop on all child nodes) and xbc_node_for_each_subkey() (loop on only subkey nodes), this name distinction is necessary to avoid confusing users. Link: https://lkml.kernel.org/r/163119459826.161018.11200274779483115300.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-09Merge tag 'for-linus-5.15-rc1' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML updates from Richard Weinberger: - Support for VMAP_STACK - Support for splice_write in hostfs - Fixes for virt-pci - Fixes for virtio_uml - Various fixes * tag 'for-linus-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: fix stub location calculation um: virt-pci: fix uapi documentation um: enable VMAP_STACK um: virt-pci: don't do DMA from stack hostfs: support splice_write um: virtio_uml: fix memory leak on init failures um: virtio_uml: include linux/virtio-uml.h lib/logic_iomem: fix sparse warnings um: make PCI emulation driver init/exit static
2021-09-09Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds3-10/+25
Pull ARM development updates from Russell King: - Rename "mod_init" and "mod_exit" so that initcall debug output is actually useful (Randy Dunlap) - Update maintainers entries for linux-arm-kernel to indicate it is moderated for non-subscribers (Randy Dunlap) - Move install rules to arch/arm/Makefile (Masahiro Yamada) - Drop unnecessary ARCH_NR_GPIOS definition (Linus Walleij) - Don't warn about atags_to_fdt() stack size (David Heidelberg) - Speed up unaligned copy_{from,to}_kernel_nofault (Arnd Bergmann) - Get rid of set_fs() usage (Arnd Bergmann) - Remove checks for GCC prior to v4.6 (Geert Uytterhoeven) * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 9118/1: div64: Remove always-true __div64_const32_is_OK() duplicate ARM: 9117/1: asm-generic: div64: Remove always-true __div64_const32_is_OK() ARM: 9116/1: unified: Remove check for gcc < 4 ARM: 9110/1: oabi-compat: fix oabi epoll sparse warning ARM: 9113/1: uaccess: remove set_fs() implementation ARM: 9112/1: uaccess: add __{get,put}_kernel_nofault ARM: 9111/1: oabi-compat: rework fcntl64() emulation ARM: 9114/1: oabi-compat: rework sys_semtimedop emulation ARM: 9108/1: oabi-compat: rework epoll_wait/epoll_pwait emulation ARM: 9107/1: syscall: always store thread_info->abi_syscall ARM: 9109/1: oabi-compat: add epoll_pwait handler ARM: 9106/1: traps: use get_kernel_nofault instead of set_fs() ARM: 9115/1: mm/maccess: fix unaligned copy_{from,to}_kernel_nofault ARM: 9105/1: atags_to_fdt: don't warn about stack size ARM: 9103/1: Drop ARCH_NR_GPIOS definition ARM: 9102/1: move theinstall rules to arch/arm/Makefile ARM: 9100/1: MAINTAINERS: mark all linux-arm-kernel@infradead list as moderated ARM: 9099/1: crypto: rename 'mod_init' & 'mod_exit' functions to be module-specific
2021-09-09Merge branch 'work.gfs2' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull gfs2 setattr updates from Al Viro: "Make it possible for filesystems to use a generic 'may_setattr()' and switch gfs2 to using it" * 'work.gfs2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: gfs2: Switch to may_setattr in gfs2_setattr fs: Move notify_change permission checks into may_setattr
2021-09-09Merge branch 'work.init' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull root filesystem type handling updates from Al Viro: "Teach init/do_mounts.c to handle non-block filesystems, hopefully preventing even more special-cased kludges (such as root=/dev/nfs, etc)" * 'work.init' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: simplify get_filesystem_list / get_all_fs_names init: allow mounting arbitrary non-blockdevice filesystems as root init: split get_fs_names
2021-09-09Merge branch 'work.iov_iter' of ↵Linus Torvalds1-1/+5
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull iov_iter fixes from Al Viro: "Fixes for io-uring handling of iov_iter reexpands" * 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: io_uring: reexpand under-reexpanded iters iov_iter: track truncated size