summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2022-10-14cifs: fix static checker warningPaulo Alcantara1-1/+1
Remove unnecessary NULL check of oparam->cifs_sb when parsing symlink error response as it's already set by all smb2_open_file() callers and deferenced earlier. This fixes below report: fs/cifs/smb2file.c:126 smb2_open_file() warn: variable dereferenced before check 'oparms->cifs_sb' (see line 112) Link: https://lore.kernel.org/r/Y0kt42j2tdpYakRu@kili Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-13Merge tag 'ceph-for-6.1-rc1' of https://github.com/ceph/ceph-clientLinus Torvalds5-9/+54
Pull ceph updates from Ilya Dryomov: "A quiet round this time: several assorted filesystem fixes, the most noteworthy one being some additional wakeups in cap handling code, and a messenger cleanup" * tag 'ceph-for-6.1-rc1' of https://github.com/ceph/ceph-client: ceph: remove Sage's git tree from documentation ceph: fix incorrectly showing the .snap size for stat ceph: fail the open_by_handle_at() if the dentry is being unlinked ceph: increment i_version when doing a setattr with caps ceph: Use kcalloc for allocating multiple elements ceph: no need to wait for transition RDCACHE|RD -> RD ceph: fail the request if the peer MDS doesn't support getvxattr op ceph: wake up the waiters if any new caps comes libceph: drop last_piece flag from ceph_msg_data_cursor
2022-10-13Merge tag 'nfs-for-6.1-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds17-36/+194
Pull NFS client updates from Anna Schumaker: "New Features: - Add NFSv4.2 xattr tracepoints - Replace xprtiod WQ in rpcrdma - Flexfiles cancels I/O on layout recall or revoke Bugfixes and Cleanups: - Directly use ida_alloc() / ida_free() - Don't open-code max_t() - Prefer using strscpy over strlcpy - Remove unused forward declarations - Always return layout states on flexfiles layout return - Have LISTXATTR treat NFS4ERR_NOXATTR as an empty reply instead of error - Allow more xprtrdma memory allocations to fail without triggering a reclaim - Various other xprtrdma clean ups - Fix rpc_killall_tasks() races" * tag 'nfs-for-6.1-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (27 commits) NFSv4/flexfiles: Cancel I/O if the layout is recalled or revoked SUNRPC: Add API to force the client to disconnect SUNRPC: Add a helper to allow pNFS drivers to selectively cancel RPC calls SUNRPC: Fix races with rpc_killall_tasks() xprtrdma: Fix uninitialized variable xprtrdma: Prevent memory allocations from driving a reclaim xprtrdma: Memory allocation should be allowed to fail during connect xprtrdma: MR-related memory allocation should be allowed to fail xprtrdma: Clean up synopsis of rpcrdma_regbuf_alloc() xprtrdma: Clean up synopsis of rpcrdma_req_create() svcrdma: Clean up RPCRDMA_DEF_GFP SUNRPC: Replace the use of the xprtiod WQ in rpcrdma NFSv4.2: Add a tracepoint for listxattr NFSv4.2: Add tracepoints for getxattr, setxattr, and removexattr NFSv4.2: Move TRACE_DEFINE_ENUM(NFS4_CONTENT_*) under CONFIG_NFS_V4_2 NFSv4.2: Add special handling for LISTXATTR receiving NFS4ERR_NOXATTR nfs: remove nfs_wait_atomic_killable() and nfs_write_prepare() declaration NFSv4: remove nfs4_renewd_prepare_shutdown() declaration fs/nfs/pnfs_nfs.c: fix spelling typo and syntax error in comment NFSv4/pNFS: Always return layout stats on layout return for flexfiles ...
2022-10-13Merge tag 'for-linus-6.1-ofs1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs update from Mike Marshall: "Change iterate to iterate_shared" * tag 'for-linus-6.1-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: Orangefs: change iterate to iterate_shared
2022-10-13nfsd: ensure we always call fh_verify_error tracepointJeff Layton1-1/+1
This is a conditional tracepoint. Call it every time, not just when nfs_permission fails. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-10-13cifs: use ALIGN() and round_up() macrosEnzo Matsumiya5-38/+33
Improve code readability by using existing macros: Replace hardcoded alignment computations (e.g. (len + 7) & ~0x7) by ALIGN()/IS_ALIGNED() macros. Also replace (DIV_ROUND_UP(len, 8) * 8) with ALIGN(len, 8), which, if not optimized by the compiler, has the overhead of a multiplication and a division. Do the same for roundup() by replacing it by round_up() (division-less version, but requires the multiple to be a power of 2, which is always the case for us). And remove some unnecessary checks where !IS_ALIGNED() would fit, but calling round_up() directly is fine as it's a no-op if the value is already aligned. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-13cifs: find and use the dentry for cached non-root directories alsoRonnie Sahlberg1-14/+49
This allows us to use cached attributes for the entries in a cached directory for as long as a lease is held on the directory itself. Previously we have always allowed "used cached attributes for 1 second" but this extends this to the lifetime of the lease as well as making the caching safer. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-13cifs: enable caching of directories for which a lease is heldRonnie Sahlberg4-193/+263
This expands the directory caching to now cache an open handle for all directories (up to a maximum) and not just the root directory. In this patch, locking and refcounting is intended to work as so: The main function to get a reference to a cached handle is find_or_create_cached_dir() called from open_cached_dir() These functions are protected under the cfid_list_lock spin-lock to make sure we do not race creating new references for cached dirs with deletion of expired ones. An successful open_cached_dir() will take out 2 references to the cfid if this was the very first and successful call to open the directory and it acquired a lease from the server. One reference is for the lease and the other is for the cfid that we return. The is lease reference is tracked by cfid->has_lease. If the directory already has a handle with an active lease, then we just take out one new reference for the cfid and return it. It can happen that we have a thread that tries to open a cached directory where we have a cfid already but we do not, yet, have a working lease. In this case we will just return NULL, and this the caller will fall back to the case when no handle was available. In this model the total number of references we have on a cfid is 1 for while the handle is open and we have a lease, and one additional reference for each open instance of a cfid. Once we get a lease break (cached_dir_lease_break()) we remove the cfid from the list under the spinlock. This prevents any new threads to use it, and we also call smb2_cached_lease_break() via the work_queue in order to drop the reference we got for the lease (we drop it outside of the spin-lock.) Anytime a thread calls close_cached_dir() we also drop a reference to the cfid. When the last reference to the cfid is released smb2_close_cached_fid() will be invoked which will drop the reference ot the dentry we held for this cfid and it will also, if we the handle is open/has a lease also call SMB2_close() to close the handle on the server. Two events require special handling: invalidate_all_cached_dirs() this function is called from SMB2_tdis() and cifs_mark_open_files_invalid(). In both cases the tcon is either gone already or will be shortly so we do not need to actually close the handles. They will be dropped server side as part of the tcon dropping. But we have to be careful about a potential race with a concurrent lease break so we need to take out additional refences to avoid the cfid from being freed while we are still referencing it. free_cached_dirs() which is called from tconInfoFree(). This is called quite late in the umount process so there should no longer be any open handles or files and we can just free all the remaining data. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-13cifs: prevent copying past input buffer boundariesPaulo Alcantara1-2/+2
Prevent copying past @data buffer in smb2_validate_and_copy_iov() as the output buffer in @iov might be potentially bigger and thus copying more bytes than requested in @minbufsize. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-13cifs: fix uninitialised var in smb2_compound_op()Paulo Alcantara1-0/+1
Fix uninitialised variable @idata when calling smb2_compound_op() with SMB2_OP_POSIX_QUERY_INFO. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-13cifs: improve symlink handling for smb2+Paulo Alcantara14-453/+451
When creating inode for symlink, the client used to send below requests to fill it in: * create+query_info+close (STATUS_STOPPED_ON_SYMLINK) * create(+reparse_flag)+query_info+close (set file attrs) * create+ioctl(get_reparse)+close (query reparse tag) and then for every access to the symlink dentry, the ->link() method would send another: * create+ioctl(get_reparse)+close (parse symlink) So, in order to improve: (i) Get rid of unnecessary roundtrips and then resolve symlinks as follows: * create+query_info+close (STATUS_STOPPED_ON_SYMLINK + parse symlink + get reparse tag) * create(+reparse_flag)+query_info+close (set file attrs) (ii) Set the resolved symlink target directly in inode->i_link and use simple_get_link() for ->link() to simply return it. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-13smb3: clarify multichannel warningSteve French1-1/+2
When server does not return network interfaces, clarify the message to indicate that "multichannel not available" not just that "empty network interface returned by server ..." Suggested-by: Tom Talpey <tom@talpey.com> Reviewed-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-13cifs: fix regression in very old smb1 mountsRonnie Sahlberg1-6/+5
BZ: 215375 Fixes: 76a3c92ec9e0 ("cifs: remove support for NTLM and weaker authentication algorithms") Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-12ext4,f2fs: fix readahead of verity dataMatthew Wilcox (Oracle)2-2/+4
The recent change of page_cache_ra_unbounded() arguments was buggy in the two callers, causing us to readahead the wrong pages. Move the definition of ractl down to after the index is set correctly. This affected performance on configurations that use fs-verity. Link: https://lkml.kernel.org/r/20221012193419.1453558-1-willy@infradead.org Fixes: 73bb49da50cd ("mm/readahead: make page_cache_ra_unbounded take a readahead_control") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reported-by: Jintao Yin <nicememory@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-12Merge tag 'mm-hotfixes-stable-2022-10-11' of ↵Linus Torvalds2-5/+21
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "Five hotfixes - three for nilfs2, two for MM. For are cc:stable, one is not" * tag 'mm-hotfixes-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: nilfs2: fix leak of nilfs_root in case of writer thread creation failure nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level() nilfs2: fix use-after-free bug of struct nilfs_root mm/damon/core: initialize damon_target->list in damon_new_target() mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page
2022-10-12Merge tag 'mm-nonmm-stable-2022-10-11' of ↵Linus Torvalds30-175/+260
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - hfs and hfsplus kmap API modernization (Fabio Francesco) - make crash-kexec work properly when invoked from an NMI-time panic (Valentin Schneider) - ntfs bugfixes (Hawkins Jiawei) - improve IPC msg scalability by replacing atomic_t's with percpu counters (Jiebin Sun) - nilfs2 cleanups (Minghao Chi) - lots of other single patches all over the tree! * tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits) include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype proc: test how it holds up with mapping'less process mailmap: update Frank Rowand email address ia64: mca: use strscpy() is more robust and safer init/Kconfig: fix unmet direct dependencies ia64: update config files nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure fork: remove duplicate included header files init/main.c: remove unnecessary (void*) conversions proc: mark more files as permanent nilfs2: remove the unneeded result variable nilfs2: delete unnecessary checks before brelse() checkpatch: warn for non-standard fixes tag style usr/gen_init_cpio.c: remove unnecessary -1 values from int file ipc/msg: mitigate the lock contention with percpu counter percpu: add percpu_counter_add_local and percpu_counter_sub_local fs/ocfs2: fix repeated words in comments relay: use kvcalloc to alloc page array in relay_alloc_page_array proc: make config PROC_CHILDREN depend on PROC_FS fs: uninline inode_maybe_inc_iversion() ...
2022-10-11nilfs2: fix leak of nilfs_root in case of writer thread creation failureRyusuke Konishi1-4/+3
If nilfs_attach_log_writer() failed to create a log writer thread, it frees a data structure of the log writer without any cleanup. After commit e912a5b66837 ("nilfs2: use root object to get ifile"), this causes a leak of struct nilfs_root, which started to leak an ifile metadata inode and a kobject on that struct. In addition, if the kernel is booted with panic_on_warn, the above ifile metadata inode leak will cause the following panic when the nilfs2 kernel module is removed: kmem_cache_destroy nilfs2_inode_cache: Slab cache still has objects when called from nilfs_destroy_cachep+0x16/0x3a [nilfs2] WARNING: CPU: 8 PID: 1464 at mm/slab_common.c:494 kmem_cache_destroy+0x138/0x140 ... RIP: 0010:kmem_cache_destroy+0x138/0x140 Code: 00 20 00 00 e8 a9 55 d8 ff e9 76 ff ff ff 48 8b 53 60 48 c7 c6 20 70 65 86 48 c7 c7 d8 69 9c 86 48 8b 4c 24 28 e8 ef 71 c7 00 <0f> 0b e9 53 ff ff ff c3 48 81 ff ff 0f 00 00 77 03 31 c0 c3 53 48 ... Call Trace: <TASK> ? nilfs_palloc_freev.cold.24+0x58/0x58 [nilfs2] nilfs_destroy_cachep+0x16/0x3a [nilfs2] exit_nilfs_fs+0xa/0x1b [nilfs2] __x64_sys_delete_module+0x1d9/0x3a0 ? __sanitizer_cov_trace_pc+0x1a/0x50 ? syscall_trace_enter.isra.19+0x119/0x190 do_syscall_64+0x34/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd ... </TASK> Kernel panic - not syncing: panic_on_warn set ... This patch fixes these issues by calling nilfs_detach_log_writer() cleanup function if spawning the log writer thread fails. Link: https://lkml.kernel.org/r/20221007085226.57667-1-konishi.ryusuke@gmail.com Fixes: e912a5b66837 ("nilfs2: use root object to get ifile") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+7381dc4ad60658ca4c05@syzkaller.appspotmail.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-11nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level()Ryusuke Konishi1-0/+2
If the i_mode field in inode of metadata files is corrupted on disk, it can cause the initialization of bmap structure, which should have been called from nilfs_read_inode_common(), not to be called. This causes a lockdep warning followed by a NULL pointer dereference at nilfs_bmap_lookup_at_level(). This patch fixes these issues by adding a missing sanitiy check for the i_mode field of metadata file's inode. Link: https://lkml.kernel.org/r/20221002030804.29978-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+2b32eb36c1a825b7a74c@syzkaller.appspotmail.com Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-11nilfs2: fix use-after-free bug of struct nilfs_rootRyusuke Konishi1-1/+16
If the beginning of the inode bitmap area is corrupted on disk, an inode with the same inode number as the root inode can be allocated and fail soon after. In this case, the subsequent call to nilfs_clear_inode() on that bogus root inode will wrongly decrement the reference counter of struct nilfs_root, and this will erroneously free struct nilfs_root, causing kernel oopses. This fixes the problem by changing nilfs_new_inode() to skip reserved inode numbers while repairing the inode bitmap. Link: https://lkml.kernel.org/r/20221003150519.39789-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+b8c672b0e22615c80fe0@syzkaller.appspotmail.com Reported-by: Khalid Masum <khalid.masum.92@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-11nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failureRyusuke Konishi1-4/+10
If creation or finalization of a checkpoint fails due to anomalies in the checkpoint metadata on disk, a kernel warning is generated. This patch replaces the WARN_ONs by nilfs_error, so that a kernel, booted with panic_on_warn, does not panic. A nilfs_error is appropriate here to handle the abnormal filesystem condition. This also replaces the detected error codes with an I/O error so that neither of the internal error codes is returned to callers. Link: https://lkml.kernel.org/r/20220929123330.19658-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+fbb3e0b24e8dae5a16ee@syzkaller.appspotmail.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-11treewide: use get_random_bytes() when possibleJason A. Donenfeld1-1/+1
The prandom_bytes() function has been a deprecated inline wrapper around get_random_bytes() for several releases now, and compiles down to the exact same code. Replace the deprecated wrapper with a direct call to the real function. This was done as a basic find and replace. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> # powerpc Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11treewide: use get_random_u32() when possibleJason A. Donenfeld12-16/+16
The prandom_u32() function has been a deprecated inline wrapper around get_random_u32() for several releases now, and compiles down to the exact same code. Replace the deprecated wrapper with a direct call to the real function. The same also applies to get_random_int(), which is just a wrapper around get_random_u32(). This was done as a basic find and replace. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbolt Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Acked-by: Helge Deller <deller@gmx.de> # for parisc Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11treewide: use prandom_u32_max() when possible, part 2Jason A. Donenfeld2-5/+3
Rather than incurring a division or requesting too many random bytes for the given range, use the prandom_u32_max() function, which only takes the minimum required bytes from the RNG and avoids divisions. This was done by hand, covering things that coccinelle could not do on its own. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> # for ext2, ext4, and sbitmap Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11treewide: use prandom_u32_max() when possible, part 1Jason A. Donenfeld11-26/+25
Rather than incurring a division or requesting too many random bytes for the given range, use the prandom_u32_max() function, which only takes the minimum required bytes from the RNG and avoids divisions. This was done mechanically with this coccinelle script: @basic@ expression E; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u64; @@ ( - ((T)get_random_u32() % (E)) + prandom_u32_max(E) | - ((T)get_random_u32() & ((E) - 1)) + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2) | - ((u64)(E) * get_random_u32() >> 32) + prandom_u32_max(E) | - ((T)get_random_u32() & ~PAGE_MASK) + prandom_u32_max(PAGE_SIZE) ) @multi_line@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; identifier RAND; expression E; @@ - RAND = get_random_u32(); ... when != RAND - RAND %= (E); + RAND = prandom_u32_max(E); // Find a potential literal @literal_mask@ expression LITERAL; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; position p; @@ ((T)get_random_u32()@p & (LITERAL)) // Add one to the literal. @script:python add_one@ literal << literal_mask.LITERAL; RESULT; @@ value = None if literal.startswith('0x'): value = int(literal, 16) elif literal[0] in '123456789': value = int(literal, 10) if value is None: print("I don't know how to handle %s" % (literal)) cocci.include_match(False) elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1: print("Skipping 0x%x for cleanup elsewhere" % (value)) cocci.include_match(False) elif value & (value + 1) != 0: print("Skipping 0x%x because it's not a power of two minus one" % (value)) cocci.include_match(False) elif literal.startswith('0x'): coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1)) else: coccinelle.RESULT = cocci.make_expr("%d" % (value + 1)) // Replace the literal mask with the calculated result. @plus_one@ expression literal_mask.LITERAL; position literal_mask.p; expression add_one.RESULT; identifier FUNC; @@ - (FUNC()@p & (LITERAL)) + prandom_u32_max(RESULT) @collapse_ret@ type T; identifier VAR; expression E; @@ { - T VAR; - VAR = (E); - return VAR; + return E; } @drop_var@ type T; identifier VAR; @@ { - T VAR; ... when != VAR } Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: KP Singh <kpsingh@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11cifs: fix skipping to incorrect offset in emit_cached_direntsRonnie Sahlberg1-6/+23
When application has done lseek() to a different offset on a directory fd we skipped one entry too many before we start emitting directory entries from the cache. We need to also make sure that when we are starting to emit directory entries from the cache, the ->pos sequence might have holes and skip some indices. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-11NFSD: unregister shrinker when nfsd_init_net() failsTetsuo Handa1-1/+3
syzbot is reporting UAF read at register_shrinker_prepared() [1], for commit 7746b32f467b3813 ("NFSD: add shrinker to reap courtesy clients on low memory condition") missed that nfsd4_leases_net_shutdown() from nfsd_exit_net() is called only when nfsd_init_net() succeeded. If nfsd_init_net() fails due to nfsd_reply_cache_init() failure, register_shrinker() from nfsd4_init_leases_net() has to be undone before nfsd_init_net() returns. Link: https://syzkaller.appspot.com/bug?extid=ff796f04613b4c84ad89 [1] Reported-by: syzbot <syzbot+ff796f04613b4c84ad89@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: 7746b32f467b3813 ("NFSD: add shrinker to reap courtesy clients on low memory condition") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-10-11btrfs: ignore fiemap path cache if we have multiple leaves for a data extentFilipe Manana2-0/+26
The path cache used during fiemap used to determine the sharedness of extent buffers in a path from a leaf containing a file extent item pointing to our data extent up to the root node of the tree, is meant to be used for a single path. Having a single path is by far the most common case, and therefore worth to optimize for, but it's possible to actually have multiple paths because we have 2 or more leaves. If we have multiple leaves, the 'level' variable keeps getting incremented in each iteration of the while loop at btrfs_is_data_extent_shared(), which means we will treat the second leaf in the 'tmp' ulist as a level 1 node, and so forth. In the worst case this can lead to getting a level greater than or equals to BTRFS_MAX_LEVEL (8), which will trigger a WARN_ON_ONCE() in the functions to lookup from or store in the path cache (lookup_backref_shared_cache() and store_backref_shared_cache()). If the current level never goes beyond 8, due to shared nodes in the paths and a fs tree height smaller than 8, it can still result in incorrectly marking one leaf as shared because some other leaf is shared and is stored one level below that other leaf, as when storing a true sharedness value in the cache results in updating the sharedness to true of all entries in the cache below the current level. Having multiple leaves happens in a case like the following: - We have a file extent item point to data extent at bytenr X, for a file range [0, 1M[ for example; - At this moment we have an extent data ref for the extent, with an offset of 0 and a count of 1; - A write into the middle of the extent happens, file range [64K, 128K) so the file extent item is split into two (at btrfs_drop_extents()): 1) One for file range [0, 64K), with a length (num_bytes field) of 64K and an extent offset of 0; 2) Another one for file range [128K, 1M), with a length of 896K (1M - 128K) and an extent offset of 128K. - At this moment the two file extent items are located in the same leaf; - A new file extent item for the range [64K, 128K), pointing to a new data extent, is inserted in the leaf. This results in a leaf split and now those two file extent items pointing to data extent X end up located in different leaves; - Once delayed refs are run, we still have a single extent data ref item for our data extent at bytenr X, for offset 0, but now with a count of 2 instead of 1; - So during fiemap, at btrfs_is_data_extent_shared(), after we call find_parent_nodes() for the data extent, we get two leaves, since we have two file extent items point to data extent at bytenr X that are located in two different leaves. So skip the use of the path cache when we get more than one leaf. Fixes: 12a824dc67a61e ("btrfs: speedup checking for extent sharedness during fiemap") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11btrfs: fix processing of delayed tree block refs during backref walkingFilipe Manana1-6/+7
During backref walking, when processing a delayed reference with a type of BTRFS_TREE_BLOCK_REF_KEY, we have two bugs there: 1) We are accessing the delayed references extent_op, and its key, without the protection of the delayed ref head's lock; 2) If there's no extent op for the delayed ref head, we end up with an uninitialized key in the stack, variable 'tmp_op_key', and then pass it to add_indirect_ref(), which adds the reference to the indirect refs rb tree. This is wrong, because indirect references should have a NULL key when we don't have access to the key, and in that case they should be added to the indirect_missing_keys rb tree and not to the indirect rb tree. This means that if have BTRFS_TREE_BLOCK_REF_KEY delayed ref resulting from freeing an extent buffer, therefore with a count of -1, it will not cancel out the corresponding reference we have in the extent tree (with a count of 1), since both references end up in different rb trees. When using fiemap, where we often need to check if extents are shared through shared subtrees resulting from snapshots, it means we can incorrectly report an extent as shared when it's no longer shared. However this is temporary because after the transaction is committed the extent is no longer reported as shared, as running the delayed reference results in deleting the tree block reference from the extent tree. Outside the fiemap context, the result is unpredictable, as the key was not initialized but it's used when navigating the rb trees to insert and search for references (prelim_ref_compare()), and we expect all references in the indirect rb tree to have valid keys. The following reproducer triggers the second bug: $ cat test.sh #!/bin/bash DEV=/dev/sdj MNT=/mnt/sdj mkfs.btrfs -f $DEV mount -o compress $DEV $MNT # With a compressed 128M file we get a tree height of 2 (level 1 root). xfs_io -f -c "pwrite -b 1M 0 128M" $MNT/foo btrfs subvolume snapshot $MNT $MNT/snap # Fiemap should output 0x2008 in the flags column. # 0x2000 means shared extent # 0x8 means encoded extent (because it's compressed) echo echo "fiemap after snapshot, range [120M, 120M + 128K):" xfs_io -c "fiemap -v 120M 128K" $MNT/foo echo # Overwrite one extent and fsync to flush delalloc and COW a new path # in the snapshot's tree. # # After this we have a BTRFS_DROP_DELAYED_REF delayed ref of type # BTRFS_TREE_BLOCK_REF_KEY with a count of -1 for every COWed extent # buffer in the path. # # In the extent tree we have inline references of type # BTRFS_TREE_BLOCK_REF_KEY, with a count of 1, for the same extent # buffers, so they should cancel each other, and the extent buffers in # the fs tree should no longer be considered as shared. # echo "Overwriting file range [120M, 120M + 128K)..." xfs_io -c "pwrite -b 128K 120M 128K" $MNT/snap/foo xfs_io -c "fsync" $MNT/snap/foo # Fiemap should output 0x8 in the flags column. The extent in the range # [120M, 120M + 128K) is no longer shared, it's now exclusive to the fs # tree. echo echo "fiemap after overwrite range [120M, 120M + 128K):" xfs_io -c "fiemap -v 120M 128K" $MNT/foo echo umount $MNT Running it before this patch: $ ./test.sh (...) wrote 134217728/134217728 bytes at offset 0 128 MiB, 128 ops; 0.1152 sec (1.085 GiB/sec and 1110.5809 ops/sec) Create a snapshot of '/mnt/sdj' in '/mnt/sdj/snap' fiemap after snapshot, range [120M, 120M + 128K): /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [245760..246015]: 34304..34559 256 0x2008 Overwriting file range [120M, 120M + 128K)... wrote 131072/131072 bytes at offset 125829120 128 KiB, 1 ops; 0.0001 sec (683.060 MiB/sec and 5464.4809 ops/sec) fiemap after overwrite range [120M, 120M + 128K): /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [245760..246015]: 34304..34559 256 0x2008 The extent in the range [120M, 120M + 128K) is still reported as shared (0x2000 bit set) after overwriting that range and flushing delalloc, which is not correct - an entire path was COWed in the snapshot's tree and the extent is now only referenced by the original fs tree. Running it after this patch: $ ./test.sh (...) wrote 134217728/134217728 bytes at offset 0 128 MiB, 128 ops; 0.1198 sec (1.043 GiB/sec and 1068.2067 ops/sec) Create a snapshot of '/mnt/sdj' in '/mnt/sdj/snap' fiemap after snapshot, range [120M, 120M + 128K): /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [245760..246015]: 34304..34559 256 0x2008 Overwriting file range [120M, 120M + 128K)... wrote 131072/131072 bytes at offset 125829120 128 KiB, 1 ops; 0.0001 sec (694.444 MiB/sec and 5555.5556 ops/sec) fiemap after overwrite range [120M, 120M + 128K): /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [245760..246015]: 34304..34559 256 0x8 Now the extent is not reported as shared anymore. So fix this by passing a NULL key pointer to add_indirect_ref() when processing a delayed reference for a tree block if there's no extent op for our delayed ref head with a defined key. Also access the extent op only after locking the delayed ref head's lock. The reproducer will be converted later to a test case for fstests. Fixes: 86d5f994425252 ("btrfs: convert prelimary reference tracking to use rbtrees") Fixes: a6dbceafb915e8 ("btrfs: Remove unused op_key var from add_delayed_refs") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11btrfs: fix processing of delayed data refs during backref walkingFilipe Manana1-9/+24
When processing delayed data references during backref walking and we are using a share context (we are being called through fiemap), whenever we find a delayed data reference for an inode different from the one we are interested in, then we immediately exit and consider the data extent as shared. This is wrong, because: 1) This might be a DROP reference that will cancel out a reference in the extent tree; 2) Even if it's an ADD reference, it may be followed by a DROP reference that cancels it out. In either case we should not exit immediately. Fix this by never exiting when we find a delayed data reference for another inode - instead add the reference and if it does not cancel out other delayed reference, we will exit early when we call extent_is_shared() after processing all delayed references. If we find a drop reference, then signal the code that processes references from the extent tree (add_inline_refs() and add_keyed_refs()) to not exit immediately if it finds there a reference for another inode, since we have delayed drop references that may cancel it out. In this later case we exit once we don't have references in the rb trees that cancel out each other and have two references for different inodes. Example reproducer for case 1): $ cat test-1.sh #!/bin/bash DEV=/dev/sdj MNT=/mnt/sdj mkfs.btrfs -f $DEV mount $DEV $MNT xfs_io -f -c "pwrite 0 64K" $MNT/foo cp --reflink=always $MNT/foo $MNT/bar echo echo "fiemap after cloning:" xfs_io -c "fiemap -v" $MNT/foo rm -f $MNT/bar echo echo "fiemap after removing file bar:" xfs_io -c "fiemap -v" $MNT/foo umount $MNT Running it before this patch, the extent is still listed as shared, it has the flag 0x2000 (FIEMAP_EXTENT_SHARED) set: $ ./test-1.sh fiemap after cloning: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x2001 fiemap after removing file bar: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x2001 Example reproducer for case 2): $ cat test-2.sh #!/bin/bash DEV=/dev/sdj MNT=/mnt/sdj mkfs.btrfs -f $DEV mount $DEV $MNT xfs_io -f -c "pwrite 0 64K" $MNT/foo cp --reflink=always $MNT/foo $MNT/bar # Flush delayed references to the extent tree and commit current # transaction. sync echo echo "fiemap after cloning:" xfs_io -c "fiemap -v" $MNT/foo rm -f $MNT/bar echo echo "fiemap after removing file bar:" xfs_io -c "fiemap -v" $MNT/foo umount $MNT Running it before this patch, the extent is still listed as shared, it has the flag 0x2000 (FIEMAP_EXTENT_SHARED) set: $ ./test-2.sh fiemap after cloning: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x2001 fiemap after removing file bar: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x2001 After this patch, after deleting bar in both tests, the extent is not reported with the 0x2000 flag anymore, it gets only the flag 0x1 (which is FIEMAP_EXTENT_LAST): $ ./test-1.sh fiemap after cloning: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x2001 fiemap after removing file bar: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x1 $ ./test-2.sh fiemap after cloning: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x2001 fiemap after removing file bar: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x1 These tests will later be converted to a test case for fstests. Fixes: dc046b10c8b7d4 ("Btrfs: make fiemap not blow when you have lots of snapshots") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11btrfs: delete stale comments after merge conflict resolutionDavid Sterba1-2/+0
There are two comments in btrfs_cache_block_group that I left when resolving conflict between commits ced8ecf026fd8 "btrfs: fix space cache corruption and potential double allocations" and 527c490f44f6f "btrfs: delete btrfs_wait_space_cache_v1_finished". The former reworked the caching logic to wait until the caching ends in btrfs_cache_block_group while the latter only open coded the waiting. Both removed btrfs_wait_space_cache_v1_finished, the correct code is with the waiting and returning error. Thus the conflict resolution was OK. Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11btrfs: unlock locked extent area if we have contentionJosef Bacik1-7/+8
In production we hit the following deadlock task 1 task 2 task 3 ------ ------ ------ fiemap(file) falloc(file) fsync(file) write(0, 1MiB) btrfs_commit_transaction() wait_on(!pending_ordered) lock(512MiB, 1GiB) start_transaction wait_on_transaction lock(0, 1GiB) wait_extent_bit(512MiB) task 4 ------ finish_ordered_extent(0, 1MiB) lock(0, 1MiB) **DEADLOCK** This occurs because when task 1 does it's lock, it locks everything from 0-512MiB, and then waits for the 512MiB chunk to unlock. task 2 will never unlock because it's waiting on the transaction commit to happen, the transaction commit is waiting for the outstanding ordered extents, and then the ordered extent thread is blocked waiting on the 0-1MiB range to unlock. To fix this we have to clear anything we've locked so far, wait for the extent_state that we contended on, and then try to re-lock the entire range again. CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11btrfs: send: update command for protocol version checkDavid Sterba1-1/+4
For a protocol and command compatibility we have a helper that hasn't been updated for v3 yet. We use it for verity so update where necessary. Fixes: 38622010a6de ("btrfs: send: add support for fs-verity") Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11btrfs: send: allow protocol version 3 with CONFIG_BTRFS_DEBUGBoris Burkov2-1/+6
We haven't finalized send stream v3 yet, so gate the send stream version behind CONFIG_BTRFS_DEBUG as we want some way to test it. The original verity send did not check the protocol version, so add that actual protection as well. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-10Merge tag 'xfs-6.1-for-linus' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds21-98/+116
Pull xfs updates from Dave Chinner: "There are relatively few updates this cycle; half the cycle was eaten by a grue, the other half was eaten by a tricky data corruption issue that I still haven't entirely solved. Hence there's no major changes in this cycle and it's largely just minor cleanups and small bug fixes: - fixes for filesystem shutdown procedure during a DAX memory failure notification - bug fixes - logic cleanups - log message cleanups - updates to use vfs{g,u}id_t helpers where appropriate" * tag 'xfs-6.1-for-linus' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: on memory failure, only shut down fs after scanning all mappings xfs: rearrange the logic and remove the broken comment for xfs_dir2_isxx xfs: trim the mapp array accordingly in xfs_da_grow_inode_int xfs: do not need to check return value of xlog_kvmalloc() xfs: port to vfs{g,u}id_t and associated helpers xfs: remove xfs_setattr_time() declaration xfs: Remove the unneeded result variable xfs: missing space in xfs trace log xfs: simplify if-else condition in xfs_reflink_trim_around_shared xfs: simplify if-else condition in xfs_validate_new_dalign xfs: replace unnecessary seq_printf with seq_puts xfs: clean up "%Ld/%Lu" which doesn't meet C standard xfs: remove redundant else for clean code xfs: remove the redundant word in comment
2022-10-10Merge tag 'f2fs-for-6.1-rc1' of ↵Linus Torvalds22-199/+465
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This round looks fairly small comparing to the previous updates and includes mostly minor bug fixes. Nevertheless, as we've still interested in improving the stability, Chao added some debugging methods to diagnoze subtle runtime inconsistency problem. Enhancements: - store all the corruption or failure reasons in superblock - detect meta inode, summary info, and block address inconsistency - increase the limit for reserve_root for low-end devices - add the number of compressed IO in iostat Bug fixes: - DIO write fix for zoned devices - do out-of-place writes for cold files - fix some stat updates (FS_CP_DATA_IO, dirty page count) - fix race condition on setting FI_NO_EXTENT flag - fix data races when freezing super - fix wrong continue condition check in GC - do not allow ATGC for LFS mode In addition, there're some code enhancement and clean-ups as usual" * tag 'f2fs-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (32 commits) f2fs: change to use atomic_t type form sbi.atomic_files f2fs: account swapfile inodes f2fs: allow direct read for zoned device f2fs: support recording errors into superblock f2fs: support recording stop_checkpoint reason into super_block f2fs: remove the unnecessary check in f2fs_xattr_fiemap f2fs: introduce cp_status sysfs entry f2fs: fix to detect corrupted meta ino f2fs: fix to account FS_CP_DATA_IO correctly f2fs: code clean and fix a type error f2fs: add "c_len" into trace_f2fs_update_extent_tree_range for compressed file f2fs: fix to do sanity check on summary info f2fs: port to vfs{g,u}id_t and associated helpers f2fs: fix to do sanity check on destination blkaddr during recovery f2fs: let FI_OPU_WRITE override FADVISE_COLD_BIT f2fs: fix race condition on setting FI_NO_EXTENT flag f2fs: remove redundant check in f2fs_sanity_check_cluster f2fs: add static init_idisk_time function to reduce the code f2fs: fix typo f2fs: fix wrong dirty page count when race between mmap and fallocate. ...
2022-10-10Merge tag 'gfs2-nopid-for-v6.1' of ↵Linus Torvalds8-25/+247
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 debugfs updates from Andreas Gruenbacher: - Improve the way how the state of glocks is reported in debugfs for glocks which are not held by processes, but rather by other resouces like cached inodes or flocks. * tag 'gfs2-nopid-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Mark the remaining process-independent glock holders as GL_NOPID gfs2: Mark flock glock holders as GL_NOPID gfs2: Add GL_NOPID flag for process-independent glock holders gfs2: Add flocks to glockfd debugfs file gfs2: Add glockfd debugfs file
2022-10-10Merge tag 'gfs2-v6.0-rc2-fixes' of ↵Linus Torvalds6-25/+77
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Make sure to initialize the filesystem work queues before registering the filesystem; this prevents them from being used uninitialized. - On filesystem withdraw: prevent a a double iput() and immediately reject pending locking requests that can no longer succeed. - Use TRY lock in gfs2_inode_lookup() to prevent a rare glock hang during evict. - During filesystem mount, explicitly make sure that the sb_bsize and sb_bsize_shift super block fields are consistent with each other. This prevents messy error messages during fuzz testing. - Switch from strlcpy to strscpy. * tag 'gfs2-v6.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Register fs after creating workqueues gfs2: Check sb_bsize_shift after reading superblock gfs2: Switch from strlcpy to strscpy gfs2: Clear flags when withdraw prevents xmote gfs2: Dequeue waiters when withdrawn gfs2: Prevent double iput for journal on error gfs2: Use TRY lock in gfs2_inode_lookup for UNLINKED inodes
2022-10-10Merge tag '6.1-rc-smb3-client-fixes-part1' of ↵Linus Torvalds29-424/+444
git://git.samba.org/sfrench/cifs-2.6 Pull cifs updates from Steve French: - data corruption fix when cache disabled - four RDMA (smbdirect) improvements, including enabling support for SoftiWARP - four signing improvements - three directory lease improvements - four cleanup fixes - minor security fix - two debugging improvements * tag '6.1-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6: (21 commits) smb3: fix oops in calculating shash_setkey cifs: secmech: use shash_desc directly, remove sdesc smb3: rename encryption/decryption TFMs cifs: replace kfree() with kfree_sensitive() for sensitive data cifs: remove initialization value cifs: Replace a couple of one-element arrays with flexible-array members smb3: do not log confusing message when server returns no network interfaces smb3: define missing create contexts cifs: store a pointer to a fid in the cfid structure instead of the struct cifs: improve handlecaching cifs: Make tcon contain a wrapper structure cached_fids instead of cached_fid smb3: add dynamic trace points for tree disconnect Fix formatting of client smbdirect RDMA logging Handle variable number of SGEs in client smbdirect send. Reduce client smbdirect max receive segment size Decrease the number of SMB3 smbdirect client SGEs cifs: Fix the error length of VALIDATE_NEGOTIATE_INFO message cifs: destage dirty pages before re-reading them for cache=none cifs: return correct error in ->calc_signature() MAINTAINERS: Add Tom Talpey as cifs.ko reviewer ...
2022-10-10Merge tag 'nfsd-6.1-1' of ↵Linus Torvalds1-59/+29
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull more nfsd updates from Chuck Lever: - filecache code clean-ups * tag 'nfsd-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: rework hashtable handling in nfsd_do_file_acquire nfsd: fix nfsd_file_unhash_and_dispose
2022-10-10Merge tag 'pull-tmpfile' of ↵Linus Torvalds19-227/+262
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs tmpfile updates from Al Viro: "Miklos' ->tmpfile() signature change; pass an unopened struct file to it, let it open the damn thing. Allows to add tmpfile support to FUSE" * tag 'pull-tmpfile' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fuse: implement ->tmpfile() vfs: open inside ->tmpfile() vfs: move open right after ->tmpfile() vfs: make vfs_tmpfile() static ovl: use vfs_tmpfile_open() helper cachefiles: use vfs_tmpfile_open() helper cachefiles: only pass inode to *mark_inode_inuse() helpers cachefiles: tmpfile error handling cleanup hugetlbfs: cleanup mknod and tmpfile vfs: add vfs_tmpfile_open() helper
2022-10-10Merge tag 'mm-stable-2022-10-08' of ↵Linus Torvalds35-472/+643
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in linux-next for a couple of months without, to my knowledge, any negative reports (or any positive ones, come to that). - Also the Maple Tree from Liam Howlett. An overlapping range-based tree for vmas. It it apparently slightly more efficient in its own right, but is mainly targeted at enabling work to reduce mmap_lock contention. Liam has identified a number of other tree users in the kernel which could be beneficially onverted to mapletrees. Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat at [1]. This has yet to be addressed due to Liam's unfortunately timed vacation. He is now back and we'll get this fixed up. - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses clang-generated instrumentation to detect used-unintialized bugs down to the single bit level. KMSAN keeps finding bugs. New ones, as well as the legacy ones. - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of memory into THPs. - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support file/shmem-backed pages. - userfaultfd updates from Axel Rasmussen - zsmalloc cleanups from Alexey Romanov - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure - Huang Ying adds enhancements to NUMA balancing memory tiering mode's page promotion, with a new way of detecting hot pages. - memcg updates from Shakeel Butt: charging optimizations and reduced memory consumption. - memcg cleanups from Kairui Song. - memcg fixes and cleanups from Johannes Weiner. - Vishal Moola provides more folio conversions - Zhang Yi removed ll_rw_block() :( - migration enhancements from Peter Xu - migration error-path bugfixes from Huang Ying - Aneesh Kumar added ability for a device driver to alter the memory tiering promotion paths. For optimizations by PMEM drivers, DRM drivers, etc. - vma merging improvements from Jakub Matěn. - NUMA hinting cleanups from David Hildenbrand. - xu xin added aditional userspace visibility into KSM merging activity. - THP & KSM code consolidation from Qi Zheng. - more folio work from Matthew Wilcox. - KASAN updates from Andrey Konovalov. - DAMON cleanups from Kaixu Xia. - DAMON work from SeongJae Park: fixes, cleanups. - hugetlb sysfs cleanups from Muchun Song. - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core. Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1] * tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits) hugetlb: allocate vma lock for all sharable vmas hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer hugetlb: fix vma lock handling during split vma and range unmapping mglru: mm/vmscan.c: fix imprecise comments mm/mglru: don't sync disk for each aging cycle mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol mm: memcontrol: use do_memsw_account() in a few more places mm: memcontrol: deprecate swapaccounting=0 mode mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled mm/secretmem: remove reduntant return value mm/hugetlb: add available_huge_pages() func mm: remove unused inline functions from include/linux/mm_inline.h selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd selftests/vm: add thp collapse shmem testing selftests/vm: add thp collapse file and tmpfs testing selftests/vm: modularize thp collapse memory operations selftests/vm: dedup THP helpers mm/khugepaged: add tracepoint to hpage_collapse_scan_file() mm/madvise: add file and shmem support to MADV_COLLAPSE ...
2022-10-10Merge tag 'xtensa-20221010' of https://github.com/jcmvbkbc/linux-xtensaLinus Torvalds1-1/+1
Pull xtensa updates from Max Filippov: - add support for FDPIC and static PIE executable formats for noMMU * tag 'xtensa-20221010' of https://github.com/jcmvbkbc/linux-xtensa: xtensa: add FDPIC and static PIE support for noMMU xtensa: clean up ELF_PLAT_INIT macro
2022-10-10Merge tag 'bitmap-6.1-rc1' of https://github.com/norov/linuxLinus Torvalds1-2/+2
Pull bitmap updates from Yury Norov: - Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES (Phil Auld) - cleanup nr_cpu_ids vs nr_cpumask_bits mess (me) This series cleans that mess and adds new config FORCE_NR_CPUS that allows to optimize cpumask subsystem if the number of CPUs is known at compile-time. - optimize find_bit() functions (me) Reworks find_bit() functions based on new FIND_{FIRST,NEXT}_BIT() macros. - add find_nth_bit() (me) Adds find_nth_bit(), which is ~70 times faster than bitcounting with for_each() loop: for_each_set_bit(bit, mask, size) if (n-- == 0) return bit; Also adds bitmap_weight_and() to let people replace this pattern: tmp = bitmap_alloc(nbits); bitmap_and(tmp, map1, map2, nbits); weight = bitmap_weight(tmp, nbits); bitmap_free(tmp); with a single bitmap_weight_and() call. - repair cpumask_check() (me) After switching cpumask to use nr_cpu_ids, cpumask_check() started generating many false-positive warnings. This series fixes it. - Add for_each_cpu_andnot() and for_each_cpu_andnot() (Valentin Schneider) Extends the API with one more function and applies it in sched/core. * tag 'bitmap-6.1-rc1' of https://github.com/norov/linux: (28 commits) sched/core: Merge cpumask_andnot()+for_each_cpu() into for_each_cpu_andnot() lib/test_cpumask: Add for_each_cpu_and(not) tests cpumask: Introduce for_each_cpu_andnot() lib/find_bit: Introduce find_next_andnot_bit() cpumask: fix checking valid cpu range lib/bitmap: add tests for for_each() loops lib/find: optimize for_each() macros lib/bitmap: introduce for_each_set_bit_wrap() macro lib/find_bit: add find_next{,_and}_bit_wrap cpumask: switch for_each_cpu{,_not} to use for_each_bit() net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and} cpumask: add cpumask_nth_{,and,andnot} lib/bitmap: remove bitmap_ord_to_pos lib/bitmap: add tests for find_nth_bit() lib: add find_nth{,_and,_andnot}_bit() lib/bitmap: add bitmap_weight_and() lib/bitmap: don't call __bitmap_weight() in kernel code tools: sync find_bit() implementation lib/find_bit: optimize find_next_bit() functions lib/find_bit: create find_first_zero_bit_le() ...
2022-10-10Merge tag 'sysctl-6.1-rc1' of ↵Linus Torvalds1-8/+1
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull sysctl updates from Luis Chamberlain: "Just some boring cleanups on the sysctl front for this release" * tag 'sysctl-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: kernel/sysctl-test: use SYSCTL_{ZERO/ONE_HUNDRED} instead of i_{zero/one_hundred} kernel/sysctl.c: move sysctl_vals and sysctl_long_vals to sysctl.c sysctl: remove max_extfrag_threshold kernel/sysctl.c: remove unnecessary (void*) conversions proc: remove initialization assignment
2022-10-10Merge tag 'printk-for-6.1' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Initialize pointer hashing using the system workqueue. It avoids taking locks in printk()/vsprintf() code path - Misc code clean up * tag 'printk-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: Mark __printk percpu data ready __ro_after_init printk: Remove bogus comment vs. boot consoles printk: Remove write only variable nr_ext_console_drivers printk: Declare log_wait properly printk: Make pr_flush() static lib/vsprintf: Initialize vsprintf's pointer hash once the random core is ready. lib/vsprintf: Remove static_branch_likely() from __ptr_to_hashval(). lib/vnsprintf: add const modifier for param 'bitmap'
2022-10-10Merge tag 'sched-rt-2022-10-05' of ↵Linus Torvalds1-11/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull preempt RT updates from Thomas Gleixner: "Introduce preempt_[dis|enable_nested() and use it to clean up various places which have open coded PREEMPT_RT conditionals. On PREEMPT_RT enabled kernels, spinlocks and rwlocks are neither disabling preemption nor interrupts. Though there are a few places which depend on the implicit preemption/interrupt disable of those locks, e.g. seqcount write sections, per CPU statistics updates etc. PREEMPT_RT added open coded CONFIG_PREEMPT_RT conditionals to disable/enable preemption in the related code parts all over the place. That's hard to read and does not really explain why this is necessary. Linus suggested to use helper functions (preempt_disable_nested() and preempt_enable_nested()) and use those in the affected places. On !RT enabled kernels these functions are NOPs, but contain a lockdep assert to validate that preemption is actually disabled to catch call sites which do not have preemption disabled. Clean up the affected code paths in mm, dentry and lib" * tag 'sched-rt-2022-10-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: u64_stats: Streamline the implementation flex_proportions: Disable preemption entering the write section. mm/compaction: Get rid of RT ifdeffery mm/memcontrol: Replace the PREEMPT_RT conditionals mm/debug: Provide VM_WARN_ON_IRQS_ENABLED() mm/vmstat: Use preempt_[dis|en]able_nested() dentry: Use preempt_[dis|en]able_nested() preempt: Provide preempt_[dis|en]able_nested()
2022-10-10Merge tag 'sched-core-2022-10-07' of ↵Linus Torvalds10-32/+31
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Debuggability: - Change most occurances of BUG_ON() to WARN_ON_ONCE() - Reorganize & fix TASK_ state comparisons, turn it into a bitmap - Update/fix misc scheduler debugging facilities Load-balancing & regular scheduling: - Improve the behavior of the scheduler in presence of lot of SCHED_IDLE tasks - in particular they should not impact other scheduling classes. - Optimize task load tracking, cleanups & fixes - Clean up & simplify misc load-balancing code Freezer: - Rewrite the core freezer to behave better wrt thawing and be simpler in general, by replacing PF_FROZEN with TASK_FROZEN & fixing/adjusting all the fallout. Deadline scheduler: - Fix the DL capacity-aware code - Factor out dl_task_is_earliest_deadline() & replenish_dl_new_period() - Relax/optimize locking in task_non_contending() Cleanups: - Factor out the update_current_exec_runtime() helper - Various cleanups, simplifications" * tag 'sched-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits) sched: Fix more TASK_state comparisons sched: Fix TASK_state comparisons sched/fair: Move call to list_last_entry() in detach_tasks sched/fair: Cleanup loop_max and loop_break sched/fair: Make sure to try to detach at least one movable task sched: Show PF_flag holes freezer,sched: Rewrite core freezer logic sched: Widen TAKS_state literals sched/wait: Add wait_event_state() sched/completion: Add wait_for_completion_state() sched: Add TASK_ANY for wait_task_inactive() sched: Change wait_task_inactive()s match_state freezer,umh: Clean up freezer/initrd interaction freezer: Have {,un}lock_system_sleep() save/restore flags sched: Rename task_running() to task_on_cpu() sched/fair: Cleanup for SIS_PROP sched/fair: Default to false in test_idle_cores() sched/fair: Remove useless check in select_idle_core() sched/fair: Avoid double search on same cpu sched/fair: Remove redundant check in select_idle_smt() ...
2022-10-09Merge tag 'ucount-rlimits-cleanups-for-v5.19' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull ucounts update from Eric Biederman: "Split rlimit and ucount values and max values After the ucount rlimit code was merged a bunch of small but siginificant bugs were found and fixed. At the time it was realized that part of the problem was that while the ucount rlimits were very similar to the oridinary ucounts (in being nested counts with limits) the semantics were slightly different and the code would be less error prone if there was less sharing. This is the long awaited cleanup that should hopefully keep things more comprehensible and less error prone for whoever needs to touch that code next" * tag 'ucount-rlimits-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ucounts: Split rlimit and ucount values and max values
2022-10-09Merge tag 'signal-for-v5.20' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull ptrace update from Eric Biederman: "ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT Recently I had a conversation where it was pointed out to me that SIGKILL sent to a tracee stropped in PTRACE_EVENT_EXIT is quite difficult for a tracer to handle. Keeping SIGKILL working after the process has been killed is pain from an implementation point of view. So since the debuggers don't want this behavior let's see if we can remove this wart for the userspace API If a regression is detected it should only need to be the last change that is the reverted. The other two are just general cleanups that make the last patch simpler" * tag 'signal-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: signal: Drop signals received after a fatal signal has been processed signal: Guarantee that SIGNAL_GROUP_EXIT is set on process exit signal: Ensure SIGNAL_GROUP_EXIT gets set in do_group_exit
2022-10-09gfs2: Merge branch 'for-next.nopid' into for-nextAndreas Gruenbacher8-25/+247
Resolves a conflict in gfs2_inode_lookup() between the following commits: gfs2: Use TRY lock in gfs2_inode_lookup for UNLINKED inodes gfs2: Mark the remaining process-independent glock holders as GL_NOPID Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>