Age | Commit message (Collapse) | Author | Files | Lines |
|
Pull smb server fixes from Steve French:
"Six SMB3 server fixes for various races found by RO0T Lab of Huawei:
- Fix oops when racing between oplock break ack and freeing file
- Simultaneous request fixes for parallel logoffs, and for parallel
lock requests
- Fixes for tree disconnect race, session expire race, and close/open
race"
* tag '6.6-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix race condition between tree conn lookup and disconnect
ksmbd: fix race condition from parallel smb2 lock requests
ksmbd: fix race condition from parallel smb2 logoff requests
ksmbd: fix uaf in smb20_oplock_break_ack
ksmbd: fix race condition with fp
ksmbd: fix race condition between session lookup and expire
|
|
Pull smb client fixes from Steve French:
- protect cifs/smb3 socket connect from BPF address overwrite
- fix case when directory leases disabled but wasting resources with
unneeded thread on each mount
* tag '6.6-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: do not start laundromat thread on nohandlecache
smb: use kernel_connect() and kernel_bind()
|
|
Pull xfs fixes from Chandan Babu:
- Prevent filesystem hang when executing fstrim operations on large and
slow storage
* tag 'xfs-6.6-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: abort fstrim if kernel is suspending
xfs: reduce AGF hold times during fstrim operations
xfs: move log discard work to xfs_discard.c
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- reject unknown mount options
- adjust transaction abort error message level
- fix one more build warning with -Wmaybe-uninitialized
- proper error handling in several COW-related cases
* tag 'for-6.6-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: error out when reallocating block for defrag using a stale transaction
btrfs: error when COWing block from a root that is being deleted
btrfs: error out when COWing block using a stale transaction
btrfs: always print transaction aborted messages with an error level
btrfs: reject unknown mount options early
btrfs: fix some -Wmaybe-uninitialized warnings in ioctl.c
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
- Fix a memory leak issue when using LZMA global compressed
deduplication
- Fix empty device tags in flatdev mode
- Update documentation for recent new features
* tag 'erofs-for-6.6-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: update documentation
erofs: allow empty device tags in flatdev mode
erofs: fix memory leak of LZMA global compressed deduplication
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs fixes from Amir Goldstein:
- Fix for file reference leak regression
- Fix for NULL pointer deref regression
- Fixes for RCU-walk race regressions:
Two of the fixes were taken from Al's RCU pathwalk race fixes series
with his consent [1].
Note that unlike most of Al's series, these two patches are not about
racing with ->kill_sb() and they are also very recent regressions
from v6.5, so I think it's worth getting them into v6.5.y.
There is also a fix for an RCU pathwalk race with ->kill_sb(), which
may have been solved in vfs generic code as you suggested, but it
also rids overlayfs from a nasty hack, so I think it's worth anyway.
Link: https://lore.kernel.org/linux-fsdevel/20231003204749.GA800259@ZenIV/ [1]
* tag 'ovl-fixes-6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: fix NULL pointer defer when encoding non-decodable lower fid
ovl: make use of ->layers safe in rcu pathwalk
ovl: fetch inode once in ovl_dentry_revalidate_common()
ovl: move freeing ovl_entry past rcu delay
ovl: fix file reference leak when submitting aio
|
|
if thread A in smb2_write is using work-tcon, other thread B use
smb2_tree_disconnect free the tcon, then thread A will use free'd tcon.
Time
+
Thread A | Thread A
smb2_write | smb2_tree_disconnect
|
|
| kfree(tree_conn)
|
// UAF! |
work->tcon->share_conf |
+
This patch add state, reference count and lock for tree conn to fix race
condition issue.
Reported-by: luosili <rootlab@huawei.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
There is a race condition issue between parallel smb2 lock request.
Time
+
Thread A | Thread A
smb2_lock | smb2_lock
|
insert smb_lock to lock_list |
spin_unlock(&work->conn->llist_lock) |
|
| spin_lock(&conn->llist_lock);
| kfree(cmp_lock);
|
// UAF! |
list_add(&smb_lock->llist, &rollback_list) +
This patch swaps the line for adding the smb lock to the rollback list and
adding the lock list of connection to fix the race issue.
Reported-by: luosili <rootlab@huawei.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
If parallel smb2 logoff requests come in before closing door, running
request count becomes more than 1 even though connection status is set to
KSMBD_SESS_NEED_RECONNECT. It can't get condition true, and sleep forever.
This patch fix race condition problem by returning error if connection
status was already set to KSMBD_SESS_NEED_RECONNECT.
Reported-by: luosili <rootlab@huawei.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
drop reference after use opinfo.
Signed-off-by: luosili <rootlab@huawei.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
fp can used in each command. If smb2_close command is coming at the
same time, UAF issue can happen by race condition.
Time
+
Thread A | Thread B1 B2 .... B5
smb2_open | smb2_close
|
__open_id |
insert fp to file_table |
|
| atomic_dec_and_test(&fp->refcount)
| if fp->refcount == 0, free fp by kfree.
// UAF! |
use fp |
+
This patch add f_state not to use freed fp is used and not to free fp in
use.
Reported-by: luosili <rootlab@huawei.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Thread A + Thread B
ksmbd_session_lookup | smb2_sess_setup
sess = xa_load |
|
| xa_erase(&conn->sessions, sess->id);
|
| ksmbd_session_destroy(sess) --> kfree(sess)
|
// UAF! |
sess->last_active = jiffies |
+
This patch add rwsem to fix race condition between ksmbd_session_lookup
and ksmbd_expire_session.
Reported-by: luosili <rootlab@huawei.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Honor 'nohandlecache' mount option by not starting laundromat thread
even when SMB server supports directory leases. Do not waste system
resources by having laundromat thread running with no directory
caching at all.
Fixes: 2da338ff752a ("smb3: do not start laundromat thread when dir leases disabled")
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Recent changes to kernel_connect() and kernel_bind() ensure that
callers are insulated from changes to the address parameter made by BPF
SOCK_ADDR hooks. This patch wraps direct calls to ops->connect() and
ops->bind() with kernel_connect() and kernel_bind() to ensure that SMB
mounts do not see their mount address overwritten in such cases.
Link: https://lore.kernel.org/netdev/9944248dba1bce861375fcce9de663934d933ba9.camel@redhat.com/
Cc: <stable@vger.kernel.org> # 6.0+
Signed-off-by: Jordan Rife <jrife@google.com>
Acked-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
At btrfs_realloc_node() we have these checks to verify we are not using a
stale transaction (a past transaction with an unblocked state or higher),
and the only thing we do is to trigger two WARN_ON(). This however is a
critical problem, highly unexpected and if it happens it's most likely due
to a bug, so we should error out and turn the fs into error state so that
such issue is much more easily noticed if it's triggered.
The problem is critical because in btrfs_realloc_node() we COW tree blocks,
and using such stale transaction will lead to not persisting the extent
buffers used for the COW operations, as allocating tree block adds the
range of the respective extent buffers to the ->dirty_pages iotree of the
transaction, and a stale transaction, in the unlocked state or higher,
will not flush dirty extent buffers anymore, therefore resulting in not
persisting the tree block and resource leaks (not cleaning the dirty_pages
iotree for example).
So do the following changes:
1) Return -EUCLEAN if we find a stale transaction;
2) Turn the fs into error state, with error -EUCLEAN, so that no
transaction can be committed, and generate a stack trace;
3) Combine both conditions into a single if statement, as both are related
and have the same error message;
4) Mark the check as unlikely, since this is not expected to ever happen.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
At btrfs_cow_block() we check if the block being COWed belongs to a root
that is being deleted and if so we log an error message. However this is
an unexpected case and it indicates a bug somewhere, so we should return
an error and abort the transaction. So change this in the following ways:
1) Abort the transaction with -EUCLEAN, so that if the issue ever happens
it can easily be noticed;
2) Change the logged message level from error to critical, and change the
message itself to print the block's logical address and the ID of the
root;
3) Return -EUCLEAN to the caller;
4) As this is an unexpected scenario, that should never happen, mark the
check as unlikely, allowing the compiler to potentially generate better
code.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
At btrfs_cow_block() we have these checks to verify we are not using a
stale transaction (a past transaction with an unblocked state or higher),
and the only thing we do is to trigger a WARN with a message and a stack
trace. This however is a critical problem, highly unexpected and if it
happens it's most likely due to a bug, so we should error out and turn the
fs into error state so that such issue is much more easily noticed if it's
triggered.
The problem is critical because using such stale transaction will lead to
not persisting the extent buffer used for the COW operation, as allocating
a tree block adds the range of the respective extent buffer to the
->dirty_pages iotree of the transaction, and a stale transaction, in the
unlocked state or higher, will not flush dirty extent buffers anymore,
therefore resulting in not persisting the tree block and resource leaks
(not cleaning the dirty_pages iotree for example).
So do the following changes:
1) Return -EUCLEAN if we find a stale transaction;
2) Turn the fs into error state, with error -EUCLEAN, so that no
transaction can be committed, and generate a stack trace;
3) Combine both conditions into a single if statement, as both are related
and have the same error message;
4) Mark the check as unlikely, since this is not expected to ever happen.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Commit b7af0635c87f ("btrfs: print transaction aborted messages with an
error level") changed the log level of transaction aborted messages from
a debug level to an error level, so that such messages are always visible
even on production systems where the log level is normally above the debug
level (and also on some syzbot reports).
Later, commit fccf0c842ed4 ("btrfs: move btrfs_abort_transaction to
transaction.c") changed the log level back to debug level when the error
number for a transaction abort should not have a stack trace printed.
This happened for absolutely no reason. It's always useful to print
transaction abort messages with an error level, regardless of whether
the error number should cause a stack trace or not.
So change back the log level to error level.
Fixes: fccf0c842ed4 ("btrfs: move btrfs_abort_transaction to transaction.c")
CC: stable@vger.kernel.org # 6.5+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
[BUG]
The following script would allow invalid mount options to be specified
(although such invalid options would just be ignored):
# mkfs.btrfs -f $dev
# mount $dev $mnt1 <<< Successful mount expected
# mount $dev $mnt2 -o junk <<< Failed mount expected
# echo $?
0
[CAUSE]
For the 2nd mount, since the fs is already mounted, we won't go through
open_ctree() thus no btrfs_parse_options(), but only through
btrfs_parse_subvol_options().
However we do not treat unrecognized options from valid but irrelevant
options, thus those invalid options would just be ignored by
btrfs_parse_subvol_options().
[FIX]
Add the handling for Opt_err to handle invalid options and error out,
while still ignore other valid options inside btrfs_parse_subvol_options().
Reported-by: Anand Jain <anand.jain@oracle.com>
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Jens reported the following warnings from -Wmaybe-uninitialized recent
Linus' branch.
In file included from ./include/asm-generic/rwonce.h:26,
from ./arch/arm64/include/asm/rwonce.h:71,
from ./include/linux/compiler.h:246,
from ./include/linux/export.h:5,
from ./include/linux/linkage.h:7,
from ./include/linux/kernel.h:17,
from fs/btrfs/ioctl.c:6:
In function ‘instrument_copy_from_user_before’,
inlined from ‘_copy_from_user’ at ./include/linux/uaccess.h:148:3,
inlined from ‘copy_from_user’ at ./include/linux/uaccess.h:183:7,
inlined from ‘btrfs_ioctl_space_info’ at fs/btrfs/ioctl.c:2999:6,
inlined from ‘btrfs_ioctl’ at fs/btrfs/ioctl.c:4616:10:
./include/linux/kasan-checks.h:38:27: warning: ‘space_args’ may be used
uninitialized [-Wmaybe-uninitialized]
38 | #define kasan_check_write __kasan_check_write
./include/linux/instrumented.h:129:9: note: in expansion of macro
‘kasan_check_write’
129 | kasan_check_write(to, n);
| ^~~~~~~~~~~~~~~~~
./include/linux/kasan-checks.h: In function ‘btrfs_ioctl’:
./include/linux/kasan-checks.h:20:6: note: by argument 1 of type ‘const
volatile void *’ to ‘__kasan_check_write’ declared here
20 | bool __kasan_check_write(const volatile void *p, unsigned int
size);
| ^~~~~~~~~~~~~~~~~~~
fs/btrfs/ioctl.c:2981:39: note: ‘space_args’ declared here
2981 | struct btrfs_ioctl_space_args space_args;
| ^~~~~~~~~~
In function ‘instrument_copy_from_user_before’,
inlined from ‘_copy_from_user’ at ./include/linux/uaccess.h:148:3,
inlined from ‘copy_from_user’ at ./include/linux/uaccess.h:183:7,
inlined from ‘_btrfs_ioctl_send’ at fs/btrfs/ioctl.c:4343:9,
inlined from ‘btrfs_ioctl’ at fs/btrfs/ioctl.c:4658:10:
./include/linux/kasan-checks.h:38:27: warning: ‘args32’ may be used
uninitialized [-Wmaybe-uninitialized]
38 | #define kasan_check_write __kasan_check_write
./include/linux/instrumented.h:129:9: note: in expansion of macro
‘kasan_check_write’
129 | kasan_check_write(to, n);
| ^~~~~~~~~~~~~~~~~
./include/linux/kasan-checks.h: In function ‘btrfs_ioctl’:
./include/linux/kasan-checks.h:20:6: note: by argument 1 of type ‘const
volatile void *’ to ‘__kasan_check_write’ declared here
20 | bool __kasan_check_write(const volatile void *p, unsigned int
size);
| ^~~~~~~~~~~~~~~~~~~
fs/btrfs/ioctl.c:4341:49: note: ‘args32’ declared here
4341 | struct btrfs_ioctl_send_args_32 args32;
| ^~~~~~
This was due to his config options and having KASAN turned on,
which adds some extra checks around copy_from_user(), which then
triggered the -Wmaybe-uninitialized checker for these cases.
Fix the warnings by initializing the different structs we're copying
into.
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
A recent ext4 patch posting from Jan Kara reminded me of a
discussion a year ago about fstrim in progress preventing kernels
from suspending. The fix is simple, we should do the same for XFS.
This removes the -ERESTARTSYS error return from this code, replacing
it with either the last error seen or the number of blocks
successfully trimmed up to the point where we detected the stop
condition.
References: https://bugzilla.kernel.org/show_bug.cgi?id=216322
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
|
fstrim will hold the AGF lock for as long as it takes to walk and
discard all the free space in the AG that meets the userspace trim
criteria. For AGs with lots of free space extents (e.g. millions)
or the underlying device is really slow at processing discard
requests (e.g. Ceph RBD), this means the AGF hold time is often
measured in minutes to hours, not a few milliseconds as we normal
see with non-discard based operations.
This can result in the entire filesystem hanging whilst the
long-running fstrim is in progress. We can have transactions get
stuck waiting for the AGF lock (data or metadata extent allocation
and freeing), and then more transactions get stuck waiting on the
locks those transactions hold. We can get to the point where fstrim
blocks an extent allocation or free operation long enough that it
ends up pinning the tail of the log and the log then runs out of
space. At this point, every modification in the filesystem gets
blocked. This includes read operations, if atime updates need to be
made.
To fix this problem, we need to be able to discard free space
extents safely without holding the AGF lock. Fortunately, we already
do this with online discard via busy extents. We can mark free space
extents as "busy being discarded" under the AGF lock and then unlock
the AGF, knowing that nobody will be able to allocate that free
space extent until we remove it from the busy tree.
Modify xfs_trim_extents to use the same asynchronous discard
mechanism backed by busy extents as is used with online discard.
This results in the AGF only needing to be held for short periods of
time and it is never held while we issue discards. Hence if discard
submission gets throttled because it is slow and/or there are lots
of them, we aren't preventing other operations from being performed
on AGF while we wait for discards to complete...
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
|
Because we are going to use the same list-based discard submission
interface for fstrim-based discards, too.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
|
Pull NFS client fixes from Anna Schumaker:
"Stable fixes:
- Revert "SUNRPC dont update timeout value on connection reset"
- NFSv4: Fix a state manager thread deadlock regression
Fixes:
- Fix potential NULL pointer dereference in nfs_inode_remove_request()
- Fix rare NULL pointer dereference in xs_tcp_tls_setup_socket()
- Fix long delay before failing a TLS mount when server does not
support TLS
- Fix various NFS state manager issues"
* tag 'nfs-for-6.6-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
nfs: decrement nrequests counter before releasing the req
SUNRPC/TLS: Lock the lower_xprt during the tls handshake
Revert "SUNRPC dont update timeout value on connection reset"
NFSv4: Fix a state manager thread deadlock regression
NFSv4: Fix a nfs4_state_manager() race
SUNRPC: Fail quickly when server does not recognize TLS
|
|
A wrong return value from ovl_check_encode_origin() would cause
ovl_dentry_to_fid() to try to encode fid from NULL upper dentry.
Reported-by: syzbot+2208f82282740c1c8915@syzkaller.appspotmail.com
Fixes: 16aac5ad1fa9 ("ovl: support encoding non-decodable file handles")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
ovl_permission() accesses ->layers[...].mnt; we can't have ->layers
freed without an RCU delay on fs shutdown.
Fortunately, kern_unmount_array() that is used to drop those mounts
does include an RCU delay, so freeing is delayed; unfortunately, the
array passed to kern_unmount_array() is formed by mangling ->layers
contents and that happens without any delays.
The ->layers[...].name string entries are used to store the strings to
display in "lowerdir=..." by ovl_show_options(). Those entries are not
accessed in RCU walk.
Move the name strings into a separate array ofs->config.lowerdirs and
reuse the ofs->config.lowerdirs array as the temporary mount array to
pass to kern_unmount_array().
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/20231002023711.GP3389589@ZenIV/
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
d_inode_rcu() is right - we might be in rcu pathwalk;
however, OVL_E() hides plain d_inode() on the same dentry...
Fixes: a6ff2bc0be17 ("ovl: use OVL_E() and OVL_E_FLAGS() accessors")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
... into ->free_inode(), that is.
Fixes: 0af950f57fef "ovl: move ovl_entry into ovl_inode"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
Commit 724768a39374 ("ovl: fix incorrect fdput() on aio completion")
took a refcount on real file before submitting aio, but forgot to
avoid clearing FDPUT_FPUT from real.flags stack variable.
This can result in a file reference leak.
Fixes: 724768a39374 ("ovl: fix incorrect fdput() on aio completion")
Reported-by: Gil Lev <contact@levgil.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Fourteen hotfixes, eleven of which are cc:stable. The remainder
pertain to issues which were introduced after 6.5"
* tag 'mm-hotfixes-stable-2023-10-01-08-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
Crash: add lock to serialize crash hotplug handling
selftests/mm: fix awk usage in charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh that may cause error
mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified
mm/damon/vaddr-test: fix memory leak in damon_do_test_apply_three_regions()
mm, memcg: reconsider kmem.limit_in_bytes deprecation
mm: zswap: fix potential memory corruption on duplicate store
arm64: hugetlb: fix set_huge_pte_at() to work with all swap entries
mm: hugetlb: add huge page size param to set_huge_pte_at()
maple_tree: add MAS_UNDERFLOW and MAS_OVERFLOW states
maple_tree: add mas_is_active() to detect in-tree walks
nilfs2: fix potential use after free in nilfs_gccache_submit_read_data()
mm: abstract moving to the next PFN
mm: report success more often from filemap_map_folio_range()
fs: binfmt_elf_efpic: fix personality for ELF-FDPIC
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Make sure 32-bit applications using user events have aligned access
when running on a 64-bit kernel.
- Add cond_resched in the loop that handles converting enums in
print_fmt string is trace events.
- Fix premature wake ups of polling processes in the tracing ring
buffer. When a task polls waiting for a percentage of the ring buffer
to be filled, the writer still will wake it up at every event. Add
the polling's percentage to the "shortest_full" list to tell the
writer when to wake it up.
- For eventfs dir lookups on dynamic events, an event system's only
event could be removed, leaving its dentry with no children. This is
totally legitimate. But in eventfs_release() it must not access the
children array, as it is only allocated when the dentry has children.
* tag 'trace-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
eventfs: Test for dentries array allocated in eventfs_release()
tracing/user_events: Align set_bit() address for all archs
tracing: relax trace_event_eval_update() execution with cond_resched()
ring-buffer: Update "shortest_full" in polling
|
|
The dcache_dir_open_wrapper() could be called when a dynamic event is
being deleted leaving a dentry with no children. In this case the
dlist->dentries array will never be allocated. This needs to be checked
for in eventfs_release(), otherwise it will trigger a NULL pointer
dereference.
Link: https://lore.kernel.org/linux-trace-kernel/20230930090106.1c3164e9@rorschach.local.home
Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: ef36b4f92868 ("eventfs: Remember what dentries were created on dir open")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Pull iomap fixes from Darrick Wong:
- Handle a race between writing and shrinking block devices by
returning EIO
- Fix a typo in a comment
* tag 'iomap-6.6-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: Spelling s/preceeding/preceding/g
iomap: add a workaround for racy i_size updates on block devices
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:
- Fix NFSv4 READ corner case
* tag 'nfsd-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: Fix zero NFSv4 READ results when RQ_SPLICE_OK is not set
|
|
Pull smb client fix from Steve French:
"Fix for password freeing potential oops (also for stable)"
* tag '6.6-rc3-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
fs/smb/client: Reset password pointer to NULL
|
|
In nilfs_gccache_submit_read_data(), brelse(bh) is called to drop the
reference count of bh when the call to nilfs_dat_translate() fails. If
the reference count hits 0 and its owner page gets unlocked, bh may be
freed. However, bh->b_page is dereferenced to put the page after that,
which may result in a use-after-free bug. This patch moves the release
operation after unlocking and putting the page.
NOTE: The function in question is only called in GC, and in combination
with current userland tools, address translation using DAT does not occur
in that function, so the code path that causes this issue will not be
executed. However, it is possible to run that code path by intentionally
modifying the userland GC library or by calling the GC ioctl directly.
[konishi.ryusuke@gmail.com: NOTE added to the commit log]
Link: https://lkml.kernel.org/r/1543201709-53191-1-git-send-email-bianpan2016@163.com
Link: https://lkml.kernel.org/r/20230921141731.10073-1-konishi.ryusuke@gmail.com
Fixes: a3d93f709e89 ("nilfs2: block cache for garbage collection")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reported-by: Ferry Meng <mengferry@linux.alibaba.com>
Closes: https://lkml.kernel.org/r/20230818092022.111054-1-mengferry@linux.alibaba.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The elf-fdpic loader hard sets the process personality to either
PER_LINUX_FDPIC for true elf-fdpic binaries or to PER_LINUX for normal ELF
binaries (in this case they would be constant displacement compiled with
-pie for example). The problem with that is that it will lose any other
bits that may be in the ELF header personality (such as the "bug
emulation" bits).
On the ARM architecture the ADDR_LIMIT_32BIT flag is used to signify a
normal 32bit binary - as opposed to a legacy 26bit address binary. This
matters since start_thread() will set the ARM CPSR register as required
based on this flag. If the elf-fdpic loader loses this bit the process
will be mis-configured and crash out pretty quickly.
Modify elf-fdpic loader personality setting so that it preserves the upper
three bytes by using the SET_PERSONALITY macro to set it. This macro in
the generic case sets PER_LINUX and preserves the upper bytes.
Architectures can override this for their specific use case, and ARM does
exactly this.
The problem shows up quite easily running under qemu using the ARM
architecture, but not necessarily on all types of real ARM hardware. If
the underlying ARM processor does not support the legacy 26-bit addressing
mode then everything will work as expected.
Link: https://lkml.kernel.org/r/20230907011808.2985083-1-gerg@kernel.org
Fixes: 1bde925d23547 ("fs/binfmt_elf_fdpic.c: provide NOMMU loader for regular ELF binaries")
Signed-off-by: Greg Ungerer <gerg@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Pull smb server fixes from Steve French:
"Two SMB3 server fixes for null pointer dereferences:
- invalid SMB3 request case (fixes issue found in testing the read
compound patch)
- iovec error case in response processing"
* tag '6.6-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: check iov vector index in ksmbd_conn_write()
ksmbd: return invalid parameter error response if smb2 request is invalid
|
|
Pull ceph fixes from Ilya Dryomov:
"A series that fixes an involved 'double watch error' deadlock in RBD
marked for stable and two cleanups"
* tag 'ceph-for-6.6-rc4' of https://github.com/ceph/ceph-client:
rbd: take header_rwsem in rbd_dev_refresh() only when updating
rbd: decouple parent info read-in from updating rbd_dev
rbd: decouple header read-in from updating rbd_dev->header
rbd: move rbd_dev_refresh() definition
Revert "ceph: make members in struct ceph_mds_request_args_ext a union"
ceph: remove unnecessary check for NULL in parse_longname()
|
|
Pull xfs fix from Chandan Babu:
- fix for commit 68b957f64fca ("xfs: load uncached unlinked inodes into
memory on demand") which address review comments provided by Dave
Chinner
* tag 'xfs-6.6-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix reloading entire unlinked bucket lists
|
|
Forget to reset ctx->password to NULL will lead to bug like double free
Cc: stable@vger.kernel.org
Cc: Willy Tarreau <w@1wt.eu>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Quang Le <quanglex97@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Fix a misspelling of "preceding".
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
I hit this panic in testing:
[ 6235.500016] run fstests generic/464 at 2023-09-18 22:51:24
[ 6288.410761] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 6288.412174] #PF: supervisor read access in kernel mode
[ 6288.413160] #PF: error_code(0x0000) - not-present page
[ 6288.413992] PGD 0 P4D 0
[ 6288.414603] Oops: 0000 [#1] PREEMPT SMP PTI
[ 6288.415419] CPU: 0 PID: 340798 Comm: kworker/u18:8 Not tainted 6.6.0-rc1-gdcf620ceebac #95
[ 6288.416538] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014
[ 6288.417701] Workqueue: nfsiod rpc_async_release [sunrpc]
[ 6288.418676] RIP: 0010:nfs_inode_remove_request+0xc8/0x150 [nfs]
[ 6288.419836] Code: ff ff 48 8b 43 38 48 8b 7b 10 a8 04 74 5b 48 85 ff 74 56 48 8b 07 a9 00 00 08 00 74 58 48 8b 07 f6 c4 10 74 50 e8 c8 44 b3 d5 <48> 8b 00 f0 48 ff 88 30 ff ff ff 5b 5d 41 5c c3 cc cc cc cc 48 8b
[ 6288.422389] RSP: 0018:ffffbd618353bda8 EFLAGS: 00010246
[ 6288.423234] RAX: 0000000000000000 RBX: ffff9a29f9a25280 RCX: 0000000000000000
[ 6288.424351] RDX: ffff9a29f9a252b4 RSI: 000000000000000b RDI: ffffef41448e3840
[ 6288.425345] RBP: ffffef41448e3840 R08: 0000000000000038 R09: ffffffffffffffff
[ 6288.426334] R10: 0000000000033f80 R11: ffff9a2a7fffa000 R12: ffff9a29093f98c4
[ 6288.427353] R13: 0000000000000000 R14: ffff9a29230f62e0 R15: ffff9a29230f62d0
[ 6288.428358] FS: 0000000000000000(0000) GS:ffff9a2a77c00000(0000) knlGS:0000000000000000
[ 6288.429513] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6288.430427] CR2: 0000000000000000 CR3: 0000000264748002 CR4: 0000000000770ef0
[ 6288.431553] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6288.432715] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 6288.433698] PKRU: 55555554
[ 6288.434196] Call Trace:
[ 6288.434667] <TASK>
[ 6288.435132] ? __die+0x1f/0x70
[ 6288.435723] ? page_fault_oops+0x159/0x450
[ 6288.436389] ? try_to_wake_up+0x98/0x5d0
[ 6288.437044] ? do_user_addr_fault+0x65/0x660
[ 6288.437728] ? exc_page_fault+0x7a/0x180
[ 6288.438368] ? asm_exc_page_fault+0x22/0x30
[ 6288.439137] ? nfs_inode_remove_request+0xc8/0x150 [nfs]
[ 6288.440112] ? nfs_inode_remove_request+0xa0/0x150 [nfs]
[ 6288.440924] nfs_commit_release_pages+0x16e/0x340 [nfs]
[ 6288.441700] ? __pfx_call_transmit+0x10/0x10 [sunrpc]
[ 6288.442475] ? _raw_spin_lock_irqsave+0x23/0x50
[ 6288.443161] nfs_commit_release+0x15/0x40 [nfs]
[ 6288.443926] rpc_free_task+0x36/0x60 [sunrpc]
[ 6288.444741] rpc_async_release+0x29/0x40 [sunrpc]
[ 6288.445509] process_one_work+0x171/0x340
[ 6288.446135] worker_thread+0x277/0x3a0
[ 6288.446724] ? __pfx_worker_thread+0x10/0x10
[ 6288.447376] kthread+0xf0/0x120
[ 6288.447903] ? __pfx_kthread+0x10/0x10
[ 6288.448500] ret_from_fork+0x2d/0x50
[ 6288.449078] ? __pfx_kthread+0x10/0x10
[ 6288.449665] ret_from_fork_asm+0x1b/0x30
[ 6288.450283] </TASK>
[ 6288.450688] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc nls_iso8859_1 nls_cp437 vfat fat 9p netfs ext4 kvm_intel crc16 mbcache jbd2 joydev kvm xfs irqbypass virtio_net pcspkr net_failover psmouse failover 9pnet_virtio cirrus drm_shmem_helper virtio_balloon drm_kms_helper button evdev drm loop dm_mod zram zsmalloc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha512_ssse3 sha512_generic virtio_blk nvme aesni_intel crypto_simd cryptd nvme_core t10_pi i6300esb crc64_rocksoft_generic crc64_rocksoft crc64 virtio_pci virtio virtio_pci_legacy_dev virtio_pci_modern_dev virtio_ring serio_raw btrfs blake2b_generic libcrc32c crc32c_generic crc32c_intel xor raid6_pq autofs4
[ 6288.460211] CR2: 0000000000000000
[ 6288.460787] ---[ end trace 0000000000000000 ]---
[ 6288.461571] RIP: 0010:nfs_inode_remove_request+0xc8/0x150 [nfs]
[ 6288.462500] Code: ff ff 48 8b 43 38 48 8b 7b 10 a8 04 74 5b 48 85 ff 74 56 48 8b 07 a9 00 00 08 00 74 58 48 8b 07 f6 c4 10 74 50 e8 c8 44 b3 d5 <48> 8b 00 f0 48 ff 88 30 ff ff ff 5b 5d 41 5c c3 cc cc cc cc 48 8b
[ 6288.465136] RSP: 0018:ffffbd618353bda8 EFLAGS: 00010246
[ 6288.465963] RAX: 0000000000000000 RBX: ffff9a29f9a25280 RCX: 0000000000000000
[ 6288.467035] RDX: ffff9a29f9a252b4 RSI: 000000000000000b RDI: ffffef41448e3840
[ 6288.468093] RBP: ffffef41448e3840 R08: 0000000000000038 R09: ffffffffffffffff
[ 6288.469121] R10: 0000000000033f80 R11: ffff9a2a7fffa000 R12: ffff9a29093f98c4
[ 6288.470109] R13: 0000000000000000 R14: ffff9a29230f62e0 R15: ffff9a29230f62d0
[ 6288.471106] FS: 0000000000000000(0000) GS:ffff9a2a77c00000(0000) knlGS:0000000000000000
[ 6288.472216] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6288.473059] CR2: 0000000000000000 CR3: 0000000264748002 CR4: 0000000000770ef0
[ 6288.474096] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6288.475097] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 6288.476148] PKRU: 55555554
[ 6288.476665] note: kworker/u18:8[340798] exited with irqs disabled
Once we've released "req", it's not safe to dereference it anymore.
Decrement the nrequests counter before dropping the reference.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
nfsd4_encode_readv() uses xdr->buf->page_len as a starting point for
the nfsd_iter_read() sink buffer -- page_len is going to be offset
by the parts of the COMPOUND that have already been encoded into
xdr->buf->pages.
However, that value must be captured /before/
xdr_reserve_space_vec() advances page_len by the expected size of
the read payload. Otherwise, the whole front part of the first
page of the payload in the reply will be uninitialized.
Mantas hit this because sec=krb5i forces RQ_SPLICE_OK off, which
invokes the readv part of the nfsd4_encode_read() path. Also,
older Linux NFS clients appear to send shorter READ requests
for files smaller than a page, whereas newer clients just send
page-sized requests and let the server send as many bytes as
are in the file.
Reported-by: Mantas Mikulėnas <grawity@gmail.com>
Closes: https://lore.kernel.org/linux-nfs/f1d0b234-e650-0f6e-0f5d-126b3d51d1eb@gmail.com/
Fixes: 703d75215555 ("NFSD: Hoist rq_vec preparation into nfsd_read() [step two]")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Commit 4dc73c679114 reintroduces the deadlock that was fixed by commit
aeabb3c96186 ("NFSv4: Fix a NFSv4 state manager deadlock") because it
prevents the setup of new threads to handle reboot recovery, while the
older recovery thread is stuck returning delegations.
Fixes: 4dc73c679114 ("NFSv4: keep state manager thread active if swap is enabled")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
If the NFS4CLNT_RUN_MANAGER flag got set just before we cleared
NFS4CLNT_MANAGER_RUNNING, then we might have won the race against
nfs4_schedule_state_manager(), and are responsible for handling the
recovery situation.
Fixes: aeabb3c96186 ("NFSv4: Fix a NFSv4 state manager deadlock")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- delayed refs fixes:
- fix race when refilling delayed refs block reserve
- prevent transaction block reserve underflow when starting
transaction
- error message and value adjustments
- fix build warnings with CONFIG_CC_OPTIMIZE_FOR_SIZE and
-Wmaybe-uninitialized
- fix for smatch report where uninitialized data from invalid extent
buffer range could be returned to the caller
- fix numeric overflow in statfs when calculating lower threshold
for a full filesystem
* tag 'for-6.6-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: initialize start_slot in btrfs_log_prealloc_extents
btrfs: make sure to initialize start and len in find_free_dev_extent
btrfs: reset destination buffer when read_extent_buffer() gets invalid range
btrfs: properly report 0 avail for very full file systems
btrfs: log message if extent item not found when running delayed extent op
btrfs: remove redundant BUG_ON() from __btrfs_inc_extent_ref()
btrfs: return -EUCLEAN for delayed tree ref with a ref count not equals to 1
btrfs: prevent transaction block reserve underflow when starting transaction
btrfs: fix race when refilling delayed refs block reserve
|
|
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"This contains the usual miscellaneous fixes and cleanups for vfs and
individual fses:
Fixes:
- Revert ki_pos on error from buffered writes for direct io fallback
- Add missing documentation for block device and superblock handling
for changes merged this cycle
- Fix reiserfs flexible array usage
- Ensure that overlayfs sets ctime when setting mtime and atime
- Disable deferred caller completions with overlayfs writes until
proper support exists
Cleanups:
- Remove duplicate initialization in pipe code
- Annotate aio kioctx_table with __counted_by"
* tag 'v6.6-rc4.vfs.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
overlayfs: set ctime when setting mtime and atime
ntfs3: put resources during ntfs_fill_super()
ovl: disable IOCB_DIO_CALLER_COMP
porting: document superblock as block device holder
porting: document new block device opening order
fs/pipe: remove duplicate "offset" initializer
fs-writeback: do not requeue a clean inode having skipped pages
aio: Annotate struct kioctx_table with __counted_by
direct_write_fallback(): on error revert the ->ki_pos update from buffered write
reiserfs: Replace 1-element array with C99 style flex-array
|
|
A szybot reproducer that does write I/O while truncating the size of a
block device can end up in clean_bdev_aliases, which tries to clean the
bdev aliases that it uses. This is because iomap_to_bh automatically
sets the BH_New flag when outside of i_size. For block devices updates
to i_size are racy and we can hit this case in a tiny race window,
leading to the eventual clean_bdev_aliases call. Fix this by erroring
out of > i_size I/O on block devices.
Reported-by: syzbot+1fa947e7f09e136925b8@syzkaller.appspotmail.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: syzbot+1fa947e7f09e136925b8@syzkaller.appspotmail.com
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Nathan reported that he was seeing the new warning in
setattr_copy_mgtime pop when starting podman containers. Overlayfs is
trying to set the atime and mtime via notify_change without also
setting the ctime.
POSIX states that when the atime and mtime are updated via utimes() that
we must also update the ctime to the current time. The situation with
overlayfs copy-up is analogies, so add ATTR_CTIME to the bitmask.
notify_change will fill in the value.
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Message-Id: <20230913-ctime-v1-1-c6bc509cbc27@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|