Age | Commit message (Collapse) | Author | Files | Lines |
|
Merge updates from Andrew Morton:
- A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs
- Most the MM patches which precede the patches in Willy's tree: kasan,
pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb,
userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp,
cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap,
zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (227 commits)
mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release()
Docs/ABI/testing: add DAMON sysfs interface ABI document
Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface
selftests/damon: add a test for DAMON sysfs interface
mm/damon/sysfs: support DAMOS stats
mm/damon/sysfs: support DAMOS watermarks
mm/damon/sysfs: support schemes prioritization
mm/damon/sysfs: support DAMOS quotas
mm/damon/sysfs: support DAMON-based Operation Schemes
mm/damon/sysfs: support the physical address space monitoring
mm/damon/sysfs: link DAMON for virtual address spaces monitoring
mm/damon: implement a minimal stub for sysfs-based DAMON interface
mm/damon/core: add number of each enum type values
mm/damon/core: allow non-exclusive DAMON start/stop
Docs/damon: update outdated term 'regions update interval'
Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling
Docs/vm/damon: call low level monitoring primitives the operations
mm/damon: remove unnecessary CONFIG_DAMON option
mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}()
mm/damon/dbgfs-test: fix is_target_id() change
...
|
|
PF_SWAPWRITE has been redundant since v3.2 commit ee72886d8ed5 ("mm:
vmscan: do not writeback filesystem pages in direct reclaim").
Coincidentally, NeilBrown's current patch "remove inode_congested()"
deletes may_write_to_inode(), which appeared to be the one function which
took notice of PF_SWAPWRITE. But if you study the old logic, and the
conditions under which may_write_to_inode() was called, you discover that
flag and function have been pointless for a decade.
Link: https://lkml.kernel.org/r/75e80e7-742d-e3bd-531-614db8961e4@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: NeilBrown <neilb@suse.de>
Cc: Jan Kara <jack@suse.de>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Userfaultfd is supposed to provide the full address (i.e., unmasked) of
the faulting access back to userspace. However, that is not the case for
quite some time.
Even running "userfaultfd_demo" from the userfaultfd man page provides the
wrong output (and contradicts the man page). Notice that
"UFFD_EVENT_PAGEFAULT event" shows the masked address (7fc5e30b3000) and
not the first read address (0x7fc5e30b300f).
Address returned by mmap() = 0x7fc5e30b3000
fault_handler_thread():
poll() returns: nready = 1; POLLIN = 1; POLLERR = 0
UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fc5e30b3000
(uffdio_copy.copy returned 4096)
Read address 0x7fc5e30b300f in main(): A
Read address 0x7fc5e30b340f in main(): A
Read address 0x7fc5e30b380f in main(): A
Read address 0x7fc5e30b3c0f in main(): A
The exact address is useful for various reasons and specifically for
prefetching decisions. If it is known that the memory is populated by
certain objects whose size is not page-aligned, then based on the faulting
address, the uffd-monitor can decide whether to prefetch and prefault the
adjacent page.
This bug has been for quite some time in the kernel: since commit
1a29d85eb0f1 ("mm: use vmf->address instead of of vmf->virtual_address")
vmf->virtual_address"), which dates back to 2016. A concern has been
raised that existing userspace application might rely on the old/wrong
behavior in which the address is masked. Therefore, it was suggested to
provide the masked address unless the user explicitly asks for the exact
address.
Add a new userfaultfd feature UFFD_FEATURE_EXACT_ADDRESS to direct
userfaultfd to provide the exact address. Add a new "real_address" field
to vmf to hold the unmasked address. Provide the address to userspace
accordingly.
Initialize real_address in various code-paths to be consistent with
address, even when it is not used, to be on the safe side.
[namit@vmware.com: initialize real_address on all code paths, per Jan]
Link: https://lkml.kernel.org/r/20220226022655.350562-1-namit@vmware.com
[akpm@linux-foundation.org: fix typo in comment, per Jan]
Link: https://lkml.kernel.org/r/20220218041003.3508-1-namit@vmware.com
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Like inode cache, the dentry will also be added to its memcg list_lru. So
replace kmem_cache_alloc() with kmem_cache_alloc_lru() to allocate dentry.
Link: https://lkml.kernel.org/r/20220228122126.37293-8-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Link: https://lkml.kernel.org/r/20220228122126.37293-6-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() of all filesystems to alloc_inode_sb().
Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Theodore Ts'o <tytso@mit.edu> [ext4]
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The allocated inode cache is supposed to be added to its memcg list_lru
which should be allocated as well in advance. That can be done by
kmem_cache_alloc_lru() which allocates object and list_lru. The file
systems is main user of it. So introduce alloc_inode_sb() to allocate
file system specific inodes and set up the inode reclaim context
properly. The file system is supposed to use alloc_inode_sb() to
allocate inodes.
In later patches, we will convert all users to the new API.
Link: https://lkml.kernel.org/r/20220228122126.37293-4-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Check lru_cache_disabled under bh_lru_lock. Otherwise, it could introduce
race below and it fails to migrate pages containing buffer_head.
CPU 0 CPU 1
bh_lru_install
lru_cache_disable
lru_cache_disabled = false
atomic_inc(&lru_disable_count);
invalidate_bh_lrus_cpu of CPU 0
bh_lru_lock
__invalidate_bh_lrus
bh_lru_unlock
bh_lru_lock
install the bh
bh_lru_unlock
WHen this race happens a CMA allocation fails, which is critical for
the workload which depends on CMA.
Link: https://lkml.kernel.org/r/20220308180709.2017638-1-minchan@kernel.org
Fixes: 8cc621d2f45d ("mm: fs: invalidate BH LRU during page migration")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Chris Goldsworthy <cgoldswo@codeaurora.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: John Dias <joaodias@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit f8b92ba67c5d ("mount: Add mount warning for impending timestamp
expiry") introduced a mount warning regarding filesystem timestamp
limits, that is printed upon each writable mount or remount.
This can result in a lot of unnecessary messages in the kernel log in
setups where filesystems are being frequently remounted (or mounted
multiple times).
Avoid this by setting a superblock flag which indicates that the warning
has been emitted at least once for any particular mount, as suggested in
[1].
Link: https://lore.kernel.org/CAHk-=wim6VGnxQmjfK_tDg6fbHYKL4EFkmnTjVr9QnRqjDBAeA@mail.gmail.com/ [1]
Link: https://lkml.kernel.org/r/20220119202934.26495-1-ailiop@suse.com
Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
As congestion is no longer tracked, congestion_wait() is effectively
equivalent to io_schedule_timeout().
So introduce f2fs_io_schedule_timeout() which sets TASK_UNINTERRUPTIBLE
and call that instead.
Link: https://lkml.kernel.org/r/164549983744.9187.6425865370954230902.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
These functions are no longer useful as no BDIs report congestions any
more.
Removing the test on bdi_write_contested() in current_may_throttle()
could cause a small change in behaviour, but only when PF_LOCAL_THROTTLE
is set.
So replace the calls by 'false' and simplify the code - and remove the
functions.
[akpm@linux-foundation.org: fix build]
Link: https://lkml.kernel.org/r/164549983742.9187.2570198746005819592.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nilfs]
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
inode_congested() reports if the backing-device for the inode is
congested. No bdi reports congestion any more, so this always returns
'false'.
So remove inode_congested() and related functions, and remove the call
sites, assuming that inode_congested() always returns 'false'.
Link: https://lkml.kernel.org/r/164549983741.9187.2174285592262191311.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The bdi congestion tracking in not widely used and will be removed.
CEPHfs is one of a small number of filesystems that uses it, setting just
the async (write) congestion flags at what it determines are appropriate
times.
The only remaining effect of the async flag is to cause (some)
WB_SYNC_NONE writes to be skipped.
So instead of setting the flag, set an internal flag and change:
- .writepages to do nothing if WB_SYNC_NONE and the flag is set
- .writepage to return AOP_WRITEPAGE_ACTIVATE if WB_SYNC_NONE and the
flag is set.
The writepages change causes a behavioural change in that pageout() can
now return PAGE_ACTIVATE instead of PAGE_KEEP, so SetPageActive() will
be called on the page which (I think) wil further delay the next attempt
at writeout. This might be a good thing.
Link: https://lkml.kernel.org/r/164549983739.9187.14895675781408171186.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The bdi congestion tracking in not widely used and will be removed.
NFS is one of a small number of filesystems that uses it, setting just
the async (write) congestion flag at what it determines are appropriate
times.
The only remaining effect of the async flag is to cause (some)
WB_SYNC_NONE writes to be skipped.
So instead of setting the flag, set an internal flag and change:
- .writepages to do nothing if WB_SYNC_NONE and the flag is set
- .writepage to return AOP_WRITEPAGE_ACTIVATE if WB_SYNC_NONE and the
flag is set.
The writepages change causes a behavioural change in that pageout() can
now return PAGE_ACTIVATE instead of PAGE_KEEP, so SetPageActive() will be
called on the page which (I think) wil further delay the next attempt at
writeout. This might be a good thing.
Link: https://lkml.kernel.org/r/164549983738.9187.3972219847989393182.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The bdi congestion tracking in not widely used and will be removed.
Fuse is one of a small number of filesystems that uses it, setting both
the sync (read) and async (write) congestion flags at what it determines
are appropriate times.
The only remaining effect of the sync flag is to cause read-ahead to be
skipped. The only remaining effect of the async flag is to cause (some)
WB_SYNC_NONE writes to be skipped.
So instead of setting the flags, change:
- .readahead to stop when it has submitted all non-async pages for
read.
- .writepages to do nothing if WB_SYNC_NONE and the flag would be set
- .writepage to return AOP_WRITEPAGE_ACTIVATE if WB_SYNC_NONE and the
flag would be set.
The writepages change causes a behavioural change in that pageout() can
now return PAGE_ACTIVATE instead of PAGE_KEEP, so SetPageActive() will be
called on the page which (I think) will further delay the next attempt at
writeout. This might be a good thing.
Link: https://lkml.kernel.org/r/164549983737.9187.2627117501000365074.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
inode->i_mutex has been replaced with inode->i_rwsem long ago. Fix
comments still mentioning i_mutex.
Link: https://lkml.kernel.org/r/20220214031314.100094-1-hongnan.li@linux.alibaba.com
Signed-off-by: hongnanli <hongnan.li@linux.alibaba.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Simply return directly instead of assign the return value to another
variable.
Link: https://lkml.kernel.org/r/20220114021641.13927-1-joseph.qi@linux.alibaba.com
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Cc: Minghao Chi <chi.minghao@zte.com.cn>
Cc: CGEL ZTE <cgel.zte@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
ntfs_read_inode_mount invokes ntfs_malloc_nofs with zero allocation
size. It triggers one BUG in the __ntfs_malloc function.
Fix this by adding sanity check on ni->attr_list_size.
Link: https://lkml.kernel.org/r/20220120094914.47736-1-dzm91@hust.edu.cn
Reported-by: syzbot+3c765c5248797356edaa@syzkaller.appspotmail.com
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Acked-by: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"This contains feature updates, performance improvements, preparatory
and core work and some related VFS updates:
Features:
- encoded read/write ioctls, allows user space to read or write raw
data directly to extents (now compressed, encrypted in the future),
will be used by send/receive v2 where it saves processing time
- zoned mode now works with metadata DUP (the mkfs.btrfs default)
- error message header updates:
- print error state: transaction abort, other error, log tree
errors
- print transient filesystem state: remount, device replace,
ignored checksum verifications
- tree-checker: verify the transaction id of the to-be-written dirty
extent buffer
Performance improvements for fsync:
- directory logging speedups (up to -90% run time)
- avoid logging all directory changes during renames (up to -60% run
time)
- avoid inode logging during rename and link when possible (up to
-60% run time)
- prepare extents to be logged before locking a log tree path
(throughput +7%)
- stop copying old file extents when doing a full fsync()
- improved logging of old extents after truncate
Core, fixes:
- improved stale device identification by dev_t and not just path
(for devices that are behind other layers like device mapper)
- continued extent tree v2 preparatory work
- disable features that won't work yet
- add wrappers and abstractions for new tree roots
- improved error handling
- add super block write annotations around background block group
reclaim
- fix device scanning messages potentially accessing stale pointer
- cleanups and refactoring
VFS:
- allow reflinks/deduplication from two different mounts of the same
filesystem
- export and add helpers for read/write range verification, for the
encoded ioctls"
* tag 'for-5.18-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (98 commits)
btrfs: zoned: put block group after final usage
btrfs: don't access possibly stale fs_info data in device_list_add
btrfs: add lockdep_assert_held to need_preemptive_reclaim
btrfs: verify the tranisd of the to-be-written dirty extent buffer
btrfs: unify the error handling of btrfs_read_buffer()
btrfs: unify the error handling pattern for read_tree_block()
btrfs: factor out do_free_extent_accounting helper
btrfs: remove last_ref from the extent freeing code
btrfs: add a alloc_reserved_extent helper
btrfs: remove BUG_ON(ret) in alloc_reserved_tree_block
btrfs: add and use helper for unlinking inode during log replay
btrfs: extend locking to all space_info members accesses
btrfs: zoned: mark relocation as writing
fs: allow cross-vfsmount reflink/dedupe
btrfs: remove the cross file system checks from remap
btrfs: pass btrfs_fs_info to btrfs_recover_relocation
btrfs: pass btrfs_fs_info for deleting snapshots and cleaner
btrfs: add filesystems state details to error messages
btrfs: deal with unexpected extent type during reflinking
btrfs: fix unexpected error path when reflinking an inline extent
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Fix some bugs in converting ext4 to use the new mount API, as well as
more bug fixes and clean ups in the ext4 fast_commit feature (most
notably, in the tracepoints).
In the jbd2 layer, the t_handle_lock spinlock has been removed, with
the last place where it was actually needed replaced with an atomic
cmpxchg"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (35 commits)
ext4: fix kernel doc warnings
ext4: fix remaining two trace events to use same printk convention
ext4: add commit tid info in ext4_fc_commit_start/stop trace events
ext4: add commit_tid info in jbd debug log
ext4: add transaction tid info in fc_track events
ext4: add new trace event in ext4_fc_cleanup
ext4: return early for non-eligible fast_commit track events
ext4: do not call FC trace event in ext4_fc_commit() if FS does not support FC
ext4: convert ext4_fc_track_dentry type events to use event class
ext4: fix ext4_fc_stats trace point
ext4: remove unused enum EXT4_FC_COMMIT_FAILED
ext4: warn when dirtying page w/o buffers in data=journal mode
doc: fixed a typo in ext4 documentation
ext4: make mb_optimize_scan performance mount option work with extents
ext4: make mb_optimize_scan option work with set/unset mount cmd
ext4: don't BUG if someone dirty pages without asking ext4 first
ext4: remove redundant assignment to variable split_flag1
ext4: fix underflow in ext4_max_bitmap_size()
ext4: fix ext4_mb_clear_bb() kernel-doc comment
ext4: fix fs corruption when tring to remove a non-empty directory with IO error
...
|
|
Pull nfsd updates from Chuck Lever:
"New features:
- NFSv3 support in NFSD is now always built
- Added NFSD support for the NFSv4 birth-time file attribute
- Added support for storing and displaying sockaddrs in trace points
- NFSD now recognizes RPC_AUTH_TLS probes
Performance improvements:
- Optimized the svc transport enqueuing mechanism
- Added micro-optimizations for the duplicate reply cache
Notable bug fixes:
- Allocation of the NFSD file cache hash table is more reliable"
* tag 'nfsd-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (30 commits)
nfsd: fix using the correct variable for sizeof()
nfsd: use correct format characters
NFSD: prevent integer overflow on 32 bit systems
NFSD: prevent underflow in nfssvc_decode_writeargs()
fs/lock: documentation cleanup. Replace inode->i_lock with flc_lock.
NFSD: Fix nfsd_breaker_owns_lease() return values
NFSD: Clean up _lm_ operation names
arch: Remove references to CONFIG_NFSD_V3 in the default configs
NFSD: Remove CONFIG_NFSD_V3
nfsd: more robust allocation failure handling in nfsd_file_cache_init
SUNRPC: Teach server to recognize RPC_AUTH_TLS
NFSD: Move svc_serv_ops::svo_function into struct svc_serv
NFSD: Remove svc_serv_ops::svo_module
SUNRPC: Remove svc_shutdown_net()
SUNRPC: Rename svc_close_xprt()
SUNRPC: Rename svc_create_xprt()
SUNRPC: Remove svo_shutdown method
SUNRPC: Merge svc_do_enqueue_xprt() into svc_enqueue_xprt()
SUNRPC: Remove the .svo_enqueue_xprt method
SUNRPC: Record endpoint information in trace log
...
|
|
Pull cfis updates from Steve French:
"Handlecache, unmount, fiemap and two reconnect fixes"
* tag '5.18-smb3-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6:
cifs: use a different reconnect helper for non-cifsd threads
cifs: we do not need a spinlock around the tree access during umount
Adjust cifssb maximum read size
cifs: truncate the inode and mapping when we simulate fcollapse
cifs: fix handlecache and multiuser
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this cycle, f2fs has some performance improvements for Android
workloads such as using read-unfair rwsems and adding some sysfs
entries to control GCs and discard commands in more details. In
addtiion, it has some tunings to improve the recovery speed after
sudden power-cut.
Enhancement:
- add reader-unfair rwsems with F2FS_UNFAIR_RWSEM: will replace with
generic API support
- adjust to make the readahead/recovery flow more efficiently
- sysfs entries to control issue speeds of GCs and Discard commands
- enable idmapped mounts
Bug fix:
- correct wrong error handling routines
- fix missing conditions in quota
- fix a potential deadlock between writeback and block plug routines
- fix a deadlock btween freezefs and evict_inode
We've added some boundary checks to avoid kernel panics on corrupted
images, and several minor code clean-ups"
* tag 'f2fs-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (27 commits)
f2fs: fix to do sanity check on .cp_pack_total_block_count
f2fs: make gc_urgent and gc_segment_mode sysfs node readable
f2fs: use aggressive GC policy during f2fs_disable_checkpoint()
f2fs: fix compressed file start atomic write may cause data corruption
f2fs: initialize sbi->gc_mode explicitly
f2fs: introduce gc_urgent_mid mode
f2fs: compress: fix to print raw data size in error path of lz4 decompression
f2fs: remove redundant parameter judgment
f2fs: use spin_lock to avoid hang
f2fs: don't get FREEZE lock in f2fs_evict_inode in frozen fs
f2fs: remove unnecessary read for F2FS_FITS_IN_INODE
f2fs: introduce F2FS_UNFAIR_RWSEM to support unfair rwsem
f2fs: avoid an infinite loop in f2fs_sync_dirty_inodes
f2fs: fix to do sanity check on curseg->alloc_type
f2fs: fix to avoid potential deadlock
f2fs: quota: fix loop condition at f2fs_quota_sync()
f2fs: Restore rwsem lockdep support
f2fs: fix missing free nid in f2fs_handle_failed_inode
f2fs: support idmapped mounts
f2fs: add a way to limit roll forward recovery time
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"In this cycle, we continue converting to use meta buffers for all
remaining uncompressed paths to prepare for the upcoming subpage,
folio and fscache features.
We also fixed a double-free issue when sysfs initialization fails,
which was reported by syzbot.
Besides, in order for the userspace to control per-file timestamp
easier, we now switch to record mtime instead of ctime with a
compatible feature marked. And there are also some code cleanups and
documentation update as usual.
Summary:
- Avoid using page structure directly for all uncompressed paths
- Fix a double-free issue when sysfs initialization fails
- Complete DAX description for erofs
- Use mtime instead since there's no (easy) way for users to control
ctime
- Several code cleanups"
* tag 'erofs-for-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: rename ctime to mtime
erofs: use meta buffers for inode lookup
erofs: use meta buffers for reading directories
fs: erofs: add sanity check for kobject in erofs_unregister_sysfs
erofs: refine managed inode stuffs
erofs: clean up z_erofs_extent_lookback
erofs: silence warnings related to impossible m_plen
Documentation/filesystem/dax: update DAX description on erofs
erofs: clean up preload_compressed_pages()
erofs: get rid of `struct z_erofs_collector'
erofs: use meta buffers for erofs_read_superblock()
|
|
Pull fscrypt updates from Eric Biggers:
"Add support for direct I/O on encrypted files when blk-crypto (inline
encryption) is being used for file contents encryption"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
fscrypt: update documentation for direct I/O support
f2fs: support direct I/O with fscrypt using blk-crypto
ext4: support direct I/O with fscrypt using blk-crypto
iomap: support direct I/O with fscrypt using blk-crypto
fscrypt: add functions for direct I/O support
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore updates from Kees Cook:
- Don't use semaphores in always-atomic-context code (Jann Horn)
- Add "ECC:" prefix to ECC messages (Vincent Whitchurch)
* tag 'pstore-v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore: Don't use semaphores in always-atomic-context code
pstore: Add prefix to ECC messages
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve updates from Kees Cook:
"Execve and binfmt updates.
Eric and I have stepped up to be the active maintainers of this area,
so here's our first collection. The bulk of the work was in coredump
handling fixes; additional details are noted below:
- Handle unusual AT_PHDR offsets (Akira Kawata)
- Fix initial mapping size when PT_LOADs are not ordered (Alexey
Dobriyan)
- Move more code under CONFIG_COREDUMP (Alexey Dobriyan)
- Fix missing mmap_lock in file_files_note (Eric W. Biederman)
- Remove a.out support for alpha and m68k (Eric W. Biederman)
- Include first pages of non-exec ELF libraries in coredump (Jann
Horn)
- Don't write past end of notes for regset gap in coredump (Rick
Edgecombe)
- Comment clean-ups (Tom Rix)
- Force single empty string when argv is empty (Kees Cook)
- Add NULL argv selftest (Kees Cook)
- Properly redefine PT_GNU_* in terms of PT_LOOS (Kees Cook)
- MAINTAINERS: Update execve entry with tree (Kees Cook)
- Introduce initial KUnit testing for binfmt_elf (Kees Cook)"
* tag 'execve-v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
binfmt_elf: Don't write past end of notes for regset gap
a.out: Stop building a.out/osf1 support on alpha and m68k
coredump: Don't compile flat_core_dump when coredumps are disabled
coredump: Use the vma snapshot in fill_files_note
coredump/elf: Pass coredump_params into fill_note_info
coredump: Remove the WARN_ON in dump_vma_snapshot
coredump: Snapshot the vmas in do_coredump
coredump: Move definition of struct coredump_params into coredump.h
binfmt_elf: Introduce KUnit test
ELF: Properly redefine PT_GNU_* in terms of PT_LOOS
MAINTAINERS: Update execve entry with more details
exec: cleanup comments
fs/binfmt_elf: Refactor load_elf_binary function
fs/binfmt_elf: Fix AT_PHDR for unusual ELF files
binfmt: move more stuff undef CONFIG_COREDUMP
selftests/exec: Test for empty string on NULL argv
exec: Force single empty string when argv is empty
coredump: Also dump first pages of non-executable ELF libraries
ELF: fix overflow in total mapping size calculation
|
|
git://git.kernel.dk/linux-block
Pull bio_alloc() cleanups from Jens Axboe:
"Filesystem cleanups to pass the bio op to bio_alloc() instead of
setting it just before bio submission".
* tag 'for-5.18/alloc-cleanups-2022-03-18' of git://git.kernel.dk/linux-block:
f2fs: pass the bio operation to bio_alloc_bioset
f2fs: don't pass a bio to f2fs_target_device
nilfs2: pass the operation to bio_alloc
ext4: pass the operation to bio_alloc
mpage: pass the operation to bio_alloc
|
|
Pull block updates from Jens Axboe:
- BFQ cleanups and fixes (Yu, Zhang, Yahu, Paolo)
- blk-rq-qos completion fix (Tejun)
- blk-cgroup merge fix (Tejun)
- Add offline error return value to distinguish it from an IO error on
the device (Song)
- IO stats fixes (Zhang, Christoph)
- blkcg refcount fixes (Ming, Yu)
- Fix for indefinite dispatch loop softlockup (Shin'ichiro)
- blk-mq hardware queue management improvements (Ming)
- sbitmap dead code removal (Ming, John)
- Plugging merge improvements (me)
- Show blk-crypto capabilities in sysfs (Eric)
- Multiple delayed queue run improvement (David)
- Block throttling fixes (Ming)
- Start deprecating auto module loading based on dev_t (Christoph)
- bio allocation improvements (Christoph, Chaitanya)
- Get rid of bio_devname (Christoph)
- bio clone improvements (Christoph)
- Block plugging improvements (Christoph)
- Get rid of genhd.h header (Christoph)
- Ensure drivers use appropriate flush helpers (Christoph)
- Refcounting improvements (Christoph)
- Queue initialization and teardown improvements (Ming, Christoph)
- Misc fixes/improvements (Barry, Chaitanya, Colin, Dan, Jiapeng,
Lukas, Nian, Yang, Eric, Chengming)
* tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-block: (127 commits)
block: cancel all throttled bios in del_gendisk()
block: let blkcg_gq grab request queue's refcnt
block: avoid use-after-free on throttle data
block: limit request dispatch loop duration
block/bfq-iosched: Fix spelling mistake "tenative" -> "tentative"
sr: simplify the local variable initialization in sr_block_open()
block: don't merge across cgroup boundaries if blkcg is enabled
block: fix rq-qos breakage from skipping rq_qos_done_bio()
block: flush plug based on hardware and software queue order
block: ensure plug merging checks the correct queue at least once
block: move rq_qos_exit() into disk_release()
block: do more work in elevator_exit
block: move blk_exit_queue into disk_release
block: move q_usage_counter release into blk_queue_release
block: don't remove hctx debugfs dir from blk_mq_exit_queue
block: move blkcg initialization/destroy into disk allocation/release handler
sr: implement ->free_disk to simplify refcounting
sd: implement ->free_disk to simplify refcounting
sd: delay calling free_opal_dev
sd: call sd_zbc_release_disk before releasing the scsi_device reference
...
|
|
git://git.kernel.dk/linux-block
Pull io_uring statx fixes from Jens Axboe:
"On top of the main io_uring branch, this is to ensure that the
filename component of statx is stable after submit.
That requires a few VFS related changes"
* tag 'for-5.18/io_uring-statx-2022-03-18' of git://git.kernel.dk/linux-block:
io-uring: Make statx API stable
|
|
Pull io_uring updates from Jens Axboe:
- Fixes for current file position. Still doesn't have the f_pos_lock
sorted, but it's a step in the right direction (Dylan)
- Tracing updates (Dylan, Stefan)
- Improvements to io-wq locking (Hao)
- Improvements for provided buffers (me, Pavel)
- Support for registered file descriptors (me, Xiaoguang)
- Support for ring messages (me)
- Poll improvements (me)
- Fix for fixed buffers and non-iterator reads/writes (me)
- Support for NAPI on sockets (Olivier)
- Ring quiesce improvements (Usama)
- Misc fixes (Olivier, Pavel)
* tag 'for-5.18/io_uring-2022-03-18' of git://git.kernel.dk/linux-block: (42 commits)
io_uring: terminate manual loop iterator loop correctly for non-vecs
io_uring: don't check unrelated req->open.how in accept request
io_uring: manage provided buffers strictly ordered
io_uring: fold evfd signalling under a slower path
io_uring: thin down io_commit_cqring()
io_uring: shuffle io_eventfd_signal() bits around
io_uring: remove extra barrier for non-sqpoll iopoll
io_uring: fix provided buffer return on failure for kiocb_done()
io_uring: extend provided buf return to fails
io_uring: refactor timeout cancellation cqe posting
io_uring: normilise naming for fill_cqe*
io_uring: cache poll/double-poll state with a request flag
io_uring: cache req->apoll->events in req->cflags
io_uring: move req->poll_refs into previous struct hole
io_uring: make tracing format consistent
io_uring: recycle apoll_poll entries
io_uring: remove duplicated member check for io_msg_ring_prep()
io_uring: allow submissions to continue on error
io_uring: recycle provided buffers if request goes async
io_uring: ensure reads re-import for selected buffers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
- Support for including MTE tags in ELF coredumps
- Instruction encoder updates, including fixes to 64-bit immediate
generation and support for the LSE atomic instructions
- Improvements to kselftests for MTE and fpsimd
- Symbol aliasing and linker script cleanups
- Reduce instruction cache maintenance performed for user mappings
created using contiguous PTEs
- Support for the new "asymmetric" MTE mode, where stores are checked
asynchronously but loads are checked synchronously
- Support for the latest pointer authentication algorithm ("QARMA3")
- Support for the DDR PMU present in the Marvell CN10K platform
- Support for the CPU PMU present in the Apple M1 platform
- Use the RNDR instruction for arch_get_random_{int,long}()
- Update our copy of the Arm optimised string routines for str{n}cmp()
- Fix signal frame generation for CPUs which have foolishly elected to
avoid building in support for the fpsimd instructions
- Workaround for Marvell GICv3 erratum #38545
- Clarification to our Documentation (booting reqs. and MTE prctl())
- Miscellanous cleanups and minor fixes
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (90 commits)
docs: sysfs-devices-system-cpu: document "asymm" value for mte_tcf_preferred
arm64/mte: Remove asymmetric mode from the prctl() interface
arm64: Add cavium_erratum_23154_cpus missing sentinel
perf/marvell: Fix !CONFIG_OF build for CN10K DDR PMU driver
arm64: mm: Drop 'const' from conditional arm64_dma_phys_limit definition
Documentation: vmcoreinfo: Fix htmldocs warning
kasan: fix a missing header include of static_keys.h
drivers/perf: Add Apple icestorm/firestorm CPU PMU driver
drivers/perf: arm_pmu: Handle 47 bit counters
arm64: perf: Consistently make all event numbers as 16-bits
arm64: perf: Expose some Armv9 common events under sysfs
perf/marvell: cn10k DDR perf event core ownership
perf/marvell: cn10k DDR perfmon event overflow handling
perf/marvell: CN10k DDR performance monitor support
dt-bindings: perf: marvell: cn10k ddr performance monitor
arm64: clean up tools Makefile
perf/arm-cmn: Update watchpoint format
perf/arm-cmn: Hide XP PUB events for CMN-600
arm64: drop unused includes of <linux/personality.h>
arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones
...
|
|
As bughunter reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=215709
f2fs may hang when mounting a fuzzed image, the dmesg shows as below:
__filemap_get_folio+0x3a9/0x590
pagecache_get_page+0x18/0x60
__get_meta_page+0x95/0x460 [f2fs]
get_checkpoint_version+0x2a/0x1e0 [f2fs]
validate_checkpoint+0x8e/0x2a0 [f2fs]
f2fs_get_valid_checkpoint+0xd0/0x620 [f2fs]
f2fs_fill_super+0xc01/0x1d40 [f2fs]
mount_bdev+0x18a/0x1c0
f2fs_mount+0x15/0x20 [f2fs]
legacy_get_tree+0x28/0x50
vfs_get_tree+0x27/0xc0
path_mount+0x480/0xaa0
do_mount+0x7c/0xa0
__x64_sys_mount+0x8b/0xe0
do_syscall_64+0x38/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae
The root cause is cp_pack_total_block_count field in checkpoint was fuzzed
to one, as calcuated, two cp pack block locates in the same block address,
so then read latter cp pack block, it will block on the page lock due to
the lock has already held when reading previous cp pack block, fix it by
adding sanity check for cp_pack_total_block_count.
Cc: stable@vger.kernel.org
Signed-off-by: Chao Yu <chao.yu@oppo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Changed a way of showing values of them to use strings.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
While the original code is valid, it is not the obvious choice for the
sizeof() call and in preparation to limit the scope of the list iterator
variable the sizeof should be changed to the size of the destination.
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The cifs_demultiplexer_thread should only call cifs_reconnect.
If any other thread wants to trigger a reconnect, they can do
so by updating the server tcpStatus to CifsNeedReconnect.
The last patch attempted to use the same helper function for
both types of threads, but that causes other issues
with lock dependencies.
This patch creates a new helper for non-cifsd threads, that
will indicate to cifsd that the server needs reconnect.
Fixes: 2a05137a0575 ("cifs: mark sessions for reconnection in helper function")
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Remove the spinlock around the tree traversal as we are calling possibly
sleeping functions.
We do not need a spinlock here as there will be no modifications to this
tree at this point.
This prevents warnings like this to occur in dmesg:
[ 653.774996] BUG: sleeping function called from invalid context at kernel/loc\
king/mutex.c:280
[ 653.775088] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1827, nam\
e: umount
[ 653.775152] preempt_count: 1, expected: 0
[ 653.775191] CPU: 0 PID: 1827 Comm: umount Tainted: G W OE 5.17.0\
-rc7-00006-g4eb628dd74df #135
[ 653.775195] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-\
1.fc33 04/01/2014
[ 653.775197] Call Trace:
[ 653.775199] <TASK>
[ 653.775202] dump_stack_lvl+0x34/0x44
[ 653.775209] __might_resched.cold+0x13f/0x172
[ 653.775213] mutex_lock+0x75/0xf0
[ 653.775217] ? __mutex_lock_slowpath+0x10/0x10
[ 653.775220] ? _raw_write_lock_irq+0xd0/0xd0
[ 653.775224] ? dput+0x6b/0x360
[ 653.775228] cifs_kill_sb+0xff/0x1d0 [cifs]
[ 653.775285] deactivate_locked_super+0x85/0x130
[ 653.775289] cleanup_mnt+0x32c/0x4d0
[ 653.775292] ? path_umount+0x228/0x380
[ 653.775296] task_work_run+0xd8/0x180
[ 653.775301] exit_to_user_mode_loop+0x152/0x160
[ 653.775306] exit_to_user_mode_prepare+0x89/0xd0
[ 653.775315] syscall_exit_to_user_mode+0x12/0x30
[ 653.775322] do_syscall_64+0x48/0x90
[ 653.775326] entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: 187af6e98b44e5d8f25e1d41a92db138eb54416f ("cifs: fix handlecache and multiuser")
Reported-by: kernel test robot <oliver.sang@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
When session gets reconnected during mount then read size in super block fs context
gets set to zero and after negotiate, rsize is not modified which results in
incorrect read with requested bytes as zero. Fixes intermittent failure
of xfstest generic/240
Note that stable requires a different version of this patch which will be
sent to the stable mailing list.
Signed-off-by: Rohith Surabattula <rohiths@microsoft.com>
Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
RHBZ:1997367
When we collapse a range in smb3_collapse_range() we must make sure
we update the inode size and pagecache accordingly.
If not, both inode size and pagecahce may be stale until it is refreshed.
This can be demonstrated for the inode size by running :
xfs_io -i -f -c "truncate 320k" -c "fcollapse 64k 128k" -c "fiemap -v" \
/mnt/testfile
where we can see the result of stale data in the fiemap output.
The third line of the output is wrong, all this data should be truncated.
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..127]: hole 128
1: [128..383]: 128..383 256 0x1
2: [384..639]: hole 256
And the correct output, when the inode size has been updated correctly should
look like this:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..127]: hole 128
1: [128..383]: 128..383 256 0x1
Reported-by: Xiaoli Feng <xifeng@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
In multiuser each individual user has their own tcon structure for the
share and thus their own handle for a cached directory.
When we umount such a share we much make sure to release the pinned down dentry
for each such tcon and not just the master tcon.
Otherwise we will get nasty warnings on umount that dentries are still in use:
[ 3459.590047] BUG: Dentry 00000000115c6f41{i=12000000019d95,n=/} still in use\
(2) [unmount of cifs cifs]
...
[ 3459.590492] Call Trace:
[ 3459.590500] d_walk+0x61/0x2a0
[ 3459.590518] ? shrink_lock_dentry.part.0+0xe0/0xe0
[ 3459.590526] shrink_dcache_for_umount+0x49/0x110
[ 3459.590535] generic_shutdown_super+0x1a/0x110
[ 3459.590542] kill_anon_super+0x14/0x30
[ 3459.590549] cifs_kill_sb+0xf5/0x104 [cifs]
[ 3459.590773] deactivate_locked_super+0x36/0xa0
[ 3459.590782] cleanup_mnt+0x131/0x190
[ 3459.590789] task_work_run+0x5c/0x90
[ 3459.590798] exit_to_user_mode_loop+0x151/0x160
[ 3459.590809] exit_to_user_mode_prepare+0x83/0xd0
[ 3459.590818] syscall_exit_to_user_mode+0x12/0x30
[ 3459.590828] do_syscall_64+0x48/0x90
[ 3459.590833] entry_SYSCALL_64_after_hwframe+0x44/0xae
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Pull cifs fix from Steve French:
"Small fix for regression in multiuser mounts.
The additional improvements suggested by Ronnie to make the server and
session status handling code easier to read can wait for the 5.18
merge window."
* tag '5.17-rc8-smb3-fix' of git://git.samba.org/sfrench/cifs-2.6:
smb3: fix incorrect session setup check for multiuser mounts
|
|
The fix for not advancing the iterator if we're using fixed buffers is
broken in that it can hit a condition where we don't terminate the loop.
This results in io-wq looping forever, asking to read (or write) 0 bytes
for every subsequent loop.
Reported-by: Joel Jaeschke <joel.jaeschke@gmail.com>
Link: https://github.com/axboe/liburing/issues/549
Fixes: 16c8d2df7ec0 ("io_uring: ensure symmetry in handling iter types in loop_rw_iter()")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In fill_thread_core_info() the ptrace accessible registers are collected
to be written out as notes in a core file. The note array is allocated
from a size calculated by iterating the user regset view, and counting the
regsets that have a non-zero core_note_type. However, this only allows for
there to be non-zero core_note_type at the end of the regset view. If
there are any gaps in the middle, fill_thread_core_info() will overflow the
note allocation, as it iterates over the size of the view and the
allocation would be smaller than that.
There doesn't appear to be any arch that has gaps such that they exceed
the notes allocation, but the code is brittle and tries to support
something it doesn't. It could be fixed by increasing the allocation size,
but instead just have the note collecting code utilize the array better.
This way the allocation can stay smaller.
Even in the case of no arch's that have gaps in their regset views, this
introduces a change in the resulting indicies of t->notes. It does not
introduce any changes to the core file itself, because any blank notes are
skipped in write_note_info().
In case, the allocation logic between fill_note_info() and
fill_thread_core_info() ever diverges from the usage logic, warn and skip
writing any notes that would overflow the array.
This fix is derrived from an earlier one[0] by Yu-cheng Yu.
[0] https://lore.kernel.org/lkml/20180717162502.32274-1-yu-cheng.yu@intel.com/
Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220317192013.13655-4-rick.p.edgecombe@intel.com
|
|
Looks like a victim of too much copy/paste, we should not be looking
at req->open.how in accept. The point is to check CLOEXEC and error
out, which we don't invalid direct descriptors on exec. Hence any
attempt to get a direct descriptor with CLOEXEC is invalid.
No harm is done here, as req->open.how.flags overlaps with
req->accept.flags, but it's very confusing and might change if either of
those command structs are modified.
Fixes: aaa4db12ef7b ("io_uring: accept directly into fixed file table")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Let's enable GC_URGENT_HIGH mode during f2fs_disable_checkpoint(),
so that we can use SSR allocator for GCed data/node persistence,
it can improve the performance due to it avoiding migration of
data/node locates in selected target segment of SSR allocator.
Signed-off-by: Chao Yu <chao.yu@oppo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
When compressed file has blocks, f2fs_ioc_start_atomic_write will succeed,
but compressed flag will be remained in inode. If write partial compreseed
cluster and commit atomic write will cause data corruption.
This is the reproduction process:
Step 1:
create a compressed file ,write 64K data , call fsync(), then the blocks
are write as compressed cluster.
Step2:
iotcl(F2FS_IOC_START_ATOMIC_WRITE) --- this should be fail, but not.
write page 0 and page 3.
iotcl(F2FS_IOC_COMMIT_ATOMIC_WRITE) -- page 0 and 3 write as normal file,
Step3:
drop cache.
read page 0-4 -- Since page 0 has a valid block address, read as
non-compressed cluster, page 1 and 2 will be filled with compressed data
or zero.
The root cause is, after commit 7eab7a696827 ("f2fs: compress: remove
unneeded read when rewrite whole cluster"), in step 2, f2fs_write_begin()
only set target page dirty, and in f2fs_commit_inmem_pages(), we will write
partial raw pages into compressed cluster, result in corrupting compressed
cluster layout.
Fixes: 4c8ff7095bef ("f2fs: support data compression")
Fixes: 7eab7a696827 ("f2fs: compress: remove unneeded read when rewrite whole cluster")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Fengnan Chang <changfengnan@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
It needs to initialized sbi->gc_mode to GC_NORMAL explicitly.
Signed-off-by: Chao Yu <chao.yu@oppo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
When compiling with -Wformat, clang emits the following warnings:
fs/nfsd/flexfilelayout.c:120:27: warning: format specifies type 'unsigned
char' but the argument has type 'int' [-Wformat]
"%s.%hhu.%hhu", addr, port >> 8, port & 0xff);
~~~~ ^~~~~~~~~
%d
fs/nfsd/flexfilelayout.c:120:38: warning: format specifies type 'unsigned
char' but the argument has type 'int' [-Wformat]
"%s.%hhu.%hhu", addr, port >> 8, port & 0xff);
~~~~ ^~~~~~~~~~~
%d
The types of these arguments are unconditionally defined, so this patch
updates the format character to the correct ones for ints and unsigned
ints.
Link: https://github.com/ClangBuiltLinux/linux/issues/378
Signed-off-by: Bill Wendling <morbo@google.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Tom Haynes <loghyr@hammerspace.com>
|
|
Workloads using provided buffers benefit from using and returning buffers
in the right order, and so does TLBs for that matter. Manage the internal
buffer list in a straight list, rather than use the head buffer as the
insertion node. Use a hashed list for the buffer group IDs instead of
xarray, the overhead is much lower this way. xarray provides internal
locking and other trickery that is handy for some uses cases, but
io_uring already locks internally for the buffer manipulation and needs
none of that.
This is good for about a 2% reduction in overhead, combination of the
improved management and the fact that the workload has an easier time
bundling back provided buffers.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Once s_root is set, genric_shutdown_super() will be called if
fill_super() fails. That means, we will call ocfs2_dismount_volume()
twice in such case, which can lead to kernel crash.
Fix this issue by initializing filecheck kobj before setting s_root.
Link: https://lkml.kernel.org/r/20220310081930.86305-1-joseph.qi@linux.alibaba.com
Fixes: 5f483c4abb50 ("ocfs2: add kobject for online file check")
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|