diff options
author | Christian Brauner <brauner@kernel.org> | 2023-11-04 15:00:13 +0100 |
---|---|---|
committer | Christian Brauner <brauner@kernel.org> | 2023-11-18 14:59:25 +0100 |
commit | 7366f8b6fc6aa21c4199cb5d337b023df69745b0 (patch) | |
tree | 6d16ad249783861ead62b0f34fb3675056446dee /include | |
parent | efa5d065b4a0790061921194019c321ad335b8db (diff) |
fs: handle freezing from multiple devices
Before [1] freezing a filesystems through the block layer only worked
for the main block device as the owning superblock of additional block
devices could not be found. Any filesystem that made use of multiple
block devices would only be freezable via it's main block device.
For example, consider xfs over device mapper with /dev/dm-0 as main
block device and /dev/dm-1 as external log device. Two freeze requests
before [1]:
(1) dmsetup suspend /dev/dm-0 on the main block device
bdev_freeze(dm-0)
-> dm-0->bd_fsfreeze_count++
-> freeze_super(xfs-sb)
The owning superblock is found and the filesystem gets frozen.
Returns 0.
(2) dmsetup suspend /dev/dm-1 on the log device
bdev_freeze(dm-1)
-> dm-1->bd_fsfreeze_count++
The owning superblock isn't found and only the block device freeze
count is incremented. Returns 0.
Two freeze requests after [1]:
(1') dmsetup suspend /dev/dm-0 on the main block device
bdev_freeze(dm-0)
-> dm-0->bd_fsfreeze_count++
-> freeze_super(xfs-sb)
The owning superblock is found and the filesystem gets frozen.
Returns 0.
(2') dmsetup suspend /dev/dm-1 on the log device
bdev_freeze(dm-0)
-> dm-0->bd_fsfreeze_count++
-> freeze_super(xfs-sb)
The owning superblock is found and the filesystem gets frozen.
Returns -EBUSY.
When (2') is called we initiate a freeze from another block device of
the same superblock. So we increment the bd_fsfreeze_count for that
additional block device. But we now also find the owning superblock for
additional block devices and call freeze_super() again which reports
-EBUSY.
This can be reproduced through xfstests via:
mkfs.xfs -f -m crc=1,reflink=1,rmapbt=1, -i sparse=1 -lsize=1g,logdev=/dev/nvme1n1p4 /dev/nvme1n1p3
mkfs.xfs -f -m crc=1,reflink=1,rmapbt=1, -i sparse=1 -lsize=1g,logdev=/dev/nvme1n1p6 /dev/nvme1n1p5
FSTYP=xfs
export TEST_DEV=/dev/nvme1n1p3
export TEST_DIR=/mnt/test
export TEST_LOGDEV=/dev/nvme1n1p4
export SCRATCH_DEV=/dev/nvme1n1p5
export SCRATCH_MNT=/mnt/scratch
export SCRATCH_LOGDEV=/dev/nvme1n1p6
export USE_EXTERNAL=yes
sudo ./check generic/311
Current semantics allow two concurrent freezers: one initiated from
userspace via FREEZE_HOLDER_USERSPACE and one initiated from the kernel
via FREEZE_HOLDER_KERNEL. If there are multiple concurrent freeze
requests from either FREEZE_HOLDER_USERSPACE or FREEZE_HOLDER_KERNEL
-EBUSY is returned.
We need to preserve these semantics because as they are uapi via
FIFREEZE and FITHAW ioctl()s. IOW, freezes don't nest for FIFREEZE and
FITHAW. Other kernels consumers rely on non-nesting freezes as well.
With freezes initiated from the block layer freezes need to nest if the
same superblock is frozen via multiple devices. So we need to start
counting the number of freeze requests.
If FREEZE_MAY_NEST is passed alongside FREEZE_HOLDER_KERNEL or
FREEZE_HOLDER_USERSPACE we allow the caller to nest freeze calls.
To accommodate the old semantics we split the freeze counter into two
counting kernel initiated and userspace initiated freezes separately. We
can then also stop recording FREEZE_HOLDER_* in struct sb_writers.
We also simplify freezing by making all concurrent freezers share a
single active superblock reference count instead of having separate
references for kernel and userspace. I don't see why we would need two
active reference counts. Neither FREEZE_HOLDER_KERNEL nor
FREEZE_HOLDER_USERSPACE can put the active reference as long as they are
concurrent freezers anwyay. That was already true before we allowed
nesting freezes.
Survives various fstests runs with different options including the
reproducer, online scrub, and online repair, fsfreze, and so on. Also
survives blktests.
Link: https://lore.kernel.org/linux-block/87bkccnwxc.fsf@debian-BULLSEYE-live-builder-AMD64
Link: https://lore.kernel.org/r/20231104-vfs-multi-device-freeze-v2-2-5b5b69626eac@kernel.org
Fixes: 288d8706abfc ("bdev: implement freeze and thaw holder operations") [1] # no backport needed
Tested-by: Chandan Babu R <chandanbabu@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reported-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Diffstat (limited to 'include')
-rw-r--r-- | include/linux/fs.h | 18 |
1 files changed, 17 insertions, 1 deletions
diff --git a/include/linux/fs.h b/include/linux/fs.h index 7dc6c1bf5f55..b2a3f1c61c19 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1185,7 +1185,8 @@ enum { struct sb_writers { unsigned short frozen; /* Is sb frozen? */ - unsigned short freeze_holders; /* Who froze fs? */ + int freeze_kcount; /* How many kernel freeze requests? */ + int freeze_ucount; /* How many userspace freeze requests? */ struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS]; }; @@ -2051,9 +2052,24 @@ extern loff_t vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos, struct file *dst_file, loff_t dst_pos, loff_t len, unsigned int remap_flags); +/** + * enum freeze_holder - holder of the freeze + * @FREEZE_HOLDER_KERNEL: kernel wants to freeze or thaw filesystem + * @FREEZE_HOLDER_USERSPACE: userspace wants to freeze or thaw filesystem + * @FREEZE_MAY_NEST: whether nesting freeze and thaw requests is allowed + * + * Indicate who the owner of the freeze or thaw request is and whether + * the freeze needs to be exclusive or can nest. + * Without @FREEZE_MAY_NEST, multiple freeze and thaw requests from the + * same holder aren't allowed. It is however allowed to hold a single + * @FREEZE_HOLDER_USERSPACE and a single @FREEZE_HOLDER_KERNEL freeze at + * the same time. This is relied upon by some filesystems during online + * repair or similar. + */ enum freeze_holder { FREEZE_HOLDER_KERNEL = (1U << 0), FREEZE_HOLDER_USERSPACE = (1U << 1), + FREEZE_MAY_NEST = (1U << 2), }; struct super_operations { |