diff options
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r-- | Documentation/filesystems/autofs.rst | 2 | ||||
-rw-r--r-- | Documentation/filesystems/dlmfs.rst | 2 | ||||
-rw-r--r-- | Documentation/filesystems/f2fs.rst | 44 | ||||
-rw-r--r-- | Documentation/filesystems/fsverity.rst | 2 | ||||
-rw-r--r-- | Documentation/filesystems/index.rst | 1 | ||||
-rw-r--r-- | Documentation/filesystems/iomap/operations.rst | 15 | ||||
-rw-r--r-- | Documentation/filesystems/mount_api.rst | 3 | ||||
-rw-r--r-- | Documentation/filesystems/multigrain-ts.rst | 125 | ||||
-rw-r--r-- | Documentation/filesystems/nfs/exporting.rst | 7 | ||||
-rw-r--r-- | Documentation/filesystems/overlayfs.rst | 17 | ||||
-rw-r--r-- | Documentation/filesystems/path-lookup.rst | 2 | ||||
-rw-r--r-- | Documentation/filesystems/path-lookup.txt | 2 | ||||
-rw-r--r-- | Documentation/filesystems/porting.rst | 2 | ||||
-rw-r--r-- | Documentation/filesystems/proc.rst | 2 | ||||
-rw-r--r-- | Documentation/filesystems/ramfs-rootfs-initramfs.rst | 2 | ||||
-rw-r--r-- | Documentation/filesystems/tmpfs.rst | 24 |
16 files changed, 236 insertions, 16 deletions
diff --git a/Documentation/filesystems/autofs.rst b/Documentation/filesystems/autofs.rst index 1ac576458c69..5eb02394fcc3 100644 --- a/Documentation/filesystems/autofs.rst +++ b/Documentation/filesystems/autofs.rst @@ -442,7 +442,7 @@ which can be used to communicate directly with the autofs filesystem. It requires CAP_SYS_ADMIN for access. The 'ioctl's that can be used on this device are described in a separate -document `autofs-mount-control.txt`, and are summarised briefly here. +document `autofs-mount-control.rst`, and are summarised briefly here. Each ioctl is passed a pointer to an `autofs_dev_ioctl` structure:: struct autofs_dev_ioctl { diff --git a/Documentation/filesystems/dlmfs.rst b/Documentation/filesystems/dlmfs.rst index 7e2b1fd471d7..70d4e48242c3 100644 --- a/Documentation/filesystems/dlmfs.rst +++ b/Documentation/filesystems/dlmfs.rst @@ -36,7 +36,7 @@ None Usage ===== -If you're just interested in OCFS2, then please see ocfs2.txt. The +If you're just interested in OCFS2, then please see ocfs2.rst. The rest of this document will be geared towards those who want to use dlmfs for easy to setup and easy to use clustered locking in userspace. diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst index 68a0885fb5e6..fb7d2ee022bc 100644 --- a/Documentation/filesystems/f2fs.rst +++ b/Documentation/filesystems/f2fs.rst @@ -943,3 +943,47 @@ NVMe Zoned Namespace devices can start before the zone-capacity and span across zone-capacity boundary. Such spanning segments are also considered as usable segments. All blocks past the zone-capacity are considered unusable in these segments. + +Device aliasing feature +----------------------- + +f2fs can utilize a special file called a "device aliasing file." This file allows +the entire storage device to be mapped with a single, large extent, not using +the usual f2fs node structures. This mapped area is pinned and primarily intended +for holding the space. + +Essentially, this mechanism allows a portion of the f2fs area to be temporarily +reserved and used by another filesystem or for different purposes. Once that +external usage is complete, the device aliasing file can be deleted, releasing +the reserved space back to F2FS for its own use. + +<use-case> + +# ls /dev/vd* +/dev/vdb (32GB) /dev/vdc (32GB) +# mkfs.ext4 /dev/vdc +# mkfs.f2fs -c /dev/vdc@vdc.file /dev/vdb +# mount /dev/vdb /mnt/f2fs +# ls -l /mnt/f2fs +vdc.file +# df -h +/dev/vdb 64G 33G 32G 52% /mnt/f2fs + +# mount -o loop /dev/vdc /mnt/ext4 +# df -h +/dev/vdb 64G 33G 32G 52% /mnt/f2fs +/dev/loop7 32G 24K 30G 1% /mnt/ext4 +# umount /mnt/ext4 + +# f2fs_io getflags /mnt/f2fs/vdc.file +get a flag on /mnt/f2fs/vdc.file ret=0, flags=nocow(pinned),immutable +# f2fs_io setflags noimmutable /mnt/f2fs/vdc.file +get a flag on noimmutable ret=0, flags=800010 +set a flag on /mnt/f2fs/vdc.file ret=0, flags=noimmutable +# rm /mnt/f2fs/vdc.file +# df -h +/dev/vdb 64G 753M 64G 2% /mnt/f2fs + +So, the key idea is, user can do any file operations on /dev/vdc, and +reclaim the space after the use, while the space is counted as /data. +That doesn't require modifying partition size and filesystem format. diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst index 0e2fac7a16da..76e538217868 100644 --- a/Documentation/filesystems/fsverity.rst +++ b/Documentation/filesystems/fsverity.rst @@ -16,7 +16,7 @@ btrfs filesystems. Like fscrypt, not too much filesystem-specific code is needed to support fs-verity. fs-verity is similar to `dm-verity -<https://www.kernel.org/doc/Documentation/device-mapper/verity.txt>`_ +<https://www.kernel.org/doc/Documentation/admin-guide/device-mapper/verity.rst>`_ but works on files rather than block devices. On regular files on filesystems supporting fs-verity, userspace can execute an ioctl that causes the filesystem to build a Merkle tree for the file and persist diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst index e8e496d23e1d..44e9e77ffe0d 100644 --- a/Documentation/filesystems/index.rst +++ b/Documentation/filesystems/index.rst @@ -29,6 +29,7 @@ algorithms work. fiemap files locks + multigrain-ts mount_api quota seq_file diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst index b93115ab8748..ef082e5a4e0c 100644 --- a/Documentation/filesystems/iomap/operations.rst +++ b/Documentation/filesystems/iomap/operations.rst @@ -513,6 +513,21 @@ IOMAP_WRITE`` with any combination of the following enhancements: if the mapping is unwritten and the filesystem cannot handle zeroing the unaligned regions without exposing stale contents. + * ``IOMAP_ATOMIC``: This write is being issued with torn-write + protection. + Only a single bio can be created for the write, and the write must + not be split into multiple I/O requests, i.e. flag REQ_ATOMIC must be + set. + The file range to write must be aligned to satisfy the requirements + of both the filesystem and the underlying block device's atomic + commit capabilities. + If filesystem metadata updates are required (e.g. unwritten extent + conversion or copy on write), all updates for the entire file range + must be committed atomically as well. + Only one space mapping is allowed per untorn write. + Untorn writes must be aligned to, and must not be longer than, a + single file block. + Callers commonly hold ``i_rwsem`` in shared or exclusive mode before calling this function. diff --git a/Documentation/filesystems/mount_api.rst b/Documentation/filesystems/mount_api.rst index 317934c9e8fc..d92c276f1575 100644 --- a/Documentation/filesystems/mount_api.rst +++ b/Documentation/filesystems/mount_api.rst @@ -770,7 +770,8 @@ process the parameters it is given. * :: - bool fs_validate_description(const struct fs_parameter_description *desc); + bool fs_validate_description(const char *name, + const struct fs_parameter_description *desc); This performs some validation checks on a parameter description. It returns true if the description is good and false if it is not. It will diff --git a/Documentation/filesystems/multigrain-ts.rst b/Documentation/filesystems/multigrain-ts.rst new file mode 100644 index 000000000000..c779e47284e8 --- /dev/null +++ b/Documentation/filesystems/multigrain-ts.rst @@ -0,0 +1,125 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================== +Multigrain Timestamps +===================== + +Introduction +============ +Historically, the kernel has always used coarse time values to stamp inodes. +This value is updated every jiffy, so any change that happens within that jiffy +will end up with the same timestamp. + +When the kernel goes to stamp an inode (due to a read or write), it first gets +the current time and then compares it to the existing timestamp(s) to see +whether anything will change. If nothing changed, then it can avoid updating +the inode's metadata. + +Coarse timestamps are therefore good from a performance standpoint, since they +reduce the need for metadata updates, but bad from the standpoint of +determining whether anything has changed, since a lot of things can happen in a +jiffy. + +They are particularly troublesome with NFSv3, where unchanging timestamps can +make it difficult to tell whether to invalidate caches. NFSv4 provides a +dedicated change attribute that should always show a visible change, but not +all filesystems implement this properly, causing the NFS server to substitute +the ctime in many cases. + +Multigrain timestamps aim to remedy this by selectively using fine-grained +timestamps when a file has had its timestamps queried recently, and the current +coarse-grained time does not cause a change. + +Inode Timestamps +================ +There are currently 3 timestamps in the inode that are updated to the current +wallclock time on different activity: + +ctime: + The inode change time. This is stamped with the current time whenever + the inode's metadata is changed. Note that this value is not settable + from userland. + +mtime: + The inode modification time. This is stamped with the current time + any time a file's contents change. + +atime: + The inode access time. This is stamped whenever an inode's contents are + read. Widely considered to be a terrible mistake. Usually avoided with + options like noatime or relatime. + +Updating the mtime always implies a change to the ctime, but updating the +atime due to a read request does not. + +Multigrain timestamps are only tracked for the ctime and the mtime. atimes are +not affected and always use the coarse-grained value (subject to the floor). + +Inode Timestamp Ordering +======================== + +In addition to just providing info about changes to individual files, file +timestamps also serve an important purpose in applications like "make". These +programs measure timestamps in order to determine whether source files might be +newer than cached objects. + +Userland applications like make can only determine ordering based on +operational boundaries. For a syscall those are the syscall entry and exit +points. For io_uring or nfsd operations, that's the request submission and +response. In the case of concurrent operations, userland can make no +determination about the order in which things will occur. + +For instance, if a single thread modifies one file, and then another file in +sequence, the second file must show an equal or later mtime than the first. The +same is true if two threads are issuing similar operations that do not overlap +in time. + +If however, two threads have racing syscalls that overlap in time, then there +is no such guarantee, and the second file may appear to have been modified +before, after or at the same time as the first, regardless of which one was +submitted first. + +Note that the above assumes that the system doesn't experience a backward jump +of the realtime clock. If that occurs at an inopportune time, then timestamps +can appear to go backward, even on a properly functioning system. + +Multigrain Timestamp Implementation +=================================== +Multigrain timestamps are aimed at ensuring that changes to a single file are +always recognizable, without violating the ordering guarantees when multiple +different files are modified. This affects the mtime and the ctime, but the +atime will always use coarse-grained timestamps. + +It uses an unused bit in the i_ctime_nsec field to indicate whether the mtime +or ctime has been queried. If either or both have, then the kernel takes +special care to ensure the next timestamp update will display a visible change. +This ensures tight cache coherency for use-cases like NFS, without sacrificing +the benefits of reduced metadata updates when files aren't being watched. + +The Ctime Floor Value +===================== +It's not sufficient to simply use fine or coarse-grained timestamps based on +whether the mtime or ctime has been queried. A file could get a fine grained +timestamp, and then a second file modified later could get a coarse-grained one +that appears earlier than the first, which would break the kernel's timestamp +ordering guarantees. + +To mitigate this problem, maintain a global floor value that ensures that +this can't happen. The two files in the above example may appear to have been +modified at the same time in such a case, but they will never show the reverse +order. To avoid problems with realtime clock jumps, the floor is managed as a +monotonic ktime_t, and the values are converted to realtime clock values as +needed. + +Implementation Notes +==================== +Multigrain timestamps are intended for use by local filesystems that get +ctime values from the local clock. This is in contrast to network filesystems +and the like that just mirror timestamp values from a server. + +For most filesystems, it's sufficient to just set the FS_MGTIME flag in the +fstype->fs_flags in order to opt-in, providing the ctime is only ever set via +inode_set_ctime_current(). If the filesystem has a ->getattr routine that +doesn't call generic_fillattr, then it should call fill_mg_cmtime() to +fill those values. For setattr, it should use setattr_copy() to update the +timestamps, or otherwise mimic its behavior. diff --git a/Documentation/filesystems/nfs/exporting.rst b/Documentation/filesystems/nfs/exporting.rst index f04ce1215a03..de64d2d002a2 100644 --- a/Documentation/filesystems/nfs/exporting.rst +++ b/Documentation/filesystems/nfs/exporting.rst @@ -238,10 +238,3 @@ following flags are defined: all of an inode's dirty data on last close. Exports that behave this way should set EXPORT_OP_FLUSH_ON_CLOSE so that NFSD knows to skip waiting for writeback when closing such files. - - EXPORT_OP_ASYNC_LOCK - Indicates a capable filesystem to do async lock - requests from lockd. Only set EXPORT_OP_ASYNC_LOCK if the filesystem has - it's own ->lock() functionality as core posix_lock_file() implementation - has no async lock request handling yet. For more information about how to - indicate an async lock request from a ->lock() file_operations struct, see - fs/locks.c and comment for the function vfs_lock_file(). diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst index 343644712340..4c8387e1c880 100644 --- a/Documentation/filesystems/overlayfs.rst +++ b/Documentation/filesystems/overlayfs.rst @@ -440,6 +440,23 @@ For example:: fsconfig(fs_fd, FSCONFIG_SET_STRING, "datadir+", "/do2", 0); +Specifying layers via file descriptors +-------------------------------------- + +Since kernel v6.13, overlayfs supports specifying layers via file descriptors in +addition to specifying them as paths. This feature is available for the +"datadir+", "lowerdir+", "upperdir", and "workdir+" mount options with the +fsconfig syscall from the new mount api:: + + fsconfig(fs_fd, FSCONFIG_SET_FD, "lowerdir+", NULL, fd_lower1); + fsconfig(fs_fd, FSCONFIG_SET_FD, "lowerdir+", NULL, fd_lower2); + fsconfig(fs_fd, FSCONFIG_SET_FD, "lowerdir+", NULL, fd_lower3); + fsconfig(fs_fd, FSCONFIG_SET_FD, "datadir+", NULL, fd_data1); + fsconfig(fs_fd, FSCONFIG_SET_FD, "datadir+", NULL, fd_data2); + fsconfig(fs_fd, FSCONFIG_SET_FD, "workdir", NULL, fd_work); + fsconfig(fs_fd, FSCONFIG_SET_FD, "upperdir", NULL, fd_upper); + + fs-verity support ----------------- diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst index 2b2df6aa5432..9ced1135608e 100644 --- a/Documentation/filesystems/path-lookup.rst +++ b/Documentation/filesystems/path-lookup.rst @@ -531,7 +531,7 @@ this retry process in the next article. Automount points are locations in the filesystem where an attempt to lookup a name can trigger changes to how that lookup should be handled, in particular by mounting a filesystem there. These are -covered in greater detail in autofs.txt in the Linux documentation +covered in greater detail in autofs.rst in the Linux documentation tree, but a few notes specifically related to path lookup are in order here. diff --git a/Documentation/filesystems/path-lookup.txt b/Documentation/filesystems/path-lookup.txt index 1aa7ce099f6f..d2cf2852e1f8 100644 --- a/Documentation/filesystems/path-lookup.txt +++ b/Documentation/filesystems/path-lookup.txt @@ -379,4 +379,4 @@ Papers and other documentation on dcache locking 2. http://lse.sourceforge.net/locking/dcache/dcache.html -3. path-lookup.md in this directory. +3. path-lookup.rst in this directory. diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst index 92bffcc6747a..9ab2a3d6f2b4 100644 --- a/Documentation/filesystems/porting.rst +++ b/Documentation/filesystems/porting.rst @@ -177,7 +177,7 @@ settles down a bit. **mandatory** s_export_op is now required for exporting a filesystem. -isofs, ext2, ext3, reiserfs, fat +isofs, ext2, ext3, fat can be used as examples of very different filesystems. --- diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index e834779d9611..6a882c57a7e7 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -579,7 +579,7 @@ encoded manner. The codes are the following: mt arm64 MTE allocation tags are enabled um userfaultfd missing tracking uw userfaultfd wr-protect tracking - ss shadow stack page + ss shadow/guarded control stack page sl sealed == ======================================= diff --git a/Documentation/filesystems/ramfs-rootfs-initramfs.rst b/Documentation/filesystems/ramfs-rootfs-initramfs.rst index 447f767c6462..fa4f81099cb4 100644 --- a/Documentation/filesystems/ramfs-rootfs-initramfs.rst +++ b/Documentation/filesystems/ramfs-rootfs-initramfs.rst @@ -315,7 +315,7 @@ the above threads) is: 2) The cpio archive format chosen by the kernel is simpler and cleaner (and thus easier to create and parse) than any of the (literally dozens of) various tar archive formats. The complete initramfs archive format is - explained in buffer-format.txt, created in usr/gen_init_cpio.c, and + explained in buffer-format.rst, created in usr/gen_init_cpio.c, and extracted in init/initramfs.c. All three together come to less than 26k total of human-readable text. diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst index 56a26c843dbe..d677e0428c3f 100644 --- a/Documentation/filesystems/tmpfs.rst +++ b/Documentation/filesystems/tmpfs.rst @@ -241,6 +241,28 @@ So 'mount -t tmpfs -o size=10G,nr_inodes=10k,mode=700 tmpfs /mytmpfs' will give you tmpfs instance on /mytmpfs which can allocate 10GB RAM/SWAP in 10240 inodes and it is only accessible by root. +tmpfs has the following mounting options for case-insensitive lookup support: + +================= ============================================================== +casefold Enable casefold support at this mount point using the given + argument as the encoding standard. Currently only UTF-8 + encodings are supported. If no argument is used, it will load + the latest UTF-8 encoding available. +strict_encoding Enable strict encoding at this mount point (disabled by + default). In this mode, the filesystem refuses to create file + and directory with names containing invalid UTF-8 characters. +================= ============================================================== + +This option doesn't render the entire filesystem case-insensitive. One needs to +still set the casefold flag per directory, by flipping the +F attribute in an +empty directory. Nevertheless, new directories will inherit the attribute. The +mountpoint itself cannot be made case-insensitive. + +Example:: + + $ mount -t tmpfs -o casefold=utf8-12.1.0,strict_encoding fs_name /mytmpfs + $ mount -t tmpfs -o casefold fs_name /mytmpfs + :Author: Christoph Rohland <cr@sap.com>, 1.12.01 @@ -250,3 +272,5 @@ RAM/SWAP in 10240 inodes and it is only accessible by root. KOSAKI Motohiro, 16 Mar 2010 :Updated: Chris Down, 13 July 2020 +:Updated: + André Almeida, 23 Aug 2024 |