iomap: don't skip reading in !uptodate folios when unsharing a range

Prior to commit a01b8f225248e, we would always read in the contents of a !uptodate folio prior to writing userspace data into the folio, allocated a folio state object, etc. Ritesh introduced an optimization that skips all of that if the write would cover the entire folio. Unfortunately, the optimization misses the unshare case, where we always have to read in the folio contents since there isn't a data buffer supplied by userspace. This can result in stale kernel memory exposure if userspace issues a FALLOC_FL_UNSHARE_RANGE call on part of a shared file that isn't already cached. This was caught by observing fstests regressions in the "unshare around" mechanism that is used for unaligned writes to a reflinked realtime volume when the realtime extent size is larger than 1FSB, though I think it applies to any shared file. Cc: ritesh.list@gmail.com, willy@infradead.org Fixes: a01b8f225248e ("iomap: Allocate ifs in ->write_begin() early") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
author: Darrick J. Wong <djwong@kernel.org> 2023-09-18 15:57:39 -0700
committer: Darrick J. Wong <djwong@kernel.org> 2023-09-18 15:57:39 -0700
commit: 35d30c9cf12730a1e37053dfde4007c7cc452d1a (patch)
tree: 17d3c34b71470b2e6c647e494e1c07b251f8416b /fs/iomap
parent: 4aa8cdd5e523d2d8ec8df29dcd696bf207d7a494 (diff)
1 files changed, 4 insertions, 2 deletions
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index ae8673ce08b1..0350830fc989 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -640,11 +640,13 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
 	size_t poff, plen;
 
 	/*
-	 * If the write completely overlaps the current folio, then
+	 * If the write or zeroing completely overlaps the current folio, then
 	 * entire folio will be dirtied so there is no need for
 	 * per-block state tracking structures to be attached to this folio.
+	 * For the unshare case, we must read in the ondisk contents because we
+	 * are not changing pagecache contents.
 	 */
-	if (pos <= folio_pos(folio) &&
+	if (!(iter->flags & IOMAP_UNSHARE) && pos <= folio_pos(folio) &&
 	    pos + len >= folio_pos(folio) + folio_size(folio))
 		return 0;
author	Darrick J. Wong <djwong@kernel.org>	2023-09-18 15:57:39 -0700
committer	Darrick J. Wong <djwong@kernel.org>	2023-09-18 15:57:39 -0700
commit	35d30c9cf12730a1e37053dfde4007c7cc452d1a (patch)
tree	17d3c34b71470b2e6c647e494e1c07b251f8416b /fs/iomap
parent	4aa8cdd5e523d2d8ec8df29dcd696bf207d7a494 (diff)