btrfs: dynamic block_group reclaim threshold

We can currently recover allocated block_groups by: - explicitly starting balance operations - "auto reclaim" via bg_reclaim_threshold The latter works by checking against a fixed threshold on frees. If we pass from above the threshold to below, relocation triggers and the block group will get reclaimed by the cleaner thread (assuming it is still eligible) Picking a threshold is challenging. Too high, and you end up trying to reclaim very full block_groups which is quite costly, and you don't do reclaim on block_groups that don't get quite THAT full, but could still be quite fragmented and stranding a lot of space. Too low, and you similarly miss out on reclaim even if you badly need it to avoid running out of unallocated space, if you have heavily fragmented block groups living above the threshold. No matter the threshold, it suffers from a workload that happens to bounce around that threshold, which can introduce arbitrary amounts of reclaim waste. To improve this situation, introduce a dynamic threshold. The basic idea behind this threshold is that it should be very lax when there is plenty of unallocated space, and increasingly aggressive as we approach zero unallocated space. To that end, it sets a target for unallocated space (10 chunks) and then linearly increases the threshold as the amount of space short of the target we are increases. The formula is: (target - unalloc) / target I tested this by running it on three interesting workloads: 1. bounce allocations around X% full. 2. fill up all the way and introduce full fragmentation. 3. write in a fragmented way until the filesystem is just about full. 1. and 2. attack the weaknesses of a fixed threshold; fixed either works perfectly or fully falls apart, depending on the threshold. Dynamic always handles these cases well. 3. attacks dynamic by checking whether it is too zealous to reclaim in conditions with low unallocated and low unused. It tends to claw back 1GiB of unallocated fairly aggressively, but not much more. Early versions of dynamic threshold struggled on this test. Additional work could be done to intelligently ratchet up the urgency of reclaim in very low unallocated conditions. Existing mechanisms are already useless in that case anyway. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
author: Boris Burkov <boris@bur.io> 2024-02-02 11:52:16 -0800
committer: David Sterba <dsterba@suse.com> 2024-07-11 15:33:27 +0200
commit: f5ff64ccf7bb7274ed66b0d835b2f6ae10af5d7a (patch)
tree: 0baaff38ff8a3587ef8dbdec13796a5986723559 /fs/btrfs/space-info.h
parent: 42f620aec182f62ee72e3fce41cb3353951b3508 (diff)
1 files changed, 8 insertions, 0 deletions
diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h
index 25edfd453b27..2cac771321c7 100644
--- a/fs/btrfs/space-info.h
+++ b/fs/btrfs/space-info.h
@@ -184,6 +184,12 @@ struct btrfs_space_info {
 	 * Exposed in /sys/fs/<uuid>/allocation/<type>/reclaim_errors
 	 */
 	u64 reclaim_errors;
+
+	/*
+	 * If true, use the dynamic relocation threshold, instead of the
+	 * fixed bg_reclaim_threshold.
+	 */
+	bool dynamic_reclaim;
 };
 
 struct reserve_ticket {
@@ -266,4 +272,6 @@ void btrfs_dump_space_info_for_trans_abort(struct btrfs_fs_info *fs_info);
 void btrfs_init_async_reclaim_work(struct btrfs_fs_info *fs_info);
 u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo);
 
+int btrfs_calc_reclaim_threshold(struct btrfs_space_info *space_info);
+
 #endif /* BTRFS_SPACE_INFO_H */
author	Boris Burkov <boris@bur.io>	2024-02-02 11:52:16 -0800
committer	David Sterba <dsterba@suse.com>	2024-07-11 15:33:27 +0200
commit	f5ff64ccf7bb7274ed66b0d835b2f6ae10af5d7a (patch)
tree	0baaff38ff8a3587ef8dbdec13796a5986723559 /fs/btrfs/space-info.h
parent	42f620aec182f62ee72e3fce41cb3353951b3508 (diff)