diff options
author | Filipe Manana <fdmanana@suse.com> | 2014-11-27 21:14:15 +0000 |
---|---|---|
committer | Chris Mason <clm@fb.com> | 2014-12-02 18:35:09 -0800 |
commit | 04216820fe83d5e27322065ba989de27dbfc104d (patch) | |
tree | febfcea25770f1298cbb7470dbeba5d473751530 /fs/btrfs/volumes.c | |
parent | ae0ab003f2840e7ed66ed762ec0d4e298ad1e725 (diff) |
Btrfs: fix race between fs trimming and block group remove/allocation
Our fs trim operation, which is completely transactionless (doesn't start
or joins an existing transaction) consists of visiting all block groups
and then for each one to iterate its free space entries and perform a
discard operation against the space range represented by the free space
entries. However before performing a discard, the corresponding free space
entry is removed from the free space rbtree, and when the discard completes
it is added back to the free space rbtree.
If a block group remove operation happens while the discard is ongoing (or
before it starts and after a free space entry is hidden), we end up not
waiting for the discard to complete, remove the extent map that maps
logical address to physical addresses and the corresponding chunk metadata
from the the chunk and device trees. After that and before the discard
completes, the current running transaction can finish and a new one start,
allowing for new block groups that map to the same physical addresses to
be allocated and written to.
So fix this by keeping the extent map in memory until the discard completes
so that the same physical addresses aren't reused before it completes.
If the physical locations that are under a discard operation end up being
used for a new metadata block group for example, and dirty metadata extents
are written before the discard finishes (the VM might call writepages() of
our btree inode's i_mapping for example, or an fsync log commit happens) we
end up overwriting metadata with zeroes, which leads to errors from fsck
like the following:
checking extents
Check tree block failed, want=833912832, have=0
Check tree block failed, want=833912832, have=0
Check tree block failed, want=833912832, have=0
Check tree block failed, want=833912832, have=0
Check tree block failed, want=833912832, have=0
read block failed check_tree_block
owner ref check failed [833912832 16384]
Errors found in extent allocation tree or chunk allocation
checking free space cache
checking fs roots
Check tree block failed, want=833912832, have=0
Check tree block failed, want=833912832, have=0
Check tree block failed, want=833912832, have=0
Check tree block failed, want=833912832, have=0
Check tree block failed, want=833912832, have=0
read block failed check_tree_block
root 5 root dir 256 error
root 5 inode 260 errors 2001, no inode item, link count wrong
unresolved ref dir 256 index 0 namelen 8 name foobar_3 filetype 1 errors 6, no dir index, no inode ref
root 5 inode 262 errors 2001, no inode item, link count wrong
unresolved ref dir 256 index 0 namelen 8 name foobar_5 filetype 1 errors 6, no dir index, no inode ref
root 5 inode 263 errors 2001, no inode item, link count wrong
(...)
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Diffstat (limited to 'fs/btrfs/volumes.c')
-rw-r--r-- | fs/btrfs/volumes.c | 26 |
1 files changed, 8 insertions, 18 deletions
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 01920515f90d..588f37e0a564 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -53,16 +53,6 @@ static void btrfs_dev_stat_print_on_load(struct btrfs_device *device); DEFINE_MUTEX(uuid_mutex); static LIST_HEAD(fs_uuids); -static void lock_chunks(struct btrfs_root *root) -{ - mutex_lock(&root->fs_info->chunk_mutex); -} - -static void unlock_chunks(struct btrfs_root *root) -{ - mutex_unlock(&root->fs_info->chunk_mutex); -} - static struct btrfs_fs_devices *__alloc_fs_devices(void) { struct btrfs_fs_devices *fs_devs; @@ -1068,9 +1058,11 @@ static int contains_pending_extent(struct btrfs_trans_handle *trans, u64 *start, u64 len) { struct extent_map *em; + struct list_head *search_list = &trans->transaction->pending_chunks; int ret = 0; - list_for_each_entry(em, &trans->transaction->pending_chunks, list) { +again: + list_for_each_entry(em, search_list, list) { struct map_lookup *map; int i; @@ -1087,6 +1079,10 @@ static int contains_pending_extent(struct btrfs_trans_handle *trans, ret = 1; } } + if (search_list == &trans->transaction->pending_chunks) { + search_list = &trans->root->fs_info->pinned_chunks; + goto again; + } return ret; } @@ -2653,18 +2649,12 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans, } } - ret = btrfs_remove_block_group(trans, extent_root, chunk_offset); + ret = btrfs_remove_block_group(trans, extent_root, chunk_offset, em); if (ret) { btrfs_abort_transaction(trans, extent_root, ret); goto out; } - write_lock(&em_tree->lock); - remove_extent_mapping(em_tree, em); - write_unlock(&em_tree->lock); - - /* once for the tree */ - free_extent_map(em); out: /* once for us */ free_extent_map(em); |