Age | Commit message (Collapse) | Author | Files | Lines |
|
We could delete directories transactionally on rmdir()/unlink(), but we
don't; instead, like with regular files we wait for the VFS to call
evict().
That means that our check for directories in the deleted inodes btree is
wrong - the check should be for non-empty directories.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
In may_delete_deleted_inode(), there's a corner case when a snapshot was
taken while we had an unlinked inode: we don't want to delete the inode
in the internal (shared) snapshot node, since it might have been
reattached in a descendent snapshot.
Instead we propagate the key to any snapshot leaves it doesn't exist in,
so that it can be deleted there if necessary, and then clear the
unlinked flag in the internal node.
But we forgot to commit after clearing the unlinked flag, causing us to
go into an infinite loop.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
may_delete_deleted_inode() was returning without exiting a btree
iterator, eventually causing propagate_key_to_snaphot_leaves() to go
into an infinite loop hitting btree_trans_too_many_iters().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This lets us use bch2_prt_bitflags to print them out.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
- the fsck_err() check for the filesystem being clean was incorrect,
causing us to always fail to delete unlinked inodes
- if a snapshot had been taken, the unlinked inode needs to be
propagated to snapshot leaves so the unlink can happen there - fixed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This patch adds a superblock error counter for every distinct fsck
error; this means that when analyzing filesystems out in the wild we'll
be able to see what sorts of inconsistencies are being found and repair,
and hence what bugs to look for.
Errors validating bkeys are not yet considered distinct fsck errors, but
this patch adds a new helper, bkey_fsck_err(), in order to add distinct
error types for them as well.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
New helper for new rebalance code
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Since compression options now include compression level, proper
validation is a bit more involved.
This adds bch2_compression_opt_valid(), and plumbs it around
appropriately.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_inode_delete_keys() was using BTREE_ITER_NOT_EXTENTS, on the
assumption that it would never need to split extents.
But that caused a race with extents being split by other threads -
specifically, the data move path. Extents iterators have the iterator
position pointing to the start of the extent, which avoids the race.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
These errors aren't actual errors, and should never be printed - do this
in the common helpers.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We're using more stack than we'd like in a number of functions, and
btree_trans is the biggest object that we stack allocate.
But we have to do a heap allocatation to initialize it anyways, so
there's no real downside to heap allocating the entire thing.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_quota_read(), when scanning for inodes, may attempt to look up
inodes that have been deleted in the main subvolume - this is not an
error.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
subvolume.c has gotten a bit large, this splits out a separate file just
for managing snapshot trees - BTREE_ID_snapshots.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add error messages when we fail to lookup an inode, and also add a few
missing bch2_err_class() calls.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a new bitset btree for inodes pending deletion; this means we no
longer have to scan the full inodes btree after an unclean shutdown.
Specifically, this adds:
- a trigger to update the deleted_inodes btree based on changes to the
inodes btree
- a new recovery pass
- and check_inodes is now only a fsck pass.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Prep work for the new deleted inodes btree
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bit of reorg
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
As part of the forward compatibility patch series, we need to allow for
new key types without complaining loudly when running an old version.
This patch changes the flags parameter of bkey_invalid to an enum, and
adds a new flag to indicate we're being called from the transaction
commit path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
As with previous conversions, replace -ENOENT uses with more informative
private error codes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Introduce new helpers for a common pattern:
bch2_trans_iter_init();
bch2_btree_iter_peek_slot();
- bch2_bkey_get_iter_type() returns -ENOENT if it doesn't find a key of
the correct type
- bch2_bkey_get_val_typed() copies the val out of the btree to a
(typically stack allocated) variable; it handles the case where the
value in the btree is smaller than the current version of the type,
zeroing out the remainder.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This adds a new field to bkey_ops for the minimum size of the value,
which standardizes that check and also enforces the new rule (previously
done somewhat ad-hoc) that we can extend value types by adding new
fields on to the end.
To make that work we do _not_ initialize min_val_size with sizeof,
instead we initialize it to the size of the first version of those
values.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This fixes a bug in bch2_evict_subvolume_inodes(): d_mark_dontcache()
doesn't handle the case where i_count is already 0, we need to grab and
put the inode in order for it to be dropped.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This adds support for nocow mode, where we do writes in-place when
possible. Patch components:
- New boolean filesystem and inode option, nocow: note that when nocow
is enabled, data checksumming and compression are implicitly disabled
- To prevent in-place writes from racing with data moves
(data_update.c) or bucket reuse (i.e. a bucket being reused and
re-allocated while a nocow write is in flight, we have a new locking
mechanism.
Buckets can be locked for either data update or data move, using a
fixed size hash table of two_state_shared locks. We don't have any
chaining, meaning updates and moves to different buckets that hash to
the same lock will wait unnecessarily - we'll want to watch for this
becoming an issue.
- The allocator path also needs to check for in-place writes in flight
to a given bucket before giving it out: thus we add another counter
to bucket_alloc_state so we can track this.
- Fsync now may need to issue cache flushes to block devices instead of
flushing the journal. We add a device bitmask to bch_inode_info,
ei_devs_need_flush, which tracks devices that need to have flushes
issued - note that this will lead to unnecessary flushes when other
codepaths have already issued flushes, we may want to replace this with
a sequence number.
- New nocow write path: look up extents, and if they're writable write
to them - otherwise fall back to the normal COW write path.
XXX: switch to sequence numbers instead of bitmask for devs needing
journal flush
XXX: ei_quota_lock being a mutex means bch2_nocow_write_done() needs to
run in process context - see if we can improve this
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Move bi_size and bi_sectors into the non-varint portion of the inode, so
that the write path can update them without going through the relatively
expensive unpack/pack operations.
Other changes:
- Add a field for the offset of the varint section, so we can add new
non-varint fields without needing a new inode type, like alloc_v3
- Move bi_mode into the flags field, so that the varint section can be
u64 aligned
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This improves io_opts() and makes it a non-inline function - it's big
enough that it probably shouldn't be.
Also, bch_io_opts no longer needs fields for whether options are
defined, so we can slim it down a bit.
We'd like to stop passing around the full bch_io_opts, but that'll be
tricky because of bch2_rebalance_add_key().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Old inode formats don't have all the fields of the current inode format:
when unpacking inodes in the current format we can thus skip zeroing out
the destination buffer, but that doesn't work on for the old formats.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We shouldn't be overloading standard error codes now that we have
provisions for bcachefs-specific errorcodes: this patch converts super.c
and super-io.c to per error site errcodes, with a bit of cleanup.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This switches btree_key_cache_fill() to use a btree iterator, not a
btree path, so that it can search for keys in previous snapshots.
We also add another iterator flag, BTREE_ITER_KEY_CACHE_FILL, to avoid
recursion back into the key cache.
This will allow us to re-enable the key cache for inodes in the next
patch.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This patch introduces
- bpos_eq()
- bpos_lt()
- bpos_le()
- bpos_gt()
- bpos_ge()
and equivalent replacements for bkey_cmp().
Looking at the generated assembly these could probably be improved
further, but we already see a significant code size improvement with
this patch.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
It turns out the *_defined entries of bch_io_opts are only used in one
place - in the xattr get path - and there we immediately convert to a
bch_opts struct, which also has the *_defined entries.
This patch changes bch2_inode_opts_to_opts() to go directly from
bch_inode_unpacked to bch_opts, which is a minor simplification and will
also let us slim down struct bch_io_opts in another patch.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
It turns out we need bch2_extent_trim_atomi() even when we're deleting
extents one at a time because it's possible for one reflink_p to
reference arbitrarily many reflink_v extents. This doesn't normally
happen, but the data move path can fragment existing extents in the
background.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
checkpatch.pl gives lots of warnings that we don't want - suggested
ignore list:
ASSIGN_IN_IF
UNSPECIFIED_INT - bcachefs coding style prefers single token type names
NEW_TYPEDEFS - typedefs are occasionally good
FUNCTION_ARGUMENTS - we prefer to look at functions in .c files
(hopefully with docbook documentation), not .h
file prototypes
MULTISTATEMENT_MACRO_USE_DO_WHILE
- we have _many_ x-macros and other macros where
we can't do this
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
It's mainly used from bch2_inode_write(), so inline it there.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Continuing the saga of introducing private dedicated error codes for
each error path, this patch converts ENOSPC to error codes that are
subtypes of ENOSPC. We've recently had a test failure where we got
-ENOSPC where we shouldn't have, and didn't have enough information to
tell where it came from, so this patch will solve that problem.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Now that we have error codes, with subtypes, we can switch to our own
error code for transaction restarts - and even better, a distinct error
code for each transaction restart reason: clearer code and better
debugging.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This switches that assertion to a bch2_trans_inconsistent() call, as it
should be.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This converts bcachefs to the modern printbuf interface/implementation,
synced with the version to be submitted upstream.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This adds a new parameter to .key_invalid() methods for whether the key
is being read or written; the idea being that methods can do more
aggressive checks when a key is newly created and being written, when we
wouldn't want to delete the key because of those checks.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
In BTREE_ITER_FILTER_SNAPHOTS mode, we skip over keys in unrelated
snapshots. When we hit the end of an inode, if the next inode(s) are in
a different subvolume, we could potentially have to skip past many keys
before finding a key we can return to the caller, so they can terminate
the iteration.
This adds a peek_upto() variant to solve this problem, to be used when
we know the range we're searching within.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Now that we have full key cache coherency, we can simplify
bch2_inode_create().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a new helper that returns true if the given btree ID uses the btree
key cache. This enables some new cleanups, since the helper can check
the options for whether caching is enabled on a given btree.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Had a bug report that implies bch2_inode_delete_keys() returned -EINTR
before it completed, so this patch simplifies it and makes the flow
control a little more conventional.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Similarly to bch2_btree_delete_range_trans(), bch2_inode_delete_keys()
may sometimes split compressed extents, and needs to pass in a disk
reservation.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This fixes some compiler warnings that only trigger in userspace - dead
code, a maybe uninitialed variable, a maybe null ptr passed to printk.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
When unpacking v1 inodes, we were failing to initialize the journal_seq
field, leading to a BUG_ON() when fsync tries to flush a garbage journal
sequence number.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|