Age | Commit message (Collapse) | Author | Files | Lines |
|
Otherwise, we can see stale fsync/dentry mark given by previous calls, resulting
in giving up roll-forward recovery due to wrong dentry mark.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
We'd better allocate atomic writes to hot_data zone.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If all the threads are waiting for checkpoint, we have no chance to flush
required dirty pages.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch fixes the following scenario.
- f2fs_create/f2fs_mkdir - write_checkpoint
- f2fs_mark_inode_dirty_sync - block_operations
- f2fs_lock_all
- f2fs_sync_inode_meta
- f2fs_unlock_all
- sync_inode_metadata
- f2fs_lock_op
- f2fs_write_inode
- update_inode_page
- get_node_page
return -ENOENT
- new_inode_page
- fill_node_footer
- f2fs_mark_inode_dirty_sync
- ...
- f2fs_unlock_op
- f2fs_inode_synced
- f2fs_lock_all
- do_checkpoint
In this checkpoint, we can get an inode page which contains zeros having valid
node footer only.
Cc: <stable@vger.kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Don't use blk plug covering area where there won't be any IOs being issued.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
rb-tree lookup/update functions are deeply coupled into extent cache
codes, it's very hard to reuse these basic functions, this patch
extracts common rb-tree operation infrastructure for latter reusing.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Now we're doing SSR aggressively more than ever before, so once we reach to
the reserved_segment, f2fs_balance_fs will call f2fs_gc, which triggers
checkpoint everytime. We actually must avoid that.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch cleans several macros by introducing:
- BLKS_PER_SEC
- GET_SEC_FROM_SEG
- GET_SEG_FROM_SEC
- GET_ZONE_FROM_SEC
- GET_ZONE_FROM_SEG
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch cleans up get_valid_blocks, which has no functional change.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch fixes to submit a segment number for get_valid_blocks.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Add braces around variables used within macros for those make sense
to do it. Many of the macros in f2fs already do this. What this commit
doesn't do is anything that changes line# as a result of adding braces,
which usually affects the binary via __LINE__.
Confirmed no diff in fs/f2fs/f2fs.ko before/after this commit on x86_64,
to make sure this has no functional change as well as there's been no
unexpected side effect due to callers' arithmetics within the existing
code.
Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Callers are to unlock the page on failure after 86531d6b.
Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In f2fs_submit_discard_endio, we will wake up waiter before setting
discard command states, so waiter may use incorrect states. Change
the order between complete() and states setting to fix this issue.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Split f2fs_wait_discard_bios from f2fs_wait_discard_bio, just for cleanup,
no logic change.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Split discard_cmd_list to discard_{pend,wait}_list, so while sending/waiting
discard command, we can avoid traversing unneeded entries in original list.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This reverts commit 3436c4bdb30de421d46f58c9174669fbcfd40ce0.
This makes a leak to register dirty segments. I reproduced the issue by
modified postmark which injects a lot of file create/delete/update and
finally triggers huge number of SSR allocations.
Cc: <stable@vger.kernel.org> # v4.10+
[Jaegeuk Kim: Change missing incorrect comment]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Since callers statically know which type to use, make_dentry_ptr()
can simply be splitted into two inline functions. This way, the code
has less inlined, fewer arguments, and no cast.
Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch tries to split in-place-update bios from sequential bios.
Suggested-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The variable 'i' has been defined before, so here we can
use it directly.
Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If two threads try to flush dirty pages in different inodes respectively,
f2fs_write_data_pages() will produce WRITE and WRITE_SYNC one at a time,
resulting in a lot of 4KB seperated IOs.
So, this patch gives higher priority to WB_SYNC_ALL IOs and gathers write
IOs with a big WRITE_SYNC'ed bio.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
It would better split small and large IOs separately in order to get more
consecutive big writes.
The default threshold is set to 64KB, but configurable by sysfs/min_hot_blocks.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch changes to use bitmap instead of extent in struct discard_entry
to indicate discard range in one segment, for fragmented space, this
implementation can save memory footprint.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Remove unneeded parameter and simply change flow in
destroy_discard_cmd_control.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Adds to count discard command entry and show the number in debugfs,
also fix to add cost of discard command cache into total comsumed
memory footprint.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Show historical count of flush command and discard command.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
- has_not_enough_free_secs
node_secs: 0 dent_secs: 0 freed:0 free_segments:103 reserved:104
- f2fs_gc
- get_victim_by_default
alloc_mode 0, gc_mode 1, max_search 2672, offset 4654, ofs_unit 1
- do_garbage_collect
start_segno 3976, end_segno 3977 type 0
- is_alive
nid 22797, blkaddr 2131882, ofs_in_node 0, version 0x8/0x0
- gc_data_segment 766, segno 3976, block 512/426 not alive
So, this patch fixes subtle corrupted case where node version does not match
to summary version which results in infinite loop by gc.
Reported-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch initiates SSR much eariler, resulting in less FG_GC.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In order to give more spatial locality, this patch changes the block allocation
policy which assigns beginning of partition for small and hot data/node blocks.
In order to do this, we set noheap allocation by default and introduce another
mount option, heap, to reset it back.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch fixes missing increased max cost caused by a patch that we increased
cose of data segments in greedy algorithm.
Cc: <stable@vger.kernel.org> # v4.10+
Fixes: b9cd20619 "f2fs: node segment is prior to data segment selected victim"
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch allow write data to normal file when writting
new checkpoint.
We relax three limitations for write_begin path:
1. data allocation
2. node allocation
3. variables in checkpoint
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In this patch, we change xattr block disk layout as below:
Before:
xattr node block layout
+---------------------------------------------+---------------+-------------+
| node block xattr entries | reserved | node footer |
| 4068 Bytes | 4 Bytes | 24 Bytes |
In memory layout
+--------------------+---------------------------------+--------------------+
| inline xattr | node block xattr entries | reserved |
| 200 Bytes | 4068 Bytes | 4 Bytes |
After:
xattr node block layout
+-------------------------------------------------------------+-------------+
| node block xattr entries | node footer |
| 4072 Bytes | 24 Bytes |
In memory layout
+--------------------+---------------------------------+--------------------+
| inline xattr | node block xattr entries | reserved |
| 200 Bytes | 4072 Bytes | 4 Bytes |
With this change, we don't need to reserve additional space in node block,
just keep reserved space in logical in-memory layout. So that it would help
to enlarge valid free space of xattr node block.
As tested, generic/026 shows max stored xattr entires number increases from
531 to 532 when inline_xattr option is enabled.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
1. don't allocate redundant memory in read_all_xattrs.
2. introduce RESERVED_XATTR_SIZE for cleanup.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Don't track volatile file in dirty inode list, otherwise with data_flush
option, background thread will entry into endless loop for flushing
journal file's pages.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch adds to show the max number of volatile operations which are
conducting concurrently.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In below concurrent case, allocated nid can be loaded into free nid cache
and be allocated again.
Thread A Thread B
- f2fs_create
- f2fs_new_inode
- alloc_nid
- __insert_nid_to_list(ALLOC_NID_LIST)
- f2fs_balance_fs_bg
- build_free_nids
- __build_free_nids
- scan_nat_page
- add_free_nid
- __lookup_nat_cache
- f2fs_add_link
- init_inode_metadata
- new_inode_page
- new_node_page
- set_node_addr
- alloc_nid_done
- __remove_nid_from_list(ALLOC_NID_LIST)
- __insert_nid_to_list(FREE_NID_LIST)
This patch makes nat cache lookup and free nid list operation being atomical
to avoid this race condition.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Use set_page_private marcro instead of operte page struct
directly
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
When doing garbage collection, we try to record segment offset which
locates at next one of last victim, using it as the start offset in
next searching.
But in some corner cases, recorded offset may cross the end of main
segment area, it will cause incorrectly searching in dirty_segmap
bitmap. This patch adds modular operation to avoid this issue.
Reported-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The memory size of f2fs_stat_info also should be calculated.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
After filemap_write_and_wait_range fail, the FI_ATOMIC_FILE flags is removed,
so that f2fs should not increase the stat of atomic_write.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The crc_offset towards or beyond the end of block is wrong,
sanity check it.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
As discuss with Jaegeuk and Chao,
"Once checkpoint is done, f2fs doesn't need to update there-in filename at all."
The disk-level filename is used only one case,
1. create a file A under a dir
2. sync A
3. godown
4. umount
5. mount (roll_forward)
Only the rename/cross_rename changes the filename, if it happens,
a. between step 1 and 2, the sync A will caused checkpoint, so that,
the roll_forward at step 5 never happens.
b. after step 2, the roll_forward happens, file A will roll forward
to the result as after step 1.
So that, any updating the disk filename is useless, just cleanup it.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
free_nid_bitmap and free_nid_count in update_free_nid_bitmap should be
updated atomically, use nid_list_lock cover them to avoid race in
concurrent scenario.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
For f2fs_read_data_pages, the f2fs_mpage_readpages gets "page == NULL",
so that, the prefetchw(&page->flags) is operated on NULL.
Fixes: f1e8866016 ("f2fs: expose f2fs_mpage_readpages")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Clear FI_DATA_EXIST flag atomically in truncate_inline_inode, and
the return value from truncate_inline_inode isn't used, remove it.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
It's needless of mnt_want_write_file for arguments checking.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The inode_newsize_ok is better than only checking the maxbytes,
eg. the rlimit etc.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If move file range return error, the data copied to user-space is duplicate.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
use a slightly simpler expression to calculate nat block with nid.
Signed-off-by: Fan Li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|