[1/1] f2fs: move fiemap to use iomap framework
Commit Message
This patch has been tested with xfstests by running 'kvm-xfstests -c
f2fs -g auto' with and without this patch; no regressions were seen.
Some tests fail both before and after, and the test results are:
f2fs/default: 683 tests, 9 failures, 226 skipped, 30297 seconds
Failures: generic/050 generic/064 generic/250 generic/252 generic/459
generic/506 generic/563 generic/634 generic/635
Signed-off-by: Wu Bo <bo.wu@vivo.com>
---
fs/f2fs/data.c | 238 ++++++++++++++++++++---------------------------
fs/f2fs/f2fs.h | 8 +-
fs/f2fs/inline.c | 20 ++--
3 files changed, 120 insertions(+), 146 deletions(-)
Comments
On 2023/7/31 9:26, Wu Bo wrote:
> This patch has been tested with xfstests by running 'kvm-xfstests -c
> f2fs -g auto' with and without this patch; no regressions were seen.
>
> Some tests fail both before and after, and the test results are:
> f2fs/default: 683 tests, 9 failures, 226 skipped, 30297 seconds
> Failures: generic/050 generic/064 generic/250 generic/252 generic/459
> generic/506 generic/563 generic/634 generic/635
Can you please take a look at gerneic/473 ?
generic/473 1s ... - output mismatch (see /media/fstests/results//generic/473.out.bad)
--- tests/generic/473.out 2022-11-10 08:42:19.231395230 +0000
+++ /media/fstests/results//generic/473.out.bad 2023-08-04 02:02:01.000000000 +0000
@@ -6,7 +6,7 @@
1: [256..287]: hole
Hole + Data
0: [0..127]: hole
-1: [128..255]: data
+1: [128..135]: data
Hole + Data + Hole
0: [0..127]: hole
...
(Run 'diff -u /media/fstests/tests/generic/473.out /media/fstests/results//generic/473.out.bad' to see the entire diff)
Other concern is, it needs to test this implementation on compressed file,
since the logic is a little bit complicated.
+Cc Daeho Jeong
Thanks,
On 2023/8/6 10:05, Chao Yu wrote:
> On 2023/7/31 9:26, Wu Bo wrote:
>> This patch has been tested with xfstests by running 'kvm-xfstests -c
>> f2fs -g auto' with and without this patch; no regressions were seen.
>>
>> Some tests fail both before and after, and the test results are:
>> f2fs/default: 683 tests, 9 failures, 226 skipped, 30297 seconds
>> Failures: generic/050 generic/064 generic/250 generic/252 generic/459
>> generic/506 generic/563 generic/634 generic/635
>
> Can you please take a look at gerneic/473 ?
This generic/473 case is failed on xfs too. It's an issue of iomap.
>
> generic/473 1s ... - output mismatch (see
> /media/fstests/results//generic/473.out.bad)
> --- tests/generic/473.out 2022-11-10 08:42:19.231395230 +0000
> +++ /media/fstests/results//generic/473.out.bad 2023-08-04
> 02:02:01.000000000 +0000
> @@ -6,7 +6,7 @@
> 1: [256..287]: hole
> Hole + Data
> 0: [0..127]: hole
> -1: [128..255]: data
> +1: [128..135]: data
> Hole + Data + Hole
> 0: [0..127]: hole
> ...
> (Run 'diff -u /media/fstests/tests/generic/473.out
> /media/fstests/results//generic/473.out.bad' to see the entire diff)
The layout of the test file is:
fiemap.473:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..127]: hole 128
1: [128..255]: 5283840..5283967 128 0x1000
2: [256..383]: hole 128
3: [384..511]: 5283968..5284095 128 0x1000
And the test command is:
xfs_io -c "fiemap -v 0 65k" fiemap.473
So the difference is about when to stop traversal the extents.
The iomap stop when the length beyond it is requested from fiemap command:
...
xfs_io-7399 [001] ..... 1385.656328: f2fs_map_blocks: dev = (254,48), ino = 5, file offset = 15, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
xfs_io-7399 [001] ..... 1385.656328: f2fs_map_blocks: dev = (254,48), ino = 5, file offset = 16, start blkaddr = 0x3400, len = 0x1, flags = 2, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
While previous logic is that stop traversal until next data extent is found:
...
xfs_io-2194 [000] ..... 116.046690: f2fs_map_blocks: dev = (254,48), ino = 5, file offset = 15, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
xfs_io-2194 [000] ..... 116.046690: f2fs_map_blocks: dev = (254,48), ino = 5, file offset = 16, start blkaddr = 0xa1400, len = 0x10, flags = 2, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
xfs_io-2194 [000] ..... 116.046691: f2fs_map_blocks: dev = (254,48), ino = 5, file offset = 32, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
...
xfs_io-2194 [000] ..... 116.046706: f2fs_map_blocks: dev = (254,48), ino = 5, file offset = 48, start blkaddr = 0xa1410, len = 0x10, flags = 2, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
>
> Other concern is, it needs to test this implementation on compressed
> file,
> since the logic is a little bit complicated.
To be honest, all the complex logic is try to handle compressed file situation.
I used enwiki8 dataset to test compressed file:
mkfs.f2fs -f -O extra_attr,compression f2fs.img
mount f2fs.img f2fs -o compress_algorithm=lz4,compress_log_size=3,compress_mode=user
touch compressed_file
f2fs_io setflags compression compressed_file
cat enwiki8 > compressed_file
f2fs_io compress compressed_file
f2fs_io release_cblocks compressed_file
xfs_io -c fiemap compressed_file | awk '{print $2 $3}'
enwiki8 download url: http://mattmahoney.net/dc/enwik8.zip
And the result is:
--- a/orig
+++ b/new
@@ -1750,8 +1750,8 @@
[111872..111935]:323448..323511
[111936..111999]:323488..323551
[112000..112063]:323520..323583
-[112064..112087]:323560..323583
-[112088..112127]:53248..53287
+[112064..112095]:323560..323591
+[112096..112127]:53248..53279
[112128..112191]:53256..53319
[112192..112255]:53288..53351
[112256..112319]:53328..53391
@@ -2078,10 +2078,8 @@
[132800..132863]:65408..65471
[132864..132927]:65448..65511
[132928..132991]:65488..65551
-[132992..132999]:65528..65535
-[133000..133007]:65528..65535
-[133008..133039]:69632..69663
-[133040..133055]:hole
+[132992..133007]:65528..65543
+[133008..133055]:69632..69679
[133056..133119]:69664..69727
[133120..133183]:69704..69767
[133184..133247]:69744..69807
The first diff is I count the space of COMPRESS_ADDR belong to the head of one
compressed cluster while previous count at the rear of cluster.
The secound diff show the previous print a 'hole' in one cluster. I think a
compressed cluster should not include a 'hole', so there may have a bug before.
Also, as discussed in this thread:
https://lore.kernel.org/linux-f2fs-devel/ZJmBmt3WmUpWR3+2@casper.infradead.org/T/#t
If f2fs can support async buffer write, the performance can be greatly improved
when using io_uring.
I think it's time to move f2fs to iomap framework. And really looking forward
to hearing your opinion on this.
Thanks
@@ -1599,12 +1599,14 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
unsigned int maxblocks = map->m_len;
struct dnode_of_data dn;
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+ unsigned int cluster_size = F2FS_I(inode)->i_cluster_size;
+ unsigned int cluster_mask = cluster_size - 1;
int mode = map->m_may_create ? ALLOC_NODE : LOOKUP_NODE;
pgoff_t pgofs, end_offset, end;
- int err = 0, ofs = 1;
- unsigned int ofs_in_node, last_ofs_in_node;
+ int err = 0, ofs = 1, append = 0;
+ unsigned int ofs_in_node, last_ofs_in_node, ofs_in_cluster;
blkcnt_t prealloc;
- block_t blkaddr;
+ block_t blkaddr, start_addr;
unsigned int start_pgofs;
int bidx = 0;
bool is_hole;
@@ -1691,6 +1693,7 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
map->m_flags |= F2FS_MAP_NEW;
} else if (is_hole) {
if (f2fs_compressed_file(inode) &&
+ blkaddr == COMPRESS_ADDR &&
f2fs_sanity_check_cluster(&dn) &&
(flag != F2FS_GET_BLOCK_FIEMAP ||
IS_ENABLED(CONFIG_F2FS_CHECK_FS))) {
@@ -1712,6 +1715,18 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
*map->m_next_pgofs = pgofs + 1;
goto sync_out;
}
+ if (f2fs_compressed_file(inode) &&
+ blkaddr == COMPRESS_ADDR) {
+ /* split consecutive cluster */
+ if (map->m_len) {
+ dn.ofs_in_node--;
+ goto sync_out;
+ }
+ pgofs++;
+ dn.ofs_in_node++;
+ append = 1;
+ goto next_block;
+ }
break;
default:
/* for defragment case */
@@ -1750,6 +1765,10 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
goto sync_out;
}
+ /* 1 cluster 1 extent, split consecutive cluster */
+ if (append && !((dn.ofs_in_node + 1) & cluster_mask))
+ goto sync_out;
+
skip:
dn.ofs_in_node++;
pgofs++;
@@ -1832,6 +1851,20 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
if (map->m_next_extent)
*map->m_next_extent = pgofs + 1;
}
+
+ if (flag == F2FS_GET_BLOCK_FIEMAP && f2fs_compressed_file(inode)) {
+ ofs_in_node = round_down(dn.ofs_in_node, cluster_size);
+ ofs_in_cluster = dn.ofs_in_node & cluster_mask;
+ start_addr = data_blkaddr(dn.inode, dn.node_page, ofs_in_node);
+ if (start_addr == COMPRESS_ADDR) {
+ map->m_flags |= F2FS_MAP_ENCODED;
+ map->m_len += append;
+ /* End of a cluster */
+ if (blkaddr == NULL_ADDR || blkaddr == NEW_ADDR)
+ map->m_len += cluster_size - ofs_in_cluster;
+ }
+ }
+
f2fs_put_dnode(&dn);
unlock_out:
if (map->m_may_create) {
@@ -1952,37 +1985,10 @@ static int f2fs_xattr_fiemap(struct inode *inode,
return (err < 0 ? err : 0);
}
-static loff_t max_inode_blocks(struct inode *inode)
-{
- loff_t result = ADDRS_PER_INODE(inode);
- loff_t leaf_count = ADDRS_PER_BLOCK(inode);
-
- /* two direct node blocks */
- result += (leaf_count * 2);
-
- /* two indirect node blocks */
- leaf_count *= NIDS_PER_BLOCK;
- result += (leaf_count * 2);
-
- /* one double indirect node block */
- leaf_count *= NIDS_PER_BLOCK;
- result += leaf_count;
-
- return result;
-}
-
int f2fs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
u64 start, u64 len)
{
- struct f2fs_map_blocks map;
- sector_t start_blk, last_blk;
- pgoff_t next_pgofs;
- u64 logical = 0, phys = 0, size = 0;
- u32 flags = 0;
- int ret = 0;
- bool compr_cluster = false, compr_appended;
- unsigned int cluster_size = F2FS_I(inode)->i_cluster_size;
- unsigned int count_in_cluster = 0;
+ int ret;
loff_t maxbytes;
if (fieinfo->fi_flags & FIEMAP_FLAG_CACHE) {
@@ -1991,10 +1997,6 @@ int f2fs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
return ret;
}
- ret = fiemap_prep(inode, fieinfo, start, &len, FIEMAP_FLAG_XATTR);
- if (ret)
- return ret;
-
inode_lock(inode);
maxbytes = max_file_blocks(inode) << F2FS_BLKSIZE_BITS;
@@ -2011,110 +2013,9 @@ int f2fs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
goto out;
}
- if (f2fs_has_inline_data(inode) || f2fs_has_inline_dentry(inode)) {
- ret = f2fs_inline_data_fiemap(inode, fieinfo, start, len);
- if (ret != -EAGAIN)
- goto out;
- }
-
- if (bytes_to_blks(inode, len) == 0)
- len = blks_to_bytes(inode, 1);
-
- start_blk = bytes_to_blks(inode, start);
- last_blk = bytes_to_blks(inode, start + len - 1);
-
-next:
- memset(&map, 0, sizeof(map));
- map.m_lblk = start_blk;
- map.m_len = bytes_to_blks(inode, len);
- map.m_next_pgofs = &next_pgofs;
- map.m_seg_type = NO_CHECK_TYPE;
-
- if (compr_cluster) {
- map.m_lblk += 1;
- map.m_len = cluster_size - count_in_cluster;
- }
-
- ret = f2fs_map_blocks(inode, &map, F2FS_GET_BLOCK_FIEMAP);
- if (ret)
- goto out;
-
- /* HOLE */
- if (!compr_cluster && !(map.m_flags & F2FS_MAP_FLAGS)) {
- start_blk = next_pgofs;
-
- if (blks_to_bytes(inode, start_blk) < blks_to_bytes(inode,
- max_inode_blocks(inode)))
- goto prep_next;
-
- flags |= FIEMAP_EXTENT_LAST;
- }
-
- compr_appended = false;
- /* In a case of compressed cluster, append this to the last extent */
- if (compr_cluster && ((map.m_flags & F2FS_MAP_DELALLOC) ||
- !(map.m_flags & F2FS_MAP_FLAGS))) {
- compr_appended = true;
- goto skip_fill;
- }
-
- if (size) {
- flags |= FIEMAP_EXTENT_MERGED;
- if (IS_ENCRYPTED(inode))
- flags |= FIEMAP_EXTENT_DATA_ENCRYPTED;
-
- ret = fiemap_fill_next_extent(fieinfo, logical,
- phys, size, flags);
- trace_f2fs_fiemap(inode, logical, phys, size, flags, ret);
- if (ret)
- goto out;
- size = 0;
- }
-
- if (start_blk > last_blk)
- goto out;
-
-skip_fill:
- if (map.m_pblk == COMPRESS_ADDR) {
- compr_cluster = true;
- count_in_cluster = 1;
- } else if (compr_appended) {
- unsigned int appended_blks = cluster_size -
- count_in_cluster + 1;
- size += blks_to_bytes(inode, appended_blks);
- start_blk += appended_blks;
- compr_cluster = false;
- } else {
- logical = blks_to_bytes(inode, start_blk);
- phys = __is_valid_data_blkaddr(map.m_pblk) ?
- blks_to_bytes(inode, map.m_pblk) : 0;
- size = blks_to_bytes(inode, map.m_len);
- flags = 0;
-
- if (compr_cluster) {
- flags = FIEMAP_EXTENT_ENCODED;
- count_in_cluster += map.m_len;
- if (count_in_cluster == cluster_size) {
- compr_cluster = false;
- size += blks_to_bytes(inode, 1);
- }
- } else if (map.m_flags & F2FS_MAP_DELALLOC) {
- flags = FIEMAP_EXTENT_UNWRITTEN;
- }
-
- start_blk += bytes_to_blks(inode, size);
- }
+ ret = iomap_fiemap(inode, fieinfo, start, len, &f2fs_iomap_report_ops);
-prep_next:
- cond_resched();
- if (fatal_signal_pending(current))
- ret = -EINTR;
- else
- goto next;
out:
- if (ret == 1)
- ret = 0;
-
inode_unlock(inode);
return ret;
}
@@ -4266,3 +4167,66 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
const struct iomap_ops f2fs_iomap_ops = {
.iomap_begin = f2fs_iomap_begin,
};
+
+static int f2fs_iomap_begin_report(struct inode *inode, loff_t offset,
+ loff_t length, unsigned int flags,
+ struct iomap *iomap, struct iomap *srcmap)
+{
+ struct f2fs_map_blocks map = {0};
+ pgoff_t next_pgofs = 0;
+ int err;
+
+ if (f2fs_has_inline_data(inode) || f2fs_has_inline_dentry(inode)) {
+ err = f2fs_inline_data_fiemap(inode, iomap, offset, length);
+ if (err != -EAGAIN)
+ return err;
+ }
+
+ map.m_lblk = bytes_to_blks(inode, offset);
+ map.m_len = bytes_to_blks(inode, offset + length - 1) - map.m_lblk + 1;
+ map.m_next_pgofs = &next_pgofs;
+ map.m_seg_type = NO_CHECK_TYPE;
+ err = f2fs_map_blocks(inode, &map, F2FS_GET_BLOCK_FIEMAP);
+ if (err)
+ return err;
+ /*
+ * When inline encryption is enabled, sometimes I/O to an encrypted file
+ * has to be broken up to guarantee DUN contiguity. Handle this by
+ * limiting the length of the mapping returned.
+ */
+ map.m_len = fscrypt_limit_io_blocks(inode, map.m_lblk, map.m_len);
+
+ if (WARN_ON_ONCE(map.m_pblk == COMPRESS_ADDR))
+ return -EINVAL;
+
+ iomap->offset = blks_to_bytes(inode, map.m_lblk);
+ if (map.m_flags & F2FS_MAP_FLAGS)
+ iomap->length = blks_to_bytes(inode, map.m_len);
+ else
+ iomap->length = blks_to_bytes(inode, next_pgofs) -
+ iomap->offset;
+
+ if (map.m_pblk == NEW_ADDR) {
+ /* f2fs treat pre-alloc & delay-alloc blocks the same way */
+ iomap->type = IOMAP_UNWRITTEN;
+ iomap->addr = IOMAP_NULL_ADDR;
+ } else if (map.m_pblk == NULL_ADDR) {
+ iomap->type = IOMAP_HOLE;
+ iomap->addr = IOMAP_NULL_ADDR;
+ } else {
+ iomap->type = IOMAP_MAPPED;
+ iomap->flags |= IOMAP_F_MERGED;
+ iomap->bdev = map.m_bdev;
+ iomap->addr = blks_to_bytes(inode, map.m_pblk);
+ }
+
+ cond_resched();
+ if (fatal_signal_pending(current))
+ return -EINTR;
+ else
+ return 0;
+}
+
+const struct iomap_ops f2fs_iomap_report_ops = {
+ .iomap_begin = f2fs_iomap_begin_report,
+};
@@ -25,6 +25,7 @@
#include <linux/quotaops.h>
#include <linux/part_stat.h>
#include <crypto/hash.h>
+#include <linux/iomap.h>
#include <linux/fscrypt.h>
#include <linux/fsverity.h>
@@ -680,8 +681,9 @@ struct extent_tree_info {
#define F2FS_MAP_NEW (1U << 0)
#define F2FS_MAP_MAPPED (1U << 1)
#define F2FS_MAP_DELALLOC (1U << 2)
+#define F2FS_MAP_ENCODED (1U << 3)
#define F2FS_MAP_FLAGS (F2FS_MAP_NEW | F2FS_MAP_MAPPED |\
- F2FS_MAP_DELALLOC)
+ F2FS_MAP_DELALLOC | F2FS_MAP_ENCODED)
struct f2fs_map_blocks {
struct block_device *m_bdev; /* for multi-device dio */
@@ -4109,6 +4111,7 @@ extern const struct inode_operations f2fs_symlink_inode_operations;
extern const struct inode_operations f2fs_encrypted_symlink_inode_operations;
extern const struct inode_operations f2fs_special_inode_operations;
extern struct kmem_cache *f2fs_inode_entry_slab;
+extern const struct iomap_ops f2fs_iomap_report_ops;
/*
* inline.c
@@ -4139,8 +4142,7 @@ bool f2fs_empty_inline_dir(struct inode *dir);
int f2fs_read_inline_dir(struct file *file, struct dir_context *ctx,
struct fscrypt_str *fstr);
int f2fs_inline_data_fiemap(struct inode *inode,
- struct fiemap_extent_info *fieinfo,
- __u64 start, __u64 len);
+ struct iomap *iomap, __u64 start, __u64 len);
/*
* shrinker.c
@@ -767,11 +767,9 @@ int f2fs_read_inline_dir(struct file *file, struct dir_context *ctx,
}
int f2fs_inline_data_fiemap(struct inode *inode,
- struct fiemap_extent_info *fieinfo, __u64 start, __u64 len)
+ struct iomap *iomap, __u64 start, __u64 len)
{
__u64 byteaddr, ilen;
- __u32 flags = FIEMAP_EXTENT_DATA_INLINE | FIEMAP_EXTENT_NOT_ALIGNED |
- FIEMAP_EXTENT_LAST;
struct node_info ni;
struct page *ipage;
int err = 0;
@@ -792,8 +790,14 @@ int f2fs_inline_data_fiemap(struct inode *inode,
}
ilen = min_t(size_t, MAX_INLINE_DATA(inode), i_size_read(inode));
- if (start >= ilen)
+ if (start >= ilen) {
+ /* stop iomap iterator */
+ iomap->offset = start;
+ iomap->length = len;
+ iomap->addr = IOMAP_NULL_ADDR;
+ iomap->type = IOMAP_HOLE;
goto out;
+ }
if (start + len < ilen)
ilen = start + len;
ilen -= start;
@@ -805,8 +809,12 @@ int f2fs_inline_data_fiemap(struct inode *inode,
byteaddr = (__u64)ni.blk_addr << inode->i_sb->s_blocksize_bits;
byteaddr += (char *)inline_data_addr(inode, ipage) -
(char *)F2FS_INODE(ipage);
- err = fiemap_fill_next_extent(fieinfo, start, byteaddr, ilen, flags);
- trace_f2fs_fiemap(inode, start, byteaddr, ilen, flags, err);
+ iomap->addr = byteaddr;
+ iomap->type = IOMAP_INLINE;
+ iomap->flags = 0;
+ iomap->offset = start;
+ iomap->length = ilen;
+
out:
f2fs_put_page(ipage, 1);
return err;