[RFC] ext4:record error information when insert extent failed in 'ext4_split_extent_at'

Message ID 20221024122725.3083432-1-yebin@huaweicloud.com
State New
Headers
Series [RFC] ext4:record error information when insert extent failed in 'ext4_split_extent_at' |

Commit Message

Ye Bin Oct. 24, 2022, 12:27 p.m. UTC
  From: Ye Bin <yebin10@huawei.com>

There's issue as follows when do test with memory fault injection:
[localhost]# fsck.ext4 -a image
image: clean, 45595/655360 files, 466841/2621440 blocks
[localhost]# fsck.ext4 -fn image
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -(1457230--1457256)
Fix? no

image: ********** WARNING: Filesystem still has errors **********

image: 45595/655360 files (12.4% non-contiguous), 466841/2621440 blocks

Inject context:
 -----------------------------------------------------------
 Inject function:kmem_cache_alloc (pid:177858) (return: 0)
 Calltrace Context:
 mem_cache_allock+0x73/0xcc
 ext4_mb_new_blocks+0x32e/0x540 [ext4]
 ext4_new_meta_blocks+0xc4/0x110 [ext4]
 ext4_ext_grow_indepth+0x68/0x250 [ext4]
 ext4_ext_create_new_leaf+0xc5/0x120 [ext4]
 ext4_ext_insert_extent+0x1bf/0x670 [ext4]
 ext4_split_extent_at+0x212/0x530 [ext4]
 ext4_split_extent+0x13a/0x1a0 [ext4]
 ext4_ext_handle_unwritten_extents+0x13d/0x240 [ext4]
 ext4_ext_map_blocks+0x459/0x8f0 [ext4]
 ext4_map_blocks+0x18e/0x5a0 [ext4]
 ext4_iomap_alloc+0xb0/0x1b0 [ext4]
 ext4_iomap_begin+0xb0/0x130 [ext4]
 iomap_apply+0x95/0x2e0
 __iomap_dio_rw+0x1cc/0x4b0
 iomap_dio_rw+0xe/0x40
 ext4_dio_write_iter+0x1a9/0x390 [ext4]
 new_sync_write+0x113/0x1b0
 vfs_write+0x1b7/0x250
 ksys_write+0x5f/0xe0
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x61/0xc6

Compare extent change in journal:
Start:
ee_block      ee_len        ee_start
75            32798         1457227  -> unwritten len=30
308           12            434489
355           5             442492
=>
ee_block      ee_len        ee_start
11            2             951584
74            32769         951647   -> unwritten  len=1
75            32771         1457227  -> unwritten  len=3, length decreased 27
211           15            960906
308           12            434489
355           5             442492

Acctually, above issue can repaired by 'fsck -fa'. But file system is 'clean',
'fsck' will not do deep repair.
Obviously, final lost 27 blocks. Above issue may happens as follows:
ext4_split_extent_at
...
err = ext4_ext_insert_extent(handle, inode, ppath, &newex, flags); -> return -ENOMEM
if (err != -ENOSPC && err != -EDQUOT)
	goto out; -> goto 'out' will not fix extent length, will
...
fix_extent_len:
        ex->ee_len = orig_ex.ee_len;
        /*
         * Ignore ext4_ext_dirty return value since we are already in error path
         * and err is a non-zero error code.
         */
        ext4_ext_dirty(handle, inode, path + path->p_depth);
        return err;
out:
        ext4_ext_show_leaf(inode, path);
        return err;
If 'ext4_ext_insert_extent' return '-ENOMEM' which will not fix 'ex->ee_len' by
old length. 'ext4_ext_insert_extent' will trigger extent tree merge, fix like
'ex->ee_len = orig_ex.ee_len' may lead to new issues.
To solve above issue, record error messages when 'ext4_ext_insert_extent' return
'err' not equal '(-ENOSPC && -EDQUOT)'. If filesysten is mounted with 'errors=continue'
as filesystem is not clean 'fsck' will repair issue. If filesystem is mounted with
'errors=remount-ro' filesystem will be remounted by read-only.

Signed-off-by: Ye Bin <yebin10@huawei.com>
---
 fs/ext4/extents.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)
  

Comments

zhanchengbin Oct. 28, 2022, 1:41 a.m. UTC | #1
There have been a lot of problems here before, but the problem has not 
been fundamentally solved.

On 2022/10/24 20:27, Ye Bin wrote:
> From: Ye Bin <yebin10@huawei.com>
> 
> There's issue as follows when do test with memory fault injection:
> [localhost]# fsck.ext4 -a image
> image: clean, 45595/655360 files, 466841/2621440 blocks
> [localhost]# fsck.ext4 -fn image
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Block bitmap differences:  -(1457230--1457256)
> Fix? no
> 
> image: ********** WARNING: Filesystem still has errors **********
> 
> image: 45595/655360 files (12.4% non-contiguous), 466841/2621440 blocks
> 
> Inject context:
>   -----------------------------------------------------------
>   Inject function:kmem_cache_alloc (pid:177858) (return: 0)
>   Calltrace Context:
>   mem_cache_allock+0x73/0xcc
>   ext4_mb_new_blocks+0x32e/0x540 [ext4]
>   ext4_new_meta_blocks+0xc4/0x110 [ext4]
>   ext4_ext_grow_indepth+0x68/0x250 [ext4]
>   ext4_ext_create_new_leaf+0xc5/0x120 [ext4]
>   ext4_ext_insert_extent+0x1bf/0x670 [ext4]
>   ext4_split_extent_at+0x212/0x530 [ext4]
>   ext4_split_extent+0x13a/0x1a0 [ext4]
>   ext4_ext_handle_unwritten_extents+0x13d/0x240 [ext4]
>   ext4_ext_map_blocks+0x459/0x8f0 [ext4]
>   ext4_map_blocks+0x18e/0x5a0 [ext4]
>   ext4_iomap_alloc+0xb0/0x1b0 [ext4]
>   ext4_iomap_begin+0xb0/0x130 [ext4]
>   iomap_apply+0x95/0x2e0
>   __iomap_dio_rw+0x1cc/0x4b0
>   iomap_dio_rw+0xe/0x40
>   ext4_dio_write_iter+0x1a9/0x390 [ext4]
>   new_sync_write+0x113/0x1b0
>   vfs_write+0x1b7/0x250
>   ksys_write+0x5f/0xe0
>   do_syscall_64+0x33/0x40
>   entry_SYSCALL_64_after_hwframe+0x61/0xc6
> 
> Compare extent change in journal:
> Start:
> ee_block      ee_len        ee_start
> 75            32798         1457227  -> unwritten len=30
> 308           12            434489
> 355           5             442492
> =>
> ee_block      ee_len        ee_start
> 11            2             951584
> 74            32769         951647   -> unwritten  len=1
> 75            32771         1457227  -> unwritten  len=3, length decreased 27
> 211           15            960906
> 308           12            434489
> 355           5             442492
> 
> Acctually, above issue can repaired by 'fsck -fa'. But file system is 'clean',
> 'fsck' will not do deep repair.
> Obviously, final lost 27 blocks. Above issue may happens as follows:
> ext4_split_extent_at
> ...
> err = ext4_ext_insert_extent(handle, inode, ppath, &newex, flags); -> return -ENOMEM
> if (err != -ENOSPC && err != -EDQUOT)
> 	goto out; -> goto 'out' will not fix extent length, will
> ...
> fix_extent_len:
>          ex->ee_len = orig_ex.ee_len;
>          /*
>           * Ignore ext4_ext_dirty return value since we are already in error path
>           * and err is a non-zero error code.
>           */
>          ext4_ext_dirty(handle, inode, path + path->p_depth);
>          return err;
> out:
>          ext4_ext_show_leaf(inode, path);
>          return err;
> If 'ext4_ext_insert_extent' return '-ENOMEM' which will not fix 'ex->ee_len' by
> old length. 'ext4_ext_insert_extent' will trigger extent tree merge, fix like
> 'ex->ee_len = orig_ex.ee_len' may lead to new issues.
> To solve above issue, record error messages when 'ext4_ext_insert_extent' return
> 'err' not equal '(-ENOSPC && -EDQUOT)'. If filesysten is mounted with 'errors=continue'
> as filesystem is not clean 'fsck' will repair issue. If filesystem is mounted with
> 'errors=remount-ro' filesystem will be remounted by read-only.
> 
> Signed-off-by: Ye Bin <yebin10@huawei.com>
> ---
>   fs/ext4/extents.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index f1956288307f..582a7d59d6e3 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3252,8 +3252,13 @@ static int ext4_split_extent_at(handle_t *handle,
>   		ext4_ext_mark_unwritten(ex2);
>   
>   	err = ext4_ext_insert_extent(handle, inode, ppath, &newex, flags);
> -	if (err != -ENOSPC && err != -EDQUOT)
> +	if (err != -ENOSPC && err != -EDQUOT) {
> +		if (err)
> +			EXT4_ERROR_INODE_ERR(inode, -err,
> +			"insert extent failed block = %d len = %d",
> +			ex2->ee_block, ex2->ee_len);
>   		goto out;
> +	}
>   
>   	if (EXT4_EXT_MAY_ZEROOUT & split_flag) {
>   		if (split_flag & (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) {
>
  
Theodore Ts'o Dec. 5, 2022, 5:48 p.m. UTC | #2
On Fri, Oct 28, 2022 at 09:41:55AM +0800, zhanchengbin wrote:
> There have been a lot of problems here before, but the problem has not been
> fundamentally solved.

Indeed, this only papers only the problem.  The commit description is
pretty good, in that it describes the the root cause of the problem:

> > If 'ext4_ext_insert_extent' return '-ENOMEM' which will not fix 'ex->ee_len' by
> > old length. 'ext4_ext_insert_extent' will trigger extent tree merge, fix like
> > 'ex->ee_len = orig_ex.ee_len' may lead to new issues.
> > To solve above issue, record error messages when 'ext4_ext_insert_extent' return
> > 'err' not equal '(-ENOSPC && -EDQUOT)'. If filesysten is mounted with 'errors=continue'
> > as filesystem is not clean 'fsck' will repair issue. If filesystem is mounted with
> > 'errors=remount-ro' filesystem will be remounted by read-only.

What should actually happen here is that we undo the change in orig_ex
if ext4_ext_insert_extent fails.  As you point out, in an earlier part
of the code path, this gets handled by a "goto fix_extent_len".    

The problem is that it's possible that the shape of the extent tree
may have changed before ext4_insert_extent() fails with the ENOMEM.
So the simplistic fix of just jumping to fix_extent_len isn't going to
work.  But fixing it by much marking the file system is corrupt is a
bit of a cop-out.  Consider that if you were running ext4 in a Huawei
Cloud, and you run into the memory allocation failure, what would you
prefer?  For the individual system call to fail, and propagating the
failure to userspace?  Or to leave the file system corrupted?  (And
then either forcing a reboot so the file system to be fixed, or allow
the system to stumble along, with further unknown unexpected behaviour
from userspace?)

The real correct fix is that ext4_ext_insert_extent() needs to
rollback any changes to the extent tree that was made, *or* it needs
to make sure that the operation will succeed before starting to make
any changes, *or* we need to look up the orig_extent via
orig_ex->ee_block, and then undo the change.

It might be that we can't always reliably rollback the change, or we
might think that the operation will succeed, but then it fail due to
an I/O error.  If it's due to an I/O error, then it's fine to bail and
mark the file system as corrupted.  But if the failure is caused by an
ENOMEM, we should be able to handle this case more gracefully.

Can you look into a better fix?

Thanks,

						- Ted
  

Patch

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index f1956288307f..582a7d59d6e3 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3252,8 +3252,13 @@  static int ext4_split_extent_at(handle_t *handle,
 		ext4_ext_mark_unwritten(ex2);
 
 	err = ext4_ext_insert_extent(handle, inode, ppath, &newex, flags);
-	if (err != -ENOSPC && err != -EDQUOT)
+	if (err != -ENOSPC && err != -EDQUOT) {
+		if (err)
+			EXT4_ERROR_INODE_ERR(inode, -err,
+			"insert extent failed block = %d len = %d",
+			ex2->ee_block, ex2->ee_len);
 		goto out;
+	}
 
 	if (EXT4_EXT_MAY_ZEROOUT & split_flag) {
 		if (split_flag & (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) {