[RFC] ext4:record error information when insert extent failed in 'ext4_split_extent_at'
Message ID | 20221024122725.3083432-1-yebin@huaweicloud.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp715574wru; Mon, 24 Oct 2022 17:08:03 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4qxSAeN8kXSVCSD+hOydhA1C9UunAGsFafia4h3Nvxxkwuw8sTSESHZayRhcw80AzdWscf X-Received: by 2002:aa7:de81:0:b0:461:2122:a807 with SMTP id j1-20020aa7de81000000b004612122a807mr21683105edv.382.1666656483498; Mon, 24 Oct 2022 17:08:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666656483; cv=none; d=google.com; s=arc-20160816; b=dMZK5V1cvJP/EOqCgPUmhaLtCGzHxkPWd/6GslZp+djFlq+AkXmsFhxmQsXorliinD j0qFKJyYvrJY1sOOZtQ7oBaO1ILD5cLEQ+FR2I9ImCA/RMEohFnnYQQXOHgWeMVeX+g/ tFOillCcr9ddJbjZH+cJyG6ysUsF/NQy2+CEOdepBYgrbQ40FMD6u6zJpa6fOrkrmIAY Jx014j/0mgq4KCaXat9l4QZxs3Q2hezV1jrG0JCh3EkYEUkLUTC7/EVY10pdlia97Z1j 1aL0MEjJcTxfHR2tY6rDbmz+xFODtEexwhAx2ijWm/I3jwxsfF27vXEQQJf8n0seGH7z U/IQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=2SXbHCnN1XJgmWg8Mwe1+OUqnjrduDYw2iJk53YFsSQ=; b=yB1QuFgj0mTQG8/cyBrSYDfCwQYfK8iT4hZK8QyMuZ1DAXaTJyBbeGn4qcGxr+DtWr rxEGIQze4nwIucVJdPs+5OE+rLF92b6qpaHrzip/sGBAulFSULFSngso8HAe3gShv3jA Z1aVfnNFXGYChBnlaJFNbBoWd1OjRm/CMWFr99GjB64PzQaaN2vCXF2pOkMEnL4wkFjM 4+OaKseke3sZ3W7tETHUwCjPVX/jXl6N4uybcBQPiB+m57ldchfk+S0oHAPnmLQxHwig aVCFA5WFmKL9L3Xu1eWX+wKJCjJx7H5tnDvtAvJq5NUWmtuS+Jcn802tPYL7zGWIdCyj ZUrQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ga4-20020a1709070c0400b0073de493b83esi943389ejc.147.2022.10.24.17.07.37; Mon, 24 Oct 2022 17:08:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229779AbiJXXs2 (ORCPT <rfc822;pwkd43@gmail.com> + 99 others); Mon, 24 Oct 2022 19:48:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46250 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229866AbiJXXrx (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 24 Oct 2022 19:47:53 -0400 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 24E8C2FC5A3; Mon, 24 Oct 2022 15:06:05 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4Mwtxh3dNszKFN2; Mon, 24 Oct 2022 20:03:04 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.127.227]) by APP4 (Coremail) with SMTP id gCh0CgC3xuiJf1ZjjtV9AA--.11645S4; Mon, 24 Oct 2022 20:05:31 +0800 (CST) From: Ye Bin <yebin@huaweicloud.com> To: tytso@mit.edu, adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org Cc: linux-kernel@vger.kernel.org, jack@suse.cz, Ye Bin <yebin10@huawei.com> Subject: [PATCH RFC] ext4:record error information when insert extent failed in 'ext4_split_extent_at' Date: Mon, 24 Oct 2022 20:27:25 +0800 Message-Id: <20221024122725.3083432-1-yebin@huaweicloud.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-CM-TRANSID: gCh0CgC3xuiJf1ZjjtV9AA--.11645S4 X-Coremail-Antispam: 1UD129KBjvJXoWxZFWfWr13Jr1UKr4UGr4DArb_yoWrWrWrpr Z3Cr1xGr15J3WUCrZ7AFs2gryI9a17Gw1UJFyfGr1fJFyUZrWUWFn8KF1FvFy8WrW8Ga45 XFW8trW5WF1DCa7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUgKb4IE77IF4wAFF20E14v26r4j6ryUM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6r1F6r1fM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4 vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7Cj xVAFwI0_Cr0_Gr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I 0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40E x7xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x 0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Y z7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zV AF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4l IxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF0xvE42xK8VAvwI8IcIk0rVWrZr1j6s 0DMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBI daVFxhVjvjDU0xZFpf9x07UWE__UUUUU= X-CM-SenderInfo: p1hex046kxt4xhlfz01xgou0bp/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747615988776071856?= X-GMAIL-MSGID: =?utf-8?q?1747615988776071856?= |
Series |
[RFC] ext4:record error information when insert extent failed in 'ext4_split_extent_at'
|
|
Commit Message
Ye Bin
Oct. 24, 2022, 12:27 p.m. UTC
From: Ye Bin <yebin10@huawei.com> There's issue as follows when do test with memory fault injection: [localhost]# fsck.ext4 -a image image: clean, 45595/655360 files, 466841/2621440 blocks [localhost]# fsck.ext4 -fn image Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Block bitmap differences: -(1457230--1457256) Fix? no image: ********** WARNING: Filesystem still has errors ********** image: 45595/655360 files (12.4% non-contiguous), 466841/2621440 blocks Inject context: ----------------------------------------------------------- Inject function:kmem_cache_alloc (pid:177858) (return: 0) Calltrace Context: mem_cache_allock+0x73/0xcc ext4_mb_new_blocks+0x32e/0x540 [ext4] ext4_new_meta_blocks+0xc4/0x110 [ext4] ext4_ext_grow_indepth+0x68/0x250 [ext4] ext4_ext_create_new_leaf+0xc5/0x120 [ext4] ext4_ext_insert_extent+0x1bf/0x670 [ext4] ext4_split_extent_at+0x212/0x530 [ext4] ext4_split_extent+0x13a/0x1a0 [ext4] ext4_ext_handle_unwritten_extents+0x13d/0x240 [ext4] ext4_ext_map_blocks+0x459/0x8f0 [ext4] ext4_map_blocks+0x18e/0x5a0 [ext4] ext4_iomap_alloc+0xb0/0x1b0 [ext4] ext4_iomap_begin+0xb0/0x130 [ext4] iomap_apply+0x95/0x2e0 __iomap_dio_rw+0x1cc/0x4b0 iomap_dio_rw+0xe/0x40 ext4_dio_write_iter+0x1a9/0x390 [ext4] new_sync_write+0x113/0x1b0 vfs_write+0x1b7/0x250 ksys_write+0x5f/0xe0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x61/0xc6 Compare extent change in journal: Start: ee_block ee_len ee_start 75 32798 1457227 -> unwritten len=30 308 12 434489 355 5 442492 => ee_block ee_len ee_start 11 2 951584 74 32769 951647 -> unwritten len=1 75 32771 1457227 -> unwritten len=3, length decreased 27 211 15 960906 308 12 434489 355 5 442492 Acctually, above issue can repaired by 'fsck -fa'. But file system is 'clean', 'fsck' will not do deep repair. Obviously, final lost 27 blocks. Above issue may happens as follows: ext4_split_extent_at ... err = ext4_ext_insert_extent(handle, inode, ppath, &newex, flags); -> return -ENOMEM if (err != -ENOSPC && err != -EDQUOT) goto out; -> goto 'out' will not fix extent length, will ... fix_extent_len: ex->ee_len = orig_ex.ee_len; /* * Ignore ext4_ext_dirty return value since we are already in error path * and err is a non-zero error code. */ ext4_ext_dirty(handle, inode, path + path->p_depth); return err; out: ext4_ext_show_leaf(inode, path); return err; If 'ext4_ext_insert_extent' return '-ENOMEM' which will not fix 'ex->ee_len' by old length. 'ext4_ext_insert_extent' will trigger extent tree merge, fix like 'ex->ee_len = orig_ex.ee_len' may lead to new issues. To solve above issue, record error messages when 'ext4_ext_insert_extent' return 'err' not equal '(-ENOSPC && -EDQUOT)'. If filesysten is mounted with 'errors=continue' as filesystem is not clean 'fsck' will repair issue. If filesystem is mounted with 'errors=remount-ro' filesystem will be remounted by read-only. Signed-off-by: Ye Bin <yebin10@huawei.com> --- fs/ext4/extents.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
Comments
There have been a lot of problems here before, but the problem has not been fundamentally solved. On 2022/10/24 20:27, Ye Bin wrote: > From: Ye Bin <yebin10@huawei.com> > > There's issue as follows when do test with memory fault injection: > [localhost]# fsck.ext4 -a image > image: clean, 45595/655360 files, 466841/2621440 blocks > [localhost]# fsck.ext4 -fn image > Pass 1: Checking inodes, blocks, and sizes > Pass 2: Checking directory structure > Pass 3: Checking directory connectivity > Pass 4: Checking reference counts > Pass 5: Checking group summary information > Block bitmap differences: -(1457230--1457256) > Fix? no > > image: ********** WARNING: Filesystem still has errors ********** > > image: 45595/655360 files (12.4% non-contiguous), 466841/2621440 blocks > > Inject context: > ----------------------------------------------------------- > Inject function:kmem_cache_alloc (pid:177858) (return: 0) > Calltrace Context: > mem_cache_allock+0x73/0xcc > ext4_mb_new_blocks+0x32e/0x540 [ext4] > ext4_new_meta_blocks+0xc4/0x110 [ext4] > ext4_ext_grow_indepth+0x68/0x250 [ext4] > ext4_ext_create_new_leaf+0xc5/0x120 [ext4] > ext4_ext_insert_extent+0x1bf/0x670 [ext4] > ext4_split_extent_at+0x212/0x530 [ext4] > ext4_split_extent+0x13a/0x1a0 [ext4] > ext4_ext_handle_unwritten_extents+0x13d/0x240 [ext4] > ext4_ext_map_blocks+0x459/0x8f0 [ext4] > ext4_map_blocks+0x18e/0x5a0 [ext4] > ext4_iomap_alloc+0xb0/0x1b0 [ext4] > ext4_iomap_begin+0xb0/0x130 [ext4] > iomap_apply+0x95/0x2e0 > __iomap_dio_rw+0x1cc/0x4b0 > iomap_dio_rw+0xe/0x40 > ext4_dio_write_iter+0x1a9/0x390 [ext4] > new_sync_write+0x113/0x1b0 > vfs_write+0x1b7/0x250 > ksys_write+0x5f/0xe0 > do_syscall_64+0x33/0x40 > entry_SYSCALL_64_after_hwframe+0x61/0xc6 > > Compare extent change in journal: > Start: > ee_block ee_len ee_start > 75 32798 1457227 -> unwritten len=30 > 308 12 434489 > 355 5 442492 > => > ee_block ee_len ee_start > 11 2 951584 > 74 32769 951647 -> unwritten len=1 > 75 32771 1457227 -> unwritten len=3, length decreased 27 > 211 15 960906 > 308 12 434489 > 355 5 442492 > > Acctually, above issue can repaired by 'fsck -fa'. But file system is 'clean', > 'fsck' will not do deep repair. > Obviously, final lost 27 blocks. Above issue may happens as follows: > ext4_split_extent_at > ... > err = ext4_ext_insert_extent(handle, inode, ppath, &newex, flags); -> return -ENOMEM > if (err != -ENOSPC && err != -EDQUOT) > goto out; -> goto 'out' will not fix extent length, will > ... > fix_extent_len: > ex->ee_len = orig_ex.ee_len; > /* > * Ignore ext4_ext_dirty return value since we are already in error path > * and err is a non-zero error code. > */ > ext4_ext_dirty(handle, inode, path + path->p_depth); > return err; > out: > ext4_ext_show_leaf(inode, path); > return err; > If 'ext4_ext_insert_extent' return '-ENOMEM' which will not fix 'ex->ee_len' by > old length. 'ext4_ext_insert_extent' will trigger extent tree merge, fix like > 'ex->ee_len = orig_ex.ee_len' may lead to new issues. > To solve above issue, record error messages when 'ext4_ext_insert_extent' return > 'err' not equal '(-ENOSPC && -EDQUOT)'. If filesysten is mounted with 'errors=continue' > as filesystem is not clean 'fsck' will repair issue. If filesystem is mounted with > 'errors=remount-ro' filesystem will be remounted by read-only. > > Signed-off-by: Ye Bin <yebin10@huawei.com> > --- > fs/ext4/extents.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c > index f1956288307f..582a7d59d6e3 100644 > --- a/fs/ext4/extents.c > +++ b/fs/ext4/extents.c > @@ -3252,8 +3252,13 @@ static int ext4_split_extent_at(handle_t *handle, > ext4_ext_mark_unwritten(ex2); > > err = ext4_ext_insert_extent(handle, inode, ppath, &newex, flags); > - if (err != -ENOSPC && err != -EDQUOT) > + if (err != -ENOSPC && err != -EDQUOT) { > + if (err) > + EXT4_ERROR_INODE_ERR(inode, -err, > + "insert extent failed block = %d len = %d", > + ex2->ee_block, ex2->ee_len); > goto out; > + } > > if (EXT4_EXT_MAY_ZEROOUT & split_flag) { > if (split_flag & (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) { >
On Fri, Oct 28, 2022 at 09:41:55AM +0800, zhanchengbin wrote: > There have been a lot of problems here before, but the problem has not been > fundamentally solved. Indeed, this only papers only the problem. The commit description is pretty good, in that it describes the the root cause of the problem: > > If 'ext4_ext_insert_extent' return '-ENOMEM' which will not fix 'ex->ee_len' by > > old length. 'ext4_ext_insert_extent' will trigger extent tree merge, fix like > > 'ex->ee_len = orig_ex.ee_len' may lead to new issues. > > To solve above issue, record error messages when 'ext4_ext_insert_extent' return > > 'err' not equal '(-ENOSPC && -EDQUOT)'. If filesysten is mounted with 'errors=continue' > > as filesystem is not clean 'fsck' will repair issue. If filesystem is mounted with > > 'errors=remount-ro' filesystem will be remounted by read-only. What should actually happen here is that we undo the change in orig_ex if ext4_ext_insert_extent fails. As you point out, in an earlier part of the code path, this gets handled by a "goto fix_extent_len". The problem is that it's possible that the shape of the extent tree may have changed before ext4_insert_extent() fails with the ENOMEM. So the simplistic fix of just jumping to fix_extent_len isn't going to work. But fixing it by much marking the file system is corrupt is a bit of a cop-out. Consider that if you were running ext4 in a Huawei Cloud, and you run into the memory allocation failure, what would you prefer? For the individual system call to fail, and propagating the failure to userspace? Or to leave the file system corrupted? (And then either forcing a reboot so the file system to be fixed, or allow the system to stumble along, with further unknown unexpected behaviour from userspace?) The real correct fix is that ext4_ext_insert_extent() needs to rollback any changes to the extent tree that was made, *or* it needs to make sure that the operation will succeed before starting to make any changes, *or* we need to look up the orig_extent via orig_ex->ee_block, and then undo the change. It might be that we can't always reliably rollback the change, or we might think that the operation will succeed, but then it fail due to an I/O error. If it's due to an I/O error, then it's fine to bail and mark the file system as corrupted. But if the failure is caused by an ENOMEM, we should be able to handle this case more gracefully. Can you look into a better fix? Thanks, - Ted
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index f1956288307f..582a7d59d6e3 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -3252,8 +3252,13 @@ static int ext4_split_extent_at(handle_t *handle, ext4_ext_mark_unwritten(ex2); err = ext4_ext_insert_extent(handle, inode, ppath, &newex, flags); - if (err != -ENOSPC && err != -EDQUOT) + if (err != -ENOSPC && err != -EDQUOT) { + if (err) + EXT4_ERROR_INODE_ERR(inode, -err, + "insert extent failed block = %d len = %d", + ex2->ee_block, ex2->ee_len); goto out; + } if (EXT4_EXT_MAY_ZEROOUT & split_flag) { if (split_flag & (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) {