Message ID | 20230310125206.2867822-2-chengzhihao1@huawei.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp859774wrd; Fri, 10 Mar 2023 05:06:45 -0800 (PST) X-Google-Smtp-Source: AK7set8xZfjbn6uT3Z1RSl4E9ZvMr6c7CV0kWhDHy3tG/x+yScFcvZrd2L0kwVR9LJieO6wH+DSX X-Received: by 2002:a17:902:864b:b0:19e:7889:f9fb with SMTP id y11-20020a170902864b00b0019e7889f9fbmr22730014plt.68.1678453605011; Fri, 10 Mar 2023 05:06:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678453604; cv=none; d=google.com; s=arc-20160816; b=nzLmbOz2PV9WMkI+SxZO88GFjtfnvHk9JOLszEOKcUhb4LBk31LUelG370yPcsXKtE cJUhj4iW11M2I5F0dYtbT6Bprrp5KQgciqoOJ+RhpYjYMCFELxgQAQMe7quc3hoP119i fWTitnKF5yPZ9xUupbvlfD8eesez2yU6E2CLncmGIG/taxy7wdp6/3OpVSGYuNZY6vlG 67RM0WO3wBjqf+X6cBy38epHoxPySZH58OiVDxWvlRUGsOifxlD1fMjbV3wNaNr9YRKq 0PHRUtp/t6EQmAr3HZ1yrZxwC8AehXYVJcwTZ8v6uXXk9x+3au0V6gb50+N4MIcEHP5X drhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=JrUpRWsprgMjBkZ/u2uTZ6UX0v586TCo2g+0+MKDG3I=; b=om0dlmFl/0Ga57UsLlC11J7eWBLbHumr2wCI2V0ARisxb8CvYeM9NdRIjLtZ6DQ8TP D4ssMust1L2eTPsxsQVhogmtrIAwtoYUWiRPHugWXfHnBqCHKROgBIAhV5k3yjugN0ii i9M00GRaqx2gVI2O1tDMD6i2H6Di0i47iEiA+7N7qGQrCHzEgiRLGWrzONCFdjx78kAS JlNHHCJs/B0+KdHdzIxa15Qj5XIipYYiO6VV3q5Sv+3+PP2mXPPBsltJOqsuHyZcsvxq Z74H0yXYMD1gru1YhLI4UzoQBY474XH/GhejV4pS5MAuyKVU5h3X1t+IWIZ2yaUHVr1l Q8fw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q19-20020a170902b11300b0019266fcbd20si1949390plr.623.2023.03.10.05.06.32; Fri, 10 Mar 2023 05:06:44 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230296AbjCJMxD (ORCPT <rfc822;carlos.wei.hk@gmail.com> + 99 others); Fri, 10 Mar 2023 07:53:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41944 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230165AbjCJMwz (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 10 Mar 2023 07:52:55 -0500 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C3F8C10C721; Fri, 10 Mar 2023 04:52:52 -0800 (PST) Received: from kwepemm600013.china.huawei.com (unknown [172.30.72.57]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4PY5WR5MZhzHwTv; Fri, 10 Mar 2023 20:50:43 +0800 (CST) Received: from huawei.com (10.175.127.227) by kwepemm600013.china.huawei.com (7.193.23.68) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.21; Fri, 10 Mar 2023 20:52:50 +0800 From: Zhihao Cheng <chengzhihao1@huawei.com> To: <tytso@mit.edu>, <adilger.kernel@dilger.ca>, <jack@suse.com> CC: <linux-ext4@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <chengzhihao1@huawei.com>, <yi.zhang@huawei.com> Subject: [PATCH 1/5] ext4: Fix reusing stale buffer heads from last failed mounting Date: Fri, 10 Mar 2023 20:52:02 +0800 Message-ID: <20230310125206.2867822-2-chengzhihao1@huawei.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230310125206.2867822-1-chengzhihao1@huawei.com> References: <20230310125206.2867822-1-chengzhihao1@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.175.127.227] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemm600013.china.huawei.com (7.193.23.68) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759986167356689441?= X-GMAIL-MSGID: =?utf-8?q?1759986167356689441?= |
Series |
ext4: Fix stale buffer loading from last failed
|
|
Commit Message
Zhihao Cheng
March 10, 2023, 12:52 p.m. UTC
Following process makes ext4 load stale buffer heads from last failed
mounting in a new mounting operation:
mount_bdev
ext4_fill_super
| ext4_load_and_init_journal
| ext4_load_journal
| jbd2_journal_load
| load_superblock
| journal_get_superblock
| set_buffer_verified(bh) // buffer head is verified
| jbd2_journal_recover // failed caused by EIO
| goto failed_mount3a // skip 'sb->s_root' initialization
deactivate_locked_super
kill_block_super
generic_shutdown_super
if (sb->s_root)
// false, skip ext4_put_super->invalidate_bdev->
// invalidate_mapping_pages->mapping_evict_folio->
// filemap_release_folio->try_to_free_buffers, which
// cannot drop buffer head.
blkdev_put
blkdev_put_whole
if (atomic_dec_and_test(&bdev->bd_openers))
// false, systemd-udev happens to open the device. Then
// blkdev_flush_mapping->kill_bdev->truncate_inode_pages->
// truncate_inode_folio->truncate_cleanup_folio->
// folio_invalidate->block_invalidate_folio->
// filemap_release_folio->try_to_free_buffers will be skipped,
// dropping buffer head is missed again.
Second mount:
ext4_fill_super
ext4_load_and_init_journal
ext4_load_journal
ext4_get_journal
jbd2_journal_init_inode
journal_init_common
bh = getblk_unmovable
bh = __find_get_block // Found stale bh in last failed mounting
journal->j_sb_buffer = bh
jbd2_journal_load
load_superblock
journal_get_superblock
if (buffer_verified(bh))
// true, skip journal->j_format_version = 2, value is 0
jbd2_journal_recover
do_one_pass
next_log_block += count_tags(journal, bh)
// According to journal_tag_bytes(), 'tag_bytes' calculating is
// affected by jbd2_has_feature_csum3(), jbd2_has_feature_csum3()
// returns false because 'j->j_format_version >= 2' is not true,
// then we get wrong next_log_block. The do_one_pass may exit
// early whenoccuring non JBD2_MAGIC_NUMBER in 'next_log_block'.
The filesystem is corrupted here, journal is partially replayed, and
new journal sequence number actually is already used by last mounting.
The invalidate_bdev() can drop all buffer heads even racing with bare
reading block device(eg. systemd-udev), so we can fix it by invalidating
bdev in error handling path in __ext4_fill_super().
Fetch a reproducer in [Link].
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217171
Cc: <stable@kernel.org>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
fs/ext4/super.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
Comments
On Fri 10-03-23 20:52:02, Zhihao Cheng wrote: > Following process makes ext4 load stale buffer heads from last failed > mounting in a new mounting operation: > mount_bdev > ext4_fill_super > | ext4_load_and_init_journal > | ext4_load_journal > | jbd2_journal_load > | load_superblock > | journal_get_superblock > | set_buffer_verified(bh) // buffer head is verified > | jbd2_journal_recover // failed caused by EIO > | goto failed_mount3a // skip 'sb->s_root' initialization > deactivate_locked_super > kill_block_super > generic_shutdown_super > if (sb->s_root) > // false, skip ext4_put_super->invalidate_bdev-> > // invalidate_mapping_pages->mapping_evict_folio-> > // filemap_release_folio->try_to_free_buffers, which > // cannot drop buffer head. > blkdev_put > blkdev_put_whole > if (atomic_dec_and_test(&bdev->bd_openers)) > // false, systemd-udev happens to open the device. Then > // blkdev_flush_mapping->kill_bdev->truncate_inode_pages-> > // truncate_inode_folio->truncate_cleanup_folio-> > // folio_invalidate->block_invalidate_folio-> > // filemap_release_folio->try_to_free_buffers will be skipped, > // dropping buffer head is missed again. > > Second mount: > ext4_fill_super > ext4_load_and_init_journal > ext4_load_journal > ext4_get_journal > jbd2_journal_init_inode > journal_init_common > bh = getblk_unmovable > bh = __find_get_block // Found stale bh in last failed mounting > journal->j_sb_buffer = bh > jbd2_journal_load > load_superblock > journal_get_superblock > if (buffer_verified(bh)) > // true, skip journal->j_format_version = 2, value is 0 > jbd2_journal_recover > do_one_pass > next_log_block += count_tags(journal, bh) > // According to journal_tag_bytes(), 'tag_bytes' calculating is > // affected by jbd2_has_feature_csum3(), jbd2_has_feature_csum3() > // returns false because 'j->j_format_version >= 2' is not true, > // then we get wrong next_log_block. The do_one_pass may exit > // early whenoccuring non JBD2_MAGIC_NUMBER in 'next_log_block'. > > The filesystem is corrupted here, journal is partially replayed, and > new journal sequence number actually is already used by last mounting. > > The invalidate_bdev() can drop all buffer heads even racing with bare > reading block device(eg. systemd-udev), so we can fix it by invalidating > bdev in error handling path in __ext4_fill_super(). > > Fetch a reproducer in [Link]. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217171 > Cc: <stable@kernel.org> > Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> The fix looks good to me. Feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > fs/ext4/super.c | 15 ++++++++------- > 1 file changed, 8 insertions(+), 7 deletions(-) > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index 88f7b8a88c76..7e990637bc48 100644 > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c > @@ -1126,6 +1126,12 @@ static void ext4_blkdev_remove(struct ext4_sb_info *sbi) > struct block_device *bdev; > bdev = sbi->s_journal_bdev; > if (bdev) { > + /* > + * Invalidate the journal device's buffers. We don't want them > + * floating about in memory - the physical journal device may > + * hotswapped, and it breaks the `ro-after' testing code. > + */ > + invalidate_bdev(bdev); > ext4_blkdev_put(bdev); > sbi->s_journal_bdev = NULL; > } > @@ -1271,14 +1277,8 @@ static void ext4_put_super(struct super_block *sb) > > sync_blockdev(sb->s_bdev); > invalidate_bdev(sb->s_bdev); > - if (sbi->s_journal_bdev && sbi->s_journal_bdev != sb->s_bdev) { > - /* > - * Invalidate the journal device's buffers. We don't want them > - * floating about in memory - the physical journal device may > - * hotswapped, and it breaks the `ro-after' testing code. > - */ > + if (sbi->s_journal_bdev) { > sync_blockdev(sbi->s_journal_bdev); > - invalidate_bdev(sbi->s_journal_bdev); > ext4_blkdev_remove(sbi); > } > > @@ -5610,6 +5610,7 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb) > brelse(sbi->s_sbh); > ext4_blkdev_remove(sbi); > out_fail: > + invalidate_bdev(sb->s_bdev); > sb->s_fs_info = NULL; > return err ? err : ret; > } > -- > 2.31.1 >
Hi, Zhihao, On 3/10/23 12:52, Zhihao Cheng wrote: cut > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217171 > Cc: <stable@kernel.org> Shouldn't have been stable@vger.kernel.org instead? That's what is advertised at: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html A Fixes tag would be helpful. It would assist the stable kernel team, or people that have to backport your patch, in determining which stable versions should receive your fix. Same suggestion is made in https://www.kernel.org/doc/html/latest/process/submitting-patches.html Thanks! ta
diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 88f7b8a88c76..7e990637bc48 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1126,6 +1126,12 @@ static void ext4_blkdev_remove(struct ext4_sb_info *sbi) struct block_device *bdev; bdev = sbi->s_journal_bdev; if (bdev) { + /* + * Invalidate the journal device's buffers. We don't want them + * floating about in memory - the physical journal device may + * hotswapped, and it breaks the `ro-after' testing code. + */ + invalidate_bdev(bdev); ext4_blkdev_put(bdev); sbi->s_journal_bdev = NULL; } @@ -1271,14 +1277,8 @@ static void ext4_put_super(struct super_block *sb) sync_blockdev(sb->s_bdev); invalidate_bdev(sb->s_bdev); - if (sbi->s_journal_bdev && sbi->s_journal_bdev != sb->s_bdev) { - /* - * Invalidate the journal device's buffers. We don't want them - * floating about in memory - the physical journal device may - * hotswapped, and it breaks the `ro-after' testing code. - */ + if (sbi->s_journal_bdev) { sync_blockdev(sbi->s_journal_bdev); - invalidate_bdev(sbi->s_journal_bdev); ext4_blkdev_remove(sbi); } @@ -5610,6 +5610,7 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb) brelse(sbi->s_sbh); ext4_blkdev_remove(sbi); out_fail: + invalidate_bdev(sb->s_bdev); sb->s_fs_info = NULL; return err ? err : ret; }