Message ID | 20230313132021.672134-2-chengzhihao1@huawei.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1187971wrd; Mon, 13 Mar 2023 06:32:57 -0700 (PDT) X-Google-Smtp-Source: AK7set/EyM3cz3CxWPSpLfLWZ2ht2k+bj5jzKoXzJxjPO4gg/JO7HH+8V+hAy05EP1VrwACjEJdC X-Received: by 2002:a05:6a21:33a2:b0:cd:97f3:25e1 with SMTP id yy34-20020a056a2133a200b000cd97f325e1mr45194302pzb.51.1678714377095; Mon, 13 Mar 2023 06:32:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678714377; cv=none; d=google.com; s=arc-20160816; b=SC4J9s1SDHElVT47MR4i8ZSTyCQ1dQBSBV9ZieDlb3YUB5A5GIAxLkfmOvn5oMb0bP kIcJfqthQ0NXwxOKyn34lpAOHo6iWkp/tsLpckTsx8xpSQU1rxSJuk6nKPTNi6ZIbfRu 1YVwoLvVpENhe0Xv3YuIk5GjytCq3vWvkkKP17uS8p+rzTLD6iNWBJSiNXBA10YPQ1z1 wWppAIbdTpY7bdiFPomx64P7Vu5kxVIJ4ESgj+GdoGKRdLJVSDrAH1E5nKvCvC0iQ5jP tBdOSPubINvddgyKEgk+TSDmago0f/k00q2fwUKvGZMU7xE8vLPBTUjZuZbwGS04uYIC toYw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=kWYzYUl3W7xsDxnKtqbz6nH4lpvFjQWiEDurDeTsFGk=; b=sz0woWSfjV+/ISTeTCxr1fN1tEyhKAZjT59B+aaQOGkbedRJjm4s5RNTY4nUNnjvZO bausyUGNATsAsWdoDhqZ+myN3gO7LULsiV0cbTourZLZQpe773dFw6Ele2LAIkv5qr9W rSQ3+nlkBfppIO+O5GgQdwkJrxAtaxJV7f2FA5GHWFiZT0g7ueJ63RoXZDRfOFPYjpfh bGDE1RZfmaETqdmvG1nz0PBWYa+AibwSQ6yAKs2wciNq3uN100pwHVXh5cy+rkbX1OBD qJ5I7cqS8M7RiA23VdNCqWvuw9mJWCEmdtGb97fIWCLW0l+J6WggtSrlCtC4UiVoHQOt hOMg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j70-20020a638049000000b00507681e127dsi6514539pgd.700.2023.03.13.06.32.42; Mon, 13 Mar 2023 06:32:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230419AbjCMNV0 (ORCPT <rfc822;realc9580@gmail.com> + 99 others); Mon, 13 Mar 2023 09:21:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59688 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229996AbjCMNVP (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 13 Mar 2023 09:21:15 -0400 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3A3C1360B7; Mon, 13 Mar 2023 06:21:13 -0700 (PDT) Received: from kwepemm600013.china.huawei.com (unknown [172.30.72.56]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4PZxzq5GCxz17KfZ; Mon, 13 Mar 2023 21:18:15 +0800 (CST) Received: from huawei.com (10.175.127.227) by kwepemm600013.china.huawei.com (7.193.23.68) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.21; Mon, 13 Mar 2023 21:21:09 +0800 From: Zhihao Cheng <chengzhihao1@huawei.com> To: <tytso@mit.edu>, <adilger.kernel@dilger.ca>, <jack@suse.com>, <tudor.ambarus@linaro.org> CC: <linux-ext4@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <chengzhihao1@huawei.com>, <yi.zhang@huawei.com> Subject: [PATCH v2 1/5] ext4: Fix reusing stale buffer heads from last failed mounting Date: Mon, 13 Mar 2023 21:20:17 +0800 Message-ID: <20230313132021.672134-2-chengzhihao1@huawei.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230313132021.672134-1-chengzhihao1@huawei.com> References: <20230313132021.672134-1-chengzhihao1@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.175.127.227] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To kwepemm600013.china.huawei.com (7.193.23.68) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760259606322937179?= X-GMAIL-MSGID: =?utf-8?q?1760259606322937179?= |
Series |
ext4: Fix stale buffer loading from last failed
|
|
Commit Message
Zhihao Cheng
March 13, 2023, 1:20 p.m. UTC
Following process makes ext4 load stale buffer heads from last failed
mounting in a new mounting operation:
mount_bdev
ext4_fill_super
| ext4_load_and_init_journal
| ext4_load_journal
| jbd2_journal_load
| load_superblock
| journal_get_superblock
| set_buffer_verified(bh) // buffer head is verified
| jbd2_journal_recover // failed caused by EIO
| goto failed_mount3a // skip 'sb->s_root' initialization
deactivate_locked_super
kill_block_super
generic_shutdown_super
if (sb->s_root)
// false, skip ext4_put_super->invalidate_bdev->
// invalidate_mapping_pages->mapping_evict_folio->
// filemap_release_folio->try_to_free_buffers, which
// cannot drop buffer head.
blkdev_put
blkdev_put_whole
if (atomic_dec_and_test(&bdev->bd_openers))
// false, systemd-udev happens to open the device. Then
// blkdev_flush_mapping->kill_bdev->truncate_inode_pages->
// truncate_inode_folio->truncate_cleanup_folio->
// folio_invalidate->block_invalidate_folio->
// filemap_release_folio->try_to_free_buffers will be skipped,
// dropping buffer head is missed again.
Second mount:
ext4_fill_super
ext4_load_and_init_journal
ext4_load_journal
ext4_get_journal
jbd2_journal_init_inode
journal_init_common
bh = getblk_unmovable
bh = __find_get_block // Found stale bh in last failed mounting
journal->j_sb_buffer = bh
jbd2_journal_load
load_superblock
journal_get_superblock
if (buffer_verified(bh))
// true, skip journal->j_format_version = 2, value is 0
jbd2_journal_recover
do_one_pass
next_log_block += count_tags(journal, bh)
// According to journal_tag_bytes(), 'tag_bytes' calculating is
// affected by jbd2_has_feature_csum3(), jbd2_has_feature_csum3()
// returns false because 'j->j_format_version >= 2' is not true,
// then we get wrong next_log_block. The do_one_pass may exit
// early whenoccuring non JBD2_MAGIC_NUMBER in 'next_log_block'.
The filesystem is corrupted here, journal is partially replayed, and
new journal sequence number actually is already used by last mounting.
The invalidate_bdev() can drop all buffer heads even racing with bare
reading block device(eg. systemd-udev), so we can fix it by invalidating
bdev in error handling path in __ext4_fill_super().
Fetch a reproducer in [Link].
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217171
Fixes: 25ed6e8a54df ("jbd2: enable journal clients to enable v2 checksumming")
Cc: stable@vger.kernel.org # v3.5
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
fs/ext4/super.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
Comments
On Mon 13-03-23 21:20:17, Zhihao Cheng wrote: > Following process makes ext4 load stale buffer heads from last failed > mounting in a new mounting operation: > mount_bdev > ext4_fill_super > | ext4_load_and_init_journal > | ext4_load_journal > | jbd2_journal_load > | load_superblock > | journal_get_superblock > | set_buffer_verified(bh) // buffer head is verified > | jbd2_journal_recover // failed caused by EIO > | goto failed_mount3a // skip 'sb->s_root' initialization > deactivate_locked_super > kill_block_super > generic_shutdown_super > if (sb->s_root) > // false, skip ext4_put_super->invalidate_bdev-> > // invalidate_mapping_pages->mapping_evict_folio-> > // filemap_release_folio->try_to_free_buffers, which > // cannot drop buffer head. > blkdev_put > blkdev_put_whole > if (atomic_dec_and_test(&bdev->bd_openers)) > // false, systemd-udev happens to open the device. Then > // blkdev_flush_mapping->kill_bdev->truncate_inode_pages-> > // truncate_inode_folio->truncate_cleanup_folio-> > // folio_invalidate->block_invalidate_folio-> > // filemap_release_folio->try_to_free_buffers will be skipped, > // dropping buffer head is missed again. > > Second mount: > ext4_fill_super > ext4_load_and_init_journal > ext4_load_journal > ext4_get_journal > jbd2_journal_init_inode > journal_init_common > bh = getblk_unmovable > bh = __find_get_block // Found stale bh in last failed mounting > journal->j_sb_buffer = bh > jbd2_journal_load > load_superblock > journal_get_superblock > if (buffer_verified(bh)) > // true, skip journal->j_format_version = 2, value is 0 > jbd2_journal_recover > do_one_pass > next_log_block += count_tags(journal, bh) > // According to journal_tag_bytes(), 'tag_bytes' calculating is > // affected by jbd2_has_feature_csum3(), jbd2_has_feature_csum3() > // returns false because 'j->j_format_version >= 2' is not true, > // then we get wrong next_log_block. The do_one_pass may exit > // early whenoccuring non JBD2_MAGIC_NUMBER in 'next_log_block'. > > The filesystem is corrupted here, journal is partially replayed, and > new journal sequence number actually is already used by last mounting. > > The invalidate_bdev() can drop all buffer heads even racing with bare > reading block device(eg. systemd-udev), so we can fix it by invalidating > bdev in error handling path in __ext4_fill_super(). > > Fetch a reproducer in [Link]. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217171 > Fixes: 25ed6e8a54df ("jbd2: enable journal clients to enable v2 checksumming") > Cc: stable@vger.kernel.org # v3.5 > Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> ... > @@ -1271,14 +1277,8 @@ static void ext4_put_super(struct super_block *sb) > > sync_blockdev(sb->s_bdev); > invalidate_bdev(sb->s_bdev); > - if (sbi->s_journal_bdev && sbi->s_journal_bdev != sb->s_bdev) { > - /* > - * Invalidate the journal device's buffers. We don't want them > - * floating about in memory - the physical journal device may > - * hotswapped, and it breaks the `ro-after' testing code. > - */ > + if (sbi->s_journal_bdev) { > sync_blockdev(sbi->s_journal_bdev); > - invalidate_bdev(sbi->s_journal_bdev); > ext4_blkdev_remove(sbi); > } Hum, but this will invalidate bhs only if journal is stored on a block device. If journal is in the inode (the common case), we won't invalidate anything (sbi->s_journal_bdev is NULL) and the same problem can happen? Honza
在 2023/3/14 19:33, Jan Kara 写道: Hi Jan, > >> @@ -1271,14 +1277,8 @@ static void ext4_put_super(struct super_block *sb) >> >> sync_blockdev(sb->s_bdev); >> invalidate_bdev(sb->s_bdev); For journal in the inode case, journal bhs come from block device, which means buffers will be dropped after this line 'invalidate_bdev(sb->s_bdev)' being executed. >> - if (sbi->s_journal_bdev && sbi->s_journal_bdev != sb->s_bdev) { >> - /* >> - * Invalidate the journal device's buffers. We don't want them >> - * floating about in memory - the physical journal device may >> - * hotswapped, and it breaks the `ro-after' testing code. >> - */ >> + if (sbi->s_journal_bdev) { >> sync_blockdev(sbi->s_journal_bdev); >> - invalidate_bdev(sbi->s_journal_bdev); >> ext4_blkdev_remove(sbi); >> } > > Hum, but this will invalidate bhs only if journal is stored on a block > device. If journal is in the inode (the common case), we won't invalidate > anything (sbi->s_journal_bdev is NULL) and the same problem can happen? >
On Tue 14-03-23 20:01:46, Zhihao Cheng wrote: > 在 2023/3/14 19:33, Jan Kara 写道: > Hi Jan, > > > > > > @@ -1271,14 +1277,8 @@ static void ext4_put_super(struct super_block *sb) > > > sync_blockdev(sb->s_bdev); > > > invalidate_bdev(sb->s_bdev); > > For journal in the inode case, journal bhs come from block device, which > means buffers will be dropped after this line 'invalidate_bdev(sb->s_bdev)' > being executed. Right, I've missed that. But then why do you remove the sbi->s_journal_bdev != sb->s_bdev condition below? > > > - if (sbi->s_journal_bdev && sbi->s_journal_bdev != sb->s_bdev) { > > > - /* > > > - * Invalidate the journal device's buffers. We don't want them > > > - * floating about in memory - the physical journal device may > > > - * hotswapped, and it breaks the `ro-after' testing code. > > > - */ > > > + if (sbi->s_journal_bdev) { > > > sync_blockdev(sbi->s_journal_bdev); > > > - invalidate_bdev(sbi->s_journal_bdev); > > > ext4_blkdev_remove(sbi); > > > } > > > Hum, but this will invalidate bhs only if journal is stored on a block > > device. If journal is in the inode (the common case), we won't invalidate > > anything (sbi->s_journal_bdev is NULL) and the same problem can happen? > > Honza
> On Tue 14-03-23 20:01:46, Zhihao Cheng wrote: >> 在 2023/3/14 19:33, Jan Kara 写道: >> Hi Jan, >> >>> >>>> @@ -1271,14 +1277,8 @@ static void ext4_put_super(struct super_block *sb) >>>> sync_blockdev(sb->s_bdev); >>>> invalidate_bdev(sb->s_bdev); >> >> For journal in the inode case, journal bhs come from block device, which >> means buffers will be dropped after this line 'invalidate_bdev(sb->s_bdev)' >> being executed. > > Right, I've missed that. But then why do you remove the sbi->s_journal_bdev > != sb->s_bdev condition below? > I think 'sbi->s_journal_bdev != sb->s_bdev' always becomes true if sbi->s_journal_bdev exists. mount_bdev fmode_t mode = FMODE_READ | FMODE_EXCL bdev_a = blkdev_get_by_path(dev_name, mode, fs_type) mount_bdev->ext4_fill_super->ext4_load_and_init_journal->ext4_load_journal->ext4_get_dev_journal: bdev_b = ext4_blkdev_get(j_dev, sb) bdev_b = blkdev_get_by_dev(dev, FMODE_READ|FMODE_WRITE|FMODE_EXCL, sb) EXT4_SB(sb)->s_journal_bdev = bdev_b bdev_a cannot be bdev_b, because bd_prepare_to_claim() makes sure the same block device cannot be openned twice with mode 'FMODE_EXCL'.
On Tue 14-03-23 20:31:43, Zhihao Cheng wrote: > > On Tue 14-03-23 20:01:46, Zhihao Cheng wrote: > > > 在 2023/3/14 19:33, Jan Kara 写道: > > > Hi Jan, > > > > > > > > > > > > @@ -1271,14 +1277,8 @@ static void ext4_put_super(struct super_block *sb) > > > > > sync_blockdev(sb->s_bdev); > > > > > invalidate_bdev(sb->s_bdev); > > > > > > For journal in the inode case, journal bhs come from block device, which > > > means buffers will be dropped after this line 'invalidate_bdev(sb->s_bdev)' > > > being executed. > > > > Right, I've missed that. But then why do you remove the sbi->s_journal_bdev > > != sb->s_bdev condition below? > > > > I think 'sbi->s_journal_bdev != sb->s_bdev' always becomes true if > sbi->s_journal_bdev exists. OK, fair point. But please move this cleanup into a separate commit with this justification. Thanks! Honza
> OK, fair point. But please move this cleanup into a separate commit with > this justification. Thanks! OK, I will split it from patch 1 in v3.
diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 88f7b8a88c76..7e990637bc48 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1126,6 +1126,12 @@ static void ext4_blkdev_remove(struct ext4_sb_info *sbi) struct block_device *bdev; bdev = sbi->s_journal_bdev; if (bdev) { + /* + * Invalidate the journal device's buffers. We don't want them + * floating about in memory - the physical journal device may + * hotswapped, and it breaks the `ro-after' testing code. + */ + invalidate_bdev(bdev); ext4_blkdev_put(bdev); sbi->s_journal_bdev = NULL; } @@ -1271,14 +1277,8 @@ static void ext4_put_super(struct super_block *sb) sync_blockdev(sb->s_bdev); invalidate_bdev(sb->s_bdev); - if (sbi->s_journal_bdev && sbi->s_journal_bdev != sb->s_bdev) { - /* - * Invalidate the journal device's buffers. We don't want them - * floating about in memory - the physical journal device may - * hotswapped, and it breaks the `ro-after' testing code. - */ + if (sbi->s_journal_bdev) { sync_blockdev(sbi->s_journal_bdev); - invalidate_bdev(sbi->s_journal_bdev); ext4_blkdev_remove(sbi); } @@ -5610,6 +5610,7 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb) brelse(sbi->s_sbh); ext4_blkdev_remove(sbi); out_fail: + invalidate_bdev(sb->s_bdev); sb->s_fs_info = NULL; return err ? err : ret; }