[v3] jbd2: Fix potential data lost in recovering journal raced with synchronizing fs bdev
Message ID | 20230919012525.1783108-1-chengzhihao1@huawei.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp3067707vqi; Mon, 18 Sep 2023 18:35:04 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFF53Spu8U464Z0HpJ1MpN0FrHgOMh7lbbzHldaO08LDZ2nesSNAVVoam3HVqL3c7QZBqlt X-Received: by 2002:a05:6a00:22d0:b0:68c:a81:4368 with SMTP id f16-20020a056a0022d000b0068c0a814368mr10457345pfj.6.1695087304581; Mon, 18 Sep 2023 18:35:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695087304; cv=none; d=google.com; s=arc-20160816; b=fW9cUplSYJpO/tUXr5WvpkZVrREWHMapcjaZyF27WSUR8+lj224q2se6rqkBr50LpN 1OfBSc5zxvY9wkjKF1OynoFIk2yzy+c/JkrW+J3eKsZWMBp9ZBxINKIEdmLMKnSOgAJb 9BpimyxV8yL312dV+FJ1TAx55oSKYmN5OwIrcZPFqIIioo4fGoHm8/L9v8wxOPKb7m96 LcTso/l4P3t4UXqrqoz5FD6k4jujL8OFxZiTh6CNy/NgHH/NXCuVxmoDVlqGgxf+Jid6 t2ozZdEdwp8qiAJeQceDfiZ6Pd687/TB/iWr1zSunMe9n899K9LuT3KRDo4SRPHt3ZCA 5sEg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=9FKCDywk3PfOGGEoO8mMmE+TTcnUwedfyGmOJpGBCc8=; fh=aCPesFXtiotw8J5f6WLQRIus1aexICE2r1Wx2Btxekk=; b=LD/vSESWgVCFv7VsTZfu8CJ2zXmZ8w4OPxxOfnW0n7n3Azv85phKl1fXYng397Z5QE 2Ll4IkUCjd3LN/zXEkos6PD73ULkc8bfsCL8jc0pcnSdxlPMZXe99PXNk8E1DEErs1WU 2PTsLe7SUiV/ywoV9ZU+fxNWE6MJpux0tH/qfI96FU7GuU2z5d1eXjNRLzvOQA3JpvcI 4m/B5AupWC3HwtDuE3Y89SIE1UVRnIeO/t6wOrMvM7xSnrO7QJOYV0cO1qg5NoXgMalI VG0FQE1hJBDmdqN8QsWv0GhpixEL9oRyIl+dftpCGUvq68+tdW+JDL/MFvIuSlux+SzE /TCg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id k24-20020a63ff18000000b005697ebac19esi7631142pgi.776.2023.09.18.18.35.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Sep 2023 18:35:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 78A1B81BD3E2; Mon, 18 Sep 2023 18:31:49 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230446AbjISBb0 (ORCPT <rfc822;toshivichauhan@gmail.com> + 26 others); Mon, 18 Sep 2023 21:31:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56644 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229534AbjISBbZ (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 18 Sep 2023 21:31:25 -0400 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8775094; Mon, 18 Sep 2023 18:31:17 -0700 (PDT) Received: from kwepemm600013.china.huawei.com (unknown [172.30.72.54]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4RqPCS4Km4ztSq2; Tue, 19 Sep 2023 09:27:00 +0800 (CST) Received: from huawei.com (10.175.104.67) by kwepemm600013.china.huawei.com (7.193.23.68) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Tue, 19 Sep 2023 09:31:12 +0800 From: Zhihao Cheng <chengzhihao1@huawei.com> To: <tytso@mit.edu>, <jack@suse.com> CC: <linux-ext4@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <chengzhihao1@huawei.com>, <yi.zhang@huawei.com> Subject: [PATCH v3] jbd2: Fix potential data lost in recovering journal raced with synchronizing fs bdev Date: Tue, 19 Sep 2023 09:25:25 +0800 Message-ID: <20230919012525.1783108-1-chengzhihao1@huawei.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.175.104.67] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemm600013.china.huawei.com (7.193.23.68) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Mon, 18 Sep 2023 18:31:49 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777427865548258066 X-GMAIL-MSGID: 1777427865548258066 |
Series |
[v3] jbd2: Fix potential data lost in recovering journal raced with synchronizing fs bdev
|
|
Commit Message
Zhihao Cheng
Sept. 19, 2023, 1:25 a.m. UTC
JBD2 makes sure journal data is fallen on fs device by sync_blockdev(), however, other process could intercept the EIO information from bdev's mapping, which leads journal recovering successful even EIO occurs during data written back to fs device. We found this problem in our product, iscsi + multipath is chosen for block device of ext4. Unstable network may trigger kpartx to rescan partitions in device mapper layer. Detailed process is shown as following: mount kpartx irq jbd2_journal_recover do_one_pass memcpy(nbh->b_data, obh->b_data) // copy data to fs dev from journal mark_buffer_dirty // mark bh dirty vfs_read generic_file_read_iter // dio filemap_write_and_wait_range __filemap_fdatawrite_range do_writepages block_write_full_folio submit_bh_wbc >> EIO occurs in disk << end_buffer_async_write mark_buffer_write_io_error mapping_set_error set_bit(AS_EIO, &mapping->flags) // set! filemap_check_errors test_and_clear_bit(AS_EIO, &mapping->flags) // clear! err2 = sync_blockdev filemap_write_and_wait filemap_check_errors test_and_clear_bit(AS_EIO, &mapping->flags) // false err2 = 0 Filesystem is mounted successfully even data from journal is failed written into disk, and ext4/ocfs2 could become corrupted. Fix it by comparing the wb_err state in fs block device before recovering and after recovering. Fetch a reproducer in [Link]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217888 Cc: stable@vger.kernel.org Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> --- v1->v3: Initialize wb_err. Untialized wb_err could be same with mapping->wb_err(eg. EIO without ERRSEQ_SEEN). When EIO occurs again, mapping->wb_err won't be changed, and wb_err is still same with mapping->wb_err. fs/jbd2/recovery.c | 8 ++++++++ 1 file changed, 8 insertions(+)
Comments
On Tue 19-09-23 09:25:25, Zhihao Cheng wrote: > JBD2 makes sure journal data is fallen on fs device by sync_blockdev(), > however, other process could intercept the EIO information from bdev's > mapping, which leads journal recovering successful even EIO occurs during > data written back to fs device. > > We found this problem in our product, iscsi + multipath is chosen for block > device of ext4. Unstable network may trigger kpartx to rescan partitions in > device mapper layer. Detailed process is shown as following: > > mount kpartx irq > jbd2_journal_recover > do_one_pass > memcpy(nbh->b_data, obh->b_data) // copy data to fs dev from journal > mark_buffer_dirty // mark bh dirty > vfs_read > generic_file_read_iter // dio > filemap_write_and_wait_range > __filemap_fdatawrite_range > do_writepages > block_write_full_folio > submit_bh_wbc > >> EIO occurs in disk << > end_buffer_async_write > mark_buffer_write_io_error > mapping_set_error > set_bit(AS_EIO, &mapping->flags) // set! > filemap_check_errors > test_and_clear_bit(AS_EIO, &mapping->flags) // clear! > err2 = sync_blockdev > filemap_write_and_wait > filemap_check_errors > test_and_clear_bit(AS_EIO, &mapping->flags) // false > err2 = 0 > > Filesystem is mounted successfully even data from journal is failed written > into disk, and ext4/ocfs2 could become corrupted. > > Fix it by comparing the wb_err state in fs block device before recovering > and after recovering. > > Fetch a reproducer in [Link]. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217888 > Cc: stable@vger.kernel.org > Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> > Signed-off-by: Zhang Yi <yi.zhang@huawei.com> > --- > v1->v3: Initialize wb_err. Untialized wb_err could be same with > mapping->wb_err(eg. EIO without ERRSEQ_SEEN). When EIO > occurs again, mapping->wb_err won't be changed, and wb_err > is still same with mapping->wb_err. Yeah, good catch. The patch still looks good to me. Feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > fs/jbd2/recovery.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c > index c269a7d29a46..5b771a3d8d9a 100644 > --- a/fs/jbd2/recovery.c > +++ b/fs/jbd2/recovery.c > @@ -289,6 +289,8 @@ int jbd2_journal_recover(journal_t *journal) > journal_superblock_t * sb; > > struct recovery_info info; > + errseq_t wb_err; > + struct address_space *mapping; > > memset(&info, 0, sizeof(info)); > sb = journal->j_superblock; > @@ -306,6 +308,9 @@ int jbd2_journal_recover(journal_t *journal) > return 0; > } > > + wb_err = 0; > + mapping = journal->j_fs_dev->bd_inode->i_mapping; > + errseq_check_and_advance(&mapping->wb_err, &wb_err); > err = do_one_pass(journal, &info, PASS_SCAN); > if (!err) > err = do_one_pass(journal, &info, PASS_REVOKE); > @@ -327,6 +332,9 @@ int jbd2_journal_recover(journal_t *journal) > > jbd2_journal_clear_revoke(journal); > err2 = sync_blockdev(journal->j_fs_dev); > + if (!err) > + err = err2; > + err2 = errseq_check_and_advance(&mapping->wb_err, &wb_err); > if (!err) > err = err2; > /* Make sure all replayed data is on permanent storage */ > -- > 2.39.2 >
On Tue, 19 Sep 2023 09:25:25 +0800, Zhihao Cheng wrote: > JBD2 makes sure journal data is fallen on fs device by sync_blockdev(), > however, other process could intercept the EIO information from bdev's > mapping, which leads journal recovering successful even EIO occurs during > data written back to fs device. > > We found this problem in our product, iscsi + multipath is chosen for block > device of ext4. Unstable network may trigger kpartx to rescan partitions in > device mapper layer. Detailed process is shown as following: > > [...] Applied, thanks! [1/1] jbd2: Fix potential data lost in recovering journal raced with synchronizing fs bdev commit: 61187fce8600e8ef90e601be84f9d0f3222c1206 Best regards,
diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c index c269a7d29a46..5b771a3d8d9a 100644 --- a/fs/jbd2/recovery.c +++ b/fs/jbd2/recovery.c @@ -289,6 +289,8 @@ int jbd2_journal_recover(journal_t *journal) journal_superblock_t * sb; struct recovery_info info; + errseq_t wb_err; + struct address_space *mapping; memset(&info, 0, sizeof(info)); sb = journal->j_superblock; @@ -306,6 +308,9 @@ int jbd2_journal_recover(journal_t *journal) return 0; } + wb_err = 0; + mapping = journal->j_fs_dev->bd_inode->i_mapping; + errseq_check_and_advance(&mapping->wb_err, &wb_err); err = do_one_pass(journal, &info, PASS_SCAN); if (!err) err = do_one_pass(journal, &info, PASS_REVOKE); @@ -327,6 +332,9 @@ int jbd2_journal_recover(journal_t *journal) jbd2_journal_clear_revoke(journal); err2 = sync_blockdev(journal->j_fs_dev); + if (!err) + err = err2; + err2 = errseq_check_and_advance(&mapping->wb_err, &wb_err); if (!err) err = err2; /* Make sure all replayed data is on permanent storage */