[v3,0/2] fix error flag covered by journal recovery

Message ID	20230214022905.765088-1-yebin@huaweicloud.com
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; From: Ye Bin <yebin@huaweicloud.com> To: tytso@mit.edu, adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org Cc: linux-kernel@vger.kernel.org, jack@suse.cz, Ye Bin <yebin10@huawei.com> Subject: [PATCH v3 0/2] fix error flag covered by journal recovery Date: Tue, 14 Feb 2023 10:29:03 +0800 Message-Id: <20230214022905.765088-1-yebin@huaweicloud.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	fix error flag covered by journal recovery \| [v3,0/2] fix error flag covered by journal recovery [v3,1/2] ext4: commit super block if fs record error when journal record without error [v3,2/2] ext4: make sure fs error flag setted before clear journal error

Message ID

20230214022905.765088-1-yebin@huaweicloud.com

Headers

Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
From: Ye Bin <yebin@huaweicloud.com>
To: tytso@mit.edu, adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, jack@suse.cz,
        Ye Bin <yebin10@huawei.com>
Subject: [PATCH v3 0/2] fix error flag covered by journal recovery
Date: Tue, 14 Feb 2023 10:29:03 +0800
Message-Id: <20230214022905.765088-1-yebin@huaweicloud.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

fix error flag covered by journal recovery |

Message

Ye Bin Feb. 14, 2023, 2:29 a.m. UTC

  From: Ye Bin <yebin10@huawei.com>

Diff v3 Vs v2:
Only fix fs error flag lost when previous journal errno is not record
in disk. As this may lead to drop orphan list, however fs not record
error flag, then fsck will not repair deeply.

Diff v2 vs v1:
Move call 'j_replay_prepare_callback' and 'j_replay_end_callback' from
ext4_load_journal() to jbd2_journal_recover().

When do fault injection test, got issue as follows:
EXT4-fs (dm-5): warning: mounting fs with errors, running e2fsck is recommended
EXT4-fs (dm-5): Errors on filesystem, clearing orphan list.
EXT4-fs (dm-5): recovery complete
EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: data_err=abort,errors=remount-ro

EXT4-fs (dm-5): recovery complete
EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: data_err=abort,errors=remount-ro

Without do file system check, file system is clean when do second mount.
Theoretically, the kernel will not clear fs error flag. In errors=remount-ro
mode the last super block is commit directly. So super block in journal is
not uptodate. When do jounral recovery, the uptodate super block will be
covered by jounral data. If super block submit all failed after recover
journal, then file system error flag is lost. When do "fsck -a" couldn't
repair file system deeply.
To solve above issue we need to do extra handle when do super block journal
recovery.


Ye Bin (2):
  ext4: commit super block if fs record error when journal record
    without error
  ext4: make sure fs error flag setted before clear journal error

 fs/ext4/super.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

Comments

Baokun Li Feb. 16, 2023, 7:18 a.m. UTC | #1

On 2023/2/14 10:29, Ye Bin wrote:
> From: Ye Bin <yebin10@huawei.com>
>
> Diff v3 Vs v2:
> Only fix fs error flag lost when previous journal errno is not record
> in disk. As this may lead to drop orphan list, however fs not record
> error flag, then fsck will not repair deeply.
>
> Diff v2 vs v1:
> Move call 'j_replay_prepare_callback' and 'j_replay_end_callback' from
> ext4_load_journal() to jbd2_journal_recover().
>
> When do fault injection test, got issue as follows:
> EXT4-fs (dm-5): warning: mounting fs with errors, running e2fsck is recommended
> EXT4-fs (dm-5): Errors on filesystem, clearing orphan list.
> EXT4-fs (dm-5): recovery complete
> EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: data_err=abort,errors=remount-ro
>
> EXT4-fs (dm-5): recovery complete
> EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: data_err=abort,errors=remount-ro
>
> Without do file system check, file system is clean when do second mount.
> Theoretically, the kernel will not clear fs error flag. In errors=remount-ro
> mode the last super block is commit directly. So super block in journal is
> not uptodate. When do jounral recovery, the uptodate super block will be
> covered by jounral data. If super block submit all failed after recover
> journal, then file system error flag is lost. When do "fsck -a" couldn't
> repair file system deeply.
> To solve above issue we need to do extra handle when do super block journal
> recovery.
>
>
> Ye Bin (2):
>    ext4: commit super block if fs record error when journal record
>      without error
>    ext4: make sure fs error flag setted before clear journal error
>
>   fs/ext4/super.c | 18 ++++++++++++++++--
>   1 file changed, 16 insertions(+), 2 deletions(-)
When we proceed in the flow of ( uninstall after injecting fault 
triggered error -> mount
kernel replay journal -> umount to view fsck info ), there are three cases:

1. When an injection fault causes the ERROR_FS flag to not be saved to 
disk, but j_errno
is successfully saved to disk, PATCH 2/2 effectively ensures that 
ERROR_FS is saved to disk
so that fsck performs a force check to discover the error correctly.

2. When j_errno is lost and the ERROR_FS flag is saved, after the 
journal replay：
     a. The ext4_super_block on disk has neither error info nor ERROR_FS 
flag;
     b. The ext4_super_block in memory contains error info but no 
ERROR_FS flag
         because the error info is copied additionally during journal 
replay;
     c. The ext4_sb_info in memory contains both error info and ERROR_FS 
flag.
This means that the ext4_super_block in memory will be written to disk 
the next time
ext4_commit_super is executed, while the ERROR_FS flag in ext4_sb_info 
will not be written
to disk until ext4_put_super is called. So if there is a disk 
deletion/power failure/disk offline,
we will lose the ERROR_FS flag or even the error info.

(In this case, repairing directly with e2fsck will not do a force check 
either, because it
relies on j_errno to recover the ERROR_FS flag after the journal replay. 
And it reloads
the information from the disk into memory after the journal replay, so the
ERROR_FS flag and error info are completely lost.)

3. If neither the ERROR_FS flag nor j_errno are saved to disk, we seem 
to be unable to
determine if a deep sweep is currently needed. But I think when journal 
replay is needed
it means that the file system exits abnormally,
*Is it possible to consider e2fsck to do a force check after the journal 
replay?*

Ye Bin Feb. 16, 2023, 8:12 a.m. UTC | #2

On 2023/2/16 15:18, Baokun Li wrote:
> On 2023/2/14 10:29, Ye Bin wrote:
>> From: Ye Bin <yebin10@huawei.com>
>>
>> Diff v3 Vs v2:
>> Only fix fs error flag lost when previous journal errno is not record
>> in disk. As this may lead to drop orphan list, however fs not record
>> error flag, then fsck will not repair deeply.
>>
>> Diff v2 vs v1:
>> Move call 'j_replay_prepare_callback' and 'j_replay_end_callback' from
>> ext4_load_journal() to jbd2_journal_recover().
>>
>> When do fault injection test, got issue as follows:
>> EXT4-fs (dm-5): warning: mounting fs with errors, running e2fsck is 
>> recommended
>> EXT4-fs (dm-5): Errors on filesystem, clearing orphan list.
>> EXT4-fs (dm-5): recovery complete
>> EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: 
>> data_err=abort,errors=remount-ro
>>
>> EXT4-fs (dm-5): recovery complete
>> EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: 
>> data_err=abort,errors=remount-ro
>>
>> Without do file system check, file system is clean when do second mount.
>> Theoretically, the kernel will not clear fs error flag. In 
>> errors=remount-ro
>> mode the last super block is commit directly. So super block in 
>> journal is
>> not uptodate. When do jounral recovery, the uptodate super block will be
>> covered by jounral data. If super block submit all failed after recover
>> journal, then file system error flag is lost. When do "fsck -a" couldn't
>> repair file system deeply.
>> To solve above issue we need to do extra handle when do super block 
>> journal
>> recovery.
>>
>>
>> Ye Bin (2):
>>    ext4: commit super block if fs record error when journal record
>>      without error
>>    ext4: make sure fs error flag setted before clear journal error
>>
>>   fs/ext4/super.c | 18 ++++++++++++++++--
>>   1 file changed, 16 insertions(+), 2 deletions(-)
> When we proceed in the flow of ( uninstall after injecting fault 
> triggered error -> mount
> kernel replay journal -> umount to view fsck info ), there are three 
> cases:
>
> 1. When an injection fault causes the ERROR_FS flag to not be saved to 
> disk, but j_errno
> is successfully saved to disk, PATCH 2/2 effectively ensures that 
> ERROR_FS is saved to disk
> so that fsck performs a force check to discover the error correctly.
>
> 2. When j_errno is lost and the ERROR_FS flag is saved, after the 
> journal replay：
>     a. The ext4_super_block on disk has neither error info nor 
> ERROR_FS flag;
>     b. The ext4_super_block in memory contains error info but no 
> ERROR_FS flag
>         because the error info is copied additionally during journal 
> replay;
>     c. The ext4_sb_info in memory contains both error info and 
> ERROR_FS flag.
> This means that the ext4_super_block in memory will be written to disk 
> the next time
> ext4_commit_super is executed, while the ERROR_FS flag in ext4_sb_info 
> will not be written
> to disk until ext4_put_super is called. So if there is a disk 
> deletion/power failure/disk offline,
> we will lose the ERROR_FS flag or even the error info.
>
> (In this case, repairing directly with e2fsck will not do a force 
> check either, because it
> relies on j_errno to recover the ERROR_FS flag after the journal 
> replay. And it reloads
> the information from the disk into memory after the journal replay, so 
> the
> ERROR_FS flag and error info are completely lost.)
>
> 3. If neither the ERROR_FS flag nor j_errno are saved to disk, we seem 
> to be unable to
> determine if a deep sweep is currently needed. But I think when 
> journal replay is needed
> it means that the file system exits abnormally,
> *Is it possible to consider e2fsck to do a force check after the 
> journal replay?*
Perhaps e2fsck can provide a command parameter, because it is 
unacceptable to
do so in scenarios with requirements for startup time.