[v3,1/2] ext4: commit super block if fs record error when journal record without error

Message ID 20230214022905.765088-2-yebin@huaweicloud.com
State New
Headers
Series fix error flag covered by journal recovery |

Commit Message

Ye Bin Feb. 14, 2023, 2:29 a.m. UTC
  From: Ye Bin <yebin10@huawei.com>

Now, 'es->s_state' maybe covered by recover journal. And journal errno
maybe not recorded in journal sb as IO error. ext4_update_super() only
update error information when 'sbi->s_add_error_count' large than zero.
Then 'EXT4_ERROR_FS' flag maybe lost.
To solve above issue commit error information after recover journal.

Signed-off-by: Ye Bin <yebin10@huawei.com>
---
 fs/ext4/super.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)
  

Comments

Baokun Li Feb. 16, 2023, 7:17 a.m. UTC | #1
On 2023/2/14 10:29, Ye Bin wrote:
> From: Ye Bin <yebin10@huawei.com>
>
> Now, 'es->s_state' maybe covered by recover journal. And journal errno
> maybe not recorded in journal sb as IO error. ext4_update_super() only
> update error information when 'sbi->s_add_error_count' large than zero.
> Then 'EXT4_ERROR_FS' flag maybe lost.
> To solve above issue commit error information after recover journal.
>
> Signed-off-by: Ye Bin <yebin10@huawei.com>
> ---
>   fs/ext4/super.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index dc3907dff13a..b94754ba8556 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -5932,6 +5932,18 @@ static int ext4_load_journal(struct super_block *sb,
>   		goto err_out;
>   	}
>   
> +	if (unlikely(es->s_error_count && !jbd2_journal_errno(journal) &&
> +		     !(le16_to_cpu(es->s_state) & EXT4_ERROR_FS))) {
> +		EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS;
> +		es->s_state |= cpu_to_le16(EXT4_ERROR_FS);
> +		err = ext4_commit_super(sb);
> +		if (err) {
> +			ext4_msg(sb, KERN_ERR,
> +				 "Failed to commit error information, please repair fs force!");
> +			goto err_out;
> +		}
> +	}
> +
>   	EXT4_SB(sb)->s_journal = journal;
>   	err = ext4_clear_journal_err(sb, es);
>   	if (err) {
I think we don't need such a complicated judgment, after the journal 
replay and saving the error info,
if there is EXT4_ERROR_FS flag in ext4_sb_info->s_mount_state, just add 
this flag directly to es->s_state.
This way the EXT4_ERROR_FS flag and the error message will be written to 
disk the next time
ext4_commit_super() is executed. The code change is as follows:

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 260c1b3e3ef2..341b11c589b3 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5935,6 +5935,7 @@ static int ext4_load_journal(struct super_block *sb,
                         memcpy(((char *) es) + EXT4_S_ERR_START,
                                save, EXT4_S_ERR_LEN);
                 kfree(save);
+               es->s_state |= cpu_to_le16(EXT4_SB(sb)->s_mount_state & 
EXT4_ERROR_FS);
         }

         if (err) {
  
Ye Bin Feb. 16, 2023, 7:44 a.m. UTC | #2
On 2023/2/16 15:17, Baokun Li wrote:
> On 2023/2/14 10:29, Ye Bin wrote:
>> From: Ye Bin <yebin10@huawei.com>
>>
>> Now, 'es->s_state' maybe covered by recover journal. And journal errno
>> maybe not recorded in journal sb as IO error. ext4_update_super() only
>> update error information when 'sbi->s_add_error_count' large than zero.
>> Then 'EXT4_ERROR_FS' flag maybe lost.
>> To solve above issue commit error information after recover journal.
>>
>> Signed-off-by: Ye Bin <yebin10@huawei.com>
>> ---
>>   fs/ext4/super.c | 12 ++++++++++++
>>   1 file changed, 12 insertions(+)
>>
>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>> index dc3907dff13a..b94754ba8556 100644
>> --- a/fs/ext4/super.c
>> +++ b/fs/ext4/super.c
>> @@ -5932,6 +5932,18 @@ static int ext4_load_journal(struct 
>> super_block *sb,
>>           goto err_out;
>>       }
>>   +    if (unlikely(es->s_error_count && !jbd2_journal_errno(journal) &&
>> +             !(le16_to_cpu(es->s_state) & EXT4_ERROR_FS))) {
>> +        EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS;
>> +        es->s_state |= cpu_to_le16(EXT4_ERROR_FS);
>> +        err = ext4_commit_super(sb);
>> +        if (err) {
>> +            ext4_msg(sb, KERN_ERR,
>> +                 "Failed to commit error information, please repair 
>> fs force!");
>> +            goto err_out;
>> +        }
>> +    }
>> +
>>       EXT4_SB(sb)->s_journal = journal;
>>       err = ext4_clear_journal_err(sb, es);
>>       if (err) {
> I think we don't need such a complicated judgment, after the journal 
> replay and saving the error info,
> if there is EXT4_ERROR_FS flag in ext4_sb_info->s_mount_state, just 
> add this flag directly to es->s_state.
> This way the EXT4_ERROR_FS flag and the error message will be written 
> to disk the next time

Thanks for your suggestion. There are two reasons for this:
1. We want to write the error mark to the disk as soon as possible.
2. Here we deal with the case where there is no error mark bit but there 
is an error record.
In this case, the file system should be marked with an error and the 
user should be prompted.
> ext4_commit_super() is executed. The code change is as follows:
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 260c1b3e3ef2..341b11c589b3 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -5935,6 +5935,7 @@ static int ext4_load_journal(struct super_block 
> *sb,
>                         memcpy(((char *) es) + EXT4_S_ERR_START,
>                                save, EXT4_S_ERR_LEN);
>                 kfree(save);
> +               es->s_state |= cpu_to_le16(EXT4_SB(sb)->s_mount_state 
> & EXT4_ERROR_FS);
>         }
>
>         if (err) {
>
  
Baokun Li Feb. 16, 2023, 9:17 a.m. UTC | #3
On 2023/2/16 15:44, yebin (H) wrote:
>
>
> On 2023/2/16 15:17, Baokun Li wrote:
>> On 2023/2/14 10:29, Ye Bin wrote:
>>> From: Ye Bin <yebin10@huawei.com>
>>>
>>> Now, 'es->s_state' maybe covered by recover journal. And journal errno
>>> maybe not recorded in journal sb as IO error. ext4_update_super() only
>>> update error information when 'sbi->s_add_error_count' large than zero.
>>> Then 'EXT4_ERROR_FS' flag maybe lost.
>>> To solve above issue commit error information after recover journal.
>>>
>>> Signed-off-by: Ye Bin <yebin10@huawei.com>
>>> ---
>>>   fs/ext4/super.c | 12 ++++++++++++
>>>   1 file changed, 12 insertions(+)
>>>
>>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>>> index dc3907dff13a..b94754ba8556 100644
>>> --- a/fs/ext4/super.c
>>> +++ b/fs/ext4/super.c
>>> @@ -5932,6 +5932,18 @@ static int ext4_load_journal(struct 
>>> super_block *sb,
>>>           goto err_out;
>>>       }
>>>   +    if (unlikely(es->s_error_count && 
>>> !jbd2_journal_errno(journal) &&
>>> +             !(le16_to_cpu(es->s_state) & EXT4_ERROR_FS))) {
>>> +        EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS;
>>> +        es->s_state |= cpu_to_le16(EXT4_ERROR_FS);
>>> +        err = ext4_commit_super(sb);
>>> +        if (err) {
>>> +            ext4_msg(sb, KERN_ERR,
>>> +                 "Failed to commit error information, please repair 
>>> fs force!");
>>> +            goto err_out;
>>> +        }
>>> +    }
>>> +
>>>       EXT4_SB(sb)->s_journal = journal;
>>>       err = ext4_clear_journal_err(sb, es);
>>>       if (err) {
>> I think we don't need such a complicated judgment, after the journal 
>> replay and saving the error info,
>> if there is EXT4_ERROR_FS flag in ext4_sb_info->s_mount_state, just 
>> add this flag directly to es->s_state.
>> This way the EXT4_ERROR_FS flag and the error message will be written 
>> to disk the next time
>
> Thanks for your suggestion. There are two reasons for this:
> 1. We want to write the error mark to the disk as soon as possible.
> 2. Here we deal with the case where there is no error mark bit but 
> there is an error record.
> In this case, the file system should be marked with an error and the 
> user should be prompted.
The EXT4_ERROR_FS flag is always written to disk at the same time as the 
error info,
except when the journal is replayed. During journal replay the error 
info is additionally
copied to disk first, and the EXT4_ERROR_FS flag in the sbi is not 
written to disk until
the ext4_put_super() is called. It is only when a failure occurs during 
this time that
there is an error info but no EXT4_ERROR_FS flag. So we just need to 
make sure that
the EXT4_ERROR_FS flag is also written to disk at the same time as the 
error info
after the journal replay.
>> ext4_commit_super() is executed. The code change is as follows:
>>
>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>> index 260c1b3e3ef2..341b11c589b3 100644
>> --- a/fs/ext4/super.c
>> +++ b/fs/ext4/super.c
>> @@ -5935,6 +5935,7 @@ static int ext4_load_journal(struct super_block 
>> *sb,
>>                         memcpy(((char *) es) + EXT4_S_ERR_START,
>>                                save, EXT4_S_ERR_LEN);
>>                 kfree(save);
>> +               es->s_state |= cpu_to_le16(EXT4_SB(sb)->s_mount_state 
>> & EXT4_ERROR_FS);
>>         }
>>
>>         if (err) {
>>
>
>
>
  
Ye Bin Feb. 16, 2023, 9:29 a.m. UTC | #4
On 2023/2/16 17:17, Baokun Li wrote:
> On 2023/2/16 15:44, yebin (H) wrote:
>>
>>
>> On 2023/2/16 15:17, Baokun Li wrote:
>>> On 2023/2/14 10:29, Ye Bin wrote:
>>>> From: Ye Bin <yebin10@huawei.com>
>>>>
>>>> Now, 'es->s_state' maybe covered by recover journal. And journal errno
>>>> maybe not recorded in journal sb as IO error. ext4_update_super() only
>>>> update error information when 'sbi->s_add_error_count' large than 
>>>> zero.
>>>> Then 'EXT4_ERROR_FS' flag maybe lost.
>>>> To solve above issue commit error information after recover journal.
>>>>
>>>> Signed-off-by: Ye Bin <yebin10@huawei.com>
>>>> ---
>>>>   fs/ext4/super.c | 12 ++++++++++++
>>>>   1 file changed, 12 insertions(+)
>>>>
>>>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>>>> index dc3907dff13a..b94754ba8556 100644
>>>> --- a/fs/ext4/super.c
>>>> +++ b/fs/ext4/super.c
>>>> @@ -5932,6 +5932,18 @@ static int ext4_load_journal(struct 
>>>> super_block *sb,
>>>>           goto err_out;
>>>>       }
>>>>   +    if (unlikely(es->s_error_count && 
>>>> !jbd2_journal_errno(journal) &&
>>>> +             !(le16_to_cpu(es->s_state) & EXT4_ERROR_FS))) {
>>>> +        EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS;
>>>> +        es->s_state |= cpu_to_le16(EXT4_ERROR_FS);
>>>> +        err = ext4_commit_super(sb);
>>>> +        if (err) {
>>>> +            ext4_msg(sb, KERN_ERR,
>>>> +                 "Failed to commit error information, please 
>>>> repair fs force!");
>>>> +            goto err_out;
>>>> +        }
>>>> +    }
>>>> +
>>>>       EXT4_SB(sb)->s_journal = journal;
>>>>       err = ext4_clear_journal_err(sb, es);
>>>>       if (err) {
>>> I think we don't need such a complicated judgment, after the journal 
>>> replay and saving the error info,
>>> if there is EXT4_ERROR_FS flag in ext4_sb_info->s_mount_state, just 
>>> add this flag directly to es->s_state.
>>> This way the EXT4_ERROR_FS flag and the error message will be 
>>> written to disk the next time
>>
>> Thanks for your suggestion. There are two reasons for this:
>> 1. We want to write the error mark to the disk as soon as possible.
>> 2. Here we deal with the case where there is no error mark bit but 
>> there is an error record.
>> In this case, the file system should be marked with an error and the 
>> user should be prompted.
> The EXT4_ERROR_FS flag is always written to disk at the same time as 
> the error info,
> except when the journal is replayed. During journal replay the error 
> info is additionally
> copied to disk first, and the EXT4_ERROR_FS flag in the sbi is not 
> written to disk until
> the ext4_put_super() is called. It is only when a failure occurs 
> during this time that
> there is an error info but no EXT4_ERROR_FS flag. So we just need to 
> make sure that
> the EXT4_ERROR_FS flag is also written to disk at the same time as the 
> error info
> after the journal replay.
The situation you said is based on the situation after the repair. What 
about the existing
image with such inconsistency?
>>> ext4_commit_super() is executed. The code change is as follows:
>>>
>>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>>> index 260c1b3e3ef2..341b11c589b3 100644
>>> --- a/fs/ext4/super.c
>>> +++ b/fs/ext4/super.c
>>> @@ -5935,6 +5935,7 @@ static int ext4_load_journal(struct 
>>> super_block *sb,
>>>                         memcpy(((char *) es) + EXT4_S_ERR_START,
>>>                                save, EXT4_S_ERR_LEN);
>>>                 kfree(save);
>>> +               es->s_state |= 
>>> cpu_to_le16(EXT4_SB(sb)->s_mount_state & EXT4_ERROR_FS);
>>>         }
>>>
>>>         if (err) {
>>>
>>
>>
>>
  
Jan Kara Feb. 16, 2023, 5:31 p.m. UTC | #5
On Tue 14-02-23 10:29:04, Ye Bin wrote:
> From: Ye Bin <yebin10@huawei.com>
> 
> Now, 'es->s_state' maybe covered by recover journal. And journal errno
> maybe not recorded in journal sb as IO error. ext4_update_super() only
> update error information when 'sbi->s_add_error_count' large than zero.
> Then 'EXT4_ERROR_FS' flag maybe lost.
> To solve above issue commit error information after recover journal.
> 
> Signed-off-by: Ye Bin <yebin10@huawei.com>
> ---
>  fs/ext4/super.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index dc3907dff13a..b94754ba8556 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -5932,6 +5932,18 @@ static int ext4_load_journal(struct super_block *sb,
>  		goto err_out;
>  	}
>  
> +	if (unlikely(es->s_error_count && !jbd2_journal_errno(journal) &&
> +		     !(le16_to_cpu(es->s_state) & EXT4_ERROR_FS))) {
> +		EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS;
> +		es->s_state |= cpu_to_le16(EXT4_ERROR_FS);
> +		err = ext4_commit_super(sb);
> +		if (err) {
> +			ext4_msg(sb, KERN_ERR,
> +				 "Failed to commit error information, please repair fs force!");
> +			goto err_out;
> +		}
> +	}
> +

Hum, I'm not sure I follow here. If journal replay has overwritten the
superblock (and thus the stored error info), then I'd expect
es->s_error_count got overwritten (possibly to 0) as well. And this is
actually relatively realistic scenario with errors=remount-ro behavior when
the first fs error happens.

What I intended in my original suggestion was to save es->s_error_count,
es->s_state & EXT4_ERROR_FS, es->s_first_error_*, es->s_last_error_* before
doing journal replay in ext4_load_journal() and then after journal replay
merge this info back to the superblock - if EXT4_ERROR_FS was set, set it
now as well, take max of old and new s_error_count, set s_first_error_* if
it is now unset, set s_last_error_* if stored timestamp is newer than
current timestamp.

Or am I overengineering it now? :)

								Honza
  
Baokun Li Feb. 17, 2023, 1:43 a.m. UTC | #6
On 2023/2/17 1:31, Jan Kara wrote:
> On Tue 14-02-23 10:29:04, Ye Bin wrote:
>> From: Ye Bin <yebin10@huawei.com>
>>
>> Now, 'es->s_state' maybe covered by recover journal. And journal errno
>> maybe not recorded in journal sb as IO error. ext4_update_super() only
>> update error information when 'sbi->s_add_error_count' large than zero.
>> Then 'EXT4_ERROR_FS' flag maybe lost.
>> To solve above issue commit error information after recover journal.
>>
>> Signed-off-by: Ye Bin <yebin10@huawei.com>
>> ---
>>   fs/ext4/super.c | 12 ++++++++++++
>>   1 file changed, 12 insertions(+)
> Hum, I'm not sure I follow here. If journal replay has overwritten the
> superblock (and thus the stored error info), then I'd expect
> es->s_error_count got overwritten (possibly to 0) as well. And this is
> actually relatively realistic scenario with errors=remount-ro behavior when
> the first fs error happens.
>
> What I intended in my original suggestion was to save es->s_error_count,
> es->s_state & EXT4_ERROR_FS, es->s_first_error_*, es->s_last_error_* before
> doing journal replay in ext4_load_journal() and then after journal replay
> merge this info back to the superblock - if EXT4_ERROR_FS was set, set it
> now as well, take max of old and new s_error_count, set s_first_error_* if
> it is now unset, set s_last_error_* if stored timestamp is newer than
> current timestamp.
>
> Or am I overengineering it now? :)
>
> 								Honza
This is exactly how the code is designed now!
The code has now saved all the above information except EXT4_ERROR_FS by
the following two pieces of logic, as follows:

---------------- In struct ext4_super_block ----------------
1412 #define EXT4_S_ERR_START offsetof(struct ext4_super_block, 
s_error_count)
1413         __le32  s_error_count;          /* number of fs errors */
1414         __le32  s_first_error_time;     /* first time an error 
happened */
1415         __le32  s_first_error_ino;      /* inode involved in first 
error */
1416         __le64  s_first_error_block;    /* block involved of first 
error */
1417         __u8    s_first_error_func[32] __nonstring;     /* function 
where the error happened */
1418         __le32  s_first_error_line;     /* line number where error 
happened */
1419         __le32  s_last_error_time;      /* most recent time of an 
error */
1420         __le32  s_last_error_ino;       /* inode involved in last 
error */
1421         __le32  s_last_error_line;      /* line number where error 
happened */
1422         __le64  s_last_error_block;     /* block involved of last 
error */
1423         __u8    s_last_error_func[32] __nonstring;      /* function 
where the error happened */
1424 #define EXT4_S_ERR_END offsetof(struct ext4_super_block, s_mount_opts)
-----------------------------------------------------------

---------------- In ext4_load_journal() ----------------
5929                 char *save = kmalloc(EXT4_S_ERR_LEN, GFP_KERNEL);
5930                 if (save)
5931                         memcpy(save, ((char *) es) +
5932                                EXT4_S_ERR_START, EXT4_S_ERR_LEN);
5933                 err = jbd2_journal_load(journal);
5934                 if (save)
5935                         memcpy(((char *) es) + EXT4_S_ERR_START,
5936                                save, EXT4_S_ERR_LEN);
5937                 kfree(save);
--------------------------------------------------------

As you said, we should also save EXT4_ERROR_FS to es->s_state.
But we are not saving this now, so I think we just need to add

  `es->s_state |= cpu_to_le16(EXT4_SB(sb)->s_mount_state & EXT4_ERROR_FS);`

to save the possible EXT4_ERROR_FS flag after copying the error area 
data to es.
  
Ye Bin Feb. 17, 2023, 1:44 a.m. UTC | #7
On 2023/2/17 1:31, Jan Kara wrote:
> On Tue 14-02-23 10:29:04, Ye Bin wrote:
>> From: Ye Bin <yebin10@huawei.com>
>>
>> Now, 'es->s_state' maybe covered by recover journal. And journal errno
>> maybe not recorded in journal sb as IO error. ext4_update_super() only
>> update error information when 'sbi->s_add_error_count' large than zero.
>> Then 'EXT4_ERROR_FS' flag maybe lost.
>> To solve above issue commit error information after recover journal.
>>
>> Signed-off-by: Ye Bin <yebin10@huawei.com>
>> ---
>>   fs/ext4/super.c | 12 ++++++++++++
>>   1 file changed, 12 insertions(+)
>>
>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>> index dc3907dff13a..b94754ba8556 100644
>> --- a/fs/ext4/super.c
>> +++ b/fs/ext4/super.c
>> @@ -5932,6 +5932,18 @@ static int ext4_load_journal(struct super_block *sb,
>>   		goto err_out;
>>   	}
>>   
>> +	if (unlikely(es->s_error_count && !jbd2_journal_errno(journal) &&
>> +		     !(le16_to_cpu(es->s_state) & EXT4_ERROR_FS))) {
>> +		EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS;
>> +		es->s_state |= cpu_to_le16(EXT4_ERROR_FS);
>> +		err = ext4_commit_super(sb);
>> +		if (err) {
>> +			ext4_msg(sb, KERN_ERR,
>> +				 "Failed to commit error information, please repair fs force!");
>> +			goto err_out;
>> +		}
>> +	}
>> +
> Hum, I'm not sure I follow here. If journal replay has overwritten the
> superblock (and thus the stored error info), then I'd expect
> es->s_error_count got overwritten (possibly to 0) as well. And this is
> actually relatively realistic scenario with errors=remount-ro behavior when
> the first fs error happens.
>
> What I intended in my original suggestion was to save es->s_error_count,
> es->s_state & EXT4_ERROR_FS, es->s_first_error_*, es->s_last_error_* before
> doing journal replay in ext4_load_journal() and then after journal replay
> merge this info back to the superblock
Actually,commit 1c13d5c08728 ("ext4: Save error information to the 
superblock for analysis")
already merged error info back to the superblock after journal replay 
except 'es->s_state'.
The problem I have now is that the error flag in the journal superblock 
was not recorded,
but the error message was recorded in the superblock. So it leads to 
ext4_clear_journal_err()
does not detect errors and marks the file system as an error. Because 
ext4_update_super() is
only set error flag  when 'sbi->s_add_error_count  > 0'. Although 
'sbi->s_mount_state' is
written to the super block when umount, but it is also conditional.
So I handle the scenario "es->s_error_count && 
!jbd2_journal_errno(journal) &&
!(le16_to_cpu(es->s_state) & EXT4_ERROR_FS)". Maybe we can just store
'EXT4_SB(sb)->s_mount_state & EXT4_ERROR_FS' back to the superblock. But i
prefer to mark fs as error if it contain detail error info without 
EXT4_ERROR_FS.
> - if EXT4_ERROR_FS was set, set it
> now as well, take max of old and new s_error_count, set s_first_error_* if
> it is now unset, set s_last_error_* if stored timestamp is newer than
> current timestamp.
>
> Or am I overengineering it now? :)
>
> 								Honza
  
Jan Kara Feb. 17, 2023, 10:56 a.m. UTC | #8
On Fri 17-02-23 09:44:57, yebin (H) wrote:
> On 2023/2/17 1:31, Jan Kara wrote:
> > On Tue 14-02-23 10:29:04, Ye Bin wrote:
> > > From: Ye Bin <yebin10@huawei.com>
> > > 
> > > Now, 'es->s_state' maybe covered by recover journal. And journal errno
> > > maybe not recorded in journal sb as IO error. ext4_update_super() only
> > > update error information when 'sbi->s_add_error_count' large than zero.
> > > Then 'EXT4_ERROR_FS' flag maybe lost.
> > > To solve above issue commit error information after recover journal.
> > > 
> > > Signed-off-by: Ye Bin <yebin10@huawei.com>
> > > ---
> > >   fs/ext4/super.c | 12 ++++++++++++
> > >   1 file changed, 12 insertions(+)
> > > 
> > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > > index dc3907dff13a..b94754ba8556 100644
> > > --- a/fs/ext4/super.c
> > > +++ b/fs/ext4/super.c
> > > @@ -5932,6 +5932,18 @@ static int ext4_load_journal(struct super_block *sb,
> > >   		goto err_out;
> > >   	}
> > > +	if (unlikely(es->s_error_count && !jbd2_journal_errno(journal) &&
> > > +		     !(le16_to_cpu(es->s_state) & EXT4_ERROR_FS))) {
> > > +		EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS;
> > > +		es->s_state |= cpu_to_le16(EXT4_ERROR_FS);
> > > +		err = ext4_commit_super(sb);
> > > +		if (err) {
> > > +			ext4_msg(sb, KERN_ERR,
> > > +				 "Failed to commit error information, please repair fs force!");
> > > +			goto err_out;
> > > +		}
> > > +	}
> > > +
> > Hum, I'm not sure I follow here. If journal replay has overwritten the
> > superblock (and thus the stored error info), then I'd expect
> > es->s_error_count got overwritten (possibly to 0) as well. And this is
> > actually relatively realistic scenario with errors=remount-ro behavior when
> > the first fs error happens.
> > 
> > What I intended in my original suggestion was to save es->s_error_count,
> > es->s_state & EXT4_ERROR_FS, es->s_first_error_*, es->s_last_error_* before
> > doing journal replay in ext4_load_journal() and then after journal replay
> > merge this info back to the superblock
> Actually,commit 1c13d5c08728 ("ext4: Save error information to the
> superblock for analysis")
> already merged error info back to the superblock after journal replay except
> 'es->s_state'.
> The problem I have now is that the error flag in the journal superblock was
> not recorded,
> but the error message was recorded in the superblock. So it leads to
> ext4_clear_journal_err()
> does not detect errors and marks the file system as an error. Because
> ext4_update_super() is
> only set error flag  when 'sbi->s_add_error_count  > 0'. Although
> 'sbi->s_mount_state' is
> written to the super block when umount, but it is also conditional.
> So I handle the scenario "es->s_error_count && !jbd2_journal_errno(journal)
> &&
> !(le16_to_cpu(es->s_state) & EXT4_ERROR_FS)". Maybe we can just store
> 'EXT4_SB(sb)->s_mount_state & EXT4_ERROR_FS' back to the superblock. But i
> prefer to mark fs as error if it contain detail error info without
> EXT4_ERROR_FS.

Aha, thanks for explanation! So now I finally understand what the problem
exactly is. I'm not sure relying on es->s_error_count is too good. Probably
it works but I'd be calmer if when saving error info we also did:

	bool error_fs = es->s_state & cpu_to_le16(EXT4_ERROR_FS);

	copy other info
	err = jbd2_journal_load(journal);
	restore other info
	if (error_fs)
		es->s_state |= cpu_to_le16(EXT4_ERROR_FS);
	/* Write out restored error information to the superblock */
	err2 = ext4_commit_super(sb);

and be done with this. I don't think trying to optimize away the committing
of the superblock when we had to replay the journal is really worth it.

Does this solve your concerns?

								Honza
  
Ye Bin Feb. 18, 2023, 2:18 a.m. UTC | #9
On 2023/2/17 18:56, Jan Kara wrote:
> On Fri 17-02-23 09:44:57, yebin (H) wrote:
>> On 2023/2/17 1:31, Jan Kara wrote:
>>> On Tue 14-02-23 10:29:04, Ye Bin wrote:
>>>> From: Ye Bin <yebin10@huawei.com>
>>>>
>>>> Now, 'es->s_state' maybe covered by recover journal. And journal errno
>>>> maybe not recorded in journal sb as IO error. ext4_update_super() only
>>>> update error information when 'sbi->s_add_error_count' large than zero.
>>>> Then 'EXT4_ERROR_FS' flag maybe lost.
>>>> To solve above issue commit error information after recover journal.
>>>>
>>>> Signed-off-by: Ye Bin <yebin10@huawei.com>
>>>> ---
>>>>    fs/ext4/super.c | 12 ++++++++++++
>>>>    1 file changed, 12 insertions(+)
>>>>
>>>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>>>> index dc3907dff13a..b94754ba8556 100644
>>>> --- a/fs/ext4/super.c
>>>> +++ b/fs/ext4/super.c
>>>> @@ -5932,6 +5932,18 @@ static int ext4_load_journal(struct super_block *sb,
>>>>    		goto err_out;
>>>>    	}
>>>> +	if (unlikely(es->s_error_count && !jbd2_journal_errno(journal) &&
>>>> +		     !(le16_to_cpu(es->s_state) & EXT4_ERROR_FS))) {
>>>> +		EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS;
>>>> +		es->s_state |= cpu_to_le16(EXT4_ERROR_FS);
>>>> +		err = ext4_commit_super(sb);
>>>> +		if (err) {
>>>> +			ext4_msg(sb, KERN_ERR,
>>>> +				 "Failed to commit error information, please repair fs force!");
>>>> +			goto err_out;
>>>> +		}
>>>> +	}
>>>> +
>>> Hum, I'm not sure I follow here. If journal replay has overwritten the
>>> superblock (and thus the stored error info), then I'd expect
>>> es->s_error_count got overwritten (possibly to 0) as well. And this is
>>> actually relatively realistic scenario with errors=remount-ro behavior when
>>> the first fs error happens.
>>>
>>> What I intended in my original suggestion was to save es->s_error_count,
>>> es->s_state & EXT4_ERROR_FS, es->s_first_error_*, es->s_last_error_* before
>>> doing journal replay in ext4_load_journal() and then after journal replay
>>> merge this info back to the superblock
>> Actually,commit 1c13d5c08728 ("ext4: Save error information to the
>> superblock for analysis")
>> already merged error info back to the superblock after journal replay except
>> 'es->s_state'.
>> The problem I have now is that the error flag in the journal superblock was
>> not recorded,
>> but the error message was recorded in the superblock. So it leads to
>> ext4_clear_journal_err()
>> does not detect errors and marks the file system as an error. Because
>> ext4_update_super() is
>> only set error flag  when 'sbi->s_add_error_count  > 0'. Although
>> 'sbi->s_mount_state' is
>> written to the super block when umount, but it is also conditional.
>> So I handle the scenario "es->s_error_count && !jbd2_journal_errno(journal)
>> &&
>> !(le16_to_cpu(es->s_state) & EXT4_ERROR_FS)". Maybe we can just store
>> 'EXT4_SB(sb)->s_mount_state & EXT4_ERROR_FS' back to the superblock. But i
>> prefer to mark fs as error if it contain detail error info without
>> EXT4_ERROR_FS.
> Aha, thanks for explanation! So now I finally understand what the problem
> exactly is. I'm not sure relying on es->s_error_count is too good. Probably
> it works but I'd be calmer if when saving error info we also did:
>
> 	bool error_fs = es->s_state & cpu_to_le16(EXT4_ERROR_FS);
>
> 	copy other info
> 	err = jbd2_journal_load(journal);
> 	restore other info
> 	if (error_fs)
> 		es->s_state |= cpu_to_le16(EXT4_ERROR_FS);
> 	/* Write out restored error information to the superblock */
> 	err2 = ext4_commit_super(sb);
>
> and be done with this. I don't think trying to optimize away the committing
> of the superblock when we had to replay the journal is really worth it.
>
> Does this solve your concerns?
>
> 								Honza
Thanks for your suggestion.

I think if journal super block has 'j_errno' ext4_clear_journal_err() 
will commit error info.
The scenario we need to deal with is:(1) journal super block has no 
'j_errno'; (2) super
block has detail error info, but 'es->s_state' has no 'EXT4_ERROR_FS', 
It means super
block in journal has no error flag and the newest super block has error 
flag. so we
need to store error flag to 'es->s_state', and commit it to disk.If 
'es->s_state' has
  'EXT4_ERROR_FS', it means super block in journal has error flag, so we 
do not need
to store error flag in super block.

I don't know if I can explain my idea of repair.
  
Jan Kara Feb. 27, 2023, 11:20 a.m. UTC | #10
On Sat 18-02-23 10:18:42, yebin (H) wrote:
> On 2023/2/17 18:56, Jan Kara wrote:
> > On Fri 17-02-23 09:44:57, yebin (H) wrote:
> > > On 2023/2/17 1:31, Jan Kara wrote:
> > > > On Tue 14-02-23 10:29:04, Ye Bin wrote:
> > > > > From: Ye Bin <yebin10@huawei.com>
> > > > > 
> > > > > Now, 'es->s_state' maybe covered by recover journal. And journal errno
> > > > > maybe not recorded in journal sb as IO error. ext4_update_super() only
> > > > > update error information when 'sbi->s_add_error_count' large than zero.
> > > > > Then 'EXT4_ERROR_FS' flag maybe lost.
> > > > > To solve above issue commit error information after recover journal.
> > > > > 
> > > > > Signed-off-by: Ye Bin <yebin10@huawei.com>
> > > > > ---
> > > > >    fs/ext4/super.c | 12 ++++++++++++
> > > > >    1 file changed, 12 insertions(+)
> > > > > 
> > > > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > > > > index dc3907dff13a..b94754ba8556 100644
> > > > > --- a/fs/ext4/super.c
> > > > > +++ b/fs/ext4/super.c
> > > > > @@ -5932,6 +5932,18 @@ static int ext4_load_journal(struct super_block *sb,
> > > > >    		goto err_out;
> > > > >    	}
> > > > > +	if (unlikely(es->s_error_count && !jbd2_journal_errno(journal) &&
> > > > > +		     !(le16_to_cpu(es->s_state) & EXT4_ERROR_FS))) {
> > > > > +		EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS;
> > > > > +		es->s_state |= cpu_to_le16(EXT4_ERROR_FS);
> > > > > +		err = ext4_commit_super(sb);
> > > > > +		if (err) {
> > > > > +			ext4_msg(sb, KERN_ERR,
> > > > > +				 "Failed to commit error information, please repair fs force!");
> > > > > +			goto err_out;
> > > > > +		}
> > > > > +	}
> > > > > +
> > > > Hum, I'm not sure I follow here. If journal replay has overwritten the
> > > > superblock (and thus the stored error info), then I'd expect
> > > > es->s_error_count got overwritten (possibly to 0) as well. And this is
> > > > actually relatively realistic scenario with errors=remount-ro behavior when
> > > > the first fs error happens.
> > > > 
> > > > What I intended in my original suggestion was to save es->s_error_count,
> > > > es->s_state & EXT4_ERROR_FS, es->s_first_error_*, es->s_last_error_* before
> > > > doing journal replay in ext4_load_journal() and then after journal replay
> > > > merge this info back to the superblock
> > > Actually,commit 1c13d5c08728 ("ext4: Save error information to the
> > > superblock for analysis")
> > > already merged error info back to the superblock after journal replay except
> > > 'es->s_state'.
> > > The problem I have now is that the error flag in the journal superblock was
> > > not recorded,
> > > but the error message was recorded in the superblock. So it leads to
> > > ext4_clear_journal_err()
> > > does not detect errors and marks the file system as an error. Because
> > > ext4_update_super() is
> > > only set error flag  when 'sbi->s_add_error_count  > 0'. Although
> > > 'sbi->s_mount_state' is
> > > written to the super block when umount, but it is also conditional.
> > > So I handle the scenario "es->s_error_count && !jbd2_journal_errno(journal)
> > > &&
> > > !(le16_to_cpu(es->s_state) & EXT4_ERROR_FS)". Maybe we can just store
> > > 'EXT4_SB(sb)->s_mount_state & EXT4_ERROR_FS' back to the superblock. But i
> > > prefer to mark fs as error if it contain detail error info without
> > > EXT4_ERROR_FS.
> > Aha, thanks for explanation! So now I finally understand what the problem
> > exactly is. I'm not sure relying on es->s_error_count is too good. Probably
> > it works but I'd be calmer if when saving error info we also did:
> > 
> > 	bool error_fs = es->s_state & cpu_to_le16(EXT4_ERROR_FS);
> > 
> > 	copy other info
> > 	err = jbd2_journal_load(journal);
> > 	restore other info
> > 	if (error_fs)
> > 		es->s_state |= cpu_to_le16(EXT4_ERROR_FS);
> > 	/* Write out restored error information to the superblock */
> > 	err2 = ext4_commit_super(sb);
> > 
> > and be done with this. I don't think trying to optimize away the committing
> > of the superblock when we had to replay the journal is really worth it.
> > 
> > Does this solve your concerns?
> Thanks for your suggestion.
> 
> I think if journal super block has 'j_errno' ext4_clear_journal_err()
> will commit error info.  The scenario we need to deal with is:(1) journal
> super block has no 'j_errno'; (2) super block has detail error info, but
> 'es->s_state' has no 'EXT4_ERROR_FS', It means super block in journal has
> no error flag and the newest super block has error flag.

But my code above should be handling this situation you describe - the
check:

error_fs = es->s_state & cpu_to_le16(EXT4_ERROR_FS);

will check the newest state in the superblock before journal replay. Then
we replay the journal - es->s_state may loose the EXT4_ERROR_FS flag if the
superblock version in the journal didn't have it. So we do:

if (error_fs)
	es->s_state |= cpu_to_le16(EXT4_ERROR_FS);

which makes sure EXT4_ERROR_FS is set either if it was set in the journal
or in the newest superblock version before journal replay.

> so we need to
> store error flag to 'es->s_state', and commit it to disk.If 'es->s_state'
> has 'EXT4_ERROR_FS', it means super block in journal has error flag, so
> we do not need to store error flag in super block.

Why do you think that if es->s_state has EXT4_ERROR_FS, the super block in
the journal has that flag? During mount, we load the superblock directly
from the first block in the filesystem and until we call
jbd2_journal_load(), es->s_state contains this newest value of the
superblock state. What am I missing?

								Honza
  
Ye Bin Feb. 28, 2023, 2:24 a.m. UTC | #11
On 2023/2/27 19:20, Jan Kara wrote:
> On Sat 18-02-23 10:18:42, yebin (H) wrote:
>> On 2023/2/17 18:56, Jan Kara wrote:
>>> On Fri 17-02-23 09:44:57, yebin (H) wrote:
>>>> On 2023/2/17 1:31, Jan Kara wrote:
>>>>> On Tue 14-02-23 10:29:04, Ye Bin wrote:
>>>>>> From: Ye Bin <yebin10@huawei.com>
>>>>>>
>>>>>> Now, 'es->s_state' maybe covered by recover journal. And journal errno
>>>>>> maybe not recorded in journal sb as IO error. ext4_update_super() only
>>>>>> update error information when 'sbi->s_add_error_count' large than zero.
>>>>>> Then 'EXT4_ERROR_FS' flag maybe lost.
>>>>>> To solve above issue commit error information after recover journal.
>>>>>>
>>>>>> Signed-off-by: Ye Bin <yebin10@huawei.com>
>>>>>> ---
>>>>>>     fs/ext4/super.c | 12 ++++++++++++
>>>>>>     1 file changed, 12 insertions(+)
>>>>>>
>>>>>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>>>>>> index dc3907dff13a..b94754ba8556 100644
>>>>>> --- a/fs/ext4/super.c
>>>>>> +++ b/fs/ext4/super.c
>>>>>> @@ -5932,6 +5932,18 @@ static int ext4_load_journal(struct super_block *sb,
>>>>>>     		goto err_out;
>>>>>>     	}
>>>>>> +	if (unlikely(es->s_error_count && !jbd2_journal_errno(journal) &&
>>>>>> +		     !(le16_to_cpu(es->s_state) & EXT4_ERROR_FS))) {
>>>>>> +		EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS;
>>>>>> +		es->s_state |= cpu_to_le16(EXT4_ERROR_FS);
>>>>>> +		err = ext4_commit_super(sb);
>>>>>> +		if (err) {
>>>>>> +			ext4_msg(sb, KERN_ERR,
>>>>>> +				 "Failed to commit error information, please repair fs force!");
>>>>>> +			goto err_out;
>>>>>> +		}
>>>>>> +	}
>>>>>> +
>>>>> Hum, I'm not sure I follow here. If journal replay has overwritten the
>>>>> superblock (and thus the stored error info), then I'd expect
>>>>> es->s_error_count got overwritten (possibly to 0) as well. And this is
>>>>> actually relatively realistic scenario with errors=remount-ro behavior when
>>>>> the first fs error happens.
>>>>>
>>>>> What I intended in my original suggestion was to save es->s_error_count,
>>>>> es->s_state & EXT4_ERROR_FS, es->s_first_error_*, es->s_last_error_* before
>>>>> doing journal replay in ext4_load_journal() and then after journal replay
>>>>> merge this info back to the superblock
>>>> Actually,commit 1c13d5c08728 ("ext4: Save error information to the
>>>> superblock for analysis")
>>>> already merged error info back to the superblock after journal replay except
>>>> 'es->s_state'.
>>>> The problem I have now is that the error flag in the journal superblock was
>>>> not recorded,
>>>> but the error message was recorded in the superblock. So it leads to
>>>> ext4_clear_journal_err()
>>>> does not detect errors and marks the file system as an error. Because
>>>> ext4_update_super() is
>>>> only set error flag  when 'sbi->s_add_error_count  > 0'. Although
>>>> 'sbi->s_mount_state' is
>>>> written to the super block when umount, but it is also conditional.
>>>> So I handle the scenario "es->s_error_count && !jbd2_journal_errno(journal)
>>>> &&
>>>> !(le16_to_cpu(es->s_state) & EXT4_ERROR_FS)". Maybe we can just store
>>>> 'EXT4_SB(sb)->s_mount_state & EXT4_ERROR_FS' back to the superblock. But i
>>>> prefer to mark fs as error if it contain detail error info without
>>>> EXT4_ERROR_FS.
>>> Aha, thanks for explanation! So now I finally understand what the problem
>>> exactly is. I'm not sure relying on es->s_error_count is too good. Probably
>>> it works but I'd be calmer if when saving error info we also did:
>>>
>>> 	bool error_fs = es->s_state & cpu_to_le16(EXT4_ERROR_FS);
>>>
>>> 	copy other info
>>> 	err = jbd2_journal_load(journal);
>>> 	restore other info
>>> 	if (error_fs)
>>> 		es->s_state |= cpu_to_le16(EXT4_ERROR_FS);
>>> 	/* Write out restored error information to the superblock */
>>> 	err2 = ext4_commit_super(sb);
>>>
>>> and be done with this. I don't think trying to optimize away the committing
>>> of the superblock when we had to replay the journal is really worth it.
>>>
>>> Does this solve your concerns?
>> Thanks for your suggestion.
>>
>> I think if journal super block has 'j_errno' ext4_clear_journal_err()
>> will commit error info.  The scenario we need to deal with is:(1) journal
>> super block has no 'j_errno'; (2) super block has detail error info, but
>> 'es->s_state' has no 'EXT4_ERROR_FS', It means super block in journal has
>> no error flag and the newest super block has error flag.
> But my code above should be handling this situation you describe - the
> check:
>
> error_fs = es->s_state & cpu_to_le16(EXT4_ERROR_FS);
Here, we do not need to backup 'error_fs', as 
'EXT4_SB(sb)->s_mount_state' already
record this flag when fs has error flag before do journal replay.
> will check the newest state in the superblock before journal replay. Then
> we replay the journal - es->s_state may loose the EXT4_ERROR_FS flag if the
> superblock version in the journal didn't have it. So we do:
>
> if (error_fs)
> 	es->s_state |= cpu_to_le16(EXT4_ERROR_FS);
>
> which makes sure EXT4_ERROR_FS is set either if it was set in the journal
> or in the newest superblock version before journal replay.
My modification is to deal with the situation we missed, and I don't 
want to introduce
an invalid super block submission.
If you think my judgment is too obscure, I can send another version 
according to your
suggestion.
>> so we need to
>> store error flag to 'es->s_state', and commit it to disk.If 'es->s_state'
>> has 'EXT4_ERROR_FS', it means super block in journal has error flag, so
>> we do not need to store error flag in super block.
> Why do you think that if es->s_state has EXT4_ERROR_FS, the super block in
> the journal has that flag? During mount, we load the superblock directly
> from the first block in the filesystem and until we call
> jbd2_journal_load(), es->s_state contains this newest value of the
> superblock state. What am I missing?
After jbd2_journal_load() 'es->s_state' is already covered by the super 
block in journal.
If there is super block in journal and 'es->s_state' has error flag this 
means super block
in journal has error flag. If there is no super block in journal and  
'es->s_state' has error
flag, this means super block already has error flag.In both cases we can 
do nothing.
>
> 								Honza
  
Jan Kara March 1, 2023, 9:07 a.m. UTC | #12
On Tue 28-02-23 10:24:26, yebin (H) wrote:
> On 2023/2/27 19:20, Jan Kara wrote:
> > On Sat 18-02-23 10:18:42, yebin (H) wrote:
> > > On 2023/2/17 18:56, Jan Kara wrote:
> > > > On Fri 17-02-23 09:44:57, yebin (H) wrote:
> > > > > On 2023/2/17 1:31, Jan Kara wrote:
> > > > > > On Tue 14-02-23 10:29:04, Ye Bin wrote:
> > > > > > > From: Ye Bin <yebin10@huawei.com>
> > > > > > > 
> > > > > > > Now, 'es->s_state' maybe covered by recover journal. And journal errno
> > > > > > > maybe not recorded in journal sb as IO error. ext4_update_super() only
> > > > > > > update error information when 'sbi->s_add_error_count' large than zero.
> > > > > > > Then 'EXT4_ERROR_FS' flag maybe lost.
> > > > > > > To solve above issue commit error information after recover journal.
> > > > > > > 
> > > > > > > Signed-off-by: Ye Bin <yebin10@huawei.com>
> > > > > > > ---
> > > > > > >     fs/ext4/super.c | 12 ++++++++++++
> > > > > > >     1 file changed, 12 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > > > > > > index dc3907dff13a..b94754ba8556 100644
> > > > > > > --- a/fs/ext4/super.c
> > > > > > > +++ b/fs/ext4/super.c
> > > > > > > @@ -5932,6 +5932,18 @@ static int ext4_load_journal(struct super_block *sb,
> > > > > > >     		goto err_out;
> > > > > > >     	}
> > > > > > > +	if (unlikely(es->s_error_count && !jbd2_journal_errno(journal) &&
> > > > > > > +		     !(le16_to_cpu(es->s_state) & EXT4_ERROR_FS))) {
> > > > > > > +		EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS;
> > > > > > > +		es->s_state |= cpu_to_le16(EXT4_ERROR_FS);
> > > > > > > +		err = ext4_commit_super(sb);
> > > > > > > +		if (err) {
> > > > > > > +			ext4_msg(sb, KERN_ERR,
> > > > > > > +				 "Failed to commit error information, please repair fs force!");
> > > > > > > +			goto err_out;
> > > > > > > +		}
> > > > > > > +	}
> > > > > > > +
> > > > > > Hum, I'm not sure I follow here. If journal replay has overwritten the
> > > > > > superblock (and thus the stored error info), then I'd expect
> > > > > > es->s_error_count got overwritten (possibly to 0) as well. And this is
> > > > > > actually relatively realistic scenario with errors=remount-ro behavior when
> > > > > > the first fs error happens.
> > > > > > 
> > > > > > What I intended in my original suggestion was to save es->s_error_count,
> > > > > > es->s_state & EXT4_ERROR_FS, es->s_first_error_*, es->s_last_error_* before
> > > > > > doing journal replay in ext4_load_journal() and then after journal replay
> > > > > > merge this info back to the superblock
> > > > > Actually,commit 1c13d5c08728 ("ext4: Save error information to the
> > > > > superblock for analysis")
> > > > > already merged error info back to the superblock after journal replay except
> > > > > 'es->s_state'.
> > > > > The problem I have now is that the error flag in the journal superblock was
> > > > > not recorded,
> > > > > but the error message was recorded in the superblock. So it leads to
> > > > > ext4_clear_journal_err()
> > > > > does not detect errors and marks the file system as an error. Because
> > > > > ext4_update_super() is
> > > > > only set error flag  when 'sbi->s_add_error_count  > 0'. Although
> > > > > 'sbi->s_mount_state' is
> > > > > written to the super block when umount, but it is also conditional.
> > > > > So I handle the scenario "es->s_error_count && !jbd2_journal_errno(journal)
> > > > > &&
> > > > > !(le16_to_cpu(es->s_state) & EXT4_ERROR_FS)". Maybe we can just store
> > > > > 'EXT4_SB(sb)->s_mount_state & EXT4_ERROR_FS' back to the superblock. But i
> > > > > prefer to mark fs as error if it contain detail error info without
> > > > > EXT4_ERROR_FS.
> > > > Aha, thanks for explanation! So now I finally understand what the problem
> > > > exactly is. I'm not sure relying on es->s_error_count is too good. Probably
> > > > it works but I'd be calmer if when saving error info we also did:
> > > > 
> > > > 	bool error_fs = es->s_state & cpu_to_le16(EXT4_ERROR_FS);
> > > > 
> > > > 	copy other info
> > > > 	err = jbd2_journal_load(journal);
> > > > 	restore other info
> > > > 	if (error_fs)
> > > > 		es->s_state |= cpu_to_le16(EXT4_ERROR_FS);
> > > > 	/* Write out restored error information to the superblock */
> > > > 	err2 = ext4_commit_super(sb);
> > > > 
> > > > and be done with this. I don't think trying to optimize away the committing
> > > > of the superblock when we had to replay the journal is really worth it.
> > > > 
> > > > Does this solve your concerns?
> > > Thanks for your suggestion.
> > > 
> > > I think if journal super block has 'j_errno' ext4_clear_journal_err()
> > > will commit error info.  The scenario we need to deal with is:(1) journal
> > > super block has no 'j_errno'; (2) super block has detail error info, but
> > > 'es->s_state' has no 'EXT4_ERROR_FS', It means super block in journal has
> > > no error flag and the newest super block has error flag.
> > But my code above should be handling this situation you describe - the
> > check:
> > 
> > error_fs = es->s_state & cpu_to_le16(EXT4_ERROR_FS);
>
> Here, we do not need to backup 'error_fs', as
> 'EXT4_SB(sb)->s_mount_state' already record this flag when fs has error
> flag before do journal replay.

Yeah, right. We don't need error_fs variable and can just look at
EXT4_SB(sb)->s_mount_state.

> > will check the newest state in the superblock before journal replay. Then
> > we replay the journal - es->s_state may loose the EXT4_ERROR_FS flag if the
> > superblock version in the journal didn't have it. So we do:
> > 
> > if (error_fs)
> > 	es->s_state |= cpu_to_le16(EXT4_ERROR_FS);
> > 
> > which makes sure EXT4_ERROR_FS is set either if it was set in the journal
> > or in the newest superblock version before journal replay.
>
> My modification is to deal with the situation we missed, and I don't want
> to introduce an invalid super block submission.  If you think my judgment
> is too obscure, I can send another version according to your suggestion.

So is the extra superblock write your only concern? Honestly, I prefer code
simplicity over saved one superblock write in case we had to replay the
journal (which should be rare anyway). If you look at the code, we can
writeout superblock several times in that mount path anyway.

> > > so we need to
> > > store error flag to 'es->s_state', and commit it to disk.If 'es->s_state'
> > > has 'EXT4_ERROR_FS', it means super block in journal has error flag, so
> > > we do not need to store error flag in super block.
> > Why do you think that if es->s_state has EXT4_ERROR_FS, the super block in
> > the journal has that flag? During mount, we load the superblock directly
> > from the first block in the filesystem and until we call
> > jbd2_journal_load(), es->s_state contains this newest value of the
> > superblock state. What am I missing?
> After jbd2_journal_load() 'es->s_state' is already covered by the super
> block in journal.  If there is super block in journal and 'es->s_state'
> has error flag this means super block in journal has error flag. If there
> is no super block in journal and 'es->s_state' has error flag, this means
> super block already has error flag.In both cases we can do nothing.

If what you wanted to say: "It is not necessary to write the superblock if
EXT4_ERROR_FS is already set." I tend to agree although not 100% because
journal replay could result in rewinding 's_last_error_*' fields and so
writing superblock would still make sense even if EXT4_ERROR_FS is set in
es->s_state after journal reply. That's why I wouldn't complicate things
and just always write the superblock after journal replay.

								Honza
  

Patch

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index dc3907dff13a..b94754ba8556 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5932,6 +5932,18 @@  static int ext4_load_journal(struct super_block *sb,
 		goto err_out;
 	}
 
+	if (unlikely(es->s_error_count && !jbd2_journal_errno(journal) &&
+		     !(le16_to_cpu(es->s_state) & EXT4_ERROR_FS))) {
+		EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS;
+		es->s_state |= cpu_to_le16(EXT4_ERROR_FS);
+		err = ext4_commit_super(sb);
+		if (err) {
+			ext4_msg(sb, KERN_ERR,
+				 "Failed to commit error information, please repair fs force!");
+			goto err_out;
+		}
+	}
+
 	EXT4_SB(sb)->s_journal = journal;
 	err = ext4_clear_journal_err(sb, es);
 	if (err) {