[2/3] md/raid10: fix incorrect done of recovery

Message ID 20230522115449.2203939-3-linan666@huaweicloud.com
State New
Headers
Series raid10 bugfix |

Commit Message

Li Nan May 22, 2023, 11:54 a.m. UTC
  From: Li Nan <linan122@huawei.com>

Recovery will go to giveup and let chunks_skipped++ in
raid10_sync_request() if there are some bad_blocks, and it will return
max_sector when chunks_skipped >= geo.raid_disks. Now, recovery fail and
data is inconsistent but user think recovery is done, it is wrong.

Fix it by set mirror's recovery_disabled and spare device shouln't be
added to here.

Signed-off-by: Li Nan <linan122@huawei.com>
---
 drivers/md/raid10.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)
  

Comments

Yu Kuai May 22, 2023, 1:54 p.m. UTC | #1
Hi,

在 2023/05/22 19:54, linan666@huaweicloud.com 写道:
> From: Li Nan <linan122@huawei.com>
> 
> Recovery will go to giveup and let chunks_skipped++ in
> raid10_sync_request() if there are some bad_blocks, and it will return
> max_sector when chunks_skipped >= geo.raid_disks. Now, recovery fail and
> data is inconsistent but user think recovery is done, it is wrong.
> 
> Fix it by set mirror's recovery_disabled and spare device shouln't be
> added to here.
> 
> Signed-off-by: Li Nan <linan122@huawei.com>
> ---
>   drivers/md/raid10.c | 16 +++++++++++++++-
>   1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index e21502c03b45..70cc87c7ee57 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>   	int chunks_skipped = 0;
>   	sector_t chunk_mask = conf->geo.chunk_mask;
>   	int page_idx = 0;
> +	int error_disk = -1;
>   
>   	/*
>   	 * Allow skipping a full rebuild for incremental assembly
> @@ -3386,7 +3387,18 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>   		return reshape_request(mddev, sector_nr, skipped);
>   
>   	if (chunks_skipped >= conf->geo.raid_disks) {
> -		/* if there has been nothing to do on any drive,
> +		pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
> +			test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ?  "resync" : "recovery");

Line exceed 80 columns, and following.
> +		if (error_disk >= 0 && !test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {

Resync has the same problem, right?

Thanks,
Kuai
> +			/*
> +			 * recovery fail, set mirrors.recovory_disabled,
> +			 * device shouldn't be added to there.
> +			 */
> +			conf->mirrors[error_disk].recovery_disabled = mddev->recovery_disabled;
> +			return 0;
> +		}
> +		/*
> +		 * if there has been nothing to do on any drive,
>   		 * then there is nothing to do at all..
>   		 */
>   		*skipped = 1;
> @@ -3640,6 +3652,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>   						       mdname(mddev));
>   					mirror->recovery_disabled
>   						= mddev->recovery_disabled;
> +				} else {
> +					error_disk = i;
>   				}
>   				put_buf(r10_bio);
>   				if (rb2)
>
  
Song Liu May 22, 2023, 4:03 p.m. UTC | #2
On Mon, May 22, 2023 at 6:54 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/05/22 19:54, linan666@huaweicloud.com 写道:
> > From: Li Nan <linan122@huawei.com>
> >
> > Recovery will go to giveup and let chunks_skipped++ in
> > raid10_sync_request() if there are some bad_blocks, and it will return
> > max_sector when chunks_skipped >= geo.raid_disks. Now, recovery fail and
> > data is inconsistent but user think recovery is done, it is wrong.
> >
> > Fix it by set mirror's recovery_disabled and spare device shouln't be
> > added to here.
> >
> > Signed-off-by: Li Nan <linan122@huawei.com>
> > ---
> >   drivers/md/raid10.c | 16 +++++++++++++++-
> >   1 file changed, 15 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> > index e21502c03b45..70cc87c7ee57 100644
> > --- a/drivers/md/raid10.c
> > +++ b/drivers/md/raid10.c
> > @@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> >       int chunks_skipped = 0;
> >       sector_t chunk_mask = conf->geo.chunk_mask;
> >       int page_idx = 0;
> > +     int error_disk = -1;
> >
> >       /*
> >        * Allow skipping a full rebuild for incremental assembly
> > @@ -3386,7 +3387,18 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> >               return reshape_request(mddev, sector_nr, skipped);
> >
> >       if (chunks_skipped >= conf->geo.raid_disks) {
> > -             /* if there has been nothing to do on any drive,
> > +             pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
> > +                     test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ?  "resync" : "recovery");
>
> Line exceed 80 columns, and following.

If it makes the code look better, such as in this case, it is OK to have
lines longer than 80 columns.

Thanks,
Song

> > +             if (error_disk >= 0 && !test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
>
  
Li Nan May 25, 2023, 2 p.m. UTC | #3
在 2023/5/22 21:54, Yu Kuai 写道:
> Hi,
> 
> 在 2023/05/22 19:54, linan666@huaweicloud.com 写道:
>> From: Li Nan <linan122@huawei.com>
>>
>> Recovery will go to giveup and let chunks_skipped++ in
>> raid10_sync_request() if there are some bad_blocks, and it will return
>> max_sector when chunks_skipped >= geo.raid_disks. Now, recovery fail and
>> data is inconsistent but user think recovery is done, it is wrong.
>>
>> Fix it by set mirror's recovery_disabled and spare device shouln't be
>> added to here.
>>
>> Signed-off-by: Li Nan <linan122@huawei.com>
>> ---
>>   drivers/md/raid10.c | 16 +++++++++++++++-
>>   1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>> index e21502c03b45..70cc87c7ee57 100644
>> --- a/drivers/md/raid10.c
>> +++ b/drivers/md/raid10.c
>> @@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct mddev 
>> *mddev, sector_t sector_nr,
>>       int chunks_skipped = 0;
>>       sector_t chunk_mask = conf->geo.chunk_mask;
>>       int page_idx = 0;
>> +    int error_disk = -1;
>>       /*
>>        * Allow skipping a full rebuild for incremental assembly
>> @@ -3386,7 +3387,18 @@ static sector_t raid10_sync_request(struct 
>> mddev *mddev, sector_t sector_nr,
>>           return reshape_request(mddev, sector_nr, skipped);
>>       if (chunks_skipped >= conf->geo.raid_disks) {
>> -        /* if there has been nothing to do on any drive,
>> +        pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
>> +            test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ?  "resync" 
>> : "recovery");
> 
> Line exceed 80 columns, and following.
>> +        if (error_disk >= 0 && !test_bit(MD_RECOVERY_SYNC, 
>> &mddev->recovery)) {
> 
> Resync has the same problem, right?
> 

Yes. But I have no idea to fix it. md_error disk nor set 
recovery_disabled is a good solution. So, just print error message now.
Do you have any ideas?
  
Yu Kuai May 26, 2023, 2:55 a.m. UTC | #4
Hi,

在 2023/05/25 22:00, Li Nan 写道:
> 
> 
> 在 2023/5/22 21:54, Yu Kuai 写道:
>> Hi,
>>
>> 在 2023/05/22 19:54, linan666@huaweicloud.com 写道:
>>> From: Li Nan <linan122@huawei.com>
>>>
>>> Recovery will go to giveup and let chunks_skipped++ in
>>> raid10_sync_request() if there are some bad_blocks, and it will return
>>> max_sector when chunks_skipped >= geo.raid_disks. Now, recovery fail and
>>> data is inconsistent but user think recovery is done, it is wrong.
>>>
>>> Fix it by set mirror's recovery_disabled and spare device shouln't be
>>> added to here.
>>>
>>> Signed-off-by: Li Nan <linan122@huawei.com>
>>> ---
>>>   drivers/md/raid10.c | 16 +++++++++++++++-
>>>   1 file changed, 15 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>>> index e21502c03b45..70cc87c7ee57 100644
>>> --- a/drivers/md/raid10.c
>>> +++ b/drivers/md/raid10.c
>>> @@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct 
>>> mddev *mddev, sector_t sector_nr,
>>>       int chunks_skipped = 0;
>>>       sector_t chunk_mask = conf->geo.chunk_mask;
>>>       int page_idx = 0;
>>> +    int error_disk = -1;
>>>       /*
>>>        * Allow skipping a full rebuild for incremental assembly
>>> @@ -3386,7 +3387,18 @@ static sector_t raid10_sync_request(struct 
>>> mddev *mddev, sector_t sector_nr,
>>>           return reshape_request(mddev, sector_nr, skipped);
>>>       if (chunks_skipped >= conf->geo.raid_disks) {
>>> -        /* if there has been nothing to do on any drive,
>>> +        pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
>>> +            test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ?  "resync" 
>>> : "recovery");
>>
>> Line exceed 80 columns, and following.
>>> +        if (error_disk >= 0 && !test_bit(MD_RECOVERY_SYNC, 
>>> &mddev->recovery)) {
>>
>> Resync has the same problem, right?
>>
> 
> Yes. But I have no idea to fix it. md_error disk nor set 
> recovery_disabled is a good solution. So, just print error message now.
> Do you have any ideas?

I'll look into this, in the meadtime, I don't suggest to apply this
patch because this is just temporary solution that only fix half of
the problem.

Thanks,
Kuai
  

Patch

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index e21502c03b45..70cc87c7ee57 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3303,6 +3303,7 @@  static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 	int chunks_skipped = 0;
 	sector_t chunk_mask = conf->geo.chunk_mask;
 	int page_idx = 0;
+	int error_disk = -1;
 
 	/*
 	 * Allow skipping a full rebuild for incremental assembly
@@ -3386,7 +3387,18 @@  static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 		return reshape_request(mddev, sector_nr, skipped);
 
 	if (chunks_skipped >= conf->geo.raid_disks) {
-		/* if there has been nothing to do on any drive,
+		pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
+			test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ?  "resync" : "recovery");
+		if (error_disk >= 0 && !test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
+			/*
+			 * recovery fail, set mirrors.recovory_disabled,
+			 * device shouldn't be added to there.
+			 */
+			conf->mirrors[error_disk].recovery_disabled = mddev->recovery_disabled;
+			return 0;
+		}
+		/*
+		 * if there has been nothing to do on any drive,
 		 * then there is nothing to do at all..
 		 */
 		*skipped = 1;
@@ -3640,6 +3652,8 @@  static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 						       mdname(mddev));
 					mirror->recovery_disabled
 						= mddev->recovery_disabled;
+				} else {
+					error_disk = i;
 				}
 				put_buf(r10_bio);
 				if (rb2)