[-next] md/raid5: don't allow concurrent reshape with recovery

Message ID 20230529031045.1760883-1-yukuai1@huaweicloud.com
State New
Headers
Series [-next] md/raid5: don't allow concurrent reshape with recovery |

Commit Message

Yu Kuai May 29, 2023, 3:10 a.m. UTC
  From: Yu Kuai <yukuai3@huawei.com>

commit 0aecb06e2249 ("md/raid5: don't allow replacement while reshape
is in progress") fix that replacement can be set if reshape is
interrupted, which will cause that array can't be assemebled.

There is a similar on the other side, if recovery is interrupted, then
reshape can start, which will cause the same problem.

Fix the prblem by don't start reshape is recovery is still in progress.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid5.c | 8 ++++++++
 1 file changed, 8 insertions(+)
  

Comments

Paul Menzel May 29, 2023, 6:44 a.m. UTC | #1
Dear Yu,


Thank you for your patch. As always some minor commons, you can also ignore.

Am 29.05.23 um 05:10 schrieb Yu Kuai:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> commit 0aecb06e2249 ("md/raid5: don't allow replacement while reshape

I’d start with a capital letter: Commit …

> is in progress") fix that replacement can be set if reshape is

fixes

> interrupted, which will cause that array can't be assemebled.

assembled

> There is a similar on the other side, if recovery is interrupted, then

similar *problem*?

> reshape can start, which will cause the same problem.
> 
> Fix the prblem by don't start reshape is recovery is still in progress.

•  problem
•  … by not starting to reshape while recovery is still in progress

> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>   drivers/md/raid5.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 64865f9dd3f5..6db783ca71b7 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -8500,6 +8500,7 @@ static int raid5_start_reshape(struct mddev *mddev)
>   	struct r5conf *conf = mddev->private;
>   	struct md_rdev *rdev;
>   	int spares = 0;
> +	int i;

It won’t make a difference for the code I believe, but as the count 
variable can’t be negative, I’d use `unsigned int`.

>   	unsigned long flags;
>   
>   	if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
> @@ -8511,6 +8512,13 @@ static int raid5_start_reshape(struct mddev *mddev)
>   	if (has_failed(conf))
>   		return -EINVAL;
>   
> +	/* raid5 can't handle concurrent reshape and recovery */
> +	if (mddev->recovery_cp < MaxSector)
> +		return -EBUSY;
> +	for (i = 0; i < conf->raid_disks; i++)
> +		if (rdev_mdlock_deref(mddev, conf->disks[i].replacement))
> +			return -EBUSY;
> +
>   	rdev_for_each(rdev, mddev) {
>   		if (!test_bit(In_sync, &rdev->flags)
>   		    && !test_bit(Faulty, &rdev->flags))


Kind regards,

Paul
  
Yu Kuai May 29, 2023, 1:36 p.m. UTC | #2
Hi,

在 2023/05/29 14:44, Paul Menzel 写道:
> Dear Yu,
> 
> 
> Thank you for your patch. As always some minor commons, you can also 
> ignore.

Thanks for there comments, I'll update them in v2 except that 'unsigned
int i'.

Thanks,
Kuai
> 
> Am 29.05.23 um 05:10 schrieb Yu Kuai:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> commit 0aecb06e2249 ("md/raid5: don't allow replacement while reshape
> 
> I’d start with a capital letter: Commit …
> 
>> is in progress") fix that replacement can be set if reshape is
> 
> fixes
> 
>> interrupted, which will cause that array can't be assemebled.
> 
> assembled
> 
>> There is a similar on the other side, if recovery is interrupted, then
> 
> similar *problem*?
> 
>> reshape can start, which will cause the same problem.
>>
>> Fix the prblem by don't start reshape is recovery is still in progress.
> 
> •  problem
> •  … by not starting to reshape while recovery is still in progress
> 
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>>   drivers/md/raid5.c | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index 64865f9dd3f5..6db783ca71b7 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -8500,6 +8500,7 @@ static int raid5_start_reshape(struct mddev *mddev)
>>       struct r5conf *conf = mddev->private;
>>       struct md_rdev *rdev;
>>       int spares = 0;
>> +    int i;
> 
> It won’t make a difference for the code I believe, but as the count 
> variable can’t be negative, I’d use `unsigned int`.
> 
>>       unsigned long flags;
>>       if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
>> @@ -8511,6 +8512,13 @@ static int raid5_start_reshape(struct mddev 
>> *mddev)
>>       if (has_failed(conf))
>>           return -EINVAL;
>> +    /* raid5 can't handle concurrent reshape and recovery */
>> +    if (mddev->recovery_cp < MaxSector)
>> +        return -EBUSY;
>> +    for (i = 0; i < conf->raid_disks; i++)
>> +        if (rdev_mdlock_deref(mddev, conf->disks[i].replacement))
>> +            return -EBUSY;
>> +
>>       rdev_for_each(rdev, mddev) {
>>           if (!test_bit(In_sync, &rdev->flags)
>>               && !test_bit(Faulty, &rdev->flags))
> 
> 
> Kind regards,
> 
> Paul
> 
> .
>
  

Patch

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 64865f9dd3f5..6db783ca71b7 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -8500,6 +8500,7 @@  static int raid5_start_reshape(struct mddev *mddev)
 	struct r5conf *conf = mddev->private;
 	struct md_rdev *rdev;
 	int spares = 0;
+	int i;
 	unsigned long flags;
 
 	if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
@@ -8511,6 +8512,13 @@  static int raid5_start_reshape(struct mddev *mddev)
 	if (has_failed(conf))
 		return -EINVAL;
 
+	/* raid5 can't handle concurrent reshape and recovery */
+	if (mddev->recovery_cp < MaxSector)
+		return -EBUSY;
+	for (i = 0; i < conf->raid_disks; i++)
+		if (rdev_mdlock_deref(mddev, conf->disks[i].replacement))
+			return -EBUSY;
+
 	rdev_for_each(rdev, mddev) {
 		if (!test_bit(In_sync, &rdev->flags)
 		    && !test_bit(Faulty, &rdev->flags))