diff mbox series

[v2] md/raid5: don't allow concurrent reshape with recovery

Message ID	20230529133410.2125914-1-yukuai1@huaweicloud.com
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; From: Yu Kuai <yukuai1@huaweicloud.com> To: song@kernel.org, pmenzel@molgen.mpg.de Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH v2] md/raid5: don't allow concurrent reshape with recovery Date: Mon, 29 May 2023 21:34:10 +0800 Message-Id: <20230529133410.2125914-1-yukuai1@huaweicloud.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[v2] md/raid5: don't allow concurrent reshape with recovery \| [v2] md/raid5: don't allow concurrent reshape with recovery

Commit Message

Yu Kuai May 29, 2023, 1:34 p.m. UTC

  From: Yu Kuai <yukuai3@huawei.com>

Commit 0aecb06e2249 ("md/raid5: don't allow replacement while reshape
is in progress") fixes that replacement can be set if reshape is
interrupted, which will cause that array can't be assembled.

There is a similar problem on the other side, if recovery is
interrupted, then reshape can start, which will cause the same problem.

Fix the problem by not starting to reshape while recovery is still in
progress.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
Changes in v2:
 - fix some typo in commit message.

 drivers/md/raid5.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Song Liu May 30, 2023, 9:19 p.m. UTC | #1

On Mon, May 29, 2023 at 6:37 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@huawei.com>
>
> Commit 0aecb06e2249 ("md/raid5: don't allow replacement while reshape
> is in progress") fixes that replacement can be set if reshape is
> interrupted, which will cause that array can't be assembled.
>
> There is a similar problem on the other side, if recovery is
> interrupted, then reshape can start, which will cause the same problem.
>
> Fix the problem by not starting to reshape while recovery is still in
> progress.
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Applied to md-next.

Thanks,
Song

> ---
> Changes in v2:
>  - fix some typo in commit message.
>
>  drivers/md/raid5.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 8686d629e3f2..6615abf54d3f 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -8525,6 +8525,7 @@ static int raid5_start_reshape(struct mddev *mddev)
>         struct r5conf *conf = mddev->private;
>         struct md_rdev *rdev;
>         int spares = 0;
> +       int i;
>         unsigned long flags;
>
>         if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
> @@ -8536,6 +8537,13 @@ static int raid5_start_reshape(struct mddev *mddev)
>         if (has_failed(conf))
>                 return -EINVAL;
>
> +       /* raid5 can't handle concurrent reshape and recovery */
> +       if (mddev->recovery_cp < MaxSector)
> +               return -EBUSY;
> +       for (i = 0; i < conf->raid_disks; i++)
> +               if (rdev_mdlock_deref(mddev, conf->disks[i].replacement))
> +                       return -EBUSY;
> +
>         rdev_for_each(rdev, mddev) {
>                 if (!test_bit(In_sync, &rdev->flags)
>                     && !test_bit(Faulty, &rdev->flags))
> --
> 2.39.2
>

Guoqing Jiang May 31, 2023, 1:06 a.m. UTC | #2

On 5/29/23 21:34, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
>
> Commit 0aecb06e2249 ("md/raid5: don't allow replacement while reshape
> is in progress") fixes that replacement can be set if reshape is
> interrupted, which will cause that array can't be assembled.
>
> There is a similar problem on the other side, if recovery is
> interrupted, then reshape can start, which will cause the same problem.
>
> Fix the problem by not starting to reshape while recovery is still in
> progress.
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
> Changes in v2:
>   - fix some typo in commit message.
>
>   drivers/md/raid5.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 8686d629e3f2..6615abf54d3f 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -8525,6 +8525,7 @@ static int raid5_start_reshape(struct mddev *mddev)
>   	struct r5conf *conf = mddev->private;
>   	struct md_rdev *rdev;
>   	int spares = 0;
> +	int i;
>   	unsigned long flags;
>   
>   	if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
> @@ -8536,6 +8537,13 @@ static int raid5_start_reshape(struct mddev *mddev)
>   	if (has_failed(conf))
>   		return -EINVAL;
>   
> +	/* raid5 can't handle concurrent reshape and recovery */
> +	if (mddev->recovery_cp < MaxSector)
> +		return -EBUSY;
> +	for (i = 0; i < conf->raid_disks; i++)
> +		if (rdev_mdlock_deref(mddev, conf->disks[i].replacement))
> +			return -EBUSY;
> +

Does it mean reshape and recovery  can happen in parallel without the 
change?
I really doubt about it given any kind of internal io (resync, reshape 
and recovery)
is handled by resync thread. And IIUC either md_do_sync or md_check_recovery
should avoid it, no need to do it in personality layer.

Thanks,
Guoqing

Yu Kuai May 31, 2023, 1:22 a.m. UTC | #3

Hi,

在 2023/05/31 9:06, Guoqing Jiang 写道:
> 
> 
> On 5/29/23 21:34, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Commit 0aecb06e2249 ("md/raid5: don't allow replacement while reshape
>> is in progress") fixes that replacement can be set if reshape is
>> interrupted, which will cause that array can't be assembled.
>>
>> There is a similar problem on the other side, if recovery is
>> interrupted, then reshape can start, which will cause the same problem.
>>
>> Fix the problem by not starting to reshape while recovery is still in
>> progress.
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>> Changes in v2:
>>   - fix some typo in commit message.
>>
>>   drivers/md/raid5.c | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index 8686d629e3f2..6615abf54d3f 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -8525,6 +8525,7 @@ static int raid5_start_reshape(struct mddev *mddev)
>>       struct r5conf *conf = mddev->private;
>>       struct md_rdev *rdev;
>>       int spares = 0;
>> +    int i;
>>       unsigned long flags;
>>       if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
>> @@ -8536,6 +8537,13 @@ static int raid5_start_reshape(struct mddev 
>> *mddev)
>>       if (has_failed(conf))
>>           return -EINVAL;
>> +    /* raid5 can't handle concurrent reshape and recovery */
>> +    if (mddev->recovery_cp < MaxSector)
>> +        return -EBUSY;
>> +    for (i = 0; i < conf->raid_disks; i++)
>> +        if (rdev_mdlock_deref(mddev, conf->disks[i].replacement))
>> +            return -EBUSY;
>> +
> 
> Does it mean reshape and recovery  can happen in parallel without the 
> change?
> I really doubt about it given any kind of internal io (resync, reshape 
> and recovery)
> is handled by resync thread. And IIUC either md_do_sync or 
> md_check_recovery
> should avoid it, no need to do it in personality layer.
> 

They can't, in this case recovery is interrupted, then recovery can't
make progress, and md_check_recovery() will start reshape, and after
reshape is done, recovery will continue, and data will be corrupted
because raid456 reshape doesn't handle replacement.

And by the way in raid456 is that if system reboot, this array can't be
assembled, raid5_run() will fail if reshape and replacement are both
set.

Thanks,
Kuai

Yu Kuai May 31, 2023, 1:29 a.m. UTC | #4

Hi,

在 2023/05/31 9:22, Yu Kuai 写道:
> Hi,
> 
> 在 2023/05/31 9:06, Guoqing Jiang 写道:
>>
>>
>> On 5/29/23 21:34, Yu Kuai wrote:
>>> From: Yu Kuai <yukuai3@huawei.com>
>>>
>>> Commit 0aecb06e2249 ("md/raid5: don't allow replacement while reshape
>>> is in progress") fixes that replacement can be set if reshape is
>>> interrupted, which will cause that array can't be assembled.
>>>
>>> There is a similar problem on the other side, if recovery is
>>> interrupted, then reshape can start, which will cause the same problem.
>>>
>>> Fix the problem by not starting to reshape while recovery is still in
>>> progress.
>>>
>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>>> ---
>>> Changes in v2:
>>>   - fix some typo in commit message.
>>>
>>>   drivers/md/raid5.c | 8 ++++++++
>>>   1 file changed, 8 insertions(+)
>>>
>>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>>> index 8686d629e3f2..6615abf54d3f 100644
>>> --- a/drivers/md/raid5.c
>>> +++ b/drivers/md/raid5.c
>>> @@ -8525,6 +8525,7 @@ static int raid5_start_reshape(struct mddev 
>>> *mddev)
>>>       struct r5conf *conf = mddev->private;
>>>       struct md_rdev *rdev;
>>>       int spares = 0;
>>> +    int i;
>>>       unsigned long flags;
>>>       if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
>>> @@ -8536,6 +8537,13 @@ static int raid5_start_reshape(struct mddev 
>>> *mddev)
>>>       if (has_failed(conf))
>>>           return -EINVAL;
>>> +    /* raid5 can't handle concurrent reshape and recovery */
>>> +    if (mddev->recovery_cp < MaxSector)
>>> +        return -EBUSY;
>>> +    for (i = 0; i < conf->raid_disks; i++)
>>> +        if (rdev_mdlock_deref(mddev, conf->disks[i].replacement))
>>> +            return -EBUSY;
>>> +
>>
>> Does it mean reshape and recovery  can happen in parallel without the 
>> change?
>> I really doubt about it given any kind of internal io (resync, reshape 
>> and recovery)
>> is handled by resync thread. And IIUC either md_do_sync or 
>> md_check_recovery
>> should avoid it, no need to do it in personality layer.
>>
> 
> They can't, in this case recovery is interrupted, then recovery can't
> make progress, and md_check_recovery() will start reshape, and after
> reshape is done, recovery will continue, and data will be corrupted
> because raid456 reshape doesn't handle replacement.
> 
> And by the way in raid456 is that if system reboot, this array can't be
> assembled, raid5_run() will fail if reshape and replacement are both
> set.

And someone reported this reboot case, I also add new test for this
case, you can take a look.

Thanks,
Kuai

Guoqing Jiang May 31, 2023, 1:49 a.m. UTC | #5

On 5/31/23 09:22, Yu Kuai wrote:
> Hi,
>
> 在 2023/05/31 9:06, Guoqing Jiang 写道:
>>
>>
>> On 5/29/23 21:34, Yu Kuai wrote:
>>> From: Yu Kuai <yukuai3@huawei.com>
>>>
>>> Commit 0aecb06e2249 ("md/raid5: don't allow replacement while reshape
>>> is in progress") fixes that replacement can be set if reshape is
>>> interrupted, which will cause that array can't be assembled.

I just pulled md tree, but can't find the commit id either in md-next or 
md-fixes .
gjiang@pc:~/storage/md> git branch
  master
  md-fixes
* md-next
gjiang@pc:~/storage/md> git branch --contain 0aecb06e2249
error: malformed object name 0aecb06e2249

>>> There is a similar problem on the other side, if recovery is
>>> interrupted, then reshape can start, which will cause the same problem.
>>>
>>> Fix the problem by not starting to reshape while recovery is still in
>>> progress.
>>>
>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>>> ---
>>> Changes in v2:
>>>   - fix some typo in commit message.
>>>
>>>   drivers/md/raid5.c | 8 ++++++++
>>>   1 file changed, 8 insertions(+)
>>>
>>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>>> index 8686d629e3f2..6615abf54d3f 100644
>>> --- a/drivers/md/raid5.c
>>> +++ b/drivers/md/raid5.c
>>> @@ -8525,6 +8525,7 @@ static int raid5_start_reshape(struct mddev 
>>> *mddev)
>>>       struct r5conf *conf = mddev->private;
>>>       struct md_rdev *rdev;
>>>       int spares = 0;
>>> +    int i;
>>>       unsigned long flags;
>>>       if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
>>> @@ -8536,6 +8537,13 @@ static int raid5_start_reshape(struct mddev 
>>> *mddev)
>>>       if (has_failed(conf))
>>>           return -EINVAL;
>>> +    /* raid5 can't handle concurrent reshape and recovery */
>>> +    if (mddev->recovery_cp < MaxSector)
>>> +        return -EBUSY;
>>> +    for (i = 0; i < conf->raid_disks; i++)
>>> +        if (rdev_mdlock_deref(mddev, conf->disks[i].replacement))
>>> +            return -EBUSY;
>>> +
>>
>> Does it mean reshape and recovery  can happen in parallel without the 
>> change?
>> I really doubt about it given any kind of internal io (resync, 
>> reshape and recovery)
>> is handled by resync thread. And IIUC either md_do_sync or 
>> md_check_recovery
>> should avoid it, no need to do it in personality layer.
>>
>
> They can't, in this case recovery is interrupted, then recovery can't
> make progress, and md_check_recovery() will start reshape, and after
> reshape is done, recovery will continue, and data will be corrupted
> because raid456 reshape doesn't handle replacement.

So, do reshape first then recovery, right? I don't see concurrent 
reshape and recovery
happen based on your description, if concurrent reshape and recovery is 
possible
then I believe we really have big trouble.

> And by the way in raid456 is that if system reboot, this array can't be
> assembled, raid5_run() will fail if reshape and replacement are both
> set.

Assemble an array need to read data from sb, I don't know which place 
record replacement,
I probably misunderstand something.

Thanks,
Guoqing

Yu Kuai May 31, 2023, 3:20 a.m. UTC | #6

Hi,

在 2023/05/31 9:49, Guoqing Jiang 写道:
> 
> 
> On 5/31/23 09:22, Yu Kuai wrote:
>> Hi,
>>
>> 在 2023/05/31 9:06, Guoqing Jiang 写道:
>>>
>>>
>>> On 5/29/23 21:34, Yu Kuai wrote:
>>>> From: Yu Kuai <yukuai3@huawei.com>
>>>>
>>>> Commit 0aecb06e2249 ("md/raid5: don't allow replacement while reshape
>>>> is in progress") fixes that replacement can be set if reshape is
>>>> interrupted, which will cause that array can't be assembled.
> 
> I just pulled md tree, but can't find the commit id either in md-next or 
> md-fixes .
> gjiang@pc:~/storage/md> git branch
>   master
>   md-fixes
> * md-next
> gjiang@pc:~/storage/md> git branch --contain 0aecb06e2249
> error: malformed object name 0aecb06e2249
> 
>>>> There is a similar problem on the other side, if recovery is
>>>> interrupted, then reshape can start, which will cause the same problem.
>>>>
>>>> Fix the problem by not starting to reshape while recovery is still in
>>>> progress.
>>>>
>>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>>>> ---
>>>> Changes in v2:
>>>>   - fix some typo in commit message.
>>>>
>>>>   drivers/md/raid5.c | 8 ++++++++
>>>>   1 file changed, 8 insertions(+)
>>>>
>>>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>>>> index 8686d629e3f2..6615abf54d3f 100644
>>>> --- a/drivers/md/raid5.c
>>>> +++ b/drivers/md/raid5.c
>>>> @@ -8525,6 +8525,7 @@ static int raid5_start_reshape(struct mddev 
>>>> *mddev)
>>>>       struct r5conf *conf = mddev->private;
>>>>       struct md_rdev *rdev;
>>>>       int spares = 0;
>>>> +    int i;
>>>>       unsigned long flags;
>>>>       if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
>>>> @@ -8536,6 +8537,13 @@ static int raid5_start_reshape(struct mddev 
>>>> *mddev)
>>>>       if (has_failed(conf))
>>>>           return -EINVAL;
>>>> +    /* raid5 can't handle concurrent reshape and recovery */
>>>> +    if (mddev->recovery_cp < MaxSector)
>>>> +        return -EBUSY;
>>>> +    for (i = 0; i < conf->raid_disks; i++)
>>>> +        if (rdev_mdlock_deref(mddev, conf->disks[i].replacement))
>>>> +            return -EBUSY;
>>>> +
>>>
>>> Does it mean reshape and recovery  can happen in parallel without the 
>>> change?
>>> I really doubt about it given any kind of internal io (resync, 
>>> reshape and recovery)
>>> is handled by resync thread. And IIUC either md_do_sync or 
>>> md_check_recovery
>>> should avoid it, no need to do it in personality layer.
>>>
>>
>> They can't, in this case recovery is interrupted, then recovery can't
>> make progress, and md_check_recovery() will start reshape, and after
>> reshape is done, recovery will continue, and data will be corrupted
>> because raid456 reshape doesn't handle replacement.
> 
> So, do reshape first then recovery, right? I don't see concurrent 
> reshape and recovery
> happen based on your description, if concurrent reshape and recovery is 
> possible
> then I believe we really have big trouble.

Yes, reshape first, and yes they can't concurrent.
> 
>> And by the way in raid456 is that if system reboot, this array can't be
>> assembled, raid5_run() will fail if reshape and replacement are both
>> set.
> 
> Assemble an array need to read data from sb, I don't know which place 
> record replacement,
> I probably misunderstand something.

It's in rdev->flags, if MD_FEATURE_REPLACEMENT is set in rdev
metadata(sb->feature_map), then this rdev will mark Replacement, and
later this rdev will set to mirros[]->replacement in setup_conf().

Thanks,
Kuai

Guoqing Jiang May 31, 2023, 7:33 a.m. UTC | #7

On 5/31/23 11:20, Yu Kuai wrote:
> Hi,
>
> 在 2023/05/31 9:49, Guoqing Jiang 写道:
>>
>>
>> On 5/31/23 09:22, Yu Kuai wrote:
>>> Hi,
>>>
>>> 在 2023/05/31 9:06, Guoqing Jiang 写道:
>>>>
>>>>
>>>> On 5/29/23 21:34, Yu Kuai wrote:
>>>>> From: Yu Kuai <yukuai3@huawei.com>
>>>>>
>>>>> Commit 0aecb06e2249 ("md/raid5: don't allow replacement while reshape
>>>>> is in progress") fixes that replacement can be set if reshape is
>>>>> interrupted, which will cause that array can't be assembled.
>>
>> I just pulled md tree, but can't find the commit id either in md-next 
>> or md-fixes .
>> gjiang@pc:~/storage/md> git branch
>>   master
>>   md-fixes
>> * md-next
>> gjiang@pc:~/storage/md> git branch --contain 0aecb06e2249
>> error: malformed object name 0aecb06e2249
>>
>>>>> There is a similar problem on the other side, if recovery is
>>>>> interrupted, then reshape can start, which will cause the same 
>>>>> problem.
>>>>>
>>>>> Fix the problem by not starting to reshape while recovery is still in
>>>>> progress.
>>>>>
>>>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>>>>> ---
>>>>> Changes in v2:
>>>>>   - fix some typo in commit message.
>>>>>
>>>>>   drivers/md/raid5.c | 8 ++++++++
>>>>>   1 file changed, 8 insertions(+)
>>>>>
>>>>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>>>>> index 8686d629e3f2..6615abf54d3f 100644
>>>>> --- a/drivers/md/raid5.c
>>>>> +++ b/drivers/md/raid5.c
>>>>> @@ -8525,6 +8525,7 @@ static int raid5_start_reshape(struct mddev 
>>>>> *mddev)
>>>>>       struct r5conf *conf = mddev->private;
>>>>>       struct md_rdev *rdev;
>>>>>       int spares = 0;
>>>>> +    int i;
>>>>>       unsigned long flags;
>>>>>       if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
>>>>> @@ -8536,6 +8537,13 @@ static int raid5_start_reshape(struct mddev 
>>>>> *mddev)
>>>>>       if (has_failed(conf))
>>>>>           return -EINVAL;
>>>>> +    /* raid5 can't handle concurrent reshape and recovery */
>>>>> +    if (mddev->recovery_cp < MaxSector)
>>>>> +        return -EBUSY;
>>>>> +    for (i = 0; i < conf->raid_disks; i++)
>>>>> +        if (rdev_mdlock_deref(mddev, conf->disks[i].replacement))
>>>>> +            return -EBUSY;
>>>>> +
>>>>
>>>> Does it mean reshape and recovery  can happen in parallel without 
>>>> the change?
>>>> I really doubt about it given any kind of internal io (resync, 
>>>> reshape and recovery)
>>>> is handled by resync thread. And IIUC either md_do_sync or 
>>>> md_check_recovery
>>>> should avoid it, no need to do it in personality layer.
>>>>
>>>
>>> They can't, in this case recovery is interrupted, then recovery can't
>>> make progress, and md_check_recovery() will start reshape, and after
>>> reshape is done, recovery will continue, and data will be corrupted
>>> because raid456 reshape doesn't handle replacement.
>>
>> So, do reshape first then recovery, right? I don't see concurrent 
>> reshape and recovery
>> happen based on your description, if concurrent reshape and recovery 
>> is possible
>> then I believe we really have big trouble.
>
> Yes, reshape first, and yes they can't concurrent.

Then the subject, commit message and above comment are confusing given 
they can't happen
even without your change.

>
>>
>>> And by the way in raid456 is that if system reboot, this array can't be
>>> assembled, raid5_run() will fail if reshape and replacement are both
>>> set.
>>
>> Assemble an array need to read data from sb, I don't know which place 
>> record replacement,
>> I probably misunderstand something.
>
> It's in rdev->flags, if MD_FEATURE_REPLACEMENT is set in rdev
> metadata(sb->feature_map), then this rdev will mark Replacement, and
> later this rdev will set to mirros[]->replacement in setup_conf().

Yes, I missed that.

Thanks,
Guoqing

Song Liu June 5, 2023, 11:59 p.m. UTC | #8

On Wed, May 31, 2023 at 12:34 AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
>
>
>
> On 5/31/23 11:20, Yu Kuai wrote:
> > Hi,
> >
> > 在 2023/05/31 9:49, Guoqing Jiang 写道:
> >>
> >>
> >> On 5/31/23 09:22, Yu Kuai wrote:
> >>> Hi,
> >>>
> >>> 在 2023/05/31 9:06, Guoqing Jiang 写道:
> >>>>
> >>>>
> >>>> On 5/29/23 21:34, Yu Kuai wrote:
> >>>>> From: Yu Kuai <yukuai3@huawei.com>
> >>>>>
> >>>>> Commit 0aecb06e2249 ("md/raid5: don't allow replacement while reshape
> >>>>> is in progress") fixes that replacement can be set if reshape is
> >>>>> interrupted, which will cause that array can't be assembled.
> >>
> >> I just pulled md tree, but can't find the commit id either in md-next
> >> or md-fixes .
> >> gjiang@pc:~/storage/md> git branch
> >>   master
> >>   md-fixes
> >> * md-next
> >> gjiang@pc:~/storage/md> git branch --contain 0aecb06e2249
> >> error: malformed object name 0aecb06e2249
> >>
> >>>>> There is a similar problem on the other side, if recovery is
> >>>>> interrupted, then reshape can start, which will cause the same
> >>>>> problem.
> >>>>>
> >>>>> Fix the problem by not starting to reshape while recovery is still in
> >>>>> progress.
> >>>>>
> >>>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> >>>>> ---
> >>>>> Changes in v2:
> >>>>>   - fix some typo in commit message.
> >>>>>
> >>>>>   drivers/md/raid5.c | 8 ++++++++
> >>>>>   1 file changed, 8 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> >>>>> index 8686d629e3f2..6615abf54d3f 100644
> >>>>> --- a/drivers/md/raid5.c
> >>>>> +++ b/drivers/md/raid5.c
> >>>>> @@ -8525,6 +8525,7 @@ static int raid5_start_reshape(struct mddev
> >>>>> *mddev)
> >>>>>       struct r5conf *conf = mddev->private;
> >>>>>       struct md_rdev *rdev;
> >>>>>       int spares = 0;
> >>>>> +    int i;
> >>>>>       unsigned long flags;
> >>>>>       if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
> >>>>> @@ -8536,6 +8537,13 @@ static int raid5_start_reshape(struct mddev
> >>>>> *mddev)
> >>>>>       if (has_failed(conf))
> >>>>>           return -EINVAL;
> >>>>> +    /* raid5 can't handle concurrent reshape and recovery */
> >>>>> +    if (mddev->recovery_cp < MaxSector)
> >>>>> +        return -EBUSY;
> >>>>> +    for (i = 0; i < conf->raid_disks; i++)
> >>>>> +        if (rdev_mdlock_deref(mddev, conf->disks[i].replacement))
> >>>>> +            return -EBUSY;
> >>>>> +
> >>>>
> >>>> Does it mean reshape and recovery  can happen in parallel without
> >>>> the change?
> >>>> I really doubt about it given any kind of internal io (resync,
> >>>> reshape and recovery)
> >>>> is handled by resync thread. And IIUC either md_do_sync or
> >>>> md_check_recovery
> >>>> should avoid it, no need to do it in personality layer.
> >>>>
> >>>
> >>> They can't, in this case recovery is interrupted, then recovery can't
> >>> make progress, and md_check_recovery() will start reshape, and after
> >>> reshape is done, recovery will continue, and data will be corrupted
> >>> because raid456 reshape doesn't handle replacement.
> >>
> >> So, do reshape first then recovery, right? I don't see concurrent
> >> reshape and recovery
> >> happen based on your description, if concurrent reshape and recovery
> >> is possible
> >> then I believe we really have big trouble.
> >
> > Yes, reshape first, and yes they can't concurrent.
>
> Then the subject, commit message and above comment are confusing given
> they can't happen
> even without your change.
>

I updated the commit message as:

    md/raid5: don't start reshape when recovery or replace is in progress

    When recovery is interrupted (reboot, etc.) check for MD_RECOVERY_RUNNING
    is not enough to tell recovery is in progress. Also check recovery_cp
   before starting reshape.

Please let me know if this doesn't make sense.

Thanks,
Song

diff mbox series

Patch

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 8686d629e3f2..6615abf54d3f 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -8525,6 +8525,7 @@  static int raid5_start_reshape(struct mddev *mddev)
 	struct r5conf *conf = mddev->private;
 	struct md_rdev *rdev;
 	int spares = 0;
+	int i;
 	unsigned long flags;
 
 	if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
@@ -8536,6 +8537,13 @@  static int raid5_start_reshape(struct mddev *mddev)
 	if (has_failed(conf))
 		return -EINVAL;
 
+	/* raid5 can't handle concurrent reshape and recovery */
+	if (mddev->recovery_cp < MaxSector)
+		return -EBUSY;
+	for (i = 0; i < conf->raid_disks; i++)
+		if (rdev_mdlock_deref(mddev, conf->disks[i].replacement))
+			return -EBUSY;
+
 	rdev_for_each(rdev, mddev) {
 		if (!test_bit(In_sync, &rdev->flags)
 		    && !test_bit(Faulty, &rdev->flags))