Message ID | 20230529133410.2125914-1-yukuai1@huaweicloud.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp1542060vqr; Mon, 29 May 2023 07:09:12 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4IccphnecezuOVkKEgXQi/NrHsx4w6VjwDccbmaf9dtnHJ02CxV4OslsSwbGWl6hPgLse/ X-Received: by 2002:a05:6a00:190e:b0:63b:8afe:a4dc with SMTP id y14-20020a056a00190e00b0063b8afea4dcmr13430411pfi.30.1685369352549; Mon, 29 May 2023 07:09:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685369352; cv=none; d=google.com; s=arc-20160816; b=hOumqHolJEVocLWNW7LUgxZcAuokhhbJ8tqPMwkt6CUnS9UzKGHQrHn7obURH9p/WH 14N9DkV4hR6CXFQdOQVJftzaYOHnnsQLi9KyZOhHFFBFiB2tGZctPVJOyfDrZORr+4nm gxPbYT/hzLA3zQGQLdtG8BhiKUSsK5Hu+BWD0wEpUKoNwFd8ml1FoySxVpW99SQsyvJr 4frRAeJP6m5hhDKEIn8Ad+FWBVQT3t0Izjdf1DDDb8up+3vZMx90POWqEt/uVvJQ41ge DiNlqBbIxNFWQwdghOvqb9ZDc28hnGKtJ6HSXOjhEDjRQZ3GArfoCfIr+LxtvnQttsAg 9kiw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=mRNIJmxsQEgbIhDI69eR/504bG3+f1kBLaz9aoVio7Q=; b=BCu5ucl162xpUwl9iophHZwfq1EpEfzee4XiSSq/gXTTRbjMimm478EXAnUb8rkh1k zm6rBIBd6DFjziRwniw/YpXe64I0arUwDYSntS/lLw/FmlgkTpJTesR58RYwpv0WbQk/ k+cSOW3D+VMv86NRpcRs0DhAWlW+stL+Rc0jFc73JDwUU9ZGVw8ho/zdSYMFDT3al+Sq FXrsCvoluXZnhTY/AihYIoJYW4Dg8n7kzscp05Agp5BopoIThcEEgBXI1e4JE0U4eAa7 HZEWSdFwS8lzHObtaxZfktcc3dGRbqkHdzLFLStJ7r3a3cC/Z9FND4t+96QtBegkSEA2 Bc9A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s15-20020aa78bcf000000b0063d5cfd0867si9356930pfd.351.2023.05.29.07.08.58; Mon, 29 May 2023 07:09:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229871AbjE2Nhw (ORCPT <rfc822;callmefire3@gmail.com> + 99 others); Mon, 29 May 2023 09:37:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55976 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229536AbjE2Nhv (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 29 May 2023 09:37:51 -0400 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D802C94; Mon, 29 May 2023 06:37:49 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4QVGmn2zSLz4f3jY8; Mon, 29 May 2023 21:37:45 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgD3X7OqqnRk+gDpKQ--.35039S4; Mon, 29 May 2023 21:37:47 +0800 (CST) From: Yu Kuai <yukuai1@huaweicloud.com> To: song@kernel.org, pmenzel@molgen.mpg.de Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH v2] md/raid5: don't allow concurrent reshape with recovery Date: Mon, 29 May 2023 21:34:10 +0800 Message-Id: <20230529133410.2125914-1-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID: gCh0CgD3X7OqqnRk+gDpKQ--.35039S4 X-Coremail-Antispam: 1UD129KBjvJXoW7Cry3uFykXrykGr4rtF4DCFg_yoW8XF1fpa 93KFs8ur4UZw1akF4DA34DCFyY9FWDtrWrtFy3X34Fy3ZIqrWxuayrGFW5KryUJFWSqw4Y vw45JryDCry2kaDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUyK14x267AKxVW8JVW5JwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK02 1l84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4U JVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_Gc CE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E 2Ix0cI8IcVAFwI0_Jrv_JF1lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJV W8JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1l42xK82IYc2Ij64vI r41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8Gjc xK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43MIIYrxkI7VAKI48JMIIF0xvE2Ix0 cI8IcVAFwI0_Gr0_Xr1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1lIxAIcVCF04 k26cxKx2IYs7xG6r4j6FyUMIIF0xvEx4A2jsIE14v26r4j6F4UMIIF0xvEx4A2jsIEc7Cj xVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuYvjfUouWlDUUUU X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767198046027577124?= X-GMAIL-MSGID: =?utf-8?q?1767237853927086163?= |
Series |
[v2] md/raid5: don't allow concurrent reshape with recovery
|
|
Commit Message
Yu Kuai
May 29, 2023, 1:34 p.m. UTC
From: Yu Kuai <yukuai3@huawei.com> Commit 0aecb06e2249 ("md/raid5: don't allow replacement while reshape is in progress") fixes that replacement can be set if reshape is interrupted, which will cause that array can't be assembled. There is a similar problem on the other side, if recovery is interrupted, then reshape can start, which will cause the same problem. Fix the problem by not starting to reshape while recovery is still in progress. Signed-off-by: Yu Kuai <yukuai3@huawei.com> --- Changes in v2: - fix some typo in commit message. drivers/md/raid5.c | 8 ++++++++ 1 file changed, 8 insertions(+)
Comments
On Mon, May 29, 2023 at 6:37 AM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > From: Yu Kuai <yukuai3@huawei.com> > > Commit 0aecb06e2249 ("md/raid5: don't allow replacement while reshape > is in progress") fixes that replacement can be set if reshape is > interrupted, which will cause that array can't be assembled. > > There is a similar problem on the other side, if recovery is > interrupted, then reshape can start, which will cause the same problem. > > Fix the problem by not starting to reshape while recovery is still in > progress. > > Signed-off-by: Yu Kuai <yukuai3@huawei.com> Applied to md-next. Thanks, Song > --- > Changes in v2: > - fix some typo in commit message. > > drivers/md/raid5.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index 8686d629e3f2..6615abf54d3f 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -8525,6 +8525,7 @@ static int raid5_start_reshape(struct mddev *mddev) > struct r5conf *conf = mddev->private; > struct md_rdev *rdev; > int spares = 0; > + int i; > unsigned long flags; > > if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) > @@ -8536,6 +8537,13 @@ static int raid5_start_reshape(struct mddev *mddev) > if (has_failed(conf)) > return -EINVAL; > > + /* raid5 can't handle concurrent reshape and recovery */ > + if (mddev->recovery_cp < MaxSector) > + return -EBUSY; > + for (i = 0; i < conf->raid_disks; i++) > + if (rdev_mdlock_deref(mddev, conf->disks[i].replacement)) > + return -EBUSY; > + > rdev_for_each(rdev, mddev) { > if (!test_bit(In_sync, &rdev->flags) > && !test_bit(Faulty, &rdev->flags)) > -- > 2.39.2 >
On 5/29/23 21:34, Yu Kuai wrote: > From: Yu Kuai <yukuai3@huawei.com> > > Commit 0aecb06e2249 ("md/raid5: don't allow replacement while reshape > is in progress") fixes that replacement can be set if reshape is > interrupted, which will cause that array can't be assembled. > > There is a similar problem on the other side, if recovery is > interrupted, then reshape can start, which will cause the same problem. > > Fix the problem by not starting to reshape while recovery is still in > progress. > > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > --- > Changes in v2: > - fix some typo in commit message. > > drivers/md/raid5.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index 8686d629e3f2..6615abf54d3f 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -8525,6 +8525,7 @@ static int raid5_start_reshape(struct mddev *mddev) > struct r5conf *conf = mddev->private; > struct md_rdev *rdev; > int spares = 0; > + int i; > unsigned long flags; > > if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) > @@ -8536,6 +8537,13 @@ static int raid5_start_reshape(struct mddev *mddev) > if (has_failed(conf)) > return -EINVAL; > > + /* raid5 can't handle concurrent reshape and recovery */ > + if (mddev->recovery_cp < MaxSector) > + return -EBUSY; > + for (i = 0; i < conf->raid_disks; i++) > + if (rdev_mdlock_deref(mddev, conf->disks[i].replacement)) > + return -EBUSY; > + Does it mean reshape and recovery can happen in parallel without the change? I really doubt about it given any kind of internal io (resync, reshape and recovery) is handled by resync thread. And IIUC either md_do_sync or md_check_recovery should avoid it, no need to do it in personality layer. Thanks, Guoqing
Hi, 在 2023/05/31 9:06, Guoqing Jiang 写道: > > > On 5/29/23 21:34, Yu Kuai wrote: >> From: Yu Kuai <yukuai3@huawei.com> >> >> Commit 0aecb06e2249 ("md/raid5: don't allow replacement while reshape >> is in progress") fixes that replacement can be set if reshape is >> interrupted, which will cause that array can't be assembled. >> >> There is a similar problem on the other side, if recovery is >> interrupted, then reshape can start, which will cause the same problem. >> >> Fix the problem by not starting to reshape while recovery is still in >> progress. >> >> Signed-off-by: Yu Kuai <yukuai3@huawei.com> >> --- >> Changes in v2: >> - fix some typo in commit message. >> >> drivers/md/raid5.c | 8 ++++++++ >> 1 file changed, 8 insertions(+) >> >> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c >> index 8686d629e3f2..6615abf54d3f 100644 >> --- a/drivers/md/raid5.c >> +++ b/drivers/md/raid5.c >> @@ -8525,6 +8525,7 @@ static int raid5_start_reshape(struct mddev *mddev) >> struct r5conf *conf = mddev->private; >> struct md_rdev *rdev; >> int spares = 0; >> + int i; >> unsigned long flags; >> if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) >> @@ -8536,6 +8537,13 @@ static int raid5_start_reshape(struct mddev >> *mddev) >> if (has_failed(conf)) >> return -EINVAL; >> + /* raid5 can't handle concurrent reshape and recovery */ >> + if (mddev->recovery_cp < MaxSector) >> + return -EBUSY; >> + for (i = 0; i < conf->raid_disks; i++) >> + if (rdev_mdlock_deref(mddev, conf->disks[i].replacement)) >> + return -EBUSY; >> + > > Does it mean reshape and recovery can happen in parallel without the > change? > I really doubt about it given any kind of internal io (resync, reshape > and recovery) > is handled by resync thread. And IIUC either md_do_sync or > md_check_recovery > should avoid it, no need to do it in personality layer. > They can't, in this case recovery is interrupted, then recovery can't make progress, and md_check_recovery() will start reshape, and after reshape is done, recovery will continue, and data will be corrupted because raid456 reshape doesn't handle replacement. And by the way in raid456 is that if system reboot, this array can't be assembled, raid5_run() will fail if reshape and replacement are both set. Thanks, Kuai
Hi, 在 2023/05/31 9:22, Yu Kuai 写道: > Hi, > > 在 2023/05/31 9:06, Guoqing Jiang 写道: >> >> >> On 5/29/23 21:34, Yu Kuai wrote: >>> From: Yu Kuai <yukuai3@huawei.com> >>> >>> Commit 0aecb06e2249 ("md/raid5: don't allow replacement while reshape >>> is in progress") fixes that replacement can be set if reshape is >>> interrupted, which will cause that array can't be assembled. >>> >>> There is a similar problem on the other side, if recovery is >>> interrupted, then reshape can start, which will cause the same problem. >>> >>> Fix the problem by not starting to reshape while recovery is still in >>> progress. >>> >>> Signed-off-by: Yu Kuai <yukuai3@huawei.com> >>> --- >>> Changes in v2: >>> - fix some typo in commit message. >>> >>> drivers/md/raid5.c | 8 ++++++++ >>> 1 file changed, 8 insertions(+) >>> >>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c >>> index 8686d629e3f2..6615abf54d3f 100644 >>> --- a/drivers/md/raid5.c >>> +++ b/drivers/md/raid5.c >>> @@ -8525,6 +8525,7 @@ static int raid5_start_reshape(struct mddev >>> *mddev) >>> struct r5conf *conf = mddev->private; >>> struct md_rdev *rdev; >>> int spares = 0; >>> + int i; >>> unsigned long flags; >>> if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) >>> @@ -8536,6 +8537,13 @@ static int raid5_start_reshape(struct mddev >>> *mddev) >>> if (has_failed(conf)) >>> return -EINVAL; >>> + /* raid5 can't handle concurrent reshape and recovery */ >>> + if (mddev->recovery_cp < MaxSector) >>> + return -EBUSY; >>> + for (i = 0; i < conf->raid_disks; i++) >>> + if (rdev_mdlock_deref(mddev, conf->disks[i].replacement)) >>> + return -EBUSY; >>> + >> >> Does it mean reshape and recovery can happen in parallel without the >> change? >> I really doubt about it given any kind of internal io (resync, reshape >> and recovery) >> is handled by resync thread. And IIUC either md_do_sync or >> md_check_recovery >> should avoid it, no need to do it in personality layer. >> > > They can't, in this case recovery is interrupted, then recovery can't > make progress, and md_check_recovery() will start reshape, and after > reshape is done, recovery will continue, and data will be corrupted > because raid456 reshape doesn't handle replacement. > > And by the way in raid456 is that if system reboot, this array can't be > assembled, raid5_run() will fail if reshape and replacement are both > set. And someone reported this reboot case, I also add new test for this case, you can take a look. Thanks, Kuai
On 5/31/23 09:22, Yu Kuai wrote: > Hi, > > 在 2023/05/31 9:06, Guoqing Jiang 写道: >> >> >> On 5/29/23 21:34, Yu Kuai wrote: >>> From: Yu Kuai <yukuai3@huawei.com> >>> >>> Commit 0aecb06e2249 ("md/raid5: don't allow replacement while reshape >>> is in progress") fixes that replacement can be set if reshape is >>> interrupted, which will cause that array can't be assembled. I just pulled md tree, but can't find the commit id either in md-next or md-fixes . gjiang@pc:~/storage/md> git branch master md-fixes * md-next gjiang@pc:~/storage/md> git branch --contain 0aecb06e2249 error: malformed object name 0aecb06e2249 >>> There is a similar problem on the other side, if recovery is >>> interrupted, then reshape can start, which will cause the same problem. >>> >>> Fix the problem by not starting to reshape while recovery is still in >>> progress. >>> >>> Signed-off-by: Yu Kuai <yukuai3@huawei.com> >>> --- >>> Changes in v2: >>> - fix some typo in commit message. >>> >>> drivers/md/raid5.c | 8 ++++++++ >>> 1 file changed, 8 insertions(+) >>> >>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c >>> index 8686d629e3f2..6615abf54d3f 100644 >>> --- a/drivers/md/raid5.c >>> +++ b/drivers/md/raid5.c >>> @@ -8525,6 +8525,7 @@ static int raid5_start_reshape(struct mddev >>> *mddev) >>> struct r5conf *conf = mddev->private; >>> struct md_rdev *rdev; >>> int spares = 0; >>> + int i; >>> unsigned long flags; >>> if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) >>> @@ -8536,6 +8537,13 @@ static int raid5_start_reshape(struct mddev >>> *mddev) >>> if (has_failed(conf)) >>> return -EINVAL; >>> + /* raid5 can't handle concurrent reshape and recovery */ >>> + if (mddev->recovery_cp < MaxSector) >>> + return -EBUSY; >>> + for (i = 0; i < conf->raid_disks; i++) >>> + if (rdev_mdlock_deref(mddev, conf->disks[i].replacement)) >>> + return -EBUSY; >>> + >> >> Does it mean reshape and recovery can happen in parallel without the >> change? >> I really doubt about it given any kind of internal io (resync, >> reshape and recovery) >> is handled by resync thread. And IIUC either md_do_sync or >> md_check_recovery >> should avoid it, no need to do it in personality layer. >> > > They can't, in this case recovery is interrupted, then recovery can't > make progress, and md_check_recovery() will start reshape, and after > reshape is done, recovery will continue, and data will be corrupted > because raid456 reshape doesn't handle replacement. So, do reshape first then recovery, right? I don't see concurrent reshape and recovery happen based on your description, if concurrent reshape and recovery is possible then I believe we really have big trouble. > And by the way in raid456 is that if system reboot, this array can't be > assembled, raid5_run() will fail if reshape and replacement are both > set. Assemble an array need to read data from sb, I don't know which place record replacement, I probably misunderstand something. Thanks, Guoqing
Hi, 在 2023/05/31 9:49, Guoqing Jiang 写道: > > > On 5/31/23 09:22, Yu Kuai wrote: >> Hi, >> >> 在 2023/05/31 9:06, Guoqing Jiang 写道: >>> >>> >>> On 5/29/23 21:34, Yu Kuai wrote: >>>> From: Yu Kuai <yukuai3@huawei.com> >>>> >>>> Commit 0aecb06e2249 ("md/raid5: don't allow replacement while reshape >>>> is in progress") fixes that replacement can be set if reshape is >>>> interrupted, which will cause that array can't be assembled. > > I just pulled md tree, but can't find the commit id either in md-next or > md-fixes . > gjiang@pc:~/storage/md> git branch > master > md-fixes > * md-next > gjiang@pc:~/storage/md> git branch --contain 0aecb06e2249 > error: malformed object name 0aecb06e2249 > >>>> There is a similar problem on the other side, if recovery is >>>> interrupted, then reshape can start, which will cause the same problem. >>>> >>>> Fix the problem by not starting to reshape while recovery is still in >>>> progress. >>>> >>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com> >>>> --- >>>> Changes in v2: >>>> - fix some typo in commit message. >>>> >>>> drivers/md/raid5.c | 8 ++++++++ >>>> 1 file changed, 8 insertions(+) >>>> >>>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c >>>> index 8686d629e3f2..6615abf54d3f 100644 >>>> --- a/drivers/md/raid5.c >>>> +++ b/drivers/md/raid5.c >>>> @@ -8525,6 +8525,7 @@ static int raid5_start_reshape(struct mddev >>>> *mddev) >>>> struct r5conf *conf = mddev->private; >>>> struct md_rdev *rdev; >>>> int spares = 0; >>>> + int i; >>>> unsigned long flags; >>>> if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) >>>> @@ -8536,6 +8537,13 @@ static int raid5_start_reshape(struct mddev >>>> *mddev) >>>> if (has_failed(conf)) >>>> return -EINVAL; >>>> + /* raid5 can't handle concurrent reshape and recovery */ >>>> + if (mddev->recovery_cp < MaxSector) >>>> + return -EBUSY; >>>> + for (i = 0; i < conf->raid_disks; i++) >>>> + if (rdev_mdlock_deref(mddev, conf->disks[i].replacement)) >>>> + return -EBUSY; >>>> + >>> >>> Does it mean reshape and recovery can happen in parallel without the >>> change? >>> I really doubt about it given any kind of internal io (resync, >>> reshape and recovery) >>> is handled by resync thread. And IIUC either md_do_sync or >>> md_check_recovery >>> should avoid it, no need to do it in personality layer. >>> >> >> They can't, in this case recovery is interrupted, then recovery can't >> make progress, and md_check_recovery() will start reshape, and after >> reshape is done, recovery will continue, and data will be corrupted >> because raid456 reshape doesn't handle replacement. > > So, do reshape first then recovery, right? I don't see concurrent > reshape and recovery > happen based on your description, if concurrent reshape and recovery is > possible > then I believe we really have big trouble. Yes, reshape first, and yes they can't concurrent. > >> And by the way in raid456 is that if system reboot, this array can't be >> assembled, raid5_run() will fail if reshape and replacement are both >> set. > > Assemble an array need to read data from sb, I don't know which place > record replacement, > I probably misunderstand something. It's in rdev->flags, if MD_FEATURE_REPLACEMENT is set in rdev metadata(sb->feature_map), then this rdev will mark Replacement, and later this rdev will set to mirros[]->replacement in setup_conf(). Thanks, Kuai
On 5/31/23 11:20, Yu Kuai wrote: > Hi, > > 在 2023/05/31 9:49, Guoqing Jiang 写道: >> >> >> On 5/31/23 09:22, Yu Kuai wrote: >>> Hi, >>> >>> 在 2023/05/31 9:06, Guoqing Jiang 写道: >>>> >>>> >>>> On 5/29/23 21:34, Yu Kuai wrote: >>>>> From: Yu Kuai <yukuai3@huawei.com> >>>>> >>>>> Commit 0aecb06e2249 ("md/raid5: don't allow replacement while reshape >>>>> is in progress") fixes that replacement can be set if reshape is >>>>> interrupted, which will cause that array can't be assembled. >> >> I just pulled md tree, but can't find the commit id either in md-next >> or md-fixes . >> gjiang@pc:~/storage/md> git branch >> master >> md-fixes >> * md-next >> gjiang@pc:~/storage/md> git branch --contain 0aecb06e2249 >> error: malformed object name 0aecb06e2249 >> >>>>> There is a similar problem on the other side, if recovery is >>>>> interrupted, then reshape can start, which will cause the same >>>>> problem. >>>>> >>>>> Fix the problem by not starting to reshape while recovery is still in >>>>> progress. >>>>> >>>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com> >>>>> --- >>>>> Changes in v2: >>>>> - fix some typo in commit message. >>>>> >>>>> drivers/md/raid5.c | 8 ++++++++ >>>>> 1 file changed, 8 insertions(+) >>>>> >>>>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c >>>>> index 8686d629e3f2..6615abf54d3f 100644 >>>>> --- a/drivers/md/raid5.c >>>>> +++ b/drivers/md/raid5.c >>>>> @@ -8525,6 +8525,7 @@ static int raid5_start_reshape(struct mddev >>>>> *mddev) >>>>> struct r5conf *conf = mddev->private; >>>>> struct md_rdev *rdev; >>>>> int spares = 0; >>>>> + int i; >>>>> unsigned long flags; >>>>> if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) >>>>> @@ -8536,6 +8537,13 @@ static int raid5_start_reshape(struct mddev >>>>> *mddev) >>>>> if (has_failed(conf)) >>>>> return -EINVAL; >>>>> + /* raid5 can't handle concurrent reshape and recovery */ >>>>> + if (mddev->recovery_cp < MaxSector) >>>>> + return -EBUSY; >>>>> + for (i = 0; i < conf->raid_disks; i++) >>>>> + if (rdev_mdlock_deref(mddev, conf->disks[i].replacement)) >>>>> + return -EBUSY; >>>>> + >>>> >>>> Does it mean reshape and recovery can happen in parallel without >>>> the change? >>>> I really doubt about it given any kind of internal io (resync, >>>> reshape and recovery) >>>> is handled by resync thread. And IIUC either md_do_sync or >>>> md_check_recovery >>>> should avoid it, no need to do it in personality layer. >>>> >>> >>> They can't, in this case recovery is interrupted, then recovery can't >>> make progress, and md_check_recovery() will start reshape, and after >>> reshape is done, recovery will continue, and data will be corrupted >>> because raid456 reshape doesn't handle replacement. >> >> So, do reshape first then recovery, right? I don't see concurrent >> reshape and recovery >> happen based on your description, if concurrent reshape and recovery >> is possible >> then I believe we really have big trouble. > > Yes, reshape first, and yes they can't concurrent. Then the subject, commit message and above comment are confusing given they can't happen even without your change. > >> >>> And by the way in raid456 is that if system reboot, this array can't be >>> assembled, raid5_run() will fail if reshape and replacement are both >>> set. >> >> Assemble an array need to read data from sb, I don't know which place >> record replacement, >> I probably misunderstand something. > > It's in rdev->flags, if MD_FEATURE_REPLACEMENT is set in rdev > metadata(sb->feature_map), then this rdev will mark Replacement, and > later this rdev will set to mirros[]->replacement in setup_conf(). Yes, I missed that. Thanks, Guoqing
On Wed, May 31, 2023 at 12:34 AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote: > > > > On 5/31/23 11:20, Yu Kuai wrote: > > Hi, > > > > 在 2023/05/31 9:49, Guoqing Jiang 写道: > >> > >> > >> On 5/31/23 09:22, Yu Kuai wrote: > >>> Hi, > >>> > >>> 在 2023/05/31 9:06, Guoqing Jiang 写道: > >>>> > >>>> > >>>> On 5/29/23 21:34, Yu Kuai wrote: > >>>>> From: Yu Kuai <yukuai3@huawei.com> > >>>>> > >>>>> Commit 0aecb06e2249 ("md/raid5: don't allow replacement while reshape > >>>>> is in progress") fixes that replacement can be set if reshape is > >>>>> interrupted, which will cause that array can't be assembled. > >> > >> I just pulled md tree, but can't find the commit id either in md-next > >> or md-fixes . > >> gjiang@pc:~/storage/md> git branch > >> master > >> md-fixes > >> * md-next > >> gjiang@pc:~/storage/md> git branch --contain 0aecb06e2249 > >> error: malformed object name 0aecb06e2249 > >> > >>>>> There is a similar problem on the other side, if recovery is > >>>>> interrupted, then reshape can start, which will cause the same > >>>>> problem. > >>>>> > >>>>> Fix the problem by not starting to reshape while recovery is still in > >>>>> progress. > >>>>> > >>>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com> > >>>>> --- > >>>>> Changes in v2: > >>>>> - fix some typo in commit message. > >>>>> > >>>>> drivers/md/raid5.c | 8 ++++++++ > >>>>> 1 file changed, 8 insertions(+) > >>>>> > >>>>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > >>>>> index 8686d629e3f2..6615abf54d3f 100644 > >>>>> --- a/drivers/md/raid5.c > >>>>> +++ b/drivers/md/raid5.c > >>>>> @@ -8525,6 +8525,7 @@ static int raid5_start_reshape(struct mddev > >>>>> *mddev) > >>>>> struct r5conf *conf = mddev->private; > >>>>> struct md_rdev *rdev; > >>>>> int spares = 0; > >>>>> + int i; > >>>>> unsigned long flags; > >>>>> if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) > >>>>> @@ -8536,6 +8537,13 @@ static int raid5_start_reshape(struct mddev > >>>>> *mddev) > >>>>> if (has_failed(conf)) > >>>>> return -EINVAL; > >>>>> + /* raid5 can't handle concurrent reshape and recovery */ > >>>>> + if (mddev->recovery_cp < MaxSector) > >>>>> + return -EBUSY; > >>>>> + for (i = 0; i < conf->raid_disks; i++) > >>>>> + if (rdev_mdlock_deref(mddev, conf->disks[i].replacement)) > >>>>> + return -EBUSY; > >>>>> + > >>>> > >>>> Does it mean reshape and recovery can happen in parallel without > >>>> the change? > >>>> I really doubt about it given any kind of internal io (resync, > >>>> reshape and recovery) > >>>> is handled by resync thread. And IIUC either md_do_sync or > >>>> md_check_recovery > >>>> should avoid it, no need to do it in personality layer. > >>>> > >>> > >>> They can't, in this case recovery is interrupted, then recovery can't > >>> make progress, and md_check_recovery() will start reshape, and after > >>> reshape is done, recovery will continue, and data will be corrupted > >>> because raid456 reshape doesn't handle replacement. > >> > >> So, do reshape first then recovery, right? I don't see concurrent > >> reshape and recovery > >> happen based on your description, if concurrent reshape and recovery > >> is possible > >> then I believe we really have big trouble. > > > > Yes, reshape first, and yes they can't concurrent. > > Then the subject, commit message and above comment are confusing given > they can't happen > even without your change. > I updated the commit message as: md/raid5: don't start reshape when recovery or replace is in progress When recovery is interrupted (reboot, etc.) check for MD_RECOVERY_RUNNING is not enough to tell recovery is in progress. Also check recovery_cp before starting reshape. Please let me know if this doesn't make sense. Thanks, Song
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 8686d629e3f2..6615abf54d3f 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -8525,6 +8525,7 @@ static int raid5_start_reshape(struct mddev *mddev) struct r5conf *conf = mddev->private; struct md_rdev *rdev; int spares = 0; + int i; unsigned long flags; if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) @@ -8536,6 +8537,13 @@ static int raid5_start_reshape(struct mddev *mddev) if (has_failed(conf)) return -EINVAL; + /* raid5 can't handle concurrent reshape and recovery */ + if (mddev->recovery_cp < MaxSector) + return -EBUSY; + for (i = 0; i < conf->raid_disks; i++) + if (rdev_mdlock_deref(mddev, conf->disks[i].replacement)) + return -EBUSY; + rdev_for_each(rdev, mddev) { if (!test_bit(In_sync, &rdev->flags) && !test_bit(Faulty, &rdev->flags))