Message ID | 20240201092559.910982-10-yukuai1@huaweicloud.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-47873-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2719:b0:106:209c:c626 with SMTP id hl25csp30774dyb; Thu, 1 Feb 2024 01:34:16 -0800 (PST) X-Google-Smtp-Source: AGHT+IHaRjjtdpQpVQEKgZ2TSVMYonWMmUZk8/LkoTlStgk8estwSfexzaognAHgQ4+hQeHJIxPt X-Received: by 2002:a17:906:f296:b0:a30:e4d8:2e46 with SMTP id gu22-20020a170906f29600b00a30e4d82e46mr3065610ejb.20.1706780056510; Thu, 01 Feb 2024 01:34:16 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706780056; cv=pass; d=google.com; s=arc-20160816; b=SawQNInhBG8r3m5jFzl93914/YSP5HHZM5uGrmvMyXYIYLzc9AlYNSmXpAteCe2Zdp CDoffV08ZcN3xa8A4tPJdSdEWSXcUjH907eL9wUHsl9yvBgDiUvKA5JpsoO0EC+CjAN8 cbRDHdtkTuJOHc9PDiAqjzFL0jUhr7ncNAI50j8El0CbrQHw/Ww6m/zBvGIpuQ0FQbd9 D+80Nl4hpKE6v05KidugcGXf1J6Gafsa4tF4uN5EGEqbbkfxIkyoILjTbAKojq58j4zP DUivwMOdBzeupPxmcQtqkjSHoRoQtDwAwe+sldkGcTz2Ybv4+7Jrrjs3xND0IdlQVIMT w24Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=kQ7iVhVb18spXoVAwrqbgq++EMzPS7foKEKx8QVjwOM=; fh=YGDlfKXAQItOpUo8arSe0WCUsb2355D6MfGSLCDLkHw=; b=qf8hyX+dh+dm62fDgIiWe7fh7Vk/IvMacIQEKSGEqPMhg3CuilHvVlLdIakba/8ioN mzl1NDSikhzAgBussQ5cPpDHNeJYLfeRmKl/ag3x2+UuDe8g1NOun13KgYrdCQqY1YrC +nMza3jHDDOvThHStwSx6Brs6FrCa9JfMcWEc7Pii6EpbWKlCx2e0xfGbxaOyDgZik4Q VfW2qlJ7meHIahpTGHh7UC+PuNwY54GK9xFLxzZ3QmukTxvAiub+KhvnpA9AApD/Oeb0 xBpKvs+9TL0QLAt8Xq2Ua+KiVKbWbHCbClCdI/a4rCFisaIV/Y9zH+RSrLxfs98f6ulZ d8CQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-47873-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-47873-ouuuleilei=gmail.com@vger.kernel.org" X-Forwarded-Encrypted: i=1; AJvYcCXCQNhllztc95mrRqEewBFFlQk0vn2xmFHDs+CyNp9JJTiDpeeDy0Y3MorMzzk83kfvh+fLZSurxc/FwlRTWRuQNWOOxw== Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id gq19-20020a170906e25300b00a366d4e7ba1si1622028ejb.510.2024.02.01.01.34.16 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Feb 2024 01:34:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-47873-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-47873-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-47873-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id EA7A31F277A1 for <ouuuleilei@gmail.com>; Thu, 1 Feb 2024 09:34:15 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 007A5163A8A; Thu, 1 Feb 2024 09:30:39 +0000 (UTC) Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 49B6D15DBBF; Thu, 1 Feb 2024 09:30:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706779837; cv=none; b=jztwQsjLR/vrwUFxlzTICGWZ1lmNuzFXA3yKDc/2do0lVIin9QPdDqxwpa5aSTFs4R0U2JlPBm/rKirirzt//y6DdkPouDqJZSvaBRYLoRBezfPuzcKlCQsvMT5ZRjeyqrCPZq7MU3B+SqHwy9EgVJfsHhtTtlPNg9CBKvFBvfA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706779837; c=relaxed/simple; bh=6BrWeORSG3spzobIm2H1STSTTDZ6J8PBa6b1+irB9U4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=oEP+kRDjKJTkourRzzHkm99elpA7m2b1nPLMG2mKGYF/8bdYy9D1/7cWeiTXXiBTqLz96V/ed8qKhQPXiCkJBTmlyMuvWxLiFdjlPSO3OegLvvu0Tkq9vu8vz5gDnhNMI5TSEGvOejuWSPR55v6pMtUj3aDGj+0qSFxGxB5d8EE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4TQYY01NkSz4f3l7c; Thu, 1 Feb 2024 17:30:28 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 668C51A0175; Thu, 1 Feb 2024 17:30:32 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgCXaBGtZLtl8V6KCg--.33515S13; Thu, 01 Feb 2024 17:30:32 +0800 (CST) From: Yu Kuai <yukuai1@huaweicloud.com> To: mpatocka@redhat.com, heinzm@redhat.com, xni@redhat.com, blazej.kucman@linux.intel.com, agk@redhat.com, snitzer@kernel.org, dm-devel@lists.linux.dev, song@kernel.org, yukuai3@huawei.com, jbrassow@f14.redhat.com, neilb@suse.de, shli@fb.com, akpm@osdl.org Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH v5 09/14] dm-raid: really frozen sync_thread during suspend Date: Thu, 1 Feb 2024 17:25:54 +0800 Message-Id: <20240201092559.910982-10-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240201092559.910982-1-yukuai1@huaweicloud.com> References: <20240201092559.910982-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID: cCh0CgCXaBGtZLtl8V6KCg--.33515S13 X-Coremail-Antispam: 1UD129KBjvJXoWxur1xXF4fWry8tryDtFW3trb_yoWrKw15pa yrtws0vw48JrW7Za9Fy3WkXFWYvwnIgrWUtr93WayrJ3WSkws3ury8Kw47ZFWDtFyxJa4Y yr4Dtw45uFWjgFJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUP214x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Cr1j6r xdM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s0D M2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjx v20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1l F7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7M4IIrI8v6xkF7I0E8cxan2 IY04v7MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAF wI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWrXwCIc4 0Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1I6r4UMIIF0xvE2Ix0cI8IcVCY1x0267AK xVW8Jr0_Cr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JV WxJwCI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7VUbmZ X7UUUUU== X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789688604563616859 X-GMAIL-MSGID: 1789688604563616859 |
Series |
[v5,01/14] md: don't ignore suspended array in md_check_recovery()
|
|
Commit Message
Yu Kuai
Feb. 1, 2024, 9:25 a.m. UTC
From: Yu Kuai <yukuai3@huawei.com> 1) The flag MD_RECOVERY_FROZEN doesn't mean that sync thread is frozen, it only prevent new sync_thread to start, and it can't stop the running sync thread; 2) The flag MD_RECOVERY_FROZEN doesn't mean that writes are stopped, use it as condition for md_stop_writes() in raid_postsuspend() doesn't look correct. 3) raid_message can set/clear the flag MD_RECOVERY_FROZEN at anytime, and if MD_RECOVERY_FROZEN is cleared while the array is suspended, new sync_thread can start unexpected. Fix above problems by using the new helper to suspend the array during suspend, also disallow raid_message() to change sync_thread status during suspend. Note that after commit f52f5c71f3d4 ("md: fix stopping sync thread"), the test shell/lvconvert-raid-reshape.sh start to hang in stop_sync_thread(), and with previous fixes, the test won't hang there anymore, however, the test will still fail and complain that ext4 is corrupted. And with this patch, the test won't hang due to stop_sync_thread() or fail due to ext4 is corrupted anymore. However, there is still a deadlock related to dm-raid456 that will be fixed in following patches. Reported-by: Mikulas Patocka <mpatocka@redhat.com> Closes: https://lore.kernel.org/all/e5e8afe2-e9a8-49a2-5ab0-958d4065c55e@redhat.com/ Fixes: 1af2048a3e87 ("dm raid: fix deadlock caused by premature md_stop_writes()") Fixes: 9dbd1aa3a81c ("dm raid: add reshaping support to the target") Fixes: f52f5c71f3d4 ("md: fix stopping sync thread") Signed-off-by: Yu Kuai <yukuai3@huawei.com> --- drivers/md/dm-raid.c | 38 +++++++++++++++++++++++++++++--------- 1 file changed, 29 insertions(+), 9 deletions(-)
Comments
Hi Kuai On Thu, Feb 1, 2024 at 5:30 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > From: Yu Kuai <yukuai3@huawei.com> > > 1) The flag MD_RECOVERY_FROZEN doesn't mean that sync thread is frozen, > it only prevent new sync_thread to start, and it can't stop the > running sync thread; Agree with this > 2) The flag MD_RECOVERY_FROZEN doesn't mean that writes are stopped, use > it as condition for md_stop_writes() in raid_postsuspend() doesn't > look correct. I don't agree with it. __md_stop_writes stops sync thread, so it needs to check this flag. And It looks like the name __md_stop_writes is not right. Does it really stop write io? mddev_suspend should be the function that stop write request. From my understanding, raid_postsuspend does two jobs. One is stopping sync thread. Two is suspending array. > 3) raid_message can set/clear the flag MD_RECOVERY_FROZEN at anytime, > and if MD_RECOVERY_FROZEN is cleared while the array is suspended, > new sync_thread can start unexpected. md_action_store doesn't check this either. If the array is suspended and MD_RECOVERY_FROZEN is cleared, before patch01, sync thread can't happen. So it looks like patch01 breaks the logic. Regards Xiao > > Fix above problems by using the new helper to suspend the array during > suspend, also disallow raid_message() to change sync_thread status > during suspend. > > Note that after commit f52f5c71f3d4 ("md: fix stopping sync thread"), the > test shell/lvconvert-raid-reshape.sh start to hang in stop_sync_thread(), > and with previous fixes, the test won't hang there anymore, however, the > test will still fail and complain that ext4 is corrupted. And with this > patch, the test won't hang due to stop_sync_thread() or fail due to ext4 > is corrupted anymore. However, there is still a deadlock related to > dm-raid456 that will be fixed in following patches. > > Reported-by: Mikulas Patocka <mpatocka@redhat.com> > Closes: https://lore.kernel.org/all/e5e8afe2-e9a8-49a2-5ab0-958d4065c55e@redhat.com/ > Fixes: 1af2048a3e87 ("dm raid: fix deadlock caused by premature md_stop_writes()") > Fixes: 9dbd1aa3a81c ("dm raid: add reshaping support to the target") > Fixes: f52f5c71f3d4 ("md: fix stopping sync thread") > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > --- > drivers/md/dm-raid.c | 38 +++++++++++++++++++++++++++++--------- > 1 file changed, 29 insertions(+), 9 deletions(-) > > diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c > index eb009d6bb03a..5ce3c6020b1b 100644 > --- a/drivers/md/dm-raid.c > +++ b/drivers/md/dm-raid.c > @@ -3240,11 +3240,12 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv) > rs->md.ro = 1; > rs->md.in_sync = 1; > > - /* Keep array frozen until resume. */ > - set_bit(MD_RECOVERY_FROZEN, &rs->md.recovery); > - > /* Has to be held on running the array */ > mddev_suspend_and_lock_nointr(&rs->md); > + > + /* Keep array frozen until resume. */ > + md_frozen_sync_thread(&rs->md); > + > r = md_run(&rs->md); > rs->md.in_sync = 0; /* Assume already marked dirty */ > if (r) { > @@ -3722,6 +3723,9 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv, > if (!mddev->pers || !mddev->pers->sync_request) > return -EINVAL; > > + if (test_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) > + return -EBUSY; > + > if (!strcasecmp(argv[0], "frozen")) > set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); > else > @@ -3791,15 +3795,31 @@ static void raid_io_hints(struct dm_target *ti, struct queue_limits *limits) > blk_limits_io_opt(limits, chunk_size_bytes * mddev_data_stripes(rs)); > } > > +static void raid_presuspend(struct dm_target *ti) > +{ > + struct raid_set *rs = ti->private; > + > + mddev_lock_nointr(&rs->md); > + md_frozen_sync_thread(&rs->md); > + mddev_unlock(&rs->md); > +} > + > +static void raid_presuspend_undo(struct dm_target *ti) > +{ > + struct raid_set *rs = ti->private; > + > + mddev_lock_nointr(&rs->md); > + md_unfrozen_sync_thread(&rs->md); > + mddev_unlock(&rs->md); > +} > + > static void raid_postsuspend(struct dm_target *ti) > { > struct raid_set *rs = ti->private; > > if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) { > /* Writes have to be stopped before suspending to avoid deadlocks. */ > - if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery)) > - md_stop_writes(&rs->md); > - > + md_stop_writes(&rs->md); > mddev_suspend(&rs->md, false); > } > } > @@ -4012,8 +4032,6 @@ static int raid_preresume(struct dm_target *ti) > } > > /* Check for any resize/reshape on @rs and adjust/initiate */ > - /* Be prepared for mddev_resume() in raid_resume() */ > - set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); > if (mddev->recovery_cp && mddev->recovery_cp < MaxSector) { > set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery); > mddev->resync_min = mddev->recovery_cp; > @@ -4056,9 +4074,9 @@ static void raid_resume(struct dm_target *ti) > rs_set_capacity(rs); > > mddev_lock_nointr(mddev); > - clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); > mddev->ro = 0; > mddev->in_sync = 0; > + md_unfrozen_sync_thread(mddev); > mddev_unlock_and_resume(mddev); > } > } > @@ -4074,6 +4092,8 @@ static struct target_type raid_target = { > .message = raid_message, > .iterate_devices = raid_iterate_devices, > .io_hints = raid_io_hints, > + .presuspend = raid_presuspend, > + .presuspend_undo = raid_presuspend_undo, > .postsuspend = raid_postsuspend, > .preresume = raid_preresume, > .resume = raid_resume, > -- > 2.39.2 >
Hi, 在 2024/02/18 12:53, Xiao Ni 写道: > Hi Kuai > > On Thu, Feb 1, 2024 at 5:30 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >> >> From: Yu Kuai <yukuai3@huawei.com> >> >> 1) The flag MD_RECOVERY_FROZEN doesn't mean that sync thread is frozen, >> it only prevent new sync_thread to start, and it can't stop the >> running sync thread; > > Agree with this > >> 2) The flag MD_RECOVERY_FROZEN doesn't mean that writes are stopped, use >> it as condition for md_stop_writes() in raid_postsuspend() doesn't >> look correct. > > I don't agree with it. __md_stop_writes stops sync thread, so it needs > to check this flag. And It looks like the name __md_stop_writes is not > right. Does it really stop write io? mddev_suspend should be the > function that stop write request. From my understanding, > raid_postsuspend does two jobs. One is stopping sync thread. Two is > suspending array. MD_RECOVERY_FROZEN is not just used in __md_stop_writes(), so I think it's not correct to to check this. For example, if MD_RECOVERY_FROZEN is set by raid_message(), then __md_stop_writes() will be skipped. > >> 3) raid_message can set/clear the flag MD_RECOVERY_FROZEN at anytime, >> and if MD_RECOVERY_FROZEN is cleared while the array is suspended, >> new sync_thread can start unexpected. > > md_action_store doesn't check this either. If the array is suspended > and MD_RECOVERY_FROZEN is cleared, before patch01, sync thread can't > happen. So it looks like patch01 breaks the logic. The difference is that md/raid doen't need to frozen sync_thread while suspending the array for now. And I don't understand at all why sync thread can't happed before patch01. Thanks, Kuai > > Regards > Xiao > > >> >> Fix above problems by using the new helper to suspend the array during >> suspend, also disallow raid_message() to change sync_thread status >> during suspend. >> >> Note that after commit f52f5c71f3d4 ("md: fix stopping sync thread"), the >> test shell/lvconvert-raid-reshape.sh start to hang in stop_sync_thread(), >> and with previous fixes, the test won't hang there anymore, however, the >> test will still fail and complain that ext4 is corrupted. And with this >> patch, the test won't hang due to stop_sync_thread() or fail due to ext4 >> is corrupted anymore. However, there is still a deadlock related to >> dm-raid456 that will be fixed in following patches. >> >> Reported-by: Mikulas Patocka <mpatocka@redhat.com> >> Closes: https://lore.kernel.org/all/e5e8afe2-e9a8-49a2-5ab0-958d4065c55e@redhat.com/ >> Fixes: 1af2048a3e87 ("dm raid: fix deadlock caused by premature md_stop_writes()") >> Fixes: 9dbd1aa3a81c ("dm raid: add reshaping support to the target") >> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread") >> Signed-off-by: Yu Kuai <yukuai3@huawei.com> >> --- >> drivers/md/dm-raid.c | 38 +++++++++++++++++++++++++++++--------- >> 1 file changed, 29 insertions(+), 9 deletions(-) >> >> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c >> index eb009d6bb03a..5ce3c6020b1b 100644 >> --- a/drivers/md/dm-raid.c >> +++ b/drivers/md/dm-raid.c >> @@ -3240,11 +3240,12 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv) >> rs->md.ro = 1; >> rs->md.in_sync = 1; >> >> - /* Keep array frozen until resume. */ >> - set_bit(MD_RECOVERY_FROZEN, &rs->md.recovery); >> - >> /* Has to be held on running the array */ >> mddev_suspend_and_lock_nointr(&rs->md); >> + >> + /* Keep array frozen until resume. */ >> + md_frozen_sync_thread(&rs->md); >> + >> r = md_run(&rs->md); >> rs->md.in_sync = 0; /* Assume already marked dirty */ >> if (r) { >> @@ -3722,6 +3723,9 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv, >> if (!mddev->pers || !mddev->pers->sync_request) >> return -EINVAL; >> >> + if (test_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) >> + return -EBUSY; >> + >> if (!strcasecmp(argv[0], "frozen")) >> set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); >> else >> @@ -3791,15 +3795,31 @@ static void raid_io_hints(struct dm_target *ti, struct queue_limits *limits) >> blk_limits_io_opt(limits, chunk_size_bytes * mddev_data_stripes(rs)); >> } >> >> +static void raid_presuspend(struct dm_target *ti) >> +{ >> + struct raid_set *rs = ti->private; >> + >> + mddev_lock_nointr(&rs->md); >> + md_frozen_sync_thread(&rs->md); >> + mddev_unlock(&rs->md); >> +} >> + >> +static void raid_presuspend_undo(struct dm_target *ti) >> +{ >> + struct raid_set *rs = ti->private; >> + >> + mddev_lock_nointr(&rs->md); >> + md_unfrozen_sync_thread(&rs->md); >> + mddev_unlock(&rs->md); >> +} >> + >> static void raid_postsuspend(struct dm_target *ti) >> { >> struct raid_set *rs = ti->private; >> >> if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) { >> /* Writes have to be stopped before suspending to avoid deadlocks. */ >> - if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery)) >> - md_stop_writes(&rs->md); >> - >> + md_stop_writes(&rs->md); >> mddev_suspend(&rs->md, false); >> } >> } >> @@ -4012,8 +4032,6 @@ static int raid_preresume(struct dm_target *ti) >> } >> >> /* Check for any resize/reshape on @rs and adjust/initiate */ >> - /* Be prepared for mddev_resume() in raid_resume() */ >> - set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); >> if (mddev->recovery_cp && mddev->recovery_cp < MaxSector) { >> set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery); >> mddev->resync_min = mddev->recovery_cp; >> @@ -4056,9 +4074,9 @@ static void raid_resume(struct dm_target *ti) >> rs_set_capacity(rs); >> >> mddev_lock_nointr(mddev); >> - clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); >> mddev->ro = 0; >> mddev->in_sync = 0; >> + md_unfrozen_sync_thread(mddev); >> mddev_unlock_and_resume(mddev); >> } >> } >> @@ -4074,6 +4092,8 @@ static struct target_type raid_target = { >> .message = raid_message, >> .iterate_devices = raid_iterate_devices, >> .io_hints = raid_io_hints, >> + .presuspend = raid_presuspend, >> + .presuspend_undo = raid_presuspend_undo, >> .postsuspend = raid_postsuspend, >> .preresume = raid_preresume, >> .resume = raid_resume, >> -- >> 2.39.2 >> > > . >
On Sun, Feb 18, 2024 at 2:34 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > Hi, > > 在 2024/02/18 12:53, Xiao Ni 写道: > > Hi Kuai > > > > On Thu, Feb 1, 2024 at 5:30 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > >> > >> From: Yu Kuai <yukuai3@huawei.com> > >> > >> 1) The flag MD_RECOVERY_FROZEN doesn't mean that sync thread is frozen, > >> it only prevent new sync_thread to start, and it can't stop the > >> running sync thread; > > > > Agree with this > > > >> 2) The flag MD_RECOVERY_FROZEN doesn't mean that writes are stopped, use > >> it as condition for md_stop_writes() in raid_postsuspend() doesn't > >> look correct. > > > > I don't agree with it. __md_stop_writes stops sync thread, so it needs > > to check this flag. And It looks like the name __md_stop_writes is not > > right. Does it really stop write io? mddev_suspend should be the > > function that stop write request. From my understanding, > > raid_postsuspend does two jobs. One is stopping sync thread. Two is > > suspending array. > > MD_RECOVERY_FROZEN is not just used in __md_stop_writes(), so I think > it's not correct to to check this. For example, if MD_RECOVERY_FROZEN is > set by raid_message(), then __md_stop_writes() will be skipped. Hi Kuai raid_message sets MD_RECOVERY_FROZEN and it stops the sync thread synchronously. So it doesn't need __md_stop_writes. So from md and dmraid, it has a rule. If you set MD_RECOVERY_FROZEN, you're in the process of stopping sync thread. > > > > >> 3) raid_message can set/clear the flag MD_RECOVERY_FROZEN at anytime, > >> and if MD_RECOVERY_FROZEN is cleared while the array is suspended, > >> new sync_thread can start unexpected. > > > > md_action_store doesn't check this either. If the array is suspended > > and MD_RECOVERY_FROZEN is cleared, before patch01, sync thread can't > > happen. So it looks like patch01 breaks the logic. > > The difference is that md/raid doen't need to frozen sync_thread while > suspending the array for now. And I don't understand at all why sync > thread can't happed before patch01. There is a condition you mentioned above -- the array is suspended. Before patch01, if one array is suspended, the sync thread can't happen. Even raid_messages clears MD_RECOVERY_FROZEN, the sync thread can't start. After resume the array, the sync thread can start again. Regards Xiao > > Thanks, > Kuai > > > > > Regards > > Xiao > > > > > >> > >> Fix above problems by using the new helper to suspend the array during > >> suspend, also disallow raid_message() to change sync_thread status > >> during suspend. > >> > >> Note that after commit f52f5c71f3d4 ("md: fix stopping sync thread"), the > >> test shell/lvconvert-raid-reshape.sh start to hang in stop_sync_thread(), > >> and with previous fixes, the test won't hang there anymore, however, the > >> test will still fail and complain that ext4 is corrupted. And with this > >> patch, the test won't hang due to stop_sync_thread() or fail due to ext4 > >> is corrupted anymore. However, there is still a deadlock related to > >> dm-raid456 that will be fixed in following patches. > >> > >> Reported-by: Mikulas Patocka <mpatocka@redhat.com> > >> Closes: https://lore.kernel.org/all/e5e8afe2-e9a8-49a2-5ab0-958d4065c55e@redhat.com/ > >> Fixes: 1af2048a3e87 ("dm raid: fix deadlock caused by premature md_stop_writes()") > >> Fixes: 9dbd1aa3a81c ("dm raid: add reshaping support to the target") > >> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread") > >> Signed-off-by: Yu Kuai <yukuai3@huawei.com> > >> --- > >> drivers/md/dm-raid.c | 38 +++++++++++++++++++++++++++++--------- > >> 1 file changed, 29 insertions(+), 9 deletions(-) > >> > >> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c > >> index eb009d6bb03a..5ce3c6020b1b 100644 > >> --- a/drivers/md/dm-raid.c > >> +++ b/drivers/md/dm-raid.c > >> @@ -3240,11 +3240,12 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv) > >> rs->md.ro = 1; > >> rs->md.in_sync = 1; > >> > >> - /* Keep array frozen until resume. */ > >> - set_bit(MD_RECOVERY_FROZEN, &rs->md.recovery); > >> - > >> /* Has to be held on running the array */ > >> mddev_suspend_and_lock_nointr(&rs->md); > >> + > >> + /* Keep array frozen until resume. */ > >> + md_frozen_sync_thread(&rs->md); > >> + > >> r = md_run(&rs->md); > >> rs->md.in_sync = 0; /* Assume already marked dirty */ > >> if (r) { > >> @@ -3722,6 +3723,9 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv, > >> if (!mddev->pers || !mddev->pers->sync_request) > >> return -EINVAL; > >> > >> + if (test_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) > >> + return -EBUSY; > >> + > >> if (!strcasecmp(argv[0], "frozen")) > >> set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); > >> else > >> @@ -3791,15 +3795,31 @@ static void raid_io_hints(struct dm_target *ti, struct queue_limits *limits) > >> blk_limits_io_opt(limits, chunk_size_bytes * mddev_data_stripes(rs)); > >> } > >> > >> +static void raid_presuspend(struct dm_target *ti) > >> +{ > >> + struct raid_set *rs = ti->private; > >> + > >> + mddev_lock_nointr(&rs->md); > >> + md_frozen_sync_thread(&rs->md); > >> + mddev_unlock(&rs->md); > >> +} > >> + > >> +static void raid_presuspend_undo(struct dm_target *ti) > >> +{ > >> + struct raid_set *rs = ti->private; > >> + > >> + mddev_lock_nointr(&rs->md); > >> + md_unfrozen_sync_thread(&rs->md); > >> + mddev_unlock(&rs->md); > >> +} > >> + > >> static void raid_postsuspend(struct dm_target *ti) > >> { > >> struct raid_set *rs = ti->private; > >> > >> if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) { > >> /* Writes have to be stopped before suspending to avoid deadlocks. */ > >> - if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery)) > >> - md_stop_writes(&rs->md); > >> - > >> + md_stop_writes(&rs->md); > >> mddev_suspend(&rs->md, false); > >> } > >> } > >> @@ -4012,8 +4032,6 @@ static int raid_preresume(struct dm_target *ti) > >> } > >> > >> /* Check for any resize/reshape on @rs and adjust/initiate */ > >> - /* Be prepared for mddev_resume() in raid_resume() */ > >> - set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); > >> if (mddev->recovery_cp && mddev->recovery_cp < MaxSector) { > >> set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery); > >> mddev->resync_min = mddev->recovery_cp; > >> @@ -4056,9 +4074,9 @@ static void raid_resume(struct dm_target *ti) > >> rs_set_capacity(rs); > >> > >> mddev_lock_nointr(mddev); > >> - clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); > >> mddev->ro = 0; > >> mddev->in_sync = 0; > >> + md_unfrozen_sync_thread(mddev); > >> mddev_unlock_and_resume(mddev); > >> } > >> } > >> @@ -4074,6 +4092,8 @@ static struct target_type raid_target = { > >> .message = raid_message, > >> .iterate_devices = raid_iterate_devices, > >> .io_hints = raid_io_hints, > >> + .presuspend = raid_presuspend, > >> + .presuspend_undo = raid_presuspend_undo, > >> .postsuspend = raid_postsuspend, > >> .preresume = raid_preresume, > >> .resume = raid_resume, > >> -- > >> 2.39.2 > >> > > > > . > > >
Hi, 在 2024/02/19 15:27, Xiao Ni 写道: > On Sun, Feb 18, 2024 at 2:34 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >> >> Hi, >> >> 在 2024/02/18 12:53, Xiao Ni 写道: >>> Hi Kuai >>> >>> On Thu, Feb 1, 2024 at 5:30 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >>>> >>>> From: Yu Kuai <yukuai3@huawei.com> >>>> >>>> 1) The flag MD_RECOVERY_FROZEN doesn't mean that sync thread is frozen, >>>> it only prevent new sync_thread to start, and it can't stop the >>>> running sync thread; >>> >>> Agree with this >>> >>>> 2) The flag MD_RECOVERY_FROZEN doesn't mean that writes are stopped, use >>>> it as condition for md_stop_writes() in raid_postsuspend() doesn't >>>> look correct. >>> >>> I don't agree with it. __md_stop_writes stops sync thread, so it needs >>> to check this flag. And It looks like the name __md_stop_writes is not >>> right. Does it really stop write io? mddev_suspend should be the >>> function that stop write request. From my understanding, >>> raid_postsuspend does two jobs. One is stopping sync thread. Two is >>> suspending array. >> >> MD_RECOVERY_FROZEN is not just used in __md_stop_writes(), so I think >> it's not correct to to check this. For example, if MD_RECOVERY_FROZEN is >> set by raid_message(), then __md_stop_writes() will be skipped. > > Hi Kuai > > raid_message sets MD_RECOVERY_FROZEN and it stops the sync thread > synchronously. So it doesn't need __md_stop_writes. So from md and > dmraid, it has a rule. If you set MD_RECOVERY_FROZEN, you're in the > process of stopping sync thread. There are so much problems here, I'm not sure if you really walk through all patches here. 1) stop the sync_thread synchronously is problematic, and raid_message() doesn't even hold 'reconfig_mutex' for md_reap_sync_thread(); 2) skip __md_stop_writes() because sycn_thread is stopped is wrong, __md_stop_writes() does more work. > >> >>> >>>> 3) raid_message can set/clear the flag MD_RECOVERY_FROZEN at anytime, >>>> and if MD_RECOVERY_FROZEN is cleared while the array is suspended, >>>> new sync_thread can start unexpected. >>> >>> md_action_store doesn't check this either. If the array is suspended >>> and MD_RECOVERY_FROZEN is cleared, before patch01, sync thread can't >>> happen. So it looks like patch01 breaks the logic. >> >> The difference is that md/raid doen't need to frozen sync_thread while >> suspending the array for now. And I don't understand at all why sync >> thread can't happed before patch01. > > There is a condition you mentioned above -- the array is suspended. > Before patch01, if one array is suspended, the sync thread can't 3) before patch 1, sync_thread can still running even if array is suspended; And even without patch 1, raid_message() can still start new sync_thread: // assume sync_thread is not register raid_postsuspend raid_message md_stop_writes set_bit(MD_RECOVERY_NEEDED, &mddev->recovery) if (!mddev->suspended) md_wakeup_thread // new sync_thread is registered mddev_suspend > happen. Even raid_messages clears MD_RECOVERY_FROZEN, the sync thread > can't start. After resume the array, the sync thread can start again. 4) I think I don't need to explain again why suspended should not be used to prevent starting new sync_thread; Thanks, Kuai > > Regards > Xiao >> >> Thanks, >> Kuai >> >>> >>> Regards >>> Xiao >>> >>> >>>> >>>> Fix above problems by using the new helper to suspend the array during >>>> suspend, also disallow raid_message() to change sync_thread status >>>> during suspend. >>>> >>>> Note that after commit f52f5c71f3d4 ("md: fix stopping sync thread"), the >>>> test shell/lvconvert-raid-reshape.sh start to hang in stop_sync_thread(), >>>> and with previous fixes, the test won't hang there anymore, however, the >>>> test will still fail and complain that ext4 is corrupted. And with this >>>> patch, the test won't hang due to stop_sync_thread() or fail due to ext4 >>>> is corrupted anymore. However, there is still a deadlock related to >>>> dm-raid456 that will be fixed in following patches. >>>> >>>> Reported-by: Mikulas Patocka <mpatocka@redhat.com> >>>> Closes: https://lore.kernel.org/all/e5e8afe2-e9a8-49a2-5ab0-958d4065c55e@redhat.com/ >>>> Fixes: 1af2048a3e87 ("dm raid: fix deadlock caused by premature md_stop_writes()") >>>> Fixes: 9dbd1aa3a81c ("dm raid: add reshaping support to the target") >>>> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread") >>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com> >>>> --- >>>> drivers/md/dm-raid.c | 38 +++++++++++++++++++++++++++++--------- >>>> 1 file changed, 29 insertions(+), 9 deletions(-) >>>> >>>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c >>>> index eb009d6bb03a..5ce3c6020b1b 100644 >>>> --- a/drivers/md/dm-raid.c >>>> +++ b/drivers/md/dm-raid.c >>>> @@ -3240,11 +3240,12 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv) >>>> rs->md.ro = 1; >>>> rs->md.in_sync = 1; >>>> >>>> - /* Keep array frozen until resume. */ >>>> - set_bit(MD_RECOVERY_FROZEN, &rs->md.recovery); >>>> - >>>> /* Has to be held on running the array */ >>>> mddev_suspend_and_lock_nointr(&rs->md); >>>> + >>>> + /* Keep array frozen until resume. */ >>>> + md_frozen_sync_thread(&rs->md); >>>> + >>>> r = md_run(&rs->md); >>>> rs->md.in_sync = 0; /* Assume already marked dirty */ >>>> if (r) { >>>> @@ -3722,6 +3723,9 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv, >>>> if (!mddev->pers || !mddev->pers->sync_request) >>>> return -EINVAL; >>>> >>>> + if (test_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) >>>> + return -EBUSY; >>>> + >>>> if (!strcasecmp(argv[0], "frozen")) >>>> set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); >>>> else >>>> @@ -3791,15 +3795,31 @@ static void raid_io_hints(struct dm_target *ti, struct queue_limits *limits) >>>> blk_limits_io_opt(limits, chunk_size_bytes * mddev_data_stripes(rs)); >>>> } >>>> >>>> +static void raid_presuspend(struct dm_target *ti) >>>> +{ >>>> + struct raid_set *rs = ti->private; >>>> + >>>> + mddev_lock_nointr(&rs->md); >>>> + md_frozen_sync_thread(&rs->md); >>>> + mddev_unlock(&rs->md); >>>> +} >>>> + >>>> +static void raid_presuspend_undo(struct dm_target *ti) >>>> +{ >>>> + struct raid_set *rs = ti->private; >>>> + >>>> + mddev_lock_nointr(&rs->md); >>>> + md_unfrozen_sync_thread(&rs->md); >>>> + mddev_unlock(&rs->md); >>>> +} >>>> + >>>> static void raid_postsuspend(struct dm_target *ti) >>>> { >>>> struct raid_set *rs = ti->private; >>>> >>>> if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) { >>>> /* Writes have to be stopped before suspending to avoid deadlocks. */ >>>> - if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery)) >>>> - md_stop_writes(&rs->md); >>>> - >>>> + md_stop_writes(&rs->md); >>>> mddev_suspend(&rs->md, false); >>>> } >>>> } >>>> @@ -4012,8 +4032,6 @@ static int raid_preresume(struct dm_target *ti) >>>> } >>>> >>>> /* Check for any resize/reshape on @rs and adjust/initiate */ >>>> - /* Be prepared for mddev_resume() in raid_resume() */ >>>> - set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); >>>> if (mddev->recovery_cp && mddev->recovery_cp < MaxSector) { >>>> set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery); >>>> mddev->resync_min = mddev->recovery_cp; >>>> @@ -4056,9 +4074,9 @@ static void raid_resume(struct dm_target *ti) >>>> rs_set_capacity(rs); >>>> >>>> mddev_lock_nointr(mddev); >>>> - clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); >>>> mddev->ro = 0; >>>> mddev->in_sync = 0; >>>> + md_unfrozen_sync_thread(mddev); >>>> mddev_unlock_and_resume(mddev); >>>> } >>>> } >>>> @@ -4074,6 +4092,8 @@ static struct target_type raid_target = { >>>> .message = raid_message, >>>> .iterate_devices = raid_iterate_devices, >>>> .io_hints = raid_io_hints, >>>> + .presuspend = raid_presuspend, >>>> + .presuspend_undo = raid_presuspend_undo, >>>> .postsuspend = raid_postsuspend, >>>> .preresume = raid_preresume, >>>> .resume = raid_resume, >>>> -- >>>> 2.39.2 >>>> >>> >>> . >>> >> > > > . >
On Mon, Feb 19, 2024 at 3:53 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > Hi, > > 在 2024/02/19 15:27, Xiao Ni 写道: > > On Sun, Feb 18, 2024 at 2:34 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > >> > >> Hi, > >> > >> 在 2024/02/18 12:53, Xiao Ni 写道: > >>> Hi Kuai > >>> > >>> On Thu, Feb 1, 2024 at 5:30 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > >>>> > >>>> From: Yu Kuai <yukuai3@huawei.com> > >>>> > >>>> 1) The flag MD_RECOVERY_FROZEN doesn't mean that sync thread is frozen, > >>>> it only prevent new sync_thread to start, and it can't stop the > >>>> running sync thread; > >>> > >>> Agree with this > >>> > >>>> 2) The flag MD_RECOVERY_FROZEN doesn't mean that writes are stopped, use > >>>> it as condition for md_stop_writes() in raid_postsuspend() doesn't > >>>> look correct. > >>> > >>> I don't agree with it. __md_stop_writes stops sync thread, so it needs > >>> to check this flag. And It looks like the name __md_stop_writes is not > >>> right. Does it really stop write io? mddev_suspend should be the > >>> function that stop write request. From my understanding, > >>> raid_postsuspend does two jobs. One is stopping sync thread. Two is > >>> suspending array. > >> > >> MD_RECOVERY_FROZEN is not just used in __md_stop_writes(), so I think > >> it's not correct to to check this. For example, if MD_RECOVERY_FROZEN is > >> set by raid_message(), then __md_stop_writes() will be skipped. > > > > Hi Kuai > > > > raid_message sets MD_RECOVERY_FROZEN and it stops the sync thread > > synchronously. So it doesn't need __md_stop_writes. So from md and > > dmraid, it has a rule. If you set MD_RECOVERY_FROZEN, you're in the > > process of stopping sync thread. > > There are so much problems here, I'm not sure if you really walk through > all patches here. I haven't read all of them. But as you mentioned, the following patches are based on patch01. They work together. I want to narrow the change to fix these regression problems. But it depends on the song's decision. > > 1) stop the sync_thread synchronously is problematic, and raid_message() > doesn't even hold 'reconfig_mutex' for md_reap_sync_thread(); > 2) skip __md_stop_writes() because sycn_thread is stopped is wrong, > __md_stop_writes() does more work. Agree with this. We can use the same way as action_store does. But we can do this later, not this patch set. > > > >> > >>> > >>>> 3) raid_message can set/clear the flag MD_RECOVERY_FROZEN at anytime, > >>>> and if MD_RECOVERY_FROZEN is cleared while the array is suspended, > >>>> new sync_thread can start unexpected. > >>> > >>> md_action_store doesn't check this either. If the array is suspended > >>> and MD_RECOVERY_FROZEN is cleared, before patch01, sync thread can't > >>> happen. So it looks like patch01 breaks the logic. > >> > >> The difference is that md/raid doen't need to frozen sync_thread while > >> suspending the array for now. And I don't understand at all why sync > >> thread can't happed before patch01. > > > > There is a condition you mentioned above -- the array is suspended. > > Before patch01, if one array is suspended, the sync thread can't > 3) before patch 1, sync_thread can still running even if array is > suspended; > And even without patch 1, raid_message() can still start new > sync_thread: > > // assume sync_thread is not register > raid_postsuspend raid_message > md_stop_writes > set_bit(MD_RECOVERY_NEEDED, &mddev->recovery) > if (!mddev->suspended) > md_wakeup_thread > // new sync_thread is registered > mddev_suspend The array is not suspended in the above case. Before patch01, after mddev_suspend, sync thread can't start. But this looks like a problem. I'm not sure if dm has a way to handle the concurrency. In md, we have a new lock sync_mutex to protect this, right? If dm doesn't do this, dm-raid can do the same thing as md does. > > > happen. Even raid_messages clears MD_RECOVERY_FROZEN, the sync thread > > can't start. After resume the array, the sync thread can start again. > > 4) I think I don't need to explain again why suspended should not be > used to prevent starting new sync_thread; Yes. I understand you. But I only follow the existing logic. It has been there for many years. Especially for dmraid/lvmraid, maybe there are some codes that depend on this logic. For such a change, I don't reject it 100%. I just want to say we need to be more careful. Best Regards Xiao > > Thanks, > Kuai > > > > > Regards > > Xiao > >> > >> Thanks, > >> Kuai > >> > >>> > >>> Regards > >>> Xiao > >>> > >>> > >>>> > >>>> Fix above problems by using the new helper to suspend the array during > >>>> suspend, also disallow raid_message() to change sync_thread status > >>>> during suspend. > >>>> > >>>> Note that after commit f52f5c71f3d4 ("md: fix stopping sync thread"), the > >>>> test shell/lvconvert-raid-reshape.sh start to hang in stop_sync_thread(), > >>>> and with previous fixes, the test won't hang there anymore, however, the > >>>> test will still fail and complain that ext4 is corrupted. And with this > >>>> patch, the test won't hang due to stop_sync_thread() or fail due to ext4 > >>>> is corrupted anymore. However, there is still a deadlock related to > >>>> dm-raid456 that will be fixed in following patches. > >>>> > >>>> Reported-by: Mikulas Patocka <mpatocka@redhat.com> > >>>> Closes: https://lore.kernel.org/all/e5e8afe2-e9a8-49a2-5ab0-958d4065c55e@redhat.com/ > >>>> Fixes: 1af2048a3e87 ("dm raid: fix deadlock caused by premature md_stop_writes()") > >>>> Fixes: 9dbd1aa3a81c ("dm raid: add reshaping support to the target") > >>>> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread") > >>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com> > >>>> --- > >>>> drivers/md/dm-raid.c | 38 +++++++++++++++++++++++++++++--------- > >>>> 1 file changed, 29 insertions(+), 9 deletions(-) > >>>> > >>>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c > >>>> index eb009d6bb03a..5ce3c6020b1b 100644 > >>>> --- a/drivers/md/dm-raid.c > >>>> +++ b/drivers/md/dm-raid.c > >>>> @@ -3240,11 +3240,12 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv) > >>>> rs->md.ro = 1; > >>>> rs->md.in_sync = 1; > >>>> > >>>> - /* Keep array frozen until resume. */ > >>>> - set_bit(MD_RECOVERY_FROZEN, &rs->md.recovery); > >>>> - > >>>> /* Has to be held on running the array */ > >>>> mddev_suspend_and_lock_nointr(&rs->md); > >>>> + > >>>> + /* Keep array frozen until resume. */ > >>>> + md_frozen_sync_thread(&rs->md); > >>>> + > >>>> r = md_run(&rs->md); > >>>> rs->md.in_sync = 0; /* Assume already marked dirty */ > >>>> if (r) { > >>>> @@ -3722,6 +3723,9 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv, > >>>> if (!mddev->pers || !mddev->pers->sync_request) > >>>> return -EINVAL; > >>>> > >>>> + if (test_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) > >>>> + return -EBUSY; > >>>> + > >>>> if (!strcasecmp(argv[0], "frozen")) > >>>> set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); > >>>> else > >>>> @@ -3791,15 +3795,31 @@ static void raid_io_hints(struct dm_target *ti, struct queue_limits *limits) > >>>> blk_limits_io_opt(limits, chunk_size_bytes * mddev_data_stripes(rs)); > >>>> } > >>>> > >>>> +static void raid_presuspend(struct dm_target *ti) > >>>> +{ > >>>> + struct raid_set *rs = ti->private; > >>>> + > >>>> + mddev_lock_nointr(&rs->md); > >>>> + md_frozen_sync_thread(&rs->md); > >>>> + mddev_unlock(&rs->md); > >>>> +} > >>>> + > >>>> +static void raid_presuspend_undo(struct dm_target *ti) > >>>> +{ > >>>> + struct raid_set *rs = ti->private; > >>>> + > >>>> + mddev_lock_nointr(&rs->md); > >>>> + md_unfrozen_sync_thread(&rs->md); > >>>> + mddev_unlock(&rs->md); > >>>> +} > >>>> + > >>>> static void raid_postsuspend(struct dm_target *ti) > >>>> { > >>>> struct raid_set *rs = ti->private; > >>>> > >>>> if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) { > >>>> /* Writes have to be stopped before suspending to avoid deadlocks. */ > >>>> - if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery)) > >>>> - md_stop_writes(&rs->md); > >>>> - > >>>> + md_stop_writes(&rs->md); > >>>> mddev_suspend(&rs->md, false); > >>>> } > >>>> } > >>>> @@ -4012,8 +4032,6 @@ static int raid_preresume(struct dm_target *ti) > >>>> } > >>>> > >>>> /* Check for any resize/reshape on @rs and adjust/initiate */ > >>>> - /* Be prepared for mddev_resume() in raid_resume() */ > >>>> - set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); > >>>> if (mddev->recovery_cp && mddev->recovery_cp < MaxSector) { > >>>> set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery); > >>>> mddev->resync_min = mddev->recovery_cp; > >>>> @@ -4056,9 +4074,9 @@ static void raid_resume(struct dm_target *ti) > >>>> rs_set_capacity(rs); > >>>> > >>>> mddev_lock_nointr(mddev); > >>>> - clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); > >>>> mddev->ro = 0; > >>>> mddev->in_sync = 0; > >>>> + md_unfrozen_sync_thread(mddev); > >>>> mddev_unlock_and_resume(mddev); > >>>> } > >>>> } > >>>> @@ -4074,6 +4092,8 @@ static struct target_type raid_target = { > >>>> .message = raid_message, > >>>> .iterate_devices = raid_iterate_devices, > >>>> .io_hints = raid_io_hints, > >>>> + .presuspend = raid_presuspend, > >>>> + .presuspend_undo = raid_presuspend_undo, > >>>> .postsuspend = raid_postsuspend, > >>>> .preresume = raid_preresume, > >>>> .resume = raid_resume, > >>>> -- > >>>> 2.39.2 > >>>> > >>> > >>> . > >>> > >> > > > > > > . > > >
diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c index eb009d6bb03a..5ce3c6020b1b 100644 --- a/drivers/md/dm-raid.c +++ b/drivers/md/dm-raid.c @@ -3240,11 +3240,12 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv) rs->md.ro = 1; rs->md.in_sync = 1; - /* Keep array frozen until resume. */ - set_bit(MD_RECOVERY_FROZEN, &rs->md.recovery); - /* Has to be held on running the array */ mddev_suspend_and_lock_nointr(&rs->md); + + /* Keep array frozen until resume. */ + md_frozen_sync_thread(&rs->md); + r = md_run(&rs->md); rs->md.in_sync = 0; /* Assume already marked dirty */ if (r) { @@ -3722,6 +3723,9 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv, if (!mddev->pers || !mddev->pers->sync_request) return -EINVAL; + if (test_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) + return -EBUSY; + if (!strcasecmp(argv[0], "frozen")) set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); else @@ -3791,15 +3795,31 @@ static void raid_io_hints(struct dm_target *ti, struct queue_limits *limits) blk_limits_io_opt(limits, chunk_size_bytes * mddev_data_stripes(rs)); } +static void raid_presuspend(struct dm_target *ti) +{ + struct raid_set *rs = ti->private; + + mddev_lock_nointr(&rs->md); + md_frozen_sync_thread(&rs->md); + mddev_unlock(&rs->md); +} + +static void raid_presuspend_undo(struct dm_target *ti) +{ + struct raid_set *rs = ti->private; + + mddev_lock_nointr(&rs->md); + md_unfrozen_sync_thread(&rs->md); + mddev_unlock(&rs->md); +} + static void raid_postsuspend(struct dm_target *ti) { struct raid_set *rs = ti->private; if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) { /* Writes have to be stopped before suspending to avoid deadlocks. */ - if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery)) - md_stop_writes(&rs->md); - + md_stop_writes(&rs->md); mddev_suspend(&rs->md, false); } } @@ -4012,8 +4032,6 @@ static int raid_preresume(struct dm_target *ti) } /* Check for any resize/reshape on @rs and adjust/initiate */ - /* Be prepared for mddev_resume() in raid_resume() */ - set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); if (mddev->recovery_cp && mddev->recovery_cp < MaxSector) { set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery); mddev->resync_min = mddev->recovery_cp; @@ -4056,9 +4074,9 @@ static void raid_resume(struct dm_target *ti) rs_set_capacity(rs); mddev_lock_nointr(mddev); - clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); mddev->ro = 0; mddev->in_sync = 0; + md_unfrozen_sync_thread(mddev); mddev_unlock_and_resume(mddev); } } @@ -4074,6 +4092,8 @@ static struct target_type raid_target = { .message = raid_message, .iterate_devices = raid_iterate_devices, .io_hints = raid_io_hints, + .presuspend = raid_presuspend, + .presuspend_undo = raid_presuspend_undo, .postsuspend = raid_postsuspend, .preresume = raid_preresume, .resume = raid_resume,