From patchwork Mon May 29 13:20:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 100269 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp1519145vqr; Mon, 29 May 2023 06:37:20 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5xmyXKbUONJpdJNjkIBjKGzLGZo/4sW+ZSFILOCsNaibfEj6w91iiDhMF6/8wluAk3kCSH X-Received: by 2002:a05:6a21:999f:b0:10b:b166:8836 with SMTP id ve31-20020a056a21999f00b0010bb1668836mr10332970pzb.47.1685367440232; Mon, 29 May 2023 06:37:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685367440; cv=none; d=google.com; s=arc-20160816; b=aVJXk2w0EASE8yTRgyHw72wQubcAhSO9MZn4960Vc8fRPDi5Jzb36a2bDXv6t8fRA5 lPj41Awx0Nj7P24t3ONRPx9pGYHoVadp/F1DBrs6e/hbWa3gbg1RUN1KBdju+jyesb2x 03TZuVC9zP5zsA6XXnv+u0XfcQ02q/TWVc5SUqQ1dDtgXwf/OF4ezDK6KcsWnK4UrJg/ Zxc/5TYZb/XOOfUEqGWkWDd2y6aHmKQ7uiHRGVyzkKpxAsRQbumxOIAmsGPnUcA2W0Il xs2ru85UAlHbOHtvQm+03BCzr5EUwKBeNoJ7KUmqA1bw0r5Ui4mIxO8DLaQsOZa7TL5+ dPnw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=IV+qDQeo9RuChir7ckE4pBVsZgdyxz9ZNt+FuTGV4hQ=; b=VsXCTvbESe2pOgOvVV28QmtW5vRP99KLqPe6717HftNslBLSbt4B3xS0PHr0xkPpX/ 58VC0f21TFvobLHkS4HN2i+f+NFdD/wZ0eEDOLaQgxg4YRxT1JVHV7SaqlAoib1K2fsO LgTLzYWlh1yQj424fr3WtfD5YQlZhlP267JwpS7HCQU4+q4+ehwBEKG9WyNhzUX9eQhk 2cHy4zla0AMqIsqyDO1ouv6qCEEW18+Xr3nPIcrQqVltF1Mpw29rHgCT0DSedqQAeIj8 cZfXqc5YnmLmBioGYHxaK/OgGWxnRwBDiQpWKeYNWEr77SUn8Z1akeeHsNTlPOodMwzm Dppg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j63-20020a638042000000b0052c73367c13si9183302pgd.871.2023.05.29.06.37.05; Mon, 29 May 2023 06:37:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229813AbjE2NY0 (ORCPT + 99 others); Mon, 29 May 2023 09:24:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229569AbjE2NYU (ORCPT ); Mon, 29 May 2023 09:24:20 -0400 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A8EDB7; Mon, 29 May 2023 06:24:18 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4QVGT967V1z4f454c; Mon, 29 May 2023 21:24:13 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgAHvbB9p3RkWVPoKQ--.23397S5; Mon, 29 May 2023 21:24:14 +0800 (CST) From: Yu Kuai To: guoqing.jiang@linux.dev, agk@redhat.com, snitzer@kernel.org, dm-devel@redhat.com, song@kernel.org Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH -next v2 1/6] Revert "md: unlock mddev before reap sync_thread in action_store" Date: Mon, 29 May 2023 21:20:32 +0800 Message-Id: <20230529132037.2124527-2-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230529132037.2124527-1-yukuai1@huaweicloud.com> References: <20230529132037.2124527-1-yukuai1@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAHvbB9p3RkWVPoKQ--.23397S5 X-Coremail-Antispam: 1UD129KBjvJXoWxWr1ftF18AF45XF15ZF18Krg_yoWrKr43p3 yfJF9xJrWUArW3ZrWUJa4DXay5Zw1jq3yqyrWfW34fJw1fKr43G345uFyUZFyDJas5Zw4a qayrJFWrZFW09r7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPY14x267AKxVW5JVWrJwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jr4l82xGYIkIc2 x26xkF7I0E14v26r4j6ryUM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v2 6r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2 Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAFwI0_ Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_Cr 1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUTHqxU UUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,KHOP_HELO_FCRDNS, MAY_BE_FORGED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767235848800115242?= X-GMAIL-MSGID: =?utf-8?q?1767235848800115242?= From: Yu Kuai This reverts commit 9dfbdafda3b34e262e43e786077bab8e476a89d1. Because it will introduce a defect that sync_thread can be running while MD_RECOVERY_RUNNING is cleared, which will cause some unexpected problems, for example: list_add corruption. prev->next should be next (ffff0001ac1daba0), but was ffff0000ce1a02a0. (prev=ffff0000ce1a02a0). Call trace: __list_add_valid+0xfc/0x140 insert_work+0x78/0x1a0 __queue_work+0x500/0xcf4 queue_work_on+0xe8/0x12c md_check_recovery+0xa34/0xf30 raid10d+0xb8/0x900 [raid10] md_thread+0x16c/0x2cc kthread+0x1a4/0x1ec ret_from_fork+0x10/0x18 This is because work is requeued while it's still inside workqueue: t1: t2: action_store mddev_lock if (mddev->sync_thread) mddev_unlock md_unregister_thread // first sync_thread is done md_check_recovery mddev_try_lock /* * once MD_RECOVERY_DONE is set, new sync_thread * can start. */ set_bit(MD_RECOVERY_RUNNING, &mddev->recovery) INIT_WORK(&mddev->del_work, md_start_sync) queue_work(md_misc_wq, &mddev->del_work) test_and_set_bit(WORK_STRUCT_PENDING_BIT, ...) // set pending bit insert_work list_add_tail mddev_unlock mddev_lock_nointr md_reap_sync_thread // MD_RECOVERY_RUNNING is cleared mddev_unlock t3: // before queued work started from t2 md_check_recovery // MD_RECOVERY_RUNNING is not set, a new sync_thread can be started INIT_WORK(&mddev->del_work, md_start_sync) work->data = 0 // work pending bit is cleared queue_work(md_misc_wq, &mddev->del_work) insert_work list_add_tail // list is corrupted The above commit is reverted to fix the problem, the deadlock this commit tries to fix will be fixed in following patches. Signed-off-by: Yu Kuai --- drivers/md/dm-raid.c | 1 - drivers/md/md.c | 19 ++----------------- 2 files changed, 2 insertions(+), 18 deletions(-) diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c index 8846bf510a35..1f22bef27841 100644 --- a/drivers/md/dm-raid.c +++ b/drivers/md/dm-raid.c @@ -3725,7 +3725,6 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv, if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) { if (mddev->sync_thread) { set_bit(MD_RECOVERY_INTR, &mddev->recovery); - md_unregister_thread(&mddev->sync_thread); md_reap_sync_thread(mddev); } } else if (decipher_sync_action(mddev, mddev->recovery) != st_idle) diff --git a/drivers/md/md.c b/drivers/md/md.c index a5a7af2f4e59..9b97731e1fe4 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -4772,19 +4772,6 @@ action_store(struct mddev *mddev, const char *page, size_t len) if (work_pending(&mddev->del_work)) flush_workqueue(md_misc_wq); if (mddev->sync_thread) { - sector_t save_rp = mddev->reshape_position; - - mddev_unlock(mddev); - set_bit(MD_RECOVERY_INTR, &mddev->recovery); - md_unregister_thread(&mddev->sync_thread); - mddev_lock_nointr(mddev); - /* - * set RECOVERY_INTR again and restore reshape - * position in case others changed them after - * got lock, eg, reshape_position_store and - * md_check_recovery. - */ - mddev->reshape_position = save_rp; set_bit(MD_RECOVERY_INTR, &mddev->recovery); md_reap_sync_thread(mddev); } @@ -6184,7 +6171,6 @@ static void __md_stop_writes(struct mddev *mddev) flush_workqueue(md_misc_wq); if (mddev->sync_thread) { set_bit(MD_RECOVERY_INTR, &mddev->recovery); - md_unregister_thread(&mddev->sync_thread); md_reap_sync_thread(mddev); } @@ -9336,7 +9322,6 @@ void md_check_recovery(struct mddev *mddev) * ->spare_active and clear saved_raid_disk */ set_bit(MD_RECOVERY_INTR, &mddev->recovery); - md_unregister_thread(&mddev->sync_thread); md_reap_sync_thread(mddev); clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery); clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); @@ -9372,7 +9357,6 @@ void md_check_recovery(struct mddev *mddev) goto unlock; } if (mddev->sync_thread) { - md_unregister_thread(&mddev->sync_thread); md_reap_sync_thread(mddev); goto unlock; } @@ -9452,7 +9436,8 @@ void md_reap_sync_thread(struct mddev *mddev) sector_t old_dev_sectors = mddev->dev_sectors; bool is_reshaped = false; - /* sync_thread should be unregistered, collect result */ + /* resync has finished, collect result */ + md_unregister_thread(&mddev->sync_thread); if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) && !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) && mddev->degraded != mddev->raid_disks) { From patchwork Mon May 29 13:20:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 100265 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp1512652vqr; Mon, 29 May 2023 06:27:01 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6f5q9GrHCNTtKoNa+DoVyQCB/Quy3UAe6Z6T3x4R7qReRy1mg+RtRzwZEmKOCo1D7vfBMl X-Received: by 2002:a05:6a20:1fa3:b0:f0:50c4:4c43 with SMTP id dm35-20020a056a201fa300b000f050c44c43mr6300590pzb.5.1685366820908; Mon, 29 May 2023 06:27:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685366820; cv=none; d=google.com; s=arc-20160816; b=L+1fbKINWIcbYR95qT1QM67IWWrWlj/xtC7sTXklSfkz15OLWPWKDL4rgTFkzoUs5Q DeOL4rd831MlrBEqU4+5lfT6g81RR+3ewAcv/lKPtzBywBCPTMr2h6DvNm1lh2nwnjS2 21OV/af1btctLymz/+UnZ/hWDTUzlWu0K+4/CECBqd+TLgQ6JtKb4awGBfJjZFqAxkfT I2WJ7OU0PUxkZV/9xa1Z/51zugkw01Ua2U6zrXL53c1ZF+2uP8ZJ04JHMJ4VyFpFLfs9 4xgoAz/MjXtSBZFoXqwrU9X8e4kY2U0svChagw5SlnQ3HACbVWxXK0UXK31Zt+ptnzS1 spOg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=EIojUlYJH869I/dDBDIfC+d7unuyliDyxxLyeyDGMOk=; b=kAy6K0O6GjK3O17pOpAfBI83j00ri4NXBh4ox+NemsHtvDAFFTIFF8+FkSABiACQiP b2vSn65q9mRRaMIWgn3j6kd3hPVd2oEopCTqg562/fxshMcN5Vt7Usr2jCg45+BsBDCI LqKyIBGUlXTTMTvIi0uRMcDPeorzltPBDeMEiYkcXRRSJzo8BZjI6DroD+/QQKUbOESE uS8o0ZbMWNDDZoQtKr7DVyD5IycUp4WEbuN1TpkpvCnbpktUv3jlAWj69uZt1SENQ+rw L/AlHbrelAtGrmu3hhdeiZGyL8bkDjyRVUDyE/tF6WXDW61LPk/UcaQEjN4szJw6TPHz cXMQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y186-20020a6264c3000000b00637745fdf99si3699879pfb.370.2023.05.29.06.26.48; Mon, 29 May 2023 06:27:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229792AbjE2NYY (ORCPT + 99 others); Mon, 29 May 2023 09:24:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47760 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229529AbjE2NYU (ORCPT ); Mon, 29 May 2023 09:24:20 -0400 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5154CA8; Mon, 29 May 2023 06:24:18 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4QVGTB3gZDz4f3vf8; Mon, 29 May 2023 21:24:14 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgAHvbB9p3RkWVPoKQ--.23397S6; Mon, 29 May 2023 21:24:15 +0800 (CST) From: Yu Kuai To: guoqing.jiang@linux.dev, agk@redhat.com, snitzer@kernel.org, dm-devel@redhat.com, song@kernel.org Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH -next v2 2/6] md: refactor action_store() for 'idle' and 'frozen' Date: Mon, 29 May 2023 21:20:33 +0800 Message-Id: <20230529132037.2124527-3-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230529132037.2124527-1-yukuai1@huaweicloud.com> References: <20230529132037.2124527-1-yukuai1@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAHvbB9p3RkWVPoKQ--.23397S6 X-Coremail-Antispam: 1UD129KBjvJXoWxuF45Ww4rtw1rCF1DAFyfXrb_yoW5Xr1kp3 yftas8ArW8JFy3Z343K3WDZay5Zw1IqrWDtrW3W3s5JF1fJF47Gw1Y93WxAFykJa4ftr15 Xa98JFWfuryrGr7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPY14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jryl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v2 6r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2 Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI42IY6xIIjxv20xvEc7CjxVAFwI0_ Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_Cr 1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUAGYLU UUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,KHOP_HELO_FCRDNS, MAY_BE_FORGED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767235199588794662?= X-GMAIL-MSGID: =?utf-8?q?1767235199588794662?= From: Yu Kuai Prepare to handle 'idle' and 'frozen' differently to fix a deadlock, there are no functional changes except that MD_RECOVERY_RUNNING is checked again after 'reconfig_mutex' is held. Signed-off-by: Yu Kuai --- drivers/md/md.c | 61 ++++++++++++++++++++++++++++++++++++------------- 1 file changed, 45 insertions(+), 16 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 9b97731e1fe4..23e8e7eae062 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -4755,6 +4755,46 @@ action_show(struct mddev *mddev, char *page) return sprintf(page, "%s\n", type); } +static void stop_sync_thread(struct mddev *mddev) +{ + if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) + return; + + if (mddev_lock(mddev)) + return; + + /* + * Check again in case MD_RECOVERY_RUNNING is cleared before lock is + * held. + */ + if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) { + mddev_unlock(mddev); + return; + } + + if (work_pending(&mddev->del_work)) + flush_workqueue(md_misc_wq); + + if (mddev->sync_thread) { + set_bit(MD_RECOVERY_INTR, &mddev->recovery); + md_reap_sync_thread(mddev); + } + + mddev_unlock(mddev); +} + +static void idle_sync_thread(struct mddev *mddev) +{ + clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); + stop_sync_thread(mddev); +} + +static void frozen_sync_thread(struct mddev *mddev) +{ + set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); + stop_sync_thread(mddev); +} + static ssize_t action_store(struct mddev *mddev, const char *page, size_t len) { @@ -4762,22 +4802,11 @@ action_store(struct mddev *mddev, const char *page, size_t len) return -EINVAL; - if (cmd_match(page, "idle") || cmd_match(page, "frozen")) { - if (cmd_match(page, "frozen")) - set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); - else - clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); - if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) && - mddev_lock(mddev) == 0) { - if (work_pending(&mddev->del_work)) - flush_workqueue(md_misc_wq); - if (mddev->sync_thread) { - set_bit(MD_RECOVERY_INTR, &mddev->recovery); - md_reap_sync_thread(mddev); - } - mddev_unlock(mddev); - } - } else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) + if (cmd_match(page, "idle")) + idle_sync_thread(mddev); + else if (cmd_match(page, "frozen")) + frozen_sync_thread(mddev); + else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) return -EBUSY; else if (cmd_match(page, "resync")) clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); From patchwork Mon May 29 13:20:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 100266 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp1512675vqr; Mon, 29 May 2023 06:27:04 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4PZdDKhX2zDV7dgV8D8a8gcSIuUKfw1kLgU0DMrql0thXPEUK63TMspmo3UBa4Ar6f/J/y X-Received: by 2002:a17:903:2351:b0:1b0:3576:c2b5 with SMTP id c17-20020a170903235100b001b03576c2b5mr6258625plh.7.1685366824114; Mon, 29 May 2023 06:27:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685366824; cv=none; d=google.com; s=arc-20160816; b=gYLPsRZ5joe4fT5eHO/GTvL0ELqH3UwW2lY3aTe7X3T9LdWcPEuG6Q71FGM0zs84EP 8afLoytaid6V+G2lG3Ae3k52gjzQeqEDUIB6OoMx5bZFnElZy1oZjKsNaWllotLKrcQ7 kjQQvU+3on+3XwvEUX3ewtKJDll2xFDkNjBoCN0UKfRwKClnZFTmepHxdgeTrHmsc3cy 0eIp8wno1tWyczAqmiVF0g91DaSpqHJAWHoDVFCxgNUc3T+Mn5YlF+zBtcQK5wll9NA0 Jv3bFvIHvNtggoqyJxaJq38k77JeG1w+OJaTlPeU43RtA0IPhqpRMSJcVIqN3RcdpTN6 Iacw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=F0cbtZ8pv4CIAajnLmyNFkMUo7R8UA4KOsYTGmNyIX0=; b=hViV1BItvuiqMjQ2yYE4GW2pGdiQBD+DX5l8/hm3c9W5aDdotHB1YpokTLykpoLCUf ERyXwI7PAuzBosThmJr4iyfF72W/BrGg3cYac//E0Wd+pc8bbH5HCGNQqqkb3Zc6eln7 XDmmsohCREZnACK5mU0NowHRy5UY3uUwWKv+d62tELp3SzDsaL8yDTFhol8ggxvmr5xj +UQce4eLRO00nhkJ5zO9Pit8VYexYA7I1qZd3qEIAXaZSioLV5vLx8Rz4i+/2dqfT4mI tMt23MqgQqOXnGW5omG5HGhwBdn99UW3zGxOlIeiqmCkRUqzpcIS1Glg7jbog77pd8mE +iGA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c17-20020a170903235100b001b02ccb155bsi768253plh.327.2023.05.29.06.26.52; Mon, 29 May 2023 06:27:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229569AbjE2NY2 (ORCPT + 99 others); Mon, 29 May 2023 09:24:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47766 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229716AbjE2NYV (ORCPT ); Mon, 29 May 2023 09:24:21 -0400 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 36D90C7; Mon, 29 May 2023 06:24:19 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4QVGTB5xgzz4f455B; Mon, 29 May 2023 21:24:14 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgAHvbB9p3RkWVPoKQ--.23397S7; Mon, 29 May 2023 21:24:15 +0800 (CST) From: Yu Kuai To: guoqing.jiang@linux.dev, agk@redhat.com, snitzer@kernel.org, dm-devel@redhat.com, song@kernel.org Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH -next v2 3/6] md: add a mutex to synchronize idle and frozen in action_store() Date: Mon, 29 May 2023 21:20:34 +0800 Message-Id: <20230529132037.2124527-4-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230529132037.2124527-1-yukuai1@huaweicloud.com> References: <20230529132037.2124527-1-yukuai1@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAHvbB9p3RkWVPoKQ--.23397S7 X-Coremail-Antispam: 1UD129KBjvJXoWxJryUWFy7Zw43WFW7ArWUurg_yoW8KFyUpF WxJa95Ar4DArsxAr17Jan7uay5Xwn2gFWDtry3Ca1fG3WfAr4qqFn0gFWUuFykCa4fAF9F q3Z5XF43ZFy8Wr7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPY14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JrWl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v2 6r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2 Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI42IY6xIIjxv20xvEc7CjxVAFwI0_ Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_Cr 1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUCXdbU UUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,KHOP_HELO_FCRDNS, MAY_BE_FORGED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767235203215470358?= X-GMAIL-MSGID: =?utf-8?q?1767235203215470358?= From: Yu Kuai Currently, for idle and frozen, action_store will hold 'reconfig_mutex' and call md_reap_sync_thread() to stop sync thread, however, this will cause deadlock (explained in the next patch). In order to fix the problem, following patch will release 'reconfig_mutex' and wait on 'resync_wait', like md_set_readonly() and do_md_stop() does. Consider that action_store() will set/clear 'MD_RECOVERY_FROZEN' unconditionally, which might cause unexpected problems, for example, frozen just set 'MD_RECOVERY_FROZEN' and is still in progress, while 'idle' clear 'MD_RECOVERY_FROZEN' and new sync thread is started, which might starve in progress frozen. A mutex is added to synchronize idle and frozen from action_store(). Signed-off-by: Yu Kuai --- drivers/md/md.c | 5 +++++ drivers/md/md.h | 3 +++ 2 files changed, 8 insertions(+) diff --git a/drivers/md/md.c b/drivers/md/md.c index 23e8e7eae062..63a993b52cd7 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -644,6 +644,7 @@ void mddev_init(struct mddev *mddev) mutex_init(&mddev->open_mutex); mutex_init(&mddev->reconfig_mutex); mutex_init(&mddev->delete_mutex); + mutex_init(&mddev->sync_mutex); mutex_init(&mddev->bitmap_info.mutex); INIT_LIST_HEAD(&mddev->disks); INIT_LIST_HEAD(&mddev->all_mddevs); @@ -4785,14 +4786,18 @@ static void stop_sync_thread(struct mddev *mddev) static void idle_sync_thread(struct mddev *mddev) { + mutex_lock(&mddev->sync_mutex); clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); stop_sync_thread(mddev); + mutex_unlock(&mddev->sync_mutex); } static void frozen_sync_thread(struct mddev *mddev) { + mutex_init(&mddev->delete_mutex); set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); stop_sync_thread(mddev); + mutex_unlock(&mddev->sync_mutex); } static ssize_t diff --git a/drivers/md/md.h b/drivers/md/md.h index bfd2306bc750..2fa903de5bd0 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -537,6 +537,9 @@ struct mddev { /* Protect the deleting list */ struct mutex delete_mutex; + /* Used to synchronize idle and frozen for action_store() */ + struct mutex sync_mutex; + bool has_superblocks:1; bool fail_last_dev:1; bool serialize_policy:1; From patchwork Mon May 29 13:20:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 100267 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp1512808vqr; Mon, 29 May 2023 06:27:19 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ54lgUm/tpfEXYOU/flk913oRgk9Zm1oTMB39NjyELvjQol1nczimhN52no9V8WWRSXKhnG X-Received: by 2002:a05:6a00:218b:b0:645:fc7b:63db with SMTP id h11-20020a056a00218b00b00645fc7b63dbmr15723510pfi.20.1685366838822; Mon, 29 May 2023 06:27:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685366838; cv=none; d=google.com; s=arc-20160816; b=cuqjSFAXLubqz4MKkiXeOkkFJHJ6s4WtEK+8n3aBUah0PPInB3Dj0VYR5Ny269YE/g 2cy9W/V8tz1zX9D7hDvbBkFjGPbsF4x6Ohv0boQYs1UE4l5CQn4maICeuMmoQnh+MUWU B21f1OuCf7M/93x/CdJZ6SeSoxV7ZUUVZv0O4uR76FdMyi3+XIbaRqTZHOx5lk9QXrCs 8jEb1FnlsrhX9Cf5joftUDxCRgsTLHu2BPyj6SOLeNYLRVy8xdpDx11IdDhw+tzZV2iS GElSC8XbZT427aj5YLh14k7zfWNmfzF+rBEK2lPDUhGAa1RltUD2mtMf8ur3PMITjjFJ oppw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=sEAo3anKF5BNidaZgv96JzXJxU2rfF5O6AQPeQsye00=; b=C9YmU3vD5YWwrZWuvzfnY3zNfLhMj+wj81O2pvbISZW0Mx7lKC9NFCOFe1/zqg+iuR Za6vNtWgyb1l/1kJKqsSZc0wkS6GRNWE7sz5T+nxXVjJr9vnqEQM+8QNXXUb4iP0M/eq d9rm2NgwmmL5VZK0xzTP/diI1Rq3OKdoA8qjSu0afLwAn2W0TSW/ASA9tNE2Y+FaxZbl XFjXonh/r7bJbJlcfU4V4pceg6kCeeax0y4nmnL18nOLZQrgUL/sOObU+yo3BzBc9FIG TOjFkyOajPDxZ1heXhoBYtm5PmZfepr7l06mJ3xHkBLV7e4/tmqyDrXAQrFuwFOqhkaP QK7A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 34-20020a631062000000b0053b0da378edsi8557887pgq.789.2023.05.29.06.27.06; Mon, 29 May 2023 06:27:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229723AbjE2NYe (ORCPT + 99 others); Mon, 29 May 2023 09:24:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47776 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229780AbjE2NYW (ORCPT ); Mon, 29 May 2023 09:24:22 -0400 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7BCF09B; Mon, 29 May 2023 06:24:20 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4QVGTC3tQQz4f3sjc; Mon, 29 May 2023 21:24:15 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgAHvbB9p3RkWVPoKQ--.23397S8; Mon, 29 May 2023 21:24:16 +0800 (CST) From: Yu Kuai To: guoqing.jiang@linux.dev, agk@redhat.com, snitzer@kernel.org, dm-devel@redhat.com, song@kernel.org Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH -next v2 4/6] md: refactor idle/frozen_sync_thread() to fix deadlock Date: Mon, 29 May 2023 21:20:35 +0800 Message-Id: <20230529132037.2124527-5-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230529132037.2124527-1-yukuai1@huaweicloud.com> References: <20230529132037.2124527-1-yukuai1@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAHvbB9p3RkWVPoKQ--.23397S8 X-Coremail-Antispam: 1UD129KBjvJXoWxZr4xKr4xWF4DXF1kCF4kJFb_yoW7Jw4fp3 yxtFn8Ar4UArW3ZrsrJ3Zrua4rZw109a9rtrW3ua4xAr1Sgr43tFn5uFy8ZFykAa9ayr4U Xw4rXayfuFWUWr7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPI14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr 0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUA rcfUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767235218617551157?= X-GMAIL-MSGID: =?utf-8?q?1767235218617551157?= From: Yu Kuai Our test found a following deadlock in raid10: 1) Issue a normal write, and such write failed: raid10_end_write_request set_bit(R10BIO_WriteError, &r10_bio->state) one_write_done reschedule_retry // later from md thread raid10d handle_write_completed list_add(&r10_bio->retry_list, &conf->bio_end_io_list) // later from md thread raid10d if (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags)) list_move(conf->bio_end_io_list.prev, &tmp) r10_bio = list_first_entry(&tmp, struct r10bio, retry_list) raid_end_bio_io(r10_bio) Dependency chain 1: normal io is waiting for updating superblock 2) Trigger a recovery: raid10_sync_request raise_barrier Dependency chain 2: sync thread is waiting for normal io 3) echo idle/frozen to sync_action: action_store mddev_lock md_unregister_thread kthread_stop Dependency chain 3: drop 'reconfig_mutex' is waiting for sync thread 4) md thread can't update superblock: raid10d md_check_recovery if (mddev_trylock(mddev)) md_update_sb Dependency chain 4: update superblock is waiting for 'reconfig_mutex' Hence cyclic dependency exist, in order to fix the problem, we must break one of them. Dependency 1 and 2 can't be broken because they are foundation design. Dependency 4 may be possible if it can be guaranteed that no io can be inflight, however, this requires a new mechanism which seems complex. Dependency 3 is a good choice, because idle/frozen only requires sync thread to finish, which can be done asynchronously that is already implemented, and 'reconfig_mutex' is not needed anymore. This patch switch 'idle' and 'frozen' to wait sync thread to be done asynchronously, and this patch also add a sequence counter to record how many times sync thread is done, so that 'idle' won't keep waiting on new started sync thread. Noted that raid456 has similiar deadlock([1]), and it's verified[2] this deadlock can be fixed by this patch as well. [1] https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t [2] https://lore.kernel.org/linux-raid/e9067438-d713-f5f3-0d3d-9e6b0e9efa0e@huaweicloud.com/ Signed-off-by: Yu Kuai --- drivers/md/md.c | 23 +++++++++++++++++++---- drivers/md/md.h | 2 ++ 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 63a993b52cd7..7912de0e4d12 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -652,6 +652,7 @@ void mddev_init(struct mddev *mddev) timer_setup(&mddev->safemode_timer, md_safemode_timeout, 0); atomic_set(&mddev->active, 1); atomic_set(&mddev->openers, 0); + atomic_set(&mddev->sync_seq, 0); spin_lock_init(&mddev->lock); atomic_set(&mddev->flush_pending, 0); init_waitqueue_head(&mddev->sb_wait); @@ -4776,19 +4777,27 @@ static void stop_sync_thread(struct mddev *mddev) if (work_pending(&mddev->del_work)) flush_workqueue(md_misc_wq); - if (mddev->sync_thread) { - set_bit(MD_RECOVERY_INTR, &mddev->recovery); - md_reap_sync_thread(mddev); - } + set_bit(MD_RECOVERY_INTR, &mddev->recovery); + /* + * Thread might be blocked waiting for metadata update which will now + * never happen + */ + md_wakeup_thread_directly(mddev->sync_thread); mddev_unlock(mddev); } static void idle_sync_thread(struct mddev *mddev) { + int sync_seq = atomic_read(&mddev->sync_seq); + mutex_lock(&mddev->sync_mutex); clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); stop_sync_thread(mddev); + + wait_event(resync_wait, sync_seq != atomic_read(&mddev->sync_seq) || + !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)); + mutex_unlock(&mddev->sync_mutex); } @@ -4797,6 +4806,10 @@ static void frozen_sync_thread(struct mddev *mddev) mutex_init(&mddev->delete_mutex); set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); stop_sync_thread(mddev); + + wait_event(resync_wait, mddev->sync_thread == NULL && + !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)); + mutex_unlock(&mddev->sync_mutex); } @@ -9472,6 +9485,8 @@ void md_reap_sync_thread(struct mddev *mddev) /* resync has finished, collect result */ md_unregister_thread(&mddev->sync_thread); + atomic_inc(&mddev->sync_seq); + if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) && !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) && mddev->degraded != mddev->raid_disks) { diff --git a/drivers/md/md.h b/drivers/md/md.h index 2fa903de5bd0..7cab9c7c45b8 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -539,6 +539,8 @@ struct mddev { /* Used to synchronize idle and frozen for action_store() */ struct mutex sync_mutex; + /* The sequence number for sync thread */ + atomic_t sync_seq; bool has_superblocks:1; bool fail_last_dev:1; From patchwork Mon May 29 13:20:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 100271 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp1525569vqr; Mon, 29 May 2023 06:47:51 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6QADEN6ghl5BftH5Taj1QANmq5exJ74D8QsQ28dcUvfDb531tnzrocZt9St/IgIUKGRRUH X-Received: by 2002:a17:902:da84:b0:1ad:d95d:ca9c with SMTP id j4-20020a170902da8400b001add95dca9cmr8972684plx.15.1685368071425; Mon, 29 May 2023 06:47:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685368071; cv=none; d=google.com; s=arc-20160816; b=ntJpIqw1R4jNAIDdZu16tF4RppVDKFNEtFwMZG+HW4OqmUosY5JRX0DUmMFnaS3l5R HsEf+7XIyGFlPAwiZGPbuUTlVpwNYrODvocvYKNzKp03aRAaNd/dOHtiGtpbs8LIxPHE gumtqgf7Aq1330zz3mTjq8XOa35Ws0vTLG9MLRGnYD6eEmOPSHkgLwqrKndYA5lLzDAd Q3VZHcsTQWtU9E9Mz7IUXZD2hHBM6uHI80vyeSH9FGcqR084z45XtP0k3FEcXcS4Y20M GcIbjA/YsayP1dG0c1F4d96/20FV0r2x9EYXLXDV6cRJSdJUlrM4+LHF9dAL/HkDNKK7 UByw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=BZoESHWD+6/b6+pbkk8JkNmpx1t2BG++P5RAWR2sNz0=; b=0MkcnYAAfO+mbb+jYAdXjwSpUhc75MoxdpgOw2GP+iekf5/au2zMW9gafYA8xWCavn YbBclFPpUiXKyPXx6s5lhnHpU9TnyGIPtdF3eDgOxVE2qWvMtOt6ht0rpfA8sHtgE57g 27sp6m1emfK2lLAx0L0N6V2jJNSiXjMLVFvLTuySs7bAgLpUPbj+KcAheCuEpbnCQPDP CFm2SyjAYlrzv/5c3Pt/gZFfGIHGuYb0IMsX1tTlR56fD77CIGFoMk64wLBdlQaqGAqt XbzSxh8+UCPn/P23k0ktWWQWdWTEmGPH1QYwv8NWHaCbOaqvXdxTBW5DwYztjgZF6pii GQfQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p13-20020a170902e74d00b001afb1e9b2dfsi7708547plf.484.2023.05.29.06.47.36; Mon, 29 May 2023 06:47:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229823AbjE2NYc (ORCPT + 99 others); Mon, 29 May 2023 09:24:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47778 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229787AbjE2NYW (ORCPT ); Mon, 29 May 2023 09:24:22 -0400 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 31D2390; Mon, 29 May 2023 06:24:21 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4QVGTD068Vz4f3nCC; Mon, 29 May 2023 21:24:16 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgAHvbB9p3RkWVPoKQ--.23397S9; Mon, 29 May 2023 21:24:17 +0800 (CST) From: Yu Kuai To: guoqing.jiang@linux.dev, agk@redhat.com, snitzer@kernel.org, dm-devel@redhat.com, song@kernel.org Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH -next v2 5/6] md: wake up 'resync_wait' at last in md_reap_sync_thread() Date: Mon, 29 May 2023 21:20:36 +0800 Message-Id: <20230529132037.2124527-6-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230529132037.2124527-1-yukuai1@huaweicloud.com> References: <20230529132037.2124527-1-yukuai1@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAHvbB9p3RkWVPoKQ--.23397S9 X-Coremail-Antispam: 1UD129KBjvdXoW7Jr1rKr4fKrWfZFW3GF4fAFb_yoWkuFX_WF 9xZrWkXry7W39rKr1Yvw4SvrZ5tws8Ww1kZFyftFyjyFW5J348Jr93uw15Zwn3u3y7G34Y krWj9FWfZrW5GjkaLaAFLSUrUUUUUb8apTn2vfkv8UJUUUU8Yxn0WfASr-VFAUDa7-sFnT 9fnUUIcSsGvfJTRUUUbq8FF20E14v26rWj6s0DM7CY07I20VC2zVCF04k26cxKx2IYs7xG 6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAVCq3wA2048vs2 IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxSw2x7M28E F7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxVWxJr0_Gc Wl84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE3s1l e2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2Ix0cI 8IcVAFwI0_JrI_JrylYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJVW8JwAC jcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lFIxGxcIEc7CjxVA2Y2ka0x kIwI1l42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AK xVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43MIIYrx kI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Gr0_Xr1lIxAIcVC0I7IYx2IY6xkF7I0E14v2 6r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F 4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuYvjfUF18B UUUUU X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767236510971512197?= X-GMAIL-MSGID: =?utf-8?q?1767236510971512197?= From: Yu Kuai md_reap_sync_thread() is just replaced with wait_event(resync_wait, ...) from action_store(), just make sure action_store() will still wait for everything to be done in md_reap_sync_thread(). Signed-off-by: Yu Kuai --- drivers/md/md.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 7912de0e4d12..f90226e6ddf8 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -9531,7 +9531,6 @@ void md_reap_sync_thread(struct mddev *mddev) if (mddev_is_clustered(mddev) && is_reshaped && !test_bit(MD_CLOSING, &mddev->flags)) md_cluster_ops->update_size(mddev, old_dev_sectors); - wake_up(&resync_wait); /* flag recovery needed just to double check */ set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); sysfs_notify_dirent_safe(mddev->sysfs_completed); @@ -9539,6 +9538,7 @@ void md_reap_sync_thread(struct mddev *mddev) md_new_event(); if (mddev->event_work.func) queue_work(md_misc_wq, &mddev->event_work); + wake_up(&resync_wait); } EXPORT_SYMBOL(md_reap_sync_thread); From patchwork Mon May 29 13:20:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 100270 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp1521588vqr; Mon, 29 May 2023 06:41:34 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5BR0yN3sZPKg8yV9kjH9TsHhaGXJ2YkSEqbBRfiZ2k+DtytpkIVdtyHCoUDDJgROq0VSyx X-Received: by 2002:a05:6a00:c82:b0:643:6b94:374b with SMTP id a2-20020a056a000c8200b006436b94374bmr15519637pfv.1.1685367693944; Mon, 29 May 2023 06:41:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685367693; cv=none; d=google.com; s=arc-20160816; b=fcqdjzSfggpbGl151l+j9LOHpXBtVs1Zovwkwd6XGh9NoJKxHYl6TPFP4TQNqQTe+t VYCPOYsq6HsUbkktaRDeVy1mHoyuZsBIWzRixRe/BUYJ8apOld6N1+BzUi7xuYCCcHcv wIJcZ7Rgdg3YlJX4TMRro6vGlvMjxOI0z/LKby5I61IBPf2NPkKvdXyyWvjij4hCZewM 56PmF9VJ5QqY9agyK7coTqYNS7W1AXn0n2XJgFuv8k1q6TISvqRxi5aK7b8lG9TgZ4vf WQN42Wx1FCgiz3cb74dva0U/sGuOZao3SGEFBGWyQQMWiL0GSEySV+EeY/7P3jR7NDOz 9Jcg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=zWH0MwIldxtUeaYPi8qywttKldq/26QV7GuYsyK6/5E=; b=Iczzr5SNAliQe47GFuEgH+XBy+A4dRVoZi7F0+a595CNxl0LpqYVIhwlIGKwxdOiLv nHMLQpfzYBVXEMYCxXXDlqlFq5Q8WSfL4oHteGDOqxvPy4r/EaJO5R6i2kbSTNDwwqWk 9o8aA2b9LXYd4WQyryDLAmCbcfACLIcqNZg/+0NzWO9gN6U/3cH6IaVR6BbB81FAoXAS NvnmRzcK5M/xCgPADpPEFm6zvTj9u/h6++0HLHz5fvFx2GKC6hKRPSlLczj7e++6JqV9 1HoEspS/OIm0sIw/nj8rMSxjZFal1WapJB/pnbjq5LIfoPLjV7S9UbHGqLNME2fxzJ5a DGgQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k12-20020aa79d0c000000b00643aa8d8cb7si10056951pfp.185.2023.05.29.06.41.20; Mon, 29 May 2023 06:41:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229830AbjE2NYl (ORCPT + 99 others); Mon, 29 May 2023 09:24:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47784 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229796AbjE2NYY (ORCPT ); Mon, 29 May 2023 09:24:24 -0400 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 14678A8; Mon, 29 May 2023 06:24:22 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4QVGTF2CKwz4f3wQj; Mon, 29 May 2023 21:24:17 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgAHvbB9p3RkWVPoKQ--.23397S10; Mon, 29 May 2023 21:24:18 +0800 (CST) From: Yu Kuai To: guoqing.jiang@linux.dev, agk@redhat.com, snitzer@kernel.org, dm-devel@redhat.com, song@kernel.org Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH -next v2 6/6] md: enhance checking in md_check_recovery() Date: Mon, 29 May 2023 21:20:37 +0800 Message-Id: <20230529132037.2124527-7-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230529132037.2124527-1-yukuai1@huaweicloud.com> References: <20230529132037.2124527-1-yukuai1@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAHvbB9p3RkWVPoKQ--.23397S10 X-Coremail-Antispam: 1UD129KBjvJXoW7CF1DXF15WF4fCry5trW3Awb_yoW8CryUpF WfWas8GrW8ZFW3ZrW7Ka4DJayrAw40vayjyFy3Wa4rJF13ta129345uF1UAFWDAa9aq3WY y3W5JFs3Zry8Cw7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUP214x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Cr1j6r xdM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s0D M2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjx v20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1l F7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7M4IIrI8v6xkF7I0E8cxan2 IY04v7MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAF wI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUtVW8ZwCIc4 0Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r4j6ryUMIIF0xvE2Ix0cI8IcVCY1x0267AK xVW8Jr0_Cr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JV WxJwCI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7VU17G YJUUUUU== X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,KHOP_HELO_FCRDNS, MAY_BE_FORGED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767236114934575220?= X-GMAIL-MSGID: =?utf-8?q?1767236114934575220?= From: Yu Kuai For md_check_recovery(): 1) if 'MD_RECOVERY_RUNING' is not set, register new sync_thread. 2) if 'MD_RECOVERY_RUNING' is set: a) if 'MD_RECOVERY_DONE' is not set, don't do anything, wait for md_do_sync() to be done. b) if 'MD_RECOVERY_DONE' is set, unregister sync_thread. Current code expects that sync_thread is not NULL, otherwise new sync_thread will be registered, which will corrupt the array. Make sure md_check_recovery() won't register new sync_thread if 'MD_RECOVERY_RUNING' is still set, and a new WARN_ON_ONCE() is added for the above corruption, Signed-off-by: Yu Kuai Reviewed-by: Xiao Ni --- drivers/md/md.c | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index f90226e6ddf8..9da0fc906bbd 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -9397,16 +9397,24 @@ void md_check_recovery(struct mddev *mddev) if (mddev->sb_flags) md_update_sb(mddev, 0); - if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) && - !test_bit(MD_RECOVERY_DONE, &mddev->recovery)) { - /* resync/recovery still happening */ - clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); - goto unlock; - } - if (mddev->sync_thread) { + /* + * Never start a new sync thread if MD_RECOVERY_RUNNING is + * still set. + */ + if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) { + if (!test_bit(MD_RECOVERY_DONE, &mddev->recovery)) { + /* resync/recovery still happening */ + clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); + goto unlock; + } + + if (WARN_ON_ONCE(!mddev->sync_thread)) + goto unlock; + md_reap_sync_thread(mddev); goto unlock; } + /* Set RUNNING before clearing NEEDED to avoid * any transients in the value of "sync_action". */