Message ID | 20240120103734.4155446-6-yukuai1@huaweicloud.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-31765-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2bc4:b0:101:a8e8:374 with SMTP id hx4csp1546943dyb; Sat, 20 Jan 2024 02:43:49 -0800 (PST) X-Google-Smtp-Source: AGHT+IE0IAL6+NS2cRC1Q3inDQSyUsQidMM+WxHlw7w2b8IlaHkNMe/Ue+S3Dsf7XHxPVYHGq4h4 X-Received: by 2002:a05:6a00:3a0c:b0:6da:ca92:3e2e with SMTP id fj12-20020a056a003a0c00b006daca923e2emr3330752pfb.7.1705747429189; Sat, 20 Jan 2024 02:43:49 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705747429; cv=pass; d=google.com; s=arc-20160816; b=a02Yw1DkNsOPcD3Waprznjf8FfH++9M24RJLKJo3RM3jp6ieuJXy5snhZQ3sRaBU4S z514LSf1VBrEiR68yp9J6LatarJGFOAWD8Z3G2+0NfxRdDnteo/wHSZoDjjFvONgQEJQ jRDfqOYpBnOu2Zkjbvlc59nP+WbJiuBmqgGSpodqXD3ZmRp3IJTWtmk0N3Y16Q96sAl+ C2T4zauDeK0M5WlaQu0LG3qN2KCxcEMo8VkhHBDBVUAN6l+Ne/+6P/kLrhMAI4n89Xe8 W1zDEMYIW6oo5SnGttg/icCI4GUUynuujnCqkLbaXaBm1tYDFnQuUDQ2LjwB9xjy8b6/ fmNQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=J55X/o3qp8dvNiu9Ry884uFweEtuQY0vPCQbEunqmMU=; fh=y2Ma2pP32dbTTwcFlrDJ//qG1pVVCx07l+xa/F5EZpc=; b=NRqjGlCHiV6Ag9UXeRMEqeWCTYdjTjvb1W/AQS5MOyvklwo/7uw+Knu1DPkLu5xwDu wGk3emG8XWcp4iHnbeLiFZuZYg+rmkYh1G2F92PPw7zvk1T/pWHzF+/N5O2Qgtg1zjQ9 qwQQVWRijTERfXVNA3N64SOr3ky8T3AAPOCfScT3Do6QDOgoaO6caEAo7L59+2t7a/yR MgTBZVElZL3FCfkv/shPZiA4ParL7BIbdsypndk8AgYfhltF1fFCU9ADGFi24CM8f0zh JbGQPEUwbaZE9HE7kYVA8BVYsfckc1xkY2GskGS8eelwS0hvu8YdpYZxBwh8SHf1jtQa iNlw== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-31765-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-31765-ouuuleilei=gmail.com@vger.kernel.org" Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id c16-20020a056a00009000b006dbb2b6571asi3496440pfj.81.2024.01.20.02.43.49 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 20 Jan 2024 02:43:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-31765-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-31765-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-31765-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id E74132836A7 for <ouuuleilei@gmail.com>; Sat, 20 Jan 2024 10:43:48 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5CE7614F78; Sat, 20 Jan 2024 10:41:31 +0000 (UTC) Received: from dggsgout12.his.huawei.com (unknown [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 61DF412B86; Sat, 20 Jan 2024 10:41:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705747289; cv=none; b=FKsbrMRxQB6d+Uza2CJuNFdgL9+hdkPCaOaa0Plu5piRxWQqaJfg657UlMR43I26LoasVUdHbLjHVUlkRk4LOtyyuAxMJb73qB9614SZ/6Vpeyshz/T2BH/D0bcPjSqng5tByacCqNWhxiO1/GbZRbCOeLZX7S3B2dfbLLb1Bo4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705747289; c=relaxed/simple; bh=vYtge49+SIPK2TMqk3PqdGYdBsr5TMvfVLJDD/vFH2A=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=uv8fEd0TgWjCJzkNH53aobzlt8gXoQ7EYr9f7SuXHroZArtJNLFhi7WgenGQLcnwZdbV498LN/2RmdG7e2+z7veUdW8gU1swX6CZWuGEVodCaPsvWqh2GRHTlrJp19KvA1dvv3S4Zs20RsU+WJj+XRmnerlL5TubKlm4nmmXpww= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4THChC0zdZz4f3kG2; Sat, 20 Jan 2024 18:41:15 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id EBD201A0171; Sat, 20 Jan 2024 18:41:18 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgAn+RFKo6tlY4mmBQ--.38494S9; Sat, 20 Jan 2024 18:41:18 +0800 (CST) From: Yu Kuai <yukuai1@huaweicloud.com> To: mpatocka@redhat.com, dm-devel@lists.linux.dev, msnitzer@redhat.com, heinzm@redhat.com, song@kernel.org, yukuai3@huawei.com Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH RFC 5/5] md: use md_reap_sync_thread() directly for dm-raid Date: Sat, 20 Jan 2024 18:37:34 +0800 Message-Id: <20240120103734.4155446-6-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240120103734.4155446-1-yukuai1@huaweicloud.com> References: <20240120103734.4155446-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID: cCh0CgAn+RFKo6tlY4mmBQ--.38494S9 X-Coremail-Antispam: 1UD129KBjvJXoWrtr1UWw1xZFWUJw4kWryDJrb_yoW8JrWDp3 yfWFy5Cr15Crs7Ar17WFyDZFyrZw1S9rWqyr9xCay3Z3W5Jr47Cr1F9FyjgFyDuFWfJwsx XF4rJFWfCa48KrJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPF14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Xr0_Ar1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJV W8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjfUOBTY UUUUU X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1788605816397316609 X-GMAIL-MSGID: 1788605816397316609 |
Series |
md: fix/prevent dm-raid regressions
|
|
Commit Message
Yu Kuai
Jan. 20, 2024, 10:37 a.m. UTC
From: Yu Kuai <yukuai3@huawei.com> Now that previous patch make sure that stop_sync_thread() can successfully stop sync_thread, and lvm2 tests won't hang anymore. However, the test lvconvert-raid-reshape.sh still fail and complain that ext4 is corrupted. The root cause is still not clear yet, however, let's convert dm-raid back to use md_reap_sync_thread() directly. This is not safe but at least there won't be new regressions. We can decide what to do after figuring out the root cause. Signed-off-by: Yu Kuai <yukuai3@huawei.com> --- drivers/md/md.c | 8 ++++++++ 1 file changed, 8 insertions(+)
Comments
Hi, 在 2024/01/20 18:37, Yu Kuai 写道: > The root cause is still not clear yet, however, let's convert dm-raid > back to use md_reap_sync_thread() directly. This is not safe but at > least there won't be new regressions. We can decide what to do after > figuring out the root cause. I think I finally figure out the root cause here. This patch is no longer needed after following patch. I already verified in my VM for 3 times that lvconvert-raid-reshape.sh won't fail(with raid6 patch 2c265ac5ffde reverted). I'll run more tests in case there are new regression. Meanwhile I'll try to locate root cause of the problem decribed in patch 4. Thanks, Kuai diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c index eb009d6bb03a..108e7e313631 100644 --- a/drivers/md/dm-raid.c +++ b/drivers/md/dm-raid.c @@ -3241,7 +3241,7 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv) rs->md.in_sync = 1; /* Keep array frozen until resume. */ - set_bit(MD_RECOVERY_FROZEN, &rs->md.recovery); + md_frozen_sync_thread(&rs->md); /* Has to be held on running the array */ mddev_suspend_and_lock_nointr(&rs->md); @@ -3722,6 +3722,9 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv, if (!mddev->pers || !mddev->pers->sync_request) return -EINVAL; + if (test_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) + return -EBUSY; + if (!strcasecmp(argv[0], "frozen")) set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); else @@ -3796,10 +3799,8 @@ static void raid_postsuspend(struct dm_target *ti) struct raid_set *rs = ti->private; if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) { - /* Writes have to be stopped before suspending to avoid deadlocks. */ - if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery)) - md_stop_writes(&rs->md); - + md_frozen_sync_thread(&rs->md); + md_stop_writes(&rs->md); mddev_suspend(&rs->md, false); } } @@ -4011,9 +4012,6 @@ static int raid_preresume(struct dm_target *ti) DMERR("Failed to resize bitmap"); } - /* Check for any resize/reshape on @rs and adjust/initiate */ - /* Be prepared for mddev_resume() in raid_resume() */ - set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); if (mddev->recovery_cp && mddev->recovery_cp < MaxSector) { set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery); mddev->resync_min = mddev->recovery_cp; @@ -4056,10 +4054,11 @@ static void raid_resume(struct dm_target *ti) rs_set_capacity(rs); mddev_lock_nointr(mddev); - clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); mddev->ro = 0; mddev->in_sync = 0; mddev_unlock_and_resume(mddev); + + md_unfrozen_sync_thread(mddev); } } diff --git a/drivers/md/md.c b/drivers/md/md.c index 9ef17a769cc2..0638d104fe26 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -4939,7 +4939,7 @@ static void idle_sync_thread(struct mddev *mddev) mutex_unlock(&mddev->sync_mutex); } -static void frozen_sync_thread(struct mddev *mddev) +void md_frozen_sync_thread(struct mddev *mddev) { mutex_lock(&mddev->sync_mutex); set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); @@ -4952,6 +4952,18 @@ static void frozen_sync_thread(struct mddev *mddev) stop_sync_thread(mddev, false, false); mutex_unlock(&mddev->sync_mutex); } +EXPORT_SYMBOL_GPL(md_frozen_sync_thread); + +void md_unfrozen_sync_thread(struct mddev *mddev) +{ + mutex_lock(&mddev->sync_mutex); + clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); + set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); + md_wakeup_thread(mddev->thread); + sysfs_notify_dirent_safe(mddev->sysfs_action); + mutex_unlock(&mddev->sync_mutex); +} +EXPORT_SYMBOL_GPL(md_unfrozen_sync_thread); static ssize_t action_store(struct mddev *mddev, const char *page, size_t len) @@ -4963,7 +4975,7 @@ action_store(struct mddev *mddev, const char *page, size_t len) if (cmd_match(page, "idle")) idle_sync_thread(mddev); else if (cmd_match(page, "frozen")) - frozen_sync_thread(mddev); + md_frozen_sync_thread(mddev); else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) return -EBUSY; else if (cmd_match(page, "resync")) diff --git a/drivers/md/md.h b/drivers/md/md.h index 8d881cc59799..332520595ed8 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -781,6 +781,8 @@ extern void md_rdev_clear(struct md_rdev *rdev); extern void md_handle_request(struct mddev *mddev, struct bio *bio); extern int mddev_suspend(struct mddev *mddev, bool interruptible); extern void mddev_resume(struct mddev *mddev); +extern void md_frozen_sync_thread(struct mddev *mddev); +extern void md_unfrozen_sync_thread(struct mddev *mddev); extern void md_reload_sb(struct mddev *mddev, int raid_disk); extern void md_update_sb(struct mddev *mddev, int force);
diff --git a/drivers/md/md.c b/drivers/md/md.c index 7db749ba7e60..3e8dd020bf9f 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -4909,6 +4909,14 @@ static void stop_sync_thread(struct mddev *mddev, bool locked, bool check_seq) if (work_pending(&mddev->sync_work)) flush_work(&mddev->sync_work); + if (!mddev->gendisk) { + mddev_lock_nointr(mddev); + md_reap_sync_thread(mddev); + if (!locked) + mddev_unlock(mddev); + return; + } + wait_event(resync_wait, !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) || (check_seq && sync_seq != atomic_read(&mddev->sync_seq)));