From patchwork Sat Oct 21 10:20:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 156385 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce89:0:b0:403:3b70:6f57 with SMTP id p9csp83707vqx; Fri, 20 Oct 2023 19:26:18 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFpTkzphm7DrTpBbZkYmfkoGg/zwCOqKsezOPTOStV3IzZ/88QUAZvK6piQh4bDd+uFBGa7 X-Received: by 2002:a92:c242:0:b0:357:610c:d822 with SMTP id k2-20020a92c242000000b00357610cd822mr4726588ilo.6.1697855177819; Fri, 20 Oct 2023 19:26:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697855177; cv=none; d=google.com; s=arc-20160816; b=fgueQqlnmOsOMrAiNtMVv7GiR9pQm7qdp5ecYltaAM0rfuv+IBmnQUtGLtAEFhC/mM azsgItoSef/qJFsSKINrJ9BTrL9mlwZSo8VfPuTsX0IJxntLXqZkq/dJprXQFtLnT7Vy BFZIweUDiAJGPGakwjPjbGrFawvrBg3VI4u/rmGJKzlMNH2t4UnkYfF/1G2n0XtTTePR vvFPjpua84jsKh2Sra+CWfTv2Aqc9rZ0rPPdze70sUefpxxX+It0hkbotxHravpEVusz KTelv/HKert0Ijp67TskXVDG/QdTpzhK8kaY6mM28IHlyqmbxHhgTijt0V3/PbcSVqDI KZlA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=MNysy6YyZ4HHEAn+EVGgSofnsefCuYDS9E4eWs2hUTo=; fh=d9c5gHOb3LccGp7KhO2PTidd9oOuOSBo1OJ/8DRRGxA=; b=uZv6HlHiDIvfOa1/oRIPKblJSOslr8l3sZ5rcN/4BVOewVupdWgbRJGwQSooudYt1n 0etrffCrpBZmuKetSXyzvg3CAdHcKHvYrhwIbkqFkIJQ+qY2oaziKtm2hcytyXtXZKZ3 y8Q/+hZkfoWtBN6gtJrwQDpWSAjmw756Dcg8D747GTHomKMthKYVmeQhCIYoD+Th7NDn 4xR4ZywC/PN9U1BWfSOym4pEuDdHg0sphr3KQccXN60sTpDKjK3r9hnglAGPkPExMGzC HYwIX0WeykTWWcHUt/RBBjw09bTFEj4UDOqUSN8z27gO8oRLPvy8DyHTJwO/01y0AMLl TKGg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from howler.vger.email (howler.vger.email. [23.128.96.34]) by mx.google.com with ESMTPS id w2-20020a1709029a8200b001c72d694ec7si2705490plp.328.2023.10.20.19.26.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Oct 2023 19:26:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) client-ip=23.128.96.34; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 649668348D95; Fri, 20 Oct 2023 19:26:05 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231324AbjJUCZY (ORCPT + 26 others); Fri, 20 Oct 2023 22:25:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35654 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231184AbjJUCZV (ORCPT ); Fri, 20 Oct 2023 22:25:21 -0400 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D5609D7A; Fri, 20 Oct 2023 19:25:17 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4SC4zp2W17z4f3lfp; Sat, 21 Oct 2023 10:25:10 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgAnt9aHNjNlZ+cUDg--.5642S5; Sat, 21 Oct 2023 10:25:13 +0800 (CST) From: Yu Kuai To: song@kernel.org Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH -next v2 1/6] md: remove useless debug code to print configuration Date: Sat, 21 Oct 2023 18:20:54 +0800 Message-Id: <20231021102059.3198284-2-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231021102059.3198284-1-yukuai1@huaweicloud.com> References: <20231021102059.3198284-1-yukuai1@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAnt9aHNjNlZ+cUDg--.5642S5 X-Coremail-Antispam: 1UD129KBjvJXoW3Jr1DXry8Kr4xCr17Gr4xXrb_yoWxKF48pw 4aqas3JrsFv345JF4DArWDC3Wayw47Ka97tryfC3s7ZanIyrZ3J3WrJFyrJFy5Za45Zw15 Z3y5KrWkC3WIgFUanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBE14x267AKxVW5JVWrJwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_Jr4l82xGYIkIc2x26xkF7I0E14v26r1I6r4UM28lY4IEw2IIxxk0rwA2 F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjx v20xvEc7CjxVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2 z280aVCY1x0267AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0V AKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1l Ox8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErc IFxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v2 6r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_JF0_Jw1lIxkGc2 Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_ Gr0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMI IF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x0pRxhLUUUUUU = X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=-0.7 required=5.0 tests=DATE_IN_FUTURE_06_12, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on howler.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Fri, 20 Oct 2023 19:26:05 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780330191248521868 X-GMAIL-MSGID: 1780330191248521868 From: Yu Kuai One the one hand, print_conf() can be called without grabbing 'reconfig_mtuex' and current rcu protection to access rdev through 'conf' is not safe. Fortunately, there is a separate rcu protection to access rdev from 'mddev->disks', and rdev is always removed from 'conf' before 'mddev->disks'. On the other hand, print_conf() is just used for debug, and user can always grab such information(/proc/mdstat and mdadm). There is no need to always enable this debug and try to fix misuse rcu protection for accessing rdev from 'conf', hence remove print_conf(). Signed-off-by: Yu Kuai --- drivers/md/raid1.c | 28 ---------------------------- drivers/md/raid10.c | 29 ----------------------------- drivers/md/raid5.c | 34 ---------------------------------- 3 files changed, 91 deletions(-) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 35d12948e0a9..c13088eae401 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1679,30 +1679,6 @@ static void raid1_error(struct mddev *mddev, struct md_rdev *rdev) mdname(mddev), conf->raid_disks - mddev->degraded); } -static void print_conf(struct r1conf *conf) -{ - int i; - - pr_debug("RAID1 conf printout:\n"); - if (!conf) { - pr_debug("(!conf)\n"); - return; - } - pr_debug(" --- wd:%d rd:%d\n", conf->raid_disks - conf->mddev->degraded, - conf->raid_disks); - - rcu_read_lock(); - for (i = 0; i < conf->raid_disks; i++) { - struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev); - if (rdev) - pr_debug(" disk %d, wo:%d, o:%d, dev:%pg\n", - i, !test_bit(In_sync, &rdev->flags), - !test_bit(Faulty, &rdev->flags), - rdev->bdev); - } - rcu_read_unlock(); -} - static void close_sync(struct r1conf *conf) { int idx; @@ -1763,7 +1739,6 @@ static int raid1_spare_active(struct mddev *mddev) mddev->degraded -= count; spin_unlock_irqrestore(&conf->device_lock, flags); - print_conf(conf); return count; } @@ -1829,7 +1804,6 @@ static int raid1_add_disk(struct mddev *mddev, struct md_rdev *rdev) rcu_assign_pointer(p[conf->raid_disks].rdev, rdev); } - print_conf(conf); return err; } @@ -1846,7 +1820,6 @@ static int raid1_remove_disk(struct mddev *mddev, struct md_rdev *rdev) if (rdev != p->rdev) p = conf->mirrors + conf->raid_disks + number; - print_conf(conf); if (rdev == p->rdev) { if (test_bit(In_sync, &rdev->flags) || atomic_read(&rdev->nr_pending)) { @@ -1902,7 +1875,6 @@ static int raid1_remove_disk(struct mddev *mddev, struct md_rdev *rdev) } abort: - print_conf(conf); return err; } diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index a5927e98dc67..4b5f34f320c8 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -2059,31 +2059,6 @@ static void raid10_error(struct mddev *mddev, struct md_rdev *rdev) mdname(mddev), conf->geo.raid_disks - mddev->degraded); } -static void print_conf(struct r10conf *conf) -{ - int i; - struct md_rdev *rdev; - - pr_debug("RAID10 conf printout:\n"); - if (!conf) { - pr_debug("(!conf)\n"); - return; - } - pr_debug(" --- wd:%d rd:%d\n", conf->geo.raid_disks - conf->mddev->degraded, - conf->geo.raid_disks); - - /* This is only called with ->reconfix_mutex held, so - * rcu protection of rdev is not needed */ - for (i = 0; i < conf->geo.raid_disks; i++) { - rdev = conf->mirrors[i].rdev; - if (rdev) - pr_debug(" disk %d, wo:%d, o:%d, dev:%pg\n", - i, !test_bit(In_sync, &rdev->flags), - !test_bit(Faulty, &rdev->flags), - rdev->bdev); - } -} - static void close_sync(struct r10conf *conf) { wait_barrier(conf, false); @@ -2136,7 +2111,6 @@ static int raid10_spare_active(struct mddev *mddev) mddev->degraded -= count; spin_unlock_irqrestore(&conf->device_lock, flags); - print_conf(conf); return count; } @@ -2207,7 +2181,6 @@ static int raid10_add_disk(struct mddev *mddev, struct md_rdev *rdev) rcu_assign_pointer(p->replacement, rdev); } - print_conf(conf); return err; } @@ -2219,7 +2192,6 @@ static int raid10_remove_disk(struct mddev *mddev, struct md_rdev *rdev) struct md_rdev **rdevp; struct raid10_info *p; - print_conf(conf); if (unlikely(number >= mddev->raid_disks)) return 0; p = conf->mirrors + number; @@ -2271,7 +2243,6 @@ static int raid10_remove_disk(struct mddev *mddev, struct md_rdev *rdev) abort: - print_conf(conf); return err; } diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 4207e945e8c8..27a4dce51c92 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -156,8 +156,6 @@ static int raid6_idx_to_slot(int idx, struct stripe_head *sh, return slot; } -static void print_raid5_conf (struct r5conf *conf); - static int stripe_operations_active(struct stripe_head *sh) { return sh->check_state || sh->reconstruct_state || @@ -7983,8 +7981,6 @@ static int raid5_run(struct mddev *mddev) mddev->raid_disks-mddev->degraded, mddev->raid_disks, mddev->new_layout); - print_raid5_conf(conf); - if (conf->reshape_progress != MaxSector) { conf->reshape_safe = conf->reshape_progress; atomic_set(&conf->reshape_stripes, 0); @@ -8075,7 +8071,6 @@ static int raid5_run(struct mddev *mddev) return 0; abort: md_unregister_thread(mddev, &mddev->thread); - print_raid5_conf(conf); free_conf(conf); mddev->private = NULL; pr_warn("md/raid:%s: failed to run raid set.\n", mdname(mddev)); @@ -8107,31 +8102,6 @@ static void raid5_status(struct seq_file *seq, struct mddev *mddev) seq_printf (seq, "]"); } -static void print_raid5_conf (struct r5conf *conf) -{ - struct md_rdev *rdev; - int i; - - pr_debug("RAID conf printout:\n"); - if (!conf) { - pr_debug("(conf==NULL)\n"); - return; - } - pr_debug(" --- level:%d rd:%d wd:%d\n", conf->level, - conf->raid_disks, - conf->raid_disks - conf->mddev->degraded); - - rcu_read_lock(); - for (i = 0; i < conf->raid_disks; i++) { - rdev = rcu_dereference(conf->disks[i].rdev); - if (rdev) - pr_debug(" disk %d, o:%d, dev:%pg\n", - i, !test_bit(Faulty, &rdev->flags), - rdev->bdev); - } - rcu_read_unlock(); -} - static int raid5_spare_active(struct mddev *mddev) { int i; @@ -8173,7 +8143,6 @@ static int raid5_spare_active(struct mddev *mddev) spin_lock_irqsave(&conf->device_lock, flags); mddev->degraded = raid5_calc_degraded(conf); spin_unlock_irqrestore(&conf->device_lock, flags); - print_raid5_conf(conf); return count; } @@ -8186,7 +8155,6 @@ static int raid5_remove_disk(struct mddev *mddev, struct md_rdev *rdev) struct disk_info *p; struct md_rdev *tmp; - print_raid5_conf(conf); if (test_bit(Journal, &rdev->flags) && conf->log) { /* * we can't wait pending write here, as this is called in @@ -8266,7 +8234,6 @@ static int raid5_remove_disk(struct mddev *mddev, struct md_rdev *rdev) clear_bit(WantReplacement, &rdev->flags); abort: - print_raid5_conf(conf); return err; } @@ -8348,7 +8315,6 @@ static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev) } } out: - print_raid5_conf(conf); return err; } From patchwork Sat Oct 21 10:20:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 156389 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce89:0:b0:403:3b70:6f57 with SMTP id p9csp83972vqx; Fri, 20 Oct 2023 19:27:15 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHEiGcY/6KDVQHruABi3omUCkVFDI7NNBw39Tqr5IADKOCDcIEpyvuDR5DRy52LRwh6RnZ/ X-Received: by 2002:a05:6808:344:b0:3a9:ba39:6d70 with SMTP id j4-20020a056808034400b003a9ba396d70mr4024475oie.23.1697855235398; Fri, 20 Oct 2023 19:27:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697855235; cv=none; d=google.com; s=arc-20160816; b=KZby6ORC5IKm/EZynNBVgGBkVMqK1+aLRciSpIYpV3pKWe/WLi0puSL/907r96zz2o 9jWQ+D5v8/UqS9HoyFHO8Q64BZIPZMDfeaKcER7nXOPWWnlrRu8SBa9YYoG3/E7/DTmW f//c9mNcN7pSeapiniji9/d1ufOf/NMbCKSVTJJmmDhVal9xrGOFiKfzWGayqtk+TRUQ cKhNcJnJB7rze7i8+ramPC6V70fiQ8LjQT9RzWQiV39efz8qHeEbhTBIo2lzooRM/V/R MEzViisAK9yFzliRx7M8HE6LUycG0rYys3c0CSxsYtqMZcjbcHrgk06bKDJLvfA3mb8I abEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=4ty44XRMwVZh9Nu2tzzUg4dj7VZHpOupnVz5T3RD1Zg=; fh=d9c5gHOb3LccGp7KhO2PTidd9oOuOSBo1OJ/8DRRGxA=; b=YGJGpW+Pa27bdvvRA+/J+ERjkIKYpw8eQeNs9vgfJLzIXaLOkTXx0lNUxEyDvEORl3 r+pvuESCgKlfP61jDGw4k0QE5IG61qzj3poHgtJITLPhAUItcUg0y7RIwdDTOdeY9+pU 1d7xskNDLcQhkXlivfGiI4eKQC/e8EvQmAfULVWIvisOerlnp4bxUowyDYKjEucXz3P3 WU3WIOs3eDxHVfcotVpx85n2fsxS3lk/4ZaP8yb/cYqn+okXoZkN5JLqMyQ1AscjgxVa 9wXVlEL1gginsj68pnUtlkexH4ijjne+dP/oLFS2VTm+kSaS5OSLBFkr7B+l2QdTsHjI lrFg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id ik26-20020a170902ab1a00b001c7345bc007si2705553plb.486.2023.10.20.19.27.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Oct 2023 19:27:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 748AB81F45D7; Fri, 20 Oct 2023 19:25:58 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231181AbjJUCZU (ORCPT + 26 others); Fri, 20 Oct 2023 22:25:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35632 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229500AbjJUCZT (ORCPT ); Fri, 20 Oct 2023 22:25:19 -0400 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3447ED71; Fri, 20 Oct 2023 19:25:17 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4SC4zq74Qdz4f3lWy; Sat, 21 Oct 2023 10:25:11 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgAnt9aHNjNlZ+cUDg--.5642S6; Sat, 21 Oct 2023 10:25:13 +0800 (CST) From: Yu Kuai To: song@kernel.org Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH -next v2 2/6] md: remove flag RemoveSynchronized Date: Sat, 21 Oct 2023 18:20:55 +0800 Message-Id: <20231021102059.3198284-3-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231021102059.3198284-1-yukuai1@huaweicloud.com> References: <20231021102059.3198284-1-yukuai1@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAnt9aHNjNlZ+cUDg--.5642S6 X-Coremail-Antispam: 1UD129KBjvJXoW3WFW3tw1DWF15Cr45KFyfZwb_yoW7CrWkpw s3WFy3ur4DJw4Utw4DJrW7CFyrJw1Utayjkr93u34fZa43ZryDX34rJFy5Zr90vFZaya1j qF1UJw4DGFyxGF7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPSb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M280x2IEY4vEnII2IxkI6r1a6r45M2 8IrcIa0xkI8VA2jI8067AKxVWUXwA2048vs2IY020Ec7CjxVAFwI0_Gr0_Xr1l8cAvFVAK 0II2c7xJM28CjxkF64kEwVA0rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4 x0Y4vE2Ix0cI8IcVCY1x0267AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l 84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I 8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AK xVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zV CS5cI20VAGYxC7MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E 5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUAV WUtwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY 1x0267AKxVW8JVWxJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI 0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7s RNLvtUUUUUU== X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=-0.7 required=5.0 tests=DATE_IN_FUTURE_06_12, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Fri, 20 Oct 2023 19:25:58 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780330251090077276 X-GMAIL-MSGID: 1780330251090077276 From: Yu Kuai rcu is not used correctly here, because synchronize_rcu() is called before replacing old value, for example: remove_and_add_spares // other path synchronize_rcu // called before replacing old value set_bit(RemoveSynchronized) rcu_read_lock() rdev = conf->mirros[].rdev pers->hot_remove_disk conf->mirros[].rdev = NULL; if (!test_bit(RemoveSynchronized)) synchronize_rcu /* * won't be called, and won't wait * for concurrent readers to be done. */ // access rdev after remove_and_add_spares() rcu_read_unlock() Fortunately, there is a separate rcu protection to prevent such rdev to be freed: md_kick_rdev_from_array //other path rcu_read_lock() rdev = conf->mirros[].rdev list_del_rcu(&rdev->same_set) rcu_read_unlock() /* * rdev can be removed from conf, but * rdev won't be freed. */ synchronize_rcu() free rdev Hence remove this useless flag and prepare to remove rcu protection to access rdev from 'conf'. Signed-off-by: Yu Kuai --- drivers/md/md-multipath.c | 9 --------- drivers/md/md.c | 37 ++++++------------------------------- drivers/md/raid1.c | 9 --------- drivers/md/raid10.c | 9 --------- drivers/md/raid5.c | 9 --------- 5 files changed, 6 insertions(+), 67 deletions(-) diff --git a/drivers/md/md-multipath.c b/drivers/md/md-multipath.c index d22276870283..aa77133f3188 100644 --- a/drivers/md/md-multipath.c +++ b/drivers/md/md-multipath.c @@ -258,15 +258,6 @@ static int multipath_remove_disk(struct mddev *mddev, struct md_rdev *rdev) goto abort; } p->rdev = NULL; - if (!test_bit(RemoveSynchronized, &rdev->flags)) { - synchronize_rcu(); - if (atomic_read(&rdev->nr_pending)) { - /* lost the race, try later */ - err = -EBUSY; - p->rdev = rdev; - goto abort; - } - } err = md_integrity_register(mddev); } abort: diff --git a/drivers/md/md.c b/drivers/md/md.c index 09686d8db983..68f3bb6e89cb 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -9250,44 +9250,19 @@ static int remove_and_add_spares(struct mddev *mddev, struct md_rdev *rdev; int spares = 0; int removed = 0; - bool remove_some = false; if (this && test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) /* Mustn't remove devices when resync thread is running */ return 0; rdev_for_each(rdev, mddev) { - if ((this == NULL || rdev == this) && - rdev->raid_disk >= 0 && - !test_bit(Blocked, &rdev->flags) && - test_bit(Faulty, &rdev->flags) && - atomic_read(&rdev->nr_pending)==0) { - /* Faulty non-Blocked devices with nr_pending == 0 - * never get nr_pending incremented, - * never get Faulty cleared, and never get Blocked set. - * So we can synchronize_rcu now rather than once per device - */ - remove_some = true; - set_bit(RemoveSynchronized, &rdev->flags); - } - } - - if (remove_some) - synchronize_rcu(); - rdev_for_each(rdev, mddev) { - if ((this == NULL || rdev == this) && - (test_bit(RemoveSynchronized, &rdev->flags) || - rdev_removeable(rdev))) { - if (mddev->pers->hot_remove_disk( - mddev, rdev) == 0) { - sysfs_unlink_rdev(mddev, rdev); - rdev->saved_raid_disk = rdev->raid_disk; - rdev->raid_disk = -1; - removed++; - } + if ((this == NULL || rdev == this) && rdev_removeable(rdev) && + !mddev->pers->hot_remove_disk(mddev, rdev)) { + sysfs_unlink_rdev(mddev, rdev); + rdev->saved_raid_disk = rdev->raid_disk; + rdev->raid_disk = -1; + removed++; } - if (remove_some && test_bit(RemoveSynchronized, &rdev->flags)) - clear_bit(RemoveSynchronized, &rdev->flags); } if (removed && mddev->kobj.sd) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index c13088eae401..4348d670439d 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1836,15 +1836,6 @@ static int raid1_remove_disk(struct mddev *mddev, struct md_rdev *rdev) goto abort; } p->rdev = NULL; - if (!test_bit(RemoveSynchronized, &rdev->flags)) { - synchronize_rcu(); - if (atomic_read(&rdev->nr_pending)) { - /* lost the race, try later */ - err = -EBUSY; - p->rdev = rdev; - goto abort; - } - } if (conf->mirrors[conf->raid_disks + number].rdev) { /* We just removed a device that is being replaced. * Move down the replacement. We drain all IO before diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 4b5f34f320c8..33ab00323cae 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -2219,15 +2219,6 @@ static int raid10_remove_disk(struct mddev *mddev, struct md_rdev *rdev) goto abort; } *rdevp = NULL; - if (!test_bit(RemoveSynchronized, &rdev->flags)) { - synchronize_rcu(); - if (atomic_read(&rdev->nr_pending)) { - /* lost the race, try later */ - err = -EBUSY; - *rdevp = rdev; - goto abort; - } - } if (p->replacement) { /* We must have just cleared 'rdev' */ p->rdev = p->replacement; diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 27a4dce51c92..a80be51b4825 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -8202,15 +8202,6 @@ static int raid5_remove_disk(struct mddev *mddev, struct md_rdev *rdev) goto abort; } *rdevp = NULL; - if (!test_bit(RemoveSynchronized, &rdev->flags)) { - lockdep_assert_held(&mddev->reconfig_mutex); - synchronize_rcu(); - if (atomic_read(&rdev->nr_pending)) { - /* lost the race, try later */ - err = -EBUSY; - rcu_assign_pointer(*rdevp, rdev); - } - } if (!err) { err = log_modify(conf, rdev, false); if (err) From patchwork Sat Oct 21 10:20:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 156387 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce89:0:b0:403:3b70:6f57 with SMTP id p9csp83866vqx; Fri, 20 Oct 2023 19:26:53 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG3fV4hrSZM3oJlBZuDNu8XsB+RxT0jtNJkq+NanDaduwdrudZLTyNIj0dLB/uWgooSbH+o X-Received: by 2002:a17:902:da92:b0:1c6:2d13:5b74 with SMTP id j18-20020a170902da9200b001c62d135b74mr4530788plx.55.1697855212906; Fri, 20 Oct 2023 19:26:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697855212; cv=none; d=google.com; s=arc-20160816; b=ar5LxBHzNaO3ResR7O1LO52KQd8k522QEe09Usu34Oo17wePvhcFnjQkNi0MdGBINK /oFoIeRq4Nj3yrlz5j3ELsCQBeeAXS/rcvExdulsiqTs/WoduRo5RpfOr1Tufa0iKPeV MoCC5dq/k++p323gvNVU875sS5d7qOtReSxR9fz9gVEZQ/70y5+/+wkvK+8hdjlityrE sP45H7BM0L3B65gpblzFAvLZ2UbJPic0tAOi55Vihjxz1QeaA9JpQNtn4qPePHzF6Sik OrFp25NvBCxlPTFSkXQpcxN4NaSJ57DqOlUbdWfh5xugYJ7Towr2n6VEToQ9703JydsK 38qA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=xdP+Z2/TvhVeq5PSjfTr6K71z1c7dEqOrz5DBKXeaCs=; fh=d9c5gHOb3LccGp7KhO2PTidd9oOuOSBo1OJ/8DRRGxA=; b=NU50+7RKpW5PwlYbNVeLWuEWz9HmQdk11OqKHe3rgnMUW07WiJZ/iXryOMwjcsw4Xt z7Xmwcagm4DVoI+upDn6Bp2ZF2HnN2vKMXWJYll7AuhC0IeR/T14D+G9TFYqnEeKLkUm ZvAZy05dtIrk7cLqyFD3D65HlRwoz50pSE2rLUn4psEB+AZr4TYLyb3FLN3bMMw5SM3d qPmAbVNAXuBe0c/KXDqZ9VSqkmga9vWZ1Rzeb+dBHGybOsIrGR1DAvb2leoeVG5aRB6s K8Akv105pTgZucFrKP9oTrANbw8SUpiSUmvOR5o4k9QsFzoHSZIcoWSpEg8n4l31x0u3 9hTA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id g4-20020a170902c38400b001c72c258f82si2854582plg.99.2023.10.20.19.26.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Oct 2023 19:26:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 4678880A8B6E; Fri, 20 Oct 2023 19:26:09 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233159AbjJUCZl (ORCPT + 26 others); Fri, 20 Oct 2023 22:25:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35674 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232568AbjJUCZd (ORCPT ); Fri, 20 Oct 2023 22:25:33 -0400 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8485810D0; Fri, 20 Oct 2023 19:25:28 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4SC4zq0Z19z4f3nJY; Sat, 21 Oct 2023 10:25:11 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgAnt9aHNjNlZ+cUDg--.5642S7; Sat, 21 Oct 2023 10:25:14 +0800 (CST) From: Yu Kuai To: song@kernel.org Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH -next v2 3/6] md/raid1: remove rcu protection to access rdev from conf Date: Sat, 21 Oct 2023 18:20:56 +0800 Message-Id: <20231021102059.3198284-4-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231021102059.3198284-1-yukuai1@huaweicloud.com> References: <20231021102059.3198284-1-yukuai1@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAnt9aHNjNlZ+cUDg--.5642S7 X-Coremail-Antispam: 1UD129KBjvJXoW3WrWrJr43Wr47Aw1DXw17GFg_yoW3Wry3pw 43tas7JF4DX3s0gF1DAayDG3WSyryaqFWxJryfGw4I93s3KrZxtay8Gryaqry5CrZ8Ar15 X3W5K398CFyxKF7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBE14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JrWl82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2 F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjx v20xvEc7CjxVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2 z280aVCY1x0267AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0V AKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1l Ox8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErc IFxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v2 6r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_JF0_Jw1lIxkGc2 Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_ Gr0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMI IF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x0pRPEf5UUUUU = X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=0.4 required=5.0 tests=BAYES_00,DATE_IN_FUTURE_06_12, KHOP_HELO_FCRDNS,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Fri, 20 Oct 2023 19:26:09 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780330227741611886 X-GMAIL-MSGID: 1780330227741611886 From: Yu Kuai Because it's safe to accees rdev from conf: - If any spinlock is held, because synchronize_rcu() from md_kick_rdev_from_array() will prevent 'rdev' to be freed until spinlock is released; - If 'reconfig_lock' is held, because rdev can't be added or removed from array; - If there is normal IO inflight, because mddev_suspend() will prevent rdev to be added or removed from array; - If there is sync IO inflight, because 'MD_RECOVERY_RUNNING' is checked in remove_and_add_spares(). And these will cover all the scenarios in raid1. Signed-off-by: Yu Kuai --- drivers/md/raid1.c | 57 +++++++++++++++++----------------------------- 1 file changed, 21 insertions(+), 36 deletions(-) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 4348d670439d..5c647036663d 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -609,7 +609,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect int choose_first; int choose_next_idle; - rcu_read_lock(); /* * Check if we can balance. We can balance on the whole * device if no resync is going on, or below the resync window. @@ -642,7 +641,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect unsigned int pending; bool nonrot; - rdev = rcu_dereference(conf->mirrors[disk].rdev); + rdev = conf->mirrors[disk].rdev; if (r1_bio->bios[disk] == IO_BLOCKED || rdev == NULL || test_bit(Faulty, &rdev->flags)) @@ -773,7 +772,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect } if (best_disk >= 0) { - rdev = rcu_dereference(conf->mirrors[best_disk].rdev); + rdev = conf->mirrors[best_disk].rdev; if (!rdev) goto retry; atomic_inc(&rdev->nr_pending); @@ -784,7 +783,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect conf->mirrors[best_disk].next_seq_sect = this_sector + sectors; } - rcu_read_unlock(); *max_sectors = sectors; return best_disk; @@ -1235,14 +1233,12 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio, if (r1bio_existed) { /* Need to get the block device name carefully */ - struct md_rdev *rdev; - rcu_read_lock(); - rdev = rcu_dereference(conf->mirrors[r1_bio->read_disk].rdev); + struct md_rdev *rdev = conf->mirrors[r1_bio->read_disk].rdev; + if (rdev) snprintf(b, sizeof(b), "%pg", rdev->bdev); else strcpy(b, "???"); - rcu_read_unlock(); } /* @@ -1396,10 +1392,9 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, disks = conf->raid_disks * 2; blocked_rdev = NULL; - rcu_read_lock(); max_sectors = r1_bio->sectors; for (i = 0; i < disks; i++) { - struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev); + struct md_rdev *rdev = conf->mirrors[i].rdev; /* * The write-behind io is only attempted on drives marked as @@ -1465,7 +1460,6 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, } r1_bio->bios[i] = bio; } - rcu_read_unlock(); if (unlikely(blocked_rdev)) { /* Wait for this device to become unblocked */ @@ -1617,15 +1611,16 @@ static void raid1_status(struct seq_file *seq, struct mddev *mddev) struct r1conf *conf = mddev->private; int i; + lockdep_assert_held(&mddev->lock); + seq_printf(seq, " [%d/%d] [", conf->raid_disks, conf->raid_disks - mddev->degraded); - rcu_read_lock(); for (i = 0; i < conf->raid_disks; i++) { - struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev); + struct md_rdev *rdev = READ_ONCE(conf->mirrors[i].rdev); + seq_printf(seq, "%s", rdev && test_bit(In_sync, &rdev->flags) ? "U" : "_"); } - rcu_read_unlock(); seq_printf(seq, "]"); } @@ -1785,7 +1780,7 @@ static int raid1_add_disk(struct mddev *mddev, struct md_rdev *rdev) */ if (rdev->saved_raid_disk < 0) conf->fullsync = 1; - rcu_assign_pointer(p->rdev, rdev); + WRITE_ONCE(p->rdev, rdev); break; } if (test_bit(WantReplacement, &p->rdev->flags) && @@ -1801,7 +1796,7 @@ static int raid1_add_disk(struct mddev *mddev, struct md_rdev *rdev) rdev->raid_disk = repl_slot; err = 0; conf->fullsync = 1; - rcu_assign_pointer(p[conf->raid_disks].rdev, rdev); + WRITE_ONCE(p[conf->raid_disks].rdev, rdev); } return err; @@ -1835,7 +1830,7 @@ static int raid1_remove_disk(struct mddev *mddev, struct md_rdev *rdev) err = -EBUSY; goto abort; } - p->rdev = NULL; + WRITE_ONCE(p->rdev, NULL); if (conf->mirrors[conf->raid_disks + number].rdev) { /* We just removed a device that is being replaced. * Move down the replacement. We drain all IO before @@ -1856,7 +1851,7 @@ static int raid1_remove_disk(struct mddev *mddev, struct md_rdev *rdev) goto abort; } clear_bit(Replacement, &repl->flags); - p->rdev = repl; + WRITE_ONCE(p->rdev, repl); conf->mirrors[conf->raid_disks + number].rdev = NULL; unfreeze_array(conf); } @@ -2253,8 +2248,7 @@ static void fix_read_error(struct r1conf *conf, int read_disk, sector_t first_bad; int bad_sectors; - rcu_read_lock(); - rdev = rcu_dereference(conf->mirrors[d].rdev); + rdev = conf->mirrors[d].rdev; if (rdev && (test_bit(In_sync, &rdev->flags) || (!test_bit(Faulty, &rdev->flags) && @@ -2262,15 +2256,14 @@ static void fix_read_error(struct r1conf *conf, int read_disk, is_badblock(rdev, sect, s, &first_bad, &bad_sectors) == 0) { atomic_inc(&rdev->nr_pending); - rcu_read_unlock(); if (sync_page_io(rdev, sect, s<<9, conf->tmppage, REQ_OP_READ, false)) success = 1; rdev_dec_pending(rdev, mddev); if (success) break; - } else - rcu_read_unlock(); + } + d++; if (d == conf->raid_disks * 2) d = 0; @@ -2289,29 +2282,24 @@ static void fix_read_error(struct r1conf *conf, int read_disk, if (d==0) d = conf->raid_disks * 2; d--; - rcu_read_lock(); - rdev = rcu_dereference(conf->mirrors[d].rdev); + rdev = conf->mirrors[d].rdev; if (rdev && !test_bit(Faulty, &rdev->flags)) { atomic_inc(&rdev->nr_pending); - rcu_read_unlock(); r1_sync_page_io(rdev, sect, s, conf->tmppage, WRITE); rdev_dec_pending(rdev, mddev); - } else - rcu_read_unlock(); + } } d = start; while (d != read_disk) { if (d==0) d = conf->raid_disks * 2; d--; - rcu_read_lock(); - rdev = rcu_dereference(conf->mirrors[d].rdev); + rdev = conf->mirrors[d].rdev; if (rdev && !test_bit(Faulty, &rdev->flags)) { atomic_inc(&rdev->nr_pending); - rcu_read_unlock(); if (r1_sync_page_io(rdev, sect, s, conf->tmppage, READ)) { atomic_add(s, &rdev->corrected_errors); @@ -2322,8 +2310,7 @@ static void fix_read_error(struct r1conf *conf, int read_disk, rdev->bdev); } rdev_dec_pending(rdev, mddev); - } else - rcu_read_unlock(); + } } sectors -= s; sect += s; @@ -2704,7 +2691,6 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr, r1_bio = raid1_alloc_init_r1buf(conf); - rcu_read_lock(); /* * If we get a correctably read error during resync or recovery, * we might want to read from a different device. So we @@ -2725,7 +2711,7 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr, struct md_rdev *rdev; bio = r1_bio->bios[i]; - rdev = rcu_dereference(conf->mirrors[i].rdev); + rdev = conf->mirrors[i].rdev; if (rdev == NULL || test_bit(Faulty, &rdev->flags)) { if (i < conf->raid_disks) @@ -2783,7 +2769,6 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr, bio->bi_opf |= MD_FAILFAST; } } - rcu_read_unlock(); if (disk < 0) disk = wonly; r1_bio->read_disk = disk; From patchwork Sat Oct 21 10:20:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 156388 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce89:0:b0:403:3b70:6f57 with SMTP id p9csp83944vqx; Fri, 20 Oct 2023 19:27:11 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHW5JTe3wpg1276orF67T5dTrYUlB3tIUhvPVNG5RN2e7cXbzOFZw8QZ/pj5okPUkenkiaD X-Received: by 2002:a05:6871:d0b:b0:1e9:8e2d:1576 with SMTP id vh11-20020a0568710d0b00b001e98e2d1576mr4662933oab.51.1697855231673; Fri, 20 Oct 2023 19:27:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697855231; cv=none; d=google.com; s=arc-20160816; b=TAdmzuJta+9ZnyxC2zj+4KSpav2WCdivJu1gCg5d7I7mc1Z37oyMITDOxDCXC1Vxdy z6S9l0UDH8c4DSQKHACG/yHF8EWuu46mXhwtOWwxYIdJETfx5ombig+lRewVaoZ//gmA 7hVC3fAj+dDX4wPaPBKKUmHv7yhjmiu+pIM4Ugno1d0gZ157QEZUHslmHE93VhuvhEuD vWYHx5VhdYztqhBuif5LNX9BkDtOsvOKv2TTYu/Bk4nNSjeQeyTor6/smZ0dtdTrS3Os 4NNWTXZTWoj9V/HTBL8s9vAHk8KlFja8ounbqOAolpPb6a+opDJt+PasiACKwRGjm+Vi DtCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=gYqrclLREVTM03JFJKxTK+RTKkVYgqVZtzRL03FUWdc=; fh=d9c5gHOb3LccGp7KhO2PTidd9oOuOSBo1OJ/8DRRGxA=; b=I3V9XvReeYDqIlCpkcqqh2AMvu2OTSmNaUabiDDC2HlaneyHavWm27wBKQqFBAwd6M sfT6w41kuaRoLRikyLF5x6xIL/UhnmtQXGhlK3RVsW3CachxoBWJ/Su+80+ETggfBR52 BRjkLgWfAu/bGpluJhFhi383lSf75HGU5SnkXjMXgo7WVEP2YbeaXw/8A3tWeF2FgkUz 7dwYQT5mBjK0aYh8yc089rLz+yL1bWEEdGFM5KMy7c2grftR6FqmZd0ajYBTMALYTi2a r5csIARvEhAts0PQw1rtllPBZuvw6kNza55V9Hkp9YpFSkPCaW7paR7RQXvA7N3ojtcZ PaoA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id p12-20020a170902bd0c00b001bde0b58abesi2678160pls.161.2023.10.20.19.27.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Oct 2023 19:27:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id F164E808AB82; Fri, 20 Oct 2023 19:26:39 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231251AbjJUCZX (ORCPT + 26 others); Fri, 20 Oct 2023 22:25:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35644 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231175AbjJUCZU (ORCPT ); Fri, 20 Oct 2023 22:25:20 -0400 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 846F3D75; Fri, 20 Oct 2023 19:25:17 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4SC4zq3JcKz4f3lDL; Sat, 21 Oct 2023 10:25:11 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgAnt9aHNjNlZ+cUDg--.5642S8; Sat, 21 Oct 2023 10:25:14 +0800 (CST) From: Yu Kuai To: song@kernel.org Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH -next v2 4/6] md/raid10: remove rcu protection to access rdev from conf Date: Sat, 21 Oct 2023 18:20:57 +0800 Message-Id: <20231021102059.3198284-5-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231021102059.3198284-1-yukuai1@huaweicloud.com> References: <20231021102059.3198284-1-yukuai1@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAnt9aHNjNlZ+cUDg--.5642S8 X-Coremail-Antispam: 1UD129KBjvAXoWfZrWxtr4DAw4DAr13trWxXrb_yoW8tr1xXo Z5JwnxKw1fAr9Yq3y7JF1ftrsrua45Aw1fuw15GrZ8CFWqgw4FvwsxGr4rZa4YqF1SqFyU Xr9rXw4vqFsxA3yxn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUOb7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l87I20VAvwVAaII0Ic2I_JFv_Gryl82xGYIkIc2 x26280x7IE14v26r126s0DM28IrcIa0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Y j41l84x0c7CEw4AK67xGY2AK021l84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xvwV C0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0 Y4vEx4A2jsIEc7CjxVAFwI0_GcCE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64 xvF2IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j 6r4UMcvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI 8I648v4I1l42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG 67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMI IYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E 14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJV W8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjTRKfOw UUUUU X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=-0.7 required=5.0 tests=DATE_IN_FUTURE_06_12, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Fri, 20 Oct 2023 19:26:40 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780330247406819000 X-GMAIL-MSGID: 1780330247406819000 From: Yu Kuai Because it's safe to accees rdev from conf: - If any spinlock is held, because synchronize_rcu() from md_kick_rdev_from_array() will prevent 'rdev' to be freed until spinlock is released; - If 'reconfig_lock' is held, because rdev can't be added or removed from array; - If there is normal IO inflight, because mddev_suspend() will prevent rdev to be added or removed from array; - If there is sync IO inflight, because 'MD_RECOVERY_RUNNING' is checked in remove_and_add_spares(). And these will cover all the scenarios in raid10. This patch also cleanup the code to handle the case that replacement replace rdev while IO is still inflight. Signed-off-by: Yu Kuai --- drivers/md/raid10.c | 210 ++++++++++++-------------------------------- 1 file changed, 57 insertions(+), 153 deletions(-) diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 33ab00323cae..806a7fe2f74a 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -743,7 +743,6 @@ static struct md_rdev *read_balance(struct r10conf *conf, struct geom *geo = &conf->geo; raid10_find_phys(conf, r10_bio); - rcu_read_lock(); best_dist_slot = -1; min_pending = UINT_MAX; best_dist_rdev = NULL; @@ -775,18 +774,11 @@ static struct md_rdev *read_balance(struct r10conf *conf, if (r10_bio->devs[slot].bio == IO_BLOCKED) continue; disk = r10_bio->devs[slot].devnum; - rdev = rcu_dereference(conf->mirrors[disk].replacement); + rdev = conf->mirrors[disk].replacement; if (rdev == NULL || test_bit(Faulty, &rdev->flags) || r10_bio->devs[slot].addr + sectors > - rdev->recovery_offset) { - /* - * Read replacement first to prevent reading both rdev - * and replacement as NULL during replacement replace - * rdev. - */ - smp_mb(); - rdev = rcu_dereference(conf->mirrors[disk].rdev); - } + rdev->recovery_offset) + rdev = conf->mirrors[disk].rdev; if (rdev == NULL || test_bit(Faulty, &rdev->flags)) continue; @@ -876,7 +868,6 @@ static struct md_rdev *read_balance(struct r10conf *conf, r10_bio->read_slot = slot; } else rdev = NULL; - rcu_read_unlock(); *max_sectors = best_good_sectors; return rdev; @@ -1198,9 +1189,8 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio, */ gfp = GFP_NOIO | __GFP_HIGH; - rcu_read_lock(); disk = r10_bio->devs[slot].devnum; - err_rdev = rcu_dereference(conf->mirrors[disk].rdev); + err_rdev = conf->mirrors[disk].rdev; if (err_rdev) snprintf(b, sizeof(b), "%pg", err_rdev->bdev); else { @@ -1208,7 +1198,6 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio, /* This never gets dereferenced */ err_rdev = r10_bio->devs[slot].rdev; } - rcu_read_unlock(); } if (!regular_request_wait(mddev, conf, bio, r10_bio->sectors)) @@ -1279,15 +1268,8 @@ static void raid10_write_one_disk(struct mddev *mddev, struct r10bio *r10_bio, int devnum = r10_bio->devs[n_copy].devnum; struct bio *mbio; - if (replacement) { - rdev = conf->mirrors[devnum].replacement; - if (rdev == NULL) { - /* Replacement just got moved to main 'rdev' */ - smp_mb(); - rdev = conf->mirrors[devnum].rdev; - } - } else - rdev = conf->mirrors[devnum].rdev; + rdev = replacement ? conf->mirrors[devnum].replacement : + conf->mirrors[devnum].rdev; mbio = bio_alloc_clone(rdev->bdev, bio, GFP_NOIO, &mddev->bio_set); if (replacement) @@ -1321,25 +1303,6 @@ static void raid10_write_one_disk(struct mddev *mddev, struct r10bio *r10_bio, } } -static struct md_rdev *dereference_rdev_and_rrdev(struct raid10_info *mirror, - struct md_rdev **prrdev) -{ - struct md_rdev *rdev, *rrdev; - - rrdev = rcu_dereference(mirror->replacement); - /* - * Read replacement first to prevent reading both rdev and - * replacement as NULL during replacement replace rdev. - */ - smp_mb(); - rdev = rcu_dereference(mirror->rdev); - if (rdev == rrdev) - rrdev = NULL; - - *prrdev = rrdev; - return rdev; -} - static void wait_blocked_dev(struct mddev *mddev, struct r10bio *r10_bio) { int i; @@ -1348,11 +1311,11 @@ static void wait_blocked_dev(struct mddev *mddev, struct r10bio *r10_bio) retry_wait: blocked_rdev = NULL; - rcu_read_lock(); for (i = 0; i < conf->copies; i++) { struct md_rdev *rdev, *rrdev; - rdev = dereference_rdev_and_rrdev(&conf->mirrors[i], &rrdev); + rdev = conf->mirrors[i].rdev; + rrdev = conf->mirrors[i].replacement; if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) { atomic_inc(&rdev->nr_pending); blocked_rdev = rdev; @@ -1391,7 +1354,6 @@ static void wait_blocked_dev(struct mddev *mddev, struct r10bio *r10_bio) } } } - rcu_read_unlock(); if (unlikely(blocked_rdev)) { /* Have to wait for this device to get unblocked, then retry */ @@ -1474,14 +1436,14 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio, wait_blocked_dev(mddev, r10_bio); - rcu_read_lock(); max_sectors = r10_bio->sectors; for (i = 0; i < conf->copies; i++) { int d = r10_bio->devs[i].devnum; struct md_rdev *rdev, *rrdev; - rdev = dereference_rdev_and_rrdev(&conf->mirrors[d], &rrdev); + rdev = conf->mirrors[d].rdev; + rrdev = conf->mirrors[d].replacement; if (rdev && (test_bit(Faulty, &rdev->flags))) rdev = NULL; if (rrdev && (test_bit(Faulty, &rrdev->flags))) @@ -1535,7 +1497,6 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio, atomic_inc(&rrdev->nr_pending); } } - rcu_read_unlock(); if (max_sectors < r10_bio->sectors) r10_bio->sectors = max_sectors; @@ -1625,17 +1586,8 @@ static void raid10_end_discard_request(struct bio *bio) set_bit(R10BIO_Uptodate, &r10_bio->state); dev = find_bio_disk(conf, r10_bio, bio, &slot, &repl); - if (repl) - rdev = conf->mirrors[dev].replacement; - if (!rdev) { - /* - * raid10_remove_disk uses smp_mb to make sure rdev is set to - * replacement before setting replacement to NULL. It can read - * rdev first without barrier protect even replacement is NULL - */ - smp_rmb(); - rdev = conf->mirrors[dev].rdev; - } + rdev = repl ? conf->mirrors[dev].replacement : + conf->mirrors[dev].rdev; raid_end_discard_bio(r10_bio); rdev_dec_pending(rdev, conf->mddev); @@ -1785,11 +1737,11 @@ static int raid10_handle_discard(struct mddev *mddev, struct bio *bio) * inc refcount on their rdev. Record them by setting * bios[x] to bio */ - rcu_read_lock(); for (disk = 0; disk < geo->raid_disks; disk++) { struct md_rdev *rdev, *rrdev; - rdev = dereference_rdev_and_rrdev(&conf->mirrors[disk], &rrdev); + rdev = conf->mirrors[disk].rdev; + rrdev = conf->mirrors[disk].replacement; r10_bio->devs[disk].bio = NULL; r10_bio->devs[disk].repl_bio = NULL; @@ -1809,7 +1761,6 @@ static int raid10_handle_discard(struct mddev *mddev, struct bio *bio) atomic_inc(&rrdev->nr_pending); } } - rcu_read_unlock(); atomic_set(&r10_bio->remaining, 1); for (disk = 0; disk < geo->raid_disks; disk++) { @@ -1939,6 +1890,8 @@ static void raid10_status(struct seq_file *seq, struct mddev *mddev) struct r10conf *conf = mddev->private; int i; + lockdep_assert_held(&mddev->lock); + if (conf->geo.near_copies < conf->geo.raid_disks) seq_printf(seq, " %dK chunks", mddev->chunk_sectors / 2); if (conf->geo.near_copies > 1) @@ -1953,12 +1906,11 @@ static void raid10_status(struct seq_file *seq, struct mddev *mddev) } seq_printf(seq, " [%d/%d] [", conf->geo.raid_disks, conf->geo.raid_disks - mddev->degraded); - rcu_read_lock(); for (i = 0; i < conf->geo.raid_disks; i++) { - struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev); + struct md_rdev *rdev = READ_ONCE(conf->mirrors[i].rdev); + seq_printf(seq, "%s", rdev && test_bit(In_sync, &rdev->flags) ? "U" : "_"); } - rcu_read_unlock(); seq_printf(seq, "]"); } @@ -1980,7 +1932,6 @@ static int _enough(struct r10conf *conf, int previous, int ignore) ncopies = conf->geo.near_copies; } - rcu_read_lock(); do { int n = conf->copies; int cnt = 0; @@ -1988,7 +1939,7 @@ static int _enough(struct r10conf *conf, int previous, int ignore) while (n--) { struct md_rdev *rdev; if (this != ignore && - (rdev = rcu_dereference(conf->mirrors[this].rdev)) && + (rdev = conf->mirrors[this].rdev) && test_bit(In_sync, &rdev->flags)) cnt++; this = (this+1) % disks; @@ -1999,7 +1950,6 @@ static int _enough(struct r10conf *conf, int previous, int ignore) } while (first != 0); has_enough = 1; out: - rcu_read_unlock(); return has_enough; } @@ -2164,7 +2114,7 @@ static int raid10_add_disk(struct mddev *mddev, struct md_rdev *rdev) err = 0; if (rdev->saved_raid_disk != mirror) conf->fullsync = 1; - rcu_assign_pointer(p->rdev, rdev); + WRITE_ONCE(p->rdev, rdev); break; } @@ -2178,7 +2128,7 @@ static int raid10_add_disk(struct mddev *mddev, struct md_rdev *rdev) disk_stack_limits(mddev->gendisk, rdev->bdev, rdev->data_offset << 9); conf->fullsync = 1; - rcu_assign_pointer(p->replacement, rdev); + WRITE_ONCE(p->replacement, rdev); } return err; @@ -2218,15 +2168,12 @@ static int raid10_remove_disk(struct mddev *mddev, struct md_rdev *rdev) err = -EBUSY; goto abort; } - *rdevp = NULL; + WRITE_ONCE(*rdevp, NULL); if (p->replacement) { /* We must have just cleared 'rdev' */ - p->rdev = p->replacement; + WRITE_ONCE(p->rdev, p->replacement); clear_bit(Replacement, &p->replacement->flags); - smp_mb(); /* Make sure other CPUs may see both as identical - * but will never see neither -- if they are careful. - */ - p->replacement = NULL; + WRITE_ONCE(p->replacement, NULL); } clear_bit(WantReplacement, &rdev->flags); @@ -2725,20 +2672,18 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10 if (s > (PAGE_SIZE>>9)) s = PAGE_SIZE >> 9; - rcu_read_lock(); do { sector_t first_bad; int bad_sectors; d = r10_bio->devs[sl].devnum; - rdev = rcu_dereference(conf->mirrors[d].rdev); + rdev = conf->mirrors[d].rdev; if (rdev && test_bit(In_sync, &rdev->flags) && !test_bit(Faulty, &rdev->flags) && is_badblock(rdev, r10_bio->devs[sl].addr + sect, s, &first_bad, &bad_sectors) == 0) { atomic_inc(&rdev->nr_pending); - rcu_read_unlock(); success = sync_page_io(rdev, r10_bio->devs[sl].addr + sect, @@ -2746,7 +2691,6 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10 conf->tmppage, REQ_OP_READ, false); rdev_dec_pending(rdev, mddev); - rcu_read_lock(); if (success) break; } @@ -2754,7 +2698,6 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10 if (sl == conf->copies) sl = 0; } while (sl != slot); - rcu_read_unlock(); if (!success) { /* Cannot read from anywhere, just mark the block @@ -2778,20 +2721,18 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10 start = sl; /* write it back and re-read */ - rcu_read_lock(); while (sl != slot) { if (sl==0) sl = conf->copies; sl--; d = r10_bio->devs[sl].devnum; - rdev = rcu_dereference(conf->mirrors[d].rdev); + rdev = conf->mirrors[d].rdev; if (!rdev || test_bit(Faulty, &rdev->flags) || !test_bit(In_sync, &rdev->flags)) continue; atomic_inc(&rdev->nr_pending); - rcu_read_unlock(); if (r10_sync_page_io(rdev, r10_bio->devs[sl].addr + sect, @@ -2810,7 +2751,6 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10 rdev->bdev); } rdev_dec_pending(rdev, mddev); - rcu_read_lock(); } sl = start; while (sl != slot) { @@ -2818,14 +2758,13 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10 sl = conf->copies; sl--; d = r10_bio->devs[sl].devnum; - rdev = rcu_dereference(conf->mirrors[d].rdev); + rdev = conf->mirrors[d].rdev; if (!rdev || test_bit(Faulty, &rdev->flags) || !test_bit(In_sync, &rdev->flags)) continue; atomic_inc(&rdev->nr_pending); - rcu_read_unlock(); switch (r10_sync_page_io(rdev, r10_bio->devs[sl].addr + sect, @@ -2853,9 +2792,7 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10 } rdev_dec_pending(rdev, mddev); - rcu_read_lock(); } - rcu_read_unlock(); sectors -= s; sect += s; @@ -3329,14 +3266,13 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr, /* Completed a full sync so the replacements * are now fully recovered. */ - rcu_read_lock(); for (i = 0; i < conf->geo.raid_disks; i++) { struct md_rdev *rdev = - rcu_dereference(conf->mirrors[i].replacement); + conf->mirrors[i].replacement; + if (rdev) rdev->recovery_offset = MaxSector; } - rcu_read_unlock(); } conf->fullsync = 0; } @@ -3417,9 +3353,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr, struct raid10_info *mirror = &conf->mirrors[i]; struct md_rdev *mrdev, *mreplace; - rcu_read_lock(); - mrdev = rcu_dereference(mirror->rdev); - mreplace = rcu_dereference(mirror->replacement); + mrdev = mirror->rdev; + mreplace = mirror->replacement; if (mrdev && (test_bit(Faulty, &mrdev->flags) || test_bit(In_sync, &mrdev->flags))) @@ -3427,22 +3362,18 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr, if (mreplace && test_bit(Faulty, &mreplace->flags)) mreplace = NULL; - if (!mrdev && !mreplace) { - rcu_read_unlock(); + if (!mrdev && !mreplace) continue; - } still_degraded = 0; /* want to reconstruct this device */ rb2 = r10_bio; sect = raid10_find_virt(conf, sector_nr, i); - if (sect >= mddev->resync_max_sectors) { + if (sect >= mddev->resync_max_sectors) /* last stripe is not complete - don't * try to recover this sector. */ - rcu_read_unlock(); continue; - } /* Unless we are doing a full sync, or a replacement * we only need to recover the block if it is set in * the bitmap @@ -3458,14 +3389,12 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr, * that there will never be anything to do here */ chunks_skipped = -1; - rcu_read_unlock(); continue; } if (mrdev) atomic_inc(&mrdev->nr_pending); if (mreplace) atomic_inc(&mreplace->nr_pending); - rcu_read_unlock(); r10_bio = raid10_alloc_init_r10buf(conf); r10_bio->state = 0; @@ -3484,10 +3413,9 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr, /* Need to check if the array will still be * degraded */ - rcu_read_lock(); for (j = 0; j < conf->geo.raid_disks; j++) { - struct md_rdev *rdev = rcu_dereference( - conf->mirrors[j].rdev); + struct md_rdev *rdev = conf->mirrors[j].rdev; + if (rdev == NULL || test_bit(Faulty, &rdev->flags)) { still_degraded = 1; break; @@ -3502,8 +3430,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr, int k; int d = r10_bio->devs[j].devnum; sector_t from_addr, to_addr; - struct md_rdev *rdev = - rcu_dereference(conf->mirrors[d].rdev); + struct md_rdev *rdev = conf->mirrors[d].rdev; sector_t sector, first_bad; int bad_sectors; if (!rdev || @@ -3582,7 +3509,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr, atomic_inc(&r10_bio->remaining); break; } - rcu_read_unlock(); if (j == conf->copies) { /* Cannot recover, so abort the recovery or * record a bad block */ @@ -3709,12 +3635,10 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr, bio = r10_bio->devs[i].bio; bio->bi_status = BLK_STS_IOERR; - rcu_read_lock(); - rdev = rcu_dereference(conf->mirrors[d].rdev); - if (rdev == NULL || test_bit(Faulty, &rdev->flags)) { - rcu_read_unlock(); + rdev = conf->mirrors[d].rdev; + if (rdev == NULL || test_bit(Faulty, &rdev->flags)) continue; - } + sector = r10_bio->devs[i].addr; if (is_badblock(rdev, sector, max_sync, &first_bad, &bad_sectors)) { @@ -3724,7 +3648,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr, bad_sectors -= (sector - first_bad); if (max_sync > bad_sectors) max_sync = bad_sectors; - rcu_read_unlock(); continue; } } @@ -3740,11 +3663,10 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr, bio_set_dev(bio, rdev->bdev); count++; - rdev = rcu_dereference(conf->mirrors[d].replacement); - if (rdev == NULL || test_bit(Faulty, &rdev->flags)) { - rcu_read_unlock(); + rdev = conf->mirrors[d].replacement; + if (rdev == NULL || test_bit(Faulty, &rdev->flags)) continue; - } + atomic_inc(&rdev->nr_pending); /* Need to set up for writing to the replacement */ @@ -3761,7 +3683,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr, bio->bi_iter.bi_sector = sector + rdev->data_offset; bio_set_dev(bio, rdev->bdev); count++; - rcu_read_unlock(); } if (count < 2) { @@ -4471,11 +4392,11 @@ static int calc_degraded(struct r10conf *conf) int degraded, degraded2; int i; - rcu_read_lock(); degraded = 0; /* 'prev' section first */ for (i = 0; i < conf->prev.raid_disks; i++) { - struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev); + struct md_rdev *rdev = conf->mirrors[i].rdev; + if (!rdev || test_bit(Faulty, &rdev->flags)) degraded++; else if (!test_bit(In_sync, &rdev->flags)) @@ -4485,13 +4406,12 @@ static int calc_degraded(struct r10conf *conf) */ degraded++; } - rcu_read_unlock(); if (conf->geo.raid_disks == conf->prev.raid_disks) return degraded; - rcu_read_lock(); degraded2 = 0; for (i = 0; i < conf->geo.raid_disks; i++) { - struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev); + struct md_rdev *rdev = conf->mirrors[i].rdev; + if (!rdev || test_bit(Faulty, &rdev->flags)) degraded2++; else if (!test_bit(In_sync, &rdev->flags)) { @@ -4504,7 +4424,6 @@ static int calc_degraded(struct r10conf *conf) degraded2++; } } - rcu_read_unlock(); if (degraded2 > degraded) return degraded2; return degraded; @@ -4936,16 +4855,15 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, blist = read_bio; read_bio->bi_next = NULL; - rcu_read_lock(); for (s = 0; s < conf->copies*2; s++) { struct bio *b; int d = r10_bio->devs[s/2].devnum; struct md_rdev *rdev2; if (s&1) { - rdev2 = rcu_dereference(conf->mirrors[d].replacement); + rdev2 = conf->mirrors[d].replacement; b = r10_bio->devs[s/2].repl_bio; } else { - rdev2 = rcu_dereference(conf->mirrors[d].rdev); + rdev2 = conf->mirrors[d].rdev; b = r10_bio->devs[s/2].bio; } if (!rdev2 || test_bit(Faulty, &rdev2->flags)) @@ -4979,7 +4897,6 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, sector_nr += len >> 9; nr_sectors += len >> 9; } - rcu_read_unlock(); r10_bio->sectors = nr_sectors; /* Now submit the read */ @@ -5032,20 +4949,17 @@ static void reshape_request_write(struct mddev *mddev, struct r10bio *r10_bio) struct bio *b; int d = r10_bio->devs[s/2].devnum; struct md_rdev *rdev; - rcu_read_lock(); if (s&1) { - rdev = rcu_dereference(conf->mirrors[d].replacement); + rdev = conf->mirrors[d].replacement; b = r10_bio->devs[s/2].repl_bio; } else { - rdev = rcu_dereference(conf->mirrors[d].rdev); + rdev = conf->mirrors[d].rdev; b = r10_bio->devs[s/2].bio; } - if (!rdev || test_bit(Faulty, &rdev->flags)) { - rcu_read_unlock(); + if (!rdev || test_bit(Faulty, &rdev->flags)) continue; - } + atomic_inc(&rdev->nr_pending); - rcu_read_unlock(); md_sync_acct_bio(b, r10_bio->sectors); atomic_inc(&r10_bio->remaining); b->bi_next = NULL; @@ -5116,10 +5030,9 @@ static int handle_reshape_read_error(struct mddev *mddev, if (s > (PAGE_SIZE >> 9)) s = PAGE_SIZE >> 9; - rcu_read_lock(); while (!success) { int d = r10b->devs[slot].devnum; - struct md_rdev *rdev = rcu_dereference(conf->mirrors[d].rdev); + struct md_rdev *rdev = conf->mirrors[d].rdev; sector_t addr; if (rdev == NULL || test_bit(Faulty, &rdev->flags) || @@ -5128,14 +5041,12 @@ static int handle_reshape_read_error(struct mddev *mddev, addr = r10b->devs[slot].addr + idx * PAGE_SIZE; atomic_inc(&rdev->nr_pending); - rcu_read_unlock(); success = sync_page_io(rdev, addr, s << 9, pages[idx], REQ_OP_READ, false); rdev_dec_pending(rdev, mddev); - rcu_read_lock(); if (success) break; failed: @@ -5145,7 +5056,6 @@ static int handle_reshape_read_error(struct mddev *mddev, if (slot == first_slot) break; } - rcu_read_unlock(); if (!success) { /* couldn't read this block, must give up */ set_bit(MD_RECOVERY_INTR, @@ -5171,12 +5081,8 @@ static void end_reshape_write(struct bio *bio) struct md_rdev *rdev = NULL; d = find_bio_disk(conf, r10_bio, bio, &slot, &repl); - if (repl) - rdev = conf->mirrors[d].replacement; - if (!rdev) { - smp_mb(); - rdev = conf->mirrors[d].rdev; - } + rdev = repl ? conf->mirrors[d].replacement : + conf->mirrors[d].rdev; if (bio->bi_status) { /* FIXME should record badblock */ @@ -5211,18 +5117,16 @@ static void raid10_finish_reshape(struct mddev *mddev) mddev->resync_max_sectors = mddev->array_sectors; } else { int d; - rcu_read_lock(); for (d = conf->geo.raid_disks ; d < conf->geo.raid_disks - mddev->delta_disks; d++) { - struct md_rdev *rdev = rcu_dereference(conf->mirrors[d].rdev); + struct md_rdev *rdev = conf->mirrors[d].rdev; if (rdev) clear_bit(In_sync, &rdev->flags); - rdev = rcu_dereference(conf->mirrors[d].replacement); + rdev = conf->mirrors[d].replacement; if (rdev) clear_bit(In_sync, &rdev->flags); } - rcu_read_unlock(); } mddev->layout = mddev->new_layout; mddev->chunk_sectors = 1 << conf->geo.chunk_shift; From patchwork Sat Oct 21 10:20:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 156386 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce89:0:b0:403:3b70:6f57 with SMTP id p9csp83860vqx; Fri, 20 Oct 2023 19:26:52 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHqd+wTxQLPD1RW4pM4FpwWJ949mhA6JuDSst1eI7c8eETVatJDgAyTnMw8dyJKkCEbXJec X-Received: by 2002:a17:902:e5c3:b0:1c9:aac5:df1a with SMTP id u3-20020a170902e5c300b001c9aac5df1amr4273910plf.51.1697855212060; Fri, 20 Oct 2023 19:26:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697855212; cv=none; d=google.com; s=arc-20160816; b=OTLyO3ED+G7gEHy+wxb9b5jFn/eAiiaeaUN9A13APbk6yk8+kZQlP2vT7SN4G6+EdB NAVZ84jjKXgg2NpEUgerGzfKSLwtN2iCUzIS5srhsU6xU/GXLJhvnwtacOaQh9flsKwY 7bVepKC2YqTKE4QRNET5pl7jFyPju7i+Dyoa+p07WK8c9IPc2jkf5efHMyYikvcdKZgW uHBbYOjc4UI6cQ7iYmH/nAKLUqRmodU+7ufLOKJrat6dF2ujXQGXkpUb1yFx1ngtxbnx 7J3AgMffXE+Na8GsKGrgIsacgQiFCjJ3J4XWKryM6o+770vWC+yaMeIdS8+mdCc6hTEY HAhQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=w+Q22U+D8VNbbyWYQt/HikGXtgA8KeKUUtmz03fuVUQ=; fh=d9c5gHOb3LccGp7KhO2PTidd9oOuOSBo1OJ/8DRRGxA=; b=qpUkw+RI05PqcdHUFvm3Qep9ZKMKRVGPZ1yXW6yzslkhinphMu2gIWDAvWLL2i55Bq P3Tn302cPkUcLozhZgKzdybjidKTLGC0VGX781pa0YycsRoTI9hHkywTiamhWr8H6jPt asXfjLoEHSPG0q4atMooburYQCbpCfaquDQTss4J5t5qAgYZG29jGFseGhcNV0NaTmzN jYVNqrHHeW7IiEKrDpB/Lcd7IbQ1stZVSrd7msLlTym7IOyQy1OqwFVIvu6AgddZa6JM OFSKxp0kcDOw67MSgPCc0bLN7D4wvB3k9Jhe+U2Vfs0IYmaFQXjWdOV0W0eMP5V/NOYl WwXg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id y2-20020a17090264c200b001c9ad94f614si2671243pli.244.2023.10.20.19.26.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Oct 2023 19:26:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 54ABD80A73EE; Fri, 20 Oct 2023 19:26:03 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233119AbjJUCZi (ORCPT + 26 others); Fri, 20 Oct 2023 22:25:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35674 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231239AbjJUCZW (ORCPT ); Fri, 20 Oct 2023 22:25:22 -0400 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D2F4D78; Fri, 20 Oct 2023 19:25:17 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4SC4zs0pw2z4f3lX0; Sat, 21 Oct 2023 10:25:13 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgAnt9aHNjNlZ+cUDg--.5642S9; Sat, 21 Oct 2023 10:25:14 +0800 (CST) From: Yu Kuai To: song@kernel.org Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH -next v2 5/6] md/raid5: remove rcu protection to access rdev from conf Date: Sat, 21 Oct 2023 18:20:58 +0800 Message-Id: <20231021102059.3198284-6-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231021102059.3198284-1-yukuai1@huaweicloud.com> References: <20231021102059.3198284-1-yukuai1@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAnt9aHNjNlZ+cUDg--.5642S9 X-Coremail-Antispam: 1UD129KBjvAXoWfCFW8Aryxtw4kZr1DJFy3CFg_yoW8tFW8Wo Z7Zwsxta1xJryvg3y7trn3tr47uayrAw1fCr15WrZ5Za92gw4Fgw13Cr45XF1UXF1fKFy7 Xr93Xw4vqF15CrZ3n29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUOb7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l87I20VAvwVAaII0Ic2I_JFv_Gryl82xGYIkIc2 x26280x7IE14v26r126s0DM28IrcIa0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Y j41l84x0c7CEw4AK67xGY2AK021l84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xvwV C0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0 Y4vEx4A2jsIEc7CjxVAFwI0_GcCE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64 xvF2IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j 6r4UMcvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI 8I648v4I1l42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG 67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMI IYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_JFI_Gr1lIxAIcVC0I7IYx2IY6xkF7I0E 14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJV W8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjTRKfOw UUUUU X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=0.4 required=5.0 tests=BAYES_00,DATE_IN_FUTURE_06_12, KHOP_HELO_FCRDNS,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Fri, 20 Oct 2023 19:26:03 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780330227203121852 X-GMAIL-MSGID: 1780330227203121852 From: Yu Kuai Because it's safe to accees rdev from conf: - If any spinlock is held, because synchronize_rcu() from md_kick_rdev_from_array() will prevent 'rdev' to be freed until spinlock is released; - If 'reconfig_lock' is held, because rdev can't be added or removed from array; - If there is normal IO inflight, because mddev_suspend() will prevent rdev to be added or removed from array; - If there is sync IO inflight, because 'MD_RECOVERY_RUNNING' is checked in remove_and_add_spares(). And these will cover all the scenarios in raid456. Signed-off-by: Yu Kuai --- drivers/md/raid5-cache.c | 11 +-- drivers/md/raid5-ppl.c | 16 +--- drivers/md/raid5.c | 182 +++++++++++++-------------------------- drivers/md/raid5.h | 4 +- 4 files changed, 69 insertions(+), 144 deletions(-) diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c index 6157f5beb9fe..874874fe4fa1 100644 --- a/drivers/md/raid5-cache.c +++ b/drivers/md/raid5-cache.c @@ -1890,28 +1890,22 @@ r5l_recovery_replay_one_stripe(struct r5conf *conf, continue; /* in case device is broken */ - rcu_read_lock(); - rdev = rcu_dereference(conf->disks[disk_index].rdev); + rdev = conf->disks[disk_index].rdev; if (rdev) { atomic_inc(&rdev->nr_pending); - rcu_read_unlock(); sync_page_io(rdev, sh->sector, PAGE_SIZE, sh->dev[disk_index].page, REQ_OP_WRITE, false); rdev_dec_pending(rdev, rdev->mddev); - rcu_read_lock(); } - rrdev = rcu_dereference(conf->disks[disk_index].replacement); + rrdev = conf->disks[disk_index].replacement; if (rrdev) { atomic_inc(&rrdev->nr_pending); - rcu_read_unlock(); sync_page_io(rrdev, sh->sector, PAGE_SIZE, sh->dev[disk_index].page, REQ_OP_WRITE, false); rdev_dec_pending(rrdev, rrdev->mddev); - rcu_read_lock(); } - rcu_read_unlock(); } ctx->data_parity_stripes++; out: @@ -2948,7 +2942,6 @@ bool r5c_big_stripe_cached(struct r5conf *conf, sector_t sect) if (!log) return false; - WARN_ON_ONCE(!rcu_read_lock_held()); tree_index = r5c_tree_index(conf, sect); slot = radix_tree_lookup(&log->big_stripe_tree, tree_index); return slot != NULL; diff --git a/drivers/md/raid5-ppl.c b/drivers/md/raid5-ppl.c index eaea57aee602..da4ba736c4f0 100644 --- a/drivers/md/raid5-ppl.c +++ b/drivers/md/raid5-ppl.c @@ -620,11 +620,9 @@ static void ppl_do_flush(struct ppl_io_unit *io) struct md_rdev *rdev; struct block_device *bdev = NULL; - rcu_read_lock(); - rdev = rcu_dereference(conf->disks[i].rdev); + rdev = conf->disks[i].rdev; if (rdev && !test_bit(Faulty, &rdev->flags)) bdev = rdev->bdev; - rcu_read_unlock(); if (bdev) { struct bio *bio; @@ -882,9 +880,7 @@ static int ppl_recover_entry(struct ppl_log *log, struct ppl_header_entry *e, (unsigned long long)r_sector, dd_idx, (unsigned long long)sector); - /* Array has not started so rcu dereference is safe */ - rdev = rcu_dereference_protected( - conf->disks[dd_idx].rdev, 1); + rdev = conf->disks[dd_idx].rdev; if (!rdev || (!test_bit(In_sync, &rdev->flags) && sector >= rdev->recovery_offset)) { pr_debug("%s:%*s data member disk %d missing\n", @@ -936,9 +932,7 @@ static int ppl_recover_entry(struct ppl_log *log, struct ppl_header_entry *e, 0, &disk, &sh); BUG_ON(sh.pd_idx != le32_to_cpu(e->parity_disk)); - /* Array has not started so rcu dereference is safe */ - parity_rdev = rcu_dereference_protected( - conf->disks[sh.pd_idx].rdev, 1); + parity_rdev = conf->disks[sh.pd_idx].rdev; BUG_ON(parity_rdev->bdev->bd_dev != log->rdev->bdev->bd_dev); pr_debug("%s:%*s write parity at sector %llu, disk %pg\n", @@ -1404,9 +1398,7 @@ int ppl_init_log(struct r5conf *conf) for (i = 0; i < ppl_conf->count; i++) { struct ppl_log *log = &ppl_conf->child_logs[i]; - /* Array has not started so rcu dereference is safe */ - struct md_rdev *rdev = - rcu_dereference_protected(conf->disks[i].rdev, 1); + struct md_rdev *rdev = conf->disks[i].rdev; mutex_init(&log->io_mutex); spin_lock_init(&log->io_list_lock); diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index a80be51b4825..ad6d5138a6bd 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -692,12 +692,12 @@ int raid5_calc_degraded(struct r5conf *conf) int degraded, degraded2; int i; - rcu_read_lock(); degraded = 0; for (i = 0; i < conf->previous_raid_disks; i++) { - struct md_rdev *rdev = rcu_dereference(conf->disks[i].rdev); + struct md_rdev *rdev = READ_ONCE(conf->disks[i].rdev); + if (rdev && test_bit(Faulty, &rdev->flags)) - rdev = rcu_dereference(conf->disks[i].replacement); + rdev = READ_ONCE(conf->disks[i].replacement); if (!rdev || test_bit(Faulty, &rdev->flags)) degraded++; else if (test_bit(In_sync, &rdev->flags)) @@ -715,15 +715,14 @@ int raid5_calc_degraded(struct r5conf *conf) if (conf->raid_disks >= conf->previous_raid_disks) degraded++; } - rcu_read_unlock(); if (conf->raid_disks == conf->previous_raid_disks) return degraded; - rcu_read_lock(); degraded2 = 0; for (i = 0; i < conf->raid_disks; i++) { - struct md_rdev *rdev = rcu_dereference(conf->disks[i].rdev); + struct md_rdev *rdev = READ_ONCE(conf->disks[i].rdev); + if (rdev && test_bit(Faulty, &rdev->flags)) - rdev = rcu_dereference(conf->disks[i].replacement); + rdev = READ_ONCE(conf->disks[i].replacement); if (!rdev || test_bit(Faulty, &rdev->flags)) degraded2++; else if (test_bit(In_sync, &rdev->flags)) @@ -737,7 +736,6 @@ int raid5_calc_degraded(struct r5conf *conf) if (conf->raid_disks <= conf->previous_raid_disks) degraded2++; } - rcu_read_unlock(); if (degraded2 > degraded) return degraded2; return degraded; @@ -1175,14 +1173,8 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s) bi = &dev->req; rbi = &dev->rreq; /* For writing to replacement */ - rcu_read_lock(); - rrdev = rcu_dereference(conf->disks[i].replacement); - smp_mb(); /* Ensure that if rrdev is NULL, rdev won't be */ - rdev = rcu_dereference(conf->disks[i].rdev); - if (!rdev) { - rdev = rrdev; - rrdev = NULL; - } + rdev = conf->disks[i].rdev; + rrdev = conf->disks[i].replacement; if (op_is_write(op)) { if (replace_only) rdev = NULL; @@ -1203,7 +1195,6 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s) rrdev = NULL; if (rrdev) atomic_inc(&rrdev->nr_pending); - rcu_read_unlock(); /* We have already checked bad blocks for reads. Now * need to check for writes. We never accept write errors @@ -2722,28 +2713,6 @@ static void shrink_stripes(struct r5conf *conf) conf->slab_cache = NULL; } -/* - * This helper wraps rcu_dereference_protected() and can be used when - * it is known that the nr_pending of the rdev is elevated. - */ -static struct md_rdev *rdev_pend_deref(struct md_rdev __rcu *rdev) -{ - return rcu_dereference_protected(rdev, - atomic_read(&rcu_access_pointer(rdev)->nr_pending)); -} - -/* - * This helper wraps rcu_dereference_protected() and should be used - * when it is known that the mddev_lock() is held. This is safe - * seeing raid5_remove_disk() has the same lock held. - */ -static struct md_rdev *rdev_mdlock_deref(struct mddev *mddev, - struct md_rdev __rcu *rdev) -{ - return rcu_dereference_protected(rdev, - lockdep_is_held(&mddev->reconfig_mutex)); -} - static void raid5_end_read_request(struct bio * bi) { struct stripe_head *sh = bi->bi_private; @@ -2769,9 +2738,9 @@ static void raid5_end_read_request(struct bio * bi) * In that case it moved down to 'rdev'. * rdev is not removed until all requests are finished. */ - rdev = rdev_pend_deref(conf->disks[i].replacement); + rdev = conf->disks[i].replacement; if (!rdev) - rdev = rdev_pend_deref(conf->disks[i].rdev); + rdev = conf->disks[i].rdev; if (use_new_offset(conf, sh)) s = sh->sector + rdev->new_data_offset; @@ -2884,11 +2853,11 @@ static void raid5_end_write_request(struct bio *bi) for (i = 0 ; i < disks; i++) { if (bi == &sh->dev[i].req) { - rdev = rdev_pend_deref(conf->disks[i].rdev); + rdev = conf->disks[i].rdev; break; } if (bi == &sh->dev[i].rreq) { - rdev = rdev_pend_deref(conf->disks[i].replacement); + rdev = conf->disks[i].replacement; if (rdev) replacement = 1; else @@ -2896,7 +2865,7 @@ static void raid5_end_write_request(struct bio *bi) * replaced it. rdev is not removed * until all requests are finished. */ - rdev = rdev_pend_deref(conf->disks[i].rdev); + rdev = conf->disks[i].rdev; break; } } @@ -3658,15 +3627,13 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh, int bitmap_end = 0; if (test_bit(R5_ReadError, &sh->dev[i].flags)) { - struct md_rdev *rdev; - rcu_read_lock(); - rdev = rcu_dereference(conf->disks[i].rdev); + struct md_rdev *rdev = conf->disks[i].rdev; + if (rdev && test_bit(In_sync, &rdev->flags) && !test_bit(Faulty, &rdev->flags)) atomic_inc(&rdev->nr_pending); else rdev = NULL; - rcu_read_unlock(); if (rdev) { if (!rdev_set_badblocks( rdev, @@ -3784,16 +3751,17 @@ handle_failed_sync(struct r5conf *conf, struct stripe_head *sh, /* During recovery devices cannot be removed, so * locking and refcounting of rdevs is not needed */ - rcu_read_lock(); for (i = 0; i < conf->raid_disks; i++) { - struct md_rdev *rdev = rcu_dereference(conf->disks[i].rdev); + struct md_rdev *rdev = conf->disks[i].rdev; + if (rdev && !test_bit(Faulty, &rdev->flags) && !test_bit(In_sync, &rdev->flags) && !rdev_set_badblocks(rdev, sh->sector, RAID5_STRIPE_SECTORS(conf), 0)) abort = 1; - rdev = rcu_dereference(conf->disks[i].replacement); + rdev = conf->disks[i].replacement; + if (rdev && !test_bit(Faulty, &rdev->flags) && !test_bit(In_sync, &rdev->flags) @@ -3801,7 +3769,6 @@ handle_failed_sync(struct r5conf *conf, struct stripe_head *sh, RAID5_STRIPE_SECTORS(conf), 0)) abort = 1; } - rcu_read_unlock(); if (abort) conf->recovery_disabled = conf->mddev->recovery_disabled; @@ -3814,15 +3781,13 @@ static int want_replace(struct stripe_head *sh, int disk_idx) struct md_rdev *rdev; int rv = 0; - rcu_read_lock(); - rdev = rcu_dereference(sh->raid_conf->disks[disk_idx].replacement); + rdev = sh->raid_conf->disks[disk_idx].replacement; if (rdev && !test_bit(Faulty, &rdev->flags) && !test_bit(In_sync, &rdev->flags) && (rdev->recovery_offset <= sh->sector || rdev->mddev->recovery_cp <= sh->sector)) rv = 1; - rcu_read_unlock(); return rv; } @@ -4699,7 +4664,6 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s) s->log_failed = r5l_log_disk_error(conf); /* Now to look around and see what can be done */ - rcu_read_lock(); for (i=disks; i--; ) { struct md_rdev *rdev; sector_t first_bad; @@ -4744,7 +4708,7 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s) /* Prefer to use the replacement for reads, but only * if it is recovered enough and has no bad blocks. */ - rdev = rcu_dereference(conf->disks[i].replacement); + rdev = conf->disks[i].replacement; if (rdev && !test_bit(Faulty, &rdev->flags) && rdev->recovery_offset >= sh->sector + RAID5_STRIPE_SECTORS(conf) && !is_badblock(rdev, sh->sector, RAID5_STRIPE_SECTORS(conf), @@ -4755,7 +4719,7 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s) set_bit(R5_NeedReplace, &dev->flags); else clear_bit(R5_NeedReplace, &dev->flags); - rdev = rcu_dereference(conf->disks[i].rdev); + rdev = conf->disks[i].rdev; clear_bit(R5_ReadRepl, &dev->flags); } if (rdev && test_bit(Faulty, &rdev->flags)) @@ -4802,8 +4766,8 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s) if (test_bit(R5_WriteError, &dev->flags)) { /* This flag does not apply to '.replacement' * only to .rdev, so make sure to check that*/ - struct md_rdev *rdev2 = rcu_dereference( - conf->disks[i].rdev); + struct md_rdev *rdev2 = conf->disks[i].rdev; + if (rdev2 == rdev) clear_bit(R5_Insync, &dev->flags); if (rdev2 && !test_bit(Faulty, &rdev2->flags)) { @@ -4815,8 +4779,8 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s) if (test_bit(R5_MadeGood, &dev->flags)) { /* This flag does not apply to '.replacement' * only to .rdev, so make sure to check that*/ - struct md_rdev *rdev2 = rcu_dereference( - conf->disks[i].rdev); + struct md_rdev *rdev2 = conf->disks[i].rdev; + if (rdev2 && !test_bit(Faulty, &rdev2->flags)) { s->handle_bad_blocks = 1; atomic_inc(&rdev2->nr_pending); @@ -4824,8 +4788,8 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s) clear_bit(R5_MadeGood, &dev->flags); } if (test_bit(R5_MadeGoodRepl, &dev->flags)) { - struct md_rdev *rdev2 = rcu_dereference( - conf->disks[i].replacement); + struct md_rdev *rdev2 = conf->disks[i].replacement; + if (rdev2 && !test_bit(Faulty, &rdev2->flags)) { s->handle_bad_blocks = 1; atomic_inc(&rdev2->nr_pending); @@ -4846,8 +4810,7 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s) if (rdev && !test_bit(Faulty, &rdev->flags)) do_recovery = 1; else if (!rdev) { - rdev = rcu_dereference( - conf->disks[i].replacement); + rdev = conf->disks[i].replacement; if (rdev && !test_bit(Faulty, &rdev->flags)) do_recovery = 1; } @@ -4874,7 +4837,6 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s) else s->replacing = 1; } - rcu_read_unlock(); } /* @@ -5331,23 +5293,23 @@ static void handle_stripe(struct stripe_head *sh) struct r5dev *dev = &sh->dev[i]; if (test_and_clear_bit(R5_WriteError, &dev->flags)) { /* We own a safe reference to the rdev */ - rdev = rdev_pend_deref(conf->disks[i].rdev); + rdev = conf->disks[i].rdev; if (!rdev_set_badblocks(rdev, sh->sector, RAID5_STRIPE_SECTORS(conf), 0)) md_error(conf->mddev, rdev); rdev_dec_pending(rdev, conf->mddev); } if (test_and_clear_bit(R5_MadeGood, &dev->flags)) { - rdev = rdev_pend_deref(conf->disks[i].rdev); + rdev = conf->disks[i].rdev; rdev_clear_badblocks(rdev, sh->sector, RAID5_STRIPE_SECTORS(conf), 0); rdev_dec_pending(rdev, conf->mddev); } if (test_and_clear_bit(R5_MadeGoodRepl, &dev->flags)) { - rdev = rdev_pend_deref(conf->disks[i].replacement); + rdev = conf->disks[i].replacement; if (!rdev) /* rdev have been moved down */ - rdev = rdev_pend_deref(conf->disks[i].rdev); + rdev = conf->disks[i].rdev; rdev_clear_badblocks(rdev, sh->sector, RAID5_STRIPE_SECTORS(conf), 0); rdev_dec_pending(rdev, conf->mddev); @@ -5506,24 +5468,22 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio) &dd_idx, NULL); end_sector = sector + bio_sectors(raid_bio); - rcu_read_lock(); if (r5c_big_stripe_cached(conf, sector)) - goto out_rcu_unlock; + return 0; - rdev = rcu_dereference(conf->disks[dd_idx].replacement); + rdev = conf->disks[dd_idx].replacement; if (!rdev || test_bit(Faulty, &rdev->flags) || rdev->recovery_offset < end_sector) { - rdev = rcu_dereference(conf->disks[dd_idx].rdev); + rdev = conf->disks[dd_idx].rdev; if (!rdev) - goto out_rcu_unlock; + return 0; if (test_bit(Faulty, &rdev->flags) || !(test_bit(In_sync, &rdev->flags) || rdev->recovery_offset >= end_sector)) - goto out_rcu_unlock; + return 0; } atomic_inc(&rdev->nr_pending); - rcu_read_unlock(); if (is_badblock(rdev, sector, bio_sectors(raid_bio), &first_bad, &bad_sectors)) { @@ -5567,10 +5527,6 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio) raid_bio->bi_iter.bi_sector); submit_bio_noacct(align_bio); return 1; - -out_rcu_unlock: - rcu_read_unlock(); - return 0; } static struct bio *chunk_aligned_read(struct mddev *mddev, struct bio *raid_bio) @@ -6573,14 +6529,12 @@ static inline sector_t raid5_sync_request(struct mddev *mddev, sector_t sector_n * Note in case of > 1 drive failures it's possible we're rebuilding * one drive while leaving another faulty drive in array. */ - rcu_read_lock(); for (i = 0; i < conf->raid_disks; i++) { - struct md_rdev *rdev = rcu_dereference(conf->disks[i].rdev); + struct md_rdev *rdev = conf->disks[i].rdev; if (rdev == NULL || test_bit(Faulty, &rdev->flags)) still_degraded = 1; } - rcu_read_unlock(); md_bitmap_start_sync(mddev->bitmap, sector_nr, &sync_blocks, still_degraded); @@ -7898,18 +7852,10 @@ static int raid5_run(struct mddev *mddev) for (i = 0; i < conf->raid_disks && conf->previous_raid_disks; i++) { - rdev = rdev_mdlock_deref(mddev, conf->disks[i].rdev); - if (!rdev && conf->disks[i].replacement) { - /* The replacement is all we have yet */ - rdev = rdev_mdlock_deref(mddev, - conf->disks[i].replacement); - conf->disks[i].replacement = NULL; - clear_bit(Replacement, &rdev->flags); - rcu_assign_pointer(conf->disks[i].rdev, rdev); - } + rdev = conf->disks[i].rdev; if (!rdev) continue; - if (rcu_access_pointer(conf->disks[i].replacement) && + if (conf->disks[i].replacement && conf->reshape_progress != MaxSector) { /* replacements and reshape simply do not mix. */ pr_warn("md: cannot handle concurrent replacement and reshape.\n"); @@ -8090,15 +8036,16 @@ static void raid5_status(struct seq_file *seq, struct mddev *mddev) struct r5conf *conf = mddev->private; int i; + lockdep_assert_held(&mddev->lock); + seq_printf(seq, " level %d, %dk chunk, algorithm %d", mddev->level, conf->chunk_sectors / 2, mddev->layout); seq_printf (seq, " [%d/%d] [", conf->raid_disks, conf->raid_disks - mddev->degraded); - rcu_read_lock(); for (i = 0; i < conf->raid_disks; i++) { - struct md_rdev *rdev = rcu_dereference(conf->disks[i].rdev); + struct md_rdev *rdev = READ_ONCE(conf->disks[i].rdev); + seq_printf (seq, "%s", rdev && test_bit(In_sync, &rdev->flags) ? "U" : "_"); } - rcu_read_unlock(); seq_printf (seq, "]"); } @@ -8111,9 +8058,8 @@ static int raid5_spare_active(struct mddev *mddev) unsigned long flags; for (i = 0; i < conf->raid_disks; i++) { - rdev = rdev_mdlock_deref(mddev, conf->disks[i].rdev); - replacement = rdev_mdlock_deref(mddev, - conf->disks[i].replacement); + rdev = conf->disks[i].rdev; + replacement = conf->disks[i].replacement; if (replacement && replacement->recovery_offset == MaxSector && !test_bit(Faulty, &replacement->flags) @@ -8151,7 +8097,7 @@ static int raid5_remove_disk(struct mddev *mddev, struct md_rdev *rdev) struct r5conf *conf = mddev->private; int err = 0; int number = rdev->raid_disk; - struct md_rdev __rcu **rdevp; + struct md_rdev **rdevp; struct disk_info *p; struct md_rdev *tmp; @@ -8173,9 +8119,9 @@ static int raid5_remove_disk(struct mddev *mddev, struct md_rdev *rdev) if (unlikely(number >= conf->pool_size)) return 0; p = conf->disks + number; - if (rdev == rcu_access_pointer(p->rdev)) + if (rdev == p->rdev) rdevp = &p->rdev; - else if (rdev == rcu_access_pointer(p->replacement)) + else if (rdev == p->replacement) rdevp = &p->replacement; else return 0; @@ -8195,28 +8141,24 @@ static int raid5_remove_disk(struct mddev *mddev, struct md_rdev *rdev) if (!test_bit(Faulty, &rdev->flags) && mddev->recovery_disabled != conf->recovery_disabled && !has_failed(conf) && - (!rcu_access_pointer(p->replacement) || - rcu_access_pointer(p->replacement) == rdev) && + (!p->replacement || p->replacement == rdev) && number < conf->raid_disks) { err = -EBUSY; goto abort; } - *rdevp = NULL; + WRITE_ONCE(*rdevp, NULL); if (!err) { err = log_modify(conf, rdev, false); if (err) goto abort; } - tmp = rcu_access_pointer(p->replacement); + tmp = p->replacement; if (tmp) { /* We must have just cleared 'rdev' */ - rcu_assign_pointer(p->rdev, tmp); + WRITE_ONCE(p->rdev, tmp); clear_bit(Replacement, &tmp->flags); - smp_mb(); /* Make sure other CPUs may see both as identical - * but will never see neither - if they are careful - */ - rcu_assign_pointer(p->replacement, NULL); + WRITE_ONCE(p->replacement, NULL); if (!err) err = log_modify(conf, tmp, true); @@ -8283,7 +8225,7 @@ static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev) rdev->raid_disk = disk; if (rdev->saved_raid_disk != disk) conf->fullsync = 1; - rcu_assign_pointer(p->rdev, rdev); + WRITE_ONCE(p->rdev, rdev); err = log_modify(conf, rdev, true); @@ -8292,7 +8234,7 @@ static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev) } for (disk = first; disk <= last; disk++) { p = conf->disks + disk; - tmp = rdev_mdlock_deref(mddev, p->rdev); + tmp = p->rdev; if (test_bit(WantReplacement, &tmp->flags) && mddev->reshape_position == MaxSector && p->replacement == NULL) { @@ -8301,7 +8243,7 @@ static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev) rdev->raid_disk = disk; err = 0; conf->fullsync = 1; - rcu_assign_pointer(p->replacement, rdev); + WRITE_ONCE(p->replacement, rdev); break; } } @@ -8433,7 +8375,7 @@ static int raid5_start_reshape(struct mddev *mddev) if (mddev->recovery_cp < MaxSector) return -EBUSY; for (i = 0; i < conf->raid_disks; i++) - if (rdev_mdlock_deref(mddev, conf->disks[i].replacement)) + if (conf->disks[i].replacement) return -EBUSY; rdev_for_each(rdev, mddev) { @@ -8604,12 +8546,10 @@ static void raid5_finish_reshape(struct mddev *mddev) for (d = conf->raid_disks ; d < conf->raid_disks - mddev->delta_disks; d++) { - rdev = rdev_mdlock_deref(mddev, - conf->disks[d].rdev); + rdev = conf->disks[d].rdev; if (rdev) clear_bit(In_sync, &rdev->flags); - rdev = rdev_mdlock_deref(mddev, - conf->disks[d].replacement); + rdev = conf->disks[d].replacement; if (rdev) clear_bit(In_sync, &rdev->flags); } diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h index 97a795979a35..9163c8cefb3f 100644 --- a/drivers/md/raid5.h +++ b/drivers/md/raid5.h @@ -473,8 +473,8 @@ enum { */ struct disk_info { - struct md_rdev __rcu *rdev; - struct md_rdev __rcu *replacement; + struct md_rdev *rdev; + struct md_rdev *replacement; struct page *extra_page; /* extra page to use in prexor */ }; From patchwork Sat Oct 21 10:20:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 156384 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce89:0:b0:403:3b70:6f57 with SMTP id p9csp83698vqx; Fri, 20 Oct 2023 19:26:16 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHLgmM8UvZS4vWjeNGyq5IBX3EVctQpCYdZRvZHDY2aLiWhySYIdUDBZpmKfwwfxf7KflxH X-Received: by 2002:a05:6a00:4c86:b0:6bf:15fb:4b32 with SMTP id eb6-20020a056a004c8600b006bf15fb4b32mr6644691pfb.8.1697855175964; Fri, 20 Oct 2023 19:26:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697855175; cv=none; d=google.com; s=arc-20160816; b=i0sNhAOSbZj5XhRpUsp00Ti91x39NHNJLkptt0wq69TRj0u7MkC7sw+Cx8LsI7e/DN UT2gN3HKy4Kt49KV1R2E4yoBIf9wIfuvirIm/xPJCeCeMZXLayySPIOkBAXLHvakU4rz Qnd1cJimwl3oeolLuEyyH9MoNt+hWBmabyyt5gnJiD6I2ao1aJmi5gG51XBKfM59UbXK q2+HatqnOBB50K2x3SJdmYqfWh+JdICqssxfWnzonIYAVnP+wG/y20+CNQGgmZR2N2D/ ZOAZXOWt1yzPwJJruVG9lUJ8BNaE6dqC94fBqTshHeLixK7SCSq30AXWKrHD+4L7UTLL smzQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=75tQ2zZudm29Ad7dA9n8sTEU+NJbUgh0Fh9U+0WKGE0=; fh=d9c5gHOb3LccGp7KhO2PTidd9oOuOSBo1OJ/8DRRGxA=; b=Lyaw3IFAIk2el9a9MXP8jCtQkZPeidAZBE3nU7jhuDd5m7Ifl+McCdSbaGK9wyrMxf EG4Ua9wQBKrPPaOLpZoDhHO1rNOlW/Ul/OYJ9YuMy3k/WoYdqvr+WM2nn8ZExvYiyt4J GAmgoE58wwcDdSyMHJVCw5aP81+fiHjgfspJ2mkm18Qw5cTK5CJTEB6vu1A1/50xqaYr oMO0n+cjtMZj61k/NibFYeH35fY5t8gpNol0Ia06LevbJ+4HYdhwBTuHtDWUjeI6Cxma XF8mu1q0hkjh7hWkeEQfhLIRzRfFP6+4zVXOWREQqYiuuBCHAgdoVbWgZtG/kPeUsyOf 1hBg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id ch1-20020a056a0208c100b005b4ef9f02desi2761743pgb.796.2023.10.20.19.26.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Oct 2023 19:26:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id C0D7581D2B7A; Fri, 20 Oct 2023 19:26:11 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231184AbjJUCZe (ORCPT + 26 others); Fri, 20 Oct 2023 22:25:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35660 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231201AbjJUCZV (ORCPT ); Fri, 20 Oct 2023 22:25:21 -0400 Received: from dggsgout12.his.huawei.com (unknown [45.249.212.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 91A4DD7D; Fri, 20 Oct 2023 19:25:18 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4SC4zn4G3Rz4f3kG1; Sat, 21 Oct 2023 10:25:09 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgAnt9aHNjNlZ+cUDg--.5642S10; Sat, 21 Oct 2023 10:25:15 +0800 (CST) From: Yu Kuai To: song@kernel.org Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH -next v2 6/6] md/md-multipath: remove rcu protection to access rdev from conf Date: Sat, 21 Oct 2023 18:20:59 +0800 Message-Id: <20231021102059.3198284-7-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231021102059.3198284-1-yukuai1@huaweicloud.com> References: <20231021102059.3198284-1-yukuai1@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAnt9aHNjNlZ+cUDg--.5642S10 X-Coremail-Antispam: 1UD129KBjvJXoWxXw1kJF4UZr43uw17Wr17ZFb_yoW5Jw4kpa yaqasxtr4UXryakrnFka1Uua4Skw43tFWIkryfC3yIva15Gry5XF1rtryUXFn5AFZ5AF45 XFn8Kw4DAFyxGaUanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPY14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xv wVC2z280aVCY1x0267AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFc xC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_ Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2 IErcIFxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_JF0_Jw1lIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Gr0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r 4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x0pRvJPtU UUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=-0.7 required=5.0 tests=DATE_IN_FUTURE_06_12, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Fri, 20 Oct 2023 19:26:11 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780330189156287810 X-GMAIL-MSGID: 1780330189156287810 From: Yu Kuai Because it's safe to accees rdev from conf: - If any spinlock is held, because synchronize_rcu() from md_kick_rdev_from_array() will prevent 'rdev' to be freed until spinlock is released; - If there is normal IO inflight, because mddev_suspend() will prevent rdev to be added or removed from array; And these will cover all the scenarios in md-multipath. Signed-off-by: Yu Kuai --- drivers/md/md-multipath.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/md/md-multipath.c b/drivers/md/md-multipath.c index aa77133f3188..51c5390c517d 100644 --- a/drivers/md/md-multipath.c +++ b/drivers/md/md-multipath.c @@ -32,17 +32,15 @@ static int multipath_map (struct mpconf *conf) * now we use the first available disk. */ - rcu_read_lock(); for (i = 0; i < disks; i++) { - struct md_rdev *rdev = rcu_dereference(conf->multipaths[i].rdev); + struct md_rdev *rdev = conf->multipaths[i].rdev; + if (rdev && test_bit(In_sync, &rdev->flags) && !test_bit(Faulty, &rdev->flags)) { atomic_inc(&rdev->nr_pending); - rcu_read_unlock(); return i; } } - rcu_read_unlock(); pr_crit_ratelimited("multipath_map(): no more operational IO paths?\n"); return (-1); @@ -137,14 +135,16 @@ static void multipath_status(struct seq_file *seq, struct mddev *mddev) struct mpconf *conf = mddev->private; int i; + lockdep_assert_held(&mddev->lock); + seq_printf (seq, " [%d/%d] [", conf->raid_disks, conf->raid_disks - mddev->degraded); - rcu_read_lock(); for (i = 0; i < conf->raid_disks; i++) { - struct md_rdev *rdev = rcu_dereference(conf->multipaths[i].rdev); - seq_printf (seq, "%s", rdev && test_bit(In_sync, &rdev->flags) ? "U" : "_"); + struct md_rdev *rdev = READ_ONCE(conf->multipaths[i].rdev); + + seq_printf(seq, "%s", + rdev && test_bit(In_sync, &rdev->flags) ? "U" : "_"); } - rcu_read_unlock(); seq_putc(seq, ']'); } @@ -231,7 +231,7 @@ static int multipath_add_disk(struct mddev *mddev, struct md_rdev *rdev) rdev->raid_disk = path; set_bit(In_sync, &rdev->flags); spin_unlock_irq(&conf->device_lock); - rcu_assign_pointer(p->rdev, rdev); + WRITE_ONCE(p->rdev, rdev); err = 0; break; } @@ -257,7 +257,7 @@ static int multipath_remove_disk(struct mddev *mddev, struct md_rdev *rdev) err = -EBUSY; goto abort; } - p->rdev = NULL; + WRITE_ONCE(p->rdev, NULL); err = md_integrity_register(mddev); } abort: