Message ID | 20230322035926.1791317-1-yukuai1@huaweicloud.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp2152491wrt; Tue, 21 Mar 2023 21:17:10 -0700 (PDT) X-Google-Smtp-Source: AK7set9ynxOVgfCrZO8SZB+HEOXeapJNtrvnKBVmgZI4erU31vq78eOqqNaIV4kIBG2grky8v9kH X-Received: by 2002:a05:6402:7d5:b0:500:50f6:dd33 with SMTP id u21-20020a05640207d500b0050050f6dd33mr6158583edy.2.1679458630202; Tue, 21 Mar 2023 21:17:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679458630; cv=none; d=google.com; s=arc-20160816; b=WNmmnhfhHKTi5aXvGWLAwo+v6Vk4SBT0rfe17ob5R83AkuRV6wOaOlj6BETVeob+0D bMsyebGSEgzNIVXumVhhM+kmdRrlE74jtBomMQzzEFqJXzLYS3LEghR529VwjiBGBplK hoTtHaXFAhe6c6YPttMUtgZeFCpDJQ6FawD6rZUrLkIqSvQOY/I5pvVmkmQ7vUOIAx+a OSfpqmxKssO1aDrf2mXZnB/JIxHixiVyC+X5P3WCXrmxMtB2HO6cgTHFqHe89XaeOW9n AEKvL9aFolpJwPDkcxNwl0dOmSZx/m5L45DsC5E/2NOsA+uPQGFvJ7Xp4ahXFBMhvdJi RL9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=hyrjhorHmbe3XOtWiLtzFpn0+26KySht5Ax6IjM1FnM=; b=myKZ0itdg2wWWlLBmvwCMjmA8JDPTI4WA4cv8VPhbF3o4CsHItWNAxiANSzHiYDoK2 YKxU4q7n9At74kWgto7WEk9yPPM2YrbUen/pugfU1ILCFnQ5S8eMn0PXRELbz1a7MM5t YuTMlWmauzugoZXiPzkhHEz/7mNmJlo/hvnk9OED8GhHFNbR+tbPk/UQxa27FsV99vW0 Pth2URbTvNA66MY7m2BuwmFzswB9KyLisqSjHrxrsRc7n5GmR6vWzPDUM2qmphmlM9lM owM0NI1HIgJIGw2uW4W6zEmVmX4nPkYmCNX4tj8e76CtD2C1kHiMcTAED18VWqyKTV02 3+Jw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n11-20020aa7c78b000000b004fd3def3e85si14395287eds.180.2023.03.21.21.16.40; Tue, 21 Mar 2023 21:17:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230338AbjCVEAn (ORCPT <rfc822;ezelljr.billy@gmail.com> + 99 others); Wed, 22 Mar 2023 00:00:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46794 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230253AbjCVEA0 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 22 Mar 2023 00:00:26 -0400 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 96925474E7; Tue, 21 Mar 2023 21:00:23 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.169]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4PhF9t5Y8lz4f3mLm; Wed, 22 Mar 2023 12:00:18 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.127.227]) by APP3 (Coremail) with SMTP id _Ch0CgC3YiBSfRpkU+ZjFQ--.28641S4; Wed, 22 Mar 2023 12:00:20 +0800 (CST) From: Yu Kuai <yukuai1@huaweicloud.com> To: ming.lei@redhat.com, jack@suse.cz, hch@infradead.org, axboe@kernel.dk, yukuai3@huawei.com Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH] block: don't set GD_NEED_PART_SCAN if scan partition failed Date: Wed, 22 Mar 2023 11:59:26 +0800 Message-Id: <20230322035926.1791317-1-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <ZBmYcuVzpDDTiaP+@ovpn-8-18.pek2.redhat.com> References: <ZBmYcuVzpDDTiaP+@ovpn-8-18.pek2.redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID: _Ch0CgC3YiBSfRpkU+ZjFQ--.28641S4 X-Coremail-Antispam: 1UD129KBjvJXoW7ur48Kw47JF1UWr1Dtw17Awb_yoW8uF4xpF nxJa15KryDWr1fCa4jv3WxXa15Ja9rZryfJrW3G34IvwnxXanIyF92k3yDWF10qr93JrWD ur15W34ruF1furDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUvY14x267AKxVW8JVW5JwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK02 1l84ACjcxK6xIIjxv20xvE14v26F1j6w1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4U JVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_Gc CE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E 2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJV W8JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lFIxGxcIEc7CjxVA2 Y2ka0xkIwI1l42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4 xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43 MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I 0E14v26r1j6r4UMIIF0xvE42xK8VAvwI8IcIk0rVWrZr1j6s0DMIIF0xvEx4A2jsIE14v2 6r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x0J UdHUDUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=0.0 required=5.0 tests=SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761040013025060378?= X-GMAIL-MSGID: =?utf-8?q?1761040013025060378?= |
Series |
block: don't set GD_NEED_PART_SCAN if scan partition failed
|
|
Commit Message
Yu Kuai
March 22, 2023, 3:59 a.m. UTC
From: Yu Kuai <yukuai3@huawei.com> Currently if disk_scan_partitions() failed, GD_NEED_PART_SCAN will still set, and partition scan will be proceed again when blkdev_get_by_dev() is called. However, this will cause a problem that re-assemble partitioned raid device will creat partition for underlying disk. Test procedure: mdadm -CR /dev/md0 -l 1 -n 2 /dev/sda /dev/sdb -e 1.0 sgdisk -n 0:0:+100MiB /dev/md0 blockdev --rereadpt /dev/sda blockdev --rereadpt /dev/sdb mdadm -S /dev/md0 mdadm -A /dev/md0 /dev/sda /dev/sdb Test result: underlying disk partition and raid partition can be observed at the same time Note that this can still happen in come corner cases that GD_NEED_PART_SCAN can be set for underlying disk while re-assemble raid device. Fixes: e5cfefa97bcc ("block: fix scan partition for exclusively open device again") Signed-off-by: Yu Kuai <yukuai3@huawei.com> --- block/genhd.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
Comments
On Wed, Mar 22, 2023 at 11:59:26AM +0800, Yu Kuai wrote: > From: Yu Kuai <yukuai3@huawei.com> > > Currently if disk_scan_partitions() failed, GD_NEED_PART_SCAN will still > set, and partition scan will be proceed again when blkdev_get_by_dev() > is called. However, this will cause a problem that re-assemble partitioned > raid device will creat partition for underlying disk. > > Test procedure: > > mdadm -CR /dev/md0 -l 1 -n 2 /dev/sda /dev/sdb -e 1.0 > sgdisk -n 0:0:+100MiB /dev/md0 > blockdev --rereadpt /dev/sda > blockdev --rereadpt /dev/sdb > mdadm -S /dev/md0 > mdadm -A /dev/md0 /dev/sda /dev/sdb > > Test result: underlying disk partition and raid partition can be > observed at the same time > > Note that this can still happen in come corner cases that > GD_NEED_PART_SCAN can be set for underlying disk while re-assemble raid > device. > > Fixes: e5cfefa97bcc ("block: fix scan partition for exclusively open device again") > Signed-off-by: Yu Kuai <yukuai3@huawei.com> The issue still can't be avoided completely, such as, after rebooting, /dev/sda1 & /dev/md0p1 can be observed at the same time. And this one should be underlying partitions scanned before re-assembling raid, I guess it may not be easy to avoid. Also seems the following change added in e5cfefa97bcc isn't necessary: /* Make sure the first partition scan will be proceed */ if (get_capacity(disk) && !(disk->flags & GENHD_FL_NO_PART) && !test_bit(GD_SUPPRESS_PART_SCAN, &disk->state)) set_bit(GD_NEED_PART_SCAN, &disk->state); since the following disk_scan_partitions() in device_add_disk() should cover partitions scan. > --- > block/genhd.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/block/genhd.c b/block/genhd.c > index 08bb1a9ec22c..a72e27d6779d 100644 > --- a/block/genhd.c > +++ b/block/genhd.c > @@ -368,7 +368,6 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode) > if (disk->open_partitions) > return -EBUSY; > > - set_bit(GD_NEED_PART_SCAN, &disk->state); > /* > * If the device is opened exclusively by current thread already, it's > * safe to scan partitons, otherwise, use bd_prepare_to_claim() to > @@ -381,12 +380,19 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode) > return ret; > } > > + set_bit(GD_NEED_PART_SCAN, &disk->state); > bdev = blkdev_get_by_dev(disk_devt(disk), mode & ~FMODE_EXCL, NULL); > if (IS_ERR(bdev)) > ret = PTR_ERR(bdev); > else > blkdev_put(bdev, mode & ~FMODE_EXCL); > > + /* > + * If blkdev_get_by_dev() failed early, GD_NEED_PART_SCAN is still set, > + * and this will cause that re-assemble partitioned raid device will > + * creat partition for underlying disk. > + */ > + clear_bit(GD_NEED_PART_SCAN, &disk->state); I feel GD_NEED_PART_SCAN becomes a bit hard to follow. So far, it is only consumed by blkdev_get_whole(), and cleared in bdev_disk_changed(). That means partition scan can be retried if bdev_disk_changed() fails. Another mess is that more drivers start to touch this flag, such as nbd/sd, probably it is better to change them into one API of blk_disk_need_partition_scan(), and hide implementation detail to drivers. thanks, Ming
Hi, Ming 在 2023/03/22 15:58, Ming Lei 写道: > On Wed, Mar 22, 2023 at 11:59:26AM +0800, Yu Kuai wrote: >> From: Yu Kuai <yukuai3@huawei.com> >> >> Currently if disk_scan_partitions() failed, GD_NEED_PART_SCAN will still >> set, and partition scan will be proceed again when blkdev_get_by_dev() >> is called. However, this will cause a problem that re-assemble partitioned >> raid device will creat partition for underlying disk. >> >> Test procedure: >> >> mdadm -CR /dev/md0 -l 1 -n 2 /dev/sda /dev/sdb -e 1.0 >> sgdisk -n 0:0:+100MiB /dev/md0 >> blockdev --rereadpt /dev/sda >> blockdev --rereadpt /dev/sdb >> mdadm -S /dev/md0 >> mdadm -A /dev/md0 /dev/sda /dev/sdb >> >> Test result: underlying disk partition and raid partition can be >> observed at the same time >> >> Note that this can still happen in come corner cases that >> GD_NEED_PART_SCAN can be set for underlying disk while re-assemble raid >> device. >> >> Fixes: e5cfefa97bcc ("block: fix scan partition for exclusively open device again") >> Signed-off-by: Yu Kuai <yukuai3@huawei.com> > > The issue still can't be avoided completely, such as, after rebooting, > /dev/sda1 & /dev/md0p1 can be observed at the same time. And this one > should be underlying partitions scanned before re-assembling raid, I > guess it may not be easy to avoid. Yes, this is possible and I don't know how to fix this yet... > > Also seems the following change added in e5cfefa97bcc isn't necessary: > > /* Make sure the first partition scan will be proceed */ > if (get_capacity(disk) && !(disk->flags & GENHD_FL_NO_PART) && > !test_bit(GD_SUPPRESS_PART_SCAN, &disk->state)) > set_bit(GD_NEED_PART_SCAN, &disk->state); > > since the following disk_scan_partitions() in device_add_disk() should cover > partitions scan. This can't be guaranteed if someone else open the device excl after bdev_add and before disk_scan_partitions: t1: t2: device_add_disk bdev_add insert_inode_hash // open device excl disk_scan_partitions // will fail However, this is just in theory, and it's unlikely to happen in practice. Thanks, Kuai > >> --- >> block/genhd.c | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/block/genhd.c b/block/genhd.c >> index 08bb1a9ec22c..a72e27d6779d 100644 >> --- a/block/genhd.c >> +++ b/block/genhd.c >> @@ -368,7 +368,6 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode) >> if (disk->open_partitions) >> return -EBUSY; >> >> - set_bit(GD_NEED_PART_SCAN, &disk->state); >> /* >> * If the device is opened exclusively by current thread already, it's >> * safe to scan partitons, otherwise, use bd_prepare_to_claim() to >> @@ -381,12 +380,19 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode) >> return ret; >> } >> >> + set_bit(GD_NEED_PART_SCAN, &disk->state); >> bdev = blkdev_get_by_dev(disk_devt(disk), mode & ~FMODE_EXCL, NULL); >> if (IS_ERR(bdev)) >> ret = PTR_ERR(bdev); >> else >> blkdev_put(bdev, mode & ~FMODE_EXCL); >> >> + /* >> + * If blkdev_get_by_dev() failed early, GD_NEED_PART_SCAN is still set, >> + * and this will cause that re-assemble partitioned raid device will >> + * creat partition for underlying disk. >> + */ >> + clear_bit(GD_NEED_PART_SCAN, &disk->state); > > I feel GD_NEED_PART_SCAN becomes a bit hard to follow. > > So far, it is only consumed by blkdev_get_whole(), and cleared in > bdev_disk_changed(). That means partition scan can be retried > if bdev_disk_changed() fails. > > Another mess is that more drivers start to touch this flag, such as > nbd/sd, probably it is better to change them into one API of > blk_disk_need_partition_scan(), and hide implementation detail > to drivers. > > > thanks, > Ming > > . >
On Wed 22-03-23 15:58:35, Ming Lei wrote: > On Wed, Mar 22, 2023 at 11:59:26AM +0800, Yu Kuai wrote: > > From: Yu Kuai <yukuai3@huawei.com> > > > > Currently if disk_scan_partitions() failed, GD_NEED_PART_SCAN will still > > set, and partition scan will be proceed again when blkdev_get_by_dev() > > is called. However, this will cause a problem that re-assemble partitioned > > raid device will creat partition for underlying disk. > > > > Test procedure: > > > > mdadm -CR /dev/md0 -l 1 -n 2 /dev/sda /dev/sdb -e 1.0 > > sgdisk -n 0:0:+100MiB /dev/md0 > > blockdev --rereadpt /dev/sda > > blockdev --rereadpt /dev/sdb > > mdadm -S /dev/md0 > > mdadm -A /dev/md0 /dev/sda /dev/sdb > > > > Test result: underlying disk partition and raid partition can be > > observed at the same time > > > > Note that this can still happen in come corner cases that > > GD_NEED_PART_SCAN can be set for underlying disk while re-assemble raid > > device. > > > > Fixes: e5cfefa97bcc ("block: fix scan partition for exclusively open device again") > > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > > The issue still can't be avoided completely, such as, after rebooting, > /dev/sda1 & /dev/md0p1 can be observed at the same time. And this one > should be underlying partitions scanned before re-assembling raid, I > guess it may not be easy to avoid. So this was always happening (before my patches, after my patches, and now after Yu's patches) and kernel does not have enough information to know that sda will become part of md0 device in the future. But mdadm actually deals with this as far as I remember and deletes partitions for all devices it is assembling the array from (and quick tracing experiment I did supports this). Honza
On Wed 22-03-23 11:59:26, Yu Kuai wrote: > From: Yu Kuai <yukuai3@huawei.com> > > Currently if disk_scan_partitions() failed, GD_NEED_PART_SCAN will still > set, and partition scan will be proceed again when blkdev_get_by_dev() > is called. However, this will cause a problem that re-assemble partitioned > raid device will creat partition for underlying disk. > > Test procedure: > > mdadm -CR /dev/md0 -l 1 -n 2 /dev/sda /dev/sdb -e 1.0 > sgdisk -n 0:0:+100MiB /dev/md0 > blockdev --rereadpt /dev/sda > blockdev --rereadpt /dev/sdb > mdadm -S /dev/md0 > mdadm -A /dev/md0 /dev/sda /dev/sdb > > Test result: underlying disk partition and raid partition can be > observed at the same time > > Note that this can still happen in come corner cases that > GD_NEED_PART_SCAN can be set for underlying disk while re-assemble raid > device. > > Fixes: e5cfefa97bcc ("block: fix scan partition for exclusively open device again") > Signed-off-by: Yu Kuai <yukuai3@huawei.com> This looks good to me. I've actually noticed this problem already when looking at the patch resulting in commit e5cfefa97bcc but Jens merged it before I got to checking it and then I've convinced myself it's not serious enough to redo the patch. Anyway, feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > block/genhd.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/block/genhd.c b/block/genhd.c > index 08bb1a9ec22c..a72e27d6779d 100644 > --- a/block/genhd.c > +++ b/block/genhd.c > @@ -368,7 +368,6 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode) > if (disk->open_partitions) > return -EBUSY; > > - set_bit(GD_NEED_PART_SCAN, &disk->state); > /* > * If the device is opened exclusively by current thread already, it's > * safe to scan partitons, otherwise, use bd_prepare_to_claim() to > @@ -381,12 +380,19 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode) > return ret; > } > > + set_bit(GD_NEED_PART_SCAN, &disk->state); > bdev = blkdev_get_by_dev(disk_devt(disk), mode & ~FMODE_EXCL, NULL); > if (IS_ERR(bdev)) > ret = PTR_ERR(bdev); > else > blkdev_put(bdev, mode & ~FMODE_EXCL); > > + /* > + * If blkdev_get_by_dev() failed early, GD_NEED_PART_SCAN is still set, > + * and this will cause that re-assemble partitioned raid device will > + * creat partition for underlying disk. > + */ > + clear_bit(GD_NEED_PART_SCAN, &disk->state); > if (!(mode & FMODE_EXCL)) > bd_abort_claiming(disk->part0, disk_scan_partitions); > return ret; > -- > 2.31.1 >
On Wed, Mar 22, 2023 at 10:47:07AM +0100, Jan Kara wrote: > On Wed 22-03-23 15:58:35, Ming Lei wrote: > > On Wed, Mar 22, 2023 at 11:59:26AM +0800, Yu Kuai wrote: > > > From: Yu Kuai <yukuai3@huawei.com> > > > > > > Currently if disk_scan_partitions() failed, GD_NEED_PART_SCAN will still > > > set, and partition scan will be proceed again when blkdev_get_by_dev() > > > is called. However, this will cause a problem that re-assemble partitioned > > > raid device will creat partition for underlying disk. > > > > > > Test procedure: > > > > > > mdadm -CR /dev/md0 -l 1 -n 2 /dev/sda /dev/sdb -e 1.0 > > > sgdisk -n 0:0:+100MiB /dev/md0 > > > blockdev --rereadpt /dev/sda > > > blockdev --rereadpt /dev/sdb > > > mdadm -S /dev/md0 > > > mdadm -A /dev/md0 /dev/sda /dev/sdb > > > > > > Test result: underlying disk partition and raid partition can be > > > observed at the same time > > > > > > Note that this can still happen in come corner cases that > > > GD_NEED_PART_SCAN can be set for underlying disk while re-assemble raid > > > device. > > > > > > Fixes: e5cfefa97bcc ("block: fix scan partition for exclusively open device again") > > > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > > > > The issue still can't be avoided completely, such as, after rebooting, > > /dev/sda1 & /dev/md0p1 can be observed at the same time. And this one > > should be underlying partitions scanned before re-assembling raid, I > > guess it may not be easy to avoid. > > So this was always happening (before my patches, after my patches, and now > after Yu's patches) and kernel does not have enough information to know > that sda will become part of md0 device in the future. But mdadm actually > deals with this as far as I remember and deletes partitions for all devices > it is assembling the array from (and quick tracing experiment I did > supports this). I am testing on Fedora 37, so mdadm v4.2 doesn't delete underlying partitions before re-assemble. Also given mdadm or related userspace has to change for avoiding to scan underlying partitions, just wondering why not let userspace to tell kernel not do it explicitly? Thanks, Ming
On Wed 22-03-23 19:34:30, Ming Lei wrote: > On Wed, Mar 22, 2023 at 10:47:07AM +0100, Jan Kara wrote: > > On Wed 22-03-23 15:58:35, Ming Lei wrote: > > > On Wed, Mar 22, 2023 at 11:59:26AM +0800, Yu Kuai wrote: > > > > From: Yu Kuai <yukuai3@huawei.com> > > > > > > > > Currently if disk_scan_partitions() failed, GD_NEED_PART_SCAN will still > > > > set, and partition scan will be proceed again when blkdev_get_by_dev() > > > > is called. However, this will cause a problem that re-assemble partitioned > > > > raid device will creat partition for underlying disk. > > > > > > > > Test procedure: > > > > > > > > mdadm -CR /dev/md0 -l 1 -n 2 /dev/sda /dev/sdb -e 1.0 > > > > sgdisk -n 0:0:+100MiB /dev/md0 > > > > blockdev --rereadpt /dev/sda > > > > blockdev --rereadpt /dev/sdb > > > > mdadm -S /dev/md0 > > > > mdadm -A /dev/md0 /dev/sda /dev/sdb > > > > > > > > Test result: underlying disk partition and raid partition can be > > > > observed at the same time > > > > > > > > Note that this can still happen in come corner cases that > > > > GD_NEED_PART_SCAN can be set for underlying disk while re-assemble raid > > > > device. > > > > > > > > Fixes: e5cfefa97bcc ("block: fix scan partition for exclusively open device again") > > > > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > > > > > > The issue still can't be avoided completely, such as, after rebooting, > > > /dev/sda1 & /dev/md0p1 can be observed at the same time. And this one > > > should be underlying partitions scanned before re-assembling raid, I > > > guess it may not be easy to avoid. > > > > So this was always happening (before my patches, after my patches, and now > > after Yu's patches) and kernel does not have enough information to know > > that sda will become part of md0 device in the future. But mdadm actually > > deals with this as far as I remember and deletes partitions for all devices > > it is assembling the array from (and quick tracing experiment I did > > supports this). > > I am testing on Fedora 37, so mdadm v4.2 doesn't delete underlying > partitions before re-assemble. Strange, I'm on openSUSE Leap 15.4 and mdadm v4.1 deletes these partitions (at least I can see mdadm do BLKPG_DEL_PARTITION ioctls). And checking mdadm sources I can see calls to remove_partitions() from start_array() function in Assemble.c so I'm not sure why this is not working for you... > Also given mdadm or related userspace has to change for avoiding > to scan underlying partitions, just wondering why not let userspace > to tell kernel not do it explicitly? Well, those userspace changes are long deployed, now you would introduce new API that needs to proliferate again. Not very nice. Also how would that exactly work? I mean once mdadm has underlying device open, the current logic makes sure we do not create partitions anymore. But there's no way how mdadm could possibly prevent creation of partitions for devices it doesn't know about yet so it still has to delete existing partitions... Honza
On Wed, Mar 22, 2023 at 02:07:09PM +0100, Jan Kara wrote: > On Wed 22-03-23 19:34:30, Ming Lei wrote: > > On Wed, Mar 22, 2023 at 10:47:07AM +0100, Jan Kara wrote: > > > On Wed 22-03-23 15:58:35, Ming Lei wrote: > > > > On Wed, Mar 22, 2023 at 11:59:26AM +0800, Yu Kuai wrote: > > > > > From: Yu Kuai <yukuai3@huawei.com> > > > > > > > > > > Currently if disk_scan_partitions() failed, GD_NEED_PART_SCAN will still > > > > > set, and partition scan will be proceed again when blkdev_get_by_dev() > > > > > is called. However, this will cause a problem that re-assemble partitioned > > > > > raid device will creat partition for underlying disk. > > > > > > > > > > Test procedure: > > > > > > > > > > mdadm -CR /dev/md0 -l 1 -n 2 /dev/sda /dev/sdb -e 1.0 > > > > > sgdisk -n 0:0:+100MiB /dev/md0 > > > > > blockdev --rereadpt /dev/sda > > > > > blockdev --rereadpt /dev/sdb > > > > > mdadm -S /dev/md0 > > > > > mdadm -A /dev/md0 /dev/sda /dev/sdb > > > > > > > > > > Test result: underlying disk partition and raid partition can be > > > > > observed at the same time > > > > > > > > > > Note that this can still happen in come corner cases that > > > > > GD_NEED_PART_SCAN can be set for underlying disk while re-assemble raid > > > > > device. > > > > > > > > > > Fixes: e5cfefa97bcc ("block: fix scan partition for exclusively open device again") > > > > > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > > > > > > > > The issue still can't be avoided completely, such as, after rebooting, > > > > /dev/sda1 & /dev/md0p1 can be observed at the same time. And this one > > > > should be underlying partitions scanned before re-assembling raid, I > > > > guess it may not be easy to avoid. > > > > > > So this was always happening (before my patches, after my patches, and now > > > after Yu's patches) and kernel does not have enough information to know > > > that sda will become part of md0 device in the future. But mdadm actually > > > deals with this as far as I remember and deletes partitions for all devices > > > it is assembling the array from (and quick tracing experiment I did > > > supports this). > > > > I am testing on Fedora 37, so mdadm v4.2 doesn't delete underlying > > partitions before re-assemble. > > Strange, I'm on openSUSE Leap 15.4 and mdadm v4.1 deletes these partitions > (at least I can see mdadm do BLKPG_DEL_PARTITION ioctls). And checking > mdadm sources I can see calls to remove_partitions() from start_array() > function in Assemble.c so I'm not sure why this is not working for you... I added dump_stack() in delete_partition() for partition 1, not observe stack trace during booting. > > > Also given mdadm or related userspace has to change for avoiding > > to scan underlying partitions, just wondering why not let userspace > > to tell kernel not do it explicitly? > > Well, those userspace changes are long deployed, now you would introduce > new API that needs to proliferate again. Not very nice. Also how would that > exactly work? I mean once mdadm has underlying device open, the current > logic makes sure we do not create partitions anymore. But there's no way > how mdadm could possibly prevent creation of partitions for devices it > doesn't know about yet so it still has to delete existing partitions... I meant if mdadm has to change to delete existed partitions, why not add one ioctl to disable partition scan for this disk when deleting partitions/re-assemble, and re-enable scan after stopping array. But looks it isn't so, since you mentioned that remove_partitions is supposed to be called before starting array, however I didn't observe this behavior. I am worrying if the current approach may cause regression, one concern is that ioctl(BLKRRPART) needs exclusive open now, such as: 1) mount /dev/vdb1 /mnt 2) ioctl(BLKRRPART) may fail after removing /dev/vdb3 thanks, Ming
On Thu 23-03-23 00:08:51, Ming Lei wrote: > On Wed, Mar 22, 2023 at 02:07:09PM +0100, Jan Kara wrote: > > On Wed 22-03-23 19:34:30, Ming Lei wrote: > > > On Wed, Mar 22, 2023 at 10:47:07AM +0100, Jan Kara wrote: > > > > On Wed 22-03-23 15:58:35, Ming Lei wrote: > > > > > On Wed, Mar 22, 2023 at 11:59:26AM +0800, Yu Kuai wrote: > > > > > > From: Yu Kuai <yukuai3@huawei.com> > > > > > > > > > > > > Currently if disk_scan_partitions() failed, GD_NEED_PART_SCAN will still > > > > > > set, and partition scan will be proceed again when blkdev_get_by_dev() > > > > > > is called. However, this will cause a problem that re-assemble partitioned > > > > > > raid device will creat partition for underlying disk. > > > > > > > > > > > > Test procedure: > > > > > > > > > > > > mdadm -CR /dev/md0 -l 1 -n 2 /dev/sda /dev/sdb -e 1.0 > > > > > > sgdisk -n 0:0:+100MiB /dev/md0 > > > > > > blockdev --rereadpt /dev/sda > > > > > > blockdev --rereadpt /dev/sdb > > > > > > mdadm -S /dev/md0 > > > > > > mdadm -A /dev/md0 /dev/sda /dev/sdb > > > > > > > > > > > > Test result: underlying disk partition and raid partition can be > > > > > > observed at the same time > > > > > > > > > > > > Note that this can still happen in come corner cases that > > > > > > GD_NEED_PART_SCAN can be set for underlying disk while re-assemble raid > > > > > > device. > > > > > > > > > > > > Fixes: e5cfefa97bcc ("block: fix scan partition for exclusively open device again") > > > > > > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > > > > > > > > > > The issue still can't be avoided completely, such as, after rebooting, > > > > > /dev/sda1 & /dev/md0p1 can be observed at the same time. And this one > > > > > should be underlying partitions scanned before re-assembling raid, I > > > > > guess it may not be easy to avoid. > > > > > > > > So this was always happening (before my patches, after my patches, and now > > > > after Yu's patches) and kernel does not have enough information to know > > > > that sda will become part of md0 device in the future. But mdadm actually > > > > deals with this as far as I remember and deletes partitions for all devices > > > > it is assembling the array from (and quick tracing experiment I did > > > > supports this). > > > > > > I am testing on Fedora 37, so mdadm v4.2 doesn't delete underlying > > > partitions before re-assemble. > > > > Strange, I'm on openSUSE Leap 15.4 and mdadm v4.1 deletes these partitions > > (at least I can see mdadm do BLKPG_DEL_PARTITION ioctls). And checking > > mdadm sources I can see calls to remove_partitions() from start_array() > > function in Assemble.c so I'm not sure why this is not working for you... > > I added dump_stack() in delete_partition() for partition 1, not observe > stack trace during booting. > > > > > > Also given mdadm or related userspace has to change for avoiding > > > to scan underlying partitions, just wondering why not let userspace > > > to tell kernel not do it explicitly? > > > > Well, those userspace changes are long deployed, now you would introduce > > new API that needs to proliferate again. Not very nice. Also how would that > > exactly work? I mean once mdadm has underlying device open, the current > > logic makes sure we do not create partitions anymore. But there's no way > > how mdadm could possibly prevent creation of partitions for devices it > > doesn't know about yet so it still has to delete existing partitions... > > I meant if mdadm has to change to delete existed partitions, why not add > one ioctl to disable partition scan for this disk when deleting > partitions/re-assemble, and re-enable scan after stopping array. > > But looks it isn't so, since you mentioned that remove_partitions is > supposed to be called before starting array, however I didn't observe this > behavior. Yeah, not sure what's happening on your system. > I am worrying if the current approach may cause regression, one concern is > that ioctl(BLKRRPART) needs exclusive open now, such as: > > 1) mount /dev/vdb1 /mnt > > 2) ioctl(BLKRRPART) may fail after removing /dev/vdb3 Well, but we always had some variant of: if (disk->open_partitions) return -EBUSY; in disk_scan_partitions(). So as long as any partition on the disk is used, EBUSY is the correct return value from BLKRRPART. Honza
On Thu, Mar 23, 2023 at 11:51:20AM +0100, Jan Kara wrote: > On Thu 23-03-23 00:08:51, Ming Lei wrote: > > On Wed, Mar 22, 2023 at 02:07:09PM +0100, Jan Kara wrote: > > > On Wed 22-03-23 19:34:30, Ming Lei wrote: > > > > On Wed, Mar 22, 2023 at 10:47:07AM +0100, Jan Kara wrote: > > > > > On Wed 22-03-23 15:58:35, Ming Lei wrote: > > > > > > On Wed, Mar 22, 2023 at 11:59:26AM +0800, Yu Kuai wrote: > > > > > > > From: Yu Kuai <yukuai3@huawei.com> > > > > > > > > > > > > > > Currently if disk_scan_partitions() failed, GD_NEED_PART_SCAN will still > > > > > > > set, and partition scan will be proceed again when blkdev_get_by_dev() > > > > > > > is called. However, this will cause a problem that re-assemble partitioned > > > > > > > raid device will creat partition for underlying disk. > > > > > > > > > > > > > > Test procedure: > > > > > > > > > > > > > > mdadm -CR /dev/md0 -l 1 -n 2 /dev/sda /dev/sdb -e 1.0 > > > > > > > sgdisk -n 0:0:+100MiB /dev/md0 > > > > > > > blockdev --rereadpt /dev/sda > > > > > > > blockdev --rereadpt /dev/sdb > > > > > > > mdadm -S /dev/md0 > > > > > > > mdadm -A /dev/md0 /dev/sda /dev/sdb > > > > > > > > > > > > > > Test result: underlying disk partition and raid partition can be > > > > > > > observed at the same time > > > > > > > > > > > > > > Note that this can still happen in come corner cases that > > > > > > > GD_NEED_PART_SCAN can be set for underlying disk while re-assemble raid > > > > > > > device. > > > > > > > > > > > > > > Fixes: e5cfefa97bcc ("block: fix scan partition for exclusively open device again") > > > > > > > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > > > > > > > > > > > > The issue still can't be avoided completely, such as, after rebooting, > > > > > > /dev/sda1 & /dev/md0p1 can be observed at the same time. And this one > > > > > > should be underlying partitions scanned before re-assembling raid, I > > > > > > guess it may not be easy to avoid. > > > > > > > > > > So this was always happening (before my patches, after my patches, and now > > > > > after Yu's patches) and kernel does not have enough information to know > > > > > that sda will become part of md0 device in the future. But mdadm actually > > > > > deals with this as far as I remember and deletes partitions for all devices > > > > > it is assembling the array from (and quick tracing experiment I did > > > > > supports this). > > > > > > > > I am testing on Fedora 37, so mdadm v4.2 doesn't delete underlying > > > > partitions before re-assemble. > > > > > > Strange, I'm on openSUSE Leap 15.4 and mdadm v4.1 deletes these partitions > > > (at least I can see mdadm do BLKPG_DEL_PARTITION ioctls). And checking > > > mdadm sources I can see calls to remove_partitions() from start_array() > > > function in Assemble.c so I'm not sure why this is not working for you... > > > > I added dump_stack() in delete_partition() for partition 1, not observe > > stack trace during booting. > > > > > > > > > Also given mdadm or related userspace has to change for avoiding > > > > to scan underlying partitions, just wondering why not let userspace > > > > to tell kernel not do it explicitly? > > > > > > Well, those userspace changes are long deployed, now you would introduce > > > new API that needs to proliferate again. Not very nice. Also how would that > > > exactly work? I mean once mdadm has underlying device open, the current > > > logic makes sure we do not create partitions anymore. But there's no way > > > how mdadm could possibly prevent creation of partitions for devices it > > > doesn't know about yet so it still has to delete existing partitions... > > > > I meant if mdadm has to change to delete existed partitions, why not add > > one ioctl to disable partition scan for this disk when deleting > > partitions/re-assemble, and re-enable scan after stopping array. > > > > But looks it isn't so, since you mentioned that remove_partitions is > > supposed to be called before starting array, however I didn't observe this > > behavior. > > Yeah, not sure what's happening on your system. Looks not see such issue on Fedora 38, but it does happen on Fedora 37. > > > I am worrying if the current approach may cause regression, one concern is > > that ioctl(BLKRRPART) needs exclusive open now, such as: > > > > 1) mount /dev/vdb1 /mnt > > > > 2) ioctl(BLKRRPART) may fail after removing /dev/vdb3 > > Well, but we always had some variant of: > > if (disk->open_partitions) > return -EBUSY; > > in disk_scan_partitions(). So as long as any partition on the disk is used, > EBUSY is the correct return value from BLKRRPART. OK, missing that check. Then the change basically can be thought as ioctl(BLKRRPART) requiring O_EXCL, One app just open(disk) with O_EXCL for a bit long, then ioctl(BLKRRPART) can't be done from other process. Hope there isn't such case in practice. Thanks, Ming
On Wed, Mar 22, 2023 at 11:59:26AM +0800, Yu Kuai wrote: > From: Yu Kuai <yukuai3@huawei.com> > > Currently if disk_scan_partitions() failed, GD_NEED_PART_SCAN will still > set, and partition scan will be proceed again when blkdev_get_by_dev() > is called. However, this will cause a problem that re-assemble partitioned > raid device will creat partition for underlying disk. > > Test procedure: > > mdadm -CR /dev/md0 -l 1 -n 2 /dev/sda /dev/sdb -e 1.0 > sgdisk -n 0:0:+100MiB /dev/md0 > blockdev --rereadpt /dev/sda > blockdev --rereadpt /dev/sdb > mdadm -S /dev/md0 > mdadm -A /dev/md0 /dev/sda /dev/sdb > > Test result: underlying disk partition and raid partition can be > observed at the same time > > Note that this can still happen in come corner cases that > GD_NEED_PART_SCAN can be set for underlying disk while re-assemble raid > device. That is why I suggest to touch this flag as less as possible, maybe replace it with one function parameter in future. > > Fixes: e5cfefa97bcc ("block: fix scan partition for exclusively open device again") > Signed-off-by: Yu Kuai <yukuai3@huawei.com> So far, let's move on with the fix: Reviewed-by: Ming Lei <ming.lei@redhat.com> Thanks, Ming
Hi, Jens! 在 2023/03/22 11:59, Yu Kuai 写道: > From: Yu Kuai <yukuai3@huawei.com> > > Currently if disk_scan_partitions() failed, GD_NEED_PART_SCAN will still > set, and partition scan will be proceed again when blkdev_get_by_dev() > is called. However, this will cause a problem that re-assemble partitioned > raid device will creat partition for underlying disk. > > Test procedure: > > mdadm -CR /dev/md0 -l 1 -n 2 /dev/sda /dev/sdb -e 1.0 > sgdisk -n 0:0:+100MiB /dev/md0 > blockdev --rereadpt /dev/sda > blockdev --rereadpt /dev/sdb > mdadm -S /dev/md0 > mdadm -A /dev/md0 /dev/sda /dev/sdb > > Test result: underlying disk partition and raid partition can be > observed at the same time > > Note that this can still happen in come corner cases that > GD_NEED_PART_SCAN can be set for underlying disk while re-assemble raid > device. > Can you apply this patch? Thanks, Kuai > Fixes: e5cfefa97bcc ("block: fix scan partition for exclusively open device again") > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > --- > block/genhd.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/block/genhd.c b/block/genhd.c > index 08bb1a9ec22c..a72e27d6779d 100644 > --- a/block/genhd.c > +++ b/block/genhd.c > @@ -368,7 +368,6 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode) > if (disk->open_partitions) > return -EBUSY; > > - set_bit(GD_NEED_PART_SCAN, &disk->state); > /* > * If the device is opened exclusively by current thread already, it's > * safe to scan partitons, otherwise, use bd_prepare_to_claim() to > @@ -381,12 +380,19 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode) > return ret; > } > > + set_bit(GD_NEED_PART_SCAN, &disk->state); > bdev = blkdev_get_by_dev(disk_devt(disk), mode & ~FMODE_EXCL, NULL); > if (IS_ERR(bdev)) > ret = PTR_ERR(bdev); > else > blkdev_put(bdev, mode & ~FMODE_EXCL); > > + /* > + * If blkdev_get_by_dev() failed early, GD_NEED_PART_SCAN is still set, > + * and this will cause that re-assemble partitioned raid device will > + * creat partition for underlying disk. > + */ > + clear_bit(GD_NEED_PART_SCAN, &disk->state); > if (!(mode & FMODE_EXCL)) > bd_abort_claiming(disk->part0, disk_scan_partitions); > return ret; >
On 4/5/23 9:42 PM, Yu Kuai wrote: > Hi, Jens! > > 在 2023/03/22 11:59, Yu Kuai 写道: >> From: Yu Kuai <yukuai3@huawei.com> >> >> Currently if disk_scan_partitions() failed, GD_NEED_PART_SCAN will still >> set, and partition scan will be proceed again when blkdev_get_by_dev() >> is called. However, this will cause a problem that re-assemble partitioned >> raid device will creat partition for underlying disk. >> >> Test procedure: >> >> mdadm -CR /dev/md0 -l 1 -n 2 /dev/sda /dev/sdb -e 1.0 >> sgdisk -n 0:0:+100MiB /dev/md0 >> blockdev --rereadpt /dev/sda >> blockdev --rereadpt /dev/sdb >> mdadm -S /dev/md0 >> mdadm -A /dev/md0 /dev/sda /dev/sdb >> >> Test result: underlying disk partition and raid partition can be >> observed at the same time >> >> Note that this can still happen in come corner cases that >> GD_NEED_PART_SCAN can be set for underlying disk while re-assemble raid >> device. >> > > Can you apply this patch? None of them apply to my for-6.4/block branch...
On Thu, Apr 06, 2023 at 04:29:43PM -0600, Jens Axboe wrote: > On 4/5/23 9:42 PM, Yu Kuai wrote: > > Hi, Jens! > > > > 在 2023/03/22 11:59, Yu Kuai 写道: > >> From: Yu Kuai <yukuai3@huawei.com> > >> > >> Currently if disk_scan_partitions() failed, GD_NEED_PART_SCAN will still > >> set, and partition scan will be proceed again when blkdev_get_by_dev() > >> is called. However, this will cause a problem that re-assemble partitioned > >> raid device will creat partition for underlying disk. > >> > >> Test procedure: > >> > >> mdadm -CR /dev/md0 -l 1 -n 2 /dev/sda /dev/sdb -e 1.0 > >> sgdisk -n 0:0:+100MiB /dev/md0 > >> blockdev --rereadpt /dev/sda > >> blockdev --rereadpt /dev/sdb > >> mdadm -S /dev/md0 > >> mdadm -A /dev/md0 /dev/sda /dev/sdb > >> > >> Test result: underlying disk partition and raid partition can be > >> observed at the same time > >> > >> Note that this can still happen in come corner cases that > >> GD_NEED_PART_SCAN can be set for underlying disk while re-assemble raid > >> device. > >> > > > > Can you apply this patch? > > None of them apply to my for-6.4/block branch... This patch is bug fix, and probably should aim at 6.3. Thanks, Ming
On 4/6/23 8:01 PM, Ming Lei wrote: > On Thu, Apr 06, 2023 at 04:29:43PM -0600, Jens Axboe wrote: >> On 4/5/23 9:42 PM, Yu Kuai wrote: >>> Hi, Jens! >>> >>> 在 2023/03/22 11:59, Yu Kuai 写道: >>>> From: Yu Kuai <yukuai3@huawei.com> >>>> >>>> Currently if disk_scan_partitions() failed, GD_NEED_PART_SCAN will still >>>> set, and partition scan will be proceed again when blkdev_get_by_dev() >>>> is called. However, this will cause a problem that re-assemble partitioned >>>> raid device will creat partition for underlying disk. >>>> >>>> Test procedure: >>>> >>>> mdadm -CR /dev/md0 -l 1 -n 2 /dev/sda /dev/sdb -e 1.0 >>>> sgdisk -n 0:0:+100MiB /dev/md0 >>>> blockdev --rereadpt /dev/sda >>>> blockdev --rereadpt /dev/sdb >>>> mdadm -S /dev/md0 >>>> mdadm -A /dev/md0 /dev/sda /dev/sdb >>>> >>>> Test result: underlying disk partition and raid partition can be >>>> observed at the same time >>>> >>>> Note that this can still happen in come corner cases that >>>> GD_NEED_PART_SCAN can be set for underlying disk while re-assemble raid >>>> device. >>>> >>> >>> Can you apply this patch? >> >> None of them apply to my for-6.4/block branch... > > This patch is bug fix, and probably should aim at 6.3. Yeah I see now, but it's a bit of a mashup of 2 patches, and then a separate one. I've applied the single fixup for 6.3.
diff --git a/block/genhd.c b/block/genhd.c index 08bb1a9ec22c..a72e27d6779d 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -368,7 +368,6 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode) if (disk->open_partitions) return -EBUSY; - set_bit(GD_NEED_PART_SCAN, &disk->state); /* * If the device is opened exclusively by current thread already, it's * safe to scan partitons, otherwise, use bd_prepare_to_claim() to @@ -381,12 +380,19 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode) return ret; } + set_bit(GD_NEED_PART_SCAN, &disk->state); bdev = blkdev_get_by_dev(disk_devt(disk), mode & ~FMODE_EXCL, NULL); if (IS_ERR(bdev)) ret = PTR_ERR(bdev); else blkdev_put(bdev, mode & ~FMODE_EXCL); + /* + * If blkdev_get_by_dev() failed early, GD_NEED_PART_SCAN is still set, + * and this will cause that re-assemble partitioned raid device will + * creat partition for underlying disk. + */ + clear_bit(GD_NEED_PART_SCAN, &disk->state); if (!(mode & FMODE_EXCL)) bd_abort_claiming(disk->part0, disk_scan_partitions); return ret;