Message ID | 20230906093720.1070929-1-linan122@huawei.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ab0a:0:b0:3f2:4152:657d with SMTP id m10csp2203435vqo; Wed, 6 Sep 2023 02:58:04 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFO4e1bmZZIzebZPxAxoVeUqT+PmH5t0pE75ptdkXxR6Lkws31P4Pn6HFWgI13QpOCWSfyT X-Received: by 2002:a17:907:724b:b0:9a1:d5de:5e3 with SMTP id ds11-20020a170907724b00b009a1d5de05e3mr1942195ejc.54.1693994283918; Wed, 06 Sep 2023 02:58:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1693994283; cv=none; d=google.com; s=arc-20160816; b=PvpoplqV/34xzBOjXuA9MazNyepFkhXaVZranMulgvOkcID/EbIQ9pzDydQGMrZm2K bpoLyTGayOZlyVUx8eSc7paun0qWjM+oTvuCTd9T4f6BRFSOrbkM34Na1GY6dw+53snW Om6UFdw57vf9iimKGnk9xLBZo6HrztjpJECU91PwIPFFqFAN3+HbEnbyxjpKb3d3XrZd JkFx3WQyhOjai33n81Y785VsEVtZPoZu5xerjfm8FS5ZeVBz76cE9VDMv3X9IPV153/R 20hKXUWQRiRgKISWdYq9uuCv/ScpNSORQSiST+N5Q9ScTVIJ0rtitl8SD5NQ8aZZrqnK mZ/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=6CEilAvZX22fD3Qk9s8+MPO207WlIFdAxBRteSb/h40=; fh=R8LoagG+D57icA1GaEVobVE15Pcw+K7p2Z5/3U+44mo=; b=Qe+n5S5TSUvTrp+HSr/3gkqczA3PnTHRnClXX/ekR0Fw5WWY8qfEuU8bynUvqI3Dgc 6wir1NQ/gLBWRSIY0TR3wz9fO1L4ItTtGjtJ0Igzdp2bifgQNhPUlnFM2ZaIE5S+hKc3 iDi4RqOF740Un0SQ1vsA0bDz967i6MQuvWbiassdX8RBStyaN74RIoxRPHW9NBFWa30m d5ibnS2ClKCL7Bci4IGKhPZ/znxcI6BWCA72i2e6pL0POW8nOoTUjE7+5Pf2nPnC3ic8 0gHIAaNwaqZjfXw+eluR1luDD/dPy711Vb/22cjPCM2SlNWX60n59EUzuWvWt3/Qc4No Q7nA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i9-20020a1709061cc900b00992d0de8760si8823476ejh.911.2023.09.06.02.57.13; Wed, 06 Sep 2023 02:58:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233560AbjIFJme (ORCPT <rfc822;mochenchao2018@gmail.com> + 99 others); Wed, 6 Sep 2023 05:42:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48102 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237074AbjIFJmb (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 6 Sep 2023 05:42:31 -0400 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9722510F8; Wed, 6 Sep 2023 02:42:17 -0700 (PDT) Received: from dggpeml500003.china.huawei.com (unknown [172.30.72.56]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4Rgcmq2v8Cz1M93f; Wed, 6 Sep 2023 17:40:27 +0800 (CST) Received: from huawei.com (10.175.104.67) by dggpeml500003.china.huawei.com (7.185.36.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Wed, 6 Sep 2023 17:42:14 +0800 From: Li Nan <linan122@huawei.com> To: <song@kernel.org> CC: <linux-raid@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <linan122@huawei.com>, <yukuai3@huawei.com>, <yi.zhang@huawei.com>, <houtao1@huawei.com>, <yangerkun@huawei.com> Subject: [PATCH] md/raid1: only update stack limits with the device in use Date: Wed, 6 Sep 2023 17:37:20 +0800 Message-ID: <20230906093720.1070929-1-linan122@huawei.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.175.104.67] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To dggpeml500003.china.huawei.com (7.185.36.200) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1776281750094120168 X-GMAIL-MSGID: 1776281750094120168 |
Series |
md/raid1: only update stack limits with the device in use
|
|
Commit Message
Li Nan
Sept. 6, 2023, 9:37 a.m. UTC
Spare device affects array stack limits is unreasonable. For example,
create a raid1 with two 512 byte devices, the logical_block_size of array
will be 512. But after add a 4k devcie as spare, logical_block_size of
array will change as follows.
mdadm -C /dev/md0 -n 2 -l 10 /dev/sd[ab] //sd[ab] is 512
//logical_block_size of md0: 512
mdadm --add /dev/md0 /dev/sdc //sdc is 4k
//logical_block_size of md0: 512
mdadm -S /dev/md0
mdadm -A /dev/md0 /dev/sd[ab]
//logical_block_size of md0: 4k
This will confuse users, as nothing has been changed, why did the
logical_block_size of array change?
Now, only update logical_block_size of array with the device in use.
Signed-off-by: Li Nan <linan122@huawei.com>
---
drivers/md/raid1.c | 19 ++++++++-----------
1 file changed, 8 insertions(+), 11 deletions(-)
Comments
On Wed, Sep 6, 2023 at 11:30 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > Hi, > > 在 2023/09/06 17:37, Li Nan 写道: > > Spare device affects array stack limits is unreasonable. For example, > > create a raid1 with two 512 byte devices, the logical_block_size of array > > will be 512. But after add a 4k devcie as spare, logical_block_size of > > array will change as follows. > > > > mdadm -C /dev/md0 -n 2 -l 10 /dev/sd[ab] //sd[ab] is 512 > > //logical_block_size of md0: 512 > > > > mdadm --add /dev/md0 /dev/sdc //sdc is 4k > > //logical_block_size of md0: 512 > > > > mdadm -S /dev/md0 > > mdadm -A /dev/md0 /dev/sd[ab] > > //logical_block_size of md0: 4k > > > > This will confuse users, as nothing has been changed, why did the > > logical_block_size of array change? > > > > Now, only update logical_block_size of array with the device in use. > > > > Signed-off-by: Li Nan <linan122@huawei.com> > > --- > > drivers/md/raid1.c | 19 ++++++++----------- > > 1 file changed, 8 insertions(+), 11 deletions(-) > > > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > > index 95504612b7e2..d75c5dd89e86 100644 > > --- a/drivers/md/raid1.c > > +++ b/drivers/md/raid1.c > > @@ -3140,19 +3140,16 @@ static int raid1_run(struct mddev *mddev) > > I'm not sure about this behaviour, 'logical_block_size' can be > increased while adding new underlying disk, the key point is not when > to increase 'logical_block_size'. If there is a mounted fs, or > partition in the array, I think the array will be corrupted. How common is such fs/partition corruption? I think some fs and partition table can work properly with 512=>4096 change? Thanks, Song > > Perhaps once that array is started, logical_block_size should not be > changed anymore, this will require 'logical_block_size' to be metadate > inside raid superblock. And the array should deny any new disk with > bigger logical_block_size. > > Thanks, > Kuai > > > > if (mddev->queue) > > blk_queue_max_write_zeroes_sectors(mddev->queue, 0); > > > > - rdev_for_each(rdev, mddev) { > > - if (!mddev->gendisk) > > - continue; > > - disk_stack_limits(mddev->gendisk, rdev->bdev, > > - rdev->data_offset << 9); > > - } > > - > > mddev->degraded = 0; > > - for (i = 0; i < conf->raid_disks; i++) > > - if (conf->mirrors[i].rdev == NULL || > > - !test_bit(In_sync, &conf->mirrors[i].rdev->flags) || > > - test_bit(Faulty, &conf->mirrors[i].rdev->flags)) > > + for (i = 0; i < conf->raid_disks; i++) { > > + rdev = conf->mirrors[i].rdev; > > + if (rdev && mddev->gendisk) > > + disk_stack_limits(mddev->gendisk, rdev->bdev, > > + rdev->data_offset << 9); > > + if (!rdev || !test_bit(In_sync, &rdev->flags) || > > + test_bit(Faulty, &rdev->flags)) > > mddev->degraded++; > > + } > > /* > > * RAID1 needs at least one disk in active > > */ > > >
On Sat, Sep 9, 2023 at 7:24 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > Hi, > > 在 2023/09/09 4:42, Song Liu 写道: > > On Wed, Sep 6, 2023 at 11:30 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > >> > >> Hi, > >> > >> 在 2023/09/06 17:37, Li Nan 写道: > >>> Spare device affects array stack limits is unreasonable. For example, > >>> create a raid1 with two 512 byte devices, the logical_block_size of array > >>> will be 512. But after add a 4k devcie as spare, logical_block_size of > >>> array will change as follows. > >>> > >>> mdadm -C /dev/md0 -n 2 -l 10 /dev/sd[ab] //sd[ab] is 512 > >>> //logical_block_size of md0: 512 > >>> > >>> mdadm --add /dev/md0 /dev/sdc //sdc is 4k > >>> //logical_block_size of md0: 512 > >>> > >>> mdadm -S /dev/md0 > >>> mdadm -A /dev/md0 /dev/sd[ab] > >>> //logical_block_size of md0: 4k > >>> > >>> This will confuse users, as nothing has been changed, why did the > >>> logical_block_size of array change? > >>> > >>> Now, only update logical_block_size of array with the device in use. > >>> > >>> Signed-off-by: Li Nan <linan122@huawei.com> > >>> --- > >>> drivers/md/raid1.c | 19 ++++++++----------- > >>> 1 file changed, 8 insertions(+), 11 deletions(-) > >>> > >>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > >>> index 95504612b7e2..d75c5dd89e86 100644 > >>> --- a/drivers/md/raid1.c > >>> +++ b/drivers/md/raid1.c > >>> @@ -3140,19 +3140,16 @@ static int raid1_run(struct mddev *mddev) > >> > >> I'm not sure about this behaviour, 'logical_block_size' can be > >> increased while adding new underlying disk, the key point is not when > >> to increase 'logical_block_size'. If there is a mounted fs, or > >> partition in the array, I think the array will be corrupted. > > > > How common is such fs/partition corruption? I think some fs and partition > > table can work properly with 512=>4096 change? > > For fs, that should depend on fs bs that is usually set in mkfs, if bs > is less than 4096, then such fs can't be mounted. > > For partition, that is much worse, start sector and end sector will stay > the same, while sector size is changed. And 4096 -> 512 change is the > same. Thanks for this information. > >> > >> Perhaps once that array is started, logical_block_size should not be > >> changed anymore, this will require 'logical_block_size' to be metadate > >> inside raid superblock. And the array should deny any new disk with > >> bigger logical_block_size. I really hope we can avoid adding this to the raid superblock. But I am not sure what would be a better solution (that is also backward compatible). Do we have real world reports of such issues? Thanks, Song
Hi, 在 2023/09/23 5:53, Song Liu 写道: > On Sat, Sep 9, 2023 at 7:24 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >> >> Hi, >> >> 在 2023/09/09 4:42, Song Liu 写道: >>> On Wed, Sep 6, 2023 at 11:30 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >>>> >>>> Hi, >>>> >>>> 在 2023/09/06 17:37, Li Nan 写道: >>>>> Spare device affects array stack limits is unreasonable. For example, >>>>> create a raid1 with two 512 byte devices, the logical_block_size of array >>>>> will be 512. But after add a 4k devcie as spare, logical_block_size of >>>>> array will change as follows. >>>>> >>>>> mdadm -C /dev/md0 -n 2 -l 10 /dev/sd[ab] //sd[ab] is 512 >>>>> //logical_block_size of md0: 512 >>>>> >>>>> mdadm --add /dev/md0 /dev/sdc //sdc is 4k >>>>> //logical_block_size of md0: 512 >>>>> >>>>> mdadm -S /dev/md0 >>>>> mdadm -A /dev/md0 /dev/sd[ab] >>>>> //logical_block_size of md0: 4k >>>>> >>>>> This will confuse users, as nothing has been changed, why did the >>>>> logical_block_size of array change? >>>>> >>>>> Now, only update logical_block_size of array with the device in use. >>>>> >>>>> Signed-off-by: Li Nan <linan122@huawei.com> >>>>> --- >>>>> drivers/md/raid1.c | 19 ++++++++----------- >>>>> 1 file changed, 8 insertions(+), 11 deletions(-) >>>>> >>>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >>>>> index 95504612b7e2..d75c5dd89e86 100644 >>>>> --- a/drivers/md/raid1.c >>>>> +++ b/drivers/md/raid1.c >>>>> @@ -3140,19 +3140,16 @@ static int raid1_run(struct mddev *mddev) >>>> >>>> I'm not sure about this behaviour, 'logical_block_size' can be >>>> increased while adding new underlying disk, the key point is not when >>>> to increase 'logical_block_size'. If there is a mounted fs, or >>>> partition in the array, I think the array will be corrupted. >>> >>> How common is such fs/partition corruption? I think some fs and partition >>> table can work properly with 512=>4096 change? >> >> For fs, that should depend on fs bs that is usually set in mkfs, if bs >> is less than 4096, then such fs can't be mounted. >> >> For partition, that is much worse, start sector and end sector will stay >> the same, while sector size is changed. And 4096 -> 512 change is the >> same. > > Thanks for this information. > >>>> >>>> Perhaps once that array is started, logical_block_size should not be >>>> changed anymore, this will require 'logical_block_size' to be metadate >>>> inside raid superblock. And the array should deny any new disk with >>>> bigger logical_block_size. > > I really hope we can avoid adding this to the raid superblock. But I am not > sure what would be a better solution (that is also backward compatible). > Do we have real world reports of such issues? Yes, our customer is using raid1 with one 4k disk and other 512 disk as root device, and they reported that if 4k disk is kicked out from the array, then system can't reboot. And for backward compatible, I think it can be solved by continue to use biggest block size from uderlying disks if metadata is 0. Thanks, Kuai > > Thanks, > Song > . >
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 95504612b7e2..d75c5dd89e86 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -3140,19 +3140,16 @@ static int raid1_run(struct mddev *mddev) if (mddev->queue) blk_queue_max_write_zeroes_sectors(mddev->queue, 0); - rdev_for_each(rdev, mddev) { - if (!mddev->gendisk) - continue; - disk_stack_limits(mddev->gendisk, rdev->bdev, - rdev->data_offset << 9); - } - mddev->degraded = 0; - for (i = 0; i < conf->raid_disks; i++) - if (conf->mirrors[i].rdev == NULL || - !test_bit(In_sync, &conf->mirrors[i].rdev->flags) || - test_bit(Faulty, &conf->mirrors[i].rdev->flags)) + for (i = 0; i < conf->raid_disks; i++) { + rdev = conf->mirrors[i].rdev; + if (rdev && mddev->gendisk) + disk_stack_limits(mddev->gendisk, rdev->bdev, + rdev->data_offset << 9); + if (!rdev || !test_bit(In_sync, &rdev->flags) || + test_bit(Faulty, &rdev->flags)) mddev->degraded++; + } /* * RAID1 needs at least one disk in active */