Message ID | 20231215013931.3329455-2-linan666@huaweicloud.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-319-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:3b04:b0:fb:cd0c:d3e with SMTP id c4csp8977200dys; Thu, 14 Dec 2023 17:42:26 -0800 (PST) X-Google-Smtp-Source: AGHT+IGfMxdNPDHruQ7AvQFm7MvhWmp78vYLKCX71MkOFpZMyEfTuaBEUXN9m1lwWLtX0EIqULlL X-Received: by 2002:a05:6a20:bf10:b0:18f:97c:8a45 with SMTP id gc16-20020a056a20bf1000b0018f097c8a45mr12525886pzb.112.1702604546600; Thu, 14 Dec 2023 17:42:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702604546; cv=none; d=google.com; s=arc-20160816; b=JfjGgdJ2GuUNWceQHgGnIMWgo/NZ1vwM4eusHvdWkE75eSzfO7VlTTRhI+GQB+fKhi cXsQvHuCTTuUE2PuT2hjipEN/jYvDH8reVqNloIMdzS3lOaeBM2aw6qkQxV6C7fpR0Xb 6pP+uisxtfNdHbTuD8fxYXhX06x0h73e+qpC6mGH4GWLwkDCQKZljwhzatalSsddk/jr OotGWx3kKaZeTCDsdGiqOruJIJ26eU/JfqOK/Kun8pgXXic495JpwIn6O7bEgvH8h/bE G4nC/Z59yOLccjenhKwmE+YjiKXSNaPfayYJC9yFMA4RIySoqVmftOE8SIVF9HMDFIlP w4tQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=AeEiYN8GYN0GZ/3adGcwh0jO6fYTQavurxnea2Wi9oE=; fh=KzMu/56/cZ/XdN+BRAd8A7ip70iszopujqpzmcppc/E=; b=gBYzNs1bC4lG/bH7aUYMt/yy3y/RglnhIOS47rJdEqZD2VJMicEfkaGdU9vBWsldPY 9K7P0N2E/Bh3HcSzcwortxsARZw0rhCfoKRZcrZt2vl6W+hrWL5jcB59qzdsiDC28EJd dC+G1jdrhkSsR8lkIzN+o4ut5XOB04UUYxYrWKEQXdyAU67mmMK+5V4+N/s/MeUbkmMs dwKnjzGWMeS05tMycJ9M+xn8MgLJsET5S8uAzgenjC+qKkciPoO53NBeGqPnKWEl8Rvr lQ3wN61nmU1QJRIzqk7UmY+/eaD+nhovY8dPdYSOkO/LbiY/9CB0WkxSUkJo19TG51iI 1onQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-319-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-319-ouuuleilei=gmail.com@vger.kernel.org" Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id n190-20020a6327c7000000b005c659eb004asi12202402pgn.485.2023.12.14.17.42.26 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 17:42:26 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-319-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-319-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-319-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 7D812B21EEB for <ouuuleilei@gmail.com>; Fri, 15 Dec 2023 01:42:14 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7112946AB; Fri, 15 Dec 2023 01:41:21 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D1112624; Fri, 15 Dec 2023 01:41:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4SrsPW4WBZz4f3lVc; Fri, 15 Dec 2023 09:41:03 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id A322F1A085B; Fri, 15 Dec 2023 09:41:08 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgD3Rg2yrntlPtG0Dg--.46759S5; Fri, 15 Dec 2023 09:41:08 +0800 (CST) From: linan666@huaweicloud.com To: song@kernel.org, axboe@kernel.dk Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linan666@huaweicloud.com, yukuai3@huawei.com, yi.zhang@huawei.com, houtao1@huawei.com, yangerkun@huawei.com Subject: [PATCH v2 1/2] md: Fix overflow in is_mddev_idle Date: Fri, 15 Dec 2023 09:39:30 +0800 Message-Id: <20231215013931.3329455-2-linan666@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231215013931.3329455-1-linan666@huaweicloud.com> References: <20231215013931.3329455-1-linan666@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID: cCh0CgD3Rg2yrntlPtG0Dg--.46759S5 X-Coremail-Antispam: 1UD129KBjvJXoWxXF45Zw1xGw1fur1kXw43ZFb_yoW5Ar43pF WUJFyakrWUXr4Uuw4DZayDCa4rK34ay3y3KrW7C3yaqFySg3ZIqF48GFyYqFnrZFW8ZFW2 q34UKFZ0va40qrJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmK14x267AKxVW5JVWrJwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jr4l82xGYIkIc2 x26xkF7I0E14v26r1I6r4UM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAa c4AC62xK8xCEY4vEwIxC4wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzV Aqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S 6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxw ACI402YVCY1x02628vn2kIc2xKxwAKzVCY07xG64k0F24l42xK82IYc2Ij64vIr41l4I8I 3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxV WUGVWUWwC2zVAF1VAY17CE14v26r1q6r43MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAF wI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcI k0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j 6r4UJbIYCTnIWIevJa73UjIFyTuYvjfUeYL9UUUUU X-CM-SenderInfo: polqt0awwwqx5xdzvxpfor3voofrz/ X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785310265392275102 X-GMAIL-MSGID: 1785310265392275102 |
Series |
md: fix is_mddev_idle()
|
|
Commit Message
Li Nan
Dec. 15, 2023, 1:39 a.m. UTC
From: Li Nan <linan122@huawei.com> UBSAN reports this problem: UBSAN: Undefined behaviour in drivers/md/md.c:8175:15 signed integer overflow: -2147483291 - 2072033152 cannot be represented in type 'int' Call trace: dump_backtrace+0x0/0x310 show_stack+0x28/0x38 dump_stack+0xec/0x15c ubsan_epilogue+0x18/0x84 handle_overflow+0x14c/0x19c __ubsan_handle_sub_overflow+0x34/0x44 is_mddev_idle+0x338/0x3d8 md_do_sync+0x1bb8/0x1cf8 md_thread+0x220/0x288 kthread+0x1d8/0x1e0 ret_from_fork+0x10/0x18 'curr_events' will overflow when stat accum or 'sync_io' is greater than INT_MAX. Fix it by changing sync_io, last_events and curr_events to 64bit. Signed-off-by: Li Nan <linan122@huawei.com> --- drivers/md/md.h | 4 ++-- include/linux/blkdev.h | 2 +- drivers/md/md.c | 7 ++++--- 3 files changed, 7 insertions(+), 6 deletions(-)
Comments
On Thu, Dec 14, 2023 at 5:41 PM <linan666@huaweicloud.com> wrote: > > From: Li Nan <linan122@huawei.com> > > UBSAN reports this problem: > > UBSAN: Undefined behaviour in drivers/md/md.c:8175:15 > signed integer overflow: > -2147483291 - 2072033152 cannot be represented in type 'int' > Call trace: > dump_backtrace+0x0/0x310 > show_stack+0x28/0x38 > dump_stack+0xec/0x15c > ubsan_epilogue+0x18/0x84 > handle_overflow+0x14c/0x19c > __ubsan_handle_sub_overflow+0x34/0x44 > is_mddev_idle+0x338/0x3d8 > md_do_sync+0x1bb8/0x1cf8 > md_thread+0x220/0x288 > kthread+0x1d8/0x1e0 > ret_from_fork+0x10/0x18 > > 'curr_events' will overflow when stat accum or 'sync_io' is greater than > INT_MAX. > > Fix it by changing sync_io, last_events and curr_events to 64bit. > > Signed-off-by: Li Nan <linan122@huawei.com> > --- > drivers/md/md.h | 4 ++-- > include/linux/blkdev.h | 2 +- > drivers/md/md.c | 7 ++++--- > 3 files changed, 7 insertions(+), 6 deletions(-) > > diff --git a/drivers/md/md.h b/drivers/md/md.h > index ade83af123a2..1a4f976951c1 100644 > --- a/drivers/md/md.h > +++ b/drivers/md/md.h > @@ -50,7 +50,7 @@ struct md_rdev { > > sector_t sectors; /* Device size (in 512bytes sectors) */ > struct mddev *mddev; /* RAID array if running */ > - int last_events; /* IO event timestamp */ > + long long last_events; /* IO event timestamp */ > > /* > * If meta_bdev is non-NULL, it means that a separate device is > @@ -584,7 +584,7 @@ extern void mddev_unlock(struct mddev *mddev); > > static inline void md_sync_acct(struct block_device *bdev, unsigned long nr_sectors) > { > - atomic_add(nr_sectors, &bdev->bd_disk->sync_io); > + atomic64_add(nr_sectors, &bdev->bd_disk->sync_io); > } > > static inline void md_sync_acct_bio(struct bio *bio, unsigned long nr_sectors) > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index 3f8a21cd9233..d28b98adf457 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -170,7 +170,7 @@ struct gendisk { > struct list_head slave_bdevs; > #endif > struct timer_rand_state *random; > - atomic_t sync_io; /* RAID */ > + atomic64_t sync_io; /* RAID */ > struct disk_events *ev; As we are on this, I wonder whether we really need this. AFAICT, is_mddev_idle() is the only consumer of sync_io. We can probably do the same check in is_mddev_idle() without sync_io. Thanks, Song > > #ifdef CONFIG_BLK_DEV_ZONED > diff --git a/drivers/md/md.c b/drivers/md/md.c > index c94373d64f2c..1d71b2a9af03 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -8496,14 +8496,15 @@ static int is_mddev_idle(struct mddev *mddev, int init) > { > struct md_rdev *rdev; > int idle; > - int curr_events; > + long long curr_events; > > idle = 1; > rcu_read_lock(); > rdev_for_each_rcu(rdev, mddev) { > struct gendisk *disk = rdev->bdev->bd_disk; > - curr_events = (int)part_stat_read_accum(disk->part0, sectors) - > - atomic_read(&disk->sync_io); > + curr_events = > + (long long)part_stat_read_accum(disk->part0, sectors) - > + atomic64_read(&disk->sync_io); > /* sync IO will cause sync_io to increase before the disk_stats > * as sync_io is counted when a request starts, and > * disk_stats is counted when it completes. > -- > 2.39.2 > >
Hi, 在 2023/12/16 7:12, Song Liu 写道: > On Thu, Dec 14, 2023 at 5:41 PM <linan666@huaweicloud.com> wrote: >> >> From: Li Nan <linan122@huawei.com> >> >> UBSAN reports this problem: >> >> UBSAN: Undefined behaviour in drivers/md/md.c:8175:15 >> signed integer overflow: >> -2147483291 - 2072033152 cannot be represented in type 'int' >> Call trace: >> dump_backtrace+0x0/0x310 >> show_stack+0x28/0x38 >> dump_stack+0xec/0x15c >> ubsan_epilogue+0x18/0x84 >> handle_overflow+0x14c/0x19c >> __ubsan_handle_sub_overflow+0x34/0x44 >> is_mddev_idle+0x338/0x3d8 >> md_do_sync+0x1bb8/0x1cf8 >> md_thread+0x220/0x288 >> kthread+0x1d8/0x1e0 >> ret_from_fork+0x10/0x18 >> >> 'curr_events' will overflow when stat accum or 'sync_io' is greater than >> INT_MAX. >> >> Fix it by changing sync_io, last_events and curr_events to 64bit. >> >> Signed-off-by: Li Nan <linan122@huawei.com> >> --- >> drivers/md/md.h | 4 ++-- >> include/linux/blkdev.h | 2 +- >> drivers/md/md.c | 7 ++++--- >> 3 files changed, 7 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/md/md.h b/drivers/md/md.h >> index ade83af123a2..1a4f976951c1 100644 >> --- a/drivers/md/md.h >> +++ b/drivers/md/md.h >> @@ -50,7 +50,7 @@ struct md_rdev { >> >> sector_t sectors; /* Device size (in 512bytes sectors) */ >> struct mddev *mddev; /* RAID array if running */ >> - int last_events; /* IO event timestamp */ >> + long long last_events; /* IO event timestamp */ >> >> /* >> * If meta_bdev is non-NULL, it means that a separate device is >> @@ -584,7 +584,7 @@ extern void mddev_unlock(struct mddev *mddev); >> >> static inline void md_sync_acct(struct block_device *bdev, unsigned long nr_sectors) >> { >> - atomic_add(nr_sectors, &bdev->bd_disk->sync_io); >> + atomic64_add(nr_sectors, &bdev->bd_disk->sync_io); >> } >> >> static inline void md_sync_acct_bio(struct bio *bio, unsigned long nr_sectors) >> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h >> index 3f8a21cd9233..d28b98adf457 100644 >> --- a/include/linux/blkdev.h >> +++ b/include/linux/blkdev.h >> @@ -170,7 +170,7 @@ struct gendisk { >> struct list_head slave_bdevs; >> #endif >> struct timer_rand_state *random; >> - atomic_t sync_io; /* RAID */ >> + atomic64_t sync_io; /* RAID */ >> struct disk_events *ev; > > As we are on this, I wonder whether we really need this. > AFAICT, is_mddev_idle() is the only consumer of sync_io. > We can probably do the same check in is_mddev_idle() > without sync_io. After reviewing some code, what I'm understand is following: I think 'sync_io' is used to distinguish 'sync io' from raid(this can from different raid, for example, different partition is used for different array) and other io(any io, even it's not from raid). And if there are any other IO, sync speed is limited to min, otherwise it's limited to max. If we want to keep this behaviour, I can't think of any other way for now... > > Thanks, > Song > > >> >> #ifdef CONFIG_BLK_DEV_ZONED >> diff --git a/drivers/md/md.c b/drivers/md/md.c >> index c94373d64f2c..1d71b2a9af03 100644 >> --- a/drivers/md/md.c >> +++ b/drivers/md/md.c >> @@ -8496,14 +8496,15 @@ static int is_mddev_idle(struct mddev *mddev, int init) >> { >> struct md_rdev *rdev; >> int idle; >> - int curr_events; >> + long long curr_events; >> >> idle = 1; >> rcu_read_lock(); >> rdev_for_each_rcu(rdev, mddev) { >> struct gendisk *disk = rdev->bdev->bd_disk; >> - curr_events = (int)part_stat_read_accum(disk->part0, sectors) - >> - atomic_read(&disk->sync_io); >> + curr_events = >> + (long long)part_stat_read_accum(disk->part0, sectors) - >> + atomic64_read(&disk->sync_io); By the way, I don't think this overflow is problematic, assume that sectors accumulate by 'A' and sync_io accumulate by 'B': (setros + A) - (sync_io + B) -(sectors - sync_io) = (A - B) Nomatter overflow or truncation happened of not in the abouve addition and subtraction, the result is correct. And even if io accounting is disabled, which means sectors and A is always 0, the result will always be -B that is <= 0, hence is_mddev_idle() will always return true, and sync_speed will be limited to max in this case. Thanks, Kuai >> /* sync IO will cause sync_io to increase before the disk_stats >> * as sync_io is counted when a request starts, and >> * disk_stats is counted when it completes. >> -- >> 2.39.2 >> >> > . >
On Fri, Dec 15, 2023 at 6:24 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: [...] > >> static inline void md_sync_acct_bio(struct bio *bio, unsigned long nr_sectors) > >> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > >> index 3f8a21cd9233..d28b98adf457 100644 > >> --- a/include/linux/blkdev.h > >> +++ b/include/linux/blkdev.h > >> @@ -170,7 +170,7 @@ struct gendisk { > >> struct list_head slave_bdevs; > >> #endif > >> struct timer_rand_state *random; > >> - atomic_t sync_io; /* RAID */ > >> + atomic64_t sync_io; /* RAID */ > >> struct disk_events *ev; > > > > As we are on this, I wonder whether we really need this. > > AFAICT, is_mddev_idle() is the only consumer of sync_io. > > We can probably do the same check in is_mddev_idle() > > without sync_io. > > After reviewing some code, what I'm understand is following: > > I think 'sync_io' is used to distinguish 'sync io' from raid(this can > from different raid, for example, different partition is used for > different array) and other io(any io, even it's not from raid). And > if there are any other IO, sync speed is limited to min, otherwise it's > limited to max. > > If we want to keep this behaviour, I can't think of any other way for > now... Thanks for looking into this. To keep current behavior, we will need it in gendisk. [...] > >> @@ -8496,14 +8496,15 @@ static int is_mddev_idle(struct mddev *mddev, int init) > >> { > >> struct md_rdev *rdev; > >> int idle; > >> - int curr_events; > >> + long long curr_events; > >> > >> idle = 1; > >> rcu_read_lock(); > >> rdev_for_each_rcu(rdev, mddev) { > >> struct gendisk *disk = rdev->bdev->bd_disk; > >> - curr_events = (int)part_stat_read_accum(disk->part0, sectors) - > >> - atomic_read(&disk->sync_io); > >> + curr_events = > >> + (long long)part_stat_read_accum(disk->part0, sectors) - > >> + atomic64_read(&disk->sync_io); > > By the way, I don't think this overflow is problematic, assume that > sectors accumulate by 'A' and sync_io accumulate by 'B': > > (setros + A) - (sync_io + B) -(sectors - sync_io) = (A - B) > > Nomatter overflow or truncation happened of not in the abouve addition > and subtraction, the result is correct. > > And even if io accounting is disabled, which means sectors and A is > always 0, the result will always be -B that is <= 0, hence > is_mddev_idle() will always return true, and sync_speed will be limited > to max in this case. We only use this for idle or not check, the behavior is OK (I think). However, this logic is error prone. On 64-bit systems, there is a 4-byte hole behind sync_io. I think we can just use it for atomic64_t so that we don't have to worry about overflow. Thanks, Song
Hi, 在 2023/12/17 8:28, Song Liu 写道: > On Fri, Dec 15, 2023 at 6:24 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > [...] >>>> static inline void md_sync_acct_bio(struct bio *bio, unsigned long nr_sectors) >>>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h >>>> index 3f8a21cd9233..d28b98adf457 100644 >>>> --- a/include/linux/blkdev.h >>>> +++ b/include/linux/blkdev.h >>>> @@ -170,7 +170,7 @@ struct gendisk { >>>> struct list_head slave_bdevs; >>>> #endif >>>> struct timer_rand_state *random; >>>> - atomic_t sync_io; /* RAID */ >>>> + atomic64_t sync_io; /* RAID */ >>>> struct disk_events *ev; >>> >>> As we are on this, I wonder whether we really need this. >>> AFAICT, is_mddev_idle() is the only consumer of sync_io. >>> We can probably do the same check in is_mddev_idle() >>> without sync_io. >> >> After reviewing some code, what I'm understand is following: >> >> I think 'sync_io' is used to distinguish 'sync io' from raid(this can >> from different raid, for example, different partition is used for >> different array) and other io(any io, even it's not from raid). And >> if there are any other IO, sync speed is limited to min, otherwise it's >> limited to max. >> >> If we want to keep this behaviour, I can't think of any other way for >> now... > > Thanks for looking into this. To keep current behavior, we will need it > in gendisk. > > [...] > >>>> @@ -8496,14 +8496,15 @@ static int is_mddev_idle(struct mddev *mddev, int init) >>>> { >>>> struct md_rdev *rdev; >>>> int idle; >>>> - int curr_events; >>>> + long long curr_events; >>>> >>>> idle = 1; >>>> rcu_read_lock(); >>>> rdev_for_each_rcu(rdev, mddev) { >>>> struct gendisk *disk = rdev->bdev->bd_disk; >>>> - curr_events = (int)part_stat_read_accum(disk->part0, sectors) - >>>> - atomic_read(&disk->sync_io); >>>> + curr_events = >>>> + (long long)part_stat_read_accum(disk->part0, sectors) - >>>> + atomic64_read(&disk->sync_io); >> >> By the way, I don't think this overflow is problematic, assume that >> sectors accumulate by 'A' and sync_io accumulate by 'B': >> >> (setros + A) - (sync_io + B) -(sectors - sync_io) = (A - B) >> >> Nomatter overflow or truncation happened of not in the abouve addition >> and subtraction, the result is correct. >> >> And even if io accounting is disabled, which means sectors and A is >> always 0, the result will always be -B that is <= 0, hence >> is_mddev_idle() will always return true, and sync_speed will be limited >> to max in this case. > > We only use this for idle or not check, the behavior is OK (I think). > However, this logic is error prone. > > On 64-bit systems, there is a 4-byte hole behind sync_io. I think we can > just use it for atomic64_t so that we don't have to worry about overflow. I'm not sure about this, because other than this ubsan warning, this overflow doesn't have any impact on functionality to me. If we care about this 'hole', there are lots of holes in gendisk, and can be avoiled, for example, moving 'sync_io' near to 'node_id'. Thanks, Kuai > > Thanks, > Song > . >
On Sun, Dec 17, 2023 at 5:39 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > [...] > > > > We only use this for idle or not check, the behavior is OK (I think). > > However, this logic is error prone. > > > > On 64-bit systems, there is a 4-byte hole behind sync_io. I think we can > > just use it for atomic64_t so that we don't have to worry about overflow. > > I'm not sure about this, because other than this ubsan warning, this > overflow doesn't have any impact on functionality to me. Fixing warnings for zero or low cost is always a good idea. It helps boost the signal when UBSAN (and other debug features) detects real issues. > If we care about this 'hole', there are lots of holes in gendisk, and > can be avoiled, for example, moving 'sync_io' near to 'node_id'. The point was not "let's fill the hole", but "we can use atomic64_t without extra memory cost". In general, I don't think we care too much about holes in "struct gendisk". Does this make sense? Thanks, Song
Hi, 在 2023/12/19 0:04, Song Liu 写道: > On Sun, Dec 17, 2023 at 5:39 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >> > [...] >>> >>> We only use this for idle or not check, the behavior is OK (I think). >>> However, this logic is error prone. >>> >>> On 64-bit systems, there is a 4-byte hole behind sync_io. I think we can >>> just use it for atomic64_t so that we don't have to worry about overflow. >> >> I'm not sure about this, because other than this ubsan warning, this >> overflow doesn't have any impact on functionality to me. > > Fixing warnings for zero or low cost is always a good idea. It helps boost > the signal when UBSAN (and other debug features) detects real issues. > >> If we care about this 'hole', there are lots of holes in gendisk, and >> can be avoiled, for example, moving 'sync_io' near to 'node_id'. > > The point was not "let's fill the hole", but "we can use atomic64_t > without extra memory cost". In general, I don't think we care too > much about holes in "struct gendisk". > > Does this make sense? Of course, I don't have strong preference on this. Because our syzkaller reported lots of UBSAN warnings, hence only fix real issues is how we do it. For upstream, I'm good at fix this warning with zero or low cost. Thanks, Kuai > > Thanks, > Song > . >
diff --git a/drivers/md/md.h b/drivers/md/md.h index ade83af123a2..1a4f976951c1 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -50,7 +50,7 @@ struct md_rdev { sector_t sectors; /* Device size (in 512bytes sectors) */ struct mddev *mddev; /* RAID array if running */ - int last_events; /* IO event timestamp */ + long long last_events; /* IO event timestamp */ /* * If meta_bdev is non-NULL, it means that a separate device is @@ -584,7 +584,7 @@ extern void mddev_unlock(struct mddev *mddev); static inline void md_sync_acct(struct block_device *bdev, unsigned long nr_sectors) { - atomic_add(nr_sectors, &bdev->bd_disk->sync_io); + atomic64_add(nr_sectors, &bdev->bd_disk->sync_io); } static inline void md_sync_acct_bio(struct bio *bio, unsigned long nr_sectors) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 3f8a21cd9233..d28b98adf457 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -170,7 +170,7 @@ struct gendisk { struct list_head slave_bdevs; #endif struct timer_rand_state *random; - atomic_t sync_io; /* RAID */ + atomic64_t sync_io; /* RAID */ struct disk_events *ev; #ifdef CONFIG_BLK_DEV_ZONED diff --git a/drivers/md/md.c b/drivers/md/md.c index c94373d64f2c..1d71b2a9af03 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8496,14 +8496,15 @@ static int is_mddev_idle(struct mddev *mddev, int init) { struct md_rdev *rdev; int idle; - int curr_events; + long long curr_events; idle = 1; rcu_read_lock(); rdev_for_each_rcu(rdev, mddev) { struct gendisk *disk = rdev->bdev->bd_disk; - curr_events = (int)part_stat_read_accum(disk->part0, sectors) - - atomic_read(&disk->sync_io); + curr_events = + (long long)part_stat_read_accum(disk->part0, sectors) - + atomic64_read(&disk->sync_io); /* sync IO will cause sync_io to increase before the disk_stats * as sync_io is counted when a request starts, and * disk_stats is counted when it completes.