Message ID | 20240222075806.1816400-7-yukuai1@huaweicloud.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-76049-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:aa16:b0:108:e6aa:91d0 with SMTP id by22csp99107dyb; Thu, 22 Feb 2024 00:05:46 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWLazlFKTPDXf8pB0pQCQ+P1XDRV11/LghNQdz9BozPCvYMMXSALMLZ5X70lyToVPizN5r31/R4i/khcsnjPVqRvg8Kbg== X-Google-Smtp-Source: AGHT+IF20sHh2g1CDzotHYxRGSXCLbgOe2a7HN4NQcAFKYBTgPeKltN+kCoHDQC1YBETv9iPEryw X-Received: by 2002:a05:6214:e6e:b0:68f:67b3:731b with SMTP id jz14-20020a0562140e6e00b0068f67b3731bmr11942808qvb.58.1708589146084; Thu, 22 Feb 2024 00:05:46 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708589146; cv=pass; d=google.com; s=arc-20160816; b=WljCsNRVYWmJ5BMeC0KdwwzHRq48F3GsMmysl2LcMdLa5N7ZCg4xm+8XJ5Yz0H5IGY S3dI5jTjjH44r7wOt6O1o5RBX0V3JVzZyrSXMf/RxVw0MIJnH5Lp1JqpUa5l4gDf0+ma kp+2jgTlOOt8bQ8EuXH8cjR0AdxJxew/9+7QLUFQTkczkssIzSmty87QYSWjmNLacoBG JkUuUNs6ZjZGaHSUjyDzpaGSVZDVfVqFeU6u6of6Tw8LH92SXtIK+ESJRLRLppfdvKj/ 8BSAcS6Eu/2JWtmvSH1AB3mklhexBjUhWinVyGsdfSZagFzVs0LyHL6EwUGt/bSfOIPL DzrA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=v2gffHkXolpgM0b/ltiPk3g8GgVv6WsJwrHHWBa0lgg=; fh=5pFDn0/OhEinqiskZflJb/MnGts2EKkuL30fOGsymHI=; b=XLmMb1c2FgVWkDdx1qnPdGwVp7E7iZ+iP/xog1c31QpiGhn1agHTxuqrAy7wXRRXYy Beu9y4sen6C1Ssdqx49DBLm4ZvsvWr6eD1jQJiQWneW13z08etL0ix09UDqb5NGJKK+a Z9n7bc8RmMJg8rWuBKP58PO/bGEaVYmBTMv0qxzUZH7SWBJxkDkVnEyH7UvZ/iaL9D3/ sLsA6RafKURFHAz+kyZcQJSxy+xy1RsdDvCa3Hx4K2MznJAoHiLqQUQjhh9rd6wuKHnl NE/FwSVGsScr/zL5p8oSg+P5Wf0iKehV3E2y0+aZDuOijZRVEcexQWGfdGG34ZkfpUW5 Kh3Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-76049-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-76049-ouuuleilei=gmail.com@vger.kernel.org" Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id gm11-20020a056214268b00b0068cb4c6ecdbsi12562739qvb.317.2024.02.22.00.05.45 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 22 Feb 2024 00:05:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-76049-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-76049-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-76049-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id C0E9D1C23432 for <ouuuleilei@gmail.com>; Thu, 22 Feb 2024 08:05:38 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id ABFA4224EA; Thu, 22 Feb 2024 08:03:59 +0000 (UTC) Received: from dggsgout12.his.huawei.com (unknown [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 672641799E; Thu, 22 Feb 2024 08:03:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708589037; cv=none; b=PKqULLUdmV8qnbXIpKliQJHZEYafE+GWafWvPbaEbFLhKKh3YS3DFRcoxh0ebVN8HYVf6KsE8lYTsMVK7b6ARJD1dG66rD1Ew41oSOa/EOV4fOi8eiiGdIJtHYjDDFmVOMxssowDdDHjRb26sXuzbJL3EjlAVhrztGIO0iasCaU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708589037; c=relaxed/simple; bh=Sji+sT+RUYImL6rnVGEg2kPA6leNFLII/UxsXcCt0No=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Y/WJkl7lbx6ocIp57fTwpAbs6FhgFIQ9SiBEe6qg5GEaHPHyR37+81O47hlcIcvVXWA1W026HveY7lXzcP7OXb3mWxw20Jvr+3skS3CjHS8eLghGfkVztoMoOYyO4o90x2J1hbXGy+qLY3bCqBRWY7mRicDnmDCRWKO93q+vk+Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4TgQdC1mklz4f3l7R; Thu, 22 Feb 2024 16:03:43 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 447D61A1158; Thu, 22 Feb 2024 16:03:48 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgBXKBHc_9ZlnaUhEw--.37287S10; Thu, 22 Feb 2024 16:03:48 +0800 (CST) From: Yu Kuai <yukuai1@huaweicloud.com> To: paul.e.luse@linux.intel.com, song@kernel.org, neilb@suse.com, shli@fb.com Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH md-6.9 06/10] md/raid1: factor out read_first_rdev() from read_balance() Date: Thu, 22 Feb 2024 15:58:02 +0800 Message-Id: <20240222075806.1816400-7-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240222075806.1816400-1-yukuai1@huaweicloud.com> References: <20240222075806.1816400-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID: cCh0CgBXKBHc_9ZlnaUhEw--.37287S10 X-Coremail-Antispam: 1UD129KBjvJXoWxZFW8CrW5GFW3Jr1UtrWkZwb_yoWrCFyUpw 45AFZ3tryUXr95Zws8J3yDWrn3t34fJF48GrZ7Xwnagrn3KrWqgFy8Grya9Fy5Crs8Jw17 Zw15Ar47C3Z7KFDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPF14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJV W8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjfUOBTY UUUUU X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1791585572122060206 X-GMAIL-MSGID: 1791585572122060206 |
Series |
md/raid1: refactor read_balance() and some minor fix
|
|
Commit Message
Yu Kuai
Feb. 22, 2024, 7:58 a.m. UTC
From: Yu Kuai <yukuai3@huawei.com> read_balance() is hard to understand because there are too many status and branches, and it's overlong. This patch factor out the case to read the first rdev from read_balance(), there are no functional changes. Co-developed-by: Paul Luse <paul.e.luse@linux.intel.com> Signed-off-by: Paul Luse <paul.e.luse@linux.intel.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> --- drivers/md/raid1.c | 63 +++++++++++++++++++++++++++++++++------------- 1 file changed, 46 insertions(+), 17 deletions(-)
Comments
On Thu, Feb 22, 2024 at 4:04 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > From: Yu Kuai <yukuai3@huawei.com> > > read_balance() is hard to understand because there are too many status > and branches, and it's overlong. > > This patch factor out the case to read the first rdev from > read_balance(), there are no functional changes. > > Co-developed-by: Paul Luse <paul.e.luse@linux.intel.com> > Signed-off-by: Paul Luse <paul.e.luse@linux.intel.com> > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > --- > drivers/md/raid1.c | 63 +++++++++++++++++++++++++++++++++------------- > 1 file changed, 46 insertions(+), 17 deletions(-) > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 8089c569e84f..08c45ca55a7e 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -579,6 +579,47 @@ static sector_t align_to_barrier_unit_end(sector_t start_sector, > return len; > } > > +static void update_read_sectors(struct r1conf *conf, int disk, > + sector_t this_sector, int len) > +{ > + struct raid1_info *info = &conf->mirrors[disk]; > + > + atomic_inc(&info->rdev->nr_pending); > + if (info->next_seq_sect != this_sector) > + info->seq_start = this_sector; > + info->next_seq_sect = this_sector + len; > +} > + > +static int choose_first_rdev(struct r1conf *conf, struct r1bio *r1_bio, > + int *max_sectors) > +{ > + sector_t this_sector = r1_bio->sector; > + int len = r1_bio->sectors; > + int disk; > + > + for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) { > + struct md_rdev *rdev; > + int read_len; > + > + if (r1_bio->bios[disk] == IO_BLOCKED) > + continue; > + > + rdev = conf->mirrors[disk].rdev; > + if (!rdev || test_bit(Faulty, &rdev->flags)) > + continue; > + > + /* choose the first disk even if it has some bad blocks. */ > + read_len = raid1_check_read_range(rdev, this_sector, &len); > + if (read_len > 0) { > + update_read_sectors(conf, disk, this_sector, read_len); > + *max_sectors = read_len; > + return disk; > + } Hi Kuai It needs to update max_sectors even if the bad block starts before this_sector. Because it can't read more than bad_blocks from other member disks. If it reads more data than bad blocks, it will cause data corruption. One rule here is read from the primary disk (the first readable disk) if it has no bad block and read the badblock-data-length data from other disks. Best Regards Xiao > + } > + > + return -1; > +} > + > /* > * This routine returns the disk from which the requested read should > * be done. There is a per-array 'next expected sequential IO' sector > @@ -603,7 +644,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > sector_t best_dist; > unsigned int min_pending; > struct md_rdev *rdev; > - int choose_first; > > retry: > sectors = r1_bio->sectors; > @@ -613,10 +653,11 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > best_pending_disk = -1; > min_pending = UINT_MAX; > best_good_sectors = 0; > - choose_first = raid1_should_read_first(conf->mddev, this_sector, > - sectors); > clear_bit(R1BIO_FailFast, &r1_bio->state); > > + if (raid1_should_read_first(conf->mddev, this_sector, sectors)) > + return choose_first_rdev(conf, r1_bio, max_sectors); > + > for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) { > sector_t dist; > sector_t first_bad; > @@ -662,8 +703,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > * bad_sectors from another device.. > */ > bad_sectors -= (this_sector - first_bad); > - if (choose_first && sectors > bad_sectors) > - sectors = bad_sectors; > if (best_good_sectors > sectors) > best_good_sectors = sectors; > > @@ -673,8 +712,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > best_good_sectors = good_sectors; > best_disk = disk; > } > - if (choose_first) > - break; > } > continue; > } else { > @@ -689,10 +726,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > > pending = atomic_read(&rdev->nr_pending); > dist = abs(this_sector - conf->mirrors[disk].head_position); > - if (choose_first) { > - best_disk = disk; > - break; > - } > /* Don't change to another disk for sequential reads */ > if (conf->mirrors[disk].next_seq_sect == this_sector > || dist == 0) { > @@ -760,13 +793,9 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > rdev = conf->mirrors[best_disk].rdev; > if (!rdev) > goto retry; > - atomic_inc(&rdev->nr_pending); > - sectors = best_good_sectors; > - > - if (conf->mirrors[best_disk].next_seq_sect != this_sector) > - conf->mirrors[best_disk].seq_start = this_sector; > > - conf->mirrors[best_disk].next_seq_sect = this_sector + sectors; > + sectors = best_good_sectors; > + update_read_sectors(conf, disk, this_sector, sectors); > } > *max_sectors = sectors; > > -- > 2.39.2 > >
Hi, 在 2024/02/26 22:16, Xiao Ni 写道: > On Thu, Feb 22, 2024 at 4:04 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >> >> From: Yu Kuai <yukuai3@huawei.com> >> >> read_balance() is hard to understand because there are too many status >> and branches, and it's overlong. >> >> This patch factor out the case to read the first rdev from >> read_balance(), there are no functional changes. >> >> Co-developed-by: Paul Luse <paul.e.luse@linux.intel.com> >> Signed-off-by: Paul Luse <paul.e.luse@linux.intel.com> >> Signed-off-by: Yu Kuai <yukuai3@huawei.com> >> --- >> drivers/md/raid1.c | 63 +++++++++++++++++++++++++++++++++------------- >> 1 file changed, 46 insertions(+), 17 deletions(-) >> >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >> index 8089c569e84f..08c45ca55a7e 100644 >> --- a/drivers/md/raid1.c >> +++ b/drivers/md/raid1.c >> @@ -579,6 +579,47 @@ static sector_t align_to_barrier_unit_end(sector_t start_sector, >> return len; >> } >> >> +static void update_read_sectors(struct r1conf *conf, int disk, >> + sector_t this_sector, int len) >> +{ >> + struct raid1_info *info = &conf->mirrors[disk]; >> + >> + atomic_inc(&info->rdev->nr_pending); >> + if (info->next_seq_sect != this_sector) >> + info->seq_start = this_sector; >> + info->next_seq_sect = this_sector + len; >> +} >> + >> +static int choose_first_rdev(struct r1conf *conf, struct r1bio *r1_bio, >> + int *max_sectors) >> +{ >> + sector_t this_sector = r1_bio->sector; >> + int len = r1_bio->sectors; >> + int disk; >> + >> + for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) { >> + struct md_rdev *rdev; >> + int read_len; >> + >> + if (r1_bio->bios[disk] == IO_BLOCKED) >> + continue; >> + >> + rdev = conf->mirrors[disk].rdev; >> + if (!rdev || test_bit(Faulty, &rdev->flags)) >> + continue; >> + >> + /* choose the first disk even if it has some bad blocks. */ >> + read_len = raid1_check_read_range(rdev, this_sector, &len); >> + if (read_len > 0) { >> + update_read_sectors(conf, disk, this_sector, read_len); >> + *max_sectors = read_len; >> + return disk; >> + } > > Hi Kuai > > It needs to update max_sectors even if the bad block starts before > this_sector. Because it can't read more than bad_blocks from other > member disks. If it reads more data than bad blocks, it will cause > data corruption. One rule here is read from the primary disk (the > first readable disk) if it has no bad block and read the > badblock-data-length data from other disks. Noted that raid1_check_read_range() will return readable length from this rdev, hence if bad block starts before this_sector, 0 is returned, and 'len' is updated to the length of badblocks(if not exceed read range), and following iteration will find the first disk to read updated 'len' data and update max_sectors. Thanks, Kuai > > Best Regards > Xiao > >> + } >> + >> + return -1; >> +} >> + >> /* >> * This routine returns the disk from which the requested read should >> * be done. There is a per-array 'next expected sequential IO' sector >> @@ -603,7 +644,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect >> sector_t best_dist; >> unsigned int min_pending; >> struct md_rdev *rdev; >> - int choose_first; >> >> retry: >> sectors = r1_bio->sectors; >> @@ -613,10 +653,11 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect >> best_pending_disk = -1; >> min_pending = UINT_MAX; >> best_good_sectors = 0; >> - choose_first = raid1_should_read_first(conf->mddev, this_sector, >> - sectors); >> clear_bit(R1BIO_FailFast, &r1_bio->state); >> >> + if (raid1_should_read_first(conf->mddev, this_sector, sectors)) >> + return choose_first_rdev(conf, r1_bio, max_sectors); >> + >> for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) { >> sector_t dist; >> sector_t first_bad; >> @@ -662,8 +703,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect >> * bad_sectors from another device.. >> */ >> bad_sectors -= (this_sector - first_bad); >> - if (choose_first && sectors > bad_sectors) >> - sectors = bad_sectors; >> if (best_good_sectors > sectors) >> best_good_sectors = sectors; >> >> @@ -673,8 +712,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect >> best_good_sectors = good_sectors; >> best_disk = disk; >> } >> - if (choose_first) >> - break; >> } >> continue; >> } else { >> @@ -689,10 +726,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect >> >> pending = atomic_read(&rdev->nr_pending); >> dist = abs(this_sector - conf->mirrors[disk].head_position); >> - if (choose_first) { >> - best_disk = disk; >> - break; >> - } >> /* Don't change to another disk for sequential reads */ >> if (conf->mirrors[disk].next_seq_sect == this_sector >> || dist == 0) { >> @@ -760,13 +793,9 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect >> rdev = conf->mirrors[best_disk].rdev; >> if (!rdev) >> goto retry; >> - atomic_inc(&rdev->nr_pending); >> - sectors = best_good_sectors; >> - >> - if (conf->mirrors[best_disk].next_seq_sect != this_sector) >> - conf->mirrors[best_disk].seq_start = this_sector; >> >> - conf->mirrors[best_disk].next_seq_sect = this_sector + sectors; >> + sectors = best_good_sectors; >> + update_read_sectors(conf, disk, this_sector, sectors); >> } >> *max_sectors = sectors; >> >> -- >> 2.39.2 >> >> > > . >
On Tue, Feb 27, 2024 at 9:06 AM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > Hi, > > 在 2024/02/26 22:16, Xiao Ni 写道: > > On Thu, Feb 22, 2024 at 4:04 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > >> > >> From: Yu Kuai <yukuai3@huawei.com> > >> > >> read_balance() is hard to understand because there are too many status > >> and branches, and it's overlong. > >> > >> This patch factor out the case to read the first rdev from > >> read_balance(), there are no functional changes. > >> > >> Co-developed-by: Paul Luse <paul.e.luse@linux.intel.com> > >> Signed-off-by: Paul Luse <paul.e.luse@linux.intel.com> > >> Signed-off-by: Yu Kuai <yukuai3@huawei.com> > >> --- > >> drivers/md/raid1.c | 63 +++++++++++++++++++++++++++++++++------------- > >> 1 file changed, 46 insertions(+), 17 deletions(-) > >> > >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > >> index 8089c569e84f..08c45ca55a7e 100644 > >> --- a/drivers/md/raid1.c > >> +++ b/drivers/md/raid1.c > >> @@ -579,6 +579,47 @@ static sector_t align_to_barrier_unit_end(sector_t start_sector, > >> return len; > >> } > >> > >> +static void update_read_sectors(struct r1conf *conf, int disk, > >> + sector_t this_sector, int len) > >> +{ > >> + struct raid1_info *info = &conf->mirrors[disk]; > >> + > >> + atomic_inc(&info->rdev->nr_pending); > >> + if (info->next_seq_sect != this_sector) > >> + info->seq_start = this_sector; > >> + info->next_seq_sect = this_sector + len; > >> +} > >> + > >> +static int choose_first_rdev(struct r1conf *conf, struct r1bio *r1_bio, > >> + int *max_sectors) > >> +{ > >> + sector_t this_sector = r1_bio->sector; > >> + int len = r1_bio->sectors; > >> + int disk; > >> + > >> + for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) { > >> + struct md_rdev *rdev; > >> + int read_len; > >> + > >> + if (r1_bio->bios[disk] == IO_BLOCKED) > >> + continue; > >> + > >> + rdev = conf->mirrors[disk].rdev; > >> + if (!rdev || test_bit(Faulty, &rdev->flags)) > >> + continue; > >> + > >> + /* choose the first disk even if it has some bad blocks. */ > >> + read_len = raid1_check_read_range(rdev, this_sector, &len); > >> + if (read_len > 0) { > >> + update_read_sectors(conf, disk, this_sector, read_len); > >> + *max_sectors = read_len; > >> + return disk; > >> + } > > > > Hi Kuai > > > > It needs to update max_sectors even if the bad block starts before > > this_sector. Because it can't read more than bad_blocks from other > > member disks. If it reads more data than bad blocks, it will cause > > data corruption. One rule here is read from the primary disk (the > > first readable disk) if it has no bad block and read the > > badblock-data-length data from other disks. > > Noted that raid1_check_read_range() will return readable length from > this rdev, hence if bad block starts before this_sector, 0 is returned, > and 'len' is updated to the length of badblocks(if not exceed read > range), and following iteration will find the first disk to read updated > 'len' data and update max_sectors. Hi Kuai The problem is that choose_first_rdev doesn't return 'len' from max_sectors when bad blocks start before this_sector. In the following iteration, it can't read more than 'len' from other disks to avoid data corruption. I haven't read all the patches. To this patch, it resets best_good_sectors to sectors when it encounters a good member disk without bad blocks. Regards Xiao > > Thanks, > Kuai > > > > > Best Regards > > Xiao > > > >> + } > >> + > >> + return -1; > >> +} > >> + > >> /* > >> * This routine returns the disk from which the requested read should > >> * be done. There is a per-array 'next expected sequential IO' sector > >> @@ -603,7 +644,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > >> sector_t best_dist; > >> unsigned int min_pending; > >> struct md_rdev *rdev; > >> - int choose_first; > >> > >> retry: > >> sectors = r1_bio->sectors; > >> @@ -613,10 +653,11 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > >> best_pending_disk = -1; > >> min_pending = UINT_MAX; > >> best_good_sectors = 0; > >> - choose_first = raid1_should_read_first(conf->mddev, this_sector, > >> - sectors); > >> clear_bit(R1BIO_FailFast, &r1_bio->state); > >> > >> + if (raid1_should_read_first(conf->mddev, this_sector, sectors)) > >> + return choose_first_rdev(conf, r1_bio, max_sectors); > >> + > >> for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) { > >> sector_t dist; > >> sector_t first_bad; > >> @@ -662,8 +703,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > >> * bad_sectors from another device.. > >> */ > >> bad_sectors -= (this_sector - first_bad); > >> - if (choose_first && sectors > bad_sectors) > >> - sectors = bad_sectors; > >> if (best_good_sectors > sectors) > >> best_good_sectors = sectors; > >> > >> @@ -673,8 +712,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > >> best_good_sectors = good_sectors; > >> best_disk = disk; > >> } > >> - if (choose_first) > >> - break; > >> } > >> continue; > >> } else { > >> @@ -689,10 +726,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > >> > >> pending = atomic_read(&rdev->nr_pending); > >> dist = abs(this_sector - conf->mirrors[disk].head_position); > >> - if (choose_first) { > >> - best_disk = disk; > >> - break; > >> - } > >> /* Don't change to another disk for sequential reads */ > >> if (conf->mirrors[disk].next_seq_sect == this_sector > >> || dist == 0) { > >> @@ -760,13 +793,9 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > >> rdev = conf->mirrors[best_disk].rdev; > >> if (!rdev) > >> goto retry; > >> - atomic_inc(&rdev->nr_pending); > >> - sectors = best_good_sectors; > >> - > >> - if (conf->mirrors[best_disk].next_seq_sect != this_sector) > >> - conf->mirrors[best_disk].seq_start = this_sector; > >> > >> - conf->mirrors[best_disk].next_seq_sect = this_sector + sectors; > >> + sectors = best_good_sectors; > >> + update_read_sectors(conf, disk, this_sector, sectors); > >> } > >> *max_sectors = sectors; > >> > >> -- > >> 2.39.2 > >> > >> > > > > . > > >
On Tue, Feb 27, 2024 at 9:23 AM Xiao Ni <xni@redhat.com> wrote: > > On Tue, Feb 27, 2024 at 9:06 AM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > > > Hi, > > > > 在 2024/02/26 22:16, Xiao Ni 写道: > > > On Thu, Feb 22, 2024 at 4:04 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > >> > > >> From: Yu Kuai <yukuai3@huawei.com> > > >> > > >> read_balance() is hard to understand because there are too many status > > >> and branches, and it's overlong. > > >> > > >> This patch factor out the case to read the first rdev from > > >> read_balance(), there are no functional changes. > > >> > > >> Co-developed-by: Paul Luse <paul.e.luse@linux.intel.com> > > >> Signed-off-by: Paul Luse <paul.e.luse@linux.intel.com> > > >> Signed-off-by: Yu Kuai <yukuai3@huawei.com> > > >> --- > > >> drivers/md/raid1.c | 63 +++++++++++++++++++++++++++++++++------------- > > >> 1 file changed, 46 insertions(+), 17 deletions(-) > > >> > > >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > > >> index 8089c569e84f..08c45ca55a7e 100644 > > >> --- a/drivers/md/raid1.c > > >> +++ b/drivers/md/raid1.c > > >> @@ -579,6 +579,47 @@ static sector_t align_to_barrier_unit_end(sector_t start_sector, > > >> return len; > > >> } > > >> > > >> +static void update_read_sectors(struct r1conf *conf, int disk, > > >> + sector_t this_sector, int len) > > >> +{ > > >> + struct raid1_info *info = &conf->mirrors[disk]; > > >> + > > >> + atomic_inc(&info->rdev->nr_pending); > > >> + if (info->next_seq_sect != this_sector) > > >> + info->seq_start = this_sector; > > >> + info->next_seq_sect = this_sector + len; > > >> +} > > >> + > > >> +static int choose_first_rdev(struct r1conf *conf, struct r1bio *r1_bio, > > >> + int *max_sectors) > > >> +{ > > >> + sector_t this_sector = r1_bio->sector; > > >> + int len = r1_bio->sectors; > > >> + int disk; > > >> + > > >> + for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) { > > >> + struct md_rdev *rdev; > > >> + int read_len; > > >> + > > >> + if (r1_bio->bios[disk] == IO_BLOCKED) > > >> + continue; > > >> + > > >> + rdev = conf->mirrors[disk].rdev; > > >> + if (!rdev || test_bit(Faulty, &rdev->flags)) > > >> + continue; > > >> + > > >> + /* choose the first disk even if it has some bad blocks. */ > > >> + read_len = raid1_check_read_range(rdev, this_sector, &len); > > >> + if (read_len > 0) { > > >> + update_read_sectors(conf, disk, this_sector, read_len); > > >> + *max_sectors = read_len; > > >> + return disk; > > >> + } > > > > > > Hi Kuai > > > > > > It needs to update max_sectors even if the bad block starts before > > > this_sector. Because it can't read more than bad_blocks from other > > > member disks. If it reads more data than bad blocks, it will cause > > > data corruption. One rule here is read from the primary disk (the > > > first readable disk) if it has no bad block and read the > > > badblock-data-length data from other disks. > > > > Noted that raid1_check_read_range() will return readable length from > > this rdev, hence if bad block starts before this_sector, 0 is returned, > > and 'len' is updated to the length of badblocks(if not exceed read > > range), and following iteration will find the first disk to read updated > > 'len' data and update max_sectors. > > Hi Kuai > > The problem is that choose_first_rdev doesn't return 'len' from > max_sectors when bad blocks start before this_sector. In the following > iteration, it can't read more than 'len' from other disks to avoid > data corruption. I haven't read all the patches. To this patch, it > resets best_good_sectors to sectors when it encounters a good member > disk without bad blocks. Sorry, I missed one thing. Beside the point I mentioned above, it needs to update 'sectors' which is the the read region to 'len' finally. It looks like the raid1_check_read_range needs to modify to achieve this. Regards Xiao > > Regards > Xiao > > > > Thanks, > > Kuai > > > > > > > > Best Regards > > > Xiao > > > > > >> + } > > >> + > > >> + return -1; > > >> +} > > >> + > > >> /* > > >> * This routine returns the disk from which the requested read should > > >> * be done. There is a per-array 'next expected sequential IO' sector > > >> @@ -603,7 +644,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > > >> sector_t best_dist; > > >> unsigned int min_pending; > > >> struct md_rdev *rdev; > > >> - int choose_first; > > >> > > >> retry: > > >> sectors = r1_bio->sectors; > > >> @@ -613,10 +653,11 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > > >> best_pending_disk = -1; > > >> min_pending = UINT_MAX; > > >> best_good_sectors = 0; > > >> - choose_first = raid1_should_read_first(conf->mddev, this_sector, > > >> - sectors); > > >> clear_bit(R1BIO_FailFast, &r1_bio->state); > > >> > > >> + if (raid1_should_read_first(conf->mddev, this_sector, sectors)) > > >> + return choose_first_rdev(conf, r1_bio, max_sectors); > > >> + > > >> for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) { > > >> sector_t dist; > > >> sector_t first_bad; > > >> @@ -662,8 +703,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > > >> * bad_sectors from another device.. > > >> */ > > >> bad_sectors -= (this_sector - first_bad); > > >> - if (choose_first && sectors > bad_sectors) > > >> - sectors = bad_sectors; > > >> if (best_good_sectors > sectors) > > >> best_good_sectors = sectors; > > >> > > >> @@ -673,8 +712,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > > >> best_good_sectors = good_sectors; > > >> best_disk = disk; > > >> } > > >> - if (choose_first) > > >> - break; > > >> } > > >> continue; > > >> } else { > > >> @@ -689,10 +726,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > > >> > > >> pending = atomic_read(&rdev->nr_pending); > > >> dist = abs(this_sector - conf->mirrors[disk].head_position); > > >> - if (choose_first) { > > >> - best_disk = disk; > > >> - break; > > >> - } > > >> /* Don't change to another disk for sequential reads */ > > >> if (conf->mirrors[disk].next_seq_sect == this_sector > > >> || dist == 0) { > > >> @@ -760,13 +793,9 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > > >> rdev = conf->mirrors[best_disk].rdev; > > >> if (!rdev) > > >> goto retry; > > >> - atomic_inc(&rdev->nr_pending); > > >> - sectors = best_good_sectors; > > >> - > > >> - if (conf->mirrors[best_disk].next_seq_sect != this_sector) > > >> - conf->mirrors[best_disk].seq_start = this_sector; > > >> > > >> - conf->mirrors[best_disk].next_seq_sect = this_sector + sectors; > > >> + sectors = best_good_sectors; > > >> + update_read_sectors(conf, disk, this_sector, sectors); > > >> } > > >> *max_sectors = sectors; > > >> > > >> -- > > >> 2.39.2 > > >> > > >> > > > > > > . > > > > >
Hi, 在 2024/02/27 9:23, Xiao Ni 写道: > On Tue, Feb 27, 2024 at 9:06 AM Yu Kuai <yukuai1@huaweicloud.com> wrote: >> >> Hi, >> >> 在 2024/02/26 22:16, Xiao Ni 写道: >>> On Thu, Feb 22, 2024 at 4:04 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >>>> >>>> From: Yu Kuai <yukuai3@huawei.com> >>>> >>>> read_balance() is hard to understand because there are too many status >>>> and branches, and it's overlong. >>>> >>>> This patch factor out the case to read the first rdev from >>>> read_balance(), there are no functional changes. >>>> >>>> Co-developed-by: Paul Luse <paul.e.luse@linux.intel.com> >>>> Signed-off-by: Paul Luse <paul.e.luse@linux.intel.com> >>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com> >>>> --- >>>> drivers/md/raid1.c | 63 +++++++++++++++++++++++++++++++++------------- >>>> 1 file changed, 46 insertions(+), 17 deletions(-) >>>> >>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >>>> index 8089c569e84f..08c45ca55a7e 100644 >>>> --- a/drivers/md/raid1.c >>>> +++ b/drivers/md/raid1.c >>>> @@ -579,6 +579,47 @@ static sector_t align_to_barrier_unit_end(sector_t start_sector, >>>> return len; >>>> } >>>> >>>> +static void update_read_sectors(struct r1conf *conf, int disk, >>>> + sector_t this_sector, int len) >>>> +{ >>>> + struct raid1_info *info = &conf->mirrors[disk]; >>>> + >>>> + atomic_inc(&info->rdev->nr_pending); >>>> + if (info->next_seq_sect != this_sector) >>>> + info->seq_start = this_sector; >>>> + info->next_seq_sect = this_sector + len; >>>> +} >>>> + >>>> +static int choose_first_rdev(struct r1conf *conf, struct r1bio *r1_bio, >>>> + int *max_sectors) >>>> +{ >>>> + sector_t this_sector = r1_bio->sector; >>>> + int len = r1_bio->sectors; >>>> + int disk; >>>> + >>>> + for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) { >>>> + struct md_rdev *rdev; >>>> + int read_len; >>>> + >>>> + if (r1_bio->bios[disk] == IO_BLOCKED) >>>> + continue; >>>> + >>>> + rdev = conf->mirrors[disk].rdev; >>>> + if (!rdev || test_bit(Faulty, &rdev->flags)) >>>> + continue; >>>> + >>>> + /* choose the first disk even if it has some bad blocks. */ >>>> + read_len = raid1_check_read_range(rdev, this_sector, &len); >>>> + if (read_len > 0) { >>>> + update_read_sectors(conf, disk, this_sector, read_len); >>>> + *max_sectors = read_len; >>>> + return disk; >>>> + } >>> >>> Hi Kuai >>> >>> It needs to update max_sectors even if the bad block starts before >>> this_sector. Because it can't read more than bad_blocks from other >>> member disks. If it reads more data than bad blocks, it will cause >>> data corruption. One rule here is read from the primary disk (the >>> first readable disk) if it has no bad block and read the >>> badblock-data-length data from other disks. >> >> Noted that raid1_check_read_range() will return readable length from >> this rdev, hence if bad block starts before this_sector, 0 is returned, >> and 'len' is updated to the length of badblocks(if not exceed read >> range), and following iteration will find the first disk to read updated >> 'len' data and update max_sectors. > > Hi Kuai > > The problem is that choose_first_rdev doesn't return 'len' from > max_sectors when bad blocks start before this_sector. In the following > iteration, it can't read more than 'len' from other disks to avoid > data corruption. I haven't read all the patches. To this patch, it > resets best_good_sectors to sectors when it encounters a good member > disk without bad blocks. In this case, 'len' is not supposed to be returned, caller will split orignal IO based on 'max_sectors', for example: IO: 2, 4 | ---- rdev1: BB: 0, 4 |xxxx rdev2: no BB Then choose_first_rdev() will set max_sectors to 2, and return rdev2, then caller will split and issue new IO: orignal IO: 4, 2 | -- splited IO: 2, 2 | -- Finally, issue splited IO to rdev2. Later orignal IO will be handled by read_balance() again, and rdev1 will be returned. Is this case what you concerned? Thanks, Kuai > > Regards > Xiao >> >> Thanks, >> Kuai >> >>> >>> Best Regards >>> Xiao >>> >>>> + } >>>> + >>>> + return -1; >>>> +} >>>> + >>>> /* >>>> * This routine returns the disk from which the requested read should >>>> * be done. There is a per-array 'next expected sequential IO' sector >>>> @@ -603,7 +644,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect >>>> sector_t best_dist; >>>> unsigned int min_pending; >>>> struct md_rdev *rdev; >>>> - int choose_first; >>>> >>>> retry: >>>> sectors = r1_bio->sectors; >>>> @@ -613,10 +653,11 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect >>>> best_pending_disk = -1; >>>> min_pending = UINT_MAX; >>>> best_good_sectors = 0; >>>> - choose_first = raid1_should_read_first(conf->mddev, this_sector, >>>> - sectors); >>>> clear_bit(R1BIO_FailFast, &r1_bio->state); >>>> >>>> + if (raid1_should_read_first(conf->mddev, this_sector, sectors)) >>>> + return choose_first_rdev(conf, r1_bio, max_sectors); >>>> + >>>> for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) { >>>> sector_t dist; >>>> sector_t first_bad; >>>> @@ -662,8 +703,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect >>>> * bad_sectors from another device.. >>>> */ >>>> bad_sectors -= (this_sector - first_bad); >>>> - if (choose_first && sectors > bad_sectors) >>>> - sectors = bad_sectors; >>>> if (best_good_sectors > sectors) >>>> best_good_sectors = sectors; >>>> >>>> @@ -673,8 +712,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect >>>> best_good_sectors = good_sectors; >>>> best_disk = disk; >>>> } >>>> - if (choose_first) >>>> - break; >>>> } >>>> continue; >>>> } else { >>>> @@ -689,10 +726,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect >>>> >>>> pending = atomic_read(&rdev->nr_pending); >>>> dist = abs(this_sector - conf->mirrors[disk].head_position); >>>> - if (choose_first) { >>>> - best_disk = disk; >>>> - break; >>>> - } >>>> /* Don't change to another disk for sequential reads */ >>>> if (conf->mirrors[disk].next_seq_sect == this_sector >>>> || dist == 0) { >>>> @@ -760,13 +793,9 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect >>>> rdev = conf->mirrors[best_disk].rdev; >>>> if (!rdev) >>>> goto retry; >>>> - atomic_inc(&rdev->nr_pending); >>>> - sectors = best_good_sectors; >>>> - >>>> - if (conf->mirrors[best_disk].next_seq_sect != this_sector) >>>> - conf->mirrors[best_disk].seq_start = this_sector; >>>> >>>> - conf->mirrors[best_disk].next_seq_sect = this_sector + sectors; >>>> + sectors = best_good_sectors; >>>> + update_read_sectors(conf, disk, this_sector, sectors); >>>> } >>>> *max_sectors = sectors; >>>> >>>> -- >>>> 2.39.2 >>>> >>>> >>> >>> . >>> >> > > . >
On Tue, Feb 27, 2024 at 9:44 AM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > Hi, > > 在 2024/02/27 9:23, Xiao Ni 写道: > > On Tue, Feb 27, 2024 at 9:06 AM Yu Kuai <yukuai1@huaweicloud.com> wrote: > >> > >> Hi, > >> > >> 在 2024/02/26 22:16, Xiao Ni 写道: > >>> On Thu, Feb 22, 2024 at 4:04 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > >>>> > >>>> From: Yu Kuai <yukuai3@huawei.com> > >>>> > >>>> read_balance() is hard to understand because there are too many status > >>>> and branches, and it's overlong. > >>>> > >>>> This patch factor out the case to read the first rdev from > >>>> read_balance(), there are no functional changes. > >>>> > >>>> Co-developed-by: Paul Luse <paul.e.luse@linux.intel.com> > >>>> Signed-off-by: Paul Luse <paul.e.luse@linux.intel.com> > >>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com> > >>>> --- > >>>> drivers/md/raid1.c | 63 +++++++++++++++++++++++++++++++++------------- > >>>> 1 file changed, 46 insertions(+), 17 deletions(-) > >>>> > >>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > >>>> index 8089c569e84f..08c45ca55a7e 100644 > >>>> --- a/drivers/md/raid1.c > >>>> +++ b/drivers/md/raid1.c > >>>> @@ -579,6 +579,47 @@ static sector_t align_to_barrier_unit_end(sector_t start_sector, > >>>> return len; > >>>> } > >>>> > >>>> +static void update_read_sectors(struct r1conf *conf, int disk, > >>>> + sector_t this_sector, int len) > >>>> +{ > >>>> + struct raid1_info *info = &conf->mirrors[disk]; > >>>> + > >>>> + atomic_inc(&info->rdev->nr_pending); > >>>> + if (info->next_seq_sect != this_sector) > >>>> + info->seq_start = this_sector; > >>>> + info->next_seq_sect = this_sector + len; > >>>> +} > >>>> + > >>>> +static int choose_first_rdev(struct r1conf *conf, struct r1bio *r1_bio, > >>>> + int *max_sectors) > >>>> +{ > >>>> + sector_t this_sector = r1_bio->sector; > >>>> + int len = r1_bio->sectors; > >>>> + int disk; > >>>> + > >>>> + for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) { > >>>> + struct md_rdev *rdev; > >>>> + int read_len; > >>>> + > >>>> + if (r1_bio->bios[disk] == IO_BLOCKED) > >>>> + continue; > >>>> + > >>>> + rdev = conf->mirrors[disk].rdev; > >>>> + if (!rdev || test_bit(Faulty, &rdev->flags)) > >>>> + continue; > >>>> + > >>>> + /* choose the first disk even if it has some bad blocks. */ > >>>> + read_len = raid1_check_read_range(rdev, this_sector, &len); > >>>> + if (read_len > 0) { > >>>> + update_read_sectors(conf, disk, this_sector, read_len); > >>>> + *max_sectors = read_len; > >>>> + return disk; > >>>> + } > >>> > >>> Hi Kuai > >>> > >>> It needs to update max_sectors even if the bad block starts before > >>> this_sector. Because it can't read more than bad_blocks from other > >>> member disks. If it reads more data than bad blocks, it will cause > >>> data corruption. One rule here is read from the primary disk (the > >>> first readable disk) if it has no bad block and read the > >>> badblock-data-length data from other disks. > >> > >> Noted that raid1_check_read_range() will return readable length from > >> this rdev, hence if bad block starts before this_sector, 0 is returned, > >> and 'len' is updated to the length of badblocks(if not exceed read > >> range), and following iteration will find the first disk to read updated > >> 'len' data and update max_sectors. > > > > Hi Kuai > > > > The problem is that choose_first_rdev doesn't return 'len' from > > max_sectors when bad blocks start before this_sector. In the following > > iteration, it can't read more than 'len' from other disks to avoid > > data corruption. I haven't read all the patches. To this patch, it > > resets best_good_sectors to sectors when it encounters a good member > > disk without bad blocks. > > In this case, 'len' is not supposed to be returned, caller will split > orignal IO based on 'max_sectors', for example: > > IO: 2, 4 | ---- > rdev1: BB: 0, 4 |xxxx > rdev2: no BB > > Then choose_first_rdev() will set max_sectors to 2, and return rdev2, > then caller will split and issue new IO: > > orignal IO: 4, 2 | -- > splited IO: 2, 2 | -- > > Finally, issue splited IO to rdev2. Later orignal IO will be handled by > read_balance() again, and rdev1 will be returned. > > Is this case what you concerned? Ah I was still in the original logic and forgot choose_first_rdev is iterates all disks. The case I want to talk is: bad block range: 1------8 first readable disk: 4-----------16 The io starts at 4, but the bad block starts at 1 and length is 8. raid1_check_read_range returns 0 and len is updated to 5. In the loop, it can split the io as expected. Thanks for the explanation. Best Regards Xiao > > Thanks, > Kuai > > > > > Regards > > Xiao > >> > >> Thanks, > >> Kuai > >> > >>> > >>> Best Regards > >>> Xiao > >>> > >>>> + } > >>>> + > >>>> + return -1; > >>>> +} > >>>> + > >>>> /* > >>>> * This routine returns the disk from which the requested read should > >>>> * be done. There is a per-array 'next expected sequential IO' sector > >>>> @@ -603,7 +644,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > >>>> sector_t best_dist; > >>>> unsigned int min_pending; > >>>> struct md_rdev *rdev; > >>>> - int choose_first; > >>>> > >>>> retry: > >>>> sectors = r1_bio->sectors; > >>>> @@ -613,10 +653,11 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > >>>> best_pending_disk = -1; > >>>> min_pending = UINT_MAX; > >>>> best_good_sectors = 0; > >>>> - choose_first = raid1_should_read_first(conf->mddev, this_sector, > >>>> - sectors); > >>>> clear_bit(R1BIO_FailFast, &r1_bio->state); > >>>> > >>>> + if (raid1_should_read_first(conf->mddev, this_sector, sectors)) > >>>> + return choose_first_rdev(conf, r1_bio, max_sectors); > >>>> + > >>>> for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) { > >>>> sector_t dist; > >>>> sector_t first_bad; > >>>> @@ -662,8 +703,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > >>>> * bad_sectors from another device. > >>>> */ > >>>> bad_sectors -= (this_sector - first_bad); > >>>> - if (choose_first && sectors > bad_sectors) > >>>> - sectors = bad_sectors; > >>>> if (best_good_sectors > sectors) > >>>> best_good_sectors = sectors; > >>>> > >>>> @@ -673,8 +712,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > >>>> best_good_sectors = good_sectors; > >>>> best_disk = disk; > >>>> } > >>>> - if (choose_first) > >>>> - break; > >>>> } > >>>> continue; > >>>> } else { > >>>> @@ -689,10 +726,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > >>>> > >>>> pending = atomic_read(&rdev->nr_pending); > >>>> dist = abs(this_sector - conf->mirrors[disk].head_position); > >>>> - if (choose_first) { > >>>> - best_disk = disk; > >>>> - break; > >>>> - } > >>>> /* Don't change to another disk for sequential reads */ > >>>> if (conf->mirrors[disk].next_seq_sect == this_sector > >>>> || dist == 0) { > >>>> @@ -760,13 +793,9 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > >>>> rdev = conf->mirrors[best_disk].rdev; > >>>> if (!rdev) > >>>> goto retry; > >>>> - atomic_inc(&rdev->nr_pending); > >>>> - sectors = best_good_sectors; > >>>> - > >>>> - if (conf->mirrors[best_disk].next_seq_sect != this_sector) > >>>> - conf->mirrors[best_disk].seq_start = this_sector; > >>>> > >>>> - conf->mirrors[best_disk].next_seq_sect = this_sector + sectors; > >>>> + sectors = best_good_sectors; > >>>> + update_read_sectors(conf, disk, this_sector, sectors); > >>>> } > >>>> *max_sectors = sectors; > >>>> > >>>> -- > >>>> 2.39.2 > >>>> > >>>> > >>> > >>> . > >>> > >> > > > > . > > >
On Thu, Feb 22, 2024 at 4:04 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > From: Yu Kuai <yukuai3@huawei.com> > > read_balance() is hard to understand because there are too many status > and branches, and it's overlong. > > This patch factor out the case to read the first rdev from > read_balance(), there are no functional changes. > > Co-developed-by: Paul Luse <paul.e.luse@linux.intel.com> > Signed-off-by: Paul Luse <paul.e.luse@linux.intel.com> > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > --- > drivers/md/raid1.c | 63 +++++++++++++++++++++++++++++++++------------- > 1 file changed, 46 insertions(+), 17 deletions(-) > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 8089c569e84f..08c45ca55a7e 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -579,6 +579,47 @@ static sector_t align_to_barrier_unit_end(sector_t start_sector, > return len; > } > > +static void update_read_sectors(struct r1conf *conf, int disk, > + sector_t this_sector, int len) > +{ > + struct raid1_info *info = &conf->mirrors[disk]; > + > + atomic_inc(&info->rdev->nr_pending); > + if (info->next_seq_sect != this_sector) > + info->seq_start = this_sector; > + info->next_seq_sect = this_sector + len; > +} > + > +static int choose_first_rdev(struct r1conf *conf, struct r1bio *r1_bio, > + int *max_sectors) > +{ > + sector_t this_sector = r1_bio->sector; > + int len = r1_bio->sectors; > + int disk; > + > + for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) { > + struct md_rdev *rdev; > + int read_len; > + > + if (r1_bio->bios[disk] == IO_BLOCKED) > + continue; > + > + rdev = conf->mirrors[disk].rdev; > + if (!rdev || test_bit(Faulty, &rdev->flags)) > + continue; > + > + /* choose the first disk even if it has some bad blocks. */ > + read_len = raid1_check_read_range(rdev, this_sector, &len); > + if (read_len > 0) { > + update_read_sectors(conf, disk, this_sector, read_len); > + *max_sectors = read_len; > + return disk; > + } > + } > + > + return -1; > +} > + > /* > * This routine returns the disk from which the requested read should > * be done. There is a per-array 'next expected sequential IO' sector > @@ -603,7 +644,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > sector_t best_dist; > unsigned int min_pending; > struct md_rdev *rdev; > - int choose_first; > > retry: > sectors = r1_bio->sectors; > @@ -613,10 +653,11 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > best_pending_disk = -1; > min_pending = UINT_MAX; > best_good_sectors = 0; > - choose_first = raid1_should_read_first(conf->mddev, this_sector, > - sectors); > clear_bit(R1BIO_FailFast, &r1_bio->state); > > + if (raid1_should_read_first(conf->mddev, this_sector, sectors)) > + return choose_first_rdev(conf, r1_bio, max_sectors); > + > for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) { > sector_t dist; > sector_t first_bad; > @@ -662,8 +703,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > * bad_sectors from another device.. > */ > bad_sectors -= (this_sector - first_bad); > - if (choose_first && sectors > bad_sectors) > - sectors = bad_sectors; > if (best_good_sectors > sectors) > best_good_sectors = sectors; > > @@ -673,8 +712,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > best_good_sectors = good_sectors; > best_disk = disk; > } > - if (choose_first) > - break; > } > continue; > } else { > @@ -689,10 +726,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > > pending = atomic_read(&rdev->nr_pending); > dist = abs(this_sector - conf->mirrors[disk].head_position); > - if (choose_first) { > - best_disk = disk; > - break; > - } > /* Don't change to another disk for sequential reads */ > if (conf->mirrors[disk].next_seq_sect == this_sector > || dist == 0) { > @@ -760,13 +793,9 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect > rdev = conf->mirrors[best_disk].rdev; > if (!rdev) > goto retry; > - atomic_inc(&rdev->nr_pending); > - sectors = best_good_sectors; > - > - if (conf->mirrors[best_disk].next_seq_sect != this_sector) > - conf->mirrors[best_disk].seq_start = this_sector; > > - conf->mirrors[best_disk].next_seq_sect = this_sector + sectors; > + sectors = best_good_sectors; > + update_read_sectors(conf, disk, this_sector, sectors); > } > *max_sectors = sectors; > > -- > 2.39.2 > > Hi This patch looks good to me. Reviewed-by: Xiao Ni <xni@redhat.com>
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 8089c569e84f..08c45ca55a7e 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -579,6 +579,47 @@ static sector_t align_to_barrier_unit_end(sector_t start_sector, return len; } +static void update_read_sectors(struct r1conf *conf, int disk, + sector_t this_sector, int len) +{ + struct raid1_info *info = &conf->mirrors[disk]; + + atomic_inc(&info->rdev->nr_pending); + if (info->next_seq_sect != this_sector) + info->seq_start = this_sector; + info->next_seq_sect = this_sector + len; +} + +static int choose_first_rdev(struct r1conf *conf, struct r1bio *r1_bio, + int *max_sectors) +{ + sector_t this_sector = r1_bio->sector; + int len = r1_bio->sectors; + int disk; + + for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) { + struct md_rdev *rdev; + int read_len; + + if (r1_bio->bios[disk] == IO_BLOCKED) + continue; + + rdev = conf->mirrors[disk].rdev; + if (!rdev || test_bit(Faulty, &rdev->flags)) + continue; + + /* choose the first disk even if it has some bad blocks. */ + read_len = raid1_check_read_range(rdev, this_sector, &len); + if (read_len > 0) { + update_read_sectors(conf, disk, this_sector, read_len); + *max_sectors = read_len; + return disk; + } + } + + return -1; +} + /* * This routine returns the disk from which the requested read should * be done. There is a per-array 'next expected sequential IO' sector @@ -603,7 +644,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect sector_t best_dist; unsigned int min_pending; struct md_rdev *rdev; - int choose_first; retry: sectors = r1_bio->sectors; @@ -613,10 +653,11 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect best_pending_disk = -1; min_pending = UINT_MAX; best_good_sectors = 0; - choose_first = raid1_should_read_first(conf->mddev, this_sector, - sectors); clear_bit(R1BIO_FailFast, &r1_bio->state); + if (raid1_should_read_first(conf->mddev, this_sector, sectors)) + return choose_first_rdev(conf, r1_bio, max_sectors); + for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) { sector_t dist; sector_t first_bad; @@ -662,8 +703,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect * bad_sectors from another device.. */ bad_sectors -= (this_sector - first_bad); - if (choose_first && sectors > bad_sectors) - sectors = bad_sectors; if (best_good_sectors > sectors) best_good_sectors = sectors; @@ -673,8 +712,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect best_good_sectors = good_sectors; best_disk = disk; } - if (choose_first) - break; } continue; } else { @@ -689,10 +726,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect pending = atomic_read(&rdev->nr_pending); dist = abs(this_sector - conf->mirrors[disk].head_position); - if (choose_first) { - best_disk = disk; - break; - } /* Don't change to another disk for sequential reads */ if (conf->mirrors[disk].next_seq_sect == this_sector || dist == 0) { @@ -760,13 +793,9 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect rdev = conf->mirrors[best_disk].rdev; if (!rdev) goto retry; - atomic_inc(&rdev->nr_pending); - sectors = best_good_sectors; - - if (conf->mirrors[best_disk].next_seq_sect != this_sector) - conf->mirrors[best_disk].seq_start = this_sector; - conf->mirrors[best_disk].next_seq_sect = this_sector + sectors; + sectors = best_good_sectors; + update_read_sectors(conf, disk, this_sector, sectors); } *max_sectors = sectors;