From patchwork Tue Feb 27 12:03:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 207185 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:a81b:b0:108:e6aa:91d0 with SMTP id bq27csp2658487dyb; Tue, 27 Feb 2024 04:23:51 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWmg42z0fpmobRq15UmczIumk11VHYcVLY6jD+fDijbI+tEplhNtXerCTJ70zjrTsHYpBQ/2fAc7oIKfft6VMkaF4aStg== X-Google-Smtp-Source: AGHT+IEa9Y2RH1Kv5DxKkDfL4aTDWPfFZSWHcbgXKaQtvr1UAo5Xp0v/SX99qALqG+LDF3EyZrks X-Received: by 2002:a54:4781:0:b0:3c1:a3df:fb6e with SMTP id o1-20020a544781000000b003c1a3dffb6emr1768084oic.18.1709036631425; Tue, 27 Feb 2024 04:23:51 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709036631; cv=pass; d=google.com; s=arc-20160816; b=d7wlbt2AtnYVxGTJWkd1S168xufK1R2hg/b+1lChHHc3xMFRdtWXMOxw4IGxDc6oYK WqWjF0OBEYwLTIkr3Oc0dlvk9yAdYZ0wYv1ciBMD4yyhPNEDfn8seFF/+eU4Sc8llH6w O4nC/S8zgMcH2tQ5G6DaFa1rdF62Nt3l+ip+FU0y5tJJ1udORKe9EWac7Ayjrm21Oj6O 44QMb5+l4rWYnx0xaxvA0SET/BT5i4hfB7RSe+buIb+7WO7agdvf0FgmQ3yEpLIr92Xy Vk1yc/lFd/6uCE/QO8tYMQPPvZ3xTQ2x4Rg1c9gfFSgyqzM5CM5hEUkFdyYQGO3yHBhP alDA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=Sb/kNr7PfRENUWs9EE5/E6hxgqnxYAMcVzqdwYCWjYQ=; fh=Ytj0OQL2yyVgRSgKlJizbwNzVtV64xy+1TGtb7SiHpY=; b=HqbryM23F/oxUJRkUOaKdE4+SHjzLTK2qJ7AbHJ0kb7lKiLAb8QPRy40tB9yo45n0z N3Z6IgNgNDfhpXITNCO8DkIgVp/S1FXC76lZcmcYcu4f2o4QGcll2dynz2X6Se564s3G fwD+Ob6940SFj8qBHQKe0YLZ3blZKX9mD7lg45Efczmj9mKAefmDoy3QjlxCuAKdWFB4 Et01mz3Ya7xjB6QCnqd/TZaPe0JpfBElGQf87et+gFwLHkQtD+QtL7cL+EUvtDC2JVRI GYx1bsPbYSO0yeyjyMBghJJ5acjdN9JhMUnXJrYgLibKxL7notfLy/aHPoLdp0Vas1NG nZIg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-83167-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83167-ouuuleilei=gmail.com@vger.kernel.org" Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id d3-20020a056a0010c300b006e4ee56c02esi5400621pfu.168.2024.02.27.04.23.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Feb 2024 04:23:51 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-83167-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-83167-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83167-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 49766B2943E for ; Tue, 27 Feb 2024 12:11:55 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 9427314532D; Tue, 27 Feb 2024 12:09:34 +0000 (UTC) Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BE38713A250; Tue, 27 Feb 2024 12:09:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709035770; cv=none; b=kTODbYvWC0RXUrGUWsIPF635gVjy2MKUv5pjmTloc/gSKO6lZSxWB+1UG2N9Qd9uMSKxzfZM/ykt0m/U8DuOta0oTMj1n3SHP3hbtwDyldxmM7lA63PuSY4VdyXgJLsCGo+qLh6bN6MRSYDLZawmBw0zdi7Yr7pibFmE+bZyS9U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709035770; c=relaxed/simple; bh=VHPw/XIujHrrnoKg8+hTdwOfW2a8iMlT6BhNgl66NrI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=QuTK1FkDDia4qRcjahfKwHjeoddoF/5YTVLPTz7cirTQeHgbq/C7kq5aIJYx9ku9zAzMDbOOSDwnRxzWEqQo6HeZYpiTsHdDpUs+7UpoiYgloe0MsTZ9Z3QUdJd+rHZ17f1NrlCsNSZPu/YxHXvMyn7qE8ebpIaHgTCf3sRxnEg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4TkbrD65Pnz4f3m7B; Tue, 27 Feb 2024 20:09:16 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 2AD561A0232; Tue, 27 Feb 2024 20:09:24 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgAn+RHv0N1lpKNAFQ--.28259S5; Tue, 27 Feb 2024 20:09:23 +0800 (CST) From: Yu Kuai To: xni@redhat.com, paul.e.luse@linux.intel.com, song@kernel.org, shli@fb.com, neilb@suse.com Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH md-6.9 v2 01/10] md: add a new helper rdev_has_badblock() Date: Tue, 27 Feb 2024 20:03:18 +0800 Message-Id: <20240227120327.1432511-2-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240227120327.1432511-1-yukuai1@huaweicloud.com> References: <20240227120327.1432511-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgAn+RHv0N1lpKNAFQ--.28259S5 X-Coremail-Antispam: 1UD129KBjvJXoW3uFWUuFWxKFW3Jw47Cw47CFg_yoWkZw13p3 9rJa4SyFWUJFyfWw4DJayUurnYy34fJrW7JFWxX34Iga4jkr9xKFykXryYgF98uFy3ur12 qwnrZ3y7u397KFUanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBK14x267AKxVW5JVWrJwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jr4l82xGYIkIc2 x26xkF7I0E14v26r4j6ryUM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_Xr0_Ar1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Cr0_Gr1UM2 8EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s0DM2AI xVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20x vE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xv r2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7M4IIrI8v6xkF7I0E8cxan2IY04 v7MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_ Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUtVW8ZwCIc40Y0x 0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8 JVWxJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIx AIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7VUjeWlDUUUUU= = X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792054794735461514 X-GMAIL-MSGID: 1792054794735461514 From: Yu Kuai The current api is_badblock() must pass in 'first_bad' and 'bad_sectors', however, many caller just want to know if there are badblocks or not, and these caller must define two local variable that will never be used. Add a new helper rdev_has_badblock() that will only return if there are badblocks or not, remove unnecessary local variables and replace is_badblock() with the new helper in many places. There are no functional changes, and the new helper will also be used later to refactor read_balance(). Co-developed-by: Paul Luse Signed-off-by: Paul Luse Signed-off-by: Yu Kuai Reviewed-by: Xiao Ni --- drivers/md/md.h | 10 ++++++++++ drivers/md/raid1.c | 26 +++++++------------------- drivers/md/raid10.c | 45 ++++++++++++++------------------------------- drivers/md/raid5.c | 35 +++++++++++++---------------------- 4 files changed, 44 insertions(+), 72 deletions(-) diff --git a/drivers/md/md.h b/drivers/md/md.h index 8d881cc59799..a49ab04ab707 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -222,6 +222,16 @@ static inline int is_badblock(struct md_rdev *rdev, sector_t s, int sectors, } return 0; } + +static inline int rdev_has_badblock(struct md_rdev *rdev, sector_t s, + int sectors) +{ + sector_t first_bad; + int bad_sectors; + + return is_badblock(rdev, s, sectors, &first_bad, &bad_sectors); +} + extern int rdev_set_badblocks(struct md_rdev *rdev, sector_t s, int sectors, int is_new); extern int rdev_clear_badblocks(struct md_rdev *rdev, sector_t s, int sectors, diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 286f8b16c7bd..a145fe48b9ce 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -498,9 +498,6 @@ static void raid1_end_write_request(struct bio *bio) * to user-side. So if something waits for IO, then it * will wait for the 'master' bio. */ - sector_t first_bad; - int bad_sectors; - r1_bio->bios[mirror] = NULL; to_put = bio; /* @@ -516,8 +513,8 @@ static void raid1_end_write_request(struct bio *bio) set_bit(R1BIO_Uptodate, &r1_bio->state); /* Maybe we can clear some bad blocks. */ - if (is_badblock(rdev, r1_bio->sector, r1_bio->sectors, - &first_bad, &bad_sectors) && !discard_error) { + if (rdev_has_badblock(rdev, r1_bio->sector, r1_bio->sectors) && + !discard_error) { r1_bio->bios[mirror] = IO_MADE_GOOD; set_bit(R1BIO_MadeGood, &r1_bio->state); } @@ -1944,8 +1941,6 @@ static void end_sync_write(struct bio *bio) struct r1bio *r1_bio = get_resync_r1bio(bio); struct mddev *mddev = r1_bio->mddev; struct r1conf *conf = mddev->private; - sector_t first_bad; - int bad_sectors; struct md_rdev *rdev = conf->mirrors[find_bio_disk(r1_bio, bio)].rdev; if (!uptodate) { @@ -1955,14 +1950,11 @@ static void end_sync_write(struct bio *bio) set_bit(MD_RECOVERY_NEEDED, & mddev->recovery); set_bit(R1BIO_WriteError, &r1_bio->state); - } else if (is_badblock(rdev, r1_bio->sector, r1_bio->sectors, - &first_bad, &bad_sectors) && - !is_badblock(conf->mirrors[r1_bio->read_disk].rdev, - r1_bio->sector, - r1_bio->sectors, - &first_bad, &bad_sectors) - ) + } else if (rdev_has_badblock(rdev, r1_bio->sector, r1_bio->sectors) && + !rdev_has_badblock(conf->mirrors[r1_bio->read_disk].rdev, + r1_bio->sector, r1_bio->sectors)) { set_bit(R1BIO_MadeGood, &r1_bio->state); + } put_sync_write_buf(r1_bio, uptodate); } @@ -2279,16 +2271,12 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio) s = PAGE_SIZE >> 9; do { - sector_t first_bad; - int bad_sectors; - rdev = conf->mirrors[d].rdev; if (rdev && (test_bit(In_sync, &rdev->flags) || (!test_bit(Faulty, &rdev->flags) && rdev->recovery_offset >= sect + s)) && - is_badblock(rdev, sect, s, - &first_bad, &bad_sectors) == 0) { + rdev_has_badblock(rdev, sect, s) == 0) { atomic_inc(&rdev->nr_pending); if (sync_page_io(rdev, sect, s<<9, conf->tmppage, REQ_OP_READ, false)) diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 7412066ea22c..d5a7a621f0f0 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -518,11 +518,7 @@ static void raid10_end_write_request(struct bio *bio) * The 'master' represents the composite IO operation to * user-side. So if something waits for IO, then it will * wait for the 'master' bio. - */ - sector_t first_bad; - int bad_sectors; - - /* + * * Do not set R10BIO_Uptodate if the current device is * rebuilding or Faulty. This is because we cannot use * such device for properly reading the data back (we could @@ -535,10 +531,9 @@ static void raid10_end_write_request(struct bio *bio) set_bit(R10BIO_Uptodate, &r10_bio->state); /* Maybe we can clear some bad blocks. */ - if (is_badblock(rdev, - r10_bio->devs[slot].addr, - r10_bio->sectors, - &first_bad, &bad_sectors) && !discard_error) { + if (rdev_has_badblock(rdev, r10_bio->devs[slot].addr, + r10_bio->sectors) && + !discard_error) { bio_put(bio); if (repl) r10_bio->devs[slot].repl_bio = IO_MADE_GOOD; @@ -1330,10 +1325,7 @@ static void wait_blocked_dev(struct mddev *mddev, struct r10bio *r10_bio) } if (rdev && test_bit(WriteErrorSeen, &rdev->flags)) { - sector_t first_bad; sector_t dev_sector = r10_bio->devs[i].addr; - int bad_sectors; - int is_bad; /* * Discard request doesn't care the write result @@ -1342,9 +1334,8 @@ static void wait_blocked_dev(struct mddev *mddev, struct r10bio *r10_bio) if (!r10_bio->sectors) continue; - is_bad = is_badblock(rdev, dev_sector, r10_bio->sectors, - &first_bad, &bad_sectors); - if (is_bad < 0) { + if (rdev_has_badblock(rdev, dev_sector, + r10_bio->sectors) < 0) { /* * Mustn't write here until the bad block * is acknowledged @@ -2290,8 +2281,6 @@ static void end_sync_write(struct bio *bio) struct mddev *mddev = r10_bio->mddev; struct r10conf *conf = mddev->private; int d; - sector_t first_bad; - int bad_sectors; int slot; int repl; struct md_rdev *rdev = NULL; @@ -2312,11 +2301,10 @@ static void end_sync_write(struct bio *bio) &rdev->mddev->recovery); set_bit(R10BIO_WriteError, &r10_bio->state); } - } else if (is_badblock(rdev, - r10_bio->devs[slot].addr, - r10_bio->sectors, - &first_bad, &bad_sectors)) + } else if (rdev_has_badblock(rdev, r10_bio->devs[slot].addr, + r10_bio->sectors)) { set_bit(R10BIO_MadeGood, &r10_bio->state); + } rdev_dec_pending(rdev, mddev); @@ -2597,11 +2585,8 @@ static void recovery_request_write(struct mddev *mddev, struct r10bio *r10_bio) static int r10_sync_page_io(struct md_rdev *rdev, sector_t sector, int sectors, struct page *page, enum req_op op) { - sector_t first_bad; - int bad_sectors; - - if (is_badblock(rdev, sector, sectors, &first_bad, &bad_sectors) - && (op == REQ_OP_READ || test_bit(WriteErrorSeen, &rdev->flags))) + if (rdev_has_badblock(rdev, sector, sectors) && + (op == REQ_OP_READ || test_bit(WriteErrorSeen, &rdev->flags))) return -1; if (sync_page_io(rdev, sector, sectors << 9, page, op, false)) /* success */ @@ -2658,16 +2643,14 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10 s = PAGE_SIZE >> 9; do { - sector_t first_bad; - int bad_sectors; - d = r10_bio->devs[sl].devnum; rdev = conf->mirrors[d].rdev; if (rdev && test_bit(In_sync, &rdev->flags) && !test_bit(Faulty, &rdev->flags) && - is_badblock(rdev, r10_bio->devs[sl].addr + sect, s, - &first_bad, &bad_sectors) == 0) { + rdev_has_badblock(rdev, + r10_bio->devs[sl].addr + sect, + s) == 0) { atomic_inc(&rdev->nr_pending); success = sync_page_io(rdev, r10_bio->devs[sl].addr + diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 14f2cf75abbd..9241e95ef55c 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -1210,10 +1210,8 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s) */ while (op_is_write(op) && rdev && test_bit(WriteErrorSeen, &rdev->flags)) { - sector_t first_bad; - int bad_sectors; - int bad = is_badblock(rdev, sh->sector, RAID5_STRIPE_SECTORS(conf), - &first_bad, &bad_sectors); + int bad = rdev_has_badblock(rdev, sh->sector, + RAID5_STRIPE_SECTORS(conf)); if (!bad) break; @@ -2855,8 +2853,6 @@ static void raid5_end_write_request(struct bio *bi) struct r5conf *conf = sh->raid_conf; int disks = sh->disks, i; struct md_rdev *rdev; - sector_t first_bad; - int bad_sectors; int replacement = 0; for (i = 0 ; i < disks; i++) { @@ -2888,9 +2884,8 @@ static void raid5_end_write_request(struct bio *bi) if (replacement) { if (bi->bi_status) md_error(conf->mddev, rdev); - else if (is_badblock(rdev, sh->sector, - RAID5_STRIPE_SECTORS(conf), - &first_bad, &bad_sectors)) + else if (rdev_has_badblock(rdev, sh->sector, + RAID5_STRIPE_SECTORS(conf))) set_bit(R5_MadeGoodRepl, &sh->dev[i].flags); } else { if (bi->bi_status) { @@ -2900,9 +2895,8 @@ static void raid5_end_write_request(struct bio *bi) if (!test_and_set_bit(WantReplacement, &rdev->flags)) set_bit(MD_RECOVERY_NEEDED, &rdev->mddev->recovery); - } else if (is_badblock(rdev, sh->sector, - RAID5_STRIPE_SECTORS(conf), - &first_bad, &bad_sectors)) { + } else if (rdev_has_badblock(rdev, sh->sector, + RAID5_STRIPE_SECTORS(conf))) { set_bit(R5_MadeGood, &sh->dev[i].flags); if (test_bit(R5_ReadError, &sh->dev[i].flags)) /* That was a successful write so make @@ -4674,8 +4668,6 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s) /* Now to look around and see what can be done */ for (i=disks; i--; ) { struct md_rdev *rdev; - sector_t first_bad; - int bad_sectors; int is_bad = 0; dev = &sh->dev[i]; @@ -4719,8 +4711,8 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s) rdev = conf->disks[i].replacement; if (rdev && !test_bit(Faulty, &rdev->flags) && rdev->recovery_offset >= sh->sector + RAID5_STRIPE_SECTORS(conf) && - !is_badblock(rdev, sh->sector, RAID5_STRIPE_SECTORS(conf), - &first_bad, &bad_sectors)) + !rdev_has_badblock(rdev, sh->sector, + RAID5_STRIPE_SECTORS(conf))) set_bit(R5_ReadRepl, &dev->flags); else { if (rdev && !test_bit(Faulty, &rdev->flags)) @@ -4733,8 +4725,8 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s) if (rdev && test_bit(Faulty, &rdev->flags)) rdev = NULL; if (rdev) { - is_bad = is_badblock(rdev, sh->sector, RAID5_STRIPE_SECTORS(conf), - &first_bad, &bad_sectors); + is_bad = rdev_has_badblock(rdev, sh->sector, + RAID5_STRIPE_SECTORS(conf)); if (s->blocked_rdev == NULL && (test_bit(Blocked, &rdev->flags) || is_bad < 0)) { @@ -5463,8 +5455,8 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio) struct r5conf *conf = mddev->private; struct bio *align_bio; struct md_rdev *rdev; - sector_t sector, end_sector, first_bad; - int bad_sectors, dd_idx; + sector_t sector, end_sector; + int dd_idx; bool did_inc; if (!in_chunk_boundary(mddev, raid_bio)) { @@ -5493,8 +5485,7 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio) atomic_inc(&rdev->nr_pending); - if (is_badblock(rdev, sector, bio_sectors(raid_bio), &first_bad, - &bad_sectors)) { + if (rdev_has_badblock(rdev, sector, bio_sectors(raid_bio))) { rdev_dec_pending(rdev, mddev); return 0; } From patchwork Tue Feb 27 12:03:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 207166 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:a81b:b0:108:e6aa:91d0 with SMTP id bq27csp2652102dyb; Tue, 27 Feb 2024 04:12:03 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCUoWF6IK+zswrZFkVSiGtXWECjgtwxqDSePlR7A8tPU58LhpOajNVfZaInyOgQJBNPf076p9AduEl5XH1VCkBJePZKOqA== X-Google-Smtp-Source: AGHT+IEMj/WcKRue79yaXl+1ygHX3YR41CmMpnkGWPETrujqJC9K5pya+85FgEu5+IRDHomJvqVh X-Received: by 2002:a17:902:6803:b0:1dc:b531:839 with SMTP id h3-20020a170902680300b001dcb5310839mr2206102plk.25.1709035923733; Tue, 27 Feb 2024 04:12:03 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709035923; cv=pass; d=google.com; s=arc-20160816; b=EIEWUFdzU17eyPT1zyu+R9quUvs9DYTHyGjKfpP/AyVsu0b2FGeSNqJcIDftRRtUs1 CXJPRPQ7yah1FHW8C4lSOINHKzBBqNH7SLa5qHL652yKBp2j/qQ6nFQFg7gpvmf/cefQ yrTDfm57JbqqEA1pOs1fZo3X8KPz8fb26VBEWBIoUmtYCqFl7Oxn18cubH7j/X7Ps4X2 Fqc8nqd+xJ0mas1tqJtADdCefsYevKBMfuzvPK6NDnxYrC77XcaLtuxniNgS+WKUjURu 7uwGaLm4rVtICRDIDmhV6WPU+vmK2ugO7EsTHSb8NKNTBDxbkZDL9n/jSGkA8gRWH5Zo t/Iw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=ML+uekyiC15S7UK4VmfPn0bDPAB91mJF7e4I9jCo7tw=; fh=Ytj0OQL2yyVgRSgKlJizbwNzVtV64xy+1TGtb7SiHpY=; b=SnneI9o1ZjL78063DahJb7c0Y0K1u84voYOlETJN4z1oayZdsTDPpOCB8ORGIdEmfc SQF35IheYmiwrKFaAlGPSC+K1d2EeU697eOw+lZ9bC9NkHjDOxce+5Mc7JG+9pO677nP o3TlkECw9I3b5Lw77j2TLhYYJSGCRIfnAwx5Pv/0Hr/xLhIK79q273kBMKOQFwStnImn 1wrBCiy2hHfq3h7ZorKxETqf2yBZI18KOcph+xfxW0zaYWClWGhH1w6KNnVpc1W5Jr+v JqF87yz2CNHK5hniweSy6wYvv6gUFEV8nEvCbLhJvprwc/6azR4G+2IZq2NWnfoIVdUM yO4w==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-83169-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83169-ouuuleilei=gmail.com@vger.kernel.org" Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id b12-20020a170902bd4c00b001db27f2357esi1267875plx.579.2024.02.27.04.12.03 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Feb 2024 04:12:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-83169-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-83169-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83169-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 3A07728A762 for ; Tue, 27 Feb 2024 12:11:19 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A6B7D14199F; Tue, 27 Feb 2024 12:09:33 +0000 (UTC) Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BE3C013A86B; Tue, 27 Feb 2024 12:09:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709035771; cv=none; b=OnNOHXSa01OM2S4e2TxVgIkM77wa9BuGOPFSDvQTgV/t1LLfnvplOVE3lj/rPChG/zdwN93g5hOBGa4HBS2xbSg7pxACry1kCr3q5kP/quC80KeTw2z34xO2aREIlr/0WcQtFjcTWBE+h3xGz3C/Yq1UesBJHXx9HnG8iUfixd0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709035771; c=relaxed/simple; bh=vkhO80ttDMRLO2Lm+FPNEhsahiNySQ2XyLZPA5yKU8k=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=qVyrZrf82VLvy4xUCulIOq7uT+3EqtQhTPAebU4DbUIPnuaZJXw9I5V0UWuKeGgU1YjsDmI6VVNaxI/CcxFt4bKleWmAGdXjM3KhUQzzeo3veG1/LCbmDtRJz5elSoex3GjPXfFDSmCkz0GQV6i1vbsw7Xcwr5TbHH2y2U0uVVA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4TkbrF2GYkz4f3m7R; Tue, 27 Feb 2024 20:09:17 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 9BAE81A0232; Tue, 27 Feb 2024 20:09:24 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgAn+RHv0N1lpKNAFQ--.28259S6; Tue, 27 Feb 2024 20:09:24 +0800 (CST) From: Yu Kuai To: xni@redhat.com, paul.e.luse@linux.intel.com, song@kernel.org, shli@fb.com, neilb@suse.com Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH md-6.9 v2 02/10] md/raid1: record nonrot rdevs while adding/removing rdevs to conf Date: Tue, 27 Feb 2024 20:03:19 +0800 Message-Id: <20240227120327.1432511-3-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240227120327.1432511-1-yukuai1@huaweicloud.com> References: <20240227120327.1432511-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgAn+RHv0N1lpKNAFQ--.28259S6 X-Coremail-Antispam: 1UD129KBjvJXoWxZF47AF1xGFWxJryxCryUZFb_yoWrCFWxpw 45ta93Z3yUJa98Cw4ktw4kCr1Sv345Kay8GFZ7C3yS9asIqFWqqFWkG342qr1DGrsxAw47 Zr1UGws8C3WxKFJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBK14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jryl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_Xr0_Ar1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Cr0_Gr1UM2 8EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s0DM2AI xVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20x vE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xv r2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7M4IIrI8v6xkF7I0E8cxan2IY04 v7MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_ Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUtVW8ZwCIc40Y0x 0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8 JVWxJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIx AIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7VUbdOz7UUUUU= = X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792054052709651503 X-GMAIL-MSGID: 1792054052709651503 From: Yu Kuai For raid1, each read will iterate all the rdevs from conf and check if any rdev is non-rotational, then choose rdev with minimal IO inflight if so, or rdev with closest distance otherwise. Disk nonrot info can be changed through sysfs entry: /sys/block/[disk_name]/queue/rotational However, consider that this should only be used for testing, and user really shouldn't do this in real life. Record the number of non-rotational disks in conf, to avoid checking each rdev in IO fast path and simplify read_balance() a little bit. Co-developed-by: Paul Luse Signed-off-by: Paul Luse Signed-off-by: Yu Kuai --- drivers/md/md.h | 1 + drivers/md/raid1.c | 17 ++++++++++------- drivers/md/raid1.h | 1 + 3 files changed, 12 insertions(+), 7 deletions(-) diff --git a/drivers/md/md.h b/drivers/md/md.h index a49ab04ab707..b2076a165c10 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -207,6 +207,7 @@ enum flag_bits { * check if there is collision between raid1 * serial bios. */ + Nonrot, /* non-rotational device (SSD) */ }; static inline int is_badblock(struct md_rdev *rdev, sector_t s, int sectors, diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index a145fe48b9ce..0fed01b06de9 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -599,7 +599,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect int sectors; int best_good_sectors; int best_disk, best_dist_disk, best_pending_disk; - int has_nonrot_disk; int disk; sector_t best_dist; unsigned int min_pending; @@ -620,7 +619,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect best_pending_disk = -1; min_pending = UINT_MAX; best_good_sectors = 0; - has_nonrot_disk = 0; choose_next_idle = 0; clear_bit(R1BIO_FailFast, &r1_bio->state); @@ -637,7 +635,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect sector_t first_bad; int bad_sectors; unsigned int pending; - bool nonrot; rdev = conf->mirrors[disk].rdev; if (r1_bio->bios[disk] == IO_BLOCKED @@ -703,8 +700,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect /* At least two disks to choose from so failfast is OK */ set_bit(R1BIO_FailFast, &r1_bio->state); - nonrot = bdev_nonrot(rdev->bdev); - has_nonrot_disk |= nonrot; pending = atomic_read(&rdev->nr_pending); dist = abs(this_sector - conf->mirrors[disk].head_position); if (choose_first) { @@ -731,7 +726,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect * small, but not a big deal since when the second disk * starts IO, the first disk is likely still busy. */ - if (nonrot && opt_iosize > 0 && + if (test_bit(Nonrot, &rdev->flags) && opt_iosize > 0 && mirror->seq_start != MaxSector && mirror->next_seq_sect > opt_iosize && mirror->next_seq_sect - opt_iosize >= @@ -763,7 +758,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect * mixed ratation/non-rotational disks depending on workload. */ if (best_disk == -1) { - if (has_nonrot_disk || min_pending == 0) + if (READ_ONCE(conf->nonrot_disks) || min_pending == 0) best_disk = best_pending_disk; else best_disk = best_dist_disk; @@ -1819,6 +1814,11 @@ static int raid1_add_disk(struct mddev *mddev, struct md_rdev *rdev) WRITE_ONCE(p[conf->raid_disks].rdev, rdev); } + if (!err && bdev_nonrot(rdev->bdev)) { + set_bit(Nonrot, &rdev->flags); + WRITE_ONCE(conf->nonrot_disks, conf->nonrot_disks + 1); + } + print_conf(conf); return err; } @@ -1883,6 +1883,9 @@ static int raid1_remove_disk(struct mddev *mddev, struct md_rdev *rdev) } abort: + if (test_and_clear_bit(Nonrot, &rdev->flags)) + WRITE_ONCE(conf->nonrot_disks, conf->nonrot_disks - 1); + print_conf(conf); return err; } diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h index 14d4211a123a..5300cbaa58a4 100644 --- a/drivers/md/raid1.h +++ b/drivers/md/raid1.h @@ -71,6 +71,7 @@ struct r1conf { * allow for replacements. */ int raid_disks; + int nonrot_disks; spinlock_t device_lock; From patchwork Tue Feb 27 12:03:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 207165 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:a81b:b0:108:e6aa:91d0 with SMTP id bq27csp2651769dyb; Tue, 27 Feb 2024 04:11:24 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVo6cmsenL8sTs5XLXdBukeJWdM+wWDlM0Q5z3mDa345UW/wWrlVDQ8h32iqUaBi0ghiR563aBSGZ8ZRquyHCXO1vhXWw== X-Google-Smtp-Source: AGHT+IGtFfJhI7rq4IxnkpngIVIQrfb1xzcDffgkNYLUMtg/odo3cW75MBWmndTvp4eGA2h+W/4g X-Received: by 2002:a05:622a:14cd:b0:42c:7c0e:1dda with SMTP id u13-20020a05622a14cd00b0042c7c0e1ddamr12190846qtx.6.1709035884642; Tue, 27 Feb 2024 04:11:24 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709035884; cv=pass; d=google.com; s=arc-20160816; b=h3n4fzYnfXCyrCIGBK2CbX7KOwTdGhtlOiAP7sRCIT0L76oeIVOpgpE5HeqdvJ+bOJ BVNz5sSO3yojvhYuavC31LLtR+NoVbEuqJUM+7BmQHdPDjLqJXAcRgHxGSnZgGnxpKXx 2TsQ8zQHL1Dd4mGM6DYnIXIwXH7fXUIlfHXL9eq40DcQbqWgXpXIcKha63hd+OMKqwgh pZrfQIKAaPs+0J/0UHVZXqILMx3iMO3aLt0HD9EAF5oGhHTgizAVddEn9XrJ3LQl3nvT DFRVUtOZMw+WERe3unxFhexHBIjZBpdxisFmhWkk8RwsJ38yKxBNolf17q4ytbKGb3UT PLHQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=AYI14mbTiQ5uTwOzEYZMcKeL8x961l4a6aNfhBWGGBQ=; fh=Ytj0OQL2yyVgRSgKlJizbwNzVtV64xy+1TGtb7SiHpY=; b=U0pgKqToCLiFAfBqTrUcexnuC3O2JnLGkwVAvqlkR7sZLgSPIl36dmLJOjy0TX/Xdh OZ3Mjin0vG/69qbcou3Q1UZTchameZQyCvn4CO5AbFnQbktKgqW0viI+82gXZSXgTqQT DD4/jXLQn/pW0dIHTr2NZzXhcnkKIsB0E5e5Y/zFIMDcKOrmgkAWLOAXV8fNNh57sHi6 TRATS9OAD9i+FfHUCca/S/SZYd9/6rdkP+t3KVD0agoiEo32UfCeQPt69fkgsgiv7z2L /Byyc01/78kQovmYA6ELoBcNRbIcHx2FXlNrd2gnciF7KXj/Y4zjJadaFsB62NnKXxQC jlvw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-83168-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83168-ouuuleilei=gmail.com@vger.kernel.org" Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id t6-20020ac87606000000b0042db3db25afsi7118397qtq.773.2024.02.27.04.11.24 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Feb 2024 04:11:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-83168-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-83168-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83168-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 17E291C24CE8 for ; Tue, 27 Feb 2024 12:10:57 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 05BFC13DB83; Tue, 27 Feb 2024 12:09:33 +0000 (UTC) Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA71D13A872; Tue, 27 Feb 2024 12:09:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709035770; cv=none; b=gaAgqs/moHeJtMZbYsdeV394HbzBoQ2e0QsMBgwbkyMXDFMmIbExTI68XhvMhgEMWdsE9iB5SqDBkchKFAuQyvMa4p/m0I1dZUR0rEb3jPjYAhnIab9V7Y0Q7w1e8iONvpxJckuXV+FkWKvU6wJvDzcab4AJK2aobxZFJQKV08c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709035770; c=relaxed/simple; bh=QYOlqih7BF6kX5BYdwF9n3/QSS94mDOA3t9MJUII4Qw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Z1e5l9QAIgXYukNBBYczunWnM6bsm51ZbhrFfOrt94GO1MfLydS/hb8PR/+z2AQARPhkdnJLiPrwFavc/cSfM54NwDva08Vwg/Kv82A+7/a83idCraV7ytnRrgu3BQ8WlpeFxN82aGDWGssRILW0pGE+iP1frAhe5HjfFImPRrY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4TkbrL1WCGz4f3kJv; Tue, 27 Feb 2024 20:09:22 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 888621A0283; Tue, 27 Feb 2024 20:09:25 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgAn+RHv0N1lpKNAFQ--.28259S8; Tue, 27 Feb 2024 20:09:25 +0800 (CST) From: Yu Kuai To: xni@redhat.com, paul.e.luse@linux.intel.com, song@kernel.org, shli@fb.com, neilb@suse.com Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH md-6.9 v2 04/10] md/raid1-10: add a helper raid1_check_read_range() Date: Tue, 27 Feb 2024 20:03:21 +0800 Message-Id: <20240227120327.1432511-5-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240227120327.1432511-1-yukuai1@huaweicloud.com> References: <20240227120327.1432511-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgAn+RHv0N1lpKNAFQ--.28259S8 X-Coremail-Antispam: 1UD129KBjvJXoW7AFy8Zw15tF1rurW5CF4kWFg_yoW8KFy5pr 4Yya43tr1UK3y3W3W3uF1xC34FyayfWFW8GrWfX3WDWry5Ga9akF97JryjgFyDWry3Xw12 qa1j9rWxua47CaDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUP214x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Xr0_Ar1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJV W8JwCI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7VUbmZ X7UUUUU== X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792054011813928005 X-GMAIL-MSGID: 1792054011813928005 From: Yu Kuai The checking and handler of bad blocks appear many timers during read_balance() in raid1 and raid10. This helper will be used in later patches to simplify read_balance() a lot. Co-developed-by: Paul Luse Signed-off-by: Paul Luse Signed-off-by: Yu Kuai Reviewed-by: Xiao Ni --- drivers/md/raid1-10.c | 49 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c index 512746551f36..9bc0f0022a6c 100644 --- a/drivers/md/raid1-10.c +++ b/drivers/md/raid1-10.c @@ -227,3 +227,52 @@ static inline bool exceed_read_errors(struct mddev *mddev, struct md_rdev *rdev) return false; } + +/** + * raid1_check_read_range() - check a given read range for bad blocks, + * available read length is returned; + * @rdev: the rdev to read; + * @this_sector: read position; + * @len: read length; + * + * helper function for read_balance() + * + * 1) If there are no bad blocks in the range, @len is returned; + * 2) If the range are all bad blocks, 0 is returned; + * 3) If there are partial bad blocks: + * - If the bad block range starts after @this_sector, the length of first + * good region is returned; + * - If the bad block range starts before @this_sector, 0 is returned and + * the @len is updated to the offset into the region before we get to the + * good blocks; + */ +static inline int raid1_check_read_range(struct md_rdev *rdev, + sector_t this_sector, int *len) +{ + sector_t first_bad; + int bad_sectors; + + /* no bad block overlap */ + if (!is_badblock(rdev, this_sector, *len, &first_bad, &bad_sectors)) + return *len; + + /* + * bad block range starts offset into our range so we can return the + * number of sectors before the bad blocks start. + */ + if (first_bad > this_sector) + return first_bad - this_sector; + + /* read range is fully consumed by bad blocks. */ + if (this_sector + *len <= first_bad + bad_sectors) + return 0; + + /* + * final case, bad block range starts before or at the start of our + * range but does not cover our entire range so we still return 0 but + * update the length with the number of sectors before we get to the + * good ones. + */ + *len = first_bad + bad_sectors - this_sector; + return 0; +} From patchwork Tue Feb 27 12:03:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 207167 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:a81b:b0:108:e6aa:91d0 with SMTP id bq27csp2652249dyb; Tue, 27 Feb 2024 04:12:20 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCX6Pg0hPSWkEa0PmCU6jPo1IeLA84luoGWy7hLUJ613wOoTmgE7dZUQAsiai1kOq0E3TaOHTA1BWMjvyqx9x6ShYZvcyg== X-Google-Smtp-Source: AGHT+IHqPJTZHxENuA77cbmBLb3/IhCmhXlIgfW+Ay/JcUI//h7Ud/cOzGr+W+onpOeTu82kU5wF X-Received: by 2002:a05:6870:9625:b0:220:15bb:ba43 with SMTP id d37-20020a056870962500b0022015bbba43mr5779907oaq.48.1709035940425; Tue, 27 Feb 2024 04:12:20 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709035940; cv=pass; d=google.com; s=arc-20160816; b=RReKdoFetSCKtXDgtG17g+kxanB3JEG7JB3rMS7lGMxxxtPZt5foOfETDGgmJI6yQa RoOS/E/A4s+gKPy0gxT39d/YlYKoK4qp3EvBBdihS3QUoQTzUUoK3BT7HHFXC0ktf4r6 s7qgC/cBzJWJKC9FixuTxjlBfAcu20dqjgK/epOHVITW8HFQwFo/Et3W5dHZ0GA3DW/8 kS8/L4CbYkP8Zy//X73+ZX8tWekIiCChnD8NMJS1PnL9+HJGv0Myg5tnbPbqsuxh432C 1D/c28gNVv4INLjR5rY3gMt33Q8O78yRUR8uAZ/xB+e+b413+A8O2jCih5AUEJEnWOZw axVQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=4Jb6gQbHQX2alF3IeGEbveqBv8Vw97CYri7Xv86wjY4=; fh=Ytj0OQL2yyVgRSgKlJizbwNzVtV64xy+1TGtb7SiHpY=; b=uM0oCUlMnwH8nkHEIvrukkQu3BJfkebntDkn+vs+yH/TlXeYZfbiIgfUQXiqETvajf DYHCVKXXEccxxo+dQHBLUjRBm1vOAR/srofIbKjlhnw2KoI8D0yQb73TznWzln1pQf8d EhKfZSkal4BG8Bg0orvp5ZcOuu6eC/Pkflrzfwx6uBYjQ6xdCb2YKp+L1TURDY1x84Oc MNcFOezl+EDgJNTPKIC5orCc55xcCGSgLukCqDl/Wezg50GogdZEs5bn7CHSYjiZTi/q 28bCA2ebGtn+TWFWSa8Q7eTRJa3iEw8rymeuEGhQcR8W4NZRup3PMt5c9kjPdIhj5pd5 TZXA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-83173-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83173-ouuuleilei=gmail.com@vger.kernel.org" Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id c21-20020a05620a201500b00787a71e63d5si6958484qka.151.2024.02.27.04.12.20 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Feb 2024 04:12:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-83173-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-83173-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83173-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 0D19D1C212AC for ; Tue, 27 Feb 2024 12:12:20 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A0EC6145B1D; Tue, 27 Feb 2024 12:09:35 +0000 (UTC) Received: from dggsgout12.his.huawei.com (unknown [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2D25F13A883; Tue, 27 Feb 2024 12:09:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709035772; cv=none; b=qFytnjMHp/ULR5aK3epdpYmQfs8dBumjyq8x26wuajbra++KRzIOpeV3xKIjKN+P1mVp8toFpZyc6DtBc6tfOnGIDvfOPb8RmZ9tHjY0uOKVRgxaglv06Bjk0sjmWLPJ1B0QufYyUxPb6D4ZezqJCLrPaXrL026h9v6X4xkLt6s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709035772; c=relaxed/simple; bh=Dtuvhl1cdTcDzxwUYUJ1R9NwqKmtU3nTt/FeyUFjpmw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=WexSZ5dNCvJrDV5s2/Gn9kZZj44bxFE7Pnvr2vhX3LsX7HsEgETnw4MgXxUaJqT4yPPZ4zy78ry+uX2qSL0GdzGFRy2DMhYDPRMMTPorZpEGxeub3Vm3OmC0pdS6zi+Z7Bg9HqGJj+7/kQFfvzDJCBJvVFGjeQliHeO2AvhTRp0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4TkbrJ5dbBz4f3jdn; Tue, 27 Feb 2024 20:09:20 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 042451A0BA4; Tue, 27 Feb 2024 20:09:26 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgAn+RHv0N1lpKNAFQ--.28259S9; Tue, 27 Feb 2024 20:09:25 +0800 (CST) From: Yu Kuai To: xni@redhat.com, paul.e.luse@linux.intel.com, song@kernel.org, shli@fb.com, neilb@suse.com Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH md-6.9 v2 05/10] md/raid1-10: factor out a new helper raid1_should_read_first() Date: Tue, 27 Feb 2024 20:03:22 +0800 Message-Id: <20240227120327.1432511-6-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240227120327.1432511-1-yukuai1@huaweicloud.com> References: <20240227120327.1432511-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgAn+RHv0N1lpKNAFQ--.28259S9 X-Coremail-Antispam: 1UD129KBjvJXoWxXr17Jw4DJrWruF4DWr4xCrg_yoWrWF43pw 4avF93AryUKay3Aws8A3yDua4Sy34rWFWUKFWxWws5uFySqFW5Gay5GryY9r1DuF95Jr17 Xa45GrW5C3ZrJFJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUP214x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Xr0_Ar1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JV WxJwCI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7VUbmZ X7UUUUU== X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792054070220532758 X-GMAIL-MSGID: 1792054070220532758 From: Yu Kuai If resync is in progress, read_balance() should find the first usable disk, otherwise, data could be inconsistent after resync is done. raid1 and raid10 implement the same checking, hence factor out the checking to make code cleaner. Noted that raid1 is using 'mddev->recovery_cp', which is updated after all resync IO is done, while raid10 is using 'conf->next_resync', which is inaccurate because raid10 update it before submitting resync IO. Fortunately, raid10 read IO can't concurrent with resync IO, hence there is no problem. And this patch also switch raid10 to use 'mddev->recovery_cp'. Co-developed-by: Paul Luse Signed-off-by: Paul Luse Signed-off-by: Yu Kuai Reviewed-by: Xiao Ni --- drivers/md/raid1-10.c | 20 ++++++++++++++++++++ drivers/md/raid1.c | 15 ++------------- drivers/md/raid10.c | 13 ++----------- 3 files changed, 24 insertions(+), 24 deletions(-) diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c index 9bc0f0022a6c..2ea1710a3b70 100644 --- a/drivers/md/raid1-10.c +++ b/drivers/md/raid1-10.c @@ -276,3 +276,23 @@ static inline int raid1_check_read_range(struct md_rdev *rdev, *len = first_bad + bad_sectors - this_sector; return 0; } + +/* + * Check if read should choose the first rdev. + * + * Balance on the whole device if no resync is going on (recovery is ok) or + * below the resync window. Otherwise, take the first readable disk. + */ +static inline bool raid1_should_read_first(struct mddev *mddev, + sector_t this_sector, int len) +{ + if ((mddev->recovery_cp < this_sector + len)) + return true; + + if (mddev_is_clustered(mddev) && + md_cluster_ops->area_resyncing(mddev, READ, this_sector, + this_sector + len)) + return true; + + return false; +} diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index fc5899fb08c1..640d5d8f789a 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -605,11 +605,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect struct md_rdev *rdev; int choose_first; - /* - * Check if we can balance. We can balance on the whole - * device if no resync is going on, or below the resync window. - * We take the first readable disk when above the resync window. - */ retry: sectors = r1_bio->sectors; best_disk = -1; @@ -619,16 +614,10 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect best_pending_disk = -1; min_pending = UINT_MAX; best_good_sectors = 0; + choose_first = raid1_should_read_first(conf->mddev, this_sector, + sectors); clear_bit(R1BIO_FailFast, &r1_bio->state); - if ((conf->mddev->recovery_cp < this_sector + sectors) || - (mddev_is_clustered(conf->mddev) && - md_cluster_ops->area_resyncing(conf->mddev, READ, this_sector, - this_sector + sectors))) - choose_first = 1; - else - choose_first = 0; - for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) { sector_t dist; sector_t first_bad; diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index d5a7a621f0f0..8aecdb1ccc16 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -748,17 +748,8 @@ static struct md_rdev *read_balance(struct r10conf *conf, best_good_sectors = 0; do_balance = 1; clear_bit(R10BIO_FailFast, &r10_bio->state); - /* - * Check if we can balance. We can balance on the whole - * device if no resync is going on (recovery is ok), or below - * the resync window. We take the first readable disk when - * above the resync window. - */ - if ((conf->mddev->recovery_cp < MaxSector - && (this_sector + sectors >= conf->next_resync)) || - (mddev_is_clustered(conf->mddev) && - md_cluster_ops->area_resyncing(conf->mddev, READ, this_sector, - this_sector + sectors))) + + if (raid1_should_read_first(conf->mddev, this_sector, sectors)) do_balance = 0; for (slot = 0; slot < conf->copies ; slot++) { From patchwork Tue Feb 27 12:03:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 207184 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:a81b:b0:108:e6aa:91d0 with SMTP id bq27csp2658324dyb; Tue, 27 Feb 2024 04:23:30 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCXPAztssCB13j028e92pIGQAy0CSdyk/b2XB2SrhE2zjj4XmsJM9RPx0YcTvFrlq/5u+AzHjUj+N1Q3Cco8e9jQATZuvw== X-Google-Smtp-Source: AGHT+IEqSqldS6ndDLAxKOPWBoxCa3JNOGysK9lMjjeEg2Zn+2lC+sruv4Q2KC2Stn3aSFuBB0t3 X-Received: by 2002:a17:903:4303:b0:1db:9ff1:b59b with SMTP id jz3-20020a170903430300b001db9ff1b59bmr9544773plb.23.1709036609884; Tue, 27 Feb 2024 04:23:29 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709036609; cv=pass; d=google.com; s=arc-20160816; b=vsa6yfj1HIkdkR1xPAagDOv6cmLaEJlIU8xnmiAG5LilgaWkTmXCstXpkiCEkNYG01 1DGC4VRy672ucwqlkXSDKymzgcRW82GAnLSmwytV3PxsJCG+3J1gcUQtWe3fVyNswVsc +WnN9SAdRe9ZWOq58qoCZqeNSivoBwtuQjRqxGCACI+NRpWrGgQoA/Hl7syoC1Dyto4O H+9rbyuXRTfRZBPr1MM48rvPBLA3bHnes0FIDAc0y01OHyEgFOG++xTcFfWR/hO4kuX7 MNibdeCP9LdttNpQ3Tg9o1M0FpBhOPsEowhJETu3vKI+DRvVJil4fJqM+YT9mxncOYR6 Nt7A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=yaXk+3IPhd7z3WBtl89l4QB6SrgH43q877Rt91dbsjM=; fh=Ytj0OQL2yyVgRSgKlJizbwNzVtV64xy+1TGtb7SiHpY=; b=wLSqhB8SVxJb7mRgJCQLVFOKhkay9LGNmRCoAVTqNIJYMYwdGN75hEkSRFGeWBFcAe YE4LCC82OdIueEIaMZZGCcB/Dit7kzJNakwCbJhkcu92VdP48oIrKbuA2GYYDUjX/GGk uDsHaom7nABAINWIhXb4FlM8OpvQPR5E7tpwXID3bfeaVf3jREzEA00UI/M/KGKYDLHs tpEJQoB2ulhdU69ugXUjqwYR9so1pljFL4LhZjASqkxmVay0VZ7L90ThJpp3DCiqupBv 7RzpOuWcB2lMV2hA+QaBafSy3y9DK3SVgkOqDCcgolzcWBPrKImAbXcyROEK2tUYEPo7 yy/A==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-83170-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83170-ouuuleilei=gmail.com@vger.kernel.org" Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id im15-20020a170902bb0f00b001dc8ebc6229si1284646plb.536.2024.02.27.04.23.29 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Feb 2024 04:23:29 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-83170-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-83170-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83170-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id DE3F4B291E8 for ; Tue, 27 Feb 2024 12:11:28 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E0ED31420A2; Tue, 27 Feb 2024 12:09:33 +0000 (UTC) Received: from dggsgout12.his.huawei.com (unknown [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2D21513A878; Tue, 27 Feb 2024 12:09:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709035771; cv=none; b=TBYD5AwCcp2cf9dZus/bQU8EHKzNx/fzMlzH/Uoe10gJLqgX4HXKPRhNvOCHUhoez+XDYp7xSFTiNLnCXCr5JMg9u+5ZBc8D64l4ZVOP6QijAzUO84uwaZFmnqt3ieRsl7mQxTZ/OfAbKcVmFs8Xy+LjETZXzS9Y9GFcHd9gmR0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709035771; c=relaxed/simple; bh=2c/lRR6BYe0jqzqLX94eTav/pOJl8LWew6XeDkC265s=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=iAPuyhSmggzDGSKvuCVly6sSP3W7cpQQpvWiDjS4i+sWauhVHYnH1hVZ9M/AalQFAKbHc1hXYqdqPuaIRlxpV4BRr4pwipiG21eu+0a19zsudtdhwz2IDz4yYNqMiL+Cb5hyv9CHvcPcAIAbecVGvQSBqhNgHvM0w0mk88BACtQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4TkbrK1tJMz4f3jrl; Tue, 27 Feb 2024 20:09:21 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 761EC1A0232; Tue, 27 Feb 2024 20:09:26 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgAn+RHv0N1lpKNAFQ--.28259S10; Tue, 27 Feb 2024 20:09:26 +0800 (CST) From: Yu Kuai To: xni@redhat.com, paul.e.luse@linux.intel.com, song@kernel.org, shli@fb.com, neilb@suse.com Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH md-6.9 v2 06/10] md/raid1: factor out read_first_rdev() from read_balance() Date: Tue, 27 Feb 2024 20:03:23 +0800 Message-Id: <20240227120327.1432511-7-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240227120327.1432511-1-yukuai1@huaweicloud.com> References: <20240227120327.1432511-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgAn+RHv0N1lpKNAFQ--.28259S10 X-Coremail-Antispam: 1UD129KBjvJXoWxZFW8CrW5GFW3Jr1UAw13Arb_yoWrCw47pw 45AFZ3tryUX34rZws8J3yDWr93t34fJF48GrZ7Xwnagrn3KryqgFWUGrya9Fy5Crs8Jw1U Zw15Ar4ak3Z7KFDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUP214x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Xr0_Ar1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JV WxJwCI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7VUbmZ X7UUUUU== X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792054771889730174 X-GMAIL-MSGID: 1792054771889730174 From: Yu Kuai read_balance() is hard to understand because there are too many status and branches, and it's overlong. This patch factor out the case to read the first rdev from read_balance(), there are no functional changes. Co-developed-by: Paul Luse Signed-off-by: Paul Luse Signed-off-by: Yu Kuai Reviewed-by: Xiao Ni --- drivers/md/raid1.c | 63 +++++++++++++++++++++++++++++++++------------- 1 file changed, 46 insertions(+), 17 deletions(-) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 640d5d8f789a..3eeaef7f8ded 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -579,6 +579,47 @@ static sector_t align_to_barrier_unit_end(sector_t start_sector, return len; } +static void update_read_sectors(struct r1conf *conf, int disk, + sector_t this_sector, int len) +{ + struct raid1_info *info = &conf->mirrors[disk]; + + atomic_inc(&info->rdev->nr_pending); + if (info->next_seq_sect != this_sector) + info->seq_start = this_sector; + info->next_seq_sect = this_sector + len; +} + +static int choose_first_rdev(struct r1conf *conf, struct r1bio *r1_bio, + int *max_sectors) +{ + sector_t this_sector = r1_bio->sector; + int len = r1_bio->sectors; + int disk; + + for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) { + struct md_rdev *rdev; + int read_len; + + if (r1_bio->bios[disk] == IO_BLOCKED) + continue; + + rdev = conf->mirrors[disk].rdev; + if (!rdev || test_bit(Faulty, &rdev->flags)) + continue; + + /* choose the first disk even if it has some bad blocks. */ + read_len = raid1_check_read_range(rdev, this_sector, &len); + if (read_len > 0) { + update_read_sectors(conf, disk, this_sector, read_len); + *max_sectors = read_len; + return disk; + } + } + + return -1; +} + /* * This routine returns the disk from which the requested read should * be done. There is a per-array 'next expected sequential IO' sector @@ -603,7 +644,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect sector_t best_dist; unsigned int min_pending; struct md_rdev *rdev; - int choose_first; retry: sectors = r1_bio->sectors; @@ -614,10 +654,11 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect best_pending_disk = -1; min_pending = UINT_MAX; best_good_sectors = 0; - choose_first = raid1_should_read_first(conf->mddev, this_sector, - sectors); clear_bit(R1BIO_FailFast, &r1_bio->state); + if (raid1_should_read_first(conf->mddev, this_sector, sectors)) + return choose_first_rdev(conf, r1_bio, max_sectors); + for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) { sector_t dist; sector_t first_bad; @@ -663,8 +704,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect * bad_sectors from another device.. */ bad_sectors -= (this_sector - first_bad); - if (choose_first && sectors > bad_sectors) - sectors = bad_sectors; if (best_good_sectors > sectors) best_good_sectors = sectors; @@ -674,8 +713,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect best_good_sectors = good_sectors; best_disk = disk; } - if (choose_first) - break; } continue; } else { @@ -690,10 +727,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect pending = atomic_read(&rdev->nr_pending); dist = abs(this_sector - conf->mirrors[disk].head_position); - if (choose_first) { - best_disk = disk; - break; - } /* Don't change to another disk for sequential reads */ if (conf->mirrors[disk].next_seq_sect == this_sector || dist == 0) { @@ -769,13 +802,9 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect rdev = conf->mirrors[best_disk].rdev; if (!rdev) goto retry; - atomic_inc(&rdev->nr_pending); - sectors = best_good_sectors; - - if (conf->mirrors[best_disk].next_seq_sect != this_sector) - conf->mirrors[best_disk].seq_start = this_sector; - conf->mirrors[best_disk].next_seq_sect = this_sector + sectors; + sectors = best_good_sectors; + update_read_sectors(conf, disk, this_sector, sectors); } *max_sectors = sectors; From patchwork Tue Feb 27 12:03:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 207187 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:a81b:b0:108:e6aa:91d0 with SMTP id bq27csp2658669dyb; Tue, 27 Feb 2024 04:24:16 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWGNSS0iuWsDbcHkYpQlXPrH7fqwe3Je1FVjMUv7vEsB2ANMbnDP8jVmnsR/9vV93Q9OTZumTWIQ8MYWZozJBCt0qYAbw== X-Google-Smtp-Source: AGHT+IHVsi/ntFk2r+XA5cxHnQdIXh29XaYEUFNgyoJmecW8/hX4RHvR7y9JXBzs97K42DvNYk+x X-Received: by 2002:a17:90b:4d8c:b0:29a:9dca:e85d with SMTP id oj12-20020a17090b4d8c00b0029a9dcae85dmr5383343pjb.41.1709036656141; Tue, 27 Feb 2024 04:24:16 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709036656; cv=pass; d=google.com; s=arc-20160816; b=c9GTMZUA6kbZEMZ+inFtNnAvksKXoJCcgxk87uuMPxb9E2ruUjWzKg/yHeWylbF3/J 5NUnyCiM+beUqUb+Yzsg4zLg4XsiO95Ckk9N9NMeMzHpotYCPW0+WVcasVFG1G7LWAvI xbwaYWFs4M04GeXmWBq7SwMrbVD2mDxnhWlKqjtWRx31ZDkcHzFsOYJHDco1yVraRaLP CSK4uXU1K0sa1o7JnKHzmcszCi/yDoB3JRkyI8XsithQW9276rJjMH1KtrA5B/VgnOru F8P9xSNG3h48wZBwRR650typYxB/PiZbRRSjyb1DgWOaez2LN5HDC7wxqOqHWDOWuflZ 0W9g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=FN14YjszLPwuKdnQaSHv+offq9uODubWkfxyis/asd0=; fh=Ytj0OQL2yyVgRSgKlJizbwNzVtV64xy+1TGtb7SiHpY=; b=eisWeMw/ZIwPByyPbQnpc7OGA8tdyEvZjJQYrgXKa1ZF2JGqYubX+MRULkaWZRfJoW rxWdnGQO8QuK564aU8NMPnmoVs0ZYzgik7VJUbkj6aYwQvobqyGAud44jRWyx+qqe0vC qd9YMakPHKNSEm5qAbnCBLNtbewrVc72mJA3U+rrQob/loDHn/n5ROHkyP/+a3XsRL9g ATCf/9fViLCK4RcUMj17QIyZ2vYLhW9OT7TmmC4+fvCuzbEeapgI2HmUGlZyHxCDwLU+ Aauw8dlMmkPBAViu+vJyr7vQ7fwyWY9QNtJNHb1L81fZjbtLH3Dv+orYE1wWOV3/LuwC gB9Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-83174-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83174-ouuuleilei=gmail.com@vger.kernel.org" Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id t7-20020a17090a510700b0029a66775c1csi7168609pjh.78.2024.02.27.04.24.15 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Feb 2024 04:24:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-83174-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-83174-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83174-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 41074B29658 for ; Tue, 27 Feb 2024 12:12:25 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C58B3145B26; Tue, 27 Feb 2024 12:09:35 +0000 (UTC) Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 522A713A894; Tue, 27 Feb 2024 12:09:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709035773; cv=none; b=PQ4NrcWRy/dedVVZUTgisExv37ZBaMqy8znnmugWXxFZd3JQF9E36sH3Ip9qwYfPvGU5OC84n3V+m2aoHftZ/B75P9Im4R3aTOMAggw77ORWzN37p6QA3X29cWim5gWJm+Otr4a4uUfMxusN7FvCaQAfqINawzomaeVhou61/wI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709035773; c=relaxed/simple; bh=nDJZRU0YCYPAjrhhKKDHVPnPBTfasvLlEitAIstAKTI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=aRgudS/rEzJdLfvcwvsloUiTTOBCU5cIePzPhVGva0BMeTh6zifBkr1qnLiNb+99Ad/wVHSqYQntAZqhqXiJ8KfC1NlFiY1KRl7nADTEG+FIhouiIuCPrkEvm/om0/o+a14Tm9vXuhRzMcJFHNci9UBTGxVHUIfTpWj1llmOAX8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4TkbrJ0YLBz4f3mJL; Tue, 27 Feb 2024 20:09:20 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 6181E1A0EEE; Tue, 27 Feb 2024 20:09:27 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgAn+RHv0N1lpKNAFQ--.28259S12; Tue, 27 Feb 2024 20:09:27 +0800 (CST) From: Yu Kuai To: xni@redhat.com, paul.e.luse@linux.intel.com, song@kernel.org, shli@fb.com, neilb@suse.com Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH md-6.9 v2 08/10] md/raid1: factor out choose_bb_rdev() from read_balance() Date: Tue, 27 Feb 2024 20:03:25 +0800 Message-Id: <20240227120327.1432511-9-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240227120327.1432511-1-yukuai1@huaweicloud.com> References: <20240227120327.1432511-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgAn+RHv0N1lpKNAFQ--.28259S12 X-Coremail-Antispam: 1UD129KBjvJXoWxZF47Ww4ftFyxtF4fWryUKFg_yoWrJF17pw 43KFWftryUX34fWws8J3yUuryft345Ga18JryxJ3WS9r93Cr90gFW8GryYgFyUCrWrA3W7 Zw15Zr4293WkKFDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUP214x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JV WxJwCI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7VUbmZ X7UUUUU== X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792054821144022873 X-GMAIL-MSGID: 1792054821144022873 From: Yu Kuai read_balance() is hard to understand because there are too many status and branches, and it's overlong. This patch factor out the case to read the rdev with bad blocks from read_balance(), there are no functional changes. Co-developed-by: Paul Luse Signed-off-by: Paul Luse Signed-off-by: Yu Kuai Reviewed-by: Xiao Ni --- drivers/md/raid1.c | 79 ++++++++++++++++++++++++++++------------------ 1 file changed, 48 insertions(+), 31 deletions(-) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 407e2bf5c322..76bb59ad1485 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -620,6 +620,44 @@ static int choose_first_rdev(struct r1conf *conf, struct r1bio *r1_bio, return -1; } +static int choose_bb_rdev(struct r1conf *conf, struct r1bio *r1_bio, + int *max_sectors) +{ + sector_t this_sector = r1_bio->sector; + int best_disk = -1; + int best_len = 0; + int disk; + + for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) { + struct md_rdev *rdev; + int len; + int read_len; + + if (r1_bio->bios[disk] == IO_BLOCKED) + continue; + + rdev = conf->mirrors[disk].rdev; + if (!rdev || test_bit(Faulty, &rdev->flags) || + test_bit(WriteMostly, &rdev->flags)) + continue; + + /* keep track of the disk with the most readable sectors. */ + len = r1_bio->sectors; + read_len = raid1_check_read_range(rdev, this_sector, &len); + if (read_len > best_len) { + best_disk = disk; + best_len = read_len; + } + } + + if (best_disk != -1) { + *max_sectors = best_len; + update_read_sectors(conf, best_disk, this_sector, best_len); + } + + return best_disk; +} + static int choose_slow_rdev(struct r1conf *conf, struct r1bio *r1_bio, int *max_sectors) { @@ -708,8 +746,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) { sector_t dist; - sector_t first_bad; - int bad_sectors; unsigned int pending; rdev = conf->mirrors[disk].rdev; @@ -722,36 +758,8 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect continue; if (test_bit(WriteMostly, &rdev->flags)) continue; - /* This is a reasonable device to use. It might - * even be best. - */ - if (is_badblock(rdev, this_sector, sectors, - &first_bad, &bad_sectors)) { - if (best_dist < MaxSector) - /* already have a better device */ - continue; - if (first_bad <= this_sector) { - /* cannot read here. If this is the 'primary' - * device, then we must not read beyond - * bad_sectors from another device.. - */ - bad_sectors -= (this_sector - first_bad); - if (best_good_sectors > sectors) - best_good_sectors = sectors; - - } else { - sector_t good_sectors = first_bad - this_sector; - if (good_sectors > best_good_sectors) { - best_good_sectors = good_sectors; - best_disk = disk; - } - } + if (rdev_has_badblock(rdev, this_sector, sectors)) continue; - } else { - if ((sectors > best_good_sectors) && (best_disk >= 0)) - best_disk = -1; - best_good_sectors = sectors; - } if (best_disk >= 0) /* At least two disks to choose from so failfast is OK */ @@ -843,6 +851,15 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect if (best_disk >= 0) return best_disk; + /* + * If we are here it means we didn't find a perfectly good disk so + * now spend a bit more time trying to find one with the most good + * sectors. + */ + disk = choose_bb_rdev(conf, r1_bio, max_sectors); + if (disk >= 0) + return disk; + return choose_slow_rdev(conf, r1_bio, max_sectors); } From patchwork Tue Feb 27 12:03:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 207188 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:a81b:b0:108:e6aa:91d0 with SMTP id bq27csp2658725dyb; Tue, 27 Feb 2024 04:24:22 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCV8EHsKQvE5rx6lEyUPZ0pb7EUh1iCyyrW2AHcRLvKX5eclvMvkwbKqzyCedg3EdCa63BC69q6mag1xPma0zKtJWanBvw== X-Google-Smtp-Source: AGHT+IFtm2mejUxFBj4GccL+6MtcO5iIY0ePIBbr9owetUCpSvvIZixdas59MemGRiIM/vJ3UO4Y X-Received: by 2002:a05:6402:6cd:b0:566:414d:d70a with SMTP id n13-20020a05640206cd00b00566414dd70amr1066655edy.23.1709036662562; Tue, 27 Feb 2024 04:24:22 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709036662; cv=pass; d=google.com; s=arc-20160816; b=h5ypQ5juNwPGd6q0baWBeaBToge8p7Ekd7mMDPuGpkQq70vlFKN/HkrAkXNDOp5TXz f+mRhihsj8F/NIvc2RZwd1KU+jzBxKjZFZXtpwMAkPu9Eh7E8Iexq21OwM3mWXFgbkX/ M+7wAv7qrJTj+yw+tJqD0uSlpkiPRR6Tr/XTNdvBRtCuas8vMSzaaSPM0P/O06+k8xZN Tp2TWDYeGrSlEOMIfRAyHo4b7/6LbIeR+ZeU6jdOohW9gDlQsreKNjuDYIwjKbBhGXBD FwrTMiTVoVy80lIAf9klR18c1ZEOfxfp6IAL8I6en2WFrpoHOPEw3MfqURU63PmA4ruh hsrA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=TmWpbhds5QybYqIiZ3TpgqusanFkftwh52eY863RxQg=; fh=Ytj0OQL2yyVgRSgKlJizbwNzVtV64xy+1TGtb7SiHpY=; b=OVcVEiTL/TvXRhH1Su14e12SNpsGugvUBv7slfIiOVF5rTkl2OcOBahjgjf18TV2n5 qNxDRLHxMauPntang9iE0XbJrP3SOUSRmTKzrPGKXEtZ90jtu4Yh1hi9FOlivj93mdfV 995pcXgLtJ9R3ZgX1e0bSRN8XNDAQ5b8d0AxLgfb4umDh8pQidEPyDKznSLKdom/Ke7Z 20uu4+G38WK0BPeX8LlGQSCkbSvDL+q6n7bLQp6fhxFMkD27eypeYA0h/PVjRX7gnKEl tGNKHDAKy474dEv67AFhO+xyhWJ65NtL6p67oelVBm1RWEDx55+dCdSJxWfjXbtrn8JY JHPw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-83175-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83175-ouuuleilei=gmail.com@vger.kernel.org" Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id w4-20020a056402268400b00565af2eaccfsi685936edd.280.2024.02.27.04.24.22 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Feb 2024 04:24:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-83175-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-83175-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83175-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 0D7561F2B9F6 for ; Tue, 27 Feb 2024 12:12:35 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 78912145FE5; Tue, 27 Feb 2024 12:09:36 +0000 (UTC) Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7589013AA5C; Tue, 27 Feb 2024 12:09:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709035773; cv=none; b=BgsyA+RDhSOpJmQzSm6spDap1r1fxNnL3Q8K/uUuGt/hVyfn3h1utKU86ine794lvhrQaQJlwkZ35zpEJb5PexHLJKwynhz1Zi7HtMzT77BPcjWfDEy0GkNd0US9OCxDjOWQ5pFxnN57Vn0IjBdZM+sCC4Afe4iWWKmPeCEkS7M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709035773; c=relaxed/simple; bh=e7VEpwjkZpubVBdOHDpFqeGrY5dOQNY4u1VlFpGkt24=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Wio9yh1Unaqmshar35x3TLk9Dy4s7uwlrHXRMo5EZLYZbd+5DG9se1xI42kwiOQvGIFxMtRGMxqe0KE92YwVonXYoBSTtURTPWr3BaoPndSFaCJ/Rp4za9fh6LUqmagy9Ba/p5PcWaYd0cSsRWkvdqQA+LJHQnLouuKK3aE1X9w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4TkbrN3gjtz4f3kKp; Tue, 27 Feb 2024 20:09:24 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id D2A7E1A0232; Tue, 27 Feb 2024 20:09:27 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgAn+RHv0N1lpKNAFQ--.28259S13; Tue, 27 Feb 2024 20:09:27 +0800 (CST) From: Yu Kuai To: xni@redhat.com, paul.e.luse@linux.intel.com, song@kernel.org, shli@fb.com, neilb@suse.com Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH md-6.9 v2 09/10] md/raid1: factor out the code to manage sequential IO Date: Tue, 27 Feb 2024 20:03:26 +0800 Message-Id: <20240227120327.1432511-10-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240227120327.1432511-1-yukuai1@huaweicloud.com> References: <20240227120327.1432511-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgAn+RHv0N1lpKNAFQ--.28259S13 X-Coremail-Antispam: 1UD129KBjvJXoWxZF45JryrWF1kZw4fJr13Jwb_yoW5KrW7pa 1avwn3XrWkXr9xu3y3Jr4UCryF9w1fGF48GFZ7A34FgrySqrW5ta18KrW3Zr97J393J34U X3Z3GrW7C3WkC3DanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPI14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr 0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUQ SdkUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792054827685182937 X-GMAIL-MSGID: 1792054827685182937 From: Yu Kuai There is no functional change for now, make read_balance() cleaner and prepare to fix problems and refactor the handler of sequential IO. Co-developed-by: Paul Luse Signed-off-by: Paul Luse Signed-off-by: Yu Kuai Reviewed-by: Xiao Ni --- drivers/md/raid1.c | 71 ++++++++++++++++++++++++---------------------- 1 file changed, 37 insertions(+), 34 deletions(-) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 76bb59ad1485..d3e9a0157437 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -705,6 +705,31 @@ static int choose_slow_rdev(struct r1conf *conf, struct r1bio *r1_bio, return bb_disk; } +static bool is_sequential(struct r1conf *conf, int disk, struct r1bio *r1_bio) +{ + /* TODO: address issues with this check and concurrency. */ + return conf->mirrors[disk].next_seq_sect == r1_bio->sector || + conf->mirrors[disk].head_position == r1_bio->sector; +} + +/* + * If buffered sequential IO size exceeds optimal iosize, check if there is idle + * disk. If yes, choose the idle disk. + */ +static bool should_choose_next(struct r1conf *conf, int disk) +{ + struct raid1_info *mirror = &conf->mirrors[disk]; + int opt_iosize; + + if (!test_bit(Nonrot, &mirror->rdev->flags)) + return false; + + opt_iosize = bdev_io_opt(mirror->rdev->bdev) >> 9; + return opt_iosize > 0 && mirror->seq_start != MaxSector && + mirror->next_seq_sect > opt_iosize && + mirror->next_seq_sect - opt_iosize >= mirror->seq_start; +} + /* * This routine returns the disk from which the requested read should * be done. There is a per-array 'next expected sequential IO' sector @@ -768,43 +793,21 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect pending = atomic_read(&rdev->nr_pending); dist = abs(this_sector - conf->mirrors[disk].head_position); /* Don't change to another disk for sequential reads */ - if (conf->mirrors[disk].next_seq_sect == this_sector - || dist == 0) { - int opt_iosize = bdev_io_opt(rdev->bdev) >> 9; - struct raid1_info *mirror = &conf->mirrors[disk]; - - /* - * If buffered sequential IO size exceeds optimal - * iosize, check if there is idle disk. If yes, choose - * the idle disk. read_balance could already choose an - * idle disk before noticing it's a sequential IO in - * this disk. This doesn't matter because this disk - * will idle, next time it will be utilized after the - * first disk has IO size exceeds optimal iosize. In - * this way, iosize of the first disk will be optimal - * iosize at least. iosize of the second disk might be - * small, but not a big deal since when the second disk - * starts IO, the first disk is likely still busy. - */ - if (test_bit(Nonrot, &rdev->flags) && opt_iosize > 0 && - mirror->seq_start != MaxSector && - mirror->next_seq_sect > opt_iosize && - mirror->next_seq_sect - opt_iosize >= - mirror->seq_start) { - /* - * Add 'pending' to avoid choosing this disk if - * there is other idle disk. - */ - pending++; - /* - * If there is no other idle disk, this disk - * will be chosen. - */ - sequential_disk = disk; - } else { + if (is_sequential(conf, disk, r1_bio)) { + if (!should_choose_next(conf, disk)) { best_disk = disk; break; } + /* + * Add 'pending' to avoid choosing this disk if + * there is other idle disk. + */ + pending++; + /* + * If there is no other idle disk, this disk + * will be chosen. + */ + sequential_disk = disk; } if (min_pending > pending) { From patchwork Tue Feb 27 12:03:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 207168 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:a81b:b0:108:e6aa:91d0 with SMTP id bq27csp2652673dyb; Tue, 27 Feb 2024 04:13:01 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVXwb2+zJQn0I7DGdWzFWzpEfKdPtKzkNsFyFXs9bjO8cVrHjHuesF0SnLEEjzVgykgZkDj8xltIXJRK62Dc8uzR/o9jg== X-Google-Smtp-Source: AGHT+IEp+Eqlg6Bu3kcPOHkRrut4UMDV/iLHJ1iD6/A/dnO2Utf72idFyuWct008WgGXXtNUoU2x X-Received: by 2002:a17:90a:bd0a:b0:299:489f:fd2d with SMTP id y10-20020a17090abd0a00b00299489ffd2dmr6910181pjr.20.1709035981625; Tue, 27 Feb 2024 04:13:01 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709035981; cv=pass; d=google.com; s=arc-20160816; b=zl51DjEut4loHwNee3mWbchnzf2LhdfMHxL4KJF/op+Xvs4D0nZ4Cp5C7T3OKnNgye /rE5ezzzYTxyjfbZj2l1IJMEMAgEZahr4BXuvth1zoIxqOkyC3Y64SuW3OTHS56oD2bl e5cCkv1KDne+ydTWl9Lg7wimSs99OAvmdZXxIpBMqktuxqamUVs19yQGC1CUzyocgFce liEtfeJatSaD/lSR7p3/jkKdJUeaz96N1X81uJNQMoLdYRS5gj1DBY6qZJNQsB0vfmBw 8OpA7P+u3CIPfftF45xhry7ilcCP3Jayr/EiDbDdyXUlmfyVGe4JEAhNeUjYIgHU4HN6 sCug== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=HgcrBO9Tw7bSmmX3m2n6rkDv5/MVS180Kjah1kJyTSU=; fh=Ytj0OQL2yyVgRSgKlJizbwNzVtV64xy+1TGtb7SiHpY=; b=Y6cbExGzhJWgpjt5mKhzeMiPTJvJ/oGkQVXzyko+CwmH2tK9BQAlCYaXALaRaihVnP pswr0C6ezPTlIsteK96fxcSDC8/kD6hmLldmRjuj+iiF9t3sanVuZkyqjSBHGghH4fuP kICHtckHLg2MrwDIdRklj67jXbJW6vs8H5po+v8cvqqAU3MgRfist70zPSbE5jQWPTJd eqv2oPGZpJI0EmerV1MtF3x+KIfGFlRcQR3N6Gn1H+WlSmrJ75tkgkXNGtQ6MkfsERo7 HYguhOqUPjvwPGyyk9Aszw3lpw38ajP5A3FS/nKm9nN7U6WuuceXciGtiRed7OB9tAeO QclQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-83176-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83176-ouuuleilei=gmail.com@vger.kernel.org" Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id b12-20020a170902bd4c00b001db27f2357esi1267875plx.579.2024.02.27.04.13.01 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Feb 2024 04:13:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-83176-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-83176-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83176-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id EB26D282416 for ; Tue, 27 Feb 2024 12:12:39 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7048A145FF1; Tue, 27 Feb 2024 12:09:37 +0000 (UTC) Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7593513AA5D; Tue, 27 Feb 2024 12:09:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709035773; cv=none; b=JN1mpBfKFQSIeYIMT9UMq6k8B/Jsr1SBx6Mwbt0U4k2GSTSC9h+PktQSZ/egqfCrZ9uzLgrAz/dxC9tTUrmEcHrwb75NuZamU08594EQwHB4v+G3IZmo7kIK5d3kHlsaFfQy/4z5rzM9Tesjsbo+fU0Z9rXEw8hYdFSk1BCN+Tw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709035773; c=relaxed/simple; bh=gj6NVHpq0Y6xpRlyoFks9hO476lUefpwWpArgf0CebQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=faUm9EMIQqC9kEuin5pOSjHIWQOyROGWvPD3oSW65OExQM3NzMKON/D591b3VG5H8xI+JeoPufKJiVe0KoojMnQPl3RjwGQy54ww1EaUkLoQN4sNhZMt3hekLIMwHrJ72WMoRWXQ8DJ20AuITLafVMWp8HCBXus7UBLaiYr6e/0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4TkbrK06Vcz4f3mJP; Tue, 27 Feb 2024 20:09:21 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 512E61A016E; Tue, 27 Feb 2024 20:09:28 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgAn+RHv0N1lpKNAFQ--.28259S14; Tue, 27 Feb 2024 20:09:28 +0800 (CST) From: Yu Kuai To: xni@redhat.com, paul.e.luse@linux.intel.com, song@kernel.org, shli@fb.com, neilb@suse.com Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH md-6.9 v2 10/10] md/raid1: factor out helpers to choose the best rdev from read_balance() Date: Tue, 27 Feb 2024 20:03:27 +0800 Message-Id: <20240227120327.1432511-11-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240227120327.1432511-1-yukuai1@huaweicloud.com> References: <20240227120327.1432511-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgAn+RHv0N1lpKNAFQ--.28259S14 X-Coremail-Antispam: 1UD129KBjvJXoW3JrWrWw15Wr1UAFyfAryxXwb_yoW3ZF13pw 45GFn2y3yUZryruwn5tr4UWrWS934fJa18GrWkG34S93sagrZ0qFn7KryY9FyDGrs3Cw12 qw15Gr47C3Z7GFJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPI14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr 0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUQ SdkUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792054113219081202 X-GMAIL-MSGID: 1792054113219081202 From: Yu Kuai The way that best rdev is chosen: 1) If the read is sequential from one rdev: - if rdev is rotational, use this rdev; - if rdev is non-rotational, use this rdev until total read length exceed disk opt io size; 2) If the read is not sequential: - if there is idle disk, use it, otherwise: - if the array has non-rotational disk, choose the rdev with minimal inflight IO; - if all the underlaying disks are rotational disk, choose the rdev with closest IO; There are no functional changes, just to make code cleaner and prepare for following refactor. Co-developed-by: Paul Luse Signed-off-by: Paul Luse Signed-off-by: Yu Kuai Reviewed-by: Xiao Ni --- drivers/md/raid1.c | 175 +++++++++++++++++++++++++-------------------- 1 file changed, 98 insertions(+), 77 deletions(-) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index d3e9a0157437..1bdd59d9e6ba 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -730,74 +730,71 @@ static bool should_choose_next(struct r1conf *conf, int disk) mirror->next_seq_sect - opt_iosize >= mirror->seq_start; } -/* - * This routine returns the disk from which the requested read should - * be done. There is a per-array 'next expected sequential IO' sector - * number - if this matches on the next IO then we use the last disk. - * There is also a per-disk 'last know head position' sector that is - * maintained from IRQ contexts, both the normal and the resync IO - * completion handlers update this position correctly. If there is no - * perfect sequential match then we pick the disk whose head is closest. - * - * If there are 2 mirrors in the same 2 devices, performance degrades - * because position is mirror, not device based. - * - * The rdev for the device selected will have nr_pending incremented. - */ -static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sectors) +static bool rdev_readable(struct md_rdev *rdev, struct r1bio *r1_bio) { - const sector_t this_sector = r1_bio->sector; - int sectors; - int best_good_sectors; - int best_disk, best_dist_disk, best_pending_disk, sequential_disk; - int disk; - sector_t best_dist; - unsigned int min_pending; - struct md_rdev *rdev; + if (!rdev || test_bit(Faulty, &rdev->flags)) + return false; - retry: - sectors = r1_bio->sectors; - best_disk = -1; - best_dist_disk = -1; - sequential_disk = -1; - best_dist = MaxSector; - best_pending_disk = -1; - min_pending = UINT_MAX; - best_good_sectors = 0; - clear_bit(R1BIO_FailFast, &r1_bio->state); + /* still in recovery */ + if (!test_bit(In_sync, &rdev->flags) && + rdev->recovery_offset < r1_bio->sector + r1_bio->sectors) + return false; - if (raid1_should_read_first(conf->mddev, this_sector, sectors)) - return choose_first_rdev(conf, r1_bio, max_sectors); + /* don't read from slow disk unless have to */ + if (test_bit(WriteMostly, &rdev->flags)) + return false; + + /* don't split IO for bad blocks unless have to */ + if (rdev_has_badblock(rdev, r1_bio->sector, r1_bio->sectors)) + return false; + + return true; +} + +struct read_balance_ctl { + sector_t closest_dist; + int closest_dist_disk; + int min_pending; + int min_pending_disk; + int sequential_disk; + int readable_disks; +}; + +static int choose_best_rdev(struct r1conf *conf, struct r1bio *r1_bio) +{ + int disk; + struct read_balance_ctl ctl = { + .closest_dist_disk = -1, + .closest_dist = MaxSector, + .min_pending_disk = -1, + .min_pending = UINT_MAX, + .sequential_disk = -1, + }; for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) { + struct md_rdev *rdev; sector_t dist; unsigned int pending; - rdev = conf->mirrors[disk].rdev; - if (r1_bio->bios[disk] == IO_BLOCKED - || rdev == NULL - || test_bit(Faulty, &rdev->flags)) - continue; - if (!test_bit(In_sync, &rdev->flags) && - rdev->recovery_offset < this_sector + sectors) - continue; - if (test_bit(WriteMostly, &rdev->flags)) + if (r1_bio->bios[disk] == IO_BLOCKED) continue; - if (rdev_has_badblock(rdev, this_sector, sectors)) + + rdev = conf->mirrors[disk].rdev; + if (!rdev_readable(rdev, r1_bio)) continue; - if (best_disk >= 0) - /* At least two disks to choose from so failfast is OK */ + /* At least two disks to choose from so failfast is OK */ + if (ctl.readable_disks++ == 1) set_bit(R1BIO_FailFast, &r1_bio->state); pending = atomic_read(&rdev->nr_pending); - dist = abs(this_sector - conf->mirrors[disk].head_position); + dist = abs(r1_bio->sector - conf->mirrors[disk].head_position); + /* Don't change to another disk for sequential reads */ if (is_sequential(conf, disk, r1_bio)) { - if (!should_choose_next(conf, disk)) { - best_disk = disk; - break; - } + if (!should_choose_next(conf, disk)) + return disk; + /* * Add 'pending' to avoid choosing this disk if * there is other idle disk. @@ -807,17 +804,17 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect * If there is no other idle disk, this disk * will be chosen. */ - sequential_disk = disk; + ctl.sequential_disk = disk; } - if (min_pending > pending) { - min_pending = pending; - best_pending_disk = disk; + if (ctl.min_pending > pending) { + ctl.min_pending = pending; + ctl.min_pending_disk = disk; } - if (dist < best_dist) { - best_dist = dist; - best_dist_disk = disk; + if (ctl.closest_dist > dist) { + ctl.closest_dist = dist; + ctl.closest_dist_disk = disk; } } @@ -825,8 +822,8 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect * sequential IO size exceeds optimal iosize, however, there is no other * idle disk, so choose the sequential disk. */ - if (best_disk == -1 && min_pending != 0) - best_disk = sequential_disk; + if (ctl.sequential_disk != -1 && ctl.min_pending != 0) + return ctl.sequential_disk; /* * If all disks are rotational, choose the closest disk. If any disk is @@ -834,25 +831,49 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect * disk is rotational, which might/might not be optimal for raids with * mixed ratation/non-rotational disks depending on workload. */ - if (best_disk == -1) { - if (READ_ONCE(conf->nonrot_disks) || min_pending == 0) - best_disk = best_pending_disk; - else - best_disk = best_dist_disk; - } + if (ctl.min_pending_disk != -1 && + (READ_ONCE(conf->nonrot_disks) || ctl.min_pending == 0)) + return ctl.min_pending_disk; + else + return ctl.closest_dist_disk; +} - if (best_disk >= 0) { - rdev = conf->mirrors[best_disk].rdev; - if (!rdev) - goto retry; +/* + * This routine returns the disk from which the requested read should be done. + * + * 1) If resync is in progress, find the first usable disk and use it even if it + * has some bad blocks. + * + * 2) Now that there is no resync, loop through all disks and skipping slow + * disks and disks with bad blocks for now. Only pay attention to key disk + * choice. + * + * 3) If we've made it this far, now look for disks with bad blocks and choose + * the one with most number of sectors. + * + * 4) If we are all the way at the end, we have no choice but to use a disk even + * if it is write mostly. + * + * The rdev for the device selected will have nr_pending incremented. + */ +static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, + int *max_sectors) +{ + int disk; - sectors = best_good_sectors; - update_read_sectors(conf, disk, this_sector, sectors); - } - *max_sectors = sectors; + clear_bit(R1BIO_FailFast, &r1_bio->state); + + if (raid1_should_read_first(conf->mddev, r1_bio->sector, + r1_bio->sectors)) + return choose_first_rdev(conf, r1_bio, max_sectors); - if (best_disk >= 0) - return best_disk; + disk = choose_best_rdev(conf, r1_bio); + if (disk >= 0) { + *max_sectors = r1_bio->sectors; + update_read_sectors(conf, disk, r1_bio->sector, + r1_bio->sectors); + return disk; + } /* * If we are here it means we didn't find a perfectly good disk so