Message ID | 20231110051950.21972-1-ed.tsai@mediatek.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b129:0:b0:403:3b70:6f57 with SMTP id q9csp1306404vqs; Fri, 10 Nov 2023 10:33:45 -0800 (PST) X-Google-Smtp-Source: AGHT+IGyS7ZY/hinhIUURQa1H2Xce4PT4uKOyqxhz0RZAtohQ8ViSZa3Eaz17q/F7UHdAb5LhuwP X-Received: by 2002:a05:6e02:1a84:b0:359:d5dd:b6c1 with SMTP id k4-20020a056e021a8400b00359d5ddb6c1mr208694ilv.11.1699641224865; Fri, 10 Nov 2023 10:33:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699641224; cv=none; d=google.com; s=arc-20160816; b=WYw7aPOMrq9N9pLiXiHy0kTpHo48s+xZZOptrapvvUJcvNyMa/obLzdAoYSngTU/lb SSViJjeK7WT3Sp7duZ4YGABfU46zEC5EKCU8nHt4nqfV8IzcGjk3b76IRpvuBCCReC38 nXxhW7RhbWR65gKfq3JvDS48B/iLT20ZO1PDc2R80aE2yfEjg421ekUE3q9p5wSnIrKO bJQp5Ae1AIyS3C+KdTjRL7UTVh0PLRg04pWjOWXMblrjCa2iWPPFHWr3vl4OBYPh6Isz emP07IJ/rQm66c48ka8Lok8hmNuQzAUWNzzWE9C2Wo1rJr8LOkAFHwBpsOGlEd99Q/wF C9vg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:subject:cc:to:from :dkim-signature; bh=7FpHD2kCR7YbMx+aNsQ1mmOFFVb+8A9sWA6xzZIMrmg=; fh=mhAMXKbJjVhI8HUZaOkk4rTzI68eixLOhlO6WtNM+oY=; b=yyd3/eAJVtyyJLHPWBPvAGmWaowNu4m4MTcNeb1GVMeBxmPQoR8QuqqrYSLPpf+6ME fFVg7AZ1YF6EOZg2sdqsteZN4qpy7WvIp1vWUlIcOsOPBzJObSkNcwcBuhtWmaXN3g/K AZvgVd+NY+KDfEw00bDpMmoNeOr/XoSnxavv2+GXcG8YCeMCzcmkMMTCU0PUG4lB8aXD YlbJPevvPwmm4b4CGmbrzXbfoSpKqnndQTDp2IXS31PkbzSVK2XdKVjp2puiW5z8w7Rt 58LNAem02ayZ5SStsA72ArhybCcgKgaEK0Yh1USyPBfdlLBwKd50wuSq5zWcXuU1CX6X K4Bw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@mediatek.com header.s=dk header.b=Y93eMJbe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=mediatek.com Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id ca5-20020a056a02068500b005be37aa28dcsi1054683pgb.571.2023.11.10.10.33.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Nov 2023 10:33:44 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@mediatek.com header.s=dk header.b=Y93eMJbe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=mediatek.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 64FD680E8566; Fri, 10 Nov 2023 10:33:15 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235272AbjKJS3I (ORCPT <rfc822;lhua1029@gmail.com> + 30 others); Fri, 10 Nov 2023 13:29:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39912 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346511AbjKJS0z (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 10 Nov 2023 13:26:55 -0500 Received: from mailgw02.mediatek.com (unknown [210.61.82.184]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D39426A65; Thu, 9 Nov 2023 22:20:57 -0800 (PST) X-UUID: cc3217387f8811ee8051498923ad61e6-20231110 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=mediatek.com; s=dk; h=Content-Type:MIME-Version:Message-ID:Date:Subject:CC:To:From; bh=7FpHD2kCR7YbMx+aNsQ1mmOFFVb+8A9sWA6xzZIMrmg=; b=Y93eMJbebs/vWyTU8xoREfqtqbBXjnBdi03BE/RjNPHFakvv0Egb7klCGJEzKCksoGvSrhy9rTDBqLX6cWCIAcgPpv1tnYDJwCDuj5prPd7kz6nYTeFUocnh5aLe4C6I8sP/jSvuIYRvi29A+QvH+ENXl9mPIbGXwMXfcwDmD0c=; X-CID-P-RULE: Release_Ham X-CID-O-INFO: VERSION:1.1.33,REQID:223e8155-0dac-4a27-bc5e-bc7010b555f6,IP:0,U RL:0,TC:0,Content:0,EDM:0,RT:0,SF:0,FILE:0,BULK:0,RULE:Release_Ham,ACTION: release,TS:0 X-CID-META: VersionHash:364b77b,CLOUDID:40e65efc-4a48-46e2-b946-12f04f20af8c,B ulkID:nil,BulkQuantity:0,Recheck:0,SF:102,TC:nil,Content:0,EDM:-3,IP:nil,U RL:1,File:nil,Bulk:nil,QS:nil,BEC:nil,COL:0,OSI:0,OSA:0,AV:0,LES:1,SPR:NO, DKR:0,DKP:0,BRR:0,BRE:0 X-CID-BVR: 0,NGT X-CID-BAS: 0,NGT,0,_ X-CID-FACTOR: TF_CID_SPAM_SNR,TF_CID_SPAM_ULS X-UUID: cc3217387f8811ee8051498923ad61e6-20231110 Received: from mtkmbs10n1.mediatek.inc [(172.21.101.34)] by mailgw02.mediatek.com (envelope-from <ed.tsai@mediatek.com>) (Generic MTA with TLSv1.2 ECDHE-RSA-AES256-GCM-SHA384 256/256) with ESMTP id 1782307766; Fri, 10 Nov 2023 13:20:01 +0800 Received: from mtkmbs13n1.mediatek.inc (172.21.101.193) by mtkmbs10n2.mediatek.inc (172.21.101.183) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.26; Fri, 10 Nov 2023 13:19:59 +0800 Received: from mtksdccf07.mediatek.inc (172.21.84.99) by mtkmbs13n1.mediatek.inc (172.21.101.73) with Microsoft SMTP Server id 15.2.1118.26 via Frontend Transport; Fri, 10 Nov 2023 13:19:59 +0800 From: <ed.tsai@mediatek.com> To: <ming.lei@redhat.com>, <hch@lst.de>, Jens Axboe <axboe@kernel.dk>, Matthias Brugger <matthias.bgg@gmail.com>, AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> CC: <wsd_upstream@mediatek.com>, <chun-hung.wu@mediatek.com>, <casper.li@mediatek.com>, <will.shiu@mediatek.com>, <light.hsieh@mediatek.com>, Ed Tsai <ed.tsai@mediatek.com>, <linux-block@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <linux-arm-kernel@lists.infradead.org>, <linux-mediatek@lists.infradead.org> Subject: [PATCH v2] block: limit the extract size to align queue limit Date: Fri, 10 Nov 2023 13:19:49 +0800 Message-ID: <20231110051950.21972-1-ed.tsai@mediatek.com> X-Mailer: git-send-email 2.18.0 MIME-Version: 1.0 Content-Type: text/plain X-TM-AS-Product-Ver: SMEX-14.0.0.3152-9.1.1006-23728.005 X-TM-AS-Result: No-10--8.071000-8.000000 X-TMASE-MatchedRID: GRLfTco7/5d8XbDftE7Is4DCiwMF64/BF7ufiw/c2JC+1E7kR3IIODAp i0hTr2bdzAa7mENQbcdODBHPjwMoZMXordq7zYYlG5mg0pzqmX5dymZBcuGGRE1KG1YrOQW/SHg UVMoIv2E+Ajr5JCJKWHYzUtuMFaSG5G8kIlfiBw6S2fL5IypDsQrefVId6fzV+45sY5SzyDLBXs R9mHtjgVlIo52/rCDQ232LjmYwhKqkwTZnxyBJ8wPZZctd3P4B7f6JAS2hKPi1YUw9VHYKvEAVp ZTwLLImA02Wf/A3XRP4/eeNk8pxCQJ0w/AKVkU5ngIgpj8eDcAZ1CdBJOsoY8RB0bsfrpPIx1FP lNAAmcCaaytbuWqW4EJ/sa/XOh/KqZzj70/su53LD62wY7gH956oP1a0mRIj X-TM-AS-User-Approved-Sender: No X-TM-AS-User-Blocked-Sender: No X-TMASE-Result: 10--8.071000-8.000000 X-TMASE-Version: SMEX-14.0.0.3152-9.1.1006-23728.005 X-TM-SNTS-SMTP: 16776E1C9EE34DD3DF3EA7DE413DEC2CBF793C528D41CD11BDB7143234A2273E2000:8 X-MTK: N X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Fri, 10 Nov 2023 10:33:15 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782202997308072831 X-GMAIL-MSGID: 1782202997308072831 |
Series |
[v2] block: limit the extract size to align queue limit
|
|
Commit Message
Ed Tsai (蔡宗軒)
Nov. 10, 2023, 5:19 a.m. UTC
From: Ed Tsai <ed.tsai@mediatek.com> When an application performs a large IO, it fills and submits multiple full bios to the block layer. Referring to commit 07173c3ec276 ("block: enable multipage bvecs"), the full bio size is no longer fixed at 1MB but can vary based on the physical memory layout. The size of the full bio no longer aligns with the maximum IO size of the queue. Therefore, in a 64MB read, you may see many unaligned bios being submitted. Executing the command to perform a 64MB read: dd if=/data/test_file of=/dev/null bs=64m count=1 iflag=direct It demonstrates the submission of numerous unaligned bios: block_bio_queue: 254,52 R 2933336 + 2136 block_bio_queue: 254,52 R 2935472 + 2152 block_bio_queue: 254,52 R 2937624 + 2128 block_bio_queue: 254,52 R 2939752 + 2160 This patch limits the number of extract pages to ensure that we submit the bio once we fill enough pages, preventing the block layer from spliting small I/Os in between. I performed the Antutu V10 Storage Test on a UFS 4.0 device, which resulted in a significant improvement in the Sequential test: Sequential Read (average of 5 rounds): Original: 3033.7 MB/sec Patched: 3520.9 MB/sec Sequential Write (average of 5 rounds): Original: 2225.4 MB/sec Patched: 2800.3 MB/sec Link: https://lore.kernel.org/linux-arm-kernel/20231025092255.27930-1-ed.tsai@mediatek.com/ Signed-off-by: Ed Tsai <ed.tsai@mediatek.com> --- block/bio.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)
Comments
On Fri, Nov 10, 2023 at 01:19:49PM +0800, ed.tsai@mediatek.com wrote: > + if (bdev && blk_queue_pci_p2pdma(bdev->bd_disk->queue)) > extraction_flags |= ITER_ALLOW_P2PDMA; As pointed out in reply to Ming, you really need to first figure out if we can assume we have a valid bdev or not, and if not pass all the relevant information separately. > + if (bdev && bio_op(bio) != REQ_OP_ZONE_APPEND) { > + unsigned int max = queue_max_bytes(bdev_get_queue(bdev)); The higher level code must not look at queue_max_bytes, that is only used for splitting and might not even be initialized.
Hi, kernel test robot noticed the following build warnings: [auto build test WARNING on axboe-block/for-next] [also build test WARNING on hch-configfs/for-next linus/master v6.6 next-20231110] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/ed-tsai-mediatek-com/block-limit-the-extract-size-to-align-queue-limit/20231110-142205 base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next patch link: https://lore.kernel.org/r/20231110051950.21972-1-ed.tsai%40mediatek.com patch subject: [PATCH v2] block: limit the extract size to align queue limit config: arc-randconfig-002-20231110 (https://download.01.org/0day-ci/archive/20231110/202311101853.9N398fyj-lkp@intel.com/config) compiler: arc-elf-gcc (GCC) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231110/202311101853.9N398fyj-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202311101853.9N398fyj-lkp@intel.com/ All warnings (new ones prefixed by >>): block/bio.c: In function '__bio_iov_iter_get_pages': >> block/bio.c:1261:29: warning: suggest parentheses around '-' in operand of '&' [-Wparentheses] 1261 | max - bio->bi_iter.bi_size & (max - 1) : max; | ~~~~^~~~~~~~~~~~~~~~~~~~~~ vim +1261 block/bio.c 1214 1215 /** 1216 * __bio_iov_iter_get_pages - pin user or kernel pages and add them to a bio 1217 * @bio: bio to add pages to 1218 * @iter: iov iterator describing the region to be mapped 1219 * 1220 * Extracts pages from *iter and appends them to @bio's bvec array. The pages 1221 * will have to be cleaned up in the way indicated by the BIO_PAGE_PINNED flag. 1222 * For a multi-segment *iter, this function only adds pages from the next 1223 * non-empty segment of the iov iterator. 1224 */ 1225 static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) 1226 { 1227 iov_iter_extraction_t extraction_flags = 0; 1228 unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt; 1229 unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt; 1230 struct block_device *bdev = bio->bi_bdev; 1231 struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt; 1232 struct page **pages = (struct page **)bv; 1233 ssize_t max_extract = UINT_MAX - bio->bi_iter.bi_size; 1234 ssize_t size, left; 1235 unsigned len, i = 0; 1236 size_t offset; 1237 int ret = 0; 1238 1239 /* 1240 * Move page array up in the allocated memory for the bio vecs as far as 1241 * possible so that we can start filling biovecs from the beginning 1242 * without overwriting the temporary page array. 1243 */ 1244 BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2); 1245 pages += entries_left * (PAGE_PTRS_PER_BVEC - 1); 1246 1247 if (bdev && blk_queue_pci_p2pdma(bdev->bd_disk->queue)) 1248 extraction_flags |= ITER_ALLOW_P2PDMA; 1249 1250 /* 1251 * Each segment in the iov is required to be a block size multiple. 1252 * However, we may not be able to get the entire segment if it spans 1253 * more pages than bi_max_vecs allows, so we have to ALIGN_DOWN the 1254 * result to ensure the bio's total size is correct. The remainder of 1255 * the iov data will be picked up in the next bio iteration. 1256 */ 1257 if (bdev && bio_op(bio) != REQ_OP_ZONE_APPEND) { 1258 unsigned int max = queue_max_bytes(bdev_get_queue(bdev)); 1259 1260 max_extract = bio->bi_iter.bi_size ? > 1261 max - bio->bi_iter.bi_size & (max - 1) : max; 1262 } 1263 size = iov_iter_extract_pages(iter, &pages, max_extract, 1264 nr_pages, extraction_flags, &offset); 1265 if (unlikely(size <= 0)) 1266 return size ? size : -EFAULT; 1267 1268 nr_pages = DIV_ROUND_UP(offset + size, PAGE_SIZE); 1269 1270 if (bdev) { 1271 size_t trim = size & (bdev_logical_block_size(bdev) - 1); 1272 iov_iter_revert(iter, trim); 1273 size -= trim; 1274 } 1275 1276 if (unlikely(!size)) { 1277 ret = -EFAULT; 1278 goto out; 1279 } 1280 1281 for (left = size, i = 0; left > 0; left -= len, i++) { 1282 struct page *page = pages[i]; 1283 1284 len = min_t(size_t, PAGE_SIZE - offset, left); 1285 if (bio_op(bio) == REQ_OP_ZONE_APPEND) { 1286 ret = bio_iov_add_zone_append_page(bio, page, len, 1287 offset); 1288 if (ret) 1289 break; 1290 } else 1291 bio_iov_add_page(bio, page, len, offset); 1292 1293 offset = 0; 1294 } 1295 1296 iov_iter_revert(iter, left); 1297 out: 1298 while (i < nr_pages) 1299 bio_release_page(bio, pages[i++]); 1300 1301 return ret; 1302 } 1303
On Fri, Nov 10, 2023 at 01:19:49PM +0800, ed.tsai@mediatek.com wrote: > From: Ed Tsai <ed.tsai@mediatek.com> > > When an application performs a large IO, it fills and submits multiple > full bios to the block layer. Referring to commit 07173c3ec276 > ("block: enable multipage bvecs"), the full bio size is no longer fixed > at 1MB but can vary based on the physical memory layout. > > The size of the full bio no longer aligns with the maximum IO size of > the queue. Therefore, in a 64MB read, you may see many unaligned bios > being submitted. > > Executing the command to perform a 64MB read: > > dd if=/data/test_file of=/dev/null bs=64m count=1 iflag=direct > > It demonstrates the submission of numerous unaligned bios: > > block_bio_queue: 254,52 R 2933336 + 2136 > block_bio_queue: 254,52 R 2935472 + 2152 > block_bio_queue: 254,52 R 2937624 + 2128 > block_bio_queue: 254,52 R 2939752 + 2160 > > This patch limits the number of extract pages to ensure that we submit > the bio once we fill enough pages, preventing the block layer from > spliting small I/Os in between. > > I performed the Antutu V10 Storage Test on a UFS 4.0 device, which > resulted in a significant improvement in the Sequential test: > > Sequential Read (average of 5 rounds): > Original: 3033.7 MB/sec > Patched: 3520.9 MB/sec > > Sequential Write (average of 5 rounds): > Original: 2225.4 MB/sec > Patched: 2800.3 MB/sec > > Link: https://lore.kernel.org/linux-arm-kernel/20231025092255.27930-1-ed.tsai@mediatek.com/ > Signed-off-by: Ed Tsai <ed.tsai@mediatek.com> > > --- > block/bio.c | 17 ++++++++++++----- > 1 file changed, 12 insertions(+), 5 deletions(-) > > diff --git a/block/bio.c b/block/bio.c > index 816d412c06e9..8d3a112e68da 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -1227,8 +1227,10 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) > iov_iter_extraction_t extraction_flags = 0; > unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt; > unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt; > + struct block_device *bdev = bio->bi_bdev; > struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt; > struct page **pages = (struct page **)bv; > + ssize_t max_extract = UINT_MAX - bio->bi_iter.bi_size; > ssize_t size, left; > unsigned len, i = 0; > size_t offset; > @@ -1242,7 +1244,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) > BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2); > pages += entries_left * (PAGE_PTRS_PER_BVEC - 1); > > - if (bio->bi_bdev && blk_queue_pci_p2pdma(bio->bi_bdev->bd_disk->queue)) > + if (bdev && blk_queue_pci_p2pdma(bdev->bd_disk->queue)) > extraction_flags |= ITER_ALLOW_P2PDMA; > > /* > @@ -1252,16 +1254,21 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) > * result to ensure the bio's total size is correct. The remainder of > * the iov data will be picked up in the next bio iteration. > */ > - size = iov_iter_extract_pages(iter, &pages, > - UINT_MAX - bio->bi_iter.bi_size, > + if (bdev && bio_op(bio) != REQ_OP_ZONE_APPEND) { > + unsigned int max = queue_max_bytes(bdev_get_queue(bdev)); > + > + max_extract = bio->bi_iter.bi_size ? > + max - bio->bi_iter.bi_size & (max - 1) : max; > + } > + size = iov_iter_extract_pages(iter, &pages, max_extract, > nr_pages, extraction_flags, &offset); The above is just what I did in the 'slow path' of patch v2[1], and it can't work well for every extracting pages which is usually slow, and batching extracting pages should be done always, such as: 1) build one ublk disk(suppose it is /dev/ublkb0) with max sectors of 32k: - rublk add null --io-buf-size=16384 -q 2 [2] 2) run 64KB IO fio --direct=1 --size=230G --bsrange=64k-64k --runtime=20 --numjobs=2 --ioengine=libaio \ --iodepth=64 --iodepth_batch_submit=64 --iodepth_batch_complete_min=64 --group_reporting=1 \ --filename=/dev/ublkb0 --name=/dev/ublkb0-test-randread --rw=randread In my local VM, read BW is dropped to 3709MB/s from 20GB/s in the above fio test with this patch. The point is that: 1) bio size alignment is only needed in case of multiple bios 2) bio size alignment is needed only when the current bio is approaching to become FULL 3) with multiple bvec, it is hard to know how many pages can be held in bvecs beforehand In short, running every alignment is much less efficient. [1] https://lore.kernel.org/linux-block/202311100354.HYfqOQ7o-lkp@intel.com/T/#u [2] install rublk via `cargo install --version=^0.1 rublk` and CONFIG_BLK_DEV_UBLK is required Thanks, Ming
diff --git a/block/bio.c b/block/bio.c index 816d412c06e9..8d3a112e68da 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1227,8 +1227,10 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) iov_iter_extraction_t extraction_flags = 0; unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt; unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt; + struct block_device *bdev = bio->bi_bdev; struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt; struct page **pages = (struct page **)bv; + ssize_t max_extract = UINT_MAX - bio->bi_iter.bi_size; ssize_t size, left; unsigned len, i = 0; size_t offset; @@ -1242,7 +1244,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2); pages += entries_left * (PAGE_PTRS_PER_BVEC - 1); - if (bio->bi_bdev && blk_queue_pci_p2pdma(bio->bi_bdev->bd_disk->queue)) + if (bdev && blk_queue_pci_p2pdma(bdev->bd_disk->queue)) extraction_flags |= ITER_ALLOW_P2PDMA; /* @@ -1252,16 +1254,21 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) * result to ensure the bio's total size is correct. The remainder of * the iov data will be picked up in the next bio iteration. */ - size = iov_iter_extract_pages(iter, &pages, - UINT_MAX - bio->bi_iter.bi_size, + if (bdev && bio_op(bio) != REQ_OP_ZONE_APPEND) { + unsigned int max = queue_max_bytes(bdev_get_queue(bdev)); + + max_extract = bio->bi_iter.bi_size ? + max - bio->bi_iter.bi_size & (max - 1) : max; + } + size = iov_iter_extract_pages(iter, &pages, max_extract, nr_pages, extraction_flags, &offset); if (unlikely(size <= 0)) return size ? size : -EFAULT; nr_pages = DIV_ROUND_UP(offset + size, PAGE_SIZE); - if (bio->bi_bdev) { - size_t trim = size & (bdev_logical_block_size(bio->bi_bdev) - 1); + if (bdev) { + size_t trim = size & (bdev_logical_block_size(bdev) - 1); iov_iter_revert(iter, trim); size -= trim; }