Message ID | 20231021154806.4019417-1-yukuai1@huaweicloud.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce89:0:b0:403:3b70:6f57 with SMTP id p9csp169819vqx; Sat, 21 Oct 2023 00:52:53 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFJe2LergfwfDqGZs7zJLDKZnlmhj6GFjqI0uNi3Y94s1hjDfNz/yQheaLCoimzPoWLZd4V X-Received: by 2002:a05:622a:18e:b0:41c:b764:ca0f with SMTP id s14-20020a05622a018e00b0041cb764ca0fmr4250165qtw.58.1697874772820; Sat, 21 Oct 2023 00:52:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697874772; cv=none; d=google.com; s=arc-20160816; b=WQazr00n4NTZY1eCQKSZS1jz42TFVSMBw18jtYcN7OPRmzvuAa44axpVrYz5bgnQ0v lYnWpOcPrBcJP9bcjSp/8LUfKp6ixq34SvPM5RuPXoOqxcZDkbtYOwiHnS6LUFTZGiey +UT0LfdJXrUmVmj4i+CpOQ8IFji9uaN/NcFXDvyWL1ueF1w428Pp7oBnuRZ4fLEVyGKA NEHBFIcFLeJIfHGopbH247404BbKo8cNRQOi2++LmuiC2+6kdY6QaIBcgJTbK5STOQRW O4Xqr6/K4Nxnxq7D8+tWa/W882uPgIJzXksrbG5jn60u3RermUID/87whbRkxKoL3bPO xkLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=ar0fo9OYk48ndFV/4OLzGteF4vK6Qbva87V7+Hbx1/w=; fh=arL6pA5NmwMvFY+nZRGHincKL7bYEPRz5M0mwGB0hxY=; b=OA6btU2h60Pwn58W+3nu8MT7rDi3xTiF6/F3823JBC1UjGs5QVY2jC8KDD1jwlRMIV h7VyvpdIHkKtsm95iv1HhzSguySjy12G2w5PhgNRulqXP1jBQq22YaH6c4iQJtYM6aJ7 Mf0Qi0CaqxQb3WfgJMzjiHtmtjLEpqGod902xeA3brArDIJGY/frdIO+QEjeOCRZ/WFt 89UQV1ZKO7r1ECldzT4jnNOU9qch+RmqaqyepVK9f6v+v6kj1NuCIKRK6G1CBZJ2kjEB 7ll8TymyYUz7k/f4ZbTRW9DryUdZrRZzSn/bZfuP1YZj71PkW49L4txt07x2CcRiS4jP GcDg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from pete.vger.email (pete.vger.email. [23.128.96.36]) by mx.google.com with ESMTPS id d6-20020a170903230600b001c75866c987si3291581plh.584.2023.10.21.00.52.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 21 Oct 2023 00:52:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) client-ip=23.128.96.36; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id 17B5882DB45C; Sat, 21 Oct 2023 00:52:44 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229680AbjJUHw2 (ORCPT <rfc822;a1648639935@gmail.com> + 26 others); Sat, 21 Oct 2023 03:52:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33128 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229472AbjJUHw1 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sat, 21 Oct 2023 03:52:27 -0400 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B2D31D66; Sat, 21 Oct 2023 00:52:24 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4SCDFF5wBtz4f3mHR; Sat, 21 Oct 2023 15:52:17 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgAXrt0ygzNl84cpDg--.7754S4; Sat, 21 Oct 2023 15:52:20 +0800 (CST) From: Yu Kuai <yukuai1@huaweicloud.com> To: bvanassche@acm.org, hch@lst.de, kbusch@kernel.org, ming.lei@redhat.com, axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH RFC v2 0/8] blk-mq: improve tag fair sharing Date: Sat, 21 Oct 2023 23:47:58 +0800 Message-Id: <20231021154806.4019417-1-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID: gCh0CgAXrt0ygzNl84cpDg--.7754S4 X-Coremail-Antispam: 1UD129KBjvJXoW7Aryxur4rAw48ArWktF48Crg_yoW8tFy8pF W3Ka1fGw4xtrW2qr43Z3y0qa4Fqw4kCF45Krn3X345Ar1Ykrs2q3Wvqr4rZFyxJrs3AFsr XF4jyr98CFWUJ37anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUU9q14x267AKxVW8JVW5JwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2ocxC64kIII 0Yj41l84x0c7CEw4AK67xGY2AK021l84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xv wVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4 x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG 64xvF2IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r 1j6r4UMcvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAq YI8I648v4I1lFIxGxcIEc7CjxVA2Y2ka0xkIwI1l42xK82IYc2Ij64vIr41l4I8I3I0E4I kC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWU WwC2zVAF1VAY17CE14v26r1q6r43MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr 0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWr Jr0_WFyUJwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r 4UJbIYCTnIWIevJa73UjIFyTuYvjTRNgAwUUUUU X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=-0.7 required=5.0 tests=DATE_IN_FUTURE_06_12, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Sat, 21 Oct 2023 00:52:44 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780350737503574818 X-GMAIL-MSGID: 1780350737503574818 |
Series |
blk-mq: improve tag fair sharing
|
|
Message
Yu Kuai
Oct. 21, 2023, 3:47 p.m. UTC
From: Yu Kuai <yukuai3@huawei.com>
Current implementation:
- a counter active_queues record how many queue/hctx is sharing tags,
and it's updated while issue new IO, and cleared in
blk_mq_timeout_work().
- if active_queues is more than 1, then tags is fair shared to each
node;
New implementation:
- a new field 'available_tags' is added to each node, and it's
calculate in slow path, hence fast path won't be affected, patch 5;
- a new counter 'busy_queues' is added to blk_mq_tags, and it's updated
while fail to get driver tag, and it's also cleared in
blk_mq_timeout_work(), and tag sharing will based on 'busy_queues'
instead of 'active_queues', patch 6,7;
- a new counter 'busy_count' is added to each node to record how many
times a node failed to get driver tag, and it's used to judge if a node
is busy and need more tags, patch 8;
- a new timer is added to blk_mq_tags, it will start if any node failed
to get driver tag, and timer function will be used to borrow tags and
return borrowed tags, patch 8;
A simple test, 32 tags with two shared node:
[global]
ioengine=libaio
iodepth=2
bs=4k
direct=1
rw=randrw
group_reporting
[sda]
numjobs=32
filename=/dev/sda
[sdb]
numjobs=1
filename=/dev/sdb
Test result(monitor new debugfs entry):
time active available
sda sdb sda sdb
0 0 0 32 32
1 16 2 16 16 -> start fair sharing
2 19 2 20 16
3 24 2 24 16
4 26 2 28 16 -> borrow 32/8=4 tags each round
5 28 2 28 16 -> save at lease 4 tags for sdb
Yu Kuai (8):
blk-mq: factor out a structure from blk_mq_tags
blk-mq: factor out a structure to store information for tag sharing
blk-mq: add a helper to initialize shared_tag_info
blk-mq: support to track active queues from blk_mq_tags
blk-mq: precalculate available tags for hctx_may_queue()
blk-mq: add new helpers blk_mq_driver_tag_busy/idle()
blk-mq-tag: delay tag sharing until fail to get driver tag
blk-mq-tag: allow shared queue/hctx to get more driver tags
block/blk-core.c | 2 -
block/blk-mq-debugfs.c | 30 +++++-
block/blk-mq-tag.c | 226 +++++++++++++++++++++++++++++++++++++++--
block/blk-mq.c | 12 ++-
block/blk-mq.h | 64 +++++++-----
include/linux/blk-mq.h | 36 +++++--
include/linux/blkdev.h | 11 +-
7 files changed, 328 insertions(+), 53 deletions(-)
Comments
Hello Yu Kuai, On Sat, Oct 21, 2023 at 11:47:58PM +0800, Yu Kuai wrote: > From: Yu Kuai <yukuai3@huawei.com> > > Current implementation: > - a counter active_queues record how many queue/hctx is sharing tags, > and it's updated while issue new IO, and cleared in > blk_mq_timeout_work(). > - if active_queues is more than 1, then tags is fair shared to each > node; Can you explain a bit what the problem is in current tag sharing? And what is your basic approach for this problem? Just mentioning the implementation is not too helpful for initial review, cause the problem and approach(correctness) need to be understood first. Thanks, Ming
Hi, 在 2023/10/23 12:38, Ming Lei 写道: > Hello Yu Kuai, > > On Sat, Oct 21, 2023 at 11:47:58PM +0800, Yu Kuai wrote: >> From: Yu Kuai <yukuai3@huawei.com> >> >> Current implementation: >> - a counter active_queues record how many queue/hctx is sharing tags, >> and it's updated while issue new IO, and cleared in >> blk_mq_timeout_work(). >> - if active_queues is more than 1, then tags is fair shared to each >> node; > > Can you explain a bit what the problem is in current tag sharing? > And what is your basic approach for this problem? > > Just mentioning the implementation is not too helpful for initial > review, cause the problem and approach(correctness) need to be > understood first. Of course, I'll add following if there will be a v3; Current problems: If there are multiple active_queues, then tag is fair shared to each queue, and if one queue is not busy(for example, only issue one IO once for a while), then shared tags for this queue is wasted and can't be used for other queues. Depends on the hardware, this might casue performance problems in some user case. For example, as reported by [1], UFS devices have multiple logical units. One of these logical units (WLUN) is used to submit control commands, e.g. START STOP UNIT. If any request is submitted to the WLUN, the queue depth is reduced from 31 to 15 or lower for data LUNs. This patchset first delay tag sharing from issue IO to failed to get driver tag; then add a counter to record how many times shared queue failed to get driver tag to indicate if the queue is busy; finially, allow busy queue to borrow more tags from idle queue. Thanks, Kuai > > Thanks, > Ming > > . >