Message ID | 20230627120854.971475-2-chengming.zhou@linux.dev |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp8153676vqr; Tue, 27 Jun 2023 05:24:04 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ655Mh8vAMBwhNNd9C1n7q6GlsntkbCQzBjpeKm8Wdp2Cv7eO2+601OOdRTsWcFzz+6zV/G X-Received: by 2002:a17:907:1dec:b0:991:e3c4:c129 with SMTP id og44-20020a1709071dec00b00991e3c4c129mr3040485ejc.69.1687868643690; Tue, 27 Jun 2023 05:24:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687868643; cv=none; d=google.com; s=arc-20160816; b=y44lMEiXMtOg6TKA3OVk1R+A/9kIckM1jJ8tHTlj5mK2ucfRfehtJWg3EDJmDVfzFp ARk+a/tqKErEFWJ9Ez2RlfYVQdhmUFzGgsUcW223pDXucqDLCkeooEiVgfxm1e8v2/nX MSO4zJ3ux4dM069aMsb8wHmSLz/oy697uyDOaRegb3Ztygyh4nKlZC/9k3Hh5kU3w/tE nBz7m5bPcMJ3UxHxhhJcW7eWSpsTlp5A0n5mpzv/BTOfv5Xqsc7sPOqJytEnSSjlULFj QnFpXzoPZ5ayQIIOlBmZPPmSLTJ3WxPOBxuS+/ZSN6t5q1tDEJdwctlZzlRF1lIpoPc+ qyVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=jXwoOICLOShdX7GdT4p+4NbQyy/5jY0pTAO6DGHq6Ek=; fh=T4yTOyWmBQqcqsH0AWzH3ZUeoyvhq0SzMP2E10XewIg=; b=uR7Ngc8yfNAOHU8NeATxK5zZtmlzYodvaP3Emj2pQo1nyhM/KTMy51K/V9/wydtPsI 9zJoKon7R4GChEH7sIiIIk3AQNpT2v2ynT3xEn/lp8WGBbHFl912cDENvk3xLrHhFH85 pPJo2UnS6VF6jjiL8NIHvdr1si5CX0tsAgrBcfmOSaSliZQA/zmvaq7o8krA4yBQ1BLV kZ5OpFmj7lFI6kJMl7COK8NV0ZEhA0RyFWtyYj+Y24rO2YqayV0gNKNcjFc0W2rbuLBs iQZ++yjenDqyE/BITywZWHpmbz5etHXT4HdnLDnRZl1qr1XHO+zXED54HC2z3jdY19cz YZYw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=deQwbwKz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hb17-20020a170906b89100b00991f1e4b044si1601644ejb.336.2023.06.27.05.23.39; Tue, 27 Jun 2023 05:24:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=deQwbwKz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231847AbjF0MJc (ORCPT <rfc822;filip.gregor98@gmail.com> + 99 others); Tue, 27 Jun 2023 08:09:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35192 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229977AbjF0MJ3 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 27 Jun 2023 08:09:29 -0400 X-Greylist: delayed 2197 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Tue, 27 Jun 2023 05:09:27 PDT Received: from out-22.mta1.migadu.com (out-22.mta1.migadu.com [95.215.58.22]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D5017199C for <linux-kernel@vger.kernel.org>; Tue, 27 Jun 2023 05:09:27 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1687867766; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jXwoOICLOShdX7GdT4p+4NbQyy/5jY0pTAO6DGHq6Ek=; b=deQwbwKzKSR21OXpf2PePuZ4sU+KKmFuah4oWQXe3F+ZpELyCJ15iLXDG6blxyzBfiIHs2 2ywkmlUzuZIIgj1T6XEFXzPV3TvaD9FoHjGU2qNcBejs35N3PYWmVHzpMWJDxgoOgFwTAY M1eU9Wx8l5DBTCFFlNalCZjGl91D878= From: chengming.zhou@linux.dev To: axboe@kernel.dk, tj@kernel.org, hch@lst.de, ming.lei@redhat.com Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, zhouchengming@bytedance.com Subject: [PATCH 1/4] blk-mq: use percpu csd to remote complete instead of per-rq csd Date: Tue, 27 Jun 2023 20:08:51 +0800 Message-Id: <20230627120854.971475-2-chengming.zhou@linux.dev> In-Reply-To: <20230627120854.971475-1-chengming.zhou@linux.dev> References: <20230627120854.971475-1-chengming.zhou@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769858550821348560?= X-GMAIL-MSGID: =?utf-8?q?1769858550821348560?= |
Series |
blk-mq: optimize the size of struct request
|
|
Commit Message
Chengming Zhou
June 27, 2023, 12:08 p.m. UTC
From: Chengming Zhou <zhouchengming@bytedance.com> If request need to be completed remotely, we insert it into percpu llist, and smp_call_function_single_async() if llist is empty previously. We don't need to use per-rq csd, percpu csd is enough. And the size of struct request is decreased by 24 bytes. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> --- block/blk-mq.c | 12 ++++++++---- include/linux/blk-mq.h | 5 +---- 2 files changed, 9 insertions(+), 8 deletions(-)
Comments
On Tue, Jun 27, 2023 at 08:08:51PM +0800, chengming.zhou@linux.dev wrote: > From: Chengming Zhou <zhouchengming@bytedance.com> > > If request need to be completed remotely, we insert it into percpu llist, > and smp_call_function_single_async() if llist is empty previously. > > We don't need to use per-rq csd, percpu csd is enough. And the size of > struct request is decreased by 24 bytes. > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > --- > block/blk-mq.c | 12 ++++++++---- > include/linux/blk-mq.h | 5 +---- > 2 files changed, 9 insertions(+), 8 deletions(-) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index decb6ab2d508..a36822479b94 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -43,6 +43,7 @@ > #include "blk-ioprio.h" > > static DEFINE_PER_CPU(struct llist_head, blk_cpu_done); > +static DEFINE_PER_CPU(struct __call_single_data, blk_cpu_csd); It might be better to use call_single_data, given: /* Use __aligned() to avoid to use 2 cache lines for 1 csd */ typedef struct __call_single_data call_single_data_t __aligned(sizeof(struct __call_single_data)); > > static void blk_mq_insert_request(struct request *rq, blk_insert_t flags); > static void blk_mq_request_bypass_insert(struct request *rq, > @@ -1156,13 +1157,13 @@ static void blk_mq_complete_send_ipi(struct request *rq) > { > struct llist_head *list; > unsigned int cpu; > + struct __call_single_data *csd; > > cpu = rq->mq_ctx->cpu; > list = &per_cpu(blk_cpu_done, cpu); > - if (llist_add(&rq->ipi_list, list)) { > - INIT_CSD(&rq->csd, __blk_mq_complete_request_remote, rq); > - smp_call_function_single_async(cpu, &rq->csd); > - } > + csd = &per_cpu(blk_cpu_csd, cpu); > + if (llist_add(&rq->ipi_list, list)) > + smp_call_function_single_async(cpu, csd); > } This way is cleaner, and looks correct, given block softirq is guaranteed to be scheduled to consume the list if one new request is added to this percpu list, either smp_call_function_single_async() returns -EBUSY or 0. thanks Ming
On 2023/6/28 10:20, Ming Lei wrote: > On Tue, Jun 27, 2023 at 08:08:51PM +0800, chengming.zhou@linux.dev wrote: >> From: Chengming Zhou <zhouchengming@bytedance.com> >> >> If request need to be completed remotely, we insert it into percpu llist, >> and smp_call_function_single_async() if llist is empty previously. >> >> We don't need to use per-rq csd, percpu csd is enough. And the size of >> struct request is decreased by 24 bytes. >> >> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> >> --- >> block/blk-mq.c | 12 ++++++++---- >> include/linux/blk-mq.h | 5 +---- >> 2 files changed, 9 insertions(+), 8 deletions(-) >> >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index decb6ab2d508..a36822479b94 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -43,6 +43,7 @@ >> #include "blk-ioprio.h" >> >> static DEFINE_PER_CPU(struct llist_head, blk_cpu_done); >> +static DEFINE_PER_CPU(struct __call_single_data, blk_cpu_csd); > > It might be better to use call_single_data, given: > > /* Use __aligned() to avoid to use 2 cache lines for 1 csd */ > typedef struct __call_single_data call_single_data_t > __aligned(sizeof(struct __call_single_data)); > Good, I will change to use this. >> >> static void blk_mq_insert_request(struct request *rq, blk_insert_t flags); >> static void blk_mq_request_bypass_insert(struct request *rq, >> @@ -1156,13 +1157,13 @@ static void blk_mq_complete_send_ipi(struct request *rq) >> { >> struct llist_head *list; >> unsigned int cpu; >> + struct __call_single_data *csd; >> >> cpu = rq->mq_ctx->cpu; >> list = &per_cpu(blk_cpu_done, cpu); >> - if (llist_add(&rq->ipi_list, list)) { >> - INIT_CSD(&rq->csd, __blk_mq_complete_request_remote, rq); >> - smp_call_function_single_async(cpu, &rq->csd); >> - } >> + csd = &per_cpu(blk_cpu_csd, cpu); >> + if (llist_add(&rq->ipi_list, list)) >> + smp_call_function_single_async(cpu, csd); >> } > > This way is cleaner, and looks correct, given block softirq is guaranteed to be > scheduled to consume the list if one new request is added to this percpu list, > either smp_call_function_single_async() returns -EBUSY or 0. > If this llist_add() see the llist is empty, the consumer function in the softirq on the remote CPU must have consumed the llist, so smp_call_function_single_async() won't return -EBUSY ? Thanks.
On Wed, Jun 28, 2023 at 11:28:20AM +0800, Chengming Zhou wrote: > On 2023/6/28 10:20, Ming Lei wrote: > > On Tue, Jun 27, 2023 at 08:08:51PM +0800, chengming.zhou@linux.dev wrote: > >> From: Chengming Zhou <zhouchengming@bytedance.com> > >> > >> If request need to be completed remotely, we insert it into percpu llist, > >> and smp_call_function_single_async() if llist is empty previously. > >> > >> We don't need to use per-rq csd, percpu csd is enough. And the size of > >> struct request is decreased by 24 bytes. > >> > >> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > >> --- > >> block/blk-mq.c | 12 ++++++++---- > >> include/linux/blk-mq.h | 5 +---- > >> 2 files changed, 9 insertions(+), 8 deletions(-) > >> > >> diff --git a/block/blk-mq.c b/block/blk-mq.c > >> index decb6ab2d508..a36822479b94 100644 > >> --- a/block/blk-mq.c > >> +++ b/block/blk-mq.c > >> @@ -43,6 +43,7 @@ > >> #include "blk-ioprio.h" > >> > >> static DEFINE_PER_CPU(struct llist_head, blk_cpu_done); > >> +static DEFINE_PER_CPU(struct __call_single_data, blk_cpu_csd); > > > > It might be better to use call_single_data, given: > > > > /* Use __aligned() to avoid to use 2 cache lines for 1 csd */ > > typedef struct __call_single_data call_single_data_t > > __aligned(sizeof(struct __call_single_data)); > > > > Good, I will change to use this. > > >> > >> static void blk_mq_insert_request(struct request *rq, blk_insert_t flags); > >> static void blk_mq_request_bypass_insert(struct request *rq, > >> @@ -1156,13 +1157,13 @@ static void blk_mq_complete_send_ipi(struct request *rq) > >> { > >> struct llist_head *list; > >> unsigned int cpu; > >> + struct __call_single_data *csd; > >> > >> cpu = rq->mq_ctx->cpu; > >> list = &per_cpu(blk_cpu_done, cpu); > >> - if (llist_add(&rq->ipi_list, list)) { > >> - INIT_CSD(&rq->csd, __blk_mq_complete_request_remote, rq); > >> - smp_call_function_single_async(cpu, &rq->csd); > >> - } > >> + csd = &per_cpu(blk_cpu_csd, cpu); > >> + if (llist_add(&rq->ipi_list, list)) > >> + smp_call_function_single_async(cpu, csd); > >> } > > > > This way is cleaner, and looks correct, given block softirq is guaranteed to be > > scheduled to consume the list if one new request is added to this percpu list, > > either smp_call_function_single_async() returns -EBUSY or 0. > > > > If this llist_add() see the llist is empty, the consumer function in the softirq > on the remote CPU must have consumed the llist, so smp_call_function_single_async() > won't return -EBUSY ? block softirq can be scheduled from other code path, such as blk_mq_raise_softirq() for single queue's remote completion, where no percpu csd schedule is needed, so two smp_call_function_single_async() could be called, and the 2nd one may return -EBUSY. Not mention csd_unlock() could be called after the callback returns, see __flush_smp_call_function_queue(). But that is fine, if there is pending block softirq, the llist is guaranteed to be consumed because the csd callback just raises block softirq, and request/llist is consumed in softirq handler. Thanks, Ming
On 2023/6/28 12:50, Ming Lei wrote: > On Wed, Jun 28, 2023 at 11:28:20AM +0800, Chengming Zhou wrote: >> On 2023/6/28 10:20, Ming Lei wrote: >>> On Tue, Jun 27, 2023 at 08:08:51PM +0800, chengming.zhou@linux.dev wrote: >>>> From: Chengming Zhou <zhouchengming@bytedance.com> >>>> >>>> If request need to be completed remotely, we insert it into percpu llist, >>>> and smp_call_function_single_async() if llist is empty previously. >>>> >>>> We don't need to use per-rq csd, percpu csd is enough. And the size of >>>> struct request is decreased by 24 bytes. >>>> >>>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> >>>> --- >>>> block/blk-mq.c | 12 ++++++++---- >>>> include/linux/blk-mq.h | 5 +---- >>>> 2 files changed, 9 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>> index decb6ab2d508..a36822479b94 100644 >>>> --- a/block/blk-mq.c >>>> +++ b/block/blk-mq.c >>>> @@ -43,6 +43,7 @@ >>>> #include "blk-ioprio.h" >>>> >>>> static DEFINE_PER_CPU(struct llist_head, blk_cpu_done); >>>> +static DEFINE_PER_CPU(struct __call_single_data, blk_cpu_csd); >>> >>> It might be better to use call_single_data, given: >>> >>> /* Use __aligned() to avoid to use 2 cache lines for 1 csd */ >>> typedef struct __call_single_data call_single_data_t >>> __aligned(sizeof(struct __call_single_data)); >>> >> >> Good, I will change to use this. >> >>>> >>>> static void blk_mq_insert_request(struct request *rq, blk_insert_t flags); >>>> static void blk_mq_request_bypass_insert(struct request *rq, >>>> @@ -1156,13 +1157,13 @@ static void blk_mq_complete_send_ipi(struct request *rq) >>>> { >>>> struct llist_head *list; >>>> unsigned int cpu; >>>> + struct __call_single_data *csd; >>>> >>>> cpu = rq->mq_ctx->cpu; >>>> list = &per_cpu(blk_cpu_done, cpu); >>>> - if (llist_add(&rq->ipi_list, list)) { >>>> - INIT_CSD(&rq->csd, __blk_mq_complete_request_remote, rq); >>>> - smp_call_function_single_async(cpu, &rq->csd); >>>> - } >>>> + csd = &per_cpu(blk_cpu_csd, cpu); >>>> + if (llist_add(&rq->ipi_list, list)) >>>> + smp_call_function_single_async(cpu, csd); >>>> } >>> >>> This way is cleaner, and looks correct, given block softirq is guaranteed to be >>> scheduled to consume the list if one new request is added to this percpu list, >>> either smp_call_function_single_async() returns -EBUSY or 0. >>> >> >> If this llist_add() see the llist is empty, the consumer function in the softirq >> on the remote CPU must have consumed the llist, so smp_call_function_single_async() >> won't return -EBUSY ? > > block softirq can be scheduled from other code path, such as blk_mq_raise_softirq() > for single queue's remote completion, where no percpu csd schedule is needed, so > two smp_call_function_single_async() could be called, and the 2nd one > may return -EBUSY. Thanks for your very clear explanation! I understand what you mean. Yes, the 2nd smp_call_function_single_async() will return -EBUSY, but it's ok since the 1st will do the right thing. > > Not mention csd_unlock() could be called after the callback returns, see > __flush_smp_call_function_queue(). Ok, CSD_TYPE_SYNC will csd_unlock() after csd_do_func() returns, our CSD_TYPE_ASYNC will csd_unlock() before csd_do_func(). > > But that is fine, if there is pending block softirq, the llist is > guaranteed to be consumed because the csd callback just raises block > softirq, and request/llist is consumed in softirq handler. > Agree, it's fine even the 2nd return -EBUSY when the 1st function is raising block softirq, our llist will be consumed in softirq handler. Thanks!
diff --git a/block/blk-mq.c b/block/blk-mq.c index decb6ab2d508..a36822479b94 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -43,6 +43,7 @@ #include "blk-ioprio.h" static DEFINE_PER_CPU(struct llist_head, blk_cpu_done); +static DEFINE_PER_CPU(struct __call_single_data, blk_cpu_csd); static void blk_mq_insert_request(struct request *rq, blk_insert_t flags); static void blk_mq_request_bypass_insert(struct request *rq, @@ -1156,13 +1157,13 @@ static void blk_mq_complete_send_ipi(struct request *rq) { struct llist_head *list; unsigned int cpu; + struct __call_single_data *csd; cpu = rq->mq_ctx->cpu; list = &per_cpu(blk_cpu_done, cpu); - if (llist_add(&rq->ipi_list, list)) { - INIT_CSD(&rq->csd, __blk_mq_complete_request_remote, rq); - smp_call_function_single_async(cpu, &rq->csd); - } + csd = &per_cpu(blk_cpu_csd, cpu); + if (llist_add(&rq->ipi_list, list)) + smp_call_function_single_async(cpu, csd); } static void blk_mq_raise_softirq(struct request *rq) @@ -4796,6 +4797,9 @@ static int __init blk_mq_init(void) for_each_possible_cpu(i) init_llist_head(&per_cpu(blk_cpu_done, i)); + for_each_possible_cpu(i) + INIT_CSD(&per_cpu(blk_cpu_csd, i), + __blk_mq_complete_request_remote, NULL); open_softirq(BLOCK_SOFTIRQ, blk_done_softirq); cpuhp_setup_state_nocalls(CPUHP_BLOCK_SOFTIRQ_DEAD, diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index f401067ac03a..070551197c0e 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -182,10 +182,7 @@ struct request { rq_end_io_fn *saved_end_io; } flush; - union { - struct __call_single_data csd; - u64 fifo_time; - }; + u64 fifo_time; /* * completion callback.