Message ID | 20231010142216.1114752-1-ming.lei@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2908:b0:403:3b70:6f57 with SMTP id ib8csp240250vqb; Tue, 10 Oct 2023 07:23:55 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGvZpT/PDOXp3m4cw+pfYUSHNcDvOMTPu7PyDpY9jLMEcwELRbduJSsJPZkJOyw+hGjbSoV X-Received: by 2002:a17:902:ab82:b0:1c1:e7b2:27ad with SMTP id f2-20020a170902ab8200b001c1e7b227admr15367429plr.60.1696947835687; Tue, 10 Oct 2023 07:23:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696947835; cv=none; d=google.com; s=arc-20160816; b=yFQRqtmFwQpIzMwpurcz2JOvb+RS05FKR+kUvudchtQGk5G2Qcc9bChOLDlQWil/ki KZxpKn0vAcGtLvAe8V8HS6zrpilijJjt3rSpcPlYIxNolEU9zgKbJ/0TxD7pdSvYNoLx eTdNHu5jEeVJpuXZS51fVpLFNh9DV9QIdvQgW9Z4pv8UbOSVkf8r9J4xI9bhcSOsMCsg w7Aa7JpOmTRptxGeUX7Tq4tYccBNAbK5Y5a8zwAX/YnzupyD/Mnv2/GVqxTJGdMXA6CM EVZiorS0Cyj1HtQT/glCIlSEGh6DFKLNLZOFy1hSTxHrowGTrMlIDxmYpwRgRJdY/uHG F5Cg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=eanIYdQ/OkURqG898KbJm4DzgfTGcpyCx3WajorKeyU=; fh=nDpWr6gUY0RnsZgdP1f/CGLAt67E1YoMsxemygXC10A=; b=rgs4TR6nkzckendFRyB0Bv3rLDIrwcn8TxAzFy5arEL/A7xZJCTFVE1UTFRIMnpR/j HtUwTFghG+NM9VOCLSnDNih4/ANM3P9sHu5d30PgDnYyP/Ep3zIEzGUpiST53ELHIYL6 AyzqFDV5atmzzY+M5JpbZOBsDH77GM/u74/ZgydsWYqyC0/GBAs+JxtIrF2ZgW224EfZ ktITGb09RhHR2Pck8FFIsoN8Znb/jWI3NekH5yw7lx+wrhzkGul5rjlXVAWY7xbgfNFL iyiVk2/OTHgVQ6ioZW59nRYBvkSNWT53g0Okqr5juEo04k15e0Cp1OOi3YvTiuBezkqr ciGw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ORX6NROu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id w9-20020a1709026f0900b001c62b659f98si9495223plk.79.2023.10.10.07.23.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Oct 2023 07:23:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ORX6NROu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 99824810FBD3; Tue, 10 Oct 2023 07:23:53 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232865AbjJJOXl (ORCPT <rfc822;rua109.linux@gmail.com> + 20 others); Tue, 10 Oct 2023 10:23:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58982 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232779AbjJJOXj (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 10 Oct 2023 10:23:39 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3A59AA4 for <linux-kernel@vger.kernel.org>; Tue, 10 Oct 2023 07:23:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696947779; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=eanIYdQ/OkURqG898KbJm4DzgfTGcpyCx3WajorKeyU=; b=ORX6NROuSgHwmZjSU8b+hPZRR+qzVjRxUG+JqldAIwEC2a0ssG1XL3Dmjyu/Q2qoRpjSgm vHDqm5y/PEa9IvEqh8/Wu/Ab2W5FTykt7feSIpGlzyITi9/XqdGSw6mLR+1AmJi/uqfMVA 9BG14HGkkWuPZAfutUFpMrz1Ap9aOeU= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-101-Ztn_c7GENNWS0tBFcB0GLA-1; Tue, 10 Oct 2023 10:22:51 -0400 X-MC-Unique: Ztn_c7GENNWS0tBFcB0GLA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D87EC1C068DC; Tue, 10 Oct 2023 14:22:34 +0000 (UTC) Received: from localhost (unknown [10.72.120.3]) by smtp.corp.redhat.com (Postfix) with ESMTP id 034B125C3; Tue, 10 Oct 2023 14:22:33 +0000 (UTC) From: Ming Lei <ming.lei@redhat.com> To: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org, Tejun Heo <tj@kernel.org>, linux-kernel@vger.kernel.org, Ming Lei <ming.lei@redhat.com>, Juri Lelli <juri.lelli@redhat.com>, Andrew Theurer <atheurer@redhat.com>, Joe Mario <jmario@redhat.com>, Sebastian Jug <sejug@redhat.com> Subject: [PATCH] blk-mq: add module parameter to not run block kworker on isolated CPUs Date: Tue, 10 Oct 2023 22:22:16 +0800 Message-ID: <20231010142216.1114752-1-ming.lei@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 X-Spam-Status: No, score=2.7 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Tue, 10 Oct 2023 07:23:53 -0700 (PDT) X-Spam-Level: ** X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1779378773643649493 X-GMAIL-MSGID: 1779378773643649493 |
Series |
blk-mq: add module parameter to not run block kworker on isolated CPUs
|
|
Commit Message
Ming Lei
Oct. 10, 2023, 2:22 p.m. UTC
Kernel parameter of `isolcpus=` is used for isolating CPUs for specific
task, and user often won't want block IO to disturb these CPUs, also long
IO latency may be caused if blk-mq kworker is scheduled on these isolated
CPUs.
Kernel workqueue only respects this limit for WQ_UNBOUND, for bound wq,
the responsibility should be on wq user.
Add one block layer parameter for not running block kworker on isolated
CPUs.
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Andrew Theurer <atheurer@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Sebastian Jug <sejug@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
Comments
(cc'ing Frederic) On Tue, Oct 10, 2023 at 10:22:16PM +0800, Ming Lei wrote: > Kernel parameter of `isolcpus=` is used for isolating CPUs for specific > task, and user often won't want block IO to disturb these CPUs, also long > IO latency may be caused if blk-mq kworker is scheduled on these isolated > CPUs. > > Kernel workqueue only respects this limit for WQ_UNBOUND, for bound wq, > the responsibility should be on wq user. > > Add one block layer parameter for not running block kworker on isolated > CPUs. > > Cc: Juri Lelli <juri.lelli@redhat.com> > Cc: Andrew Theurer <atheurer@redhat.com> > Cc: Joe Mario <jmario@redhat.com> > Cc: Sebastian Jug <sejug@redhat.com> > Signed-off-by: Ming Lei <ming.lei@redhat.com> > --- > block/blk-mq.c | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index ec922c6bccbe..c53b5b522053 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -29,6 +29,7 @@ > #include <linux/prefetch.h> > #include <linux/blk-crypto.h> > #include <linux/part_stat.h> > +#include <linux/sched/isolation.h> > > #include <trace/events/block.h> > > @@ -42,6 +43,13 @@ > #include "blk-rq-qos.h" > #include "blk-ioprio.h" > > +static bool respect_cpu_isolation; > +module_param(respect_cpu_isolation, bool, 0444); > +MODULE_PARM_DESC(respect_cpu_isolation, > + "Don't schedule blk-mq worker on isolated CPUs passed in " > + "isolcpus= or nohz_full=. User need to guarantee to not run " > + "block IO on isolated CPUs (default: false)"); Any chance we can centralize these? It's no fun to try to hunt down module params to opt in different subsystems and the housekeeping interface does have some provisions for selecting different parts. I'd much prefer to see these settings to be collected into a central place. Thanks.
Hello, On Tue, Oct 10, 2023 at 08:45:44AM -1000, Tejun Heo wrote: > (cc'ing Frederic) > > On Tue, Oct 10, 2023 at 10:22:16PM +0800, Ming Lei wrote: > > Kernel parameter of `isolcpus=` is used for isolating CPUs for specific > > task, and user often won't want block IO to disturb these CPUs, also long > > IO latency may be caused if blk-mq kworker is scheduled on these isolated > > CPUs. > > > > Kernel workqueue only respects this limit for WQ_UNBOUND, for bound wq, > > the responsibility should be on wq user. > > > > Add one block layer parameter for not running block kworker on isolated > > CPUs. > > > > Cc: Juri Lelli <juri.lelli@redhat.com> > > Cc: Andrew Theurer <atheurer@redhat.com> > > Cc: Joe Mario <jmario@redhat.com> > > Cc: Sebastian Jug <sejug@redhat.com> > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > --- > > block/blk-mq.c | 15 +++++++++++++++ > > 1 file changed, 15 insertions(+) > > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > index ec922c6bccbe..c53b5b522053 100644 > > --- a/block/blk-mq.c > > +++ b/block/blk-mq.c > > @@ -29,6 +29,7 @@ > > #include <linux/prefetch.h> > > #include <linux/blk-crypto.h> > > #include <linux/part_stat.h> > > +#include <linux/sched/isolation.h> > > > > #include <trace/events/block.h> > > > > @@ -42,6 +43,13 @@ > > #include "blk-rq-qos.h" > > #include "blk-ioprio.h" > > > > +static bool respect_cpu_isolation; > > +module_param(respect_cpu_isolation, bool, 0444); > > +MODULE_PARM_DESC(respect_cpu_isolation, > > + "Don't schedule blk-mq worker on isolated CPUs passed in " > > + "isolcpus= or nohz_full=. User need to guarantee to not run " > > + "block IO on isolated CPUs (default: false)"); > > Any chance we can centralize these? It's no fun to try to hunt down module > params to opt in different subsystems and the housekeeping interface does > have some provisions for selecting different parts. I'd much prefer to see > these settings to be collected into a central place. I guess it is hard to solve in a central place, such as workqueue. Follows the workqueue API: /** * queue_work_on - queue work on specific cpu * @cpu: CPU number to execute work on * @wq: workqueue to use * @work: work to queue * * We queue the work to a specific CPU, the caller must ensure it * can't go away. Callers that fail to ensure that the specified * CPU cannot go away will execute on a randomly chosen CPU. * But note well that callers specifying a CPU that never has been * online will get a splat. * * Return: %false if @work was already on a queue, %true otherwise. */ bool queue_work_on(int cpu, struct workqueue_struct *wq, struct work_struct *work) The caller specifies one cpu to queue work, what can queue_work_on() do if the specified CPU is isolated? If the API is changed by dealing with isolated CPU, the caller has to modify for adapting with the API change. Secondly isolated CPUs still can be override by 'taskset -C $isolated_cpus', that is why I add one blk-mq module parameter, but the module parameter can be removed, just with two extra effects if block IOs are submitted from isolated CPUs: - driver's ->queue_rq() can be queued on other CPU or UNBOUND CPU, which looks fine - IO timeout may be triggered during cpu hotplug, but this way had been long time, maybe not one big deal too. I appreciate that any specific suggestions about dealing with isolated CPUs generically for bound WQ can be shared. Thanks, Ming
Hello, On Wed, Oct 11, 2023 at 08:39:05AM +0800, Ming Lei wrote: > I appreciate that any specific suggestions about dealing with isolated CPUs > generically for bound WQ can be shared. Oh, all I meant was whether we can at least collect this into or at least adjacent to the existing housekeeping / isolcpu parameters. Let's say there's someone who really wants to isolated some CPUs, how would they find out the different parameters if they're scattered across different subsystems? Thanks.
Hi Tejun, On Thu, Oct 12, 2023 at 09:55:55AM -1000, Tejun Heo wrote: > Hello, > > On Wed, Oct 11, 2023 at 08:39:05AM +0800, Ming Lei wrote: > > I appreciate that any specific suggestions about dealing with isolated CPUs > > generically for bound WQ can be shared. > > Oh, all I meant was whether we can at least collect this into or at least > adjacent to the existing housekeeping / isolcpu parameters. Let's say > there's someone who really wants to isolated some CPUs, how would they find > out the different parameters if they're scattered across different > subsystems? AFAIK, the issue is reported on RH Openshift environment and it is real use case, some of CPUs are isolated for some dedicated tasks(such as network polling, ...) by passing "isolcpus=managed_irq nohz_full". But blk-mq still queue kworker on these isolated CPUs, and cause very long latency in nvme IO workloads. Joe should know the story much more then me. Thanks, Ming
On Tue, Oct 10, 2023 at 08:45:44AM -1000, Tejun Heo wrote: > > +static bool respect_cpu_isolation; > > +module_param(respect_cpu_isolation, bool, 0444); > > +MODULE_PARM_DESC(respect_cpu_isolation, > > + "Don't schedule blk-mq worker on isolated CPUs passed in " > > + "isolcpus= or nohz_full=. User need to guarantee to not run " > > + "block IO on isolated CPUs (default: false)"); > > Any chance we can centralize these? It's no fun to try to hunt down module > params to opt in different subsystems and the housekeeping interface does > have some provisions for selecting different parts. I'd much prefer to see > these settings to be collected into a central place. Do we need this parameter in the first place? Shouldn't we avoid scheduling blk-mq worker on isolated CPUs in any case? Thanks. > > Thanks. > > -- > tejun
On Fri, Oct 13, 2023 at 01:26:08PM +0200, Frederic Weisbecker wrote: > On Tue, Oct 10, 2023 at 08:45:44AM -1000, Tejun Heo wrote: > > > +static bool respect_cpu_isolation; > > > +module_param(respect_cpu_isolation, bool, 0444); > > > +MODULE_PARM_DESC(respect_cpu_isolation, > > > + "Don't schedule blk-mq worker on isolated CPUs passed in " > > > + "isolcpus= or nohz_full=. User need to guarantee to not run " > > > + "block IO on isolated CPUs (default: false)"); > > > > Any chance we can centralize these? It's no fun to try to hunt down module > > params to opt in different subsystems and the housekeeping interface does > > have some provisions for selecting different parts. I'd much prefer to see > > these settings to be collected into a central place. > > Do we need this parameter in the first place? Shouldn't we avoid scheduling > blk-mq worker on isolated CPUs in any case? Yeah, I think this parameter isn't necessary, will remove it in V2. Thanks, Ming
diff --git a/block/blk-mq.c b/block/blk-mq.c index ec922c6bccbe..c53b5b522053 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -29,6 +29,7 @@ #include <linux/prefetch.h> #include <linux/blk-crypto.h> #include <linux/part_stat.h> +#include <linux/sched/isolation.h> #include <trace/events/block.h> @@ -42,6 +43,13 @@ #include "blk-rq-qos.h" #include "blk-ioprio.h" +static bool respect_cpu_isolation; +module_param(respect_cpu_isolation, bool, 0444); +MODULE_PARM_DESC(respect_cpu_isolation, + "Don't schedule blk-mq worker on isolated CPUs passed in " + "isolcpus= or nohz_full=. User need to guarantee to not run " + "block IO on isolated CPUs (default: false)"); + static DEFINE_PER_CPU(struct llist_head, blk_cpu_done); static DEFINE_PER_CPU(call_single_data_t, blk_cpu_csd); @@ -3926,6 +3934,13 @@ static void blk_mq_map_swqueue(struct request_queue *q) */ sbitmap_resize(&hctx->ctx_map, hctx->nr_ctx); + if (respect_cpu_isolation) { + cpumask_and(hctx->cpumask, hctx->cpumask, + housekeeping_cpumask(HK_TYPE_DOMAIN)); + cpumask_and(hctx->cpumask, hctx->cpumask, + housekeeping_cpumask(HK_TYPE_WQ)); + } + /* * Initialize batch roundrobin counts */