Message ID | 20230729135334.566138-1-atomlin@atomlin.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:918b:0:b0:3e4:2afc:c1 with SMTP id s11csp1060233vqg; Sat, 29 Jul 2023 07:29:03 -0700 (PDT) X-Google-Smtp-Source: APBJJlFe6K26iU54dw4KlBicPdJccUnSHMBR57pvLzPmbsD0X8GUbKOwZwt+B/czhXvttLnPhtU8 X-Received: by 2002:a17:906:18:b0:99b:6e54:bd6e with SMTP id 24-20020a170906001800b0099b6e54bd6emr1854346eja.56.1690640943012; Sat, 29 Jul 2023 07:29:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690640942; cv=none; d=google.com; s=arc-20160816; b=E+OHmh9vK4/11hRTMaAufE74ukuw010M7xWNuqHmyYToOxbnQDoxvKbT9H8xVA7wNS eJgqdUCsUqylSKeRUu+Z+4LFuzpTnRplMNxE1ho3lbtc2CtrA8Q7KcNfadchMoqUVcfX CTVHyxZ/7eXqFwKKq3qJYQE6Dr+FuEYnYkjX8qWp5YkdKRcCQEbWKxYOvDlsVvJxm5Tq m2O2eEL1lK11kxYG9YOKkWn05O0q3hvPPiocYYviyLa7Us10rp01ytPU6UydI43YfBlC z0koAu1t5vAmsjeWJdtuIDTD3q2l1okFIbENlvT3VnDAR6qWk8WHP/b9VeLy7PEKzxMo tYAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=ItceX9cDOozwZpHaz0SKD4GPoPizIK47y64sO6xriYE=; fh=iIrmuiooHb58lVSZg9OlpW7yJYF2kt3Km03sQ9AihSM=; b=q+YZ5lMuzcJItu2Xg02Pmo+ZQHckX2H60vTc7g7QoufN/hPeMVpzlVQJ/PMBtehRmC TJGy7o2OCeKyqSyXncuGCpSPjZcsb3laDO3wlhpkbe3x+txD6i7b/X0UzAjTlOLP9qkW aQuS0EypDAPi50FfKp9xRVYLQBFp1wivKjaQqOAIxE69v2vm3jrmmFfQPL6MLBL2FQzO A2DPQaleVv3pRbACIPSfjfKxng0CDo0D6bN0vrO7oPzVZ0QJOByPB2UvFwcAjzjEKDJ+ +Mqc/OpflcSpVGoEkG4GxhYfYl/bSclwvHuPA1Wq82qtewhkQtvxd8exAnHxMjMyZdWm 3PzQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lg21-20020a170906f89500b00991bd84725asi4263167ejb.227.2023.07.29.07.28.39; Sat, 29 Jul 2023 07:29:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231621AbjG2N4A (ORCPT <rfc822;hanasaki@gmail.com> + 99 others); Sat, 29 Jul 2023 09:56:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48200 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231491AbjG2Nz4 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sat, 29 Jul 2023 09:55:56 -0400 X-Greylist: delayed 135 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Sat, 29 Jul 2023 06:55:52 PDT Received: from p3plsmtpa08-03.prod.phx3.secureserver.net (p3plsmtpa08-03.prod.phx3.secureserver.net [173.201.193.104]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A8AA1FE7 for <linux-kernel@vger.kernel.org>; Sat, 29 Jul 2023 06:55:51 -0700 (PDT) Received: from localhost ([82.27.99.45]) by :SMTPAUTH: with ESMTPA id PkNyqM7LIgssUPkO0q93dq; Sat, 29 Jul 2023 06:53:36 -0700 X-CMAE-Analysis: v=2.4 cv=XoY8e3J9 c=1 sm=1 tr=0 ts=64c519e0 a=YwMIiW7BGddQzL8MrqPWMg==:117 a=YwMIiW7BGddQzL8MrqPWMg==:17 a=Qe_DDWJ2S8ZzKdoxMNcA:9 X-SECURESERVER-ACCT: atomlin@atomlin.com From: Aaron Tomlin <atomlin@atomlin.com> To: linux-kernel@vger.kernel.org Cc: tj@kernel.org, jiangshanlai@gmail.com, peterz@infradead.org Subject: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER Date: Sat, 29 Jul 2023 14:53:32 +0100 Message-Id: <20230729135334.566138-1-atomlin@atomlin.com> X-Mailer: git-send-email 2.39.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CMAE-Envelope: MS4xfDm+ivhAG0siGQTlgWZ1nlzf+D2vp9U5BHNoR4ign5AtO7ygzgpr0Kj+Jq+BWNQ+RDj0fqB6LRpM0Eqaz55G37n+b/tU2V/lbJmSVFWagz8RscjdS5gf cWe+dbF5uKU+Non476fA6AHoiT1FqbOKj3ufRfnirk5UMvRFhSIC7mLBFOVbT+1tfKTrB6clDUSYj8DKi+RH2/VHi0bZVQt4StwBdVCjaFqDeAShb6E27Gqx ZaHGqvXV/v77Zx6M2dOlyZKYgYqwF8ZipH0Kf1QJhAc= X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772765517508955464 X-GMAIL-MSGID: 1772765517508955464 |
Series |
workqueue: Introduce PF_WQ_RESCUE_WORKER
|
|
Message
Aaron Tomlin
July 29, 2023, 1:53 p.m. UTC
The Linux kernel does not provide a way to differentiate between a kworker and a rescue kworker for user-mode. From user-mode, one can establish if a task is a kworker by testing for PF_WQ_WORKER in a specified task's flags bit mask (or bitmap) via /proc/[PID]/stat. Indeed, one can examine /proc/[PID]/stack and search for the function namely "rescuer_thread". This is only available to the root user. It can be useful to identify a rescue kworker since their CPU affinity cannot be modified and their initial CPU assignment can be safely ignored. Furthermore, a workqueue that was created with WQ_MEM_RECLAIM and WQ_SYSFS the cpumask file is not applicable to the rescue kworker. By design a rescue kworker should run anywhere. This patch series introduces PF_WQ_RESCUE_WORKER and ensures it is set and cleared appropriately and simplifies current_is_workqueue_rescuer(). Aaron Tomlin (2): workqueue: Introduce PF_WQ_RESCUE_WORKER workqueue: Simplify current_is_workqueue_rescuer() include/linux/sched.h | 2 +- kernel/workqueue.c | 25 +++++++++++++++---------- 2 files changed, 16 insertions(+), 11 deletions(-)
Comments
Hello, On Sat, Jul 29, 2023 at 02:53:32PM +0100, Aaron Tomlin wrote: > It can be useful to identify a rescue kworker since their CPU affinity > cannot be modified and their initial CPU assignment can be safely ignored. You really shouldn't be setting affinities on kworkers manually. There's no way of knowing which kworker is going to execute which workqueue. Please use the attributes API and sysfs interface to modify per-workqueue worker attributes. If that's not sufficient and you need finer grained control, the right thing to do is using kthread_worker which gives you a dedicated kthread that you can manipulate as appropriate. Thanks.
> You really shouldn't be setting affinities on kworkers manually. There's > no way of knowing which kworker is going to execute which workqueue. > Please use the attributes API and sysfs interface to modify per-workqueue > worker attributes. If that's not sufficient and you need finer grained > control, the right thing to do is using kthread_worker which gives you a > dedicated kthread that you can manipulate as appropriate. Hi Tejun, I completely agree. Each kworker has PF_NO_SETAFFINITY applied anyway. If I understand correctly, only an unbound kworker can have their CPU affinity modified via sysfs. The objective of this series was to easily identify a rescuer kworker from user-mode. Kind regards,
Hello, On Tue, Aug 01, 2023 at 11:53:01AM +0100, Aaron Tomlin wrote: > > You really shouldn't be setting affinities on kworkers manually. There's > > no way of knowing which kworker is going to execute which workqueue. > > Please use the attributes API and sysfs interface to modify per-workqueue > > worker attributes. If that's not sufficient and you need finer grained > > control, the right thing to do is using kthread_worker which gives you a > > dedicated kthread that you can manipulate as appropriate. > > I completely agree. Each kworker has PF_NO_SETAFFINITY applied anyway. > If I understand correctly, only an unbound kworker can have their CPU > affinity modified via sysfs. The objective of this series was to easily > identify a rescuer kworker from user-mode. But why do you need to identify rescue workers? What are you trying to achieve? Thanks.
> But why do you need to identify rescue workers? What are you trying to > achieve? Hi Tejun, I had a conversation with a colleague of mine. It can be useful to identify and account for all kernel threads. From the perspective of user-mode, the name given currently to the rescuer kworker is ambiguous. For instance, "kworker/u16:9-kcryptd/253:0" is clearly identifiable as an unbound kworker for the specified workqueue which can have their CPU affinity adjusted as you mentioned before. I think if we followed the same naming convention for a rescuer kworker then it would be more consistent. I'll send a patch so it can be discussed further. Kind regards,
On Thu, Aug 03, 2023 at 09:19:14PM +0100, Aaron Tomlin wrote: > > But why do you need to identify rescue workers? What are you trying to > > achieve? > > Hi Tejun, > > I had a conversation with a colleague of mine. It can be useful to identify > and account for all kernel threads. From the perspective of user-mode, the > name given currently to the rescuer kworker is ambiguous. For instance, > "kworker/u16:9-kcryptd/253:0" is clearly identifiable as an unbound kworker > for the specified workqueue which can have their CPU affinity adjusted as Note that the name changes to the work item the worker is currently executing. It won't stay that way. Workers are shared across the workqueues, so I'm not sure "identify and account all kernel threads" is working as well as you think it is. > you mentioned before. I think if we followed the same naming convention > for a rescuer kworker then it would be more consistent. I'll send a patch > so it can be discussed further. We can certainly rename them to indicate that they are rescuers - e.g. maybe krescuer? But, at the moment, the proposed reason seems rather dubious. Thanks.
> Note that the name changes to the work item the worker is currently > executing. It won't stay that way. Workers are shared across the > workqueues, so I'm not sure "identify and account all kernel threads" is > working as well as you think it is. Hi Tejun, Indeed. The point is that these kworker kthreads are easily identifiable. > We can certainly rename them to indicate that they are rescuers - e.g. > maybe krescuer? But, at the moment, the proposed reason seems rather > dubious. Personally, I would prefer "kworker/r-%s" and then include the specified workqueue's name e.g. "kworker/r-ext4-rsv-conver". So the rescuer task's name is more consistent with the current naming scheme. I will send a follow up patch. Kind regards,
Hi, Just stumbled upon this series while looking into rescuers myself. :) On 29/07/23 14:53, Aaron Tomlin wrote: > The Linux kernel does not provide a way to differentiate between a > kworker and a rescue kworker for user-mode. > From user-mode, one can establish if a task is a kworker by testing for > PF_WQ_WORKER in a specified task's flags bit mask (or bitmap) via > /proc/[PID]/stat. Indeed, one can examine /proc/[PID]/stack and search > for the function namely "rescuer_thread". This is only available to the > root user. > > It can be useful to identify a rescue kworker since their CPU affinity > cannot be modified and their initial CPU assignment can be safely ignored. > Furthermore, a workqueue that was created with WQ_MEM_RECLAIM and > WQ_SYSFS the cpumask file is not applicable to the rescue kworker. > By design a rescue kworker should run anywhere. Guess this is a requirement because, if workqueue processing is stuck for some reason, getting rescuers to run on the same set of cpus workqueues have been restricted to already doesn't really have good chances of making any progress? Wonder if we still might need some sort of fail hard/warn mode in case strict isolation is in place? Or maybe we have that already? Thanks! Juri
Hello, On Mon, Dec 11, 2023 at 03:51:57PM +0100, Juri Lelli wrote: > Guess this is a requirement because, if workqueue processing is stuck > for some reason, getting rescuers to run on the same set of cpus > workqueues have been restricted to already doesn't really have good > chances of making any progress? The only problem rescuers try to solve is deadlocks caused by lack of memory, so on the cpu side, it just follows whatever worker pool it's trying to help. > Wonder if we still might need some sort of fail hard/warn mode in case > strict isolation is in place? Or maybe we have that already? For both percpu and unbound workqueues, the rescuers just follow whatever pool it's trying to help at the moment, so it shouldn't cause any surprises in terms of isolation. It just temporarily joins the already active but stuck pool. Thanks.
Hello, Thanks for the quick reply! On 11/12/23 08:39, Tejun Heo wrote: > Hello, > > On Mon, Dec 11, 2023 at 03:51:57PM +0100, Juri Lelli wrote: > > Guess this is a requirement because, if workqueue processing is stuck > > for some reason, getting rescuers to run on the same set of cpus > > workqueues have been restricted to already doesn't really have good > > chances of making any progress? > > The only problem rescuers try to solve is deadlocks caused by lack of > memory, so on the cpu side, it just follows whatever worker pool it's trying > to help. > > > Wonder if we still might need some sort of fail hard/warn mode in case > > strict isolation is in place? Or maybe we have that already? > > For both percpu and unbound workqueues, the rescuers just follow whatever > pool it's trying to help at the moment, so it shouldn't cause any surprises > in terms of isolation. It just temporarily joins the already active but > stuck pool. Hummm, OK, but in terms of which CPU the rescuer is possibly woken up, how are we making sure that the wake up is always happening on housekeeping CPUs (assuming unbound workqueues have been restricted to those)? AFAICS, we have send_mayday -> wake_up_process(wq->rescuer->task) which is not affined to the workqueue cpumask it's called to rescue, so in theory can be woken up anywhere? Thanks, Juri
Hello, Juri. On Tue, Dec 12, 2023 at 10:56:02AM +0100, Juri Lelli wrote: > Hummm, OK, but in terms of which CPU the rescuer is possibly woken up, > how are we making sure that the wake up is always happening on > housekeeping CPUs (assuming unbound workqueues have been restricted to > those)? > > AFAICS, we have > > send_mayday -> > wake_up_process(wq->rescuer->task) > > which is not affined to the workqueue cpumask it's called to rescue, so > in theory can be woken up anywhere? Ah, was only thinking about work item execution. Yeah, it's not following the isolation rule there and we probably should affine it as we're waking it up. Thanks.
On Tue, Dec 12, 2023 at 07:14:48AM -1000, Tejun Heo wrote: > Hello, Juri. > > On Tue, Dec 12, 2023 at 10:56:02AM +0100, Juri Lelli wrote: > > Hummm, OK, but in terms of which CPU the rescuer is possibly woken up, > > how are we making sure that the wake up is always happening on > > housekeeping CPUs (assuming unbound workqueues have been restricted to > > those)? > > > > AFAICS, we have > > > > send_mayday -> > > wake_up_process(wq->rescuer->task) > > > > which is not affined to the workqueue cpumask it's called to rescue, so > > in theory can be woken up anywhere? > > Ah, was only thinking about work item execution. Yeah, it's not following > the isolation rule there and we probably should affine it as we're waking it > up. Hi Tejun, I am confused. I thought by design we want a rescuer kthread to execute on any CPU, no? Kind regards,
On Tue, Dec 12, 2023 at 07:06:48PM +0000, Aaron Tomlin wrote:
> I thought by design we want a rescuer kthread to execute on any CPU, no?
Well, it needs to be able to move around because it dynamically attaches to
the worker pool it's rescuing and needs to take on its cpumask, but it
doesn't have to be able to run on all cpus all the time.
Thanks.
On 12/12/23 07:14, Tejun Heo wrote: > Hello, Juri. > > On Tue, Dec 12, 2023 at 10:56:02AM +0100, Juri Lelli wrote: > > Hummm, OK, but in terms of which CPU the rescuer is possibly woken up, > > how are we making sure that the wake up is always happening on > > housekeeping CPUs (assuming unbound workqueues have been restricted to > > those)? > > > > AFAICS, we have > > > > send_mayday -> > > wake_up_process(wq->rescuer->task) > > > > which is not affined to the workqueue cpumask it's called to rescue, so > > in theory can be woken up anywhere? > > Ah, was only thinking about work item execution. Yeah, it's not following > the isolation rule there and we probably should affine it as we're waking it > up. Something like the following then maybe? --- kernel/workqueue.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 2989b57e154a7..ed73f7f80d57d 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -4405,6 +4405,12 @@ static void apply_wqattrs_commit(struct apply_wqattrs_ctx *ctx) link_pwq(ctx->dfl_pwq); swap(ctx->wq->dfl_pwq, ctx->dfl_pwq); + /* rescuer needs to respect wq cpumask changes */ + if (ctx->wq->rescuer) { + kthread_bind_mask(ctx->wq->rescuer->task, ctx->attrs->cpumask); + wake_up_process(ctx->wq->rescuer->task); + } + mutex_unlock(&ctx->wq->mutex); }
Hello, On Wed, Dec 13, 2023 at 09:59:42AM +0100, Juri Lelli wrote: > Something like the following then maybe? > > --- > kernel/workqueue.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index 2989b57e154a7..ed73f7f80d57d 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -4405,6 +4405,12 @@ static void apply_wqattrs_commit(struct apply_wqattrs_ctx *ctx) > link_pwq(ctx->dfl_pwq); > swap(ctx->wq->dfl_pwq, ctx->dfl_pwq); > > + /* rescuer needs to respect wq cpumask changes */ > + if (ctx->wq->rescuer) { > + kthread_bind_mask(ctx->wq->rescuer->task, ctx->attrs->cpumask); > + wake_up_process(ctx->wq->rescuer->task); > + } > + > mutex_unlock(&ctx->wq->mutex); > } I'm not sure kthread_bind_mask() would be safe here. The rescuer might be running a work item. wait_task_inactive() might fail and we don't want to change cpumask while the rescuer is active anyway. Maybe the easiest way to do this is making rescuer_thread() restore the wq's cpumask right before going to sleep, and making apply_wqattrs_commit() just wake up the rescuer. Thanks.
On 13/12/23 05:35, Tejun Heo wrote: > Hello, > > On Wed, Dec 13, 2023 at 09:59:42AM +0100, Juri Lelli wrote: > > Something like the following then maybe? > > > > --- > > kernel/workqueue.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > > index 2989b57e154a7..ed73f7f80d57d 100644 > > --- a/kernel/workqueue.c > > +++ b/kernel/workqueue.c > > @@ -4405,6 +4405,12 @@ static void apply_wqattrs_commit(struct apply_wqattrs_ctx *ctx) > > link_pwq(ctx->dfl_pwq); > > swap(ctx->wq->dfl_pwq, ctx->dfl_pwq); > > > > + /* rescuer needs to respect wq cpumask changes */ > > + if (ctx->wq->rescuer) { > > + kthread_bind_mask(ctx->wq->rescuer->task, ctx->attrs->cpumask); > > + wake_up_process(ctx->wq->rescuer->task); > > + } > > + > > mutex_unlock(&ctx->wq->mutex); > > } > > I'm not sure kthread_bind_mask() would be safe here. The rescuer might be > running a work item. wait_task_inactive() might fail and we don't want to > change cpumask while the rescuer is active anyway. > > Maybe the easiest way to do this is making rescuer_thread() restore the wq's > cpumask right before going to sleep, and making apply_wqattrs_commit() just > wake up the rescuer. Hummm, don't think we can call that either while the rescuer is actually running. Maybe we can simply s/kthread_bind_mask/set_cpus_allowed_ptr/ in the above? Thanks, Juri
On Wed, Dec 13, 2023 at 07:32:10PM +0100, Juri Lelli wrote: > > Maybe the easiest way to do this is making rescuer_thread() restore the wq's > > cpumask right before going to sleep, and making apply_wqattrs_commit() just > > wake up the rescuer. > > Hummm, don't think we can call that either while the rescuer is actually > running. Maybe we can simply s/kthread_bind_mask/set_cpus_allowed_ptr/ > in the above? So, we have to use set_cpus_allowed_ptr() but we still don't want to change the affinity of a rescuer which is already running a task for a pool. Thanks.
On 13/12/23 08:38, Tejun Heo wrote: > On Wed, Dec 13, 2023 at 07:32:10PM +0100, Juri Lelli wrote: > > > Maybe the easiest way to do this is making rescuer_thread() restore the wq's > > > cpumask right before going to sleep, and making apply_wqattrs_commit() just > > > wake up the rescuer. > > > > Hummm, don't think we can call that either while the rescuer is actually > > running. Maybe we can simply s/kthread_bind_mask/set_cpus_allowed_ptr/ > > in the above? > > So, we have to use set_cpus_allowed_ptr() but we still don't want to change > the affinity of a rescuer which is already running a task for a pool. But then, even today, a rescuer might keep handling work on a cpu outside its wq cpumask if the associated wq cpumask change can proceed w/o waiting for it to finish the iteration? BTW, apologies for all the questions, but I'd like to make sure I can get the implications hopefully right. :) Thanks, Juri
Hello, On Thu, Dec 14, 2023 at 12:25:25PM +0100, Juri Lelli wrote: > > So, we have to use set_cpus_allowed_ptr() but we still don't want to change > > the affinity of a rescuer which is already running a task for a pool. > > But then, even today, a rescuer might keep handling work on a cpu > outside its wq cpumask if the associated wq cpumask change can proceed > w/o waiting for it to finish the iteration? Yeah, that can happen and pool cpumasks naturally being subsets of the wq's cpumask that they're serving, your original approach likely isn't broken either. > BTW, apologies for all the questions, but I'd like to make sure I can > get the implications hopefully right. :) I obviously haven't thought through it very well, so thanks for the questions. So, yeah, I think we actually need to set the rescuer's cpumask when wq's cpumask changes and doing it where you were suggesting should probably work. Thanks.
On 14/12/23 09:47, Tejun Heo wrote: > Hello, > > On Thu, Dec 14, 2023 at 12:25:25PM +0100, Juri Lelli wrote: > > > So, we have to use set_cpus_allowed_ptr() but we still don't want to change > > > the affinity of a rescuer which is already running a task for a pool. > > > > But then, even today, a rescuer might keep handling work on a cpu > > outside its wq cpumask if the associated wq cpumask change can proceed > > w/o waiting for it to finish the iteration? > > Yeah, that can happen and pool cpumasks naturally being subsets of the wq's > cpumask that they're serving, your original approach likely isn't broken > either. > > > BTW, apologies for all the questions, but I'd like to make sure I can > > get the implications hopefully right. :) > > I obviously haven't thought through it very well, so thanks for the > questions. So, yeah, I think we actually need to set the rescuer's cpumask > when wq's cpumask changes and doing it where you were suggesting should > probably work. OK. Going to send a proper patch asap. Thanks! Juri
Hello again, On 15/12/23 07:50, Juri Lelli wrote: > On 14/12/23 09:47, Tejun Heo wrote: > > Hello, > > > > On Thu, Dec 14, 2023 at 12:25:25PM +0100, Juri Lelli wrote: > > > > So, we have to use set_cpus_allowed_ptr() but we still don't want to change > > > > the affinity of a rescuer which is already running a task for a pool. > > > > > > But then, even today, a rescuer might keep handling work on a cpu > > > outside its wq cpumask if the associated wq cpumask change can proceed > > > w/o waiting for it to finish the iteration? > > > > Yeah, that can happen and pool cpumasks naturally being subsets of the wq's > > cpumask that they're serving, your original approach likely isn't broken > > either. > > > > > BTW, apologies for all the questions, but I'd like to make sure I can > > > get the implications hopefully right. :) > > > > I obviously haven't thought through it very well, so thanks for the > > questions. So, yeah, I think we actually need to set the rescuer's cpumask > > when wq's cpumask changes and doing it where you were suggesting should > > probably work. > > OK. Going to send a proper patch asap. I actually didn't do that yet as it turns out the proposed approach doesn't cover !WQ_SYSFS unbounded wqs. Well, I thought those should be covered as well, since we have (initiated by echo <mask> into /sys/devices/virtual/workqueue/cpumask) workqueue_apply_unbound_cpumask -> apply_wqattrs_commit but for some reason the mask change is not reflected into rescuers affinity. Trying to dig deeper I went ahead and extended the recent wq_dump.py addition with the following --- ls/workqueue/wq_dump.py | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/tools/workqueue/wq_dump.py b/tools/workqueue/wq_dump.py index d0df5833f2c18..6da621989e210 100644 --- a/tools/workqueue/wq_dump.py +++ b/tools/workqueue/wq_dump.py @@ -175,3 +175,32 @@ for wq in list_for_each_entry('struct workqueue_struct', workqueues.address_of_( if wq.flags & WQ_UNBOUND: print(f' {wq.dfl_pwq.pool.id.value_():{max_pool_id_len}}', end='') print('') + +print('') +print('Workqueue -> rescuer') +print('=====================') +print(f'wq_unbound_cpumask={cpumask_str(wq_unbound_cpumask)}') +print('') +print('[ workqueue \ type unbound_cpumask rescuer pid cpumask]') + +for wq in list_for_each_entry('struct workqueue_struct', workqueues.address_of_(), 'list'): + print(f'{wq.name.string_().decode()[-24:]:24}', end='') + if wq.flags & WQ_UNBOUND: + if wq.flags & WQ_ORDERED: + print(' ordered ', end='') + else: + print(' unbound', end='') + if wq.unbound_attrs.affn_strict: + print(',S ', end='') + else: + print(' ', end='') + print(f' {cpumask_str(wq.unbound_attrs.cpumask):24}', end='') + else: + print(' percpu ', end='') + print(' ', end='') + + if wq.flags & WQ_MEM_RECLAIM: + print(f' {wq.rescuer.task.comm.string_().decode()[-24:]:24}', end='') + print(f' {wq.rescuer.task.pid.value_():5}', end='') + print(f' {cpumask_str(wq.rescuer.task.cpus_ptr)}', end='') + print('') --- which shows the following situation after an # echo 00,00000003 > /sys/devices/virtual/workqueue/cpumask on the system I'm testing with: ... Workqueue -> rescuer ===================== wq_unbound_cpumask=00000003 [ workqueue \ type unbound_cpumask rescuer pid cpumask] events percpu events_highpri percpu events_long percpu events_unbound unbound 0xffffffff 000000ff events_freezable percpu events_power_efficient percpu events_freezable_power_ percpu rcu_gp percpu kworker/R-rcu_g 4 0xffffffff 000000ff rcu_par_gp percpu kworker/R-rcu_p 5 0xffffffff 000000ff slub_flushwq percpu kworker/R-slub_ 6 0xffffffff 000000ff netns ordered 0xffffffff 000000ff kworker/R-netns 7 0xffffffff 000000ff mm_percpu_wq percpu kworker/R-mm_pe 13 0xffffffff 000000ff cpuset_migrate_mm ordered 0xffffffff 000000ff inet_frag_wq percpu kworker/R-inet_ 300 0xffffffff 000000ff pm percpu cgroup_destroy percpu cgroup_pidlist_destroy percpu writeback unbound 0xffffffff 000000ff kworker/R-write 308 0xffffffff 000000ff cgwb_release percpu cryptd percpu kworker/R-crypt 314 0xffffffff 000000ff kintegrityd percpu kworker/R-kinte 315 0xffffffff 000000ff kblockd percpu kworker/R-kbloc 316 0xffffffff 000000ff kacpid percpu kacpi_notify percpu kacpi_hotplug ordered 0xffffffff 000000ff kec ordered 0xffffffff 000000ff kec_query percpu tpm_dev_wq percpu kworker/R-tpm_d 352 0xffffffff 000000ff usb_hub_wq percpu md percpu kworker/R-md 353 0xffffffff 000000ff md_misc percpu md_bitmap unbound 0xffffffff 000000ff kworker/R-md_bi 354 0xffffffff 000000ff edac-poller ordered 0xffffffff 000000ff kworker/R-edac- 355 0xffffffff 000000ff ... I guess I expected wq_unbound_cpumask and unbound_cpumask for each unbound wq to be kept in sync, so I'm evidently missing details. :) Can you please help me here understanding what am I missing? Thanks! Juri