From patchwork Fri May 19 00:16:46 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tejun Heo <tj@kernel.org>
X-Patchwork-Id: 96135
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp890436vqo;
        Thu, 18 May 2023 17:30:00 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ55p+9Io1oic9mgrqoHcBy8VepyGu6mCfSIG7rOpr29pBMQrpJ0JalXH+JwWaikdXvqvfF0
X-Received: by 2002:a05:6a20:e68e:b0:104:b0c1:2813 with SMTP id
 mz14-20020a056a20e68e00b00104b0c12813mr222087pzb.25.1684456200370;
        Thu, 18 May 2023 17:30:00 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1684456200; cv=none;
        d=google.com; s=arc-20160816;
        b=npf8hUQqfqmee08RjgX2CLGqfPU7ecEbSZcV4JmyDBhsbNQ3sUR4PeYRlW9PEz57lG
         dJaJlOxfGGgSXbLb2l9LIT7TV2tRRiYsqZ+K/Fx1jIyoHdedBzS0oR7Xibbz4WAN4p4a
         Mysz/IvMJO11+YQIN1iDBSYo6Lb6eIC0y06TWvojmhGTR6ZZmCUK2POg0g6PAQaEU7DE
         lblmJyaQM2vifxPnODWxFgYmqpHgrUstOYI3JjVRApQou4bcDiy3UfVWZQWo7xuEwkqF
         zTvSCT2iatGZDptzhjjvGeFpK8dHbxklErMNnABNw2BQGUUs5dnaXA7XbQfI0K6l0WaA
         PE3w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from:sender
         :dkim-signature;
        bh=ijOnINV0SHkQJNy159mnAmpLWfqt3tq523U1u8jiKVQ=;
        b=R5GSg2Yf5VpoU2Gz4tiQZzhCdk+gD42G+mPZ/Isi1rqckI3tRNGXvquEwrfI9Mk2J5
         qXFNEISX7AflHPW85hy/h5EJOjAiwQV6m+bKpqJ4K253CCA4FUHpSxfu+K3MkoqNcGjW
         6KhZj+RoMg3qr5DwHYXuGNM3SFCGuwOiKn6FNUOO1JQJ97oJgG9v5jaPTsWMWH8bAKoQ
         6epi0CnpcTm33Vq1it/q3GKASBLrb/st+dJ/nKgPatb9CHz0ackZaNamzNV+k+A+Lez5
         tXqwiT9LpppgAnRhfINuSldohmqNarTIVRYFeCLjzFxnndBDtIIiiEtMbRD/qavwfxa1
         i+HQ==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=EeO6GDlN;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 b134-20020a63348c000000b005289d0633e6si2808393pga.733.2023.05.18.17.29.48;
        Thu, 18 May 2023 17:30:00 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=EeO6GDlN;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230063AbjESARm (ORCPT <rfc822;pacteraone@gmail.com> + 99 others);
        Thu, 18 May 2023 20:17:42 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53440 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229543AbjESARj (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 May 2023 20:17:39 -0400
Received: from mail-pg1-x535.google.com (mail-pg1-x535.google.com
 [IPv6:2607:f8b0:4864:20::535])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 35592F1
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:17:38 -0700 (PDT)
Received: by mail-pg1-x535.google.com with SMTP id
 41be03b00d2f7-53469299319so1127285a12.3
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:17:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1684455457; x=1687047457;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date
         :message-id:reply-to;
        bh=ijOnINV0SHkQJNy159mnAmpLWfqt3tq523U1u8jiKVQ=;
        b=EeO6GDlNBKvWAEFas4c/23VJoMXMrUWgruSl5fX8lM6sF2FEHZM7U2Il1X7gEm/uIQ
         Z11MGt//RASQEbBNEyxB0sDjGg4TOJdTUZHUso9TNXCEObyJeU0q9b+pnTPx4tTZZ45E
         byN+10X8aYqRhYfipyl8JheM2EIZk5qYiy/lfZno361b1N7L7ngi3HODqp86Jgu9m7wc
         vvVBLiwjDRXnwFdrrecQfAxjmJa7ib+35zYQIZaKomr7MZ9yO4kenG/ywWXIhK03FrPh
         +BKfYfEvs/3iWNCPWbMrgiApGb1BTkVYsLtHNP05zUmMaX4LDMiDJcHVnHFiag46vGig
         WyYQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684455457; x=1687047457;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=ijOnINV0SHkQJNy159mnAmpLWfqt3tq523U1u8jiKVQ=;
        b=keocbRaaRq5VZ/tsDnCyoyuBaAuAxLU47h6tDsAGK7Fxq1oIqo8t8J7h5ya7MGGG6M
         U1RmZxOpYNN0oOi6F0HAhQc/dE5rw0n6BCDFroaBWtEdSW/rlMYL+2rtEitlmZeUdENx
         zstY27Z9b3OnfUy/5bKuOExXcns8dlczkhpYw/I1L5h3Qsr/wLjSi0L0/YbjnBAVHil3
         t88dJ3Aa5TZSXgSSY2NbDBNTJL4hvo2ZFhyh7BSgMMEn1PrnBlKAFEZb36WkrVUA8rsh
         VF1ulL4dWCEcLM8db+VB7Qjjw3YxJuKQDl4/7oFG9gLnPGPF8WNo9+j0UYAmkSQibxrS
         qarA==
X-Gm-Message-State: AC+VfDxp1TSGnNrNGySF2sL+SoYAidMPkZ9ufyw1btgPUstWrw3lSxCi
        YPeilY0W7kP0TKBmWcjfuTM=
X-Received: by 2002:a17:902:d389:b0:1ad:b5b4:e424 with SMTP id
 e9-20020a170902d38900b001adb5b4e424mr825661pld.38.1684455457335;
        Thu, 18 May 2023 17:17:37 -0700 (PDT)
Received: from localhost
 (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com.
 [2603:800c:1a02:1bae:a7fa:157f:969a:4cde])
        by smtp.gmail.com with ESMTPSA id
 u8-20020a170902e5c800b001ac84f87b1dsm2047254plf.155.2023.05.18.17.17.36
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 18 May 2023 17:17:36 -0700 (PDT)
Sender: Tejun Heo <htejun@gmail.com>
From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
        linux-kernel@vger.kernel.org, kernel-team@meta.com,
        joshdon@google.com, brho@google.com, briannorris@chromium.org,
        nhuck@google.com, agk@redhat.com, snitzer@kernel.org,
        void@manifault.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 01/24] workqueue: Drop the special locking rule for
 worker->flags and worker_pool->flags
Date: Thu, 18 May 2023 14:16:46 -1000
Message-Id: <20230519001709.2563-2-tj@kernel.org>
X-Mailer: git-send-email 2.40.1
In-Reply-To: <20230519001709.2563-1-tj@kernel.org>
References: <20230519001709.2563-1-tj@kernel.org>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,
        DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
        HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,
        SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no
        version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1766280344760942626?=
X-GMAIL-MSGID: =?utf-8?q?1766280344760942626?=

worker->flags used to be accessed from scheduler hooks without grabbing
pool->lock for concurrency management. This is no longer true since
6d25be5782e4 ("sched/core, workqueues: Distangle worker accounting from rq
lock"). Also, it's unclear why worker_pool->flags was using the "X" rule.
All relevant users are accessing it under the pool lock.

Let's drop the special "X" rule and use the "L" rule for these flag fields
instead. While at it, replace the CONTEXT comment with
lockdep_assert_held().

This allows worker_set/clr_flags() to be used from context which isn't the
worker itself. This will be used later to implement assinging work items to
workers before waking them up so that workqueue can have better control over
which worker executes which work item on which CPU.

The only actual changes are sanity checks. There shouldn't be any visible
behavior changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/workqueue.c          | 17 +++--------------
 kernel/workqueue_internal.h |  2 +-
 2 files changed, 4 insertions(+), 15 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ee16ddb0647c..9a97db94e1dc 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -121,11 +121,6 @@ enum {
  *
  * L: pool->lock protected.  Access with pool->lock held.
  *
- * X: During normal operation, modification requires pool->lock and should
- *    be done only from local cpu.  Either disabling preemption on local
- *    cpu or grabbing pool->lock is enough for read access.  If
- *    POOL_DISASSOCIATED is set, it's identical to L.
- *
  * K: Only modified by worker while holding pool->lock. Can be safely read by
  *    self, while holding pool->lock or from IRQ context if %current is the
  *    kworker.
@@ -159,7 +154,7 @@ struct worker_pool {
 	int			cpu;		/* I: the associated cpu */
 	int			node;		/* I: the associated node ID */
 	int			id;		/* I: pool ID */
-	unsigned int		flags;		/* X: flags */
+	unsigned int		flags;		/* L: flags */
 
 	unsigned long		watchdog_ts;	/* L: watchdog timestamp */
 	bool			cpu_stall;	/* WD: stalled cpu bound pool */
@@ -901,15 +896,12 @@ static void wake_up_worker(struct worker_pool *pool)
  * @flags: flags to set
  *
  * Set @flags in @worker->flags and adjust nr_running accordingly.
- *
- * CONTEXT:
- * raw_spin_lock_irq(pool->lock)
  */
 static inline void worker_set_flags(struct worker *worker, unsigned int flags)
 {
 	struct worker_pool *pool = worker->pool;
 
-	WARN_ON_ONCE(worker->task != current);
+	lockdep_assert_held(&pool->lock);
 
 	/* If transitioning into NOT_RUNNING, adjust nr_running. */
 	if ((flags & WORKER_NOT_RUNNING) &&
@@ -926,16 +918,13 @@ static inline void worker_set_flags(struct worker *worker, unsigned int flags)
  * @flags: flags to clear
  *
  * Clear @flags in @worker->flags and adjust nr_running accordingly.
- *
- * CONTEXT:
- * raw_spin_lock_irq(pool->lock)
  */
 static inline void worker_clr_flags(struct worker *worker, unsigned int flags)
 {
 	struct worker_pool *pool = worker->pool;
 	unsigned int oflags = worker->flags;
 
-	WARN_ON_ONCE(worker->task != current);
+	lockdep_assert_held(&pool->lock);
 
 	worker->flags &= ~flags;
 
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 6b1d66e28269..f6275944ada7 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -48,7 +48,7 @@ struct worker {
 						/* A: runs through worker->node */
 
 	unsigned long		last_active;	/* K: last active timestamp */
-	unsigned int		flags;		/* X: flags */
+	unsigned int		flags;		/* L: flags */
 	int			id;		/* I: worker id */
 
 	/*

From patchwork Fri May 19 00:16:47 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tejun Heo <tj@kernel.org>
X-Patchwork-Id: 96123
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp889161vqo;
        Thu, 18 May 2023 17:26:38 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ6ClhpymgzV/CYkfBwEJooqUqqE75Oa8ERaCLravocctYqhxMjfp5yhy1KvhIAMwwGMKN0r
X-Received: by 2002:a05:6a00:14c4:b0:643:9cc0:a3be with SMTP id
 w4-20020a056a0014c400b006439cc0a3bemr986119pfu.5.1684455998389;
        Thu, 18 May 2023 17:26:38 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1684455998; cv=none;
        d=google.com; s=arc-20160816;
        b=Z/vchvJ4f9S9EfV2YFuq+1qaqanTQMhWwnB/+aEM6HjtZGqosHdobLQWNKSF25qt4q
         7/EUeQNzBZlOWspuHLyMOYFTeTGsJLRq8i5HizkuCKOu8TQ7qDF91AJj0aplMjWL2ske
         zD9gcGbLMg1Lkzu9gibqjJanLNjln/w92/p7qZ7seHgHMS5lphI/yOIbxivcS5Tlm+fL
         Vz+QAycO1i0PbKB2lnQLNg1za/MNmp2kROlatSxj57rWyjjSU6acrqrPoAAAbHJmCUdg
         vga6Klte6XNBnCYqNZnxiqBOHHM5ej8oRRoFNKpvJDs055ayPV6vb1BBkIt28nqCc/hI
         +vFA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from:sender
         :dkim-signature;
        bh=EgmojnMiZ1jSUNbjWXd5AKfczzQxJVqGZLAZfKAvf+k=;
        b=CHKqIqOVmcbQKcp0zXdTeJgn083ur0KsJkYd9twUb5w8qb8UZLZGF5iprCkCr1zPLP
         4/Dw/kLfUvbBzcUoJz3c9eHq0w/cqaCjuntWoEIfkEjuN5T29itojl99zXgDdBggHTQ5
         pW2BlArl0VT9CsuQjRckkqFlpAZRItrsQVzevPjmPGSD37Pbb5MNVitrjPzjSeLeWgAy
         lcearW2OzhDLaVZxPpp70UzvSrwUatazxIxZt4+Pr8azT07B3lDnLkn6xnJ9etoAIhAS
         kG2i42AJeku/gzbx5+OoAYmbtRwzywDrjl1p+aNoMxvaamozC0W9ngJaLcNKhbaGAx8k
         NOEw==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=k7sPPSPD;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 p188-20020a6229c5000000b0064762d151fcsi2850807pfp.183.2023.05.18.17.26.26;
        Thu, 18 May 2023 17:26:38 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=k7sPPSPD;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230222AbjESARp (ORCPT <rfc822;pacteraone@gmail.com> + 99 others);
        Thu, 18 May 2023 20:17:45 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53448 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229497AbjESARl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 May 2023 20:17:41 -0400
Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com
 [IPv6:2607:f8b0:4864:20::62d])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2ACCDE0
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:17:40 -0700 (PDT)
Received: by mail-pl1-x62d.google.com with SMTP id
 d9443c01a7336-1ae3fe67980so27240635ad.3
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:17:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1684455459; x=1687047459;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date
         :message-id:reply-to;
        bh=EgmojnMiZ1jSUNbjWXd5AKfczzQxJVqGZLAZfKAvf+k=;
        b=k7sPPSPDqJoNSbNjwl8Wbx/ds81S3tZeDxUEEuHSI5RN03Qz5E2hU3BT0haOFYC2dC
         NlsN4GbO36sveNFWoHSiFHqreq9vagZwuJp3er+6onAz1uhuyNrel4b18vu6YDv2hTU6
         wzwO1VueAH80fRzhhmFoi4flDzgLiLV2jA+eErSx5m9Z4mq7nmHKkmgMZ3RYUXeh7Rr3
         av/Ywx/D0t3YvVhn1TkWsRLvJIGB73njblcxkhPwgnyqtNcZi7ZjgN3SomBdagE2IqH8
         AIKyD+mc6BZs/rRm4DdBt+J/bgPqSe5q5TtEDaqoWX+GZP79smcgG8QNp55AYjlTAzAC
         c45A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684455459; x=1687047459;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=EgmojnMiZ1jSUNbjWXd5AKfczzQxJVqGZLAZfKAvf+k=;
        b=TvxH5RNtksFCUDknxzl8mDMDm0J4Lo4Q+4MXpW/G3JKIWcwk4pBmkI/gwzXdovckML
         667XiWcaMqNNE+Qf9NoNM9CltoxqqaWqg5/MHS4FdO2j8EsO57ycAJj+7KWGNbrdgzSO
         cWY9XeRNTyibIKjOYwwyvef+CbkMvRGgS8XG+KkJi2g4olu3Lcflv/admVrm8FjCxqja
         RuHlE9I4imL6nef9ZPYXfKDgdn9zcQOW/tTfxX9Q/QXBPzeueVGp6nLHwHQ7852EtcjF
         fm6f/En5dTYaN4YD5dwtQIDZEj2hx7P4mRqSNzcuy2JQVTRvxjieQmD0t5RlRwa92J06
         mGBQ==
X-Gm-Message-State: AC+VfDwEaN/br4XVWh9eaWYeD8MTlC3CFFLlYbEzgfX4tTVq7EDneYHR
        H4phatKMzFN/bdJ1H6hKz3A=
X-Received: by 2002:a17:902:e5ce:b0:1ab:14f2:e7e6 with SMTP id
 u14-20020a170902e5ce00b001ab14f2e7e6mr921534plf.65.1684455459492;
        Thu, 18 May 2023 17:17:39 -0700 (PDT)
Received: from localhost
 (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com.
 [2603:800c:1a02:1bae:a7fa:157f:969a:4cde])
        by smtp.gmail.com with ESMTPSA id
 f3-20020a17090274c300b001a980a23804sm2077797plt.4.2023.05.18.17.17.38
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 18 May 2023 17:17:38 -0700 (PDT)
Sender: Tejun Heo <htejun@gmail.com>
From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
        linux-kernel@vger.kernel.org, kernel-team@meta.com,
        joshdon@google.com, brho@google.com, briannorris@chromium.org,
        nhuck@google.com, agk@redhat.com, snitzer@kernel.org,
        void@manifault.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 02/24] workqueue: Cleanups around process_scheduled_works()
Date: Thu, 18 May 2023 14:16:47 -1000
Message-Id: <20230519001709.2563-3-tj@kernel.org>
X-Mailer: git-send-email 2.40.1
In-Reply-To: <20230519001709.2563-1-tj@kernel.org>
References: <20230519001709.2563-1-tj@kernel.org>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,
        DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
        HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,
        SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no
        version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1766280132743877723?=
X-GMAIL-MSGID: =?utf-8?q?1766280132743877723?=

* Drop the trivial optimization in worker_thread() where it bypasses calling
  process_scheduled_works() if the first work item isn't linked. This is a
  mostly pointless micro optimization and gets in the way of improving the
  work processing path.

* Consolidate pool->watchdog_ts updates in the two callers into
  process_scheduled_works().

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/workqueue.c | 29 +++++++++++------------------
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9a97db94e1dc..c1e56ba4a038 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2634,9 +2634,15 @@ __acquires(&pool->lock)
  */
 static void process_scheduled_works(struct worker *worker)
 {
-	while (!list_empty(&worker->scheduled)) {
-		struct work_struct *work = list_first_entry(&worker->scheduled,
-						struct work_struct, entry);
+	struct work_struct *work;
+	bool first = true;
+
+	while ((work = list_first_entry_or_null(&worker->scheduled,
+						struct work_struct, entry))) {
+		if (first) {
+			worker->pool->watchdog_ts = jiffies;
+			first = false;
+		}
 		process_one_work(worker, work);
 	}
 }
@@ -2717,17 +2723,8 @@ static int worker_thread(void *__worker)
 			list_first_entry(&pool->worklist,
 					 struct work_struct, entry);
 
-		pool->watchdog_ts = jiffies;
-
-		if (likely(!(*work_data_bits(work) & WORK_STRUCT_LINKED))) {
-			/* optimization path, not strictly necessary */
-			process_one_work(worker, work);
-			if (unlikely(!list_empty(&worker->scheduled)))
-				process_scheduled_works(worker);
-		} else {
-			move_linked_works(work, &worker->scheduled, NULL);
-			process_scheduled_works(worker);
-		}
+		move_linked_works(work, &worker->scheduled, NULL);
+		process_scheduled_works(worker);
 	} while (keep_working(pool));
 
 	worker_set_flags(worker, WORKER_PREP);
@@ -2802,7 +2799,6 @@ static int rescuer_thread(void *__rescuer)
 					struct pool_workqueue, mayday_node);
 		struct worker_pool *pool = pwq->pool;
 		struct work_struct *work, *n;
-		bool first = true;
 
 		__set_current_state(TASK_RUNNING);
 		list_del_init(&pwq->mayday_node);
@@ -2820,12 +2816,9 @@ static int rescuer_thread(void *__rescuer)
 		WARN_ON_ONCE(!list_empty(scheduled));
 		list_for_each_entry_safe(work, n, &pool->worklist, entry) {
 			if (get_work_pwq(work) == pwq) {
-				if (first)
-					pool->watchdog_ts = jiffies;
 				move_linked_works(work, scheduled, &n);
 				pwq->stats[PWQ_STAT_RESCUED]++;
 			}
-			first = false;
 		}
 
 		if (!list_empty(scheduled)) {

From patchwork Fri May 19 00:16:48 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tejun Heo <tj@kernel.org>
X-Patchwork-Id: 96117
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp887200vqo;
        Thu, 18 May 2023 17:21:18 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ7NoOvn1OTAZuoJSnGUdAzsxTN36PsOtDue0g4Mj093JEDr3n45OVbegeyqAZvQfSg8UNMB
X-Received: by 2002:a05:6a20:e185:b0:104:7a4c:6c97 with SMTP id
 ks5-20020a056a20e18500b001047a4c6c97mr220236pzb.13.1684455678184;
        Thu, 18 May 2023 17:21:18 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1684455678; cv=none;
        d=google.com; s=arc-20160816;
        b=GONZ5ApXhKjMG/pBIemdT99hrxBiBVFaMYASnKwgApmgNBQ4FXLMYgt7ziFArp/d0l
         uMWv9hvyU2BIbbd27GWlpjSmE6z56+fo9c9hRY4ps/0W6i8DKVSNcqgJZWc7nBygNYM+
         xN8fSOHrfnTuZFFm2yQ0js9NY0HvVxUmuvi5vzmEJrH5y6itmvSA9/xl+QR2GGu4PWFI
         qijRnDQhhlzjb6705mvWSF9z+0t+p+qp1bSZa+2u1/6qI+axPO9CY/1vF9MGRZuCJaAb
         a0wWkUni4cMOCCOzHxrXp95jkIjCrCi8O4S2dJVVgkyln9T6nF1JRqNpEiC63pTQsOT4
         f1+Q==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from:sender
         :dkim-signature;
        bh=mQwCMgvu2VcS9enL8hvYeXtHxlRMITrajrDxqgjGaCA=;
        b=szmpZIzlluHqnMMHsxcwHRpdZZ7S0HZIA4Ku3PHjI/W4vRVGvKNx/g1gbzGTM0yOyx
         f8Ou+c24/XLTCs/4E03Ai6P6p3XrJj20jN01lVh9OI3e5C/fSYGazLpH43SWPFdwcDrt
         ZMdJTYf6YKSBX8rjwR0Hw7s3YN+7pzmsTi3t40MF+zUCzUS9ERxw5Ul9KMggpmZePAxK
         JmHzEFJovK5DineOJDRY4Pj+4vI3hhIEcJo4MjRnuIymq6vG056RZqbxWmxVYOgCAJhF
         UA0TauDTNrB9+9Hoxymo1JZlO3Am2NGQpl0J2pxbo8oBd4UVrsS2xq7mXe5qP5O5sggB
         iC+Q==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=dINA6O1E;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 u7-20020a637907000000b0052cc0c1c39bsi2424551pgc.684.2023.05.18.17.21.02;
        Thu, 18 May 2023 17:21:18 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=dINA6O1E;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230274AbjESARv (ORCPT <rfc822;pacteraone@gmail.com> + 99 others);
        Thu, 18 May 2023 20:17:51 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53464 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229543AbjESARn (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 May 2023 20:17:43 -0400
Received: from mail-pg1-x535.google.com (mail-pg1-x535.google.com
 [IPv6:2607:f8b0:4864:20::535])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ED3C0E0
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:17:41 -0700 (PDT)
Received: by mail-pg1-x535.google.com with SMTP id
 41be03b00d2f7-53469299319so1127328a12.3
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:17:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1684455461; x=1687047461;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date
         :message-id:reply-to;
        bh=mQwCMgvu2VcS9enL8hvYeXtHxlRMITrajrDxqgjGaCA=;
        b=dINA6O1E+wWjh43Uq8aD2hjsTsjgcYQx5psqEDoYUb75QqvVB6EOkS7v43SrOXkvbx
         bX4YyK4mxUHpqlTIwyZMyAVnA/NEiQi4xQGC0IHbQvxSN/zID3rRlrI2cyvtARhLx4CD
         ZSm5VlhSOdx2Lh03NxR6pUXo72O7hdvNNzsoMlaw25+UO016RUFiiu+6T5WLgeT8+9Co
         wlrYYcxv9aDycB8lZIpZS2Rf/omOttafHbXdgdLDOh43GxxYVCoHe1mcPjmACVnRUK9E
         r9rE4XBvv6/hIy7C9cn+hYvq4ieJEWUihsSIpJMFYt+1ekYWrJvVQpOYWGh5iW9F91V3
         BdFA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684455461; x=1687047461;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=mQwCMgvu2VcS9enL8hvYeXtHxlRMITrajrDxqgjGaCA=;
        b=OPa8ttIQmlfZHjC1kK9kr6G2giTVMFFaPde3o0OjmD3SW1IMtPXMzWv+kglTyeJKMF
         wSAiqcYczSbBCGVyzTEUZRh5QKZbJXT3Cn8UASB3R1oTrrnHhfalC55+4s0VwNxz/0KW
         7yjJ4kD7wvkXelxA5Tvl561wv/85jRMl2q5bnQbxgk/xE9H6ASseZBmrwOL88WjFM77Y
         Lzm5bHV+fFbR9zEDyWopV11LOQhS5/X3Wasz0mPnL+xEYH8oJ67J8cTn0afsD+dQZXLF
         6LJvMNSRMe48tIxHvHiMN1pH/taWMzq/myOR7CKrvrwRIiP9iGWqgHyQcdxJzRNvJAQW
         ISwA==
X-Gm-Message-State: AC+VfDyx43+Pk+ejOTxJ01UCs3PTrE9R6LCkeGEXnvOgbxjo2GxqRODD
        JSSHMCriSJr1DblcTrs4RVc=
X-Received: by 2002:a17:903:32cb:b0:1a6:4606:6e06 with SMTP id
 i11-20020a17090332cb00b001a646066e06mr1139601plr.17.1684455461381;
        Thu, 18 May 2023 17:17:41 -0700 (PDT)
Received: from localhost
 (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com.
 [2603:800c:1a02:1bae:a7fa:157f:969a:4cde])
        by smtp.gmail.com with ESMTPSA id
 x8-20020a170902a38800b001aaf536b1e3sm2068143pla.123.2023.05.18.17.17.40
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 18 May 2023 17:17:40 -0700 (PDT)
Sender: Tejun Heo <htejun@gmail.com>
From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
        linux-kernel@vger.kernel.org, kernel-team@meta.com,
        joshdon@google.com, brho@google.com, briannorris@chromium.org,
        nhuck@google.com, agk@redhat.com, snitzer@kernel.org,
        void@manifault.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 03/24] workqueue: Not all work insertion needs to wake up a
 worker
Date: Thu, 18 May 2023 14:16:48 -1000
Message-Id: <20230519001709.2563-4-tj@kernel.org>
X-Mailer: git-send-email 2.40.1
In-Reply-To: <20230519001709.2563-1-tj@kernel.org>
References: <20230519001709.2563-1-tj@kernel.org>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,
        DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
        HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,
        SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no
        version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1766279797464672492?=
X-GMAIL-MSGID: =?utf-8?q?1766279797464672492?=

insert_work() always tried to wake up a worker; however, the only time it
needs to try to wake up a worker is when a new active work item is queued.
When a work item goes on the inactive list or queueing a flush work item,
there's no reason to try to wake up a worker.

This patch moves the worker wakeup logic out of insert_work() and places it
in the active new work item queueing path in __queue_work().

While at it:

* __queue_work() is dereferencing pwq->pool repeatedly. Add local variable
  pool.

* Every caller of insert_work() calls debug_work_activate(). Consolidate the
  invocations into insert_work().

* In __queue_work() pool->watchdog_ts update is relocated slightly. This is
  to better accommodate future changes.

This makes wakeups more precise and will help the planned change to assign
work items to workers before waking them up. No behavior changes intended.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/workqueue.c | 37 ++++++++++++++++++-------------------
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c1e56ba4a038..0d5eb436d31a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1523,7 +1523,7 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
 static void insert_work(struct pool_workqueue *pwq, struct work_struct *work,
 			struct list_head *head, unsigned int extra_flags)
 {
-	struct worker_pool *pool = pwq->pool;
+	debug_work_activate(work);
 
 	/* record the work call stack in order to print it in KASAN reports */
 	kasan_record_aux_stack_noalloc(work);
@@ -1532,9 +1532,6 @@ static void insert_work(struct pool_workqueue *pwq, struct work_struct *work,
 	set_work_pwq(work, pwq, extra_flags);
 	list_add_tail(&work->entry, head);
 	get_pwq(pwq);
-
-	if (__need_more_worker(pool))
-		wake_up_worker(pool);
 }
 
 /*
@@ -1588,8 +1585,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
 			 struct work_struct *work)
 {
 	struct pool_workqueue *pwq;
-	struct worker_pool *last_pool;
-	struct list_head *worklist;
+	struct worker_pool *last_pool, *pool;
 	unsigned int work_flags;
 	unsigned int req_cpu = cpu;
 
@@ -1623,13 +1619,15 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
 		pwq = per_cpu_ptr(wq->cpu_pwqs, cpu);
 	}
 
+	pool = pwq->pool;
+
 	/*
 	 * If @work was previously on a different pool, it might still be
 	 * running there, in which case the work needs to be queued on that
 	 * pool to guarantee non-reentrancy.
 	 */
 	last_pool = get_work_pool(work);
-	if (last_pool && last_pool != pwq->pool) {
+	if (last_pool && last_pool != pool) {
 		struct worker *worker;
 
 		raw_spin_lock(&last_pool->lock);
@@ -1638,13 +1636,14 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
 
 		if (worker && worker->current_pwq->wq == wq) {
 			pwq = worker->current_pwq;
+			pool = pwq->pool;
 		} else {
 			/* meh... not running there, queue here */
 			raw_spin_unlock(&last_pool->lock);
-			raw_spin_lock(&pwq->pool->lock);
+			raw_spin_lock(&pool->lock);
 		}
 	} else {
-		raw_spin_lock(&pwq->pool->lock);
+		raw_spin_lock(&pool->lock);
 	}
 
 	/*
@@ -1657,7 +1656,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
 	 */
 	if (unlikely(!pwq->refcnt)) {
 		if (wq->flags & WQ_UNBOUND) {
-			raw_spin_unlock(&pwq->pool->lock);
+			raw_spin_unlock(&pool->lock);
 			cpu_relax();
 			goto retry;
 		}
@@ -1676,21 +1675,22 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
 	work_flags = work_color_to_flags(pwq->work_color);
 
 	if (likely(pwq->nr_active < pwq->max_active)) {
+		if (list_empty(&pool->worklist))
+			pool->watchdog_ts = jiffies;
+
 		trace_workqueue_activate_work(work);
 		pwq->nr_active++;
-		worklist = &pwq->pool->worklist;
-		if (list_empty(worklist))
-			pwq->pool->watchdog_ts = jiffies;
+		insert_work(pwq, work, &pool->worklist, work_flags);
+
+		if (__need_more_worker(pool))
+			wake_up_worker(pool);
 	} else {
 		work_flags |= WORK_STRUCT_INACTIVE;
-		worklist = &pwq->inactive_works;
+		insert_work(pwq, work, &pwq->inactive_works, work_flags);
 	}
 
-	debug_work_activate(work);
-	insert_work(pwq, work, worklist, work_flags);
-
 out:
-	raw_spin_unlock(&pwq->pool->lock);
+	raw_spin_unlock(&pool->lock);
 	rcu_read_unlock();
 }
 
@@ -2994,7 +2994,6 @@ static void insert_wq_barrier(struct pool_workqueue *pwq,
 	pwq->nr_in_flight[work_color]++;
 	work_flags |= work_color_to_flags(work_color);
 
-	debug_work_activate(&barr->work);
 	insert_work(pwq, &barr->work, head, work_flags);
 }
 

From patchwork Fri May 19 00:16:49 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tejun Heo <tj@kernel.org>
X-Patchwork-Id: 96129
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp889734vqo;
        Thu, 18 May 2023 17:28:17 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ7sWYLLII/jzY0VHWj7NfOfLke6Pm6DuMcPkaWCee/HYQWfp0VWbQOsj1zPnqQbbcywHKDE
X-Received: by 2002:a05:6a21:7899:b0:101:2f83:5fba with SMTP id
 bf25-20020a056a21789900b001012f835fbamr249118pzc.31.1684456097207;
        Thu, 18 May 2023 17:28:17 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1684456097; cv=none;
        d=google.com; s=arc-20160816;
        b=dFNTRHcYq/xoLWxlKUyUcMJX0VHrE8A9cIFCTcC1XNeK/Y5U2ePSrJB62dtqjYdIxM
         V+pC+wnqLlJ6rzm3kD/OqGkQP68Zwff28sXnEw3pxWJkc0EODXogb8qrtOpqRrh9hMaZ
         uTa7H+DLYhr54bI1BTClD7+Btu9DbjvbcFO0jUZnHJmJXD93WPlipEzteSIP1IhNAv4u
         yF0fJedhs29Y1PRjU9fxJYianPVCv5mm1Wc22S9D5YAwsyi1gzsSrY4FOkl8IDiy3T6D
         Ehpi9HzCl8Zz793/i5+rKIrsL38YZb+BbL7GMiQiWiJc6xyMalV48lJqWebmogDb3oPb
         wzHQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from:sender
         :dkim-signature;
        bh=+dpnSLC9bQ8To4/SaHarG2WNUmWGl1/1r8Lc4Ap+s7k=;
        b=SrBOuWDmN2pblS5EAJ4q1YOjY5NCr0Abx7O+W8NqjaBgi/ohvEV8j21XIpVjl+wLs/
         KIRXLHgpfNOzvIz2RZ/1UVHs1p7n50RMthJapJwof8+XU9iYqWB+SOBgx9P3qiSBP3s/
         m/G4mIn/2kA05Df7fdonkDoviyf4w35gA14JKlQSgla27x3RCdeJH1PPCt+Jb6IjBnQt
         AW1B4XalqFZ/cZ0+BRuod79IUIoLknan2J0JCli/m9PzndrIbhp5gL3ryhptpUexi0YS
         OtJpkdfcRcGjkdKJRbX0OVhgJ5tgIy2AfWQ0inEbjOvCbBgOHuEMSMSxS6UkFrj82rnP
         jW7g==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=RWc82gZQ;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 l128-20020a622586000000b0064d2e108cecsi759733pfl.104.2023.05.18.17.28.04;
        Thu, 18 May 2023 17:28:17 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=RWc82gZQ;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230299AbjESARz (ORCPT <rfc822;pacteraone@gmail.com> + 99 others);
        Thu, 18 May 2023 20:17:55 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53486 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230211AbjESARp (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 May 2023 20:17:45 -0400
Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com
 [IPv6:2607:f8b0:4864:20::435])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 24F7A10C6
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:17:44 -0700 (PDT)
Received: by mail-pf1-x435.google.com with SMTP id
 d2e1a72fcca58-64d247a023aso806598b3a.2
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:17:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1684455463; x=1687047463;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date
         :message-id:reply-to;
        bh=+dpnSLC9bQ8To4/SaHarG2WNUmWGl1/1r8Lc4Ap+s7k=;
        b=RWc82gZQcOi3ewtdnwMSvpCmToFXueBwQ6xBJ4oFtXS4FCJQTTQDQbd5KnlaQxiy9T
         G08WrCfizDTtILfIV3A7oADxIX9VzS1umuKnlgTNtKAok7o910DO/MLWpR+HWBZjnjG+
         kjXIr0t+ue6cNYgafS5lHgW1H4R65TrljtG6lEHphPShRDKB5RZWQJLfpubtm1VwZxYM
         AfL/lA0oc3BmkVXvNGekKPGXdvM+ZHUd+bqI4g1qDdxXjvOZXl7ZVN5G5KV9qxfsju87
         9MV56x3SzhZwRPKWFn4kAshdpDnQbdp7a1VQ05yLf/mXIWwboZhmvrkteIgCDwMlSBuD
         GaMQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684455463; x=1687047463;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=+dpnSLC9bQ8To4/SaHarG2WNUmWGl1/1r8Lc4Ap+s7k=;
        b=Fd43J/FTuD8uVAK8bTxpiyESrVMZlp4nqXVFhke93FtyiiRjsGPtKcp2ulEqIhnb2I
         8+GTNu+zxxJyGNYaoccQ0lQMa+5nVeN3QRrOcNaDhQOJZCSQLGyh6W/ZdwUpNuJ3omzP
         7i59qQySgcjkD6VfQ+wYU7CFVRxYCGiD332R2fiOds+mU4o5rN9byp1KO1itaT+huLqV
         hjVMmqcfqdbrH63pm70K6vTWDNKuxjTaVwn3boWyzNWwE21291ksNZkgDejbtGeAKY0+
         XQ4eb+oW2zLDTqB1RscaSTpW64nIlv52+bOrwEx3pQRdGGuE6z7t8CBuXJ6U49tGQSRU
         I/OA==
X-Gm-Message-State: AC+VfDzBCfDi0BvPOjvLWpwiwlzt6D0yfWfQl+jMYefnt/jHzXA1esPJ
        bl5Eo0tITfgv/Ic798noCe0=
X-Received: by 2002:a05:6a00:228d:b0:64d:1c59:6767 with SMTP id
 f13-20020a056a00228d00b0064d1c596767mr776394pfe.24.1684455463409;
        Thu, 18 May 2023 17:17:43 -0700 (PDT)
Received: from localhost
 (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com.
 [2603:800c:1a02:1bae:a7fa:157f:969a:4cde])
        by smtp.gmail.com with ESMTPSA id
 f2-20020a655502000000b0051baf3f1b3esm1649710pgr.76.2023.05.18.17.17.42
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 18 May 2023 17:17:42 -0700 (PDT)
Sender: Tejun Heo <htejun@gmail.com>
From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
        linux-kernel@vger.kernel.org, kernel-team@meta.com,
        joshdon@google.com, brho@google.com, briannorris@chromium.org,
        nhuck@google.com, agk@redhat.com, snitzer@kernel.org,
        void@manifault.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 04/24] workqueue: Rename wq->cpu_pwqs to wq->cpu_pwq
Date: Thu, 18 May 2023 14:16:49 -1000
Message-Id: <20230519001709.2563-5-tj@kernel.org>
X-Mailer: git-send-email 2.40.1
In-Reply-To: <20230519001709.2563-1-tj@kernel.org>
References: <20230519001709.2563-1-tj@kernel.org>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,
        DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
        HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,
        SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no
        version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1766280236420612989?=
X-GMAIL-MSGID: =?utf-8?q?1766280236420612989?=

wq->cpu_pwqs is a percpu variable carraying one pointer to a pool_workqueue.
The field name being plural is unusual and confusing. Rename it to singular.

This patch doesn't cause any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/workqueue.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0d5eb436d31a..80b2bd01c718 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -320,7 +320,7 @@ struct workqueue_struct {
 
 	/* hot fields used during command issue, aligned to cacheline */
 	unsigned int		flags ____cacheline_aligned; /* WQ: WQ_* flags */
-	struct pool_workqueue __percpu *cpu_pwqs; /* I: per-cpu pwqs */
+	struct pool_workqueue __percpu *cpu_pwq; /* I: per-cpu pwqs */
 	struct pool_workqueue __rcu *numa_pwq_tbl[]; /* PWR: unbound pwqs indexed by node */
 };
 
@@ -1616,7 +1616,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
 	} else {
 		if (req_cpu == WORK_CPU_UNBOUND)
 			cpu = raw_smp_processor_id();
-		pwq = per_cpu_ptr(wq->cpu_pwqs, cpu);
+		pwq = per_cpu_ptr(wq->cpu_pwq, cpu);
 	}
 
 	pool = pwq->pool;
@@ -3807,7 +3807,7 @@ static void rcu_free_wq(struct rcu_head *rcu)
 	wq_free_lockdep(wq);
 
 	if (!(wq->flags & WQ_UNBOUND))
-		free_percpu(wq->cpu_pwqs);
+		free_percpu(wq->cpu_pwq);
 	else
 		free_workqueue_attrs(wq->unbound_attrs);
 
@@ -4501,13 +4501,13 @@ static int alloc_and_link_pwqs(struct workqueue_struct *wq)
 	int cpu, ret;
 
 	if (!(wq->flags & WQ_UNBOUND)) {
-		wq->cpu_pwqs = alloc_percpu(struct pool_workqueue);
-		if (!wq->cpu_pwqs)
+		wq->cpu_pwq = alloc_percpu(struct pool_workqueue);
+		if (!wq->cpu_pwq)
 			return -ENOMEM;
 
 		for_each_possible_cpu(cpu) {
 			struct pool_workqueue *pwq =
-				per_cpu_ptr(wq->cpu_pwqs, cpu);
+				per_cpu_ptr(wq->cpu_pwq, cpu);
 			struct worker_pool *cpu_pools =
 				per_cpu(cpu_worker_pools, cpu);
 
@@ -4888,7 +4888,7 @@ bool workqueue_congested(int cpu, struct workqueue_struct *wq)
 		cpu = smp_processor_id();
 
 	if (!(wq->flags & WQ_UNBOUND))
-		pwq = per_cpu_ptr(wq->cpu_pwqs, cpu);
+		pwq = per_cpu_ptr(wq->cpu_pwq, cpu);
 	else
 		pwq = unbound_pwq_by_node(wq, cpu_to_node(cpu));
 

From patchwork Fri May 19 00:16:50 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tejun Heo <tj@kernel.org>
X-Patchwork-Id: 96130
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp889812vqo;
        Thu, 18 May 2023 17:28:30 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ6PJLw1DQQf30/SQzA8zc9BDlmw8Z4aPb4RKCvsMNotboWjpskXkZVcbY4T+H0X90rSacKF
X-Received: by 2002:a05:6a00:2e21:b0:627:e49a:871a with SMTP id
 fc33-20020a056a002e2100b00627e49a871amr740864pfb.23.1684456110362;
        Thu, 18 May 2023 17:28:30 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1684456110; cv=none;
        d=google.com; s=arc-20160816;
        b=gDJN7kWMjTGxr84U6lzUbTp4AhCakgA3/jUBwSAddMU4nUGk0MSUdu3Cg+/808epJR
         XEpu4q7XYyuq2sJeq9QFnC31+QUaz/oENKuhs8bhYgDCHAAEdqzbz8FhDd7l6zHpcS6w
         YjENElNTC63PMKLHoEnwfqkfPghRw0ffvcaO47dr5kxemjW1pO010RFCeWGieJ+oTadE
         crKD4tVAP/i+qgIIWSbZVH4F1WDpPf5Qa19ZXKM/272SdEc4p7+vHmBYHlAiJBAeFqmX
         YXRaN5ADQxoa59TG9QGh47HPu/77QDwvldBE/7k7vTk89MlU9dobG8sgwFrozZ7E07gD
         Coyw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from:sender
         :dkim-signature;
        bh=l59TW3SO2E8+id/N9ePHaPeVBwKCkcAQduO8MlWs1rY=;
        b=kEsYp/6nCy7FwT84Y5RZX4WzYt6QWNbXRI4TiqVvGlg9cQe349vDnUdL4CUjoEcAvu
         6jiNxa/jPJ7TilIaDdTKc1jP3oTwNcgWNiiyg/1qluUGlYVYFGLS9bsWVV6JUD5emLPn
         RDnuQ68SLxVtcVDJRBIH6YqLDpANt+UX288lFvyQQxBPPYy7UYHafmQjNB+0yZEz5rCu
         hUko1Ao/16iIsApHYtfqbatKKFPRdr99yW1ms4cAI9JwQhlRf0AlZs/AXb8ucQo0WHCY
         I3VCmhBxv6Jel0PPdhwecmEph//2qqeUqieIOqAXU+ps76/Xp4eslf3w9KD+HfYF+Y8h
         Fe5A==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=eHpFQrwT;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 d2-20020a621d02000000b00640fbfb639asi2871917pfd.360.2023.05.18.17.28.18;
        Thu, 18 May 2023 17:28:30 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=eHpFQrwT;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230382AbjESASB (ORCPT <rfc822;pacteraone@gmail.com> + 99 others);
        Thu, 18 May 2023 20:18:01 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53668 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230309AbjESARx (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 May 2023 20:17:53 -0400
Received: from mail-pj1-x1034.google.com (mail-pj1-x1034.google.com
 [IPv6:2607:f8b0:4864:20::1034])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B5FD10E7
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:17:46 -0700 (PDT)
Received: by mail-pj1-x1034.google.com with SMTP id
 98e67ed59e1d1-253520adb30so1435231a91.1
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:17:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1684455466; x=1687047466;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date
         :message-id:reply-to;
        bh=l59TW3SO2E8+id/N9ePHaPeVBwKCkcAQduO8MlWs1rY=;
        b=eHpFQrwTVtLflC1aHXznxuuUxG3nA5fVcmju/wZu0cYSWfk8fb3EhLSTQze08lXH0m
         reLbpXiizQSIU7kZWAsJ1Ej5YKMHgqfaZzAOIgWoPtxyDQobACXXbOyOozEJSdkNq00/
         I/jbOcqEKC2WgcQ7H9r6pSUnzmNHhfMU7ie6yHkB2A/k3qiqxaAZPfWiXqr+89d9LNZh
         BNOCGP3FWBExyaKvozRnslka91kv9W2fCAw4C3WRncnhvctA7XZSFxS3HpzDXiwSj0P0
         Pi0vJSUJgw3phpgaytIzVD37gtvQn0MuDveglwAnjC3OSUIRloL2Dw+DVfUxss87FY2J
         9aOw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684455466; x=1687047466;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=l59TW3SO2E8+id/N9ePHaPeVBwKCkcAQduO8MlWs1rY=;
        b=iN6C61Wjec6+6vitTnUbUDCl0yf6gxhNquoA3SCKUT5bfi42gdlo+/newTnqB2n0fr
         wstuqY7i98SLC17GYZniEVntEByZGkXlufA8NI7oK6vdA2dDwABVyjqAZeJ8gKufZWnp
         W1y8vkSLegSzrjLBEw8JFqYTQC/BsqykXOFgLsBL8Bw5QXgMdw82Wb2hMVtYqMBa/0Pz
         S4VaN4IiFaIxWJIxQDiSbWClnE8R6UIW5TR44lMNg46t6MexuJLnM6sgsN9uIAlh/8AS
         rBcBWP3gSJUasJuREfkbJuStHwLW8N8D2LSte9scUwGG1uudJ3DAIxf8xm1HhLDxowPv
         VFAg==
X-Gm-Message-State: AC+VfDysLURFfPYVhaDo4RVWJ4ZgjuWyM1u05jnNdHe4FWQ2qXSAzNR5
        1L0IcCNcSIdNwjfGRKLW/10=
X-Received: by 2002:a17:90a:9cf:b0:250:bb6:47ed with SMTP id
 73-20020a17090a09cf00b002500bb647edmr257522pjo.33.1684455465410;
        Thu, 18 May 2023 17:17:45 -0700 (PDT)
Received: from localhost
 (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com.
 [2603:800c:1a02:1bae:a7fa:157f:969a:4cde])
        by smtp.gmail.com with ESMTPSA id
 co2-20020a17090afe8200b0024de3dff70esm216219pjb.56.2023.05.18.17.17.44
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 18 May 2023 17:17:44 -0700 (PDT)
Sender: Tejun Heo <htejun@gmail.com>
From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
        linux-kernel@vger.kernel.org, kernel-team@meta.com,
        joshdon@google.com, brho@google.com, briannorris@chromium.org,
        nhuck@google.com, agk@redhat.com, snitzer@kernel.org,
        void@manifault.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 05/24] workqueue: Relocate worker and work management
 functions
Date: Thu, 18 May 2023 14:16:50 -1000
Message-Id: <20230519001709.2563-6-tj@kernel.org>
X-Mailer: git-send-email 2.40.1
In-Reply-To: <20230519001709.2563-1-tj@kernel.org>
References: <20230519001709.2563-1-tj@kernel.org>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,
        DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
        HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,
        SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=no
        autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1766280250186781828?=
X-GMAIL-MSGID: =?utf-8?q?1766280250186781828?=

Collect first_idle_worker(), worker_enter/leave_idle(),
find_worker_executing_work(), move_linked_works() and wake_up_worker() into
one place. These functions will later be used to implement higher level
worker management logic.

No functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/workqueue.c | 340 ++++++++++++++++++++++-----------------------
 1 file changed, 168 insertions(+), 172 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 80b2bd01c718..6ec22eb87283 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -860,36 +860,6 @@ static bool too_many_workers(struct worker_pool *pool)
 	return nr_idle > 2 && (nr_idle - 2) * MAX_IDLE_WORKERS_RATIO >= nr_busy;
 }
 
-/*
- * Wake up functions.
- */
-
-/* Return the first idle worker.  Called with pool->lock held. */
-static struct worker *first_idle_worker(struct worker_pool *pool)
-{
-	if (unlikely(list_empty(&pool->idle_list)))
-		return NULL;
-
-	return list_first_entry(&pool->idle_list, struct worker, entry);
-}
-
-/**
- * wake_up_worker - wake up an idle worker
- * @pool: worker pool to wake worker from
- *
- * Wake up the first idle worker of @pool.
- *
- * CONTEXT:
- * raw_spin_lock_irq(pool->lock).
- */
-static void wake_up_worker(struct worker_pool *pool)
-{
-	struct worker *worker = first_idle_worker(pool);
-
-	if (likely(worker))
-		wake_up_process(worker->task);
-}
-
 /**
  * worker_set_flags - set worker flags and adjust nr_running accordingly
  * @worker: self
@@ -938,6 +908,174 @@ static inline void worker_clr_flags(struct worker *worker, unsigned int flags)
 			pool->nr_running++;
 }
 
+/* Return the first idle worker.  Called with pool->lock held. */
+static struct worker *first_idle_worker(struct worker_pool *pool)
+{
+	if (unlikely(list_empty(&pool->idle_list)))
+		return NULL;
+
+	return list_first_entry(&pool->idle_list, struct worker, entry);
+}
+
+/**
+ * worker_enter_idle - enter idle state
+ * @worker: worker which is entering idle state
+ *
+ * @worker is entering idle state.  Update stats and idle timer if
+ * necessary.
+ *
+ * LOCKING:
+ * raw_spin_lock_irq(pool->lock).
+ */
+static void worker_enter_idle(struct worker *worker)
+{
+	struct worker_pool *pool = worker->pool;
+
+	if (WARN_ON_ONCE(worker->flags & WORKER_IDLE) ||
+	    WARN_ON_ONCE(!list_empty(&worker->entry) &&
+			 (worker->hentry.next || worker->hentry.pprev)))
+		return;
+
+	/* can't use worker_set_flags(), also called from create_worker() */
+	worker->flags |= WORKER_IDLE;
+	pool->nr_idle++;
+	worker->last_active = jiffies;
+
+	/* idle_list is LIFO */
+	list_add(&worker->entry, &pool->idle_list);
+
+	if (too_many_workers(pool) && !timer_pending(&pool->idle_timer))
+		mod_timer(&pool->idle_timer, jiffies + IDLE_WORKER_TIMEOUT);
+
+	/* Sanity check nr_running. */
+	WARN_ON_ONCE(pool->nr_workers == pool->nr_idle && pool->nr_running);
+}
+
+/**
+ * worker_leave_idle - leave idle state
+ * @worker: worker which is leaving idle state
+ *
+ * @worker is leaving idle state.  Update stats.
+ *
+ * LOCKING:
+ * raw_spin_lock_irq(pool->lock).
+ */
+static void worker_leave_idle(struct worker *worker)
+{
+	struct worker_pool *pool = worker->pool;
+
+	if (WARN_ON_ONCE(!(worker->flags & WORKER_IDLE)))
+		return;
+	worker_clr_flags(worker, WORKER_IDLE);
+	pool->nr_idle--;
+	list_del_init(&worker->entry);
+}
+
+/**
+ * find_worker_executing_work - find worker which is executing a work
+ * @pool: pool of interest
+ * @work: work to find worker for
+ *
+ * Find a worker which is executing @work on @pool by searching
+ * @pool->busy_hash which is keyed by the address of @work.  For a worker
+ * to match, its current execution should match the address of @work and
+ * its work function.  This is to avoid unwanted dependency between
+ * unrelated work executions through a work item being recycled while still
+ * being executed.
+ *
+ * This is a bit tricky.  A work item may be freed once its execution
+ * starts and nothing prevents the freed area from being recycled for
+ * another work item.  If the same work item address ends up being reused
+ * before the original execution finishes, workqueue will identify the
+ * recycled work item as currently executing and make it wait until the
+ * current execution finishes, introducing an unwanted dependency.
+ *
+ * This function checks the work item address and work function to avoid
+ * false positives.  Note that this isn't complete as one may construct a
+ * work function which can introduce dependency onto itself through a
+ * recycled work item.  Well, if somebody wants to shoot oneself in the
+ * foot that badly, there's only so much we can do, and if such deadlock
+ * actually occurs, it should be easy to locate the culprit work function.
+ *
+ * CONTEXT:
+ * raw_spin_lock_irq(pool->lock).
+ *
+ * Return:
+ * Pointer to worker which is executing @work if found, %NULL
+ * otherwise.
+ */
+static struct worker *find_worker_executing_work(struct worker_pool *pool,
+						 struct work_struct *work)
+{
+	struct worker *worker;
+
+	hash_for_each_possible(pool->busy_hash, worker, hentry,
+			       (unsigned long)work)
+		if (worker->current_work == work &&
+		    worker->current_func == work->func)
+			return worker;
+
+	return NULL;
+}
+
+/**
+ * move_linked_works - move linked works to a list
+ * @work: start of series of works to be scheduled
+ * @head: target list to append @work to
+ * @nextp: out parameter for nested worklist walking
+ *
+ * Schedule linked works starting from @work to @head.  Work series to
+ * be scheduled starts at @work and includes any consecutive work with
+ * WORK_STRUCT_LINKED set in its predecessor.
+ *
+ * If @nextp is not NULL, it's updated to point to the next work of
+ * the last scheduled work.  This allows move_linked_works() to be
+ * nested inside outer list_for_each_entry_safe().
+ *
+ * CONTEXT:
+ * raw_spin_lock_irq(pool->lock).
+ */
+static void move_linked_works(struct work_struct *work, struct list_head *head,
+			      struct work_struct **nextp)
+{
+	struct work_struct *n;
+
+	/*
+	 * Linked worklist will always end before the end of the list,
+	 * use NULL for list head.
+	 */
+	list_for_each_entry_safe_from(work, n, NULL, entry) {
+		list_move_tail(&work->entry, head);
+		if (!(*work_data_bits(work) & WORK_STRUCT_LINKED))
+			break;
+	}
+
+	/*
+	 * If we're already inside safe list traversal and have moved
+	 * multiple works to the scheduled queue, the next position
+	 * needs to be updated.
+	 */
+	if (nextp)
+		*nextp = n;
+}
+
+/**
+ * wake_up_worker - wake up an idle worker
+ * @pool: worker pool to wake worker from
+ *
+ * Wake up the first idle worker of @pool.
+ *
+ * CONTEXT:
+ * raw_spin_lock_irq(pool->lock).
+ */
+static void wake_up_worker(struct worker_pool *pool)
+{
+	struct worker *worker = first_idle_worker(pool);
+
+	if (likely(worker))
+		wake_up_process(worker->task);
+}
+
 #ifdef CONFIG_WQ_CPU_INTENSIVE_REPORT
 
 /*
@@ -1183,94 +1321,6 @@ work_func_t wq_worker_last_func(struct task_struct *task)
 	return worker->last_func;
 }
 
-/**
- * find_worker_executing_work - find worker which is executing a work
- * @pool: pool of interest
- * @work: work to find worker for
- *
- * Find a worker which is executing @work on @pool by searching
- * @pool->busy_hash which is keyed by the address of @work.  For a worker
- * to match, its current execution should match the address of @work and
- * its work function.  This is to avoid unwanted dependency between
- * unrelated work executions through a work item being recycled while still
- * being executed.
- *
- * This is a bit tricky.  A work item may be freed once its execution
- * starts and nothing prevents the freed area from being recycled for
- * another work item.  If the same work item address ends up being reused
- * before the original execution finishes, workqueue will identify the
- * recycled work item as currently executing and make it wait until the
- * current execution finishes, introducing an unwanted dependency.
- *
- * This function checks the work item address and work function to avoid
- * false positives.  Note that this isn't complete as one may construct a
- * work function which can introduce dependency onto itself through a
- * recycled work item.  Well, if somebody wants to shoot oneself in the
- * foot that badly, there's only so much we can do, and if such deadlock
- * actually occurs, it should be easy to locate the culprit work function.
- *
- * CONTEXT:
- * raw_spin_lock_irq(pool->lock).
- *
- * Return:
- * Pointer to worker which is executing @work if found, %NULL
- * otherwise.
- */
-static struct worker *find_worker_executing_work(struct worker_pool *pool,
-						 struct work_struct *work)
-{
-	struct worker *worker;
-
-	hash_for_each_possible(pool->busy_hash, worker, hentry,
-			       (unsigned long)work)
-		if (worker->current_work == work &&
-		    worker->current_func == work->func)
-			return worker;
-
-	return NULL;
-}
-
-/**
- * move_linked_works - move linked works to a list
- * @work: start of series of works to be scheduled
- * @head: target list to append @work to
- * @nextp: out parameter for nested worklist walking
- *
- * Schedule linked works starting from @work to @head.  Work series to
- * be scheduled starts at @work and includes any consecutive work with
- * WORK_STRUCT_LINKED set in its predecessor.
- *
- * If @nextp is not NULL, it's updated to point to the next work of
- * the last scheduled work.  This allows move_linked_works() to be
- * nested inside outer list_for_each_entry_safe().
- *
- * CONTEXT:
- * raw_spin_lock_irq(pool->lock).
- */
-static void move_linked_works(struct work_struct *work, struct list_head *head,
-			      struct work_struct **nextp)
-{
-	struct work_struct *n;
-
-	/*
-	 * Linked worklist will always end before the end of the list,
-	 * use NULL for list head.
-	 */
-	list_for_each_entry_safe_from(work, n, NULL, entry) {
-		list_move_tail(&work->entry, head);
-		if (!(*work_data_bits(work) & WORK_STRUCT_LINKED))
-			break;
-	}
-
-	/*
-	 * If we're already inside safe list traversal and have moved
-	 * multiple works to the scheduled queue, the next position
-	 * needs to be updated.
-	 */
-	if (nextp)
-		*nextp = n;
-}
-
 /**
  * get_pwq - get an extra reference on the specified pool_workqueue
  * @pwq: pool_workqueue to get
@@ -1954,60 +2004,6 @@ bool queue_rcu_work(struct workqueue_struct *wq, struct rcu_work *rwork)
 }
 EXPORT_SYMBOL(queue_rcu_work);
 
-/**
- * worker_enter_idle - enter idle state
- * @worker: worker which is entering idle state
- *
- * @worker is entering idle state.  Update stats and idle timer if
- * necessary.
- *
- * LOCKING:
- * raw_spin_lock_irq(pool->lock).
- */
-static void worker_enter_idle(struct worker *worker)
-{
-	struct worker_pool *pool = worker->pool;
-
-	if (WARN_ON_ONCE(worker->flags & WORKER_IDLE) ||
-	    WARN_ON_ONCE(!list_empty(&worker->entry) &&
-			 (worker->hentry.next || worker->hentry.pprev)))
-		return;
-
-	/* can't use worker_set_flags(), also called from create_worker() */
-	worker->flags |= WORKER_IDLE;
-	pool->nr_idle++;
-	worker->last_active = jiffies;
-
-	/* idle_list is LIFO */
-	list_add(&worker->entry, &pool->idle_list);
-
-	if (too_many_workers(pool) && !timer_pending(&pool->idle_timer))
-		mod_timer(&pool->idle_timer, jiffies + IDLE_WORKER_TIMEOUT);
-
-	/* Sanity check nr_running. */
-	WARN_ON_ONCE(pool->nr_workers == pool->nr_idle && pool->nr_running);
-}
-
-/**
- * worker_leave_idle - leave idle state
- * @worker: worker which is leaving idle state
- *
- * @worker is leaving idle state.  Update stats.
- *
- * LOCKING:
- * raw_spin_lock_irq(pool->lock).
- */
-static void worker_leave_idle(struct worker *worker)
-{
-	struct worker_pool *pool = worker->pool;
-
-	if (WARN_ON_ONCE(!(worker->flags & WORKER_IDLE)))
-		return;
-	worker_clr_flags(worker, WORKER_IDLE);
-	pool->nr_idle--;
-	list_del_init(&worker->entry);
-}
-
 static struct worker *alloc_worker(int node)
 {
 	struct worker *worker;

From patchwork Fri May 19 00:16:51 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tejun Heo <tj@kernel.org>
X-Patchwork-Id: 96136
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp890512vqo;
        Thu, 18 May 2023 17:30:13 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ4tcOmdTRNAToKmRqrJ6IwDTQXU02830gBXxZoWYvA/e6BmdoeBCNuinPYm0BOpRFaPSoW8
X-Received: by 2002:a17:902:e80b:b0:1a1:f5dd:2dce with SMTP id
 u11-20020a170902e80b00b001a1f5dd2dcemr892768plg.6.1684456212734;
        Thu, 18 May 2023 17:30:12 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1684456212; cv=none;
        d=google.com; s=arc-20160816;
        b=xRgh+MI8OPLLpwunVBIaCQe4D21X04rT9/iBumvO9XDD+Xok2fX4+Kvq/Jg8hpVlpE
         fye7uK3AV2KmGdhFcQYb1BLg97LDfDwqmKgleQRRDCx//Reo4JwPSgzv2EXhQR38KD6N
         ai0lz/MMtbPvKQnXzvfS4mE8m2zIxzdT43g2PQUMCYtlVMAcdXClfkbaun3PjV7HOZ9D
         qG2W4NQaxz5bbbPNoZO53qTLz1L4emFJay9SC9zyJYkODFjkrmObuL/dVHELRbfdSSxH
         qal5dQko/GuOPIofBUDYT40VpW21jihcRxSCw+NaumQeHjc2vhfdArJV7GMpXgCsLCSr
         grrA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from:sender
         :dkim-signature;
        bh=v0ZsecRYaEvYJeTDSDmHpi6RDj4O0XfHKhELiT004Zs=;
        b=Ch4wSUYHmARiuoH6lWx6teU4Ocp6ZD2XF/q7MEEfVmyis4956+9hM8hLndUJVqRAo3
         yPAVaWeSifzfBe7uCGCfkUHtvvcGuEoQ6F4N1FhG0CMi1O0N/xUP4w+dYT6EfznA/4YG
         Prn1UEYe6hhlGzHVMkU14PYkqgTD824v7Jyfa1OW5wuUm0i5+xxeVYWjpOTxMVX3m6eG
         drteqeOh97J+/L+oM2mimHY3/tWirfbasI2zmZGyrrXznPokc5BdYeUT1bTYjDKB+yiz
         VZ8iJ3QW+intyDiF80LnuV6pcF37txsnQ+wdznW5oAd3+yd75APfoB4fINkwKhNtTYcm
         j12w==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=eop8BSJf;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 9-20020a170902c24900b001ac6430f68csi2541187plg.96.2023.05.18.17.30.00;
        Thu, 18 May 2023 17:30:12 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=eop8BSJf;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229543AbjESASD (ORCPT <rfc822;pacteraone@gmail.com> + 99 others);
        Thu, 18 May 2023 20:18:03 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53698 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230329AbjESARy (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 May 2023 20:17:54 -0400
Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com
 [IPv6:2607:f8b0:4864:20::62f])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 948681703
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:17:48 -0700 (PDT)
Received: by mail-pl1-x62f.google.com with SMTP id
 d9443c01a7336-1ae4e49727eso26967695ad.1
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:17:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1684455467; x=1687047467;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date
         :message-id:reply-to;
        bh=v0ZsecRYaEvYJeTDSDmHpi6RDj4O0XfHKhELiT004Zs=;
        b=eop8BSJfdrQRbEwHV/yxjHa0y/3t+6vbROqMroHcX9Bn8nJHDTRQ7MGWjUSYUc371o
         TvYCtLGvIk6AR4pvw3fFVChsdFFYqYjVxAzZGlBFQ4VdDCXffRB9L16h7sLmzeYFpFjE
         lWSIqCCqitwCmVZ6LYqN3RpmUOeyk7gd/4kjrMRMqd2LRxdGX79vP7+w1NZQQp2efAt6
         knsBLgp/7WpwV7O9s1j7T7kqptEHRq+4663OKSICBA3Il8Sxj+7XhDFIKcSbNd3BH0F5
         aiai7A6LnT1k1jpSRNQJewNuQNEiIc5C5GtDCrLjodwZx843nlqC9T/GqTT9oBr9mMUt
         hQmw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684455467; x=1687047467;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=v0ZsecRYaEvYJeTDSDmHpi6RDj4O0XfHKhELiT004Zs=;
        b=hHWl0XjkT++wePYQ8U3A1nhDo99gnp4r4AeAwwz3G6iWrFUuvrCijTC0oD60Zs/VGQ
         LI4cgDCoOqbnap4j1oS8zFBg0MBKuyENnUA41UIdtHy+ogNqpAF43nD+2wCt3aUiKS97
         qL/6aOCPuUAZNBJIfuLSv4a9hjJj5ZDxSrZ+olCXPkNNgiQnGlASuMNvuv2ZhS3kUdkH
         WMmVkP14V4XGzn7K3/OxL/JxlW+LWymvPMilwU/kFj+ZyXtnk/P6K4ZV1oqb1u/Gp7Sx
         EawgYRhDnVy0ecpbhn4fApFhSswNWDkAhwVDhyd/i74x3w5ybVZ/UuwJMBuaaTOwUxua
         Hw1w==
X-Gm-Message-State: AC+VfDy73BC/i2/xyMF0u5lcBmlcBkpmeBMcoR36QZhPAjyXmCRZ9ewz
        UjLMo9LlECDqSz80w/gAFk8=
X-Received: by 2002:a17:903:2595:b0:1ad:1c22:1b53 with SMTP id
 jb21-20020a170903259500b001ad1c221b53mr806630plb.40.1684455467212;
        Thu, 18 May 2023 17:17:47 -0700 (PDT)
Received: from localhost
 (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com.
 [2603:800c:1a02:1bae:a7fa:157f:969a:4cde])
        by smtp.gmail.com with ESMTPSA id
 jh18-20020a170903329200b001ac741db80csm2074225plb.88.2023.05.18.17.17.46
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 18 May 2023 17:17:46 -0700 (PDT)
Sender: Tejun Heo <htejun@gmail.com>
From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
        linux-kernel@vger.kernel.org, kernel-team@meta.com,
        joshdon@google.com, brho@google.com, briannorris@chromium.org,
        nhuck@google.com, agk@redhat.com, snitzer@kernel.org,
        void@manifault.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 06/24] workqueue: Remove module param disable_numa and sysfs
 knobs pool_ids and numa
Date: Thu, 18 May 2023 14:16:51 -1000
Message-Id: <20230519001709.2563-7-tj@kernel.org>
X-Mailer: git-send-email 2.40.1
In-Reply-To: <20230519001709.2563-1-tj@kernel.org>
References: <20230519001709.2563-1-tj@kernel.org>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,
        DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
        HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,
        SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no
        version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1766280357678646975?=
X-GMAIL-MSGID: =?utf-8?q?1766280357678646975?=

Unbound workqueue CPU affinity is going to receive an overhaul and the NUMA
specific knobs won't make sense anymore. Remove them. Also, the pool_ids
knob was used for debugging and not really meaningful given that there is no
visibility into the pools associated with those IDs. Remove it too. A future
patch will improve overall visibility.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 .../admin-guide/kernel-parameters.txt         |  9 ---
 kernel/workqueue.c                            | 73 -------------------
 2 files changed, 82 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 3ed7dda4c994..042275425c32 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6943,15 +6943,6 @@
 			threshold repeatedly. They are likely good
 			candidates for using WQ_UNBOUND workqueues instead.
 
-	workqueue.disable_numa
-			By default, all work items queued to unbound
-			workqueues are affine to the NUMA nodes they're
-			issued on, which results in better behavior in
-			general.  If NUMA affinity needs to be disabled for
-			whatever reason, this option can be used.  Note
-			that this also can be controlled per-workqueue for
-			workqueues visible under /sys/bus/workqueue/.
-
 	workqueue.power_efficient
 			Per-cpu workqueues are generally preferred because
 			they show better performance thanks to cache
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 6ec22eb87283..f39d04e7e5f9 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -337,9 +337,6 @@ static cpumask_var_t *wq_numa_possible_cpumask;
 static unsigned long wq_cpu_intensive_thresh_us = 10000;
 module_param_named(cpu_intensive_thresh_us, wq_cpu_intensive_thresh_us, ulong, 0644);
 
-static bool wq_disable_numa;
-module_param_named(disable_numa, wq_disable_numa, bool, 0444);
-
 /* see the comment above the definition of WQ_POWER_EFFICIENT */
 static bool wq_power_efficient = IS_ENABLED(CONFIG_WQ_POWER_EFFICIENT_DEFAULT);
 module_param_named(power_efficient, wq_power_efficient, bool, 0444);
@@ -5777,10 +5774,8 @@ int workqueue_set_unbound_cpumask(cpumask_var_t cpumask)
  *
  * Unbound workqueues have the following extra attributes.
  *
- *  pool_ids	RO int	: the associated pool IDs for each node
  *  nice	RW int	: nice value of the workers
  *  cpumask	RW mask	: bitmask of allowed CPUs for the workers
- *  numa	RW bool	: whether enable NUMA affinity
  */
 struct wq_device {
 	struct workqueue_struct		*wq;
@@ -5833,28 +5828,6 @@ static struct attribute *wq_sysfs_attrs[] = {
 };
 ATTRIBUTE_GROUPS(wq_sysfs);
 
-static ssize_t wq_pool_ids_show(struct device *dev,
-				struct device_attribute *attr, char *buf)
-{
-	struct workqueue_struct *wq = dev_to_wq(dev);
-	const char *delim = "";
-	int node, written = 0;
-
-	cpus_read_lock();
-	rcu_read_lock();
-	for_each_node(node) {
-		written += scnprintf(buf + written, PAGE_SIZE - written,
-				     "%s%d:%d", delim, node,
-				     unbound_pwq_by_node(wq, node)->pool->id);
-		delim = " ";
-	}
-	written += scnprintf(buf + written, PAGE_SIZE - written, "\n");
-	rcu_read_unlock();
-	cpus_read_unlock();
-
-	return written;
-}
-
 static ssize_t wq_nice_show(struct device *dev, struct device_attribute *attr,
 			    char *buf)
 {
@@ -5945,50 +5918,9 @@ static ssize_t wq_cpumask_store(struct device *dev,
 	return ret ?: count;
 }
 
-static ssize_t wq_numa_show(struct device *dev, struct device_attribute *attr,
-			    char *buf)
-{
-	struct workqueue_struct *wq = dev_to_wq(dev);
-	int written;
-
-	mutex_lock(&wq->mutex);
-	written = scnprintf(buf, PAGE_SIZE, "%d\n",
-			    !wq->unbound_attrs->no_numa);
-	mutex_unlock(&wq->mutex);
-
-	return written;
-}
-
-static ssize_t wq_numa_store(struct device *dev, struct device_attribute *attr,
-			     const char *buf, size_t count)
-{
-	struct workqueue_struct *wq = dev_to_wq(dev);
-	struct workqueue_attrs *attrs;
-	int v, ret = -ENOMEM;
-
-	apply_wqattrs_lock();
-
-	attrs = wq_sysfs_prep_attrs(wq);
-	if (!attrs)
-		goto out_unlock;
-
-	ret = -EINVAL;
-	if (sscanf(buf, "%d", &v) == 1) {
-		attrs->no_numa = !v;
-		ret = apply_workqueue_attrs_locked(wq, attrs);
-	}
-
-out_unlock:
-	apply_wqattrs_unlock();
-	free_workqueue_attrs(attrs);
-	return ret ?: count;
-}
-
 static struct device_attribute wq_sysfs_unbound_attrs[] = {
-	__ATTR(pool_ids, 0444, wq_pool_ids_show, NULL),
 	__ATTR(nice, 0644, wq_nice_show, wq_nice_store),
 	__ATTR(cpumask, 0644, wq_cpumask_show, wq_cpumask_store),
-	__ATTR(numa, 0644, wq_numa_show, wq_numa_store),
 	__ATTR_NULL,
 };
 
@@ -6362,11 +6294,6 @@ static void __init wq_numa_init(void)
 	if (num_possible_nodes() <= 1)
 		return;
 
-	if (wq_disable_numa) {
-		pr_info("workqueue: NUMA affinity support disabled\n");
-		return;
-	}
-
 	for_each_possible_cpu(cpu) {
 		if (WARN_ON(cpu_to_node(cpu) == NUMA_NO_NODE)) {
 			pr_warn("workqueue: NUMA node mapping not available for cpu%d, disabling NUMA support\n", cpu);

From patchwork Fri May 19 00:16:52 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tejun Heo <tj@kernel.org>
X-Patchwork-Id: 96124
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp889248vqo;
        Thu, 18 May 2023 17:26:50 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ5GC8w7JltgJP/IIY5VZ7+srKAJnb7K2AdQ7tpiyyeOdhRkfKXEtV97pb1OX3aZaXE29Upg
X-Received: by 2002:a05:6a20:e615:b0:102:2de7:ee1d with SMTP id
 my21-20020a056a20e61500b001022de7ee1dmr303256pzb.6.1684456010502;
        Thu, 18 May 2023 17:26:50 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1684456010; cv=none;
        d=google.com; s=arc-20160816;
        b=jPQGhWkUdc81UxjOL3ExsbS+bPpH5nn12CoBcC1vc3DSF0SugMatbN42qx8y+zEDxh
         Ex1ffLD+btvPQWEdwgwm91wNfgyFMg57OyLmj+PzjXr9HcoujpZxaDzoSuTezePKtiu0
         KjwEVy/TydrzS6mf2ZWuC3tKu9EfLqx+YJvBFELij82w6JCNgddmHoSdQK/LFH3Udu6Y
         fOO2Vqn6u8nskDXKIxxnfFNRcMeogCpd8KjXWdswDfoR8r6ldngflrWS/cMGstTX8nGm
         l9KXxvIpSmO6aBn2+HUwWUkypKf8ZXqIHmyAIv7CuMS+IVYR0JpOpDEJbcojnDRybAGY
         wqpQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from:sender
         :dkim-signature;
        bh=6v7q4NSZrKhF1BiW0NfAqCkRfXplNOsiBN0pnQXEFmY=;
        b=aIM2pLCECzHWMBHjjAimttzGoTGeYFF2TEvu7jwtnsxaojLQjR2y1g6+kkBfeq0vEz
         zOWKRjajVX4rw/0rgv+MWpZqtxzA7NOqzBMCDxCzaad6Dyjq3YOpxsTesQMFpzAuJO2r
         o+lXeonJLi2tDild3n/rrrixZ59OT2VzAL6XIXY2W4aIPyVz0N5reRLn3P22oILapTab
         2tTD2Jv770dTarcMUhdQlSYE/49JTyw7OKDhYAdDvw58N2UzHO9cEp6Dpsi0GqqtwyTs
         nkN8RbUgQ8SCc5OD2Y3sB88mrfbxi25vU+LkvIAVBJhio63+eS7UIoIQE2AMVze0DBlS
         b/Sw==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=GOaCYvtZ;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 c4-20020a637244000000b005342409bdcasi2457130pgn.723.2023.05.18.17.26.38;
        Thu, 18 May 2023 17:26:50 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=GOaCYvtZ;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230308AbjESASG (ORCPT <rfc822;pacteraone@gmail.com> + 99 others);
        Thu, 18 May 2023 20:18:06 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53764 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230340AbjESAR5 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 May 2023 20:17:57 -0400
Received: from mail-pf1-x430.google.com (mail-pf1-x430.google.com
 [IPv6:2607:f8b0:4864:20::430])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 14D681711
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:17:50 -0700 (PDT)
Received: by mail-pf1-x430.google.com with SMTP id
 d2e1a72fcca58-64d2c865e4eso464070b3a.0
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:17:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1684455469; x=1687047469;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date
         :message-id:reply-to;
        bh=6v7q4NSZrKhF1BiW0NfAqCkRfXplNOsiBN0pnQXEFmY=;
        b=GOaCYvtZk4/f014gDfMmOxxeZSqtmi8D8HaRGUttcPDxs/ZU2/Rus4hIjVbnf7EwR0
         xNXtO2VI8+mjsBCAYXIN9Uo0om3yzPHxW8EgbUX4seyypPB8ZNDldFE1eMXf+tO/xBH4
         SeGd3UdyPNrcNsnv4c+dA+kyQ2kO+RtVIV7z1Psz6LV53/MnYYDn0qx+7eKIwGlbYg1M
         ZnvxYDqLmuyAd0I8Jdt13o86KmBAsywk4Njt389q9Ncg4744tC0bA88Za4Cgcd74evpx
         H3gFPGTcSOqybwdLzDZSDbKATSCSpOUB3SPyVVUnzxJ69+sZYi4rW0A/jV3+onzx9PbT
         vzLw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684455469; x=1687047469;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=6v7q4NSZrKhF1BiW0NfAqCkRfXplNOsiBN0pnQXEFmY=;
        b=Gg43TuUObBamWzFo8vUhTeGHkqt7cOfdYTalvrSqVIDRjnivXKSYV6303OH4hLogQ4
         JfX73aHBMXKxsHSTB1S6KkbdB2su9YlllVR2hFyfMBXSVU2cfrmFHMoJeemV24azrGR/
         TGElge0oAdcQgR3ObqKEkjs9YVCCBufHtDU5J9kdS4CKpKJv/EzwRZD+kSH90Vg0d9Z8
         vRVCin+w70WBhJS39BT2eS+ybq72pjwX7olTMhgd5Vjv4t0JlL8f15L2FzUyTX/w6yk2
         KdKQvQZ0C/TX+Mga+ZXl30OLReCy9POPdI9P4zTVRuD33B+IQBuNlVak7B0xVF8OxGRw
         0kJQ==
X-Gm-Message-State: AC+VfDxDFXRlEH3yvcJJr8mlAE36GSh9Dm1bYU/kOJ4n2gd7cS6qsfjq
        GMpQRsn1nq/SWPckVOMdtGI=
X-Received: by 2002:a05:6a20:c191:b0:104:859b:9263 with SMTP id
 bg17-20020a056a20c19100b00104859b9263mr287726pzb.10.1684455469253;
        Thu, 18 May 2023 17:17:49 -0700 (PDT)
Received: from localhost
 (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com.
 [2603:800c:1a02:1bae:a7fa:157f:969a:4cde])
        by smtp.gmail.com with ESMTPSA id
 a26-20020aa7865a000000b0063b85893633sm1845384pfo.197.2023.05.18.17.17.48
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 18 May 2023 17:17:48 -0700 (PDT)
Sender: Tejun Heo <htejun@gmail.com>
From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
        linux-kernel@vger.kernel.org, kernel-team@meta.com,
        joshdon@google.com, brho@google.com, briannorris@chromium.org,
        nhuck@google.com, agk@redhat.com, snitzer@kernel.org,
        void@manifault.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 07/24] workqueue: Use a kthread_worker to release
 pool_workqueues
Date: Thu, 18 May 2023 14:16:52 -1000
Message-Id: <20230519001709.2563-8-tj@kernel.org>
X-Mailer: git-send-email 2.40.1
In-Reply-To: <20230519001709.2563-1-tj@kernel.org>
References: <20230519001709.2563-1-tj@kernel.org>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,
        DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
        HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,
        SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no
        version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1766280145404758574?=
X-GMAIL-MSGID: =?utf-8?q?1766280145404758574?=

pool_workqueue release path is currently bounced to system_wq; however, this
is a bit tricky because this bouncing occurs while holding a pool lock and
thus has risk of causing a A-A deadlock. This is currently addressed by the
fact that only unbound workqueues use this bouncing path and system_wq is a
per-cpu workqueue.

While this works, it's brittle and requires a work-around like setting the
lockdep subclass for the lock of unbound pools. Besides, future changes will
use the bouncing path for per-cpu workqueues too making the current approach
unusable.

Let's just use a dedicated kthread_worker to untangle the dependency. This
is just one more kthread for all workqueues and makes the pwq release logic
simpler and more robust.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/workqueue.c | 40 +++++++++++++++++++++++-----------------
 1 file changed, 23 insertions(+), 17 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f39d04e7e5f9..7addda9b37b9 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -256,12 +256,12 @@ struct pool_workqueue {
 	u64			stats[PWQ_NR_STATS];
 
 	/*
-	 * Release of unbound pwq is punted to system_wq.  See put_pwq()
-	 * and pwq_unbound_release_workfn() for details.  pool_workqueue
-	 * itself is also RCU protected so that the first pwq can be
-	 * determined without grabbing wq->mutex.
+	 * Release of unbound pwq is punted to a kthread_worker. See put_pwq()
+	 * and pwq_unbound_release_workfn() for details. pool_workqueue itself
+	 * is also RCU protected so that the first pwq can be determined without
+	 * grabbing wq->mutex.
 	 */
-	struct work_struct	unbound_release_work;
+	struct kthread_work	unbound_release_work;
 	struct rcu_head		rcu;
 } __aligned(1 << WORK_STRUCT_FLAG_BITS);
 
@@ -389,6 +389,13 @@ static struct workqueue_attrs *unbound_std_wq_attrs[NR_STD_WORKER_POOLS];
 /* I: attributes used when instantiating ordered pools on demand */
 static struct workqueue_attrs *ordered_wq_attrs[NR_STD_WORKER_POOLS];
 
+/*
+ * I: kthread_worker to release pwq's. pwq release needs to be bounced to a
+ * process context while holding a pool lock. Bounce to a dedicated kthread
+ * worker to avoid A-A deadlocks.
+ */
+static struct kthread_worker *pwq_release_worker;
+
 struct workqueue_struct *system_wq __read_mostly;
 EXPORT_SYMBOL(system_wq);
 struct workqueue_struct *system_highpri_wq __read_mostly;
@@ -1347,14 +1354,10 @@ static void put_pwq(struct pool_workqueue *pwq)
 	if (WARN_ON_ONCE(!(pwq->wq->flags & WQ_UNBOUND)))
 		return;
 	/*
-	 * @pwq can't be released under pool->lock, bounce to
-	 * pwq_unbound_release_workfn().  This never recurses on the same
-	 * pool->lock as this path is taken only for unbound workqueues and
-	 * the release work item is scheduled on a per-cpu workqueue.  To
-	 * avoid lockdep warning, unbound pool->locks are given lockdep
-	 * subclass of 1 in get_unbound_pool().
+	 * @pwq can't be released under pool->lock, bounce to a dedicated
+	 * kthread_worker to avoid A-A deadlocks.
 	 */
-	schedule_work(&pwq->unbound_release_work);
+	kthread_queue_work(pwq_release_worker, &pwq->unbound_release_work);
 }
 
 /**
@@ -3948,7 +3951,6 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
 	if (!pool || init_worker_pool(pool) < 0)
 		goto fail;
 
-	lockdep_set_subclass(&pool->lock, 1);	/* see put_pwq() */
 	copy_workqueue_attrs(pool->attrs, attrs);
 	pool->node = target_node;
 
@@ -3982,10 +3984,10 @@ static void rcu_free_pwq(struct rcu_head *rcu)
 }
 
 /*
- * Scheduled on system_wq by put_pwq() when an unbound pwq hits zero refcnt
- * and needs to be destroyed.
+ * Scheduled on pwq_release_worker by put_pwq() when an unbound pwq hits zero
+ * refcnt and needs to be destroyed.
  */
-static void pwq_unbound_release_workfn(struct work_struct *work)
+static void pwq_unbound_release_workfn(struct kthread_work *work)
 {
 	struct pool_workqueue *pwq = container_of(work, struct pool_workqueue,
 						  unbound_release_work);
@@ -4093,7 +4095,8 @@ static void init_pwq(struct pool_workqueue *pwq, struct workqueue_struct *wq,
 	INIT_LIST_HEAD(&pwq->inactive_works);
 	INIT_LIST_HEAD(&pwq->pwqs_node);
 	INIT_LIST_HEAD(&pwq->mayday_node);
-	INIT_WORK(&pwq->unbound_release_work, pwq_unbound_release_workfn);
+	kthread_init_work(&pwq->unbound_release_work,
+			  pwq_unbound_release_workfn);
 }
 
 /* sync @pwq with the current state of its associated wq and link it */
@@ -6419,6 +6422,9 @@ void __init workqueue_init(void)
 	struct worker_pool *pool;
 	int cpu, bkt;
 
+	pwq_release_worker = kthread_create_worker(0, "pool_workqueue_release");
+	BUG_ON(IS_ERR(pwq_release_worker));
+
 	/*
 	 * It'd be simpler to initialize NUMA in workqueue_init_early() but
 	 * CPU to node mapping may not be available that early on some

From patchwork Fri May 19 00:16:53 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tejun Heo <tj@kernel.org>
X-Patchwork-Id: 96126
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp889454vqo;
        Thu, 18 May 2023 17:27:22 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ4Hk7QjQK2aXGJpYZJN9pEdMJ1S9KyU5jNl6RvffxZ18ef+4W90W9ruvw0HcuUK4DR6OULz
X-Received: by 2002:a17:902:ea04:b0:1ad:e2b6:d2a0 with SMTP id
 s4-20020a170902ea0400b001ade2b6d2a0mr4796706plg.11.1684456042544;
        Thu, 18 May 2023 17:27:22 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1684456042; cv=none;
        d=google.com; s=arc-20160816;
        b=CZnne0HHdzNmz061PxbPV8OlAZvBT5sRPx89bLDOh8d6aabGKkzsX1mFjPJtFzZrhC
         CsvoN5hD+eRexVaUo9XEWKHKAIKo3Ez6z7tzLuAc9auJ2GuqvU+TDCWU2zPQkMys0+qD
         tmClQzhpzF0D//HxtW3eqR4BGXnvT81BUj1j0TD2MA6yekpyXGV7UYEu2VXD+KVoQmf7
         om7shRi4jde7b4rLbomMZRsyspGqrxmZjJgZVQgqsDBOcswoPvo0xQiQ4heukvJ0YPHk
         Xk0ZrfClNr6Glz4MXHT8NbMShhal3NOSZMA+RNFNRQ3IkJFLUY6t34V/hFsxq/e5/4FC
         OW6A==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from:sender
         :dkim-signature;
        bh=iX/ZhpMzXlbbJY0luxFlzfIl6BBtLUXwU2QHcxfRKvY=;
        b=h7I3O6YLpUM7hJ9wnbG8/NZ6fowbNAstplcKLLPataZmIZ0q5OeE/a76MLVcGK/iNQ
         UrBfaYfHgc4ewCBFC6m8PSKeCJzlgC5BpxjvZZ7LnTJLRwCnOzAopi90UePchlOQJ2Gh
         DwyhzXeJ7l0Dv/2R6lnjwI7TNeCRcJcRX4gcnjTWiU0OB4jgLUBgE4/jODpiQBk6u9Ej
         TieeStnyx5FwZtB15Cn/V545jbrkPcvXoklQ3iclOgQXI1Sevdr+MI8SU9K7raP2CoV8
         5edF8iTD6vhi/hQ/00qY4sxwXBFB4i3PCCng5rw6Ga7Jv2KQX7cyUBVR3Qblm4rg3eBm
         knNg==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=XSI7NB1e;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 h10-20020a170902f54a00b001aaccc51d45si2689701plf.398.2023.05.18.17.27.10;
        Thu, 18 May 2023 17:27:22 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=XSI7NB1e;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230449AbjESASP (ORCPT <rfc822;pacteraone@gmail.com> + 99 others);
        Thu, 18 May 2023 20:18:15 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53840 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230357AbjESASA (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 May 2023 20:18:00 -0400
Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com
 [IPv6:2607:f8b0:4864:20::435])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC1811727
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:17:52 -0700 (PDT)
Received: by mail-pf1-x435.google.com with SMTP id
 d2e1a72fcca58-64354231003so210079b3a.0
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:17:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1684455471; x=1687047471;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date
         :message-id:reply-to;
        bh=iX/ZhpMzXlbbJY0luxFlzfIl6BBtLUXwU2QHcxfRKvY=;
        b=XSI7NB1eZIyPUdckMD/JKLwuf0cXmzkCddY5ocuvHZePKVRzHI/hlogPoLC0eULt45
         Klz/uMPHW9KQq0Q1P3EOB1txlfRd8kKpmHw+rS39NId3l64qd3i/z7R/0hqTW9YXDgms
         DHteB4158+2FF09x0Ji/jxX1l33Uhs5scrwIzGhT+DMOGD9+W6Zt8nznh8h7HnmXkvHz
         Jhv8RgBqcj6zBLlbv1s0l3Q6Q+lfijyqThAH4/+UPjyiVLf4m0Ml/jWNHx4Bnp5Z38b6
         XDY2XRYKlnlVu1VeRDFbck9uoKGgCGC9bhRsUQhGu5A+Y67YCV8RFXnUWKm3whSvNVBm
         2lgg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684455471; x=1687047471;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=iX/ZhpMzXlbbJY0luxFlzfIl6BBtLUXwU2QHcxfRKvY=;
        b=QmlbyhurBtY38jME5UmdG3L8q3uaC0pr+ZYltVJldJOIFJ9Ek5WefvX3wMbgiKpsrv
         ep3mWWO/yjTofGbLCPNYm2ftcmK8A/XECY/CQIEekYidv90tgzkxJUf1we+aatZdPY29
         nNXsZDVikWVk3uJkbcDkqvG0bl6B8S9zdeOkICSvLBVGDF5YW9r4E8jQcJ6EP33sVLVd
         Zbx0tBuJFNjjyzQAJ+4ryFmflNg1ZVGmhpe4IBiCet5+230nUOdtrsKxK2R97nD+r2zy
         ABU24zidbkBjBc1rRrDQLN08gjCoxRVGBPawQdumQ78JByv1XiFMsm1gK3RQRzy/Mkkk
         2tNg==
X-Gm-Message-State: AC+VfDzk4N5D98+z4OGM/azKMs47EZVZRI3vTqSY2ZJY8g8O0idfFUxA
        RKen02YwTTGsV9hkthr21P8=
X-Received: by 2002:a05:6a00:1486:b0:5a8:9858:750a with SMTP id
 v6-20020a056a00148600b005a89858750amr794908pfu.13.1684455471127;
        Thu, 18 May 2023 17:17:51 -0700 (PDT)
Received: from localhost
 (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com.
 [2603:800c:1a02:1bae:a7fa:157f:969a:4cde])
        by smtp.gmail.com with ESMTPSA id
 e21-20020a62ee15000000b0062cf75a9e6bsm1847518pfi.131.2023.05.18.17.17.50
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 18 May 2023 17:17:50 -0700 (PDT)
Sender: Tejun Heo <htejun@gmail.com>
From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
        linux-kernel@vger.kernel.org, kernel-team@meta.com,
        joshdon@google.com, brho@google.com, briannorris@chromium.org,
        nhuck@google.com, agk@redhat.com, snitzer@kernel.org,
        void@manifault.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 08/24] workqueue: Make per-cpu pool_workqueues allocated and
 released like unbound ones
Date: Thu, 18 May 2023 14:16:53 -1000
Message-Id: <20230519001709.2563-9-tj@kernel.org>
X-Mailer: git-send-email 2.40.1
In-Reply-To: <20230519001709.2563-1-tj@kernel.org>
References: <20230519001709.2563-1-tj@kernel.org>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,
        DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
        HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,
        SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no
        version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1766280179539642887?=
X-GMAIL-MSGID: =?utf-8?q?1766280179539642887?=

Currently, all per-cpu pwq's (pool_workqueue's) are allocated directly
through a per-cpu allocation and thus, unlike unbound workqueues, not
reference counted. This difference in lifetime management between the two
types is a bit confusing.

Unbound workqueues are currently accessed through wq->numa_pwq_tbl[] which
isn't suitiable for the planned CPU locality related improvements. The plan
is to unify pwq handling across per-cpu and unbound workqueues so that
they're always accessed through wq->cpu_pwq.

In preparation, this patch makes per-cpu pwq's to be allocated, reference
counted and released the same way as unbound pwq's. wq->cpu_pwq now holds
pointers to pwq's instead of containing them directly.

pwq_unbound_release_workfn() is renamed to pwq_release_workfn() as it's now
also used for per-cpu work items.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/workqueue.c | 74 +++++++++++++++++++++++++---------------------
 1 file changed, 40 insertions(+), 34 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 7addda9b37b9..574d2e12417d 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -257,11 +257,11 @@ struct pool_workqueue {
 
 	/*
 	 * Release of unbound pwq is punted to a kthread_worker. See put_pwq()
-	 * and pwq_unbound_release_workfn() for details. pool_workqueue itself
-	 * is also RCU protected so that the first pwq can be determined without
+	 * and pwq_release_workfn() for details. pool_workqueue itself is also
+	 * RCU protected so that the first pwq can be determined without
 	 * grabbing wq->mutex.
 	 */
-	struct kthread_work	unbound_release_work;
+	struct kthread_work	release_work;
 	struct rcu_head		rcu;
 } __aligned(1 << WORK_STRUCT_FLAG_BITS);
 
@@ -320,7 +320,7 @@ struct workqueue_struct {
 
 	/* hot fields used during command issue, aligned to cacheline */
 	unsigned int		flags ____cacheline_aligned; /* WQ: WQ_* flags */
-	struct pool_workqueue __percpu *cpu_pwq; /* I: per-cpu pwqs */
+	struct pool_workqueue __percpu **cpu_pwq; /* I: per-cpu pwqs */
 	struct pool_workqueue __rcu *numa_pwq_tbl[]; /* PWR: unbound pwqs indexed by node */
 };
 
@@ -1351,13 +1351,11 @@ static void put_pwq(struct pool_workqueue *pwq)
 	lockdep_assert_held(&pwq->pool->lock);
 	if (likely(--pwq->refcnt))
 		return;
-	if (WARN_ON_ONCE(!(pwq->wq->flags & WQ_UNBOUND)))
-		return;
 	/*
 	 * @pwq can't be released under pool->lock, bounce to a dedicated
 	 * kthread_worker to avoid A-A deadlocks.
 	 */
-	kthread_queue_work(pwq_release_worker, &pwq->unbound_release_work);
+	kthread_queue_work(pwq_release_worker, &pwq->release_work);
 }
 
 /**
@@ -1666,7 +1664,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
 	} else {
 		if (req_cpu == WORK_CPU_UNBOUND)
 			cpu = raw_smp_processor_id();
-		pwq = per_cpu_ptr(wq->cpu_pwq, cpu);
+		pwq = *per_cpu_ptr(wq->cpu_pwq, cpu);
 	}
 
 	pool = pwq->pool;
@@ -3987,31 +3985,30 @@ static void rcu_free_pwq(struct rcu_head *rcu)
  * Scheduled on pwq_release_worker by put_pwq() when an unbound pwq hits zero
  * refcnt and needs to be destroyed.
  */
-static void pwq_unbound_release_workfn(struct kthread_work *work)
+static void pwq_release_workfn(struct kthread_work *work)
 {
 	struct pool_workqueue *pwq = container_of(work, struct pool_workqueue,
-						  unbound_release_work);
+						  release_work);
 	struct workqueue_struct *wq = pwq->wq;
 	struct worker_pool *pool = pwq->pool;
 	bool is_last = false;
 
 	/*
-	 * when @pwq is not linked, it doesn't hold any reference to the
+	 * When @pwq is not linked, it doesn't hold any reference to the
 	 * @wq, and @wq is invalid to access.
 	 */
 	if (!list_empty(&pwq->pwqs_node)) {
-		if (WARN_ON_ONCE(!(wq->flags & WQ_UNBOUND)))
-			return;
-
 		mutex_lock(&wq->mutex);
 		list_del_rcu(&pwq->pwqs_node);
 		is_last = list_empty(&wq->pwqs);
 		mutex_unlock(&wq->mutex);
 	}
 
-	mutex_lock(&wq_pool_mutex);
-	put_unbound_pool(pool);
-	mutex_unlock(&wq_pool_mutex);
+	if (wq->flags & WQ_UNBOUND) {
+		mutex_lock(&wq_pool_mutex);
+		put_unbound_pool(pool);
+		mutex_unlock(&wq_pool_mutex);
+	}
 
 	call_rcu(&pwq->rcu, rcu_free_pwq);
 
@@ -4095,8 +4092,7 @@ static void init_pwq(struct pool_workqueue *pwq, struct workqueue_struct *wq,
 	INIT_LIST_HEAD(&pwq->inactive_works);
 	INIT_LIST_HEAD(&pwq->pwqs_node);
 	INIT_LIST_HEAD(&pwq->mayday_node);
-	kthread_init_work(&pwq->unbound_release_work,
-			  pwq_unbound_release_workfn);
+	kthread_init_work(&pwq->release_work, pwq_release_workfn);
 }
 
 /* sync @pwq with the current state of its associated wq and link it */
@@ -4497,20 +4493,25 @@ static int alloc_and_link_pwqs(struct workqueue_struct *wq)
 	int cpu, ret;
 
 	if (!(wq->flags & WQ_UNBOUND)) {
-		wq->cpu_pwq = alloc_percpu(struct pool_workqueue);
+		wq->cpu_pwq = alloc_percpu(struct pool_workqueue *);
 		if (!wq->cpu_pwq)
-			return -ENOMEM;
+			goto enomem;
 
 		for_each_possible_cpu(cpu) {
-			struct pool_workqueue *pwq =
+			struct pool_workqueue **pwq_p =
 				per_cpu_ptr(wq->cpu_pwq, cpu);
-			struct worker_pool *cpu_pools =
-				per_cpu(cpu_worker_pools, cpu);
+			struct worker_pool *pool =
+				&(per_cpu_ptr(cpu_worker_pools, cpu)[highpri]);
 
-			init_pwq(pwq, wq, &cpu_pools[highpri]);
+			*pwq_p = kmem_cache_alloc_node(pwq_cache, GFP_KERNEL,
+						       pool->node);
+			if (!*pwq_p)
+				goto enomem;
+
+			init_pwq(*pwq_p, wq, pool);
 
 			mutex_lock(&wq->mutex);
-			link_pwq(pwq);
+			link_pwq(*pwq_p);
 			mutex_unlock(&wq->mutex);
 		}
 		return 0;
@@ -4529,6 +4530,15 @@ static int alloc_and_link_pwqs(struct workqueue_struct *wq)
 	cpus_read_unlock();
 
 	return ret;
+
+enomem:
+	if (wq->cpu_pwq) {
+		for_each_possible_cpu(cpu)
+			kfree(*per_cpu_ptr(wq->cpu_pwq, cpu));
+		free_percpu(wq->cpu_pwq);
+		wq->cpu_pwq = NULL;
+	}
+	return -ENOMEM;
 }
 
 static int wq_clamp_max_active(int max_active, unsigned int flags,
@@ -4702,7 +4712,7 @@ static bool pwq_busy(struct pool_workqueue *pwq)
 void destroy_workqueue(struct workqueue_struct *wq)
 {
 	struct pool_workqueue *pwq;
-	int node;
+	int cpu, node;
 
 	/*
 	 * Remove it from sysfs first so that sanity check failure doesn't
@@ -4762,12 +4772,8 @@ void destroy_workqueue(struct workqueue_struct *wq)
 	mutex_unlock(&wq_pool_mutex);
 
 	if (!(wq->flags & WQ_UNBOUND)) {
-		wq_unregister_lockdep(wq);
-		/*
-		 * The base ref is never dropped on per-cpu pwqs.  Directly
-		 * schedule RCU free.
-		 */
-		call_rcu(&wq->rcu, rcu_free_wq);
+		for_each_possible_cpu(cpu)
+			put_pwq_unlocked(*per_cpu_ptr(wq->cpu_pwq, cpu));
 	} else {
 		/*
 		 * We're the sole accessor of @wq at this point.  Directly
@@ -4884,7 +4890,7 @@ bool workqueue_congested(int cpu, struct workqueue_struct *wq)
 		cpu = smp_processor_id();
 
 	if (!(wq->flags & WQ_UNBOUND))
-		pwq = per_cpu_ptr(wq->cpu_pwq, cpu);
+		pwq = *per_cpu_ptr(wq->cpu_pwq, cpu);
 	else
 		pwq = unbound_pwq_by_node(wq, cpu_to_node(cpu));
 

From patchwork Fri May 19 00:16:54 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tejun Heo <tj@kernel.org>
X-Patchwork-Id: 96132
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp890067vqo;
        Thu, 18 May 2023 17:29:14 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ4/kpeQkG+vNInKEPIYA4XhrQuWvw8mHt4t7XNUujvgex179opl8ewGjhyGE/LlIoHYCbvF
X-Received: by 2002:a17:90a:8c03:b0:24d:f8e6:9d4c with SMTP id
 a3-20020a17090a8c0300b0024df8e69d4cmr249639pjo.49.1684456154418;
        Thu, 18 May 2023 17:29:14 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1684456154; cv=none;
        d=google.com; s=arc-20160816;
        b=RKQCJ0lRQPZ0qLXpzYyyQCfOBvq7uHlDf6gBx/zSzvoHx5EZOrZX0t7RaMOPej/p/h
         ptjhIV7RsNaau9AJiMYjB/v3VLhQEQTy8XSE0Tzc+zeWDf/FTNZnW3T6yY3nRl0EosQs
         bhpgl4g2kjA1drAtfUf1UOO+TyyjuKwgQKSypXCGhwHQX9OF4ZOXEFm6kjYMW9aMV0pH
         zIUB+Dfp/pm+d9Xzz5pF4aYKMc8ZKefqXBtztrW53Y+oQ7gMA3wE/r/aWt4GeSgxSt13
         ixp8hbmn/QG723h9KeBRL0AEPBfpiiqIC/hNxHAbDMRtGHXkg5/6TUoERFTL6PttCy2/
         3JpA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from:sender
         :dkim-signature;
        bh=vqxsFLuh6S8kzvHhAiOj1kdb3o4HT8hk5LpbS18Nae0=;
        b=iqxQdSoy4M/wN6ZFQLlxroTeGOwN89R/4C+qXfU8zX0lAzYQ8aYOXdsD+6wOMLqWoE
         ptIyNF2JbtEVxbQ3e87pF9e34DLEi9egbXpVTpVHKl0ciiOiExvaQ4EW/Sa4uh9XEjvI
         kRSEKJ29LyvQonp2jL9pRSwHIUNghHUbeTYmyGxtt+3ZjC7s9lzeuZ+Z5ugEta8M+Yg2
         N8nvFTwOEaEs2vWN289vB5j1O+11gk5psWpfHOCQZEY/W8DjCOuaYuUOU8XPCTNHCd5T
         nmhWoRxX3YJ7Xg1M5/MwLvKxQk0dkGLdmuQXAGI28DfyDMn3eQG3X/3lRl51EqCSGpLK
         57vw==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=CQgMAuQp;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 h23-20020a17090acf1700b002527f171401si580612pju.182.2023.05.18.17.29.01;
        Thu, 18 May 2023 17:29:14 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=CQgMAuQp;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229547AbjESASR (ORCPT <rfc822;pacteraone@gmail.com> + 99 others);
        Thu, 18 May 2023 20:18:17 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53656 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229981AbjESASC (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 May 2023 20:18:02 -0400
Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com
 [IPv6:2607:f8b0:4864:20::631])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 66A021738
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:17:54 -0700 (PDT)
Received: by mail-pl1-x631.google.com with SMTP id
 d9443c01a7336-1ae615d5018so13834175ad.1
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:17:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1684455473; x=1687047473;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date
         :message-id:reply-to;
        bh=vqxsFLuh6S8kzvHhAiOj1kdb3o4HT8hk5LpbS18Nae0=;
        b=CQgMAuQpgHflzZZUtZxgSgS3EERH8X/Ag3BN66MKzvMndcGviahLmbGu5o45NBNxKD
         RqUxNFaBbAMf9O8i6gjouPS75+EHdDsH5zaHq/XDysfyUymnCIRat4E1WnGYq/Vc4+yn
         MR74qDubhRLQ1j2nSVmysh9MNx+QLoNBAeRKrvtS3BDKtuRJRgrDBeYp5Pf6P4MsnAo/
         yyNmuq5B2tN+ZyD6/Q/OG+T1eI9awMJnxyUfwpoz2QfQbLcb7KF9vKBNtm5nyLgRHEj4
         rDg8bZMSSbhrNykRMGdphbSt4dV+6UoRIw70qMUdpVtN5Roy/9J3X5ALw6Eqi6517Rj+
         xk6g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684455473; x=1687047473;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=vqxsFLuh6S8kzvHhAiOj1kdb3o4HT8hk5LpbS18Nae0=;
        b=Xk1AObo7ecZdqJIfHSPqY1AmW4oXve7mqZUUbgf13oieYdNNVEZufKolObvzknV3fR
         dj5iQRfRcIApGlwPIygCtkF3NtfypVJY9BJApUKwZS+k2uKnHolC+8vDnxUF/FCzVIS2
         Vjsi4hrH+xUmWQZculllJSSjXYzerc+6m8brmXf6UWoro7UQo5ycwNwPxiIin2XjkPUM
         4ajnMHL/um8kyHcUyz9cWmuPyhXamrt8OMfwPzhhTFKaDqweIZFJFBXiclzcncEH/mjq
         ZFp7sRVVzSdaxNjkM/Favjl8JKeEeQk9dLHhdxDF9Bh51OGbDNIPADK661kdrnHjpEo9
         HfBA==
X-Gm-Message-State: AC+VfDxDDgKVwG2hmaq6Q69aR0tv2hN7L+rQqfDnht77alixVQs+i1zC
        OKTfAJtKEOyuoE9oCdsPKEE=
X-Received: by 2002:a17:902:ecc4:b0:1ab:528:5f85 with SMTP id
 a4-20020a170902ecc400b001ab05285f85mr929380plh.59.1684455473115;
        Thu, 18 May 2023 17:17:53 -0700 (PDT)
Received: from localhost
 (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com.
 [2603:800c:1a02:1bae:a7fa:157f:969a:4cde])
        by smtp.gmail.com with ESMTPSA id
 a9-20020a170902ecc900b001a24cded097sm2031414plh.236.2023.05.18.17.17.52
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 18 May 2023 17:17:52 -0700 (PDT)
Sender: Tejun Heo <htejun@gmail.com>
From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
        linux-kernel@vger.kernel.org, kernel-team@meta.com,
        joshdon@google.com, brho@google.com, briannorris@chromium.org,
        nhuck@google.com, agk@redhat.com, snitzer@kernel.org,
        void@manifault.com, Tejun Heo <tj@kernel.org>,
        Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>,
        Jason Gunthorpe <jgg@ziepe.ca>,
        Leon Romanovsky <leon@kernel.org>,
        Karsten Graul <kgraul@linux.ibm.com>,
        Wenjia Zhang <wenjia@linux.ibm.com>,
        Jan Karcher <jaka@linux.ibm.com>
Subject: [PATCH 09/24] workqueue: Make unbound workqueues to use per-cpu
 pool_workqueues
Date: Thu, 18 May 2023 14:16:54 -1000
Message-Id: <20230519001709.2563-10-tj@kernel.org>
X-Mailer: git-send-email 2.40.1
In-Reply-To: <20230519001709.2563-1-tj@kernel.org>
References: <20230519001709.2563-1-tj@kernel.org>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,
        DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
        HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,
        SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no
        version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1766280296638015065?=
X-GMAIL-MSGID: =?utf-8?q?1766280296638015065?=

A pwq (pool_workqueue) represents an association between a workqueue and a
worker_pool. When a work item is queued, the workqueue selects the pwq to
use, which in turn determines the pool, and queues the work item to the pool
through the pwq. pwq is also what implements the maximum concurrency limit -
@max_active.

As a per-cpu workqueue should be assocaited with a different worker_pool on
each CPU, it always had per-cpu pwq's that are accessed through wq->cpu_pwq.
However, unbound workqueues were sharing a pwq within each NUMA node by
default. The sharing has several downsides:

* Because @max_active is per-pwq, the meaning of @max_active changes
  depending on the machine configuration and whether workqueue NUMA locality
  support is enabled.

* Makes per-cpu and unbound code deviate.

* Gets in the way of making workqueue CPU locality awareness more flexible.

This patch makes unbound workqueues use per-cpu pwq's the same way per-cpu
workqueues do by making the following changes:

* wq->numa_pwq_tbl[] is removed and unbound workqueues now use wq->cpu_pwq
  just like per-cpu workqueues. wq->cpu_pwq is now RCU protected for unbound
  workqueues.

* numa_pwq_tbl_install() is renamed to install_unbound_pwq() and installs
  the specified pwq to the target CPU's wq->cpu_pwq.

* apply_wqattrs_prepare() now always allocates a separate pwq for each CPU
  unless the workqueue is ordered. If ordered, all CPUs use wq->dfl_pwq.
  This makes the return value of wq_calc_node_cpumask() unnecessary. It now
  returns void.

* @max_active now means the same thing for both per-cpu and unbound
  workqueues. WQ_UNBOUND_MAX_ACTIVE now equals WQ_MAX_ACTIVE and
  documentation is updated accordingly. WQ_UNBOUND_MAX_ACTIVE is no longer
  used in workqueue implementation and will be removed later.

* All unbound pwq operations which used to be per-numa-node are now per-cpu.

For most unbound workqueue users, this shouldn't cause noticeable changes.
Work item issue and completion will be a small bit faster, flush_workqueue()
would become a bit more expensive, and the total concurrency limit would
likely become higher. All @max_active==1 use cases are currently being
audited for conversion into alloc_ordered_workqueue() and they shouldn't be
affected once the audit and conversion is complete.

One area where the behavior change may be more noticeable is
workqueue_congested() as the reported congestion state is now per CPU
instead of NUMA node. There are only two users of this interface -
drivers/infiniband/hw/hfi1 and net/smc. Maintainers of both subsystems are
cc'd. Inputs on the behavior change would be very much appreciated.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Karsten Graul <kgraul@linux.ibm.com>
Cc: Wenjia Zhang <wenjia@linux.ibm.com>
Cc: Jan Karcher <jaka@linux.ibm.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
---
 Documentation/core-api/workqueue.rst |  21 ++-
 include/linux/workqueue.h            |   8 +-
 kernel/workqueue.c                   | 218 ++++++++++-----------------
 3 files changed, 89 insertions(+), 158 deletions(-)

diff --git a/Documentation/core-api/workqueue.rst b/Documentation/core-api/workqueue.rst
index a4c9b9d1905f..8e541c5d8fa9 100644
--- a/Documentation/core-api/workqueue.rst
+++ b/Documentation/core-api/workqueue.rst
@@ -220,17 +220,16 @@ resources, scheduled and executed.
 ``max_active``
 --------------
 
-``@max_active`` determines the maximum number of execution contexts
-per CPU which can be assigned to the work items of a wq.  For example,
-with ``@max_active`` of 16, at most 16 work items of the wq can be
-executing at the same time per CPU.
-
-Currently, for a bound wq, the maximum limit for ``@max_active`` is
-512 and the default value used when 0 is specified is 256.  For an
-unbound wq, the limit is higher of 512 and 4 *
-``num_possible_cpus()``.  These values are chosen sufficiently high
-such that they are not the limiting factor while providing protection
-in runaway cases.
+``@max_active`` determines the maximum number of execution contexts per
+CPU which can be assigned to the work items of a wq. For example, with
+``@max_active`` of 16, at most 16 work items of the wq can be executing
+at the same time per CPU. This is always a per-CPU attribute, even for
+unbound workqueues.
+
+The maximum limit for ``@max_active`` is 512 and the default value used
+when 0 is specified is 256. These values are chosen sufficiently high
+such that they are not the limiting factor while providing protection in
+runaway cases.
 
 The number of active work items of a wq is usually regulated by the
 users of the wq, more specifically, by how many work items the users
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 3992c994787f..d1b681f67985 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -342,14 +342,10 @@ enum {
 	__WQ_ORDERED_EXPLICIT	= 1 << 19, /* internal: alloc_ordered_workqueue() */
 
 	WQ_MAX_ACTIVE		= 512,	  /* I like 512, better ideas? */
-	WQ_MAX_UNBOUND_PER_CPU	= 4,	  /* 4 * #cpus for unbound wq */
+	WQ_UNBOUND_MAX_ACTIVE	= WQ_MAX_ACTIVE,
 	WQ_DFL_ACTIVE		= WQ_MAX_ACTIVE / 2,
 };
 
-/* unbound wq's aren't per-cpu, scale max_active according to #cpus */
-#define WQ_UNBOUND_MAX_ACTIVE	\
-	max_t(int, WQ_MAX_ACTIVE, num_possible_cpus() * WQ_MAX_UNBOUND_PER_CPU)
-
 /*
  * System-wide workqueues which are always present.
  *
@@ -390,7 +386,7 @@ extern struct workqueue_struct *system_freezable_power_efficient_wq;
  * alloc_workqueue - allocate a workqueue
  * @fmt: printf format for the name of the workqueue
  * @flags: WQ_* flags
- * @max_active: max in-flight work items, 0 for default
+ * @max_active: max in-flight work items per CPU, 0 for default
  * remaining args: args for @fmt
  *
  * Allocate a workqueue with the specified parameters.  For detailed
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 574d2e12417d..43f3bb801bd9 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -320,8 +320,7 @@ struct workqueue_struct {
 
 	/* hot fields used during command issue, aligned to cacheline */
 	unsigned int		flags ____cacheline_aligned; /* WQ: WQ_* flags */
-	struct pool_workqueue __percpu **cpu_pwq; /* I: per-cpu pwqs */
-	struct pool_workqueue __rcu *numa_pwq_tbl[]; /* PWR: unbound pwqs indexed by node */
+	struct pool_workqueue __percpu __rcu **cpu_pwq; /* I: per-cpu pwqs */
 };
 
 static struct kmem_cache *pwq_cache;
@@ -602,35 +601,6 @@ static int worker_pool_assign_id(struct worker_pool *pool)
 	return ret;
 }
 
-/**
- * unbound_pwq_by_node - return the unbound pool_workqueue for the given node
- * @wq: the target workqueue
- * @node: the node ID
- *
- * This must be called with any of wq_pool_mutex, wq->mutex or RCU
- * read locked.
- * If the pwq needs to be used beyond the locking in effect, the caller is
- * responsible for guaranteeing that the pwq stays online.
- *
- * Return: The unbound pool_workqueue for @node.
- */
-static struct pool_workqueue *unbound_pwq_by_node(struct workqueue_struct *wq,
-						  int node)
-{
-	assert_rcu_or_wq_mutex_or_pool_mutex(wq);
-
-	/*
-	 * XXX: @node can be NUMA_NO_NODE if CPU goes offline while a
-	 * delayed item is pending.  The plan is to keep CPU -> NODE
-	 * mapping valid and stable across CPU on/offlines.  Once that
-	 * happens, this workaround can be removed.
-	 */
-	if (unlikely(node == NUMA_NO_NODE))
-		return wq->dfl_pwq;
-
-	return rcu_dereference_raw(wq->numa_pwq_tbl[node]);
-}
-
 static unsigned int work_color_to_flags(int color)
 {
 	return color << WORK_STRUCT_COLOR_SHIFT;
@@ -1657,16 +1627,14 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
 	rcu_read_lock();
 retry:
 	/* pwq which will be used unless @work is executing elsewhere */
-	if (wq->flags & WQ_UNBOUND) {
-		if (req_cpu == WORK_CPU_UNBOUND)
+	if (req_cpu == WORK_CPU_UNBOUND) {
+		if (wq->flags & WQ_UNBOUND)
 			cpu = wq_select_unbound_cpu(raw_smp_processor_id());
-		pwq = unbound_pwq_by_node(wq, cpu_to_node(cpu));
-	} else {
-		if (req_cpu == WORK_CPU_UNBOUND)
+		else
 			cpu = raw_smp_processor_id();
-		pwq = *per_cpu_ptr(wq->cpu_pwq, cpu);
 	}
 
+	pwq = rcu_dereference(*per_cpu_ptr(wq->cpu_pwq, cpu));
 	pool = pwq->pool;
 
 	/*
@@ -1695,12 +1663,11 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
 	}
 
 	/*
-	 * pwq is determined and locked.  For unbound pools, we could have
-	 * raced with pwq release and it could already be dead.  If its
-	 * refcnt is zero, repeat pwq selection.  Note that pwqs never die
-	 * without another pwq replacing it in the numa_pwq_tbl or while
-	 * work items are executing on it, so the retrying is guaranteed to
-	 * make forward-progress.
+	 * pwq is determined and locked. For unbound pools, we could have raced
+	 * with pwq release and it could already be dead. If its refcnt is zero,
+	 * repeat pwq selection. Note that unbound pwqs never die without
+	 * another pwq replacing it in cpu_pwq or while work items are executing
+	 * on it, so the retrying is guaranteed to make forward-progress.
 	 */
 	if (unlikely(!pwq->refcnt)) {
 		if (wq->flags & WQ_UNBOUND) {
@@ -3799,12 +3766,8 @@ static void rcu_free_wq(struct rcu_head *rcu)
 		container_of(rcu, struct workqueue_struct, rcu);
 
 	wq_free_lockdep(wq);
-
-	if (!(wq->flags & WQ_UNBOUND))
-		free_percpu(wq->cpu_pwq);
-	else
-		free_workqueue_attrs(wq->unbound_attrs);
-
+	free_percpu(wq->cpu_pwq);
+	free_workqueue_attrs(wq->unbound_attrs);
 	kfree(wq);
 }
 
@@ -4157,11 +4120,8 @@ static struct pool_workqueue *alloc_unbound_pwq(struct workqueue_struct *wq,
  *
  * The caller is responsible for ensuring that the cpumask of @node stays
  * stable.
- *
- * Return: %true if the resulting @cpumask is different from @attrs->cpumask,
- * %false if equal.
  */
-static bool wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node,
+static void wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node,
 				 int cpu_going_down, cpumask_t *cpumask)
 {
 	if (!wq_numa_enabled || attrs->no_numa)
@@ -4178,23 +4138,18 @@ static bool wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node,
 	/* yeap, return possible CPUs in @node that @attrs wants */
 	cpumask_and(cpumask, attrs->cpumask, wq_numa_possible_cpumask[node]);
 
-	if (cpumask_empty(cpumask)) {
+	if (cpumask_empty(cpumask))
 		pr_warn_once("WARNING: workqueue cpumask: online intersect > "
 				"possible intersect\n");
-		return false;
-	}
-
-	return !cpumask_equal(cpumask, attrs->cpumask);
+	return;
 
 use_dfl:
 	cpumask_copy(cpumask, attrs->cpumask);
-	return false;
 }
 
-/* install @pwq into @wq's numa_pwq_tbl[] for @node and return the old pwq */
-static struct pool_workqueue *numa_pwq_tbl_install(struct workqueue_struct *wq,
-						   int node,
-						   struct pool_workqueue *pwq)
+/* install @pwq into @wq's cpu_pwq and return the old pwq */
+static struct pool_workqueue *install_unbound_pwq(struct workqueue_struct *wq,
+					int cpu, struct pool_workqueue *pwq)
 {
 	struct pool_workqueue *old_pwq;
 
@@ -4204,8 +4159,8 @@ static struct pool_workqueue *numa_pwq_tbl_install(struct workqueue_struct *wq,
 	/* link_pwq() can handle duplicate calls */
 	link_pwq(pwq);
 
-	old_pwq = rcu_access_pointer(wq->numa_pwq_tbl[node]);
-	rcu_assign_pointer(wq->numa_pwq_tbl[node], pwq);
+	old_pwq = rcu_access_pointer(*per_cpu_ptr(wq->cpu_pwq, cpu));
+	rcu_assign_pointer(*per_cpu_ptr(wq->cpu_pwq, cpu), pwq);
 	return old_pwq;
 }
 
@@ -4222,10 +4177,10 @@ struct apply_wqattrs_ctx {
 static void apply_wqattrs_cleanup(struct apply_wqattrs_ctx *ctx)
 {
 	if (ctx) {
-		int node;
+		int cpu;
 
-		for_each_node(node)
-			put_pwq_unlocked(ctx->pwq_tbl[node]);
+		for_each_possible_cpu(cpu)
+			put_pwq_unlocked(ctx->pwq_tbl[cpu]);
 		put_pwq_unlocked(ctx->dfl_pwq);
 
 		free_workqueue_attrs(ctx->attrs);
@@ -4242,11 +4197,11 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
 {
 	struct apply_wqattrs_ctx *ctx;
 	struct workqueue_attrs *new_attrs, *tmp_attrs;
-	int node;
+	int cpu;
 
 	lockdep_assert_held(&wq_pool_mutex);
 
-	ctx = kzalloc(struct_size(ctx, pwq_tbl, nr_node_ids), GFP_KERNEL);
+	ctx = kzalloc(struct_size(ctx, pwq_tbl, nr_cpu_ids), GFP_KERNEL);
 
 	new_attrs = alloc_workqueue_attrs();
 	tmp_attrs = alloc_workqueue_attrs();
@@ -4280,14 +4235,16 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
 	if (!ctx->dfl_pwq)
 		goto out_free;
 
-	for_each_node(node) {
-		if (wq_calc_node_cpumask(new_attrs, node, -1, tmp_attrs->cpumask)) {
-			ctx->pwq_tbl[node] = alloc_unbound_pwq(wq, tmp_attrs);
-			if (!ctx->pwq_tbl[node])
-				goto out_free;
-		} else {
+	for_each_possible_cpu(cpu) {
+		if (new_attrs->no_numa) {
 			ctx->dfl_pwq->refcnt++;
-			ctx->pwq_tbl[node] = ctx->dfl_pwq;
+			ctx->pwq_tbl[cpu] = ctx->dfl_pwq;
+		} else {
+			wq_calc_node_cpumask(new_attrs, cpu_to_node(cpu), -1,
+					     tmp_attrs->cpumask);
+			ctx->pwq_tbl[cpu] = alloc_unbound_pwq(wq, tmp_attrs);
+			if (!ctx->pwq_tbl[cpu])
+				goto out_free;
 		}
 	}
 
@@ -4310,7 +4267,7 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
 /* set attrs and install prepared pwqs, @ctx points to old pwqs on return */
 static void apply_wqattrs_commit(struct apply_wqattrs_ctx *ctx)
 {
-	int node;
+	int cpu;
 
 	/* all pwqs have been created successfully, let's install'em */
 	mutex_lock(&ctx->wq->mutex);
@@ -4318,9 +4275,9 @@ static void apply_wqattrs_commit(struct apply_wqattrs_ctx *ctx)
 	copy_workqueue_attrs(ctx->wq->unbound_attrs, ctx->attrs);
 
 	/* save the previous pwq and install the new one */
-	for_each_node(node)
-		ctx->pwq_tbl[node] = numa_pwq_tbl_install(ctx->wq, node,
-							  ctx->pwq_tbl[node]);
+	for_each_possible_cpu(cpu)
+		ctx->pwq_tbl[cpu] = install_unbound_pwq(ctx->wq, cpu,
+							ctx->pwq_tbl[cpu]);
 
 	/* @dfl_pwq might not have been used, ensure it's linked */
 	link_pwq(ctx->dfl_pwq);
@@ -4448,20 +4405,13 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
 	cpumask = target_attrs->cpumask;
 
 	copy_workqueue_attrs(target_attrs, wq->unbound_attrs);
-	pwq = unbound_pwq_by_node(wq, node);
 
-	/*
-	 * Let's determine what needs to be done.  If the target cpumask is
-	 * different from the default pwq's, we need to compare it to @pwq's
-	 * and create a new one if they don't match.  If the target cpumask
-	 * equals the default pwq's, the default pwq should be used.
-	 */
-	if (wq_calc_node_cpumask(wq->dfl_pwq->pool->attrs, node, cpu_off, cpumask)) {
-		if (cpumask_equal(cpumask, pwq->pool->attrs->cpumask))
-			return;
-	} else {
-		goto use_dfl_pwq;
-	}
+	/* nothing to do if the target cpumask matches the current pwq */
+	wq_calc_node_cpumask(wq->dfl_pwq->pool->attrs, node, cpu_off, cpumask);
+	pwq = rcu_dereference_protected(*per_cpu_ptr(wq->cpu_pwq, cpu),
+					lockdep_is_held(&wq_pool_mutex));
+	if (cpumask_equal(cpumask, pwq->pool->attrs->cpumask))
+		return;
 
 	/* create a new pwq */
 	pwq = alloc_unbound_pwq(wq, target_attrs);
@@ -4473,7 +4423,7 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
 
 	/* Install the new pwq. */
 	mutex_lock(&wq->mutex);
-	old_pwq = numa_pwq_tbl_install(wq, node, pwq);
+	old_pwq = install_unbound_pwq(wq, cpu, pwq);
 	goto out_unlock;
 
 use_dfl_pwq:
@@ -4481,7 +4431,7 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
 	raw_spin_lock_irq(&wq->dfl_pwq->pool->lock);
 	get_pwq(wq->dfl_pwq);
 	raw_spin_unlock_irq(&wq->dfl_pwq->pool->lock);
-	old_pwq = numa_pwq_tbl_install(wq, node, wq->dfl_pwq);
+	old_pwq = install_unbound_pwq(wq, cpu, wq->dfl_pwq);
 out_unlock:
 	mutex_unlock(&wq->mutex);
 	put_pwq_unlocked(old_pwq);
@@ -4492,11 +4442,11 @@ static int alloc_and_link_pwqs(struct workqueue_struct *wq)
 	bool highpri = wq->flags & WQ_HIGHPRI;
 	int cpu, ret;
 
-	if (!(wq->flags & WQ_UNBOUND)) {
-		wq->cpu_pwq = alloc_percpu(struct pool_workqueue *);
-		if (!wq->cpu_pwq)
-			goto enomem;
+	wq->cpu_pwq = alloc_percpu(struct pool_workqueue *);
+	if (!wq->cpu_pwq)
+		goto enomem;
 
+	if (!(wq->flags & WQ_UNBOUND)) {
 		for_each_possible_cpu(cpu) {
 			struct pool_workqueue **pwq_p =
 				per_cpu_ptr(wq->cpu_pwq, cpu);
@@ -4544,13 +4494,11 @@ static int alloc_and_link_pwqs(struct workqueue_struct *wq)
 static int wq_clamp_max_active(int max_active, unsigned int flags,
 			       const char *name)
 {
-	int lim = flags & WQ_UNBOUND ? WQ_UNBOUND_MAX_ACTIVE : WQ_MAX_ACTIVE;
-
-	if (max_active < 1 || max_active > lim)
+	if (max_active < 1 || max_active > WQ_MAX_ACTIVE)
 		pr_warn("workqueue: max_active %d requested for %s is out of range, clamping between %d and %d\n",
-			max_active, name, 1, lim);
+			max_active, name, 1, WQ_MAX_ACTIVE);
 
-	return clamp_val(max_active, 1, lim);
+	return clamp_val(max_active, 1, WQ_MAX_ACTIVE);
 }
 
 /*
@@ -4594,7 +4542,6 @@ struct workqueue_struct *alloc_workqueue(const char *fmt,
 					 unsigned int flags,
 					 int max_active, ...)
 {
-	size_t tbl_size = 0;
 	va_list args;
 	struct workqueue_struct *wq;
 	struct pool_workqueue *pwq;
@@ -4614,10 +4561,7 @@ struct workqueue_struct *alloc_workqueue(const char *fmt,
 		flags |= WQ_UNBOUND;
 
 	/* allocate wq and format name */
-	if (flags & WQ_UNBOUND)
-		tbl_size = nr_node_ids * sizeof(wq->numa_pwq_tbl[0]);
-
-	wq = kzalloc(sizeof(*wq) + tbl_size, GFP_KERNEL);
+	wq = kzalloc(sizeof(*wq), GFP_KERNEL);
 	if (!wq)
 		return NULL;
 
@@ -4712,7 +4656,7 @@ static bool pwq_busy(struct pool_workqueue *pwq)
 void destroy_workqueue(struct workqueue_struct *wq)
 {
 	struct pool_workqueue *pwq;
-	int cpu, node;
+	int cpu;
 
 	/*
 	 * Remove it from sysfs first so that sanity check failure doesn't
@@ -4771,29 +4715,23 @@ void destroy_workqueue(struct workqueue_struct *wq)
 	list_del_rcu(&wq->list);
 	mutex_unlock(&wq_pool_mutex);
 
-	if (!(wq->flags & WQ_UNBOUND)) {
-		for_each_possible_cpu(cpu)
-			put_pwq_unlocked(*per_cpu_ptr(wq->cpu_pwq, cpu));
-	} else {
-		/*
-		 * We're the sole accessor of @wq at this point.  Directly
-		 * access numa_pwq_tbl[] and dfl_pwq to put the base refs.
-		 * @wq will be freed when the last pwq is released.
-		 */
-		for_each_node(node) {
-			pwq = rcu_access_pointer(wq->numa_pwq_tbl[node]);
-			RCU_INIT_POINTER(wq->numa_pwq_tbl[node], NULL);
-			put_pwq_unlocked(pwq);
-		}
+	/*
+	 * We're the sole accessor of @wq. Directly access cpu_pwq and dfl_pwq
+	 * to put the base refs. @wq will be auto-destroyed from the last
+	 * pwq_put. RCU read lock prevents @wq from going away from under us.
+	 */
+	rcu_read_lock();
 
-		/*
-		 * Put dfl_pwq.  @wq may be freed any time after dfl_pwq is
-		 * put.  Don't access it afterwards.
-		 */
-		pwq = wq->dfl_pwq;
-		wq->dfl_pwq = NULL;
+	for_each_possible_cpu(cpu) {
+		pwq = rcu_access_pointer(*per_cpu_ptr(wq->cpu_pwq, cpu));
+		RCU_INIT_POINTER(*per_cpu_ptr(wq->cpu_pwq, cpu), NULL);
 		put_pwq_unlocked(pwq);
 	}
+
+	put_pwq_unlocked(wq->dfl_pwq);
+	wq->dfl_pwq = NULL;
+
+	rcu_read_unlock();
 }
 EXPORT_SYMBOL_GPL(destroy_workqueue);
 
@@ -4870,10 +4808,11 @@ bool current_is_workqueue_rescuer(void)
  * unreliable and only useful as advisory hints or for debugging.
  *
  * If @cpu is WORK_CPU_UNBOUND, the test is performed on the local CPU.
- * Note that both per-cpu and unbound workqueues may be associated with
- * multiple pool_workqueues which have separate congested states.  A
- * workqueue being congested on one CPU doesn't mean the workqueue is also
- * contested on other CPUs / NUMA nodes.
+ *
+ * With the exception of ordered workqueues, all workqueues have per-cpu
+ * pool_workqueues, each with its own congested state. A workqueue being
+ * congested on one CPU doesn't mean that the workqueue is contested on any
+ * other CPUs.
  *
  * Return:
  * %true if congested, %false otherwise.
@@ -4889,12 +4828,9 @@ bool workqueue_congested(int cpu, struct workqueue_struct *wq)
 	if (cpu == WORK_CPU_UNBOUND)
 		cpu = smp_processor_id();
 
-	if (!(wq->flags & WQ_UNBOUND))
-		pwq = *per_cpu_ptr(wq->cpu_pwq, cpu);
-	else
-		pwq = unbound_pwq_by_node(wq, cpu_to_node(cpu));
-
+	pwq = *per_cpu_ptr(wq->cpu_pwq, cpu);
 	ret = !list_empty(&pwq->inactive_works);
+
 	preempt_enable();
 	rcu_read_unlock();
 
@@ -6399,7 +6335,7 @@ void __init workqueue_init_early(void)
 	system_highpri_wq = alloc_workqueue("events_highpri", WQ_HIGHPRI, 0);
 	system_long_wq = alloc_workqueue("events_long", 0, 0);
 	system_unbound_wq = alloc_workqueue("events_unbound", WQ_UNBOUND,
-					    WQ_UNBOUND_MAX_ACTIVE);
+					    WQ_MAX_ACTIVE);
 	system_freezable_wq = alloc_workqueue("events_freezable",
 					      WQ_FREEZABLE, 0);
 	system_power_efficient_wq = alloc_workqueue("events_power_efficient",

From patchwork Fri May 19 00:16:55 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tejun Heo <tj@kernel.org>
X-Patchwork-Id: 96131
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp890071vqo;
        Thu, 18 May 2023 17:29:15 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ6CxE+uLiEKBmAGWOc+pSlFxUjiZV9tGc5pDUMz3esKph9Txp7TW5FnRTsNhdxa+yzFwYw7
X-Received: by 2002:a17:902:d382:b0:19d:1bc1:ce22 with SMTP id
 e2-20020a170902d38200b0019d1bc1ce22mr880260pld.5.1684456155254;
        Thu, 18 May 2023 17:29:15 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1684456155; cv=none;
        d=google.com; s=arc-20160816;
        b=w8gzHHaoew/ybip2ft1Ynozp3s3sPfUbKdQRH8njgAwXtMczX2X7B5tLdxdrChwKAI
         23lw/TfUB0/Y+mH/scR1WtrIbpCNlKG7MyO2Y1kyB+jTVlD1jbXES+YKLsEK/T5WwJjA
         mosSWEVUrOhZGr6DoeZEUiLlJvlKYprN2KxjVWgh5EVplpQWKekVkRtjtfLeeGZzEA64
         YAFFgUcV5o/CuN4g+Hxs2r2EEukHxThYoiL0IzIqwV9T65AqIb8J7cimSUjytS7/E+GN
         1r0sLNW9ajFxacufNrXs6BMxTmiUM008nFMTobcZLFu9EQ9wadQ6YJaG7tNpusPiaT8E
         UMWQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from:sender
         :dkim-signature;
        bh=l1m2+L6s6yTedlkjGRtkdmOBtMzjsPpC6Q11Fk73V9E=;
        b=p5KoHDXlbFwZpMHCZgeS2oKFOL7nHr32qQKBKdlo4BribBNXW9ts37DN3WK55QlOuP
         jKMvoHqhzOmsR2SdfW3YHyXpoU7Ks66tz8R/1boO+nbngarO/cizAuCFlSfeLmlbVbMF
         O5EJD0xAVxf1hwepo7V87GfakkNOONlhSWSqvo5d0FXDk4q/IC9avRxzimUu5qvoWu1K
         w3dGRv9fnjM9Yr4jLEH68BQU1hUd2ggXQ/Hl4feZvVXGyb5sH+zH8RJ9Pxz7pQ/bp2j4
         5XF84hMd9mFhwc7gGqamjFR4OygYXg9oeSAuImyOI1nX0qi8Exro4ROoVrwenIVYn78p
         Tlug==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=ibTQQrzi;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 x5-20020a170902ec8500b001a1a0db7f5bsi2842280plg.335.2023.05.18.17.29.03;
        Thu, 18 May 2023 17:29:15 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=ibTQQrzi;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230212AbjESAS2 (ORCPT <rfc822;pacteraone@gmail.com> + 99 others);
        Thu, 18 May 2023 20:18:28 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53914 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230338AbjESASI (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 May 2023 20:18:08 -0400
Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com
 [IPv6:2607:f8b0:4864:20::435])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EFDED1988
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:17:55 -0700 (PDT)
Received: by mail-pf1-x435.google.com with SMTP id
 d2e1a72fcca58-64359d9c531so1947235b3a.3
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:17:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1684455475; x=1687047475;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date
         :message-id:reply-to;
        bh=l1m2+L6s6yTedlkjGRtkdmOBtMzjsPpC6Q11Fk73V9E=;
        b=ibTQQrziO1HNqoksVze8jV/WZuIGym6k/RLnUFRkvzC1FYNlJeiEOTRiWCdSMubNDu
         lxX/pIMyF1HqpVh3CZXV+NjiTwIAcPlEL6hLCYF10XCwMHm6UNhvg+m/nY9HE1y9Wcdw
         +8EUhwswORXeV9h+4gfOr+X2qS1apCB6RsuanebrMok7o8nGcyUwTquNnAPFlQ9KEkHi
         o+OudO3nB5sNtF+AOm3MSOE9fjE80UwRQ4rcRBwOGr20AOM5FP0nQpr1nefu8viU3zNr
         PeQa7YrFYGokalIXNQF2ZW8VqU6RKil9inL+pY0CE6NAzug7zLDKdCpwZ1Cyp/5pPOvE
         wr4Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684455475; x=1687047475;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=l1m2+L6s6yTedlkjGRtkdmOBtMzjsPpC6Q11Fk73V9E=;
        b=QwURebL58u3e12TMu+vRxIQNrC35jDzws1SOCf2Vu/7ICqax8d50peWSI0ya90f2Xt
         c3wjMosAHy03oF9pYO5hmTxchMBjQ7xqBFY/B6EuJiH92VuDk35pul8IYLHFQpKqbRVj
         PLwjBDrzzRUT3XVyRGcEFIqEBiYLO4plR3pgAingD+wUNaQbe9zGbufFXayxXj1DZyLS
         SrADPmNCN0vJcYMyDQKkUEQ3rXgmnMIBkYCTthwQejDWwvGbiOSGQwLCOpkVPi1Fy9zA
         4ElyX/+bVbNOeOMN+x3w2GgzbzDxEt7LbaXrgslh07ZXT/YqDught8y+TuVdhNjZdHQU
         pxqw==
X-Gm-Message-State: AC+VfDz8gcUJGMPES+7DxX+Gc9hqmXxtN+l3ZmwHdH3TmKubNqGpLVPV
        TKXLFLcAcMiUAoRRGWGSQQE=
X-Received: by 2002:a05:6a00:218a:b0:643:558d:9ce2 with SMTP id
 h10-20020a056a00218a00b00643558d9ce2mr804173pfi.21.1684455474923;
        Thu, 18 May 2023 17:17:54 -0700 (PDT)
Received: from localhost
 (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com.
 [2603:800c:1a02:1bae:a7fa:157f:969a:4cde])
        by smtp.gmail.com with ESMTPSA id
 y11-20020a62b50b000000b00646e7d2b5a7sm1854752pfe.112.2023.05.18.17.17.54
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 18 May 2023 17:17:54 -0700 (PDT)
Sender: Tejun Heo <htejun@gmail.com>
From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
        linux-kernel@vger.kernel.org, kernel-team@meta.com,
        joshdon@google.com, brho@google.com, briannorris@chromium.org,
        nhuck@google.com, agk@redhat.com, snitzer@kernel.org,
        void@manifault.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 10/24] workqueue: Rename workqueue_attrs->no_numa to ->ordered
Date: Thu, 18 May 2023 14:16:55 -1000
Message-Id: <20230519001709.2563-11-tj@kernel.org>
X-Mailer: git-send-email 2.40.1
In-Reply-To: <20230519001709.2563-1-tj@kernel.org>
References: <20230519001709.2563-1-tj@kernel.org>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,
        DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
        HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,
        SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no
        version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1766280297797663208?=
X-GMAIL-MSGID: =?utf-8?q?1766280297797663208?=

With the recent removal of NUMA related module param and sysfs knob,
workqueue_attrs->no_numa is now only used to implement ordered workqueues.
Let's rename the field so that it's less confusing especially with the
planned CPU affinity awareness improvements.

Just a rename. No functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/workqueue.h |  6 +++---
 kernel/workqueue.c        | 19 +++++++++----------
 2 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index d1b681f67985..8cc9b86d3256 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -141,13 +141,13 @@ struct workqueue_attrs {
 	cpumask_var_t cpumask;
 
 	/**
-	 * @no_numa: disable NUMA affinity
+	 * @ordered: work items must be executed one by one in queueing order
 	 *
-	 * Unlike other fields, ``no_numa`` isn't a property of a worker_pool. It
+	 * Unlike other fields, ``ordered`` isn't a property of a worker_pool. It
 	 * only modifies how :c:func:`apply_workqueue_attrs` select pools and thus
 	 * doesn't participate in pool hash calculations or equality comparisons.
 	 */
-	bool no_numa;
+	bool ordered;
 };
 
 static inline struct delayed_work *to_delayed_work(struct work_struct *work)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 43f3bb801bd9..6a5d227949d9 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3653,10 +3653,10 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
 	cpumask_copy(to->cpumask, from->cpumask);
 	/*
 	 * Unlike hash and equality test, this function doesn't ignore
-	 * ->no_numa as it is used for both pool and wq attrs.  Instead,
-	 * get_unbound_pool() explicitly clears ->no_numa after copying.
+	 * ->ordered as it is used for both pool and wq attrs.  Instead,
+	 * get_unbound_pool() explicitly clears ->ordered after copying.
 	 */
-	to->no_numa = from->no_numa;
+	to->ordered = from->ordered;
 }
 
 /* hash value of the content of @attr */
@@ -3916,10 +3916,10 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
 	pool->node = target_node;
 
 	/*
-	 * no_numa isn't a worker_pool attribute, always clear it.  See
+	 * ordered isn't a worker_pool attribute, always clear it.  See
 	 * 'struct workqueue_attrs' comments for detail.
 	 */
-	pool->attrs->no_numa = false;
+	pool->attrs->ordered = false;
 
 	if (worker_pool_assign_id(pool) < 0)
 		goto fail;
@@ -4124,7 +4124,7 @@ static struct pool_workqueue *alloc_unbound_pwq(struct workqueue_struct *wq,
 static void wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node,
 				 int cpu_going_down, cpumask_t *cpumask)
 {
-	if (!wq_numa_enabled || attrs->no_numa)
+	if (!wq_numa_enabled || attrs->ordered)
 		goto use_dfl;
 
 	/* does @node have any online CPUs @attrs wants? */
@@ -4236,7 +4236,7 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
 		goto out_free;
 
 	for_each_possible_cpu(cpu) {
-		if (new_attrs->no_numa) {
+		if (new_attrs->ordered) {
 			ctx->dfl_pwq->refcnt++;
 			ctx->pwq_tbl[cpu] = ctx->dfl_pwq;
 		} else {
@@ -4393,7 +4393,7 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
 	lockdep_assert_held(&wq_pool_mutex);
 
 	if (!wq_numa_enabled || !(wq->flags & WQ_UNBOUND) ||
-	    wq->unbound_attrs->no_numa)
+	    wq->unbound_attrs->ordered)
 		return;
 
 	/*
@@ -6323,11 +6323,10 @@ void __init workqueue_init_early(void)
 		/*
 		 * An ordered wq should have only one pwq as ordering is
 		 * guaranteed by max_active which is enforced by pwqs.
-		 * Turn off NUMA so that dfl_pwq is used for all nodes.
 		 */
 		BUG_ON(!(attrs = alloc_workqueue_attrs()));
 		attrs->nice = std_nice[i];
-		attrs->no_numa = true;
+		attrs->ordered = true;
 		ordered_wq_attrs[i] = attrs;
 	}
 

From patchwork Fri May 19 00:16:56 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tejun Heo <tj@kernel.org>
X-Patchwork-Id: 96119
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp887880vqo;
        Thu, 18 May 2023 17:23:06 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ7QY0rsZ50l/XhdG1rLTpmZv8vNKPC/gHxSAaEqz5fphxaNd6OwNKjXiU+nolyhX2VTOarg
X-Received: by 2002:a05:6a20:748d:b0:100:95ca:b1e1 with SMTP id
 p13-20020a056a20748d00b0010095cab1e1mr223264pzd.37.1684455786039;
        Thu, 18 May 2023 17:23:06 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1684455786; cv=none;
        d=google.com; s=arc-20160816;
        b=lGLv3Xtwq9KMg+3a+jD6QY+wlIICgC0XBX0+RsM733cyy50D5KbrGExMiyG3Oo3NuS
         6dkBcXKbl/qdjJz7bFgl7TOdNnQDkr9zZO9n3icQjHol9bhPFkITfIsHFoKx/9spEIDV
         I8LWGGLEypEEZieWV6iOG397bLFXwxrGaXVBn5s5ookZB6MfJ8oGXykVs6dxraCo9p9/
         McimBmxgVGwA2Rc4Myq4zDjAqyOPtg22L3aLE4avzDbEOsmvPrtgyxyRS4ESTlA+IsSn
         XYpvpTjohHOyMcTEgbNLfrv6Bl4f7hg9GUSZ9NDDp79GJIgWat8Mry2jnQwWPxP/vwiF
         q03Q==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from:sender
         :dkim-signature;
        bh=PR0xPNG0L8gwUGyT27ySKhJ3diloQZpm6Ut1A4Xwn5A=;
        b=vd/Smt4KUL4PJVrXELZDbCJQImApM7zqyVmnqDGV7ux2tWsLYL01wiFERunF/PemFp
         pDXcb8oDK68w/r72Nz5YOp/Ds8d8tRIlmbwe9XvbCKRSUXS2LVCPYNw/2kfsS/EinCyh
         DNwhMIuRVQUQh0mEGLp0rfVKQVd9g7B0hIrQxvTHEiXNECT0X45tx9CJV4QM3UWxtCEn
         pkcAQ4heLm88L2PmjLPui/oVj7o9U10Xfa/k7qjP/ulkB8aGwaJzm7ySRVv3k6Jd5u+P
         WJogQ0SJj+dyqUSSUiGSOBh5cvZZkNgtb1e4Fh+jea/axmqH1kSJpdQdpThJZ9tnzldX
         hH8w==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=AP1IkGZ6;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 e23-20020a637457000000b00513234112acsi2703044pgn.888.2023.05.18.17.22.53;
        Thu, 18 May 2023 17:23:06 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=AP1IkGZ6;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230357AbjESASi (ORCPT <rfc822;pacteraone@gmail.com> + 99 others);
        Thu, 18 May 2023 20:18:38 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53864 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230323AbjESASQ (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 May 2023 20:18:16 -0400
Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com
 [IPv6:2607:f8b0:4864:20::633])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 737DD19A4
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:17:58 -0700 (PDT)
Received: by mail-pl1-x633.google.com with SMTP id
 d9443c01a7336-1ae408f4d1aso21407515ad.0
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:17:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1684455477; x=1687047477;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date
         :message-id:reply-to;
        bh=PR0xPNG0L8gwUGyT27ySKhJ3diloQZpm6Ut1A4Xwn5A=;
        b=AP1IkGZ6T721vNp/0h3CkTuhmpM817bqlt2R2Nwr0dLi9Z2tl7aBHHx0vxxwaI/nyU
         07+/lEBbm+olWurCyC42PKwvv+hCHynk+3cCds2P8C4xGhfCAbnKGarEr9VVA1F+tZG4
         Z3ARYBLZreQtYTI+kYYuMPnx7Ii9CAejIq+CB36hJTsTm/xwJCM3WGrQA37SNXOuWVdu
         aLvJtItIv4TQoUEM5eWSpKXIb90lz5kJRDGop5qM6sTIr5dK/1pJm4atC2sAJHVQKJw2
         Iul/xJ3YLFEQTpWqPQvvIkYCgVfMdQ15GE/SoN8UQjxx9EWOPF9pYCV/4aJ75zTLdQ+t
         w4KQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684455477; x=1687047477;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=PR0xPNG0L8gwUGyT27ySKhJ3diloQZpm6Ut1A4Xwn5A=;
        b=lRV1d7v8saSeMTCtc2dg7iX1IXVX8+St2VxrwR4QAKdSOL0RFGGSTy6TYY7pBOl2Lb
         Pzuhza7xKIwZE606II5o5Ol2mUrlCaitGCJ3skpf5cc/HcZGW5VZ3T9HsKG7a/NZdNtn
         DuDpLMwp0v1RFaa3qpnT8h2lREFOAyXcJ7+oXjLbOnwBIgexyvdUnjMzz6WsbqpcP/QC
         IQoc9vnQAc+6q23TGctM4b8O9yeSEqhkADC+FWssYOpGERvTEWOq1scqSY4Tok8D1vir
         7oTKhyx5PIrNQVhVncH5aVL9ui3y9bhcmKsZ6IiKaIPjIsgaZU8fhgW8vP6Lt0VuOwiM
         DENA==
X-Gm-Message-State: AC+VfDzXWL8ebZK2opdH3mqnXJhxTafXHKP3gvpVa55I/4nI1Ol1irYF
        hxzzeZTeP+tWMjkaIixMxi8=
X-Received: by 2002:a17:902:fe18:b0:1aa:e5cd:6478 with SMTP id
 g24-20020a170902fe1800b001aae5cd6478mr767671plj.58.1684455476892;
        Thu, 18 May 2023 17:17:56 -0700 (PDT)
Received: from localhost
 (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com.
 [2603:800c:1a02:1bae:a7fa:157f:969a:4cde])
        by smtp.gmail.com with ESMTPSA id
 m24-20020a170902bb9800b0019ee045a2b3sm2010057pls.308.2023.05.18.17.17.56
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 18 May 2023 17:17:56 -0700 (PDT)
Sender: Tejun Heo <htejun@gmail.com>
From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
        linux-kernel@vger.kernel.org, kernel-team@meta.com,
        joshdon@google.com, brho@google.com, briannorris@chromium.org,
        nhuck@google.com, agk@redhat.com, snitzer@kernel.org,
        void@manifault.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 11/24] workqueue: Rename NUMA related names to use pod instead
Date: Thu, 18 May 2023 14:16:56 -1000
Message-Id: <20230519001709.2563-12-tj@kernel.org>
X-Mailer: git-send-email 2.40.1
In-Reply-To: <20230519001709.2563-1-tj@kernel.org>
References: <20230519001709.2563-1-tj@kernel.org>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,
        DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
        HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,
        SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no
        version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1766279910498504494?=
X-GMAIL-MSGID: =?utf-8?q?1766279910498504494?=

Workqueue is in the process of improving CPU affinity awareness. It will
become more flexible and won't be tied to NUMA node boundaries. This patch
renames all NUMA related names in workqueue.c to use "pod" instead.

While "pod" isn't a very common term, it short and captures the grouping of
CPUs well enough. These names are only going to be used within workqueue
implementation proper, so the specific naming doesn't matter that much.

* wq_numa_possible_cpumask -> wq_pod_cpus

* wq_numa_enabled -> wq_pod_enabled

* wq_update_unbound_numa_attrs_buf -> wq_update_pod_attrs_buf

* workqueue_select_cpu_near -> select_numa_node_cpu

  This rename is different from others. The function is only used by
  queue_work_node() and specifically tries to find a CPU in the specified
  NUMA node. As workqueue affinity will become more flexible and untied from
  NUMA, this function's name should specifically describe that it's for
  NUMA.

* wq_calc_node_cpumask -> wq_calc_pod_cpumask

* wq_update_unbound_numa -> wq_update_pod

* wq_numa_init -> wq_pod_init

* node -> pod in local variables

Only renames. No functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/workqueue.c | 162 +++++++++++++++++++++------------------------
 1 file changed, 76 insertions(+), 86 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 6a5d227949d9..08ab40371697 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -325,8 +325,7 @@ struct workqueue_struct {
 
 static struct kmem_cache *pwq_cache;
 
-static cpumask_var_t *wq_numa_possible_cpumask;
-					/* possible CPUs of each node */
+static cpumask_var_t *wq_pod_cpus;	/* possible CPUs of each node */
 
 /*
  * Per-cpu work items which run for longer than the following threshold are
@@ -342,10 +341,10 @@ module_param_named(power_efficient, wq_power_efficient, bool, 0444);
 
 static bool wq_online;			/* can kworkers be created yet? */
 
-static bool wq_numa_enabled;		/* unbound NUMA affinity enabled */
+static bool wq_pod_enabled;		/* unbound CPU pod affinity enabled */
 
-/* buf for wq_update_unbound_numa_attrs(), protected by CPU hotplug exclusion */
-static struct workqueue_attrs *wq_update_unbound_numa_attrs_buf;
+/* buf for wq_update_unbound_pod_attrs(), protected by CPU hotplug exclusion */
+static struct workqueue_attrs *wq_update_pod_attrs_buf;
 
 static DEFINE_MUTEX(wq_pool_mutex);	/* protects pools and workqueues list */
 static DEFINE_MUTEX(wq_pool_attach_mutex); /* protects worker attach/detach */
@@ -1742,7 +1741,7 @@ bool queue_work_on(int cpu, struct workqueue_struct *wq,
 EXPORT_SYMBOL(queue_work_on);
 
 /**
- * workqueue_select_cpu_near - Select a CPU based on NUMA node
+ * select_numa_node_cpu - Select a CPU based on NUMA node
  * @node: NUMA node ID that we want to select a CPU from
  *
  * This function will attempt to find a "random" cpu available on a given
@@ -1750,12 +1749,12 @@ EXPORT_SYMBOL(queue_work_on);
  * WORK_CPU_UNBOUND indicating that we should just schedule to any
  * available CPU if we need to schedule this work.
  */
-static int workqueue_select_cpu_near(int node)
+static int select_numa_node_cpu(int node)
 {
 	int cpu;
 
 	/* No point in doing this if NUMA isn't enabled for workqueues */
-	if (!wq_numa_enabled)
+	if (!wq_pod_enabled)
 		return WORK_CPU_UNBOUND;
 
 	/* Delay binding to CPU if node is not valid or online */
@@ -1814,7 +1813,7 @@ bool queue_work_node(int node, struct workqueue_struct *wq,
 	local_irq_save(flags);
 
 	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
-		int cpu = workqueue_select_cpu_near(node);
+		int cpu = select_numa_node_cpu(node);
 
 		__queue_work(cpu, wq, work);
 		ret = true;
@@ -3883,8 +3882,8 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
 {
 	u32 hash = wqattrs_hash(attrs);
 	struct worker_pool *pool;
-	int node;
-	int target_node = NUMA_NO_NODE;
+	int pod;
+	int target_pod = NUMA_NO_NODE;
 
 	lockdep_assert_held(&wq_pool_mutex);
 
@@ -3896,24 +3895,23 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
 		}
 	}
 
-	/* if cpumask is contained inside a NUMA node, we belong to that node */
-	if (wq_numa_enabled) {
-		for_each_node(node) {
-			if (cpumask_subset(attrs->cpumask,
-					   wq_numa_possible_cpumask[node])) {
-				target_node = node;
+	/* if cpumask is contained inside a pod, we belong to that pod */
+	if (wq_pod_enabled) {
+		for_each_node(pod) {
+			if (cpumask_subset(attrs->cpumask, wq_pod_cpus[pod])) {
+				target_pod = pod;
 				break;
 			}
 		}
 	}
 
 	/* nope, create a new one */
-	pool = kzalloc_node(sizeof(*pool), GFP_KERNEL, target_node);
+	pool = kzalloc_node(sizeof(*pool), GFP_KERNEL, target_pod);
 	if (!pool || init_worker_pool(pool) < 0)
 		goto fail;
 
 	copy_workqueue_attrs(pool->attrs, attrs);
-	pool->node = target_node;
+	pool->node = target_pod;
 
 	/*
 	 * ordered isn't a worker_pool attribute, always clear it.  See
@@ -4103,40 +4101,38 @@ static struct pool_workqueue *alloc_unbound_pwq(struct workqueue_struct *wq,
 }
 
 /**
- * wq_calc_node_cpumask - calculate a wq_attrs' cpumask for the specified node
+ * wq_calc_pod_cpumask - calculate a wq_attrs' cpumask for a pod
  * @attrs: the wq_attrs of the default pwq of the target workqueue
- * @node: the target NUMA node
+ * @pod: the target CPU pod
  * @cpu_going_down: if >= 0, the CPU to consider as offline
  * @cpumask: outarg, the resulting cpumask
  *
- * Calculate the cpumask a workqueue with @attrs should use on @node.  If
- * @cpu_going_down is >= 0, that cpu is considered offline during
- * calculation.  The result is stored in @cpumask.
+ * Calculate the cpumask a workqueue with @attrs should use on @pod. If
+ * @cpu_going_down is >= 0, that cpu is considered offline during calculation.
+ * The result is stored in @cpumask.
  *
- * If NUMA affinity is not enabled, @attrs->cpumask is always used.  If
- * enabled and @node has online CPUs requested by @attrs, the returned
- * cpumask is the intersection of the possible CPUs of @node and
- * @attrs->cpumask.
+ * If pod affinity is not enabled, @attrs->cpumask is always used. If enabled
+ * and @pod has online CPUs requested by @attrs, the returned cpumask is the
+ * intersection of the possible CPUs of @pod and @attrs->cpumask.
  *
- * The caller is responsible for ensuring that the cpumask of @node stays
- * stable.
+ * The caller is responsible for ensuring that the cpumask of @pod stays stable.
  */
-static void wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node,
+static void wq_calc_pod_cpumask(const struct workqueue_attrs *attrs, int pod,
 				 int cpu_going_down, cpumask_t *cpumask)
 {
-	if (!wq_numa_enabled || attrs->ordered)
+	if (!wq_pod_enabled || attrs->ordered)
 		goto use_dfl;
 
-	/* does @node have any online CPUs @attrs wants? */
-	cpumask_and(cpumask, cpumask_of_node(node), attrs->cpumask);
+	/* does @pod have any online CPUs @attrs wants? */
+	cpumask_and(cpumask, cpumask_of_node(pod), attrs->cpumask);
 	if (cpu_going_down >= 0)
 		cpumask_clear_cpu(cpu_going_down, cpumask);
 
 	if (cpumask_empty(cpumask))
 		goto use_dfl;
 
-	/* yeap, return possible CPUs in @node that @attrs wants */
-	cpumask_and(cpumask, attrs->cpumask, wq_numa_possible_cpumask[node]);
+	/* yeap, return possible CPUs in @pod that @attrs wants */
+	cpumask_and(cpumask, attrs->cpumask, wq_pod_cpus[pod]);
 
 	if (cpumask_empty(cpumask))
 		pr_warn_once("WARNING: workqueue cpumask: online intersect > "
@@ -4240,8 +4236,8 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
 			ctx->dfl_pwq->refcnt++;
 			ctx->pwq_tbl[cpu] = ctx->dfl_pwq;
 		} else {
-			wq_calc_node_cpumask(new_attrs, cpu_to_node(cpu), -1,
-					     tmp_attrs->cpumask);
+			wq_calc_pod_cpumask(new_attrs, cpu_to_node(cpu), -1,
+					    tmp_attrs->cpumask);
 			ctx->pwq_tbl[cpu] = alloc_unbound_pwq(wq, tmp_attrs);
 			if (!ctx->pwq_tbl[cpu])
 				goto out_free;
@@ -4332,12 +4328,11 @@ static int apply_workqueue_attrs_locked(struct workqueue_struct *wq,
  * @wq: the target workqueue
  * @attrs: the workqueue_attrs to apply, allocated with alloc_workqueue_attrs()
  *
- * Apply @attrs to an unbound workqueue @wq.  Unless disabled, on NUMA
- * machines, this function maps a separate pwq to each NUMA node with
- * possibles CPUs in @attrs->cpumask so that work items are affine to the
- * NUMA node it was issued on.  Older pwqs are released as in-flight work
- * items finish.  Note that a work item which repeatedly requeues itself
- * back-to-back will stay on its current pwq.
+ * Apply @attrs to an unbound workqueue @wq. Unless disabled, this function maps
+ * a separate pwq to each CPU pod with possibles CPUs in @attrs->cpumask so that
+ * work items are affine to the pod it was issued on. Older pwqs are released as
+ * in-flight work items finish. Note that a work item which repeatedly requeues
+ * itself back-to-back will stay on its current pwq.
  *
  * Performs GFP_KERNEL allocations.
  *
@@ -4360,31 +4355,28 @@ int apply_workqueue_attrs(struct workqueue_struct *wq,
 }
 
 /**
- * wq_update_unbound_numa - update NUMA affinity of a wq for CPU hot[un]plug
+ * wq_update_pod - update pod affinity of a wq for CPU hot[un]plug
  * @wq: the target workqueue
  * @cpu: the CPU coming up or going down
  * @online: whether @cpu is coming up or going down
  *
  * This function is to be called from %CPU_DOWN_PREPARE, %CPU_ONLINE and
- * %CPU_DOWN_FAILED.  @cpu is being hot[un]plugged, update NUMA affinity of
- * @wq accordingly.
- *
- * If NUMA affinity can't be adjusted due to memory allocation failure, it
- * falls back to @wq->dfl_pwq which may not be optimal but is always
- * correct.
- *
- * Note that when the last allowed CPU of a NUMA node goes offline for a
- * workqueue with a cpumask spanning multiple nodes, the workers which were
- * already executing the work items for the workqueue will lose their CPU
- * affinity and may execute on any CPU.  This is similar to how per-cpu
- * workqueues behave on CPU_DOWN.  If a workqueue user wants strict
- * affinity, it's the user's responsibility to flush the work item from
- * CPU_DOWN_PREPARE.
+ * %CPU_DOWN_FAILED. @cpu is being hot[un]plugged, update pod affinity of @wq
+ * accordingly.
+ *
+ * If pod affinity can't be adjusted due to memory allocation failure, it falls
+ * back to @wq->dfl_pwq which may not be optimal but is always correct.
+ *
+ * Note that when the last allowed CPU of a pod goes offline for a workqueue
+ * with a cpumask spanning multiple poders, the workers which were already
+ * executing the work items for the workqueue will lose their CPU affinity and
+ * may execute on any CPU. This is similar to how per-cpu workqueues behave on
+ * CPU_DOWN. If a workqueue user wants strict affinity, it's the user's
+ * responsibility to flush the work item from CPU_DOWN_PREPARE.
  */
-static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
-				   bool online)
+static void wq_update_pod(struct workqueue_struct *wq, int cpu, bool online)
 {
-	int node = cpu_to_node(cpu);
+	int pod = cpu_to_node(cpu);
 	int cpu_off = online ? -1 : cpu;
 	struct pool_workqueue *old_pwq = NULL, *pwq;
 	struct workqueue_attrs *target_attrs;
@@ -4392,7 +4384,7 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
 
 	lockdep_assert_held(&wq_pool_mutex);
 
-	if (!wq_numa_enabled || !(wq->flags & WQ_UNBOUND) ||
+	if (!wq_pod_enabled || !(wq->flags & WQ_UNBOUND) ||
 	    wq->unbound_attrs->ordered)
 		return;
 
@@ -4401,13 +4393,13 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
 	 * Let's use a preallocated one.  The following buf is protected by
 	 * CPU hotplug exclusion.
 	 */
-	target_attrs = wq_update_unbound_numa_attrs_buf;
+	target_attrs = wq_update_pod_attrs_buf;
 	cpumask = target_attrs->cpumask;
 
 	copy_workqueue_attrs(target_attrs, wq->unbound_attrs);
 
 	/* nothing to do if the target cpumask matches the current pwq */
-	wq_calc_node_cpumask(wq->dfl_pwq->pool->attrs, node, cpu_off, cpumask);
+	wq_calc_pod_cpumask(wq->dfl_pwq->pool->attrs, pod, cpu_off, cpumask);
 	pwq = rcu_dereference_protected(*per_cpu_ptr(wq->cpu_pwq, cpu),
 					lockdep_is_held(&wq_pool_mutex));
 	if (cpumask_equal(cpumask, pwq->pool->attrs->cpumask))
@@ -4416,7 +4408,7 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
 	/* create a new pwq */
 	pwq = alloc_unbound_pwq(wq, target_attrs);
 	if (!pwq) {
-		pr_warn("workqueue: allocation failed while updating NUMA affinity of \"%s\"\n",
+		pr_warn("workqueue: allocation failed while updating CPU pod affinity of \"%s\"\n",
 			wq->name);
 		goto use_dfl_pwq;
 	}
@@ -4547,11 +4539,10 @@ struct workqueue_struct *alloc_workqueue(const char *fmt,
 	struct pool_workqueue *pwq;
 
 	/*
-	 * Unbound && max_active == 1 used to imply ordered, which is no
-	 * longer the case on NUMA machines due to per-node pools.  While
+	 * Unbound && max_active == 1 used to imply ordered, which is no longer
+	 * the case on many machines due to per-pod pools. While
 	 * alloc_ordered_workqueue() is the right way to create an ordered
-	 * workqueue, keep the previous behavior to avoid subtle breakages
-	 * on NUMA.
+	 * workqueue, keep the previous behavior to avoid subtle breakages.
 	 */
 	if ((flags & WQ_UNBOUND) && max_active == 1)
 		flags |= __WQ_ORDERED;
@@ -5432,9 +5423,9 @@ int workqueue_online_cpu(unsigned int cpu)
 		mutex_unlock(&wq_pool_attach_mutex);
 	}
 
-	/* update NUMA affinity of unbound workqueues */
+	/* update pod affinity of unbound workqueues */
 	list_for_each_entry(wq, &workqueues, list)
-		wq_update_unbound_numa(wq, cpu, true);
+		wq_update_pod(wq, cpu, true);
 
 	mutex_unlock(&wq_pool_mutex);
 	return 0;
@@ -5450,10 +5441,10 @@ int workqueue_offline_cpu(unsigned int cpu)
 
 	unbind_workers(cpu);
 
-	/* update NUMA affinity of unbound workqueues */
+	/* update pod affinity of unbound workqueues */
 	mutex_lock(&wq_pool_mutex);
 	list_for_each_entry(wq, &workqueues, list)
-		wq_update_unbound_numa(wq, cpu, false);
+		wq_update_pod(wq, cpu, false);
 	mutex_unlock(&wq_pool_mutex);
 
 	return 0;
@@ -6231,7 +6222,7 @@ static inline void wq_watchdog_init(void) { }
 
 #endif	/* CONFIG_WQ_WATCHDOG */
 
-static void __init wq_numa_init(void)
+static void __init wq_pod_init(void)
 {
 	cpumask_var_t *tbl;
 	int node, cpu;
@@ -6246,8 +6237,8 @@ static void __init wq_numa_init(void)
 		}
 	}
 
-	wq_update_unbound_numa_attrs_buf = alloc_workqueue_attrs();
-	BUG_ON(!wq_update_unbound_numa_attrs_buf);
+	wq_update_pod_attrs_buf = alloc_workqueue_attrs();
+	BUG_ON(!wq_update_pod_attrs_buf);
 
 	/*
 	 * We want masks of possible CPUs of each node which isn't readily
@@ -6266,8 +6257,8 @@ static void __init wq_numa_init(void)
 		cpumask_set_cpu(cpu, tbl[node]);
 	}
 
-	wq_numa_possible_cpumask = tbl;
-	wq_numa_enabled = true;
+	wq_pod_cpus = tbl;
+	wq_pod_enabled = true;
 }
 
 /**
@@ -6367,15 +6358,14 @@ void __init workqueue_init(void)
 	BUG_ON(IS_ERR(pwq_release_worker));
 
 	/*
-	 * It'd be simpler to initialize NUMA in workqueue_init_early() but
-	 * CPU to node mapping may not be available that early on some
-	 * archs such as power and arm64.  As per-cpu pools created
-	 * previously could be missing node hint and unbound pools NUMA
-	 * affinity, fix them up.
+	 * It'd be simpler to initialize pods in workqueue_init_early() but CPU
+	 * to node mapping may not be available that early on some archs such as
+	 * power and arm64. As per-cpu pools created previously could be missing
+	 * node hint and unbound pool pod affinity, fix them up.
 	 *
 	 * Also, while iterating workqueues, create rescuers if requested.
 	 */
-	wq_numa_init();
+	wq_pod_init();
 
 	mutex_lock(&wq_pool_mutex);
 
@@ -6386,7 +6376,7 @@ void __init workqueue_init(void)
 	}
 
 	list_for_each_entry(wq, &workqueues, list) {
-		wq_update_unbound_numa(wq, smp_processor_id(), true);
+		wq_update_pod(wq, smp_processor_id(), true);
 		WARN(init_rescuer(wq),
 		     "workqueue: failed to create early rescuer for %s",
 		     wq->name);

From patchwork Fri May 19 00:16:57 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tejun Heo <tj@kernel.org>
X-Patchwork-Id: 96134
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp890390vqo;
        Thu, 18 May 2023 17:29:53 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ4QYrtn1WEGjzcCCm4dWR7sYaeMwNDfzkA7m5FvVQ6yYkzzOj8ZosxOv38D7EG7sbv69BJv
X-Received: by 2002:a17:90a:3fc2:b0:252:f526:c0c5 with SMTP id
 u2-20020a17090a3fc200b00252f526c0c5mr257841pjm.43.1684456193273;
        Thu, 18 May 2023 17:29:53 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1684456193; cv=none;
        d=google.com; s=arc-20160816;
        b=lAVSS5Kprk4co7r7hPNlgFGHCBL/qj3hWw9Td/07+cQj+3maIzMXsmUkdH9aRQ8/ro
         L9s15yGXa/6fsP/OK9D8umIP8rubAcui/zlWJCOEhVhRJKYoBaCGf4dmwY3yK19WW9XY
         1BzngKFcFPFaOJvl6vc3cNQDtl984VibU1F078Ab/qYeZmvuPrGMHWr595d67JP3AeJZ
         S0Dl9sMBkPyqIl5MBWruyEAXM7sex4vmb0gzMPkCDTG7MGF+YsGFgBTtplI0zK74GEHQ
         D/FIM5+496aTG4yuif2DdovEUTfAly9ojPcVzlwqROPotOce6UsdXHiU0a7h3Y0PPP6D
         +AeA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from:sender
         :dkim-signature;
        bh=2/PmvqTHj42j0r/UUkQ7A13IJ6p9nJcrSOrJzi5sT4U=;
        b=Q9vBA97/k/e5qa2Hj4NkmyJhYvnilT7uSnpQFOM+CbwoaT8czA/2O7e0TMy1m0Qwf1
         aQJLKxuiTpxoWuHGM4S6YRBd0rHjRZsOqxiKzU+BnlLSJD09mt567FTAlo1IWWQdhNkx
         g024b072b5hd+aHYDCL08gfRo1CKQzxNCBrMawwzmZCLS6sEdTNdKWrFtDwg7V4qi2mm
         G3QXMWu15H4QBagLktcFgknccUzGR9lQ2bR8TbwEmslWEmQqO6NtyaJQ3pmGnF87M6eL
         VYNcmotbx9ymdU5j0dB9rutWap4SRo9tZ4iI8ux2cKpCY0ziNCanpPpwgBLropFEtdiD
         uGZQ==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=SXbGSvXs;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 w34-20020a17090a6ba500b00253695cadcasi568740pjj.180.2023.05.18.17.29.41;
        Thu, 18 May 2023 17:29:53 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=SXbGSvXs;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230329AbjESASk (ORCPT <rfc822;pacteraone@gmail.com> + 99 others);
        Thu, 18 May 2023 20:18:40 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54120 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230459AbjESASV (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 May 2023 20:18:21 -0400
Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com
 [IPv6:2607:f8b0:4864:20::634])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8070910E9
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:18:01 -0700 (PDT)
Received: by mail-pl1-x634.google.com with SMTP id
 d9443c01a7336-1aaebed5bd6so19812905ad.1
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:18:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1684455479; x=1687047479;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date
         :message-id:reply-to;
        bh=2/PmvqTHj42j0r/UUkQ7A13IJ6p9nJcrSOrJzi5sT4U=;
        b=SXbGSvXs/T97X71btC61wJsmE6c2w62rr/iKuh4OFsgvGHIqccEztj09hVhHp3Km/7
         eTJh8LYaWSTlMf7xekF+HUb6J79K1vzYjuE4ThwgDo6mRXMPQcEasv5Va/Erd8dekSDU
         Uyl2usw79bcsq429z2L9yE9kxpmOITmP0L20DDtkhRsPQIN2uQB7OCKujuQeXtJUWyyq
         kOW2jy7CHvYTVCLOcKCk/CO5xgBoE2+lHjbK8uQ71ZueiQYQrCBd9B4HR9boPR97gcFy
         1yg6lgcCpsBj7Lu5UnML5DH6jpmasEFLWo1x6GWZm/sxz+kTmd5IxLWvAs3Lt4pNDA4a
         iy/g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684455479; x=1687047479;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=2/PmvqTHj42j0r/UUkQ7A13IJ6p9nJcrSOrJzi5sT4U=;
        b=bSR3CR1M5B+pkS+UcsLrA+FWQ65k4hfFLFkPL5ivpoQ2GK1O9/qsNgXAySLvEXt7Ia
         wKsvKEj9A+lwuLwl062nJpcJTJbVyAeJSM1ZlScYERlSrbOTF1Th+4FRuCctz6V1OP/p
         slnZvTz3y6HdZ9hVv4PlpTAUbJX1NbgZg2ACe5wP0inP1HQxISv6Q1kw8gK3bvFbDcsE
         Pnp5tOJ8QL1HbeSqOUeybt4TCBfGCxJfnNDqtkH3TU8hFEionHG8PQAZ6mlCaRxwblXh
         RqcdcPj2HQFxKWy+fZkCScM2+/1uWsu69GXa4icBGj5KtNrU6MKlZAMIugEMow1Y6xhx
         o8bA==
X-Gm-Message-State: AC+VfDyVpYRmUrEu4P3+Hhvwt+GuMgSHiiiOrOanTwBh+1IC9OkwPD2I
        EKccaUCKuZ8NP4OqP2m8wnA=
X-Received: by 2002:a17:902:8d8a:b0:1ad:7bc5:b9ea with SMTP id
 v10-20020a1709028d8a00b001ad7bc5b9eamr656437plo.60.1684455478667;
        Thu, 18 May 2023 17:17:58 -0700 (PDT)
Received: from localhost
 (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com.
 [2603:800c:1a02:1bae:a7fa:157f:969a:4cde])
        by smtp.gmail.com with ESMTPSA id
 t4-20020a1709028c8400b001ae197fdbb2sm2013295plo.274.2023.05.18.17.17.58
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 18 May 2023 17:17:58 -0700 (PDT)
Sender: Tejun Heo <htejun@gmail.com>
From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
        linux-kernel@vger.kernel.org, kernel-team@meta.com,
        joshdon@google.com, brho@google.com, briannorris@chromium.org,
        nhuck@google.com, agk@redhat.com, snitzer@kernel.org,
        void@manifault.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 12/24] workqueue: Move wq_pod_init() below workqueue_init()
Date: Thu, 18 May 2023 14:16:57 -1000
Message-Id: <20230519001709.2563-13-tj@kernel.org>
X-Mailer: git-send-email 2.40.1
In-Reply-To: <20230519001709.2563-1-tj@kernel.org>
References: <20230519001709.2563-1-tj@kernel.org>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,
        DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
        HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,
        SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no
        version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1766280337556847550?=
X-GMAIL-MSGID: =?utf-8?q?1766280337556847550?=

wq_pod_init() is called from workqueue_init() and responsible for
initializing unbound CPU pods according to NUMA node. Workqueue is in the
process of improving affinity awareness and wants to use other topology
information to initialize unbound CPU pods; however, unlike NUMA nodes,
other topology information isn't yet available in workqueue_init().

The next patch will introduce a later stage init function for workqueue
which will be responsible for initializing unbound CPU pods. Relocate
wq_pod_init() below workqueue_init() where the new init function is going to
be located so that the diff can show the content differences.

Just a relocation. No functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/workqueue.c | 78 ++++++++++++++++++++++++----------------------
 1 file changed, 40 insertions(+), 38 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 08ab40371697..914a69f83d59 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -6222,44 +6222,7 @@ static inline void wq_watchdog_init(void) { }
 
 #endif	/* CONFIG_WQ_WATCHDOG */
 
-static void __init wq_pod_init(void)
-{
-	cpumask_var_t *tbl;
-	int node, cpu;
-
-	if (num_possible_nodes() <= 1)
-		return;
-
-	for_each_possible_cpu(cpu) {
-		if (WARN_ON(cpu_to_node(cpu) == NUMA_NO_NODE)) {
-			pr_warn("workqueue: NUMA node mapping not available for cpu%d, disabling NUMA support\n", cpu);
-			return;
-		}
-	}
-
-	wq_update_pod_attrs_buf = alloc_workqueue_attrs();
-	BUG_ON(!wq_update_pod_attrs_buf);
-
-	/*
-	 * We want masks of possible CPUs of each node which isn't readily
-	 * available.  Build one from cpu_to_node() which should have been
-	 * fully initialized by now.
-	 */
-	tbl = kcalloc(nr_node_ids, sizeof(tbl[0]), GFP_KERNEL);
-	BUG_ON(!tbl);
-
-	for_each_node(node)
-		BUG_ON(!zalloc_cpumask_var_node(&tbl[node], GFP_KERNEL,
-				node_online(node) ? node : NUMA_NO_NODE));
-
-	for_each_possible_cpu(cpu) {
-		node = cpu_to_node(cpu);
-		cpumask_set_cpu(cpu, tbl[node]);
-	}
-
-	wq_pod_cpus = tbl;
-	wq_pod_enabled = true;
-}
+static void wq_pod_init(void);
 
 /**
  * workqueue_init_early - early init for workqueue subsystem
@@ -6399,6 +6362,45 @@ void __init workqueue_init(void)
 	wq_watchdog_init();
 }
 
+static void __init wq_pod_init(void)
+{
+	cpumask_var_t *tbl;
+	int node, cpu;
+
+	if (num_possible_nodes() <= 1)
+		return;
+
+	for_each_possible_cpu(cpu) {
+		if (WARN_ON(cpu_to_node(cpu) == NUMA_NO_NODE)) {
+			pr_warn("workqueue: NUMA node mapping not available for cpu%d, disabling NUMA support\n", cpu);
+			return;
+		}
+	}
+
+	wq_update_pod_attrs_buf = alloc_workqueue_attrs();
+	BUG_ON(!wq_update_pod_attrs_buf);
+
+	/*
+	 * We want masks of possible CPUs of each node which isn't readily
+	 * available.  Build one from cpu_to_node() which should have been
+	 * fully initialized by now.
+	 */
+	tbl = kcalloc(nr_node_ids, sizeof(tbl[0]), GFP_KERNEL);
+	BUG_ON(!tbl);
+
+	for_each_node(node)
+		BUG_ON(!zalloc_cpumask_var_node(&tbl[node], GFP_KERNEL,
+				node_online(node) ? node : NUMA_NO_NODE));
+
+	for_each_possible_cpu(cpu) {
+		node = cpu_to_node(cpu);
+		cpumask_set_cpu(cpu, tbl[node]);
+	}
+
+	wq_pod_cpus = tbl;
+	wq_pod_enabled = true;
+}
+
 /*
  * Despite the naming, this is a no-op function which is here only for avoiding
  * link error. Since compile-time warning may fail to catch, we will need to

From patchwork Fri May 19 00:16:58 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tejun Heo <tj@kernel.org>
X-Patchwork-Id: 96121
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp889091vqo;
        Thu, 18 May 2023 17:26:24 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ6MDJFK8RPbt5I43tCojFndkfPk9XiFs3C0kTol1bksmZPRWgP+xAsh6/YS7J9pkK65qxTX
X-Received: by 2002:a17:90a:ea04:b0:24c:df8:8efa with SMTP id
 w4-20020a17090aea0400b0024c0df88efamr255297pjy.39.1684455983969;
        Thu, 18 May 2023 17:26:23 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1684455983; cv=none;
        d=google.com; s=arc-20160816;
        b=O6HngWumeEOrEbgBkK84UNh6ZBKu95gNFoMpIl16Cy5EoCzZfc+98ELDxoW8R1EMpt
         VRFFmCKlD6ag1YZwZ+iAyNkWSekqYCmPCNSzdLXsMh9nhEALjbK8p6SPnHJ09Z53tIAC
         E1hz34OAwh3CTTHpnCBKiMimg/PGr0zOU+8zkTyXFuE4e4s6gJbJQU1Ixc3C3Yq3QV7w
         rQ8NOcaDUGaMKXeYCyxHJ+pWCtR3RtYWNPACuoeedGkdw9UuD4Ej/+OXg7yNktZktyOU
         SRvehtya3YCOf7n5y8pRqDy55J13lhCx0YwsjmcBXMBseL0JWr8AtH3ACDxZg+yJzaPU
         +uIA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from:sender
         :dkim-signature;
        bh=WYx5gYM0HbxwulP742OJ8NlqhmKf0e7AcKBB2W8ZJjI=;
        b=Zd7fuCM21acCW8OQ99wolRhHQjHIw3TjcTJWCRzyMUVHPsdilMVlU2E93pfNS04kxZ
         IGlPz3RpX8UIt5f8+CGIeY3RRjRRcnND4ztH1U7eA4aYvDRuuY/iWSpTgBFYt3k2KIBt
         RPW81Q1vrF0+YUyLiSFvmSE+oaPC0xBP4T/cL93FfvH85gfaYIYAoi4JveoLly17Hxd8
         VHtoow1UcE7XOu6AD/oPnp7rySO1F88QXPno1yVaS6AepIRsY2VDb5HZmFq/abkIR9Vz
         QyJKSqUaeeojlM/C0rh/Iw0gOeKeGMALu7mBxI+XxH4IItTUvnxm+DMSC6GjiAyZ9n4x
         m1VA==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=hC6z5lDm;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 lw18-20020a17090b181200b0025054d888c5si730970pjb.114.2023.05.18.17.26.11;
        Thu, 18 May 2023 17:26:23 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=hC6z5lDm;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230338AbjESAS6 (ORCPT <rfc822;pacteraone@gmail.com> + 99 others);
        Thu, 18 May 2023 20:18:58 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53952 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230410AbjESAS2 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 May 2023 20:18:28 -0400
Received: from mail-pf1-x434.google.com (mail-pf1-x434.google.com
 [IPv6:2607:f8b0:4864:20::434])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2BEC919B3
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:18:12 -0700 (PDT)
Received: by mail-pf1-x434.google.com with SMTP id
 d2e1a72fcca58-64d2a889722so596464b3a.3
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:18:12 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1684455481; x=1687047481;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date
         :message-id:reply-to;
        bh=WYx5gYM0HbxwulP742OJ8NlqhmKf0e7AcKBB2W8ZJjI=;
        b=hC6z5lDmk55NJOfz5REWpFaKiBciThnPyQ7hAXP8JZCm+1OENLdpIdJhE0sr4kwc3e
         eOyzVp/N4Ub2t9GMq1dMQzml9YN20/pNPbUqLoJgzxRkR9u/w3RoNiZkJHk7d2vNInOc
         fBjMdv3y1GqGZ43MTb39bWypBGAiR/yHCDzjBheynTjRDViP57NH/m1asJMCxsx4Af80
         dMbD3yDvr0ZIzYNowOpytePkRcORAOoQYnk0TC67XRxfqJfKRar+yvtfyo/91Kdv9GWc
         c82skRkiL7viwYCmaseb33BiGMDaiS6goysvOmHf6I8K19AQKpbjmBDbm7sKr8tpW+WX
         bRFw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684455481; x=1687047481;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=WYx5gYM0HbxwulP742OJ8NlqhmKf0e7AcKBB2W8ZJjI=;
        b=Urq/QcYz3SMbmfZhbXXdLF+PQU3nTxT/Q4ws4cFtXhfDs043U4k+b/5GEkGgu51xOu
         ewSjslhNIWGAYo3ikCpXREL5f06cr5fhgpkDfDGbUcCe8DNJYvcz27lZISTZuDqt3b2O
         oZBk2AjfRbsIZlxPDO9l8Bn1cO77uaJmeqe2kYd4AvmUnP8lanVNU13k6F/eOYXhF6ZC
         Quf+zUEfqepg24GiuFaMWMsaWoS1LIGhD6JU1ftveTaQZVIknYye5FuyRlPBNGxMkd7U
         iZ3NoezbAz0gGljLmFb2FalxdRu2z3LhtmMmdegwaUT+ws+DvOPkfg3PbPbSPaXAH3er
         YsHg==
X-Gm-Message-State: AC+VfDwH0JwW0QXTCj+kKJuMegIVYzc9huDcHZL3Ehb4l1fH9pnABS2e
        ebkqNMrM5pYkiWvKabEba6TU04mBJ9c=
X-Received: by 2002:a05:6a00:1689:b0:64c:c5f9:1533 with SMTP id
 k9-20020a056a00168900b0064cc5f91533mr686486pfc.33.1684455480515;
        Thu, 18 May 2023 17:18:00 -0700 (PDT)
Received: from localhost
 (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com.
 [2603:800c:1a02:1bae:a7fa:157f:969a:4cde])
        by smtp.gmail.com with ESMTPSA id
 k3-20020aa78203000000b0063f0ef3b421sm1933970pfi.14.2023.05.18.17.17.59
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 18 May 2023 17:18:00 -0700 (PDT)
Sender: Tejun Heo <htejun@gmail.com>
From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
        linux-kernel@vger.kernel.org, kernel-team@meta.com,
        joshdon@google.com, brho@google.com, briannorris@chromium.org,
        nhuck@google.com, agk@redhat.com, snitzer@kernel.org,
        void@manifault.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 13/24] workqueue: Initialize unbound CPU pods later in the
 boot
Date: Thu, 18 May 2023 14:16:58 -1000
Message-Id: <20230519001709.2563-14-tj@kernel.org>
X-Mailer: git-send-email 2.40.1
In-Reply-To: <20230519001709.2563-1-tj@kernel.org>
References: <20230519001709.2563-1-tj@kernel.org>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,
        DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
        HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,
        SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no
        version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1766280117716957326?=
X-GMAIL-MSGID: =?utf-8?q?1766280117716957326?=

During boot, to initialize unbound CPU pods, wq_pod_init() was called from
workqueue_init(). This is early enough for NUMA nodes to be set up but
before SMP is brought up and CPU topology information is populated.

Workqueue is in the process of improving CPU locality for unbound workqueues
and will need access to topology information during pod init. This adds a
new init function workqueue_init_topology() which is called after CPU
topology information is available and replaces wq_pod_init().

As unbound CPU pods are now initialized after workqueues are activated, we
need to revisit the workqueues to apply the pod configuration. Workqueues
which are created before workqueue_init_topology() are set up so that they
always use the default worker pool. After pods are set up in
workqueue_init_topology(), wq_update_pod() is called on all existing
workqueues to update the pool associations accordingly.

Note that wq_update_pod_attrs_buf allocation is moved to
workqueue_init_early(). This isn't necessary right now but enables further
generalization of pod handling in the future.

This patch changes the initialization sequence but the end result should be
the same.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/workqueue.h |  1 +
 init/main.c               |  1 +
 kernel/workqueue.c        | 68 +++++++++++++++++++++++----------------
 3 files changed, 43 insertions(+), 27 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 8cc9b86d3256..b8961c8ea5b3 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -709,5 +709,6 @@ int workqueue_offline_cpu(unsigned int cpu);
 
 void __init workqueue_init_early(void);
 void __init workqueue_init(void);
+void __init workqueue_init_topology(void);
 
 #endif
diff --git a/init/main.c b/init/main.c
index af50044deed5..6bd5fffce2e6 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1565,6 +1565,7 @@ static noinline void __init kernel_init_freeable(void)
 	smp_init();
 	sched_init_smp();
 
+	workqueue_init_topology();
 	padata_init();
 	page_alloc_init_late();
 
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 914a69f83d59..add6f5fc799b 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -6222,17 +6222,15 @@ static inline void wq_watchdog_init(void) { }
 
 #endif	/* CONFIG_WQ_WATCHDOG */
 
-static void wq_pod_init(void);
-
 /**
  * workqueue_init_early - early init for workqueue subsystem
  *
- * This is the first half of two-staged workqueue subsystem initialization
- * and invoked as soon as the bare basics - memory allocation, cpumasks and
- * idr are up.  It sets up all the data structures and system workqueues
- * and allows early boot code to create workqueues and queue/cancel work
- * items.  Actual work item execution starts only after kthreads can be
- * created and scheduled right before early initcalls.
+ * This is the first step of three-staged workqueue subsystem initialization and
+ * invoked as soon as the bare basics - memory allocation, cpumasks and idr are
+ * up. It sets up all the data structures and system workqueues and allows early
+ * boot code to create workqueues and queue/cancel work items. Actual work item
+ * execution starts only after kthreads can be created and scheduled right
+ * before early initcalls.
  */
 void __init workqueue_init_early(void)
 {
@@ -6247,6 +6245,9 @@ void __init workqueue_init_early(void)
 
 	pwq_cache = KMEM_CACHE(pool_workqueue, SLAB_PANIC);
 
+	wq_update_pod_attrs_buf = alloc_workqueue_attrs();
+	BUG_ON(!wq_update_pod_attrs_buf);
+
 	/* initialize CPU pools */
 	for_each_possible_cpu(cpu) {
 		struct worker_pool *pool;
@@ -6305,11 +6306,11 @@ void __init workqueue_init_early(void)
 /**
  * workqueue_init - bring workqueue subsystem fully online
  *
- * This is the latter half of two-staged workqueue subsystem initialization
- * and invoked as soon as kthreads can be created and scheduled.
- * Workqueues have been created and work items queued on them, but there
- * are no kworkers executing the work items yet.  Populate the worker pools
- * with the initial workers and enable future kworker creations.
+ * This is the second step of three-staged workqueue subsystem initialization
+ * and invoked as soon as kthreads can be created and scheduled. Workqueues have
+ * been created and work items queued on them, but there are no kworkers
+ * executing the work items yet. Populate the worker pools with the initial
+ * workers and enable future kworker creations.
  */
 void __init workqueue_init(void)
 {
@@ -6320,18 +6321,12 @@ void __init workqueue_init(void)
 	pwq_release_worker = kthread_create_worker(0, "pool_workqueue_release");
 	BUG_ON(IS_ERR(pwq_release_worker));
 
-	/*
-	 * It'd be simpler to initialize pods in workqueue_init_early() but CPU
-	 * to node mapping may not be available that early on some archs such as
-	 * power and arm64. As per-cpu pools created previously could be missing
-	 * node hint and unbound pool pod affinity, fix them up.
-	 *
-	 * Also, while iterating workqueues, create rescuers if requested.
-	 */
-	wq_pod_init();
-
 	mutex_lock(&wq_pool_mutex);
 
+	/*
+	 * Per-cpu pools created earlier could be missing node hint. Fix them
+	 * up. Also, create a rescuer for workqueues that requested it.
+	 */
 	for_each_possible_cpu(cpu) {
 		for_each_cpu_worker_pool(pool, cpu) {
 			pool->node = cpu_to_node(cpu);
@@ -6339,7 +6334,6 @@ void __init workqueue_init(void)
 	}
 
 	list_for_each_entry(wq, &workqueues, list) {
-		wq_update_pod(wq, smp_processor_id(), true);
 		WARN(init_rescuer(wq),
 		     "workqueue: failed to create early rescuer for %s",
 		     wq->name);
@@ -6362,8 +6356,16 @@ void __init workqueue_init(void)
 	wq_watchdog_init();
 }
 
-static void __init wq_pod_init(void)
+/**
+ * workqueue_init_topology - initialize CPU pods for unbound workqueues
+ *
+ * This is the third step of there-staged workqueue subsystem initialization and
+ * invoked after SMP and topology information are fully initialized. It
+ * initializes the unbound CPU pods accordingly.
+ */
+void __init workqueue_init_topology(void)
 {
+	struct workqueue_struct *wq;
 	cpumask_var_t *tbl;
 	int node, cpu;
 
@@ -6377,8 +6379,7 @@ static void __init wq_pod_init(void)
 		}
 	}
 
-	wq_update_pod_attrs_buf = alloc_workqueue_attrs();
-	BUG_ON(!wq_update_pod_attrs_buf);
+	mutex_lock(&wq_pool_mutex);
 
 	/*
 	 * We want masks of possible CPUs of each node which isn't readily
@@ -6399,6 +6400,19 @@ static void __init wq_pod_init(void)
 
 	wq_pod_cpus = tbl;
 	wq_pod_enabled = true;
+
+	/*
+	 * Workqueues allocated earlier would have all CPUs sharing the default
+	 * worker pool. Explicitly call wq_update_pod() on all workqueue and CPU
+	 * combinations to apply per-pod sharing.
+	 */
+	list_for_each_entry(wq, &workqueues, list) {
+		for_each_online_cpu(cpu) {
+			wq_update_pod(wq, cpu, true);
+		}
+	}
+
+	mutex_unlock(&wq_pool_mutex);
 }
 
 /*

From patchwork Fri May 19 00:16:59 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tejun Heo <tj@kernel.org>
X-Patchwork-Id: 96120
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp888141vqo;
        Thu, 18 May 2023 17:23:42 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ7QwQvvP85oaTm8gBF3Ytl/xDgTT81fiWO6f9IlOpFoO/Mb6UiHZcgwJoCxVTNsr/Fctmvv
X-Received: by 2002:a05:6a00:2d90:b0:646:1f13:7fce with SMTP id
 fb16-20020a056a002d9000b006461f137fcemr883226pfb.2.1684455822317;
        Thu, 18 May 2023 17:23:42 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1684455822; cv=none;
        d=google.com; s=arc-20160816;
        b=cIn1S5pdHwXjE5IUTLrXxXnlL427kAWk9qPjpC38J7Is2MvtcWHnhO0BAUHSfZ3Bpc
         Idxom8HVaEXw7yUaTRmxUj35tV4PFW2H340vGg4cjqVNVXVAQ7E6offcupqjRvxBQHdY
         HRVHa4jXaDUVNE+LgRS3I0JtLX9JCR1/t++36nO+NyZhlN8NDRL8VH2ZKpqytzKntPFG
         a1rRmNYhdDwj7nCjfqrkDMKEtPLA7EXhQPB3/tGUj+RqhT/V6tRv/OnQyA6VEstgX+3Z
         VKz7O+ldmqU/Z8aJ4X9GPOJ05V+EUJ0gG4mlQ8XwQ9c5REsBSDkPr9uIptGDTnNenGZ/
         oKHw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from:sender
         :dkim-signature;
        bh=yJP3Bjk8+Z7BZ+AOr0Hj7XvZwuuUj3oJGBaMB6aUA1E=;
        b=LkXGlGbXOn4p9hEm93uNZoFQ44n6hHldT/PChrCKa1xdULkRUTXwW3q8qbpqPPlkji
         vvIcJHtfAGCQgmpgLs1jo0Mw954o8oqDLCZTkrOT1pTZI7izIyqECOy0YRdn5wwpbrHF
         /2isDvWQuV5VcEtmQB3ZGwIoqciFMMgxZEXyfX9uBfSAiW4Pe5FcsvYVzox1Mn7QmKPD
         cxktFBNYAXjSOfYy7xTTZqvM8nWaH/lEGmUqD382KA/SWnRjXTNqcqmrZ42Pomv3zLHp
         o5d6xT4T44NaEvSk0USXXSoejJH1am87HVSXHU7IE64eiXYGOSaQEDALwl0JpftsHlvG
         0b9A==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b="p0Ab3/sP";
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 k184-20020a6384c1000000b005347d6bd7eesi1807093pgd.141.2023.05.18.17.23.30;
        Thu, 18 May 2023 17:23:42 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b="p0Ab3/sP";
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231201AbjESATC (ORCPT <rfc822;pacteraone@gmail.com> + 99 others);
        Thu, 18 May 2023 20:19:02 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54026 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230434AbjESASf (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 May 2023 20:18:35 -0400
Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com
 [IPv6:2607:f8b0:4864:20::62f])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3029F10C9
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:18:16 -0700 (PDT)
Received: by mail-pl1-x62f.google.com with SMTP id
 d9443c01a7336-1aaea43def7so19740985ad.2
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:18:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1684455482; x=1687047482;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date
         :message-id:reply-to;
        bh=yJP3Bjk8+Z7BZ+AOr0Hj7XvZwuuUj3oJGBaMB6aUA1E=;
        b=p0Ab3/sP1z2ZSY9dmi2VcwVTegsdxJnwfZYrbbWKjUNqjrbgNIEgqwnf7hYtko6xCT
         nyvK530qsVCfDn0L6qRihkUZFqdT3KuBlBUgrKJdKLlZCQogjQXFUaNBzda116B/KcEi
         hWkL1TKFh64ajEoL4XBnWyK45O8doIJzYUcERunslSmOgCaO73RWmI1fCDIvBVF3DKQ4
         ujsFeLw++RwiJeNO6d0vdJrT3cdK4c+LLr+XzmS2zFLOp6my45k9CyehDD3BQOcxH0CV
         U1wMEX+ZwJ6N1M3AJ/EvEDvcc/u/SNW0tk4pcFXKnadfzYXjh0Ce8/FBOi2jdPMQv+9a
         vk7g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684455482; x=1687047482;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=yJP3Bjk8+Z7BZ+AOr0Hj7XvZwuuUj3oJGBaMB6aUA1E=;
        b=iLsEvuNp7TaxDZ9SE/1lo553p2UyQkZn9/WMOK/7m4Bfh65FIcjqscyHDPR7uLl+MX
         f4F5fPJipgjdTklITylvjXcamIjjmAnW7Iyh+z+30unUxdn7nLAK+qw5ANDnHY0NM+vi
         W4CgbhBKyEBL1PDO0Vfv3Pfxq61SRmQ932yqFi7po16UAHgGKvkhW6scOhpEuHxCq3Q+
         rv0bUGwEyBdRsX8GLrOHIi4qVwDXGLBdLrAfCoZZwip85iNh6X/fkkRebzaA/XNyWcsv
         1dgVe0rwRKvgjUDUMKA/Gw1+gD8MHYB/fJoU6ieFSOEGN/k18ox5XKuNzTcBC6YDlKt2
         3riQ==
X-Gm-Message-State: AC+VfDwO3fljEqYiOqQLxcTXmZ7WGOKPOf2OATwVZQMIivpqFNILwwjh
        5VwOquNbJewo/ik30OpCMYw=
X-Received: by 2002:a17:902:7287:b0:1a9:8ba4:d0d3 with SMTP id
 d7-20020a170902728700b001a98ba4d0d3mr910157pll.8.1684455482357;
        Thu, 18 May 2023 17:18:02 -0700 (PDT)
Received: from localhost
 (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com.
 [2603:800c:1a02:1bae:a7fa:157f:969a:4cde])
        by smtp.gmail.com with ESMTPSA id
 1-20020a170902ee4100b001a2135e7eabsm2078180plo.16.2023.05.18.17.18.01
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 18 May 2023 17:18:02 -0700 (PDT)
Sender: Tejun Heo <htejun@gmail.com>
From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
        linux-kernel@vger.kernel.org, kernel-team@meta.com,
        joshdon@google.com, brho@google.com, briannorris@chromium.org,
        nhuck@google.com, agk@redhat.com, snitzer@kernel.org,
        void@manifault.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 14/24] workqueue: Generalize unbound CPU pods
Date: Thu, 18 May 2023 14:16:59 -1000
Message-Id: <20230519001709.2563-15-tj@kernel.org>
X-Mailer: git-send-email 2.40.1
In-Reply-To: <20230519001709.2563-1-tj@kernel.org>
References: <20230519001709.2563-1-tj@kernel.org>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,
        DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
        HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,
        SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no
        version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1766279947881120335?=
X-GMAIL-MSGID: =?utf-8?q?1766279947881120335?=

While renamed to pod, the code still assumes that the pods are defined by
NUMA boundaries. Let's generalize it:

* workqueue_attrs->affn_scope is added. Each enum represents the type of
  boundaries that define the pods. There are currently two scopes -
  WQ_AFFN_NUMA and WQ_AFFN_SYSTEM. The former is the same behavior as before
  - one pod per NUMA node. The latter defines one global pod across the
  whole system.

* struct wq_pod_type is added which describes how pods are configured for
  each affnity scope. For each pod, it lists the member CPUs and the
  preferred NUMA node for memory allocations. The reverse mapping from CPU
  to pod is also available.

* wq_pod_enabled is dropped. Pod is now always enabled. The previously
  disabled behavior is now implemented through WQ_AFFN_SYSTEM.

* get_unbound_pool() wants to determine the NUMA node to allocate memory
  from for the new pool. The variables are renamed from node to pod but the
  logic still assumes they're one and the same. Clearly distinguish them -
  walk the WQ_AFFN_NUMA pods to find the matching pod and then use the pod's
  NUMA node.

* wq_calc_pod_cpumask() was taking @pod but assumed that it was the NUMA
  node. Take @cpu instead and determine the cpumask to use from the pod_type
  matching @attrs.

* apply_wqattrs_prepare() is update to return ERR_PTR() on error instead of
  NULL so that it can indicate -EINVAL on invalid affinity scopes.

This patch allows CPUs to be grouped into pods however desired per type.
While this patch causes some internal behavior changes, nothing material
should change for workqueue users.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
 include/linux/workqueue.h |  31 +++++++-
 kernel/workqueue.c        | 154 ++++++++++++++++++++++++--------------
 2 files changed, 125 insertions(+), 60 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index b8961c8ea5b3..a2f826b6ec9a 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -124,6 +124,15 @@ struct rcu_work {
 	struct workqueue_struct *wq;
 };
 
+enum wq_affn_scope {
+	WQ_AFFN_NUMA,			/* one pod per NUMA node */
+	WQ_AFFN_SYSTEM,			/* one pod across the whole system */
+
+	WQ_AFFN_NR_TYPES,
+
+	WQ_AFFN_DFL = WQ_AFFN_NUMA,
+};
+
 /**
  * struct workqueue_attrs - A struct for workqueue attributes.
  *
@@ -140,12 +149,26 @@ struct workqueue_attrs {
 	 */
 	cpumask_var_t cpumask;
 
+	/*
+	 * Below fields aren't properties of a worker_pool. They only modify how
+	 * :c:func:`apply_workqueue_attrs` select pools and thus don't
+	 * participate in pool hash calculations or equality comparisons.
+	 */
+
 	/**
-	 * @ordered: work items must be executed one by one in queueing order
+	 * @affn_scope: unbound CPU affinity scope
 	 *
-	 * Unlike other fields, ``ordered`` isn't a property of a worker_pool. It
-	 * only modifies how :c:func:`apply_workqueue_attrs` select pools and thus
-	 * doesn't participate in pool hash calculations or equality comparisons.
+	 * CPU pods are used to improve execution locality of unbound work
+	 * items. There are multiple pod types, one for each wq_affn_scope, and
+	 * every CPU in the system belongs to one pod in every pod type. CPUs
+	 * that belong to the same pod share the worker pool. For example,
+	 * selecting %WQ_AFFN_NUMA makes the workqueue use a separate worker
+	 * pool for each NUMA node.
+	 */
+	enum wq_affn_scope affn_scope;
+
+	/**
+	 * @ordered: work items must be executed one by one in queueing order
 	 */
 	bool ordered;
 };
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index add6f5fc799b..dae1787833cb 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -325,7 +325,18 @@ struct workqueue_struct {
 
 static struct kmem_cache *pwq_cache;
 
-static cpumask_var_t *wq_pod_cpus;	/* possible CPUs of each node */
+/*
+ * Each pod type describes how CPUs should be grouped for unbound workqueues.
+ * See the comment above workqueue_attrs->affn_scope.
+ */
+struct wq_pod_type {
+	int			nr_pods;	/* number of pods */
+	cpumask_var_t		*pod_cpus;	/* pod -> cpus */
+	int			*pod_node;	/* pod -> node */
+	int			*cpu_pod;	/* cpu -> pod */
+};
+
+static struct wq_pod_type wq_pod_types[WQ_AFFN_NR_TYPES];
 
 /*
  * Per-cpu work items which run for longer than the following threshold are
@@ -341,8 +352,6 @@ module_param_named(power_efficient, wq_power_efficient, bool, 0444);
 
 static bool wq_online;			/* can kworkers be created yet? */
 
-static bool wq_pod_enabled;		/* unbound CPU pod affinity enabled */
-
 /* buf for wq_update_unbound_pod_attrs(), protected by CPU hotplug exclusion */
 static struct workqueue_attrs *wq_update_pod_attrs_buf;
 
@@ -1753,10 +1762,6 @@ static int select_numa_node_cpu(int node)
 {
 	int cpu;
 
-	/* No point in doing this if NUMA isn't enabled for workqueues */
-	if (!wq_pod_enabled)
-		return WORK_CPU_UNBOUND;
-
 	/* Delay binding to CPU if node is not valid or online */
 	if (node < 0 || node >= MAX_NUMNODES || !node_online(node))
 		return WORK_CPU_UNBOUND;
@@ -3639,6 +3644,7 @@ struct workqueue_attrs *alloc_workqueue_attrs(void)
 		goto fail;
 
 	cpumask_copy(attrs->cpumask, cpu_possible_mask);
+	attrs->affn_scope = WQ_AFFN_DFL;
 	return attrs;
 fail:
 	free_workqueue_attrs(attrs);
@@ -3650,11 +3656,13 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
 {
 	to->nice = from->nice;
 	cpumask_copy(to->cpumask, from->cpumask);
+
 	/*
-	 * Unlike hash and equality test, this function doesn't ignore
-	 * ->ordered as it is used for both pool and wq attrs.  Instead,
-	 * get_unbound_pool() explicitly clears ->ordered after copying.
+	 * Unlike hash and equality test, copying shouldn't ignore wq-only
+	 * fields as copying is used for both pool and wq attrs. Instead,
+	 * get_unbound_pool() explicitly clears the fields.
 	 */
+	to->affn_scope = from->affn_scope;
 	to->ordered = from->ordered;
 }
 
@@ -3680,6 +3688,24 @@ static bool wqattrs_equal(const struct workqueue_attrs *a,
 	return true;
 }
 
+/* find wq_pod_type to use for @attrs */
+static const struct wq_pod_type *
+wqattrs_pod_type(const struct workqueue_attrs *attrs)
+{
+	struct wq_pod_type *pt = &wq_pod_types[attrs->affn_scope];
+
+	if (likely(pt->nr_pods))
+		return pt;
+
+	/*
+	 * Before workqueue_init_topology(), only SYSTEM is available which is
+	 * initialized in workqueue_init_early().
+	 */
+	pt = &wq_pod_types[WQ_AFFN_SYSTEM];
+	BUG_ON(!pt->nr_pods);
+	return pt;
+}
+
 /**
  * init_worker_pool - initialize a newly zalloc'd worker_pool
  * @pool: worker_pool to initialize
@@ -3880,10 +3906,10 @@ static void put_unbound_pool(struct worker_pool *pool)
  */
 static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
 {
+	struct wq_pod_type *pt = &wq_pod_types[WQ_AFFN_NUMA];
 	u32 hash = wqattrs_hash(attrs);
 	struct worker_pool *pool;
-	int pod;
-	int target_pod = NUMA_NO_NODE;
+	int pod, node = NUMA_NO_NODE;
 
 	lockdep_assert_held(&wq_pool_mutex);
 
@@ -3895,28 +3921,24 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
 		}
 	}
 
-	/* if cpumask is contained inside a pod, we belong to that pod */
-	if (wq_pod_enabled) {
-		for_each_node(pod) {
-			if (cpumask_subset(attrs->cpumask, wq_pod_cpus[pod])) {
-				target_pod = pod;
-				break;
-			}
+	/* If cpumask is contained inside a NUMA pod, that's our NUMA node */
+	for (pod = 0; pod < pt->nr_pods; pod++) {
+		if (cpumask_subset(attrs->cpumask, pt->pod_cpus[pod])) {
+			node = pt->pod_node[pod];
+			break;
 		}
 	}
 
 	/* nope, create a new one */
-	pool = kzalloc_node(sizeof(*pool), GFP_KERNEL, target_pod);
+	pool = kzalloc_node(sizeof(*pool), GFP_KERNEL, node);
 	if (!pool || init_worker_pool(pool) < 0)
 		goto fail;
 
 	copy_workqueue_attrs(pool->attrs, attrs);
-	pool->node = target_pod;
+	pool->node = node;
 
-	/*
-	 * ordered isn't a worker_pool attribute, always clear it.  See
-	 * 'struct workqueue_attrs' comments for detail.
-	 */
+	/* clear wq-only attr fields. See 'struct workqueue_attrs' comments */
+	pool->attrs->affn_scope = WQ_AFFN_NR_TYPES;
 	pool->attrs->ordered = false;
 
 	if (worker_pool_assign_id(pool) < 0)
@@ -4103,7 +4125,7 @@ static struct pool_workqueue *alloc_unbound_pwq(struct workqueue_struct *wq,
 /**
  * wq_calc_pod_cpumask - calculate a wq_attrs' cpumask for a pod
  * @attrs: the wq_attrs of the default pwq of the target workqueue
- * @pod: the target CPU pod
+ * @cpu: the target CPU
  * @cpu_going_down: if >= 0, the CPU to consider as offline
  * @cpumask: outarg, the resulting cpumask
  *
@@ -4117,30 +4139,29 @@ static struct pool_workqueue *alloc_unbound_pwq(struct workqueue_struct *wq,
  *
  * The caller is responsible for ensuring that the cpumask of @pod stays stable.
  */
-static void wq_calc_pod_cpumask(const struct workqueue_attrs *attrs, int pod,
-				 int cpu_going_down, cpumask_t *cpumask)
+static void wq_calc_pod_cpumask(const struct workqueue_attrs *attrs, int cpu,
+				int cpu_going_down, cpumask_t *cpumask)
 {
-	if (!wq_pod_enabled || attrs->ordered)
-		goto use_dfl;
+	const struct wq_pod_type *pt = wqattrs_pod_type(attrs);
+	int pod = pt->cpu_pod[cpu];
 
 	/* does @pod have any online CPUs @attrs wants? */
-	cpumask_and(cpumask, cpumask_of_node(pod), attrs->cpumask);
+	cpumask_and(cpumask, pt->pod_cpus[pod], attrs->cpumask);
+	cpumask_and(cpumask, cpumask, cpu_online_mask);
 	if (cpu_going_down >= 0)
 		cpumask_clear_cpu(cpu_going_down, cpumask);
 
-	if (cpumask_empty(cpumask))
-		goto use_dfl;
+	if (cpumask_empty(cpumask)) {
+		cpumask_copy(cpumask, attrs->cpumask);
+		return;
+	}
 
 	/* yeap, return possible CPUs in @pod that @attrs wants */
-	cpumask_and(cpumask, attrs->cpumask, wq_pod_cpus[pod]);
+	cpumask_and(cpumask, attrs->cpumask, pt->pod_cpus[pod]);
 
 	if (cpumask_empty(cpumask))
 		pr_warn_once("WARNING: workqueue cpumask: online intersect > "
 				"possible intersect\n");
-	return;
-
-use_dfl:
-	cpumask_copy(cpumask, attrs->cpumask);
 }
 
 /* install @pwq into @wq's cpu_pwq and return the old pwq */
@@ -4197,6 +4218,10 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
 
 	lockdep_assert_held(&wq_pool_mutex);
 
+	if (WARN_ON(attrs->affn_scope < 0 ||
+		    attrs->affn_scope >= WQ_AFFN_NR_TYPES))
+		return ERR_PTR(-EINVAL);
+
 	ctx = kzalloc(struct_size(ctx, pwq_tbl, nr_cpu_ids), GFP_KERNEL);
 
 	new_attrs = alloc_workqueue_attrs();
@@ -4236,8 +4261,7 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
 			ctx->dfl_pwq->refcnt++;
 			ctx->pwq_tbl[cpu] = ctx->dfl_pwq;
 		} else {
-			wq_calc_pod_cpumask(new_attrs, cpu_to_node(cpu), -1,
-					    tmp_attrs->cpumask);
+			wq_calc_pod_cpumask(new_attrs, cpu, -1, tmp_attrs->cpumask);
 			ctx->pwq_tbl[cpu] = alloc_unbound_pwq(wq, tmp_attrs);
 			if (!ctx->pwq_tbl[cpu])
 				goto out_free;
@@ -4257,7 +4281,7 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
 	free_workqueue_attrs(tmp_attrs);
 	free_workqueue_attrs(new_attrs);
 	apply_wqattrs_cleanup(ctx);
-	return NULL;
+	return ERR_PTR(-ENOMEM);
 }
 
 /* set attrs and install prepared pwqs, @ctx points to old pwqs on return */
@@ -4313,8 +4337,8 @@ static int apply_workqueue_attrs_locked(struct workqueue_struct *wq,
 	}
 
 	ctx = apply_wqattrs_prepare(wq, attrs, wq_unbound_cpumask);
-	if (!ctx)
-		return -ENOMEM;
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
 
 	/* the ctx has been prepared successfully, let's commit it */
 	apply_wqattrs_commit(ctx);
@@ -4376,7 +4400,6 @@ int apply_workqueue_attrs(struct workqueue_struct *wq,
  */
 static void wq_update_pod(struct workqueue_struct *wq, int cpu, bool online)
 {
-	int pod = cpu_to_node(cpu);
 	int cpu_off = online ? -1 : cpu;
 	struct pool_workqueue *old_pwq = NULL, *pwq;
 	struct workqueue_attrs *target_attrs;
@@ -4384,8 +4407,7 @@ static void wq_update_pod(struct workqueue_struct *wq, int cpu, bool online)
 
 	lockdep_assert_held(&wq_pool_mutex);
 
-	if (!wq_pod_enabled || !(wq->flags & WQ_UNBOUND) ||
-	    wq->unbound_attrs->ordered)
+	if (!(wq->flags & WQ_UNBOUND) || wq->unbound_attrs->ordered)
 		return;
 
 	/*
@@ -4399,7 +4421,7 @@ static void wq_update_pod(struct workqueue_struct *wq, int cpu, bool online)
 	copy_workqueue_attrs(target_attrs, wq->unbound_attrs);
 
 	/* nothing to do if the target cpumask matches the current pwq */
-	wq_calc_pod_cpumask(wq->dfl_pwq->pool->attrs, pod, cpu_off, cpumask);
+	wq_calc_pod_cpumask(wq->dfl_pwq->pool->attrs, cpu, cpu_off, cpumask);
 	pwq = rcu_dereference_protected(*per_cpu_ptr(wq->cpu_pwq, cpu),
 					lockdep_is_held(&wq_pool_mutex));
 	if (cpumask_equal(cpumask, pwq->pool->attrs->cpumask))
@@ -5640,8 +5662,8 @@ static int workqueue_apply_unbound_cpumask(const cpumask_var_t unbound_cpumask)
 			continue;
 
 		ctx = apply_wqattrs_prepare(wq, wq->unbound_attrs, unbound_cpumask);
-		if (!ctx) {
-			ret = -ENOMEM;
+		if (IS_ERR(ctx)) {
+			ret = PTR_ERR(ctx);
 			break;
 		}
 
@@ -6234,6 +6256,7 @@ static inline void wq_watchdog_init(void) { }
  */
 void __init workqueue_init_early(void)
 {
+	struct wq_pod_type *pt = &wq_pod_types[WQ_AFFN_SYSTEM];
 	int std_nice[NR_STD_WORKER_POOLS] = { 0, HIGHPRI_NICE_LEVEL };
 	int i, cpu;
 
@@ -6248,6 +6271,22 @@ void __init workqueue_init_early(void)
 	wq_update_pod_attrs_buf = alloc_workqueue_attrs();
 	BUG_ON(!wq_update_pod_attrs_buf);
 
+	/* initialize WQ_AFFN_SYSTEM pods */
+	pt->pod_cpus = kcalloc(1, sizeof(pt->pod_cpus[0]), GFP_KERNEL);
+	pt->pod_node = kcalloc(1, sizeof(pt->pod_node[0]), GFP_KERNEL);
+	pt->cpu_pod = kcalloc(nr_cpu_ids, sizeof(pt->cpu_pod[0]), GFP_KERNEL);
+	BUG_ON(!pt->pod_cpus || !pt->pod_node || !pt->cpu_pod);
+
+	BUG_ON(!zalloc_cpumask_var_node(&pt->pod_cpus[0], GFP_KERNEL, NUMA_NO_NODE));
+
+	wq_update_pod_attrs_buf = alloc_workqueue_attrs();
+	BUG_ON(!wq_update_pod_attrs_buf);
+
+	pt->nr_pods = 1;
+	cpumask_copy(pt->pod_cpus[0], cpu_possible_mask);
+	pt->pod_node[0] = NUMA_NO_NODE;
+	pt->cpu_pod[0] = 0;
+
 	/* initialize CPU pools */
 	for_each_possible_cpu(cpu) {
 		struct worker_pool *pool;
@@ -6365,8 +6404,8 @@ void __init workqueue_init(void)
  */
 void __init workqueue_init_topology(void)
 {
+	struct wq_pod_type *pt = &wq_pod_types[WQ_AFFN_NUMA];
 	struct workqueue_struct *wq;
-	cpumask_var_t *tbl;
 	int node, cpu;
 
 	if (num_possible_nodes() <= 1)
@@ -6386,20 +6425,23 @@ void __init workqueue_init_topology(void)
 	 * available.  Build one from cpu_to_node() which should have been
 	 * fully initialized by now.
 	 */
-	tbl = kcalloc(nr_node_ids, sizeof(tbl[0]), GFP_KERNEL);
-	BUG_ON(!tbl);
+	pt->pod_cpus = kcalloc(nr_node_ids, sizeof(pt->pod_cpus[0]), GFP_KERNEL);
+	pt->pod_node = kcalloc(nr_node_ids, sizeof(pt->pod_node[0]), GFP_KERNEL);
+	pt->cpu_pod = kcalloc(nr_cpu_ids, sizeof(pt->cpu_pod[0]), GFP_KERNEL);
+	BUG_ON(!pt->pod_cpus || !pt->pod_node || !pt->cpu_pod);
 
 	for_each_node(node)
-		BUG_ON(!zalloc_cpumask_var_node(&tbl[node], GFP_KERNEL,
+		BUG_ON(!zalloc_cpumask_var_node(&pt->pod_cpus[node], GFP_KERNEL,
 				node_online(node) ? node : NUMA_NO_NODE));
 
 	for_each_possible_cpu(cpu) {
 		node = cpu_to_node(cpu);
-		cpumask_set_cpu(cpu, tbl[node]);
+		cpumask_set_cpu(cpu, pt->pod_cpus[node]);
+		pt->pod_node[node] = node;
+		pt->cpu_pod[cpu] = node;
 	}
 
-	wq_pod_cpus = tbl;
-	wq_pod_enabled = true;
+	pt->nr_pods = nr_node_ids;
 
 	/*
 	 * Workqueues allocated earlier would have all CPUs sharing the default

From patchwork Fri May 19 00:17:00 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tejun Heo <tj@kernel.org>
X-Patchwork-Id: 96128
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp889455vqo;
        Thu, 18 May 2023 17:27:23 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ4bAk6ngR4NG/sw5kNJ4Mc53JculGmqO/k3b+lgHlhBuPcUM5Mw2PBZHz4cd4WOup9jRZzH
X-Received: by 2002:a17:902:e80c:b0:1a9:581e:d809 with SMTP id
 u12-20020a170902e80c00b001a9581ed809mr1171660plg.7.1684456042658;
        Thu, 18 May 2023 17:27:22 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1684456042; cv=none;
        d=google.com; s=arc-20160816;
        b=RQtCjOS5NFpfgBrFqbWd59AF6SLW8BNKu1JgHANDmoDyDXxMcV89X3otNxsXph98kr
         2mACfsJmOvldXVWUaRwMeQ/AHMuiIRhdDrD6AH74BZp61+Px/Hv9cKRiiUOx0EAb25yL
         frrpfJaq0Wtr5MtAe5IfDM1lklgyHpJYq+HgxJdat5Dwwbc9j2YP5O741JD1x5BWZHlG
         tYEMI7VsZ3nuP94w5MfCXj3hLKyNBMuqY+8H8GyL7/ZsshM4+CUZVWxXLdyLInqelZbr
         syyRuc0Xo5XsQzIClkKKywMH9gTUHHyw9HW0mCqbe36YzgAau0wx3swj/qn5uV7LznWT
         uV/g==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from:sender
         :dkim-signature;
        bh=5aRD0uHMN5cuSsBQXhyFHHuk25uP1lpS3hnLUdCcdjw=;
        b=U9r/oFtAJZbCNjzJwlrmLrJ5Vjfw7Vtj6PsSBXVc3eNfTMTMOnxz1siRagdPRcFQAb
         Zwm9nOZ/myqiU44zIG7XCffluK4wnFiEjPHLYhJPBmiizboT1pvZs4tF4K5fArOvJtP5
         eyepmEhZriNttXk5NQHBYyX3qxmLFQKCeK15iUNcob0K40bbCUqtFbpSHtUaty5pGy9A
         ++cS5R8s/TWqJShg+YaedSYRQFAcY5Ht14La74L+D/msuXvp4Ocjj46aPG5tPZrXs0eu
         VDOYeYF5Uvq2jtoDZq+ErG2oewayLuq8VU0tw7MZ9uuocyByTRwrjHyzEIlna5XF5qDk
         V4fg==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=K3JWZMj6;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 e19-20020a170902ed9300b0019e88376e3csi2362632plj.162.2023.05.18.17.27.10;
        Thu, 18 May 2023 17:27:22 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=K3JWZMj6;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230437AbjESATG (ORCPT <rfc822;pacteraone@gmail.com> + 99 others);
        Thu, 18 May 2023 20:19:06 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54108 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230450AbjESASk (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 May 2023 20:18:40 -0400
Received: from mail-pl1-x630.google.com (mail-pl1-x630.google.com
 [IPv6:2607:f8b0:4864:20::630])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8B6B31735
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:18:20 -0700 (PDT)
Received: by mail-pl1-x630.google.com with SMTP id
 d9443c01a7336-1aaea43def7so19741075ad.2
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:18:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1684455484; x=1687047484;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date
         :message-id:reply-to;
        bh=5aRD0uHMN5cuSsBQXhyFHHuk25uP1lpS3hnLUdCcdjw=;
        b=K3JWZMj6Oyd82tH0kb4x5BmwKNW4MaWVQx7uB4Q8Al43umNTMrrQISNjn23mVZnuNo
         y9TbIFvMXS65F00kjGUikKqn/PpHgH/rbPs4OZTKFLrmqQ00R0XVr2iDFKxSzZAZxPWd
         qbH3I14wn4tVn9ewb9aFGhEHvvC4Qd0CHiTQ1G+ieTSSqBHDmwJPkQSgEYsStDHeB/SM
         CmQRY9xKRlbTK6lCt9RCDrosb9uE42UwoNrDXFiZGsC3eNyWEzNLj0n2iZsRZm9o7Bhn
         L2tmba9nI2U356/Q3Y8MPDKlpk4c5aTcCYd5/ENM8W0j92MAOs99x+chVl8JICLFXeML
         CSWg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684455484; x=1687047484;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=5aRD0uHMN5cuSsBQXhyFHHuk25uP1lpS3hnLUdCcdjw=;
        b=iSEmp6NyEtY1iQlYOOqW5iQt25aPcJNi7xDR5YrMCQjB4dulbabQHD7ZXMlbjwZLDc
         1RLcIDPVjn0tS+wReozYTFOg9/VIncRyXO3LNfN6/N3sIlXJZlG53JYn4cFCtoWNI7+H
         WlcdZItiIfM+Ih+AOfPnuh2f9t4G0vnQvvVnnyvK5eky0r1NtYlHrXEdpziKQ9S0A3yx
         bwSq0OOhfHqqDhBFg/sLacaKpaslDeq+9ujW4cs5V1WYF1Pkksq0yhiYEaI9/wasSIOu
         +akKYAoMJM55QvbPwVFcHZ0KHPD3tA/QCmQfWmh1cvJGeJd+2AkzFDP+GCmYBaBCdwxF
         IPvw==
X-Gm-Message-State: AC+VfDzTVeQ4kSmWO3MLIml0R+kU/xD7ZLx5XdGpBsW22he6BEuy/Zf3
        3k+PNj4aI8UCjU44tnRoQ1Y=
X-Received: by 2002:a17:902:c944:b0:1a6:46f2:4365 with SMTP id
 i4-20020a170902c94400b001a646f24365mr1073430pla.30.1684455484325;
        Thu, 18 May 2023 17:18:04 -0700 (PDT)
Received: from localhost
 (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com.
 [2603:800c:1a02:1bae:a7fa:157f:969a:4cde])
        by smtp.gmail.com with ESMTPSA id
 p19-20020a170902b09300b001ae44cd96besm2027265plr.135.2023.05.18.17.18.03
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 18 May 2023 17:18:04 -0700 (PDT)
Sender: Tejun Heo <htejun@gmail.com>
From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
        linux-kernel@vger.kernel.org, kernel-team@meta.com,
        joshdon@google.com, brho@google.com, briannorris@chromium.org,
        nhuck@google.com, agk@redhat.com, snitzer@kernel.org,
        void@manifault.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 15/24] workqueue: Add tools/workqueue/wq_dump.py which prints
 out workqueue configuration
Date: Thu, 18 May 2023 14:17:00 -1000
Message-Id: <20230519001709.2563-16-tj@kernel.org>
X-Mailer: git-send-email 2.40.1
In-Reply-To: <20230519001709.2563-1-tj@kernel.org>
References: <20230519001709.2563-1-tj@kernel.org>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,
        DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
        HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,
        SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no
        version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1766280179526503780?=
X-GMAIL-MSGID: =?utf-8?q?1766280179526503780?=

Lack of visibility has always been a pain point for workqueues. While the
recently added wq_monitor.py improved the situation, it's still difficult to
understand what worker pools are active in the system, how workqueues map to
them and why. The lack of visibility into how workqueues are configured is
going to become more noticeable as workqueue improves locality awareness and
provides more mechanisms to customize locality related behaviors.

Now that the basic framework for more flexible locality support is in place,
this is a good time to improve the situation. This patch adds
tools/workqueues/wq_dump.py which prints out the topology configuration,
worker pools and how workqueues are mapped to pools. Read the command's help
message for more details.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 Documentation/core-api/workqueue.rst |  59 ++++++++++
 tools/workqueue/wq_dump.py           | 166 +++++++++++++++++++++++++++
 2 files changed, 225 insertions(+)
 create mode 100644 tools/workqueue/wq_dump.py

diff --git a/Documentation/core-api/workqueue.rst b/Documentation/core-api/workqueue.rst
index 8e541c5d8fa9..c9e46acd339b 100644
--- a/Documentation/core-api/workqueue.rst
+++ b/Documentation/core-api/workqueue.rst
@@ -347,6 +347,65 @@ Guidelines
   level of locality in wq operations and work item execution.
 
 
+Examining Configuration
+=======================
+
+Use tools/workqueue/wq_dump.py to examine unbound CPU affinity
+configuration, worker pools and how workqueues map to the pools: ::
+
+  $ tools/workqueue/wq_dump.py
+  Affinity Scopes
+  ===============
+  wq_unbound_cpumask=0000000f
+
+  NUMA
+    nr_pods  2
+    pod_cpus [0]=00000003 [1]=0000000c
+    pod_node [0]=0 [1]=1
+    cpu_pod  [0]=0 [1]=0 [2]=1 [3]=1
+
+  SYSTEM
+    nr_pods  1
+    pod_cpus [0]=0000000f
+    pod_node [0]=-1
+    cpu_pod  [0]=0 [1]=0 [2]=0 [3]=0
+
+  Worker Pools
+  ============
+  pool[00] ref= 1 nice=  0 idle/workers=  4/  4 cpu=  0
+  pool[01] ref= 1 nice=-20 idle/workers=  2/  2 cpu=  0
+  pool[02] ref= 1 nice=  0 idle/workers=  4/  4 cpu=  1
+  pool[03] ref= 1 nice=-20 idle/workers=  2/  2 cpu=  1
+  pool[04] ref= 1 nice=  0 idle/workers=  4/  4 cpu=  2
+  pool[05] ref= 1 nice=-20 idle/workers=  2/  2 cpu=  2
+  pool[06] ref= 1 nice=  0 idle/workers=  3/  3 cpu=  3
+  pool[07] ref= 1 nice=-20 idle/workers=  2/  2 cpu=  3
+  pool[08] ref=42 nice=  0 idle/workers=  6/  6 cpus=0000000f
+  pool[09] ref=28 nice=  0 idle/workers=  3/  3 cpus=00000003
+  pool[10] ref=28 nice=  0 idle/workers= 17/ 17 cpus=0000000c
+  pool[11] ref= 1 nice=-20 idle/workers=  1/  1 cpus=0000000f
+  pool[12] ref= 2 nice=-20 idle/workers=  1/  1 cpus=00000003
+  pool[13] ref= 2 nice=-20 idle/workers=  1/  1 cpus=0000000c
+
+  Workqueue CPU -> pool
+  =====================
+  [    workqueue \ CPU              0  1  2  3 dfl]
+  events                   percpu   0  2  4  6
+  events_highpri           percpu   1  3  5  7
+  events_long              percpu   0  2  4  6
+  events_unbound           unbound  9  9 10 10  8
+  events_freezable         percpu   0  2  4  6
+  events_power_efficient   percpu   0  2  4  6
+  events_freezable_power_  percpu   0  2  4  6
+  rcu_gp                   percpu   0  2  4  6
+  rcu_par_gp               percpu   0  2  4  6
+  slub_flushwq             percpu   0  2  4  6
+  netns                    ordered  8  8  8  8  8
+  ...
+
+See the command's help message for more info.
+
+
 Monitoring
 ==========
 
diff --git a/tools/workqueue/wq_dump.py b/tools/workqueue/wq_dump.py
new file mode 100644
index 000000000000..ddd0bb4395ea
--- /dev/null
+++ b/tools/workqueue/wq_dump.py
@@ -0,0 +1,166 @@
+#!/usr/bin/env drgn
+#
+# Copyright (C) 2023 Tejun Heo <tj@kernel.org>
+# Copyright (C) 2023 Meta Platforms, Inc. and affiliates.
+
+desc = """
+This is a drgn script to show the current workqueue configuration. For more
+info on drgn, visit https://github.com/osandov/drgn.
+
+Affinity Scopes
+===============
+
+Shows the CPUs that can be used for unbound workqueues and how they will be
+grouped by each available affinity type. For each type:
+
+  nr_pods   number of CPU pods in the affinity type
+  pod_cpus  CPUs in each pod
+  pod_node  NUMA node for memory allocation for each pod
+  cpu_pod   pod that each CPU is associated to
+
+Worker Pools
+============
+
+Lists all worker pools indexed by their ID. For each pool:
+
+  ref       number of pool_workqueue's associated with this pool
+  nice      nice value of the worker threads in the pool
+  idle      number of idle workers
+  workers   number of all workers
+  cpu       CPU the pool is associated with (per-cpu pool)
+  cpus      CPUs the workers in the pool can run on (unbound pool)
+
+Workqueue CPU -> pool
+=====================
+
+Lists all workqueues along with their type and worker pool association. For
+each workqueue:
+
+  NAME TYPE POOL_ID...
+
+  NAME      name of the workqueue
+  TYPE      percpu, unbound or ordered
+  POOL_ID   worker pool ID associated with each possible CPU
+"""
+
+import sys
+
+import drgn
+from drgn.helpers.linux.list import list_for_each_entry,list_empty
+from drgn.helpers.linux.percpu import per_cpu_ptr
+from drgn.helpers.linux.cpumask import for_each_cpu,for_each_possible_cpu
+from drgn.helpers.linux.idr import idr_for_each
+
+import argparse
+parser = argparse.ArgumentParser(description=desc,
+                                 formatter_class=argparse.RawTextHelpFormatter)
+args = parser.parse_args()
+
+def err(s):
+    print(s, file=sys.stderr, flush=True)
+    sys.exit(1)
+
+def cpumask_str(cpumask):
+    output = ""
+    base = 0
+    v = 0
+    for cpu in for_each_cpu(cpumask[0]):
+        while cpu - base >= 32:
+            output += f'{hex(v)} '
+            base += 32
+            v = 0
+        v |= 1 << (cpu - base)
+    if v > 0:
+        output += f'{v:08x}'
+    return output.strip()
+
+worker_pool_idr         = prog['worker_pool_idr']
+workqueues              = prog['workqueues']
+wq_unbound_cpumask      = prog['wq_unbound_cpumask']
+wq_pod_types            = prog['wq_pod_types']
+
+WQ_UNBOUND              = prog['WQ_UNBOUND']
+WQ_ORDERED              = prog['__WQ_ORDERED']
+WQ_MEM_RECLAIM          = prog['WQ_MEM_RECLAIM']
+
+WQ_AFFN_NUMA            = prog['WQ_AFFN_NUMA']
+WQ_AFFN_SYSTEM          = prog['WQ_AFFN_SYSTEM']
+
+print('Affinity Scopes')
+print('===============')
+
+print(f'wq_unbound_cpumask={cpumask_str(wq_unbound_cpumask)}')
+
+def print_pod_type(pt):
+    print(f'  nr_pods  {pt.nr_pods.value_()}')
+
+    print('  pod_cpus', end='')
+    for pod in range(pt.nr_pods):
+        print(f' [{pod}]={cpumask_str(pt.pod_cpus[pod])}', end='')
+    print('')
+
+    print('  pod_node', end='')
+    for pod in range(pt.nr_pods):
+        print(f' [{pod}]={pt.pod_node[pod].value_()}', end='')
+    print('')
+
+    print(f'  cpu_pod ', end='')
+    for cpu in for_each_possible_cpu(prog):
+        print(f' [{cpu}]={pt.cpu_pod[cpu].value_()}', end='')
+    print('')
+
+print('')
+print('NUMA')
+print_pod_type(wq_pod_types[WQ_AFFN_NUMA])
+print('')
+print('SYSTEM')
+print_pod_type(wq_pod_types[WQ_AFFN_SYSTEM])
+
+print('')
+print('Worker Pools')
+print('============')
+
+max_pool_id_len = 0
+max_ref_len = 0
+for pi, pool in idr_for_each(worker_pool_idr):
+    pool = drgn.Object(prog, 'struct worker_pool', address=pool)
+    max_pool_id_len = max(max_pool_id_len, len(f'{pi}'))
+    max_ref_len = max(max_ref_len, len(f'{pool.refcnt.value_()}'))
+
+for pi, pool in idr_for_each(worker_pool_idr):
+    pool = drgn.Object(prog, 'struct worker_pool', address=pool)
+    print(f'pool[{pi:0{max_pool_id_len}}] ref={pool.refcnt.value_():{max_ref_len}} nice={pool.attrs.nice.value_():3} ', end='')
+    print(f'idle/workers={pool.nr_idle.value_():3}/{pool.nr_workers.value_():3} ', end='')
+    if pool.cpu >= 0:
+        print(f'cpu={pool.cpu.value_():3}', end='')
+    else:
+        print(f'cpus={cpumask_str(pool.attrs.cpumask)}', end='')
+    print('')
+
+print('')
+print('Workqueue CPU -> pool')
+print('=====================')
+
+print('[    workqueue \ CPU            ', end='')
+for cpu in for_each_possible_cpu(prog):
+    print(f' {cpu:{max_pool_id_len}}', end='')
+print(' dfl]')
+
+for wq in list_for_each_entry('struct workqueue_struct', workqueues.address_of_(), 'list'):
+    print(f'{wq.name.string_().decode()[-24:]:24}', end='')
+    if wq.flags & WQ_UNBOUND:
+        if wq.flags & WQ_ORDERED:
+            print(' ordered', end='')
+        else:
+            print(' unbound', end='')
+    else:
+        print(' percpu ', end='')
+
+    for cpu in for_each_possible_cpu(prog):
+        pool_id = per_cpu_ptr(wq.cpu_pwq, cpu)[0].pool.id.value_()
+        field_len = max(len(str(cpu)), max_pool_id_len)
+        print(f' {pool_id:{field_len}}', end='')
+
+    if wq.flags & WQ_UNBOUND:
+        print(f' {wq.dfl_pwq.pool.id.value_():{max_pool_id_len}}', end='')
+    print('')

From patchwork Fri May 19 00:17:01 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tejun Heo <tj@kernel.org>
X-Patchwork-Id: 96127
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp889457vqo;
        Thu, 18 May 2023 17:27:23 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ7VXo3Bz3ZMV5WPh2u1MIGA+7xKGUgd2qf/E3pbuXqCS6DDX19/EgA0H3LwRsay9co09ePk
X-Received: by 2002:a17:902:e5c8:b0:1ae:7631:7fa9 with SMTP id
 u8-20020a170902e5c800b001ae76317fa9mr1013772plf.37.1684456043365;
        Thu, 18 May 2023 17:27:23 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1684456043; cv=none;
        d=google.com; s=arc-20160816;
        b=rN3YEhg0HFQHPXPY/VjR05XZ78k7VHG7YJC6JJoM0gaL2DriTGo+gMP2lC1fi6MY3X
         7ZEGyR/DUH6tsSjGUcufZWun6tXC6h6iGBdQ7kmjnplrvKAjXsoGbmFysoKbRpz7D3vp
         VYUFGC1uOL8xLdOZCWvOksOGA/6B+tdlIunrpE84zHujzzkNO4FfU4GUhhp1FlJMc3NL
         GSdeeTjkqFojWS9jVxcSj5iL8FYY4b8B4u78q1jxnx71rK13x0ieW4NQ0Isy3RP86ig3
         Isyn0JBPy2L98Eew4qvHOWxQz90adoFDq2Y3uIJc9eBWExfmbdWVjLwJxpjmhPpHefHm
         ITfA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from:sender
         :dkim-signature;
        bh=yJzaB4sbD9GE0bBqwD3qd2EPr+nnQXrioLEXbrU0xdI=;
        b=HaX3GMhX9MOuycg5o/zgn4n+1zAEqd1CIiXWPQDcwNPdVh1lGTLzaOIVpxiQmFmOUS
         eBl813or//ixh/UjaW6yQMzLTqxjp8ZQYUkzXOHBkegYZ61sesjmMHJdXqhKCkkqNS5s
         sFN9FWMfGVjk+aQJ42/UyvdWwrPjQp1jlYFzRluO+bxBpxJwt9vlab1FjssxTb+GiAc3
         E3T6MiUt2fgjTxUwdsK/eVts7Fy/UtiqspQ/5wFVa52d7WgDpfNc0VXlZAuwgBusuwKd
         7fisIypXmbUkYsTCOPElt0OBIe6XAEy350oxKZMOy6N5lozVWDK8HLKeTfJgkXgkNOqJ
         aUIA==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=LAr7MSHr;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 u13-20020a170903124d00b001ab1411f3e6si2712052plh.260.2023.05.18.17.27.10;
        Thu, 18 May 2023 17:27:23 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=LAr7MSHr;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231223AbjESATJ (ORCPT <rfc822;pacteraone@gmail.com> + 99 others);
        Thu, 18 May 2023 20:19:09 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54136 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230454AbjESASk (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 May 2023 20:18:40 -0400
Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com
 [IPv6:2607:f8b0:4864:20::633])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 437771BC1
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:18:23 -0700 (PDT)
Received: by mail-pl1-x633.google.com with SMTP id
 d9443c01a7336-1ae58e4b295so16276955ad.2
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:18:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1684455486; x=1687047486;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date
         :message-id:reply-to;
        bh=yJzaB4sbD9GE0bBqwD3qd2EPr+nnQXrioLEXbrU0xdI=;
        b=LAr7MSHr9NiXJV6RgUD+SFsrlffaVq99Wt9GZB0wYhRZRnkjIdpagO/rR7FJHO4sSp
         oz31yRQpSVCZMZIRXIuKNan60IIEPiLxnv0KQFYz3YQp+f7LD/5TlWOH2dvH5UsQ/CCa
         FUSOD3TlUgJSO+sX8KuZVuR5erFWjgbEWs/A8Rlr2k+jkzDRIlZhN0JLz3yc0Af6YJ23
         GTubO+nDMbBDBhBZITsZ6G3NImOQOh5YJYbaSw7CgKcjG3+7DeZfi/KwkGooEYUJgiqZ
         Oi40MwnHSSkBLhha5nAQP4pfsGs9kjjV40nFgdXIrXWjSysQfVRrgjuqRQfKlemqmvqC
         AAxQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684455486; x=1687047486;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=yJzaB4sbD9GE0bBqwD3qd2EPr+nnQXrioLEXbrU0xdI=;
        b=DiW6ONpx4pjimaF6G7CJsrAXfTXowXWq9vOJYE62vEXFYT3nmuTGD9vZSH7nfUHoEZ
         h/riUnMkgZ+ZvJncqpQVnNEro3rPdZKtRwkpTOE3NISg9n8jqCOJG51lwGxGGWMktLj1
         FQMNQ1Vpbt4RVPCVYAecBaDfnZjbeOTwI6P/Acr3qnE2sD2F6pPxIPST/qC6mLzWSPSG
         9nQOpVpv7yfmO32tw+S0a1K2pIB+mqS4EVV9YM1LEDfskRkCbUOWGm/7WYhR3uy9PSrh
         ofi+rTlxn8xxNcxXbOA2auUmjSNujMX20V/YtDKwmpJwKLJKXg9v7iOmyovz4PxUX08N
         KfJg==
X-Gm-Message-State: AC+VfDz83Vnk1V5bwZfv/9oUhFc5PHSGbAMwNxndwe75I58quEWCqLL6
        35ksaBv9fo+4ceGEEzv5AZw=
X-Received: by 2002:a17:903:248:b0:1a9:3b64:3747 with SMTP id
 j8-20020a170903024800b001a93b643747mr881330plh.17.1684455486100;
        Thu, 18 May 2023 17:18:06 -0700 (PDT)
Received: from localhost
 (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com.
 [2603:800c:1a02:1bae:a7fa:157f:969a:4cde])
        by smtp.gmail.com with ESMTPSA id
 iz22-20020a170902ef9600b001a4fecf79e4sm2068031plb.49.2023.05.18.17.18.05
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 18 May 2023 17:18:05 -0700 (PDT)
Sender: Tejun Heo <htejun@gmail.com>
From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
        linux-kernel@vger.kernel.org, kernel-team@meta.com,
        joshdon@google.com, brho@google.com, briannorris@chromium.org,
        nhuck@google.com, agk@redhat.com, snitzer@kernel.org,
        void@manifault.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 16/24] workqueue: Modularize wq_pod_type initialization
Date: Thu, 18 May 2023 14:17:01 -1000
Message-Id: <20230519001709.2563-17-tj@kernel.org>
X-Mailer: git-send-email 2.40.1
In-Reply-To: <20230519001709.2563-1-tj@kernel.org>
References: <20230519001709.2563-1-tj@kernel.org>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,
        DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
        HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,
        SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no
        version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1766280179847914891?=
X-GMAIL-MSGID: =?utf-8?q?1766280179847914891?=

While wq_pod_type[] can now group CPUs in any aribitrary way, WQ_AFFN_NUM
init is hard coded into workqueue_init_topology(). This patch modularizes
the init path by introducing init_pod_type() which takes a callback to
determine whether two CPUs should share a pod as an argument.

init_pod_type() first scans the CPU combinations testing for sharing to
assign consecutive pod IDs and initialize pod_type->cpu_pod[]. Once
->cpu_pod[] is determined, ->pod_cpus[] and ->pod_node[] are initialized
accordingly. WQ_AFFN_NUMA is now initialized by calling init_pod_type() with
cpus_share_numa() which tests whether the CPU belongs to the same NUMA node.

This patch may change the pod ID assigned to each NUMA node but that
shouldn't cause any behavior changes as the NUMA node to use for allocations
are tracked separately in pod_type->pod_node[]. This makes adding new
affinty types pretty easy.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/workqueue.c | 84 +++++++++++++++++++++++++++-------------------
 1 file changed, 50 insertions(+), 34 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index dae1787833cb..1734b8a11a4c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -6395,6 +6395,54 @@ void __init workqueue_init(void)
 	wq_watchdog_init();
 }
 
+/*
+ * Initialize @pt by first initializing @pt->cpu_pod[] with pod IDs according to
+ * @cpu_shares_pod(). Each subset of CPUs that share a pod is assigned a unique
+ * and consecutive pod ID. The rest of @pt is initialized accordingly.
+ */
+static void __init init_pod_type(struct wq_pod_type *pt,
+				 bool (*cpus_share_pod)(int, int))
+{
+	int cur, pre, cpu, pod;
+
+	pt->nr_pods = 0;
+
+	/* init @pt->cpu_pod[] according to @cpus_share_pod() */
+	pt->cpu_pod = kcalloc(nr_cpu_ids, sizeof(pt->cpu_pod[0]), GFP_KERNEL);
+	BUG_ON(!pt->cpu_pod);
+
+	for_each_possible_cpu(cur) {
+		for_each_possible_cpu(pre) {
+			if (pre >= cur) {
+				pt->cpu_pod[cur] = pt->nr_pods++;
+				break;
+			}
+			if (cpus_share_pod(cur, pre)) {
+				pt->cpu_pod[cur] = pt->cpu_pod[pre];
+				break;
+			}
+		}
+	}
+
+	/* init the rest to match @pt->cpu_pod[] */
+	pt->pod_cpus = kcalloc(pt->nr_pods, sizeof(pt->pod_cpus[0]), GFP_KERNEL);
+	pt->pod_node = kcalloc(pt->nr_pods, sizeof(pt->pod_node[0]), GFP_KERNEL);
+	BUG_ON(!pt->pod_cpus || !pt->pod_node);
+
+	for (pod = 0; pod < pt->nr_pods; pod++)
+		BUG_ON(!zalloc_cpumask_var(&pt->pod_cpus[pod], GFP_KERNEL));
+
+	for_each_possible_cpu(cpu) {
+		cpumask_set_cpu(cpu, pt->pod_cpus[pt->cpu_pod[cpu]]);
+		pt->pod_node[pt->cpu_pod[cpu]] = cpu_to_node(cpu);
+	}
+}
+
+static bool __init cpus_share_numa(int cpu0, int cpu1)
+{
+	return cpu_to_node(cpu0) == cpu_to_node(cpu1);
+}
+
 /**
  * workqueue_init_topology - initialize CPU pods for unbound workqueues
  *
@@ -6404,45 +6452,13 @@ void __init workqueue_init(void)
  */
 void __init workqueue_init_topology(void)
 {
-	struct wq_pod_type *pt = &wq_pod_types[WQ_AFFN_NUMA];
 	struct workqueue_struct *wq;
-	int node, cpu;
-
-	if (num_possible_nodes() <= 1)
-		return;
+	int cpu;
 
-	for_each_possible_cpu(cpu) {
-		if (WARN_ON(cpu_to_node(cpu) == NUMA_NO_NODE)) {
-			pr_warn("workqueue: NUMA node mapping not available for cpu%d, disabling NUMA support\n", cpu);
-			return;
-		}
-	}
+	init_pod_type(&wq_pod_types[WQ_AFFN_NUMA], cpus_share_numa);
 
 	mutex_lock(&wq_pool_mutex);
 
-	/*
-	 * We want masks of possible CPUs of each node which isn't readily
-	 * available.  Build one from cpu_to_node() which should have been
-	 * fully initialized by now.
-	 */
-	pt->pod_cpus = kcalloc(nr_node_ids, sizeof(pt->pod_cpus[0]), GFP_KERNEL);
-	pt->pod_node = kcalloc(nr_node_ids, sizeof(pt->pod_node[0]), GFP_KERNEL);
-	pt->cpu_pod = kcalloc(nr_cpu_ids, sizeof(pt->cpu_pod[0]), GFP_KERNEL);
-	BUG_ON(!pt->pod_cpus || !pt->pod_node || !pt->cpu_pod);
-
-	for_each_node(node)
-		BUG_ON(!zalloc_cpumask_var_node(&pt->pod_cpus[node], GFP_KERNEL,
-				node_online(node) ? node : NUMA_NO_NODE));
-
-	for_each_possible_cpu(cpu) {
-		node = cpu_to_node(cpu);
-		cpumask_set_cpu(cpu, pt->pod_cpus[node]);
-		pt->pod_node[node] = node;
-		pt->cpu_pod[cpu] = node;
-	}
-
-	pt->nr_pods = nr_node_ids;
-
 	/*
 	 * Workqueues allocated earlier would have all CPUs sharing the default
 	 * worker pool. Explicitly call wq_update_pod() on all workqueue and CPU

From patchwork Fri May 19 00:17:02 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tejun Heo <tj@kernel.org>
X-Patchwork-Id: 96118
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp887203vqo;
        Thu, 18 May 2023 17:21:18 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ6Q7bDV7d8IO3XQJYeqIw1Ez0FMulT45j/i7dXfeAhHzZKL1AHj1+Nv5ocyOgCRtjs6P1uz
X-Received: by 2002:a17:902:c412:b0:1ae:4:da97 with SMTP id
 k18-20020a170902c41200b001ae0004da97mr1161347plk.4.1684455678306;
        Thu, 18 May 2023 17:21:18 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1684455678; cv=none;
        d=google.com; s=arc-20160816;
        b=dyq7uq5lWK44qzJE4J2aik53M4Mgx6gjavzpHz5ewH9y2J3nj3d/3lgKf4H/pRObS+
         MvOE0SBfOKNgWrijpDf+xsuteNkDRGj24WdQLDTcQuZDuPDOu7WsyGWtSEW14xALERXT
         VvaIaP4aaNqOBypVYwrFsQi10rqSxpWlo/oXLtVqm2nIUBKWAUO5r8RRzrfPQCRRU1IV
         KOa1tteIs+CfkJ4xL0RNMnB4ER4ldwLKyQaO7onoY9wZnVBIR8H2PNibsdhmzifYEggU
         2DxOh9LN37eTFtCuh0okAJBQVPnmwcEwg6Qc0u2SI3GOyFeyl3NrZYDXZpL4f0POiew+
         Yk9w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from:sender
         :dkim-signature;
        bh=dP0ycHF3kegqOyAJTyq+wPxH3acuLsvoILr7u3u5EHc=;
        b=JRNVEWpE6g2lsyKfKt4I/UFXj+9FCkWW0PJ9wzkwOQVgrxR0vFiuATW8JWdEPxGNt9
         gPmgyWT9ksABVIkzu8ndnqeMfPOjqAgAPoS+67QxT9dX7Fzi/KG7Xxq4tpUhFj8wfCcI
         Z5VKUw0R8mVGbBxemzWrEGqnF8cU0wKqRmdn6zhSONE1fIlmwnOqqF0+hhjDgy8WbEME
         f9FB4yQo0YbpTaohY4P1RGA4U7FgWEmiQdJ3KbZTJ+j7sUbVswW4kkJBL3DrH5mhwlNY
         4Im1vO6GFnF5stMy6+MF1v7Jml8OLbDjAnkHLe8QXErpPO3QbX/KbyA1MMPQybnP2Wvf
         fjZg==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=IyUzdnaX;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 ik1-20020a170902ab0100b001aae63a371esi2187353plb.478.2023.05.18.17.21.03;
        Thu, 18 May 2023 17:21:18 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=IyUzdnaX;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231147AbjESATR (ORCPT <rfc822;pacteraone@gmail.com> + 99 others);
        Thu, 18 May 2023 20:19:17 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54414 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230487AbjESASm (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 May 2023 20:18:42 -0400
Received: from mail-pg1-x529.google.com (mail-pg1-x529.google.com
 [IPv6:2607:f8b0:4864:20::529])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C2101BC9
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:18:23 -0700 (PDT)
Received: by mail-pg1-x529.google.com with SMTP id
 41be03b00d2f7-5341737d7aeso2340730a12.2
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:18:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1684455488; x=1687047488;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date
         :message-id:reply-to;
        bh=dP0ycHF3kegqOyAJTyq+wPxH3acuLsvoILr7u3u5EHc=;
        b=IyUzdnaX1NrXHVvlHaGsPqPFrRaEZKMAqOMS4nlxP+217jmvVu1728V6o7dJvRXtxh
         29q3cW/DfHaqKyc7vOxwLC760DwmPzrLQo+kF4Qv0KBHbJzWwJZfYOU9NukkRuuUrqVk
         OlcNSTex/cxWXizFC+Pf59CeGdYZ4YIbOwE+HxNu7uaHltKE+uBX2/dsT0+vINyXhpvc
         AB/AuFS3ygiofTa0lUbqjDN9ARYI0a/DP1Uu7nHJWsnYlwALfiByLZmMtY9oikBSFk9a
         V662pSq68eYANl0/a0omQF6o3wAG6lFqRraqVEumaf2FRgoXOgvIk/PYdRg6DjHpImbb
         CiSg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684455488; x=1687047488;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=dP0ycHF3kegqOyAJTyq+wPxH3acuLsvoILr7u3u5EHc=;
        b=gU3WcX54z/KjQ8U/aJc3BNlVGhDlHTMGzlwdz1MCyPOqEtDfEVYQ09s8EhU+IpXe/7
         RWQzPpGPzdUiYfEQ7+m8tIJFeB1Kqu7v4HnlALEdO7BghOYW3bgs0X4V33XqswFoNuNV
         12R/mfbdaW0U2SivIO1N3Rb5tCq12XFSwOEEmbPhROTmxdAJNmPQKc6L3f9GgOF/bYTE
         ohbJtfn3/xxbAQjDx+yxYIM+Sf0MCLxIGAH/IUJQk1F/tas5cgt15zpJesMqDbNJXcRD
         7mjTYRYKz4QNtkyns3k/WnG1n29ect4G9GxX0JQuE2tFVrO6WLTVsjV/GKPSFeyhjKrf
         Zvqw==
X-Gm-Message-State: AC+VfDwe6QmKMxFQY5fotvIfimExqTdjL1OP9+CJ8EOm7f214j3UTgIu
        vIyQEoXtqBP3JScLfCfdYng=
X-Received: by 2002:a17:902:f809:b0:1ab:11c8:777a with SMTP id
 ix9-20020a170902f80900b001ab11c8777amr794836plb.13.1684455488053;
        Thu, 18 May 2023 17:18:08 -0700 (PDT)
Received: from localhost
 (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com.
 [2603:800c:1a02:1bae:a7fa:157f:969a:4cde])
        by smtp.gmail.com with ESMTPSA id
 u2-20020a170902b28200b001ac7c6fd12asm2051881plr.104.2023.05.18.17.18.07
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 18 May 2023 17:18:07 -0700 (PDT)
Sender: Tejun Heo <htejun@gmail.com>
From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
        linux-kernel@vger.kernel.org, kernel-team@meta.com,
        joshdon@google.com, brho@google.com, briannorris@chromium.org,
        nhuck@google.com, agk@redhat.com, snitzer@kernel.org,
        void@manifault.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 17/24] workqueue: Add multiple affinity scopes and interface
 to select them
Date: Thu, 18 May 2023 14:17:02 -1000
Message-Id: <20230519001709.2563-18-tj@kernel.org>
X-Mailer: git-send-email 2.40.1
In-Reply-To: <20230519001709.2563-1-tj@kernel.org>
References: <20230519001709.2563-1-tj@kernel.org>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,
        DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
        HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,
        SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no
        version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1766279797248277467?=
X-GMAIL-MSGID: =?utf-8?q?1766279797248277467?=

Add three more affinity scopes - WQ_AFFN_CPU, SMT and CACHE - and make CACHE
the default. The code changes to actually add the additional scopes are
trivial.

Also add module parameter "workqueue.default_affinity_scope" to override the
default scope and "affinity_scope" sysfs file to configure it per workqueue.
wq_dump.py and documentations are updated accordingly.

This enables significant flexibility in configuring how unbound workqueues
behave. If affinity scope is set to "cpu", it'll behave close to a per-cpu
workqueue. On the other hand, "system" removes all locality boundaries.

Many modern machines have multiple L3 caches often while being mostly
uniform in terms of memory access. Thus, workqueue's previous behavior of
spreading work items in each NUMA node had negative performance implications
from unncessarily crossing L3 boundaries between issue and execution.
However, picking a finer grained affinity scope also has a downside in that
an issuer in one group can't utilize CPUs in other groups.

While dependent on the specifics of workload, there's usually a noticeable
penalty in crossing L3 boundaries, so let's default to CACHE. This issue
will be further addressed and documented with examples in future patches.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 .../admin-guide/kernel-parameters.txt         |  12 ++
 Documentation/core-api/workqueue.rst          |  63 ++++++++++
 include/linux/workqueue.h                     |   5 +-
 kernel/workqueue.c                            | 110 +++++++++++++++++-
 tools/workqueue/wq_dump.py                    |  15 ++-
 5 files changed, 193 insertions(+), 12 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 042275425c32..0aa7fd68a024 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6958,6 +6958,18 @@
 			The default value of this parameter is determined by
 			the config option CONFIG_WQ_POWER_EFFICIENT_DEFAULT.
 
+        workqueue.default_affinity_scope=
+			Select the default affinity scope to use for unbound
+			workqueues. Can be one of "cpu", "smt", "cache",
+			"numa" and "system". Default is "cache". For more
+			information, see the Affinity Scopes section in
+			Documentation/core-api/workqueue.rst.
+
+			This can be updated after boot through the matching
+			file under /sys/module/workqueue/parameters.
+			However, the changed default will only apply to
+			unbound workqueues created afterwards.
+
 	workqueue.debug_force_rr_cpu
 			Workqueue used to implicitly guarantee that work
 			items queued without explicit CPU specified are put
diff --git a/Documentation/core-api/workqueue.rst b/Documentation/core-api/workqueue.rst
index c9e46acd339b..56af317508c9 100644
--- a/Documentation/core-api/workqueue.rst
+++ b/Documentation/core-api/workqueue.rst
@@ -347,6 +347,51 @@ Guidelines
   level of locality in wq operations and work item execution.
 
 
+Affinity Scopes
+===============
+
+An unbound workqueue groups CPUs according to its affinity scope to improve
+cache locality. For example, if a workqueue is using the default affinity
+scope of "cache", it will group CPUs according to last level cache
+boundaries. A work item queued on the workqueue will be processed by a
+worker running on one of the CPUs which share the last level cache with the
+issuing CPU.
+
+Workqueue currently supports the following five affinity scopes.
+
+``cpu``
+  CPUs are not grouped. A work item issued on one CPU is processed by a
+  worker on the same CPU. This makes unbound workqueues behave as per-cpu
+  workqueues without concurrency management.
+
+``smt``
+  CPUs are grouped according to SMT boundaries. This usually means that the
+  logical threads of each physical CPU core are grouped together.
+
+``cache``
+  CPUs are grouped according to cache boundaries. Which specific cache
+  boundary is used is determined by the arch code. L3 is used in a lot of
+  cases. This is the default affinity scope.
+
+``numa``
+  CPUs are grouped according to NUMA bounaries.
+
+``system``
+  All CPUs are put in the same group. Workqueue makes no effort to process a
+  work item on a CPU close to the issuing CPU.
+
+The default affinity scope can be changed with the module parameter
+``workqueue.default_affinity_scope`` and a specific workqueue's affinity
+scope can be changed using ``apply_workqueue_attrs()``.
+
+If ``WQ_SYSFS`` is set, the workqueue will have the following affinity scope
+related interface files under its ``/sys/devices/virtual/WQ_NAME/``
+directory.
+
+``affinity_scope``
+  Read to see the current affinity scope. Write to change.
+
+
 Examining Configuration
 =======================
 
@@ -358,6 +403,24 @@ Use tools/workqueue/wq_dump.py to examine unbound CPU affinity
   ===============
   wq_unbound_cpumask=0000000f
 
+  CPU
+    nr_pods  4
+    pod_cpus [0]=00000001 [1]=00000002 [2]=00000004 [3]=00000008
+    pod_node [0]=0 [1]=0 [2]=1 [3]=1
+    cpu_pod  [0]=0 [1]=1 [2]=2 [3]=3
+
+  SMT
+    nr_pods  4
+    pod_cpus [0]=00000001 [1]=00000002 [2]=00000004 [3]=00000008
+    pod_node [0]=0 [1]=0 [2]=1 [3]=1
+    cpu_pod  [0]=0 [1]=1 [2]=2 [3]=3
+
+  CACHE (default)
+    nr_pods  2
+    pod_cpus [0]=00000003 [1]=0000000c
+    pod_node [0]=0 [1]=1
+    cpu_pod  [0]=0 [1]=0 [2]=1 [3]=1
+
   NUMA
     nr_pods  2
     pod_cpus [0]=00000003 [1]=0000000c
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index a2f826b6ec9a..a01b5dcbbeb9 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -125,12 +125,15 @@ struct rcu_work {
 };
 
 enum wq_affn_scope {
+	WQ_AFFN_CPU,			/* one pod per CPU */
+	WQ_AFFN_SMT,			/* one pod poer SMT */
+	WQ_AFFN_CACHE,			/* one pod per LLC */
 	WQ_AFFN_NUMA,			/* one pod per NUMA node */
 	WQ_AFFN_SYSTEM,			/* one pod across the whole system */
 
 	WQ_AFFN_NR_TYPES,
 
-	WQ_AFFN_DFL = WQ_AFFN_NUMA,
+	WQ_AFFN_DFL = WQ_AFFN_CACHE,
 };
 
 /**
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 1734b8a11a4c..bb0900602408 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -337,6 +337,15 @@ struct wq_pod_type {
 };
 
 static struct wq_pod_type wq_pod_types[WQ_AFFN_NR_TYPES];
+static enum wq_affn_scope wq_affn_dfl = WQ_AFFN_DFL;
+
+static const char *wq_affn_names[WQ_AFFN_NR_TYPES] = {
+	[WQ_AFFN_CPU]			= "cpu",
+	[WQ_AFFN_SMT]			= "smt",
+	[WQ_AFFN_CACHE]			= "cache",
+	[WQ_AFFN_NUMA]			= "numa",
+	[WQ_AFFN_SYSTEM]		= "system",
+};
 
 /*
  * Per-cpu work items which run for longer than the following threshold are
@@ -3644,7 +3653,7 @@ struct workqueue_attrs *alloc_workqueue_attrs(void)
 		goto fail;
 
 	cpumask_copy(attrs->cpumask, cpu_possible_mask);
-	attrs->affn_scope = WQ_AFFN_DFL;
+	attrs->affn_scope = wq_affn_dfl;
 	return attrs;
 fail:
 	free_workqueue_attrs(attrs);
@@ -5721,19 +5730,55 @@ int workqueue_set_unbound_cpumask(cpumask_var_t cpumask)
 	return ret;
 }
 
+static int parse_affn_scope(const char *val)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(wq_affn_names); i++) {
+		if (!strncasecmp(val, wq_affn_names[i], strlen(wq_affn_names[i])))
+			return i;
+	}
+	return -EINVAL;
+}
+
+static int wq_affn_dfl_set(const char *val, const struct kernel_param *kp)
+{
+	int affn;
+
+	affn = parse_affn_scope(val);
+	if (affn < 0)
+		return affn;
+
+	wq_affn_dfl = affn;
+	return 0;
+}
+
+static int wq_affn_dfl_get(char *buffer, const struct kernel_param *kp)
+{
+	return scnprintf(buffer, PAGE_SIZE, "%s\n", wq_affn_names[wq_affn_dfl]);
+}
+
+static const struct kernel_param_ops wq_affn_dfl_ops = {
+	.set	= wq_affn_dfl_set,
+	.get	= wq_affn_dfl_get,
+};
+
+module_param_cb(default_affinity_scope, &wq_affn_dfl_ops, NULL, 0644);
+
 #ifdef CONFIG_SYSFS
 /*
  * Workqueues with WQ_SYSFS flag set is visible to userland via
  * /sys/bus/workqueue/devices/WQ_NAME.  All visible workqueues have the
  * following attributes.
  *
- *  per_cpu	RO bool	: whether the workqueue is per-cpu or unbound
- *  max_active	RW int	: maximum number of in-flight work items
+ *  per_cpu		RO bool	: whether the workqueue is per-cpu or unbound
+ *  max_active		RW int	: maximum number of in-flight work items
  *
  * Unbound workqueues have the following extra attributes.
  *
- *  nice	RW int	: nice value of the workers
- *  cpumask	RW mask	: bitmask of allowed CPUs for the workers
+ *  nice		RW int	: nice value of the workers
+ *  cpumask		RW mask	: bitmask of allowed CPUs for the workers
+ *  affinity_scope	RW str  : worker CPU affinity scope (cache, numa, none)
  */
 struct wq_device {
 	struct workqueue_struct		*wq;
@@ -5876,9 +5921,47 @@ static ssize_t wq_cpumask_store(struct device *dev,
 	return ret ?: count;
 }
 
+static ssize_t wq_affn_scope_show(struct device *dev,
+				  struct device_attribute *attr, char *buf)
+{
+	struct workqueue_struct *wq = dev_to_wq(dev);
+	int written;
+
+	mutex_lock(&wq->mutex);
+	written = scnprintf(buf, PAGE_SIZE, "%s\n",
+			    wq_affn_names[wq->unbound_attrs->affn_scope]);
+	mutex_unlock(&wq->mutex);
+
+	return written;
+}
+
+static ssize_t wq_affn_scope_store(struct device *dev,
+				   struct device_attribute *attr,
+				   const char *buf, size_t count)
+{
+	struct workqueue_struct *wq = dev_to_wq(dev);
+	struct workqueue_attrs *attrs;
+	int affn, ret = -ENOMEM;
+
+	affn = parse_affn_scope(buf);
+	if (affn < 0)
+		return affn;
+
+	apply_wqattrs_lock();
+	attrs = wq_sysfs_prep_attrs(wq);
+	if (attrs) {
+		attrs->affn_scope = affn;
+		ret = apply_workqueue_attrs_locked(wq, attrs);
+	}
+	apply_wqattrs_unlock();
+	free_workqueue_attrs(attrs);
+	return ret ?: count;
+}
+
 static struct device_attribute wq_sysfs_unbound_attrs[] = {
 	__ATTR(nice, 0644, wq_nice_show, wq_nice_store),
 	__ATTR(cpumask, 0644, wq_cpumask_show, wq_cpumask_store),
+	__ATTR(affinity_scope, 0644, wq_affn_scope_show, wq_affn_scope_store),
 	__ATTR_NULL,
 };
 
@@ -6438,6 +6521,20 @@ static void __init init_pod_type(struct wq_pod_type *pt,
 	}
 }
 
+static bool __init cpus_dont_share(int cpu0, int cpu1)
+{
+	return false;
+}
+
+static bool __init cpus_share_smt(int cpu0, int cpu1)
+{
+#ifdef CONFIG_SCHED_SMT
+	return cpumask_test_cpu(cpu0, cpu_smt_mask(cpu1));
+#else
+	return false;
+#endif
+}
+
 static bool __init cpus_share_numa(int cpu0, int cpu1)
 {
 	return cpu_to_node(cpu0) == cpu_to_node(cpu1);
@@ -6455,6 +6552,9 @@ void __init workqueue_init_topology(void)
 	struct workqueue_struct *wq;
 	int cpu;
 
+	init_pod_type(&wq_pod_types[WQ_AFFN_CPU], cpus_dont_share);
+	init_pod_type(&wq_pod_types[WQ_AFFN_SMT], cpus_share_smt);
+	init_pod_type(&wq_pod_types[WQ_AFFN_CACHE], cpus_share_cache);
 	init_pod_type(&wq_pod_types[WQ_AFFN_NUMA], cpus_share_numa);
 
 	mutex_lock(&wq_pool_mutex);
diff --git a/tools/workqueue/wq_dump.py b/tools/workqueue/wq_dump.py
index ddd0bb4395ea..43ab71a193b8 100644
--- a/tools/workqueue/wq_dump.py
+++ b/tools/workqueue/wq_dump.py
@@ -78,11 +78,16 @@ worker_pool_idr         = prog['worker_pool_idr']
 workqueues              = prog['workqueues']
 wq_unbound_cpumask      = prog['wq_unbound_cpumask']
 wq_pod_types            = prog['wq_pod_types']
+wq_affn_dfl             = prog['wq_affn_dfl']
+wq_affn_names           = prog['wq_affn_names']
 
 WQ_UNBOUND              = prog['WQ_UNBOUND']
 WQ_ORDERED              = prog['__WQ_ORDERED']
 WQ_MEM_RECLAIM          = prog['WQ_MEM_RECLAIM']
 
+WQ_AFFN_CPU             = prog['WQ_AFFN_CPU']
+WQ_AFFN_SMT             = prog['WQ_AFFN_SMT']
+WQ_AFFN_CACHE           = prog['WQ_AFFN_CACHE']
 WQ_AFFN_NUMA            = prog['WQ_AFFN_NUMA']
 WQ_AFFN_SYSTEM          = prog['WQ_AFFN_SYSTEM']
 
@@ -109,12 +114,10 @@ print(f'wq_unbound_cpumask={cpumask_str(wq_unbound_cpumask)}')
         print(f' [{cpu}]={pt.cpu_pod[cpu].value_()}', end='')
     print('')
 
-print('')
-print('NUMA')
-print_pod_type(wq_pod_types[WQ_AFFN_NUMA])
-print('')
-print('SYSTEM')
-print_pod_type(wq_pod_types[WQ_AFFN_SYSTEM])
+for affn in [WQ_AFFN_CPU, WQ_AFFN_SMT, WQ_AFFN_CACHE, WQ_AFFN_NUMA, WQ_AFFN_SYSTEM]:
+    print('')
+    print(f'{wq_affn_names[affn].string_().decode().upper()}{" (default)" if affn == wq_affn_dfl else ""}')
+    print_pod_type(wq_pod_types[affn])
 
 print('')
 print('Worker Pools')

From patchwork Fri May 19 00:17:03 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tejun Heo <tj@kernel.org>
X-Patchwork-Id: 96133
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp890371vqo;
        Thu, 18 May 2023 17:29:51 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ6vmJmzbci/MXkmKJP03p6CkqqwrCkdchSQu/PJOz6gMxcE6eOd3rxfFscpzjYUuno67t2O
X-Received: by 2002:a17:902:f641:b0:1ac:5d9b:d971 with SMTP id
 m1-20020a170902f64100b001ac5d9bd971mr3822101plg.34.1684456190827;
        Thu, 18 May 2023 17:29:50 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1684456190; cv=none;
        d=google.com; s=arc-20160816;
        b=kwWjgwF4fRdKEmvJneoM3qoTeblwb3vxkbaPFwItaOe9o997F+Jeu6FrBxZMxp/a+U
         zAXJ9vaNiopRHJcTFyf5yVCQm/mykaxoDcwdzfIRyVK7r7iK8e2pT1R9Li8HBK3NiuxT
         iM+MkmPZCvkzuK08jL+4Nsnmkdol1BzZ0wtlIsbu3VOK3MpVLuqF9M1su/9FCatwLcyr
         ZoQnW8OVDqQyCxw/T6s5Q8zYuqc8KtuLks8nydqnDh6MlAGgGBqqUzxknTrphUvUCkQr
         /zxHQGrpUOklGwCVDXhRruvRx1ueJNJnMzvHGZA6UNY1vaqkzBd1a1BKUEsNs/L+k54a
         ZeHQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from:sender
         :dkim-signature;
        bh=3FMjv/w/Hp/nkhV0xFtUdgAGxV0/BekPEmbNAHbOcSE=;
        b=kTeJ3O2ldt3LmOjpWg9tyHBR+MW+ShactJ8Ke+24pb9KDzG2860Eni0U7HugyW/rrT
         3ohrnWqJ82koV9xEHP2KH5pvAwF+fpPjmyUyrreLqIKRdDC62QOJpFZCYb5ixyDhIZpt
         5OpmY8sUw76toGDUidKQoqljQkDHLh2KRV/NbGWmjgx/tMQBgTaiQLdhFTaU1Zpuk4Zq
         jP/tlBiWLnGhNOc9aFZGTLYyBuZTaaXyo/VdImnOm0W/HVaP8tRPQ3YvYlgj7St69tBQ
         fjSJCaHdKGuCPC8ST/Mk3FnJz3EKkn5Xn75EtuoONwlWRoVu1PdECzWWqe9yiQmgICyw
         zITQ==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=gqdPupC7;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 9-20020a170902c24900b001ac6430f68csi2541187plg.96.2023.05.18.17.29.38;
        Thu, 18 May 2023 17:29:50 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=gqdPupC7;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230192AbjESATT (ORCPT <rfc822;pacteraone@gmail.com> + 99 others);
        Thu, 18 May 2023 20:19:19 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54474 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231150AbjESASq (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 May 2023 20:18:46 -0400
Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com
 [IPv6:2607:f8b0:4864:20::62d])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1C06C1736
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:18:27 -0700 (PDT)
Received: by mail-pl1-x62d.google.com with SMTP id
 d9443c01a7336-1ac4910e656so2124345ad.0
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:18:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1684455490; x=1687047490;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date
         :message-id:reply-to;
        bh=3FMjv/w/Hp/nkhV0xFtUdgAGxV0/BekPEmbNAHbOcSE=;
        b=gqdPupC7vG9dxYRsXYbW/7FUu59mI8lZh+ewJbmTyUiXR7IeYYoNamJ3rNtuVFR+NG
         sD21MSBH4XRpLtELOuJ1T40jCbOf/Kjn1S8PE8FNsmSMPJK4r723baayCUYonk5lMCCo
         ozIk/oj/0fkj+6LcOr/6+uCIR9rVOr5l7P+qEemlZGvY7VXXWo7YBf6RZlD+x3IiK7ar
         QtditnhPhuT7jy5TrH1vEHqTQiZcn8MeVZQlECkckbu8YtvuOsLFgXAnaOn6JgdAIiVX
         cmVmurae7SDV8G5Q3GDsZhGdaKnmkJCHwb3sJCCpVKemqgFFfsisGd8bOiQcxhvO0gp+
         C3AQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684455490; x=1687047490;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=3FMjv/w/Hp/nkhV0xFtUdgAGxV0/BekPEmbNAHbOcSE=;
        b=USGzaexB4iHs7cC5S/nhlIcBEFEsArA6kr0n48pdiLjohFl6VgUFNt8XKMx3xgfOMf
         ZO9Es1+wc9aHxvHc5nNRHaDuU3EVApZjO3YYAzUeBnVxaxSHglfvHxxwY4vH+0O1nU9q
         mhG0aCD5uRA/+PmS3b2V85/Q2kwsyv/BeHsxH3ruKXVK6jUw/NbRE1wAq8Jucp37h4jn
         us5m08QCg3WGs+G3yb/nDnlthi2JY0raxMAn1/GNcdSPDvjqfBXal7qV84kvToW/7HIQ
         glsYWR+m+sOvEbjS8sjtVjsnS4Edj5ALzkPigkM7ZAbF6DCU50DX30asDnaB9ig/GY0S
         jHiQ==
X-Gm-Message-State: AC+VfDx2bs7fHaNZtoZltvYqPjSSdK8f0+Qk8NGnohr2EIQz6gxpJmU8
        enoi5TBzVUr4/ZJlRufWNBk=
X-Received: by 2002:a17:902:c410:b0:1aa:d4ba:de2 with SMTP id
 k16-20020a170902c41000b001aad4ba0de2mr1214743plk.18.1684455489854;
        Thu, 18 May 2023 17:18:09 -0700 (PDT)
Received: from localhost
 (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com.
 [2603:800c:1a02:1bae:a7fa:157f:969a:4cde])
        by smtp.gmail.com with ESMTPSA id
 d20-20020a170902c19400b001a1a07d04e6sm2061759pld.77.2023.05.18.17.18.09
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 18 May 2023 17:18:09 -0700 (PDT)
Sender: Tejun Heo <htejun@gmail.com>
From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
        linux-kernel@vger.kernel.org, kernel-team@meta.com,
        joshdon@google.com, brho@google.com, briannorris@chromium.org,
        nhuck@google.com, agk@redhat.com, snitzer@kernel.org,
        void@manifault.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 18/24] workqueue: Factor out work to worker assignment and
 collision handling
Date: Thu, 18 May 2023 14:17:03 -1000
Message-Id: <20230519001709.2563-19-tj@kernel.org>
X-Mailer: git-send-email 2.40.1
In-Reply-To: <20230519001709.2563-1-tj@kernel.org>
References: <20230519001709.2563-1-tj@kernel.org>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,
        DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
        HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,
        SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no
        version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1766280334725171576?=
X-GMAIL-MSGID: =?utf-8?q?1766280334725171576?=

The two work execution paths in worker_thread() and rescuer_thread() use
move_linked_works() to claim work items from @pool->worklist. Once claimed,
process_schedule_works() is called which invokes process_one_work() on each
work item. process_one_work() then uses find_worker_executing_work() to
detect and handle collisions - situations where the work item to be executed
is still running on another worker.

This works fine, but, to improve work execution locality, we want to
establish work to worker association earlier and know for sure that the
worker is going to excute the work once asssigned, which requires performing
collision handling earlier while trying to assign the work item to the
worker.

This patch introduces assign_work() which assigns a work item to a worker
using move_linked_works() and then performs collision handling. As collision
handling is handled earlier, process_one_work() no longer needs to worry
about them.

After the this patch, collision checks for linked work items are skipped,
which should be fine as they can't be queued multiple times concurrently.
For work items running from rescuers, the timing of collision handling may
change but the invariant that the work items go through collision handling
before starting execution does not.

This patch shouldn't cause noticeable behavior changes, especially given
that worker_thread() behavior remains the same.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/workqueue.c | 80 ++++++++++++++++++++++++++++++----------------
 1 file changed, 52 insertions(+), 28 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index bb0900602408..a2e6c2be3a06 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1015,13 +1015,10 @@ static struct worker *find_worker_executing_work(struct worker_pool *pool,
  * @head: target list to append @work to
  * @nextp: out parameter for nested worklist walking
  *
- * Schedule linked works starting from @work to @head.  Work series to
- * be scheduled starts at @work and includes any consecutive work with
- * WORK_STRUCT_LINKED set in its predecessor.
- *
- * If @nextp is not NULL, it's updated to point to the next work of
- * the last scheduled work.  This allows move_linked_works() to be
- * nested inside outer list_for_each_entry_safe().
+ * Schedule linked works starting from @work to @head. Work series to be
+ * scheduled starts at @work and includes any consecutive work with
+ * WORK_STRUCT_LINKED set in its predecessor. See assign_work() for details on
+ * @nextp.
  *
  * CONTEXT:
  * raw_spin_lock_irq(pool->lock).
@@ -1050,6 +1047,48 @@ static void move_linked_works(struct work_struct *work, struct list_head *head,
 		*nextp = n;
 }
 
+/**
+ * assign_work - assign a work item and its linked work items to a worker
+ * @work: work to assign
+ * @worker: worker to assign to
+ * @nextp: out parameter for nested worklist walking
+ *
+ * Assign @work and its linked work items to @worker. If @work is already being
+ * executed by another worker in the same pool, it'll be punted there.
+ *
+ * If @nextp is not NULL, it's updated to point to the next work of the last
+ * scheduled work. This allows assign_work() to be nested inside
+ * list_for_each_entry_safe().
+ *
+ * Returns %true if @work was successfully assigned to @worker. %false if @work
+ * was punted to another worker already executing it.
+ */
+static bool assign_work(struct work_struct *work, struct worker *worker,
+			struct work_struct **nextp)
+{
+	struct worker_pool *pool = worker->pool;
+	struct worker *collision;
+
+	lockdep_assert_held(&pool->lock);
+
+	/*
+	 * A single work shouldn't be executed concurrently by multiple workers.
+	 * __queue_work() ensures that @work doesn't jump to a different pool
+	 * while still running in the previous pool. Here, we should ensure that
+	 * @work is not executed concurrently by multiple workers from the same
+	 * pool. Check whether anyone is already processing the work. If so,
+	 * defer the work to the currently executing one.
+	 */
+	collision = find_worker_executing_work(pool, work);
+	if (unlikely(collision)) {
+		move_linked_works(work, &collision->scheduled, nextp);
+		return false;
+	}
+
+	move_linked_works(work, &worker->scheduled, nextp);
+	return true;
+}
+
 /**
  * wake_up_worker - wake up an idle worker
  * @pool: worker pool to wake worker from
@@ -2442,7 +2481,6 @@ __acquires(&pool->lock)
 	struct pool_workqueue *pwq = get_work_pwq(work);
 	struct worker_pool *pool = worker->pool;
 	unsigned long work_data;
-	struct worker *collision;
 #ifdef CONFIG_LOCKDEP
 	/*
 	 * It is permissible to free the struct work_struct from
@@ -2459,18 +2497,6 @@ __acquires(&pool->lock)
 	WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
 		     raw_smp_processor_id() != pool->cpu);
 
-	/*
-	 * A single work shouldn't be executed concurrently by
-	 * multiple workers on a single cpu.  Check whether anyone is
-	 * already processing the work.  If so, defer the work to the
-	 * currently executing one.
-	 */
-	collision = find_worker_executing_work(pool, work);
-	if (unlikely(collision)) {
-		move_linked_works(work, &collision->scheduled, NULL);
-		return;
-	}
-
 	/* claim and dequeue */
 	debug_work_deactivate(work);
 	hash_add(pool->busy_hash, &worker->hentry, (unsigned long)work);
@@ -2697,8 +2723,8 @@ static int worker_thread(void *__worker)
 			list_first_entry(&pool->worklist,
 					 struct work_struct, entry);
 
-		move_linked_works(work, &worker->scheduled, NULL);
-		process_scheduled_works(worker);
+		if (assign_work(work, worker, NULL))
+			process_scheduled_works(worker);
 	} while (keep_working(pool));
 
 	worker_set_flags(worker, WORKER_PREP);
@@ -2742,7 +2768,6 @@ static int rescuer_thread(void *__rescuer)
 {
 	struct worker *rescuer = __rescuer;
 	struct workqueue_struct *wq = rescuer->rescue_wq;
-	struct list_head *scheduled = &rescuer->scheduled;
 	bool should_stop;
 
 	set_user_nice(current, RESCUER_NICE_LEVEL);
@@ -2787,15 +2812,14 @@ static int rescuer_thread(void *__rescuer)
 		 * Slurp in all works issued via this workqueue and
 		 * process'em.
 		 */
-		WARN_ON_ONCE(!list_empty(scheduled));
+		WARN_ON_ONCE(!list_empty(&rescuer->scheduled));
 		list_for_each_entry_safe(work, n, &pool->worklist, entry) {
-			if (get_work_pwq(work) == pwq) {
-				move_linked_works(work, scheduled, &n);
+			if (get_work_pwq(work) == pwq &&
+			    assign_work(work, rescuer, &n))
 				pwq->stats[PWQ_STAT_RESCUED]++;
-			}
 		}
 
-		if (!list_empty(scheduled)) {
+		if (!list_empty(&rescuer->scheduled)) {
 			process_scheduled_works(rescuer);
 
 			/*

From patchwork Fri May 19 00:17:04 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tejun Heo <tj@kernel.org>
X-Patchwork-Id: 96137
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp894942vqo;
        Thu, 18 May 2023 17:41:01 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ4N8/qmoWb2sX6X/Lmrb2s2sJAC0Ij63/EBMMAC4nGc81Mh2IL1+i3xFGM6ASx4VGDovoMW
X-Received: by 2002:a17:902:d2cb:b0:1ae:10b3:6203 with SMTP id
 n11-20020a170902d2cb00b001ae10b36203mr899781plc.16.1684456861414;
        Thu, 18 May 2023 17:41:01 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1684456861; cv=none;
        d=google.com; s=arc-20160816;
        b=OEecSikDwwgACqhhbGD7AXinZGn0mo4eLpIJ/vIthR8l2QaWzNLkZYJqe6jNVAIGxM
         utX/8b53/gatL7D9WadhEPoqLbnwAuYJpk4ft8Mve4kxi4rKSS+tsmh7VaDhmqEkstTs
         qV6G7OhPIpszEh6pm+2ea2u86aQbLfwLj/JpwC+7U22RH6rd5+Vkekdy8CPF8P1pfGea
         Pd+/unb53JCcZxB+1DXExL+PFrbElvx1M/9zdxn+xBgqR8VANgwm5mC6zVmZiudzOT3h
         9luDyPAZUb1M/JbN9dllwSIAEMFdmHUwh3ojFr8mw8fx/IFdTkKD0XtSn7ctBoK9SEpy
         dARA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from:sender
         :dkim-signature;
        bh=YDmg69IzRJyOjIfphfIT4F9L0vpepBzuS21ti/0FCgQ=;
        b=yjvbAFP4U+pjt+exVdhqe/fYnSesHN6atDd+KC0FO4CI39KNikagtdzHFhVSA5Oxdh
         wfBd49Cb/ld5QdKCCLROb5+WCM/I4yDVh3tdf5cYL8+IogC0QlYbC50IdJ96AAA03CC6
         OsjPn1Ygwn3e82yUVErf0hCgnOZMK80KUe1DoFTFZFHKYdlHXeaNQyCwE0+BaDceVylS
         R1rvRImjbf3oJ6frPyiH8tu3yU2FFH5z5FlDSnQpkGp5soGdz2xyj1ZF9599LcjKhzpU
         YnFYN6rizZr8inu+xFbcxRMGah0Wi+NcPgXaa/Ycq0OuzvXp4vDeIDTKJCZJZpsbkcR3
         IeVg==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=otJs2qmS;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 a3-20020a17090a8c0300b0025289bc1cc3si659458pjo.66.2023.05.18.17.40.49;
        Thu, 18 May 2023 17:41:01 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=otJs2qmS;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231172AbjESATa (ORCPT <rfc822;pacteraone@gmail.com> + 99 others);
        Thu, 18 May 2023 20:19:30 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53928 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231164AbjESASz (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 May 2023 20:18:55 -0400
Received: from mail-pg1-x529.google.com (mail-pg1-x529.google.com
 [IPv6:2607:f8b0:4864:20::529])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A3D791981
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:18:29 -0700 (PDT)
Received: by mail-pg1-x529.google.com with SMTP id
 41be03b00d2f7-51f1b6e8179so1827068a12.3
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:18:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1684455492; x=1687047492;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date
         :message-id:reply-to;
        bh=YDmg69IzRJyOjIfphfIT4F9L0vpepBzuS21ti/0FCgQ=;
        b=otJs2qmS8dpN45tc70hwrQ9M3LO2eisKxyYiQVofbXdLcm623kBDFUGAQQIQX/Hv1/
         7cnptQE4bYKMZ7MPCk+m0P4VUVmc52yBcmS78kPfCzvLtEPTt+aUMLLVdX8x2kMdg9H+
         pBy2RLmIDHKF95jmwsXM+gU6tmknMgE3uavTiZ8OCvf46ddWqf89k/hqbYeRVw8aawmL
         MJ/A2531xVPGajmq+k9vfTXYy6MN/fRVO6jYDAPJLxcgz4E8M+QaQUdAow+4c/jwxYKJ
         IqOnHTK++VvmIt7A0Q9WxNSlvMGyWn28A8PiJVTPJ6MDy2hHGS5kiandPT0cQieM7DCJ
         yEHA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684455492; x=1687047492;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=YDmg69IzRJyOjIfphfIT4F9L0vpepBzuS21ti/0FCgQ=;
        b=F/m+UhsIfHayznqzVoDzS9JY2kU/jYRqei04mAsO6zW1lGZ/spvdWym34S2WPwo1xW
         oEEMdVzFbVxfQsIOUQCff0WujWVDMjf/6k0jUrxF/1zI6s7iDZ6/heL/14wt+E5n083b
         7kX1fbFNaWABcDcOKvRuLBwUOXHXUuEn5/Nr+ax/t98RPyKpIAJbbf4VkVtirFufnfkk
         yRaIn+RE2JQ84jJ3oAcsw5gBUaZtBQ5RaQYKK7mEyqt1R3oHfdI7t+J74qXrceR3Nx7N
         Dum/MdSJIfdUsm/b7YNkqs6K9pZp3bvQzDxjCOzgpz4hxcKpklpqwqGJflG92u8YAa/g
         Fpdw==
X-Gm-Message-State: AC+VfDyRBovGITOp1mB+eAT63HdI9229wbBrNzzaIEi2j3Ho6+Kq6F6C
        2juxJrXoVhHq7Km0dfLfdQo=
X-Received: by 2002:a17:902:c40d:b0:1ac:66c4:6071 with SMTP id
 k13-20020a170902c40d00b001ac66c46071mr935163plk.57.1684455491881;
        Thu, 18 May 2023 17:18:11 -0700 (PDT)
Received: from localhost
 (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com.
 [2603:800c:1a02:1bae:a7fa:157f:969a:4cde])
        by smtp.gmail.com with ESMTPSA id
 g7-20020a1709029f8700b001ab18eaf90esm2041082plq.158.2023.05.18.17.18.11
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 18 May 2023 17:18:11 -0700 (PDT)
Sender: Tejun Heo <htejun@gmail.com>
From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
        linux-kernel@vger.kernel.org, kernel-team@meta.com,
        joshdon@google.com, brho@google.com, briannorris@chromium.org,
        nhuck@google.com, agk@redhat.com, snitzer@kernel.org,
        void@manifault.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 19/24] workqueue: Factor out need_more_worker() check and
 worker wake-up
Date: Thu, 18 May 2023 14:17:04 -1000
Message-Id: <20230519001709.2563-20-tj@kernel.org>
X-Mailer: git-send-email 2.40.1
In-Reply-To: <20230519001709.2563-1-tj@kernel.org>
References: <20230519001709.2563-1-tj@kernel.org>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,
        DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
        HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,
        SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no
        version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1766281037781478889?=
X-GMAIL-MSGID: =?utf-8?q?1766281037781478889?=

Checking need_more_worker() and calling wake_up_worker() is a repeated
pattern. Let's add kick_pool(), which checks need_more_worker() and
open-code wake_up_worker(), and replace wake_up_worker() uses. The following
conversions aren't one-to-one:

* __queue_work() was using __need_more_work() because it knows that
  pool->worklist isn't empty. Switching to kick_pool() adds an extra
  list_empty() test.

* create_worker() always needs to wake up the newly minted worker whether
  there's more work to do or not to avoid triggering hung task check on the
  new task. Keep the current wake_up_process() and still add kick_pool().
  This may lead to an extra wakeup which isn't harmful.

* pwq_adjust_max_active() was explicitly checking whether it needs to wake
  up a worker or not to avoid spurious wakeups. As kick_pool() only wakes up
  a worker when necessary, this explicit check is no longer necessary and
  dropped.

* unbind_workers() now calls kick_pool() instead of wake_up_worker() adding
  a need_more_worker() test. This avoids spurious wakeups and shouldn't
  break anything.

wake_up_worker() is dropped as kick_pool() replaces all its users. After
this patch, all paths that wakes up a non-rescuer worker to initiate work
item execution use kick_pool(). This will enable future changes to improve
locality.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/workqueue.c | 87 ++++++++++++++++++++--------------------------
 1 file changed, 37 insertions(+), 50 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index a2e6c2be3a06..58aec5cc5722 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -805,11 +805,6 @@ static bool work_is_canceling(struct work_struct *work)
  * they're being called with pool->lock held.
  */
 
-static bool __need_more_worker(struct worker_pool *pool)
-{
-	return !pool->nr_running;
-}
-
 /*
  * Need to wake up a worker?  Called from anything but currently
  * running workers.
@@ -820,7 +815,7 @@ static bool __need_more_worker(struct worker_pool *pool)
  */
 static bool need_more_worker(struct worker_pool *pool)
 {
-	return !list_empty(&pool->worklist) && __need_more_worker(pool);
+	return !list_empty(&pool->worklist) && !pool->nr_running;
 }
 
 /* Can I start working?  Called from busy but !running workers. */
@@ -1090,20 +1085,23 @@ static bool assign_work(struct work_struct *work, struct worker *worker,
 }
 
 /**
- * wake_up_worker - wake up an idle worker
- * @pool: worker pool to wake worker from
- *
- * Wake up the first idle worker of @pool.
+ * kick_pool - wake up an idle worker if necessary
+ * @pool: pool to kick
  *
- * CONTEXT:
- * raw_spin_lock_irq(pool->lock).
+ * @pool may have pending work items. Wake up worker if necessary. Returns
+ * whether a worker was woken up.
  */
-static void wake_up_worker(struct worker_pool *pool)
+static bool kick_pool(struct worker_pool *pool)
 {
 	struct worker *worker = first_idle_worker(pool);
 
-	if (likely(worker))
-		wake_up_process(worker->task);
+	lockdep_assert_held(&pool->lock);
+
+	if (!need_more_worker(pool) || !worker)
+		return false;
+
+	wake_up_process(worker->task);
+	return true;
 }
 
 #ifdef CONFIG_WQ_CPU_INTENSIVE_REPORT
@@ -1271,10 +1269,9 @@ void wq_worker_sleeping(struct task_struct *task)
 	}
 
 	pool->nr_running--;
-	if (need_more_worker(pool)) {
+	if (kick_pool(pool))
 		worker->current_pwq->stats[PWQ_STAT_CM_WAKEUP]++;
-		wake_up_worker(pool);
-	}
+
 	raw_spin_unlock_irq(&pool->lock);
 }
 
@@ -1312,10 +1309,8 @@ void wq_worker_tick(struct task_struct *task)
 	wq_cpu_intensive_report(worker->current_func);
 	pwq->stats[PWQ_STAT_CPU_INTENSIVE]++;
 
-	if (need_more_worker(pool)) {
+	if (kick_pool(pool))
 		pwq->stats[PWQ_STAT_CM_WAKEUP]++;
-		wake_up_worker(pool);
-	}
 
 	raw_spin_unlock(&pool->lock);
 }
@@ -1752,9 +1747,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
 		trace_workqueue_activate_work(work);
 		pwq->nr_active++;
 		insert_work(pwq, work, &pool->worklist, work_flags);
-
-		if (__need_more_worker(pool))
-			wake_up_worker(pool);
+		kick_pool(pool);
 	} else {
 		work_flags |= WORK_STRUCT_INACTIVE;
 		insert_work(pwq, work, &pwq->inactive_works, work_flags);
@@ -2160,9 +2153,18 @@ static struct worker *create_worker(struct worker_pool *pool)
 
 	/* start the newly created worker */
 	raw_spin_lock_irq(&pool->lock);
+
 	worker->pool->nr_workers++;
 	worker_enter_idle(worker);
+	kick_pool(pool);
+
+	/*
+	 * @worker is waiting on a completion in kthread() and will trigger hung
+	 * check if not woken up soon. As kick_pool() might not have waken it
+	 * up, wake it up explicitly once more.
+	 */
 	wake_up_process(worker->task);
+
 	raw_spin_unlock_irq(&pool->lock);
 
 	return worker;
@@ -2525,14 +2527,12 @@ __acquires(&pool->lock)
 		worker_set_flags(worker, WORKER_CPU_INTENSIVE);
 
 	/*
-	 * Wake up another worker if necessary.  The condition is always
-	 * false for normal per-cpu workers since nr_running would always
-	 * be >= 1 at this point.  This is used to chain execution of the
-	 * pending work items for WORKER_NOT_RUNNING workers such as the
-	 * UNBOUND and CPU_INTENSIVE ones.
+	 * Kick @pool if necessary. It's always noop for per-cpu worker pools
+	 * since nr_running would always be >= 1 at this point. This is used to
+	 * chain execution of the pending work items for WORKER_NOT_RUNNING
+	 * workers such as the UNBOUND and CPU_INTENSIVE ones.
 	 */
-	if (need_more_worker(pool))
-		wake_up_worker(pool);
+	kick_pool(pool);
 
 	/*
 	 * Record the last pool and clear PENDING which should be the last
@@ -2852,12 +2852,10 @@ static int rescuer_thread(void *__rescuer)
 		put_pwq(pwq);
 
 		/*
-		 * Leave this pool.  If need_more_worker() is %true, notify a
-		 * regular worker; otherwise, we end up with 0 concurrency
-		 * and stalling the execution.
+		 * Leave this pool. Notify regular workers; otherwise, we end up
+		 * with 0 concurrency and stalling the execution.
 		 */
-		if (need_more_worker(pool))
-			wake_up_worker(pool);
+		kick_pool(pool);
 
 		raw_spin_unlock_irq(&pool->lock);
 
@@ -4068,24 +4066,13 @@ static void pwq_adjust_max_active(struct pool_workqueue *pwq)
 	 * is updated and visible.
 	 */
 	if (!freezable || !workqueue_freezing) {
-		bool kick = false;
-
 		pwq->max_active = wq->saved_max_active;
 
 		while (!list_empty(&pwq->inactive_works) &&
-		       pwq->nr_active < pwq->max_active) {
+		       pwq->nr_active < pwq->max_active)
 			pwq_activate_first_inactive(pwq);
-			kick = true;
-		}
 
-		/*
-		 * Need to kick a worker after thawed or an unbound wq's
-		 * max_active is bumped. In realtime scenarios, always kicking a
-		 * worker will cause interference on the isolated cpu cores, so
-		 * let's kick iff work items were activated.
-		 */
-		if (kick)
-			wake_up_worker(pwq->pool);
+		kick_pool(pwq->pool);
 	} else {
 		pwq->max_active = 0;
 	}
@@ -5351,7 +5338,7 @@ static void unbind_workers(int cpu)
 		 * worker blocking could lead to lengthy stalls.  Kick off
 		 * unbound chain execution of currently pending work items.
 		 */
-		wake_up_worker(pool);
+		kick_pool(pool);
 
 		raw_spin_unlock_irq(&pool->lock);
 

From patchwork Fri May 19 00:17:05 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tejun Heo <tj@kernel.org>
X-Patchwork-Id: 96138
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp895242vqo;
        Thu, 18 May 2023 17:41:46 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ4I0cNOTGIkrWh5j+ztkdr/woPxyZaN+hXfI7olDnvAWsi8M2XluUjjFCfd7a2V/It5NbRI
X-Received: by 2002:a17:902:eb46:b0:1ac:9cad:1845 with SMTP id
 i6-20020a170902eb4600b001ac9cad1845mr957403pli.18.1684456906224;
        Thu, 18 May 2023 17:41:46 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1684456906; cv=none;
        d=google.com; s=arc-20160816;
        b=suZM6TaXqLN1Fl1z56fBJYDhDpwjWrMS+SYfn7XunWv/OCiolxC1B8cFmZQ3Q8xQ7j
         LPCyKh6fAhdQ2fe5LEROJxCR4hxrGFUDKKpzVsYCSZJMv0lIss4QQVA2NdQFdb5//hk0
         DTdSLJMyCMHB+UtEYpqO4PRK9+ALFrgzXiyrgYf11GVpSZVrUmJbs0/jrhVVqsOwJIw4
         Jrry+Nn5rY0cP7z8Wa+mze0tD554hGcdV+pj6ssNd9+ck7M9Lgt/8YrZ2Jpjcpq4hyho
         VN6wzg27svVUrR2m1WS/Ur11CfHTsyqL6VMR+9IEDzHaPGmrXCSXxcYiLB/HhOrenqGq
         VoVA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from:sender
         :dkim-signature;
        bh=iauq3KRhjOAlqtECxTVPCDR0MkRcmzW4Ov/fyN5+fRg=;
        b=okVTWRopL5Z+JT51vm7dTlOH0WbXfJQHhiEEpgh3fIXHBwbLanieKpknvS6rKfkEYh
         A13hQuEO3xYYy8zGFvUl8kGfUpvIfAmMVc6j4mozlYeA8pOtRizOY7cAiVshZneudGl6
         e205OGKrTBpD0TWHiWFHx3NAiMae9rA4U0DkQ1qtpu1cbQDre+8rMdR2XP1U2w+Zj2FL
         mbKxwqiiDTZ+oZQ6NYf40pEb/a2+C+m+Lb4E6/rAUaH9nwRnEEj0HLcoX39lQMAW7+LP
         XGcsKDiy8QwLxndB/92uqxISuZkBnqZdYvIIeOdE3w9baKcmpt5YrR/wPb/YnrS5ApER
         L1lQ==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=Bs9A31cw;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 x19-20020a170902ea9300b001adbfba9c29si2340043plb.409.2023.05.18.17.41.33;
        Thu, 18 May 2023 17:41:46 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=Bs9A31cw;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231288AbjESATk (ORCPT <rfc822;pacteraone@gmail.com> + 99 others);
        Thu, 18 May 2023 20:19:40 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54620 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231181AbjESAS5 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 May 2023 20:18:57 -0400
Received: from mail-pg1-x531.google.com (mail-pg1-x531.google.com
 [IPv6:2607:f8b0:4864:20::531])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 349591990
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:18:32 -0700 (PDT)
Received: by mail-pg1-x531.google.com with SMTP id
 41be03b00d2f7-53487355877so280508a12.1
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:18:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1684455494; x=1687047494;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date
         :message-id:reply-to;
        bh=iauq3KRhjOAlqtECxTVPCDR0MkRcmzW4Ov/fyN5+fRg=;
        b=Bs9A31cw8E/KLB4jBCM/Q0aWEdwAMYoCLPS6LG3c0FyNgHV7o/kzW1zw+ShrvFxCZ1
         MZbOtxtPmVMBvSvCrrh9rEXxhFVOVVPXq+7kFKCNWnYfqflvwHpFN1OG9kOTi7giME38
         7MDnTEkafyUKNbSPIFfO4h8QqfqKvY/F+6d2ddVLTbevK90lrWD7E5dCQcI2ZFTMle5h
         ZaZmsgVE0YzdS/3so6Qkim+q6yHF6po2h9xDHu68dJiQbFjV0cSuRBKrbgguOwFWRtjY
         vQdRWYDgDKQDAfUHDv34jtoAH8Vgd8ZQRaSG4VGlbkqT/lVHfy6ENZDr7+Y6BSrO/33K
         oYxQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684455494; x=1687047494;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=iauq3KRhjOAlqtECxTVPCDR0MkRcmzW4Ov/fyN5+fRg=;
        b=RFyw4JCzThDk4nIxxHx6DUYDAhKgOzBhkPfJQgzXHMj/DuDbml6vdV7XxL5BE/TeOU
         IVOVYxLMOLTsTCL0rG8pGvq/cOybfIaRGkFAeyYUBenJY1A4uY1MoRvd+0rqACjHkp+P
         EGp2VkP/24STepQgsnvQFJZIJ8R7CAWWA/VM2fyfnPjxPDPV86ZPQoBDeFw3E/Nm47Tg
         7XKIzGJbCMWSFqmmKP/ToECQE/23D4xZkyezIH6GMUfimBcKCTqqWojOABkYhyOOTegJ
         Fh1+xEMeSwOKvOdz6yu1g9JdjKh+3lFPf0TzG//FhtfKrnxkG4mHtwvKSvjQxNP6Q6IJ
         zU8g==
X-Gm-Message-State: AC+VfDwQ80D9SWLyxhpgarxq4KQFMI9CDvVcFmSYgAfALj3v/zBXgmjf
        +t0GgVI+txRs0KSUUncn6u0=
X-Received: by 2002:a17:903:25c1:b0:19f:8ad5:4331 with SMTP id
 jc1-20020a17090325c100b0019f8ad54331mr765232plb.38.1684455493702;
        Thu, 18 May 2023 17:18:13 -0700 (PDT)
Received: from localhost
 (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com.
 [2603:800c:1a02:1bae:a7fa:157f:969a:4cde])
        by smtp.gmail.com with ESMTPSA id
 a18-20020a170902b59200b001ab0672fc1fsm2052529pls.105.2023.05.18.17.18.13
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 18 May 2023 17:18:13 -0700 (PDT)
Sender: Tejun Heo <htejun@gmail.com>
From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
        linux-kernel@vger.kernel.org, kernel-team@meta.com,
        joshdon@google.com, brho@google.com, briannorris@chromium.org,
        nhuck@google.com, agk@redhat.com, snitzer@kernel.org,
        void@manifault.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 20/24] workqueue: Add workqueue_attrs->__pod_cpumask
Date: Thu, 18 May 2023 14:17:05 -1000
Message-Id: <20230519001709.2563-21-tj@kernel.org>
X-Mailer: git-send-email 2.40.1
In-Reply-To: <20230519001709.2563-1-tj@kernel.org>
References: <20230519001709.2563-1-tj@kernel.org>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,
        DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
        HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,
        SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no
        version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1766281084577483857?=
X-GMAIL-MSGID: =?utf-8?q?1766281084577483857?=

workqueue_attrs has two uses:

* to specify the required unouned workqueue properties by users

* to match worker_pool's properties to workqueues by core code

For example, if the user wants to restrict a workqueue to run only CPUs 0
and 2, and the two CPUs are on different affinity scopes, the workqueue's
attrs->cpumask would contains CPUs 0 and 2, and the workqueue would be
associated with two worker_pools, one with attrs->cpumask containing just
CPU 0 and the other CPU 2.

Workqueue wants to support non-strict affinity scopes where work items are
started in their matching affinity scopes but the scheduler is free to
migrate them outside the starting scopes, which can enable utilizing the
whole machine while maintaining most of the locality benefits from affinity
scopes.

To enable that, worker_pools need to distinguish the strict affinity that it
has to follow (because that's the restriction coming from the user) and the
soft affinity that it wants to apply when dispatching work items. Note that
two worker_pools with different soft dispatching requirements have to be
separate; otherwise, for example, we'd be ping-ponging worker threads across
NUMA boundaries constantly.

This patch adds workqueue_attrs->__pod_cpumask. The new field is double
underscored as it's only used internally to distinguish worker_pools. A
worker_pool's ->cpumask is now always the same as the online subset of
allowed CPUs of the associated workqueues, and ->__pod_cpumask is the pod's
subset of that ->cpumask. Going back to the example above, both worker_pools
would have ->cpumask containing both CPUs 0 and 2 but one's ->__pod_cpumask
would contain 0 while the other's 2.

* pool_allowed_cpus() is added. It returns the worker_pool's strict cpumask
  that the pool's workers must stay within. This is currently always
  ->__pod_cpumask as all boundaries are still strict.

* As a workqueue_attrs can now track both the associated workqueues' cpumask
  and its per-pod subset, wq_calc_pod_cpumask() no longer needs an external
  out-argument. Drop @cpumask and instead store the result in
  ->__pod_cpumask.

* The above also simplifies apply_wqattrs_prepare() as the same
  workqueue_attrs can be used to create all pods associated with a
  workqueue. tmp_attrs is dropped.

The only user-visible behavior change is that two workqueues with different
cpumasks no longer can share worker_pools even when their pod subsets
coincide. Going back to the example, let's say there's another workqueue
with cpumask 0, 2, 3, where 2 and 3 are in the same pod. It would be mapped
to two worker_pools - one with CPU 0, the other with 2 and 3. The former has
the same cpumask as the first pod of the earlier example and would have
shared the same worker_pool but that's no longer the case after this patch.
The worker_pools would have the same ->__pod_cpumask but their ->cpumask's
wouldn't match.

While this is necessary to support non-strict affinity scopes, there can be
further optimizations to maintain sharing among strict affinity scopes.
However, non-strict affinity scopes are going to be preferable for most use
cases and we don't see very diverse mixture of unbound workqueue cpumasks
anyway, so the additional overhead doesn't seem to justify the extra
complexity.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/workqueue.h | 16 +++++++++
 kernel/workqueue.c        | 74 ++++++++++++++++++++-------------------
 2 files changed, 54 insertions(+), 36 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index a01b5dcbbeb9..7a0fc0919e0a 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -149,9 +149,25 @@ struct workqueue_attrs {
 
 	/**
 	 * @cpumask: allowed CPUs
+	 *
+	 * Work items in this workqueue are affine to these CPUs and not allowed
+	 * to execute on other CPUs. A pool serving a workqueue must have the
+	 * same @cpumask.
 	 */
 	cpumask_var_t cpumask;
 
+	/**
+	 * @__pod_cpumask: internal attribute used to create per-pod pools
+	 *
+	 * Internal use only.
+	 *
+	 * Per-pod unbound worker pools are used to improve locality. Always a
+	 * subset of ->cpumask. A workqueue can be associated with multiple
+	 * worker pools with disjoint @__pod_cpumask's. Whether the enforcement
+	 * of a pool's @__pod_cpumask is strict depends on @affn_strict.
+	 */
+	cpumask_var_t __pod_cpumask;
+
 	/*
 	 * Below fields aren't properties of a worker_pool. They only modify how
 	 * :c:func:`apply_workqueue_attrs` select pools and thus don't
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 58aec5cc5722..daebc28d09ab 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2029,6 +2029,11 @@ static struct worker *alloc_worker(int node)
 	return worker;
 }
 
+static cpumask_t *pool_allowed_cpus(struct worker_pool *pool)
+{
+	return pool->attrs->__pod_cpumask;
+}
+
 /**
  * worker_attach_to_pool() - attach a worker to a pool
  * @worker: worker to be attached
@@ -2054,7 +2059,7 @@ static void worker_attach_to_pool(struct worker *worker,
 		kthread_set_per_cpu(worker->task, pool->cpu);
 
 	if (worker->rescue_wq)
-		set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
+		set_cpus_allowed_ptr(worker->task, pool_allowed_cpus(pool));
 
 	list_add_tail(&worker->node, &pool->workers);
 	worker->pool = pool;
@@ -2146,7 +2151,7 @@ static struct worker *create_worker(struct worker_pool *pool)
 	}
 
 	set_user_nice(worker->task, pool->attrs->nice);
-	kthread_bind_mask(worker->task, pool->attrs->cpumask);
+	kthread_bind_mask(worker->task, pool_allowed_cpus(pool));
 
 	/* successful, attach the worker to the pool */
 	worker_attach_to_pool(worker, pool);
@@ -3652,6 +3657,7 @@ void free_workqueue_attrs(struct workqueue_attrs *attrs)
 {
 	if (attrs) {
 		free_cpumask_var(attrs->cpumask);
+		free_cpumask_var(attrs->__pod_cpumask);
 		kfree(attrs);
 	}
 }
@@ -3673,6 +3679,8 @@ struct workqueue_attrs *alloc_workqueue_attrs(void)
 		goto fail;
 	if (!alloc_cpumask_var(&attrs->cpumask, GFP_KERNEL))
 		goto fail;
+	if (!alloc_cpumask_var(&attrs->__pod_cpumask, GFP_KERNEL))
+		goto fail;
 
 	cpumask_copy(attrs->cpumask, cpu_possible_mask);
 	attrs->affn_scope = wq_affn_dfl;
@@ -3687,6 +3695,7 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
 {
 	to->nice = from->nice;
 	cpumask_copy(to->cpumask, from->cpumask);
+	cpumask_copy(to->__pod_cpumask, from->__pod_cpumask);
 
 	/*
 	 * Unlike hash and equality test, copying shouldn't ignore wq-only
@@ -3705,6 +3714,8 @@ static u32 wqattrs_hash(const struct workqueue_attrs *attrs)
 	hash = jhash_1word(attrs->nice, hash);
 	hash = jhash(cpumask_bits(attrs->cpumask),
 		     BITS_TO_LONGS(nr_cpumask_bits) * sizeof(long), hash);
+	hash = jhash(cpumask_bits(attrs->__pod_cpumask),
+		     BITS_TO_LONGS(nr_cpumask_bits) * sizeof(long), hash);
 	return hash;
 }
 
@@ -3716,6 +3727,8 @@ static bool wqattrs_equal(const struct workqueue_attrs *a,
 		return false;
 	if (!cpumask_equal(a->cpumask, b->cpumask))
 		return false;
+	if (!cpumask_equal(a->__pod_cpumask, b->__pod_cpumask))
+		return false;
 	return true;
 }
 
@@ -3952,9 +3965,9 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
 		}
 	}
 
-	/* If cpumask is contained inside a NUMA pod, that's our NUMA node */
+	/* If __pod_cpumask is contained inside a NUMA pod, that's our node */
 	for (pod = 0; pod < pt->nr_pods; pod++) {
-		if (cpumask_subset(attrs->cpumask, pt->pod_cpus[pod])) {
+		if (cpumask_subset(attrs->__pod_cpumask, pt->pod_cpus[pod])) {
 			node = pt->pod_node[pod];
 			break;
 		}
@@ -4147,11 +4160,10 @@ static struct pool_workqueue *alloc_unbound_pwq(struct workqueue_struct *wq,
  * @attrs: the wq_attrs of the default pwq of the target workqueue
  * @cpu: the target CPU
  * @cpu_going_down: if >= 0, the CPU to consider as offline
- * @cpumask: outarg, the resulting cpumask
  *
  * Calculate the cpumask a workqueue with @attrs should use on @pod. If
  * @cpu_going_down is >= 0, that cpu is considered offline during calculation.
- * The result is stored in @cpumask.
+ * The result is stored in @attrs->__pod_cpumask.
  *
  * If pod affinity is not enabled, @attrs->cpumask is always used. If enabled
  * and @pod has online CPUs requested by @attrs, the returned cpumask is the
@@ -4159,27 +4171,27 @@ static struct pool_workqueue *alloc_unbound_pwq(struct workqueue_struct *wq,
  *
  * The caller is responsible for ensuring that the cpumask of @pod stays stable.
  */
-static void wq_calc_pod_cpumask(const struct workqueue_attrs *attrs, int cpu,
-				int cpu_going_down, cpumask_t *cpumask)
+static void wq_calc_pod_cpumask(struct workqueue_attrs *attrs, int cpu,
+				int cpu_going_down)
 {
 	const struct wq_pod_type *pt = wqattrs_pod_type(attrs);
 	int pod = pt->cpu_pod[cpu];
 
 	/* does @pod have any online CPUs @attrs wants? */
-	cpumask_and(cpumask, pt->pod_cpus[pod], attrs->cpumask);
-	cpumask_and(cpumask, cpumask, cpu_online_mask);
+	cpumask_and(attrs->__pod_cpumask, pt->pod_cpus[pod], attrs->cpumask);
+	cpumask_and(attrs->__pod_cpumask, attrs->__pod_cpumask, cpu_online_mask);
 	if (cpu_going_down >= 0)
-		cpumask_clear_cpu(cpu_going_down, cpumask);
+		cpumask_clear_cpu(cpu_going_down, attrs->__pod_cpumask);
 
-	if (cpumask_empty(cpumask)) {
-		cpumask_copy(cpumask, attrs->cpumask);
+	if (cpumask_empty(attrs->__pod_cpumask)) {
+		cpumask_copy(attrs->__pod_cpumask, attrs->cpumask);
 		return;
 	}
 
 	/* yeap, return possible CPUs in @pod that @attrs wants */
-	cpumask_and(cpumask, attrs->cpumask, pt->pod_cpus[pod]);
+	cpumask_and(attrs->__pod_cpumask, attrs->cpumask, pt->pod_cpus[pod]);
 
-	if (cpumask_empty(cpumask))
+	if (cpumask_empty(attrs->__pod_cpumask))
 		pr_warn_once("WARNING: workqueue cpumask: online intersect > "
 				"possible intersect\n");
 }
@@ -4233,7 +4245,7 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
 		      const cpumask_var_t unbound_cpumask)
 {
 	struct apply_wqattrs_ctx *ctx;
-	struct workqueue_attrs *new_attrs, *tmp_attrs;
+	struct workqueue_attrs *new_attrs;
 	int cpu;
 
 	lockdep_assert_held(&wq_pool_mutex);
@@ -4245,8 +4257,7 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
 	ctx = kzalloc(struct_size(ctx, pwq_tbl, nr_cpu_ids), GFP_KERNEL);
 
 	new_attrs = alloc_workqueue_attrs();
-	tmp_attrs = alloc_workqueue_attrs();
-	if (!ctx || !new_attrs || !tmp_attrs)
+	if (!ctx || !new_attrs)
 		goto out_free;
 
 	/*
@@ -4259,13 +4270,7 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
 	cpumask_and(new_attrs->cpumask, new_attrs->cpumask, unbound_cpumask);
 	if (unlikely(cpumask_empty(new_attrs->cpumask)))
 		cpumask_copy(new_attrs->cpumask, unbound_cpumask);
-
-	/*
-	 * We may create multiple pwqs with differing cpumasks.  Make a
-	 * copy of @new_attrs which will be modified and used to obtain
-	 * pools.
-	 */
-	copy_workqueue_attrs(tmp_attrs, new_attrs);
+	cpumask_copy(new_attrs->__pod_cpumask, new_attrs->cpumask);
 
 	/*
 	 * If something goes wrong during CPU up/down, we'll fall back to
@@ -4281,8 +4286,8 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
 			ctx->dfl_pwq->refcnt++;
 			ctx->pwq_tbl[cpu] = ctx->dfl_pwq;
 		} else {
-			wq_calc_pod_cpumask(new_attrs, cpu, -1, tmp_attrs->cpumask);
-			ctx->pwq_tbl[cpu] = alloc_unbound_pwq(wq, tmp_attrs);
+			wq_calc_pod_cpumask(new_attrs, cpu, -1);
+			ctx->pwq_tbl[cpu] = alloc_unbound_pwq(wq, new_attrs);
 			if (!ctx->pwq_tbl[cpu])
 				goto out_free;
 		}
@@ -4291,14 +4296,13 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
 	/* save the user configured attrs and sanitize it. */
 	copy_workqueue_attrs(new_attrs, attrs);
 	cpumask_and(new_attrs->cpumask, new_attrs->cpumask, cpu_possible_mask);
+	cpumask_copy(new_attrs->__pod_cpumask, new_attrs->cpumask);
 	ctx->attrs = new_attrs;
 
 	ctx->wq = wq;
-	free_workqueue_attrs(tmp_attrs);
 	return ctx;
 
 out_free:
-	free_workqueue_attrs(tmp_attrs);
 	free_workqueue_attrs(new_attrs);
 	apply_wqattrs_cleanup(ctx);
 	return ERR_PTR(-ENOMEM);
@@ -4423,7 +4427,6 @@ static void wq_update_pod(struct workqueue_struct *wq, int cpu, bool online)
 	int cpu_off = online ? -1 : cpu;
 	struct pool_workqueue *old_pwq = NULL, *pwq;
 	struct workqueue_attrs *target_attrs;
-	cpumask_t *cpumask;
 
 	lockdep_assert_held(&wq_pool_mutex);
 
@@ -4436,15 +4439,14 @@ static void wq_update_pod(struct workqueue_struct *wq, int cpu, bool online)
 	 * CPU hotplug exclusion.
 	 */
 	target_attrs = wq_update_pod_attrs_buf;
-	cpumask = target_attrs->cpumask;
-
-	copy_workqueue_attrs(target_attrs, wq->unbound_attrs);
+	copy_workqueue_attrs(target_attrs, wq->dfl_pwq->pool->attrs);
 
 	/* nothing to do if the target cpumask matches the current pwq */
-	wq_calc_pod_cpumask(wq->dfl_pwq->pool->attrs, cpu, cpu_off, cpumask);
+	wq_calc_pod_cpumask(target_attrs, cpu, cpu_off);
 	pwq = rcu_dereference_protected(*per_cpu_ptr(wq->cpu_pwq, cpu),
 					lockdep_is_held(&wq_pool_mutex));
-	if (cpumask_equal(cpumask, pwq->pool->attrs->cpumask))
+	if (cpumask_equal(target_attrs->__pod_cpumask,
+			  pwq->pool->attrs->cpumask))
 		return;
 
 	/* create a new pwq */
@@ -5371,7 +5373,7 @@ static void rebind_workers(struct worker_pool *pool)
 	for_each_pool_worker(worker, pool) {
 		kthread_set_per_cpu(worker->task, pool->cpu);
 		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
-						  pool->attrs->cpumask) < 0);
+						  pool_allowed_cpus(pool)) < 0);
 	}
 
 	raw_spin_lock_irq(&pool->lock);

From patchwork Fri May 19 00:17:06 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tejun Heo <tj@kernel.org>
X-Patchwork-Id: 96122
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp889137vqo;
        Thu, 18 May 2023 17:26:33 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ7lPfmsoqlZ2nVJXBKfyX0xZS0v8K0G/e+PcGIJJPDe7Uwhyz8+DCfRdzrx9//hZJ3HNvgc
X-Received: by 2002:a17:90a:5ac6:b0:250:6730:a364 with SMTP id
 n64-20020a17090a5ac600b002506730a364mr439665pji.3.1684455993140;
        Thu, 18 May 2023 17:26:33 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1684455993; cv=none;
        d=google.com; s=arc-20160816;
        b=q27wYqXpRCKxZIs2Zcw4Ssm3ddImZHdOtBhbas8NQeZOyETepxhZiDs/+0ApNG65br
         5jJnogAEZf5kQrz7ZcWCenQHlPF92zl2lxzImziGQG6+HPVdtcDAc7Bj+exckEqCoGUV
         mhbSgICA1bpBjOneka7YW7n+vlu/fHf8awyNokliS1L1vRWtOnRbMmm0d5EMhu1nzwlE
         hty0ybQ/a3e70bko1J3sjyTeVGt5q+1y17XGi/AC1g+KI9TovPpl5dMEJZ06DMNhG1u8
         NTW5uARgOK+XtWC88h/zLsiM1HHoornDZaPgyGyYw4GSx80c3TamddnAMPojeuMnapdh
         MN/w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from:sender
         :dkim-signature;
        bh=zkZtLQuErefJawKFSbPbL/we8iYilRNyxhefO3dKqWY=;
        b=qbX8DjGmH+heaZrVULQ9SP7AYhEnFTGmfnxOSVLrbAs700W/x/xEOfkDqOtsdz/Cnt
         jDtfQiToTQbp3QafZlXHjZWWVZQnfIKvzTMV5g6ujJHwXPsaa1nn8/la0LOg10NCWNe7
         wp4B+yxsp4OcAqrLrHzn03A8rGnbr5+acZOga3WTzRaqWwePsciG14MjQ5UJKuhYfj6O
         g8gApV6kFYF4Y6eGp9qip/3b7zKjEkVEP3IYN7dXWbgXQ0zduHMWhZscaYUuBEkwm8cc
         HgYN5NobudPNdQDcmKD8O1CsOvR3uFFmdQ96/ddZJxJOFAWQiiUoBxd1rdl7KFyhCjTn
         z9hg==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=TOXj6Ie9;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 w34-20020a17090a6ba500b00253695cadcasi568740pjj.180.2023.05.18.17.26.21;
        Thu, 18 May 2023 17:26:33 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=TOXj6Ie9;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231268AbjESATe (ORCPT <rfc822;pacteraone@gmail.com> + 99 others);
        Thu, 18 May 2023 20:19:34 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54224 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231178AbjESAS4 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 May 2023 20:18:56 -0400
Received: from mail-pl1-x62a.google.com (mail-pl1-x62a.google.com
 [IPv6:2607:f8b0:4864:20::62a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 34BF21991
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:18:32 -0700 (PDT)
Received: by mail-pl1-x62a.google.com with SMTP id
 d9443c01a7336-1ae5dc9eac4so14717635ad.1
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:18:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1684455496; x=1687047496;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date
         :message-id:reply-to;
        bh=zkZtLQuErefJawKFSbPbL/we8iYilRNyxhefO3dKqWY=;
        b=TOXj6Ie9mXX3Bd+HeUr0iSFla0b53tv55+ilgb1gAuphTcPacCvVcvL7hNJc9aC6ce
         g31DZiKEGPik+L72fI2PWdGP9X4sMWlxq6b1LVBq7F4qYAS10yXY0HPD/Whuu5ywtas6
         jvzEpYEFxFJdpUMo5BY69t0Rjl11ILbCFSA4a+O6i5GIRUJ1++YYYfqr2zdXdrAe4wJS
         WeG81LT2n42mfzqTCIQNM7mHcnnApElw6+jYOKKsKBnf1I8JzoiAOYgNqw2x3mkxw3uL
         3WUBV3VsFe/Kfjro90TLK3eW+O4llGuMdcaQhcEYe2r1UYGBgbIkWdXb+qEqJ0AAJ+RJ
         G1eg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684455496; x=1687047496;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=zkZtLQuErefJawKFSbPbL/we8iYilRNyxhefO3dKqWY=;
        b=J0xxCZujtektgLstTR6NrmTEn7R6iAstZkHGOyw9ytPbBjdEW/hVLQPbK0JQZpjd36
         6p/HGDdWUkaXS91SKsQ+G42Av3ZRufO0wPCirXRWhlpxqK7M+MlKjTfrOQMTNDdoPMxl
         +YDkmIsbTOu9anq1AgkVUFhbRh8CR9ooAlURNuadJyKCYlP1GgnaCq7QMMTZbDcyfMOS
         RS/tA1cdnH1pn8iQLnjsyjAwn6gq6FVlqC/ST9oRMqTfRoQCojFjpvQw5yysFNTh1QZo
         9L1gx51WARiRQkHY6UKu9AnOR4BOA3fMbUH9J9L2oOkF+9gLcp36ELU/XAPVcmS/htd6
         /mmw==
X-Gm-Message-State: AC+VfDx7WvkfsHNxqZ7pF5io1IBtIDldLe8r1YYFDC3Dh80KBEYl8JsZ
        abSPPhVx2KNmnzdPl3e/8eE=
X-Received: by 2002:a17:902:ef94:b0:1ab:1241:f671 with SMTP id
 iz20-20020a170902ef9400b001ab1241f671mr792359plb.29.1684455495717;
        Thu, 18 May 2023 17:18:15 -0700 (PDT)
Received: from localhost
 (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com.
 [2603:800c:1a02:1bae:a7fa:157f:969a:4cde])
        by smtp.gmail.com with ESMTPSA id
 s19-20020a170902a51300b001ac381f1ce9sm2043835plq.185.2023.05.18.17.18.14
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 18 May 2023 17:18:15 -0700 (PDT)
Sender: Tejun Heo <htejun@gmail.com>
From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
        linux-kernel@vger.kernel.org, kernel-team@meta.com,
        joshdon@google.com, brho@google.com, briannorris@chromium.org,
        nhuck@google.com, agk@redhat.com, snitzer@kernel.org,
        void@manifault.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 21/24] workqueue: Implement non-strict affinity scope for
 unbound workqueues
Date: Thu, 18 May 2023 14:17:06 -1000
Message-Id: <20230519001709.2563-22-tj@kernel.org>
X-Mailer: git-send-email 2.40.1
In-Reply-To: <20230519001709.2563-1-tj@kernel.org>
References: <20230519001709.2563-1-tj@kernel.org>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,
        DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
        HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,
        SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no
        version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1766280127492361990?=
X-GMAIL-MSGID: =?utf-8?q?1766280127492361990?=

An unbound workqueue can be served by multiple worker_pools to improve
locality. The segmentation is achieved by grouping CPUs into pods. By
default, the cache boundaries according to cpus_share_cache() define the
CPUs are grouped. Let's a workqueue is allowed to run on all CPUs and the
system has two L3 caches. The workqueue would be mapped to two worker_pools
each serving one L3 cache domains.

While this improves locality, because the pod boundaries are strict, it
limits the total bandwidth a given issuer can consume. For example, let's
say there is a thread pinned to a CPU issuing enough work items to saturate
the whole machine. With the machine segmented into two pods, no matter how
many work items it issues, it can only use half of the CPUs on the system.

While this limitation has existed for a very long time, it wasn't very
pronounced because the affinity grouping used to be always by NUMA nodes.
With cache boundaries as the default and support for even finer grained
scopes (smt and cpu), it is now an a lot more pressing problem.

This patch implements non-strict affinity scope where the pod boundaries
aren't enforced strictly. Going back to the previous example, the workqueue
would still be mapped to two worker_pools; however, the affinity enforcement
would be soft. The workers in both pools would have their cpus_allowed set
to the whole machine thus allowing the scheduler to migrate them anywhere on
the machine. However, whenever an idle worker is woken up, the workqueue
code asks the scheduler to bring back the task within the pod if the worker
is outside. ie. work items start executing within its affinity scope but can
be migrated outside as the scheduler sees fit. This removes the hard cap on
utilization while maintaining the benefits of affinity scopes.

After the earlier ->__pod_cpumask changes, the implementation is pretty
simple. When non-strict which is the new default:

* pool_allowed_cpus() returns @pool->attrs->cpumask instead of
  ->__pod_cpumask so that the workers are allowed to run on any CPU that
  the associated workqueues allow.

* If the idle worker task's ->wake_cpu is outside the pod, kick_pool() sets
  the field to a CPU within the pod.

This would be the first use of task_struct->wake_cpu outside scheduler
proper, so it isn't clear whether this would be acceptable. However, other
methods of migrating tasks are significantly more expensive and are likely
prohibitively so if we want to do this on every work item. This needs
discussion with scheduler folks.

There is also a race window where setting ->wake_cpu wouldn't be effective
as the target task is still on CPU. However, the window is pretty small and
this being a best-effort optimization, it doesn't seem to warrant more
complexity at the moment.

While the non-strict cache affinity scopes seem to be the best option, the
performance picture interacts with the affinity scope and is a bit
complicated to fully discuss in this patch, so the behavior is made easily
selectable through wqattrs and sysfs and the next patch will add
documentation to discuss performance implications.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/core-api/workqueue.rst | 30 +++++++++---
 include/linux/workqueue.h            | 11 +++++
 kernel/workqueue.c                   | 73 +++++++++++++++++++++++++++-
 tools/workqueue/wq_dump.py           | 16 ++++--
 tools/workqueue/wq_monitor.py        | 21 +++++---
 5 files changed, 131 insertions(+), 20 deletions(-)

diff --git a/Documentation/core-api/workqueue.rst b/Documentation/core-api/workqueue.rst
index 56af317508c9..c73a6df6a118 100644
--- a/Documentation/core-api/workqueue.rst
+++ b/Documentation/core-api/workqueue.rst
@@ -353,9 +353,10 @@ Affinity Scopes
 An unbound workqueue groups CPUs according to its affinity scope to improve
 cache locality. For example, if a workqueue is using the default affinity
 scope of "cache", it will group CPUs according to last level cache
-boundaries. A work item queued on the workqueue will be processed by a
-worker running on one of the CPUs which share the last level cache with the
-issuing CPU.
+boundaries. A work item queued on the workqueue will be assigned to a worker
+on one of the CPUs which share the last level cache with the issuing CPU.
+Once started, the worker may or may not be allowed to move outside the scope
+depending on the ``affinity_strict`` setting of the scope.
 
 Workqueue currently supports the following five affinity scopes.
 
@@ -391,6 +392,21 @@ directory.
 ``affinity_scope``
   Read to see the current affinity scope. Write to change.
 
+``affinity_strict``
+  0 by default indicating that affinity scopes are not strict. When a work
+  item starts execution, workqueue makes a best-effort attempt to ensure
+  that the worker is inside its affinity scope, which is called
+  repatriation. Once started, the scheduler is free to move the worker
+  anywhere in the system as it sees fit. This enables benefiting from scope
+  locality while still being able to utilize other CPUs if necessary and
+  available.
+
+  If set to 1, all workers of the scope are guaranteed always to be in the
+  scope. This may be useful when crossing affinity scopes has other
+  implications, for example, in terms of power consumption or workload
+  isolation. Strict NUMA scope can also be used to match the workqueue
+  behavior of older kernels.
+
 
 Examining Configuration
 =======================
@@ -475,21 +491,21 @@ Monitoring
 Use tools/workqueue/wq_monitor.py to monitor workqueue operations: ::
 
   $ tools/workqueue/wq_monitor.py events
-                              total  infl  CPUtime  CPUhog  CMwake  mayday rescued
+                              total  infl  CPUtime  CPUhog CMW/RPR  mayday rescued
   events                      18545     0      6.1       0       5       -       -
   events_highpri                  8     0      0.0       0       0       -       -
   events_long                     3     0      0.0       0       0       -       -
-  events_unbound              38306     0      0.1       -       -       -       -
+  events_unbound              38306     0      0.1       -       7       -       -
   events_freezable                0     0      0.0       0       0       -       -
   events_power_efficient      29598     0      0.2       0       0       -       -
   events_freezable_power_        10     0      0.0       0       0       -       -
   sock_diag_events                0     0      0.0       0       0       -       -
 
-                              total  infl  CPUtime  CPUhog  CMwake  mayday rescued
+                              total  infl  CPUtime  CPUhog CMW/RPR  mayday rescued
   events                      18548     0      6.1       0       5       -       -
   events_highpri                  8     0      0.0       0       0       -       -
   events_long                     3     0      0.0       0       0       -       -
-  events_unbound              38322     0      0.1       -       -       -       -
+  events_unbound              38322     0      0.1       -       7       -       -
   events_freezable                0     0      0.0       0       0       -       -
   events_power_efficient      29603     0      0.2       0       0       -       -
   events_freezable_power_        10     0      0.0       0       0       -       -
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 7a0fc0919e0a..751eb915e3f0 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -168,6 +168,17 @@ struct workqueue_attrs {
 	 */
 	cpumask_var_t __pod_cpumask;
 
+	/**
+	 * @affn_strict: affinity scope is strict
+	 *
+	 * If clear, workqueue will make a best-effort attempt at starting the
+	 * worker inside @__pod_cpumask but the scheduler is free to migrate it
+	 * outside.
+	 *
+	 * If set, workers are only allowed to run inside @__pod_cpumask.
+	 */
+	bool affn_strict;
+
 	/*
 	 * Below fields aren't properties of a worker_pool. They only modify how
 	 * :c:func:`apply_workqueue_attrs` select pools and thus don't
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index daebc28d09ab..3ce4c18e139c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -210,6 +210,7 @@ enum pool_workqueue_stats {
 	PWQ_STAT_CPU_TIME,	/* total CPU time consumed */
 	PWQ_STAT_CPU_INTENSIVE,	/* wq_cpu_intensive_thresh_us violations */
 	PWQ_STAT_CM_WAKEUP,	/* concurrency-management worker wakeups */
+	PWQ_STAT_REPATRIATED,	/* unbound workers brought back into scope */
 	PWQ_STAT_MAYDAY,	/* maydays to rescuer */
 	PWQ_STAT_RESCUED,	/* linked work items executed by rescuer */
 
@@ -1094,13 +1095,41 @@ static bool assign_work(struct work_struct *work, struct worker *worker,
 static bool kick_pool(struct worker_pool *pool)
 {
 	struct worker *worker = first_idle_worker(pool);
+	struct task_struct *p;
 
 	lockdep_assert_held(&pool->lock);
 
 	if (!need_more_worker(pool) || !worker)
 		return false;
 
-	wake_up_process(worker->task);
+	p = worker->task;
+
+#ifdef CONFIG_SMP
+	/*
+	 * Idle @worker is about to execute @work and waking up provides an
+	 * opportunity to migrate @worker at a lower cost by setting the task's
+	 * wake_cpu field. Let's see if we want to move @worker to improve
+	 * execution locality.
+	 *
+	 * We're waking the worker that went idle the latest and there's some
+	 * chance that @worker is marked idle but hasn't gone off CPU yet. If
+	 * so, setting the wake_cpu won't do anything. As this is a best-effort
+	 * optimization and the race window is narrow, let's leave as-is for
+	 * now. If this becomes pronounced, we can skip over workers which are
+	 * still on cpu when picking an idle worker.
+	 *
+	 * If @pool has non-strict affinity, @worker might have ended up outside
+	 * its affinity scope. Repatriate.
+	 */
+	if (!pool->attrs->affn_strict &&
+	    !cpumask_test_cpu(p->wake_cpu, pool->attrs->__pod_cpumask)) {
+		struct work_struct *work = list_first_entry(&pool->worklist,
+						struct work_struct, entry);
+		p->wake_cpu = cpumask_any_distribute(pool->attrs->__pod_cpumask);
+		get_work_pwq(work)->stats[PWQ_STAT_REPATRIATED]++;
+	}
+#endif
+	wake_up_process(p);
 	return true;
 }
 
@@ -2031,7 +2060,10 @@ static struct worker *alloc_worker(int node)
 
 static cpumask_t *pool_allowed_cpus(struct worker_pool *pool)
 {
-	return pool->attrs->__pod_cpumask;
+	if (pool->cpu < 0 && pool->attrs->affn_strict)
+		return pool->attrs->__pod_cpumask;
+	else
+		return pool->attrs->cpumask;
 }
 
 /**
@@ -3696,6 +3728,7 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
 	to->nice = from->nice;
 	cpumask_copy(to->cpumask, from->cpumask);
 	cpumask_copy(to->__pod_cpumask, from->__pod_cpumask);
+	to->affn_strict = from->affn_strict;
 
 	/*
 	 * Unlike hash and equality test, copying shouldn't ignore wq-only
@@ -3716,6 +3749,7 @@ static u32 wqattrs_hash(const struct workqueue_attrs *attrs)
 		     BITS_TO_LONGS(nr_cpumask_bits) * sizeof(long), hash);
 	hash = jhash(cpumask_bits(attrs->__pod_cpumask),
 		     BITS_TO_LONGS(nr_cpumask_bits) * sizeof(long), hash);
+	hash = jhash_1word(attrs->affn_strict, hash);
 	return hash;
 }
 
@@ -3729,6 +3763,8 @@ static bool wqattrs_equal(const struct workqueue_attrs *a,
 		return false;
 	if (!cpumask_equal(a->__pod_cpumask, b->__pod_cpumask))
 		return false;
+	if (a->affn_strict != b->affn_strict)
+		return false;
 	return true;
 }
 
@@ -5792,6 +5828,7 @@ module_param_cb(default_affinity_scope, &wq_affn_dfl_ops, NULL, 0644);
  *  nice		RW int	: nice value of the workers
  *  cpumask		RW mask	: bitmask of allowed CPUs for the workers
  *  affinity_scope	RW str  : worker CPU affinity scope (cache, numa, none)
+ *  affinity_strict	RW bool : worker CPU affinity is strict
  */
 struct wq_device {
 	struct workqueue_struct		*wq;
@@ -5971,10 +6008,42 @@ static ssize_t wq_affn_scope_store(struct device *dev,
 	return ret ?: count;
 }
 
+static ssize_t wq_affinity_strict_show(struct device *dev,
+				       struct device_attribute *attr, char *buf)
+{
+	struct workqueue_struct *wq = dev_to_wq(dev);
+
+	return scnprintf(buf, PAGE_SIZE, "%d\n",
+			 wq->unbound_attrs->affn_strict);
+}
+
+static ssize_t wq_affinity_strict_store(struct device *dev,
+					struct device_attribute *attr,
+					const char *buf, size_t count)
+{
+	struct workqueue_struct *wq = dev_to_wq(dev);
+	struct workqueue_attrs *attrs;
+	int v, ret = -ENOMEM;
+
+	if (sscanf(buf, "%d", &v) != 1)
+		return -EINVAL;
+
+	apply_wqattrs_lock();
+	attrs = wq_sysfs_prep_attrs(wq);
+	if (attrs) {
+		attrs->affn_strict = (bool)v;
+		ret = apply_workqueue_attrs_locked(wq, attrs);
+	}
+	apply_wqattrs_unlock();
+	free_workqueue_attrs(attrs);
+	return ret ?: count;
+}
+
 static struct device_attribute wq_sysfs_unbound_attrs[] = {
 	__ATTR(nice, 0644, wq_nice_show, wq_nice_store),
 	__ATTR(cpumask, 0644, wq_cpumask_show, wq_cpumask_store),
 	__ATTR(affinity_scope, 0644, wq_affn_scope_show, wq_affn_scope_store),
+	__ATTR(affinity_strict, 0644, wq_affinity_strict_show, wq_affinity_strict_store),
 	__ATTR_NULL,
 };
 
diff --git a/tools/workqueue/wq_dump.py b/tools/workqueue/wq_dump.py
index 43ab71a193b8..d0df5833f2c1 100644
--- a/tools/workqueue/wq_dump.py
+++ b/tools/workqueue/wq_dump.py
@@ -36,10 +36,11 @@ Workqueue CPU -> pool
 Lists all workqueues along with their type and worker pool association. For
 each workqueue:
 
-  NAME TYPE POOL_ID...
+  NAME TYPE[,FLAGS] POOL_ID...
 
   NAME      name of the workqueue
   TYPE      percpu, unbound or ordered
+  FLAGS     S: strict affinity scope
   POOL_ID   worker pool ID associated with each possible CPU
 """
 
@@ -138,13 +139,16 @@ max_ref_len = 0
         print(f'cpu={pool.cpu.value_():3}', end='')
     else:
         print(f'cpus={cpumask_str(pool.attrs.cpumask)}', end='')
+        print(f' pod_cpus={cpumask_str(pool.attrs.__pod_cpumask)}', end='')
+        if pool.attrs.affn_strict:
+            print(' strict', end='')
     print('')
 
 print('')
 print('Workqueue CPU -> pool')
 print('=====================')
 
-print('[    workqueue \ CPU            ', end='')
+print('[    workqueue     \     type   CPU', end='')
 for cpu in for_each_possible_cpu(prog):
     print(f' {cpu:{max_pool_id_len}}', end='')
 print(' dfl]')
@@ -153,11 +157,15 @@ print(' dfl]')
     print(f'{wq.name.string_().decode()[-24:]:24}', end='')
     if wq.flags & WQ_UNBOUND:
         if wq.flags & WQ_ORDERED:
-            print(' ordered', end='')
+            print(' ordered   ', end='')
         else:
             print(' unbound', end='')
+            if wq.unbound_attrs.affn_strict:
+                print(',S ', end='')
+            else:
+                print('   ', end='')
     else:
-        print(' percpu ', end='')
+        print(' percpu    ', end='')
 
     for cpu in for_each_possible_cpu(prog):
         pool_id = per_cpu_ptr(wq.cpu_pwq, cpu)[0].pool.id.value_()
diff --git a/tools/workqueue/wq_monitor.py b/tools/workqueue/wq_monitor.py
index 6e258d123e8c..a8856a9c45dc 100644
--- a/tools/workqueue/wq_monitor.py
+++ b/tools/workqueue/wq_monitor.py
@@ -20,8 +20,11 @@ https://github.com/osandov/drgn.
            and got excluded from concurrency management to avoid stalling
            other work items.
 
-  CMwake   The number of concurrency-management wake-ups while executing a
-           work item of the workqueue.
+  CMW/RPR  For per-cpu workqueues, the number of concurrency-management
+           wake-ups while executing a work item of the workqueue. For
+           unbound workqueues, the number of times a worker was repatriated
+           to its affinity scope after being migrated to an off-scope CPU by
+           the scheduler.
 
   mayday   The number of times the rescuer was requested while waiting for
            new worker creation.
@@ -65,6 +68,7 @@ PWQ_STAT_COMPLETED      = prog['PWQ_STAT_COMPLETED']	# work items completed exec
 PWQ_STAT_CPU_TIME       = prog['PWQ_STAT_CPU_TIME']     # total CPU time consumed
 PWQ_STAT_CPU_INTENSIVE  = prog['PWQ_STAT_CPU_INTENSIVE'] # wq_cpu_intensive_thresh_us violations
 PWQ_STAT_CM_WAKEUP      = prog['PWQ_STAT_CM_WAKEUP']    # concurrency-management worker wakeups
+PWQ_STAT_REPATRIATED    = prog['PWQ_STAT_REPATRIATED']  # unbound workers brought back into scope
 PWQ_STAT_MAYDAY         = prog['PWQ_STAT_MAYDAY']	# maydays to rescuer
 PWQ_STAT_RESCUED        = prog['PWQ_STAT_RESCUED']	# linked work items executed by rescuer
 PWQ_NR_STATS            = prog['PWQ_NR_STATS']
@@ -89,22 +93,25 @@ PWQ_NR_STATS            = prog['PWQ_NR_STATS']
                  'cpu_time'             : self.stats[PWQ_STAT_CPU_TIME],
                  'cpu_intensive'        : self.stats[PWQ_STAT_CPU_INTENSIVE],
                  'cm_wakeup'            : self.stats[PWQ_STAT_CM_WAKEUP],
+                 'repatriated'          : self.stats[PWQ_STAT_REPATRIATED],
                  'mayday'               : self.stats[PWQ_STAT_MAYDAY],
                  'rescued'              : self.stats[PWQ_STAT_RESCUED], }
 
     def table_header_str():
         return f'{"":>24} {"total":>8} {"infl":>5} {"CPUtime":>8} '\
-            f'{"CPUitsv":>7} {"CMwake":>7} {"mayday":>7} {"rescued":>7}'
+            f'{"CPUitsv":>7} {"CMW/RPR":>7} {"mayday":>7} {"rescued":>7}'
 
     def table_row_str(self):
         cpu_intensive = '-'
-        cm_wakeup = '-'
+        cmw_rpr = '-'
         mayday = '-'
         rescued = '-'
 
-        if not self.unbound:
+        if self.unbound:
+            cmw_rpr = str(self.stats[PWQ_STAT_REPATRIATED]);
+        else:
             cpu_intensive = str(self.stats[PWQ_STAT_CPU_INTENSIVE])
-            cm_wakeup = str(self.stats[PWQ_STAT_CM_WAKEUP])
+            cmw_rpr = str(self.stats[PWQ_STAT_CM_WAKEUP])
 
         if self.mem_reclaim:
             mayday = str(self.stats[PWQ_STAT_MAYDAY])
@@ -115,7 +122,7 @@ PWQ_NR_STATS            = prog['PWQ_NR_STATS']
               f'{max(self.stats[PWQ_STAT_STARTED] - self.stats[PWQ_STAT_COMPLETED], 0):5} ' \
               f'{self.stats[PWQ_STAT_CPU_TIME] / 1000000:8.1f} ' \
               f'{cpu_intensive:>7} ' \
-              f'{cm_wakeup:>7} ' \
+              f'{cmw_rpr:>7} ' \
               f'{mayday:>7} ' \
               f'{rescued:>7} '
         return out.rstrip(':')

From patchwork Fri May 19 00:17:07 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Tejun Heo <tj@kernel.org>
X-Patchwork-Id: 96139
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp895503vqo;
        Thu, 18 May 2023 17:42:29 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ5aO6VCP70z/JGESDStBLWQDBCISKi0ilnDEwhlBvf9kXpr23/3CynczMv91pWCS7XKwoUy
X-Received: by 2002:a17:903:18e:b0:1ad:bccc:af6a with SMTP id
 z14-20020a170903018e00b001adbcccaf6amr845891plg.56.1684456948706;
        Thu, 18 May 2023 17:42:28 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1684456948; cv=none;
        d=google.com; s=arc-20160816;
        b=VAWenrKY4wKpvhBgJ+PZJmvT3S/1oAzH4o8boN5KMPjDNI5fHBRb3amAw2JMOfJnea
         AlFmJPVZ1UKyu1X2iFWEboTraQn1p55KJxNmyblHHUYjyhfYBDtmlZ8oyhY/tQODf1kq
         LcdLCsNv8McBCoUDmlJeurPAFr5q1wLkFMKiuv1ZlIkmRaV1nZgb63wivI+c9TpoBhbs
         FGXNvB/PuEPbCMMq4qCdVJ7T2vW3fRk5JR6Okhf0Jq5NgS9tS8SLnwAMPzShngsIm+kE
         eUQN2cNw/1sTnGcLqEnR9t84Hh1Sjp4Ikc7j0rOQzbgAKhJm7UDyOqR5v+3weItzhOA2
         Wp5A==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from:sender
         :dkim-signature;
        bh=ZHAWRptnKdIlqvPqFYXdDuWL+3gSoggvY6ciXqnI2hE=;
        b=Wq0hymlqYrkv1ojjEim9qBtv3D6KRlC06vFaImsi8PlMaepGYFw2jin4yAE8v/M3PV
         uqN8m5/XO1y5wAEjp/3DBZc8oj1Y4QfWLe5eCPZf61JCkhCV6fI80EmZJXvxiozSK6Bb
         pcWNdMJuh9hm4QSZZRmQUQkZlhJVU2LRIqnopkV8Sd4LSH9vyAboGcZFIOPbkeUJey0D
         l6oiirpsg3Wc8JO7kGdbDrXnCvPX1jIOFz1TZvWhE1+x4+XLT/y+D3D64h0FYtgDpa5Z
         ZqtxTk9DXJR+12wVMns2gqTjZI8OyEb8eox93n1Fgyj+yIpZoLJFqdAFmWuk8GPNtFmD
         gAuw==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=YwsnyLDT;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 x19-20020a170902ea9300b001adbfba9c29si2340043plb.409.2023.05.18.17.42.16;
        Thu, 18 May 2023 17:42:28 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=YwsnyLDT;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231295AbjESATo (ORCPT <rfc822;pacteraone@gmail.com> + 99 others);
        Thu, 18 May 2023 20:19:44 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53626 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230384AbjESAS5 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 May 2023 20:18:57 -0400
Received: from mail-pg1-x535.google.com (mail-pg1-x535.google.com
 [IPv6:2607:f8b0:4864:20::535])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BCE551BD9
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:18:32 -0700 (PDT)
Received: by mail-pg1-x535.google.com with SMTP id
 41be03b00d2f7-5144a9c11c7so2430657a12.2
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:18:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1684455498; x=1687047498;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date
         :message-id:reply-to;
        bh=ZHAWRptnKdIlqvPqFYXdDuWL+3gSoggvY6ciXqnI2hE=;
        b=YwsnyLDTEVB6RHnSTyBcSSNNbF0zuMEmj0CAnBs6m0H1bcEppYJF7XnqPU2rhvMi1R
         I9oZhAGma8mBAdEA+NNuxJ5eoKQODw2AFmNBuhWlySf+49qMcSe0kQEbvvYVbS2I4OO9
         3Q8aLUMkVAogIVNVLK20UfxgPdwAVWkeFb2j6EEa7wWwCoadOjJDEggxF8SemIEixA4H
         Hc/d8JkQFfa/sAeTeuko6tDDi6oa28SBKcsyLhJzOSnYZG7kPUilSj8Y2ubOEBLrLCGr
         qcjlnINrTNB+Vo9+4wD6o5563UL8kwkrGnpEwM4J4omUyotpFrbVwwHpH6YipW0+RVIw
         MYQQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684455498; x=1687047498;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=ZHAWRptnKdIlqvPqFYXdDuWL+3gSoggvY6ciXqnI2hE=;
        b=WrSboK72l7gDgzTIWdoB1xUwmZHCLABwvMYiBP/olJqJg1z0iHCMFK97N7R9wjonoJ
         ZsrYCckzuC5CPasVQDsCgFi5jloGXfwDMfB5/evN3BLksPqGF727spM29o6qw6XTCyRp
         GX94d12mONQ9KhiqFc3W3plgI+Et+FP9nHcdHj6g5JtqqsnS7YOZ1Cb+mVRWKUJ8XBic
         TdT6muBvnUIkwzpTsJKvYDUfq3TUe/qQ6DWpxHQS0Xy/2yxSCF/Hlx86hXp0Tod69D1X
         Y+uOdACMmMbpvTw2Ylk8r9e6UdpqmMOzuAhVpdtVw1A+XRfzka8Cr0btDWaaSoXWguM7
         QZhA==
X-Gm-Message-State: AC+VfDxM5gOvmaFl3SwnHUzne0qxcTh1n3gNvqfsLTGkKZXR/vCzlrKF
        /kS92VirNDMBQzM7i5Yqr6z+uyo3e74=
X-Received: by 2002:a17:903:11c9:b0:1ac:750e:33d5 with SMTP id
 q9-20020a17090311c900b001ac750e33d5mr1133414plh.15.1684455497594;
        Thu, 18 May 2023 17:18:17 -0700 (PDT)
Received: from localhost
 (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com.
 [2603:800c:1a02:1bae:a7fa:157f:969a:4cde])
        by smtp.gmail.com with ESMTPSA id
 q7-20020a170902edc700b001ae0b373382sm2039958plk.198.2023.05.18.17.18.16
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 18 May 2023 17:18:17 -0700 (PDT)
Sender: Tejun Heo <htejun@gmail.com>
From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
        linux-kernel@vger.kernel.org, kernel-team@meta.com,
        joshdon@google.com, brho@google.com, briannorris@chromium.org,
        nhuck@google.com, agk@redhat.com, snitzer@kernel.org,
        void@manifault.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 22/24] workqueue: Add "Affinity Scopes and Performance"
 section to documentation
Date: Thu, 18 May 2023 14:17:07 -1000
Message-Id: <20230519001709.2563-23-tj@kernel.org>
X-Mailer: git-send-email 2.40.1
In-Reply-To: <20230519001709.2563-1-tj@kernel.org>
References: <20230519001709.2563-1-tj@kernel.org>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,
        DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
        HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,
        SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no
        version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1766281129268458244?=
X-GMAIL-MSGID: =?utf-8?q?1766281129268458244?=

With affinity scopes and their strictness setting added, unbound workqueues
should now be able to cover wide variety of configurations and use cases.
Unfortunately, the performance picture is not entirely straight-forward due
to a trade-off between efficiency and work-conservation in some situations
necessitating manual configuration.

This patch adds "Affinity Scopes and Performance" section to
Documentation/core-api/workqueue.rst which illustrates the trade-off with a
set of experiments and provides some guidelines.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 Documentation/core-api/workqueue.rst | 184 ++++++++++++++++++++++++++-
 1 file changed, 179 insertions(+), 5 deletions(-)

diff --git a/Documentation/core-api/workqueue.rst b/Documentation/core-api/workqueue.rst
index c73a6df6a118..4a8e764f41ae 100644
--- a/Documentation/core-api/workqueue.rst
+++ b/Documentation/core-api/workqueue.rst
@@ -1,6 +1,6 @@
-====================================
-Concurrency Managed Workqueue (cmwq)
-====================================
+=========
+Workqueue
+=========
 
 :Date: September, 2010
 :Author: Tejun Heo <tj@kernel.org>
@@ -25,8 +25,8 @@ there is no work item left on the workqueue the worker becomes idle.
 When a new work item gets queued, the worker begins executing again.
 
 
-Why cmwq?
-=========
+Why Concurrency Managed Workqueue?
+==================================
 
 In the original wq implementation, a multi threaded (MT) wq had one
 worker thread per CPU and a single threaded (ST) wq had one worker
@@ -408,6 +408,180 @@ directory.
   behavior of older kernels.
 
 
+Affinity Scopes and Performance
+===============================
+
+It'd be ideal if an unbound workqueue's behavior is optimal for vast
+majority of use cases without further tuning. Unfortunately, in the current
+kernel, there exists a pronounced trade-off between locality and utilization
+necessitating explicit configurations when workqueues are heavily used.
+
+Higher locality leads to higher efficiency where more work is performed for
+the same number of consumed CPU cycles. However, higher locality may also
+cause lower overall system utilization if the work items are not spread
+enough across the affinity scopes by the issuers. The following performance
+testing with dm-crypt clearly illustrates this trade-off.
+
+The tests are run on a CPU with 12-cores/24-threads split across four L3
+caches (AMD Ryzen 9 3900x). CPU clock boost is turned off for consistency.
+``/dev/dm-0`` is a dm-crypt device created on NVME SSD (Samsung 990 PRO) and
+opened with ``cryptsetup`` with default settings.
+
+
+Scenario 1: Enough issuers and work spread across the machine
+-------------------------------------------------------------
+
+The command used: ::
+
+  $ fio --filename=/dev/dm-0 --direct=1 --rw=randrw --bs=32k --ioengine=libaio \
+    --iodepth=64 --runtime=60 --numjobs=24 --time_based --group_reporting \
+    --name=iops-test-job --verify=sha512
+
+There are 24 issuers, each issuing 64 IOs concurrently. ``--verify=sha512``
+makes ``fio`` generate and read back the content each time which makes
+execution locality matter between the issuer and ``kcryptd``. The followings
+are the read bandwidths and CPU utilizations depending on different affinity
+scope settings on ``kcryptd`` measured over five runs. Bandwidths are in
+MiBps, and CPU util in percents.
+
+.. list-table::
+   :widths: 16 20 20
+   :header-rows: 1
+
+   * - Affinity
+     - Bandwidth (MiBps)
+     - CPU util (%)
+
+   * - system
+     - 1159.40 ±1.34
+     - 99.31 ±0.02
+
+   * - cache
+     - 1166.40 ±0.89
+     - 99.34 ±0.01
+
+   * - cache (strict)
+     - 1166.00 ±0.71
+     - 99.35 ±0.01
+
+With enough issuers spread across the system, there is no downside to
+"cache", strict or otherwise. All three configurations saturate the whole
+machine but the cache-affine ones outperform by 0.6% thanks to improved
+locality.
+
+
+Scenario 2: Fewer issuers, enough work for saturation
+-----------------------------------------------------
+
+The command used: ::
+
+  $ fio --filename=/dev/dm-0 --direct=1 --rw=randrw --bs=32k \
+    --ioengine=libaio --iodepth=64 --runtime=60 --numjobs=8 \
+    --time_based --group_reporting --name=iops-test-job --verify=sha512
+
+The only difference from the previous scenario is ``--numjobs=8``. There are
+a third of the issuers but is still enough total work to saturate the
+system.
+
+.. list-table::
+   :widths: 16 20 20
+   :header-rows: 1
+
+   * - Affinity
+     - Bandwidth (MiBps)
+     - CPU util (%)
+
+   * - system
+     - 1155.40 ±0.89
+     - 97.41 ±0.05
+
+   * - cache
+     - 1154.40 ±1.14
+     - 96.15 ±0.09
+
+   * - cache (strict)
+     - 1112.00 ±4.64
+     - 93.26 ±0.35
+
+This is more than enough work to saturate the system. Both "system" and
+"cache" are nearly saturating the machine but not fully. "cache" is using
+less CPU but the better efficiency puts it at the same bandwidth as
+"system".
+
+Eight issuers moving around over four L3 cache scope still allow "cache
+(strict)" to mostly saturate the machine but the loss of work conservation
+is now starting to hurt with 3.7% bandwidth loss.
+
+
+Scenario 3: Even fewer issuers, not enough work to saturate
+-----------------------------------------------------------
+
+The command used: ::
+
+  $ fio --filename=/dev/dm-0 --direct=1 --rw=randrw --bs=32k \
+    --ioengine=libaio --iodepth=64 --runtime=60 --numjobs=4 \
+    --time_based --group_reporting --name=iops-test-job --verify=sha512
+
+Again, the only difference is ``--numjobs=4``. With the number of issuers
+reduced to four, there now isn't enough work to saturate the whole system
+and the bandwidth becomes dependent on completion latencies.
+
+.. list-table::
+   :widths: 16 20 20
+   :header-rows: 1
+
+   * - Affinity
+     - Bandwidth (MiBps)
+     - CPU util (%)
+
+   * - system
+     - 993.60 ±1.82
+     - 75.49 ±0.06
+
+   * - cache
+     - 973.40 ±1.52
+     - 74.90 ±0.07
+
+   * - cache (strict)
+     - 828.20 ±4.49
+     - 66.84 ±0.29
+
+Now, the tradeoff between locality and utilization is clearer. "cache" shows
+2% bandwidth loss compared to "system" and "cache (struct)" whopping 20%.
+
+
+Conclusion and Recommendations
+------------------------------
+
+In the above experiments, the efficiency advantage of the "cache" affinity
+scope over "system" is, while consistent and noticeable, small. However, the
+impact is dependent on the distances between the scopes and may be more
+pronounced in processors with more complex topologies.
+
+While the loss of work-conservation in certain scenarios hurts, it is a lot
+better than "cache (strict)" and maximizing workqueue utilization is
+unlikely to be the common case anyway. As such, "cache" is the default
+affinity scope for unbound pools.
+
+* As there is no one option which is great for most cases, workqueue usages
+  that may consume a significant amount of CPU are recommended to configure
+  the workqueues using ``apply_workqueue_attrs()`` and/or enable
+  ``WQ_SYSFS``.
+
+* An unbound workqueue with strict "cpu" affinity scope behaves the same as
+  ``WQ_CPU_INTENSIVE`` per-cpu workqueue. There is no real advanage to the
+  latter and an unbound workqueue provides a lot more flexibility.
+
+* Affinity scopes are introduced in Linux v6.5. To emulate the previous
+  behavior, use strict "numa" affinity scope.
+
+* The loss of work-conservation in non-strict affinity scopes is likely
+  originating from the scheduler. There is no theoretical reason why the
+  kernel wouldn't be able to do the right thing and maintain
+  work-conservation in most cases. As such, it is possible that future
+  scheduler improvements may make most of these tunables unnecessary.
+
+
 Examining Configuration
 =======================
 

From patchwork Fri May 19 00:17:08 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tejun Heo <tj@kernel.org>
X-Patchwork-Id: 96125
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp889269vqo;
        Thu, 18 May 2023 17:26:55 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ7J0zXrQRHsNyPUQa+JGlsScV7ZEGn2DWNI9YqjilLjLgLlPf4p1NuLpP2PzMTYB7wTpSm0
X-Received: by 2002:a17:902:db07:b0:1a6:7b71:e64b with SMTP id
 m7-20020a170902db0700b001a67b71e64bmr4886987plx.15.1684456015394;
        Thu, 18 May 2023 17:26:55 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1684456015; cv=none;
        d=google.com; s=arc-20160816;
        b=diHrutkhAnakAylP1j+J1UNemY44l0qTe/y7c8TS8GJUZUpGdIpFy1Yx6R3bJy6xz9
         ETyV/NrAbIszfDHPChsD9EA3Fcf0HSmqzZonPnTGFBwf+hbN1t60TrDzoE7kGmk76yrf
         Ll08oqaEF9NuyF1hO8bLXBUAkvCle6c0D6GIVcQ7xUzP3nvfrkHwl+nphqGsfuT55yEf
         iP/bkvGcc7bL1HTxbJkxdhN6uft+spLLzyq/VTn0fn8vqzpIVu4theexrV201wfiwmPW
         vf6RriLX7dcbugVyGuT6qMF5xgL/WSU+Au7NABsegAE0CvtKVMrstiDdkBmVRmpxgjWI
         JthQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from:sender
         :dkim-signature;
        bh=XXVXGsiUJa4numQNGqAjT1/ei0l1/pJh5Uz74De66ug=;
        b=Dxzsg4f53VBV1M+JFFdk+EKtRE0qP9KCfC0jgxHli2NQsCVNU+S9ClICz+y8SpS0Re
         xDTtwK6WqwWNc9y4ijDqMm+hKOaTJEOaaRmPGykTlVauoeBDglb0x1lh9aBFyrv6Bizv
         AL7RTCbqR2dcc7fFsFiz/mIc3L76mbUdi1YfpncmgoxjBosSqZ31JKmfGcUCmnC8zz+y
         ZrmFGVTiAPtO4Qu8vzhLmrlmPdW9ObaGpOUbbDCrbb2tD8P2e4tPtZW1yrVFuxVWhsY3
         D9wfGMjkGCyQCBSI0caUPxLE3ncQM4pu7ZL9S0PakzTrscbubBoVBsgvXYcip+AL5IxF
         17PQ==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=nWBVxqSD;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 w34-20020a17090a6ba500b00253695cadcasi568740pjj.180.2023.05.18.17.26.43;
        Thu, 18 May 2023 17:26:55 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b=nWBVxqSD;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231189AbjESATh (ORCPT <rfc822;pacteraone@gmail.com> + 99 others);
        Thu, 18 May 2023 20:19:37 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54622 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231180AbjESAS5 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 May 2023 20:18:57 -0400
Received: from mail-pj1-x1034.google.com (mail-pj1-x1034.google.com
 [IPv6:2607:f8b0:4864:20::1034])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 891C11BDA
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:18:34 -0700 (PDT)
Received: by mail-pj1-x1034.google.com with SMTP id
 98e67ed59e1d1-2532d6c7ef2so208373a91.0
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:18:34 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1684455499; x=1687047499;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date
         :message-id:reply-to;
        bh=XXVXGsiUJa4numQNGqAjT1/ei0l1/pJh5Uz74De66ug=;
        b=nWBVxqSDQEg2N/y5Gt5BtJaS7uzOHTO9qdA79n5BDm++iNoUmSf4Zm5PKGoOKg5R9x
         6smHff3+YmSM4/5i4QSrkE3BHdYqVWGpQTM6gjtdFNg0tDxHSEKzVnMrfw01elzYhJWW
         1hk43D2jNaMU7fTs0E2txKSKqeq0aAIcqZ3oXGpmoz6mRbsrLX3KtkVvOTIn+aqIsJdP
         cA9q5pZhGL+K5neayr2FL9vF6Yy8SCQGEyOkJghzo7MW7oxXZp1G25knq2mwlFwZSyAj
         yl9aaX1iLK7mcokdEi+E0B/7/eEg2MD08WtSlzef7LKVo5yi2I9FWS27X/Zaum45ha1e
         HP1w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684455499; x=1687047499;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=XXVXGsiUJa4numQNGqAjT1/ei0l1/pJh5Uz74De66ug=;
        b=Xa7xyRqj96zfJusWNKtL1AxyIq/qCmW8NxUeMvCR80g3SSyCVdMMxXdEB1zazzYpBD
         ZYw4u+urA1SglKEctvMl0lF3EoMDjEQlbTh669hAGddXv6qXQBEXsrwlAS6LtMkhr6Tr
         vRn+e10WRVlHJj/wbFdPeCAkz2xjy6/DZCwGZLbJEsBmXwyr+hHgcULfO3oR/r6Fa3N3
         6IOgjN3k8BenhDTEOrKETt5CVGcOrPakjLIteM7+AjjZpBd9/+bZBVwZOpAaIwF/hztj
         onCB+QOsn/oXa320hhJ8FMhGvFQf1AhWgFbnQFWAKQ5EvQgXx5ImYeN/zRKdCcdzRzBg
         RA8A==
X-Gm-Message-State: AC+VfDypHkFtcOMK66FNojKyGSEGl30BJ4q6wCcrJRFXl96l6xezNv2N
        4KLRZ6pWEpafAfd6yVHWmfY=
X-Received: by 2002:a17:90a:3186:b0:252:7372:460c with SMTP id
 j6-20020a17090a318600b002527372460cmr946441pjb.4.1684455499325;
        Thu, 18 May 2023 17:18:19 -0700 (PDT)
Received: from localhost
 (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com.
 [2603:800c:1a02:1bae:a7fa:157f:969a:4cde])
        by smtp.gmail.com with ESMTPSA id
 b8-20020a17090acc0800b00253508d9145sm230910pju.46.2023.05.18.17.18.18
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 18 May 2023 17:18:19 -0700 (PDT)
Sender: Tejun Heo <htejun@gmail.com>
From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
        linux-kernel@vger.kernel.org, kernel-team@meta.com,
        joshdon@google.com, brho@google.com, briannorris@chromium.org,
        nhuck@google.com, agk@redhat.com, snitzer@kernel.org,
        void@manifault.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 23/24] workqueue: Add pool_workqueue->cpu
Date: Thu, 18 May 2023 14:17:08 -1000
Message-Id: <20230519001709.2563-24-tj@kernel.org>
X-Mailer: git-send-email 2.40.1
In-Reply-To: <20230519001709.2563-1-tj@kernel.org>
References: <20230519001709.2563-1-tj@kernel.org>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,
        DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
        HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,
        SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no
        version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1766280151238546412?=
X-GMAIL-MSGID: =?utf-8?q?1766280151238546412?=

For both per-cpu and unbound workqueues, pwq's (pool_workqueue's) are
per-cpu. For per-cpu workqueues, we can find out the associated CPU from
pwq->pool->cpu but unbound pools don't have specific CPUs associated. Let's
add pwq->cpu so that given an unbound work item, we can determine which CPU
it was queued on through get_work_pwq(work)->cpu.

This will be used to improve execution locality on unbound workqueues.

NOT_FOR_UPSTREAM
---
 kernel/workqueue.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 3ce4c18e139c..4efb0bd6f2e0 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -226,6 +226,7 @@ enum pool_workqueue_stats {
 struct pool_workqueue {
 	struct worker_pool	*pool;		/* I: the associated pool */
 	struct workqueue_struct *wq;		/* I: the owning workqueue */
+	int			cpu;		/* I: the associated CPU */
 	int			work_color;	/* L: current color */
 	int			flush_color;	/* L: flushing color */
 	int			refcnt;		/* L: reference count */
@@ -4131,7 +4132,7 @@ static void pwq_adjust_max_active(struct pool_workqueue *pwq)
 
 /* initialize newly allocated @pwq which is associated with @wq and @pool */
 static void init_pwq(struct pool_workqueue *pwq, struct workqueue_struct *wq,
-		     struct worker_pool *pool)
+		     struct worker_pool *pool, int cpu)
 {
 	BUG_ON((unsigned long)pwq & WORK_STRUCT_FLAG_MASK);
 
@@ -4139,6 +4140,7 @@ static void init_pwq(struct pool_workqueue *pwq, struct workqueue_struct *wq,
 
 	pwq->pool = pool;
 	pwq->wq = wq;
+	pwq->cpu = cpu;
 	pwq->flush_color = -1;
 	pwq->refcnt = 1;
 	INIT_LIST_HEAD(&pwq->inactive_works);
@@ -4169,8 +4171,9 @@ static void link_pwq(struct pool_workqueue *pwq)
 }
 
 /* obtain a pool matching @attr and create a pwq associating the pool and @wq */
-static struct pool_workqueue *alloc_unbound_pwq(struct workqueue_struct *wq,
-					const struct workqueue_attrs *attrs)
+static struct pool_workqueue *
+alloc_unbound_pwq(struct workqueue_struct *wq,
+		  const struct workqueue_attrs *attrs, int cpu)
 {
 	struct worker_pool *pool;
 	struct pool_workqueue *pwq;
@@ -4187,7 +4190,7 @@ static struct pool_workqueue *alloc_unbound_pwq(struct workqueue_struct *wq,
 		return NULL;
 	}
 
-	init_pwq(pwq, wq, pool);
+	init_pwq(pwq, wq, pool, cpu);
 	return pwq;
 }
 
@@ -4313,7 +4316,7 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
 	 * the default pwq covering whole @attrs->cpumask.  Always create
 	 * it even if we don't use it immediately.
 	 */
-	ctx->dfl_pwq = alloc_unbound_pwq(wq, new_attrs);
+	ctx->dfl_pwq = alloc_unbound_pwq(wq, new_attrs, -1);
 	if (!ctx->dfl_pwq)
 		goto out_free;
 
@@ -4323,7 +4326,7 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
 			ctx->pwq_tbl[cpu] = ctx->dfl_pwq;
 		} else {
 			wq_calc_pod_cpumask(new_attrs, cpu, -1);
-			ctx->pwq_tbl[cpu] = alloc_unbound_pwq(wq, new_attrs);
+			ctx->pwq_tbl[cpu] = alloc_unbound_pwq(wq, new_attrs, cpu);
 			if (!ctx->pwq_tbl[cpu])
 				goto out_free;
 		}
@@ -4486,7 +4489,7 @@ static void wq_update_pod(struct workqueue_struct *wq, int cpu, bool online)
 		return;
 
 	/* create a new pwq */
-	pwq = alloc_unbound_pwq(wq, target_attrs);
+	pwq = alloc_unbound_pwq(wq, target_attrs, cpu);
 	if (!pwq) {
 		pr_warn("workqueue: allocation failed while updating CPU pod affinity of \"%s\"\n",
 			wq->name);
@@ -4530,7 +4533,7 @@ static int alloc_and_link_pwqs(struct workqueue_struct *wq)
 			if (!*pwq_p)
 				goto enomem;
 
-			init_pwq(*pwq_p, wq, pool);
+			init_pwq(*pwq_p, wq, pool, cpu);
 
 			mutex_lock(&wq->mutex);
 			link_pwq(*pwq_p);

From patchwork Fri May 19 00:17:09 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tejun Heo <tj@kernel.org>
X-Patchwork-Id: 96142
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp896758vqo;
        Thu, 18 May 2023 17:45:59 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ56e6dTxQOpIf2xm/rU0kM6LIq74OS+ITVjeMKOjXpDpPJtIGkoCx/Oj62qP2Y9mtORYXcP
X-Received: by 2002:a05:6a20:7295:b0:103:ef39:a832 with SMTP id
 o21-20020a056a20729500b00103ef39a832mr273761pzk.23.1684457158820;
        Thu, 18 May 2023 17:45:58 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1684457158; cv=none;
        d=google.com; s=arc-20160816;
        b=A/Pw2r4BtcKuof8eoGwG96hij3DuMQ59W31bRXs4bb+fKks6TersGDRo4Aox37fTw/
         6xNdcWBlXI82eRFuKSz1KL75Btzcn6kUgRO9QI3EOEv06h/2nuLuXJEUuKXXeQZGoABs
         UK4+oFRUttnfV3HdfRS6+xi1atxQhPBHWlW8kfup1/YcTWa+g1AmbR7sQ5RspPGA1RzH
         QViLLXyyFf6Yj45//FsRIAGFdehbUo/ADkpkNMwKuAqFsW+5aXXPHZF97Ygf1HnkLAjC
         2n9No+mgPREezbnkHdIrQ/rPWpForVzSrqykL+TlMG437ts3t1UWQrAfqnYgGj6ELhXV
         ElMg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from:sender
         :dkim-signature;
        bh=joGAwVPVcQFy7FDlG7CBQ2u/DQ455OmkdcCWfnguelY=;
        b=aQwDaXUUwePviGPkFPk2xFv9Xj9p52EVIRUPExc+HyPJthBYHUl/i0AezG3NOLgzNk
         8JNBXIhUofWzHxtO30SswnC1j7s5xrVe8VCZNUsPuGdmPkAflkPPsyDxtTmRTFRr9k+c
         06VGRNIn3exlvuQB2iG86En8g7Kb929ZYWhe161L7CTLm4RvB1DD1VAxONoMZoLsyFlv
         TH9D/edLVhWMWmBOqIwtWTk/TtHrMbkp9uTv1FpB5cbHpkPAFnC1PJBBi6KGxXAzJlJM
         wBgf29KkOLOq2h53/wfAvu2/1avceGVK29hwqgm4xW2j969TDoqteAfS2dhaL2HToOdC
         tsNA==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b="CnIlNJ/Z";
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 c20-20020a6566d4000000b005307ce6fd00si2634941pgw.389.2023.05.18.17.45.46;
        Thu, 18 May 2023 17:45:58 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20221208 header.b="CnIlNJ/Z";
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230519AbjESAT4 (ORCPT <rfc822;pacteraone@gmail.com> + 99 others);
        Thu, 18 May 2023 20:19:56 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54290 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230414AbjESAS6 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 May 2023 20:18:58 -0400
Received: from mail-pf1-x42a.google.com (mail-pf1-x42a.google.com
 [IPv6:2607:f8b0:4864:20::42a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A2491BE1
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:18:36 -0700 (PDT)
Received: by mail-pf1-x42a.google.com with SMTP id
 d2e1a72fcca58-64384c6797eso2092951b3a.2
        for <linux-kernel@vger.kernel.org>;
 Thu, 18 May 2023 17:18:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1684455501; x=1687047501;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date
         :message-id:reply-to;
        bh=joGAwVPVcQFy7FDlG7CBQ2u/DQ455OmkdcCWfnguelY=;
        b=CnIlNJ/Z8M4Rgn+Eb2vs/s+F0sOh9YhYCNuG7/VDRvbnTQ1yalu6MN8uhyHx7OYva5
         e6kvkCf38vwzhP7lrBI2tCqwbQB6+E0DNPfgotudG7jw3492uGnHNu6EfKUdrM0pxZks
         RJU7QxxMIxV4hy1wACxJtKQOxiP3ajNzX3XTF6HHJA1VnO17V+B5m3A9q2NB30nvkeTh
         0LpPYWN9OXou4cjay7SwU2hsMODsE6NgtzHWKOwRKhf/81dFiEr7Z0FHTclk3pwc23PU
         tBAexGCvZfk0DWDLWFdEDO/Nna9X3Vxz5utrCiNCZuJxoDDKwTjmARyX4qI7T3jVR1Su
         xyIQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684455502; x=1687047502;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=joGAwVPVcQFy7FDlG7CBQ2u/DQ455OmkdcCWfnguelY=;
        b=O6BsuBPHk5twLfcvykxujC1jP3YcXFxH2CGQE+Mksg4fc1N8FXwQYpG2OWvwSCU6uu
         iIp4Yzv0/LGH5ry/9gEOMFnx4VXKTiCW9R6lY1lAYs1MFi9jAaA6HAwQu3opCfY+A3Or
         MGwIOMecm3GZzyxEr9ja+g+OMAtHvliTUUh5kbnZZxWOrUGiAbW2yo1TiLW+Fc9MSlWb
         /AL2JSqWEwX/xvdJdgivKZ7bXsOvuevNR54zcN2LYFavq1dWTPhQMLXtVTsl1ffn1hqa
         5pzfFHh+KJ8nSEs0eaNodwvA0xESk/jHbvchCs4QpForyNKF1IOH89yqiA7tqqo/9BXS
         4fLQ==
X-Gm-Message-State: AC+VfDyLDihKKHoI20WXHxE22dON5actJwZLNJ3FJChnDc64j+2OopI0
        r/hz7raKa6uNaD5usV453mA=
X-Received: by 2002:a05:6a21:6d95:b0:ff:9f85:41e1 with SMTP id
 wl21-20020a056a216d9500b000ff9f8541e1mr213750pzb.40.1684455501262;
        Thu, 18 May 2023 17:18:21 -0700 (PDT)
Received: from localhost
 (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com.
 [2603:800c:1a02:1bae:a7fa:157f:969a:4cde])
        by smtp.gmail.com with ESMTPSA id
 ev23-20020a17090aead700b0023cfdbb6496sm269966pjb.1.2023.05.18.17.18.20
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 18 May 2023 17:18:20 -0700 (PDT)
Sender: Tejun Heo <htejun@gmail.com>
From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
        linux-kernel@vger.kernel.org, kernel-team@meta.com,
        joshdon@google.com, brho@google.com, briannorris@chromium.org,
        nhuck@google.com, agk@redhat.com, snitzer@kernel.org,
        void@manifault.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 24/24] workqueue: Implement localize-to-issuing-CPU for
 unbound workqueues
Date: Thu, 18 May 2023 14:17:09 -1000
Message-Id: <20230519001709.2563-25-tj@kernel.org>
X-Mailer: git-send-email 2.40.1
In-Reply-To: <20230519001709.2563-1-tj@kernel.org>
References: <20230519001709.2563-1-tj@kernel.org>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,
        DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
        HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,
        SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no
        version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1766281349986333735?=
X-GMAIL-MSGID: =?utf-8?q?1766281349986333735?=

The non-strict cache affinity scope provides a reasonable default behavior
for improving execution locality while avoiding strict utilization limits
and the overhead of too-fine-grained scopes. However, it ignores L1/2
locality which may benefit some workloads.

This patch implements workqueue_attrs->localize which, when turned on, tries
to put the worker on the work item's issuing CPU when starting execution in
the same way non-strict cache affinity is implemented. As it uses the same
task_struct->wake_cpu, the same caveats apply. It isn't clear whether this
is an acceptable use of the scheduler property and there is a small race
window where the setting from position_worker() may be ignored.

To locate a worker on the work item's issuing CPU, we need to pre-assign the
work item to the worker before waking it up; otherwise, we can't know which
exact worker the work item is going to be assigned to. For work items that
request localization, this patch updates kick_pool() to pre-assign each work
item to an idle worker, exit the worker from the idle state before waking it
up. In turn, worker_thread() directly proceeds to work item execution if
IDLE was already clear when it woke up.

Theoretically, localizing to the issuing CPU without any hard restrictions
should be the best option as it tells the scheduler the best CPU to use for
locality without any restrictions on future scheduler decisions. However, in
practice, this doesn't work out that way due to loss of work conservation.
As such, this patch isn't for upstream yet. See the cover letter for further
discussion.

NOT_FOR_UPSTREAM
---
 Documentation/core-api/workqueue.rst |  38 +++---
 include/linux/workqueue.h            |  10 ++
 kernel/workqueue.c                   | 183 +++++++++++++++++++--------
 tools/workqueue/wq_dump.py           |   7 +-
 tools/workqueue/wq_monitor.py        |   8 +-
 5 files changed, 170 insertions(+), 76 deletions(-)

diff --git a/Documentation/core-api/workqueue.rst b/Documentation/core-api/workqueue.rst
index 4a8e764f41ae..3a7b3b0e7196 100644
--- a/Documentation/core-api/workqueue.rst
+++ b/Documentation/core-api/workqueue.rst
@@ -665,25 +665,25 @@ Monitoring
 Use tools/workqueue/wq_monitor.py to monitor workqueue operations: ::
 
   $ tools/workqueue/wq_monitor.py events
-                              total  infl  CPUtime  CPUhog CMW/RPR  mayday rescued
-  events                      18545     0      6.1       0       5       -       -
-  events_highpri                  8     0      0.0       0       0       -       -
-  events_long                     3     0      0.0       0       0       -       -
-  events_unbound              38306     0      0.1       -       7       -       -
-  events_freezable                0     0      0.0       0       0       -       -
-  events_power_efficient      29598     0      0.2       0       0       -       -
-  events_freezable_power_        10     0      0.0       0       0       -       -
-  sock_diag_events                0     0      0.0       0       0       -       -
-
-                              total  infl  CPUtime  CPUhog CMW/RPR  mayday rescued
-  events                      18548     0      6.1       0       5       -       -
-  events_highpri                  8     0      0.0       0       0       -       -
-  events_long                     3     0      0.0       0       0       -       -
-  events_unbound              38322     0      0.1       -       7       -       -
-  events_freezable                0     0      0.0       0       0       -       -
-  events_power_efficient      29603     0      0.2       0       0       -       -
-  events_freezable_power_        10     0      0.0       0       0       -       -
-  sock_diag_events                0     0      0.0       0       0       -       -
+                              total  infl  CPUtime  CPUlocal CPUhog CMW/RPR  mayday rescued
+  events                      18545     0      6.1     18545      0       5       -       -
+  events_highpri                  8     0      0.0         8      0       0       -       -
+  events_long                     3     0      0.0         3      0       0       -       -
+  events_unbound              38306     0      0.1      9432      -       7       -       -
+  events_freezable                0     0      0.0         0      0       0       -       -
+  events_power_efficient      29598     0      0.2     29598      0       0       -       -
+  events_freezable_power_        10     0      0.0        10      0       0       -       -
+  sock_diag_events                0     0      0.0         0      0       0       -       -
+
+                              total  infl  CPUtime  CPUlocal CPUhog CMW/RPR  mayday rescued
+  events                      18548     0      6.1     18548      0       5       -       -
+  events_highpri                  8     0      0.0         8      0       0       -       -
+  events_long                     3     0      0.0         3      0       0       -       -
+  events_unbound              38322     0      0.1      9440      -       7       -       -
+  events_freezable                0     0      0.0         0      0       0       -       -
+  events_power_efficient      29603     0      0.2     29063      0       0       -       -
+  events_freezable_power_        10     0      0.0        10      0       0       -       -
+  sock_diag_events                0     0      0.0         0      0       0       -       -
 
   ...
 
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 751eb915e3f0..d989f95f6646 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -197,6 +197,16 @@ struct workqueue_attrs {
 	 */
 	enum wq_affn_scope affn_scope;
 
+	/**
+	 * @localize: always put worker on work item's issuing CPU
+	 *
+	 * When starting execution of a work item, always move the assigned
+	 * worker to the CPU the work item was issued on. The scheduler is free
+	 * to move the worker around afterwards as allowed by the affinity
+	 * scope.
+	 */
+	bool localize;
+
 	/**
 	 * @ordered: work items must be executed one by one in queueing order
 	 */
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 4efb0bd6f2e0..b2e914655f05 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -208,6 +208,7 @@ enum pool_workqueue_stats {
 	PWQ_STAT_STARTED,	/* work items started execution */
 	PWQ_STAT_COMPLETED,	/* work items completed execution */
 	PWQ_STAT_CPU_TIME,	/* total CPU time consumed */
+	PWQ_STAT_CPU_LOCAL,	/* work items started on the issuing CPU */
 	PWQ_STAT_CPU_INTENSIVE,	/* wq_cpu_intensive_thresh_us violations */
 	PWQ_STAT_CM_WAKEUP,	/* concurrency-management worker wakeups */
 	PWQ_STAT_REPATRIATED,	/* unbound workers brought back into scope */
@@ -1087,51 +1088,76 @@ static bool assign_work(struct work_struct *work, struct worker *worker,
 }
 
 /**
- * kick_pool - wake up an idle worker if necessary
+ * kick_pool - wake up workers and optionally assign work items to them
  * @pool: pool to kick
  *
- * @pool may have pending work items. Wake up worker if necessary. Returns
- * whether a worker was woken up.
+ * @pool may have pending work items. Either wake up one idle worker or multiple
+ * with work items pre-assigned. See the in-line comments.
  */
 static bool kick_pool(struct worker_pool *pool)
 {
-	struct worker *worker = first_idle_worker(pool);
-	struct task_struct *p;
+	bool woken_up = false;
+	struct worker *worker;
 
 	lockdep_assert_held(&pool->lock);
 
-	if (!need_more_worker(pool) || !worker)
-		return false;
-
-	p = worker->task;
-
+	while (need_more_worker(pool) && (worker = first_idle_worker(pool))) {
+		struct task_struct *p = worker->task;
 #ifdef CONFIG_SMP
-	/*
-	 * Idle @worker is about to execute @work and waking up provides an
-	 * opportunity to migrate @worker at a lower cost by setting the task's
-	 * wake_cpu field. Let's see if we want to move @worker to improve
-	 * execution locality.
-	 *
-	 * We're waking the worker that went idle the latest and there's some
-	 * chance that @worker is marked idle but hasn't gone off CPU yet. If
-	 * so, setting the wake_cpu won't do anything. As this is a best-effort
-	 * optimization and the race window is narrow, let's leave as-is for
-	 * now. If this becomes pronounced, we can skip over workers which are
-	 * still on cpu when picking an idle worker.
-	 *
-	 * If @pool has non-strict affinity, @worker might have ended up outside
-	 * its affinity scope. Repatriate.
-	 */
-	if (!pool->attrs->affn_strict &&
-	    !cpumask_test_cpu(p->wake_cpu, pool->attrs->__pod_cpumask)) {
 		struct work_struct *work = list_first_entry(&pool->worklist,
 						struct work_struct, entry);
-		p->wake_cpu = cpumask_any_distribute(pool->attrs->__pod_cpumask);
-		get_work_pwq(work)->stats[PWQ_STAT_REPATRIATED]++;
-	}
+		struct pool_workqueue *pwq = get_work_pwq(work);
+		struct workqueue_struct *wq = pwq->wq;
+
+		/*
+		 * Idle @worker is about to execute @work and waking up provides
+		 * an opportunity to migrate @worker at a lower cost by setting
+		 * the task's wake_cpu field. Let's see if we want to move
+		 * @worker to improve execution locality.
+		 *
+		 * We're waking the worker that went idle the latest and there's
+		 * some chance that @worker is marked idle but hasn't gone off
+		 * CPU yet. If so, setting the wake_cpu won't do anything. As
+		 * this is a best-effort optimization and the race window is
+		 * narrow, let's leave as-is for now. If this becomes
+		 * pronounced, we can skip over workers which are still on cpu
+		 * when picking an idle worker.
+		 */
+
+		/*
+		 * If @work's workqueue requests localization, @work has CPU
+		 * assigned and there are enough idle workers, pre-assign @work
+		 * to @worker and tell the scheduler to try to wake up @worker
+		 * on @work's issuing CPU. Be careful that ->localize is a
+		 * workqueue attribute, not a pool one.
+		 */
+		if (wq->unbound_attrs && wq->unbound_attrs->localize &&
+		    pwq->cpu >= 0 && pool->nr_idle > 1) {
+			if (assign_work(work, worker, NULL)) {
+				worker_leave_idle(worker);
+				p->wake_cpu = pwq->cpu;
+				wake_up_process(worker->task);
+				woken_up = true;
+				continue;
+			}
+		}
+
+		/*
+		 * If @pool has non-strict affinity, @worker might have ended up
+		 * outside its affinity scope. Repatriate.
+		 */
+		if (!pool->attrs->affn_strict &&
+		    !cpumask_test_cpu(p->wake_cpu, pool->attrs->__pod_cpumask)) {
+			p->wake_cpu = cpumask_any_distribute(
+						pool->attrs->__pod_cpumask);
+			pwq->stats[PWQ_STAT_REPATRIATED]++;
+		}
 #endif
-	wake_up_process(p);
-	return true;
+		wake_up_process(p);
+		return true;
+	}
+
+	return woken_up;
 }
 
 #ifdef CONFIG_WQ_CPU_INTENSIVE_REPORT
@@ -2607,6 +2633,8 @@ __acquires(&pool->lock)
 	 */
 	lockdep_invariant_state(true);
 	pwq->stats[PWQ_STAT_STARTED]++;
+	if (pwq->cpu == smp_processor_id())
+		pwq->stats[PWQ_STAT_CPU_LOCAL]++;
 	trace_workqueue_execute_start(work);
 	worker->current_func(work);
 	/*
@@ -2730,22 +2758,26 @@ static int worker_thread(void *__worker)
 		return 0;
 	}
 
-	worker_leave_idle(worker);
-recheck:
-	/* no more worker necessary? */
-	if (!need_more_worker(pool))
-		goto sleep;
-
-	/* do we need to manage? */
-	if (unlikely(!may_start_working(pool)) && manage_workers(worker))
-		goto recheck;
-
 	/*
-	 * ->scheduled list can only be filled while a worker is
-	 * preparing to process a work or actually processing it.
-	 * Make sure nobody diddled with it while I was sleeping.
+	 * If kick_pool() assigned a work item to us, it made sure that there
+	 * are other idle workers to serve the manager role and moved us out of
+	 * the idle state already. If IDLE is clear, skip manager check and
+	 * start executing the work items on @worker->scheduled right away.
 	 */
-	WARN_ON_ONCE(!list_empty(&worker->scheduled));
+	if (worker->flags & WORKER_IDLE) {
+		WARN_ON_ONCE(!list_empty(&worker->scheduled));
+		worker_leave_idle(worker);
+
+		while (true) {
+			/* no more worker necessary? */
+			if (!need_more_worker(pool))
+				goto sleep;
+			/* do we need to manage? */
+			if (likely(may_start_working(pool)) ||
+			    !manage_workers(worker))
+				break;
+		}
+	}
 
 	/*
 	 * Finish PREP stage.  We're guaranteed to have at least one idle
@@ -2756,14 +2788,31 @@ static int worker_thread(void *__worker)
 	 */
 	worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);
 
-	do {
-		struct work_struct *work =
-			list_first_entry(&pool->worklist,
-					 struct work_struct, entry);
+	/*
+	 * If we woke up with IDLE cleared, there may already be work items on
+	 * ->scheduled. Always run process_scheduled_works() at least once. Note
+	 * that ->scheduled can be empty even after !IDLE wake-up as the
+	 * scheduled work item could have been canceled in-between.
+	 */
+	process_scheduled_works(worker);
 
-		if (assign_work(work, worker, NULL))
-			process_scheduled_works(worker);
-	} while (keep_working(pool));
+	/*
+	 * For unbound workqueues, the following keep_working() would be true
+	 * only when there are worker shortages. Otherwise, work items would
+	 * have been assigned to workers on queueing.
+	 */
+	while (keep_working(pool)) {
+		struct work_struct *work = list_first_entry(&pool->worklist,
+						struct work_struct, entry);
+		/*
+		 * An unbound @worker here might not be on the same CPU as @work
+		 * which is unfortunate if the workqueue has localization turned
+		 * on. However, it shouldn't be a problem in practice as this
+		 * path isn't taken often for unbound workqueues.
+		 */
+		assign_work(work, worker, NULL);
+		process_scheduled_works(worker);
+	}
 
 	worker_set_flags(worker, WORKER_PREP);
 sleep:
@@ -3737,6 +3786,7 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
 	 * get_unbound_pool() explicitly clears the fields.
 	 */
 	to->affn_scope = from->affn_scope;
+	to->localize = from->localize;
 	to->ordered = from->ordered;
 }
 
@@ -4020,6 +4070,7 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
 
 	/* clear wq-only attr fields. See 'struct workqueue_attrs' comments */
 	pool->attrs->affn_scope = WQ_AFFN_NR_TYPES;
+	pool->attrs->localize = false;
 	pool->attrs->ordered = false;
 
 	if (worker_pool_assign_id(pool) < 0)
@@ -5832,6 +5883,7 @@ module_param_cb(default_affinity_scope, &wq_affn_dfl_ops, NULL, 0644);
  *  cpumask		RW mask	: bitmask of allowed CPUs for the workers
  *  affinity_scope	RW str  : worker CPU affinity scope (cache, numa, none)
  *  affinity_strict	RW bool : worker CPU affinity is strict
+ *  localize		RW bool : localize worker to work's origin CPU
  */
 struct wq_device {
 	struct workqueue_struct		*wq;
@@ -6042,11 +6094,34 @@ static ssize_t wq_affinity_strict_store(struct device *dev,
 	return ret ?: count;
 }
 
+static ssize_t wq_localize_show(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	struct workqueue_struct *wq = dev_to_wq(dev);
+
+	return scnprintf(buf, PAGE_SIZE, "%d\n", wq->unbound_attrs->localize);
+}
+
+static ssize_t wq_localize_store(struct device *dev,
+				 struct device_attribute *attr, const char *buf,
+				 size_t count)
+{
+	struct workqueue_struct *wq = dev_to_wq(dev);
+	int v;
+
+	if (sscanf(buf, "%d", &v) != 1)
+		return -EINVAL;
+
+	wq->unbound_attrs->localize = v;
+	return count;
+}
+
 static struct device_attribute wq_sysfs_unbound_attrs[] = {
 	__ATTR(nice, 0644, wq_nice_show, wq_nice_store),
 	__ATTR(cpumask, 0644, wq_cpumask_show, wq_cpumask_store),
 	__ATTR(affinity_scope, 0644, wq_affn_scope_show, wq_affn_scope_store),
 	__ATTR(affinity_strict, 0644, wq_affinity_strict_show, wq_affinity_strict_store),
+	__ATTR(localize, 0644, wq_localize_show, wq_localize_store),
 	__ATTR_NULL,
 };
 
diff --git a/tools/workqueue/wq_dump.py b/tools/workqueue/wq_dump.py
index d0df5833f2c1..036fb89260a3 100644
--- a/tools/workqueue/wq_dump.py
+++ b/tools/workqueue/wq_dump.py
@@ -41,6 +41,7 @@ Lists all workqueues along with their type and worker pool association. For
   NAME      name of the workqueue
   TYPE      percpu, unbound or ordered
   FLAGS     S: strict affinity scope
+            L: localize worker to work item's issuing CPU
   POOL_ID   worker pool ID associated with each possible CPU
 """
 
@@ -160,8 +161,10 @@ print(' dfl]')
             print(' ordered   ', end='')
         else:
             print(' unbound', end='')
-            if wq.unbound_attrs.affn_strict:
-                print(',S ', end='')
+            strict = wq.unbound_attrs.affn_strict
+            local = wq.unbound_attrs.localize
+            if strict or local:
+                print(f',{"S" if strict else "_"}{"L" if local else "_"}', end='')
             else:
                 print('   ', end='')
     else:
diff --git a/tools/workqueue/wq_monitor.py b/tools/workqueue/wq_monitor.py
index a8856a9c45dc..a0b0cd50b629 100644
--- a/tools/workqueue/wq_monitor.py
+++ b/tools/workqueue/wq_monitor.py
@@ -15,6 +15,9 @@ https://github.com/osandov/drgn.
            sampled from scheduler ticks and only provides ballpark
            measurement. "nohz_full=" CPUs are excluded from measurement.
 
+  CPUlocl  The number of times a work item starts executing on the same CPU
+           that the work item was issued on.
+
   CPUitsv  The number of times a concurrency-managed work item hogged CPU
            longer than the threshold (workqueue.cpu_intensive_thresh_us)
            and got excluded from concurrency management to avoid stalling
@@ -66,6 +69,7 @@ WQ_MEM_RECLAIM          = prog['WQ_MEM_RECLAIM']
 PWQ_STAT_STARTED        = prog['PWQ_STAT_STARTED']      # work items started execution
 PWQ_STAT_COMPLETED      = prog['PWQ_STAT_COMPLETED']	# work items completed execution
 PWQ_STAT_CPU_TIME       = prog['PWQ_STAT_CPU_TIME']     # total CPU time consumed
+PWQ_STAT_CPU_LOCAL      = prog['PWQ_STAT_CPU_LOCAL']    # work items started on the issuing CPU
 PWQ_STAT_CPU_INTENSIVE  = prog['PWQ_STAT_CPU_INTENSIVE'] # wq_cpu_intensive_thresh_us violations
 PWQ_STAT_CM_WAKEUP      = prog['PWQ_STAT_CM_WAKEUP']    # concurrency-management worker wakeups
 PWQ_STAT_REPATRIATED    = prog['PWQ_STAT_REPATRIATED']  # unbound workers brought back into scope
@@ -91,6 +95,7 @@ PWQ_NR_STATS            = prog['PWQ_NR_STATS']
                  'started'              : self.stats[PWQ_STAT_STARTED],
                  'completed'            : self.stats[PWQ_STAT_COMPLETED],
                  'cpu_time'             : self.stats[PWQ_STAT_CPU_TIME],
+                 'cpu_local'            : self.stats[PWQ_STAT_CPU_LOCAL],
                  'cpu_intensive'        : self.stats[PWQ_STAT_CPU_INTENSIVE],
                  'cm_wakeup'            : self.stats[PWQ_STAT_CM_WAKEUP],
                  'repatriated'          : self.stats[PWQ_STAT_REPATRIATED],
@@ -98,7 +103,7 @@ PWQ_NR_STATS            = prog['PWQ_NR_STATS']
                  'rescued'              : self.stats[PWQ_STAT_RESCUED], }
 
     def table_header_str():
-        return f'{"":>24} {"total":>8} {"infl":>5} {"CPUtime":>8} '\
+        return f'{"":>24} {"total":>8} {"infl":>5} {"CPUtime":>8} {"CPUlocal":>8} '\
             f'{"CPUitsv":>7} {"CMW/RPR":>7} {"mayday":>7} {"rescued":>7}'
 
     def table_row_str(self):
@@ -121,6 +126,7 @@ PWQ_NR_STATS            = prog['PWQ_NR_STATS']
               f'{self.stats[PWQ_STAT_STARTED]:8} ' \
               f'{max(self.stats[PWQ_STAT_STARTED] - self.stats[PWQ_STAT_COMPLETED], 0):5} ' \
               f'{self.stats[PWQ_STAT_CPU_TIME] / 1000000:8.1f} ' \
+              f'{self.stats[PWQ_STAT_CPU_LOCAL]:8} ' \
               f'{cpu_intensive:>7} ' \
               f'{cmw_rpr:>7} ' \
               f'{mayday:>7} ' \