From patchwork Wed Dec 20 00:18:12 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: John Stultz <jstultz@google.com>
X-Patchwork-Id: 181349
Return-Path: <linux-kernel+bounces-6135-ouuuleilei=gmail.com@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp2322339dyi;
        Tue, 19 Dec 2023 16:19:38 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IG5l+zHbpA8KF598I7yq/kz7o2Fc64G1ytIAgnDyN6p36yeZZ0VJQEd0JVd8U/NIaDFC22c
X-Received: by 2002:a17:902:e741:b0:1d3:c1d2:47a3 with SMTP id
 p1-20020a170902e74100b001d3c1d247a3mr4858806plf.33.1703031577864;
        Tue, 19 Dec 2023 16:19:37 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1703031577; cv=none;
        d=google.com; s=arc-20160816;
        b=0AJMX78hF1++zPzhA5tIAHpvxnBF2EWQxdAvOF7tkE0yehocxdmfgtJU92MKxxsIUZ
         6BoaSgsFN3/83HSL2VZQD/H6TuP1KtDYkIaAHqVXMCPOGQlpyH4tc9eya2tozcYouSlM
         4g6ld1RW0OnA1GqCnQqGcSOn0ZvDU+1AO0WcChcOVn1AyV5Ey/OFQMDCVzIlQ5Z46NHe
         SkMMpig+9UhjOJ8R18DkF3ukUFjRA5QlweYNwnreZ1Ijlkmf7jF/0VfV68bR8l1UsjSv
         mKvwutQ2xRxVpIOF+YLMhpSP/4qE5WvwHomGLp9VQM/hBgZu/Nv/rj9o8ZsE/e53kHkk
         stsA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=cc:to:from:subject:message-id:references:mime-version
         :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date
         :dkim-signature;
        bh=NUHnww5ktcUEm6gbdUuCZMr5JgxeUPAZP1Uwwm4gSJs=;
        fh=K8pCBYg8ykDwAhhq49E3WwqDmzBTrGOj2/v8c6I07V4=;
        b=d12wJdzAJDwRzcyK9kjauQdeezuv7lpoQLbsYlehefqw8PI93WBL2pJhz7wwHxAhQ7
         12nlgjlPzHJdOmI7BblAtZnJDgBYDzhXGpl+tSEBDhuZT7mXDw6lx/LWa6m8I1ik3Nja
         Q81Ce2oY20LICOshQmMKotZEdW3J+R2o8SETfW9GBvx1BAx5FkcOv1mN3Dsb08JWnG9e
         uMDWsjKso2C5hrXzPFO9ulAOXozQHbLGnE8CjHhGhNkJ97B2EN6oCLYbZ8GdMQUAA7Wq
         yJTOxfqsnJ//FrUSkqIbP4hpzl1QBZRh/9Yp3L/KXgY8Hbv6N5WyeRZKYQqkqMBdg4sB
         VcMA==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=JPTksi+1;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6135-ouuuleilei=gmail.com@vger.kernel.org designates
 139.178.88.99 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6135-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99])
        by mx.google.com with ESMTPS id
 c10-20020a170902b68a00b001cf9654ec69si19393003pls.323.2023.12.19.16.19.37
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 19 Dec 2023 16:19:37 -0800 (PST)
Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-6135-ouuuleilei=gmail.com@vger.kernel.org designates
 139.178.88.99 as permitted sender) client-ip=139.178.88.99;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=JPTksi+1;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6135-ouuuleilei=gmail.com@vger.kernel.org designates
 139.178.88.99 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6135-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org
 [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by sv.mirrors.kernel.org (Postfix) with ESMTPS id 9B6B62855C3
	for <ouuuleilei@gmail.com>; Wed, 20 Dec 2023 00:19:37 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp.subspace.kernel.org (Postfix) with ESMTP id CAC9D612C;
	Wed, 20 Dec 2023 00:19:09 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="JPTksi+1"
X-Original-To: linux-kernel@vger.kernel.org
Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com
 [209.85.215.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8652123B8
	for <linux-kernel@vger.kernel.org>; Wed, 20 Dec 2023 00:19:05 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Received: by mail-pg1-f202.google.com with SMTP id
 41be03b00d2f7-5c1b986082dso147474a12.0
        for <linux-kernel@vger.kernel.org>;
 Tue, 19 Dec 2023 16:19:05 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1703031545; x=1703636345;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=NUHnww5ktcUEm6gbdUuCZMr5JgxeUPAZP1Uwwm4gSJs=;
        b=JPTksi+1QkpwIp9mYQ42y/GKuFXCfknzAretgW5L6UmlJPxOJIOL0VN+vdKY4nxSKS
         f1GbT4A9g7hZNUnkn8IKsEWFvhw7WYQtE5SXPgbq9vFPHRoHuKpjjl1poFKG000EcSXV
         g1N1nBZTfDleN4xBL4n6P2WdBH+XpSg388RjD6w5TiOV2ZiuXSpXRy/x22lInm4iIVbM
         1s8qAxjZANGOs3Yd/e4nNgQFu2cQD4O0NJspVoWypSmZhQBAS6Y69ohXpzWEZ2g6c9OC
         3zWqA8hRuN14BcZVDc0Dhbo/nlo4L0Xhv8v5Z36CJG3fgJPKiDBg5T2ku8h0D0TcMYCs
         9qow==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1703031545; x=1703636345;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=NUHnww5ktcUEm6gbdUuCZMr5JgxeUPAZP1Uwwm4gSJs=;
        b=tIUSEbxI9Fl+2y+ZBnAHPGpDUfz+4GinBCQnEr9pAmT7iclZIu9qswUqjcp4HQlb/a
         1ExLsrrU3g8ttap/bPmLOH4vjPZ5qlBROYuNYHZdjBWFoFZ/n82fpHqtF2rt0rJ2I1zc
         zLER9WoUGR+1MDOFCyM4sq9GsFBWlU6G1OcKXNMPFPCMo6z1lMpSojtTEAPOb9HXLjAj
         CWx85YeCfmLRxFcvOlscmd9PUOO30Xy7msXzAq84svzyx3FtHqcHzyu8SI3hNeIpZkh3
         G3r+79gJ+PwcwHWe3AbxzxpUB6Mmpt0Ud2D1mQY6STbI1BTcmcFHJzDsLdNDxgVsKUjr
         ubLg==
X-Gm-Message-State: AOJu0Yx9c2cghaR0EhuvjlsTFa2mrBmBGX9Y5B0FvJ2XQ1Q1MT1ncU8s
	JaX1zuEGyCKJ5PC3jUgF3ZHZwEk/TMrcBhmFG+Xp1DRB4PQWe1DHyie3UdHsm4uOdrpGGpeGFQR
	y7C8piCYBdVBtXKgM0TvZsqJzBle4LYTBQMx3820sIgiVDC+A0opv71Xh8kpWskuuwQzJ+Oo=
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a05:6a02:6a1:b0:5cd:9acb:a4a2 with SMTP id
 ca33-20020a056a0206a100b005cd9acba4a2mr42196pgb.2.1703031543770; Tue, 19 Dec
 2023 16:19:03 -0800 (PST)
Date: Tue, 19 Dec 2023 16:18:12 -0800
In-Reply-To: <20231220001856.3710363-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231220001856.3710363-1-jstultz@google.com>
X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog
Message-ID: <20231220001856.3710363-2-jstultz@google.com>
Subject: [PATCH v7 01/23] sched: Unify runtime accounting across classes
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@google.com>, Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>,
 Youssef Esmat <youssefesmat@google.com>,
	Mel Gorman <mgorman@suse.de>,
 Daniel Bristot de Oliveira <bristot@redhat.com>,
 Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>, kernel-team@android.com,
	"Connor O'Brien" <connoro@google.com>, John Stultz <jstultz@google.com>
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1785758039628543205
X-GMAIL-MSGID: 1785758039628543205

From: Peter Zijlstra <peterz@infradead.org>

All classes use sched_entity::exec_start to track runtime and have
copies of the exact same code around to compute runtime.

Collapse all that.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[fix conflicts, fold in update_current_exec_runtime]
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: rebased, resolving minor conflicts]
Signed-off-by: John Stultz <jstultz@google.com>
---
v7:
* Minor typo fixup suggested by Metin Kaya
---
 include/linux/sched.h    |  2 +-
 kernel/sched/deadline.c  | 13 +++-------
 kernel/sched/fair.c      | 56 ++++++++++++++++++++++++++++++----------
 kernel/sched/rt.c        | 13 +++-------
 kernel/sched/sched.h     | 12 ++-------
 kernel/sched/stop_task.c | 13 +---------
 6 files changed, 52 insertions(+), 57 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 292c31697248..1e80c330f755 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -523,7 +523,7 @@ struct sched_statistics {
 	u64				block_max;
 	s64				sum_block_runtime;
 
-	u64				exec_max;
+	s64				exec_max;
 	u64				slice_max;
 
 	u64				nr_migrations_cold;
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index b28114478b82..6140f1f51da1 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1275,9 +1275,8 @@ static void update_curr_dl(struct rq *rq)
 {
 	struct task_struct *curr = rq->curr;
 	struct sched_dl_entity *dl_se = &curr->dl;
-	u64 delta_exec, scaled_delta_exec;
+	s64 delta_exec, scaled_delta_exec;
 	int cpu = cpu_of(rq);
-	u64 now;
 
 	if (!dl_task(curr) || !on_dl_rq(dl_se))
 		return;
@@ -1290,21 +1289,15 @@ static void update_curr_dl(struct rq *rq)
 	 * natural solution, but the full ramifications of this
 	 * approach need further study.
 	 */
-	now = rq_clock_task(rq);
-	delta_exec = now - curr->se.exec_start;
-	if (unlikely((s64)delta_exec <= 0)) {
+	delta_exec = update_curr_common(rq);
+	if (unlikely(delta_exec <= 0)) {
 		if (unlikely(dl_se->dl_yielded))
 			goto throttle;
 		return;
 	}
 
-	schedstat_set(curr->stats.exec_max,
-		      max(curr->stats.exec_max, delta_exec));
-
 	trace_sched_stat_runtime(curr, delta_exec, 0);
 
-	update_current_exec_runtime(curr, now, delta_exec);
-
 	if (dl_entity_is_special(dl_se))
 		return;
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d7a3c63a2171..1251fd01a555 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1129,23 +1129,17 @@ static void update_tg_load_avg(struct cfs_rq *cfs_rq)
 }
 #endif /* CONFIG_SMP */
 
-/*
- * Update the current task's runtime statistics.
- */
-static void update_curr(struct cfs_rq *cfs_rq)
+static s64 update_curr_se(struct rq *rq, struct sched_entity *curr)
 {
-	struct sched_entity *curr = cfs_rq->curr;
-	u64 now = rq_clock_task(rq_of(cfs_rq));
-	u64 delta_exec;
-
-	if (unlikely(!curr))
-		return;
+	u64 now = rq_clock_task(rq);
+	s64 delta_exec;
 
 	delta_exec = now - curr->exec_start;
-	if (unlikely((s64)delta_exec <= 0))
-		return;
+	if (unlikely(delta_exec <= 0))
+		return delta_exec;
 
 	curr->exec_start = now;
+	curr->sum_exec_runtime += delta_exec;
 
 	if (schedstat_enabled()) {
 		struct sched_statistics *stats;
@@ -1155,9 +1149,43 @@ static void update_curr(struct cfs_rq *cfs_rq)
 				max(delta_exec, stats->exec_max));
 	}
 
-	curr->sum_exec_runtime += delta_exec;
-	schedstat_add(cfs_rq->exec_clock, delta_exec);
+	return delta_exec;
+}
+
+/*
+ * Used by other classes to account runtime.
+ */
+s64 update_curr_common(struct rq *rq)
+{
+	struct task_struct *curr = rq->curr;
+	s64 delta_exec;
 
+	delta_exec = update_curr_se(rq, &curr->se);
+	if (unlikely(delta_exec <= 0))
+		return delta_exec;
+
+	account_group_exec_runtime(curr, delta_exec);
+	cgroup_account_cputime(curr, delta_exec);
+
+	return delta_exec;
+}
+
+/*
+ * Update the current task's runtime statistics.
+ */
+static void update_curr(struct cfs_rq *cfs_rq)
+{
+	struct sched_entity *curr = cfs_rq->curr;
+	s64 delta_exec;
+
+	if (unlikely(!curr))
+		return;
+
+	delta_exec = update_curr_se(rq_of(cfs_rq), curr);
+	if (unlikely(delta_exec <= 0))
+		return;
+
+	schedstat_add(cfs_rq->exec_clock, delta_exec);
 	curr->vruntime += calc_delta_fair(delta_exec, curr);
 	update_deadline(cfs_rq, curr);
 	update_min_vruntime(cfs_rq);
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 6aaf0a3d6081..9cdea3ea47da 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1002,24 +1002,17 @@ static void update_curr_rt(struct rq *rq)
 {
 	struct task_struct *curr = rq->curr;
 	struct sched_rt_entity *rt_se = &curr->rt;
-	u64 delta_exec;
-	u64 now;
+	s64 delta_exec;
 
 	if (curr->sched_class != &rt_sched_class)
 		return;
 
-	now = rq_clock_task(rq);
-	delta_exec = now - curr->se.exec_start;
-	if (unlikely((s64)delta_exec <= 0))
+	delta_exec = update_curr_common(rq);
+	if (unlikely(delta_exec < 0))
 		return;
 
-	schedstat_set(curr->stats.exec_max,
-		      max(curr->stats.exec_max, delta_exec));
-
 	trace_sched_stat_runtime(curr, delta_exec, 0);
 
-	update_current_exec_runtime(curr, now, delta_exec);
-
 	if (!rt_bandwidth_enabled())
 		return;
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 2e5a95486a42..3e0e4fc8734b 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2212,6 +2212,8 @@ struct affinity_context {
 	unsigned int flags;
 };
 
+extern s64 update_curr_common(struct rq *rq);
+
 struct sched_class {
 
 #ifdef CONFIG_UCLAMP_TASK
@@ -3261,16 +3263,6 @@ extern int sched_dynamic_mode(const char *str);
 extern void sched_dynamic_update(int mode);
 #endif
 
-static inline void update_current_exec_runtime(struct task_struct *curr,
-						u64 now, u64 delta_exec)
-{
-	curr->se.sum_exec_runtime += delta_exec;
-	account_group_exec_runtime(curr, delta_exec);
-
-	curr->se.exec_start = now;
-	cgroup_account_cputime(curr, delta_exec);
-}
-
 #ifdef CONFIG_SCHED_MM_CID
 
 #define SCHED_MM_CID_PERIOD_NS	(100ULL * 1000000)	/* 100ms */
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index 6cf7304e6449..b1b8fe61c532 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -70,18 +70,7 @@ static void yield_task_stop(struct rq *rq)
 
 static void put_prev_task_stop(struct rq *rq, struct task_struct *prev)
 {
-	struct task_struct *curr = rq->curr;
-	u64 now, delta_exec;
-
-	now = rq_clock_task(rq);
-	delta_exec = now - curr->se.exec_start;
-	if (unlikely((s64)delta_exec < 0))
-		delta_exec = 0;
-
-	schedstat_set(curr->stats.exec_max,
-		      max(curr->stats.exec_max, delta_exec));
-
-	update_current_exec_runtime(curr, now, delta_exec);
+	update_curr_common(rq);
 }
 
 /*

From patchwork Wed Dec 20 00:18:13 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: John Stultz <jstultz@google.com>
X-Patchwork-Id: 181350
Return-Path: <linux-kernel+bounces-6136-ouuuleilei=gmail.com@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp2322445dyi;
        Tue, 19 Dec 2023 16:19:53 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IGlK+uq8gN0WFq2ZVi7u5GUQfVkQ19vB724/xgr42reMcSpztyW77mhx2NGk/oUV0SSYBNy
X-Received: by 2002:a17:906:5657:b0:a26:88f4:3fae with SMTP id
 v23-20020a170906565700b00a2688f43faemr486602ejr.67.1703031592795;
        Tue, 19 Dec 2023 16:19:52 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1703031592; cv=none;
        d=google.com; s=arc-20160816;
        b=cFvKQfLJVRpsoTYlpuFwBMS4phBuULfvvvwRQeufVecb60lfLBsLVdS9PtWUnmodVG
         zLZmMWCknMY5o5pACoWN5I09S8eMQSvtgckBTuptQemJNfE18Tp3uxKfZkXKP9o87irz
         nLTKB3mq29VQwtJyxEV098IlWxPx8Qh4Fz+NiLK1ZB+WpYtuuo6q5FGGWN3sY0zzxrYY
         gix7QgZi0aPitZUYLT094N2hdk6R30+BkilO/pK3gFeUxees/jcy5crwS+TSWZXfgkoD
         +yYzjbsFqBlJcHylx9grKmFBntfY6TRmFyhKIMuAgjQ784M0nvrU44YJQJv6Ots4ezhq
         NpLw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=cc:to:from:subject:message-id:references:mime-version
         :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date
         :dkim-signature;
        bh=z6PZUZyxFwbhz5KhKnmoOHqC9B4tHeLl4dA8rQU9PhA=;
        fh=lq1k4m46TqOToIb7rHC/F2KmHTJTjmaV9sS3Am9W7nA=;
        b=GMji0IXmRi6bCeMMeVV41vfWMTT4pj8sT0cCfPKzldtHUQRnZ8Z/VSEhZjVrwVNKq7
         IJK2jdc81ahuP6aL7wmlQqIzNhxfWL8y9msqefLlEifZPbEc1IVfG4ElCVFz2VSjti67
         ebuezqx/KmBJacKHQUz4usocHzNatLQqHfgnX4zkc/jXPXi7Btsne13NvOfgWIPoFi4f
         SgyIJuTPhTHMVUiKK6kl8DV+vdGBBNcDYyB6qXv2Epf0/2K4pZoEYEZlCH5HSdYNxuxS
         5VrGQeqO3ett3vThdTVvE7/zQTXSXu9jkIiLKlO9OJ0rUQJAKcJjMeIgX4TTTA3F5WXO
         33Gw==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=GzMWMFRC;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6136-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.80.249 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6136-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249])
        by mx.google.com with ESMTPS id
 xo12-20020a170907bb8c00b00a2358787074si2168977ejc.719.2023.12.19.16.19.52
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 19 Dec 2023 16:19:52 -0800 (PST)
Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-6136-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.80.249 as permitted sender) client-ip=147.75.80.249;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=GzMWMFRC;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6136-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.80.249 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6136-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org
 [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by am.mirrors.kernel.org (Postfix) with ESMTPS id 4539A1F25D4D
	for <ouuuleilei@gmail.com>; Wed, 20 Dec 2023 00:19:52 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp.subspace.kernel.org (Postfix) with ESMTP id C5F428827;
	Wed, 20 Dec 2023 00:19:11 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="GzMWMFRC"
X-Original-To: linux-kernel@vger.kernel.org
Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com
 [209.85.219.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 21C6633C1
	for <linux-kernel@vger.kernel.org>; Wed, 20 Dec 2023 00:19:06 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Received: by mail-yb1-f201.google.com with SMTP id
 3f1490d57ef6-dbcf1b27794so4489037276.2
        for <linux-kernel@vger.kernel.org>;
 Tue, 19 Dec 2023 16:19:06 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1703031546; x=1703636346;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=z6PZUZyxFwbhz5KhKnmoOHqC9B4tHeLl4dA8rQU9PhA=;
        b=GzMWMFRCpwLGwCFswAN0mIhl77VCQO/F0aqbeLqYCa9inJ0xaBH3fwqEhzJ685WeKw
         rd6ex0AxF5GTIIxNocfeUGCuSyCm6ENc5mIEmsxNnndgIf6XIc/mIsZRuJrgHD6dbi4M
         nyRbrwZey3XzRNRkW8zHMmCvh551yeW4Wfjyp9Zn63hXSjBeRlJZflmnPIp3byoR6SKh
         0rJCegXWQ3CaIKQKunAfV2Z4xy/ds9YZhTd8urTRWgUxKBNaA9ioodturM2xdveV6wgA
         e38CGqPjNknX+gFMDLJXvfcvpzbG4sv/b6HjWlobA63mmC4Si4XaqUBfyiATupcE8tnS
         TEvQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1703031546; x=1703636346;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=z6PZUZyxFwbhz5KhKnmoOHqC9B4tHeLl4dA8rQU9PhA=;
        b=bF7rQceiGM5dYY6ODqd6/+cBhFnQ6Hdw/RnLh8CLffAeNOoQNmPb2TPFKto8E9poTW
         Y4Hg6873tLkGiWJJuOG+0LINksfG4l6NvhTv75+3ISZZF4j98iC7IvRXLHyB2BKeHzs1
         aOkSupHpYGmA9a6yCeL1CsFkzb8czKBTWiJWck5f5lx/dxYUS5o/Trh9UvO2OBqG0j3q
         bn0dj0XbUpbg+/Wii8pJoSDSUy459W5Q5z4WmPu8jl8Mw0KF5KZSwo7jloAnUcFjp73p
         c7hb91P3IyrXA/TvpNMCKeMG61oss5flHGW56u3j9Wv9dWxyulesqAenEP7T0ZMg4qWh
         /tAw==
X-Gm-Message-State: AOJu0Yyl2tddcb3EUtFQaZasvYz8igNXxjzmhQNavB+LJdHbLXTjJ0y+
	v697VzIKTgQxJ8xPz3QfJeKbKnWbYY9j1zAldKWoYB571RUzborFysCKyF5k1+kcWn51qXoAR5H
	6yRCoiL8/JVU7nrssJxafMx7lZTTpcLoDpBjZvOlvDCdY/W0OO2QYng0YabVzRWchgpDSNJ8=
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a25:bc8e:0:b0:dbd:49d5:a3a9 with SMTP id
 e14-20020a25bc8e000000b00dbd49d5a3a9mr98638ybk.11.1703031545689; Tue, 19 Dec
 2023 16:19:05 -0800 (PST)
Date: Tue, 19 Dec 2023 16:18:13 -0800
In-Reply-To: <20231220001856.3710363-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231220001856.3710363-1-jstultz@google.com>
X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog
Message-ID: <20231220001856.3710363-3-jstultz@google.com>
Subject: [PATCH v7 02/23] locking/mutex: Remove wakeups from under
 mutex::wait_lock
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@google.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <vschneid@redhat.com>,
 Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Zimuzo Ezeozue <zezeozue@google.com>,
	Youssef Esmat <youssefesmat@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
 Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>, "Paul E. McKenney" <paulmck@kernel.org>,
	Metin Kaya <Metin.Kaya@arm.com>, Xuewen Yan <xuewen.yan94@gmail.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
 Thomas Gleixner <tglx@linutronix.de>, kernel-team@android.com
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1785758055267455926
X-GMAIL-MSGID: 1785758055267455926

In preparation to nest mutex::wait_lock under rq::lock we need to remove
wakeups from under it.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[Heavily changed after 55f036ca7e74 ("locking: WW mutex cleanup") and
08295b3b5bee ("locking: Implement an algorithm choice for Wound-Wait
mutexes")]
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
[jstultz: rebased to mainline, added extra wake_up_q & init
 to avoid hangs, similar to Connor's rework of this patch]
Signed-off-by: John Stultz <jstultz@google.com>
---
v5:
* Reverted back to an earlier version of this patch to undo
  the change that kept the wake_q in the ctx structure, as
  that broke the rule that the wake_q must always be on the
  stack, as its not safe for concurrency.
v6:
* Made tweaks suggested by Waiman Long
v7:
* Fixups to pass wake_qs down for PREEMPT_RT logic
---
 kernel/locking/mutex.c       | 17 +++++++++++++----
 kernel/locking/rtmutex.c     | 26 +++++++++++++++++---------
 kernel/locking/rwbase_rt.c   |  4 +++-
 kernel/locking/rwsem.c       |  4 ++--
 kernel/locking/spinlock_rt.c |  3 ++-
 kernel/locking/ww_mutex.h    | 29 ++++++++++++++++++-----------
 6 files changed, 55 insertions(+), 28 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 2deeeca3e71b..8337ed0dbf81 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -570,6 +570,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 		    struct lockdep_map *nest_lock, unsigned long ip,
 		    struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx)
 {
+	DEFINE_WAKE_Q(wake_q);
 	struct mutex_waiter waiter;
 	struct ww_mutex *ww;
 	int ret;
@@ -620,7 +621,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 	 */
 	if (__mutex_trylock(lock)) {
 		if (ww_ctx)
-			__ww_mutex_check_waiters(lock, ww_ctx);
+			__ww_mutex_check_waiters(lock, ww_ctx, &wake_q);
 
 		goto skip_wait;
 	}
@@ -640,7 +641,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 		 * Add in stamp order, waking up waiters that must kill
 		 * themselves.
 		 */
-		ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx);
+		ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx, &wake_q);
 		if (ret)
 			goto err_early_kill;
 	}
@@ -676,6 +677,11 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 		}
 
 		raw_spin_unlock(&lock->wait_lock);
+		/* Make sure we do wakeups before calling schedule */
+		if (!wake_q_empty(&wake_q)) {
+			wake_up_q(&wake_q);
+			wake_q_init(&wake_q);
+		}
 		schedule_preempt_disabled();
 
 		first = __mutex_waiter_is_first(lock, &waiter);
@@ -709,7 +715,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 		 */
 		if (!ww_ctx->is_wait_die &&
 		    !__mutex_waiter_is_first(lock, &waiter))
-			__ww_mutex_check_waiters(lock, ww_ctx);
+			__ww_mutex_check_waiters(lock, ww_ctx, &wake_q);
 	}
 
 	__mutex_remove_waiter(lock, &waiter);
@@ -725,6 +731,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 		ww_mutex_lock_acquired(ww, ww_ctx);
 
 	raw_spin_unlock(&lock->wait_lock);
+	wake_up_q(&wake_q);
 	preempt_enable();
 	return 0;
 
@@ -736,6 +743,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 	raw_spin_unlock(&lock->wait_lock);
 	debug_mutex_free_waiter(&waiter);
 	mutex_release(&lock->dep_map, ip);
+	wake_up_q(&wake_q);
 	preempt_enable();
 	return ret;
 }
@@ -929,6 +937,7 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
 		}
 	}
 
+	preempt_disable();
 	raw_spin_lock(&lock->wait_lock);
 	debug_mutex_unlock(lock);
 	if (!list_empty(&lock->wait_list)) {
@@ -947,8 +956,8 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
 		__mutex_handoff(lock, next);
 
 	raw_spin_unlock(&lock->wait_lock);
-
 	wake_up_q(&wake_q);
+	preempt_enable();
 }
 
 #ifndef CONFIG_DEBUG_LOCK_ALLOC
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 4a10e8c16fd2..eaac8b196a69 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -34,13 +34,15 @@
 
 static inline int __ww_mutex_add_waiter(struct rt_mutex_waiter *waiter,
 					struct rt_mutex *lock,
-					struct ww_acquire_ctx *ww_ctx)
+					struct ww_acquire_ctx *ww_ctx,
+					struct wake_q_head *wake_q)
 {
 	return 0;
 }
 
 static inline void __ww_mutex_check_waiters(struct rt_mutex *lock,
-					    struct ww_acquire_ctx *ww_ctx)
+					    struct ww_acquire_ctx *ww_ctx,
+					    struct wake_q_head *wake_q)
 {
 }
 
@@ -1206,6 +1208,7 @@ static int __sched task_blocks_on_rt_mutex(struct rt_mutex_base *lock,
 	struct rt_mutex_waiter *top_waiter = waiter;
 	struct rt_mutex_base *next_lock;
 	int chain_walk = 0, res;
+	DEFINE_WAKE_Q(wake_q);
 
 	lockdep_assert_held(&lock->wait_lock);
 
@@ -1244,7 +1247,8 @@ static int __sched task_blocks_on_rt_mutex(struct rt_mutex_base *lock,
 
 		/* Check whether the waiter should back out immediately */
 		rtm = container_of(lock, struct rt_mutex, rtmutex);
-		res = __ww_mutex_add_waiter(waiter, rtm, ww_ctx);
+		res = __ww_mutex_add_waiter(waiter, rtm, ww_ctx, &wake_q);
+		wake_up_q(&wake_q);
 		if (res) {
 			raw_spin_lock(&task->pi_lock);
 			rt_mutex_dequeue(lock, waiter);
@@ -1677,7 +1681,8 @@ static int __sched __rt_mutex_slowlock(struct rt_mutex_base *lock,
 				       struct ww_acquire_ctx *ww_ctx,
 				       unsigned int state,
 				       enum rtmutex_chainwalk chwalk,
-				       struct rt_mutex_waiter *waiter)
+				       struct rt_mutex_waiter *waiter,
+				       struct wake_q_head *wake_q)
 {
 	struct rt_mutex *rtm = container_of(lock, struct rt_mutex, rtmutex);
 	struct ww_mutex *ww = ww_container_of(rtm);
@@ -1688,7 +1693,7 @@ static int __sched __rt_mutex_slowlock(struct rt_mutex_base *lock,
 	/* Try to acquire the lock again: */
 	if (try_to_take_rt_mutex(lock, current, NULL)) {
 		if (build_ww_mutex() && ww_ctx) {
-			__ww_mutex_check_waiters(rtm, ww_ctx);
+			__ww_mutex_check_waiters(rtm, ww_ctx, wake_q);
 			ww_mutex_lock_acquired(ww, ww_ctx);
 		}
 		return 0;
@@ -1706,7 +1711,7 @@ static int __sched __rt_mutex_slowlock(struct rt_mutex_base *lock,
 		/* acquired the lock */
 		if (build_ww_mutex() && ww_ctx) {
 			if (!ww_ctx->is_wait_die)
-				__ww_mutex_check_waiters(rtm, ww_ctx);
+				__ww_mutex_check_waiters(rtm, ww_ctx, wake_q);
 			ww_mutex_lock_acquired(ww, ww_ctx);
 		}
 	} else {
@@ -1728,7 +1733,8 @@ static int __sched __rt_mutex_slowlock(struct rt_mutex_base *lock,
 
 static inline int __rt_mutex_slowlock_locked(struct rt_mutex_base *lock,
 					     struct ww_acquire_ctx *ww_ctx,
-					     unsigned int state)
+					     unsigned int state,
+					     struct wake_q_head *wake_q)
 {
 	struct rt_mutex_waiter waiter;
 	int ret;
@@ -1737,7 +1743,7 @@ static inline int __rt_mutex_slowlock_locked(struct rt_mutex_base *lock,
 	waiter.ww_ctx = ww_ctx;
 
 	ret = __rt_mutex_slowlock(lock, ww_ctx, state, RT_MUTEX_MIN_CHAINWALK,
-				  &waiter);
+				  &waiter, wake_q);
 
 	debug_rt_mutex_free_waiter(&waiter);
 	return ret;
@@ -1753,6 +1759,7 @@ static int __sched rt_mutex_slowlock(struct rt_mutex_base *lock,
 				     struct ww_acquire_ctx *ww_ctx,
 				     unsigned int state)
 {
+	DEFINE_WAKE_Q(wake_q);
 	unsigned long flags;
 	int ret;
 
@@ -1774,8 +1781,9 @@ static int __sched rt_mutex_slowlock(struct rt_mutex_base *lock,
 	 * irqsave/restore variants.
 	 */
 	raw_spin_lock_irqsave(&lock->wait_lock, flags);
-	ret = __rt_mutex_slowlock_locked(lock, ww_ctx, state);
+	ret = __rt_mutex_slowlock_locked(lock, ww_ctx, state, &wake_q);
 	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
+	wake_up_q(&wake_q);
 	rt_mutex_post_schedule();
 
 	return ret;
diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c
index 34a59569db6b..e9d2f38b70f3 100644
--- a/kernel/locking/rwbase_rt.c
+++ b/kernel/locking/rwbase_rt.c
@@ -69,6 +69,7 @@ static int __sched __rwbase_read_lock(struct rwbase_rt *rwb,
 				      unsigned int state)
 {
 	struct rt_mutex_base *rtm = &rwb->rtmutex;
+	DEFINE_WAKE_Q(wake_q);
 	int ret;
 
 	rwbase_pre_schedule();
@@ -110,7 +111,7 @@ static int __sched __rwbase_read_lock(struct rwbase_rt *rwb,
 	 * For rwlocks this returns 0 unconditionally, so the below
 	 * !ret conditionals are optimized out.
 	 */
-	ret = rwbase_rtmutex_slowlock_locked(rtm, state);
+	ret = rwbase_rtmutex_slowlock_locked(rtm, state, &wake_q);
 
 	/*
 	 * On success the rtmutex is held, so there can't be a writer
@@ -122,6 +123,7 @@ static int __sched __rwbase_read_lock(struct rwbase_rt *rwb,
 	if (!ret)
 		atomic_inc(&rwb->readers);
 	raw_spin_unlock_irq(&rtm->wait_lock);
+	wake_up_q(&wake_q);
 	if (!ret)
 		rwbase_rtmutex_unlock(rtm);
 
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 2340b6d90ec6..74ebb2915d63 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -1415,8 +1415,8 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
 #define rwbase_rtmutex_lock_state(rtm, state)		\
 	__rt_mutex_lock(rtm, state)
 
-#define rwbase_rtmutex_slowlock_locked(rtm, state)	\
-	__rt_mutex_slowlock_locked(rtm, NULL, state)
+#define rwbase_rtmutex_slowlock_locked(rtm, state, wq)	\
+	__rt_mutex_slowlock_locked(rtm, NULL, state, wq)
 
 #define rwbase_rtmutex_unlock(rtm)			\
 	__rt_mutex_unlock(rtm)
diff --git a/kernel/locking/spinlock_rt.c b/kernel/locking/spinlock_rt.c
index 38e292454fcc..fb1810a14c9d 100644
--- a/kernel/locking/spinlock_rt.c
+++ b/kernel/locking/spinlock_rt.c
@@ -162,7 +162,8 @@ rwbase_rtmutex_lock_state(struct rt_mutex_base *rtm, unsigned int state)
 }
 
 static __always_inline int
-rwbase_rtmutex_slowlock_locked(struct rt_mutex_base *rtm, unsigned int state)
+rwbase_rtmutex_slowlock_locked(struct rt_mutex_base *rtm, unsigned int state,
+			       struct wake_q_head *wake_q)
 {
 	rtlock_slowlock_locked(rtm);
 	return 0;
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index 3ad2cc4823e5..7189c6631d90 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -275,7 +275,7 @@ __ww_ctx_less(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
  */
 static bool
 __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER *waiter,
-	       struct ww_acquire_ctx *ww_ctx)
+	       struct ww_acquire_ctx *ww_ctx, struct wake_q_head *wake_q)
 {
 	if (!ww_ctx->is_wait_die)
 		return false;
@@ -284,7 +284,7 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER *waiter,
 #ifndef WW_RT
 		debug_mutex_wake_waiter(lock, waiter);
 #endif
-		wake_up_process(waiter->task);
+		wake_q_add(wake_q, waiter->task);
 	}
 
 	return true;
@@ -299,7 +299,8 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER *waiter,
  */
 static bool __ww_mutex_wound(struct MUTEX *lock,
 			     struct ww_acquire_ctx *ww_ctx,
-			     struct ww_acquire_ctx *hold_ctx)
+			     struct ww_acquire_ctx *hold_ctx,
+			     struct wake_q_head *wake_q)
 {
 	struct task_struct *owner = __ww_mutex_owner(lock);
 
@@ -331,7 +332,7 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
 		 * wakeup pending to re-read the wounded state.
 		 */
 		if (owner != current)
-			wake_up_process(owner);
+			wake_q_add(wake_q, owner);
 
 		return true;
 	}
@@ -352,7 +353,8 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
  * The current task must not be on the wait list.
  */
 static void
-__ww_mutex_check_waiters(struct MUTEX *lock, struct ww_acquire_ctx *ww_ctx)
+__ww_mutex_check_waiters(struct MUTEX *lock, struct ww_acquire_ctx *ww_ctx,
+			 struct wake_q_head *wake_q)
 {
 	struct MUTEX_WAITER *cur;
 
@@ -364,8 +366,8 @@ __ww_mutex_check_waiters(struct MUTEX *lock, struct ww_acquire_ctx *ww_ctx)
 		if (!cur->ww_ctx)
 			continue;
 
-		if (__ww_mutex_die(lock, cur, ww_ctx) ||
-		    __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx))
+		if (__ww_mutex_die(lock, cur, ww_ctx, wake_q) ||
+		    __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx, wake_q))
 			break;
 	}
 }
@@ -377,6 +379,8 @@ __ww_mutex_check_waiters(struct MUTEX *lock, struct ww_acquire_ctx *ww_ctx)
 static __always_inline void
 ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 {
+	DEFINE_WAKE_Q(wake_q);
+
 	ww_mutex_lock_acquired(lock, ctx);
 
 	/*
@@ -405,8 +409,10 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 	 * die or wound us.
 	 */
 	lock_wait_lock(&lock->base);
-	__ww_mutex_check_waiters(&lock->base, ctx);
+	__ww_mutex_check_waiters(&lock->base, ctx, &wake_q);
 	unlock_wait_lock(&lock->base);
+
+	wake_up_q(&wake_q);
 }
 
 static __always_inline int
@@ -488,7 +494,8 @@ __ww_mutex_check_kill(struct MUTEX *lock, struct MUTEX_WAITER *waiter,
 static inline int
 __ww_mutex_add_waiter(struct MUTEX_WAITER *waiter,
 		      struct MUTEX *lock,
-		      struct ww_acquire_ctx *ww_ctx)
+		      struct ww_acquire_ctx *ww_ctx,
+		      struct wake_q_head *wake_q)
 {
 	struct MUTEX_WAITER *cur, *pos = NULL;
 	bool is_wait_die;
@@ -532,7 +539,7 @@ __ww_mutex_add_waiter(struct MUTEX_WAITER *waiter,
 		pos = cur;
 
 		/* Wait-Die: ensure younger waiters die. */
-		__ww_mutex_die(lock, cur, ww_ctx);
+		__ww_mutex_die(lock, cur, ww_ctx, wake_q);
 	}
 
 	__ww_waiter_add(lock, waiter, pos);
@@ -550,7 +557,7 @@ __ww_mutex_add_waiter(struct MUTEX_WAITER *waiter,
 		 * such that either we or the fastpath will wound @ww->ctx.
 		 */
 		smp_mb();
-		__ww_mutex_wound(lock, ww_ctx, ww->ctx);
+		__ww_mutex_wound(lock, ww_ctx, ww->ctx, wake_q);
 	}
 
 	return 0;

From patchwork Wed Dec 20 00:18:14 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: John Stultz <jstultz@google.com>
X-Patchwork-Id: 181351
Return-Path: <linux-kernel+bounces-6137-ouuuleilei=gmail.com@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp2322578dyi;
        Tue, 19 Dec 2023 16:20:10 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IHRGdwbg6rEIhkRC+wUzOPUo6fxIa/aJvz4/xippu9FpoJ+wCHpdEPJrxRyay2VgthjBQGS
X-Received: by 2002:a05:6808:1390:b0:3b9:d783:8da9 with SMTP id
 c16-20020a056808139000b003b9d7838da9mr24470109oiw.67.1703031610369;
        Tue, 19 Dec 2023 16:20:10 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1703031610; cv=none;
        d=google.com; s=arc-20160816;
        b=K/sL03KCM6NnT/p9m6rXpIJU9FsKXYqHDhtbnrmmQ/+Fnb6OzGYLi9vzSIc72U7JO1
         tt929VuoThBoU30gDGBifMQggQQAMkUxwkXL7uc0xLcXWw0gGx2Qz+KXBmOrjvNbzBKF
         VCQRYPGB1KDxz+jFStI3oA7aAd6YfqRv4nLgGvYcrjZPhrrDluTAgiaL7V8OXsgGLscq
         ftopCLeZamvsRpZu//gc7+b7u4CvMMuqmBxPcFQHJ+NuxsC/3BNi1NJP3oaBI+AQyUdo
         IpjaY74LPZhVuZwJ9X4p6+gsJaGVrsxutBq57ItUbjXF7yZYn49GD38dNihTnHJrtMfR
         JYdw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=cc:to:from:subject:message-id:references:mime-version
         :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date
         :dkim-signature;
        bh=WfAUy2Ttqwgv58afFxIJs0xH1Dq1+/SdF+f9rGh0vBU=;
        fh=y8KJUBnrwQA+5PtWX4i2qO6O4Uc2yYtIrVYD0yjhUrk=;
        b=jkae9z3BjlPu8+6IZxLwaubZhGd4ZV/l7FGbp3OAugkXmFg5dQk92/uoQltF9pnlzI
         1tyiNE0JuMn3evJPTkrT1dQ87HmxVsLfBj+uVUg0ggmVzSge4kiOlVMshmZoueFlEKsx
         9+2LFRLxsG/UopWIiEnirZyuBk0PjR9ChFqkKIwFcv9Z3P/KVh0yYNy7Ss/djTZwziRM
         6vkXvBe9cErQLORyHeyxbxpIqJiq3X1eiWnwkWxbOnFFhh8gjiPVOxTL88/d6xA7ePYt
         fFW4CE7TjZYR8ImIf7foYpkfi1QF4RoNE2thI3WBZ68QpI50Rz8ED7j4LUcCdqf7wSCQ
         977w==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=nZAjgbDS;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6137-ouuuleilei=gmail.com@vger.kernel.org designates
 139.178.88.99 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6137-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99])
        by mx.google.com with ESMTPS id
 bx2-20020a056a02050200b005c6248a0cd8si21143309pgb.23.2023.12.19.16.20.10
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 19 Dec 2023 16:20:10 -0800 (PST)
Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-6137-ouuuleilei=gmail.com@vger.kernel.org designates
 139.178.88.99 as permitted sender) client-ip=139.178.88.99;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=nZAjgbDS;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6137-ouuuleilei=gmail.com@vger.kernel.org designates
 139.178.88.99 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6137-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org
 [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by sv.mirrors.kernel.org (Postfix) with ESMTPS id 13F972847FB
	for <ouuuleilei@gmail.com>; Wed, 20 Dec 2023 00:20:10 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp.subspace.kernel.org (Postfix) with ESMTP id F0E69AD2A;
	Wed, 20 Dec 2023 00:19:13 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="nZAjgbDS"
X-Original-To: linux-kernel@vger.kernel.org
Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com
 [209.85.216.73])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 82C3F5C9D
	for <linux-kernel@vger.kernel.org>; Wed, 20 Dec 2023 00:19:09 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Received: by mail-pj1-f73.google.com with SMTP id
 98e67ed59e1d1-28bd4766346so14226a91.3
        for <linux-kernel@vger.kernel.org>;
 Tue, 19 Dec 2023 16:19:09 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1703031549; x=1703636349;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=WfAUy2Ttqwgv58afFxIJs0xH1Dq1+/SdF+f9rGh0vBU=;
        b=nZAjgbDSfn4vKBBa7P5/FjkImF1p5J1j1wJwz+sGezB2sX9M90A8nyYzXrrYz8qcRZ
         M2Rr76kifrmwk8OZ/I3eMMw6U8y9JZpNQDiE3C8u3SPaHkI04bVI4ZMpCIEZSSY7VnqU
         MbQqqgP176EmGJK2gYjWTHB4LwpdiMBPmvK/RnEwg/sMlj7EreAMrd74Z+8WFJu3R+iH
         DuuwKbolSua4KKx/Z5VEDWmBKYyfhQjMhnNlauT07UZcaBWGkXYcH644rqAdcz4HGMsr
         nI3RevNPA7SYRxzWxpMdqqRTxj7GosDZ859jelD5ikq/kEPbaoaOWFxHhmZ4h0Xte8u1
         1aoQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1703031549; x=1703636349;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=WfAUy2Ttqwgv58afFxIJs0xH1Dq1+/SdF+f9rGh0vBU=;
        b=V1d/spNPDKK3FT3Gv7pWhaG7e5RQOBdjseGBf1BUmWc1C1i3vumZvZC+ZotEEFywML
         DV8kccKHWHDqNHHSQ5N+pJ1GRTirzO26M7SJl9R4x3+46unxDWNwnbXMgpiUttgnh/UD
         MTziKC1b73HzDytKT383aWcXG88kGu0RHxdsKDYv4QC1rcuxhErUfkqnjJCnAt6FPnbF
         UuOX5qvOHElpgh5xkj947qjj8Eva6bWByfd2CFCWPFXIzmF474ebYFVwVgkZ77ZZeaOv
         zkne054/M43Go+edjU5vyWrcskgBhABSiD/VG8SYYikBXcrlnqrPY4J735ALWip08g9L
         f6Ow==
X-Gm-Message-State: AOJu0YzJR8bC3qm0tN6I9hRKTuF8JT+Ag98yw9AQS5m9JhF8QSLRq/qV
	GOigCFylh4LJIttZKNlC/BTDqzKnawpgLj3njfM8NpGlK5zDHFya3rKqEuPRgvw+wTq5eOZCNPV
	9Vr8eu1C4VCwK5BO1MdtPeS5IToYSHzt4SLnbAEYticu3JUkbugBF/fRyn4hF0/2J26ql5qM=
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a17:90b:1bcd:b0:28b:4d3c:6688 with SMTP id
 oa13-20020a17090b1bcd00b0028b4d3c6688mr274608pjb.8.1703031547610; Tue, 19 Dec
 2023 16:19:07 -0800 (PST)
Date: Tue, 19 Dec 2023 16:18:14 -0800
In-Reply-To: <20231220001856.3710363-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231220001856.3710363-1-jstultz@google.com>
X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog
Message-ID: <20231220001856.3710363-4-jstultz@google.com>
Subject: [PATCH v7 03/23] locking/mutex: Make mutex::wait_lock irq safe
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@google.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>,
 Youssef Esmat <youssefesmat@google.com>,
	Mel Gorman <mgorman@suse.de>,
 Daniel Bristot de Oliveira <bristot@redhat.com>,
 Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>, kernel-team@android.com,
	"Connor O'Brien" <connoro@google.com>, John Stultz <jstultz@google.com>
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1785758073860415898
X-GMAIL-MSGID: 1785758073860415898

From: Juri Lelli <juri.lelli@redhat.com>

mutex::wait_lock might be nested under rq->lock.

Make it irq safe then.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[rebase & fix {un,}lock_wait_lock helpers in ww_mutex.h]
Signed-off-by: Connor O'Brien <connoro@google.com>
Signed-off-by: John Stultz <jstultz@google.com>
---
v3:
* Re-added this patch after it was dropped in v2 which
  caused lockdep warnings to trip.
v7:
* Fix function definition for PREEMPT_RT case, as pointed out
  by Metin Kaya.
* Fix incorrect flags handling in PREEMPT_RT case as found by
  Metin Kaya
---
 kernel/locking/mutex.c    | 18 ++++++++++--------
 kernel/locking/ww_mutex.h | 22 +++++++++++-----------
 2 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 8337ed0dbf81..73d98dd23eec 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -573,6 +573,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 	DEFINE_WAKE_Q(wake_q);
 	struct mutex_waiter waiter;
 	struct ww_mutex *ww;
+	unsigned long flags;
 	int ret;
 
 	if (!use_ww_ctx)
@@ -615,7 +616,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 		return 0;
 	}
 
-	raw_spin_lock(&lock->wait_lock);
+	raw_spin_lock_irqsave(&lock->wait_lock, flags);
 	/*
 	 * After waiting to acquire the wait_lock, try again.
 	 */
@@ -676,7 +677,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 				goto err;
 		}
 
-		raw_spin_unlock(&lock->wait_lock);
+		raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 		/* Make sure we do wakeups before calling schedule */
 		if (!wake_q_empty(&wake_q)) {
 			wake_up_q(&wake_q);
@@ -702,9 +703,9 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 			trace_contention_begin(lock, LCB_F_MUTEX);
 		}
 
-		raw_spin_lock(&lock->wait_lock);
+		raw_spin_lock_irqsave(&lock->wait_lock, flags);
 	}
-	raw_spin_lock(&lock->wait_lock);
+	raw_spin_lock_irqsave(&lock->wait_lock, flags);
 acquired:
 	__set_current_state(TASK_RUNNING);
 
@@ -730,7 +731,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 	if (ww_ctx)
 		ww_mutex_lock_acquired(ww, ww_ctx);
 
-	raw_spin_unlock(&lock->wait_lock);
+	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 	wake_up_q(&wake_q);
 	preempt_enable();
 	return 0;
@@ -740,7 +741,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 	__mutex_remove_waiter(lock, &waiter);
 err_early_kill:
 	trace_contention_end(lock, ret);
-	raw_spin_unlock(&lock->wait_lock);
+	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 	debug_mutex_free_waiter(&waiter);
 	mutex_release(&lock->dep_map, ip);
 	wake_up_q(&wake_q);
@@ -911,6 +912,7 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
 	struct task_struct *next = NULL;
 	DEFINE_WAKE_Q(wake_q);
 	unsigned long owner;
+	unsigned long flags;
 
 	mutex_release(&lock->dep_map, ip);
 
@@ -938,7 +940,7 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
 	}
 
 	preempt_disable();
-	raw_spin_lock(&lock->wait_lock);
+	raw_spin_lock_irqsave(&lock->wait_lock, flags);
 	debug_mutex_unlock(lock);
 	if (!list_empty(&lock->wait_list)) {
 		/* get the first entry from the wait-list: */
@@ -955,7 +957,7 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
 	if (owner & MUTEX_FLAG_HANDOFF)
 		__mutex_handoff(lock, next);
 
-	raw_spin_unlock(&lock->wait_lock);
+	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 	wake_up_q(&wake_q);
 	preempt_enable();
 }
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index 7189c6631d90..9facc0ddfdd3 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -70,14 +70,14 @@ __ww_mutex_has_waiters(struct mutex *lock)
 	return atomic_long_read(&lock->owner) & MUTEX_FLAG_WAITERS;
 }
 
-static inline void lock_wait_lock(struct mutex *lock)
+static inline void lock_wait_lock(struct mutex *lock, unsigned long *flags)
 {
-	raw_spin_lock(&lock->wait_lock);
+	raw_spin_lock_irqsave(&lock->wait_lock, *flags);
 }
 
-static inline void unlock_wait_lock(struct mutex *lock)
+static inline void unlock_wait_lock(struct mutex *lock, unsigned long *flags)
 {
-	raw_spin_unlock(&lock->wait_lock);
+	raw_spin_unlock_irqrestore(&lock->wait_lock, *flags);
 }
 
 static inline void lockdep_assert_wait_lock_held(struct mutex *lock)
@@ -144,14 +144,14 @@ __ww_mutex_has_waiters(struct rt_mutex *lock)
 	return rt_mutex_has_waiters(&lock->rtmutex);
 }
 
-static inline void lock_wait_lock(struct rt_mutex *lock)
+static inline void lock_wait_lock(struct rt_mutex *lock, unsigned long *flags)
 {
-	raw_spin_lock(&lock->rtmutex.wait_lock);
+	raw_spin_lock_irqsave(&lock->rtmutex.wait_lock, *flags);
 }
 
-static inline void unlock_wait_lock(struct rt_mutex *lock)
+static inline void unlock_wait_lock(struct rt_mutex *lock, unsigned long *flags)
 {
-	raw_spin_unlock(&lock->rtmutex.wait_lock);
+	raw_spin_unlock_irqrestore(&lock->rtmutex.wait_lock, *flags);
 }
 
 static inline void lockdep_assert_wait_lock_held(struct rt_mutex *lock)
@@ -380,6 +380,7 @@ static __always_inline void
 ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 {
 	DEFINE_WAKE_Q(wake_q);
+	unsigned long flags;
 
 	ww_mutex_lock_acquired(lock, ctx);
 
@@ -408,10 +409,9 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 	 * Uh oh, we raced in fastpath, check if any of the waiters need to
 	 * die or wound us.
 	 */
-	lock_wait_lock(&lock->base);
+	lock_wait_lock(&lock->base, &flags);
 	__ww_mutex_check_waiters(&lock->base, ctx, &wake_q);
-	unlock_wait_lock(&lock->base);
-
+	unlock_wait_lock(&lock->base, &flags);
 	wake_up_q(&wake_q);
 }
 

From patchwork Wed Dec 20 00:18:15 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: John Stultz <jstultz@google.com>
X-Patchwork-Id: 181352
Return-Path: <linux-kernel+bounces-6138-ouuuleilei=gmail.com@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp2322665dyi;
        Tue, 19 Dec 2023 16:20:19 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IEfmlU1ImYbEebMTUt3GarXU4VzTstvZpqnxBNKVcN4Hq/Et2QdbAF0tdjKY6ApIr/qQn2e
X-Received: by 2002:a50:ccd7:0:b0:552:f4ac:fdf9 with SMTP id
 b23-20020a50ccd7000000b00552f4acfdf9mr4394170edj.76.1703031619699;
        Tue, 19 Dec 2023 16:20:19 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1703031619; cv=none;
        d=google.com; s=arc-20160816;
        b=03NTDSEirDsYx+9OGBO9nJylF105O8VvggynHTjpgtHMCMhs/9rcMZ6LPn8aQCSw8C
         7h+m1fX3rCjiwWRYKJl7ws2tZIwxh63Uk0o9fNOpxH1T0pOku4tTpIQ4LF/Uqaar6bQN
         9U6heOA5FrnK8GGE5kUrsFStbPKWPR5FbKhh8ukPMkAxncngC/nyJC5jXkzQNf7E57M/
         KUtMXy3cyUKBdiokd3mqum06q5ignZzmbhmCbHk7YAUinU3y2/pD3oAIQKpvyUdHLIjw
         cAsT462IpgRFvpYSBadl3yywvG9sWCoir7bpfIPSBzm6cXRy8sQALBAsnIVoB15LVA6S
         pU0g==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=cc:to:from:subject:message-id:references:mime-version
         :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date
         :dkim-signature;
        bh=0tW3dssmUv9V5AQDOtaG+xQeN0OKPPQSNCn+KMrenwQ=;
        fh=CAq+MFEKKjJy5jyw+pqNOeDGlDhGAe1gUm7slWeEdnw=;
        b=jxHjoKEcZT7X2Qq5YGDBXE0UDj87VcXfTXTk2BRmBhPopHaY1t59EgqaTzbUd2YUwJ
         pu9ZZc+JkZAbxMjBTOCtR3PHu6aB8FnX2+CPi2NHfw0DaLTpxfx7nSQkC8XDcsca2ngx
         jNMiG0nej20W8nNQNIAgYWc2Uyd3eDo1H5xhpRIMMsI94BHOoKRvJK2J0/ACvAhl462o
         RbMcUCak1BO76MAUE2ijz+wPuTbyIrQ7/llTjE7XGM+ZF/IN4XZPbC0WYZS80eqLZDZs
         9qosKY3Vg7XoXO+XxU0ul2jp/eHP3Yfzb4v4U8Pw2Af73wTuCffxtJ/yRYgIzpANUkoM
         hWwg==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=Bc8+gGXX;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6138-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.80.249 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6138-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249])
        by mx.google.com with ESMTPS id
 g29-20020a056402321d00b00553059ffc89si3465874eda.530.2023.12.19.16.20.19
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 19 Dec 2023 16:20:19 -0800 (PST)
Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-6138-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.80.249 as permitted sender) client-ip=147.75.80.249;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=Bc8+gGXX;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6138-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.80.249 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6138-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org
 [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by am.mirrors.kernel.org (Postfix) with ESMTPS id 2B6451F25D41
	for <ouuuleilei@gmail.com>; Wed, 20 Dec 2023 00:20:19 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp.subspace.kernel.org (Postfix) with ESMTP id 35362BA34;
	Wed, 20 Dec 2023 00:19:15 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="Bc8+gGXX"
X-Original-To: linux-kernel@vger.kernel.org
Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com
 [209.85.210.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id ECE6179D8
	for <linux-kernel@vger.kernel.org>; Wed, 20 Dec 2023 00:19:10 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Received: by mail-pf1-f201.google.com with SMTP id
 d2e1a72fcca58-6d7e5d286a1so158658b3a.1
        for <linux-kernel@vger.kernel.org>;
 Tue, 19 Dec 2023 16:19:10 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1703031550; x=1703636350;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=0tW3dssmUv9V5AQDOtaG+xQeN0OKPPQSNCn+KMrenwQ=;
        b=Bc8+gGXXJakE+E486djSQr59dDUPjbxX6xS+h58sq/bxJdc+OX9rraj1GdRcdzQowZ
         vC8WAlXRFlUGlUu4dZs9gGFPB+G1aEj6tasyM9Kq7/y+R6ImTxppDbZo/faw2FN08fo/
         wa6jlMzSeT4WwZi3Dho5/1H1AIlgp1kaszq19yRrDV89LJxRa8cK0QCwwLQDaqkowsTX
         MI244NRSbA0Zmyywt//5/1IlGUQYF7tiB7KPB2MluFrdqYoJ+OStecMLomJiyDQx4m7A
         aSUGn50twE59RIrViza0aNEP1zZxkwDLbzjcS3j6iJN5qYol1K1crg3URBMTVgqHYNhT
         Ni9g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1703031550; x=1703636350;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=0tW3dssmUv9V5AQDOtaG+xQeN0OKPPQSNCn+KMrenwQ=;
        b=TmDBhhY+e8iC3LPKmzmzUTS9/Awnyys9LiVk0T7cnjJ6L4XawS/5XlbEZhAZDXZ5jt
         GtGPkqjjRYAYJMQSMBA7l/hcggmQbUOsLrBzpH7zeTH/eXe6G7ztfE8VmOJWoV2WE0aR
         Z+qwjksvfOA8HW5olNvlYCiSR73JEuvnluAGtBWCsnS02slpQVdA0biMeRh7ksZJH8eP
         3VTFaeCU1ODdhDTu98gLYcMGRvKoo3FYUQoKfTd2FZGP7P+3cMxvBkKZHOvOrop+rGOZ
         6lKXRPcLSEwUuvLkscAdthQJEZ4l7/b5QOnX5fTOEEYoKOC39qRL/NzywDgnwVG6kXHx
         8OLg==
X-Gm-Message-State: AOJu0Yyd1yyfzZgr2AIf4yp+WB6u/jGvcsJfxaZ27zW+iPbJS6vzo2Vt
	BatcZ9iV/WtXUag+Zv5Ue151Ak6Zp7VSZr2g37rP9nI4AEmZZ8esRZpkmCAVv3vV8YpYmWZFIvl
	dYApJOLnGBM1qnLyGYUxleu/0i8/JhwDzfn3cDMugbRPNRacWpViG00eArI9pby5VZ6x4Y1Q=
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a05:6a00:2d07:b0:6d9:39eb:dfb4 with SMTP
 id fa7-20020a056a002d0700b006d939ebdfb4mr115523pfb.3.1703031549341; Tue, 19
 Dec 2023 16:19:09 -0800 (PST)
Date: Tue, 19 Dec 2023 16:18:15 -0800
In-Reply-To: <20231220001856.3710363-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231220001856.3710363-1-jstultz@google.com>
X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog
Message-ID: <20231220001856.3710363-5-jstultz@google.com>
Subject: [PATCH v7 04/23] locking/mutex: Expose __mutex_owner()
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@google.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>,
 Youssef Esmat <youssefesmat@google.com>,
	Mel Gorman <mgorman@suse.de>,
 Daniel Bristot de Oliveira <bristot@redhat.com>,
 Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>, kernel-team@android.com,
	Valentin Schneider <valentin.schneider@arm.com>,
 "Connor O'Brien" <connoro@google.com>,
	John Stultz <jstultz@google.com>
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1785758084009852058
X-GMAIL-MSGID: 1785758084009852058

From: Juri Lelli <juri.lelli@redhat.com>

Implementing proxy execution requires that scheduler code be able to
identify the current owner of a mutex. Expose __mutex_owner() for
this purpose (alone!).

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
[Removed the EXPORT_SYMBOL]
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: Reworked per Peter's suggestions]
Signed-off-by: John Stultz <jstultz@google.com>
---
v4:
* Move __mutex_owner() to kernel/locking/mutex.h instead of
  adding a new globally available accessor function to keep
  the exposure of this low, along with keeping it an inline
  function, as suggested by PeterZ
---
 kernel/locking/mutex.c | 25 -------------------------
 kernel/locking/mutex.h | 25 +++++++++++++++++++++++++
 2 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 73d98dd23eec..543774506fdb 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -56,31 +56,6 @@ __mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key)
 }
 EXPORT_SYMBOL(__mutex_init);
 
-/*
- * @owner: contains: 'struct task_struct *' to the current lock owner,
- * NULL means not owned. Since task_struct pointers are aligned at
- * at least L1_CACHE_BYTES, we have low bits to store extra state.
- *
- * Bit0 indicates a non-empty waiter list; unlock must issue a wakeup.
- * Bit1 indicates unlock needs to hand the lock to the top-waiter
- * Bit2 indicates handoff has been done and we're waiting for pickup.
- */
-#define MUTEX_FLAG_WAITERS	0x01
-#define MUTEX_FLAG_HANDOFF	0x02
-#define MUTEX_FLAG_PICKUP	0x04
-
-#define MUTEX_FLAGS		0x07
-
-/*
- * Internal helper function; C doesn't allow us to hide it :/
- *
- * DO NOT USE (outside of mutex code).
- */
-static inline struct task_struct *__mutex_owner(struct mutex *lock)
-{
-	return (struct task_struct *)(atomic_long_read(&lock->owner) & ~MUTEX_FLAGS);
-}
-
 static inline struct task_struct *__owner_task(unsigned long owner)
 {
 	return (struct task_struct *)(owner & ~MUTEX_FLAGS);
diff --git a/kernel/locking/mutex.h b/kernel/locking/mutex.h
index 0b2a79c4013b..1c7d3d32def8 100644
--- a/kernel/locking/mutex.h
+++ b/kernel/locking/mutex.h
@@ -20,6 +20,31 @@ struct mutex_waiter {
 #endif
 };
 
+/*
+ * @owner: contains: 'struct task_struct *' to the current lock owner,
+ * NULL means not owned. Since task_struct pointers are aligned at
+ * at least L1_CACHE_BYTES, we have low bits to store extra state.
+ *
+ * Bit0 indicates a non-empty waiter list; unlock must issue a wakeup.
+ * Bit1 indicates unlock needs to hand the lock to the top-waiter
+ * Bit2 indicates handoff has been done and we're waiting for pickup.
+ */
+#define MUTEX_FLAG_WAITERS	0x01
+#define MUTEX_FLAG_HANDOFF	0x02
+#define MUTEX_FLAG_PICKUP	0x04
+
+#define MUTEX_FLAGS		0x07
+
+/*
+ * Internal helper function; C doesn't allow us to hide it :/
+ *
+ * DO NOT USE (outside of mutex & scheduler code).
+ */
+static inline struct task_struct *__mutex_owner(struct mutex *lock)
+{
+	return (struct task_struct *)(atomic_long_read(&lock->owner) & ~MUTEX_FLAGS);
+}
+
 #ifdef CONFIG_DEBUG_MUTEXES
 extern void debug_mutex_lock_common(struct mutex *lock,
 				    struct mutex_waiter *waiter);

From patchwork Wed Dec 20 00:18:16 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: John Stultz <jstultz@google.com>
X-Patchwork-Id: 181353
Return-Path: <linux-kernel+bounces-6139-ouuuleilei=gmail.com@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp2322944dyi;
        Tue, 19 Dec 2023 16:20:58 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IFQUczM7mqtDD1hoebPLxyaZy9b7qRGQMEi6w7WvMcb5STJ+RXcCfXjJ2NII7KUMQQCu/2D
X-Received: by 2002:a05:6358:7249:b0:172:e2d3:948e with SMTP id
 i9-20020a056358724900b00172e2d3948emr3864317rwa.36.1703031658169;
        Tue, 19 Dec 2023 16:20:58 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1703031658; cv=none;
        d=google.com; s=arc-20160816;
        b=lQdku3BiJ/IdB9W8fI9GD3pyXrP9Po23dB6l828cjBgFeQR0HAoBTcAPzCKxf5X5nn
         N+bKLq/1AzdU3pEc+0j+dEHAKQkNiloB2pflz4kTT2NdncgNNd1fsT2GvhI7Szf60LiZ
         hyJzu+CQmTTRqAAyRmdkMLTtzAoIHLWdCVuKXHrYHJOrpw0yqtBZc/l5DALysRn4a43A
         HndUnd2entpoL8vPqCRHiB6vxRXxm/IuCU1RydGzWzgeJQ4tw+4WwqDaDbVXZ9ujzQYc
         xyD1XLB//+NflfnkcWXoggGAcezJnkJt4CoHOstmWTyfNf83dAXRiYznxWFMaT+Xiww7
         DLkA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=cc:to:from:subject:message-id:references:mime-version
         :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date
         :dkim-signature;
        bh=sqfwQXYrs0dla1Ln+uFCJhXbwMO3b1s4DQoQ/IjlTnQ=;
        fh=K8pCBYg8ykDwAhhq49E3WwqDmzBTrGOj2/v8c6I07V4=;
        b=ZUDlFxgqW7dfYos/ZTOZ20CP/BqpZ9RJf+R3XOhVQzKl7Ykv2sWhvhYiEP3YfL2BdI
         XuaeHLvwtsbRzDLUgtrU432OTtB/3QyipVOhJ43pxAKOQ52YUMnIyfdoygN4Sns3Tes/
         1qgP0PoCIBiSB3Y9JEtEgVLGe8Iq83fIkOURNNvr0eQczA0Ys32Z8+AZyOYiCDzHaGT6
         fYQZbwUmjbZxcmXEEV5+i+oOteDtvrbnEcM83ewkLSVapQkmi7TbVHZdxtEmFTisw6Zy
         SR9EMoJcDry41tBI6NzBoj0yevHSxMy+c/4xaGftDKaDu6KzEIcvor8bAD3iHlkxGTSr
         GnQA==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=N1oZrSP2;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6139-ouuuleilei=gmail.com@vger.kernel.org designates
 139.178.88.99 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6139-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99])
        by mx.google.com with ESMTPS id
 b4-20020a17090acc0400b0028b75e8a4a4si2014093pju.41.2023.12.19.16.20.58
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 19 Dec 2023 16:20:58 -0800 (PST)
Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-6139-ouuuleilei=gmail.com@vger.kernel.org designates
 139.178.88.99 as permitted sender) client-ip=139.178.88.99;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=N1oZrSP2;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6139-ouuuleilei=gmail.com@vger.kernel.org designates
 139.178.88.99 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6139-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org
 [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by sv.mirrors.kernel.org (Postfix) with ESMTPS id E03E1287E6B
	for <ouuuleilei@gmail.com>; Wed, 20 Dec 2023 00:20:57 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp.subspace.kernel.org (Postfix) with ESMTP id 9CD9AC2FE;
	Wed, 20 Dec 2023 00:19:18 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="N1oZrSP2"
X-Original-To: linux-kernel@vger.kernel.org
Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com
 [209.85.215.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id F1D2CAD2E
	for <linux-kernel@vger.kernel.org>; Wed, 20 Dec 2023 00:19:12 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Received: by mail-pg1-f202.google.com with SMTP id
 41be03b00d2f7-5ca5b61c841so2045759a12.3
        for <linux-kernel@vger.kernel.org>;
 Tue, 19 Dec 2023 16:19:12 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1703031552; x=1703636352;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=sqfwQXYrs0dla1Ln+uFCJhXbwMO3b1s4DQoQ/IjlTnQ=;
        b=N1oZrSP2R+R2WerPrKUNgaQ9//qFIntPz3SemUYd3IsVr7A+oTP/IdePvYsYdyyXZQ
         AX+MoE9gniRQtLGneHUfadpYqcbJGytt3OsIu6lu3SDAvhR4I32w7RYfRDvmY8pFpNkW
         cIVyFy3CTXFp+b2QmX8VDdPqeHtvNVInGPCe17k3gEF+8FnkjqNfv3gQruE7oNb5+Sh/
         jGRoBEEHZvMWzT6CwwZBQr1ePKowQatHu9pg+mygLBV2hCz3R3wKAlsye3evAzElyIfP
         H90ojml5Xh1Z9Z/2fQ2xK8iXGqVz47UksLxD6tiLe+wWu9tCv0KNLyMc+272geAE8POD
         CPfg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1703031552; x=1703636352;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=sqfwQXYrs0dla1Ln+uFCJhXbwMO3b1s4DQoQ/IjlTnQ=;
        b=SA13avKNdzLQy/ya57opfWxnMSaKB5AKzTzfLqkij24vGhLaLv4XXF2nXvbIZCQYMf
         cTr/AM8KlbtN4+zQUfeb/j2Lpyb2ecrdXI1fr1heIb1NgRNN8/1weGajCWzaV7xRnCpj
         dpDVNS6xBJ0pfGsLyfqlX4IzHSUk6B5O6ICvNZ0fmLu1zvT4GrEMPNFhtBzBW7u/8dqT
         E3qhxJAaoE63s1MMZBioa5Tqie6la4PFhp6CiTYkMcs7FuedHtdgwRHAZOsbxIhAU7M0
         JbOzfiSs0NVgDZLY++u4wCGbIXpHCFZnxhThmlDb9fUFDT45+nKEDjbcpuggmsOnwsOL
         G4hQ==
X-Gm-Message-State: AOJu0Ywzw1L7fBB2kWzlHePyrjfFqmG1n/ZXsrH87EN4yn9qYmHuAZBy
	c069+r5G9HxyUt30/ge7P4uXG7pjiY5ZhrGVFpVuT1DgjkIcv39BuhHUgDkWU+z8SpNtJ4fHDSU
	GrR69TRlF4JfymYge81TvfanWx2pzVVNZwfGMGDHB9y0CITYERcI6Yyofa6qaJPOm5MDwmg0=
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a05:6a02:68a:b0:5cd:a34c:f9db with SMTP id
 ca10-20020a056a02068a00b005cda34cf9dbmr23155pgb.10.1703031551064; Tue, 19 Dec
 2023 16:19:11 -0800 (PST)
Date: Tue, 19 Dec 2023 16:18:16 -0800
In-Reply-To: <20231220001856.3710363-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231220001856.3710363-1-jstultz@google.com>
X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog
Message-ID: <20231220001856.3710363-6-jstultz@google.com>
Subject: [PATCH v7 05/23] locking/mutex: Rework task_struct::blocked_on
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@google.com>, Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>,
 Youssef Esmat <youssefesmat@google.com>,
	Mel Gorman <mgorman@suse.de>,
 Daniel Bristot de Oliveira <bristot@redhat.com>,
 Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>, kernel-team@android.com,
	"Connor O'Brien" <connoro@google.com>, John Stultz <jstultz@google.com>
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1785758123864557533
X-GMAIL-MSGID: 1785758123864557533

From: Peter Zijlstra <peterz@infradead.org>

Track the blocked-on relation for mutexes, to allow following this
relation at schedule time.

   task
     | blocked-on
     v
   mutex
     | owner
     v
   task

Also adds a blocked_on_state value so we can distinguqish when a
task is blocked_on a mutex, but is either blocked, waking up, or
runnable (such that it can try to aquire the lock its blocked
on).

This avoids some of the subtle & racy games where the blocked_on
state gets cleared, only to have it re-added by the
mutex_lock_slowpath call when it tries to aquire the lock on
wakeup

Also adds blocked_lock to the task_struct so we can safely
serialize the blocked-on state.

Finally adds wrappers that are useful to provide correctness
checks. Folded in from a patch by:
   Valentin Schneider <valentin.schneider@arm.com>

This all will be used for tracking blocked-task/mutex chains
with the prox-execution patch in a similar fashion to how
priority inheritence is done with rt_mutexes.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[minor changes while rebasing]
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: Fix blocked_on tracking in __mutex_lock_common in error paths]
Signed-off-by: John Stultz <jstultz@google.com>
---
v2:
* Fixed blocked_on tracking in error paths that was causing crashes
v4:
* Ensure we clear blocked_on when waking ww_mutexes to die or wound.
  This is critical so we don't get ciruclar blocked_on relationships
  that can't be resolved.
v5:
* Fix potential bug where the skip_wait path might clear blocked_on
  when that path never set it
* Slight tweaks to where we set blocked_on to make it consistent,
  along with extra WARN_ON correctness checking
* Minor comment changes
v7:
* Minor commit message change suggested by Metin Kaya
* Fix WARN_ON conditionals in unlock path (as blocked_on might
  already be cleared), found while looking at issue Metin Kaya
  raised.
* Minor tweaks to be consistent in what we do under the
  blocked_on lock, also tweaked variable name to avoid confusion
  with label, and comment typos, as suggested by Metin Kaya
* Minor tweak for CONFIG_SCHED_PROXY_EXEC name change
* Moved unused block of code to later in the series, as suggested
  by Metin Kaya
* Switch to a tri-state to be able to distinguish from waking and
  runnable so we can later safely do return migration from ttwu
* Folded together with related blocked_on changes
---
 include/linux/sched.h        | 40 ++++++++++++++++++++++++++++++++----
 init/init_task.c             |  1 +
 kernel/fork.c                |  4 ++--
 kernel/locking/mutex-debug.c |  9 ++++----
 kernel/locking/mutex.c       | 35 +++++++++++++++++++++++++++----
 kernel/locking/ww_mutex.h    | 24 ++++++++++++++++++++--
 kernel/sched/core.c          |  6 ++++++
 7 files changed, 103 insertions(+), 16 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1e80c330f755..bfe8670f99a1 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -743,6 +743,12 @@ struct kmap_ctrl {
 #endif
 };
 
+enum blocked_on_state {
+	BO_RUNNABLE,
+	BO_BLOCKED,
+	BO_WAKING,
+};
+
 struct task_struct {
 #ifdef CONFIG_THREAD_INFO_IN_TASK
 	/*
@@ -1149,10 +1155,9 @@ struct task_struct {
 	struct rt_mutex_waiter		*pi_blocked_on;
 #endif
 
-#ifdef CONFIG_DEBUG_MUTEXES
-	/* Mutex deadlock detection: */
-	struct mutex_waiter		*blocked_on;
-#endif
+	enum blocked_on_state		blocked_on_state;
+	struct mutex			*blocked_on;	/* lock we're blocked on */
+	raw_spinlock_t			blocked_lock;
 
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
 	int				non_block_count;
@@ -2258,6 +2263,33 @@ static inline int rwlock_needbreak(rwlock_t *lock)
 #endif
 }
 
+static inline void set_task_blocked_on(struct task_struct *p, struct mutex *m)
+{
+	lockdep_assert_held(&p->blocked_lock);
+
+	/*
+	 * Check we are clearing values to NULL or setting NULL
+	 * to values to ensure we don't overwrite exisiting mutex
+	 * values or clear already cleared values
+	 */
+	WARN_ON((!m && !p->blocked_on) || (m && p->blocked_on));
+
+	p->blocked_on = m;
+	p->blocked_on_state = m ? BO_BLOCKED : BO_RUNNABLE;
+}
+
+static inline struct mutex *get_task_blocked_on(struct task_struct *p)
+{
+	lockdep_assert_held(&p->blocked_lock);
+
+	return p->blocked_on;
+}
+
+static inline struct mutex *get_task_blocked_on_once(struct task_struct *p)
+{
+	return READ_ONCE(p->blocked_on);
+}
+
 static __always_inline bool need_resched(void)
 {
 	return unlikely(tif_need_resched());
diff --git a/init/init_task.c b/init/init_task.c
index 5727d42149c3..0c31d7d7c7a9 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -131,6 +131,7 @@ struct task_struct init_task
 	.journal_info	= NULL,
 	INIT_CPU_TIMERS(init_task)
 	.pi_lock	= __RAW_SPIN_LOCK_UNLOCKED(init_task.pi_lock),
+	.blocked_lock	= __RAW_SPIN_LOCK_UNLOCKED(init_task.blocked_lock),
 	.timer_slack_ns = 50000, /* 50 usec default slack */
 	.thread_pid	= &init_struct_pid,
 	.thread_node	= LIST_HEAD_INIT(init_signals.thread_head),
diff --git a/kernel/fork.c b/kernel/fork.c
index 10917c3e1f03..b3ba3d22d8b2 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2358,6 +2358,7 @@ __latent_entropy struct task_struct *copy_process(
 	ftrace_graph_init_task(p);
 
 	rt_mutex_init_task(p);
+	raw_spin_lock_init(&p->blocked_lock);
 
 	lockdep_assert_irqs_enabled();
 #ifdef CONFIG_PROVE_LOCKING
@@ -2456,9 +2457,8 @@ __latent_entropy struct task_struct *copy_process(
 	lockdep_init_task(p);
 #endif
 
-#ifdef CONFIG_DEBUG_MUTEXES
+	p->blocked_on_state = BO_RUNNABLE;
 	p->blocked_on = NULL; /* not blocked yet */
-#endif
 #ifdef CONFIG_BCACHE
 	p->sequential_io	= 0;
 	p->sequential_io_avg	= 0;
diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c
index bc8abb8549d2..1eedf7c60c00 100644
--- a/kernel/locking/mutex-debug.c
+++ b/kernel/locking/mutex-debug.c
@@ -52,17 +52,18 @@ void debug_mutex_add_waiter(struct mutex *lock, struct mutex_waiter *waiter,
 {
 	lockdep_assert_held(&lock->wait_lock);
 
-	/* Mark the current thread as blocked on the lock: */
-	task->blocked_on = waiter;
+	/* Current thread can't be already blocked (since it's executing!) */
+	DEBUG_LOCKS_WARN_ON(get_task_blocked_on(task));
 }
 
 void debug_mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *waiter,
 			 struct task_struct *task)
 {
+	struct mutex *blocked_on = get_task_blocked_on_once(task);
+
 	DEBUG_LOCKS_WARN_ON(list_empty(&waiter->list));
 	DEBUG_LOCKS_WARN_ON(waiter->task != task);
-	DEBUG_LOCKS_WARN_ON(task->blocked_on != waiter);
-	task->blocked_on = NULL;
+	DEBUG_LOCKS_WARN_ON(blocked_on && blocked_on != lock);
 
 	INIT_LIST_HEAD(&waiter->list);
 	waiter->task = NULL;
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 543774506fdb..6084470773f6 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -592,6 +592,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 	}
 
 	raw_spin_lock_irqsave(&lock->wait_lock, flags);
+	raw_spin_lock(&current->blocked_lock);
 	/*
 	 * After waiting to acquire the wait_lock, try again.
 	 */
@@ -622,6 +623,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 			goto err_early_kill;
 	}
 
+	set_task_blocked_on(current, lock);
 	set_current_state(state);
 	trace_contention_begin(lock, LCB_F_MUTEX);
 	for (;;) {
@@ -652,6 +654,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 				goto err;
 		}
 
+		raw_spin_unlock(&current->blocked_lock);
 		raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 		/* Make sure we do wakeups before calling schedule */
 		if (!wake_q_empty(&wake_q)) {
@@ -662,6 +665,13 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 
 		first = __mutex_waiter_is_first(lock, &waiter);
 
+		raw_spin_lock_irqsave(&lock->wait_lock, flags);
+		raw_spin_lock(&current->blocked_lock);
+
+		/*
+		 * Re-set blocked_on_state as unlock path set it to WAKING/RUNNABLE
+		 */
+		current->blocked_on_state = BO_BLOCKED;
 		set_current_state(state);
 		/*
 		 * Here we order against unlock; we must either see it change
@@ -672,16 +682,25 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 			break;
 
 		if (first) {
+			bool opt_acquired;
+
+			/*
+			 * mutex_optimistic_spin() can schedule, so  we need to
+			 * release these locks before calling it.
+			 */
+			raw_spin_unlock(&current->blocked_lock);
+			raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 			trace_contention_begin(lock, LCB_F_MUTEX | LCB_F_SPIN);
-			if (mutex_optimistic_spin(lock, ww_ctx, &waiter))
+			opt_acquired = mutex_optimistic_spin(lock, ww_ctx, &waiter);
+			raw_spin_lock_irqsave(&lock->wait_lock, flags);
+			raw_spin_lock(&current->blocked_lock);
+			if (opt_acquired)
 				break;
 			trace_contention_begin(lock, LCB_F_MUTEX);
 		}
-
-		raw_spin_lock_irqsave(&lock->wait_lock, flags);
 	}
-	raw_spin_lock_irqsave(&lock->wait_lock, flags);
 acquired:
+	set_task_blocked_on(current, NULL);
 	__set_current_state(TASK_RUNNING);
 
 	if (ww_ctx) {
@@ -706,16 +725,20 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 	if (ww_ctx)
 		ww_mutex_lock_acquired(ww, ww_ctx);
 
+	raw_spin_unlock(&current->blocked_lock);
 	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 	wake_up_q(&wake_q);
 	preempt_enable();
 	return 0;
 
 err:
+	set_task_blocked_on(current, NULL);
 	__set_current_state(TASK_RUNNING);
 	__mutex_remove_waiter(lock, &waiter);
 err_early_kill:
+	WARN_ON(current->blocked_on);
 	trace_contention_end(lock, ret);
+	raw_spin_unlock(&current->blocked_lock);
 	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 	debug_mutex_free_waiter(&waiter);
 	mutex_release(&lock->dep_map, ip);
@@ -925,8 +948,12 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
 
 		next = waiter->task;
 
+		raw_spin_lock(&next->blocked_lock);
 		debug_mutex_wake_waiter(lock, waiter);
+		WARN_ON(next->blocked_on != lock);
+		next->blocked_on_state = BO_WAKING;
 		wake_q_add(&wake_q, next);
+		raw_spin_unlock(&next->blocked_lock);
 	}
 
 	if (owner & MUTEX_FLAG_HANDOFF)
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index 9facc0ddfdd3..8dd21ff5eee0 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -281,10 +281,21 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER *waiter,
 		return false;
 
 	if (waiter->ww_ctx->acquired > 0 && __ww_ctx_less(waiter->ww_ctx, ww_ctx)) {
+		/* nested as we should hold current->blocked_lock already */
+		raw_spin_lock_nested(&waiter->task->blocked_lock, SINGLE_DEPTH_NESTING);
 #ifndef WW_RT
 		debug_mutex_wake_waiter(lock, waiter);
 #endif
+		/*
+		 * When waking up the task to die, be sure to set the
+		 * blocked_on_state to WAKING. Otherwise we can see
+		 * circular  blocked_on relationships that can't
+		 * resolve.
+		 */
+		WARN_ON(waiter->task->blocked_on != lock);
+		waiter->task->blocked_on_state = BO_WAKING;
 		wake_q_add(wake_q, waiter->task);
+		raw_spin_unlock(&waiter->task->blocked_lock);
 	}
 
 	return true;
@@ -331,9 +342,18 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
 		 * it's wounded in __ww_mutex_check_kill() or has a
 		 * wakeup pending to re-read the wounded state.
 		 */
-		if (owner != current)
+		if (owner != current) {
+			/* nested as we should hold current->blocked_lock already */
+			raw_spin_lock_nested(&owner->blocked_lock, SINGLE_DEPTH_NESTING);
+			/*
+			 * When waking up the task to wound, be sure to set the
+			 * blocked_on_state flag. Otherwise we can see circular
+			 * blocked_on relationships that can't resolve.
+			 */
+			owner->blocked_on_state = BO_WAKING;
 			wake_q_add(wake_q, owner);
-
+			raw_spin_unlock(&owner->blocked_lock);
+		}
 		return true;
 	}
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a708d225c28e..4e46189d545d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4195,6 +4195,7 @@ bool ttwu_state_match(struct task_struct *p, unsigned int state, int *success)
 int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 {
 	guard(preempt)();
+	unsigned long flags;
 	int cpu, success = 0;
 
 	if (p == current) {
@@ -4341,6 +4342,11 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 
 		ttwu_queue(p, cpu, wake_flags);
 	}
+	/* XXX can we do something better here for !CONFIG_SCHED_PROXY_EXEC case */
+	raw_spin_lock_irqsave(&p->blocked_lock, flags);
+	if (p->blocked_on_state == BO_WAKING)
+		p->blocked_on_state = BO_RUNNABLE;
+	raw_spin_unlock_irqrestore(&p->blocked_lock, flags);
 out:
 	if (success)
 		ttwu_stat(p, task_cpu(p), wake_flags);

From patchwork Wed Dec 20 00:18:17 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: John Stultz <jstultz@google.com>
X-Patchwork-Id: 181354
Return-Path: <linux-kernel+bounces-6140-ouuuleilei=gmail.com@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp2322951dyi;
        Tue, 19 Dec 2023 16:20:59 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IGMZpxymPda0IeAb+cQFQrmzM+mjAbKD1XVSvLiYete6ewLy05iz7vg19oMkpFb2DLyTqW8
X-Received: by 2002:ac8:59d3:0:b0:425:4043:419a with SMTP id
 f19-20020ac859d3000000b004254043419amr34317607qtf.70.1703031659080;
        Tue, 19 Dec 2023 16:20:59 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1703031659; cv=none;
        d=google.com; s=arc-20160816;
        b=OE4FFZkOGq+7jgvJty754KhZuRw/wMjZF8UodcOOKyddCINExehh5OdGyeNXTjj8qu
         51GlWUhD0LoU4etQ9stjcGsSkW64dw3/gPK6tnNv1twGfIf3uujNYmEirgdQbcmc+HBH
         dpT9Q4GpexinRK+VmuBC9It+/Zui4tPSlJXtpra2gXa81VwEmgjC3PqMrv/ZM3ZDLmDg
         iQpLBsq46cmUOuujbqf0P2sgAc0ykSM9gKUm4KEAQI7kt/xD3O25VzhBQZ6JVNX4SRJK
         JKrOrCy0UjJEwb1i3HBnr+QpESIaV/OenpEvEyjQGcAcXa4sZbEDh0BC+yPcmS7sPtni
         w7rA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=cc:to:from:subject:message-id:references:mime-version
         :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date
         :dkim-signature;
        bh=R4gN3By35m9Oev8xh3bj9l9i7deQm7lEgF77PrA60M4=;
        fh=lq1k4m46TqOToIb7rHC/F2KmHTJTjmaV9sS3Am9W7nA=;
        b=bRxwCWc/cFxYh0sxkCwsNteZgDU+XLVR1w8ZSvNtx/zh5bqUGiC+AIU902Ket4HjLB
         BWqdBC8dzNSX3uaMRAjiXEDD18wuOosuBC43fsadOkcuuPPRh6v6f1hJL620jg6AyVGv
         1jf+OsJqHxuzJikquSR5RWbJuGrF6m5UkPjQVwJk2eoCdzqmjW5h97EcGFF3kfqC3AHc
         ZUlGHX1rULF42hl3R1F0zwlHMMrgtCNRgIuTncq48JBs14G9urZ+M49xgKL92KOV1MfC
         xWtyOGFTZWStDartm+/fTYRHNOXlt2M9mc15nsDoQbo6WdNStqHcyc4UTx4fNZq+xOxd
         SecQ==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=ePJbYj50;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6140-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.199.223 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6140-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223])
        by mx.google.com with ESMTPS id
 g7-20020ac870c7000000b0042578f95d8esi25804648qtp.254.2023.12.19.16.20.58
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 19 Dec 2023 16:20:59 -0800 (PST)
Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-6140-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.199.223 as permitted sender) client-ip=147.75.199.223;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=ePJbYj50;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6140-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.199.223 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6140-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org
 [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by ny.mirrors.kernel.org (Postfix) with ESMTPS id CD1661C25470
	for <ouuuleilei@gmail.com>; Wed, 20 Dec 2023 00:20:58 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp.subspace.kernel.org (Postfix) with ESMTP id 9CDD2C8CE;
	Wed, 20 Dec 2023 00:19:18 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="ePJbYj50"
X-Original-To: linux-kernel@vger.kernel.org
Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com
 [209.85.128.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4DEDAB64F
	for <linux-kernel@vger.kernel.org>; Wed, 20 Dec 2023 00:19:14 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Received: by mail-yw1-f202.google.com with SMTP id
 00721157ae682-5cf4696e202so67923457b3.2
        for <linux-kernel@vger.kernel.org>;
 Tue, 19 Dec 2023 16:19:14 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1703031553; x=1703636353;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=R4gN3By35m9Oev8xh3bj9l9i7deQm7lEgF77PrA60M4=;
        b=ePJbYj50mEx75GgLkfCRl2N1CTA2OgV374FJNQvOQExrauWzIrHkcMfcDXowGqMEWq
         GisyGDyWaYoLATTmbpDkz+ui3ILBqOICVjAGtUuAl7w+cmBuMGHlAbiMBfKbomd/33KZ
         1E+gd+5nHbDWvU7o/welF2+SoCIZ0tkyUMWz1AUP7INTEKKAc62k+HeHmWi3rDl8wSpd
         aw6ea3NWHpzGudaRSPMMLPUPT3EBcoMslErK80l+N9BA8zXhYBOKRNGjAWApjUIUkvXj
         kTKJ3T6yup6bU8zck8fKeGKbik5F032EBH+Xuvp/Ghh0bf4YVRVMLuCINt3q1I3cMjNS
         o2jA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1703031553; x=1703636353;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=R4gN3By35m9Oev8xh3bj9l9i7deQm7lEgF77PrA60M4=;
        b=jm4cXp0z3tg7AZra85VLn872IszJk0pmWIyWomoL68FFy8ZU3OjqumbyikE+lvutT+
         /DcSzPmTk1ppDq49Adoc9rQF/q55QTuaV7jcdPhRrOQmZuW0OC6x0WPW2qKnl3at7Cs7
         FE2VooX3BFTx8qm22a1NJM/YmmQxw8/7NRR9jUY7BpPbRgjCmQC1nyj0bTFg03RNGiCH
         QpYj5F3T8lyLCUO2d9maCGEpJ+h8+OK2eu2b2w9S11QZJBZjMrZ6RdQo98VWJPjRS+qs
         0pyuWMkCd3Kz6tVGlpqDMorNZW/Yb6ZMLEE6SlOnXBUqhFNsThkjp7vq51j74JgFpADT
         /5yg==
X-Gm-Message-State: AOJu0Yy7PUHSv1wBUzjF929JACtVQrDE1f55iVONg9qmXCeYIwqRWmBL
	2xGNizYG6XKB6c6ZdjCYEymMZA9MbTEG4h/o9CHAKgUEVnn8T3jP5KJ1h2oTLZoroVU8+WfM4MI
	SW3HguTgFDG48erR5SwLXkZUukcLRUOqJOmCxhOlb4muOEqgUXcmEQqi5kcw9WNSt8II/3so=
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a81:fe0b:0:b0:5e6:468e:fea0 with SMTP id
 j11-20020a81fe0b000000b005e6468efea0mr2377004ywn.0.1703031553048; Tue, 19 Dec
 2023 16:19:13 -0800 (PST)
Date: Tue, 19 Dec 2023 16:18:17 -0800
In-Reply-To: <20231220001856.3710363-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231220001856.3710363-1-jstultz@google.com>
X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog
Message-ID: <20231220001856.3710363-7-jstultz@google.com>
Subject: [PATCH v7 06/23] sched: Add CONFIG_SCHED_PROXY_EXEC & boot argument
 to enable/disable
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@google.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <vschneid@redhat.com>,
 Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Zimuzo Ezeozue <zezeozue@google.com>,
	Youssef Esmat <youssefesmat@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
 Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>, "Paul E. McKenney" <paulmck@kernel.org>,
	Metin Kaya <Metin.Kaya@arm.com>, Xuewen Yan <xuewen.yan94@gmail.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
 Thomas Gleixner <tglx@linutronix.de>, kernel-team@android.com
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1785758125043583216
X-GMAIL-MSGID: 1785758125043583216

Add a CONFIG_SCHED_PROXY_EXEC option, along with a boot argument
sched_prox_exec= that can be used to disable the feature at boot
time if CONFIG_SCHED_PROXY_EXEC was enabled.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com
Signed-off-by: John Stultz <jstultz@google.com>
---
v7:
* Switch to CONFIG_SCHED_PROXY_EXEC/sched_proxy_exec= as
  suggested by Metin Kaya.
* Switch boot arg from =disable/enable to use kstrtobool(),
  which supports =yes|no|1|0|true|false|on|off, as also
  suggested by Metin Kaya, and print a message when a boot
  argument is used.
---
 .../admin-guide/kernel-parameters.txt         |  5 ++++
 include/linux/sched.h                         | 13 +++++++++
 init/Kconfig                                  |  7 +++++
 kernel/sched/core.c                           | 29 +++++++++++++++++++
 4 files changed, 54 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 65731b060e3f..cc64393b913f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5714,6 +5714,11 @@
 	sa1100ir	[NET]
 			See drivers/net/irda/sa1100_ir.c.
 
+	sched_proxy_exec= [KNL]
+			Enables or disables "proxy execution" style
+			solution to mutex based priority inversion.
+			Format: <bool>
+
 	sched_verbose	[KNL] Enables verbose scheduler debug messages.
 
 	schedstats=	[KNL,X86] Enable or disable scheduled statistics.
diff --git a/include/linux/sched.h b/include/linux/sched.h
index bfe8670f99a1..880af1c3097d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1566,6 +1566,19 @@ struct task_struct {
 	 */
 };
 
+#ifdef CONFIG_SCHED_PROXY_EXEC
+DECLARE_STATIC_KEY_TRUE(__sched_proxy_exec);
+static inline bool sched_proxy_exec(void)
+{
+	return static_branch_likely(&__sched_proxy_exec);
+}
+#else
+static inline bool sched_proxy_exec(void)
+{
+	return false;
+}
+#endif
+
 static inline struct pid *task_pid(struct task_struct *task)
 {
 	return task->thread_pid;
diff --git a/init/Kconfig b/init/Kconfig
index 9ffb103fc927..c5a759b6366a 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -908,6 +908,13 @@ config NUMA_BALANCING_DEFAULT_ENABLED
 	  If set, automatic NUMA balancing will be enabled if running on a NUMA
 	  machine.
 
+config SCHED_PROXY_EXEC
+	bool "Proxy Execution"
+	default n
+	help
+	  This option enables proxy execution, a mechanism for mutex owning
+	  tasks to inherit the scheduling context of higher priority waiters.
+
 menuconfig CGROUPS
 	bool "Control Group support"
 	select KERNFS
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4e46189d545d..e06558fb08aa 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -117,6 +117,35 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(sched_compute_energy_tp);
 
 DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
 
+#ifdef CONFIG_SCHED_PROXY_EXEC
+DEFINE_STATIC_KEY_TRUE(__sched_proxy_exec);
+static int __init setup_proxy_exec(char *str)
+{
+	bool proxy_enable;
+
+	if (kstrtobool(str, &proxy_enable)) {
+		pr_warn("Unable to parse sched_proxy_exec=\n");
+		return 0;
+	}
+
+	if (proxy_enable) {
+		pr_info("sched_proxy_exec enabled via boot arg\n");
+		static_branch_enable(&__sched_proxy_exec);
+	} else {
+		pr_info("sched_proxy_exec disabled via boot arg\n");
+		static_branch_disable(&__sched_proxy_exec);
+	}
+	return 1;
+}
+#else
+static int __init setup_proxy_exec(char *str)
+{
+	pr_warn("CONFIG_SCHED_PROXY_EXEC=n, so it cannot be enabled or disabled at boottime\n");
+	return 0;
+}
+#endif
+__setup("sched_proxy_exec=", setup_proxy_exec);
+
 #ifdef CONFIG_SCHED_DEBUG
 /*
  * Debugging: various feature bits

From patchwork Wed Dec 20 00:18:18 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: John Stultz <jstultz@google.com>
X-Patchwork-Id: 181356
Return-Path: <linux-kernel+bounces-6142-ouuuleilei=gmail.com@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp2323081dyi;
        Tue, 19 Dec 2023 16:21:18 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IFUHweUPUtwwSpta4kTiEuQrMCRt7Ga7P5qcuUDEuljJqem5EKX3TxFEWwNRqiywK/zEo+i
X-Received: by 2002:a05:622a:1652:b0:425:4043:8d6b with SMTP id
 y18-20020a05622a165200b0042540438d6bmr17381312qtj.134.1703031678404;
        Tue, 19 Dec 2023 16:21:18 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1703031678; cv=none;
        d=google.com; s=arc-20160816;
        b=KUH/g4n8tfkg+rOJ43GMGF7K5SUUd3U6p0YHlKUmFfyh4U9WS5vgiwSi/p5hhIG4tQ
         VsIA9HDDKPvQ4hIQg7oJF/ZptJn8oqusX+xi+hQbwgNUEg1KsP7QjmEMRhLtxJuvDP29
         j80zdgC3PjGkq24w+4P71WEOZNEItBtelm8ykU1rXDiIB3lodTO0e27otaEZ61oBkZh8
         Ign96M6mAyCxZWGBINGO2VIbyfoIRBP29RtGGtRiRp8ei7ITn+J7cuuobZGwWeLL2tC2
         HwB/2d8BcVNnTdH3Mgg1YaEJkSo9a0taUPjx27vh/9wyMYP1u75Dztvdq9tOVFHpq5hO
         7tkQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=cc:to:from:subject:message-id:references:mime-version
         :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date
         :dkim-signature;
        bh=EqtrdKGdXzPBlRqPKaGseSxeIbziY2YKYdkpRUpxJXI=;
        fh=5JIDPtr1m24UEIajLvRh11pfGrxPinBnvs5yA23VCFE=;
        b=YW2awchxCGI8+OJELI+WvcNfrvKHW+MH6a0ieDHzK63/eJHolDvzrALyD5BjtAaeuf
         0Zr2z3SH3ji8m2+wffKYbRgW07Y+jwvo+cd6InoR3Fh/yxRC/p1iGHYEqo6vZ6AcGFit
         iFKIUdEMdQQhSO/HD1ZF8jsm1OGelRqGBEvz9TJeyXDPiqQ5sNUjo8MNZZHI2Z/MlnEq
         1CjeXQ69ouc1sYR27NMi4dZxLNM3YtmNoVdJufZaJiNsXt0JFkETb7jRL5vxQzNqZTbW
         oGdViNjW1XB77+gDFfbZdc7Jq5OXkVw0oKtD77jV4FmsN7lQ+t6LGQGfey7pXnQHtAHt
         imqw==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=L0QZBpvV;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6142-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.199.223 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6142-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223])
        by mx.google.com with ESMTPS id
 k2-20020ac85fc2000000b00427782bfa6asi2303700qta.180.2023.12.19.16.21.18
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 19 Dec 2023 16:21:18 -0800 (PST)
Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-6142-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.199.223 as permitted sender) client-ip=147.75.199.223;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=L0QZBpvV;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6142-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.199.223 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6142-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org
 [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by ny.mirrors.kernel.org (Postfix) with ESMTPS id 25EDE1C20D9E
	for <ouuuleilei@gmail.com>; Wed, 20 Dec 2023 00:21:18 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp.subspace.kernel.org (Postfix) with ESMTP id 82A6811C8C;
	Wed, 20 Dec 2023 00:19:20 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="L0QZBpvV"
X-Original-To: linux-kernel@vger.kernel.org
Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com
 [209.85.215.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 29E2DBE5E
	for <linux-kernel@vger.kernel.org>; Wed, 20 Dec 2023 00:19:15 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Received: by mail-pg1-f201.google.com with SMTP id
 41be03b00d2f7-5b8d4a559ddso4791559a12.1
        for <linux-kernel@vger.kernel.org>;
 Tue, 19 Dec 2023 16:19:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1703031555; x=1703636355;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=EqtrdKGdXzPBlRqPKaGseSxeIbziY2YKYdkpRUpxJXI=;
        b=L0QZBpvVE4egwCYGoVKmekFicU1eN+/jX8NhbaC9y6zeoXV7Coom4tfZCV5T7bHAkt
         hN9AVz71jmlide+HDAJ+scTkXo01KzLTX6QlcC25WUXUVVsHBln5q3fTItJlLKJI0lAg
         UQfGzxbHLOwFeIpC0x3I8u0vuX8+2vMq8xzkTKL5Qkdaw6So7nZgLie5o/ZvC7+Ru0OJ
         PbDgZokMXUTKKh2RMZE7raVk3fOdktOwaYqwSwrg9yH1gVvg4sfXTaTU4FSMkMPpYQZj
         sih4ch7ZOddIA7g5pTZf5wcaRnTLxvNRRHkkcTVf/sYHabgvenxJsJ91LzudcgA1qWSA
         NdOw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1703031555; x=1703636355;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=EqtrdKGdXzPBlRqPKaGseSxeIbziY2YKYdkpRUpxJXI=;
        b=RffBlS8mTh2A+V0mDB0rg6VOObmFEv5QVzNDVhfWGsEvnOTlK7PXI5cGLwlt2Z75x1
         5ZKk8uZf8XpLnIsBro8uQTnocyNtx128i6eOACbm6HEMojUzbN6A4Kti/MXApZmFGUva
         NCvwPvugosZJ/JAT3mZADiFU2i4fxd2qdvE9E7g9NOPV+0RPrOSqcr/4zF8y9JFus9Pf
         yLlE/SDh4GdF7r4S4kUR5Z/OYL9QkhnAfwQwUf+BCx7k2SCfndfVQPhXbz8MNAaE9T1r
         /MyxHwk51u4YwsAS3QtduaqGWXdpMc1Mq4cLyzfCFf6oNQHOKt7fNDEpf6rMLp3zMjej
         xi3g==
X-Gm-Message-State: AOJu0YxPu1LRYhjoYYlStUe1BDFgHsnIcKaZkS9qIwI3lttvBmwVywTE
	d1n4AgfglgnNKNASkui01wvUT3ZuZFAATqoMVOyCeCdGcn2mHoYB3cgqi0m7G8+phJmdA5y0Zsa
	VgN+3DEhdFzwxlrK3uJzokK6tOazQDSSigkVENP2FB95i4U1kQGMkuRC6wLP5e7+DRo5eYNw=
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a05:6a00:18a8:b0:6d6:35f0:19c5 with SMTP
 id x40-20020a056a0018a800b006d635f019c5mr59554pfh.0.1703031554968; Tue, 19
 Dec 2023 16:19:14 -0800 (PST)
Date: Tue, 19 Dec 2023 16:18:18 -0800
In-Reply-To: <20231220001856.3710363-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231220001856.3710363-1-jstultz@google.com>
X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog
Message-ID: <20231220001856.3710363-8-jstultz@google.com>
Subject: [PATCH v7 07/23] locking/mutex: Switch to mutex handoffs for
 CONFIG_SCHED_PROXY_EXEC
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@google.com>, Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>,
 Youssef Esmat <youssefesmat@google.com>,
	Mel Gorman <mgorman@suse.de>,
 Daniel Bristot de Oliveira <bristot@redhat.com>,
 Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>, kernel-team@android.com,
	Valentin Schneider <valentin.schneider@arm.com>,
 "Connor O'Brien" <connoro@google.com>,
	John Stultz <jstultz@google.com>
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1785758145204929039
X-GMAIL-MSGID: 1785758145204929039

From: Peter Zijlstra <peterz@infradead.org>

Since with SCHED_PROXY_EXEC, we will want to hand off locks to
the tasks we are running on behalf of, switch to using mutex
handoffs.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[rebased, added comments and changelog]
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
[Fixed rebase conflicts]
[squashed sched: Ensure blocked_on is always guarded by blocked_lock]
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
[fix rebase conflicts, various fixes & tweaks commented inline]
[squashed sched: Use rq->curr vs rq->proxy checks]
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: Split out only the very basic initial framework
 for proxy logic from a larger patch.]
Signed-off-by: John Stultz <jstultz@google.com>
---
v5:
* Split out from core proxy patch
v6:
* Rework to use sched_proxy_exec() instead of #ifdef CONFIG_PROXY_EXEC
v7:
* Avoid disabling optimistic spinning at compile time so booting
  with sched_proxy_exec=off matches prior performance
* Add comment in mutex-design.rst as suggested by Metin Kaya
---
 Documentation/locking/mutex-design.rst |  3 ++
 kernel/locking/mutex.c                 | 42 +++++++++++++++-----------
 2 files changed, 28 insertions(+), 17 deletions(-)

diff --git a/Documentation/locking/mutex-design.rst b/Documentation/locking/mutex-design.rst
index 78540cd7f54b..57a5cb03f409 100644
--- a/Documentation/locking/mutex-design.rst
+++ b/Documentation/locking/mutex-design.rst
@@ -61,6 +61,9 @@ taken, depending on the state of the lock:
      waiting to spin on mutex owner, only to go directly to slowpath upon
      obtaining the MCS lock.
 
+     NOTE: Optimistic spinning will be avoided when using proxy execution
+     (SCHED_PROXY_EXEC) as we want to hand the lock off to the task that was
+     boosting the current owner.
 
 (iii) slowpath: last resort, if the lock is still unable to be acquired,
       the task is added to the wait-queue and sleeps until woken up by the
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 6084470773f6..11dc5cb7a5a3 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -416,6 +416,9 @@ static __always_inline bool
 mutex_optimistic_spin(struct mutex *lock, struct ww_acquire_ctx *ww_ctx,
 		      struct mutex_waiter *waiter)
 {
+	if (sched_proxy_exec())
+		return false;
+
 	if (!waiter) {
 		/*
 		 * The purpose of the mutex_can_spin_on_owner() function is
@@ -914,26 +917,31 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
 
 	mutex_release(&lock->dep_map, ip);
 
-	/*
-	 * Release the lock before (potentially) taking the spinlock such that
-	 * other contenders can get on with things ASAP.
-	 *
-	 * Except when HANDOFF, in that case we must not clear the owner field,
-	 * but instead set it to the top waiter.
-	 */
-	owner = atomic_long_read(&lock->owner);
-	for (;;) {
-		MUTEX_WARN_ON(__owner_task(owner) != current);
-		MUTEX_WARN_ON(owner & MUTEX_FLAG_PICKUP);
-
-		if (owner & MUTEX_FLAG_HANDOFF)
-			break;
+	if (sched_proxy_exec()) {
+		/* Always force HANDOFF for Proxy Exec for now. Revisit. */
+		owner = MUTEX_FLAG_HANDOFF;
+	} else {
+		/*
+		 * Release the lock before (potentially) taking the spinlock
+		 * such that other contenders can get on with things ASAP.
+		 *
+		 * Except when HANDOFF, in that case we must not clear the
+		 * owner field, but instead set it to the top waiter.
+		 */
+		owner = atomic_long_read(&lock->owner);
+		for (;;) {
+			MUTEX_WARN_ON(__owner_task(owner) != current);
+			MUTEX_WARN_ON(owner & MUTEX_FLAG_PICKUP);
 
-		if (atomic_long_try_cmpxchg_release(&lock->owner, &owner, __owner_flags(owner))) {
-			if (owner & MUTEX_FLAG_WAITERS)
+			if (owner & MUTEX_FLAG_HANDOFF)
 				break;
 
-			return;
+			if (atomic_long_try_cmpxchg_release(&lock->owner, &owner,
+							    __owner_flags(owner))) {
+				if (owner & MUTEX_FLAG_WAITERS)
+					break;
+				return;
+			}
 		}
 	}
 

From patchwork Wed Dec 20 00:18:19 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: John Stultz <jstultz@google.com>
X-Patchwork-Id: 181359
Return-Path: <linux-kernel+bounces-6143-ouuuleilei=gmail.com@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp2323491dyi;
        Tue, 19 Dec 2023 16:22:18 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IFhkzsDqBLSl65dAphp20WBFw9qlrR6aW8Bi9GybqajP2jGnGPfudSJSvyKz/YHW4NmtEzz
X-Received: by 2002:a05:6808:640f:b0:3b9:f10f:b69f with SMTP id
 fg15-20020a056808640f00b003b9f10fb69fmr22050237oib.11.1703031737908;
        Tue, 19 Dec 2023 16:22:17 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1703031737; cv=none;
        d=google.com; s=arc-20160816;
        b=akxbKkvRn1XRqepnKOpKn6st5URDN1gmEEsrV9rtmoexDpu4JI/ncR0Ujqq8nFJOso
         0aSDhfLgUNKyA/rC/YtQiXYpAhHlnwXgs9s5XEcfpY+u5Qw9wZ1hvX1+EYbUxlUSFlzd
         FWZg5NUv0UAM+Tj79AB773nJ8apIJJomyD/gz9Ivrkgg2rQkgwxI0idjpoGfjHR+vQz3
         xymLb73Fa3wco1S+HMlqfi3mBmg+A//Pq2WtdPKMFNJGnzq4Mxuh0d8Tj2SEKr3gGp5A
         aqAfSKB1zuucPoBr0Vk9rpaT8X/a12URVJFvnRIitS8viv5EKBZNOksx4e8hYbP+OGMX
         r4oQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=cc:to:from:subject:message-id:references:mime-version
         :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date
         :dkim-signature;
        bh=oaiagOlY2yIt6EK/2XFKSdkelCGW18s3codp/jhpWBM=;
        fh=tSAeEtZsh2QOUn0LJdH7BkbwtCdV6u7UFf67ON7qARU=;
        b=MdZS9wjnKrKnrgUk6CjeMzg3vLwrW7l1rsDQcvpEXQb/4N78nhhtJWK8eiGMf+EpBr
         rsp1c3xoUtv7uiEi9oMNYGJJ6Z+XSdQYKjL8MZ+QxnNZeAkuQU9O34iunoCCEmAcT9tt
         LWMEa/hwfeIj0SOBSnPfc4mGK6O1eYNEWfYtbpryrXhB/Nv6vtBTgaMzdAMXSAjeAUI4
         Is3uJiIUpcnB2YL3h06pgrmQcPAtLZ2lJP0FwZLLXjfNUWGmyj0nGDF2GTdPh9gvOjlY
         ZNz188sa5GeRBvgKKjwj44HofogPJEX5Vw/M/2tL01HGFpz4W2/dZq/s0v2vwWe5Riy/
         4TeA==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=bwz6nykz;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6143-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.48.161 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6143-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161])
        by mx.google.com with ESMTPS id
 i7-20020a625407000000b006d275095934si8506520pfb.310.2023.12.19.16.22.17
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 19 Dec 2023 16:22:17 -0800 (PST)
Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-6143-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.48.161 as permitted sender) client-ip=147.75.48.161;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=bwz6nykz;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6143-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.48.161 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6143-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org
 [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by sy.mirrors.kernel.org (Postfix) with ESMTPS id 6026DB250D7
	for <ouuuleilei@gmail.com>; Wed, 20 Dec 2023 00:21:45 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp.subspace.kernel.org (Postfix) with ESMTP id 772B816412;
	Wed, 20 Dec 2023 00:19:23 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="bwz6nykz"
X-Original-To: linux-kernel@vger.kernel.org
Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com
 [209.85.219.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 30632748C
	for <linux-kernel@vger.kernel.org>; Wed, 20 Dec 2023 00:19:17 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Received: by mail-yb1-f201.google.com with SMTP id
 3f1490d57ef6-dbd4a080c0bso2649248276.1
        for <linux-kernel@vger.kernel.org>;
 Tue, 19 Dec 2023 16:19:17 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1703031557; x=1703636357;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=oaiagOlY2yIt6EK/2XFKSdkelCGW18s3codp/jhpWBM=;
        b=bwz6nykzmkIWGdCr1EJDLaVn1u4PbOs3bpvbU45haNGX8frTRMhcPUsi68bpRF3eg3
         erMSgp+YL2qGBXRR1EFCIPYmwzgrOUGAakqHtEa23zea1OP+BWwWLrVURLSGEHgBpILq
         zRutLEVscxD8jpAweHiRg7/P9n2wQHo3xtBq/bivIKWdqPD/zbxyzpwBpFC4mF88My4w
         RCMD9B35Q1ZUtPO/epa6sXP+TfLGF3KaXYwQxo4Leo6OMGtRwCIXTVnLKBzaN7FSVOEt
         O7H0xIYLUKkoiA2Qlxzp293xhF+kwNP2UfTxfP+1ADdhNSZE+q+dC9OMgKkXdeLgXKEJ
         Loqw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1703031557; x=1703636357;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=oaiagOlY2yIt6EK/2XFKSdkelCGW18s3codp/jhpWBM=;
        b=OSP+eEPJFgBCZ3IjSa193aiarA21FL95IcWn1JyhZvdq/IKKOzqzy4oQ120OsF1Q7v
         RdajXtPq4XX9nKnNcPzbooUR+lT1I1LMa55474empb8OYzw7Vm5sFzmwrbydj6KnYMpY
         XHfOh3aG4DNPQbQJ7kl1AGhk17c7XiHEXPrewPPTOLIcEmLffpRnMiGEbRlc7Tg13dAo
         40UEgnOWg83gl+CDavU5HlqtLYYYhP9GE0WFT/kYMNyryv9gVVcbSmcieFI4CCLgfE+z
         TcHsZxVH8fC5R4BEC1WX1oPxPFXVaZ+/uceaP+xOo5H89r3rgmgGnWP6nQgVcOQr7JXc
         ZKxA==
X-Gm-Message-State: AOJu0YzJgEVA1v3dVxwvXiMWSXY5y9BbTehLeNTsxe2JK1Jdk4Efj+Uo
	crlB/xqrqpC/q6W0DwFFVujRR3gICVN006RMWLHBdCPFAE15usdOtUDnnqtVhhmmkFEsQIEHkSB
	UanmdfgbQOjakz/KB4JyqaGoj5tMQTk7yK1wgVk/M5jwEzfedqiEJhaLjRf0/zhYOnbh9lnc=
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a05:6902:1341:b0:dbd:ab89:659d with SMTP
 id g1-20020a056902134100b00dbdab89659dmr22428ybu.3.1703031556773; Tue, 19 Dec
 2023 16:19:16 -0800 (PST)
Date: Tue, 19 Dec 2023 16:18:19 -0800
In-Reply-To: <20231220001856.3710363-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231220001856.3710363-1-jstultz@google.com>
X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog
Message-ID: <20231220001856.3710363-9-jstultz@google.com>
Subject: [PATCH v7 08/23] sched: Split scheduler and execution contexts
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@google.com>, Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>,
 Youssef Esmat <youssefesmat@google.com>,
	Mel Gorman <mgorman@suse.de>,
 Daniel Bristot de Oliveira <bristot@redhat.com>,
 Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Xuewen Yan <xuewen.yan94@gmail.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>, Metin Kaya <Metin.Kaya@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>, kernel-team@android.com,
	"Connor O'Brien" <connoro@google.com>, John Stultz <jstultz@google.com>
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1785758207824535954
X-GMAIL-MSGID: 1785758207824535954

From: Peter Zijlstra <peterz@infradead.org>

Let's define the scheduling context as all the scheduler state
in task_struct for the task selected to run, and the execution
context as all state required to actually run the task.

Currently both are intertwined in task_struct. We want to
logically split these such that we can use the scheduling
context of the task selected to be scheduled, but use the
execution context of a different task to actually be run.

To this purpose, introduce rq_selected() macro to point to the
task_struct selected from the runqueue by the scheduler, and
will be used for scheduler state, and preserve rq->curr to
indicate the execution context of the task that will actually be
run.

NOTE: Peter previously mentioned he didn't like the name
"rq_selected()", but I've not come up with a better alternative.
I'm very open to other name proposals.

Question for Peter: Dietmar suggested you'd prefer I drop the
conditionalization of the scheduler context pointer on the rq
(so rq_selected() would be open coded as rq->curr_selected or
whatever we agree on for a name), but I'd think in the
!CONFIG_PROXY_EXEC case we'd want to avoid the wasted pointer
and its use (since it curr_selected would always be == curr)?
If I'm wrong I'm fine switching this, but would appreciate
clarification.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20181009092434.26221-5-juri.lelli@redhat.com
[add additional comments and update more sched_class code to use
 rq::proxy]
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: Rebased and resolved minor collisions, reworked to use
 accessors, tweaked update_curr_common to use rq_proxy fixing rt
 scheduling issues]
Signed-off-by: John Stultz <jstultz@google.com>
---
v2:
* Reworked to use accessors
* Fixed update_curr_common to use proxy instead of curr
v3:
* Tweaked wrapper names
* Swapped proxy for selected for clarity
v4:
* Minor variable name tweaks for readability
* Use a macro instead of a inline function and drop
  other helper functions as suggested by Peter.
* Remove verbose comments/questions to avoid review
  distractions, as suggested by Dietmar
v5:
* Add CONFIG_PROXY_EXEC option to this patch so the
  new logic can be tested with this change
* Minor fix to grab rq_selected when holding the rq lock
v7:
* Minor spelling fix and unused argument fixes suggested by
  Metin Kaya
* Switch to curr_selected for consistency, and minor rewording
  of commit message for clarity
* Rename variables selected instead of curr when we're using
  rq_selected()
* Reduce macros in CONFIG_SCHED_PROXY_EXEC ifdef sections,
  as suggested by Metin Kaya
---
 kernel/sched/core.c     | 46 ++++++++++++++++++++++++++---------------
 kernel/sched/deadline.c | 35 ++++++++++++++++---------------
 kernel/sched/fair.c     | 18 ++++++++--------
 kernel/sched/rt.c       | 40 +++++++++++++++++------------------
 kernel/sched/sched.h    | 35 +++++++++++++++++++++++++++++--
 5 files changed, 109 insertions(+), 65 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e06558fb08aa..0ce34f5c0e0c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -822,7 +822,7 @@ static enum hrtimer_restart hrtick(struct hrtimer *timer)
 
 	rq_lock(rq, &rf);
 	update_rq_clock(rq);
-	rq->curr->sched_class->task_tick(rq, rq->curr, 1);
+	rq_selected(rq)->sched_class->task_tick(rq, rq_selected(rq), 1);
 	rq_unlock(rq, &rf);
 
 	return HRTIMER_NORESTART;
@@ -2242,16 +2242,18 @@ static inline void check_class_changed(struct rq *rq, struct task_struct *p,
 
 void wakeup_preempt(struct rq *rq, struct task_struct *p, int flags)
 {
-	if (p->sched_class == rq->curr->sched_class)
-		rq->curr->sched_class->wakeup_preempt(rq, p, flags);
-	else if (sched_class_above(p->sched_class, rq->curr->sched_class))
+	struct task_struct *selected = rq_selected(rq);
+
+	if (p->sched_class == selected->sched_class)
+		selected->sched_class->wakeup_preempt(rq, p, flags);
+	else if (sched_class_above(p->sched_class, selected->sched_class))
 		resched_curr(rq);
 
 	/*
 	 * A queue event has occurred, and we're going to schedule.  In
 	 * this case, we can save a useless back to back clock update.
 	 */
-	if (task_on_rq_queued(rq->curr) && test_tsk_need_resched(rq->curr))
+	if (task_on_rq_queued(selected) && test_tsk_need_resched(rq->curr))
 		rq_clock_skip_update(rq);
 }
 
@@ -2780,7 +2782,7 @@ __do_set_cpus_allowed(struct task_struct *p, struct affinity_context *ctx)
 		lockdep_assert_held(&p->pi_lock);
 
 	queued = task_on_rq_queued(p);
-	running = task_current(rq, p);
+	running = task_current_selected(rq, p);
 
 	if (queued) {
 		/*
@@ -5600,7 +5602,7 @@ unsigned long long task_sched_runtime(struct task_struct *p)
 	 * project cycles that may never be accounted to this
 	 * thread, breaking clock_gettime().
 	 */
-	if (task_current(rq, p) && task_on_rq_queued(p)) {
+	if (task_current_selected(rq, p) && task_on_rq_queued(p)) {
 		prefetch_curr_exec_start(p);
 		update_rq_clock(rq);
 		p->sched_class->update_curr(rq);
@@ -5668,7 +5670,8 @@ void scheduler_tick(void)
 {
 	int cpu = smp_processor_id();
 	struct rq *rq = cpu_rq(cpu);
-	struct task_struct *curr = rq->curr;
+	/* accounting goes to the selected task */
+	struct task_struct *selected;
 	struct rq_flags rf;
 	unsigned long thermal_pressure;
 	u64 resched_latency;
@@ -5679,16 +5682,17 @@ void scheduler_tick(void)
 	sched_clock_tick();
 
 	rq_lock(rq, &rf);
+	selected = rq_selected(rq);
 
 	update_rq_clock(rq);
 	thermal_pressure = arch_scale_thermal_pressure(cpu_of(rq));
 	update_thermal_load_avg(rq_clock_thermal(rq), rq, thermal_pressure);
-	curr->sched_class->task_tick(rq, curr, 0);
+	selected->sched_class->task_tick(rq, selected, 0);
 	if (sched_feat(LATENCY_WARN))
 		resched_latency = cpu_resched_latency(rq);
 	calc_global_load_tick(rq);
 	sched_core_tick(rq);
-	task_tick_mm_cid(rq, curr);
+	task_tick_mm_cid(rq, selected);
 
 	rq_unlock(rq, &rf);
 
@@ -5697,8 +5701,8 @@ void scheduler_tick(void)
 
 	perf_event_task_tick();
 
-	if (curr->flags & PF_WQ_WORKER)
-		wq_worker_tick(curr);
+	if (selected->flags & PF_WQ_WORKER)
+		wq_worker_tick(selected);
 
 #ifdef CONFIG_SMP
 	rq->idle_balance = idle_cpu(cpu);
@@ -5763,6 +5767,12 @@ static void sched_tick_remote(struct work_struct *work)
 		struct task_struct *curr = rq->curr;
 
 		if (cpu_online(cpu)) {
+			/*
+			 * Since this is a remote tick for full dynticks mode,
+			 * we are always sure that there is no proxy (only a
+			 * single task is running).
+			 */
+			SCHED_WARN_ON(rq->curr != rq_selected(rq));
 			update_rq_clock(rq);
 
 			if (!is_idle_task(curr)) {
@@ -6685,6 +6695,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 	}
 
 	next = pick_next_task(rq, prev, &rf);
+	rq_set_selected(rq, next);
 	clear_tsk_need_resched(prev);
 	clear_preempt_need_resched();
 #ifdef CONFIG_SCHED_DEBUG
@@ -7185,7 +7196,7 @@ void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task)
 
 	prev_class = p->sched_class;
 	queued = task_on_rq_queued(p);
-	running = task_current(rq, p);
+	running = task_current_selected(rq, p);
 	if (queued)
 		dequeue_task(rq, p, queue_flag);
 	if (running)
@@ -7275,7 +7286,7 @@ void set_user_nice(struct task_struct *p, long nice)
 	}
 
 	queued = task_on_rq_queued(p);
-	running = task_current(rq, p);
+	running = task_current_selected(rq, p);
 	if (queued)
 		dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK);
 	if (running)
@@ -7868,7 +7879,7 @@ static int __sched_setscheduler(struct task_struct *p,
 	}
 
 	queued = task_on_rq_queued(p);
-	running = task_current(rq, p);
+	running = task_current_selected(rq, p);
 	if (queued)
 		dequeue_task(rq, p, queue_flags);
 	if (running)
@@ -9295,6 +9306,7 @@ void __init init_idle(struct task_struct *idle, int cpu)
 	rcu_read_unlock();
 
 	rq->idle = idle;
+	rq_set_selected(rq, idle);
 	rcu_assign_pointer(rq->curr, idle);
 	idle->on_rq = TASK_ON_RQ_QUEUED;
 #ifdef CONFIG_SMP
@@ -9384,7 +9396,7 @@ void sched_setnuma(struct task_struct *p, int nid)
 
 	rq = task_rq_lock(p, &rf);
 	queued = task_on_rq_queued(p);
-	running = task_current(rq, p);
+	running = task_current_selected(rq, p);
 
 	if (queued)
 		dequeue_task(rq, p, DEQUEUE_SAVE);
@@ -10489,7 +10501,7 @@ void sched_move_task(struct task_struct *tsk)
 
 	update_rq_clock(rq);
 
-	running = task_current(rq, tsk);
+	running = task_current_selected(rq, tsk);
 	queued = task_on_rq_queued(tsk);
 
 	if (queued)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 6140f1f51da1..9cf20f4ac5f9 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1150,7 +1150,7 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
 #endif
 
 	enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
-	if (dl_task(rq->curr))
+	if (dl_task(rq_selected(rq)))
 		wakeup_preempt_dl(rq, p, 0);
 	else
 		resched_curr(rq);
@@ -1273,7 +1273,7 @@ static u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *dl_se)
  */
 static void update_curr_dl(struct rq *rq)
 {
-	struct task_struct *curr = rq->curr;
+	struct task_struct *curr = rq_selected(rq);
 	struct sched_dl_entity *dl_se = &curr->dl;
 	s64 delta_exec, scaled_delta_exec;
 	int cpu = cpu_of(rq);
@@ -1784,7 +1784,7 @@ static int find_later_rq(struct task_struct *task);
 static int
 select_task_rq_dl(struct task_struct *p, int cpu, int flags)
 {
-	struct task_struct *curr;
+	struct task_struct *curr, *selected;
 	bool select_rq;
 	struct rq *rq;
 
@@ -1795,6 +1795,7 @@ select_task_rq_dl(struct task_struct *p, int cpu, int flags)
 
 	rcu_read_lock();
 	curr = READ_ONCE(rq->curr); /* unlocked access */
+	selected = READ_ONCE(rq_selected(rq));
 
 	/*
 	 * If we are dealing with a -deadline task, we must
@@ -1805,9 +1806,9 @@ select_task_rq_dl(struct task_struct *p, int cpu, int flags)
 	 * other hand, if it has a shorter deadline, we
 	 * try to make it stay here, it might be important.
 	 */
-	select_rq = unlikely(dl_task(curr)) &&
+	select_rq = unlikely(dl_task(selected)) &&
 		    (curr->nr_cpus_allowed < 2 ||
-		     !dl_entity_preempt(&p->dl, &curr->dl)) &&
+		     !dl_entity_preempt(&p->dl, &selected->dl)) &&
 		    p->nr_cpus_allowed > 1;
 
 	/*
@@ -1870,7 +1871,7 @@ static void check_preempt_equal_dl(struct rq *rq, struct task_struct *p)
 	 * let's hope p can move out.
 	 */
 	if (rq->curr->nr_cpus_allowed == 1 ||
-	    !cpudl_find(&rq->rd->cpudl, rq->curr, NULL))
+	    !cpudl_find(&rq->rd->cpudl, rq_selected(rq), NULL))
 		return;
 
 	/*
@@ -1909,7 +1910,7 @@ static int balance_dl(struct rq *rq, struct task_struct *p, struct rq_flags *rf)
 static void wakeup_preempt_dl(struct rq *rq, struct task_struct *p,
 				  int flags)
 {
-	if (dl_entity_preempt(&p->dl, &rq->curr->dl)) {
+	if (dl_entity_preempt(&p->dl, &rq_selected(rq)->dl)) {
 		resched_curr(rq);
 		return;
 	}
@@ -1919,7 +1920,7 @@ static void wakeup_preempt_dl(struct rq *rq, struct task_struct *p,
 	 * In the unlikely case current and p have the same deadline
 	 * let us try to decide what's the best thing to do...
 	 */
-	if ((p->dl.deadline == rq->curr->dl.deadline) &&
+	if ((p->dl.deadline == rq_selected(rq)->dl.deadline) &&
 	    !test_tsk_need_resched(rq->curr))
 		check_preempt_equal_dl(rq, p);
 #endif /* CONFIG_SMP */
@@ -1954,7 +1955,7 @@ static void set_next_task_dl(struct rq *rq, struct task_struct *p, bool first)
 	if (hrtick_enabled_dl(rq))
 		start_hrtick_dl(rq, p);
 
-	if (rq->curr->sched_class != &dl_sched_class)
+	if (rq_selected(rq)->sched_class != &dl_sched_class)
 		update_dl_rq_load_avg(rq_clock_pelt(rq), rq, 0);
 
 	deadline_queue_push_tasks(rq);
@@ -2268,8 +2269,8 @@ static int push_dl_task(struct rq *rq)
 	 * can move away, it makes sense to just reschedule
 	 * without going further in pushing next_task.
 	 */
-	if (dl_task(rq->curr) &&
-	    dl_time_before(next_task->dl.deadline, rq->curr->dl.deadline) &&
+	if (dl_task(rq_selected(rq)) &&
+	    dl_time_before(next_task->dl.deadline, rq_selected(rq)->dl.deadline) &&
 	    rq->curr->nr_cpus_allowed > 1) {
 		resched_curr(rq);
 		return 0;
@@ -2394,7 +2395,7 @@ static void pull_dl_task(struct rq *this_rq)
 			 * deadline than the current task of its runqueue.
 			 */
 			if (dl_time_before(p->dl.deadline,
-					   src_rq->curr->dl.deadline))
+					   rq_selected(src_rq)->dl.deadline))
 				goto skip;
 
 			if (is_migration_disabled(p)) {
@@ -2435,9 +2436,9 @@ static void task_woken_dl(struct rq *rq, struct task_struct *p)
 	if (!task_on_cpu(rq, p) &&
 	    !test_tsk_need_resched(rq->curr) &&
 	    p->nr_cpus_allowed > 1 &&
-	    dl_task(rq->curr) &&
+	    dl_task(rq_selected(rq)) &&
 	    (rq->curr->nr_cpus_allowed < 2 ||
-	     !dl_entity_preempt(&p->dl, &rq->curr->dl))) {
+	     !dl_entity_preempt(&p->dl, &rq_selected(rq)->dl))) {
 		push_dl_tasks(rq);
 	}
 }
@@ -2612,12 +2613,12 @@ static void switched_to_dl(struct rq *rq, struct task_struct *p)
 		return;
 	}
 
-	if (rq->curr != p) {
+	if (rq_selected(rq) != p) {
 #ifdef CONFIG_SMP
 		if (p->nr_cpus_allowed > 1 && rq->dl.overloaded)
 			deadline_queue_push_tasks(rq);
 #endif
-		if (dl_task(rq->curr))
+		if (dl_task(rq_selected(rq)))
 			wakeup_preempt_dl(rq, p, 0);
 		else
 			resched_curr(rq);
@@ -2646,7 +2647,7 @@ static void prio_changed_dl(struct rq *rq, struct task_struct *p,
 	if (!rq->dl.overloaded)
 		deadline_queue_pull_task(rq);
 
-	if (task_current(rq, p)) {
+	if (task_current_selected(rq, p)) {
 		/*
 		 * If we now have a earlier deadline task than p,
 		 * then reschedule, provided p is still on this
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1251fd01a555..07216ea3ed53 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1157,7 +1157,7 @@ static s64 update_curr_se(struct rq *rq, struct sched_entity *curr)
  */
 s64 update_curr_common(struct rq *rq)
 {
-	struct task_struct *curr = rq->curr;
+	struct task_struct *curr = rq_selected(rq);
 	s64 delta_exec;
 
 	delta_exec = update_curr_se(rq, &curr->se);
@@ -1203,7 +1203,7 @@ static void update_curr(struct cfs_rq *cfs_rq)
 
 static void update_curr_fair(struct rq *rq)
 {
-	update_curr(cfs_rq_of(&rq->curr->se));
+	update_curr(cfs_rq_of(&rq_selected(rq)->se));
 }
 
 static inline void
@@ -6611,7 +6611,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
 		s64 delta = slice - ran;
 
 		if (delta < 0) {
-			if (task_current(rq, p))
+			if (task_current_selected(rq, p))
 				resched_curr(rq);
 			return;
 		}
@@ -6626,7 +6626,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
  */
 static void hrtick_update(struct rq *rq)
 {
-	struct task_struct *curr = rq->curr;
+	struct task_struct *curr = rq_selected(rq);
 
 	if (!hrtick_enabled_fair(rq) || curr->sched_class != &fair_sched_class)
 		return;
@@ -8235,7 +8235,7 @@ static void set_next_buddy(struct sched_entity *se)
  */
 static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int wake_flags)
 {
-	struct task_struct *curr = rq->curr;
+	struct task_struct *curr = rq_selected(rq);
 	struct sched_entity *se = &curr->se, *pse = &p->se;
 	struct cfs_rq *cfs_rq = task_cfs_rq(curr);
 	int next_buddy_marked = 0;
@@ -8268,7 +8268,7 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
 	 * prevents us from potentially nominating it as a false LAST_BUDDY
 	 * below.
 	 */
-	if (test_tsk_need_resched(curr))
+	if (test_tsk_need_resched(rq->curr))
 		return;
 
 	/* Idle tasks are by definition preempted by non-idle tasks. */
@@ -9252,7 +9252,7 @@ static bool __update_blocked_others(struct rq *rq, bool *done)
 	 * update_load_avg() can call cpufreq_update_util(). Make sure that RT,
 	 * DL and IRQ signals have been updated before updating CFS.
 	 */
-	curr_class = rq->curr->sched_class;
+	curr_class = rq_selected(rq)->sched_class;
 
 	thermal_pressure = arch_scale_thermal_pressure(cpu_of(rq));
 
@@ -12640,7 +12640,7 @@ prio_changed_fair(struct rq *rq, struct task_struct *p, int oldprio)
 	 * our priority decreased, or if we are not currently running on
 	 * this runqueue and our priority is higher than the current's
 	 */
-	if (task_current(rq, p)) {
+	if (task_current_selected(rq, p)) {
 		if (p->prio > oldprio)
 			resched_curr(rq);
 	} else
@@ -12743,7 +12743,7 @@ static void switched_to_fair(struct rq *rq, struct task_struct *p)
 		 * kick off the schedule if running, otherwise just see
 		 * if we can still preempt the current task.
 		 */
-		if (task_current(rq, p))
+		if (task_current_selected(rq, p))
 			resched_curr(rq);
 		else
 			wakeup_preempt(rq, p, 0);
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 9cdea3ea47da..2682cec45aaa 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -530,7 +530,7 @@ static void dequeue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags)
 
 static void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
 {
-	struct task_struct *curr = rq_of_rt_rq(rt_rq)->curr;
+	struct task_struct *curr = rq_selected(rq_of_rt_rq(rt_rq));
 	struct rq *rq = rq_of_rt_rq(rt_rq);
 	struct sched_rt_entity *rt_se;
 
@@ -1000,7 +1000,7 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq)
  */
 static void update_curr_rt(struct rq *rq)
 {
-	struct task_struct *curr = rq->curr;
+	struct task_struct *curr = rq_selected(rq);
 	struct sched_rt_entity *rt_se = &curr->rt;
 	s64 delta_exec;
 
@@ -1545,7 +1545,7 @@ static int find_lowest_rq(struct task_struct *task);
 static int
 select_task_rq_rt(struct task_struct *p, int cpu, int flags)
 {
-	struct task_struct *curr;
+	struct task_struct *curr, *selected;
 	struct rq *rq;
 	bool test;
 
@@ -1557,6 +1557,7 @@ select_task_rq_rt(struct task_struct *p, int cpu, int flags)
 
 	rcu_read_lock();
 	curr = READ_ONCE(rq->curr); /* unlocked access */
+	selected = READ_ONCE(rq_selected(rq));
 
 	/*
 	 * If the current task on @p's runqueue is an RT task, then
@@ -1585,8 +1586,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, int flags)
 	 * systems like big.LITTLE.
 	 */
 	test = curr &&
-	       unlikely(rt_task(curr)) &&
-	       (curr->nr_cpus_allowed < 2 || curr->prio <= p->prio);
+	       unlikely(rt_task(selected)) &&
+	       (curr->nr_cpus_allowed < 2 || selected->prio <= p->prio);
 
 	if (test || !rt_task_fits_capacity(p, cpu)) {
 		int target = find_lowest_rq(p);
@@ -1616,12 +1617,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, int flags)
 
 static void check_preempt_equal_prio(struct rq *rq, struct task_struct *p)
 {
-	/*
-	 * Current can't be migrated, useless to reschedule,
-	 * let's hope p can move out.
-	 */
 	if (rq->curr->nr_cpus_allowed == 1 ||
-	    !cpupri_find(&rq->rd->cpupri, rq->curr, NULL))
+	    !cpupri_find(&rq->rd->cpupri, rq_selected(rq), NULL))
 		return;
 
 	/*
@@ -1664,7 +1661,9 @@ static int balance_rt(struct rq *rq, struct task_struct *p, struct rq_flags *rf)
  */
 static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int flags)
 {
-	if (p->prio < rq->curr->prio) {
+	struct task_struct *curr = rq_selected(rq);
+
+	if (p->prio < curr->prio) {
 		resched_curr(rq);
 		return;
 	}
@@ -1682,7 +1681,7 @@ static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int flags)
 	 * to move current somewhere else, making room for our non-migratable
 	 * task.
 	 */
-	if (p->prio == rq->curr->prio && !test_tsk_need_resched(rq->curr))
+	if (p->prio == curr->prio && !test_tsk_need_resched(rq->curr))
 		check_preempt_equal_prio(rq, p);
 #endif
 }
@@ -1707,7 +1706,7 @@ static inline void set_next_task_rt(struct rq *rq, struct task_struct *p, bool f
 	 * utilization. We only care of the case where we start to schedule a
 	 * rt task
 	 */
-	if (rq->curr->sched_class != &rt_sched_class)
+	if (rq_selected(rq)->sched_class != &rt_sched_class)
 		update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 0);
 
 	rt_queue_push_tasks(rq);
@@ -1988,6 +1987,7 @@ static struct task_struct *pick_next_pushable_task(struct rq *rq)
 
 	BUG_ON(rq->cpu != task_cpu(p));
 	BUG_ON(task_current(rq, p));
+	BUG_ON(task_current_selected(rq, p));
 	BUG_ON(p->nr_cpus_allowed <= 1);
 
 	BUG_ON(!task_on_rq_queued(p));
@@ -2020,7 +2020,7 @@ static int push_rt_task(struct rq *rq, bool pull)
 	 * higher priority than current. If that's the case
 	 * just reschedule current.
 	 */
-	if (unlikely(next_task->prio < rq->curr->prio)) {
+	if (unlikely(next_task->prio < rq_selected(rq)->prio)) {
 		resched_curr(rq);
 		return 0;
 	}
@@ -2375,7 +2375,7 @@ static void pull_rt_task(struct rq *this_rq)
 			 * p if it is lower in priority than the
 			 * current task on the run queue
 			 */
-			if (p->prio < src_rq->curr->prio)
+			if (p->prio < rq_selected(src_rq)->prio)
 				goto skip;
 
 			if (is_migration_disabled(p)) {
@@ -2419,9 +2419,9 @@ static void task_woken_rt(struct rq *rq, struct task_struct *p)
 	bool need_to_push = !task_on_cpu(rq, p) &&
 			    !test_tsk_need_resched(rq->curr) &&
 			    p->nr_cpus_allowed > 1 &&
-			    (dl_task(rq->curr) || rt_task(rq->curr)) &&
+			    (dl_task(rq_selected(rq)) || rt_task(rq_selected(rq))) &&
 			    (rq->curr->nr_cpus_allowed < 2 ||
-			     rq->curr->prio <= p->prio);
+			     rq_selected(rq)->prio <= p->prio);
 
 	if (need_to_push)
 		push_rt_tasks(rq);
@@ -2505,7 +2505,7 @@ static void switched_to_rt(struct rq *rq, struct task_struct *p)
 		if (p->nr_cpus_allowed > 1 && rq->rt.overloaded)
 			rt_queue_push_tasks(rq);
 #endif /* CONFIG_SMP */
-		if (p->prio < rq->curr->prio && cpu_online(cpu_of(rq)))
+		if (p->prio < rq_selected(rq)->prio && cpu_online(cpu_of(rq)))
 			resched_curr(rq);
 	}
 }
@@ -2520,7 +2520,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p, int oldprio)
 	if (!task_on_rq_queued(p))
 		return;
 
-	if (task_current(rq, p)) {
+	if (task_current_selected(rq, p)) {
 #ifdef CONFIG_SMP
 		/*
 		 * If our priority decreases while running, we
@@ -2546,7 +2546,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p, int oldprio)
 		 * greater than the current running task
 		 * then reschedule.
 		 */
-		if (p->prio < rq->curr->prio)
+		if (p->prio < rq_selected(rq)->prio)
 			resched_curr(rq);
 	}
 }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3e0e4fc8734b..6ea1dfbe502a 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -994,7 +994,10 @@ struct rq {
 	 */
 	unsigned int		nr_uninterruptible;
 
-	struct task_struct __rcu	*curr;
+	struct task_struct __rcu	*curr;       /* Execution context */
+#ifdef CONFIG_SCHED_PROXY_EXEC
+	struct task_struct __rcu	*curr_selected; /* Scheduling context (policy) */
+#endif
 	struct task_struct	*idle;
 	struct task_struct	*stop;
 	unsigned long		next_balance;
@@ -1189,6 +1192,20 @@ DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
 #define cpu_curr(cpu)		(cpu_rq(cpu)->curr)
 #define raw_rq()		raw_cpu_ptr(&runqueues)
 
+#ifdef CONFIG_SCHED_PROXY_EXEC
+#define rq_selected(rq)		((rq)->curr_selected)
+static inline void rq_set_selected(struct rq *rq, struct task_struct *t)
+{
+	rcu_assign_pointer(rq->curr_selected, t);
+}
+#else
+#define rq_selected(rq)		((rq)->curr)
+static inline void rq_set_selected(struct rq *rq, struct task_struct *t)
+{
+	/* Do nothing */
+}
+#endif
+
 struct sched_group;
 #ifdef CONFIG_SCHED_CORE
 static inline struct cpumask *sched_group_span(struct sched_group *sg);
@@ -2112,11 +2129,25 @@ static inline u64 global_rt_runtime(void)
 	return (u64)sysctl_sched_rt_runtime * NSEC_PER_USEC;
 }
 
+/*
+ * Is p the current execution context?
+ */
 static inline int task_current(struct rq *rq, struct task_struct *p)
 {
 	return rq->curr == p;
 }
 
+/*
+ * Is p the current scheduling context?
+ *
+ * Note that it might be the current execution context at the same time if
+ * rq->curr == rq_selected() == p.
+ */
+static inline int task_current_selected(struct rq *rq, struct task_struct *p)
+{
+	return rq_selected(rq) == p;
+}
+
 static inline int task_on_cpu(struct rq *rq, struct task_struct *p)
 {
 #ifdef CONFIG_SMP
@@ -2280,7 +2311,7 @@ struct sched_class {
 
 static inline void put_prev_task(struct rq *rq, struct task_struct *prev)
 {
-	WARN_ON_ONCE(rq->curr != prev);
+	WARN_ON_ONCE(rq_selected(rq) != prev);
 	prev->sched_class->put_prev_task(rq, prev);
 }
 

From patchwork Wed Dec 20 00:18:20 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: John Stultz <jstultz@google.com>
X-Patchwork-Id: 181357
Return-Path: <linux-kernel+bounces-6144-ouuuleilei=gmail.com@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp2323310dyi;
        Tue, 19 Dec 2023 16:21:52 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IGpqyCVnqtgTZzPPDaXeMujCafZCfUWGHmTDo/8G0nuWIaLpVZv2hrPrEmUkCk2SgcyTaUB
X-Received: by 2002:a05:6e02:12c8:b0:35d:768e:871e with SMTP id
 i8-20020a056e0212c800b0035d768e871emr28858078ilm.4.1703031712076;
        Tue, 19 Dec 2023 16:21:52 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1703031712; cv=none;
        d=google.com; s=arc-20160816;
        b=VshWzW/nmzW2VLevJ0tqFs6/kVVqvZYrFwQiPoOXn3vsq/WW9Kylyxa9bqGKAWZ6LW
         TfWSJQFSgO0/nKWP5IAVfAlBDdGm248fZ0G5d347XXLJ9DU6GpU3PUcuQMhh9NAfnnWU
         G6AyxuAj1dmzCakdqBJU5JnecIbAlzxMIP61Nt1b1w5iAYT6kO3H2nq3pFtryObmpQj6
         6xSHXcIAfdrzyGpzchPH71cav9vgJ2++agyQx/m01/AwYzR5jr0CsJ4/iyLnKGjAcVkQ
         xG48sEI3/VtMAY9MXQpVfiNxGdm1wgF3f8ZozFZlrY60KpW0hU4uAn5U7CD2FyE51SbJ
         wF0g==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=cc:to:from:subject:message-id:references:mime-version
         :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date
         :dkim-signature;
        bh=cuXcWz7no4J7hYPFBrTfbHGW+LQBOb7rSCL/XXuOMcE=;
        fh=lq1k4m46TqOToIb7rHC/F2KmHTJTjmaV9sS3Am9W7nA=;
        b=eqOQfSyLDaz9ieslW9vvNPPQz+7lZ2tJXJHkGrScxNGAlsenTNVsFmpRzMHkE2CH4H
         bjNm2h4WGKb3dcFuo0udndwUhO4p/t8YYtAaBMikjRPTrnxzYsxJduPUzW+h2S0ixaAr
         Vgy7Rhx4VWIWXYATjZXT6BwcPFcQWPOhnKevUwKeMJ96fgTs69LCJekpUs/4XhmW70Xz
         36Q0M09gDz9gZstpsBZcviToJD1H8Ci6QXKP46IcVJPJhQlNVA9zYe5gtl3aR6TZNSm7
         Ix5aZ23djej0hyaq3ZbarRptu0eSBO6rA4l9GA6353kkhgmq1JCzOLow7GBfc4BThV+1
         z79g==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=Hdc7kpd0;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6144-ouuuleilei=gmail.com@vger.kernel.org designates
 2604:1380:45e3:2400::1 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6144-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org.
 [2604:1380:45e3:2400::1])
        by mx.google.com with ESMTPS id
 x71-20020a63864a000000b005cd82a478f7si6356405pgd.751.2023.12.19.16.21.51
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 19 Dec 2023 16:21:52 -0800 (PST)
Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-6144-ouuuleilei=gmail.com@vger.kernel.org designates
 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=Hdc7kpd0;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6144-ouuuleilei=gmail.com@vger.kernel.org designates
 2604:1380:45e3:2400::1 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6144-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org
 [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by sv.mirrors.kernel.org (Postfix) with ESMTPS id C0316288034
	for <ouuuleilei@gmail.com>; Wed, 20 Dec 2023 00:21:51 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp.subspace.kernel.org (Postfix) with ESMTP id 08F20168A4;
	Wed, 20 Dec 2023 00:19:24 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="Hdc7kpd0"
X-Original-To: linux-kernel@vger.kernel.org
Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com
 [209.85.128.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9D473FBF9
	for <linux-kernel@vger.kernel.org>; Wed, 20 Dec 2023 00:19:19 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Received: by mail-yw1-f202.google.com with SMTP id
 00721157ae682-5e744f7ca3bso28795007b3.2
        for <linux-kernel@vger.kernel.org>;
 Tue, 19 Dec 2023 16:19:19 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1703031558; x=1703636358;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=cuXcWz7no4J7hYPFBrTfbHGW+LQBOb7rSCL/XXuOMcE=;
        b=Hdc7kpd0JSlaT6HaqU305GT9p2q+6sERQ+Tawqr7rtKFHHnhymgsIXAPPxiGdO0uEQ
         wrWzuDOQvYyE8HJkwjxRJBGs/KBtvF5EHixfSqxgxg5v89+/VZCf3pVlZDLIoPLEYdyn
         xREssLNiBELIXorXgCF58uuECIqIBEuftfN1RndRJ3bBw06XoJadHxmzhA6cAVoonx7f
         yJib8bO1AKl8nykmM2DUWOIdhnOeLdknNbkyIkryeikF0o8w+1/6tbel3HCbeEVYGY85
         GSx4Sp4pmsIBjblBsZqqOnJnZ7UxpwbNjuyEHEzdCfmn0D4kms5xZ/OjksFdsJwl77Bt
         MD6w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1703031558; x=1703636358;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=cuXcWz7no4J7hYPFBrTfbHGW+LQBOb7rSCL/XXuOMcE=;
        b=I6ei6OQyHjjxoMH/m/jOJdvEtWqmRiAil5bPJbwkGqdOTA8kExAsloFQEqmoNX+CB8
         A6XedcFbveylXpKLMCbqm1rG/ZfqLrTiO1CCQeEI67E7P/XXUwX8WfVxmdTDkNxyG5Mq
         HW3BmCiNOQFwR3c/u6MJ9d8hKLqDk8LEA5rNsj/bFcG3rooPH0QoKNf4qvspwN8J5S8u
         JFd1Oi7u74fJVrsAp0XEbg33yydlrXUFbPLq6vJoG9ksJeSzmimc2xKv3paMTX68ssBL
         h84mjjNCZ8ZyfG7pWJDiKZLiWgk2Qd+Qv5hj/65RnRAAw8ZPL/sPJ6nWUfV4iZhDdb0d
         PXRQ==
X-Gm-Message-State: AOJu0YxJ+pzu4uY0QYWZ83Jiuw3TAI76oYpwVeUlp7hZiliycrRDGz6h
	XBd0zgbE+GWZ5KorFiz4Ena90SlerPSW/ndbsfwPvohCf1Ir3bb/eEGaZtKgJH2TIxWLHKT3yVD
	fNThN5cWWAZPFOfyucWvgLNgJXu8mR+Oyr7kZsCJL/jHa4CmEr70YjE34hdYTPz4fsWBOGjQ=
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a05:690c:f09:b0:5e6:27ee:67fb with SMTP id
 dc9-20020a05690c0f0900b005e627ee67fbmr1972577ywb.4.1703031558554; Tue, 19 Dec
 2023 16:19:18 -0800 (PST)
Date: Tue, 19 Dec 2023 16:18:20 -0800
In-Reply-To: <20231220001856.3710363-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231220001856.3710363-1-jstultz@google.com>
X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog
Message-ID: <20231220001856.3710363-10-jstultz@google.com>
Subject: [PATCH v7 09/23] sched: Fix runtime accounting w/ split exec & sched
 contexts
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@google.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <vschneid@redhat.com>,
 Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Zimuzo Ezeozue <zezeozue@google.com>,
	Youssef Esmat <youssefesmat@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
 Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>, "Paul E. McKenney" <paulmck@kernel.org>,
	Metin Kaya <Metin.Kaya@arm.com>, Xuewen Yan <xuewen.yan94@gmail.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
 Thomas Gleixner <tglx@linutronix.de>, kernel-team@android.com
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1785758180391179205
X-GMAIL-MSGID: 1785758180391179205

The idea here is we want to charge the scheduler-context task's
vruntime but charge the execution-context task's sum_exec_runtime.

This way cputime accounting goes against the task actually running
but vruntime accounting goes against the selected task so we get
proper fairness.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com
Signed-off-by: John Stultz <jstultz@google.com>
---
 kernel/sched/fair.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 07216ea3ed53..085941db5bf1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1129,22 +1129,35 @@ static void update_tg_load_avg(struct cfs_rq *cfs_rq)
 }
 #endif /* CONFIG_SMP */
 
-static s64 update_curr_se(struct rq *rq, struct sched_entity *curr)
+static s64 update_curr_se(struct rq *rq, struct sched_entity *se)
 {
 	u64 now = rq_clock_task(rq);
 	s64 delta_exec;
 
-	delta_exec = now - curr->exec_start;
+	/* Calculate the delta from selected se */
+	delta_exec = now - se->exec_start;
 	if (unlikely(delta_exec <= 0))
 		return delta_exec;
 
-	curr->exec_start = now;
-	curr->sum_exec_runtime += delta_exec;
+	/* Update selected se's exec_start */
+	se->exec_start = now;
+	if (entity_is_task(se)) {
+		struct task_struct *running = rq->curr;
+		/*
+		 * If se is a task, we account the time against the running
+		 * task, as w/ proxy-exec they may not be the same.
+		 */
+		running->se.exec_start = now;
+		running->se.sum_exec_runtime += delta_exec;
+	} else {
+		/* If not task, account the time against se */
+		se->sum_exec_runtime += delta_exec;
+	}
 
 	if (schedstat_enabled()) {
 		struct sched_statistics *stats;
 
-		stats = __schedstats_from_se(curr);
+		stats = __schedstats_from_se(se);
 		__schedstat_set(stats->exec_max,
 				max(delta_exec, stats->exec_max));
 	}

From patchwork Wed Dec 20 00:18:21 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: John Stultz <jstultz@google.com>
X-Patchwork-Id: 181358
Return-Path: <linux-kernel+bounces-6145-ouuuleilei=gmail.com@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp2323424dyi;
        Tue, 19 Dec 2023 16:22:08 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IHqxt9LRpdZqqhfhYXgY72g+oehKH6e52BB3rkBYXHIwSIEeM6s3pCdLMlEB9bu1V8kuoHO
X-Received: by 2002:aa7:8750:0:b0:6d8:44bc:b262 with SMTP id
 g16-20020aa78750000000b006d844bcb262mr2400390pfo.49.1703031728657;
        Tue, 19 Dec 2023 16:22:08 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1703031728; cv=none;
        d=google.com; s=arc-20160816;
        b=W2ZnYreS6RcDPITN3bDftXspoQoR4oc6JI5hpOFNmGB0CQPbLmuUjtY/6osK7unREa
         IJvipmeYbaLi67AvieRj/yY1GpDwptGtvSGg7B3gopf/QcqQwwEk0Cbhwdj4XlKhuMYt
         hZBfGRHBFxBaGW3RmFqEt4HQ2CSgv28+UImAoLGH/lfp7EzXjMr7+zV+A82usGTywhmd
         i6hyZEis0Vm7Vw9aJhlQMq8NBtCA4+69+zrxZioK5vkR1x3jlYOdvK/R5DBadJReSDi0
         JbSxgNliVrZ4dLElXKWFfObFXyLzMe1juvpZlnGEkRt4+nGYJ00i29kOzB8hWXrmjYnp
         eQlw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=cc:to:from:subject:message-id:references:mime-version
         :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date
         :dkim-signature;
        bh=8mv/Ya0qyoH+vxh4Wg8EU7u8l2QsK4QPZg/dyROX2k8=;
        fh=lq1k4m46TqOToIb7rHC/F2KmHTJTjmaV9sS3Am9W7nA=;
        b=GueeVQ3LKEQ+lyvvXoYgltJ72TVCIuqpT7txpH6mndLNKhV/iPz3tZzPv+UmFSBFCt
         urBBv4seNwKAtpL63q8RjJROoJsfl5M3NfuR8HZHfISg7bJMTwXldGXEI90XtNNrm9rT
         hzEY5Swjyz4t9RLhBaEdEfIqzPK42bQp9/24T9jCc4gHtdcFYXxxY5sorkE1mKU6k/1G
         PLsFAbih6GWwOti0d2+UAjUQde/qRX6DqrJhQXQs64gBtouOtNvd/xn07mdsGHbbHU0v
         yfS3U2zsd8XV47ZDSDL7pG8EtqAlWODcV8irI6JQ2SsojF0vbd6WJ3RE5jxIMt5kWNU5
         jaXw==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=uhvuaL8c;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6145-ouuuleilei=gmail.com@vger.kernel.org designates
 139.178.88.99 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6145-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99])
        by mx.google.com with ESMTPS id
 b10-20020a056a00114a00b006ce4de73f9bsi20294399pfm.117.2023.12.19.16.22.08
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 19 Dec 2023 16:22:08 -0800 (PST)
Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-6145-ouuuleilei=gmail.com@vger.kernel.org designates
 139.178.88.99 as permitted sender) client-ip=139.178.88.99;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=uhvuaL8c;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6145-ouuuleilei=gmail.com@vger.kernel.org designates
 139.178.88.99 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6145-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org
 [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by sv.mirrors.kernel.org (Postfix) with ESMTPS id 6CD5E28809A
	for <ouuuleilei@gmail.com>; Wed, 20 Dec 2023 00:22:08 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp.subspace.kernel.org (Postfix) with ESMTP id EB669199C6;
	Wed, 20 Dec 2023 00:19:25 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="uhvuaL8c"
X-Original-To: linux-kernel@vger.kernel.org
Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com
 [209.85.128.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 61DBC12B86
	for <linux-kernel@vger.kernel.org>; Wed, 20 Dec 2023 00:19:21 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Received: by mail-yw1-f202.google.com with SMTP id
 00721157ae682-5e6f42b1482so32899937b3.3
        for <linux-kernel@vger.kernel.org>;
 Tue, 19 Dec 2023 16:19:21 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1703031560; x=1703636360;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=8mv/Ya0qyoH+vxh4Wg8EU7u8l2QsK4QPZg/dyROX2k8=;
        b=uhvuaL8cu8MJEgAZmbYYWpQgyXuA6Z3sDesm/yUhgHlCP26kruH1RnHgmnbQRJ9MBu
         i2Ggsfd8dFiQNtD2EJ97lTBrv+/fzBzRQP/zV/RWJRUkRbuqk1Giytc2pdpIXhyRU5hT
         DmH/4c08EFMz69/GJThJ9zSuK/KbDmspQ/I3CX82ucp5yUqZjzDvqlhrVNiBF5H3BkmP
         507j0fxua9GmPEcZNhPk3wbJGPRaEb6ra6a3LLge3uPJNC0/wYccIswabaKk0XBtA3Xi
         0sofs+QPDSKnoQb7n8hLb4U0EZ5K+4P5Hg5mXw3YjqjtXRkLfnSn3Oe2WgYMxCpRCsZS
         piiA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1703031560; x=1703636360;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=8mv/Ya0qyoH+vxh4Wg8EU7u8l2QsK4QPZg/dyROX2k8=;
        b=HDG+opzlbiVdQb9mQaonxtpRGfqdmuxi+1LMdn1/LATi7I1ehnGkKUzxtocoMG1t+C
         m0dMwDu/Qcp1mNNZ9yiVnIdO847i+skT/XbtvHzGSJvgkT0boPfGk7Xrq0iIMbAXyAzh
         IEE3nTciVHZqL8GHfEHcwlYJjhHCutmNytjaUsfZ5BP+LBpwjMvYf2mPYlbVzSqpknkC
         NrKBPSQ/XuVlKgeWyJLnzhGAio2MRaQbYt4LbeKr9PGTSJfTrKRTAKPByyNQnsS8C/1D
         aG39ji6hQgV5a+0YoAC5ScTDlvxt+KC7lwj94+jaji8+QWZhoV+raWBoMMrBQalSYrW9
         dQVg==
X-Gm-Message-State: AOJu0YyiOxFrpgNTVcYK9h2Y5DTx9Ou4jWUVssllYHGSuW+B5EQVJ6Sz
	a8kWU/w0ZGs+YqRiwfkMXkLkzsJbPShGf0LLYbchB40l2k3AjSlFAU3/Utn+OlBfnBQlKzTD2iS
	8omfVgGhWm2Yaqt04GgglqAkUB9sdoxrO1CnOOkBYSWLTZ44I6GWIBbumDPsoHS7CFqul4Lo=
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a05:690c:c02:b0:5e6:28b2:8bf2 with SMTP id
 cl2-20020a05690c0c0200b005e628b28bf2mr1829905ywb.0.1703031560363; Tue, 19 Dec
 2023 16:19:20 -0800 (PST)
Date: Tue, 19 Dec 2023 16:18:21 -0800
In-Reply-To: <20231220001856.3710363-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231220001856.3710363-1-jstultz@google.com>
X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog
Message-ID: <20231220001856.3710363-11-jstultz@google.com>
Subject: [PATCH v7 10/23] sched: Split out __sched() deactivate task logic
 into a helper
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@google.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <vschneid@redhat.com>,
 Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Zimuzo Ezeozue <zezeozue@google.com>,
	Youssef Esmat <youssefesmat@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
 Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>, "Paul E. McKenney" <paulmck@kernel.org>,
	Metin Kaya <Metin.Kaya@arm.com>, Xuewen Yan <xuewen.yan94@gmail.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
 Thomas Gleixner <tglx@linutronix.de>, kernel-team@android.com
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1785758198315049443
X-GMAIL-MSGID: 1785758198315049443

As we're going to re-use the deactivation logic,
split it into a helper.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com
Signed-off-by: John Stultz <jstultz@google.com>
---
v6:
* Define function as static to avoid "no previous prototype"
  warnings as Reported-by: kernel test robot <lkp@intel.com>
v7:
* Rename state task_state to be more clear, as suggested by
  Metin Kaya
---
 kernel/sched/core.c | 66 +++++++++++++++++++++++++--------------------
 1 file changed, 37 insertions(+), 29 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0ce34f5c0e0c..34acd80b6bd0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6571,6 +6571,42 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 # define SM_MASK_PREEMPT	SM_PREEMPT
 #endif
 
+static bool try_to_deactivate_task(struct rq *rq, struct task_struct *p,
+				   unsigned long task_state)
+{
+	if (signal_pending_state(task_state, p)) {
+		WRITE_ONCE(p->__state, TASK_RUNNING);
+	} else {
+		p->sched_contributes_to_load =
+			(task_state & TASK_UNINTERRUPTIBLE) &&
+			!(task_state & TASK_NOLOAD) &&
+			!(task_state & TASK_FROZEN);
+
+		if (p->sched_contributes_to_load)
+			rq->nr_uninterruptible++;
+
+		/*
+		 * __schedule()			ttwu()
+		 *   prev_state = prev->state;    if (p->on_rq && ...)
+		 *   if (prev_state)		    goto out;
+		 *     p->on_rq = 0;		  smp_acquire__after_ctrl_dep();
+		 *				  p->state = TASK_WAKING
+		 *
+		 * Where __schedule() and ttwu() have matching control dependencies.
+		 *
+		 * After this, schedule() must not care about p->state any more.
+		 */
+		deactivate_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK);
+
+		if (p->in_iowait) {
+			atomic_inc(&rq->nr_iowait);
+			delayacct_blkio_start();
+		}
+		return true;
+	}
+	return false;
+}
+
 /*
  * __schedule() is the main scheduler function.
  *
@@ -6662,35 +6698,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 	 */
 	prev_state = READ_ONCE(prev->__state);
 	if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) {
-		if (signal_pending_state(prev_state, prev)) {
-			WRITE_ONCE(prev->__state, TASK_RUNNING);
-		} else {
-			prev->sched_contributes_to_load =
-				(prev_state & TASK_UNINTERRUPTIBLE) &&
-				!(prev_state & TASK_NOLOAD) &&
-				!(prev_state & TASK_FROZEN);
-
-			if (prev->sched_contributes_to_load)
-				rq->nr_uninterruptible++;
-
-			/*
-			 * __schedule()			ttwu()
-			 *   prev_state = prev->state;    if (p->on_rq && ...)
-			 *   if (prev_state)		    goto out;
-			 *     p->on_rq = 0;		  smp_acquire__after_ctrl_dep();
-			 *				  p->state = TASK_WAKING
-			 *
-			 * Where __schedule() and ttwu() have matching control dependencies.
-			 *
-			 * After this, schedule() must not care about p->state any more.
-			 */
-			deactivate_task(rq, prev, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK);
-
-			if (prev->in_iowait) {
-				atomic_inc(&rq->nr_iowait);
-				delayacct_blkio_start();
-			}
-		}
+		try_to_deactivate_task(rq, prev, prev_state);
 		switch_count = &prev->nvcsw;
 	}
 

From patchwork Wed Dec 20 00:18:22 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: John Stultz <jstultz@google.com>
X-Patchwork-Id: 181361
Return-Path: <linux-kernel+bounces-6146-ouuuleilei=gmail.com@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp2323857dyi;
        Tue, 19 Dec 2023 16:23:14 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IH7OdIojjmKIBSi8zG3E1jNFMsuxwctmx0MKrt39kBeS1RALEPO4w5i4HLsiH8OmidLWZON
X-Received: by 2002:a17:902:e74a:b0:1d3:dfef:52f2 with SMTP id
 p10-20020a170902e74a00b001d3dfef52f2mr2472286plf.13.1703031794201;
        Tue, 19 Dec 2023 16:23:14 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1703031794; cv=none;
        d=google.com; s=arc-20160816;
        b=Zs4TDqhj7z/wce/WEox9WMHHEXPL3U2zZfwpmB8z5/I+9kCa+0OfQOY6lYvaBkDEv2
         9nyG+O0qo723tYNIHyDoW0rohRXxaY1S8b3brGNkG82VqclFXHPBNRYdRaiwn8S9Phmz
         kVUMKxtudLUrQOsm3vW8MIB1HIygC7L7QBrPZGqQjvEQycoytvtv7m6djWQdR1XGFGRW
         Daa9hwV0Ro/PgI9PkviIEWj7lOcrSTg4cOgcLsWyojr4zCRRoKWr97xg9hcvlgTsqjQH
         iNuQuN4FeGiJQ8Ps6Ue7nMG0qFuQItNHl3J1RUsRoIYemZMFXl6pL2lyKm08LJ+FRraF
         0RXA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=cc:to:from:subject:message-id:references:mime-version
         :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date
         :dkim-signature;
        bh=1WvcOEd7dBy/nEtXMgBcYnJnQJnDQmfBeveDTN2lf28=;
        fh=lq1k4m46TqOToIb7rHC/F2KmHTJTjmaV9sS3Am9W7nA=;
        b=T8h1XL1k/yLOlcaSl/ndxMksz/JJCh+H094lYk1kTaHjSk/y6zZDA5fZy1D3YG22FZ
         6R0zDyY/qp3e51J1flS9NDxHSCRESVQl7sKKq4k1sXm0lRhCMvpOw46X2fVB9+6yG/bU
         xIocUdk711SKveIUwFQCYMv3x+oIZ2bCauIn3aRTqr5Epf6ZG/v90ROuOeObWW6qQqAZ
         IILNhSEvGd00JzP3CZb6DzZPBpTH57o6rods4totBSaK6UB4Bi+jepib8IhZEsKoOlnU
         Glp+61sTO6gXrnfOE8QP2qo3K3qnwcl4qr4d64KNRkpY0iQ/N3pSmi9ah/6eJdj5cCng
         47tA==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=kwDq0Ae1;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6146-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.48.161 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6146-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161])
        by mx.google.com with ESMTPS id
 h4-20020a170902f54400b001cf54c7adb7si10443848plf.20.2023.12.19.16.23.13
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 19 Dec 2023 16:23:14 -0800 (PST)
Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-6146-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.48.161 as permitted sender) client-ip=147.75.48.161;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=kwDq0Ae1;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6146-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.48.161 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6146-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org
 [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by sy.mirrors.kernel.org (Postfix) with ESMTPS id EF16AB256F2
	for <ouuuleilei@gmail.com>; Wed, 20 Dec 2023 00:22:36 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp.subspace.kernel.org (Postfix) with ESMTP id C6D591B285;
	Wed, 20 Dec 2023 00:19:27 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="kwDq0Ae1"
X-Original-To: linux-kernel@vger.kernel.org
Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com
 [209.85.219.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0C3D715ADB
	for <linux-kernel@vger.kernel.org>; Wed, 20 Dec 2023 00:19:22 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Received: by mail-yb1-f202.google.com with SMTP id
 3f1490d57ef6-dbd4ed0b4d4so2258910276.3
        for <linux-kernel@vger.kernel.org>;
 Tue, 19 Dec 2023 16:19:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1703031562; x=1703636362;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=1WvcOEd7dBy/nEtXMgBcYnJnQJnDQmfBeveDTN2lf28=;
        b=kwDq0Ae1Hr8XjyAoSWzUUQeLofbHyhG8PyIqE64GFejoy/mYCAIhMTTOeWDOI4uECZ
         B4DIRz480AqKgR3Bpzs/oTPaZtCef27C5uJYW/8SPTLeHmWzg/bYzfODpj4HILrnnnkp
         EgN9iE7zJ3qnkV7NZW7x3L9fb6gY7GKFmV4ciq04t7T6yzngYhvdaj3fRsetuuGvzAJi
         g8VN3PUL1g0YOOrIeBuyQW/vty3SMWKY8uDnqmRjuyfOd9YJsObVzDxQscvLnGiiqJGX
         0ykkdP/74ntuAgV5q6nh3H+HOLs/0EoYAjyz4kjwRMruOscpri7HgU9EOEh7LnaqEcuS
         i2RA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1703031562; x=1703636362;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=1WvcOEd7dBy/nEtXMgBcYnJnQJnDQmfBeveDTN2lf28=;
        b=GhLTEiH6kOBRU/14ule/RZlYTFPrYaKMJuEKd5bzRjRvcAIXyKh3INs9Wp/MXlIeQZ
         N/r7Mm+RpK0j8fOjDaEVhFxeMwIRNu0Em+7QBBpjxpUBnxKKMKWSar8dIt74vuvNHAin
         hTc9KmiBM0PKdBGnfA4NdruuQw7lJLyfVGYRqtxAM3+2p6bOChd3G7/VjGmlS+Ubbjsp
         OmGO2VyMlJJyoTa/wb2Q39Fh1RWsCND2heSRo1MJoCVwkRX3vomaZxqM5xaem77XHK7q
         4gvmqbuGXmZnBdoHFlNyj6lgvJys5dZCWRmgo6vRIX7m5Nz661vmcLio8LaYetNg+VjB
         Rm5w==
X-Gm-Message-State: AOJu0Yxm9L6UfHLYWlVX7Sh7qlI8pHCV2gxPyICj5++N1eOZ/wrfdLu5
	Z1SS9jLtgH/qwfC4KxI247QfNac0WGakJfNV20ss5WQ+fzbTXBCdlvK+BINjWXetN39CBbi/xJf
	RRs6euFuGty4G7gBEZw1OtflWlNbRrCE5URHG1+g+dQwPLi769WkjDM1gLUBiqFkOjtqb4sA=
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a05:6902:54f:b0:dbd:5a7f:6b38 with SMTP id
 z15-20020a056902054f00b00dbd5a7f6b38mr533918ybs.12.1703031561931; Tue, 19 Dec
 2023 16:19:21 -0800 (PST)
Date: Tue, 19 Dec 2023 16:18:22 -0800
In-Reply-To: <20231220001856.3710363-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231220001856.3710363-1-jstultz@google.com>
X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog
Message-ID: <20231220001856.3710363-12-jstultz@google.com>
Subject: [PATCH v7 11/23] sched: Add a initial sketch of the find_proxy_task()
 function
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@google.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <vschneid@redhat.com>,
 Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Zimuzo Ezeozue <zezeozue@google.com>,
	Youssef Esmat <youssefesmat@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
 Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>, "Paul E. McKenney" <paulmck@kernel.org>,
	Metin Kaya <Metin.Kaya@arm.com>, Xuewen Yan <xuewen.yan94@gmail.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
 Thomas Gleixner <tglx@linutronix.de>, kernel-team@android.com
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1785758266640041872
X-GMAIL-MSGID: 1785758266640041872

Add a find_proxy_task() function which doesn't do much.

When we select a blocked task to run, we will just deactivate it
and pick again. The exception being if it has become unblocked
after find_proxy_task() was called.

Greatly simplified from patch by:
  Peter Zijlstra (Intel) <peterz@infradead.org>
  Juri Lelli <juri.lelli@redhat.com>
  Valentin Schneider <valentin.schneider@arm.com>
  Connor O'Brien <connoro@google.com>

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com
[jstultz: Split out from larger proxy patch and simplified
 for review and testing.]
Signed-off-by: John Stultz <jstultz@google.com>
---
v5:
* Split out from larger proxy patch
v7:
* Fixed unused function arguments, spelling nits, and tweaks for
  clarity, pointed out by Metin Kaya
* Moved task_is_blocked() implementation to this patch where it is
  first used. Also drop unused arguments. Suggested by Metin Kaya.
* Tweaks to make things easier to read, as suggested by Metin Kaya.
* Rename proxy() to find_proxy_task() for clarity, and typo
  fixes suggested by Metin Kaya
* Fix build warning Reported-by: kernel test robot <lkp@intel.com>
  Closes: https://lore.kernel.org/oe-kbuild-all/202311081028.yDLmCWgr-lkp@intel.com/
---
 kernel/sched/core.c  | 87 ++++++++++++++++++++++++++++++++++++++++++--
 kernel/sched/rt.c    | 19 +++++++++-
 kernel/sched/sched.h | 15 ++++++++
 3 files changed, 115 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 34acd80b6bd0..12f5a0618328 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6572,11 +6572,11 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 #endif
 
 static bool try_to_deactivate_task(struct rq *rq, struct task_struct *p,
-				   unsigned long task_state)
+				   unsigned long task_state, bool deactivate_cond)
 {
 	if (signal_pending_state(task_state, p)) {
 		WRITE_ONCE(p->__state, TASK_RUNNING);
-	} else {
+	} else if (deactivate_cond) {
 		p->sched_contributes_to_load =
 			(task_state & TASK_UNINTERRUPTIBLE) &&
 			!(task_state & TASK_NOLOAD) &&
@@ -6607,6 +6607,73 @@ static bool try_to_deactivate_task(struct rq *rq, struct task_struct *p,
 	return false;
 }
 
+#ifdef CONFIG_SCHED_PROXY_EXEC
+
+static bool proxy_deactivate(struct rq *rq, struct task_struct *next)
+{
+	unsigned long state = READ_ONCE(next->__state);
+
+	/* Don't deactivate if the state has been changed to TASK_RUNNING */
+	if (state == TASK_RUNNING)
+		return false;
+	if (!try_to_deactivate_task(rq, next, state, true))
+		return false;
+	put_prev_task(rq, next);
+	rq_set_selected(rq, rq->idle);
+	resched_curr(rq);
+	return true;
+}
+
+/*
+ * Initial simple proxy that just returns the task if it's waking
+ * or deactivates the blocked task so we can pick something that
+ * isn't blocked.
+ */
+static struct task_struct *
+find_proxy_task(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
+{
+	struct task_struct *ret = NULL;
+	struct task_struct *p = next;
+	struct mutex *mutex;
+
+	mutex = p->blocked_on;
+	/* Something changed in the chain, so pick again */
+	if (!mutex)
+		return NULL;
+	/*
+	 * By taking mutex->wait_lock we hold off concurrent mutex_unlock()
+	 * and ensure @owner sticks around.
+	 */
+	raw_spin_lock(&mutex->wait_lock);
+	raw_spin_lock(&p->blocked_lock);
+
+	/* Check again that p is blocked with blocked_lock held */
+	if (!task_is_blocked(p) || mutex != p->blocked_on) {
+		/*
+		 * Something changed in the blocked_on chain and
+		 * we don't know if only at this level. So, let's
+		 * just bail out completely and let __schedule
+		 * figure things out (pick_again loop).
+		 */
+		goto out;
+	}
+
+	if (!proxy_deactivate(rq, next))
+		ret = p;
+out:
+	raw_spin_unlock(&p->blocked_lock);
+	raw_spin_unlock(&mutex->wait_lock);
+	return ret;
+}
+#else /* SCHED_PROXY_EXEC */
+static struct task_struct *
+find_proxy_task(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
+{
+	BUG(); // This should never be called in the !PROXY case
+	return next;
+}
+#endif /* SCHED_PROXY_EXEC */
+
 /*
  * __schedule() is the main scheduler function.
  *
@@ -6698,12 +6765,24 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 	 */
 	prev_state = READ_ONCE(prev->__state);
 	if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) {
-		try_to_deactivate_task(rq, prev, prev_state);
+		try_to_deactivate_task(rq, prev, prev_state,
+				       !task_is_blocked(prev));
 		switch_count = &prev->nvcsw;
 	}
 
-	next = pick_next_task(rq, prev, &rf);
+pick_again:
+	next = pick_next_task(rq, rq_selected(rq), &rf);
 	rq_set_selected(rq, next);
+	if (unlikely(task_is_blocked(next))) {
+		next = find_proxy_task(rq, next, &rf);
+		if (!next) {
+			rq_unpin_lock(rq, &rf);
+			__balance_callbacks(rq);
+			rq_repin_lock(rq, &rf);
+			goto pick_again;
+		}
+	}
+
 	clear_tsk_need_resched(prev);
 	clear_preempt_need_resched();
 #ifdef CONFIG_SCHED_DEBUG
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 2682cec45aaa..81cd22eaa6dc 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1491,8 +1491,19 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p, int flags)
 
 	enqueue_rt_entity(rt_se, flags);
 
-	if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
-		enqueue_pushable_task(rq, p);
+	/*
+	 * Current can't be pushed away. Selected is tied to current,
+	 * so don't push it either.
+	 */
+	if (task_current(rq, p) || task_current_selected(rq, p))
+		return;
+	/*
+	 * Pinned tasks can't be pushed.
+	 */
+	if (p->nr_cpus_allowed == 1)
+		return;
+
+	enqueue_pushable_task(rq, p);
 }
 
 static void dequeue_task_rt(struct rq *rq, struct task_struct *p, int flags)
@@ -1779,6 +1790,10 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
 
 	update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 1);
 
+	/* Avoid marking selected as pushable */
+	if (task_current_selected(rq, p))
+		return;
+
 	/*
 	 * The previous task needs to be made eligible for pushing
 	 * if it is still active
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 6ea1dfbe502a..765ba10661de 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2148,6 +2148,21 @@ static inline int task_current_selected(struct rq *rq, struct task_struct *p)
 	return rq_selected(rq) == p;
 }
 
+#ifdef CONFIG_SCHED_PROXY_EXEC
+static inline bool task_is_blocked(struct task_struct *p)
+{
+	if (!sched_proxy_exec())
+		return false;
+
+	return !!p->blocked_on && p->blocked_on_state != BO_RUNNABLE;
+}
+#else /* !SCHED_PROXY_EXEC */
+static inline bool task_is_blocked(struct task_struct *p)
+{
+	return false;
+}
+#endif /* SCHED_PROXY_EXEC */
+
 static inline int task_on_cpu(struct rq *rq, struct task_struct *p)
 {
 #ifdef CONFIG_SMP

From patchwork Wed Dec 20 00:18:23 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: John Stultz <jstultz@google.com>
X-Patchwork-Id: 181360
Return-Path: <linux-kernel+bounces-6147-ouuuleilei=gmail.com@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp2323673dyi;
        Tue, 19 Dec 2023 16:22:48 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IHEJ9WpYsUOGEYHXYTPsNBHzimYjUZpZA+2QksdefwwnZBMQ5lXBbUZl+0uYaDQl9GbJ5cg
X-Received: by 2002:a05:620a:1909:b0:77d:d53a:b54a with SMTP id
 bj9-20020a05620a190900b0077dd53ab54amr23960039qkb.40.1703031768310;
        Tue, 19 Dec 2023 16:22:48 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1703031768; cv=none;
        d=google.com; s=arc-20160816;
        b=DaXMEBvmYDjkCBtZdo7quqkfAB3xSx7Ig7Zqqzyjki+ivQYhXqBneYTRvEo2X/a6Se
         DKByOk3bD+pODrfi2GoT6rOReb5Ub23uMuBGSyoBmIrnACrpb2NVKPOsfJR5Rdb/XbIV
         6eCrQXvlEiCuYH5iMnGOyCccY3jn/FgJbkQ81rZI7Aa/zguTHs8ZUVsTxmsr4Hrl0JBc
         QFuJth646UO6NNH5WCAsDFLCLU0gL8toNLbZ42izGoG/k82B7bpOyyz46uy/mbpv6uwN
         7uuFevmMROJl+8ClRNJ83kFnJztEDPGueIPFjEnOtS1EL+uktFH6L9CdbglJu+9FJhmH
         IQKw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=cc:to:from:subject:message-id:references:mime-version
         :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date
         :dkim-signature;
        bh=WJZ0zBDHStU6K3+8MFmMAIsVcwSpztLmjKA4OaYGSSc=;
        fh=D8Xl8uhzaYQmGsKoDgW5lEeEjxxrU2lJl72u5ukygg0=;
        b=YEeQE0zXCiD/46P3dh+9/LbHjGZL07QI8x9y1b6Gs3tJVZFGeJp6pmYuqFCe2FRSN/
         v2XqkTUs0TSZ1hELSyyppRhBawbfELn//LlH/7S1hLyHsPG2mSSYrgk+n3wPwx9xLmiD
         nY7nQa4bIGoGGDI9gxLCJpC09W2BiGe94C+R/nxZDCMaAzScAX0nbL6Tc80v7FUvEYAQ
         TPy50hibLHQElLIH1vm5jlNBDMYBdcpP6TSfHVfpvZINbbutTF6fwk9imozPkIgNhojy
         /MGKz1qYm9WJg7xasiK+vpuoiMakiBn83vKIVhW3CkFAVEBhG3oWtf9TRQ6WsL8PUpGT
         9N5Q==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=CN3eC1b8;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6147-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.199.223 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6147-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223])
        by mx.google.com with ESMTPS id
 d18-20020a05620a241200b00773b7d9a01esi28785127qkn.72.2023.12.19.16.22.48
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 19 Dec 2023 16:22:48 -0800 (PST)
Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-6147-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.199.223 as permitted sender) client-ip=147.75.199.223;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=CN3eC1b8;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6147-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.199.223 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6147-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org
 [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by ny.mirrors.kernel.org (Postfix) with ESMTPS id 086FB1C2559A
	for <ouuuleilei@gmail.com>; Wed, 20 Dec 2023 00:22:48 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp.subspace.kernel.org (Postfix) with ESMTP id 72F0A1D687;
	Wed, 20 Dec 2023 00:19:29 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="CN3eC1b8"
X-Original-To: linux-kernel@vger.kernel.org
Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com
 [209.85.216.73])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3D4E1168C4
	for <linux-kernel@vger.kernel.org>; Wed, 20 Dec 2023 00:19:24 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Received: by mail-pj1-f73.google.com with SMTP id
 98e67ed59e1d1-28b6a78a164so3269242a91.0
        for <linux-kernel@vger.kernel.org>;
 Tue, 19 Dec 2023 16:19:24 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1703031563; x=1703636363;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=WJZ0zBDHStU6K3+8MFmMAIsVcwSpztLmjKA4OaYGSSc=;
        b=CN3eC1b8cDs50wBw0TKGRLWUljQS7QAHezN8cauNxoTPio+leTvL9ng2q9JkHFUzyc
         qU5+8QZiu6FnH94cr6H/iWm4HS7Vtr7bse5vFSlYXvtoPRsDmjTxPjxEMHUYWTWmm1YE
         4AxW7Ao8DLaGPwlHL/dnUW/YiFtfSyHssUQbwfVuaaXEJSRT2Kcs5Mde/n7oAgzQkzQJ
         O9XU46GssCRtleqt/+pxewzRZn6t7985edCRsBc7n3AG1DR8W1I9QkvsaYFiz5NutTmr
         AJ1RVwxLg4zLR5qnTB2yFR5zjCecmJrQ59nQ5wD9PogSFgcCpIwWRHcAs+x1hamBCc0a
         mE6w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1703031563; x=1703636363;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=WJZ0zBDHStU6K3+8MFmMAIsVcwSpztLmjKA4OaYGSSc=;
        b=t2v+jI2+WVBFBnLqsC9ckpKLwc70HxHj3b/TS0jwpXonkDsE5yCDWgJfVaXz2PJK9W
         6SbKvON4T2CcJPW6qHcgJBwgfzoL0IUKJboHadQIU8ZDVW/LXh2fU2jIFJ/I0h+urlox
         K6jqGR8nOTjU0S76wD9KUSyf0sWkv20p1LLFnrbyZTxuIy8AHZhBszRjdZbwcKOoL1fq
         pmSuIFulypB/9VsTacntmUspWy21ZoKLaT4cBJ2ouWfNuUSpLDe+3ngEwtI4Ip67pr2R
         Sc2iCchzBd5454EC3quKL46SdbVPUD7msgkFmD6Ibg4eADgwvigRlTbssh3QT3DRIRIE
         u1Fg==
X-Gm-Message-State: AOJu0YykkLqbhPhY6kQR21RVprBrZU2BUBORyrSn6mlZhn6iYbp1THfD
	X5SFtOKC+jR+bMjewdPZVI7ATJFoE4CR76Pw5PE47OIjw+xXtfZ7gBIv4ojtyfptJVfqTYNJBHy
	9SnaoX7Q1WyDOXKJ6EdB9Ey0WPLoWemPhIRNiHHrlLjWS+dGzCgJYIsNWjClpFvARNn184oo=
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a17:90b:1b09:b0:28b:4d36:522 with SMTP id
 nu9-20020a17090b1b0900b0028b4d360522mr884613pjb.8.1703031563577; Tue, 19 Dec
 2023 16:19:23 -0800 (PST)
Date: Tue, 19 Dec 2023 16:18:23 -0800
In-Reply-To: <20231220001856.3710363-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231220001856.3710363-1-jstultz@google.com>
X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog
Message-ID: <20231220001856.3710363-13-jstultz@google.com>
Subject: [PATCH v7 12/23] sched: Fix proxy/current (push,pull)ability
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>,
 Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@google.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <vschneid@redhat.com>,
 Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Zimuzo Ezeozue <zezeozue@google.com>,
	Youssef Esmat <youssefesmat@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
 Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>, "Paul E. McKenney" <paulmck@kernel.org>,
	Metin Kaya <Metin.Kaya@arm.com>, Xuewen Yan <xuewen.yan94@gmail.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
 Thomas Gleixner <tglx@linutronix.de>, kernel-team@android.com,
	"Connor O'Brien" <connoro@google.com>, John Stultz <jstultz@google.com>
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1785758239318493208
X-GMAIL-MSGID: 1785758239318493208

From: Valentin Schneider <valentin.schneider@arm.com>

Proxy execution forms atomic pairs of tasks: The selected task
(scheduling context) and a proxy (execution context). The
selected task, along with the rest of the blocked chain,
follows the proxy wrt CPU placement.

They can be the same task, in which case push/pull doesn't need any
modification. When they are different, however,
FIFO1 & FIFO42:

	      ,->  RT42
	      |     | blocked-on
	      |     v
blocked_donor |   mutex
	      |     | owner
	      |     v
	      `--  RT1

   RT1
   RT42

  CPU0            CPU1
   ^                ^
   |                |
  overloaded    !overloaded
  rq prio = 42  rq prio = 0

RT1 is eligible to be pushed to CPU1, but should that happen it will
"carry" RT42 along. Clearly here neither RT1 nor RT42 must be seen as
push/pullable.

Unfortunately, only the selected task is usually dequeued from the
rq, and the proxy'ed execution context (rq->curr) remains on the rq.
This can cause RT1 to be selected for migration from logic like the
rt pushable_list.

This patch adds a dequeue/enqueue cycle on the proxy task before
__schedule returns, which allows the sched class logic to avoid
adding the now current task to the pushable_list.

Furthermore, tasks becoming blocked on a mutex don't need an explicit
dequeue/enqueue cycle to be made (push/pull)able: they have to be running
to block on a mutex, thus they will eventually hit put_prev_task().

XXX: pinned tasks becoming unblocked should be removed from the push/pull
lists, but those don't get to see __schedule() straight away.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Connor O'Brien <connoro@google.com>
Signed-off-by: John Stultz <jstultz@google.com>
---
v3:
* Tweaked comments & commit message
v5:
* Minor simplifications to utilize the fix earlier
  in the patch series.
* Rework the wording of the commit message to match selected/
  proxy terminology and expand a bit to make it more clear how
  it works.
v6:
* Droped now-unused proxied value, to be re-added later in the
  series when it is used, as caught by Dietmar
v7:
* Unused function argument fixup
* Commit message nit pointed out by Metin Kaya
* Droped unproven unlikely() and use sched_proxy_exec()
  in proxy_tag_curr, suggested by Metin Kaya
---
 kernel/sched/core.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 12f5a0618328..f6bf3b62194c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6674,6 +6674,23 @@ find_proxy_task(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
 }
 #endif /* SCHED_PROXY_EXEC */
 
+static inline void proxy_tag_curr(struct rq *rq, struct task_struct *next)
+{
+	if (sched_proxy_exec()) {
+		/*
+		 * pick_next_task() calls set_next_task() on the selected task
+		 * at some point, which ensures it is not push/pullable.
+		 * However, the selected task *and* the ,mutex owner form an
+		 * atomic pair wrt push/pull.
+		 *
+		 * Make sure owner is not pushable. Unfortunately we can only
+		 * deal with that by means of a dequeue/enqueue cycle. :-/
+		 */
+		dequeue_task(rq, next, DEQUEUE_NOCLOCK | DEQUEUE_SAVE);
+		enqueue_task(rq, next, ENQUEUE_NOCLOCK | ENQUEUE_RESTORE);
+	}
+}
+
 /*
  * __schedule() is the main scheduler function.
  *
@@ -6796,6 +6813,10 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 		 * changes to task_struct made by pick_next_task().
 		 */
 		RCU_INIT_POINTER(rq->curr, next);
+
+		if (!task_current_selected(rq, next))
+			proxy_tag_curr(rq, next);
+
 		/*
 		 * The membarrier system call requires each architecture
 		 * to have a full memory barrier after updating
@@ -6820,6 +6841,10 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 		/* Also unlocks the rq: */
 		rq = context_switch(rq, prev, next, &rf);
 	} else {
+		/* In case next was already curr but just got blocked_donor*/
+		if (!task_current_selected(rq, next))
+			proxy_tag_curr(rq, next);
+
 		rq_unpin_lock(rq, &rf);
 		__balance_callbacks(rq);
 		raw_spin_rq_unlock_irq(rq);

From patchwork Wed Dec 20 00:18:24 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: John Stultz <jstultz@google.com>
X-Patchwork-Id: 181362
Return-Path: <linux-kernel+bounces-6149-ouuuleilei=gmail.com@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp2323863dyi;
        Tue, 19 Dec 2023 16:23:14 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IHtAf8SeuKNd+QVK060xbH1w9XuEMXJEQBzp4V2TAkz55Uh74hOUHrMdKd48OFb3Itm4usj
X-Received: by 2002:a05:6808:319b:b0:3b9:fae1:82b4 with SMTP id
 cd27-20020a056808319b00b003b9fae182b4mr24530152oib.68.1703031794770;
        Tue, 19 Dec 2023 16:23:14 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1703031794; cv=none;
        d=google.com; s=arc-20160816;
        b=XkYLnQAcG/xeDdPyWOLxbYrPqcgw9c9cJpCyfjZvVrdD0n1m8jq+wD9dPhNxOPR6k9
         reNGAMOEECv3FM1PJaY8QzWPDfVWoS03AsXFMMtHBJdSXkaYKvqagMnocYdWQLWxvjSx
         2t/UqI30PqGcfXVjP6jJcD4IdACG30+c/mR0crWdiLtAEAfd5EMewGVFr/0Zc/t/z/8G
         37SkRYpC4VdjTldAKnlH+o9EYH8iQ+X5CtuEuHOkCYyRbG0sM0xJA8RULtN7YtzGALsI
         NuUH3PtDPDgzrAl/9UEJgvskR3PLBWXM2KOmjY3qGsWzsc8tSShg9R6lVoiM2VIXRsr5
         jcpg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=cc:to:from:subject:message-id:references:mime-version
         :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date
         :dkim-signature;
        bh=u8WgaqIXkyAFx0G/RhAV46UCF+hdVqbhGhJLGicpbSA=;
        fh=5JIDPtr1m24UEIajLvRh11pfGrxPinBnvs5yA23VCFE=;
        b=WBMC6Grk+1ab3RJJv+u+aLUyg8wIoYD1scDZYmgvs8auB5f5NirBc3aRf2zu9GXegT
         ERyQ8dYy10l++dNCTetIS4QZPTRwM0FRuxG40/NQQ+1QLvhjxOzAdwpyryXZQCnu9KUB
         Gijq57NBJhK/7RFyu9jsNE9nFweKzUWTbOS4Q59NdVwHNnNq7aAK9duLxcBgVR7ibcvW
         vzOKSXV4yyVLtmVjQUuDduUswqjIhUzFLe+ss2ruc9TmdeARSsg1Sp2CFoNaXOYLDmdH
         PnAL65moj+5iwVEUUgf7X81OkgCy3fSFyFz8w33+qii8qg1+euQh6HxMM+4ay7IHkQxz
         M5TQ==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=KSb7Bdg1;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6149-ouuuleilei=gmail.com@vger.kernel.org designates
 139.178.88.99 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6149-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99])
        by mx.google.com with ESMTPS id
 b10-20020a056a00114a00b006ce4de73f9bsi20294399pfm.117.2023.12.19.16.23.14
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 19 Dec 2023 16:23:14 -0800 (PST)
Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-6149-ouuuleilei=gmail.com@vger.kernel.org designates
 139.178.88.99 as permitted sender) client-ip=139.178.88.99;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=KSb7Bdg1;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6149-ouuuleilei=gmail.com@vger.kernel.org designates
 139.178.88.99 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6149-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org
 [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by sv.mirrors.kernel.org (Postfix) with ESMTPS id 7F977288188
	for <ouuuleilei@gmail.com>; Wed, 20 Dec 2023 00:23:14 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp.subspace.kernel.org (Postfix) with ESMTP id 499C9200AC;
	Wed, 20 Dec 2023 00:19:33 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="KSb7Bdg1"
X-Original-To: linux-kernel@vger.kernel.org
Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com
 [209.85.216.74])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3706613FE4
	for <linux-kernel@vger.kernel.org>; Wed, 20 Dec 2023 00:19:26 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Received: by mail-pj1-f74.google.com with SMTP id
 98e67ed59e1d1-28b99b7fbfaso153628a91.1
        for <linux-kernel@vger.kernel.org>;
 Tue, 19 Dec 2023 16:19:26 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1703031566; x=1703636366;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=u8WgaqIXkyAFx0G/RhAV46UCF+hdVqbhGhJLGicpbSA=;
        b=KSb7Bdg1nLi8JOy4ShJ8c+H8VOwu7np0TDTJhmfDKdjdQxy8vuu4bKxq7jY8kAg0w2
         i/MOKl/iT9GdPAbvopQhYSMeFoD3XJtKt6IoBFrjU5Y1Ea+oIwRdXPcPbwbUfuuYfBJH
         asLstLvuC5jdjqg9QpmiXFqAZ1wq1Vwzso+lcNV9F+Q36Jpl1eDYWWa+x7RQoKgl88Uu
         +7tQfPPTW1yIBMSTdPDUeBn8BKNIOIHje+1+k289Hrg0n65SxzLXX3nATxb7wJJJhsmr
         QKJ3eC2znxBvkHDZ8tPss3Dnpv9mSI43+Gn0xA/xi1xDZHuFPaGRpmcKVDrUJ7GImVqr
         O6WQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1703031566; x=1703636366;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=u8WgaqIXkyAFx0G/RhAV46UCF+hdVqbhGhJLGicpbSA=;
        b=PzuL2J8CUIqGiDgOorvadTiO8MchzCwHHp6GTqkkRQcgXM358GUvbmDJzFNLwENUK+
         rNorMRziBnVeMx6ldbo39u28rH9Dy3Fmbftu1tBe5xGUIX/4EmJ312FhyBOzAy5RgzU2
         QQefDRc+0x/ho2d4egwOJ1+8Hd5hhgeTT9W/d89wLKsiwG5LXtVqBspGgzEh4y1qFOKR
         tK/yl8KBtK22f4D7eIzV+OF8gBR9Hhm2EbFVD87nbinKJHxeldjdlFmgv9/cTyGSpFe0
         1HWzxK8HAF8BFOidqI1B3KPGudIwcJz3HAbeGM3Mbjgwyj/Q2iVp5ZIBd4pTAbru/20y
         d5jg==
X-Gm-Message-State: AOJu0YyXB8uwBH3+/8q4wP2APvjnBe1gmP2gHF6BiUGhF+CjQAy0TZlW
	20TgALpfhetrJWkIm45tsgR0+yJXo/iHm066Y6evVu/XtSraSXcmwO4y41wM8urcembKvzO7OzQ
	izrrgp6jloDRN61lhnxIQqijY53cBiToGXwuvIfnRNLSGNapwv8OC0Mev15bDgMQma67Qbvk=
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a17:90b:1c11:b0:28b:c540:e83 with SMTP id
 oc17-20020a17090b1c1100b0028bc5400e83mr181283pjb.0.1703031565469; Tue, 19 Dec
 2023 16:19:25 -0800 (PST)
Date: Tue, 19 Dec 2023 16:18:24 -0800
In-Reply-To: <20231220001856.3710363-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231220001856.3710363-1-jstultz@google.com>
X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog
Message-ID: <20231220001856.3710363-14-jstultz@google.com>
Subject: [PATCH v7 13/23] sched: Start blocked_on chain processing in
 find_proxy_task()
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@google.com>, Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>,
 Youssef Esmat <youssefesmat@google.com>,
	Mel Gorman <mgorman@suse.de>,
 Daniel Bristot de Oliveira <bristot@redhat.com>,
 Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>, kernel-team@android.com,
	Valentin Schneider <valentin.schneider@arm.com>,
 "Connor O'Brien" <connoro@google.com>,
	John Stultz <jstultz@google.com>
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1785758267374559408
X-GMAIL-MSGID: 1785758267374559408

From: Peter Zijlstra <peterz@infradead.org>

Start to flesh out the real find_proxy_task() implementation,
but avoid the migration cases for now, in those cases just
deactivate the selected task and pick again.

To ensure the selected task or other blocked tasks in the chain
aren't migrated away while we're running the proxy, this patch
also tweaks CFS logic to avoid migrating selected or mutex
blocked tasks.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: This change was split out from the larger proxy patch]
Signed-off-by: John Stultz <jstultz@google.com>
---
v5:
* Split this out from larger proxy patch
v7:
* Minor refactoring of core find_proxy_task() function
* Minor spelling and corrections suggested by Metin Kaya
* Dropped an added BUG_ON that was frequently tripped
* Minor commit message tweaks from Metin Kaya
---
 kernel/sched/core.c | 154 +++++++++++++++++++++++++++++++++++++-------
 kernel/sched/fair.c |   9 ++-
 2 files changed, 137 insertions(+), 26 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f6bf3b62194c..42e25bbdfe6b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -94,6 +94,7 @@
 #include "../workqueue_internal.h"
 #include "../../io_uring/io-wq.h"
 #include "../smpboot.h"
+#include "../locking/mutex.h"
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(ipi_send_cpu);
 EXPORT_TRACEPOINT_SYMBOL_GPL(ipi_send_cpumask);
@@ -6609,6 +6610,15 @@ static bool try_to_deactivate_task(struct rq *rq, struct task_struct *p,
 
 #ifdef CONFIG_SCHED_PROXY_EXEC
 
+static inline struct task_struct *
+proxy_resched_idle(struct rq *rq, struct task_struct *next)
+{
+	put_prev_task(rq, next);
+	rq_set_selected(rq, rq->idle);
+	set_tsk_need_resched(rq->idle);
+	return rq->idle;
+}
+
 static bool proxy_deactivate(struct rq *rq, struct task_struct *next)
 {
 	unsigned long state = READ_ONCE(next->__state);
@@ -6618,48 +6628,138 @@ static bool proxy_deactivate(struct rq *rq, struct task_struct *next)
 		return false;
 	if (!try_to_deactivate_task(rq, next, state, true))
 		return false;
-	put_prev_task(rq, next);
-	rq_set_selected(rq, rq->idle);
-	resched_curr(rq);
+	proxy_resched_idle(rq, next);
 	return true;
 }
 
 /*
- * Initial simple proxy that just returns the task if it's waking
- * or deactivates the blocked task so we can pick something that
- * isn't blocked.
+ * Find who @next (currently blocked on a mutex) can proxy for.
+ *
+ * Follow the blocked-on relation:
+ *   task->blocked_on -> mutex->owner -> task...
+ *
+ * Lock order:
+ *
+ *   p->pi_lock
+ *     rq->lock
+ *       mutex->wait_lock
+ *         p->blocked_lock
+ *
+ * Returns the task that is going to be used as execution context (the one
+ * that is actually going to be put to run on cpu_of(rq)).
  */
 static struct task_struct *
 find_proxy_task(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
 {
+	struct task_struct *owner = NULL;
 	struct task_struct *ret = NULL;
-	struct task_struct *p = next;
+	struct task_struct *p;
 	struct mutex *mutex;
+	int this_cpu = cpu_of(rq);
 
-	mutex = p->blocked_on;
-	/* Something changed in the chain, so pick again */
-	if (!mutex)
-		return NULL;
 	/*
-	 * By taking mutex->wait_lock we hold off concurrent mutex_unlock()
-	 * and ensure @owner sticks around.
+	 * Follow blocked_on chain.
+	 *
+	 * TODO: deadlock detection
 	 */
-	raw_spin_lock(&mutex->wait_lock);
-	raw_spin_lock(&p->blocked_lock);
+	for (p = next; task_is_blocked(p); p = owner) {
+		mutex = p->blocked_on;
+		/* Something changed in the chain, so pick again */
+		if (!mutex)
+			return NULL;
 
-	/* Check again that p is blocked with blocked_lock held */
-	if (!task_is_blocked(p) || mutex != p->blocked_on) {
 		/*
-		 * Something changed in the blocked_on chain and
-		 * we don't know if only at this level. So, let's
-		 * just bail out completely and let __schedule
-		 * figure things out (pick_again loop).
+		 * By taking mutex->wait_lock we hold off concurrent mutex_unlock()
+		 * and ensure @owner sticks around.
 		 */
-		goto out;
+		raw_spin_lock(&mutex->wait_lock);
+		raw_spin_lock(&p->blocked_lock);
+
+		/* Check again that p is blocked with blocked_lock held */
+		if (mutex != p->blocked_on) {
+			/*
+			 * Something changed in the blocked_on chain and
+			 * we don't know if only at this level. So, let's
+			 * just bail out completely and let __schedule
+			 * figure things out (pick_again loop).
+			 */
+			goto out;
+		}
+
+		owner = __mutex_owner(mutex);
+		if (!owner) {
+			ret = p;
+			goto out;
+		}
+
+		if (task_cpu(owner) != this_cpu) {
+			/* XXX Don't handle migrations yet */
+			if (!proxy_deactivate(rq, next))
+				ret = next;
+			goto out;
+		}
+
+		if (task_on_rq_migrating(owner)) {
+			/*
+			 * One of the chain of mutex owners is currently migrating to this
+			 * CPU, but has not yet been enqueued because we are holding the
+			 * rq lock. As a simple solution, just schedule rq->idle to give
+			 * the migration a chance to complete. Much like the migrate_task
+			 * case we should end up back in proxy(), this time hopefully with
+			 * all relevant tasks already enqueued.
+			 */
+			raw_spin_unlock(&p->blocked_lock);
+			raw_spin_unlock(&mutex->wait_lock);
+			return proxy_resched_idle(rq, next);
+		}
+
+		if (!owner->on_rq) {
+			/* XXX Don't handle blocked owners yet */
+			if (!proxy_deactivate(rq, next))
+				ret = next;
+			goto out;
+		}
+
+		if (owner == p) {
+			/*
+			 * It's possible we interleave with mutex_unlock like:
+			 *
+			 *				lock(&rq->lock);
+			 *				  find_proxy_task()
+			 * mutex_unlock()
+			 *   lock(&wait_lock);
+			 *   next(owner) = current->blocked_donor;
+			 *   unlock(&wait_lock);
+			 *
+			 *   wake_up_q();
+			 *     ...
+			 *       ttwu_runnable()
+			 *         __task_rq_lock()
+			 *				  lock(&wait_lock);
+			 *				  owner == p
+			 *
+			 * Which leaves us to finish the ttwu_runnable() and make it go.
+			 *
+			 * So schedule rq->idle so that ttwu_runnable can get the rq lock
+			 * and mark owner as running.
+			 */
+			raw_spin_unlock(&p->blocked_lock);
+			raw_spin_unlock(&mutex->wait_lock);
+			return proxy_resched_idle(rq, next);
+		}
+
+		/*
+		 * OK, now we're absolutely sure @owner is not blocked _and_
+		 * on this rq, therefore holding @rq->lock is sufficient to
+		 * guarantee its existence, as per ttwu_remote().
+		 */
+		raw_spin_unlock(&p->blocked_lock);
+		raw_spin_unlock(&mutex->wait_lock);
 	}
 
-	if (!proxy_deactivate(rq, next))
-		ret = p;
+	WARN_ON_ONCE(owner && !owner->on_rq);
+	return owner;
+
 out:
 	raw_spin_unlock(&p->blocked_lock);
 	raw_spin_unlock(&mutex->wait_lock);
@@ -6738,6 +6838,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 	struct rq_flags rf;
 	struct rq *rq;
 	int cpu;
+	bool preserve_need_resched = false;
 
 	cpu = smp_processor_id();
 	rq = cpu_rq(cpu);
@@ -6798,9 +6899,12 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 			rq_repin_lock(rq, &rf);
 			goto pick_again;
 		}
+		if (next == rq->idle && prev == rq->idle)
+			preserve_need_resched = true;
 	}
 
-	clear_tsk_need_resched(prev);
+	if (!preserve_need_resched)
+		clear_tsk_need_resched(prev);
 	clear_preempt_need_resched();
 #ifdef CONFIG_SCHED_DEBUG
 	rq->last_seen_need_resched_ns = 0;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 085941db5bf1..954b41e5b7df 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8905,6 +8905,9 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
 	if (kthread_is_per_cpu(p))
 		return 0;
 
+	if (task_is_blocked(p))
+		return 0;
+
 	if (!cpumask_test_cpu(env->dst_cpu, p->cpus_ptr)) {
 		int cpu;
 
@@ -8941,7 +8944,8 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
 	/* Record that we found at least one task that could run on dst_cpu */
 	env->flags &= ~LBF_ALL_PINNED;
 
-	if (task_on_cpu(env->src_rq, p)) {
+	if (task_on_cpu(env->src_rq, p) ||
+	    task_current_selected(env->src_rq, p)) {
 		schedstat_inc(p->stats.nr_failed_migrations_running);
 		return 0;
 	}
@@ -8980,6 +8984,9 @@ static void detach_task(struct task_struct *p, struct lb_env *env)
 {
 	lockdep_assert_rq_held(env->src_rq);
 
+	BUG_ON(task_current(env->src_rq, p));
+	BUG_ON(task_current_selected(env->src_rq, p));
+
 	deactivate_task(env->src_rq, p, DEQUEUE_NOCLOCK);
 	set_task_cpu(p, env->dst_cpu);
 }

From patchwork Wed Dec 20 00:18:25 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: John Stultz <jstultz@google.com>
X-Patchwork-Id: 181363
Return-Path: <linux-kernel+bounces-6150-ouuuleilei=gmail.com@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp2323983dyi;
        Tue, 19 Dec 2023 16:23:34 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IEpQunkotxZpJJTEYOxhaBFfVtIFkTVSWUBtqJ41suZNTZCvf9B6qOFWaktpoCx4T23MZAh
X-Received: by 2002:a05:6512:2186:b0:50e:4bea:579 with SMTP id
 b6-20020a056512218600b0050e4bea0579mr323567lft.111.1703031814384;
        Tue, 19 Dec 2023 16:23:34 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1703031814; cv=none;
        d=google.com; s=arc-20160816;
        b=VgAKEMiWvMRspTZ1dtQKS614PlMu8uQRROQ3ajEjcMugnyYS9l9PqoaXOuYTG5g+jI
         e3FOoaZrp3uCqb+f3F4jmqgJ2grI/Ywd/V49thLxOWlkq1L7gclLa393kIDw4FHqga77
         2M3dPgL/PJ4mWWF6kZpEVmhQdWUCXjUYBuRE+1RHudRoaBMD+6IRY/Zwm6MH2CcLcWDx
         vECtbngwyX8dCWe/BMmNUkGbGcrEQXKTmY767xSfywEwXm2kjwpwlBuysjM0c4YDAlGW
         POoi0A3WzrMObkAE+uquvb2uYRgnUZU8N9S2zs088IIZkBWbA1RFMY2quk3/v5PKjTcH
         pxuA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=cc:to:from:subject:message-id:references:mime-version
         :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date
         :dkim-signature;
        bh=OZRgW8HLq7BRsL0aOrZSuCWDIFWC4f/KwfSRCNa5tok=;
        fh=lq1k4m46TqOToIb7rHC/F2KmHTJTjmaV9sS3Am9W7nA=;
        b=d/n3j55dhKWWmX2fsdqu84ub7GVtp764xxniwTsNeepQM1zTUa0KwHDOjkoSDBN1Sd
         OmkIdjDEVUoawWm6RBQ2abbRvU3Mirx9iMBCA5NW/9HGYhFQvVGyVFjpv2mZqD7UdBbk
         BAh0koF7gIX4FdueVuf2bCdAkzCgJScdBEUGNV/tu3QNzRPI1O/QsyHgtxXz9RwyZ4C/
         zb2sx5Yyzu1TvwhXXybLbLra4G30q77Gvy6D0G02pxes99KtBCo+klIOpeVr0oAzOtl7
         nNwCwPgEOephVvYKPSQnr9hvdkxe//yg6pHOLYInDB0o0Vms48dFcRMzK1worNaLOLjb
         7cxA==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=ZWeesYVB;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6150-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.80.249 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6150-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249])
        by mx.google.com with ESMTPS id
 x29-20020a50d61d000000b005539faaa6d2si861719edi.211.2023.12.19.16.23.34
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 19 Dec 2023 16:23:34 -0800 (PST)
Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-6150-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.80.249 as permitted sender) client-ip=147.75.80.249;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=ZWeesYVB;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6150-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.80.249 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6150-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org
 [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by am.mirrors.kernel.org (Postfix) with ESMTPS id C313C1F25E46
	for <ouuuleilei@gmail.com>; Wed, 20 Dec 2023 00:23:33 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp.subspace.kernel.org (Postfix) with ESMTP id 404E0208C2;
	Wed, 20 Dec 2023 00:19:35 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="ZWeesYVB"
X-Original-To: linux-kernel@vger.kernel.org
Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com
 [209.85.219.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 518151C6B8
	for <linux-kernel@vger.kernel.org>; Wed, 20 Dec 2023 00:19:28 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Received: by mail-yb1-f202.google.com with SMTP id
 3f1490d57ef6-dbd980d774dso616181276.2
        for <linux-kernel@vger.kernel.org>;
 Tue, 19 Dec 2023 16:19:28 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1703031567; x=1703636367;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=OZRgW8HLq7BRsL0aOrZSuCWDIFWC4f/KwfSRCNa5tok=;
        b=ZWeesYVBPA42W1D3hOlDJUcpKQQttan9iFOUWfkKEFLM40xQxQTUvqaopaLt9aizTc
         L3g9m80xxrr/Lb/SkvIBzxSSNcbkOa2mzeFqAEKXqRfEg6gicunhD3PrAZG2wrimT/0Q
         Ihqgv4Ak4Nw8oXMFbdlZv/CtZl1uwOPOXqWnmRWU1cEn5pHEEzcsEb9psyGJ+2x3Yo3P
         70IWafEMUxw5W7vBMARJ8z9OQGZod/6o8kmYNriSaGuTZIHL7X6pFj3/v5mDqWRRgCKw
         aX7TSj690r7Slz4p/23a6WQBEtlbklTImUqT32krQ01jl2MyajWb/Afv9pDHRjyGtgwd
         2VGg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1703031567; x=1703636367;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=OZRgW8HLq7BRsL0aOrZSuCWDIFWC4f/KwfSRCNa5tok=;
        b=Be4kGrvBvLjDdtYOwlrFnPNi+GQj2q1xnosIQzLqXjlVOKbkF3KrKGf3el/27mv5dV
         dH71PPcleL/NG6ANcjSS2ELxCmMn2vTC+wuTQspSvGf6Vip9c6Bz4/DK7jNhsiEfkRJX
         xE6w+RS70XV+d12xZbFDJ7/oP6NjxVpAAgTaLBnmhvJ39VTl8g2QOyyy85WBJzJxyMMe
         1gzs2UQJUodQutUXKSGF5ivIMxHwUmfLwq4bKF0jvoBfL5CFuBI5UY59FV2BedFO4TA/
         z+JrKrV0RrTPyW0DzRJH7obqffWfaE5nV73BJIYQAmKdpfzmkmCAH2smiaAR4wjWAfoA
         JHew==
X-Gm-Message-State: AOJu0Yyub3pDRCwOAuPSE9hz6jcncoivbkFc3ZrllrpgRmleT4/f6T6i
	jnNJ9BUt48t6YjAPvog5zLMay+6hYQwfP0N4+JaRmG9hnTXaD37QHCd12gMp/K2eqw9Y8qTxTeU
	oOVh8kvrirLHrF9pPJYXteHZTZPmi/bEJ5GrilspruE0V4Bum/omQ7BAXCsgSclR8wlPuVB4=
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a05:6902:603:b0:dbd:7146:69d9 with SMTP id
 d3-20020a056902060300b00dbd714669d9mr448288ybt.13.1703031567192; Tue, 19 Dec
 2023 16:19:27 -0800 (PST)
Date: Tue, 19 Dec 2023 16:18:25 -0800
In-Reply-To: <20231220001856.3710363-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231220001856.3710363-1-jstultz@google.com>
X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog
Message-ID: <20231220001856.3710363-15-jstultz@google.com>
Subject: [PATCH v7 14/23] sched: Handle blocked-waiter migration (and return
 migration)
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@google.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <vschneid@redhat.com>,
 Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Zimuzo Ezeozue <zezeozue@google.com>,
	Youssef Esmat <youssefesmat@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
 Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>, "Paul E. McKenney" <paulmck@kernel.org>,
	Metin Kaya <Metin.Kaya@arm.com>, Xuewen Yan <xuewen.yan94@gmail.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
 Thomas Gleixner <tglx@linutronix.de>, kernel-team@android.com
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1785758287682168189
X-GMAIL-MSGID: 1785758287682168189

Add logic to handle migrating a blocked waiter to a remote
cpu where the lock owner is runnable.

Additionally, as the blocked task may not be able to run
on the remote cpu, add logic to handle return migration once
the waiting task is given the mutex.

Because tasks may get migrated to where they cannot run,
this patch also modifies the scheduling classes to avoid
sched class migrations on mutex blocked tasks, leaving
proxy() to do the migrations and return migrations.

This was split out from the larger proxy patch, and
significantly reworked.

Credits for the original patch go to:
  Peter Zijlstra (Intel) <peterz@infradead.org>
  Juri Lelli <juri.lelli@redhat.com>
  Valentin Schneider <valentin.schneider@arm.com>
  Connor O'Brien <connoro@google.com>

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com
Signed-off-by: John Stultz <jstultz@google.com>
---
v6:
* Integrated sched_proxy_exec() check in proxy_return_migration()
* Minor cleanups to diff
* Unpin the rq before calling __balance_callbacks()
* Tweak proxy migrate to migrate deeper task in chain, to avoid
  tasks pingponging between rqs
v7:
* Fixup for unused function arguments
* Switch from that_rq -> target_rq, other minor tweaks, and typo
  fixes suggested by Metin Kaya
* Switch back to doing return migration in the ttwu path, which
  avoids nasty lock juggling and performance issues
* Fixes for UP builds
---
 kernel/sched/core.c     | 161 ++++++++++++++++++++++++++++++++++++++--
 kernel/sched/deadline.c |   2 +-
 kernel/sched/fair.c     |   4 +-
 kernel/sched/rt.c       |   9 ++-
 4 files changed, 164 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 42e25bbdfe6b..55dc2a3b7e46 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2981,8 +2981,15 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 	struct set_affinity_pending my_pending = { }, *pending = NULL;
 	bool stop_pending, complete = false;
 
-	/* Can the task run on the task's current CPU? If so, we're done */
-	if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask)) {
+	/*
+	 * Can the task run on the task's current CPU? If so, we're done
+	 *
+	 * We are also done if the task is selected, boosting a lock-
+	 * holding proxy, (and potentially has been migrated outside its
+	 * current or previous affinity mask)
+	 */
+	if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask) ||
+	    (task_current_selected(rq, p) && !task_current(rq, p))) {
 		struct task_struct *push_task = NULL;
 
 		if ((flags & SCA_MIGRATE_ENABLE) &&
@@ -3778,6 +3785,39 @@ static inline void ttwu_do_wakeup(struct task_struct *p)
 	trace_sched_wakeup(p);
 }
 
+#ifdef CONFIG_SMP
+static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p)
+{
+	if (!sched_proxy_exec())
+		return false;
+
+	if (task_current(rq, p))
+		return false;
+
+	if (p->blocked_on && p->blocked_on_state == BO_WAKING) {
+		raw_spin_lock(&p->blocked_lock);
+		if (!is_cpu_allowed(p, cpu_of(rq))) {
+			if (task_current_selected(rq, p)) {
+				put_prev_task(rq, p);
+				rq_set_selected(rq, rq->idle);
+			}
+			deactivate_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK);
+			resched_curr(rq);
+			raw_spin_unlock(&p->blocked_lock);
+			return true;
+		}
+		resched_curr(rq);
+		raw_spin_unlock(&p->blocked_lock);
+	}
+	return false;
+}
+#else /* !CONFIG_SMP */
+static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p)
+{
+	return false;
+}
+#endif /*CONFIG_SMP */
+
 static void
 ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
 		 struct rq_flags *rf)
@@ -3870,9 +3910,12 @@ static int ttwu_runnable(struct task_struct *p, int wake_flags)
 			update_rq_clock(rq);
 			wakeup_preempt(rq, p, wake_flags);
 		}
+		if (proxy_needs_return(rq, p))
+			goto out;
 		ttwu_do_wakeup(p);
 		ret = 1;
 	}
+out:
 	__task_rq_unlock(rq, &rf);
 
 	return ret;
@@ -4231,6 +4274,7 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	int cpu, success = 0;
 
 	if (p == current) {
+		WARN_ON(task_is_blocked(p));
 		/*
 		 * We're waking current, this means 'p->on_rq' and 'task_cpu(p)
 		 * == smp_processor_id()'. Together this means we can special
@@ -6632,6 +6676,91 @@ static bool proxy_deactivate(struct rq *rq, struct task_struct *next)
 	return true;
 }
 
+#ifdef CONFIG_SMP
+/*
+ * If the blocked-on relationship crosses CPUs, migrate @p to the
+ * owner's CPU.
+ *
+ * This is because we must respect the CPU affinity of execution
+ * contexts (owner) but we can ignore affinity for scheduling
+ * contexts (@p). So we have to move scheduling contexts towards
+ * potential execution contexts.
+ *
+ * Note: The owner can disappear, but simply migrate to @target_cpu
+ * and leave that CPU to sort things out.
+ */
+static struct task_struct *
+proxy_migrate_task(struct rq *rq, struct rq_flags *rf,
+		   struct task_struct *p, int target_cpu)
+{
+	struct rq *target_rq;
+	int wake_cpu;
+
+	lockdep_assert_rq_held(rq);
+	target_rq = cpu_rq(target_cpu);
+
+	/*
+	 * Since we're going to drop @rq, we have to put(@rq_selected) first,
+	 * otherwise we have a reference that no longer belongs to us. Use
+	 * @rq->idle to fill the void and make the next pick_next_task()
+	 * invocation happy.
+	 *
+	 * CPU0				CPU1
+	 *
+	 *				B mutex_lock(X)
+	 *
+	 * A mutex_lock(X) <- B
+	 * A __schedule()
+	 * A pick->A
+	 * A proxy->B
+	 * A migrate A to CPU1
+	 *				B mutex_unlock(X) -> A
+	 *				B __schedule()
+	 *				B pick->A
+	 *				B switch_to (A)
+	 *				A ... does stuff
+	 * A ... is still running here
+	 *
+	 *		* BOOM *
+	 */
+	put_prev_task(rq, rq_selected(rq));
+	rq_set_selected(rq, rq->idle);
+	set_next_task(rq, rq_selected(rq));
+	WARN_ON(p == rq->curr);
+
+	wake_cpu = p->wake_cpu;
+	deactivate_task(rq, p, 0);
+	set_task_cpu(p, target_cpu);
+	/*
+	 * Preserve p->wake_cpu, such that we can tell where it
+	 * used to run later.
+	 */
+	p->wake_cpu = wake_cpu;
+
+	rq_unpin_lock(rq, rf);
+	__balance_callbacks(rq);
+
+	raw_spin_rq_unlock(rq);
+	raw_spin_rq_lock(target_rq);
+
+	activate_task(target_rq, p, 0);
+	wakeup_preempt(target_rq, p, 0);
+
+	raw_spin_rq_unlock(target_rq);
+	raw_spin_rq_lock(rq);
+	rq_repin_lock(rq, rf);
+
+	return NULL; /* Retry task selection on _this_ CPU. */
+}
+#else /* !CONFIG_SMP */
+static struct task_struct *
+proxy_migrate_task(struct rq *rq, struct rq_flags *rf,
+		   struct task_struct *p, int target_cpu)
+{
+	return NULL;
+}
+#endif /* CONFIG_SMP */
+
 /*
  * Find who @next (currently blocked on a mutex) can proxy for.
  *
@@ -6654,8 +6783,11 @@ find_proxy_task(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
 	struct task_struct *owner = NULL;
 	struct task_struct *ret = NULL;
 	struct task_struct *p;
+	int cur_cpu, target_cpu;
 	struct mutex *mutex;
-	int this_cpu = cpu_of(rq);
+	bool curr_in_chain = false;
+
+	cur_cpu = cpu_of(rq);
 
 	/*
 	 * Follow blocked_on chain.
@@ -6686,17 +6818,27 @@ find_proxy_task(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
 			goto out;
 		}
 
+		if (task_current(rq, p))
+			curr_in_chain = true;
+
 		owner = __mutex_owner(mutex);
 		if (!owner) {
 			ret = p;
 			goto out;
 		}
 
-		if (task_cpu(owner) != this_cpu) {
-			/* XXX Don't handle migrations yet */
-			if (!proxy_deactivate(rq, next))
-				ret = next;
-			goto out;
+		if (task_cpu(owner) != cur_cpu) {
+			target_cpu = task_cpu(owner);
+			/*
+			 * @owner can disappear, simply migrate to @target_cpu and leave that CPU
+			 * to sort things out.
+			 */
+			raw_spin_unlock(&p->blocked_lock);
+			raw_spin_unlock(&mutex->wait_lock);
+			if (curr_in_chain)
+				return proxy_resched_idle(rq, next);
+
+			return proxy_migrate_task(rq, rf, p, target_cpu);
 		}
 
 		if (task_on_rq_migrating(owner)) {
@@ -6999,6 +7141,9 @@ static inline void sched_submit_work(struct task_struct *tsk)
 	 */
 	SCHED_WARN_ON(current->__state & TASK_RTLOCK_WAIT);
 
+	if (task_is_blocked(tsk))
+		return;
+
 	/*
 	 * If we are going to sleep and we have plugged IO queued,
 	 * make sure to submit it to avoid deadlocks.
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 9cf20f4ac5f9..4f998549ea74 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1705,7 +1705,7 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
 
 	enqueue_dl_entity(&p->dl, flags);
 
-	if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
+	if (!task_current(rq, p) && p->nr_cpus_allowed > 1 && !task_is_blocked(p))
 		enqueue_pushable_dl_task(rq, p);
 }
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 954b41e5b7df..8e3f118f6d6e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8372,7 +8372,9 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
 		goto idle;
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
-	if (!prev || prev->sched_class != &fair_sched_class)
+	if (!prev ||
+	    prev->sched_class != &fair_sched_class ||
+	    rq->curr != rq_selected(rq))
 		goto simple;
 
 	/*
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 81cd22eaa6dc..a7b51a021111 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1503,6 +1503,9 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p, int flags)
 	if (p->nr_cpus_allowed == 1)
 		return;
 
+	if (task_is_blocked(p))
+		return;
+
 	enqueue_pushable_task(rq, p);
 }
 
@@ -1790,10 +1793,12 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
 
 	update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 1);
 
-	/* Avoid marking selected as pushable */
-	if (task_current_selected(rq, p))
+	/* Avoid marking current or selected as pushable */
+	if (task_current(rq, p) || task_current_selected(rq, p))
 		return;
 
+	if (task_is_blocked(p))
+		return;
 	/*
 	 * The previous task needs to be made eligible for pushing
 	 * if it is still active

From patchwork Wed Dec 20 00:18:26 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: John Stultz <jstultz@google.com>
X-Patchwork-Id: 181364
Return-Path: <linux-kernel+bounces-6151-ouuuleilei=gmail.com@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp2324048dyi;
        Tue, 19 Dec 2023 16:23:44 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IF8w+0lODZuMHUR/XiAg11hTZ/Nz2+p7rzgR31L2t4uPWCT+TZJJ8iZthZimvXzL9J/P3eg
X-Received: by 2002:a05:622a:450:b0:425:4043:2a0a with SMTP id
 o16-20020a05622a045000b0042540432a0amr26703206qtx.133.1703031824060;
        Tue, 19 Dec 2023 16:23:44 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1703031823; cv=none;
        d=google.com; s=arc-20160816;
        b=ZJnTcBZXKzpfH9nMMK5p27v1Hz6qJNJKiEG1osC2eyfbb0RyaS67pUiZAgE5IsCttL
         KjOlTD9DRHTLu3P21KqSlOGXF10NR3qrRuEILeFMDZMEoP7NqL+3l3GHJESljPlpYOsX
         aFVzhfK50K0qwZDNDJ8urSxvNgkFQ/4i7F4tpfxhHAIgwyf/eMy92qwRZFyHxR0tSFDE
         4Zsift47BtsMHgSrRvrEj8LBE3B0ZeGjfxzIkvPEP7GufwwlJnDYBu7khIJN8VmkbSiI
         A92675RIT2i3XN52YxA7IiYMAXPZX4WPLeheh4AhuiYLvGWs7eGoPQN/jZVNY3oxVTyH
         Em5Q==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=cc:to:from:subject:message-id:references:mime-version
         :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date
         :dkim-signature;
        bh=bNh04uMPPxlN5w0eaiGKDRnEs2O+ir6chwSpLeyNOZY=;
        fh=5JIDPtr1m24UEIajLvRh11pfGrxPinBnvs5yA23VCFE=;
        b=dRnfA3/A/plZQxQN2NELXVXLcxaxJ6QYp10pqHxUUyqO6kuUDOPrM6kz4Iov5IxrOq
         GhcNnrukUNjSCuMYuuTMEvJM1lqsoBe6BXOjj3K6sgJOGUHhMk9k/UScGSkvkJrIiQrv
         /hx9v86sV8qucYKk+mKr+MHC+ibkqFVFEgDW0buyqoIL5TC88d74BHGlQnXNQ15eGSXg
         8/99bvSuvP0eulqHmJJ52nolC2jTXC/UyK8uK81BZ5DY+xU4VtBLqLJmNEQH89yJ6N31
         0MCzDvPYs5o/GabMC6ncpGa7GABKQc135VCImbY6T0BIa4ZNQfCB1+a/k815+gKRGLSa
         95ew==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b="U/xL+3FA";
       spf=pass (google.com: domain of
 linux-kernel+bounces-6151-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.199.223 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6151-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223])
        by mx.google.com with ESMTPS id
 t14-20020a05622a180e00b0042784aa55d0si820347qtc.86.2023.12.19.16.23.43
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 19 Dec 2023 16:23:43 -0800 (PST)
Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-6151-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.199.223 as permitted sender) client-ip=147.75.199.223;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b="U/xL+3FA";
       spf=pass (google.com: domain of
 linux-kernel+bounces-6151-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.199.223 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6151-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org
 [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by ny.mirrors.kernel.org (Postfix) with ESMTPS id 5D3141C255E4
	for <ouuuleilei@gmail.com>; Wed, 20 Dec 2023 00:23:43 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp.subspace.kernel.org (Postfix) with ESMTP id 7D91720B0F;
	Wed, 20 Dec 2023 00:19:36 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="U/xL+3FA"
X-Original-To: linux-kernel@vger.kernel.org
Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com
 [209.85.219.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5D7551DA4F
	for <linux-kernel@vger.kernel.org>; Wed, 20 Dec 2023 00:19:30 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Received: by mail-yb1-f201.google.com with SMTP id
 3f1490d57ef6-dbd4a080c0bso2649377276.1
        for <linux-kernel@vger.kernel.org>;
 Tue, 19 Dec 2023 16:19:30 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1703031569; x=1703636369;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=bNh04uMPPxlN5w0eaiGKDRnEs2O+ir6chwSpLeyNOZY=;
        b=U/xL+3FA1vfQMVyGuj1gar9joGfNdh+gVdvjCD9SUuM2FdfSIgMFJpoXTKR20pxBl3
         l7GHeKi4KkqKK7IDb6pJ5H6TPsZJPnQWOd3lycLRDpm5VIBjwczzt6qWWfHGuv4Zm84q
         4OmJZJ3e9TT3Y3e8xRoPVPJD2X988uiSdTZrQs2OTdFQxbV077czvFPPh14ECvRUTd8N
         1/AHacSMoDfNQCKm7733cwoGy1PbxpwtyxdDknJu3HWrMFcbR04E3Br/9f7xdUAEBeQa
         VU8JgsCX1LsqAs9Na+lsWsixHHjET66ox7h+fOQn0Qf56divD4jSkyNx+BXazqYX+BSI
         /QoA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1703031569; x=1703636369;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=bNh04uMPPxlN5w0eaiGKDRnEs2O+ir6chwSpLeyNOZY=;
        b=pSKLsTGZFmnmScN6AQtKLz5WOz5z+lcG9GB3ErDActre0Y6PgwB0Vc/7T3mHjdO7HU
         9vI9qqi3tDr5EVXzd1Eq2K0gnviMmMkmtpJhD6l6qZEip2Z1KFkzdH48Ui4Ut7nxP5Yz
         ExpnuXYL9qgEwnTAGF8Uq5NHa7zWoLooWs5PVUfdVyWwrEeNsrodDOTQwFj+XrSFU7cf
         5OyuvNoTyn+3MQ1ZbmppKQr268/7d7CgMZQ/1UqOs2c72Wsft9fyJY5uM4xGC0iAbXtA
         BT1M+sq+OqLMrj449Nh6qzXTylgqik6/dewbI9MdxZdrhVblMaZ/qo8+ZiwlLvJ8OLQI
         dXBA==
X-Gm-Message-State: AOJu0YwrRnniWpxniNvWNNSllH1larx46KlLp4Wvsa9MlYt6udlAJIwQ
	XppMfr7bNHnFOj9zDXUqau9kl2eJr/rTn/7n3pJ7DImjXuCVGlE1N2Crq0FAmijAY7+J9ahuGXm
	kySZc4Kk9DwMNe1WXhhvSWPzIN2xFHiYRDm66NZvnWDd2ojuFqDFe72YlH7xQAvJf/7ioOIo=
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a25:86ce:0:b0:dbc:d2e9:39e7 with SMTP id
 y14-20020a2586ce000000b00dbcd2e939e7mr2500993ybm.10.1703031569088; Tue, 19
 Dec 2023 16:19:29 -0800 (PST)
Date: Tue, 19 Dec 2023 16:18:26 -0800
In-Reply-To: <20231220001856.3710363-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231220001856.3710363-1-jstultz@google.com>
X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog
Message-ID: <20231220001856.3710363-16-jstultz@google.com>
Subject: [PATCH v7 15/23] sched: Add blocked_donor link to task for smarter
 mutex handoffs
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@google.com>, Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>,
 Youssef Esmat <youssefesmat@google.com>,
	Mel Gorman <mgorman@suse.de>,
 Daniel Bristot de Oliveira <bristot@redhat.com>,
 Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>, kernel-team@android.com,
	Valentin Schneider <valentin.schneider@arm.com>,
 "Connor O'Brien" <connoro@google.com>,
	John Stultz <jstultz@google.com>
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1785758297576323813
X-GMAIL-MSGID: 1785758297576323813

From: Peter Zijlstra <peterz@infradead.org>

Add link to the task this task is proxying for, and use it so we
do intelligent hand-off of the owned mutex to the task we're
running on behalf.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: This patch was split out from larger proxy patch]
Signed-off-by: John Stultz <jstultz@google.com>
---
v5:
* Split out from larger proxy patch
v6:
* Moved proxied value from earlier patch to this one where it
  is actually used
* Rework logic to check sched_proxy_exec() instead of using ifdefs
* Moved comment change to this patch where it makes sense
v7:
* Use more descriptive term then "us" in comments, as suggested
  by Metin Kaya.
* Minor typo fixup from Metin Kaya
* Reworked proxied variable to prev_not_proxied to simplify usage
---
 include/linux/sched.h  |  1 +
 kernel/fork.c          |  1 +
 kernel/locking/mutex.c | 35 ++++++++++++++++++++++++++++++++---
 kernel/sched/core.c    | 19 +++++++++++++++++--
 4 files changed, 51 insertions(+), 5 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 880af1c3097d..8020e224e057 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1157,6 +1157,7 @@ struct task_struct {
 
 	enum blocked_on_state		blocked_on_state;
 	struct mutex			*blocked_on;	/* lock we're blocked on */
+	struct task_struct		*blocked_donor;	/* task that is boosting this task */
 	raw_spinlock_t			blocked_lock;
 
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
diff --git a/kernel/fork.c b/kernel/fork.c
index b3ba3d22d8b2..138fc23cad43 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2459,6 +2459,7 @@ __latent_entropy struct task_struct *copy_process(
 
 	p->blocked_on_state = BO_RUNNABLE;
 	p->blocked_on = NULL; /* not blocked yet */
+	p->blocked_donor = NULL; /* nobody is boosting p yet */
 #ifdef CONFIG_BCACHE
 	p->sequential_io	= 0;
 	p->sequential_io_avg	= 0;
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 11dc5cb7a5a3..2711af8c0052 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -910,7 +910,7 @@ EXPORT_SYMBOL_GPL(ww_mutex_lock_interruptible);
  */
 static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigned long ip)
 {
-	struct task_struct *next = NULL;
+	struct task_struct *donor, *next = NULL;
 	DEFINE_WAKE_Q(wake_q);
 	unsigned long owner;
 	unsigned long flags;
@@ -948,7 +948,34 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
 	preempt_disable();
 	raw_spin_lock_irqsave(&lock->wait_lock, flags);
 	debug_mutex_unlock(lock);
-	if (!list_empty(&lock->wait_list)) {
+
+	if (sched_proxy_exec()) {
+		raw_spin_lock(&current->blocked_lock);
+		/*
+		 * If we have a task boosting current, and that task was boosting
+		 * current through this lock, hand the lock to that task, as that
+		 * is the highest waiter, as selected by the scheduling function.
+		 */
+		donor = current->blocked_donor;
+		if (donor) {
+			struct mutex *next_lock;
+
+			raw_spin_lock_nested(&donor->blocked_lock, SINGLE_DEPTH_NESTING);
+			next_lock = get_task_blocked_on(donor);
+			if (next_lock == lock) {
+				next = donor;
+				donor->blocked_on_state = BO_WAKING;
+				wake_q_add(&wake_q, donor);
+				current->blocked_donor = NULL;
+			}
+			raw_spin_unlock(&donor->blocked_lock);
+		}
+	}
+
+	/*
+	 * Failing that, pick any on the wait list.
+	 */
+	if (!next && !list_empty(&lock->wait_list)) {
 		/* get the first entry from the wait-list: */
 		struct mutex_waiter *waiter =
 			list_first_entry(&lock->wait_list,
@@ -956,7 +983,7 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
 
 		next = waiter->task;
 
-		raw_spin_lock(&next->blocked_lock);
+		raw_spin_lock_nested(&next->blocked_lock, SINGLE_DEPTH_NESTING);
 		debug_mutex_wake_waiter(lock, waiter);
 		WARN_ON(next->blocked_on != lock);
 		next->blocked_on_state = BO_WAKING;
@@ -967,6 +994,8 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
 	if (owner & MUTEX_FLAG_HANDOFF)
 		__mutex_handoff(lock, next);
 
+	if (sched_proxy_exec())
+		raw_spin_unlock(&current->blocked_lock);
 	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 	wake_up_q(&wake_q);
 	preempt_enable();
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 55dc2a3b7e46..e0afa228bc9d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6765,7 +6765,17 @@ proxy_migrate_task(struct rq *rq, struct rq_flags *rf,
  * Find who @next (currently blocked on a mutex) can proxy for.
  *
  * Follow the blocked-on relation:
- *   task->blocked_on -> mutex->owner -> task...
+ *
+ *                ,-> task
+ *                |     | blocked-on
+ *                |     v
+ *  blocked_donor |   mutex
+ *                |     | owner
+ *                |     v
+ *                `-- task
+ *
+ * and set the blocked_donor relation, this latter is used by the mutex
+ * code to find which (blocked) task to hand-off to.
  *
  * Lock order:
  *
@@ -6897,6 +6907,8 @@ find_proxy_task(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
 		 */
 		raw_spin_unlock(&p->blocked_lock);
 		raw_spin_unlock(&mutex->wait_lock);
+
+		owner->blocked_donor = p;
 	}
 
 	WARN_ON_ONCE(owner && !owner->on_rq);
@@ -6979,6 +6991,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 	unsigned long prev_state;
 	struct rq_flags rf;
 	struct rq *rq;
+	bool prev_not_proxied;
 	int cpu;
 	bool preserve_need_resched = false;
 
@@ -7030,9 +7043,11 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 		switch_count = &prev->nvcsw;
 	}
 
+	prev_not_proxied = !prev->blocked_donor;
 pick_again:
 	next = pick_next_task(rq, rq_selected(rq), &rf);
 	rq_set_selected(rq, next);
+	next->blocked_donor = NULL;
 	if (unlikely(task_is_blocked(next))) {
 		next = find_proxy_task(rq, next, &rf);
 		if (!next) {
@@ -7088,7 +7103,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 		rq = context_switch(rq, prev, next, &rf);
 	} else {
 		/* In case next was already curr but just got blocked_donor*/
-		if (!task_current_selected(rq, next))
+		if (prev_not_proxied && next->blocked_donor)
 			proxy_tag_curr(rq, next);
 
 		rq_unpin_lock(rq, &rf);

From patchwork Wed Dec 20 00:18:27 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: John Stultz <jstultz@google.com>
X-Patchwork-Id: 181365
Return-Path: <linux-kernel+bounces-6152-ouuuleilei=gmail.com@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp2324216dyi;
        Tue, 19 Dec 2023 16:24:03 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IG4IGFY01iAZ2Nb/KE1fuQhcPxe0kf+8bMhIMD5KvluWlmmbGM9qCMZzg7AgiEz1aYrHg12
X-Received: by 2002:ac8:5d8c:0:b0:423:a3d9:2a37 with SMTP id
 d12-20020ac85d8c000000b00423a3d92a37mr22178742qtx.65.1703031843459;
        Tue, 19 Dec 2023 16:24:03 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1703031843; cv=none;
        d=google.com; s=arc-20160816;
        b=mLEPb7pPAb1YBtq9fxiL1AEVg9OvQnMrR+/HZG4vHKdujL4nKW+JDzev40L4H/ug9T
         y20y4ROZ7rD+rE07tE715CjdweAx2b0RyUq/JLxegQ0RRFV0mNHCCEGtJSWEJg6v8zpo
         qseKVjUvkhjG/GVRZ/BeLxsSoTlARBHsenPqh1S07BU1wmtVHhWWFWzxwW+xc8ZCArPb
         h5RcLtaA4AkW+SQMSLFpvQI7KBwoUFEwrz9IYnagbU53Ssew/Y5BlbOPIw5E2fOwxeL7
         y9h4C+lPGh6EbGcKzh0ulhld5DuQZcOJaiteijqgO43jOrrVzq9aesierGknUidy3vQ8
         FAdg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=cc:to:from:subject:message-id:references:mime-version
         :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date
         :dkim-signature;
        bh=WYZk7eOoB4EkSBls2zylvhTpNqHG785S+70/RgizkRI=;
        fh=5JIDPtr1m24UEIajLvRh11pfGrxPinBnvs5yA23VCFE=;
        b=LFZ1J9L6fq+r/Nlfhz/cw4WPxo8rE6RVD/zMDnjiJIRaSjs2oLJ4kaGLNwxeUoHopj
         Jw514nArnRSA05iAHyxRUJMc8FwN86PgMt6Rdw6JoA4KXLtkiKc/W6P863nq2+E2y9d3
         B19DV8pW6MP46FyuY7CqeOvW66Kw+3YP6525pY+atioDTY6gqp+OvpssuHahPRQBvVJr
         7z7MmOLy2kSbYdcFyyTu5kDGX7CBZaYTmvIIphzu1J2Pu4cLlmYQmJeBNEOvpuBtw3K+
         i2MjA6c87XD5ZRmVlIEN7dglDUzcxMeIaKMRipA7LwPonviblOD4W585a7xZBA+r67xq
         eOYw==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=YxnHhl0e;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6152-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.199.223 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6152-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223])
        by mx.google.com with ESMTPS id
 h12-20020ac87d4c000000b00425e5682bbasi18752876qtb.48.2023.12.19.16.24.03
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 19 Dec 2023 16:24:03 -0800 (PST)
Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-6152-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.199.223 as permitted sender) client-ip=147.75.199.223;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=YxnHhl0e;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6152-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.199.223 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6152-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org
 [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by ny.mirrors.kernel.org (Postfix) with ESMTPS id 365861C25536
	for <ouuuleilei@gmail.com>; Wed, 20 Dec 2023 00:24:03 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp.subspace.kernel.org (Postfix) with ESMTP id 1D23620DF7;
	Wed, 20 Dec 2023 00:19:39 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="YxnHhl0e"
X-Original-To: linux-kernel@vger.kernel.org
Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com
 [209.85.216.73])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 523A81F16B
	for <linux-kernel@vger.kernel.org>; Wed, 20 Dec 2023 00:19:31 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Received: by mail-pj1-f73.google.com with SMTP id
 98e67ed59e1d1-28b8f963816so2527005a91.1
        for <linux-kernel@vger.kernel.org>;
 Tue, 19 Dec 2023 16:19:31 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1703031570; x=1703636370;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=WYZk7eOoB4EkSBls2zylvhTpNqHG785S+70/RgizkRI=;
        b=YxnHhl0eZhQsCyfsCoXedRz7Nc2q+N9s9taDASCGO4gvgGFTh0Wf7eyEIJDlgmdob/
         PWfzbx3sipDy6FPlOQghDNquyLECzrI62TF3bSsRvwC9mYRsFOzmFazRHHVRx8eM1OYT
         6MKmcDXbTTqcBqL69nkWRvhtR0r4xTfjv6vfzJaQRCFBdwC+uBa8DtHZsYf4b7GOq4re
         L9/4usB7WoM/b48YDehJvG9PZVMTm/wqP7XMTmxlP8+x9CsptpH6ctZnQNxezMr7xiej
         XZERveUMXfCw0WNBIvkaWra4fNvPxejv+Z8W97Ez+Gf7qrDwIQ5t3K7EuC/sryPR1r9f
         k1JA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1703031570; x=1703636370;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=WYZk7eOoB4EkSBls2zylvhTpNqHG785S+70/RgizkRI=;
        b=jbrnZwu2P9vi2lcOeY1H0ac0XI+HUnN1fKbOiS/fD5iOX4IjmMr1e7lYeNgJ3nl37Y
         p5ieHSzaecfhTROHjwubIvBuBzDGWWUXzwqNSuwTU34rSwPQNANuwY3tFrOfFdiqkgEQ
         rViraeIOKPaBF6wXeSOnXyQA5hX/Xm9ytg57F8tdkTKVBfxB9Knu+N4ySwL805iWQr+A
         xjC426tfY2jkpm+87IbIfiBIMDowj5TwL9gbcwNnokNBElkPYx599bbr3/0zlYFWcls/
         czi5ud2DSEB0s1Ywjdj4+lBi+6x/AkvkprGexJlStg7FC1y6MV4CElQjiv7rtykIV8aM
         gBfg==
X-Gm-Message-State: AOJu0Yxp/PGcHTL5S1XR0vRKRa1FNsJ+59MTwZUxwS51HiL+1lINQxnU
	QkAxTfUBlIWlFZVSkIz0ENnmvKjswdzVRfSYgn5sJ0JAyeO9D6LLuiP3TLjhAsqI0po3pfWm/me
	ABuTY69wGzKu7g37QZGHmGgh3astpCVjkxoQjN4mwKLRws3/MlHFW11zDI4q2oY0cO3Cl+C4=
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a17:90b:1d07:b0:28b:5015:301 with SMTP id
 on7-20020a17090b1d0700b0028b50150301mr865251pjb.6.1703031570653; Tue, 19 Dec
 2023 16:19:30 -0800 (PST)
Date: Tue, 19 Dec 2023 16:18:27 -0800
In-Reply-To: <20231220001856.3710363-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231220001856.3710363-1-jstultz@google.com>
X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog
Message-ID: <20231220001856.3710363-17-jstultz@google.com>
Subject: [PATCH v7 16/23] sched: Add deactivated (sleeping) owner handling to
 find_proxy_task()
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@google.com>, Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>,
 Youssef Esmat <youssefesmat@google.com>,
	Mel Gorman <mgorman@suse.de>,
 Daniel Bristot de Oliveira <bristot@redhat.com>,
 Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>, kernel-team@android.com,
	Valentin Schneider <valentin.schneider@arm.com>,
 "Connor O'Brien" <connoro@google.com>,
	John Stultz <jstultz@google.com>
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1785758318556148103
X-GMAIL-MSGID: 1785758318556148103

From: Peter Zijlstra <peterz@infradead.org>

If the blocked_on chain resolves to a sleeping owner, deactivate
selected task, and enqueue it on the sleeping owner task.
Then re-activate it later when the owner is woken up.

NOTE: This has been particularly challenging to get working
properly, and some of the locking is particularly ackward. I'd
very much appreciate review and feedback for ways to simplify
this.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: This was broken out from the larger proxy() patch]
Signed-off-by: John Stultz <jstultz@google.com>
---
v5:
* Split out from larger proxy patch
v6:
* Major rework, replacing the single list head per task with
  per-task list head and nodes, creating a tree structure so
  we only wake up decendents of the task woken.
* Reworked the locking to take the task->pi_lock, so we can
  avoid mid-chain wakeup races from try_to_wake_up() called by
  the ww_mutex logic.
v7:
* Drop ununessary __nested lock annotation, as we already drop
  the lock prior.
* Add comments on #else & #endif lines, and clearer function
  names, and commit message tweaks as suggested by Metin Kaya
* Move activate_blocked_entities() call from ttwu_queue to
  try_to_wake_up() to simplify locking. Thanks to questions from
  Metin Kaya
* Fix irqsave/irqrestore usage now we call this outside where
  the pi_lock is held
* Fix activate_blocked_entitites not preserving wake_cpu
* Fix for UP builds
---
 include/linux/sched.h |   3 +
 kernel/fork.c         |   4 +
 kernel/sched/core.c   | 214 ++++++++++++++++++++++++++++++++++++++----
 3 files changed, 202 insertions(+), 19 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8020e224e057..6f982948a105 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1158,6 +1158,9 @@ struct task_struct {
 	enum blocked_on_state		blocked_on_state;
 	struct mutex			*blocked_on;	/* lock we're blocked on */
 	struct task_struct		*blocked_donor;	/* task that is boosting this task */
+	struct list_head		blocked_head;  /* tasks blocked on this task */
+	struct list_head		blocked_node;  /* our entry on someone elses blocked_head */
+	struct task_struct		*sleeping_owner; /* task our blocked_node is enqueued on */
 	raw_spinlock_t			blocked_lock;
 
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
diff --git a/kernel/fork.c b/kernel/fork.c
index 138fc23cad43..56f5e19c268e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2460,6 +2460,10 @@ __latent_entropy struct task_struct *copy_process(
 	p->blocked_on_state = BO_RUNNABLE;
 	p->blocked_on = NULL; /* not blocked yet */
 	p->blocked_donor = NULL; /* nobody is boosting p yet */
+
+	INIT_LIST_HEAD(&p->blocked_head);
+	INIT_LIST_HEAD(&p->blocked_node);
+	p->sleeping_owner = NULL;
 #ifdef CONFIG_BCACHE
 	p->sequential_io	= 0;
 	p->sequential_io_avg	= 0;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e0afa228bc9d..0cd63bd0bdcd 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3785,6 +3785,133 @@ static inline void ttwu_do_wakeup(struct task_struct *p)
 	trace_sched_wakeup(p);
 }
 
+#ifdef CONFIG_SCHED_PROXY_EXEC
+static void do_activate_task(struct rq *rq, struct task_struct *p, int en_flags)
+{
+	lockdep_assert_rq_held(rq);
+
+	if (!sched_proxy_exec()) {
+		activate_task(rq, p, en_flags);
+		return;
+	}
+
+	if (p->sleeping_owner) {
+		struct task_struct *owner = p->sleeping_owner;
+
+		raw_spin_lock(&owner->blocked_lock);
+		list_del_init(&p->blocked_node);
+		p->sleeping_owner = NULL;
+		raw_spin_unlock(&owner->blocked_lock);
+	}
+
+	/*
+	 * By calling activate_task with blocked_lock held, we
+	 * order against the find_proxy_task() blocked_task case
+	 * such that no more blocked tasks will be enqueued on p
+	 * once we release p->blocked_lock.
+	 */
+	raw_spin_lock(&p->blocked_lock);
+	WARN_ON(task_cpu(p) != cpu_of(rq));
+	activate_task(rq, p, en_flags);
+	raw_spin_unlock(&p->blocked_lock);
+}
+
+#ifdef CONFIG_SMP
+static inline void proxy_set_task_cpu(struct task_struct *p, int cpu)
+{
+	unsigned int wake_cpu;
+
+	/* Preserve wake_cpu */
+	wake_cpu = p->wake_cpu;
+	__set_task_cpu(p, cpu);
+	p->wake_cpu = wake_cpu;
+}
+#else /* !CONFIG_SMP */
+static inline void proxy_set_task_cpu(struct task_struct *p, int cpu)
+{
+	__set_task_cpu(p, cpu);
+}
+#endif /* CONFIG_SMP */
+
+static void activate_blocked_entities(struct rq *target_rq,
+				      struct task_struct *owner,
+				      int wake_flags)
+{
+	unsigned long flags;
+	struct rq_flags rf;
+	int target_cpu = cpu_of(target_rq);
+	int en_flags = ENQUEUE_WAKEUP | ENQUEUE_NOCLOCK;
+
+	if (wake_flags & WF_MIGRATED)
+		en_flags |= ENQUEUE_MIGRATED;
+	/*
+	 * A whole bunch of 'proxy' tasks back this blocked task, wake
+	 * them all up to give this task its 'fair' share.
+	 */
+	raw_spin_lock_irqsave(&owner->blocked_lock, flags);
+	while (!list_empty(&owner->blocked_head)) {
+		struct task_struct *pp;
+		unsigned int state;
+
+		pp = list_first_entry(&owner->blocked_head,
+				      struct task_struct,
+				      blocked_node);
+		BUG_ON(pp == owner);
+		list_del_init(&pp->blocked_node);
+		WARN_ON(!pp->sleeping_owner);
+		pp->sleeping_owner = NULL;
+		raw_spin_unlock_irqrestore(&owner->blocked_lock, flags);
+
+		raw_spin_lock_irqsave(&pp->pi_lock, flags);
+		state = READ_ONCE(pp->__state);
+		/* Avoid racing with ttwu */
+		if (state == TASK_WAKING) {
+			raw_spin_unlock_irqrestore(&pp->pi_lock, flags);
+			raw_spin_lock_irqsave(&owner->blocked_lock, flags);
+			continue;
+		}
+		if (READ_ONCE(pp->on_rq)) {
+			/*
+			 * We raced with a non mutex handoff activation of pp.
+			 * That activation will also take care of activating
+			 * all of the tasks after pp in the blocked_entry list,
+			 * so we're done here.
+			 */
+			raw_spin_unlock_irqrestore(&pp->pi_lock, flags);
+			raw_spin_lock_irqsave(&owner->blocked_lock, flags);
+			continue;
+		}
+
+		proxy_set_task_cpu(pp, target_cpu);
+
+		rq_lock_irqsave(target_rq, &rf);
+		update_rq_clock(target_rq);
+		do_activate_task(target_rq, pp, en_flags);
+		resched_curr(target_rq);
+		rq_unlock_irqrestore(target_rq, &rf);
+		raw_spin_unlock_irqrestore(&pp->pi_lock, flags);
+
+		/* recurse - XXX This needs to be reworked to avoid recursing */
+		activate_blocked_entities(target_rq, pp, wake_flags);
+
+		raw_spin_lock_irqsave(&owner->blocked_lock, flags);
+	}
+	raw_spin_unlock_irqrestore(&owner->blocked_lock, flags);
+}
+#else /* !CONFIG_SCHED_PROXY_EXEC */
+static inline void do_activate_task(struct rq *rq, struct task_struct *p,
+				    int en_flags)
+{
+	activate_task(rq, p, en_flags);
+}
+
+static inline void activate_blocked_entities(struct rq *target_rq,
+					     struct task_struct *owner,
+					     int wake_flags)
+{
+}
+#endif /* CONFIG_SCHED_PROXY_EXEC */
+
 #ifdef CONFIG_SMP
 static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p)
 {
@@ -3839,7 +3966,7 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
 		atomic_dec(&task_rq(p)->nr_iowait);
 	}
 
-	activate_task(rq, p, en_flags);
+	do_activate_task(rq, p, en_flags);
 	wakeup_preempt(rq, p, wake_flags);
 
 	ttwu_do_wakeup(p);
@@ -3936,13 +4063,19 @@ void sched_ttwu_pending(void *arg)
 	update_rq_clock(rq);
 
 	llist_for_each_entry_safe(p, t, llist, wake_entry.llist) {
+		int wake_flags;
 		if (WARN_ON_ONCE(p->on_cpu))
 			smp_cond_load_acquire(&p->on_cpu, !VAL);
 
 		if (WARN_ON_ONCE(task_cpu(p) != cpu_of(rq)))
 			set_task_cpu(p, cpu_of(rq));
 
-		ttwu_do_activate(rq, p, p->sched_remote_wakeup ? WF_MIGRATED : 0, &rf);
+		wake_flags = p->sched_remote_wakeup ? WF_MIGRATED : 0;
+		ttwu_do_activate(rq, p, wake_flags, &rf);
+		rq_unlock(rq, &rf);
+		activate_blocked_entities(rq, p, wake_flags);
+		rq_lock(rq, &rf);
+		update_rq_clock(rq);
 	}
 
 	/*
@@ -4423,6 +4556,7 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	if (p->blocked_on_state == BO_WAKING)
 		p->blocked_on_state = BO_RUNNABLE;
 	raw_spin_unlock_irqrestore(&p->blocked_lock, flags);
+	activate_blocked_entities(cpu_rq(cpu), p, wake_flags);
 out:
 	if (success)
 		ttwu_stat(p, task_cpu(p), wake_flags);
@@ -6663,19 +6797,6 @@ proxy_resched_idle(struct rq *rq, struct task_struct *next)
 	return rq->idle;
 }
 
-static bool proxy_deactivate(struct rq *rq, struct task_struct *next)
-{
-	unsigned long state = READ_ONCE(next->__state);
-
-	/* Don't deactivate if the state has been changed to TASK_RUNNING */
-	if (state == TASK_RUNNING)
-		return false;
-	if (!try_to_deactivate_task(rq, next, state, true))
-		return false;
-	proxy_resched_idle(rq, next);
-	return true;
-}
-
 #ifdef CONFIG_SMP
 /*
  * If the blocked-on relationship crosses CPUs, migrate @p to the
@@ -6761,6 +6882,31 @@ proxy_migrate_task(struct rq *rq, struct rq_flags *rf,
 }
 #endif /* CONFIG_SMP */
 
+static void proxy_enqueue_on_owner(struct rq *rq, struct task_struct *owner,
+				   struct task_struct *next)
+{
+	/*
+	 * ttwu_activate() will pick them up and place them on whatever rq
+	 * @owner will run next.
+	 */
+	if (!owner->on_rq) {
+		BUG_ON(!next->on_rq);
+		deactivate_task(rq, next, DEQUEUE_SLEEP);
+		if (task_current_selected(rq, next)) {
+			put_prev_task(rq, next);
+			rq_set_selected(rq, rq->idle);
+		}
+		/*
+		 * ttwu_do_activate must not have a chance to activate p
+		 * elsewhere before it's fully extricated from its old rq.
+		 */
+		WARN_ON(next->sleeping_owner);
+		next->sleeping_owner = owner;
+		smp_mb();
+		list_add(&next->blocked_node, &owner->blocked_head);
+	}
+}
+
 /*
  * Find who @next (currently blocked on a mutex) can proxy for.
  *
@@ -6866,10 +7012,40 @@ find_proxy_task(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
 		}
 
 		if (!owner->on_rq) {
-			/* XXX Don't handle blocked owners yet */
-			if (!proxy_deactivate(rq, next))
-				ret = next;
-			goto out;
+			/*
+			 * rq->curr must not be added to the blocked_head list or else
+			 * ttwu_do_activate could enqueue it elsewhere before it switches
+			 * out here. The approach to avoid this is the same as in the
+			 * migrate_task case.
+			 */
+			if (curr_in_chain) {
+				raw_spin_unlock(&p->blocked_lock);
+				raw_spin_unlock(&mutex->wait_lock);
+				return proxy_resched_idle(rq, next);
+			}
+
+			/*
+			 * If !@owner->on_rq, holding @rq->lock will not pin the task,
+			 * so we cannot drop @mutex->wait_lock until we're sure its a blocked
+			 * task on this rq.
+			 *
+			 * We use @owner->blocked_lock to serialize against ttwu_activate().
+			 * Either we see its new owner->on_rq or it will see our list_add().
+			 */
+			if (owner != p) {
+				raw_spin_unlock(&p->blocked_lock);
+				raw_spin_lock(&owner->blocked_lock);
+			}
+
+			proxy_enqueue_on_owner(rq, owner, next);
+
+			if (task_current_selected(rq, next)) {
+				put_prev_task(rq, next);
+				rq_set_selected(rq, rq->idle);
+			}
+			raw_spin_unlock(&owner->blocked_lock);
+			raw_spin_unlock(&mutex->wait_lock);
+			return NULL; /* retry task selection */
 		}
 
 		if (owner == p) {

From patchwork Wed Dec 20 00:18:28 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: John Stultz <jstultz@google.com>
X-Patchwork-Id: 181366
Return-Path: <linux-kernel+bounces-6153-ouuuleilei=gmail.com@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp2324369dyi;
        Tue, 19 Dec 2023 16:24:26 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IH2szhQnQamoksCcalpMxXVlK8nBOhfF65Z+mTOfJOmQS8ZKNdIOzyMX0IBJ47VdxsFiQFW
X-Received: by 2002:a05:6808:120c:b0:3b8:b063:505e with SMTP id
 a12-20020a056808120c00b003b8b063505emr22807429oil.95.1703031866688;
        Tue, 19 Dec 2023 16:24:26 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1703031866; cv=none;
        d=google.com; s=arc-20160816;
        b=NNxfSkz6yf8Nj7BcwlcJBoC26YxXfNpLAC7dWHt6Zvp9jmIhJIfK/8BLfrVMkOxICM
         0Yy21Jku1MJlIwgtfeL9mV03YdTp224WDipK55ieEiyQDXJCSl80XxL21KgHO2F8LMTt
         gxuwx35wY8Ryd8ItiRtFcOdOvYnKhKMfsu+poUhAb1/0K3oQwBFlKaKKScdSTidUKfaL
         bymEpWHabkX48dFlYbDQLHqrrzbK7qrQ3zHe0cGbMveq7vpCyluXR8ZlexfC2GT1p77i
         ndWWQReqP0+280Vx6HNMP7lPsc4+NjpERt9mubvkMhxbGPUSSMrkn4dB7VmD1fq+emEL
         Aa1Q==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=content-transfer-encoding:cc:to:from:subject:message-id:references
         :mime-version:list-unsubscribe:list-subscribe:list-id:precedence
         :in-reply-to:date:dkim-signature;
        bh=AIfyyqU8vvkBxzT4xph+dm2KvcH+GhhY/2ZmCVqXntI=;
        fh=lq1k4m46TqOToIb7rHC/F2KmHTJTjmaV9sS3Am9W7nA=;
        b=0eGig8GVr2RhdHF67bwiQiOS+ZLYT1ug6Iz1x9Ebgi9BpoA+AqRbbB3S2Bmqkymt/F
         X7jvuPEutzZfOisCYYBfLxh4xPXm+RgC+X2rTPQ19VnxIgJCCUQENIgxhQC/U6CNZ6Jk
         dLC9iVtGRW5rZx9tPNRskGDv+4NG94BTTi/YRjDhWiGy25Xp6Xt7eC4aigACocOCvLrd
         fCR6hwVJnEPNSoIX4/ADnORxxINKlI1C9kK6rODlwcaob2TeFeK7uDXBLcZMd+pip+2h
         NtJZahY5A0G/1BBHsFNy0lg1YruXZbhz3Ao6ncoMsD7Ga1gZu17bz9EW5PZ4N+0ty8ir
         UZ9w==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=umVrC99g;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6153-ouuuleilei=gmail.com@vger.kernel.org designates
 2604:1380:45e3:2400::1 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6153-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org.
 [2604:1380:45e3:2400::1])
        by mx.google.com with ESMTPS id
 oa16-20020a17090b1bd000b0028bb8de1428si1992019pjb.128.2023.12.19.16.24.26
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 19 Dec 2023 16:24:26 -0800 (PST)
Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-6153-ouuuleilei=gmail.com@vger.kernel.org designates
 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=umVrC99g;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6153-ouuuleilei=gmail.com@vger.kernel.org designates
 2604:1380:45e3:2400::1 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6153-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org
 [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by sv.mirrors.kernel.org (Postfix) with ESMTPS id 25AD828847B
	for <ouuuleilei@gmail.com>; Wed, 20 Dec 2023 00:24:15 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp.subspace.kernel.org (Postfix) with ESMTP id 5017E21106;
	Wed, 20 Dec 2023 00:19:40 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="umVrC99g"
X-Original-To: linux-kernel@vger.kernel.org
Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com
 [209.85.128.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 95372200D5
	for <linux-kernel@vger.kernel.org>; Wed, 20 Dec 2023 00:19:33 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Received: by mail-yw1-f201.google.com with SMTP id
 00721157ae682-5e73bd9079eso32578577b3.2
        for <linux-kernel@vger.kernel.org>;
 Tue, 19 Dec 2023 16:19:33 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1703031572; x=1703636372;
 darn=vger.kernel.org;
        h=content-transfer-encoding:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id
         :reply-to;
        bh=AIfyyqU8vvkBxzT4xph+dm2KvcH+GhhY/2ZmCVqXntI=;
        b=umVrC99gRoJhtjuT9Sr4qw/c6e1DLGzxutQKK5qOnxrXllsSXpXpOoA2UTVmAlqguj
         gyaNfHIrmyHWLONVtHRsNQKDJXulTbLVt7RHDCv3YSY+OgPu+4UeGXXM5alTZtsYp5xC
         WkuxSh6Vl6D+IkGLzG1m1yTHKs0AoEyQR94zrvJFIShEXdQ08bWpQg/wMQP87Ngru5vt
         QPJ/dGfucl8QV+1ThB4XHceWwhAUZ+CSHtQpJ7mbGCaw939xvSQ4/n2lWSxxNKzfXKgI
         ZL6HxUnRDsLOK2H4ur0IuUwjR2YVtL5UXCyOz7k+Vry/ZUzmF0DtZMp+oCtBV2DhMIEx
         DBDQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1703031572; x=1703636372;
        h=content-transfer-encoding:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject
         :date:message-id:reply-to;
        bh=AIfyyqU8vvkBxzT4xph+dm2KvcH+GhhY/2ZmCVqXntI=;
        b=plgU+p4OW9YSJtcZDuFFQzuZJbU/we6lJwaiB9vwQiRoHWlreqsCmAEi/Aa4B57ROC
         ZNqcBH00rKaPVXKT6BYDjcx5a/yZPe8LoB4Se4gTmm0ns6y19YwKVhFhWlbnLBInGAJq
         9MWmStOpAN5oKF67vTmR5hMS7ot0k+ydOw2oGmjqVOFm2onbMonvR2QYlpEW6dHNrW0c
         hP5UCYhARmKoyQVeodP1CuwGWTchfHx9pffYpwNoBC1aRYJG35tb05sDphUBnqqZSxBu
         YO/acuXD5ojz9w8XOwcssFzbi7RM8D9INRpY5CDqK4hejCOUVJ+PAEWnIrObzR85P+58
         4Fyg==
X-Gm-Message-State: AOJu0Yyuw/YWSTKLd9rrxQJUH5+jQdpAVHuVnjaTcLeDpJWDgg1x5nWa
	uX8aFRF2uPXnF7IGb2nfXrtGV12we56sG/ejQGmFlmcFnhn48/vpcmMAV07ddvSCH5hbAILh95x
	egTCs1EyzIj7AvNUe0mZog7TrKvsTw/GnDy+SlATtL2RhCRlSaDX/NvsBr0j6mHZ9ptrG9O4=
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a05:690c:4491:b0:5e8:3e57:6900 with SMTP
 id gr17-20020a05690c449100b005e83e576900mr423186ywb.1.1703031572518; Tue, 19
 Dec 2023 16:19:32 -0800 (PST)
Date: Tue, 19 Dec 2023 16:18:28 -0800
In-Reply-To: <20231220001856.3710363-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231220001856.3710363-1-jstultz@google.com>
X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog
Message-ID: <20231220001856.3710363-18-jstultz@google.com>
Subject: [PATCH v7 17/23] sched: Initial sched_football test implementation
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@google.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <vschneid@redhat.com>,
 Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Zimuzo Ezeozue <zezeozue@google.com>,
	Youssef Esmat <youssefesmat@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
 Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>, "Paul E. McKenney" <paulmck@kernel.org>,
	Metin Kaya <Metin.Kaya@arm.com>, Xuewen Yan <xuewen.yan94@gmail.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
 Thomas Gleixner <tglx@linutronix.de>, kernel-team@android.com
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1785758342650347186
X-GMAIL-MSGID: 1785758342650347186

Reimplementation of the sched_football test from LTP:
https://github.com/linux-test-project/ltp/blob/master/testcases/realtime/func/sched_football/sched_football.c

But reworked to run in the kernel and utilize mutexes
to illustrate proper boosting of low priority mutex
holders.

TODO:
* Need a rt_mutex version so it can work w/o proxy-execution
* Need a better place to put it

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com
Signed-off-by: John Stultz <jstultz@google.com>
---
 kernel/sched/Makefile              |   1 +
 kernel/sched/test_sched_football.c | 242 +++++++++++++++++++++++++++++
 lib/Kconfig.debug                  |  14 ++
 3 files changed, 257 insertions(+)
 create mode 100644 kernel/sched/test_sched_football.c

diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 976092b7bd45..2729d565dfd7 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -32,3 +32,4 @@ obj-y += core.o
 obj-y += fair.o
 obj-y += build_policy.o
 obj-y += build_utility.o
+obj-$(CONFIG_SCHED_RT_INVARIENT_TEST) += test_sched_football.o
diff --git a/kernel/sched/test_sched_football.c b/kernel/sched/test_sched_football.c
new file mode 100644
index 000000000000..9742c45c0fe0
--- /dev/null
+++ b/kernel/sched/test_sched_football.c
@@ -0,0 +1,242 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Module-based test case for RT scheduling invariant
+ *
+ * A reimplementation of my old sched_football test
+ * found in LTP:
+ *   https://github.com/linux-test-project/ltp/blob/master/testcases/realtime/func/sched_football/sched_football.c
+ *
+ * Similar to that test, this tries to validate the RT
+ * scheduling invariant, that the across N available cpus, the
+ * top N priority tasks always running.
+ *
+ * This is done via having N offsensive players that are
+ * medium priority, which constantly are trying to increment the
+ * ball_pos counter.
+ *
+ * Blocking this, are N defensive players that are higher
+ * priority which just spin on the cpu, preventing the medium
+ * priroity tasks from running.
+ *
+ * To complicate this, there are also N defensive low priority
+ * tasks. These start first and each aquire one of N mutexes.
+ * The high priority defense tasks will later try to grab the
+ * mutexes and block, opening a window for the offsensive tasks
+ * to run and increment the ball. If priority inheritance or
+ * proxy execution is used, the low priority defense players
+ * should be boosted to the high priority levels, and will
+ * prevent the mid priority offensive tasks from running.
+ *
+ * Copyright © International Business Machines  Corp., 2007, 2008
+ * Copyright (C) Google, 2023
+ *
+ * Authors: John Stultz <jstultz@google.com>
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/kthread.h>
+#include <linux/delay.h>
+#include <linux/sched/rt.h>
+#include <linux/spinlock.h>
+#include <linux/mutex.h>
+#include <linux/rwsem.h>
+#include <linux/smp.h>
+#include <linux/slab.h>
+#include <linux/interrupt.h>
+#include <linux/sched.h>
+#include <uapi/linux/sched/types.h>
+#include <linux/rtmutex.h>
+
+atomic_t players_ready;
+atomic_t ball_pos;
+int players_per_team;
+bool game_over;
+
+struct mutex *mutex_low_list;
+struct mutex *mutex_mid_list;
+
+static inline
+struct task_struct *create_fifo_thread(int (*threadfn)(void *data), void *data,
+				       char *name, int prio)
+{
+	struct task_struct *kth;
+	struct sched_attr attr = {
+		.size		= sizeof(struct sched_attr),
+		.sched_policy	= SCHED_FIFO,
+		.sched_nice	= 0,
+		.sched_priority	= prio,
+	};
+	int ret;
+
+	kth = kthread_create(threadfn, data, name);
+	if (IS_ERR(kth)) {
+		pr_warn("%s eerr, kthread_create failed\n", __func__);
+		return kth;
+	}
+	ret = sched_setattr_nocheck(kth, &attr);
+	if (ret) {
+		kthread_stop(kth);
+		pr_warn("%s: failed to set SCHED_FIFO\n", __func__);
+		return ERR_PTR(ret);
+	}
+
+	wake_up_process(kth);
+	return kth;
+}
+
+int defense_low_thread(void *arg)
+{
+	long tnum = (long)arg;
+
+	atomic_inc(&players_ready);
+	mutex_lock(&mutex_low_list[tnum]);
+	while (!READ_ONCE(game_over)) {
+		if (kthread_should_stop())
+			break;
+		schedule();
+	}
+	mutex_unlock(&mutex_low_list[tnum]);
+	return 0;
+}
+
+int defense_mid_thread(void *arg)
+{
+	long tnum = (long)arg;
+
+	atomic_inc(&players_ready);
+	mutex_lock(&mutex_mid_list[tnum]);
+	mutex_lock(&mutex_low_list[tnum]);
+	while (!READ_ONCE(game_over)) {
+		if (kthread_should_stop())
+			break;
+		schedule();
+	}
+	mutex_unlock(&mutex_low_list[tnum]);
+	mutex_unlock(&mutex_mid_list[tnum]);
+	return 0;
+}
+
+int offense_thread(void *)
+{
+	atomic_inc(&players_ready);
+	while (!READ_ONCE(game_over)) {
+		if (kthread_should_stop())
+			break;
+		schedule();
+		atomic_inc(&ball_pos);
+	}
+	return 0;
+}
+
+int defense_hi_thread(void *arg)
+{
+	long tnum = (long)arg;
+
+	atomic_inc(&players_ready);
+	mutex_lock(&mutex_mid_list[tnum]);
+	while (!READ_ONCE(game_over)) {
+		if (kthread_should_stop())
+			break;
+		schedule();
+	}
+	mutex_unlock(&mutex_mid_list[tnum]);
+	return 0;
+}
+
+int crazy_fan_thread(void *)
+{
+	int count = 0;
+
+	atomic_inc(&players_ready);
+	while (!READ_ONCE(game_over)) {
+		if (kthread_should_stop())
+			break;
+		schedule();
+		udelay(1000);
+		msleep(2);
+		count++;
+	}
+	return 0;
+}
+
+int ref_thread(void *arg)
+{
+	struct task_struct *kth;
+	long game_time = (long)arg;
+	unsigned long final_pos;
+	long i;
+
+	pr_info("%s: started ref, game_time: %ld secs !\n", __func__,
+		game_time);
+
+	/* Create low  priority defensive team */
+	for (i = 0; i < players_per_team; i++)
+		kth = create_fifo_thread(defense_low_thread, (void *)i,
+					 "defese-low-thread", 2);
+	/* Wait for the defense threads to start */
+	while (atomic_read(&players_ready) < players_per_team)
+		msleep(1);
+
+	for (i = 0; i < players_per_team; i++)
+		kth = create_fifo_thread(defense_mid_thread,
+					 (void *)(players_per_team - i - 1),
+					 "defese-mid-thread", 3);
+	/* Wait for the defense threads to start */
+	while (atomic_read(&players_ready) < players_per_team * 2)
+		msleep(1);
+
+	/* Create mid priority offensive team */
+	for (i = 0; i < players_per_team; i++)
+		kth = create_fifo_thread(offense_thread, NULL,
+					 "offense-thread", 5);
+	/* Wait for the offense threads to start */
+	while (atomic_read(&players_ready) < players_per_team * 3)
+		msleep(1);
+
+	/* Create high priority defensive team */
+	for (i = 0; i < players_per_team; i++)
+		kth = create_fifo_thread(defense_hi_thread, (void *)i,
+					 "defese-hi-thread", 10);
+	/* Wait for the defense threads to start */
+	while (atomic_read(&players_ready) < players_per_team * 4)
+		msleep(1);
+
+	/* Create high priority defensive team */
+	for (i = 0; i < players_per_team; i++)
+		kth = create_fifo_thread(crazy_fan_thread, NULL,
+					 "crazy-fan-thread", 15);
+	/* Wait for the defense threads to start */
+	while (atomic_read(&players_ready) < players_per_team * 5)
+		msleep(1);
+
+	pr_info("%s: all players checked in! Starting game.\n", __func__);
+	atomic_set(&ball_pos, 0);
+	msleep(game_time * 1000);
+	final_pos = atomic_read(&ball_pos);
+	pr_info("%s: final ball_pos: %ld\n", __func__, final_pos);
+	WARN_ON(final_pos != 0);
+	game_over = true;
+	return 0;
+}
+
+static int __init test_sched_football_init(void)
+{
+	struct task_struct *kth;
+	int i;
+
+	players_per_team = num_online_cpus();
+
+	mutex_low_list = kmalloc_array(players_per_team,  sizeof(struct mutex), GFP_ATOMIC);
+	mutex_mid_list = kmalloc_array(players_per_team,  sizeof(struct mutex), GFP_ATOMIC);
+
+	for (i = 0; i < players_per_team; i++) {
+		mutex_init(&mutex_low_list[i]);
+		mutex_init(&mutex_mid_list[i]);
+	}
+
+	kth = create_fifo_thread(ref_thread, (void *)10, "ref-thread", 20);
+
+	return 0;
+}
+module_init(test_sched_football_init);
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 4405f81248fb..1d90059d190f 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1238,6 +1238,20 @@ config SCHED_DEBUG
 	  that can help debug the scheduler. The runtime overhead of this
 	  option is minimal.
 
+config SCHED_RT_INVARIENT_TEST
+	tristate "RT invarient scheduling tester"
+	depends on DEBUG_KERNEL
+	help
+	  This option provides a kernel module that runs tests to make
+	  sure the RT invarient holds (top N priority tasks run on N
+	  available cpus).
+
+	  Say Y here if you want kernel rt scheduling tests
+	  to be built into the kernel.
+	  Say M if you want this test to build as a module.
+	  Say N if you are unsure.
+
+
 config SCHED_INFO
 	bool
 	default n

From patchwork Wed Dec 20 00:18:29 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: John Stultz <jstultz@google.com>
X-Patchwork-Id: 181370
Return-Path: <linux-kernel+bounces-6154-ouuuleilei=gmail.com@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp2325534dyi;
        Tue, 19 Dec 2023 16:27:13 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IEzfs1z3JoecDTp4uSxMkfqY5ioTq9dAk/YR4sQScYYvVrsVzZEwzHpgMll/yZDH3sLqaPO
X-Received: by 2002:a17:902:b408:b0:1d2:ec54:894b with SMTP id
 x8-20020a170902b40800b001d2ec54894bmr20205753plr.61.1703032033124;
        Tue, 19 Dec 2023 16:27:13 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1703032033; cv=none;
        d=google.com; s=arc-20160816;
        b=bhLJm0RCeffJPINUy0lTiyGz0H37+wyeGkIahLVI/oKmKvXMSZRKvWjVgQOrB0YP5L
         zwaJqvabp3MtazLEqnfza8dsOgkrus2TRnC6xeI0CEjtfc/84TuWWEoLpFxLd+/r1SoQ
         fR8b0toCcKQtZVDmgC3s8zrrfAbn+mm5ku+Z++paqDcaO0tbMAV6c73o0L9PM/tACd3l
         eNsRGGYJj8aRzG9GDNddH4c3a6n82eJD15joJpa/knAFuPtOvV9+TeQrl3TBF0JAzD24
         a/IRWhwl4emVnh4LVrnJ/ffdKpK6XKruRvc3BSAv0qFTniawEmDXJjrHWjEWiXkRfb1V
         li+w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=cc:to:from:subject:message-id:references:mime-version
         :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date
         :dkim-signature;
        bh=NGhThAZPlKuA3DG9SjSGW+fpPhrooALxVkfxxcKParw=;
        fh=OA4B84zChEo2N2zZSnRgF4ImSezDLuvwskDfqEiOYJk=;
        b=U7DFkX3A9HAaZ5Q45kLx+Rpq0EZCGHqoDU/VSveJTVPlbj0mtTZHaIHf1/t5ki5MIz
         pc/c7G2Olg0aW0aj7BSh0iYe87LcEA12cZ8bK2xs1ZUsFOFBedr+NEUOt+317b69zLAm
         wzGF56fXr51oQCFeDkuKWqL3P3YXuq4vsCnZM27bMjsFXAN3MhHBAw+v500Hk5YU2cJd
         Bf9vUG2TGLNaNrqk2E9NwJoUfpxynTcE9QE/5tqxDAe/sGggA4pcGxM9b7Wzwrr517Ne
         CDVtQhvx1dOmaCbMbmN2Hidizb/IwYFGfIxlpwPU2WKFmnk+zIeA5wxLDZi3ccapzH3V
         Xy+Q==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=L7YCRIpB;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6154-ouuuleilei=gmail.com@vger.kernel.org designates
 2604:1380:40f1:3f00::1 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6154-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org.
 [2604:1380:40f1:3f00::1])
        by mx.google.com with ESMTPS id
 4-20020a17090a08c400b0028b0b7072dfsi132665pjn.94.2023.12.19.16.27.12
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 19 Dec 2023 16:27:13 -0800 (PST)
Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-6154-ouuuleilei=gmail.com@vger.kernel.org designates
 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=L7YCRIpB;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6154-ouuuleilei=gmail.com@vger.kernel.org designates
 2604:1380:40f1:3f00::1 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6154-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org
 [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by sy.mirrors.kernel.org (Postfix) with ESMTPS id 6D0C6B224A1
	for <ouuuleilei@gmail.com>; Wed, 20 Dec 2023 00:24:32 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp.subspace.kernel.org (Postfix) with ESMTP id 9D13A2136E;
	Wed, 20 Dec 2023 00:19:42 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="L7YCRIpB"
X-Original-To: linux-kernel@vger.kernel.org
Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com
 [209.85.128.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id B50A8208DA
	for <linux-kernel@vger.kernel.org>; Wed, 20 Dec 2023 00:19:35 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Received: by mail-yw1-f202.google.com with SMTP id
 00721157ae682-5e617562a65so44910647b3.1
        for <linux-kernel@vger.kernel.org>;
 Tue, 19 Dec 2023 16:19:35 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1703031574; x=1703636374;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=NGhThAZPlKuA3DG9SjSGW+fpPhrooALxVkfxxcKParw=;
        b=L7YCRIpB6vKiiJgC63DQSXvgdbkErP0jGYoJAjBj/lr5eBFz68Hko82Z37XnqWEW2v
         7olIuzPZQQjfQcL2TKFxLubLWQRUyQ0CYsPN4AcGuhq+VJo8E3nZeI5csinnaJ7conEU
         CptNzn6/zW0FBak0vq35O1oU5c4BnVRFZ0YUkISJVW4griuqvNxtzRN+8y7ArI2dOMVW
         +lQDtXfpyHEQNgbtPoIowsIBEq920WPJHwW0XtNsuMFbrVs8hQnJQt14enJTKLqe+VZr
         Q3rduyTQ22yyRhntoFlrt71CyLeXSuneoP1FaxGAxbr3ACkiEtL2uQU8xCOe5OtaVaCQ
         zuSA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1703031574; x=1703636374;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=NGhThAZPlKuA3DG9SjSGW+fpPhrooALxVkfxxcKParw=;
        b=vpE5r7p25mFmenXTHzRu3l9Lk7zv9305zYHgcNVWZ50YYxo54o9Qidyo6sd3CKwPVW
         /sfCStSN741RU48K20qt9g2GN32kRi5092RNkO/LZ40GjEAO+ClMfuQ7+imEHqJn0cA1
         v0EaN74GJNyu1w/a2Zh+h8OZO8b0I4nX/rpbyNCV9+nsfonDwaYMT8dlrBYl9hDzyAkK
         N/Lpg9qgkp2QPS58jrZesSVJncPt4RiIHCJ6vvKwHIMyvKGfTFidtR9TvKBb+NllWmdH
         UEVoiI2fuf3CZd1eE3tmMIIw12qCsNkX2wTVM3DvNoVlqIub/GXOg83wMA/ygWJ6LLWT
         dZ3w==
X-Gm-Message-State: AOJu0YyuspeXdTYAGjmuzhUYeixAQERcah91gp19dnxx9nTLYPbT2Pq5
	Cs0fM70eSPGAWls3lvKsA1Cv7ZAkUkW0FPsy5L5Dg7K+cQ5KGugdMoiTGBH60gibbCwdYtDBV7e
	ZQzsUxphmPcBznI3XHZrcORzG1N3cTINLhudN+SHRWdrdyS1xZQjpYKA/zuFoxoI1wnqEgNU=
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a05:690c:d93:b0:5d4:3013:25d4 with SMTP id
 da19-20020a05690c0d9300b005d4301325d4mr3569079ywb.5.1703031574440; Tue, 19
 Dec 2023 16:19:34 -0800 (PST)
Date: Tue, 19 Dec 2023 16:18:29 -0800
In-Reply-To: <20231220001856.3710363-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231220001856.3710363-1-jstultz@google.com>
X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog
Message-ID: <20231220001856.3710363-19-jstultz@google.com>
Subject: [PATCH v7 18/23] sched: Add push_task_chain helper
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: "Connor O'Brien" <connoro@google.com>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@google.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <vschneid@redhat.com>,
 Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Zimuzo Ezeozue <zezeozue@google.com>,
	Youssef Esmat <youssefesmat@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
 Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>, "Paul E. McKenney" <paulmck@kernel.org>,
	Metin Kaya <Metin.Kaya@arm.com>, Xuewen Yan <xuewen.yan94@gmail.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
 Thomas Gleixner <tglx@linutronix.de>, kernel-team@android.com,
	John Stultz <jstultz@google.com>
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1785758517015593023
X-GMAIL-MSGID: 1785758517015593023

From: Connor O'Brien <connoro@google.com>

Switch logic that deactivates, sets the task cpu,
and reactivates a task on a different rq to use a
helper that will be later extended to push entire
blocked task chains.

This patch was broken out from a larger chain migration
patch originally by Connor O'Brien.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: split out from larger chain migration patch]
Signed-off-by: John Stultz <jstultz@google.com>
---
 kernel/sched/core.c     | 4 +---
 kernel/sched/deadline.c | 8 ++------
 kernel/sched/rt.c       | 8 ++------
 kernel/sched/sched.h    | 9 +++++++++
 4 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0cd63bd0bdcd..0c212dcd4b7a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2721,9 +2721,7 @@ int push_cpu_stop(void *arg)
 
 	// XXX validate p is still the highest prio task
 	if (task_rq(p) == rq) {
-		deactivate_task(rq, p, 0);
-		set_task_cpu(p, lowest_rq->cpu);
-		activate_task(lowest_rq, p, 0);
+		push_task_chain(rq, lowest_rq, p);
 		resched_curr(lowest_rq);
 	}
 
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 4f998549ea74..def1eb23318b 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2313,9 +2313,7 @@ static int push_dl_task(struct rq *rq)
 		goto retry;
 	}
 
-	deactivate_task(rq, next_task, 0);
-	set_task_cpu(next_task, later_rq->cpu);
-	activate_task(later_rq, next_task, 0);
+	push_task_chain(rq, later_rq, next_task);
 	ret = 1;
 
 	resched_curr(later_rq);
@@ -2401,9 +2399,7 @@ static void pull_dl_task(struct rq *this_rq)
 			if (is_migration_disabled(p)) {
 				push_task = get_push_task(src_rq);
 			} else {
-				deactivate_task(src_rq, p, 0);
-				set_task_cpu(p, this_cpu);
-				activate_task(this_rq, p, 0);
+				push_task_chain(src_rq, this_rq, p);
 				dmin = p->dl.deadline;
 				resched = true;
 			}
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index a7b51a021111..cf0eb4aac613 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2128,9 +2128,7 @@ static int push_rt_task(struct rq *rq, bool pull)
 		goto retry;
 	}
 
-	deactivate_task(rq, next_task, 0);
-	set_task_cpu(next_task, lowest_rq->cpu);
-	activate_task(lowest_rq, next_task, 0);
+	push_task_chain(rq, lowest_rq, next_task);
 	resched_curr(lowest_rq);
 	ret = 1;
 
@@ -2401,9 +2399,7 @@ static void pull_rt_task(struct rq *this_rq)
 			if (is_migration_disabled(p)) {
 				push_task = get_push_task(src_rq);
 			} else {
-				deactivate_task(src_rq, p, 0);
-				set_task_cpu(p, this_cpu);
-				activate_task(this_rq, p, 0);
+				push_task_chain(src_rq, this_rq, p);
 				resched = true;
 			}
 			/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 765ba10661de..19afe532771f 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3546,5 +3546,14 @@ static inline void init_sched_mm_cid(struct task_struct *t) { }
 
 extern u64 avg_vruntime(struct cfs_rq *cfs_rq);
 extern int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se);
+#ifdef CONFIG_SMP
+static inline
+void push_task_chain(struct rq *rq, struct rq *dst_rq, struct task_struct *task)
+{
+	deactivate_task(rq, task, 0);
+	set_task_cpu(task, dst_rq->cpu);
+	activate_task(dst_rq, task, 0);
+}
+#endif
 
 #endif /* _KERNEL_SCHED_SCHED_H */

From patchwork Wed Dec 20 00:18:30 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: John Stultz <jstultz@google.com>
X-Patchwork-Id: 181367
Return-Path: <linux-kernel+bounces-6155-ouuuleilei=gmail.com@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp2324478dyi;
        Tue, 19 Dec 2023 16:24:46 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IEQx6zRwYGFWpPl1dSKqO+1PJTK1ub/9R/T5tQKrwqp+oFNfwXr0zIGqtfKwp96pSOpYBRl
X-Received: by 2002:a05:620a:4092:b0:77f:af19:257f with SMTP id
 f18-20020a05620a409200b0077faf19257fmr11284301qko.35.1703031886724;
        Tue, 19 Dec 2023 16:24:46 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1703031886; cv=none;
        d=google.com; s=arc-20160816;
        b=mSMYsrdxMt1IHe29/FHyHYaETKvrSO3mPG9/TqKd6x6P2ldc2Bg6rm+7qd0lF0M56R
         V+SdeH3CjhtG1YskUGTbxiMciwohIXhFT3W0W1Ue1IdzeelrZxkej4/rgsJglIFyt408
         G63uqteiiyX2qdLZHz4HqN82o4HlIqlo2SRmQLCg9DkHGIsthrR9Z3rOAtzEuNTkDMTg
         sXpCgki0a8V9XwJfggopRrk1JCBRayNgnjoyyD3t2XF0JCm6J66JHAtkA0MF6kTxnDGA
         Uxjin0dj65tVntCG4Uy4xGl/ETy4KV2wLr6oTwbdcR2PCXx3r05K4VSwyRNGuI2I6qJi
         x08w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=cc:to:from:subject:message-id:references:mime-version
         :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date
         :dkim-signature;
        bh=VR583+vpJaLMIO4BHBgJSftKDpKeZ1SmQDLq2jDxSZY=;
        fh=OA4B84zChEo2N2zZSnRgF4ImSezDLuvwskDfqEiOYJk=;
        b=s7c3hCIZ/hbpoSkPDnIeqZkUbvPA/oCLkxWBg3WS3gpr0LO98zjMd/IL3LOmMON28M
         hJvv29DvHPZxd+g0yDRfFIQm66rzbpx9gamAxLdYZAZb5poWjiy85EtT/aVbW8/u3sV8
         W+Abp62JLs/KyjtGr5KfZ0zARTdpRwQ/FlOwi8n2jFcXDy4wWKEdnOD7ANUYK0u1WaYo
         XyOYPTClJqY60rxA0PH09u3hJnn3VJPbN72fmpTYI1EFBDLRBpGpLL2fFWuaSFoswS3g
         goIkyb/YYg4bM071A3LJS4z7oI81JwfCF0T/k8v2i/inBymCrr74QoNUqsTsv1afnpiR
         WM6w==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=r7fx3AG0;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6155-ouuuleilei=gmail.com@vger.kernel.org designates
 2604:1380:45d1:ec00::1 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6155-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org.
 [2604:1380:45d1:ec00::1])
        by mx.google.com with ESMTPS id
 dc51-20020a05620a523300b0077f445f685dsi25988886qkb.656.2023.12.19.16.24.46
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 19 Dec 2023 16:24:46 -0800 (PST)
Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-6155-ouuuleilei=gmail.com@vger.kernel.org designates
 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=r7fx3AG0;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6155-ouuuleilei=gmail.com@vger.kernel.org designates
 2604:1380:45d1:ec00::1 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6155-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org
 [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by ny.mirrors.kernel.org (Postfix) with ESMTPS id 43D471C253A0
	for <ouuuleilei@gmail.com>; Wed, 20 Dec 2023 00:24:41 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp.subspace.kernel.org (Postfix) with ESMTP id 7824121A18;
	Wed, 20 Dec 2023 00:19:45 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="r7fx3AG0"
X-Original-To: linux-kernel@vger.kernel.org
Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com
 [209.85.219.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6751420B31
	for <linux-kernel@vger.kernel.org>; Wed, 20 Dec 2023 00:19:37 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Received: by mail-yb1-f201.google.com with SMTP id
 3f1490d57ef6-dbd59ddfd66so326407276.0
        for <linux-kernel@vger.kernel.org>;
 Tue, 19 Dec 2023 16:19:37 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1703031576; x=1703636376;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=VR583+vpJaLMIO4BHBgJSftKDpKeZ1SmQDLq2jDxSZY=;
        b=r7fx3AG0MvIL59CalkTGEjBoDV7JayZCs0cvTIpF/VkFBkGGhXI1co8eTjs6qkn5Og
         fTLZEmnCZweQ3Hcuo3zcxl32xBhz4lgA7hupHNhdcVYH31mM60v83pAtoieEq+h3KzWu
         WGw8BrKxQvRkNykwv10i/6F4mgYmALkvyqnKAE6ZdupmaLf0Wrx6Hsv9JqIyJFsE6AOb
         niDPrxJ1yRnErZbdd3nLLVz6pSUlHoeBCV0uGeapUhPFIl5Q3fidaJYbHUZllxGV7GFo
         SA0XEn0kxAbfLWIPxkstx5OgGnwIrR50DYQpDTiPhjf8AV9E1mItZMo2/bNtil8NTRgc
         /cQg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1703031576; x=1703636376;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=VR583+vpJaLMIO4BHBgJSftKDpKeZ1SmQDLq2jDxSZY=;
        b=JU+xkWZYKVFd2jSrrz1OsCwTCIENp2IdYd/EepGLTspANcSNfIlyqV6iSBWP2acjxN
         LHVHD5K8FWkH3lNw7hrVyzZTBaJNxlLGDuRoKy50+m7yWEBB5qhxlHl5YuxGhMwVr4/X
         ZjAhKgIfzQ9Qkxr40edo58gDgfIVaYtPVKW4AfNPh2DRa1VQPpsgi/PrOrxfHi/4fQ9z
         c7r+M3/0stRuc1qjlHqqqhmWKbdWGbu0yeRDkItJLvyeoDASsAcj+987WhPG/NQP9D+s
         ZMCkJV+DpKEoVcBW/FXu9Xv5EBbBnnSi7MoO7bR8UtnyAtNLwD2b2Y/N7PwSY5Q6nJHl
         nIJw==
X-Gm-Message-State: AOJu0Ywq3TlwE49BT0UfeYJACANbD8vK00k0Cm40D1rbV8fyc6AxL5Zr
	uhiLd4/1Z9bY/GNBNmkDx7Zejz3/YZXrdJktgqREX6Dxsv60c+A3MdbXtlxfGtAtiDVaDGcUMnF
	66jxzKkvpIcfrnGujOWp+HRpzdqirtnaqqjto8os14FHOZbUaP81dlfcfbygPixgPEMm7qA2IOZ
	zHXg==
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a05:6902:27c5:b0:dbc:c5f8:ae16 with SMTP
 id ec5-20020a05690227c500b00dbcc5f8ae16mr726852ybb.5.1703031576267; Tue, 19
 Dec 2023 16:19:36 -0800 (PST)
Date: Tue, 19 Dec 2023 16:18:30 -0800
In-Reply-To: <20231220001856.3710363-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231220001856.3710363-1-jstultz@google.com>
X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog
Message-ID: <20231220001856.3710363-20-jstultz@google.com>
Subject: [PATCH v7 19/23] sched: Consolidate pick_*_task to task_is_pushable
 helper
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: "Connor O'Brien" <connoro@google.com>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@google.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <vschneid@redhat.com>,
 Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Zimuzo Ezeozue <zezeozue@google.com>,
	Youssef Esmat <youssefesmat@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
 Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>, "Paul E. McKenney" <paulmck@kernel.org>,
	Metin Kaya <Metin.Kaya@arm.com>, Xuewen Yan <xuewen.yan94@gmail.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
 Thomas Gleixner <tglx@linutronix.de>, kernel-team@android.com,
	John Stultz <jstultz@google.com>
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1785758363374683260
X-GMAIL-MSGID: 1785758363374683260

From: Connor O'Brien <connoro@google.com>

This patch consolidates rt and deadline pick_*_task functions to
a task_is_pushable() helper

This patch was broken out from a larger chain migration
patch originally by Connor O'Brien.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: split out from larger chain migration patch,
 renamed helper function]
Signed-off-by: John Stultz <jstultz@google.com>
---
v7:
* Split from chain migration patch
* Renamed function
---
 kernel/sched/deadline.c | 10 +---------
 kernel/sched/rt.c       | 11 +----------
 kernel/sched/sched.h    | 10 ++++++++++
 3 files changed, 12 insertions(+), 19 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index def1eb23318b..1f3bc50de678 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2049,14 +2049,6 @@ static void task_fork_dl(struct task_struct *p)
 /* Only try algorithms three times */
 #define DL_MAX_TRIES 3
 
-static int pick_dl_task(struct rq *rq, struct task_struct *p, int cpu)
-{
-	if (!task_on_cpu(rq, p) &&
-	    cpumask_test_cpu(cpu, &p->cpus_mask))
-		return 1;
-	return 0;
-}
-
 /*
  * Return the earliest pushable rq's task, which is suitable to be executed
  * on the CPU, NULL otherwise:
@@ -2075,7 +2067,7 @@ static struct task_struct *pick_earliest_pushable_dl_task(struct rq *rq, int cpu
 	if (next_node) {
 		p = __node_2_pdl(next_node);
 
-		if (pick_dl_task(rq, p, cpu))
+		if (task_is_pushable(rq, p, cpu) == 1)
 			return p;
 
 		next_node = rb_next(next_node);
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index cf0eb4aac613..15161de88753 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1812,15 +1812,6 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
 /* Only try algorithms three times */
 #define RT_MAX_TRIES 3
 
-static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu)
-{
-	if (!task_on_cpu(rq, p) &&
-	    cpumask_test_cpu(cpu, &p->cpus_mask))
-		return 1;
-
-	return 0;
-}
-
 /*
  * Return the highest pushable rq's task, which is suitable to be executed
  * on the CPU, NULL otherwise
@@ -1834,7 +1825,7 @@ static struct task_struct *pick_highest_pushable_task(struct rq *rq, int cpu)
 		return NULL;
 
 	plist_for_each_entry(p, head, pushable_tasks) {
-		if (pick_rt_task(rq, p, cpu))
+		if (task_is_pushable(rq, p, cpu) == 1)
 			return p;
 	}
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 19afe532771f..ef3d327e267c 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3554,6 +3554,16 @@ void push_task_chain(struct rq *rq, struct rq *dst_rq, struct task_struct *task)
 	set_task_cpu(task, dst_rq->cpu);
 	activate_task(dst_rq, task, 0);
 }
+
+static inline
+int task_is_pushable(struct rq *rq, struct task_struct *p, int cpu)
+{
+	if (!task_on_cpu(rq, p) &&
+	    cpumask_test_cpu(cpu, &p->cpus_mask))
+		return 1;
+
+	return 0;
+}
 #endif
 
 #endif /* _KERNEL_SCHED_SCHED_H */

From patchwork Wed Dec 20 00:18:32 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: John Stultz <jstultz@google.com>
X-Patchwork-Id: 181368
Return-Path: <linux-kernel+bounces-6157-ouuuleilei=gmail.com@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp2324744dyi;
        Tue, 19 Dec 2023 16:25:22 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IH/iWh8lGd3maHeJ4ER0jVVwFZmtp43bT4WpWG7m3ccB1hc3Q9p3+PLgrvPugsMaxsa3tlw
X-Received: by 2002:a2e:a690:0:b0:2cc:6efe:6f3e with SMTP id
 q16-20020a2ea690000000b002cc6efe6f3emr1870477lje.38.1703031921826;
        Tue, 19 Dec 2023 16:25:21 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1703031921; cv=none;
        d=google.com; s=arc-20160816;
        b=NhYDAS3oX+Of+O7zqEobdNj4kU4LeHos9kr2CABnurFQ1cvL6xjtStvAmxgHk1ctVo
         aMeTVF+uUPCXXVkZk4jGc13ywhtTGZfpIpx5yyGzns2pbV3qoocOl/FYWySY5dLmlN2Y
         SNGW8NZ8uHUxSWJIo7L5tHoipt6DbBoJpPMgPKzbGIwm89GwYAunBLRjua77yjzIF0zA
         sz57CtI5pwI2i8LMydTJ3A6+HKcrAxVebLKOY0vn+Ue9YhRO7FeDkVtQd21lPC9MAOML
         9CyqwbQGY0z9fqlCFQ76ld4x5vSvhaWtYIKu3jVg/UcA143O4cQeAeEADVewQw1U8Ou7
         KyzA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=cc:to:from:subject:message-id:references:mime-version
         :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date
         :dkim-signature;
        bh=6y8S+nSE7+9Tt9N3qq9zNMzLoztzpop33DDkxCG+ySI=;
        fh=OA4B84zChEo2N2zZSnRgF4ImSezDLuvwskDfqEiOYJk=;
        b=rXcT18depgjtWIUHrBAETNfjGCNR6fdBf1iT7OZgcd/QZjWPwdT89ekUP7BZpAmT9J
         xL7voelH7oreoRYN8oE8gDHtjI6jmEjfIu3TS0Nfk6qYz57Ft37NKyUfXFwMIGWzg9Qz
         ace5gq2jp3/NjxyxiHciS/5XSzh3hCWUH1ms0lC2Di2ZRzMlzfzsYoy8dCPkrKUbErAC
         MZ5pu4zSKaR3zcSJRgeA1hJPa5BSeGuGxeZ0BUzNNv856Gl6RzdwcZw11BXK7OJMSvd0
         eeaylwsB6azV5wufdSysXNEbIVCOc/+8Fpew5GhJuCkAlWKkcdLgc/+LIjLtt0Scoigx
         LaVA==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=DMHLwvwz;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6157-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.80.249 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6157-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249])
        by mx.google.com with ESMTPS id
 x29-20020a50d61d000000b005539faaa6d2si861719edi.211.2023.12.19.16.25.21
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 19 Dec 2023 16:25:21 -0800 (PST)
Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-6157-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.80.249 as permitted sender) client-ip=147.75.80.249;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=DMHLwvwz;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6157-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.80.249 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6157-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org
 [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by am.mirrors.kernel.org (Postfix) with ESMTPS id 4444A1F25FED
	for <ouuuleilei@gmail.com>; Wed, 20 Dec 2023 00:25:21 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp.subspace.kernel.org (Postfix) with ESMTP id D3CE32555F;
	Wed, 20 Dec 2023 00:19:54 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="DMHLwvwz"
X-Original-To: linux-kernel@vger.kernel.org
Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com
 [209.85.215.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5DA582136C
	for <linux-kernel@vger.kernel.org>; Wed, 20 Dec 2023 00:19:42 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Received: by mail-pg1-f202.google.com with SMTP id
 41be03b00d2f7-5cd91cbd273so1001914a12.3
        for <linux-kernel@vger.kernel.org>;
 Tue, 19 Dec 2023 16:19:42 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1703031581; x=1703636381;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=6y8S+nSE7+9Tt9N3qq9zNMzLoztzpop33DDkxCG+ySI=;
        b=DMHLwvwzV0QMEKRQW3VVc+aHO1+pZvv5O4ncbQKM9YudvunIfshn2cOreE4tYxo+wd
         /lwQWioItjxo+B+xZ36oCRjGQxLwgbyITHKdYS1rS4XJY2/jxcwwlL/AO0GNeVmQmyuO
         Ie3fNaEuWpbHlvq28mrG06YtUmAtLcSC7Ozz2gK2MMv9DdHsUBs4maBINn5Qwhg3bpKp
         1JjOvCGP6l9Q/RUf+z9abgp8ZMpseeicKKnVzODmvRnt+2ytabJwBcw+Sq9hJ9OT2OlD
         IaQV0XF47cBmYzXmOM4m3D7FnvzmkVPuAQBYUUBdub5KFfbEatLAerUYb4nmYx0h089Z
         F5LQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1703031581; x=1703636381;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=6y8S+nSE7+9Tt9N3qq9zNMzLoztzpop33DDkxCG+ySI=;
        b=RipDc+lyIB3WzqH5i4ShBzIudn+tBiN5f7gJjIFk1rpZ+6bNdJ5qjHkJ/BkPPVp9nd
         Ot1vMnic6JGqXExnFKKr/yZ6TpqmYEf1grs7sELcea4dvjdqw7+hwxJCsFQ++6+myJfY
         GkVZMAaQgdwF20086x3nvuDe5rcJ5Hp85F1ffIUvUdVrt2vYObq19F1MfvmDHt9pqn09
         ZNzucZhW2l/gX1khylNORi62998tyvjdVQsyuErgf+2S7zyHbmIvfDuferPo1B5cTL4E
         u1ELvhLTSyRT3YVjqzDJe0QvHeNxL/HKQe1AzgTT8HmUP+L+L8UB6sURZjlMd3MvC74u
         Cs8Q==
X-Gm-Message-State: AOJu0YzmGqkhZL2WjSWAJhs90YOmxvPRzz7P3MpSSHG/gQHs8H/o2isl
	t00XOLVnYwCUq3Ht5RChJS1MLeMx7cdIi0x8poIeJ/A2wiRyQmZEVR23XgM9SHGpHhW51hlLvCa
	QSpwm+XpHe7GENRss4NQPeDDKAsAFkKS0qeR2EJom1betX1qLAq72p8PteD3aG6PdJt00xPM=
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a63:5c07:0:b0:5ca:4440:234c with SMTP id
 q7-20020a635c07000000b005ca4440234cmr772730pgb.12.1703031580069; Tue, 19 Dec
 2023 16:19:40 -0800 (PST)
Date: Tue, 19 Dec 2023 16:18:32 -0800
In-Reply-To: <20231220001856.3710363-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231220001856.3710363-1-jstultz@google.com>
X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog
Message-ID: <20231220001856.3710363-22-jstultz@google.com>
Subject: [PATCH v7 21/23] sched: Add find_exec_ctx helper
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: "Connor O'Brien" <connoro@google.com>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@google.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <vschneid@redhat.com>,
 Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Zimuzo Ezeozue <zezeozue@google.com>,
	Youssef Esmat <youssefesmat@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
 Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>, "Paul E. McKenney" <paulmck@kernel.org>,
	Metin Kaya <Metin.Kaya@arm.com>, Xuewen Yan <xuewen.yan94@gmail.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
 Thomas Gleixner <tglx@linutronix.de>, kernel-team@android.com,
	John Stultz <jstultz@google.com>
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1785758400440359965
X-GMAIL-MSGID: 1785758400440359965

From: Connor O'Brien <connoro@google.com>

Add a helper to find the runnable owner down a chain of blocked waiters

This patch was broken out from a larger chain migration
patch originally by Connor O'Brien.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: split out from larger chain migration patch]
Signed-off-by: John Stultz <jstultz@google.com>
---
 kernel/sched/core.c     | 42 +++++++++++++++++++++++++++++++++++++++++
 kernel/sched/cpupri.c   | 11 ++++++++---
 kernel/sched/deadline.c | 15 +++++++++++++--
 kernel/sched/rt.c       |  9 ++++++++-
 kernel/sched/sched.h    | 10 ++++++++++
 5 files changed, 81 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0c212dcd4b7a..77a79d5f829a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3896,6 +3896,48 @@ static void activate_blocked_entities(struct rq *target_rq,
 	}
 	raw_spin_unlock_irqrestore(&owner->blocked_lock, flags);
 }
+
+static inline bool task_queued_on_rq(struct rq *rq, struct task_struct *task)
+{
+	if (!task_on_rq_queued(task))
+		return false;
+	smp_rmb();
+	if (task_rq(task) != rq)
+		return false;
+	smp_rmb();
+	if (!task_on_rq_queued(task))
+		return false;
+	return true;
+}
+
+/*
+ * Returns the unblocked task at the end of the blocked chain starting with p
+ * if that chain is composed entirely of tasks enqueued on rq, or NULL otherwise.
+ */
+struct task_struct *find_exec_ctx(struct rq *rq, struct task_struct *p)
+{
+	struct task_struct *exec_ctx, *owner;
+	struct mutex *mutex;
+
+	if (!sched_proxy_exec())
+		return p;
+
+	lockdep_assert_rq_held(rq);
+
+	for (exec_ctx = p; task_is_blocked(exec_ctx) && !task_on_cpu(rq, exec_ctx);
+							exec_ctx = owner) {
+		mutex = exec_ctx->blocked_on;
+		owner = __mutex_owner(mutex);
+		if (owner == exec_ctx)
+			break;
+
+		if (!task_queued_on_rq(rq, owner) || task_current_selected(rq, owner)) {
+			exec_ctx = NULL;
+			break;
+		}
+	}
+	return exec_ctx;
+}
 #else /* !CONFIG_SCHED_PROXY_EXEC */
 static inline void do_activate_task(struct rq *rq, struct task_struct *p,
 				    int en_flags)
diff --git a/kernel/sched/cpupri.c b/kernel/sched/cpupri.c
index 15e947a3ded7..53be78afdd07 100644
--- a/kernel/sched/cpupri.c
+++ b/kernel/sched/cpupri.c
@@ -96,12 +96,17 @@ static inline int __cpupri_find(struct cpupri *cp, struct task_struct *p,
 	if (skip)
 		return 0;
 
-	if (cpumask_any_and(&p->cpus_mask, vec->mask) >= nr_cpu_ids)
+	if ((p && cpumask_any_and(&p->cpus_mask, vec->mask) >= nr_cpu_ids) ||
+	    (!p && cpumask_any(vec->mask) >= nr_cpu_ids))
 		return 0;
 
 	if (lowest_mask) {
-		cpumask_and(lowest_mask, &p->cpus_mask, vec->mask);
-		cpumask_and(lowest_mask, lowest_mask, cpu_active_mask);
+		if (p) {
+			cpumask_and(lowest_mask, &p->cpus_mask, vec->mask);
+			cpumask_and(lowest_mask, lowest_mask, cpu_active_mask);
+		} else {
+			cpumask_copy(lowest_mask, vec->mask);
+		}
 
 		/*
 		 * We have to ensure that we have at least one bit
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 999bd17f11c4..21e56ac58e32 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1866,6 +1866,8 @@ static void migrate_task_rq_dl(struct task_struct *p, int new_cpu __maybe_unused
 
 static void check_preempt_equal_dl(struct rq *rq, struct task_struct *p)
 {
+	struct task_struct *exec_ctx;
+
 	/*
 	 * Current can't be migrated, useless to reschedule,
 	 * let's hope p can move out.
@@ -1874,12 +1876,16 @@ static void check_preempt_equal_dl(struct rq *rq, struct task_struct *p)
 	    !cpudl_find(&rq->rd->cpudl, rq_selected(rq), rq->curr, NULL))
 		return;
 
+	exec_ctx = find_exec_ctx(rq, p);
+	if (task_current(rq, exec_ctx))
+		return;
+
 	/*
 	 * p is migratable, so let's not schedule it and
 	 * see if it is pushed or pulled somewhere else.
 	 */
 	if (p->nr_cpus_allowed != 1 &&
-	    cpudl_find(&rq->rd->cpudl, p, p, NULL))
+	    cpudl_find(&rq->rd->cpudl, p, exec_ctx, NULL))
 		return;
 
 	resched_curr(rq);
@@ -2169,12 +2175,17 @@ static int find_later_rq(struct task_struct *sched_ctx, struct task_struct *exec
 /* Locks the rq it finds */
 static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq)
 {
+	struct task_struct *exec_ctx;
 	struct rq *later_rq = NULL;
 	int tries;
 	int cpu;
 
 	for (tries = 0; tries < DL_MAX_TRIES; tries++) {
-		cpu = find_later_rq(task, task);
+		exec_ctx = find_exec_ctx(rq, task);
+		if (!exec_ctx)
+			break;
+
+		cpu = find_later_rq(task, exec_ctx);
 
 		if ((cpu == -1) || (cpu == rq->cpu))
 			break;
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 6371b0fca4ad..f8134d062fa3 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1640,6 +1640,11 @@ static void check_preempt_equal_prio(struct rq *rq, struct task_struct *p)
 	    !cpupri_find(&rq->rd->cpupri, rq_selected(rq), rq->curr, NULL))
 		return;
 
+	/* No reason to preempt since rq->curr wouldn't change anyway */
+	exec_ctx = find_exec_ctx(rq, p);
+	if (task_current(rq, exec_ctx))
+		return;
+
 	/*
 	 * p is migratable, so let's not schedule it and
 	 * see if it is pushed or pulled somewhere else.
@@ -1933,12 +1938,14 @@ static int find_lowest_rq(struct task_struct *sched_ctx, struct task_struct *exe
 /* Will lock the rq it finds */
 static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
 {
+	struct task_struct *exec_ctx;
 	struct rq *lowest_rq = NULL;
 	int tries;
 	int cpu;
 
 	for (tries = 0; tries < RT_MAX_TRIES; tries++) {
-		cpu = find_lowest_rq(task, task);
+		exec_ctx = find_exec_ctx(rq, task);
+		cpu = find_lowest_rq(task, exec_ctx);
 
 		if ((cpu == -1) || (cpu == rq->cpu))
 			break;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ef3d327e267c..6cd473224cfe 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3564,6 +3564,16 @@ int task_is_pushable(struct rq *rq, struct task_struct *p, int cpu)
 
 	return 0;
 }
+
+#ifdef CONFIG_SCHED_PROXY_EXEC
+struct task_struct *find_exec_ctx(struct rq *rq, struct task_struct *p);
+#else /* !CONFIG_SCHED_PROXY_EXEC */
+static inline
+struct task_struct *find_exec_ctx(struct rq *rq, struct task_struct *p)
+{
+	return p;
+}
+#endif /* CONFIG_SCHED_PROXY_EXEC */
 #endif
 
 #endif /* _KERNEL_SCHED_SCHED_H */

From patchwork Wed Dec 20 00:18:33 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: John Stultz <jstultz@google.com>
X-Patchwork-Id: 181371
Return-Path: <linux-kernel+bounces-6158-ouuuleilei=gmail.com@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp2325855dyi;
        Tue, 19 Dec 2023 16:28:00 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IGbkT8wnPDOJIY1/ILaZ28fx4FLBIETdarMdzBzt65Z9g8zHIqPQozBMPnXCr/qxKEExsDw
X-Received: by 2002:a17:903:2b04:b0:1d3:ac38:7491 with SMTP id
 mc4-20020a1709032b0400b001d3ac387491mr3807131plb.121.1703032080206;
        Tue, 19 Dec 2023 16:28:00 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1703032080; cv=none;
        d=google.com; s=arc-20160816;
        b=b3qc6TpL2tVu0nZA5GL8RJSN+niToqSWR5sqkByaaw6GNIsz/Ye0+TCu2EXJV120We
         xCmVsFuNDCqaVWdchq16oZLmsKg80LZCDWLPD9bUekyDn8I2fLWoOAiwYNxMByBNj1Od
         pysCtvZIpBsyeKACz2214VVqjUsvC/Bq+or1eEm3CsIvh1Qq+Kx2xCVpPOusnopxnE1+
         0XmOgOmB9DLH5Ed3MtSoerjitJ9YpJSdgXWb62A4mOJQPjWywLAeMU4lAcEUOZLpSCBQ
         FEfj0mqstAoI5zzX0Sfc9FRe8ypujYGsqmOQTrUVAWT2OjKkiEcBr0nyl9crxURkGD0O
         ewGQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=cc:to:from:subject:message-id:references:mime-version
         :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date
         :dkim-signature;
        bh=Ma+3AbQ8bkO0LeZ0x4L+yPA6KTJo2YRmJIGPc7lQ3VM=;
        fh=lq1k4m46TqOToIb7rHC/F2KmHTJTjmaV9sS3Am9W7nA=;
        b=XmE4lMoqyFMCGMMocE6jetNNVMgwcLqPQMTB8urYNb54lKjYFCYVwGxHej10F+6mrF
         JIxfVGK+aC4Qz1Kp+Kd801bSKN5vJ+Nvurzh9GSATN0KfFuZAUlNHZ1vQlm2pV1ynkjd
         fyEgsYQUiQiOESNi9XFWlpteY7s8smi+VlbrGsOZxP5tolmCd3tKWXjqrN83aUO6diyG
         WZ/WzeUzProwFzFR7fzCoGlTv+KvivFAisg8aF8mV/TCm2z0HBMoSNkBgfsnQcbymEMy
         9nLkCU8B0ST3nA+9cmo1Lv0JZ6CHCpd7ohWgxXBWOPhpz/M9QJnWPKRqPKs4Ua5bE+8Q
         0nqQ==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=3+mESWIH;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6158-ouuuleilei=gmail.com@vger.kernel.org designates
 2604:1380:40f1:3f00::1 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6158-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org.
 [2604:1380:40f1:3f00::1])
        by mx.google.com with ESMTPS id
 j15-20020a170902da8f00b001d01f49015esi21077992plx.442.2023.12.19.16.27.59
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 19 Dec 2023 16:28:00 -0800 (PST)
Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-6158-ouuuleilei=gmail.com@vger.kernel.org designates
 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=3+mESWIH;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6158-ouuuleilei=gmail.com@vger.kernel.org designates
 2604:1380:40f1:3f00::1 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6158-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org
 [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by sy.mirrors.kernel.org (Postfix) with ESMTPS id F1E07B24E6B
	for <ouuuleilei@gmail.com>; Wed, 20 Dec 2023 00:25:21 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp.subspace.kernel.org (Postfix) with ESMTP id E152425569;
	Wed, 20 Dec 2023 00:19:54 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="3+mESWIH"
X-Original-To: linux-kernel@vger.kernel.org
Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com
 [209.85.210.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id BDEF5219E4
	for <linux-kernel@vger.kernel.org>; Wed, 20 Dec 2023 00:19:43 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Received: by mail-pf1-f202.google.com with SMTP id
 d2e1a72fcca58-6d64b546e7eso1791598b3a.2
        for <linux-kernel@vger.kernel.org>;
 Tue, 19 Dec 2023 16:19:43 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1703031583; x=1703636383;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=Ma+3AbQ8bkO0LeZ0x4L+yPA6KTJo2YRmJIGPc7lQ3VM=;
        b=3+mESWIHpeKZxqJBcV8G9HW/poxO52h/+lJYw57Cw+JwIg8AiCH9EkSGdKRxUSsr17
         7nSKmNY28xXTgcDlCoY7Nrg+BpWyasE1WfrpAynbj2RA6TpeK2AtcXEfmDrKvjhK8/sq
         Dd7fb2GzmOISaDnEtfVMAaBKaLaCs8+IfUwbjvkmQVNSVn2/EgKdWGHwsWpKUViSerPN
         RtPrWy3WhWwzQPnlOWGngEgC2QMXX3e0a9BmIIcsooy4Mb1FuRUs7MAECp972TxOxgJ7
         beyUAqGwej9ya9JgdHX5HKkCbgeng4aGZE6rEJ/wQASJM5M7KSBh89xt6E746PvDwBAh
         E6RA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1703031583; x=1703636383;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=Ma+3AbQ8bkO0LeZ0x4L+yPA6KTJo2YRmJIGPc7lQ3VM=;
        b=EcycwKA9QGL2xb7+M16A0tPS6j4Wz3rrwhKfTI11wgK3TQYTdB8G4q5vZswGyDD17o
         E1Fmov0BNlZYV4PyqbeKLEqjJ5g4NSZClyIirAS1QmyEYhoM+j5HVHkTinnJMNtoly/h
         iOA1X4Max1L186AnXGvaY0MDJMkt6AqKYVSfMUH65qHCX3JfxGz8l5uzTiF/yrQpQeF7
         CY/OEcO5pj8wEe67carb5jaJOzp9m6n7RIDzOZQYGUUiZ+Oq68tDCetsUz2VNva8suiO
         0zPdBKwWFWDTvYlARIfi2gNun8HJHSWnofttqrf0gDs0cD4go1HDqCd/wspSLxQRt/KQ
         KdHg==
X-Gm-Message-State: AOJu0YzQ64kiZQlhoMOks4Vy1TOkA7rkVc5TKjnEKLjCrWi4fpWIldP4
	RB8Fo1ZHzybhbENrrI8/gGwqWyt/7ozdtPUlmFQvCfUsA/IOG4cSyqRPmz+ADIRjGtIBYTRFMR7
	WcUgxiT8JXHTV9402X+wrcK6trP428QKnC7GNdhDuhNVoVDVNEQLu9vMLtVs87cRql15UPLo=
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:aa7:8486:0:b0:6d8:a6fd:dc98 with SMTP id
 u6-20020aa78486000000b006d8a6fddc98mr81353pfn.6.1703031581731; Tue, 19 Dec
 2023 16:19:41 -0800 (PST)
Date: Tue, 19 Dec 2023 16:18:33 -0800
In-Reply-To: <20231220001856.3710363-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231220001856.3710363-1-jstultz@google.com>
X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog
Message-ID: <20231220001856.3710363-23-jstultz@google.com>
Subject: [PATCH v7 22/23] sched: Refactor dl/rt find_lowest/latest_rq logic
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@google.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <vschneid@redhat.com>,
 Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Zimuzo Ezeozue <zezeozue@google.com>,
	Youssef Esmat <youssefesmat@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
 Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>, "Paul E. McKenney" <paulmck@kernel.org>,
	Metin Kaya <Metin.Kaya@arm.com>, Xuewen Yan <xuewen.yan94@gmail.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
 Thomas Gleixner <tglx@linutronix.de>, kernel-team@android.com
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1785758566233170993
X-GMAIL-MSGID: 1785758566233170993

This pulls re-validation logic done in find_lowest_rq
and find_latest_rq after re-acquiring the rq locks out into its
own function.

This allows us to later use a more complicated validation
check for chain-migration when using proxy-exectuion.

TODO: It seems likely we could consolidate this two functions
further and leave the task_is_rt()/task_is_dl() checks externally?

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com
Signed-off-by: John Stultz <jstultz@google.com>
---
 kernel/sched/deadline.c | 31 ++++++++++++++++++++-----
 kernel/sched/rt.c       | 50 ++++++++++++++++++++++++++++-------------
 2 files changed, 59 insertions(+), 22 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 21e56ac58e32..8b5701727342 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2172,6 +2172,30 @@ static int find_later_rq(struct task_struct *sched_ctx, struct task_struct *exec
 	return -1;
 }
 
+static inline bool dl_revalidate_rq_state(struct task_struct *task, struct rq *rq,
+					  struct rq *later)
+{
+	if (task_rq(task) != rq)
+		return false;
+
+	if (!cpumask_test_cpu(later->cpu, &task->cpus_mask))
+		return false;
+
+	if (task_on_cpu(rq, task))
+		return false;
+
+	if (!dl_task(task))
+		return false;
+
+	if (is_migration_disabled(task))
+		return false;
+
+	if (!task_on_rq_queued(task))
+		return false;
+
+	return true;
+}
+
 /* Locks the rq it finds */
 static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq)
 {
@@ -2204,12 +2228,7 @@ static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq)
 
 		/* Retry if something changed. */
 		if (double_lock_balance(rq, later_rq)) {
-			if (unlikely(task_rq(task) != rq ||
-				     !cpumask_test_cpu(later_rq->cpu, &task->cpus_mask) ||
-				     task_on_cpu(rq, task) ||
-				     !dl_task(task) ||
-				     is_migration_disabled(task) ||
-				     !task_on_rq_queued(task))) {
+			if (unlikely(!dl_revalidate_rq_state(task, rq, later_rq))) {
 				double_unlock_balance(rq, later_rq);
 				later_rq = NULL;
 				break;
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index f8134d062fa3..fabb19891e95 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1935,6 +1935,39 @@ static int find_lowest_rq(struct task_struct *sched_ctx, struct task_struct *exe
 	return -1;
 }
 
+static inline bool rt_revalidate_rq_state(struct task_struct *task, struct rq *rq,
+					  struct rq *lowest)
+{
+	/*
+	 * We had to unlock the run queue. In
+	 * the mean time, task could have
+	 * migrated already or had its affinity changed.
+	 * Also make sure that it wasn't scheduled on its rq.
+	 * It is possible the task was scheduled, set
+	 * "migrate_disabled" and then got preempted, so we must
+	 * check the task migration disable flag here too.
+	 */
+	if (task_rq(task) != rq)
+		return false;
+
+	if (!cpumask_test_cpu(lowest->cpu, &task->cpus_mask))
+		return false;
+
+	if (task_on_cpu(rq, task))
+		return false;
+
+	if (!rt_task(task))
+		return false;
+
+	if (is_migration_disabled(task))
+		return false;
+
+	if (!task_on_rq_queued(task))
+		return false;
+
+	return true;
+}
+
 /* Will lock the rq it finds */
 static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
 {
@@ -1964,22 +1997,7 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
 
 		/* if the prio of this runqueue changed, try again */
 		if (double_lock_balance(rq, lowest_rq)) {
-			/*
-			 * We had to unlock the run queue. In
-			 * the mean time, task could have
-			 * migrated already or had its affinity changed.
-			 * Also make sure that it wasn't scheduled on its rq.
-			 * It is possible the task was scheduled, set
-			 * "migrate_disabled" and then got preempted, so we must
-			 * check the task migration disable flag here too.
-			 */
-			if (unlikely(task_rq(task) != rq ||
-				     !cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask) ||
-				     task_on_cpu(rq, task) ||
-				     !rt_task(task) ||
-				     is_migration_disabled(task) ||
-				     !task_on_rq_queued(task))) {
-
+			if (unlikely(!rt_revalidate_rq_state(task, rq, lowest_rq))) {
 				double_unlock_balance(rq, lowest_rq);
 				lowest_rq = NULL;
 				break;

From patchwork Wed Dec 20 00:18:34 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: John Stultz <jstultz@google.com>
X-Patchwork-Id: 181369
Return-Path: <linux-kernel+bounces-6159-ouuuleilei=gmail.com@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp2324861dyi;
        Tue, 19 Dec 2023 16:25:39 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IHg60n3WKdPUxqQ6a1ZSGWuhvx3kE3kpwte9ixqapbU63sUM1EOkEtalWtsIiRomvDbY1If
X-Received: by 2002:a05:6870:75ce:b0:203:3196:702a with SMTP id
 de14-20020a05687075ce00b002033196702amr15040229oab.69.1703031939336;
        Tue, 19 Dec 2023 16:25:39 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1703031939; cv=none;
        d=google.com; s=arc-20160816;
        b=Q0u4W+Nbz+xtolbEdlLSRUPtLqHan6Zg2YMmpSnRbja3kuNmb267Q1OpCoNiGAWZh7
         MWRzjfdkqPr8WUuE35VQ1xIUaO9mSeWN94oo8g3DWw2wTGeDKdb9XrIa7pj6Y7ixmc1v
         dVyS/a0dJXWq3u5FSci0OOE14EjXxSRLHvBQkOTlUuj6kYLTmy9VzJpd5eI2HIgJrT0g
         3H7WE5G0CG2+4uQQFAA+EQN/ieaBiM07Tw1+c4UHme0cresd+jiqPTXflvc4CDIUKrV7
         2skxJVmUC0mKnkV9kjynSs+hzMB/nLfw7jgPgKWagusN8mBunx779ZQ2wYeQMX19502H
         5xFg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=cc:to:from:subject:message-id:references:mime-version
         :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date
         :dkim-signature;
        bh=9eYR5qbdikZDN1oBPNGzkzAgPNV/qw7OTbW9wb9vboA=;
        fh=OA4B84zChEo2N2zZSnRgF4ImSezDLuvwskDfqEiOYJk=;
        b=MCP6OLL1AbwShBoL3FAgJCTFeAUzDbG73rSygzsdgEAXM3wUlYJ+JlDjBpMV0kSN7q
         PvemIh5SH7Dy18KYNiLFDXR4YTITcllYAOeR3aOC94uaRMs1pR7gbXW5dibOtOL/o9Gd
         RWzINTeY818ikBbeKE2VpMoVWKG+EnNn9VCM6U5leekZquPtOBkwg6pdRE9Lw8f2lpdX
         kdPxSV6L8Qw3/VODMC4UF19JNFw5BHuNgIb+dcuUyOsJUtMO/GIEeiQZ7x5Iu7apELjX
         1hfq/tQwBeBu46rJ6Jusl2vunN4QmsB4QoUsoUeKO26wITAzwkVhPkqfXBQP+LdW6ngR
         owmw==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=XxOnUhce;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6159-ouuuleilei=gmail.com@vger.kernel.org designates
 139.178.88.99 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6159-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99])
        by mx.google.com with ESMTPS id
 c20-20020a631c54000000b005c6607044easi20462066pgm.738.2023.12.19.16.25.39
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 19 Dec 2023 16:25:39 -0800 (PST)
Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-6159-ouuuleilei=gmail.com@vger.kernel.org designates
 139.178.88.99 as permitted sender) client-ip=139.178.88.99;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20230601 header.b=XxOnUhce;
       spf=pass (google.com: domain of
 linux-kernel+bounces-6159-ouuuleilei=gmail.com@vger.kernel.org designates
 139.178.88.99 as permitted sender)
 smtp.mailfrom="linux-kernel+bounces-6159-ouuuleilei=gmail.com@vger.kernel.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org
 [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by sv.mirrors.kernel.org (Postfix) with ESMTPS id 11E5E2882EF
	for <ouuuleilei@gmail.com>; Wed, 20 Dec 2023 00:25:39 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp.subspace.kernel.org (Postfix) with ESMTP id 13F272AF1F;
	Wed, 20 Dec 2023 00:20:01 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="XxOnUhce"
X-Original-To: linux-kernel@vger.kernel.org
Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com
 [209.85.219.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3B9E221A13
	for <linux-kernel@vger.kernel.org>; Wed, 20 Dec 2023 00:19:44 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Received: by mail-yb1-f202.google.com with SMTP id
 3f1490d57ef6-dbd633c0653so1156004276.2
        for <linux-kernel@vger.kernel.org>;
 Tue, 19 Dec 2023 16:19:44 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1703031584; x=1703636384;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=9eYR5qbdikZDN1oBPNGzkzAgPNV/qw7OTbW9wb9vboA=;
        b=XxOnUhce0kfXeWRYw1nLIgK7o5324Ed/X+0+KjAnofh0j1zbg19LZ7oP4uDXlOCYXx
         3N+t/5qaXj+SLQeqxQyG45rM9bDbaAwfDjlyVAwj7y9Igiu8kEpk1t5AW6Vbr984OJjf
         p2NqPWZVb5AdicE8ErWFPgKNCewXHdnJIRyWgGL9qr1e4t9ThNYh4DqDUbjfvMFtVO9n
         7ar8rr4z/uSH6TuDW9Nd5G9jQWS97UG7fjpzQk7svtbVTPrM3uwkxiLruSCfQdKIqZ7n
         Hiw+QHW3pU9Ce4YZSOxqlQ1web1iSTM5rxkuY66uOEfN1V65SvduBmxkyb0u5++dVk5D
         30jQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1703031584; x=1703636384;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=9eYR5qbdikZDN1oBPNGzkzAgPNV/qw7OTbW9wb9vboA=;
        b=pjultZKcpVXCYg7ez1duYglOOjV4uslU7046Pzxx3pJ6IVs/SNVl4tM7RLbchcVE+9
         3qtPfN7vhE5WBJkvpZXHADttdclbs79iSYMIIcmExxo+LgD5o5IBSG2VB+SZn1SGKuLU
         FAlAhjHe+JWSecqthK3zF7lHjGbDzeXooTi9UUEpurtBKq0EirXlx7OZk64RP/wtdQ4y
         ItwhdvcqbN8HNy4d/jYtROwHYHJx9d2tM4X/bK4T9DFUcjiiHSsb2ZZukDhMBcGmwnE7
         Twg0Mecq4Qf6enwtHWPYABB53lATB89mr7/tE9CdL/7JMiwz+sUr6f5Uv6Pd5s7IqXbN
         HWxw==
X-Gm-Message-State: AOJu0YwXzqAeVBWcg2cM7q1NPMnYudxxi1kU3Th3AdIzkGESy4Ucv8+Y
	83GN6y0uwCT99R7XJ4OI3XgV0/R6QSO6cmH1BsmyVbRonjTrIg85KSwLWrE/Zs5xQkhNvnmzcB4
	UgEmtdmebpVqWSNbDGoerBlfXzLl01aRZ2zfPn98rb0d/3u+eOVhtWN+6o185LDGxUb+rj5s=
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a05:6902:134a:b0:dbd:2f0:c763 with SMTP id
 g10-20020a056902134a00b00dbd02f0c763mr244149ybu.1.1703031583592; Tue, 19 Dec
 2023 16:19:43 -0800 (PST)
Date: Tue, 19 Dec 2023 16:18:34 -0800
In-Reply-To: <20231220001856.3710363-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231220001856.3710363-1-jstultz@google.com>
X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog
Message-ID: <20231220001856.3710363-24-jstultz@google.com>
Subject: [PATCH v7 23/23] sched: Fix rt/dl load balancing via chain level
 balance
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: "Connor O'Brien" <connoro@google.com>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@google.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <vschneid@redhat.com>,
 Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Zimuzo Ezeozue <zezeozue@google.com>,
	Youssef Esmat <youssefesmat@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
 Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>, "Paul E. McKenney" <paulmck@kernel.org>,
	Metin Kaya <Metin.Kaya@arm.com>, Xuewen Yan <xuewen.yan94@gmail.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
 Thomas Gleixner <tglx@linutronix.de>, kernel-team@android.com,
	John Stultz <jstultz@google.com>
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1785758418554379537
X-GMAIL-MSGID: 1785758418554379537

From: Connor O'Brien <connoro@google.com>

RT/DL balancing is supposed to guarantee that with N cpus available &
CPU affinity permitting, the top N RT/DL tasks will get spread across
the CPUs and all get to run. Proxy exec greatly complicates this as
blocked tasks remain on the rq but cannot be usefully migrated away
from their lock owning tasks. This has two major consequences:
1. In order to get the desired properties we need to migrate a blocked
task, its would-be proxy, and everything in between, all together -
i.e., we need to push/pull "blocked chains" rather than individual
tasks.
2. Tasks that are part of rq->curr's "blocked tree" therefore should
not be pushed or pulled. Options for enforcing this seem to include
a) create some sort of complex data structure for tracking
pushability, updating it whenever the blocked tree for rq->curr
changes (e.g. on mutex handoffs, migrations, etc.) as well as on
context switches.
b) give up on O(1) pushability checks, and search through the pushable
list every push/pull until we find a pushable "chain"
c) Extend option "b" with some sort of caching to avoid repeated work.

For the sake of simplicity & separating the "chain level balancing"
concerns from complicated optimizations, this patch focuses on trying
to implement option "b" correctly. This can then hopefully provide a
baseline for "correct load balancing behavior" that optimizations can
try to implement more efficiently.

Note:
The inability to atomically check "is task enqueued on a specific rq"
creates 2 possible races when following a blocked chain:
- If we check task_rq() first on a task that is dequeued from its rq,
  it can be woken and enqueued on another rq before the call to
  task_on_rq_queued()
- If we call task_on_rq_queued() first on a task that is on another
  rq, it can be dequeued (since we don't hold its rq's lock) and then
  be set to the current rq before we check task_rq().

Maybe there's a more elegant solution that would work, but for now,
just sandwich the task_rq() check between two task_on_rq_queued()
checks, all separated by smp_rmb() calls. Since we hold rq's lock,
task can't be enqueued or dequeued from rq, so neither race should be
possible.

extensive comments on various pitfalls, races, etc. included inline.

This patch was broken out from a larger chain migration
patch originally by Connor O'Brien.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: split out from larger chain migration patch,
 majorly refactored for runtime conditionalization]
Signed-off-by: John Stultz <jstultz@google.com>
---
v7:
* Split out from larger chain-migration patch in earlier
  versions of this series
* Larger rework to allow proper conditionalization of the
  logic when running with CONFIG_SCHED_PROXY_EXEC
---
 kernel/sched/core.c     |  77 +++++++++++++++++++++++-
 kernel/sched/deadline.c |  98 +++++++++++++++++++++++-------
 kernel/sched/rt.c       | 130 ++++++++++++++++++++++++++++++++--------
 kernel/sched/sched.h    |  18 +++++-
 4 files changed, 273 insertions(+), 50 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 77a79d5f829a..30dfb6f14f2b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3923,7 +3923,6 @@ struct task_struct *find_exec_ctx(struct rq *rq, struct task_struct *p)
 		return p;
 
 	lockdep_assert_rq_held(rq);
-
 	for (exec_ctx = p; task_is_blocked(exec_ctx) && !task_on_cpu(rq, exec_ctx);
 							exec_ctx = owner) {
 		mutex = exec_ctx->blocked_on;
@@ -3938,6 +3937,82 @@ struct task_struct *find_exec_ctx(struct rq *rq, struct task_struct *p)
 	}
 	return exec_ctx;
 }
+
+#ifdef CONFIG_SMP
+void push_task_chain(struct rq *rq, struct rq *dst_rq, struct task_struct *task)
+{
+	struct task_struct *owner;
+
+	if (!sched_proxy_exec()) {
+		__push_task_chain(rq, dst_rq, task);
+		return;
+	}
+
+	lockdep_assert_rq_held(rq);
+	lockdep_assert_rq_held(dst_rq);
+
+	BUG_ON(!task_queued_on_rq(rq, task));
+	BUG_ON(task_current_selected(rq, task));
+
+	while (task) {
+		if (!task_queued_on_rq(rq, task) || task_current_selected(rq, task))
+			break;
+
+		if (task_is_blocked(task))
+			owner = __mutex_owner(task->blocked_on);
+		else
+			owner = NULL;
+		__push_task_chain(rq, dst_rq, task);
+		if (task == owner)
+			break;
+		task = owner;
+	}
+}
+
+/*
+ * Returns:
+ * 1 if chain is pushable and affinity does not prevent pushing to cpu
+ * 0 if chain is unpushable
+ * -1 if chain is pushable but affinity blocks running on cpu.
+ */
+int task_is_pushable(struct rq *rq, struct task_struct *p, int cpu)
+{
+	struct task_struct *exec_ctx;
+
+	if (!sched_proxy_exec())
+		return __task_is_pushable(rq, p, cpu);
+
+	lockdep_assert_rq_held(rq);
+
+	if (task_rq(p) != rq || !task_on_rq_queued(p))
+		return 0;
+
+	exec_ctx = find_exec_ctx(rq, p);
+	/*
+	 * Chain leads off the rq, we're free to push it anywhere.
+	 *
+	 * One wrinkle with relying on find_exec_ctx is that when the chain
+	 * leads to a task currently migrating to rq, we see the chain as
+	 * pushable & push everything prior to the migrating task. Even if
+	 * we checked explicitly for this case, we could still race with a
+	 * migration after the check.
+	 * This shouldn't permanently produce a bad state though, as proxy()
+	 * will send the chain back to rq and by that point the migration
+	 * should be complete & a proper push can occur.
+	 */
+	if (!exec_ctx)
+		return 1;
+
+	if (task_on_cpu(rq, exec_ctx) || exec_ctx->nr_cpus_allowed <= 1)
+		return 0;
+
+	return cpumask_test_cpu(cpu, &exec_ctx->cpus_mask) ? 1 : -1;
+}
+#else /* !CONFIG_SMP */
+void push_task_chain(struct rq *rq, struct rq *dst_rq, struct task_struct *task)
+{
+}
+#endif /* CONFIG_SMP */
 #else /* !CONFIG_SCHED_PROXY_EXEC */
 static inline void do_activate_task(struct rq *rq, struct task_struct *p,
 				    int en_flags)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 8b5701727342..b7be888c1635 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2172,8 +2172,77 @@ static int find_later_rq(struct task_struct *sched_ctx, struct task_struct *exec
 	return -1;
 }
 
+static struct task_struct *pick_next_pushable_dl_task(struct rq *rq)
+{
+	struct task_struct *p = NULL;
+	struct rb_node *next_node;
+
+	if (!has_pushable_dl_tasks(rq))
+		return NULL;
+
+	next_node = rb_first_cached(&rq->dl.pushable_dl_tasks_root);
+
+next_node:
+	if (next_node) {
+		p = __node_2_pdl(next_node);
+
+		/*
+		 * cpu argument doesn't matter because we treat a -1 result
+		 * (pushable but can't go to cpu0) the same as a 1 result
+		 * (pushable to cpu0). All we care about here is general
+		 * pushability.
+		 */
+		if (task_is_pushable(rq, p, 0))
+			return p;
+
+		next_node = rb_next(next_node);
+		goto next_node;
+	}
+
+	if (!p)
+		return NULL;
+
+	WARN_ON_ONCE(rq->cpu != task_cpu(p));
+	WARN_ON_ONCE(task_current(rq, p));
+	WARN_ON_ONCE(p->nr_cpus_allowed <= 1);
+
+	WARN_ON_ONCE(!task_on_rq_queued(p));
+	WARN_ON_ONCE(!dl_task(p));
+
+	return p;
+}
+
+#ifdef CONFIG_SCHED_PROXY_EXEC
 static inline bool dl_revalidate_rq_state(struct task_struct *task, struct rq *rq,
-					  struct rq *later)
+					  struct rq *later, bool *retry)
+{
+	if (!dl_task(task) || is_migration_disabled(task))
+		return false;
+
+	if (rq != this_rq()) {
+		struct task_struct *next_task = pick_next_pushable_dl_task(rq);
+
+		if (next_task == task) {
+			struct task_struct *exec_ctx;
+
+			exec_ctx = find_exec_ctx(rq, next_task);
+			*retry = (exec_ctx && !cpumask_test_cpu(later->cpu,
+							       &exec_ctx->cpus_mask));
+		} else {
+			return false;
+		}
+	} else {
+		int pushable = task_is_pushable(rq, task, later->cpu);
+
+		*retry = pushable == -1;
+		if (!pushable)
+			return false;
+	}
+	return true;
+}
+#else
+static inline bool dl_revalidate_rq_state(struct task_struct *task, struct rq *rq,
+					  struct rq *later, bool *retry)
 {
 	if (task_rq(task) != rq)
 		return false;
@@ -2195,16 +2264,18 @@ static inline bool dl_revalidate_rq_state(struct task_struct *task, struct rq *r
 
 	return true;
 }
-
+#endif
 /* Locks the rq it finds */
 static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq)
 {
 	struct task_struct *exec_ctx;
 	struct rq *later_rq = NULL;
+	bool retry;
 	int tries;
 	int cpu;
 
 	for (tries = 0; tries < DL_MAX_TRIES; tries++) {
+		retry = false;
 		exec_ctx = find_exec_ctx(rq, task);
 		if (!exec_ctx)
 			break;
@@ -2228,7 +2299,7 @@ static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq)
 
 		/* Retry if something changed. */
 		if (double_lock_balance(rq, later_rq)) {
-			if (unlikely(!dl_revalidate_rq_state(task, rq, later_rq))) {
+			if (unlikely(!dl_revalidate_rq_state(task, rq, later_rq, &retry))) {
 				double_unlock_balance(rq, later_rq);
 				later_rq = NULL;
 				break;
@@ -2240,7 +2311,7 @@ static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq)
 		 * its earliest one has a later deadline than our
 		 * task, the rq is a good one.
 		 */
-		if (dl_task_is_earliest_deadline(task, later_rq))
+		if (!retry && dl_task_is_earliest_deadline(task, later_rq))
 			break;
 
 		/* Otherwise we try again. */
@@ -2251,25 +2322,6 @@ static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq)
 	return later_rq;
 }
 
-static struct task_struct *pick_next_pushable_dl_task(struct rq *rq)
-{
-	struct task_struct *p;
-
-	if (!has_pushable_dl_tasks(rq))
-		return NULL;
-
-	p = __node_2_pdl(rb_first_cached(&rq->dl.pushable_dl_tasks_root));
-
-	WARN_ON_ONCE(rq->cpu != task_cpu(p));
-	WARN_ON_ONCE(task_current(rq, p));
-	WARN_ON_ONCE(p->nr_cpus_allowed <= 1);
-
-	WARN_ON_ONCE(!task_on_rq_queued(p));
-	WARN_ON_ONCE(!dl_task(p));
-
-	return p;
-}
-
 /*
  * See if the non running -deadline tasks on this rq
  * can be sent to some other CPU where they can preempt
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index fabb19891e95..d5ce95dc5c09 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1935,8 +1935,108 @@ static int find_lowest_rq(struct task_struct *sched_ctx, struct task_struct *exe
 	return -1;
 }
 
+static struct task_struct *pick_next_pushable_task(struct rq *rq)
+{
+	struct plist_head *head = &rq->rt.pushable_tasks;
+	struct task_struct *p, *push_task = NULL;
+
+	if (!has_pushable_tasks(rq))
+		return NULL;
+
+	plist_for_each_entry(p, head, pushable_tasks) {
+		if (task_is_pushable(rq, p, 0)) {
+			push_task = p;
+			break;
+		}
+	}
+
+	if (!push_task)
+		return NULL;
+
+	BUG_ON(rq->cpu != task_cpu(push_task));
+	BUG_ON(task_current(rq, push_task) || task_current_selected(rq, push_task));
+	BUG_ON(!task_on_rq_queued(push_task));
+	BUG_ON(!rt_task(push_task));
+
+	return p;
+}
+
+#ifdef CONFIG_SCHED_PROXY_EXEC
+static inline bool rt_revalidate_rq_state(struct task_struct *task, struct rq *rq,
+					  struct rq *lowest, bool *retry)
+{
+	/*
+	 * Releasing the rq lock means we need to re-check pushability.
+	 * Some scenarios:
+	 * 1) If a migration from another CPU sent a task/chain to rq
+	 *    that made task newly unpushable by completing a chain
+	 *    from task to rq->curr, then we need to bail out and push something
+	 *    else.
+	 * 2) If our chain led off this CPU or to a dequeued task, the last waiter
+	 *    on this CPU might have acquired the lock and woken (or even migrated
+	 *    & run, handed off the lock it held, etc...). This can invalidate the
+	 *    result of find_lowest_rq() if our chain previously ended in a blocked
+	 *    task whose affinity we could ignore, but now ends in an unblocked
+	 *    task that can't run on lowest_rq.
+	 * 3) Race described at https://lore.kernel.org/all/1523536384-26781-2-git-send-email-huawei.libin@huawei.com/
+	 *
+	 * Notes on these:
+	 * - Scenario #2 is properly handled by rerunning find_lowest_rq
+	 * - Scenario #1 requires that we fail
+	 * - Scenario #3 can AFAICT only occur when rq is not this_rq(). And the
+	 *   suggested fix is not universally correct now that push_cpu_stop() can
+	 *   call this function.
+	 */
+	if (!rt_task(task) || is_migration_disabled(task)) {
+		return false;
+	} else if (rq != this_rq()) {
+		/*
+		 * If we are dealing with a remote rq, then all bets are off
+		 * because task might have run & then been dequeued since we
+		 * released the lock, at which point our normal checks can race
+		 * with migration, as described in
+		 * https://lore.kernel.org/all/1523536384-26781-2-git-send-email-huawei.libin@huawei.com/
+		 * Need to repick to ensure we avoid a race.
+		 * But re-picking would be unnecessary & incorrect in the
+		 * push_cpu_stop() path.
+		 */
+		struct task_struct *next_task = pick_next_pushable_task(rq);
+
+		if (next_task == task) {
+			struct task_struct *exec_ctx;
+
+			exec_ctx = find_exec_ctx(rq, next_task);
+			*retry = (exec_ctx &&
+				!cpumask_test_cpu(lowest->cpu,
+						&exec_ctx->cpus_mask));
+		} else {
+			return false;
+		}
+	} else {
+		/*
+		 * Chain level balancing introduces new ways for our choice of
+		 * task & rq to become invalid when we release the rq lock, e.g.:
+		 * 1) Migration to rq from another CPU makes task newly unpushable
+		 *    by completing a "blocked chain" from task to rq->curr.
+		 *    Fail so a different task can be chosen for push.
+		 * 2) In cases where task's blocked chain led to a dequeued task
+		 *    or one on another rq, the last waiter in the chain on this
+		 *    rq might have acquired the lock and woken, meaning we must
+		 *    pick a different rq if its affinity prevents running on
+		 *    lowest rq.
+		 */
+		int pushable = task_is_pushable(rq, task, lowest->cpu);
+
+		*retry = pushable == -1;
+		if (!pushable)
+			return false;
+	}
+
+	return true;
+}
+#else /* !CONFIG_SCHED_PROXY_EXEC */
 static inline bool rt_revalidate_rq_state(struct task_struct *task, struct rq *rq,
-					  struct rq *lowest)
+					  struct rq *lowest, bool *retry)
 {
 	/*
 	 * We had to unlock the run queue. In
@@ -1967,16 +2067,19 @@ static inline bool rt_revalidate_rq_state(struct task_struct *task, struct rq *r
 
 	return true;
 }
+#endif
 
 /* Will lock the rq it finds */
 static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
 {
 	struct task_struct *exec_ctx;
 	struct rq *lowest_rq = NULL;
+	bool retry;
 	int tries;
 	int cpu;
 
 	for (tries = 0; tries < RT_MAX_TRIES; tries++) {
+		retry = false;
 		exec_ctx = find_exec_ctx(rq, task);
 		cpu = find_lowest_rq(task, exec_ctx);
 
@@ -1997,7 +2100,7 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
 
 		/* if the prio of this runqueue changed, try again */
 		if (double_lock_balance(rq, lowest_rq)) {
-			if (unlikely(!rt_revalidate_rq_state(task, rq, lowest_rq))) {
+			if (unlikely(!rt_revalidate_rq_state(task, rq, lowest_rq, &retry))) {
 				double_unlock_balance(rq, lowest_rq);
 				lowest_rq = NULL;
 				break;
@@ -2005,7 +2108,7 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
 		}
 
 		/* If this rq is still suitable use it. */
-		if (lowest_rq->rt.highest_prio.curr > task->prio)
+		if (lowest_rq->rt.highest_prio.curr > task->prio && !retry)
 			break;
 
 		/* try again */
@@ -2016,27 +2119,6 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
 	return lowest_rq;
 }
 
-static struct task_struct *pick_next_pushable_task(struct rq *rq)
-{
-	struct task_struct *p;
-
-	if (!has_pushable_tasks(rq))
-		return NULL;
-
-	p = plist_first_entry(&rq->rt.pushable_tasks,
-			      struct task_struct, pushable_tasks);
-
-	BUG_ON(rq->cpu != task_cpu(p));
-	BUG_ON(task_current(rq, p));
-	BUG_ON(task_current_selected(rq, p));
-	BUG_ON(p->nr_cpus_allowed <= 1);
-
-	BUG_ON(!task_on_rq_queued(p));
-	BUG_ON(!rt_task(p));
-
-	return p;
-}
-
 /*
  * If the current CPU has more than one RT task, see if the non
  * running task can migrate over to a CPU that is running a task
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 6cd473224cfe..4b97b36be691 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3548,7 +3548,7 @@ extern u64 avg_vruntime(struct cfs_rq *cfs_rq);
 extern int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se);
 #ifdef CONFIG_SMP
 static inline
-void push_task_chain(struct rq *rq, struct rq *dst_rq, struct task_struct *task)
+void __push_task_chain(struct rq *rq, struct rq *dst_rq, struct task_struct *task)
 {
 	deactivate_task(rq, task, 0);
 	set_task_cpu(task, dst_rq->cpu);
@@ -3556,7 +3556,7 @@ void push_task_chain(struct rq *rq, struct rq *dst_rq, struct task_struct *task)
 }
 
 static inline
-int task_is_pushable(struct rq *rq, struct task_struct *p, int cpu)
+int __task_is_pushable(struct rq *rq, struct task_struct *p, int cpu)
 {
 	if (!task_on_cpu(rq, p) &&
 	    cpumask_test_cpu(cpu, &p->cpus_mask))
@@ -3566,8 +3566,22 @@ int task_is_pushable(struct rq *rq, struct task_struct *p, int cpu)
 }
 
 #ifdef CONFIG_SCHED_PROXY_EXEC
+void push_task_chain(struct rq *rq, struct rq *dst_rq, struct task_struct *task);
+int task_is_pushable(struct rq *rq, struct task_struct *p, int cpu);
 struct task_struct *find_exec_ctx(struct rq *rq, struct task_struct *p);
 #else /* !CONFIG_SCHED_PROXY_EXEC */
+static inline
+void push_task_chain(struct rq *rq, struct rq *dst_rq, struct task_struct *task)
+{
+	__push_task_chain(rq, dst_rq, task);
+}
+
+static inline
+int task_is_pushable(struct rq *rq, struct task_struct *p, int cpu)
+{
+	return __task_is_pushable(rq, p, cpu);
+}
+
 static inline
 struct task_struct *find_exec_ctx(struct rq *rq, struct task_struct *p)
 {