Message ID | 20231019233543.1243121-5-frederic@kernel.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2010:b0:403:3b70:6f57 with SMTP id fe16csp717000vqb; Thu, 19 Oct 2023 16:36:50 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEEIJUZXwr3BsvSmYKyaqF3zypzAvivDGseQRU+Q4a6RfnMI5aYUkKYNMIrq0npeLKJFu2m X-Received: by 2002:a05:6a20:7f91:b0:163:a041:336c with SMTP id d17-20020a056a207f9100b00163a041336cmr352202pzj.48.1697758609951; Thu, 19 Oct 2023 16:36:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697758609; cv=none; d=google.com; s=arc-20160816; b=O5LLijoaLWIOnBfduoPOrjfWkLqCN+FIefgTIT+mKndly044VtD2eQEm6Z2SpZPPrW ah5NqSDVUDJw60Y5gkJ+myfOh9ozzJ5zo1U8yxe1mpPEIlzLz2lKPDIhLNsUVL8OAsvH RZIh51Uuee64YBUmbXgaR+/kcIZZ9nMepMDqLfffn7VKjtt/ZJmPR8S0cXx1j3ZzDCpV W8jt73RAht/biZDh1K8gFD+OSYOQ0J0atqdNLBDDuSr02PBgwPwHMzxEr5ratGyRNrXF fplSX/BQr+hrnujS12TtqYsD6HL3ZN9lK/Al2az07P6O7UrPCHUAjFtVcTlgPsi9WCcV 6j9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ttotqUIsFPIvhFt2O/OWr45T6+RSP2w6vkEJLwLvpbo=; fh=Xtw5ULhofgKZ29v5WdXlT+QfFq0h1vKmlRJPSOHLxgo=; b=IsUBJMVm0Wk1+0qx54MkGc58CbsJDINz4yIVDs1K7xgDk0wSTqnHgg8INr309QsC+J T0Bl2zLz0p5j4U1Pv99tMTPokCvVzvXDFOQtrUYXl2lKPJMT905BZTcLCDi3++6+rAt4 jr4IuIYWaaX1aHGBPTpQz5rYD8vTMdJvA9vqktuKVnQ5JXqorVNN7Sx6S4XJCUATbYqR nFaZsStI6YZPDKwsmrTEIQwuAFwhJwCWAKLfPoSGlQvoHdh0idPfueV83cADOtFXhmoK 6M8fIqoQpnK1Q4ErUwcfbxsq73xqW2iHgYLzfUt0WV6hveka4QzIrRKni+PanKN2cezZ xMqg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=thbIHCBu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id ay8-20020a1709028b8800b001c0eefc0dfesi518674plb.130.2023.10.19.16.36.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Oct 2023 16:36:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=thbIHCBu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 108D181BAF3F; Thu, 19 Oct 2023 16:36:23 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346792AbjJSXgT (ORCPT <rfc822;a1648639935@gmail.com> + 26 others); Thu, 19 Oct 2023 19:36:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35706 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346755AbjJSXgJ (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 19 Oct 2023 19:36:09 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4017318A; Thu, 19 Oct 2023 16:36:07 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6E1AAC433CB; Thu, 19 Oct 2023 23:36:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1697758566; bh=SRJb9h+7LIx0+8pFOh4ACdoNOIWr+SDM51E4mSRaHRo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=thbIHCBuTzwshUBgP4oE3WpBPBX/sHqzNus6LSOjGCnc+iREVAB3PRd5cGTPUtG90 rn40Yzh2FxHCTw8qAPCQ43znSvmGBkAK6Px6ifHhYyVF8YFMMUu8X0BGIajFntpPnh +8sxBf5TGEgifiSUPwyq+Nm2ej6yAh2oXuoNRqID5b+MttWoOAODJTwpQWG+aPCxKL Wa+5mtFGOApxGtFdsYgeh5mQAaaVn3uISiOsF9qZm/65yJFQsY68DAgmC0x7mBcLPR yK8zLr346hQ5QwZlZE8wpCChG4KsvVzOPabJj8uWge87VAWcI5vZXULvfpNpK0N/Ej OC1n6qchNxN/A== From: Frederic Weisbecker <frederic@kernel.org> To: LKML <linux-kernel@vger.kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org>, Boqun Feng <boqun.feng@gmail.com>, Joel Fernandes <joel@joelfernandes.org>, Josh Triplett <josh@joshtriplett.org>, Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, Neeraj Upadhyay <neeraj.upadhyay@amd.com>, "Paul E . McKenney" <paulmck@kernel.org>, Steven Rostedt <rostedt@goodmis.org>, Uladzislau Rezki <urezki@gmail.com>, rcu <rcu@vger.kernel.org>, Zqiang <qiang.zhang1211@gmail.com>, Lai Jiangshan <jiangshanlai@gmail.com>, "Liam R . Howlett" <Liam.Howlett@oracle.com>, Peter Zijlstra <peterz@infradead.org>, Sebastian Siewior <bigeasy@linutronix.de>, Thomas Gleixner <tglx@linutronix.de> Subject: [PATCH 4/4] Revert "kernel/sched: Modify initial boot task idle setup" Date: Fri, 20 Oct 2023 01:35:43 +0200 Message-Id: <20231019233543.1243121-5-frederic@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231019233543.1243121-1-frederic@kernel.org> References: <20231019233543.1243121-1-frederic@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Thu, 19 Oct 2023 16:36:23 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780228932544502040 X-GMAIL-MSGID: 1780228932544502040 |
Series |
rcu: Fix PF_IDLE related issues, part. 1
|
|
Commit Message
Frederic Weisbecker
Oct. 19, 2023, 11:35 p.m. UTC
Now that rcutiny can deal with early boot PF_IDLE setting, revert
commit cff9b2332ab762b7e0586c793c431a8f2ea4db04.
This fixes several subtle issues introduced on RCU-tasks(-trace):
1) RCU-tasks stalls when:
1.1 Grace period is started before init/0 had a chance to set PF_IDLE,
keeping it stuck in the holdout list until idle ever schedules.
1.2 Grace period is started when some possible CPUs have never been
online, keeping their idle tasks stuck in the holdout list until
the CPU ever boots up.
1.3 Similar to 1.1 but with secondary CPUs: Grace period is started
concurrently with secondary CPU booting, putting its idle task in
the holdout list because PF_IDLE isn't yet observed on it. It
stays then stuck in the holdout list until that CPU ever
schedules. The effect is mitigated here by all the smpboot
kthreads and the hotplug AP thread that must run to bring the
CPU up.
2) Spurious warning on RCU task trace that assumes offline CPU's idle
task is always PF_IDLE.
More issues have been found in RCU-tasks related to PF_IDLE which should
be fixed with later changes as those are not regressions:
3) The RCU-Tasks semantics consider the idle loop as a quiescent state,
however:
3.1 The boot code preceding the idle entry is included in this
quiescent state. Especially after the completion of kthreadd_done
after which init/1 can launch userspace concurrently. The window
is tiny before PF_IDLE is set but it exists.
3.2 Similarly, the boot code preceding the idle entry on secondary
CPUs is wrongly accounted as RCU tasks quiescent state.
Fixes: cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
kernel/sched/core.c | 2 +-
kernel/sched/idle.c | 1 -
2 files changed, 1 insertion(+), 2 deletions(-)
Comments
On Fri, Oct 20, 2023 at 01:35:43AM +0200, Frederic Weisbecker wrote: > Now that rcutiny can deal with early boot PF_IDLE setting, revert > commit cff9b2332ab762b7e0586c793c431a8f2ea4db04. > > This fixes several subtle issues introduced on RCU-tasks(-trace): > > 1) RCU-tasks stalls when: > > 1.1 Grace period is started before init/0 had a chance to set PF_IDLE, > keeping it stuck in the holdout list until idle ever schedules. > > 1.2 Grace period is started when some possible CPUs have never been > online, keeping their idle tasks stuck in the holdout list until > the CPU ever boots up. > > 1.3 Similar to 1.1 but with secondary CPUs: Grace period is started > concurrently with secondary CPU booting, putting its idle task in > the holdout list because PF_IDLE isn't yet observed on it. It > stays then stuck in the holdout list until that CPU ever > schedules. The effect is mitigated here by all the smpboot > kthreads and the hotplug AP thread that must run to bring the > CPU up. > > 2) Spurious warning on RCU task trace that assumes offline CPU's idle > task is always PF_IDLE. > > More issues have been found in RCU-tasks related to PF_IDLE which should > be fixed with later changes as those are not regressions: > > 3) The RCU-Tasks semantics consider the idle loop as a quiescent state, > however: > > 3.1 The boot code preceding the idle entry is included in this > quiescent state. Especially after the completion of kthreadd_done > after which init/1 can launch userspace concurrently. The window > is tiny before PF_IDLE is set but it exists. > > 3.2 Similarly, the boot code preceding the idle entry on secondary > CPUs is wrongly accounted as RCU tasks quiescent state. > Urgh... so the plan is to fix RCU-tasks for all of the above to not rely on PF_IDLE ? Because I rather like the more strict PF_IDLE and subsequently don't much like this revert.
On Fri, Oct 20, 2023 at 10:25:31AM +0200, Peter Zijlstra wrote: > On Fri, Oct 20, 2023 at 01:35:43AM +0200, Frederic Weisbecker wrote: > > Now that rcutiny can deal with early boot PF_IDLE setting, revert > > commit cff9b2332ab762b7e0586c793c431a8f2ea4db04. > > > > This fixes several subtle issues introduced on RCU-tasks(-trace): > > > > 1) RCU-tasks stalls when: > > > > 1.1 Grace period is started before init/0 had a chance to set PF_IDLE, > > keeping it stuck in the holdout list until idle ever schedules. > > > > 1.2 Grace period is started when some possible CPUs have never been > > online, keeping their idle tasks stuck in the holdout list until > > the CPU ever boots up. > > > > 1.3 Similar to 1.1 but with secondary CPUs: Grace period is started > > concurrently with secondary CPU booting, putting its idle task in > > the holdout list because PF_IDLE isn't yet observed on it. It > > stays then stuck in the holdout list until that CPU ever > > schedules. The effect is mitigated here by all the smpboot > > kthreads and the hotplug AP thread that must run to bring the > > CPU up. > > > > 2) Spurious warning on RCU task trace that assumes offline CPU's idle > > task is always PF_IDLE. > > > > More issues have been found in RCU-tasks related to PF_IDLE which should > > be fixed with later changes as those are not regressions: > > > > 3) The RCU-Tasks semantics consider the idle loop as a quiescent state, > > however: > > > > 3.1 The boot code preceding the idle entry is included in this > > quiescent state. Especially after the completion of kthreadd_done > > after which init/1 can launch userspace concurrently. The window > > is tiny before PF_IDLE is set but it exists. > > > > 3.2 Similarly, the boot code preceding the idle entry on secondary > > CPUs is wrongly accounted as RCU tasks quiescent state. > > > > Urgh... so the plan is to fix RCU-tasks for all of the above to not rely > on PF_IDLE ? Because I rather like the more strict PF_IDLE and > subsequently don't much like this revert. Yeah I can't say I really like the old coverage of PF_IDLE either. The new one (after Liam's patch) is only halfway better defined though: it makes the boot CPU's idle behave quite well: PF_IDLE is set on idle entry. And secondary CPU's idle behave quite well also except when they go offline and then online again. And then the secondary boot code becomes PF_IDLE. We probably need something like this: diff --git a/kernel/cpu.c b/kernel/cpu.c index 3b9d5c7eb4a2..b24d7937b989 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -1394,7 +1394,9 @@ void cpuhp_report_idle_dead(void) { struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state); + current->flags &= ~PF_IDLE; BUG_ON(st->state != CPUHP_AP_OFFLINE); + rcutree_report_cpu_dead(); st->state = CPUHP_AP_IDLE_DEAD; /* @@ -1642,6 +1644,8 @@ void cpuhp_online_idle(enum cpuhp_state state) { struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state); + current->flags |= PF_IDLE; + /* Happens for the boot cpu */ if (state != CPUHP_AP_ONLINE_IDLE) return; diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index 5007b25c5bc6..342f58a329f5 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -373,7 +373,6 @@ EXPORT_SYMBOL_GPL(play_idle_precise); void cpu_startup_entry(enum cpuhp_state state) { - current->flags |= PF_IDLE; arch_cpu_idle_prepare(); cpuhp_online_idle(state); while (1) And then RCU-tasks can better rely on it as it really draws correctly the idle coverage. As for working on top of that, we have thought about solutions, lemme try to make them proper. Thanks.
On Fri, Oct 20, 2023 at 02:31:13PM +0200, Frederic Weisbecker wrote: > Yeah I can't say I really like the old coverage of PF_IDLE either. The new one > (after Liam's patch) is only halfway better defined though: it makes the boot > CPU's idle behave quite well: PF_IDLE is set on idle entry. And secondary > CPU's idle behave quite well also except when they go offline and then online > again. And then the secondary boot code becomes PF_IDLE. Bah offline, yeah, we should just not do that :-) > We probably need something like this: > > diff --git a/kernel/cpu.c b/kernel/cpu.c > index 3b9d5c7eb4a2..b24d7937b989 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -1394,7 +1394,9 @@ void cpuhp_report_idle_dead(void) > { > struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state); > > + current->flags &= ~PF_IDLE; > BUG_ON(st->state != CPUHP_AP_OFFLINE); > + > rcutree_report_cpu_dead(); > st->state = CPUHP_AP_IDLE_DEAD; > /* > @@ -1642,6 +1644,8 @@ void cpuhp_online_idle(enum cpuhp_state state) > { > struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state); > > + current->flags |= PF_IDLE; > + > /* Happens for the boot cpu */ > if (state != CPUHP_AP_ONLINE_IDLE) > return; Yeah that works I suppose.
On Fri, Oct 20, 2023 at 03:48:36PM +0200, Peter Zijlstra wrote: > On Fri, Oct 20, 2023 at 02:31:13PM +0200, Frederic Weisbecker wrote: > > > Yeah I can't say I really like the old coverage of PF_IDLE either. The new one > > (after Liam's patch) is only halfway better defined though: it makes the boot > > CPU's idle behave quite well: PF_IDLE is set on idle entry. And secondary > > CPU's idle behave quite well also except when they go offline and then online > > again. And then the secondary boot code becomes PF_IDLE. > > Bah offline, yeah, we should just not do that :-) > > > We probably need something like this: > > > > diff --git a/kernel/cpu.c b/kernel/cpu.c > > index 3b9d5c7eb4a2..b24d7937b989 100644 > > --- a/kernel/cpu.c > > +++ b/kernel/cpu.c > > @@ -1394,7 +1394,9 @@ void cpuhp_report_idle_dead(void) > > { > > struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state); > > > > + current->flags &= ~PF_IDLE; > > BUG_ON(st->state != CPUHP_AP_OFFLINE); > > + > > rcutree_report_cpu_dead(); > > st->state = CPUHP_AP_IDLE_DEAD; > > /* > > @@ -1642,6 +1644,8 @@ void cpuhp_online_idle(enum cpuhp_state state) > > { > > struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state); > > > > + current->flags |= PF_IDLE; > > + > > /* Happens for the boot cpu */ > > if (state != CPUHP_AP_ONLINE_IDLE) > > return; > > Yeah that works I suppose. Booting up kernels being what it is, there might not be a completely pretty solution. Not that I would say "no" to such a solution should it appear, mind you! ;-) Thanx, Paul
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index ad960f97e4e1..b02dcbe98024 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9269,7 +9269,7 @@ void __init init_idle(struct task_struct *idle, int cpu) * PF_KTHREAD should already be set at this point; regardless, make it * look like a proper per-CPU kthread. */ - idle->flags |= PF_KTHREAD | PF_NO_SETAFFINITY; + idle->flags |= PF_IDLE | PF_KTHREAD | PF_NO_SETAFFINITY; kthread_set_per_cpu(idle, cpu); #ifdef CONFIG_SMP diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index 5007b25c5bc6..342f58a329f5 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -373,7 +373,6 @@ EXPORT_SYMBOL_GPL(play_idle_precise); void cpu_startup_entry(enum cpuhp_state state) { - current->flags |= PF_IDLE; arch_cpu_idle_prepare(); cpuhp_online_idle(state); while (1)