Message ID | 20231024214625.6483-3-frederic@kernel.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce89:0:b0:403:3b70:6f57 with SMTP id p9csp2220014vqx; Tue, 24 Oct 2023 14:46:54 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEv5NUWEbp9Eeios7aBK48R+NEG1rX5uHsX9Zry+kG089T7jNeI5IHKNkIJaiZ1U5Asm8bw X-Received: by 2002:a05:6a20:144c:b0:13d:5b8e:db83 with SMTP id a12-20020a056a20144c00b0013d5b8edb83mr4231976pzi.9.1698184014415; Tue, 24 Oct 2023 14:46:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698184014; cv=none; d=google.com; s=arc-20160816; b=e0G/r5Yj9jaJTYTlI3/m1hW5AUz49gaKuxJwWhhjnVasmWPEQOa+GHqMToc2SO9Rbv 8ScGzuKCuGF0IumV9jKHOfgR8WBwPXLT2iNNcB3FCYiu/mO0i+U32vAknDrJh4oQLcT8 1ErUiA1O2+XqhOV0n46XWNwGJlnV8SB8u2nX9ybHhU4NeZX4hHLbqiXdXXpIKObXJLPo XhkyALS4PyaRwRnKHQyaPBpogfu8QYP/PktJZkBWZb9NICB9jHQgEeLMetb/xliMXd80 ySv2Kbdn0r7mFe9sr/G1HQ+nEArY7h4xR6RBGirZIHtLax54Rk0lLhiclOlsXMQ0IY6A CNCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=v+7GjAyILigS8hAAFW+aJ3436C0aXKncZVkr0C9NJz4=; fh=vRLsh+mlJ/3erNsj9qJLuZWu4oX6irbICpSWFnY9W1o=; b=0ycJQHfueN3OnHgmy0j4ruzThDLWTQDLDbbmEfzELqRVsevbumKjstIvw+2UDweumn Fh2YCacP7Vvi9Gbnhs3SVEyDa+UJHcPiAl/PE1AM2dZXe8TPzZtpK9tzwZhoKb7wV5yQ PktG9Zp3He8V2FAOGV1+Wk/Mll0QCm0cQTeAIslBbm/BJ9bW2lcveHn/GHno+cWDC7JR zEIKcCFFUIv5weul9qdUXmtDZ2lcr7zL8fPhetYbTmkcpElYNUHk90yGD5NRzNXRuzbs GNHdn8HjfQ6/BKelkjnDeFWGBLqNMgupYcD8blMftYRmsSYYftTittc0VDsxk4gh4qno 3wKg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="au3/Ymnr"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id j12-20020a056a00174c00b006be55174f3fsi9394762pfc.28.2023.10.24.14.46.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Oct 2023 14:46:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="au3/Ymnr"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 5F27D802767B; Tue, 24 Oct 2023 14:46:52 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234979AbjJXVqt (ORCPT <rfc822;aposhian.dev@gmail.com> + 27 others); Tue, 24 Oct 2023 17:46:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33834 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234973AbjJXVqq (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 24 Oct 2023 17:46:46 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2A75410D7; Tue, 24 Oct 2023 14:46:42 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2F750C433C9; Tue, 24 Oct 2023 21:46:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1698184001; bh=VF4rwPu4+YtRb7YG4mHtznxP8H/nXqrrus0i9svAMXk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=au3/YmnrmAJQJG631/cxb4ORAO9pRQTWXnK7H6KnQmVugQQJmTa3DGjZcpQw9ePHJ VdX4Ifg7c8hlUO3kYIPVedfUuNpoouQDRjZIwigFJ5/PmE7rSzXFb9v8iDKOJcliSR HgWUCmh1iEQyjGwkEZxVBM6mSAlAQmfGiMlCD5A/OtOAJPxJyw+h+ApaZRgo0tcomy WmAMclPVf8javsIc5NZGs+kYXXeO8I0+YOzKenD/QkuKkpVtgeK7ciGyjzLbweSBWv 4lZLEwd3THfxpNQ7GMPc5W8yr/npFOeo0iM66/n73ymQi9DO+Zs85gzU/c+TmxleVd 5PiXlKNNqLlqw== From: Frederic Weisbecker <frederic@kernel.org> To: LKML <linux-kernel@vger.kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org>, Boqun Feng <boqun.feng@gmail.com>, Joel Fernandes <joel@joelfernandes.org>, Josh Triplett <josh@joshtriplett.org>, Lai Jiangshan <jiangshanlai@gmail.com>, Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, Neeraj Upadhyay <neeraj.upadhyay@amd.com>, "Paul E . McKenney" <paulmck@kernel.org>, Steven Rostedt <rostedt@goodmis.org>, Uladzislau Rezki <urezki@gmail.com>, Zqiang <qiang.zhang1211@gmail.com>, rcu <rcu@vger.kernel.org>, "Liam R . Howlett" <Liam.Howlett@oracle.com>, Peter Zijlstra <peterz@infradead.org> Subject: [PATCH 2/4] rcu/tasks: Handle new PF_IDLE semantics Date: Tue, 24 Oct 2023 23:46:23 +0200 Message-ID: <20231024214625.6483-3-frederic@kernel.org> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231024214625.6483-1-frederic@kernel.org> References: <20231024214625.6483-1-frederic@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Tue, 24 Oct 2023 14:46:52 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780675001053828841 X-GMAIL-MSGID: 1780675001053828841 |
Series |
rcu: Fix PF_IDLE related issues v2
|
|
Commit Message
Frederic Weisbecker
Oct. 24, 2023, 9:46 p.m. UTC
The commit:
cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")
has changed the semantics of what is to be considered an idle task in
such a way that CPU boot code preceding the actual idle loop is excluded
from it.
This has however introduced new potential RCU-tasks stalls when either:
1) Grace period is started before init/0 had a chance to set PF_IDLE,
keeping it stuck in the holdout list until idle ever schedules.
2) Grace period is started when some possible CPUs have never been
online, keeping their idle tasks stuck in the holdout list until the
CPU ever boots up.
3) Similar to 1) but with secondary CPUs: Grace period is started
concurrently with secondary CPU booting, putting its idle task in
the holdout list because PF_IDLE isn't yet observed on it. It stays
then stuck in the holdout list until that CPU ever schedules. The
effect is mitigated here by the hotplug AP thread that must run to
bring the CPU up.
Fix this with handling the new semantics of PF_IDLE, keeping in mind
that it may or may not be set on an idle task. Take advantage of that to
strengthen the coverage of an RCU-tasks quiescent state within an idle
task, excluding the CPU boot code from it. Only the code running within
the idle loop is now a quiescent state, along with offline CPUs.
Fixes: cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")
Suggested-by: Joel Fernandes <joel@joelfernandes.org>
Suggested-by: Paul E . McKenney" <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
kernel/rcu/tasks.h | 31 +++++++++++++++++++++++++++++--
1 file changed, 29 insertions(+), 2 deletions(-)
Comments
On Tue, Oct 24, 2023 at 11:46:23PM +0200, Frederic Weisbecker wrote: > +/* Check for quiescent states since the pregp's synchronize_rcu() */ > +static bool rcu_tasks_is_holdout(struct task_struct *t) > +{ > + int cpu; > + > + /* Has the task been seen voluntarily sleeping? */ > + if (!READ_ONCE(t->on_rq)) > + return false; > + > + cpu = task_cpu(t); > + > + /* > + * Idle tasks within the idle loop or offline CPUs are RCU-tasks > + * quiescent states. But CPU boot code performed by the idle task > + * isn't a quiescent state. > + */ > + if (t == idle_task(cpu)) { > + if (is_idle_task(t)) > + return false; > + > + if (!rcu_cpu_online(cpu)) > + return false; > + } Hmm, why is this guarded by t == idle_task() ? Notably, there is the idle-injection thing that uses FIFO tasks to run 'idle', see play_idle_precise(). This will (temporarily) get PF_IDLE on tasks that are not idle_task(). > + > + return true; > +} > + > /* Per-task initial processing. */ > static void rcu_tasks_pertask(struct task_struct *t, struct list_head *hop) > { > - if (t != current && READ_ONCE(t->on_rq) && !is_idle_task(t)) { > + if (t != current && rcu_tasks_is_holdout(t)) {
Le Wed, Oct 25, 2023 at 10:40:08AM +0200, Peter Zijlstra a écrit : > On Tue, Oct 24, 2023 at 11:46:23PM +0200, Frederic Weisbecker wrote: > > > +/* Check for quiescent states since the pregp's synchronize_rcu() */ > > +static bool rcu_tasks_is_holdout(struct task_struct *t) > > +{ > > + int cpu; > > + > > + /* Has the task been seen voluntarily sleeping? */ > > + if (!READ_ONCE(t->on_rq)) > > + return false; > > + > > + cpu = task_cpu(t); > > + > > + /* > > + * Idle tasks within the idle loop or offline CPUs are RCU-tasks > > + * quiescent states. But CPU boot code performed by the idle task > > + * isn't a quiescent state. > > + */ > > + if (t == idle_task(cpu)) { > > + if (is_idle_task(t)) > > + return false; > > + > > + if (!rcu_cpu_online(cpu)) > > + return false; > > + } > > Hmm, why is this guarded by t == idle_task() ? > > Notably, there is the idle-injection thing that uses FIFO tasks to run > 'idle', see play_idle_precise(). This will (temporarily) get PF_IDLE on > tasks that are not idle_task(). Ah good point. So indeed the is_idle_task() test doesn't musn't be guarded by t == idle_task(cpu). But rcu_cpu_online() has to, otherwise if it's not an idle task, there is a risk that the task gets migrated out by the time we observe the old CPU offline. Thanks. > > > + > > + return true; > > +} > > + > > /* Per-task initial processing. */ > > static void rcu_tasks_pertask(struct task_struct *t, struct list_head *hop) > > { > > - if (t != current && READ_ONCE(t->on_rq) && !is_idle_task(t)) { > > + if (t != current && rcu_tasks_is_holdout(t)) { > >
> > Le Wed, Oct 25, 2023 at 10:40:08AM +0200, Peter Zijlstra a écrit : > > On Tue, Oct 24, 2023 at 11:46:23PM +0200, Frederic Weisbecker wrote: > > > > > +/* Check for quiescent states since the pregp's synchronize_rcu() */ > > > +static bool rcu_tasks_is_holdout(struct task_struct *t) > > > +{ > > > + int cpu; > > > + > > > + /* Has the task been seen voluntarily sleeping? */ > > > + if (!READ_ONCE(t->on_rq)) > > > + return false; > > > + > > > + cpu = task_cpu(t); > > > + > > > + /* > > > + * Idle tasks within the idle loop or offline CPUs are RCU-tasks > > > + * quiescent states. But CPU boot code performed by the idle task > > > + * isn't a quiescent state. > > > + */ > > > + if (t == idle_task(cpu)) { > > > + if (is_idle_task(t)) > > > + return false; > > > + > > > + if (!rcu_cpu_online(cpu)) > > > + return false; > > > + } > > > > Hmm, why is this guarded by t == idle_task() ? > > > > Notably, there is the idle-injection thing that uses FIFO tasks to run > > 'idle', see play_idle_precise(). This will (temporarily) get PF_IDLE on > > tasks that are not idle_task(). > > Ah good point. So indeed the is_idle_task() test doesn't musn't be > guarded by t == idle_task(cpu). But rcu_cpu_online() has to, otherwise > if it's not an idle task, there is a risk that the task gets migrated out > by the time we observe the old CPU offline. > If a fifo-tasks use play_idle_precise() to run idle and invoke do_idle(), may cause rcu-tasks to falsely report a rcu-tasks QS , when rcu_is_cpu_rrupt_from_idle() return true in rcu_sched_clock_irq(), so should we also add a check for "current == idle_task(task_cpu(current))" in the rcu_is_cpu_rrupt_from_idle() ? Thanks Zqiang > Thanks. > > > > > > + > > > + return true; > > > +} > > > + > > > /* Per-task initial processing. */ > > > static void rcu_tasks_pertask(struct task_struct *t, struct list_head *hop) > > > { > > > - if (t != current && READ_ONCE(t->on_rq) && !is_idle_task(t)) { > > > + if (t != current && rcu_tasks_is_holdout(t)) { > > > >
On Thu, Oct 26, 2023 at 08:15:33PM +0800, Z qiang wrote: > > > > Le Wed, Oct 25, 2023 at 10:40:08AM +0200, Peter Zijlstra a écrit : > > > On Tue, Oct 24, 2023 at 11:46:23PM +0200, Frederic Weisbecker wrote: > > > > > > > +/* Check for quiescent states since the pregp's synchronize_rcu() */ > > > > +static bool rcu_tasks_is_holdout(struct task_struct *t) > > > > +{ > > > > + int cpu; > > > > + > > > > + /* Has the task been seen voluntarily sleeping? */ > > > > + if (!READ_ONCE(t->on_rq)) > > > > + return false; > > > > + > > > > + cpu = task_cpu(t); > > > > + > > > > + /* > > > > + * Idle tasks within the idle loop or offline CPUs are RCU-tasks > > > > + * quiescent states. But CPU boot code performed by the idle task > > > > + * isn't a quiescent state. > > > > + */ > > > > + if (t == idle_task(cpu)) { > > > > + if (is_idle_task(t)) > > > > + return false; > > > > + > > > > + if (!rcu_cpu_online(cpu)) > > > > + return false; > > > > + } > > > > > > Hmm, why is this guarded by t == idle_task() ? > > > > > > Notably, there is the idle-injection thing that uses FIFO tasks to run > > > 'idle', see play_idle_precise(). This will (temporarily) get PF_IDLE on > > > tasks that are not idle_task(). > > > > Ah good point. So indeed the is_idle_task() test doesn't musn't be > > guarded by t == idle_task(cpu). But rcu_cpu_online() has to, otherwise > > if it's not an idle task, there is a risk that the task gets migrated out > > by the time we observe the old CPU offline. > > > > If a fifo-tasks use play_idle_precise() to run idle and invoke > do_idle(), may cause > rcu-tasks to falsely report a rcu-tasks QS Well, there can be a debate here: should we consider an idle injector as a real task that we must wait for a voluntary schedule or should we treat it just like an idle task? Having that whole idle task quiescent state in RCU-tasks is quite a strange semantic anyway. And in the long run, the purpose is to unify RCU-tasks and RCU-tasks-RUDE with relying on ct_dynticks for idle quiescent states. > , when rcu_is_cpu_rrupt_from_idle() > return true in rcu_sched_clock_irq(), so should we also add a check for > "current == idle_task(task_cpu(current))" in the rcu_is_cpu_rrupt_from_idle() > ? That looks fine OTOH. Whether idle injection or real idle, rcu_is_cpu_rrupt_from_idle() is always a quiescent state in real RCU. Because we know we have no RCU reader between ct_idle_enter() and ct_idle_exit(). Thanks.
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h index bf5f178fe723..acf81efe5eff 100644 --- a/kernel/rcu/tasks.h +++ b/kernel/rcu/tasks.h @@ -895,10 +895,37 @@ static void rcu_tasks_pregp_step(struct list_head *hop) synchronize_rcu(); } +/* Check for quiescent states since the pregp's synchronize_rcu() */ +static bool rcu_tasks_is_holdout(struct task_struct *t) +{ + int cpu; + + /* Has the task been seen voluntarily sleeping? */ + if (!READ_ONCE(t->on_rq)) + return false; + + cpu = task_cpu(t); + + /* + * Idle tasks within the idle loop or offline CPUs are RCU-tasks + * quiescent states. But CPU boot code performed by the idle task + * isn't a quiescent state. + */ + if (t == idle_task(cpu)) { + if (is_idle_task(t)) + return false; + + if (!rcu_cpu_online(cpu)) + return false; + } + + return true; +} + /* Per-task initial processing. */ static void rcu_tasks_pertask(struct task_struct *t, struct list_head *hop) { - if (t != current && READ_ONCE(t->on_rq) && !is_idle_task(t)) { + if (t != current && rcu_tasks_is_holdout(t)) { get_task_struct(t); t->rcu_tasks_nvcsw = READ_ONCE(t->nvcsw); WRITE_ONCE(t->rcu_tasks_holdout, true); @@ -947,7 +974,7 @@ static void check_holdout_task(struct task_struct *t, if (!READ_ONCE(t->rcu_tasks_holdout) || t->rcu_tasks_nvcsw != READ_ONCE(t->nvcsw) || - !READ_ONCE(t->on_rq) || + !rcu_tasks_is_holdout(t) || (IS_ENABLED(CONFIG_NO_HZ_FULL) && !is_idle_task(t) && READ_ONCE(t->rcu_tasks_idle_cpu) >= 0)) { WRITE_ONCE(t->rcu_tasks_holdout, false);