Message ID | 20230204022051.2737724-1-joel@joelfernandes.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1163036wrn; Fri, 3 Feb 2023 18:22:43 -0800 (PST) X-Google-Smtp-Source: AK7set9QPGWMY8guhWXk7QFfGqxjOS8ARcN4wlpW6nYpp9USJkMkiw3lSgDRzhI/losNTX9CP8GB X-Received: by 2002:a17:906:3690:b0:88c:617f:bcf8 with SMTP id a16-20020a170906369000b0088c617fbcf8mr11599823ejc.61.1675477363147; Fri, 03 Feb 2023 18:22:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675477363; cv=none; d=google.com; s=arc-20160816; b=oFD7VlTcvyigWMw4UggPfw0UtyhKADvqfDT4p9+mgCIKLbOUf7jAJfaNR5V1W3/n8M IQ5qOKkga1AlHdQpCId7IbO6Bdge2U0jMQ1+u+4uAqioMDUjXaaLi14ecf7IcbXf+9Zr 62pUy9yKm1CzjL/cn3i8lyeFnWlkuwhataCqWFOoDJNWE952Obb8NcITImd1WYMYXglk X72+ZPpkit0P8/qGnINstXs86b8cYV5qupcx2VLuxdQj01w/1osg059Me89ySZEDdeh6 BWDfPfSZJ8Mcg4/0fSRufKc9QIyd6iZbA9Ygnx3RZ0D8yhea8JP+vmn05pQj93QG1f3H v55w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=O+Rvn1JkfUTqQUprG3LidZWtpE2Owjne4mF5ZG2WHZQ=; b=IZOY2xGptgEHyD6qM5Xqh8cTf6iR4bmYKhppm+lXkCMwKqz6GkYacyV1WjK0HeP4VG nJhuIiVbntyEWytsl5hBgnTmLHDu72Y2iJmN003Bj3OJaGCdqWbbMCa70fD/aPGYa8yy Z28I2BjenHovqGFaFr1bM9MfzJZC42WOaXK+32LrbfWOI9Sxm+pJpiU/7J822bc24S+c yfd05Jo9BsTsrHtqPOOIhhPmwXJa8M7YeBdblEuAFotAPjtbis0gRxtOqHsbAOtgs+2Q Ykef2INQFe4H6oykh5zFompKZQ4Sj052P4zmnE3U4NOFWaD/GqZt2esWyhNxYXLoIKGz Depw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=HxpEjUaI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id fh6-20020a1709073a8600b0088ce3b14b8dsi4485306ejc.98.2023.02.03.18.22.19; Fri, 03 Feb 2023 18:22:43 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=HxpEjUaI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232127AbjBDCVM (ORCPT <rfc822;il.mystafa@gmail.com> + 99 others); Fri, 3 Feb 2023 21:21:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35912 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229766AbjBDCVK (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 3 Feb 2023 21:21:10 -0500 Received: from mail-qv1-xf2e.google.com (mail-qv1-xf2e.google.com [IPv6:2607:f8b0:4864:20::f2e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 741971CAFA for <linux-kernel@vger.kernel.org>; Fri, 3 Feb 2023 18:21:08 -0800 (PST) Received: by mail-qv1-xf2e.google.com with SMTP id ll10so3987425qvb.6 for <linux-kernel@vger.kernel.org>; Fri, 03 Feb 2023 18:21:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=O+Rvn1JkfUTqQUprG3LidZWtpE2Owjne4mF5ZG2WHZQ=; b=HxpEjUaIgXZu4UXq0hMXQMd9KfFVD9LWbiToWocN96/h5IO9U7eTTnXtxRpzpUiggi 0Y8pChI5LH+FwY2ndyfBUlbsctjREJs9G3oOUUxMaENd1bxQoXEABtuMMIw2DpPf5eZY 2/P6jvWZpgUjLx6SJNWfl/FB/6Y6gQM7F6zns= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=O+Rvn1JkfUTqQUprG3LidZWtpE2Owjne4mF5ZG2WHZQ=; b=gGvoVwHPQSm7cIQoP0AzMaROz8Ov7xArco5o74kynM87B1dB4otmIlsLItvTF2sS3K EBQ3LZPWxGhtwatq8sZTC0i0o3cZTvELOkWyppeZ15bAriGCC+b6rdL4euJto4IzT3Zu KminWW09i1nk6IP47sLgq3U3iWbaTIc70EABN3LFnjBj9L1kV74P80UmsXa+/JeK6mAU Tv+c2Xaq+oG+roCELrM8pVgFitY6lPygVurL2/90CqImiw2/zHiFj1r8bCOd8IjgCa08 V3ZTQ0H9WXQbmYCfsHVHwfAibezFQtNVBcLh+LR1+0BYMDW6uHDgQKTSaCYl//bhd/Ai y06A== X-Gm-Message-State: AO0yUKUd7Whe2uPiR3TM8kGyD11gJ+Js565CLb/OSo8NJRIp6tAfxQTA A7t46/1VQg491sIQPDbV98xSPehWBYWsNZat X-Received: by 2002:a05:6214:300c:b0:56a:d22b:5821 with SMTP id ke12-20020a056214300c00b0056ad22b5821mr4739456qvb.22.1675477266824; Fri, 03 Feb 2023 18:21:06 -0800 (PST) Received: from joelboxx.c.googlers.com.com (129.239.188.35.bc.googleusercontent.com. [35.188.239.129]) by smtp.gmail.com with ESMTPSA id n2-20020a05620a294200b006f9ddaaf01esm3056951qkp.102.2023.02.03.18.21.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Feb 2023 18:21:05 -0800 (PST) From: "Joel Fernandes (Google)" <joel@joelfernandes.org> To: linux-kernel@vger.kernel.org Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>, Qiang Zhang <Qiang1.zhang@intel.com>, Frederic Weisbecker <frederic@kernel.org>, Josh Triplett <josh@joshtriplett.org>, Lai Jiangshan <jiangshanlai@gmail.com>, Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, "Paul E. McKenney" <paulmck@kernel.org>, rcu@vger.kernel.org, Steven Rostedt <rostedt@goodmis.org>, Boqun Feng <boqun.feng@gmail.com> Subject: [PATCH] rcu/tree: Improve comments in rcu_report_qs_rdp() Date: Sat, 4 Feb 2023 02:20:50 +0000 Message-Id: <20230204022051.2737724-1-joel@joelfernandes.org> X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,URIBL_BLACK autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756865351506782367?= X-GMAIL-MSGID: =?utf-8?q?1756865351506782367?= |
Series |
rcu/tree: Improve comments in rcu_report_qs_rdp()
|
|
Commit Message
Joel Fernandes
Feb. 4, 2023, 2:20 a.m. UTC
Recent discussion triggered due to a patch linked below, from Qiang,
shed light on the need to accelerate from QS reporting paths.
Update the comments to capture this piece of knowledge.
Link: https://lore.kernel.org/all/20230118073014.2020743-1-qiang1.zhang@intel.com/
Cc: Qiang Zhang <Qiang1.zhang@intel.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
kernel/rcu/tree.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
Comments
>Recent discussion triggered due to a patch linked below, from Qiang, >shed light on the need to accelerate from QS reporting paths. > >Update the comments to capture this piece of knowledge. > >Link: https://lore.kernel.org/all/20230118073014.2020743-1-qiang1.zhang@intel.com/ >Cc: Qiang Zhang <Qiang1.zhang@intel.com> >Cc: Frederic Weisbecker <frederic@kernel.org> >Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > >--- > kernel/rcu/tree.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > >diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c >index 93eb03f8ed99..713eb6ca6902 100644 >--- a/kernel/rcu/tree.c >+++ b/kernel/rcu/tree.c >@@ -1983,7 +1983,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > } else { > /* > * This GP can't end until cpu checks in, so all of our >- * callbacks can be processed during the next GP. >+ * callbacks can be processed during the next GP. Do >+ * the acceleration from here otherwise there may be extra >+ * grace period delays, as any accelerations from rcu_core() Does the extra grace period delays means that if not accelerate callback, the grace period will take more time to end ? or refers to a delay in the start time of a new grace period? Thanks Zqiang >+ * or note_gp_changes() may happen only after the GP after the >+ * current one has already started. Further, rcu_core() >+ * only accelerates if RCU is idle (no GP in progress). > * > * NOCB kthreads have their own way to deal with that... > */ >@@ -1993,6 +1998,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > /* > * ...but NOCB kthreads may miss or delay callbacks acceleration > * if in the middle of a (de-)offloading process. >+ * >+ * Such missed acceleration may cause the callbacks to >+ * be stranded until RCU is fully de-offloaded, as >+ * acceleration from rcu_core() and note_gp_changes() >+ * cannot happen for fully/partially offloaded mode due >+ * to ordering dependency between rnp lock and nocb_lock. > */ > needacc = true; > } >-- >2.39.1.519.gcb327c4b5f-goog >
On Sun, Feb 5, 2023 at 10:09 PM Zhang, Qiang1 <qiang1.zhang@intel.com> wrote: > > > >Recent discussion triggered due to a patch linked below, from Qiang, > >shed light on the need to accelerate from QS reporting paths. > > > >Update the comments to capture this piece of knowledge. > > > >Link: https://lore.kernel.org/all/20230118073014.2020743-1-qiang1.zhang@intel.com/ > >Cc: Qiang Zhang <Qiang1.zhang@intel.com> > >Cc: Frederic Weisbecker <frederic@kernel.org> > >Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > >--- > > kernel/rcu/tree.c | 13 ++++++++++++- > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > >diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > >index 93eb03f8ed99..713eb6ca6902 100644 > >--- a/kernel/rcu/tree.c > >+++ b/kernel/rcu/tree.c > >@@ -1983,7 +1983,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > > } else { > > /* > > * This GP can't end until cpu checks in, so all of our > >- * callbacks can be processed during the next GP. > >+ * callbacks can be processed during the next GP. Do > >+ * the acceleration from here otherwise there may be extra > >+ * grace period delays, as any accelerations from rcu_core() > > > Does the extra grace period delays means that if not accelerate callback, > the grace period will take more time to end ? or refers to a delay in the > start time of a new grace period? Yes, so IMO it is like this if we don't accelerate: 1. Start GP 1 2. CPU1 queues callback C1 (not accelerated yet) 3. CPU1 reports QS for GP1 (not accelerating anything). 4. GP1 ends 5. CPU1's note_gp_changes() is called, accelerate happens, now the CB will execute after GP3 (or alternately, rcu_core() on CPU1 does accelerate). 6. GP2 ends. 7. GP3 starts. 8. GP3 ends. 9. CB is invoked Instead, what we will get the following thanks to the acceleration here is: 1. Start GP 1 2. CPU1 queues callback C1 (not accelerated yet) 3. CPU1 reports QS for GP1 and acceleration happens as done by the code this patch adds comments for. 4. GP1 ends 5. CPU1's note_gp_changes() is called 6. GP2 ends. 7. CB is invoked Does that make sense or is there a subtlety I missed? Thanks, - Joel > > Thanks > Zqiang > > >+ * or note_gp_changes() may happen only after the GP after the > >+ * current one has already started. Further, rcu_core() > >+ * only accelerates if RCU is idle (no GP in progress). > > * > > * NOCB kthreads have their own way to deal with that... > > */ > >@@ -1993,6 +1998,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > > /* > > * ...but NOCB kthreads may miss or delay callbacks acceleration > > * if in the middle of a (de-)offloading process. > >+ * > >+ * Such missed acceleration may cause the callbacks to > >+ * be stranded until RCU is fully de-offloaded, as > >+ * acceleration from rcu_core() and note_gp_changes() > >+ * cannot happen for fully/partially offloaded mode due > >+ * to ordering dependency between rnp lock and nocb_lock. > > */ > > needacc = true; > > } > >-- > >2.39.1.519.gcb327c4b5f-goog > >
On Mon, Feb 6, 2023 at 12:19 PM Joel Fernandes <joel@joelfernandes.org> wrote: > > On Sun, Feb 5, 2023 at 10:09 PM Zhang, Qiang1 <qiang1.zhang@intel.com> wrote: > > > > > > >Recent discussion triggered due to a patch linked below, from Qiang, > > >shed light on the need to accelerate from QS reporting paths. > > > > > >Update the comments to capture this piece of knowledge. > > > > > >Link: https://lore.kernel.org/all/20230118073014.2020743-1-qiang1.zhang@intel.com/ > > >Cc: Qiang Zhang <Qiang1.zhang@intel.com> > > >Cc: Frederic Weisbecker <frederic@kernel.org> > > >Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > > >--- > > > kernel/rcu/tree.c | 13 ++++++++++++- > > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > > >diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > >index 93eb03f8ed99..713eb6ca6902 100644 > > >--- a/kernel/rcu/tree.c > > >+++ b/kernel/rcu/tree.c > > >@@ -1983,7 +1983,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > > > } else { > > > /* > > > * This GP can't end until cpu checks in, so all of our > > >- * callbacks can be processed during the next GP. > > >+ * callbacks can be processed during the next GP. Do > > >+ * the acceleration from here otherwise there may be extra > > >+ * grace period delays, as any accelerations from rcu_core() > > > > > > Does the extra grace period delays means that if not accelerate callback, > > the grace period will take more time to end ? or refers to a delay in the > > start time of a new grace period? > > Yes, so IMO it is like this if we don't accelerate: > 1. Start GP 1 > 2. CPU1 queues callback C1 (not accelerated yet) > 3. CPU1 reports QS for GP1 (not accelerating anything). > 4. GP1 ends > 5. CPU1's note_gp_changes() is called, accelerate happens, now the CB > will execute after GP3 (or alternately, rcu_core() on CPU1 does > accelerate). > 6. GP2 ends. > 7. GP3 starts. > 8. GP3 ends. > 9. CB is invoked > > Instead, what we will get the following thanks to the acceleration here is: > 1. Start GP 1 > 2. CPU1 queues callback C1 (not accelerated yet) > 3. CPU1 reports QS for GP1 and acceleration happens as done by the > code this patch adds comments for. > 4. GP1 ends > 5. CPU1's note_gp_changes() is called > 6. GP2 ends. > 7. CB is invoked Sorry I missed some steps, here is the update: 1. Start GP 1 2. CPU1 queues callback C1 (not accelerated yet) 3. CPU1 reports QS for GP1 (not accelerating anything). 4. GP1 ends 5. GP2 starts for some other reason from some other CPU. 6. CPU1's note_gp_changes() is called, acceleration happens, now the CB will execute after GP3. 7. GP2 ends. 8. GP3 starts. 9. GP3 ends. 10. CB is invoked Instead, what we will get the following thanks to the acceleration here is: 1. Start GP 1 2. CPU1 queues callback C1 (not accelerated yet) 3. CPU1 reports QS for GP1 and acceleration happens as done by the code this patch adds comments for. 4. GP1 ends 5. GP2 starts 6. GP2 ends. 7. CB is invoked Does that make sense or is there a subtlety I missed? Thanks, - Joel
On Mon, Feb 6, 2023 at 12:19 PM Joel Fernandes <joel@joelfernandes.org> wrote: > > On Sun, Feb 5, 2023 at 10:09 PM Zhang, Qiang1 <qiang1.zhang@intel.com> wrote: > > > > > > >Recent discussion triggered due to a patch linked below, from Qiang, > > >shed light on the need to accelerate from QS reporting paths. > > > > > >Update the comments to capture this piece of knowledge. > > > > > >Link: https://lore.kernel.org/all/20230118073014.2020743-1-qiang1.zhang@intel.com/ > > >Cc: Qiang Zhang <Qiang1.zhang@intel.com> > > >Cc: Frederic Weisbecker <frederic@kernel.org> > > >Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > > >--- > > > kernel/rcu/tree.c | 13 ++++++++++++- > > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > > >diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > >index 93eb03f8ed99..713eb6ca6902 100644 > > >--- a/kernel/rcu/tree.c > > >+++ b/kernel/rcu/tree.c > > >@@ -1983,7 +1983,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > > > } else { > > > /* > > > * This GP can't end until cpu checks in, so all of our > > >- * callbacks can be processed during the next GP. > > >+ * callbacks can be processed during the next GP. Do > > >+ * the acceleration from here otherwise there may be extra > > >+ * grace period delays, as any accelerations from rcu_core() > > > > > > Does the extra grace period delays means that if not accelerate callback, > > the grace period will take more time to end ? or refers to a delay in the > > start time of a new grace period? > > Yes, so IMO it is like this if we don't accelerate: > 1. Start GP 1 > 2. CPU1 queues callback C1 (not accelerated yet) > 3. CPU1 reports QS for GP1 (not accelerating anything). > 4. GP1 ends > 5. CPU1's note_gp_changes() is called, accelerate happens, now the CB > will execute after GP3 (or alternately, rcu_core() on CPU1 does > accelerate). > 6. GP2 ends. > 7. GP3 starts. > 8. GP3 ends. > 9. CB is invoked > > Instead, what we will get the following thanks to the acceleration here is: > 1. Start GP 1 > 2. CPU1 queues callback C1 (not accelerated yet) > 3. CPU1 reports QS for GP1 and acceleration happens as done by the > code this patch adds comments for. > 4. GP1 ends > 5. CPU1's note_gp_changes() is called > 6. GP2 ends. > 7. CB is invoked > >Sorry I missed some steps, here is the update: >1. Start GP 1 >2. CPU1 queues callback C1 (not accelerated yet) >3. CPU1 reports QS for GP1 (not accelerating anything). >4. GP1 ends >5. GP2 starts for some other reason from some other CPU. >6. CPU1's note_gp_changes() is called, acceleration happens, now the CB >will execute after GP3. >7. GP2 ends. >8. GP3 starts. >9. GP3 ends. >10. CB is invoked > >Instead, what we will get the following thanks to the acceleration here is: >1. Start GP 1 >2. CPU1 queues callback C1 (not accelerated yet) >3. CPU1 reports QS for GP1 and acceleration happens as done by the >code this patch adds comments for. >4. GP1 ends >5. GP2 starts >6. GP2 ends. >7. CB is invoked > >Does that make sense or is there a subtlety I missed? Thanks for detailed description, that is to say, the grace period delays means that if there is no acceleration, the invocation of callback may be delayed by one or more grace periods. Can you re-describe the meaning of "grace period delays "in the comments? Thanks Zqiang > >Thanks, > > - Joel
On Sat, Feb 04, 2023 at 02:20:50AM +0000, Joel Fernandes (Google) wrote: > Recent discussion triggered due to a patch linked below, from Qiang, > shed light on the need to accelerate from QS reporting paths. > > Update the comments to capture this piece of knowledge. > > Link: https://lore.kernel.org/all/20230118073014.2020743-1-qiang1.zhang@intel.com/ > Cc: Qiang Zhang <Qiang1.zhang@intel.com> > Cc: Frederic Weisbecker <frederic@kernel.org> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > --- > kernel/rcu/tree.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 93eb03f8ed99..713eb6ca6902 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1983,7 +1983,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > } else { > /* > * This GP can't end until cpu checks in, so all of our > - * callbacks can be processed during the next GP. > + * callbacks can be processed during the next GP. Do > + * the acceleration from here otherwise there may be extra > + * grace period delays, as any accelerations from rcu_core() > + * or note_gp_changes() may happen only after the GP after the > + * current one has already started. Further, rcu_core() > + * only accelerates if RCU is idle (no GP in progress). Actually note_gp_changes() should take care of that. My gut feeling is that the acceleration in rcu_report_qs_rdp() only stands for: * callbacks that may be enqueued from an IRQ firing during the small window between the RNP unlock in note_gp_changes() and the RNP lock in rcu_report_qs_rdp() * __note_gp_changes() got called even before from the GP kthread, and callbacks got enqueued between that and rcu_core(). Thanks.
On Tue, Feb 7, 2023 at 8:24 AM Frederic Weisbecker <frederic@kernel.org> wrote: > > On Sat, Feb 04, 2023 at 02:20:50AM +0000, Joel Fernandes (Google) wrote: > > Recent discussion triggered due to a patch linked below, from Qiang, > > shed light on the need to accelerate from QS reporting paths. > > > > Update the comments to capture this piece of knowledge. > > > > Link: https://lore.kernel.org/all/20230118073014.2020743-1-qiang1.zhang@intel.com/ > > Cc: Qiang Zhang <Qiang1.zhang@intel.com> > > Cc: Frederic Weisbecker <frederic@kernel.org> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > --- > > kernel/rcu/tree.c | 13 ++++++++++++- > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 93eb03f8ed99..713eb6ca6902 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -1983,7 +1983,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > > } else { > > /* > > * This GP can't end until cpu checks in, so all of our > > - * callbacks can be processed during the next GP. > > + * callbacks can be processed during the next GP. Do > > + * the acceleration from here otherwise there may be extra > > + * grace period delays, as any accelerations from rcu_core() > > + * or note_gp_changes() may happen only after the GP after the > > + * current one has already started. Further, rcu_core() > > + * only accelerates if RCU is idle (no GP in progress). > > Actually note_gp_changes() should take care of that. You are referring to rcu_core() -> rcu_check_quiescent_state() -> note_gp_changes() doing the acceleration prior to the rcu_core() -> rcu_report_qs_rdp() call, correct? Ah, but note_gp_changes() has an early return which triggers if either: 1. The rnp spinlock trylock failed. 2. The start of a new grace period was already detected before, so rdp->gp_seq == rnp->gp_seq. So I think it is possible that we are in the middle of a GP, and rcu_core() is called because QS reporting is required for the CPU, and say the current GP started we are in the middle off occurs from the same CPU so rdp->gp_seq == rnp->gp_seq. Now, rcu_core()'s call to note_gp_changes() should return early but its later call to report_qs_rdp() will not accelerate the callback without the code we are commenting here. > My gut feeling is that the > acceleration in rcu_report_qs_rdp() only stands for: > > * callbacks that may be enqueued from an IRQ firing during the small window > between the RNP unlock in note_gp_changes() and the RNP lock in > rcu_report_qs_rdp() Sure, this also seems like a valid reason. > * __note_gp_changes() got called even before from the GP kthread, and callbacks > got enqueued between that and rcu_core(). Agreed. In this case we will take the early return in note_gp_changes() when called from the rcu_core(). So yeah, that was kind of my point as well but slightly different reasoning. Let me know if you disagree with anything I mentioned, though. - Joel
On Mon, Feb 6, 2023 at 8:15 PM Zhang, Qiang1 <qiang1.zhang@intel.com> wrote: > > On Mon, Feb 6, 2023 at 12:19 PM Joel Fernandes <joel@joelfernandes.org> wrote: > > > > On Sun, Feb 5, 2023 at 10:09 PM Zhang, Qiang1 <qiang1.zhang@intel.com> wrote: > > > > > > > > > >Recent discussion triggered due to a patch linked below, from Qiang, > > > >shed light on the need to accelerate from QS reporting paths. > > > > > > > >Update the comments to capture this piece of knowledge. > > > > > > > >Link: https://lore.kernel.org/all/20230118073014.2020743-1-qiang1.zhang@intel.com/ > > > >Cc: Qiang Zhang <Qiang1.zhang@intel.com> > > > >Cc: Frederic Weisbecker <frederic@kernel.org> > > > >Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > > > > >--- > > > > kernel/rcu/tree.c | 13 ++++++++++++- > > > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > > > > >diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > >index 93eb03f8ed99..713eb6ca6902 100644 > > > >--- a/kernel/rcu/tree.c > > > >+++ b/kernel/rcu/tree.c > > > >@@ -1983,7 +1983,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > > > > } else { > > > > /* > > > > * This GP can't end until cpu checks in, so all of our > > > >- * callbacks can be processed during the next GP. > > > >+ * callbacks can be processed during the next GP. Do > > > >+ * the acceleration from here otherwise there may be extra > > > >+ * grace period delays, as any accelerations from rcu_core() > > > > > > > > > Does the extra grace period delays means that if not accelerate callback, > > > the grace period will take more time to end ? or refers to a delay in the > > > start time of a new grace period? > > > > Yes, so IMO it is like this if we don't accelerate: > > 1. Start GP 1 > > 2. CPU1 queues callback C1 (not accelerated yet) > > 3. CPU1 reports QS for GP1 (not accelerating anything). > > 4. GP1 ends > > 5. CPU1's note_gp_changes() is called, accelerate happens, now the CB > > will execute after GP3 (or alternately, rcu_core() on CPU1 does > > accelerate). > > 6. GP2 ends. > > 7. GP3 starts. > > 8. GP3 ends. > > 9. CB is invoked > > > > Instead, what we will get the following thanks to the acceleration here is: > > 1. Start GP 1 > > 2. CPU1 queues callback C1 (not accelerated yet) > > 3. CPU1 reports QS for GP1 and acceleration happens as done by the > > code this patch adds comments for. > > 4. GP1 ends > > 5. CPU1's note_gp_changes() is called > > 6. GP2 ends. > > 7. CB is invoked > > > >Sorry I missed some steps, here is the update: > >1. Start GP 1 > >2. CPU1 queues callback C1 (not accelerated yet) > >3. CPU1 reports QS for GP1 (not accelerating anything). > >4. GP1 ends > >5. GP2 starts for some other reason from some other CPU. > >6. CPU1's note_gp_changes() is called, acceleration happens, now the CB > >will execute after GP3. > >7. GP2 ends. > >8. GP3 starts. > >9. GP3 ends. > >10. CB is invoked > > > >Instead, what we will get the following thanks to the acceleration here is: > >1. Start GP 1 > >2. CPU1 queues callback C1 (not accelerated yet) > >3. CPU1 reports QS for GP1 and acceleration happens as done by the > >code this patch adds comments for. > >4. GP1 ends > >5. GP2 starts > >6. GP2 ends. > >7. CB is invoked > > > >Does that make sense or is there a subtlety I missed? > > > > Thanks for detailed description, that is to say, the grace period delays means that > if there is no acceleration, the invocation of callback may be delayed by one or > more grace periods. > > Can you re-describe the meaning of "grace period delays "in the comments? Yes, good point. I should change it to "one or more delays". Thank you for the suggestion! Sorry for my late reply as I was OOO this week. - Joel > > Thanks > Zqiang > > > > > > >Thanks, > > > > - Joel
On Fri, Feb 3, 2023 at 9:21 PM Joel Fernandes (Google) <joel@joelfernandes.org> wrote: > > Recent discussion triggered due to a patch linked below, from Qiang, > shed light on the need to accelerate from QS reporting paths. > > Update the comments to capture this piece of knowledge. > > Link: https://lore.kernel.org/all/20230118073014.2020743-1-qiang1.zhang@intel.com/ > Cc: Qiang Zhang <Qiang1.zhang@intel.com> > Cc: Frederic Weisbecker <frederic@kernel.org> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > --- > kernel/rcu/tree.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 93eb03f8ed99..713eb6ca6902 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1983,7 +1983,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > } else { > /* > * This GP can't end until cpu checks in, so all of our > - * callbacks can be processed during the next GP. > + * callbacks can be processed during the next GP. Do > + * the acceleration from here otherwise there may be extra > + * grace period delays, as any accelerations from rcu_core() > + * or note_gp_changes() may happen only after the GP after the > + * current one has already started. Further, rcu_core() > + * only accelerates if RCU is idle (no GP in progress). > * And I couldn't help but pass this through ChatGPT: Before: This GP can't end until cpu checks in, so all of our callbacks can be processed during the next GP. Do the acceleration from here otherwise there may be extra grace period delays, as any accelerations from rcu_core() or note_gp_changes() may happen only after the GP after the current one has already started. Further, rcu_core() only accelerates if RCU is idle (no GP in progress). NOCB kthreads have their own way to deal with that... After: In order to ensure all callbacks are processed efficiently, this grace period (GP) must not end until the CPU checks in. To avoid unnecessary grace period delays, it is important to initiate acceleration from the current location. Any acceleration from the functions "rcu_core()" or "note_gp_changes()" may only occur after the next GP has already started. It is important to note that "rcu_core()" will only initiate acceleration if the RCU system is idle (no GP in progress). NOCB kthreads, on the other hand, have their own unique method for handling this situation. It sounds well written, but maybe a bit factually incorrect.. trying again: To ensure all callbacks are processed in the next grace period, this GP must not end until the CPU has checked in. To avoid any additional grace period delays, it's important to perform the acceleration from here. If acceleration is performed from rcu_core() or note_gp_changes(), it may only occur after the next GP has already started. Additionally, it's important to note that rcu_core() will only accelerate if RCU is idle and no GP is in progress. The NOCB kthreads have a separate mechanism for dealing with this issue. Not bad, but the first sentence is still factually incorrect. Yes, I'm bored..why do you ask? ;-) I think except for the first sentence though, it made for a nice grammatical re-write (give or take Frederic and Qiang's comments ;-)). - Joel > * NOCB kthreads have their own way to deal with that... > */ > @@ -1993,6 +1998,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > /* > * ...but NOCB kthreads may miss or delay callbacks acceleration > * if in the middle of a (de-)offloading process. > + * > + * Such missed acceleration may cause the callbacks to > + * be stranded until RCU is fully de-offloaded, as > + * acceleration from rcu_core() and note_gp_changes() > + * cannot happen for fully/partially offloaded mode due > + * to ordering dependency between rnp lock and nocb_lock. > */ > needacc = true; > } > -- > 2.39.1.519.gcb327c4b5f-goog >
> > On Sat, Feb 04, 2023 at 02:20:50AM +0000, Joel Fernandes (Google) wrote: > > Recent discussion triggered due to a patch linked below, from Qiang, > > shed light on the need to accelerate from QS reporting paths. > > > > Update the comments to capture this piece of knowledge. > > > > Link: https://lore.kernel.org/all/20230118073014.2020743-1-qiang1.zhang@intel.com/ > > Cc: Qiang Zhang <Qiang1.zhang@intel.com> > > Cc: Frederic Weisbecker <frederic@kernel.org> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > --- > > kernel/rcu/tree.c | 13 ++++++++++++- > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 93eb03f8ed99..713eb6ca6902 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -1983,7 +1983,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > > } else { > > /* > > * This GP can't end until cpu checks in, so all of our > > - * callbacks can be processed during the next GP. > > + * callbacks can be processed during the next GP. Do > > + * the acceleration from here otherwise there may be extra > > + * grace period delays, as any accelerations from rcu_core() > > + * or note_gp_changes() may happen only after the GP after the > > + * current one has already started. Further, rcu_core() > > + * only accelerates if RCU is idle (no GP in progress). > > Actually note_gp_changes() should take care of that. > >You are referring to rcu_core() -> rcu_check_quiescent_state() -> >note_gp_changes() doing the acceleration prior to the rcu_core() -> >rcu_report_qs_rdp() call, correct? > >Ah, but note_gp_changes() has an early return which triggers if either: >1. The rnp spinlock trylock failed. >2. The start of a new grace period was already detected before, so >rdp->gp_seq == rnp->gp_seq. > >So I think it is possible that we are in the middle of a GP, and >rcu_core() is called because QS reporting is required for the CPU, and >say the current GP started we are in the middle off occurs from the >same CPU so rdp->gp_seq == rnp->gp_seq. > >Now, rcu_core()'s call to note_gp_changes() should return early but >its later call to report_qs_rdp() will not accelerate the callback >without the code we are commenting here. > > My gut feeling is that the > acceleration in rcu_report_qs_rdp() only stands for: > > * callbacks that may be enqueued from an IRQ firing during the small window > between the RNP unlock in note_gp_changes() and the RNP lock in > rcu_report_qs_rdp() For rdp which is in the middle of a de-offloading process, the bypass list have been flushed, the nocb kthreads may miss callbacks acceleration. invoke call_rcu() will also not use bypass list. if at this time a new gp starts, before call rcu_report_qs_rdp() to report qs, even if rcu_core() invoke note_gp_changes() notice gp start, this rdp's callback may still miss acceleration if rdp still in de-offloading process, because invoke rcu_rdp_is_offloaded() still return true. I think this is also a reason. Thanks Zqiang > >Sure, this also seems like a valid reason. > > * __note_gp_changes() got called even before from the GP kthread, and callbacks > got enqueued between that and rcu_core(). > >Agreed. In this case we will take the early return in >note_gp_changes() when called from the rcu_core(). So yeah, that was >kind of my point as well but slightly different reasoning. > >Let me know if you disagree with anything I mentioned, though. > > - Joel
> On Feb 11, 2023, at 6:18 AM, Zhang, Qiang1 <qiang1.zhang@intel.com> wrote: > > >> >> >>> On Sat, Feb 04, 2023 at 02:20:50AM +0000, Joel Fernandes (Google) wrote: >>> Recent discussion triggered due to a patch linked below, from Qiang, >>> shed light on the need to accelerate from QS reporting paths. >>> >>> Update the comments to capture this piece of knowledge. >>> >>> Link: https://lore.kernel.org/all/20230118073014.2020743-1-qiang1.zhang@intel.com/ >>> Cc: Qiang Zhang <Qiang1.zhang@intel.com> >>> Cc: Frederic Weisbecker <frederic@kernel.org> >>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> >>> >>> --- >>> kernel/rcu/tree.c | 13 ++++++++++++- >>> 1 file changed, 12 insertions(+), 1 deletion(-) >>> >>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c >>> index 93eb03f8ed99..713eb6ca6902 100644 >>> --- a/kernel/rcu/tree.c >>> +++ b/kernel/rcu/tree.c >>> @@ -1983,7 +1983,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) >>> } else { >>> /* >>> * This GP can't end until cpu checks in, so all of our >>> - * callbacks can be processed during the next GP. >>> + * callbacks can be processed during the next GP. Do >>> + * the acceleration from here otherwise there may be extra >>> + * grace period delays, as any accelerations from rcu_core() >>> + * or note_gp_changes() may happen only after the GP after the >>> + * current one has already started. Further, rcu_core() >>> + * only accelerates if RCU is idle (no GP in progress). >> >> Actually note_gp_changes() should take care of that. >> >> You are referring to rcu_core() -> rcu_check_quiescent_state() -> >> note_gp_changes() doing the acceleration prior to the rcu_core() -> >> rcu_report_qs_rdp() call, correct? >> >> Ah, but note_gp_changes() has an early return which triggers if either: >> 1. The rnp spinlock trylock failed. >> 2. The start of a new grace period was already detected before, so >> rdp->gp_seq == rnp->gp_seq. >> >> So I think it is possible that we are in the middle of a GP, and >> rcu_core() is called because QS reporting is required for the CPU, and >> say the current GP started we are in the middle off occurs from the >> same CPU so rdp->gp_seq == rnp->gp_seq. >> >> Now, rcu_core()'s call to note_gp_changes() should return early but >> its later call to report_qs_rdp() will not accelerate the callback >> without the code we are commenting here. >> >> My gut feeling is that the >> acceleration in rcu_report_qs_rdp() only stands for: >> >> * callbacks that may be enqueued from an IRQ firing during the small window >> between the RNP unlock in note_gp_changes() and the RNP lock in >> rcu_report_qs_rdp() > > For rdp which is in the middle of a de-offloading process, the bypass list have been > flushed, the nocb kthreads may miss callbacks acceleration. invoke call_rcu() > will also not use bypass list. if at this time a new gp starts, before call rcu_report_qs_rdp() > to report qs, even if rcu_core() invoke note_gp_changes() notice gp start, this rdp's callback > may still miss acceleration if rdp still in de-offloading process, because invoke rcu_rdp_is_offloaded() > still return true. > > I think this is also a reason. I tend to agree with you. I am wondering the best way to document all these reasons. Perhaps it suffices to mention a few reasons briefly here, without going into too much detail (because details may be subject to change). I will look through this entire thread again and take a call on how to proceed, but do let me know what you and Frederic think about the next steps. The main benefit of commenting is we dont look at this in a few years and run into the same question… Thanks! Joel > > Thanks > Zqiang > >> >> Sure, this also seems like a valid reason. >> >> * __note_gp_changes() got called even before from the GP kthread, and callbacks >> got enqueued between that and rcu_core(). >> >> Agreed. In this case we will take the early return in >> note_gp_changes() when called from the rcu_core(). So yeah, that was >> kind of my point as well but slightly different reasoning. >> >> Let me know if you disagree with anything I mentioned, though. >> >> - Joel
> On Feb 11, 2023, at 6:18 AM, Zhang, Qiang1 <qiang1.zhang@intel.com> wrote: > > >> >> >>> On Sat, Feb 04, 2023 at 02:20:50AM +0000, Joel Fernandes (Google) wrote: >>> Recent discussion triggered due to a patch linked below, from Qiang, >>> shed light on the need to accelerate from QS reporting paths. >>> >>> Update the comments to capture this piece of knowledge. >>> >>> Link: https://lore.kernel.org/all/20230118073014.2020743-1-qiang1.zhang@intel.com/ >>> Cc: Qiang Zhang <Qiang1.zhang@intel.com> >>> Cc: Frederic Weisbecker <frederic@kernel.org> >>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> >>> >>> --- >>> kernel/rcu/tree.c | 13 ++++++++++++- >>> 1 file changed, 12 insertions(+), 1 deletion(-) >>> >>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c >>> index 93eb03f8ed99..713eb6ca6902 100644 >>> --- a/kernel/rcu/tree.c >>> +++ b/kernel/rcu/tree.c >>> @@ -1983,7 +1983,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) >>> } else { >>> /* >>> * This GP can't end until cpu checks in, so all of our >>> - * callbacks can be processed during the next GP. >>> + * callbacks can be processed during the next GP. Do >>> + * the acceleration from here otherwise there may be extra >>> + * grace period delays, as any accelerations from rcu_core() >>> + * or note_gp_changes() may happen only after the GP after the >>> + * current one has already started. Further, rcu_core() >>> + * only accelerates if RCU is idle (no GP in progress). >> >> Actually note_gp_changes() should take care of that. >> >> You are referring to rcu_core() -> rcu_check_quiescent_state() -> >> note_gp_changes() doing the acceleration prior to the rcu_core() -> >> rcu_report_qs_rdp() call, correct? >> >> Ah, but note_gp_changes() has an early return which triggers if either: >> 1. The rnp spinlock trylock failed. >> 2. The start of a new grace period was already detected before, so >> rdp->gp_seq == rnp->gp_seq. >> >> So I think it is possible that we are in the middle of a GP, and >> rcu_core() is called because QS reporting is required for the CPU, and >> say the current GP started we are in the middle off occurs from the >> same CPU so rdp->gp_seq == rnp->gp_seq. >> >> Now, rcu_core()'s call to note_gp_changes() should return early but >> its later call to report_qs_rdp() will not accelerate the callback >> without the code we are commenting here. >> >> My gut feeling is that the >> acceleration in rcu_report_qs_rdp() only stands for: >> >> * callbacks that may be enqueued from an IRQ firing during the small window >> between the RNP unlock in note_gp_changes() and the RNP lock in >> rcu_report_qs_rdp() > > For rdp which is in the middle of a de-offloading process, the bypass list have been > flushed, the nocb kthreads may miss callbacks acceleration. invoke call_rcu() > will also not use bypass list. if at this time a new gp starts, before call rcu_report_qs_rdp() > to report qs, even if rcu_core() invoke note_gp_changes() notice gp start, this rdp's callback > may still miss acceleration if rdp still in de-offloading process, because invoke rcu_rdp_is_offloaded() > still return true. > > I think this is also a reason. > >I tend to agree with you. I am wondering the best way to document all these reasons. Perhaps it suffices to mention a few reasons briefly here, without going into too much detail (because details may be subject to change). Agree, just mention a few main reasons, and the details may be put in the document. Thanks Zqiang > >I will look through this entire thread again and take a call on how to proceed, but do let me know what you and Frederic think about the next steps. The main benefit of commenting is we dont look at this in a few years and run into the same question… > >Thanks! > >Joel > > Thanks > Zqiang > >> >> Sure, this also seems like a valid reason. >> >> * __note_gp_changes() got called even before from the GP kthread, and callbacks >> got enqueued between that and rcu_core(). >> >> Agreed. In this case we will take the early return in >> note_gp_changes() when called from the rcu_core(). So yeah, that was >> kind of my point as well but slightly different reasoning. >> >> Let me know if you disagree with anything I mentioned, though. >> >> - Joel
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 93eb03f8ed99..713eb6ca6902 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1983,7 +1983,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) } else { /* * This GP can't end until cpu checks in, so all of our - * callbacks can be processed during the next GP. + * callbacks can be processed during the next GP. Do + * the acceleration from here otherwise there may be extra + * grace period delays, as any accelerations from rcu_core() + * or note_gp_changes() may happen only after the GP after the + * current one has already started. Further, rcu_core() + * only accelerates if RCU is idle (no GP in progress). * * NOCB kthreads have their own way to deal with that... */ @@ -1993,6 +1998,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) /* * ...but NOCB kthreads may miss or delay callbacks acceleration * if in the middle of a (de-)offloading process. + * + * Such missed acceleration may cause the callbacks to + * be stranded until RCU is fully de-offloaded, as + * acceleration from rcu_core() and note_gp_changes() + * cannot happen for fully/partially offloaded mode due + * to ordering dependency between rnp lock and nocb_lock. */ needacc = true; }