diff mbox series

rcu/tree: Improve comments in rcu_report_qs_rdp()

Message ID	20230204022051.2737724-1-joel@joelfernandes.org
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; From: "Joel Fernandes (Google)" <joel@joelfernandes.org> To: linux-kernel@vger.kernel.org Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>, Qiang Zhang <Qiang1.zhang@intel.com>, Frederic Weisbecker <frederic@kernel.org>, Josh Triplett <josh@joshtriplett.org>, Lai Jiangshan <jiangshanlai@gmail.com>, Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, "Paul E. McKenney" <paulmck@kernel.org>, rcu@vger.kernel.org, Steven Rostedt <rostedt@goodmis.org>, Boqun Feng <boqun.feng@gmail.com> Subject: [PATCH] rcu/tree: Improve comments in rcu_report_qs_rdp() Date: Sat, 4 Feb 2023 02:20:50 +0000 Message-Id: <20230204022051.2737724-1-joel@joelfernandes.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	rcu/tree: Improve comments in rcu_report_qs_rdp() \| rcu/tree: Improve comments in rcu_report_qs_rdp()

Commit Message

Joel Fernandes Feb. 4, 2023, 2:20 a.m. UTC

  Recent discussion triggered due to a patch linked below, from Qiang,
shed light on the need to accelerate from QS reporting paths.

Update the comments to capture this piece of knowledge.

Link: https://lore.kernel.org/all/20230118073014.2020743-1-qiang1.zhang@intel.com/
Cc: Qiang Zhang <Qiang1.zhang@intel.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

---
 kernel/rcu/tree.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Comments

Zqiang Feb. 6, 2023, 3:09 a.m. UTC | #1

>Recent discussion triggered due to a patch linked below, from Qiang,
>shed light on the need to accelerate from QS reporting paths.
>
>Update the comments to capture this piece of knowledge.
>
>Link: https://lore.kernel.org/all/20230118073014.2020743-1-qiang1.zhang@intel.com/
>Cc: Qiang Zhang <Qiang1.zhang@intel.com>
>Cc: Frederic Weisbecker <frederic@kernel.org>
>Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>
>---
> kernel/rcu/tree.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
>diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>index 93eb03f8ed99..713eb6ca6902 100644
>--- a/kernel/rcu/tree.c
>+++ b/kernel/rcu/tree.c
>@@ -1983,7 +1983,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> 	} else {
> 		/*
> 		 * This GP can't end until cpu checks in, so all of our
>-		 * callbacks can be processed during the next GP.
>+		 * callbacks can be processed during the next GP. Do
>+		 * the acceleration from here otherwise there may be extra
>+		 * grace period delays, as any accelerations from rcu_core()


Does the extra grace period delays means that if not accelerate callback, 
the grace period will take more time to end ? or refers to a delay in the
start time of a new grace period?

Thanks
Zqiang

>+		 * or note_gp_changes() may happen only after the GP after the
>+		 * current one has already started. Further, rcu_core()
>+		 * only accelerates if RCU is idle (no GP in progress).
> 		 *
> 		 * NOCB kthreads have their own way to deal with that...
> 		 */
>@@ -1993,6 +1998,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> 			/*
> 			 * ...but NOCB kthreads may miss or delay callbacks acceleration
> 			 * if in the middle of a (de-)offloading process.
>+			 *
>+			 * Such missed acceleration may cause the callbacks to
>+			 * be stranded until RCU is fully de-offloaded, as
>+			 * acceleration from rcu_core() and note_gp_changes()
>+			 * cannot happen for fully/partially offloaded mode due
>+			 * to ordering dependency between rnp lock and nocb_lock.
> 			 */
> 			needacc = true;
> 		}
>-- 
>2.39.1.519.gcb327c4b5f-goog
>

Joel Fernandes Feb. 6, 2023, 5:19 p.m. UTC | #2

On Sun, Feb 5, 2023 at 10:09 PM Zhang, Qiang1 <qiang1.zhang@intel.com> wrote:
>
>
> >Recent discussion triggered due to a patch linked below, from Qiang,
> >shed light on the need to accelerate from QS reporting paths.
> >
> >Update the comments to capture this piece of knowledge.
> >
> >Link: https://lore.kernel.org/all/20230118073014.2020743-1-qiang1.zhang@intel.com/
> >Cc: Qiang Zhang <Qiang1.zhang@intel.com>
> >Cc: Frederic Weisbecker <frederic@kernel.org>
> >Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> >
> >---
> > kernel/rcu/tree.c | 13 ++++++++++++-
> > 1 file changed, 12 insertions(+), 1 deletion(-)
> >
> >diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> >index 93eb03f8ed99..713eb6ca6902 100644
> >--- a/kernel/rcu/tree.c
> >+++ b/kernel/rcu/tree.c
> >@@ -1983,7 +1983,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> >       } else {
> >               /*
> >                * This GP can't end until cpu checks in, so all of our
> >-               * callbacks can be processed during the next GP.
> >+               * callbacks can be processed during the next GP. Do
> >+               * the acceleration from here otherwise there may be extra
> >+               * grace period delays, as any accelerations from rcu_core()
>
>
> Does the extra grace period delays means that if not accelerate callback,
> the grace period will take more time to end ? or refers to a delay in the
> start time of a new grace period?

Yes, so IMO it is like this if we don't accelerate:
1. Start GP 1
2. CPU1 queues callback C1 (not accelerated yet)
3. CPU1 reports QS for GP1 (not accelerating anything).
4. GP1 ends
5. CPU1's note_gp_changes() is called, accelerate happens, now the CB
will execute after GP3 (or alternately, rcu_core() on CPU1 does
accelerate).
6. GP2 ends.
7. GP3 starts.
8. GP3 ends.
9. CB is invoked

Instead, what we will get the following thanks to the acceleration here is:
1. Start GP 1
2. CPU1 queues callback C1 (not accelerated yet)
3. CPU1 reports QS for GP1 and acceleration happens as done by the
code this patch adds comments for.
4. GP1 ends
5. CPU1's note_gp_changes() is called
6. GP2 ends.
7. CB is invoked

Does that make sense or is there a subtlety I missed?

Thanks,

 - Joel


>
> Thanks
> Zqiang
>
> >+               * or note_gp_changes() may happen only after the GP after the
> >+               * current one has already started. Further, rcu_core()
> >+               * only accelerates if RCU is idle (no GP in progress).
> >                *
> >                * NOCB kthreads have their own way to deal with that...
> >                */
> >@@ -1993,6 +1998,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> >                       /*
> >                        * ...but NOCB kthreads may miss or delay callbacks acceleration
> >                        * if in the middle of a (de-)offloading process.
> >+                       *
> >+                       * Such missed acceleration may cause the callbacks to
> >+                       * be stranded until RCU is fully de-offloaded, as
> >+                       * acceleration from rcu_core() and note_gp_changes()
> >+                       * cannot happen for fully/partially offloaded mode due
> >+                       * to ordering dependency between rnp lock and nocb_lock.
> >                        */
> >                       needacc = true;
> >               }
> >--
> >2.39.1.519.gcb327c4b5f-goog
> >

Joel Fernandes Feb. 6, 2023, 5:24 p.m. UTC | #3

On Mon, Feb 6, 2023 at 12:19 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Sun, Feb 5, 2023 at 10:09 PM Zhang, Qiang1 <qiang1.zhang@intel.com> wrote:
> >
> >
> > >Recent discussion triggered due to a patch linked below, from Qiang,
> > >shed light on the need to accelerate from QS reporting paths.
> > >
> > >Update the comments to capture this piece of knowledge.
> > >
> > >Link: https://lore.kernel.org/all/20230118073014.2020743-1-qiang1.zhang@intel.com/
> > >Cc: Qiang Zhang <Qiang1.zhang@intel.com>
> > >Cc: Frederic Weisbecker <frederic@kernel.org>
> > >Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > >
> > >---
> > > kernel/rcu/tree.c | 13 ++++++++++++-
> > > 1 file changed, 12 insertions(+), 1 deletion(-)
> > >
> > >diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > >index 93eb03f8ed99..713eb6ca6902 100644
> > >--- a/kernel/rcu/tree.c
> > >+++ b/kernel/rcu/tree.c
> > >@@ -1983,7 +1983,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> > >       } else {
> > >               /*
> > >                * This GP can't end until cpu checks in, so all of our
> > >-               * callbacks can be processed during the next GP.
> > >+               * callbacks can be processed during the next GP. Do
> > >+               * the acceleration from here otherwise there may be extra
> > >+               * grace period delays, as any accelerations from rcu_core()
> >
> >
> > Does the extra grace period delays means that if not accelerate callback,
> > the grace period will take more time to end ? or refers to a delay in the
> > start time of a new grace period?
>
> Yes, so IMO it is like this if we don't accelerate:
> 1. Start GP 1
> 2. CPU1 queues callback C1 (not accelerated yet)
> 3. CPU1 reports QS for GP1 (not accelerating anything).
> 4. GP1 ends
> 5. CPU1's note_gp_changes() is called, accelerate happens, now the CB
> will execute after GP3 (or alternately, rcu_core() on CPU1 does
> accelerate).
> 6. GP2 ends.
> 7. GP3 starts.
> 8. GP3 ends.
> 9. CB is invoked
>
> Instead, what we will get the following thanks to the acceleration here is:
> 1. Start GP 1
> 2. CPU1 queues callback C1 (not accelerated yet)
> 3. CPU1 reports QS for GP1 and acceleration happens as done by the
> code this patch adds comments for.
> 4. GP1 ends
> 5. CPU1's note_gp_changes() is called
> 6. GP2 ends.
> 7. CB is invoked

Sorry I missed some steps, here is the update:
1. Start GP 1
2. CPU1 queues callback C1 (not accelerated yet)
3. CPU1 reports QS for GP1 (not accelerating anything).
4. GP1 ends
5. GP2 starts for some other reason from some other CPU.
6. CPU1's note_gp_changes() is called, acceleration happens, now the CB
will execute after GP3.
7. GP2 ends.
8. GP3 starts.
9. GP3 ends.
10. CB is invoked

Instead, what we will get the following thanks to the acceleration here is:
1. Start GP 1
2. CPU1 queues callback C1 (not accelerated yet)
3. CPU1 reports QS for GP1 and acceleration happens as done by the
code this patch adds comments for.
4. GP1 ends
5. GP2 starts
6. GP2 ends.
7. CB is invoked

Does that make sense or is there a subtlety I missed?

Thanks,

 - Joel

Zqiang Feb. 7, 2023, 1:14 a.m. UTC | #4

On Mon, Feb 6, 2023 at 12:19 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Sun, Feb 5, 2023 at 10:09 PM Zhang, Qiang1 <qiang1.zhang@intel.com> wrote:
> >
> >
> > >Recent discussion triggered due to a patch linked below, from Qiang,
> > >shed light on the need to accelerate from QS reporting paths.
> > >
> > >Update the comments to capture this piece of knowledge.
> > >
> > >Link: https://lore.kernel.org/all/20230118073014.2020743-1-qiang1.zhang@intel.com/
> > >Cc: Qiang Zhang <Qiang1.zhang@intel.com>
> > >Cc: Frederic Weisbecker <frederic@kernel.org>
> > >Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > >
> > >---
> > > kernel/rcu/tree.c | 13 ++++++++++++-
> > > 1 file changed, 12 insertions(+), 1 deletion(-)
> > >
> > >diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > >index 93eb03f8ed99..713eb6ca6902 100644
> > >--- a/kernel/rcu/tree.c
> > >+++ b/kernel/rcu/tree.c
> > >@@ -1983,7 +1983,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> > >       } else {
> > >               /*
> > >                * This GP can't end until cpu checks in, so all of our
> > >-               * callbacks can be processed during the next GP.
> > >+               * callbacks can be processed during the next GP. Do
> > >+               * the acceleration from here otherwise there may be extra
> > >+               * grace period delays, as any accelerations from rcu_core()
> >
> >
> > Does the extra grace period delays means that if not accelerate callback,
> > the grace period will take more time to end ? or refers to a delay in the
> > start time of a new grace period?
>
> Yes, so IMO it is like this if we don't accelerate:
> 1. Start GP 1
> 2. CPU1 queues callback C1 (not accelerated yet)
> 3. CPU1 reports QS for GP1 (not accelerating anything).
> 4. GP1 ends
> 5. CPU1's note_gp_changes() is called, accelerate happens, now the CB
> will execute after GP3 (or alternately, rcu_core() on CPU1 does
> accelerate).
> 6. GP2 ends.
> 7. GP3 starts.
> 8. GP3 ends.
> 9. CB is invoked
>
> Instead, what we will get the following thanks to the acceleration here is:
> 1. Start GP 1
> 2. CPU1 queues callback C1 (not accelerated yet)
> 3. CPU1 reports QS for GP1 and acceleration happens as done by the
> code this patch adds comments for.
> 4. GP1 ends
> 5. CPU1's note_gp_changes() is called
> 6. GP2 ends.
> 7. CB is invoked
>
>Sorry I missed some steps, here is the update:
>1. Start GP 1
>2. CPU1 queues callback C1 (not accelerated yet)
>3. CPU1 reports QS for GP1 (not accelerating anything).
>4. GP1 ends
>5. GP2 starts for some other reason from some other CPU.
>6. CPU1's note_gp_changes() is called, acceleration happens, now the CB
>will execute after GP3.
>7. GP2 ends.
>8. GP3 starts.
>9. GP3 ends.
>10. CB is invoked
>
>Instead, what we will get the following thanks to the acceleration here is:
>1. Start GP 1
>2. CPU1 queues callback C1 (not accelerated yet)
>3. CPU1 reports QS for GP1 and acceleration happens as done by the
>code this patch adds comments for.
>4. GP1 ends
>5. GP2 starts
>6. GP2 ends.
>7. CB is invoked
>
>Does that make sense or is there a subtlety I missed?



Thanks for detailed description, that is to say, the grace period delays means that
if there is no acceleration,  the invocation of callback may be delayed by one or
more grace periods.

Can you re-describe the meaning of  "grace period delays "in the comments?

Thanks
Zqiang



>
>Thanks,
>
> - Joel

Frederic Weisbecker Feb. 7, 2023, 1:24 p.m. UTC | #5

On Sat, Feb 04, 2023 at 02:20:50AM +0000, Joel Fernandes (Google) wrote:
> Recent discussion triggered due to a patch linked below, from Qiang,
> shed light on the need to accelerate from QS reporting paths.
> 
> Update the comments to capture this piece of knowledge.
> 
> Link: https://lore.kernel.org/all/20230118073014.2020743-1-qiang1.zhang@intel.com/
> Cc: Qiang Zhang <Qiang1.zhang@intel.com>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> ---
>  kernel/rcu/tree.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 93eb03f8ed99..713eb6ca6902 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1983,7 +1983,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
>  	} else {
>  		/*
>  		 * This GP can't end until cpu checks in, so all of our
> -		 * callbacks can be processed during the next GP.
> +		 * callbacks can be processed during the next GP. Do
> +		 * the acceleration from here otherwise there may be extra
> +		 * grace period delays, as any accelerations from rcu_core()
> +		 * or note_gp_changes() may happen only after the GP after the
> +		 * current one has already started. Further, rcu_core()
> +		 * only accelerates if RCU is idle (no GP in progress).

Actually note_gp_changes() should take care of that. My gut feeling is that the
acceleration in rcu_report_qs_rdp() only stands for:

* callbacks that may be enqueued from an IRQ firing during the small window
  between the RNP unlock in note_gp_changes() and the RNP lock in
  rcu_report_qs_rdp()

* __note_gp_changes() got called even before from the GP kthread, and callbacks
  got enqueued between that and rcu_core().
  
Thanks.

Joel Fernandes Feb. 10, 2023, 11:46 p.m. UTC | #6

On Tue, Feb 7, 2023 at 8:24 AM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> On Sat, Feb 04, 2023 at 02:20:50AM +0000, Joel Fernandes (Google) wrote:
> > Recent discussion triggered due to a patch linked below, from Qiang,
> > shed light on the need to accelerate from QS reporting paths.
> >
> > Update the comments to capture this piece of knowledge.
> >
> > Link: https://lore.kernel.org/all/20230118073014.2020743-1-qiang1.zhang@intel.com/
> > Cc: Qiang Zhang <Qiang1.zhang@intel.com>
> > Cc: Frederic Weisbecker <frederic@kernel.org>
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> >
> > ---
> >  kernel/rcu/tree.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 93eb03f8ed99..713eb6ca6902 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1983,7 +1983,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> >       } else {
> >               /*
> >                * This GP can't end until cpu checks in, so all of our
> > -              * callbacks can be processed during the next GP.
> > +              * callbacks can be processed during the next GP. Do
> > +              * the acceleration from here otherwise there may be extra
> > +              * grace period delays, as any accelerations from rcu_core()
> > +              * or note_gp_changes() may happen only after the GP after the
> > +              * current one has already started. Further, rcu_core()
> > +              * only accelerates if RCU is idle (no GP in progress).
>
> Actually note_gp_changes() should take care of that.

You are referring to  rcu_core() -> rcu_check_quiescent_state() ->
note_gp_changes() doing the acceleration prior to the  rcu_core() ->
rcu_report_qs_rdp() call, correct?

Ah, but note_gp_changes() has an early return which triggers if either:
1. The rnp spinlock trylock failed.
2. The start of a new grace period was already detected before, so
rdp->gp_seq == rnp->gp_seq.

So I think it is possible that we are in the middle of a GP, and
rcu_core() is called because QS reporting is required for the CPU, and
say the current GP started we are in the middle off occurs from the
same CPU so rdp->gp_seq == rnp->gp_seq.

Now, rcu_core()'s call to note_gp_changes() should return early but
its later call to report_qs_rdp() will not accelerate the callback
without the code we are commenting here.

> My gut feeling is that the
> acceleration in rcu_report_qs_rdp() only stands for:
>
> * callbacks that may be enqueued from an IRQ firing during the small window
>   between the RNP unlock in note_gp_changes() and the RNP lock in
>   rcu_report_qs_rdp()

Sure, this also seems like a valid reason.

> * __note_gp_changes() got called even before from the GP kthread, and callbacks
>   got enqueued between that and rcu_core().

Agreed. In this case we will take the early return in
note_gp_changes() when called from the rcu_core(). So yeah, that was
kind of my point as well but slightly different reasoning.

Let me know if you disagree with anything I mentioned, though.

 - Joel

Joel Fernandes Feb. 10, 2023, 11:55 p.m. UTC | #7

On Mon, Feb 6, 2023 at 8:15 PM Zhang, Qiang1 <qiang1.zhang@intel.com> wrote:
>
> On Mon, Feb 6, 2023 at 12:19 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > On Sun, Feb 5, 2023 at 10:09 PM Zhang, Qiang1 <qiang1.zhang@intel.com> wrote:
> > >
> > >
> > > >Recent discussion triggered due to a patch linked below, from Qiang,
> > > >shed light on the need to accelerate from QS reporting paths.
> > > >
> > > >Update the comments to capture this piece of knowledge.
> > > >
> > > >Link: https://lore.kernel.org/all/20230118073014.2020743-1-qiang1.zhang@intel.com/
> > > >Cc: Qiang Zhang <Qiang1.zhang@intel.com>
> > > >Cc: Frederic Weisbecker <frederic@kernel.org>
> > > >Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > >
> > > >---
> > > > kernel/rcu/tree.c | 13 ++++++++++++-
> > > > 1 file changed, 12 insertions(+), 1 deletion(-)
> > > >
> > > >diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > >index 93eb03f8ed99..713eb6ca6902 100644
> > > >--- a/kernel/rcu/tree.c
> > > >+++ b/kernel/rcu/tree.c
> > > >@@ -1983,7 +1983,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> > > >       } else {
> > > >               /*
> > > >                * This GP can't end until cpu checks in, so all of our
> > > >-               * callbacks can be processed during the next GP.
> > > >+               * callbacks can be processed during the next GP. Do
> > > >+               * the acceleration from here otherwise there may be extra
> > > >+               * grace period delays, as any accelerations from rcu_core()
> > >
> > >
> > > Does the extra grace period delays means that if not accelerate callback,
> > > the grace period will take more time to end ? or refers to a delay in the
> > > start time of a new grace period?
> >
> > Yes, so IMO it is like this if we don't accelerate:
> > 1. Start GP 1
> > 2. CPU1 queues callback C1 (not accelerated yet)
> > 3. CPU1 reports QS for GP1 (not accelerating anything).
> > 4. GP1 ends
> > 5. CPU1's note_gp_changes() is called, accelerate happens, now the CB
> > will execute after GP3 (or alternately, rcu_core() on CPU1 does
> > accelerate).
> > 6. GP2 ends.
> > 7. GP3 starts.
> > 8. GP3 ends.
> > 9. CB is invoked
> >
> > Instead, what we will get the following thanks to the acceleration here is:
> > 1. Start GP 1
> > 2. CPU1 queues callback C1 (not accelerated yet)
> > 3. CPU1 reports QS for GP1 and acceleration happens as done by the
> > code this patch adds comments for.
> > 4. GP1 ends
> > 5. CPU1's note_gp_changes() is called
> > 6. GP2 ends.
> > 7. CB is invoked
> >
> >Sorry I missed some steps, here is the update:
> >1. Start GP 1
> >2. CPU1 queues callback C1 (not accelerated yet)
> >3. CPU1 reports QS for GP1 (not accelerating anything).
> >4. GP1 ends
> >5. GP2 starts for some other reason from some other CPU.
> >6. CPU1's note_gp_changes() is called, acceleration happens, now the CB
> >will execute after GP3.
> >7. GP2 ends.
> >8. GP3 starts.
> >9. GP3 ends.
> >10. CB is invoked
> >
> >Instead, what we will get the following thanks to the acceleration here is:
> >1. Start GP 1
> >2. CPU1 queues callback C1 (not accelerated yet)
> >3. CPU1 reports QS for GP1 and acceleration happens as done by the
> >code this patch adds comments for.
> >4. GP1 ends
> >5. GP2 starts
> >6. GP2 ends.
> >7. CB is invoked
> >
> >Does that make sense or is there a subtlety I missed?
>
>
>
> Thanks for detailed description, that is to say, the grace period delays means that
> if there is no acceleration,  the invocation of callback may be delayed by one or
> more grace periods.
>
> Can you re-describe the meaning of  "grace period delays "in the comments?

Yes, good point. I should change it to "one or more delays". Thank you
for the suggestion!

Sorry for my late reply as I was OOO this week.

 - Joel


>
> Thanks
> Zqiang
>
>
>
> >
> >Thanks,
> >
> > - Joel

Joel Fernandes Feb. 11, 2023, 12:32 a.m. UTC | #8

On Fri, Feb 3, 2023 at 9:21 PM Joel Fernandes (Google)
<joel@joelfernandes.org> wrote:
>
> Recent discussion triggered due to a patch linked below, from Qiang,
> shed light on the need to accelerate from QS reporting paths.
>
> Update the comments to capture this piece of knowledge.
>
> Link: https://lore.kernel.org/all/20230118073014.2020743-1-qiang1.zhang@intel.com/
> Cc: Qiang Zhang <Qiang1.zhang@intel.com>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>
> ---
>  kernel/rcu/tree.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 93eb03f8ed99..713eb6ca6902 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1983,7 +1983,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
>         } else {
>                 /*
>                  * This GP can't end until cpu checks in, so all of our
> -                * callbacks can be processed during the next GP.
> +                * callbacks can be processed during the next GP. Do
> +                * the acceleration from here otherwise there may be extra
> +                * grace period delays, as any accelerations from rcu_core()
> +                * or note_gp_changes() may happen only after the GP after the
> +                * current one has already started. Further, rcu_core()
> +                * only accelerates if RCU is idle (no GP in progress).
>                  *

And I couldn't help but pass this through ChatGPT:

Before:
This GP can't end until cpu checks in, so all of our callbacks can be
processed during the next GP. Do the acceleration from here otherwise
there may be extra grace period delays, as any accelerations from
rcu_core() or note_gp_changes() may happen only after the GP after the
current one has already started. Further, rcu_core() only accelerates
if RCU is idle (no GP in progress). NOCB kthreads have their own way
to deal with that...

After:
In order to ensure all callbacks are processed efficiently, this grace
period (GP) must not end until the CPU checks in. To avoid unnecessary
grace period delays, it is important to initiate acceleration from the
current location. Any acceleration from the functions "rcu_core()" or
"note_gp_changes()" may only occur after the next GP has already
started. It is important to note that "rcu_core()" will only initiate
acceleration if the RCU system is idle (no GP in progress). NOCB
kthreads, on the other hand, have their own unique method for handling
this situation.

It sounds well written, but maybe a bit factually incorrect.. trying again:

To ensure all callbacks are processed in the next grace period, this
GP must not end until the CPU has checked in. To avoid any additional
grace period delays, it's important to perform the acceleration from
here. If acceleration is performed from rcu_core() or
note_gp_changes(), it may only occur after the next GP has already
started. Additionally, it's important to note that rcu_core() will
only accelerate if RCU is idle and no GP is in progress. The NOCB
kthreads have a separate mechanism for dealing with this issue.

Not bad, but the first sentence is still factually incorrect. Yes, I'm
bored..why do you ask? ;-)

I think except for the first sentence though, it made for a nice
grammatical re-write (give or take Frederic and Qiang's comments ;-)).

 - Joel

>                  * NOCB kthreads have their own way to deal with that...
>                  */
> @@ -1993,6 +1998,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
>                         /*
>                          * ...but NOCB kthreads may miss or delay callbacks acceleration
>                          * if in the middle of a (de-)offloading process.
> +                        *
> +                        * Such missed acceleration may cause the callbacks to
> +                        * be stranded until RCU is fully de-offloaded, as
> +                        * acceleration from rcu_core() and note_gp_changes()
> +                        * cannot happen for fully/partially offloaded mode due
> +                        * to ordering dependency between rnp lock and nocb_lock.
>                          */
>                         needacc = true;
>                 }
> --
> 2.39.1.519.gcb327c4b5f-goog
>

Zqiang Feb. 11, 2023, 11:18 a.m. UTC | #9

>
> On Sat, Feb 04, 2023 at 02:20:50AM +0000, Joel Fernandes (Google) wrote:
> > Recent discussion triggered due to a patch linked below, from Qiang,
> > shed light on the need to accelerate from QS reporting paths.
> >
> > Update the comments to capture this piece of knowledge.
> >
> > Link: https://lore.kernel.org/all/20230118073014.2020743-1-qiang1.zhang@intel.com/
> > Cc: Qiang Zhang <Qiang1.zhang@intel.com>
> > Cc: Frederic Weisbecker <frederic@kernel.org>
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> >
> > ---
> >  kernel/rcu/tree.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 93eb03f8ed99..713eb6ca6902 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1983,7 +1983,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> >       } else {
> >               /*
> >                * This GP can't end until cpu checks in, so all of our
> > -              * callbacks can be processed during the next GP.
> > +              * callbacks can be processed during the next GP. Do
> > +              * the acceleration from here otherwise there may be extra
> > +              * grace period delays, as any accelerations from rcu_core()
> > +              * or note_gp_changes() may happen only after the GP after the
> > +              * current one has already started. Further, rcu_core()
> > +              * only accelerates if RCU is idle (no GP in progress).
>
> Actually note_gp_changes() should take care of that.
>
>You are referring to  rcu_core() -> rcu_check_quiescent_state() ->
>note_gp_changes() doing the acceleration prior to the  rcu_core() ->
>rcu_report_qs_rdp() call, correct?
>
>Ah, but note_gp_changes() has an early return which triggers if either:
>1. The rnp spinlock trylock failed.
>2. The start of a new grace period was already detected before, so
>rdp->gp_seq == rnp->gp_seq.
>
>So I think it is possible that we are in the middle of a GP, and
>rcu_core() is called because QS reporting is required for the CPU, and
>say the current GP started we are in the middle off occurs from the
>same CPU so rdp->gp_seq == rnp->gp_seq.
>
>Now, rcu_core()'s call to note_gp_changes() should return early but
>its later call to report_qs_rdp() will not accelerate the callback
>without the code we are commenting here.
>
> My gut feeling is that the
> acceleration in rcu_report_qs_rdp() only stands for:
>
> * callbacks that may be enqueued from an IRQ firing during the small window
>   between the RNP unlock in note_gp_changes() and the RNP lock in
>   rcu_report_qs_rdp()

For rdp which is in the middle of a de-offloading process, the bypass list have been
flushed, the nocb kthreads may miss callbacks acceleration.   invoke call_rcu()
will also not use bypass list. if at this time a new gp starts, before call rcu_report_qs_rdp()
to report qs,  even if rcu_core() invoke note_gp_changes() notice gp start, this rdp's callback
may still miss acceleration if rdp still in de-offloading process, because invoke rcu_rdp_is_offloaded()
still return true.

I think this is also a reason.

Thanks
Zqiang

>
>Sure, this also seems like a valid reason.
>
> * __note_gp_changes() got called even before from the GP kthread, and callbacks
>   got enqueued between that and rcu_core().
>
>Agreed. In this case we will take the early return in
>note_gp_changes() when called from the rcu_core(). So yeah, that was
>kind of my point as well but slightly different reasoning.
>
>Let me know if you disagree with anything I mentioned, though.
>
> - Joel

Joel Fernandes Feb. 13, 2023, 2:43 p.m. UTC | #10

> On Feb 11, 2023, at 6:18 AM, Zhang, Qiang1 <qiang1.zhang@intel.com> wrote:
> 
> 
>> 
>> 
>>> On Sat, Feb 04, 2023 at 02:20:50AM +0000, Joel Fernandes (Google) wrote:
>>> Recent discussion triggered due to a patch linked below, from Qiang,
>>> shed light on the need to accelerate from QS reporting paths.
>>> 
>>> Update the comments to capture this piece of knowledge.
>>> 
>>> Link: https://lore.kernel.org/all/20230118073014.2020743-1-qiang1.zhang@intel.com/
>>> Cc: Qiang Zhang <Qiang1.zhang@intel.com>
>>> Cc: Frederic Weisbecker <frederic@kernel.org>
>>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>>> 
>>> ---
>>> kernel/rcu/tree.c | 13 ++++++++++++-
>>> 1 file changed, 12 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>>> index 93eb03f8ed99..713eb6ca6902 100644
>>> --- a/kernel/rcu/tree.c
>>> +++ b/kernel/rcu/tree.c
>>> @@ -1983,7 +1983,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
>>>      } else {
>>>              /*
>>>               * This GP can't end until cpu checks in, so all of our
>>> -              * callbacks can be processed during the next GP.
>>> +              * callbacks can be processed during the next GP. Do
>>> +              * the acceleration from here otherwise there may be extra
>>> +              * grace period delays, as any accelerations from rcu_core()
>>> +              * or note_gp_changes() may happen only after the GP after the
>>> +              * current one has already started. Further, rcu_core()
>>> +              * only accelerates if RCU is idle (no GP in progress).
>> 
>> Actually note_gp_changes() should take care of that.
>> 
>> You are referring to  rcu_core() -> rcu_check_quiescent_state() ->
>> note_gp_changes() doing the acceleration prior to the  rcu_core() ->
>> rcu_report_qs_rdp() call, correct?
>> 
>> Ah, but note_gp_changes() has an early return which triggers if either:
>> 1. The rnp spinlock trylock failed.
>> 2. The start of a new grace period was already detected before, so
>> rdp->gp_seq == rnp->gp_seq.
>> 
>> So I think it is possible that we are in the middle of a GP, and
>> rcu_core() is called because QS reporting is required for the CPU, and
>> say the current GP started we are in the middle off occurs from the
>> same CPU so rdp->gp_seq == rnp->gp_seq.
>> 
>> Now, rcu_core()'s call to note_gp_changes() should return early but
>> its later call to report_qs_rdp() will not accelerate the callback
>> without the code we are commenting here.
>> 
>> My gut feeling is that the
>> acceleration in rcu_report_qs_rdp() only stands for:
>> 
>> * callbacks that may be enqueued from an IRQ firing during the small window
>>  between the RNP unlock in note_gp_changes() and the RNP lock in
>>  rcu_report_qs_rdp()
> 
> For rdp which is in the middle of a de-offloading process, the bypass list have been
> flushed, the nocb kthreads may miss callbacks acceleration.   invoke call_rcu()
> will also not use bypass list. if at this time a new gp starts, before call rcu_report_qs_rdp()
> to report qs,  even if rcu_core() invoke note_gp_changes() notice gp start, this rdp's callback
> may still miss acceleration if rdp still in de-offloading process, because invoke rcu_rdp_is_offloaded()
> still return true.
> 
> I think this is also a reason.

I tend to agree with you. I am wondering the best way to document all these reasons. Perhaps it suffices to mention a few reasons briefly here, without going into too much detail (because details may be subject to change).

I will look through this entire thread again and take a call on how to proceed, but do let me know what you and Frederic think about the next steps. The main benefit of commenting is we dont look at this in a few years and run into the same question…

Thanks!

Joel

> 
> Thanks
> Zqiang
> 
>> 
>> Sure, this also seems like a valid reason.
>> 
>> * __note_gp_changes() got called even before from the GP kthread, and callbacks
>>  got enqueued between that and rcu_core().
>> 
>> Agreed. In this case we will take the early return in
>> note_gp_changes() when called from the rcu_core(). So yeah, that was
>> kind of my point as well but slightly different reasoning.
>> 
>> Let me know if you disagree with anything I mentioned, though.
>> 
>> - Joel

Zqiang Feb. 14, 2023, 8:17 a.m. UTC | #11

> On Feb 11, 2023, at 6:18 AM, Zhang, Qiang1 <qiang1.zhang@intel.com> wrote:
> 
> 
>> 
>> 
>>> On Sat, Feb 04, 2023 at 02:20:50AM +0000, Joel Fernandes (Google) wrote:
>>> Recent discussion triggered due to a patch linked below, from Qiang,
>>> shed light on the need to accelerate from QS reporting paths.
>>> 
>>> Update the comments to capture this piece of knowledge.
>>> 
>>> Link: https://lore.kernel.org/all/20230118073014.2020743-1-qiang1.zhang@intel.com/
>>> Cc: Qiang Zhang <Qiang1.zhang@intel.com>
>>> Cc: Frederic Weisbecker <frederic@kernel.org>
>>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>>> 
>>> ---
>>> kernel/rcu/tree.c | 13 ++++++++++++-
>>> 1 file changed, 12 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>>> index 93eb03f8ed99..713eb6ca6902 100644
>>> --- a/kernel/rcu/tree.c
>>> +++ b/kernel/rcu/tree.c
>>> @@ -1983,7 +1983,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
>>>      } else {
>>>              /*
>>>               * This GP can't end until cpu checks in, so all of our
>>> -              * callbacks can be processed during the next GP.
>>> +              * callbacks can be processed during the next GP. Do
>>> +              * the acceleration from here otherwise there may be extra
>>> +              * grace period delays, as any accelerations from rcu_core()
>>> +              * or note_gp_changes() may happen only after the GP after the
>>> +              * current one has already started. Further, rcu_core()
>>> +              * only accelerates if RCU is idle (no GP in progress).
>> 
>> Actually note_gp_changes() should take care of that.
>> 
>> You are referring to  rcu_core() -> rcu_check_quiescent_state() ->
>> note_gp_changes() doing the acceleration prior to the  rcu_core() ->
>> rcu_report_qs_rdp() call, correct?
>> 
>> Ah, but note_gp_changes() has an early return which triggers if either:
>> 1. The rnp spinlock trylock failed.
>> 2. The start of a new grace period was already detected before, so
>> rdp->gp_seq == rnp->gp_seq.
>> 
>> So I think it is possible that we are in the middle of a GP, and
>> rcu_core() is called because QS reporting is required for the CPU, and
>> say the current GP started we are in the middle off occurs from the
>> same CPU so rdp->gp_seq == rnp->gp_seq.
>> 
>> Now, rcu_core()'s call to note_gp_changes() should return early but
>> its later call to report_qs_rdp() will not accelerate the callback
>> without the code we are commenting here.
>> 
>> My gut feeling is that the
>> acceleration in rcu_report_qs_rdp() only stands for:
>> 
>> * callbacks that may be enqueued from an IRQ firing during the small window
>>  between the RNP unlock in note_gp_changes() and the RNP lock in
>>  rcu_report_qs_rdp()
> 
> For rdp which is in the middle of a de-offloading process, the bypass list have been
> flushed, the nocb kthreads may miss callbacks acceleration.   invoke call_rcu()
> will also not use bypass list. if at this time a new gp starts, before call rcu_report_qs_rdp()
> to report qs,  even if rcu_core() invoke note_gp_changes() notice gp start, this rdp's callback
> may still miss acceleration if rdp still in de-offloading process, because invoke rcu_rdp_is_offloaded()
> still return true.
> 
> I think this is also a reason.
>
>I tend to agree with you. I am wondering the best way to document all these reasons. Perhaps it suffices to mention a few reasons briefly here, without going into too much detail (because details may be subject to change).


Agree, just mention a few main reasons, and the details may be put in the document.

Thanks
Zqiang


>
>I will look through this entire thread again and take a call on how to proceed, but do let me know what you and Frederic think about the next steps. The main benefit of commenting is we dont look at this in a few years and run into the same question…
>
>Thanks!
>
>Joel

> 
> Thanks
> Zqiang
> 
>> 
>> Sure, this also seems like a valid reason.
>> 
>> * __note_gp_changes() got called even before from the GP kthread, and callbacks
>>  got enqueued between that and rcu_core().
>> 
>> Agreed. In this case we will take the early return in
>> note_gp_changes() when called from the rcu_core(). So yeah, that was
>> kind of my point as well but slightly different reasoning.
>> 
>> Let me know if you disagree with anything I mentioned, though.
>> 
>> - Joel

diff mbox series

Patch

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 93eb03f8ed99..713eb6ca6902 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1983,7 +1983,12 @@  rcu_report_qs_rdp(struct rcu_data *rdp)
 	} else {
 		/*
 		 * This GP can't end until cpu checks in, so all of our
-		 * callbacks can be processed during the next GP.
+		 * callbacks can be processed during the next GP. Do
+		 * the acceleration from here otherwise there may be extra
+		 * grace period delays, as any accelerations from rcu_core()
+		 * or note_gp_changes() may happen only after the GP after the
+		 * current one has already started. Further, rcu_core()
+		 * only accelerates if RCU is idle (no GP in progress).
 		 *
 		 * NOCB kthreads have their own way to deal with that...
 		 */
@@ -1993,6 +1998,12 @@  rcu_report_qs_rdp(struct rcu_data *rdp)
 			/*
 			 * ...but NOCB kthreads may miss or delay callbacks acceleration
 			 * if in the middle of a (de-)offloading process.
+			 *
+			 * Such missed acceleration may cause the callbacks to
+			 * be stranded until RCU is fully de-offloaded, as
+			 * acceleration from rcu_core() and note_gp_changes()
+			 * cannot happen for fully/partially offloaded mode due
+			 * to ordering dependency between rnp lock and nocb_lock.
 			 */
 			needacc = true;
 		}