[v3] rcu: Remove impossible wakeup rcu GP kthread action from rcu_report_qs_rdp()

Message ID 20230118073014.2020743-1-qiang1.zhang@intel.com
State New
Headers
Series [v3] rcu: Remove impossible wakeup rcu GP kthread action from rcu_report_qs_rdp() |

Commit Message

Zqiang Jan. 18, 2023, 7:30 a.m. UTC
  When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's ->
grpmask has not been cleared from the corresponding rcu_node structure's
->qsmask, after that will clear and report quiescent state, but in this
time, this also means that current grace period is not end, the current
grace period is ongoing, because the rcu_gp_in_progress() currently return
true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible
to return true.

This commit therefore remove impossible rcu_gp_kthread_wake() calling.

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)
  

Comments

Frederic Weisbecker Jan. 18, 2023, 10:18 a.m. UTC | #1
On Wed, Jan 18, 2023 at 03:30:14PM +0800, Zqiang wrote:
> When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's ->
> grpmask has not been cleared from the corresponding rcu_node structure's
> ->qsmask, after that will clear and report quiescent state, but in this
> time, this also means that current grace period is not end, the current
> grace period is ongoing, because the rcu_gp_in_progress() currently return
> true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible
> to return true.
> 
> This commit therefore remove impossible rcu_gp_kthread_wake() calling.
> 
> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

Thanks!

> ---
>  kernel/rcu/tree.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index b2c204529478..0962c2202d45 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
>  {
>  	unsigned long flags;
>  	unsigned long mask;
> -	bool needwake = false;
>  	bool needacc = false;
>  	struct rcu_node *rnp;
>  
> @@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
>  		 * NOCB kthreads have their own way to deal with that...
>  		 */
>  		if (!rcu_rdp_is_offloaded(rdp)) {
> -			needwake = rcu_accelerate_cbs(rnp, rdp);
> +			/*
> +			 * Current GP does not end, invoke rcu_gp_in_progress()
> +			 * will return true and so doesn't wake up GP kthread to
> +			 * start a new GP.
> +			 */
> +			WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp));
>  		} else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) {
>  			/*
>  			 * ...but NOCB kthreads may miss or delay callbacks acceleration
> @@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
>  		rcu_disable_urgency_upon_qs(rdp);
>  		rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
>  		/* ^^^ Released rnp->lock */
> -		if (needwake)
> -			rcu_gp_kthread_wake();
>  
>  		if (needacc) {
>  			rcu_nocb_lock_irqsave(rdp, flags);
> -- 
> 2.25.1
>
  
Paul E. McKenney Jan. 18, 2023, 6:07 p.m. UTC | #2
On Wed, Jan 18, 2023 at 03:30:14PM +0800, Zqiang wrote:
> When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's ->
> grpmask has not been cleared from the corresponding rcu_node structure's
> ->qsmask, after that will clear and report quiescent state, but in this
> time, this also means that current grace period is not end, the current
> grace period is ongoing, because the rcu_gp_in_progress() currently return
> true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible
> to return true.
> 
> This commit therefore remove impossible rcu_gp_kthread_wake() calling.
> 
> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

Queued (wordsmithed as shown below, as always, please check) for further
testing and review, thank you both!

							Thanx, Paul

------------------------------------------------------------------------

commit fbe3e300ec8b3edd2b8f84dab4dc98947cf71eb8
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Wed Jan 18 15:30:14 2023 +0800

    rcu: Remove never-set needwake assignment from rcu_report_qs_rdp()
    
    The rcu_accelerate_cbs() function is invoked by rcu_report_qs_rdp()
    only if there is a grace period in progress that is still blocked
    by at least one CPU on this rcu_node structure.  This means that
    rcu_accelerate_cbs() should never return the value true, and thus that
    this function should never set the needwake variable and in turn never
    invoke rcu_gp_kthread_wake().
    
    This commit therefore removes the needwake variable and the invocation
    of rcu_gp_kthread_wake() in favor of a WARN_ON_ONCE() on the call to
    rcu_accelerate_cbs().  The purpose of this new WARN_ON_ONCE() is to
    detect situations where the system's opinion differs from ours.
    
    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b2c2045294780..7a3085ad0a7df 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
 {
 	unsigned long flags;
 	unsigned long mask;
-	bool needwake = false;
 	bool needacc = false;
 	struct rcu_node *rnp;
 
@@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
 		 * NOCB kthreads have their own way to deal with that...
 		 */
 		if (!rcu_rdp_is_offloaded(rdp)) {
-			needwake = rcu_accelerate_cbs(rnp, rdp);
+			/*
+			 * The current GP has not yet ended, so it
+			 * should not be possible for rcu_accelerate_cbs()
+			 * to return true.  So complain, but don't awaken.
+			 */
+			WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp));
 		} else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) {
 			/*
 			 * ...but NOCB kthreads may miss or delay callbacks acceleration
@@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
 		rcu_disable_urgency_upon_qs(rdp);
 		rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
 		/* ^^^ Released rnp->lock */
-		if (needwake)
-			rcu_gp_kthread_wake();
 
 		if (needacc) {
 			rcu_nocb_lock_irqsave(rdp, flags);
  
Zqiang Jan. 18, 2023, 11:30 p.m. UTC | #3
On Wed, Jan 18, 2023 at 03:30:14PM +0800, Zqiang wrote:
> When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's ->
> grpmask has not been cleared from the corresponding rcu_node structure's
> ->qsmask, after that will clear and report quiescent state, but in this
> time, this also means that current grace period is not end, the current
> grace period is ongoing, because the rcu_gp_in_progress() currently return
> true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible
> to return true.
> 
> This commit therefore remove impossible rcu_gp_kthread_wake() calling.
> 
> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
>
>Queued (wordsmithed as shown below, as always, please check) for further
>testing and review, thank you both!
>
>							Thanx, Paul
>
>------------------------------------------------------------------------
>
>commit fbe3e300ec8b3edd2b8f84dab4dc98947cf71eb8
>Author: Zqiang <qiang1.zhang@intel.com>
>Date:   Wed Jan 18 15:30:14 2023 +0800
>
>    rcu: Remove never-set needwake assignment from rcu_report_qs_rdp()
>    
>    The rcu_accelerate_cbs() function is invoked by rcu_report_qs_rdp()
>    only if there is a grace period in progress that is still blocked
>    by at least one CPU on this rcu_node structure.  This means that
>    rcu_accelerate_cbs() should never return the value true, and thus that
>    this function should never set the needwake variable and in turn never
>    invoke rcu_gp_kthread_wake().
>    
>    This commit therefore removes the needwake variable and the invocation
>    of rcu_gp_kthread_wake() in favor of a WARN_ON_ONCE() on the call to
>    rcu_accelerate_cbs().  The purpose of this new WARN_ON_ONCE() is to
>    detect situations where the system's opinion differs from ours.
>    

Thanks Paul, this is more clear for me😊.

Thanks
Zqiang

>    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
>    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
>    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>
>diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>index b2c2045294780..7a3085ad0a7df 100644
>--- a/kernel/rcu/tree.c
>+++ b/kernel/rcu/tree.c
>@@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
>{
> 	unsigned long flags;
> 	unsigned long mask;
>-	bool needwake = false;
> 	bool needacc = false;
> 	struct rcu_node *rnp;
> 
>@@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> 		 * NOCB kthreads have their own way to deal with that...
> 		 */
> 		if (!rcu_rdp_is_offloaded(rdp)) {
>-			needwake = rcu_accelerate_cbs(rnp, rdp);
>+			/*
>+			 * The current GP has not yet ended, so it
>+			 * should not be possible for rcu_accelerate_cbs()
>+			 * to return true.  So complain, but don't awaken.
>+			 */
>+			WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp));
> 		} else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) {
> 			/*
> 			 * ...but NOCB kthreads may miss or delay callbacks acceleration
>@@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> 		rcu_disable_urgency_upon_qs(rdp);
> 		rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> 		/* ^^^ Released rnp->lock */
>-		if (needwake)
>-			rcu_gp_kthread_wake();
> 
> 		if (needacc) {
> 			rcu_nocb_lock_irqsave(rdp, flags);
  
Joel Fernandes Jan. 20, 2023, 3:14 a.m. UTC | #4
On Wed, Jan 18, 2023 at 10:07:14AM -0800, Paul E. McKenney wrote:
> On Wed, Jan 18, 2023 at 03:30:14PM +0800, Zqiang wrote:
> > When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's ->
> > grpmask has not been cleared from the corresponding rcu_node structure's
> > ->qsmask, after that will clear and report quiescent state, but in this
> > time, this also means that current grace period is not end, the current
> > grace period is ongoing, because the rcu_gp_in_progress() currently return
> > true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible
> > to return true.
> > 
> > This commit therefore remove impossible rcu_gp_kthread_wake() calling.
> > 
> > Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> 
> Queued (wordsmithed as shown below, as always, please check) for further
> testing and review, thank you both!
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit fbe3e300ec8b3edd2b8f84dab4dc98947cf71eb8
> Author: Zqiang <qiang1.zhang@intel.com>
> Date:   Wed Jan 18 15:30:14 2023 +0800
> 
>     rcu: Remove never-set needwake assignment from rcu_report_qs_rdp()
>     
>     The rcu_accelerate_cbs() function is invoked by rcu_report_qs_rdp()
>     only if there is a grace period in progress that is still blocked
>     by at least one CPU on this rcu_node structure.  This means that
>     rcu_accelerate_cbs() should never return the value true, and thus that
>     this function should never set the needwake variable and in turn never
>     invoke rcu_gp_kthread_wake().
>     
>     This commit therefore removes the needwake variable and the invocation
>     of rcu_gp_kthread_wake() in favor of a WARN_ON_ONCE() on the call to
>     rcu_accelerate_cbs().  The purpose of this new WARN_ON_ONCE() is to
>     detect situations where the system's opinion differs from ours.
>     
>     Signed-off-by: Zqiang <qiang1.zhang@intel.com>
>     Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
>     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index b2c2045294780..7a3085ad0a7df 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
>  {
>  	unsigned long flags;
>  	unsigned long mask;
> -	bool needwake = false;
>  	bool needacc = false;
>  	struct rcu_node *rnp;
>  
> @@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
>  		 * NOCB kthreads have their own way to deal with that...
>  		 */
>  		if (!rcu_rdp_is_offloaded(rdp)) {
> -			needwake = rcu_accelerate_cbs(rnp, rdp);
> +			/*
> +			 * The current GP has not yet ended, so it
> +			 * should not be possible for rcu_accelerate_cbs()
> +			 * to return true.  So complain, but don't awaken.
> +			 */
> +			WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp));
>  		} else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) {
>  			/*
>  			 * ...but NOCB kthreads may miss or delay callbacks acceleration
> @@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
>  		rcu_disable_urgency_upon_qs(rdp);
>  		rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
>  		/* ^^^ Released rnp->lock */
> -		if (needwake)
> -			rcu_gp_kthread_wake();

AFAICS, there is almost no compiler benefit of doing this, and zero runtime
benefit of doing this. The WARN_ON_ONCE() also involves a runtime condition
check of the return value of rcu_accelerate_cbs(), so you still have a
branch. Yes, maybe slightly smaller code without the wake call, but I'm not
sure that is worth it.

And, if the opinion of system differs, its a bug anyway, so more added risk.


>  
>  		if (needacc) {
>  			rcu_nocb_lock_irqsave(rdp, flags);

And when needacc = true, rcu_accelerate_cbs_unlocked() tries to do a wake up
anyway, so it is consistent with nocb vs !nocb.

So I am not a fan of this change. ;-)

thanks,

 - Joel
  
Joel Fernandes Jan. 20, 2023, 3:17 a.m. UTC | #5
On Fri, Jan 20, 2023 at 3:14 AM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Wed, Jan 18, 2023 at 10:07:14AM -0800, Paul E. McKenney wrote:
> > On Wed, Jan 18, 2023 at 03:30:14PM +0800, Zqiang wrote:
> > > When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's ->
> > > grpmask has not been cleared from the corresponding rcu_node structure's
> > > ->qsmask, after that will clear and report quiescent state, but in this
> > > time, this also means that current grace period is not end, the current
> > > grace period is ongoing, because the rcu_gp_in_progress() currently return
> > > true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible
> > > to return true.
> > >
> > > This commit therefore remove impossible rcu_gp_kthread_wake() calling.
> > >
> > > Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> > > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> >
> > Queued (wordsmithed as shown below, as always, please check) for further
> > testing and review, thank you both!
> >
> >                                                       Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > commit fbe3e300ec8b3edd2b8f84dab4dc98947cf71eb8
> > Author: Zqiang <qiang1.zhang@intel.com>
> > Date:   Wed Jan 18 15:30:14 2023 +0800
> >
> >     rcu: Remove never-set needwake assignment from rcu_report_qs_rdp()
> >
> >     The rcu_accelerate_cbs() function is invoked by rcu_report_qs_rdp()
> >     only if there is a grace period in progress that is still blocked
> >     by at least one CPU on this rcu_node structure.  This means that
> >     rcu_accelerate_cbs() should never return the value true, and thus that
> >     this function should never set the needwake variable and in turn never
> >     invoke rcu_gp_kthread_wake().
> >
> >     This commit therefore removes the needwake variable and the invocation
> >     of rcu_gp_kthread_wake() in favor of a WARN_ON_ONCE() on the call to
> >     rcu_accelerate_cbs().  The purpose of this new WARN_ON_ONCE() is to
> >     detect situations where the system's opinion differs from ours.
> >
> >     Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> >     Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> >     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index b2c2045294780..7a3085ad0a7df 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> >  {
> >       unsigned long flags;
> >       unsigned long mask;
> > -     bool needwake = false;
> >       bool needacc = false;
> >       struct rcu_node *rnp;
> >
> > @@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> >                * NOCB kthreads have their own way to deal with that...
> >                */
> >               if (!rcu_rdp_is_offloaded(rdp)) {
> > -                     needwake = rcu_accelerate_cbs(rnp, rdp);
> > +                     /*
> > +                      * The current GP has not yet ended, so it
> > +                      * should not be possible for rcu_accelerate_cbs()
> > +                      * to return true.  So complain, but don't awaken.
> > +                      */
> > +                     WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp));
> >               } else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) {
> >                       /*
> >                        * ...but NOCB kthreads may miss or delay callbacks acceleration
> > @@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> >               rcu_disable_urgency_upon_qs(rdp);
> >               rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> >               /* ^^^ Released rnp->lock */
> > -             if (needwake)
> > -                     rcu_gp_kthread_wake();
>
> AFAICS, there is almost no compiler benefit of doing this, and zero runtime
> benefit of doing this. The WARN_ON_ONCE() also involves a runtime condition
> check of the return value of rcu_accelerate_cbs(), so you still have a
> branch. Yes, maybe slightly smaller code without the wake call, but I'm not
> sure that is worth it.
>
> And, if the opinion of system differs, its a bug anyway, so more added risk.
>
>
> >
> >               if (needacc) {
> >                       rcu_nocb_lock_irqsave(rdp, flags);
>
> And when needacc = true, rcu_accelerate_cbs_unlocked() tries to do a wake up
> anyway, so it is consistent with nocb vs !nocb.

Sorry, I mean "inconsistent".

Thanks,

 - Joel
  
Zqiang Jan. 20, 2023, 4:09 a.m. UTC | #6
> On Wed, Jan 18, 2023 at 03:30:14PM +0800, Zqiang wrote:
> > When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's ->
> > grpmask has not been cleared from the corresponding rcu_node structure's
> > ->qsmask, after that will clear and report quiescent state, but in this
> > time, this also means that current grace period is not end, the current
> > grace period is ongoing, because the rcu_gp_in_progress() currently return
> > true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible
> > to return true.
> > 
> > This commit therefore remove impossible rcu_gp_kthread_wake() calling.
> > 
> > Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> 
> Queued (wordsmithed as shown below, as always, please check) for further
> testing and review, thank you both!
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit fbe3e300ec8b3edd2b8f84dab4dc98947cf71eb8
> Author: Zqiang <qiang1.zhang@intel.com>
> Date:   Wed Jan 18 15:30:14 2023 +0800
> 
>     rcu: Remove never-set needwake assignment from rcu_report_qs_rdp()
>     
>     The rcu_accelerate_cbs() function is invoked by rcu_report_qs_rdp()
>     only if there is a grace period in progress that is still blocked
>     by at least one CPU on this rcu_node structure.  This means that
>     rcu_accelerate_cbs() should never return the value true, and thus that
>     this function should never set the needwake variable and in turn never
>     invoke rcu_gp_kthread_wake().
>     
>     This commit therefore removes the needwake variable and the invocation
>     of rcu_gp_kthread_wake() in favor of a WARN_ON_ONCE() on the call to
>     rcu_accelerate_cbs().  The purpose of this new WARN_ON_ONCE() is to
>     detect situations where the system's opinion differs from ours.
>     
>     Signed-off-by: Zqiang <qiang1.zhang@intel.com>
>     Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
>     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index b2c2045294780..7a3085ad0a7df 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
>  {
>  	unsigned long flags;
>  	unsigned long mask;
> -	bool needwake = false;
>  	bool needacc = false;
>  	struct rcu_node *rnp;
>  
> @@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
>  		 * NOCB kthreads have their own way to deal with that...
>  		 */
>  		if (!rcu_rdp_is_offloaded(rdp)) {
> -			needwake = rcu_accelerate_cbs(rnp, rdp);
> +			/*
> +			 * The current GP has not yet ended, so it
> +			 * should not be possible for rcu_accelerate_cbs()
> +			 * to return true.  So complain, but don't awaken.
> +			 */
> +			WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp));
>  		} else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) {
>  			/*
>  			 * ...but NOCB kthreads may miss or delay callbacks acceleration
> @@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
>  		rcu_disable_urgency_upon_qs(rdp);
>  		rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
>  		/* ^^^ Released rnp->lock */
> -		if (needwake)
> -			rcu_gp_kthread_wake();
>
>AFAICS, there is almost no compiler benefit of doing this, and zero runtime
>benefit of doing this. The WARN_ON_ONCE() also involves a runtime condition
>check of the return value of rcu_accelerate_cbs(), so you still have a
>branch. Yes, maybe slightly smaller code without the wake call, but I'm not
>sure that is worth it.
>
>And, if the opinion of system differs, its a bug anyway, so more added risk.
>
>
>  
>  		if (needacc) {
>  			rcu_nocb_lock_irqsave(rdp, flags);
>
>And when needacc = true, rcu_accelerate_cbs_unlocked() tries to do a wake up
>anyway, so it is consistent with nocb vs !nocb.

For !nocb, we invoked rcu_accelerate_cbs() before report qs,  so this GP is impossible to end
and we also not set RCU_GP_FLAG_INIT to start new GP in rcu_accelerate_cbs().
but for nocb, when needacc = true, we invoke rcu_accelerate_cbs_unlocked() after current CPU
has reported qs,  if all CPU have been reported qs,  we will wakeup gp kthread to end this GP in
rcu_report_qs_rnp().   after that, the rcu_accelerate_cbs_unlocked() is  possible to try to wake up
gp kthread if this GP has ended at this time.   so nocb vs !nocb is likely to be inconsistent.

Thanks
Zqiang


>
>So I am not a fan of this change. ;-)
>
>thanks,
>
> - Joel
  
Joel Fernandes Jan. 20, 2023, 4:40 a.m. UTC | #7
On Fri, Jan 20, 2023 at 4:09 AM Zhang, Qiang1 <qiang1.zhang@intel.com> wrote:
>
> > On Wed, Jan 18, 2023 at 03:30:14PM +0800, Zqiang wrote:
> > > When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's ->
> > > grpmask has not been cleared from the corresponding rcu_node structure's
> > > ->qsmask, after that will clear and report quiescent state, but in this
> > > time, this also means that current grace period is not end, the current
> > > grace period is ongoing, because the rcu_gp_in_progress() currently return
> > > true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible
> > > to return true.
> > >
> > > This commit therefore remove impossible rcu_gp_kthread_wake() calling.
> > >
> > > Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> > > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> >
> > Queued (wordsmithed as shown below, as always, please check) for further
> > testing and review, thank you both!
> >
> >                                                       Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > commit fbe3e300ec8b3edd2b8f84dab4dc98947cf71eb8
> > Author: Zqiang <qiang1.zhang@intel.com>
> > Date:   Wed Jan 18 15:30:14 2023 +0800
> >
> >     rcu: Remove never-set needwake assignment from rcu_report_qs_rdp()
> >
> >     The rcu_accelerate_cbs() function is invoked by rcu_report_qs_rdp()
> >     only if there is a grace period in progress that is still blocked
> >     by at least one CPU on this rcu_node structure.  This means that
> >     rcu_accelerate_cbs() should never return the value true, and thus that
> >     this function should never set the needwake variable and in turn never
> >     invoke rcu_gp_kthread_wake().
> >
> >     This commit therefore removes the needwake variable and the invocation
> >     of rcu_gp_kthread_wake() in favor of a WARN_ON_ONCE() on the call to
> >     rcu_accelerate_cbs().  The purpose of this new WARN_ON_ONCE() is to
> >     detect situations where the system's opinion differs from ours.
> >
> >     Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> >     Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> >     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index b2c2045294780..7a3085ad0a7df 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> >  {
> >       unsigned long flags;
> >       unsigned long mask;
> > -     bool needwake = false;
> >       bool needacc = false;
> >       struct rcu_node *rnp;
> >
> > @@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> >                * NOCB kthreads have their own way to deal with that...
> >                */
> >               if (!rcu_rdp_is_offloaded(rdp)) {
> > -                     needwake = rcu_accelerate_cbs(rnp, rdp);
> > +                     /*
> > +                      * The current GP has not yet ended, so it
> > +                      * should not be possible for rcu_accelerate_cbs()
> > +                      * to return true.  So complain, but don't awaken.
> > +                      */
> > +                     WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp));
> >               } else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) {
> >                       /*
> >                        * ...but NOCB kthreads may miss or delay callbacks acceleration
> > @@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> >               rcu_disable_urgency_upon_qs(rdp);
> >               rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> >               /* ^^^ Released rnp->lock */
> > -             if (needwake)
> > -                     rcu_gp_kthread_wake();
> >
> >AFAICS, there is almost no compiler benefit of doing this, and zero runtime
> >benefit of doing this. The WARN_ON_ONCE() also involves a runtime condition
> >check of the return value of rcu_accelerate_cbs(), so you still have a
> >branch. Yes, maybe slightly smaller code without the wake call, but I'm not
> >sure that is worth it.
> >
> >And, if the opinion of system differs, its a bug anyway, so more added risk.
> >
> >
> >
> >               if (needacc) {
> >                       rcu_nocb_lock_irqsave(rdp, flags);
> >
> >And when needacc = true, rcu_accelerate_cbs_unlocked() tries to do a wake up
> >anyway, so it is consistent with nocb vs !nocb.
>
> For !nocb, we invoked rcu_accelerate_cbs() before report qs,  so this GP is impossible to end
> and we also not set RCU_GP_FLAG_INIT to start new GP in rcu_accelerate_cbs().
> but for nocb, when needacc = true, we invoke rcu_accelerate_cbs_unlocked() after current CPU
> has reported qs,  if all CPU have been reported qs,  we will wakeup gp kthread to end this GP in
> rcu_report_qs_rnp().   after that, the rcu_accelerate_cbs_unlocked() is  possible to try to wake up
> gp kthread if this GP has ended at this time.   so nocb vs !nocb is likely to be inconsistent.
>

That is a fair point. But after gp ends,  rcu_check_quiescent_state()
-> note_gp_changes() which will do a accel + GP thread wake up at that
point anyway, once it notices that a GP has come to an end. That
should happen for both the nocb and !nocb cases right?

I am wondering if rcu_report_qs_rdp() needs to be rethought to make
both cases consistent.

Why does the nocb case need an accel + GP thread wakeup in the
rcu_report_qs_rdp() function, but the !nocb case does not?

(I am out of office till Monday but will intermittently (maybe) check
in, RCU is one of those things that daydreaming tends to lend itself
to...)

 - Joel
  
Zqiang Jan. 20, 2023, 8:19 a.m. UTC | #8
>
> > On Wed, Jan 18, 2023 at 03:30:14PM +0800, Zqiang wrote:
> > > When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's ->
> > > grpmask has not been cleared from the corresponding rcu_node structure's
> > > ->qsmask, after that will clear and report quiescent state, but in this
> > > time, this also means that current grace period is not end, the current
> > > grace period is ongoing, because the rcu_gp_in_progress() currently return
> > > true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible
> > > to return true.
> > >
> > > This commit therefore remove impossible rcu_gp_kthread_wake() calling.
> > >
> > > Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> > > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> >
> > Queued (wordsmithed as shown below, as always, please check) for further
> > testing and review, thank you both!
> >
> >                                                       Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > commit fbe3e300ec8b3edd2b8f84dab4dc98947cf71eb8
> > Author: Zqiang <qiang1.zhang@intel.com>
> > Date:   Wed Jan 18 15:30:14 2023 +0800
> >
> >     rcu: Remove never-set needwake assignment from rcu_report_qs_rdp()
> >
> >     The rcu_accelerate_cbs() function is invoked by rcu_report_qs_rdp()
> >     only if there is a grace period in progress that is still blocked
> >     by at least one CPU on this rcu_node structure.  This means that
> >     rcu_accelerate_cbs() should never return the value true, and thus that
> >     this function should never set the needwake variable and in turn never
> >     invoke rcu_gp_kthread_wake().
> >
> >     This commit therefore removes the needwake variable and the invocation
> >     of rcu_gp_kthread_wake() in favor of a WARN_ON_ONCE() on the call to
> >     rcu_accelerate_cbs().  The purpose of this new WARN_ON_ONCE() is to
> >     detect situations where the system's opinion differs from ours.
> >
> >     Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> >     Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> >     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index b2c2045294780..7a3085ad0a7df 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> >  {
> >       unsigned long flags;
> >       unsigned long mask;
> > -     bool needwake = false;
> >       bool needacc = false;
> >       struct rcu_node *rnp;
> >
> > @@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> >                * NOCB kthreads have their own way to deal with that...
> >                */
> >               if (!rcu_rdp_is_offloaded(rdp)) {
> > -                     needwake = rcu_accelerate_cbs(rnp, rdp);
> > +                     /*
> > +                      * The current GP has not yet ended, so it
> > +                      * should not be possible for rcu_accelerate_cbs()
> > +                      * to return true.  So complain, but don't awaken.
> > +                      */
> > +                     WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp));
> >               } else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) {
> >                       /*
> >                        * ...but NOCB kthreads may miss or delay callbacks acceleration
> > @@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> >               rcu_disable_urgency_upon_qs(rdp);
> >               rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> >               /* ^^^ Released rnp->lock */
> > -             if (needwake)
> > -                     rcu_gp_kthread_wake();
> >
> >AFAICS, there is almost no compiler benefit of doing this, and zero runtime
> >benefit of doing this. The WARN_ON_ONCE() also involves a runtime condition
> >check of the return value of rcu_accelerate_cbs(), so you still have a
> >branch. Yes, maybe slightly smaller code without the wake call, but I'm not
> >sure that is worth it.
> >
> >And, if the opinion of system differs, its a bug anyway, so more added risk.
> >
> >
> >
> >               if (needacc) {
> >                       rcu_nocb_lock_irqsave(rdp, flags);
> >
> >And when needacc = true, rcu_accelerate_cbs_unlocked() tries to do a wake up
> >anyway, so it is consistent with nocb vs !nocb.
>
> For !nocb, we invoked rcu_accelerate_cbs() before report qs,  so this GP is impossible to end
> and we also not set RCU_GP_FLAG_INIT to start new GP in rcu_accelerate_cbs().
> but for nocb, when needacc = true, we invoke rcu_accelerate_cbs_unlocked() after current CPU
> has reported qs,  if all CPU have been reported qs,  we will wakeup gp kthread to end this GP in
> rcu_report_qs_rnp().   after that, the rcu_accelerate_cbs_unlocked() is  possible to try to wake up
> gp kthread if this GP has ended at this time.   so nocb vs !nocb is likely to be inconsistent.
>
>
>That is a fair point. But after gp ends,  rcu_check_quiescent_state()
>-> note_gp_changes() which will do a accel + GP thread wake up at that
>point anyway, once it notices that a GP has come to an end. That
>should happen for both the nocb and !nocb cases right?

For nocb rdp, we won't invoke rcu_accelerate_cbs() and rcu_advance_cbs() in
note_gp_changes().  so also not wakeup gp kthread in note_gp_changes(). 

>
>I am wondering if rcu_report_qs_rdp() needs to be rethought to make
>both cases consistent.
>
>Why does the nocb case need an accel + GP thread wakeup in the
>rcu_report_qs_rdp() function, but the !nocb case does not?

For nocb accel + GP kthread wakeup only happen in the middle of a (de-)offloading process.
this is an intermediate state.
	
Thanks
Zqiang

>
>(I am out of office till Monday but will intermittently (maybe) check
>in, RCU is one of those things that daydreaming tends to lend itself
>to...)
>
> - Joel
  
Joel Fernandes Jan. 20, 2023, 1:27 p.m. UTC | #9
> On Jan 20, 2023, at 3:19 AM, Zhang, Qiang1 <qiang1.zhang@intel.com> wrote:
> 
> 
>> 
>> 
>>>> On Wed, Jan 18, 2023 at 03:30:14PM +0800, Zqiang wrote:
>>>>> When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's ->
>>>>> grpmask has not been cleared from the corresponding rcu_node structure's
>>>>> ->qsmask, after that will clear and report quiescent state, but in this
>>>>> time, this also means that current grace period is not end, the current
>>>>> grace period is ongoing, because the rcu_gp_in_progress() currently return
>>>>> true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible
>>>>> to return true.
>>>>> 
>>>>> This commit therefore remove impossible rcu_gp_kthread_wake() calling.
>>>>> 
>>>>> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
>>>>> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
>>> 
>>> Queued (wordsmithed as shown below, as always, please check) for further
>>> testing and review, thank you both!
>>> 
>>>                                                      Thanx, Paul
>>> 
>>> ------------------------------------------------------------------------
>>> 
>>> commit fbe3e300ec8b3edd2b8f84dab4dc98947cf71eb8
>>> Author: Zqiang <qiang1.zhang@intel.com>
>>> Date:   Wed Jan 18 15:30:14 2023 +0800
>>> 
>>>    rcu: Remove never-set needwake assignment from rcu_report_qs_rdp()
>>> 
>>>    The rcu_accelerate_cbs() function is invoked by rcu_report_qs_rdp()
>>>    only if there is a grace period in progress that is still blocked
>>>    by at least one CPU on this rcu_node structure.  This means that
>>>    rcu_accelerate_cbs() should never return the value true, and thus that
>>>    this function should never set the needwake variable and in turn never
>>>    invoke rcu_gp_kthread_wake().
>>> 
>>>    This commit therefore removes the needwake variable and the invocation
>>>    of rcu_gp_kthread_wake() in favor of a WARN_ON_ONCE() on the call to
>>>    rcu_accelerate_cbs().  The purpose of this new WARN_ON_ONCE() is to
>>>    detect situations where the system's opinion differs from ours.
>>> 
>>>    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
>>>    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
>>>    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>>> 
>>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>>> index b2c2045294780..7a3085ad0a7df 100644
>>> --- a/kernel/rcu/tree.c
>>> +++ b/kernel/rcu/tree.c
>>> @@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
>>> {
>>>      unsigned long flags;
>>>      unsigned long mask;
>>> -     bool needwake = false;
>>>      bool needacc = false;
>>>      struct rcu_node *rnp;
>>> 
>>> @@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
>>>               * NOCB kthreads have their own way to deal with that...
>>>               */
>>>              if (!rcu_rdp_is_offloaded(rdp)) {
>>> -                     needwake = rcu_accelerate_cbs(rnp, rdp);
>>> +                     /*
>>> +                      * The current GP has not yet ended, so it
>>> +                      * should not be possible for rcu_accelerate_cbs()
>>> +                      * to return true.  So complain, but don't awaken.
>>> +                      */
>>> +                     WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp));
>>>              } else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) {
>>>                      /*
>>>                       * ...but NOCB kthreads may miss or delay callbacks acceleration
>>> @@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
>>>              rcu_disable_urgency_upon_qs(rdp);
>>>              rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
>>>              /* ^^^ Released rnp->lock */
>>> -             if (needwake)
>>> -                     rcu_gp_kthread_wake();
>>> 
>>> AFAICS, there is almost no compiler benefit of doing this, and zero runtime
>>> benefit of doing this. The WARN_ON_ONCE() also involves a runtime condition
>>> check of the return value of rcu_accelerate_cbs(), so you still have a
>>> branch. Yes, maybe slightly smaller code without the wake call, but I'm not
>>> sure that is worth it.
>>> 
>>> And, if the opinion of system differs, its a bug anyway, so more added risk.
>>> 
>>> 
>>> 
>>>              if (needacc) {
>>>                      rcu_nocb_lock_irqsave(rdp, flags);
>>> 
>>> And when needacc = true, rcu_accelerate_cbs_unlocked() tries to do a wake up
>>> anyway, so it is consistent with nocb vs !nocb.
>> 
>> For !nocb, we invoked rcu_accelerate_cbs() before report qs,  so this GP is impossible to end
>> and we also not set RCU_GP_FLAG_INIT to start new GP in rcu_accelerate_cbs().
>> but for nocb, when needacc = true, we invoke rcu_accelerate_cbs_unlocked() after current CPU
>> has reported qs,  if all CPU have been reported qs,  we will wakeup gp kthread to end this GP in
>> rcu_report_qs_rnp().   after that, the rcu_accelerate_cbs_unlocked() is  possible to try to wake up
>> gp kthread if this GP has ended at this time.   so nocb vs !nocb is likely to be inconsistent.
>> 
>> 
>> That is a fair point. But after gp ends,  rcu_check_quiescent_state()
>> -> note_gp_changes() which will do a accel + GP thread wake up at that
>> point anyway, once it notices that a GP has come to an end. That
>> should happen for both the nocb and !nocb cases right?
> 
> For nocb rdp, we won't invoke rcu_accelerate_cbs() and rcu_advance_cbs() in
> note_gp_changes().  so also not wakeup gp kthread in note_gp_changes(). 

Yes correct, ok but…
> 
>> 
>> I am wondering if rcu_report_qs_rdp() needs to be rethought to make
>> both cases consistent.
>> 
>> Why does the nocb case need an accel + GP thread wakeup in the
>> rcu_report_qs_rdp() function, but the !nocb case does not?
> 
> For nocb accel + GP kthread wakeup only happen in the middle of a (de-)offloading process.
> this is an intermediate state.

Sure, I know what the code currently does, I am asking why and it feels wrong.

I suggest you slightly change your approach to not assuming the code should be bonafide correct and then fixing it (which is ok once in a while), and asking higher level questions to why things are the way they are in the first place (that is just my suggestion and I am not in a place to provide advice, far from it, but I am just telling you my approach — I care more about the code than increasing my patch count :P).

If you are in an intermediate state, part way to a !nocb state — you may have missed a nocb-related accel and wake, correct? Why does that matter? Once we transition to a !nocb state, we do not do a post-qs-report accel+wake anyway as we clearly know from the discussion. So why do we need to do it if we missed it for the intermediate stage? So, I am not fully sure yet what that needac is doing and why it is needed.

Do not get me wrong, stellar work here. But I suggest challenge the assumptions and the design, not always just the code that was already written :), apologies for any misplaced or noisy advice.

Thanks!

 - Joel


>    
> Thanks
> Zqiang
> 
>> 
>> (I am out of office till Monday but will intermittently (maybe) check
>> in, RCU is one of those things that daydreaming tends to lend itself
>> to...)
>> 
>> - Joel
  
Paul E. McKenney Jan. 20, 2023, 8:33 p.m. UTC | #10
On Fri, Jan 20, 2023 at 08:27:03AM -0500, Joel Fernandes wrote:
> 
> 
> > On Jan 20, 2023, at 3:19 AM, Zhang, Qiang1 <qiang1.zhang@intel.com> wrote:
> > 
> > 
> >> 
> >> 
> >>>> On Wed, Jan 18, 2023 at 03:30:14PM +0800, Zqiang wrote:
> >>>>> When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's ->
> >>>>> grpmask has not been cleared from the corresponding rcu_node structure's
> >>>>> ->qsmask, after that will clear and report quiescent state, but in this
> >>>>> time, this also means that current grace period is not end, the current
> >>>>> grace period is ongoing, because the rcu_gp_in_progress() currently return
> >>>>> true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible
> >>>>> to return true.
> >>>>> 
> >>>>> This commit therefore remove impossible rcu_gp_kthread_wake() calling.
> >>>>> 
> >>>>> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> >>>>> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> >>> 
> >>> Queued (wordsmithed as shown below, as always, please check) for further
> >>> testing and review, thank you both!
> >>> 
> >>>                                                      Thanx, Paul
> >>> 
> >>> ------------------------------------------------------------------------
> >>> 
> >>> commit fbe3e300ec8b3edd2b8f84dab4dc98947cf71eb8
> >>> Author: Zqiang <qiang1.zhang@intel.com>
> >>> Date:   Wed Jan 18 15:30:14 2023 +0800
> >>> 
> >>>    rcu: Remove never-set needwake assignment from rcu_report_qs_rdp()
> >>> 
> >>>    The rcu_accelerate_cbs() function is invoked by rcu_report_qs_rdp()
> >>>    only if there is a grace period in progress that is still blocked
> >>>    by at least one CPU on this rcu_node structure.  This means that
> >>>    rcu_accelerate_cbs() should never return the value true, and thus that
> >>>    this function should never set the needwake variable and in turn never
> >>>    invoke rcu_gp_kthread_wake().
> >>> 
> >>>    This commit therefore removes the needwake variable and the invocation
> >>>    of rcu_gp_kthread_wake() in favor of a WARN_ON_ONCE() on the call to
> >>>    rcu_accelerate_cbs().  The purpose of this new WARN_ON_ONCE() is to
> >>>    detect situations where the system's opinion differs from ours.
> >>> 
> >>>    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> >>>    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> >>>    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> >>> 
> >>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> >>> index b2c2045294780..7a3085ad0a7df 100644
> >>> --- a/kernel/rcu/tree.c
> >>> +++ b/kernel/rcu/tree.c
> >>> @@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> >>> {
> >>>      unsigned long flags;
> >>>      unsigned long mask;
> >>> -     bool needwake = false;
> >>>      bool needacc = false;
> >>>      struct rcu_node *rnp;
> >>> 
> >>> @@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> >>>               * NOCB kthreads have their own way to deal with that...
> >>>               */
> >>>              if (!rcu_rdp_is_offloaded(rdp)) {
> >>> -                     needwake = rcu_accelerate_cbs(rnp, rdp);
> >>> +                     /*
> >>> +                      * The current GP has not yet ended, so it
> >>> +                      * should not be possible for rcu_accelerate_cbs()
> >>> +                      * to return true.  So complain, but don't awaken.
> >>> +                      */
> >>> +                     WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp));
> >>>              } else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) {
> >>>                      /*
> >>>                       * ...but NOCB kthreads may miss or delay callbacks acceleration
> >>> @@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> >>>              rcu_disable_urgency_upon_qs(rdp);
> >>>              rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> >>>              /* ^^^ Released rnp->lock */
> >>> -             if (needwake)
> >>> -                     rcu_gp_kthread_wake();
> >>> 
> >>> AFAICS, there is almost no compiler benefit of doing this, and zero runtime
> >>> benefit of doing this. The WARN_ON_ONCE() also involves a runtime condition
> >>> check of the return value of rcu_accelerate_cbs(), so you still have a
> >>> branch. Yes, maybe slightly smaller code without the wake call, but I'm not
> >>> sure that is worth it.
> >>> 
> >>> And, if the opinion of system differs, its a bug anyway, so more added risk.
> >>> 
> >>> 
> >>> 
> >>>              if (needacc) {
> >>>                      rcu_nocb_lock_irqsave(rdp, flags);
> >>> 
> >>> And when needacc = true, rcu_accelerate_cbs_unlocked() tries to do a wake up
> >>> anyway, so it is consistent with nocb vs !nocb.
> >> 
> >> For !nocb, we invoked rcu_accelerate_cbs() before report qs,  so this GP is impossible to end
> >> and we also not set RCU_GP_FLAG_INIT to start new GP in rcu_accelerate_cbs().
> >> but for nocb, when needacc = true, we invoke rcu_accelerate_cbs_unlocked() after current CPU
> >> has reported qs,  if all CPU have been reported qs,  we will wakeup gp kthread to end this GP in
> >> rcu_report_qs_rnp().   after that, the rcu_accelerate_cbs_unlocked() is  possible to try to wake up
> >> gp kthread if this GP has ended at this time.   so nocb vs !nocb is likely to be inconsistent.
> >> 
> >> 
> >> That is a fair point. But after gp ends,  rcu_check_quiescent_state()
> >> -> note_gp_changes() which will do a accel + GP thread wake up at that
> >> point anyway, once it notices that a GP has come to an end. That
> >> should happen for both the nocb and !nocb cases right?
> > 
> > For nocb rdp, we won't invoke rcu_accelerate_cbs() and rcu_advance_cbs() in
> > note_gp_changes().  so also not wakeup gp kthread in note_gp_changes(). 
> 
> Yes correct, ok but…
> > 
> >> 
> >> I am wondering if rcu_report_qs_rdp() needs to be rethought to make
> >> both cases consistent.
> >> 
> >> Why does the nocb case need an accel + GP thread wakeup in the
> >> rcu_report_qs_rdp() function, but the !nocb case does not?
> > 
> > For nocb accel + GP kthread wakeup only happen in the middle of a (de-)offloading process.
> > this is an intermediate state.
> 
> Sure, I know what the code currently does, I am asking why and it feels wrong.
> 
> I suggest you slightly change your approach to not assuming the code should be bonafide correct and then fixing it (which is ok once in a while), and asking higher level questions to why things are the way they are in the first place (that is just my suggestion and I am not in a place to provide advice, far from it, but I am just telling you my approach — I care more about the code than increasing my patch count :P).
> 
> If you are in an intermediate state, part way to a !nocb state — you may have missed a nocb-related accel and wake, correct? Why does that matter? Once we transition to a !nocb state, we do not do a post-qs-report accel+wake anyway as we clearly know from the discussion. So why do we need to do it if we missed it for the intermediate stage? So, I am not fully sure yet what that needac is doing and why it is needed.
> 
> Do not get me wrong, stellar work here. But I suggest challenge the assumptions and the design, not always just the code that was already written :), apologies for any misplaced or noisy advice.

To add to Joel's point, an extra unnecessary check on a slow path can
be OK, but missing a necessary check is of course very bad.

Just to make sure that I am following along, here are the options I see:

1.	Status quo.

2.	Zqiang's current patch, as in remove the wakeup and
	add the WARN_ON_ONCE().

3.	Status quo, and only add the WARN_ON_ONCE(), but still
	keep the needless check for the wakeup.

Are there other options that I have missed?

							Thanx, Paul
  
Frederic Weisbecker Jan. 20, 2023, 10:35 p.m. UTC | #11
On Fri, Jan 20, 2023 at 12:33:00PM -0800, Paul E. McKenney wrote:
> On Fri, Jan 20, 2023 at 08:27:03AM -0500, Joel Fernandes wrote:
> > 
> > 
> > > On Jan 20, 2023, at 3:19 AM, Zhang, Qiang1 <qiang1.zhang@intel.com> wrote:
> > > 
> > > 
> > >> 
> > >> 
> > >>>> On Wed, Jan 18, 2023 at 03:30:14PM +0800, Zqiang wrote:
> > >>>>> When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's ->
> > >>>>> grpmask has not been cleared from the corresponding rcu_node structure's
> > >>>>> ->qsmask, after that will clear and report quiescent state, but in this
> > >>>>> time, this also means that current grace period is not end, the current
> > >>>>> grace period is ongoing, because the rcu_gp_in_progress() currently return
> > >>>>> true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible
> > >>>>> to return true.
> > >>>>> 
> > >>>>> This commit therefore remove impossible rcu_gp_kthread_wake() calling.
> > >>>>> 
> > >>>>> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> > >>>>> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> > >>> 
> > >>> Queued (wordsmithed as shown below, as always, please check) for further
> > >>> testing and review, thank you both!
> > >>> 
> > >>>                                                      Thanx, Paul
> > >>> 
> > >>> ------------------------------------------------------------------------
> > >>> 
> > >>> commit fbe3e300ec8b3edd2b8f84dab4dc98947cf71eb8
> > >>> Author: Zqiang <qiang1.zhang@intel.com>
> > >>> Date:   Wed Jan 18 15:30:14 2023 +0800
> > >>> 
> > >>>    rcu: Remove never-set needwake assignment from rcu_report_qs_rdp()
> > >>> 
> > >>>    The rcu_accelerate_cbs() function is invoked by rcu_report_qs_rdp()
> > >>>    only if there is a grace period in progress that is still blocked
> > >>>    by at least one CPU on this rcu_node structure.  This means that
> > >>>    rcu_accelerate_cbs() should never return the value true, and thus that
> > >>>    this function should never set the needwake variable and in turn never
> > >>>    invoke rcu_gp_kthread_wake().
> > >>> 
> > >>>    This commit therefore removes the needwake variable and the invocation
> > >>>    of rcu_gp_kthread_wake() in favor of a WARN_ON_ONCE() on the call to
> > >>>    rcu_accelerate_cbs().  The purpose of this new WARN_ON_ONCE() is to
> > >>>    detect situations where the system's opinion differs from ours.
> > >>> 
> > >>>    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> > >>>    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> > >>>    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > >>> 
> > >>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > >>> index b2c2045294780..7a3085ad0a7df 100644
> > >>> --- a/kernel/rcu/tree.c
> > >>> +++ b/kernel/rcu/tree.c
> > >>> @@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> > >>> {
> > >>>      unsigned long flags;
> > >>>      unsigned long mask;
> > >>> -     bool needwake = false;
> > >>>      bool needacc = false;
> > >>>      struct rcu_node *rnp;
> > >>> 
> > >>> @@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> > >>>               * NOCB kthreads have their own way to deal with that...
> > >>>               */
> > >>>              if (!rcu_rdp_is_offloaded(rdp)) {
> > >>> -                     needwake = rcu_accelerate_cbs(rnp, rdp);
> > >>> +                     /*
> > >>> +                      * The current GP has not yet ended, so it
> > >>> +                      * should not be possible for rcu_accelerate_cbs()
> > >>> +                      * to return true.  So complain, but don't awaken.
> > >>> +                      */
> > >>> +                     WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp));
> > >>>              } else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) {
> > >>>                      /*
> > >>>                       * ...but NOCB kthreads may miss or delay callbacks acceleration
> > >>> @@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> > >>>              rcu_disable_urgency_upon_qs(rdp);
> > >>>              rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> > >>>              /* ^^^ Released rnp->lock */
> > >>> -             if (needwake)
> > >>> -                     rcu_gp_kthread_wake();
> > >>> 
> > >>> AFAICS, there is almost no compiler benefit of doing this, and zero runtime
> > >>> benefit of doing this. The WARN_ON_ONCE() also involves a runtime condition
> > >>> check of the return value of rcu_accelerate_cbs(), so you still have a
> > >>> branch. Yes, maybe slightly smaller code without the wake call, but I'm not
> > >>> sure that is worth it.
> > >>> 
> > >>> And, if the opinion of system differs, its a bug anyway, so more added risk.
> > >>> 
> > >>> 
> > >>> 
> > >>>              if (needacc) {
> > >>>                      rcu_nocb_lock_irqsave(rdp, flags);
> > >>> 
> > >>> And when needacc = true, rcu_accelerate_cbs_unlocked() tries to do a wake up
> > >>> anyway, so it is consistent with nocb vs !nocb.
> > >> 
> > >> For !nocb, we invoked rcu_accelerate_cbs() before report qs,  so this GP is impossible to end
> > >> and we also not set RCU_GP_FLAG_INIT to start new GP in rcu_accelerate_cbs().
> > >> but for nocb, when needacc = true, we invoke rcu_accelerate_cbs_unlocked() after current CPU
> > >> has reported qs,  if all CPU have been reported qs,  we will wakeup gp kthread to end this GP in
> > >> rcu_report_qs_rnp().   after that, the rcu_accelerate_cbs_unlocked() is  possible to try to wake up
> > >> gp kthread if this GP has ended at this time.   so nocb vs !nocb is likely to be inconsistent.
> > >> 
> > >> 
> > >> That is a fair point. But after gp ends,  rcu_check_quiescent_state()
> > >> -> note_gp_changes() which will do a accel + GP thread wake up at that
> > >> point anyway, once it notices that a GP has come to an end. That
> > >> should happen for both the nocb and !nocb cases right?
> > > 
> > > For nocb rdp, we won't invoke rcu_accelerate_cbs() and rcu_advance_cbs() in
> > > note_gp_changes().  so also not wakeup gp kthread in note_gp_changes(). 
> > 
> > Yes correct, ok but…
> > > 
> > >> 
> > >> I am wondering if rcu_report_qs_rdp() needs to be rethought to make
> > >> both cases consistent.
> > >> 
> > >> Why does the nocb case need an accel + GP thread wakeup in the
> > >> rcu_report_qs_rdp() function, but the !nocb case does not?
> > > 
> > > For nocb accel + GP kthread wakeup only happen in the middle of a (de-)offloading process.
> > > this is an intermediate state.
> > 
> > Sure, I know what the code currently does, I am asking why and it feels wrong.
> > 
> > I suggest you slightly change your approach to not assuming the code should be bonafide correct and then fixing it (which is ok once in a while), and asking higher level questions to why things are the way they are in the first place (that is just my suggestion and I am not in a place to provide advice, far from it, but I am just telling you my approach — I care more about the code than increasing my patch count :P).
> > 
> > If you are in an intermediate state, part way to a !nocb state — you may have missed a nocb-related accel and wake, correct? Why does that matter? Once we transition to a !nocb state, we do not do a post-qs-report accel+wake anyway as we clearly know from the discussion. So why do we need to do it if we missed it for the intermediate stage? So, I am not fully sure yet what that needac is doing and why it is needed.
> > 
> > Do not get me wrong, stellar work here. But I suggest challenge the assumptions and the design, not always just the code that was already written :), apologies for any misplaced or noisy advice.
> 
> To add to Joel's point, an extra unnecessary check on a slow path can
> be OK, but missing a necessary check is of course very bad.
> 
> Just to make sure that I am following along, here are the options I see:
> 
> 1.	Status quo.
> 
> 2.	Zqiang's current patch, as in remove the wakeup and
> 	add the WARN_ON_ONCE().
> 
> 3.	Status quo, and only add the WARN_ON_ONCE(), but still
> 	keep the needless check for the wakeup.
> 
> Are there other options that I have missed?

I'm personally in favour of keeping 2.
Removing an imaginary path and consolidating an expectation from such
a complicated codebase always makes me able to sleep a few more minutes
everyday :)

Thanks.

> 
> 							Thanx, Paul
  
Frederic Weisbecker Jan. 20, 2023, 11:04 p.m. UTC | #12
On Fri, Jan 20, 2023 at 08:27:03AM -0500, Joel Fernandes wrote:
> 
> Sure, I know what the code currently does, I am asking why and it feels wrong.
> 
> I suggest you slightly change your approach to not assuming the code should be bonafide correct and then fixing it (which is ok once in a while), and asking higher level questions to why things are the way they are in the first place (that is just my suggestion and I am not in a place to provide advice, far from it, but I am just telling you my approach — I care more about the code than increasing my patch count :P).
> 
> If you are in an intermediate state, part way to a !nocb state — you may have
> missed a nocb-related accel and wake, correct? Why does that matter? Once we
> transition to a !nocb state, we do not do a post-qs-report accel+wake anyway
> as we clearly know from the discussion.

I'm confused. We are doing that acceleration on qs report for !nocb CPU, right?

> So why do we need to do it if we
> missed it for the intermediate stage? So, I am not fully sure yet what that
> needac is doing and why it is needed.

To summarize:

* If the CPU is NOCB, all the callbacks advance and acceleration is performed
  by the rcuo/rcuog kthreads.

* If the CPU is not NOCB, all the callbacks acceleration is performed by the
  CPU, such as in the case of rcu_report_qs_rdp().

* If the CPU is transitionning from NOCB to !NOCB or from !NOCB to NOCB, the
  kthreads may not be available to do the advance/acceleration, so we must do
  it locally. That's the needacc path.

What am I missing?

Thanks.
  
Paul E. McKenney Jan. 20, 2023, 11:20 p.m. UTC | #13
On Fri, Jan 20, 2023 at 11:35:59PM +0100, Frederic Weisbecker wrote:
> On Fri, Jan 20, 2023 at 12:33:00PM -0800, Paul E. McKenney wrote:
> > On Fri, Jan 20, 2023 at 08:27:03AM -0500, Joel Fernandes wrote:
> > > 
> > > 
> > > > On Jan 20, 2023, at 3:19 AM, Zhang, Qiang1 <qiang1.zhang@intel.com> wrote:
> > > > 
> > > > 
> > > >> 
> > > >> 
> > > >>>> On Wed, Jan 18, 2023 at 03:30:14PM +0800, Zqiang wrote:
> > > >>>>> When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's ->
> > > >>>>> grpmask has not been cleared from the corresponding rcu_node structure's
> > > >>>>> ->qsmask, after that will clear and report quiescent state, but in this
> > > >>>>> time, this also means that current grace period is not end, the current
> > > >>>>> grace period is ongoing, because the rcu_gp_in_progress() currently return
> > > >>>>> true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible
> > > >>>>> to return true.
> > > >>>>> 
> > > >>>>> This commit therefore remove impossible rcu_gp_kthread_wake() calling.
> > > >>>>> 
> > > >>>>> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> > > >>>>> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> > > >>> 
> > > >>> Queued (wordsmithed as shown below, as always, please check) for further
> > > >>> testing and review, thank you both!
> > > >>> 
> > > >>>                                                      Thanx, Paul
> > > >>> 
> > > >>> ------------------------------------------------------------------------
> > > >>> 
> > > >>> commit fbe3e300ec8b3edd2b8f84dab4dc98947cf71eb8
> > > >>> Author: Zqiang <qiang1.zhang@intel.com>
> > > >>> Date:   Wed Jan 18 15:30:14 2023 +0800
> > > >>> 
> > > >>>    rcu: Remove never-set needwake assignment from rcu_report_qs_rdp()
> > > >>> 
> > > >>>    The rcu_accelerate_cbs() function is invoked by rcu_report_qs_rdp()
> > > >>>    only if there is a grace period in progress that is still blocked
> > > >>>    by at least one CPU on this rcu_node structure.  This means that
> > > >>>    rcu_accelerate_cbs() should never return the value true, and thus that
> > > >>>    this function should never set the needwake variable and in turn never
> > > >>>    invoke rcu_gp_kthread_wake().
> > > >>> 
> > > >>>    This commit therefore removes the needwake variable and the invocation
> > > >>>    of rcu_gp_kthread_wake() in favor of a WARN_ON_ONCE() on the call to
> > > >>>    rcu_accelerate_cbs().  The purpose of this new WARN_ON_ONCE() is to
> > > >>>    detect situations where the system's opinion differs from ours.
> > > >>> 
> > > >>>    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> > > >>>    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> > > >>>    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > >>> 
> > > >>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > >>> index b2c2045294780..7a3085ad0a7df 100644
> > > >>> --- a/kernel/rcu/tree.c
> > > >>> +++ b/kernel/rcu/tree.c
> > > >>> @@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> > > >>> {
> > > >>>      unsigned long flags;
> > > >>>      unsigned long mask;
> > > >>> -     bool needwake = false;
> > > >>>      bool needacc = false;
> > > >>>      struct rcu_node *rnp;
> > > >>> 
> > > >>> @@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> > > >>>               * NOCB kthreads have their own way to deal with that...
> > > >>>               */
> > > >>>              if (!rcu_rdp_is_offloaded(rdp)) {
> > > >>> -                     needwake = rcu_accelerate_cbs(rnp, rdp);
> > > >>> +                     /*
> > > >>> +                      * The current GP has not yet ended, so it
> > > >>> +                      * should not be possible for rcu_accelerate_cbs()
> > > >>> +                      * to return true.  So complain, but don't awaken.
> > > >>> +                      */
> > > >>> +                     WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp));
> > > >>>              } else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) {
> > > >>>                      /*
> > > >>>                       * ...but NOCB kthreads may miss or delay callbacks acceleration
> > > >>> @@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> > > >>>              rcu_disable_urgency_upon_qs(rdp);
> > > >>>              rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> > > >>>              /* ^^^ Released rnp->lock */
> > > >>> -             if (needwake)
> > > >>> -                     rcu_gp_kthread_wake();
> > > >>> 
> > > >>> AFAICS, there is almost no compiler benefit of doing this, and zero runtime
> > > >>> benefit of doing this. The WARN_ON_ONCE() also involves a runtime condition
> > > >>> check of the return value of rcu_accelerate_cbs(), so you still have a
> > > >>> branch. Yes, maybe slightly smaller code without the wake call, but I'm not
> > > >>> sure that is worth it.
> > > >>> 
> > > >>> And, if the opinion of system differs, its a bug anyway, so more added risk.
> > > >>> 
> > > >>> 
> > > >>> 
> > > >>>              if (needacc) {
> > > >>>                      rcu_nocb_lock_irqsave(rdp, flags);
> > > >>> 
> > > >>> And when needacc = true, rcu_accelerate_cbs_unlocked() tries to do a wake up
> > > >>> anyway, so it is consistent with nocb vs !nocb.
> > > >> 
> > > >> For !nocb, we invoked rcu_accelerate_cbs() before report qs,  so this GP is impossible to end
> > > >> and we also not set RCU_GP_FLAG_INIT to start new GP in rcu_accelerate_cbs().
> > > >> but for nocb, when needacc = true, we invoke rcu_accelerate_cbs_unlocked() after current CPU
> > > >> has reported qs,  if all CPU have been reported qs,  we will wakeup gp kthread to end this GP in
> > > >> rcu_report_qs_rnp().   after that, the rcu_accelerate_cbs_unlocked() is  possible to try to wake up
> > > >> gp kthread if this GP has ended at this time.   so nocb vs !nocb is likely to be inconsistent.
> > > >> 
> > > >> 
> > > >> That is a fair point. But after gp ends,  rcu_check_quiescent_state()
> > > >> -> note_gp_changes() which will do a accel + GP thread wake up at that
> > > >> point anyway, once it notices that a GP has come to an end. That
> > > >> should happen for both the nocb and !nocb cases right?
> > > > 
> > > > For nocb rdp, we won't invoke rcu_accelerate_cbs() and rcu_advance_cbs() in
> > > > note_gp_changes().  so also not wakeup gp kthread in note_gp_changes(). 
> > > 
> > > Yes correct, ok but…
> > > > 
> > > >> 
> > > >> I am wondering if rcu_report_qs_rdp() needs to be rethought to make
> > > >> both cases consistent.
> > > >> 
> > > >> Why does the nocb case need an accel + GP thread wakeup in the
> > > >> rcu_report_qs_rdp() function, but the !nocb case does not?
> > > > 
> > > > For nocb accel + GP kthread wakeup only happen in the middle of a (de-)offloading process.
> > > > this is an intermediate state.
> > > 
> > > Sure, I know what the code currently does, I am asking why and it feels wrong.
> > > 
> > > I suggest you slightly change your approach to not assuming the code should be bonafide correct and then fixing it (which is ok once in a while), and asking higher level questions to why things are the way they are in the first place (that is just my suggestion and I am not in a place to provide advice, far from it, but I am just telling you my approach — I care more about the code than increasing my patch count :P).
> > > 
> > > If you are in an intermediate state, part way to a !nocb state — you may have missed a nocb-related accel and wake, correct? Why does that matter? Once we transition to a !nocb state, we do not do a post-qs-report accel+wake anyway as we clearly know from the discussion. So why do we need to do it if we missed it for the intermediate stage? So, I am not fully sure yet what that needac is doing and why it is needed.
> > > 
> > > Do not get me wrong, stellar work here. But I suggest challenge the assumptions and the design, not always just the code that was already written :), apologies for any misplaced or noisy advice.
> > 
> > To add to Joel's point, an extra unnecessary check on a slow path can
> > be OK, but missing a necessary check is of course very bad.
> > 
> > Just to make sure that I am following along, here are the options I see:
> > 
> > 1.	Status quo.
> > 
> > 2.	Zqiang's current patch, as in remove the wakeup and
> > 	add the WARN_ON_ONCE().
> > 
> > 3.	Status quo, and only add the WARN_ON_ONCE(), but still
> > 	keep the needless check for the wakeup.
> > 
> > Are there other options that I have missed?
> 
> I'm personally in favour of keeping 2.
> Removing an imaginary path and consolidating an expectation from such
> a complicated codebase always makes me able to sleep a few more minutes
> everyday :)

Excellent point, thank you!

							Thanx, Paul
  
Zqiang Jan. 21, 2023, 12:38 a.m. UTC | #14
> On Jan 20, 2023, at 3:19 AM, Zhang, Qiang1 <qiang1.zhang@intel.com> wrote:
> 
> 
>> 
>> 
>>>> On Wed, Jan 18, 2023 at 03:30:14PM +0800, Zqiang wrote:
>>>>> When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's ->
>>>>> grpmask has not been cleared from the corresponding rcu_node structure's
>>>>> ->qsmask, after that will clear and report quiescent state, but in this
>>>>> time, this also means that current grace period is not end, the current
>>>>> grace period is ongoing, because the rcu_gp_in_progress() currently return
>>>>> true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible
>>>>> to return true.
>>>>> 
>>>>> This commit therefore remove impossible rcu_gp_kthread_wake() calling.
>>>>> 
>>>>> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
>>>>> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
>>> 
>>> Queued (wordsmithed as shown below, as always, please check) for further
>>> testing and review, thank you both!
>>> 
>>>                                                      Thanx, Paul
>>> 
>>> ------------------------------------------------------------------------
>>> 
>>> commit fbe3e300ec8b3edd2b8f84dab4dc98947cf71eb8
>>> Author: Zqiang <qiang1.zhang@intel.com>
>>> Date:   Wed Jan 18 15:30:14 2023 +0800
>>> 
>>>    rcu: Remove never-set needwake assignment from rcu_report_qs_rdp()
>>> 
>>>    The rcu_accelerate_cbs() function is invoked by rcu_report_qs_rdp()
>>>    only if there is a grace period in progress that is still blocked
>>>    by at least one CPU on this rcu_node structure.  This means that
>>>    rcu_accelerate_cbs() should never return the value true, and thus that
>>>    this function should never set the needwake variable and in turn never
>>>    invoke rcu_gp_kthread_wake().
>>> 
>>>    This commit therefore removes the needwake variable and the invocation
>>>    of rcu_gp_kthread_wake() in favor of a WARN_ON_ONCE() on the call to
>>>    rcu_accelerate_cbs().  The purpose of this new WARN_ON_ONCE() is to
>>>    detect situations where the system's opinion differs from ours.
>>> 
>>>    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
>>>    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
>>>    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>>> 
>>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>>> index b2c2045294780..7a3085ad0a7df 100644
>>> --- a/kernel/rcu/tree.c
>>> +++ b/kernel/rcu/tree.c
>>> @@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
>>> {
>>>      unsigned long flags;
>>>      unsigned long mask;
>>> -     bool needwake = false;
>>>      bool needacc = false;
>>>      struct rcu_node *rnp;
>>> 
>>> @@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
>>>               * NOCB kthreads have their own way to deal with that...
>>>               */
>>>              if (!rcu_rdp_is_offloaded(rdp)) {
>>> -                     needwake = rcu_accelerate_cbs(rnp, rdp);
>>> +                     /*
>>> +                      * The current GP has not yet ended, so it
>>> +                      * should not be possible for rcu_accelerate_cbs()
>>> +                      * to return true.  So complain, but don't awaken.
>>> +                      */
>>> +                     WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp));
>>>              } else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) {
>>>                      /*
>>>                       * ...but NOCB kthreads may miss or delay callbacks acceleration
>>> @@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
>>>              rcu_disable_urgency_upon_qs(rdp);
>>>              rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
>>>              /* ^^^ Released rnp->lock */
>>> -             if (needwake)
>>> -                     rcu_gp_kthread_wake();
>>> 
>>> AFAICS, there is almost no compiler benefit of doing this, and zero runtime
>>> benefit of doing this. The WARN_ON_ONCE() also involves a runtime condition
>>> check of the return value of rcu_accelerate_cbs(), so you still have a
>>> branch. Yes, maybe slightly smaller code without the wake call, but I'm not
>>> sure that is worth it.
>>> 
>>> And, if the opinion of system differs, its a bug anyway, so more added risk.
>>> 
>>> 
>>> 
>>>              if (needacc) {
>>>                      rcu_nocb_lock_irqsave(rdp, flags);
>>> 
>>> And when needacc = true, rcu_accelerate_cbs_unlocked() tries to do a wake up
>>> anyway, so it is consistent with nocb vs !nocb.
>> 
>> For !nocb, we invoked rcu_accelerate_cbs() before report qs,  so this GP is impossible to end
>> and we also not set RCU_GP_FLAG_INIT to start new GP in rcu_accelerate_cbs().
>> but for nocb, when needacc = true, we invoke rcu_accelerate_cbs_unlocked() after current CPU
>> has reported qs,  if all CPU have been reported qs,  we will wakeup gp kthread to end this GP in
>> rcu_report_qs_rnp().   after that, the rcu_accelerate_cbs_unlocked() is  possible to try to wake up
>> gp kthread if this GP has ended at this time.   so nocb vs !nocb is likely to be inconsistent.
>> 
>> 
>> That is a fair point. But after gp ends,  rcu_check_quiescent_state()
>> -> note_gp_changes() which will do a accel + GP thread wake up at that
>> point anyway, once it notices that a GP has come to an end. That
>> should happen for both the nocb and !nocb cases right?
> 
> For nocb rdp, we won't invoke rcu_accelerate_cbs() and rcu_advance_cbs() in
> note_gp_changes().  so also not wakeup gp kthread in note_gp_changes(). 
>
>Yes correct, ok but…
> 
>> 
>> I am wondering if rcu_report_qs_rdp() needs to be rethought to make
>> both cases consistent.
>> 
>> Why does the nocb case need an accel + GP thread wakeup in the
>> rcu_report_qs_rdp() function, but the !nocb case does not?
> 
> For nocb accel + GP kthread wakeup only happen in the middle of a (de-)offloading process.
> this is an intermediate state.
>
>Sure, I know what the code currently does, I am asking why and it feels wrong.
>
>I suggest you slightly change your approach to not assuming the code should be bonafide 
>correct and then fixing it (which is ok once in a while), and asking higher level questions
>to why things are the way they are in the first place (that is just my suggestion and I am not in
>a place to provide advice, far from it, but I am just telling you my approach — I care more about
>the code than increasing my patch count :P).
>

Thanks Joel, this is a useful suggestion for me.

>
>If you are in an intermediate state, part way to a !nocb state — 
>you may have missed a nocb-related accel and wake, correct? Why does that matter? 
>Once we transition to a !nocb state, we do not do a post-qs-report accel+wake anyway

Should it be transition to a !nocb state, we do a post-qs-report accel+wake.

>as we clearly know from the discussion. So why do we need to do it if we missed it for
>the intermediate stage? So, I am not fully sure yet what that needac is doing and why it is needed.

For de-offloading, when in the process of de-offloading(rcu_segcblist_completely_offloaded() return false),
we're not doing bypass even though this rdp is offloaded state(rcu_rdp_is_offloaded(rdp) return true),
at this time, the rcuog kthread probably won't accel+wakeup, so we do accel+wakeup in rcu_report_qs_rdp(),
as you say why does that matter?  for !nocb state,  we've always tried to accel+wakeup as
much as possible(compared to nocb), let rcu callback be executed as soon as possible.

This is just my personal opinion, please correct me if I am wrong.

Thanks
Zqiang


>
>Do not get me wrong, stellar work here. But I suggest challenge the assumptions and the design, not always just the code that was already written :), apologies for any misplaced or noisy advice.
>
>Thanks!
>
> - Joel


>    
> Thanks
> Zqiang
> 
>> 
>> (I am out of office till Monday but will intermittently (maybe) check
>> in, RCU is one of those things that daydreaming tends to lend itself
>> to...)
>> 
>> - Joel
  
Joel Fernandes Jan. 23, 2023, 3:22 p.m. UTC | #15
Hi Frederic,

On Fri, Jan 20, 2023 at 6:04 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> On Fri, Jan 20, 2023 at 08:27:03AM -0500, Joel Fernandes wrote:
> >
> > Sure, I know what the code currently does, I am asking why and it feels wrong.
> >
> > I suggest you slightly change your approach to not assuming the code should be bonafide correct and then fixing it (which is ok once in a while), and asking higher level questions to why things are the way they are in the first place (that is just my suggestion and I am not in a place to provide advice, far from it, but I am just telling you my approach — I care more about the code than increasing my patch count :P).
> >
> > If you are in an intermediate state, part way to a !nocb state — you may have
> > missed a nocb-related accel and wake, correct? Why does that matter? Once we
> > transition to a !nocb state, we do not do a post-qs-report accel+wake anyway
> > as we clearly know from the discussion.
>
> I'm confused. We are doing that acceleration on qs report for !nocb CPU, right?
>
> > So why do we need to do it if we
> > missed it for the intermediate stage? So, I am not fully sure yet what that
> > needac is doing and why it is needed.
>
> To summarize:
>
> * If the CPU is NOCB, all the callbacks advance and acceleration is performed
>   by the rcuo/rcuog kthreads.
>
> * If the CPU is not NOCB, all the callbacks acceleration is performed by the
>   CPU, such as in the case of rcu_report_qs_rdp().
>
> * If the CPU is transitionning from NOCB to !NOCB or from !NOCB to NOCB, the
>   kthreads may not be available to do the advance/acceleration, so we must do
>   it locally. That's the needacc path.

Sure, I agree it "must be done locally" for the benefit of the
half-way transition.

> What am I missing?

That the acceleration is also done by __note_gp_changes() once the
grace period ends anyway, so if any acceleration was missed as you
say, it will be done anyway.

Also it is done by scheduler tick raising softirq:

rcu_pending() does this:
        /* Has RCU gone idle with this CPU needing another grace period? */
        if (!gp_in_progress && rcu_segcblist_is_enabled(&rdp->cblist) &&
            !rcu_rdp_is_offloaded(rdp) &&
            !rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
                return 1;

and rcu_core():
        /* No grace period and unregistered callbacks? */
        if (!rcu_gp_in_progress() &&
            rcu_segcblist_is_enabled(&rdp->cblist) && do_batch) {
                rcu_nocb_lock_irqsave(rdp, flags);
                if (!rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
                        rcu_accelerate_cbs_unlocked(rnp, rdp);
                rcu_nocb_unlock_irqrestore(rdp, flags);
        }

So, I am not sure if you need needacc at all. Those CBs that have not
been assigned grace period numbers will be taken care off :)

Thanks!

  -Joel
  
Frederic Weisbecker Jan. 23, 2023, 4:27 p.m. UTC | #16
On Mon, Jan 23, 2023 at 10:22:19AM -0500, Joel Fernandes wrote:
> > What am I missing?
> 
> That the acceleration is also done by __note_gp_changes() once the
> grace period ends anyway, so if any acceleration was missed as you
> say, it will be done anyway.
> 
> Also it is done by scheduler tick raising softirq:
> 
> rcu_pending() does this:
>         /* Has RCU gone idle with this CPU needing another grace period? */
>         if (!gp_in_progress && rcu_segcblist_is_enabled(&rdp->cblist) &&
>             !rcu_rdp_is_offloaded(rdp) &&
>             !rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
>                 return 1;
> 
> and rcu_core():
>         /* No grace period and unregistered callbacks? */
>         if (!rcu_gp_in_progress() &&
>             rcu_segcblist_is_enabled(&rdp->cblist) && do_batch) {
>                 rcu_nocb_lock_irqsave(rdp, flags);
>                 if (!rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
>                         rcu_accelerate_cbs_unlocked(rnp, rdp);
>                 rcu_nocb_unlock_irqrestore(rdp, flags);
>         }
> 
> So, I am not sure if you need needacc at all. Those CBs that have not
> been assigned grace period numbers will be taken care off :)

But that's only when there is no grace period pending, so it can't happen while
we report a QS.

OTOH without the needacc, those callbacks waiting to be accelerated would be
eventually processed but only on the next tick following the end of a grace
period...if none has started since then. So if someone else starts a new GP
before the current CPU, we must wait another GP, etc...

That's potentially dangerous.

And unfortunately we can't do the acceleration from __note_gp_changes() due
to lock ordering restrictions: nocb_lock -> rnp_lock

> 
> Thanks!
> 
>   -Joel
  
Joel Fernandes Jan. 23, 2023, 8:54 p.m. UTC | #17
> On Jan 23, 2023, at 11:27 AM, Frederic Weisbecker <frederic@kernel.org> wrote:
> 
> On Mon, Jan 23, 2023 at 10:22:19AM -0500, Joel Fernandes wrote:
>>> What am I missing?
>> 
>> That the acceleration is also done by __note_gp_changes() once the
>> grace period ends anyway, so if any acceleration was missed as you
>> say, it will be done anyway.
>> 
>> Also it is done by scheduler tick raising softirq:
>> 
>> rcu_pending() does this:
>>        /* Has RCU gone idle with this CPU needing another grace period? */
>>        if (!gp_in_progress && rcu_segcblist_is_enabled(&rdp->cblist) &&
>>            !rcu_rdp_is_offloaded(rdp) &&
>>            !rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
>>                return 1;
>> 
>> and rcu_core():
>>        /* No grace period and unregistered callbacks? */
>>        if (!rcu_gp_in_progress() &&
>>            rcu_segcblist_is_enabled(&rdp->cblist) && do_batch) {
>>                rcu_nocb_lock_irqsave(rdp, flags);
>>                if (!rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
>>                        rcu_accelerate_cbs_unlocked(rnp, rdp);
>>                rcu_nocb_unlock_irqrestore(rdp, flags);
>>        }
>> 
>> So, I am not sure if you need needacc at all. Those CBs that have not
>> been assigned grace period numbers will be taken care off :)
> 
> But that's only when there is no grace period pending, so it can't happen while
> we report a QS.
> 
> OTOH without the needacc, those callbacks waiting to be accelerated would be
> eventually processed but only on the next tick following the end of a grace
> period...if none has started since then. So if someone else starts a new GP
> before the current CPU, we must wait another GP, etc...
> 
> That's potentially dangerous.

Waiting for just one more GP cannot be dangerous IMO. Anyway there is no guarantee that callback will run immediately at end of GP, there may be one or more GPs before callback can run, if I remember correctly. That is by design.. but please correct me if my understanding is different from yours.

> 
> And unfortunately we can't do the acceleration from __note_gp_changes() due
> to lock ordering restrictions: nocb_lock -> rnp_lock
> 

Ah. This part I am not sure. Appreciate if point me to any old archive links or documentation detailing that, if possible… 

Thanks!

- Joel

>> 
>> Thanks!
>> 
>> -Joel
  
Frederic Weisbecker Jan. 23, 2023, 9:11 p.m. UTC | #18
On Mon, Jan 23, 2023 at 03:54:07PM -0500, Joel Fernandes wrote:
> > On Jan 23, 2023, at 11:27 AM, Frederic Weisbecker <frederic@kernel.org> wrote:
> > 
> > On Mon, Jan 23, 2023 at 10:22:19AM -0500, Joel Fernandes wrote:
> >>> What am I missing?
> >> 
> >> That the acceleration is also done by __note_gp_changes() once the
> >> grace period ends anyway, so if any acceleration was missed as you
> >> say, it will be done anyway.
> >> 
> >> Also it is done by scheduler tick raising softirq:
> >> 
> >> rcu_pending() does this:
> >>        /* Has RCU gone idle with this CPU needing another grace period? */
> >>        if (!gp_in_progress && rcu_segcblist_is_enabled(&rdp->cblist) &&
> >>            !rcu_rdp_is_offloaded(rdp) &&
> >>            !rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
> >>                return 1;
> >> 
> >> and rcu_core():
> >>        /* No grace period and unregistered callbacks? */
> >>        if (!rcu_gp_in_progress() &&
> >>            rcu_segcblist_is_enabled(&rdp->cblist) && do_batch) {
> >>                rcu_nocb_lock_irqsave(rdp, flags);
> >>                if (!rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
> >>                        rcu_accelerate_cbs_unlocked(rnp, rdp);
> >>                rcu_nocb_unlock_irqrestore(rdp, flags);
> >>        }
> >> 
> >> So, I am not sure if you need needacc at all. Those CBs that have not
> >> been assigned grace period numbers will be taken care off :)
> > 
> > But that's only when there is no grace period pending, so it can't happen while
> > we report a QS.
> > 
> > OTOH without the needacc, those callbacks waiting to be accelerated would be
> > eventually processed but only on the next tick following the end of a grace
> > period...if none has started since then. So if someone else starts a new GP
> > before the current CPU, we must wait another GP, etc...
> > 
> > That's potentially dangerous.
> 
> Waiting for just one more GP cannot be dangerous IMO. Anyway there is no
> guarantee that callback will run immediately at end of GP, there may be one or
> more GPs before callback can run, if I remember correctly. That is by
> design.. but please correct me if my understanding is different from yours.

It's not bound to just one GP. If you have N CPUs flooding callbacks for a
long while, your CPU has 1/N chances to be the one starting the next GP on
each turn. Relying on the acceleration to be performed only when no GP is
running may theoretically starve your callbacks forever.

> > And unfortunately we can't do the acceleration from __note_gp_changes() due
> > to lock ordering restrictions: nocb_lock -> rnp_lock
> > 
> 
> Ah. This part I am not sure. Appreciate if point me to any old archive links or documentation detailing that, if possible… 

It's not documented but the code in nocb_gp_wait() or nocb_cb_wait() has that
locking order for example.

Excerpt:

	rcu_nocb_lock_irqsave(rdp, flags);
	if (rcu_segcblist_nextgp(cblist, &cur_gp_seq) &&
	    rcu_seq_done(&rnp->gp_seq, cur_gp_seq) &&
	    raw_spin_trylock_rcu_node(rnp)) { /* irqs already disabled. */
		needwake_gp = rcu_advance_cbs(rdp->mynode, rdp);
		raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */
	}

Thanks.
  
Joel Fernandes Jan. 24, 2023, 4:58 p.m. UTC | #19
Hi Frederic,

On Mon, Jan 23, 2023 at 10:11:15PM +0100, Frederic Weisbecker wrote:
> On Mon, Jan 23, 2023 at 03:54:07PM -0500, Joel Fernandes wrote:
> > > On Jan 23, 2023, at 11:27 AM, Frederic Weisbecker <frederic@kernel.org> wrote:
> > > 
> > > On Mon, Jan 23, 2023 at 10:22:19AM -0500, Joel Fernandes wrote:
> > >>> What am I missing?
> > >> 
> > >> That the acceleration is also done by __note_gp_changes() once the
> > >> grace period ends anyway, so if any acceleration was missed as you
> > >> say, it will be done anyway.
> > >> 
> > >> Also it is done by scheduler tick raising softirq:
> > >> 
> > >> rcu_pending() does this:
> > >>        /* Has RCU gone idle with this CPU needing another grace period? */
> > >>        if (!gp_in_progress && rcu_segcblist_is_enabled(&rdp->cblist) &&
> > >>            !rcu_rdp_is_offloaded(rdp) &&
> > >>            !rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
> > >>                return 1;
> > >> 
> > >> and rcu_core():
> > >>        /* No grace period and unregistered callbacks? */
> > >>        if (!rcu_gp_in_progress() &&
> > >>            rcu_segcblist_is_enabled(&rdp->cblist) && do_batch) {
> > >>                rcu_nocb_lock_irqsave(rdp, flags);
> > >>                if (!rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
> > >>                        rcu_accelerate_cbs_unlocked(rnp, rdp);
> > >>                rcu_nocb_unlock_irqrestore(rdp, flags);
> > >>        }
> > >> 
> > >> So, I am not sure if you need needacc at all. Those CBs that have not
> > >> been assigned grace period numbers will be taken care off :)
> > > 
> > > But that's only when there is no grace period pending, so it can't happen while
> > > we report a QS.
> > > 
> > > OTOH without the needacc, those callbacks waiting to be accelerated would be
> > > eventually processed but only on the next tick following the end of a grace
> > > period...if none has started since then. So if someone else starts a new GP
> > > before the current CPU, we must wait another GP, etc...
> > > 
> > > That's potentially dangerous.
> > 
> > Waiting for just one more GP cannot be dangerous IMO. Anyway there is no
> > guarantee that callback will run immediately at end of GP, there may be one or
> > more GPs before callback can run, if I remember correctly. That is by
> > design.. but please correct me if my understanding is different from yours.
> 
> It's not bound to just one GP. If you have N CPUs flooding callbacks for a
> long while, your CPU has 1/N chances to be the one starting the next GP on
> each turn. Relying on the acceleration to be performed only when no GP is
> running may theoretically starve your callbacks forever.

I tend to agree with you, however I was mostly referring to the "needac". The
theoretical case is even more theoretical because you have to be half way
through the transition _and_ flooding at the same time such that there is not
a chance for RCU to be idle for a long period of time (which we don't really
see in our systems per-se). But I agree, even if theoretical, maybe better to
handle it.

> > > And unfortunately we can't do the acceleration from __note_gp_changes()
> > > due to lock ordering restrictions: nocb_lock -> rnp_lock
> > > 

Yeah I was more going for the fact that if the offload to deoffload
transition completed before note_gp_changes() was called, which obviously we
cannot assume...

But yeah for the 'limbo between offload to not offload state', eventually RCU
goes idle, the scheduling clock interrupt (via rcu_pending()) will raise
softirq or note_gp_changes() will accelerate. Which does not help the
theoretical flooding case above, but still...

> > 
> > Ah. This part I am not sure. Appreciate if point me to any old archive
> > links or documentation detailing that, if possible… 
> 
> It's not documented but the code in nocb_gp_wait() or nocb_cb_wait() has
> that locking order for example.
> 
> Excerpt:
> 
> 	rcu_nocb_lock_irqsave(rdp, flags); if (rcu_segcblist_nextgp(cblist,
> 	&cur_gp_seq) && rcu_seq_done(&rnp->gp_seq, cur_gp_seq) &&
> 	raw_spin_trylock_rcu_node(rnp)) { /* irqs already disabled. */
> 	needwake_gp = rcu_advance_cbs(rdp->mynode, rdp);
> 	raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */ }
> 
> Thanks.

Ah I know what you mean now, in other words there is a chance of an ABBA
deadlock if we do not acquire the locks in the correct order. Thank you for
clarifying. Indeed rcutree_migrate_callbacks() also confirms it.

I guess to Frederic's point, another case other than the CB flooding, that can
starve CB accelerations is if the system is in the 'limbo between offload and
de-offload' state for a long period of time. In such a situation also, neither
note_gp_changes(), nor the scheduling-clock interrupt via softirq will help
accelerate CBs.

Whether such a limbo state is possible indefinitely, I am not sure...

Thanks a lot for the discussions and it is always a learning experience
talking about these...!

thanks,

 - Joel
  
Joel Fernandes Jan. 24, 2023, 5:10 p.m. UTC | #20
On Sat, Jan 21, 2023 at 12:38:35AM +0000, Zhang, Qiang1 wrote:
[...]
> >If you are in an intermediate state, part way to a !nocb state — 
> >you may have missed a nocb-related accel and wake, correct? Why does that matter? 
> >Once we transition to a !nocb state, we do not do a post-qs-report accel+wake anyway
> 
> Should it be transition to a !nocb state, we do a post-qs-report accel+wake.
> 
> >as we clearly know from the discussion. So why do we need to do it if we missed it for
> >the intermediate stage? So, I am not fully sure yet what that needac is doing and why it is needed.
> 
> For de-offloading, when in the process of
> de-offloading(rcu_segcblist_completely_offloaded() return false), we're not
> doing bypass even though this rdp is offloaded
> state(rcu_rdp_is_offloaded(rdp) return true), at this time, the rcuog
> kthread probably won't accel+wakeup, so we do accel+wakeup in
> rcu_report_qs_rdp(), as you say why does that matter?  for !nocb state,
> we've always tried to accel+wakeup as much as possible(compared to nocb),
> let rcu callback be executed as soon as possible.
> 
> This is just my personal opinion, please correct me if I am wrong.

I think your opinion is correct. The acceleration after the QS reporting may
be needed for the case where we are part way between the offload and
de-offload state as in this state (which we could call limbo state), there
does not seem to be anywhere that acceleration is performed, and if this state
persists for a long period of time, then no other path can accelerate the CBs
thus likely starving them as Frederic mentioned..

thanks,

 - Joel


> Thanks
> Zqiang
> 
> 
> >
> >Do not get me wrong, stellar work here. But I suggest challenge the assumptions and the design, not always just the code that was already written :), apologies for any misplaced or noisy advice.
> >
> >Thanks!
> >
> > - Joel
> 
> 
> >    
> > Thanks
> > Zqiang
> > 
> >> 
> >> (I am out of office till Monday but will intermittently (maybe) check
> >> in, RCU is one of those things that daydreaming tends to lend itself
> >> to...)
> >> 
> >> - Joel
  

Patch

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b2c204529478..0962c2202d45 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1956,7 +1956,6 @@  rcu_report_qs_rdp(struct rcu_data *rdp)
 {
 	unsigned long flags;
 	unsigned long mask;
-	bool needwake = false;
 	bool needacc = false;
 	struct rcu_node *rnp;
 
@@ -1988,7 +1987,12 @@  rcu_report_qs_rdp(struct rcu_data *rdp)
 		 * NOCB kthreads have their own way to deal with that...
 		 */
 		if (!rcu_rdp_is_offloaded(rdp)) {
-			needwake = rcu_accelerate_cbs(rnp, rdp);
+			/*
+			 * Current GP does not end, invoke rcu_gp_in_progress()
+			 * will return true and so doesn't wake up GP kthread to
+			 * start a new GP.
+			 */
+			WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp));
 		} else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) {
 			/*
 			 * ...but NOCB kthreads may miss or delay callbacks acceleration
@@ -2000,8 +2004,6 @@  rcu_report_qs_rdp(struct rcu_data *rdp)
 		rcu_disable_urgency_upon_qs(rdp);
 		rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
 		/* ^^^ Released rnp->lock */
-		if (needwake)
-			rcu_gp_kthread_wake();
 
 		if (needacc) {
 			rcu_nocb_lock_irqsave(rdp, flags);