srcu: Add detection of boot CPU srcu_data structure's->srcu_cblist

Message ID 20221117112050.3942407-1-qiang1.zhang@intel.com
State New
Headers
Series srcu: Add detection of boot CPU srcu_data structure's->srcu_cblist |

Commit Message

Zqiang Nov. 17, 2022, 11:20 a.m. UTC
  Before SRCU_SIZE_WAIT_CALL srcu_size_state is reached, the srcu
callback only insert to boot-CPU srcu_data structure's->srcu_cblist
and other CPU srcu_data structure's->srcu_cblist is always empty. so
before SRCU_SIZE_WAIT_CALL is reached, need to correctly check boot-CPU
pend cbs in srcu_might_be_idle().

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
---
 kernel/rcu/srcutree.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)
  

Comments

Frederic Weisbecker Nov. 17, 2022, 2:20 p.m. UTC | #1
On Thu, Nov 17, 2022 at 07:20:50PM +0800, Zqiang wrote:
> Before SRCU_SIZE_WAIT_CALL srcu_size_state is reached, the srcu
> callback only insert to boot-CPU srcu_data structure's->srcu_cblist
> and other CPU srcu_data structure's->srcu_cblist is always empty. so
> before SRCU_SIZE_WAIT_CALL is reached, need to correctly check boot-CPU
> pend cbs in srcu_might_be_idle().
> 
> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> ---
>  kernel/rcu/srcutree.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 6af031200580..6d9af9901765 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -1098,8 +1098,11 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp)
>  	unsigned long tlast;
>  
>  	check_init_srcu_struct(ssp);
> -	/* If the local srcu_data structure has callbacks, not idle.  */
> -	sdp = raw_cpu_ptr(ssp->sda);
> +	/* If the boot CPU or local srcu_data structure has callbacks, not idle.  */
> +	if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_CALL)
> +		sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id());
> +	else
> +		sdp = raw_cpu_ptr(ssp->sda);

While at it if someone is interested in documenting/commenting on the meaning of
all these SRCU_SIZE_* things, it would be much appreciated!

Thanks.

>  	spin_lock_irqsave_rcu_node(sdp, flags);
>  	if (rcu_segcblist_pend_cbs(&sdp->srcu_cblist)) {
>  		spin_unlock_irqrestore_rcu_node(sdp, flags);
> -- 
> 2.25.1
>
  
Zqiang Nov. 18, 2022, 4:53 a.m. UTC | #2
On Thu, Nov 17, 2022 at 07:20:50PM +0800, Zqiang wrote:
> Before SRCU_SIZE_WAIT_CALL srcu_size_state is reached, the srcu
> callback only insert to boot-CPU srcu_data structure's->srcu_cblist
> and other CPU srcu_data structure's->srcu_cblist is always empty. so
> before SRCU_SIZE_WAIT_CALL is reached, need to correctly check boot-CPU
> pend cbs in srcu_might_be_idle().
> 
> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> ---
>  kernel/rcu/srcutree.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 6af031200580..6d9af9901765 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -1098,8 +1098,11 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp)
>  	unsigned long tlast;
>  
>  	check_init_srcu_struct(ssp);
> -	/* If the local srcu_data structure has callbacks, not idle.  */
> -	sdp = raw_cpu_ptr(ssp->sda);
> +	/* If the boot CPU or local srcu_data structure has callbacks, not idle.  */
> +	if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_CALL)
> +		sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id());
> +	else
> +		sdp = raw_cpu_ptr(ssp->sda);
>
>While at it if someone is interested in documenting/commenting on the meaning of
>all these SRCU_SIZE_* things, it would be much appreciated!

In the initial stage(SRCU_SIZE_SMALL), there is no srcu_node structures, only initialized
per-CPU srcu_data structures, at this time, we only use boot-cpu srcu_data structures's ->srcu_cblist
to store srcu callbacks.  
if the contention of srcu_struct and srcu_data structure's->lock become busy,
transition to SRCU_SIZE_ALLOC.  allocated memory for srcu_node structure at end of the SRCU
grace period.   
if allocated success,  transition to SRCU_SIZE_WAIT_BARRIER,  in this state, invoke
srcu_barrier() will iterate over all possible CPUs, but we still only use boot-CPU srcu_cblist to store callbacks.
the task calling call_srcu() may have access to a new srcu_node structure or may not, 
because call_srcu() is protected by SRCU, we may not transition to SRCU_SIZE_WAIT_CALL quickly,
need to wait in this state for a SRCU grace period to end.
After transition to SRCU_SIZE_WAIT_CALL, we enqueue srcu callbacks to own srcu_data structures's ->srcu_cblist.

Does my description make my commit more detailed?

Thanks
Zqiang




>
>Thanks.
>
>  	spin_lock_irqsave_rcu_node(sdp, flags);
>  	if (rcu_segcblist_pend_cbs(&sdp->srcu_cblist)) {
>  		spin_unlock_irqrestore_rcu_node(sdp, flags);
> -- 
> 2.25.1
>
  
Zqiang Nov. 23, 2022, 2:01 a.m. UTC | #3
>On Thu, Nov 17, 2022 at 07:20:50PM +0800, Zqiang wrote:
> Before SRCU_SIZE_WAIT_CALL srcu_size_state is reached, the srcu
> callback only insert to boot-CPU srcu_data structure's->srcu_cblist
> and other CPU srcu_data structure's->srcu_cblist is always empty. so
> before SRCU_SIZE_WAIT_CALL is reached, need to correctly check boot-CPU
> pend cbs in srcu_might_be_idle().
> 
> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> ---
>  kernel/rcu/srcutree.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 6af031200580..6d9af9901765 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -1098,8 +1098,11 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp)
>  	unsigned long tlast;
>  
>  	check_init_srcu_struct(ssp);
> -	/* If the local srcu_data structure has callbacks, not idle.  */
> -	sdp = raw_cpu_ptr(ssp->sda);
> +	/* If the boot CPU or local srcu_data structure has callbacks, not idle.  */
> +	if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_CALL)
> +		sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id());
> +	else
> +		sdp = raw_cpu_ptr(ssp->sda);
>

Hi Paul,  

For the convert_to_big  default configuration(SRCU_SIZING_AUTO), I found that the srcu is in
SRCU_SIZE_SMALL state most of the time in the system, I think this change is meaningful.

Thoughts ?

Thanks
Zqiang


>While at it if someone is interested in documenting/commenting on the meaning of
>all these SRCU_SIZE_* things, it would be much appreciated!
>
>In the initial stage(SRCU_SIZE_SMALL), there is no srcu_node structures, only initialized
>per-CPU srcu_data structures, at this time, we only use boot-cpu srcu_data structures's ->srcu_cblist
>to store srcu callbacks.  
>if the contention of srcu_struct and srcu_data structure's->lock become busy,
>transition to SRCU_SIZE_ALLOC.  allocated memory for srcu_node structure at end of the SRCU
>grace period.   
>if allocated success,  transition to SRCU_SIZE_WAIT_BARRIER,  in this state, invoke
>srcu_barrier() will iterate over all possible CPUs, but we still only use boot-CPU srcu_cblist to store callbacks.
>the task calling call_srcu() may have access to a new srcu_node structure or may not, 
>because call_srcu() is protected by SRCU, we may not transition to SRCU_SIZE_WAIT_CALL quickly,
>need to wait in this state for a SRCU grace period to end.
>After transition to SRCU_SIZE_WAIT_CALL, we enqueue srcu callbacks to own srcu_data structures's ->srcu_cblist.
>
>Does my description make my commit more detailed?
>
>Thanks
>Zqiang
>
>


>
>Thanks.
>
>  	spin_lock_irqsave_rcu_node(sdp, flags);
>  	if (rcu_segcblist_pend_cbs(&sdp->srcu_cblist)) {
>  		spin_unlock_irqrestore_rcu_node(sdp, flags);
> -- 
> 2.25.1
>
  
Paul E. McKenney Nov. 23, 2022, 7:04 p.m. UTC | #4
On Wed, Nov 23, 2022 at 02:01:50AM +0000, Zhang, Qiang1 wrote:
> 
> >On Thu, Nov 17, 2022 at 07:20:50PM +0800, Zqiang wrote:
> > Before SRCU_SIZE_WAIT_CALL srcu_size_state is reached, the srcu
> > callback only insert to boot-CPU srcu_data structure's->srcu_cblist
> > and other CPU srcu_data structure's->srcu_cblist is always empty. so
> > before SRCU_SIZE_WAIT_CALL is reached, need to correctly check boot-CPU
> > pend cbs in srcu_might_be_idle().
> > 
> > Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> > ---
> >  kernel/rcu/srcutree.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > index 6af031200580..6d9af9901765 100644
> > --- a/kernel/rcu/srcutree.c
> > +++ b/kernel/rcu/srcutree.c
> > @@ -1098,8 +1098,11 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp)
> >  	unsigned long tlast;
> >  
> >  	check_init_srcu_struct(ssp);
> > -	/* If the local srcu_data structure has callbacks, not idle.  */
> > -	sdp = raw_cpu_ptr(ssp->sda);
> > +	/* If the boot CPU or local srcu_data structure has callbacks, not idle.  */
> > +	if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_CALL)
> > +		sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id());
> > +	else
> > +		sdp = raw_cpu_ptr(ssp->sda);
> >
> 
> Hi Paul,  
> 
> For the convert_to_big  default configuration(SRCU_SIZING_AUTO), I found that the srcu is in
> SRCU_SIZE_SMALL state most of the time in the system, I think this change is meaningful.
> 
> Thoughts ?

You are right that this change might make srcu_might_be_idle() return a
more accurate value in the common case.  As things are now, some other
CPU might just now have added a callback, but might not yet have started
that new grace period.  In that case, we might expedite an SRCU grace
period when we would not have otherwise done so.  However, this change
would also increase contention on the get_boot_cpu_id() CPU's srcu_data
structure's ->lock.

So the result of that inaccurate return value is that the first two SRCU
grace periods in a burst might be expedited instead of only the first one,
and even then only if we hit a very narrow race window.

Besides, this same thing can happen if two CPUs do synchronize_srcu()
at the same time after a long-enough pause between grace periods.
Both see no callbacks, so both ask for an expedited grace period.
So again, the first two grace periods are expedited.

As a result, I am having a very hard time justifying the increased
lock contention.

Am I missing something here?

							Thanx, Paul

> Thanks
> Zqiang
> 
> 
> >While at it if someone is interested in documenting/commenting on the meaning of
> >all these SRCU_SIZE_* things, it would be much appreciated!
> >
> >In the initial stage(SRCU_SIZE_SMALL), there is no srcu_node structures, only initialized
> >per-CPU srcu_data structures, at this time, we only use boot-cpu srcu_data structures's ->srcu_cblist
> >to store srcu callbacks.  
> >if the contention of srcu_struct and srcu_data structure's->lock become busy,
> >transition to SRCU_SIZE_ALLOC.  allocated memory for srcu_node structure at end of the SRCU
> >grace period.   
> >if allocated success,  transition to SRCU_SIZE_WAIT_BARRIER,  in this state, invoke
> >srcu_barrier() will iterate over all possible CPUs, but we still only use boot-CPU srcu_cblist to store callbacks.
> >the task calling call_srcu() may have access to a new srcu_node structure or may not, 
> >because call_srcu() is protected by SRCU, we may not transition to SRCU_SIZE_WAIT_CALL quickly,
> >need to wait in this state for a SRCU grace period to end.
> >After transition to SRCU_SIZE_WAIT_CALL, we enqueue srcu callbacks to own srcu_data structures's ->srcu_cblist.
> >
> >Does my description make my commit more detailed?
> >
> >Thanks
> >Zqiang
> >
> >
> 
> 
> >
> >Thanks.
> >
> >  	spin_lock_irqsave_rcu_node(sdp, flags);
> >  	if (rcu_segcblist_pend_cbs(&sdp->srcu_cblist)) {
> >  		spin_unlock_irqrestore_rcu_node(sdp, flags);
> > -- 
> > 2.25.1
> >
  
Zqiang Nov. 24, 2022, 1:43 a.m. UTC | #5
On Wed, Nov 23, 2022 at 02:01:50AM +0000, Zhang, Qiang1 wrote:
> 
> >On Thu, Nov 17, 2022 at 07:20:50PM +0800, Zqiang wrote:
> > Before SRCU_SIZE_WAIT_CALL srcu_size_state is reached, the srcu
> > callback only insert to boot-CPU srcu_data structure's->srcu_cblist
> > and other CPU srcu_data structure's->srcu_cblist is always empty. so
> > before SRCU_SIZE_WAIT_CALL is reached, need to correctly check boot-CPU
> > pend cbs in srcu_might_be_idle().
> > 
> > Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> > ---
> >  kernel/rcu/srcutree.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > index 6af031200580..6d9af9901765 100644
> > --- a/kernel/rcu/srcutree.c
> > +++ b/kernel/rcu/srcutree.c
> > @@ -1098,8 +1098,11 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp)
> >  	unsigned long tlast;
> >  
> >  	check_init_srcu_struct(ssp);
> > -	/* If the local srcu_data structure has callbacks, not idle.  */
> > -	sdp = raw_cpu_ptr(ssp->sda);
> > +	/* If the boot CPU or local srcu_data structure has callbacks, not idle.  */
> > +	if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_CALL)
> > +		sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id());
> > +	else
> > +		sdp = raw_cpu_ptr(ssp->sda);
> >
> 
> Hi Paul,  
> 
> For the convert_to_big  default configuration(SRCU_SIZING_AUTO), I found that the srcu is in
> SRCU_SIZE_SMALL state most of the time in the system, I think this change is meaningful.
> 
> Thoughts ?

>You are right that this change might make srcu_might_be_idle() return a
>more accurate value in the common case.  As things are now, some other
>CPU might just now have added a callback, but might not yet have started
>that new grace period.  In that case, we might expedite an SRCU grace
>period when we would not have otherwise done so.  However, this change
>would also increase contention on the get_boot_cpu_id() CPU's srcu_data
>structure's ->lock.
>
>So the result of that inaccurate return value is that the first two SRCU
>grace periods in a burst might be expedited instead of only the first one,
>and even then only if we hit a very narrow race window.
>
>Besides, this same thing can happen if two CPUs do synchronize_srcu()
>at the same time after a long-enough pause between grace periods.
>Both see no callbacks, so both ask for an expedited grace period.
>So again, the first two grace periods are expedited.
>
>As a result, I am having a very hard time justifying the increased
>lock contention.

Thanks reply,  I have another question, Is this srcu_data structure's->lock necessary?
the rcu_segcblist_pend_cbs() only check *tails[RCU_DONE_TAIL] is empty.
or can use rcu_segcblist_get_seglen(RCU_WAIT_TAIL + RCU_NEXT_READY_TAIL + RCU_NEXT_TAIL) 
instead of  rcu_segcblist_pend_cbs() to avoid locking? (although this is also inaccurate)

Thanks
Zqiang

>
>Am I missing something here?
>
>							Thanx, Paul
>
> Thanks
> Zqiang
> 
> 
> >While at it if someone is interested in documenting/commenting on the meaning of
> >all these SRCU_SIZE_* things, it would be much appreciated!
> >
> >In the initial stage(SRCU_SIZE_SMALL), there is no srcu_node structures, only initialized
> >per-CPU srcu_data structures, at this time, we only use boot-cpu srcu_data structures's ->srcu_cblist
> >to store srcu callbacks.  
> >if the contention of srcu_struct and srcu_data structure's->lock become busy,
> >transition to SRCU_SIZE_ALLOC.  allocated memory for srcu_node structure at end of the SRCU
> >grace period.   
> >if allocated success,  transition to SRCU_SIZE_WAIT_BARRIER,  in this state, invoke
> >srcu_barrier() will iterate over all possible CPUs, but we still only use boot-CPU srcu_cblist to store callbacks.
> >the task calling call_srcu() may have access to a new srcu_node structure or may not, 
> >because call_srcu() is protected by SRCU, we may not transition to SRCU_SIZE_WAIT_CALL quickly,
> >need to wait in this state for a SRCU grace period to end.
> >After transition to SRCU_SIZE_WAIT_CALL, we enqueue srcu callbacks to own srcu_data structures's ->srcu_cblist.
> >
> >Does my description make my commit more detailed?
> >
> >Thanks
> >Zqiang
> >
> >
> 
> 
> >
> >Thanks.
> >
> >  	spin_lock_irqsave_rcu_node(sdp, flags);
> >  	if (rcu_segcblist_pend_cbs(&sdp->srcu_cblist)) {
> >  		spin_unlock_irqrestore_rcu_node(sdp, flags);
> > -- 
> > 2.25.1
> >
  
Paul E. McKenney Nov. 24, 2022, 5:25 a.m. UTC | #6
On Thu, Nov 24, 2022 at 01:43:42AM +0000, Zhang, Qiang1 wrote:
> On Wed, Nov 23, 2022 at 02:01:50AM +0000, Zhang, Qiang1 wrote:
> > 
> > >On Thu, Nov 17, 2022 at 07:20:50PM +0800, Zqiang wrote:
> > > Before SRCU_SIZE_WAIT_CALL srcu_size_state is reached, the srcu
> > > callback only insert to boot-CPU srcu_data structure's->srcu_cblist
> > > and other CPU srcu_data structure's->srcu_cblist is always empty. so
> > > before SRCU_SIZE_WAIT_CALL is reached, need to correctly check boot-CPU
> > > pend cbs in srcu_might_be_idle().
> > > 
> > > Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> > > ---
> > >  kernel/rcu/srcutree.c | 7 +++++--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > > index 6af031200580..6d9af9901765 100644
> > > --- a/kernel/rcu/srcutree.c
> > > +++ b/kernel/rcu/srcutree.c
> > > @@ -1098,8 +1098,11 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp)
> > >  	unsigned long tlast;
> > >  
> > >  	check_init_srcu_struct(ssp);
> > > -	/* If the local srcu_data structure has callbacks, not idle.  */
> > > -	sdp = raw_cpu_ptr(ssp->sda);
> > > +	/* If the boot CPU or local srcu_data structure has callbacks, not idle.  */
> > > +	if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_CALL)
> > > +		sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id());
> > > +	else
> > > +		sdp = raw_cpu_ptr(ssp->sda);
> > >
> > 
> > Hi Paul,  
> > 
> > For the convert_to_big  default configuration(SRCU_SIZING_AUTO), I found that the srcu is in
> > SRCU_SIZE_SMALL state most of the time in the system, I think this change is meaningful.
> > 
> > Thoughts ?
> 
> >You are right that this change might make srcu_might_be_idle() return a
> >more accurate value in the common case.  As things are now, some other
> >CPU might just now have added a callback, but might not yet have started
> >that new grace period.  In that case, we might expedite an SRCU grace
> >period when we would not have otherwise done so.  However, this change
> >would also increase contention on the get_boot_cpu_id() CPU's srcu_data
> >structure's ->lock.
> >
> >So the result of that inaccurate return value is that the first two SRCU
> >grace periods in a burst might be expedited instead of only the first one,
> >and even then only if we hit a very narrow race window.
> >
> >Besides, this same thing can happen if two CPUs do synchronize_srcu()
> >at the same time after a long-enough pause between grace periods.
> >Both see no callbacks, so both ask for an expedited grace period.
> >So again, the first two grace periods are expedited.
> >
> >As a result, I am having a very hard time justifying the increased
> >lock contention.
> 
> Thanks reply,  I have another question, Is this srcu_data structure's->lock necessary?
> the rcu_segcblist_pend_cbs() only check *tails[RCU_DONE_TAIL] is empty.
> or can use rcu_segcblist_get_seglen(RCU_WAIT_TAIL + RCU_NEXT_READY_TAIL + RCU_NEXT_TAIL) 
> instead of  rcu_segcblist_pend_cbs() to avoid locking? (although this is also inaccurate)

That extra "*" means that the lock must be acquired.  Otherwise, the
pointed-to callback might even be unmapped between the time we fetched
the pointer and the time we dereferenced it.

							Thanx, Paul.

> Thanks
> Zqiang
> 
> >
> >Am I missing something here?
> >
> >							Thanx, Paul
> >
> > Thanks
> > Zqiang
> > 
> > 
> > >While at it if someone is interested in documenting/commenting on the meaning of
> > >all these SRCU_SIZE_* things, it would be much appreciated!
> > >
> > >In the initial stage(SRCU_SIZE_SMALL), there is no srcu_node structures, only initialized
> > >per-CPU srcu_data structures, at this time, we only use boot-cpu srcu_data structures's ->srcu_cblist
> > >to store srcu callbacks.  
> > >if the contention of srcu_struct and srcu_data structure's->lock become busy,
> > >transition to SRCU_SIZE_ALLOC.  allocated memory for srcu_node structure at end of the SRCU
> > >grace period.   
> > >if allocated success,  transition to SRCU_SIZE_WAIT_BARRIER,  in this state, invoke
> > >srcu_barrier() will iterate over all possible CPUs, but we still only use boot-CPU srcu_cblist to store callbacks.
> > >the task calling call_srcu() may have access to a new srcu_node structure or may not, 
> > >because call_srcu() is protected by SRCU, we may not transition to SRCU_SIZE_WAIT_CALL quickly,
> > >need to wait in this state for a SRCU grace period to end.
> > >After transition to SRCU_SIZE_WAIT_CALL, we enqueue srcu callbacks to own srcu_data structures's ->srcu_cblist.
> > >
> > >Does my description make my commit more detailed?
> > >
> > >Thanks
> > >Zqiang
> > >
> > >
> > 
> > 
> > >
> > >Thanks.
> > >
> > >  	spin_lock_irqsave_rcu_node(sdp, flags);
> > >  	if (rcu_segcblist_pend_cbs(&sdp->srcu_cblist)) {
> > >  		spin_unlock_irqrestore_rcu_node(sdp, flags);
> > > -- 
> > > 2.25.1
> > >
  
Zqiang Nov. 24, 2022, 5:59 a.m. UTC | #7
On Thu, Nov 24, 2022 at 01:43:42AM +0000, Zhang, Qiang1 wrote:
> On Wed, Nov 23, 2022 at 02:01:50AM +0000, Zhang, Qiang1 wrote:
> > 
> > >On Thu, Nov 17, 2022 at 07:20:50PM +0800, Zqiang wrote:
> > > Before SRCU_SIZE_WAIT_CALL srcu_size_state is reached, the srcu
> > > callback only insert to boot-CPU srcu_data structure's->srcu_cblist
> > > and other CPU srcu_data structure's->srcu_cblist is always empty. so
> > > before SRCU_SIZE_WAIT_CALL is reached, need to correctly check boot-CPU
> > > pend cbs in srcu_might_be_idle().
> > > 
> > > Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> > > ---
> > >  kernel/rcu/srcutree.c | 7 +++++--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > > index 6af031200580..6d9af9901765 100644
> > > --- a/kernel/rcu/srcutree.c
> > > +++ b/kernel/rcu/srcutree.c
> > > @@ -1098,8 +1098,11 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp)
> > >  	unsigned long tlast;
> > >  
> > >  	check_init_srcu_struct(ssp);
> > > -	/* If the local srcu_data structure has callbacks, not idle.  */
> > > -	sdp = raw_cpu_ptr(ssp->sda);
> > > +	/* If the boot CPU or local srcu_data structure has callbacks, not idle.  */
> > > +	if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_CALL)
> > > +		sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id());
> > > +	else
> > > +		sdp = raw_cpu_ptr(ssp->sda);
> > >
> > 
> > Hi Paul,  
> > 
> > For the convert_to_big  default configuration(SRCU_SIZING_AUTO), I found that the srcu is in
> > SRCU_SIZE_SMALL state most of the time in the system, I think this change is meaningful.
> > 
> > Thoughts ?
> 
> >You are right that this change might make srcu_might_be_idle() return a
> >more accurate value in the common case.  As things are now, some other
> >CPU might just now have added a callback, but might not yet have started
> >that new grace period.  In that case, we might expedite an SRCU grace
> >period when we would not have otherwise done so.  However, this change
> >would also increase contention on the get_boot_cpu_id() CPU's srcu_data
> >structure's ->lock.
> >
> >So the result of that inaccurate return value is that the first two SRCU
> >grace periods in a burst might be expedited instead of only the first one,
> >and even then only if we hit a very narrow race window.
> >
> >Besides, this same thing can happen if two CPUs do synchronize_srcu()
> >at the same time after a long-enough pause between grace periods.
> >Both see no callbacks, so both ask for an expedited grace period.
> >So again, the first two grace periods are expedited.
> >
> >As a result, I am having a very hard time justifying the increased
> >lock contention.
> 
> Thanks reply,  I have another question, Is this srcu_data structure's->lock necessary?
> the rcu_segcblist_pend_cbs() only check *tails[RCU_DONE_TAIL] is empty.
> or can use rcu_segcblist_get_seglen(RCU_WAIT_TAIL + RCU_NEXT_READY_TAIL + RCU_NEXT_TAIL) 
> instead of  rcu_segcblist_pend_cbs() to avoid locking? (although this is also inaccurate)

>That extra "*" means that the lock must be acquired.  Otherwise, the
>pointed-to callback might even be unmapped between the time we fetched
>the pointer and the time we dereferenced it.

Thanks for detailed explanation , learn more😊.

>
>							Thanx, Paul.

> Thanks
> Zqiang
> 
> >
> >Am I missing something here?
> >
> >							Thanx, Paul
> >
> > Thanks
> > Zqiang
> > 
> > 
> > >While at it if someone is interested in documenting/commenting on the meaning of
> > >all these SRCU_SIZE_* things, it would be much appreciated!
> > >
> > >In the initial stage(SRCU_SIZE_SMALL), there is no srcu_node structures, only initialized
> > >per-CPU srcu_data structures, at this time, we only use boot-cpu srcu_data structures's ->srcu_cblist
> > >to store srcu callbacks.  
> > >if the contention of srcu_struct and srcu_data structure's->lock become busy,
> > >transition to SRCU_SIZE_ALLOC.  allocated memory for srcu_node structure at end of the SRCU
> > >grace period.   
> > >if allocated success,  transition to SRCU_SIZE_WAIT_BARRIER,  in this state, invoke
> > >srcu_barrier() will iterate over all possible CPUs, but we still only use boot-CPU srcu_cblist to store callbacks.
> > >the task calling call_srcu() may have access to a new srcu_node structure or may not, 
> > >because call_srcu() is protected by SRCU, we may not transition to SRCU_SIZE_WAIT_CALL quickly,
> > >need to wait in this state for a SRCU grace period to end.
> > >After transition to SRCU_SIZE_WAIT_CALL, we enqueue srcu callbacks to own srcu_data structures's ->srcu_cblist.
> > >
> > >Does my description make my commit more detailed?
> > >
> > >Thanks
> > >Zqiang
> > >
> > >
> > 
> > 
> > >
> > >Thanks.
> > >
> > >  	spin_lock_irqsave_rcu_node(sdp, flags);
> > >  	if (rcu_segcblist_pend_cbs(&sdp->srcu_cblist)) {
> > >  		spin_unlock_irqrestore_rcu_node(sdp, flags);
> > > -- 
> > > 2.25.1
> > >
  
Pingfan Liu Nov. 28, 2022, 8:32 a.m. UTC | #8
On Thu, Nov 17, 2022 at 03:20:25PM +0100, Frederic Weisbecker wrote:
> On Thu, Nov 17, 2022 at 07:20:50PM +0800, Zqiang wrote:
> > Before SRCU_SIZE_WAIT_CALL srcu_size_state is reached, the srcu
> > callback only insert to boot-CPU srcu_data structure's->srcu_cblist
> > and other CPU srcu_data structure's->srcu_cblist is always empty. so
> > before SRCU_SIZE_WAIT_CALL is reached, need to correctly check boot-CPU
> > pend cbs in srcu_might_be_idle().
> > 
> > Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> > ---
> >  kernel/rcu/srcutree.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > index 6af031200580..6d9af9901765 100644
> > --- a/kernel/rcu/srcutree.c
> > +++ b/kernel/rcu/srcutree.c
> > @@ -1098,8 +1098,11 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp)
> >  	unsigned long tlast;
> >  
> >  	check_init_srcu_struct(ssp);
> > -	/* If the local srcu_data structure has callbacks, not idle.  */
> > -	sdp = raw_cpu_ptr(ssp->sda);
> > +	/* If the boot CPU or local srcu_data structure has callbacks, not idle.  */
> > +	if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_CALL)
> > +		sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id());
> > +	else
> > +		sdp = raw_cpu_ptr(ssp->sda);
> 
> While at it if someone is interested in documenting/commenting on the meaning of
> all these SRCU_SIZE_* things, it would be much appreciated!
> 

Due to a conflict understanding to the code, once hesitate to jump in.
But anyway, just bold to move ahead.

I have send a series "[PATCH 0/3] srcu: shrink the num of
srcu_size_state", which opens the discussion. (Have Cced you and Zqiang)

Please have a look.

Thanks,

	Pingfan

> Thanks.
> 
> >  	spin_lock_irqsave_rcu_node(sdp, flags);
> >  	if (rcu_segcblist_pend_cbs(&sdp->srcu_cblist)) {
> >  		spin_unlock_irqrestore_rcu_node(sdp, flags);
> > -- 
> > 2.25.1
> >
  

Patch

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 6af031200580..6d9af9901765 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -1098,8 +1098,11 @@  static bool srcu_might_be_idle(struct srcu_struct *ssp)
 	unsigned long tlast;
 
 	check_init_srcu_struct(ssp);
-	/* If the local srcu_data structure has callbacks, not idle.  */
-	sdp = raw_cpu_ptr(ssp->sda);
+	/* If the boot CPU or local srcu_data structure has callbacks, not idle.  */
+	if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_CALL)
+		sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id());
+	else
+		sdp = raw_cpu_ptr(ssp->sda);
 	spin_lock_irqsave_rcu_node(sdp, flags);
 	if (rcu_segcblist_pend_cbs(&sdp->srcu_cblist)) {
 		spin_unlock_irqrestore_rcu_node(sdp, flags);