Message ID | 20221117112050.3942407-1-qiang1.zhang@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp347485wrr; Thu, 17 Nov 2022 03:32:09 -0800 (PST) X-Google-Smtp-Source: AA0mqf7Miz5qiVp8kjN93fcraUMrluoE8saSZyviVfpvAUEjj/9pNekVo8a225FdzTRsrRjDQHqO X-Received: by 2002:a17:902:7e0f:b0:17f:8097:83c1 with SMTP id b15-20020a1709027e0f00b0017f809783c1mr2390952plm.10.1668684729324; Thu, 17 Nov 2022 03:32:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668684729; cv=none; d=google.com; s=arc-20160816; b=Nrx51oAI6kwc+25jN8O9xdemhyR03/V7v8BEMZWQLxa7KPTUL9iKTEjsbs3smgxpjl 4CtEbo/C3np9P0qhtEg2lTwJMaMZ+t0HItzgfrKQIeyMnHN14Bk6f3J3hb7DzETMtkq8 XqErpIsXJGhiKnTduDzS4GppsaaN/tRhQfsNENqHq1iLglJzcvhhkDSjazhwM4xzBPzO ws3WCZfcjEDcweIzOkboksMGDrEcdRDys3XZkDVjRkwXKcNADclIEY/on20KwcoLoz7t 8dQNveoQ+cZmdN80PFxF2MQ51df0gQFD5qv3hzNtk3a4bMZf2mUdW93USVrvSGgTb8R4 AAsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=tsryr/o1050Qgwa1/wCJRItrbPDe+qt77rbyNq/EvKM=; b=QP950tPwPkHAc1cERhMH6acrSCAzPMjRnnuahy67VlAINDZxU2rK51Vl1NsflxxYW4 s6gkDH73k5hFNXM3Nh7OYeHIS0XIV0xKKi2Hn6DlJ+SgzrV7zGkcTNAXls7kPyM1YP8Q y/oVdg3UzXwmA4DXFikqnfx0amwGrPdXO2WwmpiMRTKAIc9T2xg3CpoGVzd8vG1LjQSk x9KwUMSffItuhHdPg7PaqsNM/Y1XSwyWRdwvZvqk9R/9BpomNSBJ2Og2S8JRhEdt2O2m 8fR3DYxYOxn75hG+BwP+7+JkBIAaXJ2jZhg1PVljsCoRCH5YBxCUBI2UcwzIVbpHdIck pH1Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=TUlY7ARY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g14-20020a62520e000000b005721983cbe7si596977pfb.318.2022.11.17.03.31.55; Thu, 17 Nov 2022 03:32:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=TUlY7ARY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234687AbiKQLPG (ORCPT <rfc822;a1648639935@gmail.com> + 99 others); Thu, 17 Nov 2022 06:15:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60838 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234729AbiKQLO5 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 17 Nov 2022 06:14:57 -0500 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 66832B02; Thu, 17 Nov 2022 03:14:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668683696; x=1700219696; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=v04Ag2ceXqVcD6vlEr0yMJgTmBUx6AYrMeEO/PVcnzE=; b=TUlY7ARYQLeZW0+40xt1ivtVwnWYr3VbUDzxkiWGdALGh9ZwXbISCaFP sc+AN1wDgLUy8Uu7wCaByiVXbTEvO+2i4YQKkg2R5+PJtVijQ0N6lvMpP h+MXqYjuDf0gPLaOPoclq5YTw+wvfLVChVDIPaK2C4/XBG5wexsX/OgOF DOVD/X5luuHlxzlN76GmDfOjsTEbPM50Yy19SyzZAIGzSs2itAe0ngRUm ryA3RbBji/ZeTL90vr6OAibRm9AhY4iZMdvXp9FUZMSxqwBqiW5/KPo4H 8d6InvXRru/+lkUPDaefS7PYp8TQpb7BNjb4oiU7Ne5ME1hWoSD9rHSJh A==; X-IronPort-AV: E=McAfee;i="6500,9779,10533"; a="311528571" X-IronPort-AV: E=Sophos;i="5.96,171,1665471600"; d="scan'208";a="311528571" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Nov 2022 03:14:56 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10533"; a="814483089" X-IronPort-AV: E=Sophos;i="5.96,171,1665471600"; d="scan'208";a="814483089" Received: from zq-optiplex-7090.bj.intel.com ([10.238.156.129]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Nov 2022 03:14:53 -0800 From: Zqiang <qiang1.zhang@intel.com> To: paulmck@kernel.org, frederic@kernel.org, joel@joelfernandes.org Cc: rcu@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] srcu: Add detection of boot CPU srcu_data structure's->srcu_cblist Date: Thu, 17 Nov 2022 19:20:50 +0800 Message-Id: <20221117112050.3942407-1-qiang1.zhang@intel.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749742758547549841?= X-GMAIL-MSGID: =?utf-8?q?1749742758547549841?= |
Series |
srcu: Add detection of boot CPU srcu_data structure's->srcu_cblist
|
|
Commit Message
Zqiang
Nov. 17, 2022, 11:20 a.m. UTC
Before SRCU_SIZE_WAIT_CALL srcu_size_state is reached, the srcu
callback only insert to boot-CPU srcu_data structure's->srcu_cblist
and other CPU srcu_data structure's->srcu_cblist is always empty. so
before SRCU_SIZE_WAIT_CALL is reached, need to correctly check boot-CPU
pend cbs in srcu_might_be_idle().
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
---
kernel/rcu/srcutree.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
Comments
On Thu, Nov 17, 2022 at 07:20:50PM +0800, Zqiang wrote: > Before SRCU_SIZE_WAIT_CALL srcu_size_state is reached, the srcu > callback only insert to boot-CPU srcu_data structure's->srcu_cblist > and other CPU srcu_data structure's->srcu_cblist is always empty. so > before SRCU_SIZE_WAIT_CALL is reached, need to correctly check boot-CPU > pend cbs in srcu_might_be_idle(). > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > --- > kernel/rcu/srcutree.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > index 6af031200580..6d9af9901765 100644 > --- a/kernel/rcu/srcutree.c > +++ b/kernel/rcu/srcutree.c > @@ -1098,8 +1098,11 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp) > unsigned long tlast; > > check_init_srcu_struct(ssp); > - /* If the local srcu_data structure has callbacks, not idle. */ > - sdp = raw_cpu_ptr(ssp->sda); > + /* If the boot CPU or local srcu_data structure has callbacks, not idle. */ > + if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_CALL) > + sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id()); > + else > + sdp = raw_cpu_ptr(ssp->sda); While at it if someone is interested in documenting/commenting on the meaning of all these SRCU_SIZE_* things, it would be much appreciated! Thanks. > spin_lock_irqsave_rcu_node(sdp, flags); > if (rcu_segcblist_pend_cbs(&sdp->srcu_cblist)) { > spin_unlock_irqrestore_rcu_node(sdp, flags); > -- > 2.25.1 >
On Thu, Nov 17, 2022 at 07:20:50PM +0800, Zqiang wrote: > Before SRCU_SIZE_WAIT_CALL srcu_size_state is reached, the srcu > callback only insert to boot-CPU srcu_data structure's->srcu_cblist > and other CPU srcu_data structure's->srcu_cblist is always empty. so > before SRCU_SIZE_WAIT_CALL is reached, need to correctly check boot-CPU > pend cbs in srcu_might_be_idle(). > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > --- > kernel/rcu/srcutree.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > index 6af031200580..6d9af9901765 100644 > --- a/kernel/rcu/srcutree.c > +++ b/kernel/rcu/srcutree.c > @@ -1098,8 +1098,11 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp) > unsigned long tlast; > > check_init_srcu_struct(ssp); > - /* If the local srcu_data structure has callbacks, not idle. */ > - sdp = raw_cpu_ptr(ssp->sda); > + /* If the boot CPU or local srcu_data structure has callbacks, not idle. */ > + if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_CALL) > + sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id()); > + else > + sdp = raw_cpu_ptr(ssp->sda); > >While at it if someone is interested in documenting/commenting on the meaning of >all these SRCU_SIZE_* things, it would be much appreciated! In the initial stage(SRCU_SIZE_SMALL), there is no srcu_node structures, only initialized per-CPU srcu_data structures, at this time, we only use boot-cpu srcu_data structures's ->srcu_cblist to store srcu callbacks. if the contention of srcu_struct and srcu_data structure's->lock become busy, transition to SRCU_SIZE_ALLOC. allocated memory for srcu_node structure at end of the SRCU grace period. if allocated success, transition to SRCU_SIZE_WAIT_BARRIER, in this state, invoke srcu_barrier() will iterate over all possible CPUs, but we still only use boot-CPU srcu_cblist to store callbacks. the task calling call_srcu() may have access to a new srcu_node structure or may not, because call_srcu() is protected by SRCU, we may not transition to SRCU_SIZE_WAIT_CALL quickly, need to wait in this state for a SRCU grace period to end. After transition to SRCU_SIZE_WAIT_CALL, we enqueue srcu callbacks to own srcu_data structures's ->srcu_cblist. Does my description make my commit more detailed? Thanks Zqiang > >Thanks. > > spin_lock_irqsave_rcu_node(sdp, flags); > if (rcu_segcblist_pend_cbs(&sdp->srcu_cblist)) { > spin_unlock_irqrestore_rcu_node(sdp, flags); > -- > 2.25.1 >
>On Thu, Nov 17, 2022 at 07:20:50PM +0800, Zqiang wrote: > Before SRCU_SIZE_WAIT_CALL srcu_size_state is reached, the srcu > callback only insert to boot-CPU srcu_data structure's->srcu_cblist > and other CPU srcu_data structure's->srcu_cblist is always empty. so > before SRCU_SIZE_WAIT_CALL is reached, need to correctly check boot-CPU > pend cbs in srcu_might_be_idle(). > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > --- > kernel/rcu/srcutree.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > index 6af031200580..6d9af9901765 100644 > --- a/kernel/rcu/srcutree.c > +++ b/kernel/rcu/srcutree.c > @@ -1098,8 +1098,11 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp) > unsigned long tlast; > > check_init_srcu_struct(ssp); > - /* If the local srcu_data structure has callbacks, not idle. */ > - sdp = raw_cpu_ptr(ssp->sda); > + /* If the boot CPU or local srcu_data structure has callbacks, not idle. */ > + if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_CALL) > + sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id()); > + else > + sdp = raw_cpu_ptr(ssp->sda); > Hi Paul, For the convert_to_big default configuration(SRCU_SIZING_AUTO), I found that the srcu is in SRCU_SIZE_SMALL state most of the time in the system, I think this change is meaningful. Thoughts ? Thanks Zqiang >While at it if someone is interested in documenting/commenting on the meaning of >all these SRCU_SIZE_* things, it would be much appreciated! > >In the initial stage(SRCU_SIZE_SMALL), there is no srcu_node structures, only initialized >per-CPU srcu_data structures, at this time, we only use boot-cpu srcu_data structures's ->srcu_cblist >to store srcu callbacks. >if the contention of srcu_struct and srcu_data structure's->lock become busy, >transition to SRCU_SIZE_ALLOC. allocated memory for srcu_node structure at end of the SRCU >grace period. >if allocated success, transition to SRCU_SIZE_WAIT_BARRIER, in this state, invoke >srcu_barrier() will iterate over all possible CPUs, but we still only use boot-CPU srcu_cblist to store callbacks. >the task calling call_srcu() may have access to a new srcu_node structure or may not, >because call_srcu() is protected by SRCU, we may not transition to SRCU_SIZE_WAIT_CALL quickly, >need to wait in this state for a SRCU grace period to end. >After transition to SRCU_SIZE_WAIT_CALL, we enqueue srcu callbacks to own srcu_data structures's ->srcu_cblist. > >Does my description make my commit more detailed? > >Thanks >Zqiang > > > >Thanks. > > spin_lock_irqsave_rcu_node(sdp, flags); > if (rcu_segcblist_pend_cbs(&sdp->srcu_cblist)) { > spin_unlock_irqrestore_rcu_node(sdp, flags); > -- > 2.25.1 >
On Wed, Nov 23, 2022 at 02:01:50AM +0000, Zhang, Qiang1 wrote: > > >On Thu, Nov 17, 2022 at 07:20:50PM +0800, Zqiang wrote: > > Before SRCU_SIZE_WAIT_CALL srcu_size_state is reached, the srcu > > callback only insert to boot-CPU srcu_data structure's->srcu_cblist > > and other CPU srcu_data structure's->srcu_cblist is always empty. so > > before SRCU_SIZE_WAIT_CALL is reached, need to correctly check boot-CPU > > pend cbs in srcu_might_be_idle(). > > > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > --- > > kernel/rcu/srcutree.c | 7 +++++-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > > index 6af031200580..6d9af9901765 100644 > > --- a/kernel/rcu/srcutree.c > > +++ b/kernel/rcu/srcutree.c > > @@ -1098,8 +1098,11 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp) > > unsigned long tlast; > > > > check_init_srcu_struct(ssp); > > - /* If the local srcu_data structure has callbacks, not idle. */ > > - sdp = raw_cpu_ptr(ssp->sda); > > + /* If the boot CPU or local srcu_data structure has callbacks, not idle. */ > > + if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_CALL) > > + sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id()); > > + else > > + sdp = raw_cpu_ptr(ssp->sda); > > > > Hi Paul, > > For the convert_to_big default configuration(SRCU_SIZING_AUTO), I found that the srcu is in > SRCU_SIZE_SMALL state most of the time in the system, I think this change is meaningful. > > Thoughts ? You are right that this change might make srcu_might_be_idle() return a more accurate value in the common case. As things are now, some other CPU might just now have added a callback, but might not yet have started that new grace period. In that case, we might expedite an SRCU grace period when we would not have otherwise done so. However, this change would also increase contention on the get_boot_cpu_id() CPU's srcu_data structure's ->lock. So the result of that inaccurate return value is that the first two SRCU grace periods in a burst might be expedited instead of only the first one, and even then only if we hit a very narrow race window. Besides, this same thing can happen if two CPUs do synchronize_srcu() at the same time after a long-enough pause between grace periods. Both see no callbacks, so both ask for an expedited grace period. So again, the first two grace periods are expedited. As a result, I am having a very hard time justifying the increased lock contention. Am I missing something here? Thanx, Paul > Thanks > Zqiang > > > >While at it if someone is interested in documenting/commenting on the meaning of > >all these SRCU_SIZE_* things, it would be much appreciated! > > > >In the initial stage(SRCU_SIZE_SMALL), there is no srcu_node structures, only initialized > >per-CPU srcu_data structures, at this time, we only use boot-cpu srcu_data structures's ->srcu_cblist > >to store srcu callbacks. > >if the contention of srcu_struct and srcu_data structure's->lock become busy, > >transition to SRCU_SIZE_ALLOC. allocated memory for srcu_node structure at end of the SRCU > >grace period. > >if allocated success, transition to SRCU_SIZE_WAIT_BARRIER, in this state, invoke > >srcu_barrier() will iterate over all possible CPUs, but we still only use boot-CPU srcu_cblist to store callbacks. > >the task calling call_srcu() may have access to a new srcu_node structure or may not, > >because call_srcu() is protected by SRCU, we may not transition to SRCU_SIZE_WAIT_CALL quickly, > >need to wait in this state for a SRCU grace period to end. > >After transition to SRCU_SIZE_WAIT_CALL, we enqueue srcu callbacks to own srcu_data structures's ->srcu_cblist. > > > >Does my description make my commit more detailed? > > > >Thanks > >Zqiang > > > > > > > > > >Thanks. > > > > spin_lock_irqsave_rcu_node(sdp, flags); > > if (rcu_segcblist_pend_cbs(&sdp->srcu_cblist)) { > > spin_unlock_irqrestore_rcu_node(sdp, flags); > > -- > > 2.25.1 > >
On Wed, Nov 23, 2022 at 02:01:50AM +0000, Zhang, Qiang1 wrote: > > >On Thu, Nov 17, 2022 at 07:20:50PM +0800, Zqiang wrote: > > Before SRCU_SIZE_WAIT_CALL srcu_size_state is reached, the srcu > > callback only insert to boot-CPU srcu_data structure's->srcu_cblist > > and other CPU srcu_data structure's->srcu_cblist is always empty. so > > before SRCU_SIZE_WAIT_CALL is reached, need to correctly check boot-CPU > > pend cbs in srcu_might_be_idle(). > > > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > --- > > kernel/rcu/srcutree.c | 7 +++++-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > > index 6af031200580..6d9af9901765 100644 > > --- a/kernel/rcu/srcutree.c > > +++ b/kernel/rcu/srcutree.c > > @@ -1098,8 +1098,11 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp) > > unsigned long tlast; > > > > check_init_srcu_struct(ssp); > > - /* If the local srcu_data structure has callbacks, not idle. */ > > - sdp = raw_cpu_ptr(ssp->sda); > > + /* If the boot CPU or local srcu_data structure has callbacks, not idle. */ > > + if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_CALL) > > + sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id()); > > + else > > + sdp = raw_cpu_ptr(ssp->sda); > > > > Hi Paul, > > For the convert_to_big default configuration(SRCU_SIZING_AUTO), I found that the srcu is in > SRCU_SIZE_SMALL state most of the time in the system, I think this change is meaningful. > > Thoughts ? >You are right that this change might make srcu_might_be_idle() return a >more accurate value in the common case. As things are now, some other >CPU might just now have added a callback, but might not yet have started >that new grace period. In that case, we might expedite an SRCU grace >period when we would not have otherwise done so. However, this change >would also increase contention on the get_boot_cpu_id() CPU's srcu_data >structure's ->lock. > >So the result of that inaccurate return value is that the first two SRCU >grace periods in a burst might be expedited instead of only the first one, >and even then only if we hit a very narrow race window. > >Besides, this same thing can happen if two CPUs do synchronize_srcu() >at the same time after a long-enough pause between grace periods. >Both see no callbacks, so both ask for an expedited grace period. >So again, the first two grace periods are expedited. > >As a result, I am having a very hard time justifying the increased >lock contention. Thanks reply, I have another question, Is this srcu_data structure's->lock necessary? the rcu_segcblist_pend_cbs() only check *tails[RCU_DONE_TAIL] is empty. or can use rcu_segcblist_get_seglen(RCU_WAIT_TAIL + RCU_NEXT_READY_TAIL + RCU_NEXT_TAIL) instead of rcu_segcblist_pend_cbs() to avoid locking? (although this is also inaccurate) Thanks Zqiang > >Am I missing something here? > > Thanx, Paul > > Thanks > Zqiang > > > >While at it if someone is interested in documenting/commenting on the meaning of > >all these SRCU_SIZE_* things, it would be much appreciated! > > > >In the initial stage(SRCU_SIZE_SMALL), there is no srcu_node structures, only initialized > >per-CPU srcu_data structures, at this time, we only use boot-cpu srcu_data structures's ->srcu_cblist > >to store srcu callbacks. > >if the contention of srcu_struct and srcu_data structure's->lock become busy, > >transition to SRCU_SIZE_ALLOC. allocated memory for srcu_node structure at end of the SRCU > >grace period. > >if allocated success, transition to SRCU_SIZE_WAIT_BARRIER, in this state, invoke > >srcu_barrier() will iterate over all possible CPUs, but we still only use boot-CPU srcu_cblist to store callbacks. > >the task calling call_srcu() may have access to a new srcu_node structure or may not, > >because call_srcu() is protected by SRCU, we may not transition to SRCU_SIZE_WAIT_CALL quickly, > >need to wait in this state for a SRCU grace period to end. > >After transition to SRCU_SIZE_WAIT_CALL, we enqueue srcu callbacks to own srcu_data structures's ->srcu_cblist. > > > >Does my description make my commit more detailed? > > > >Thanks > >Zqiang > > > > > > > > > >Thanks. > > > > spin_lock_irqsave_rcu_node(sdp, flags); > > if (rcu_segcblist_pend_cbs(&sdp->srcu_cblist)) { > > spin_unlock_irqrestore_rcu_node(sdp, flags); > > -- > > 2.25.1 > >
On Thu, Nov 24, 2022 at 01:43:42AM +0000, Zhang, Qiang1 wrote: > On Wed, Nov 23, 2022 at 02:01:50AM +0000, Zhang, Qiang1 wrote: > > > > >On Thu, Nov 17, 2022 at 07:20:50PM +0800, Zqiang wrote: > > > Before SRCU_SIZE_WAIT_CALL srcu_size_state is reached, the srcu > > > callback only insert to boot-CPU srcu_data structure's->srcu_cblist > > > and other CPU srcu_data structure's->srcu_cblist is always empty. so > > > before SRCU_SIZE_WAIT_CALL is reached, need to correctly check boot-CPU > > > pend cbs in srcu_might_be_idle(). > > > > > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > > --- > > > kernel/rcu/srcutree.c | 7 +++++-- > > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > > > index 6af031200580..6d9af9901765 100644 > > > --- a/kernel/rcu/srcutree.c > > > +++ b/kernel/rcu/srcutree.c > > > @@ -1098,8 +1098,11 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp) > > > unsigned long tlast; > > > > > > check_init_srcu_struct(ssp); > > > - /* If the local srcu_data structure has callbacks, not idle. */ > > > - sdp = raw_cpu_ptr(ssp->sda); > > > + /* If the boot CPU or local srcu_data structure has callbacks, not idle. */ > > > + if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_CALL) > > > + sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id()); > > > + else > > > + sdp = raw_cpu_ptr(ssp->sda); > > > > > > > Hi Paul, > > > > For the convert_to_big default configuration(SRCU_SIZING_AUTO), I found that the srcu is in > > SRCU_SIZE_SMALL state most of the time in the system, I think this change is meaningful. > > > > Thoughts ? > > >You are right that this change might make srcu_might_be_idle() return a > >more accurate value in the common case. As things are now, some other > >CPU might just now have added a callback, but might not yet have started > >that new grace period. In that case, we might expedite an SRCU grace > >period when we would not have otherwise done so. However, this change > >would also increase contention on the get_boot_cpu_id() CPU's srcu_data > >structure's ->lock. > > > >So the result of that inaccurate return value is that the first two SRCU > >grace periods in a burst might be expedited instead of only the first one, > >and even then only if we hit a very narrow race window. > > > >Besides, this same thing can happen if two CPUs do synchronize_srcu() > >at the same time after a long-enough pause between grace periods. > >Both see no callbacks, so both ask for an expedited grace period. > >So again, the first two grace periods are expedited. > > > >As a result, I am having a very hard time justifying the increased > >lock contention. > > Thanks reply, I have another question, Is this srcu_data structure's->lock necessary? > the rcu_segcblist_pend_cbs() only check *tails[RCU_DONE_TAIL] is empty. > or can use rcu_segcblist_get_seglen(RCU_WAIT_TAIL + RCU_NEXT_READY_TAIL + RCU_NEXT_TAIL) > instead of rcu_segcblist_pend_cbs() to avoid locking? (although this is also inaccurate) That extra "*" means that the lock must be acquired. Otherwise, the pointed-to callback might even be unmapped between the time we fetched the pointer and the time we dereferenced it. Thanx, Paul. > Thanks > Zqiang > > > > >Am I missing something here? > > > > Thanx, Paul > > > > Thanks > > Zqiang > > > > > > >While at it if someone is interested in documenting/commenting on the meaning of > > >all these SRCU_SIZE_* things, it would be much appreciated! > > > > > >In the initial stage(SRCU_SIZE_SMALL), there is no srcu_node structures, only initialized > > >per-CPU srcu_data structures, at this time, we only use boot-cpu srcu_data structures's ->srcu_cblist > > >to store srcu callbacks. > > >if the contention of srcu_struct and srcu_data structure's->lock become busy, > > >transition to SRCU_SIZE_ALLOC. allocated memory for srcu_node structure at end of the SRCU > > >grace period. > > >if allocated success, transition to SRCU_SIZE_WAIT_BARRIER, in this state, invoke > > >srcu_barrier() will iterate over all possible CPUs, but we still only use boot-CPU srcu_cblist to store callbacks. > > >the task calling call_srcu() may have access to a new srcu_node structure or may not, > > >because call_srcu() is protected by SRCU, we may not transition to SRCU_SIZE_WAIT_CALL quickly, > > >need to wait in this state for a SRCU grace period to end. > > >After transition to SRCU_SIZE_WAIT_CALL, we enqueue srcu callbacks to own srcu_data structures's ->srcu_cblist. > > > > > >Does my description make my commit more detailed? > > > > > >Thanks > > >Zqiang > > > > > > > > > > > > > > > >Thanks. > > > > > > spin_lock_irqsave_rcu_node(sdp, flags); > > > if (rcu_segcblist_pend_cbs(&sdp->srcu_cblist)) { > > > spin_unlock_irqrestore_rcu_node(sdp, flags); > > > -- > > > 2.25.1 > > >
On Thu, Nov 24, 2022 at 01:43:42AM +0000, Zhang, Qiang1 wrote: > On Wed, Nov 23, 2022 at 02:01:50AM +0000, Zhang, Qiang1 wrote: > > > > >On Thu, Nov 17, 2022 at 07:20:50PM +0800, Zqiang wrote: > > > Before SRCU_SIZE_WAIT_CALL srcu_size_state is reached, the srcu > > > callback only insert to boot-CPU srcu_data structure's->srcu_cblist > > > and other CPU srcu_data structure's->srcu_cblist is always empty. so > > > before SRCU_SIZE_WAIT_CALL is reached, need to correctly check boot-CPU > > > pend cbs in srcu_might_be_idle(). > > > > > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > > --- > > > kernel/rcu/srcutree.c | 7 +++++-- > > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > > > index 6af031200580..6d9af9901765 100644 > > > --- a/kernel/rcu/srcutree.c > > > +++ b/kernel/rcu/srcutree.c > > > @@ -1098,8 +1098,11 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp) > > > unsigned long tlast; > > > > > > check_init_srcu_struct(ssp); > > > - /* If the local srcu_data structure has callbacks, not idle. */ > > > - sdp = raw_cpu_ptr(ssp->sda); > > > + /* If the boot CPU or local srcu_data structure has callbacks, not idle. */ > > > + if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_CALL) > > > + sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id()); > > > + else > > > + sdp = raw_cpu_ptr(ssp->sda); > > > > > > > Hi Paul, > > > > For the convert_to_big default configuration(SRCU_SIZING_AUTO), I found that the srcu is in > > SRCU_SIZE_SMALL state most of the time in the system, I think this change is meaningful. > > > > Thoughts ? > > >You are right that this change might make srcu_might_be_idle() return a > >more accurate value in the common case. As things are now, some other > >CPU might just now have added a callback, but might not yet have started > >that new grace period. In that case, we might expedite an SRCU grace > >period when we would not have otherwise done so. However, this change > >would also increase contention on the get_boot_cpu_id() CPU's srcu_data > >structure's ->lock. > > > >So the result of that inaccurate return value is that the first two SRCU > >grace periods in a burst might be expedited instead of only the first one, > >and even then only if we hit a very narrow race window. > > > >Besides, this same thing can happen if two CPUs do synchronize_srcu() > >at the same time after a long-enough pause between grace periods. > >Both see no callbacks, so both ask for an expedited grace period. > >So again, the first two grace periods are expedited. > > > >As a result, I am having a very hard time justifying the increased > >lock contention. > > Thanks reply, I have another question, Is this srcu_data structure's->lock necessary? > the rcu_segcblist_pend_cbs() only check *tails[RCU_DONE_TAIL] is empty. > or can use rcu_segcblist_get_seglen(RCU_WAIT_TAIL + RCU_NEXT_READY_TAIL + RCU_NEXT_TAIL) > instead of rcu_segcblist_pend_cbs() to avoid locking? (although this is also inaccurate) >That extra "*" means that the lock must be acquired. Otherwise, the >pointed-to callback might even be unmapped between the time we fetched >the pointer and the time we dereferenced it. Thanks for detailed explanation , learn more😊. > > Thanx, Paul. > Thanks > Zqiang > > > > >Am I missing something here? > > > > Thanx, Paul > > > > Thanks > > Zqiang > > > > > > >While at it if someone is interested in documenting/commenting on the meaning of > > >all these SRCU_SIZE_* things, it would be much appreciated! > > > > > >In the initial stage(SRCU_SIZE_SMALL), there is no srcu_node structures, only initialized > > >per-CPU srcu_data structures, at this time, we only use boot-cpu srcu_data structures's ->srcu_cblist > > >to store srcu callbacks. > > >if the contention of srcu_struct and srcu_data structure's->lock become busy, > > >transition to SRCU_SIZE_ALLOC. allocated memory for srcu_node structure at end of the SRCU > > >grace period. > > >if allocated success, transition to SRCU_SIZE_WAIT_BARRIER, in this state, invoke > > >srcu_barrier() will iterate over all possible CPUs, but we still only use boot-CPU srcu_cblist to store callbacks. > > >the task calling call_srcu() may have access to a new srcu_node structure or may not, > > >because call_srcu() is protected by SRCU, we may not transition to SRCU_SIZE_WAIT_CALL quickly, > > >need to wait in this state for a SRCU grace period to end. > > >After transition to SRCU_SIZE_WAIT_CALL, we enqueue srcu callbacks to own srcu_data structures's ->srcu_cblist. > > > > > >Does my description make my commit more detailed? > > > > > >Thanks > > >Zqiang > > > > > > > > > > > > > > > >Thanks. > > > > > > spin_lock_irqsave_rcu_node(sdp, flags); > > > if (rcu_segcblist_pend_cbs(&sdp->srcu_cblist)) { > > > spin_unlock_irqrestore_rcu_node(sdp, flags); > > > -- > > > 2.25.1 > > >
On Thu, Nov 17, 2022 at 03:20:25PM +0100, Frederic Weisbecker wrote: > On Thu, Nov 17, 2022 at 07:20:50PM +0800, Zqiang wrote: > > Before SRCU_SIZE_WAIT_CALL srcu_size_state is reached, the srcu > > callback only insert to boot-CPU srcu_data structure's->srcu_cblist > > and other CPU srcu_data structure's->srcu_cblist is always empty. so > > before SRCU_SIZE_WAIT_CALL is reached, need to correctly check boot-CPU > > pend cbs in srcu_might_be_idle(). > > > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > --- > > kernel/rcu/srcutree.c | 7 +++++-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > > index 6af031200580..6d9af9901765 100644 > > --- a/kernel/rcu/srcutree.c > > +++ b/kernel/rcu/srcutree.c > > @@ -1098,8 +1098,11 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp) > > unsigned long tlast; > > > > check_init_srcu_struct(ssp); > > - /* If the local srcu_data structure has callbacks, not idle. */ > > - sdp = raw_cpu_ptr(ssp->sda); > > + /* If the boot CPU or local srcu_data structure has callbacks, not idle. */ > > + if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_CALL) > > + sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id()); > > + else > > + sdp = raw_cpu_ptr(ssp->sda); > > While at it if someone is interested in documenting/commenting on the meaning of > all these SRCU_SIZE_* things, it would be much appreciated! > Due to a conflict understanding to the code, once hesitate to jump in. But anyway, just bold to move ahead. I have send a series "[PATCH 0/3] srcu: shrink the num of srcu_size_state", which opens the discussion. (Have Cced you and Zqiang) Please have a look. Thanks, Pingfan > Thanks. > > > spin_lock_irqsave_rcu_node(sdp, flags); > > if (rcu_segcblist_pend_cbs(&sdp->srcu_cblist)) { > > spin_unlock_irqrestore_rcu_node(sdp, flags); > > -- > > 2.25.1 > >
diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c index 6af031200580..6d9af9901765 100644 --- a/kernel/rcu/srcutree.c +++ b/kernel/rcu/srcutree.c @@ -1098,8 +1098,11 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp) unsigned long tlast; check_init_srcu_struct(ssp); - /* If the local srcu_data structure has callbacks, not idle. */ - sdp = raw_cpu_ptr(ssp->sda); + /* If the boot CPU or local srcu_data structure has callbacks, not idle. */ + if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_CALL) + sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id()); + else + sdp = raw_cpu_ptr(ssp->sda); spin_lock_irqsave_rcu_node(sdp, flags); if (rcu_segcblist_pend_cbs(&sdp->srcu_cblist)) { spin_unlock_irqrestore_rcu_node(sdp, flags);