Message ID | 20230118073014.2020743-1-qiang1.zhang@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp2200924wrn; Tue, 17 Jan 2023 23:54:41 -0800 (PST) X-Google-Smtp-Source: AMrXdXtTGZelLBwBWhYfoI6XjnkLi0n3B1GndVQvdYqRkcaBv+5TaaNGPMzoezCa9TBH4yDmRjBh X-Received: by 2002:a17:90a:be10:b0:229:678a:39ab with SMTP id a16-20020a17090abe1000b00229678a39abmr6100609pjs.35.1674028481362; Tue, 17 Jan 2023 23:54:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674028481; cv=none; d=google.com; s=arc-20160816; b=W4kaR1W33aVU1YunjqhbHjwuGLOzCBPvJG5vxYKbdMNTbnUwKP8H3T3X9GBJ2T9r37 3QkGotr3Mpg+RpmMIPdMr59WuIBRXXO6Mur7W3ImGcQfOP8Bbhp/htLoSmvFh37NvpwD BMnysqV3YZLS5w38pbq2VfMwHbVJCWvhcM731ZPWw3ti3vr9ZuEWWy7cI8jLCJR5Q6Zh j4iGnfg/jSjb7AZX+9TsN1UXU7EUIn1l/v55BbFqazulMoFfNg/OHf4BNAuA49rOMyWd ar2SAQpG7ddPUC0yv0t80o5fGomEF7hC8Hg/G02yna1kKU0oiDmGKXO7IcJt9iBIdB7G LOiw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=jkFHwr6OGrJScqfNJaOHlFFfqycULpx/od80FffJUDI=; b=Ql2LRpFsHbkh679DNisNO2nLqKIT0FN8mILvB5nX9YqHSxsZxjSmcFKcrWB7IvA/pY I41lCBimWH9ToTOR+l4AzOYnk38EZhwMW+aJotBdjbuiBMida0VEkhvVDKVUlLIU/5vT D1HXylc0l+l7AL/Y935eOCcfF8ypkCFssG/EBdGU2dNeU5a9YbOsS3RA+HGSLOYa+/yF rxtnY4d+8JprnAH7X4ViY7IKNlM6yfxRfz7O31HylZhgorVfYDjG2EKzv4NZOPvxIiL3 N/5m23tSKtw9z3pDBVU5PxCwaq9XpS+nnH27r3eU8xhcbFRlPSz0WYK7B90o2bmNxWfd 7AlA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=lYD88QSl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s13-20020a63924d000000b004b0b76a135esi32870042pgn.374.2023.01.17.23.54.29; Tue, 17 Jan 2023 23:54:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=lYD88QSl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229821AbjARHyA (ORCPT <rfc822;pfffrao@gmail.com> + 99 others); Wed, 18 Jan 2023 02:54:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50846 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229598AbjARHxY (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 18 Jan 2023 02:53:24 -0500 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ED9A15085D; Tue, 17 Jan 2023 23:25:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1674026700; x=1705562700; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=rs0p7J65P5DaU8eumyIPyYV1lHhrG+9oA++7hTvmEy0=; b=lYD88QSlYTZib24UP4RlYcbl9BJjnBGPOb/Syw9XkRJBeW8i0p2Rusue CbaTpQhro9l8CjzIOlxkKL6pQ9GwR9jHYyt0OdmGDNlN+zaSwO+oUG89J h1Vebe3FCTWuf0UM+0oZMPNMOaqmbNB6xC8RBEONIRiqILEJsBtjap/5V CB3vnrWiVK83hCFMAG6Ze8Tf9kWZzndEn1qV1oQdDvhRgHr25BgW+Xvsj qhxOO8cA6k4NBof40E+/5kRFJpOl5rApR8y0kjP8FSNmKMRvKwsO7BVWr fnMFR8I8u0XHb0T40dDIofqFYZOHxoR8lSfsyhMy/AUzxrDRe0oOW9+A9 Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10593"; a="308483251" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="308483251" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jan 2023 23:25:00 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10593"; a="661612076" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="661612076" Received: from zq-optiplex-7090.bj.intel.com ([10.238.156.129]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jan 2023 23:24:58 -0800 From: Zqiang <qiang1.zhang@intel.com> To: paulmck@kernel.org, frederic@kernel.org, quic_neeraju@quicinc.com, joel@joelfernandes.org Cc: rcu@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3] rcu: Remove impossible wakeup rcu GP kthread action from rcu_report_qs_rdp() Date: Wed, 18 Jan 2023 15:30:14 +0800 Message-Id: <20230118073014.2020743-1-qiang1.zhang@intel.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755235374380639373?= X-GMAIL-MSGID: =?utf-8?q?1755346088569486982?= |
Series |
[v3] rcu: Remove impossible wakeup rcu GP kthread action from rcu_report_qs_rdp()
|
|
Commit Message
Zqiang
Jan. 18, 2023, 7:30 a.m. UTC
When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's -> grpmask has not been cleared from the corresponding rcu_node structure's ->qsmask, after that will clear and report quiescent state, but in this time, this also means that current grace period is not end, the current grace period is ongoing, because the rcu_gp_in_progress() currently return true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible to return true. This commit therefore remove impossible rcu_gp_kthread_wake() calling. Signed-off-by: Zqiang <qiang1.zhang@intel.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> --- kernel/rcu/tree.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
Comments
On Wed, Jan 18, 2023 at 03:30:14PM +0800, Zqiang wrote: > When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's -> > grpmask has not been cleared from the corresponding rcu_node structure's > ->qsmask, after that will clear and report quiescent state, but in this > time, this also means that current grace period is not end, the current > grace period is ongoing, because the rcu_gp_in_progress() currently return > true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible > to return true. > > This commit therefore remove impossible rcu_gp_kthread_wake() calling. > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Thanks! > --- > kernel/rcu/tree.c | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index b2c204529478..0962c2202d45 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > { > unsigned long flags; > unsigned long mask; > - bool needwake = false; > bool needacc = false; > struct rcu_node *rnp; > > @@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > * NOCB kthreads have their own way to deal with that... > */ > if (!rcu_rdp_is_offloaded(rdp)) { > - needwake = rcu_accelerate_cbs(rnp, rdp); > + /* > + * Current GP does not end, invoke rcu_gp_in_progress() > + * will return true and so doesn't wake up GP kthread to > + * start a new GP. > + */ > + WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp)); > } else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) { > /* > * ...but NOCB kthreads may miss or delay callbacks acceleration > @@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > rcu_disable_urgency_upon_qs(rdp); > rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags); > /* ^^^ Released rnp->lock */ > - if (needwake) > - rcu_gp_kthread_wake(); > > if (needacc) { > rcu_nocb_lock_irqsave(rdp, flags); > -- > 2.25.1 >
On Wed, Jan 18, 2023 at 03:30:14PM +0800, Zqiang wrote: > When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's -> > grpmask has not been cleared from the corresponding rcu_node structure's > ->qsmask, after that will clear and report quiescent state, but in this > time, this also means that current grace period is not end, the current > grace period is ongoing, because the rcu_gp_in_progress() currently return > true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible > to return true. > > This commit therefore remove impossible rcu_gp_kthread_wake() calling. > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Queued (wordsmithed as shown below, as always, please check) for further testing and review, thank you both! Thanx, Paul ------------------------------------------------------------------------ commit fbe3e300ec8b3edd2b8f84dab4dc98947cf71eb8 Author: Zqiang <qiang1.zhang@intel.com> Date: Wed Jan 18 15:30:14 2023 +0800 rcu: Remove never-set needwake assignment from rcu_report_qs_rdp() The rcu_accelerate_cbs() function is invoked by rcu_report_qs_rdp() only if there is a grace period in progress that is still blocked by at least one CPU on this rcu_node structure. This means that rcu_accelerate_cbs() should never return the value true, and thus that this function should never set the needwake variable and in turn never invoke rcu_gp_kthread_wake(). This commit therefore removes the needwake variable and the invocation of rcu_gp_kthread_wake() in favor of a WARN_ON_ONCE() on the call to rcu_accelerate_cbs(). The purpose of this new WARN_ON_ONCE() is to detect situations where the system's opinion differs from ours. Signed-off-by: Zqiang <qiang1.zhang@intel.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index b2c2045294780..7a3085ad0a7df 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) { unsigned long flags; unsigned long mask; - bool needwake = false; bool needacc = false; struct rcu_node *rnp; @@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) * NOCB kthreads have their own way to deal with that... */ if (!rcu_rdp_is_offloaded(rdp)) { - needwake = rcu_accelerate_cbs(rnp, rdp); + /* + * The current GP has not yet ended, so it + * should not be possible for rcu_accelerate_cbs() + * to return true. So complain, but don't awaken. + */ + WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp)); } else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) { /* * ...but NOCB kthreads may miss or delay callbacks acceleration @@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) rcu_disable_urgency_upon_qs(rdp); rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags); /* ^^^ Released rnp->lock */ - if (needwake) - rcu_gp_kthread_wake(); if (needacc) { rcu_nocb_lock_irqsave(rdp, flags);
On Wed, Jan 18, 2023 at 03:30:14PM +0800, Zqiang wrote: > When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's -> > grpmask has not been cleared from the corresponding rcu_node structure's > ->qsmask, after that will clear and report quiescent state, but in this > time, this also means that current grace period is not end, the current > grace period is ongoing, because the rcu_gp_in_progress() currently return > true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible > to return true. > > This commit therefore remove impossible rcu_gp_kthread_wake() calling. > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > Reviewed-by: Frederic Weisbecker <frederic@kernel.org> > >Queued (wordsmithed as shown below, as always, please check) for further >testing and review, thank you both! > > Thanx, Paul > >------------------------------------------------------------------------ > >commit fbe3e300ec8b3edd2b8f84dab4dc98947cf71eb8 >Author: Zqiang <qiang1.zhang@intel.com> >Date: Wed Jan 18 15:30:14 2023 +0800 > > rcu: Remove never-set needwake assignment from rcu_report_qs_rdp() > > The rcu_accelerate_cbs() function is invoked by rcu_report_qs_rdp() > only if there is a grace period in progress that is still blocked > by at least one CPU on this rcu_node structure. This means that > rcu_accelerate_cbs() should never return the value true, and thus that > this function should never set the needwake variable and in turn never > invoke rcu_gp_kthread_wake(). > > This commit therefore removes the needwake variable and the invocation > of rcu_gp_kthread_wake() in favor of a WARN_ON_ONCE() on the call to > rcu_accelerate_cbs(). The purpose of this new WARN_ON_ONCE() is to > detect situations where the system's opinion differs from ours. > Thanks Paul, this is more clear for me😊. Thanks Zqiang > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > Reviewed-by: Frederic Weisbecker <frederic@kernel.org> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > >diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c >index b2c2045294780..7a3085ad0a7df 100644 >--- a/kernel/rcu/tree.c >+++ b/kernel/rcu/tree.c >@@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) >{ > unsigned long flags; > unsigned long mask; >- bool needwake = false; > bool needacc = false; > struct rcu_node *rnp; > >@@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > * NOCB kthreads have their own way to deal with that... > */ > if (!rcu_rdp_is_offloaded(rdp)) { >- needwake = rcu_accelerate_cbs(rnp, rdp); >+ /* >+ * The current GP has not yet ended, so it >+ * should not be possible for rcu_accelerate_cbs() >+ * to return true. So complain, but don't awaken. >+ */ >+ WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp)); > } else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) { > /* > * ...but NOCB kthreads may miss or delay callbacks acceleration >@@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > rcu_disable_urgency_upon_qs(rdp); > rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags); > /* ^^^ Released rnp->lock */ >- if (needwake) >- rcu_gp_kthread_wake(); > > if (needacc) { > rcu_nocb_lock_irqsave(rdp, flags);
On Wed, Jan 18, 2023 at 10:07:14AM -0800, Paul E. McKenney wrote: > On Wed, Jan 18, 2023 at 03:30:14PM +0800, Zqiang wrote: > > When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's -> > > grpmask has not been cleared from the corresponding rcu_node structure's > > ->qsmask, after that will clear and report quiescent state, but in this > > time, this also means that current grace period is not end, the current > > grace period is ongoing, because the rcu_gp_in_progress() currently return > > true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible > > to return true. > > > > This commit therefore remove impossible rcu_gp_kthread_wake() calling. > > > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > Reviewed-by: Frederic Weisbecker <frederic@kernel.org> > > Queued (wordsmithed as shown below, as always, please check) for further > testing and review, thank you both! > > Thanx, Paul > > ------------------------------------------------------------------------ > > commit fbe3e300ec8b3edd2b8f84dab4dc98947cf71eb8 > Author: Zqiang <qiang1.zhang@intel.com> > Date: Wed Jan 18 15:30:14 2023 +0800 > > rcu: Remove never-set needwake assignment from rcu_report_qs_rdp() > > The rcu_accelerate_cbs() function is invoked by rcu_report_qs_rdp() > only if there is a grace period in progress that is still blocked > by at least one CPU on this rcu_node structure. This means that > rcu_accelerate_cbs() should never return the value true, and thus that > this function should never set the needwake variable and in turn never > invoke rcu_gp_kthread_wake(). > > This commit therefore removes the needwake variable and the invocation > of rcu_gp_kthread_wake() in favor of a WARN_ON_ONCE() on the call to > rcu_accelerate_cbs(). The purpose of this new WARN_ON_ONCE() is to > detect situations where the system's opinion differs from ours. > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > Reviewed-by: Frederic Weisbecker <frederic@kernel.org> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index b2c2045294780..7a3085ad0a7df 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > { > unsigned long flags; > unsigned long mask; > - bool needwake = false; > bool needacc = false; > struct rcu_node *rnp; > > @@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > * NOCB kthreads have their own way to deal with that... > */ > if (!rcu_rdp_is_offloaded(rdp)) { > - needwake = rcu_accelerate_cbs(rnp, rdp); > + /* > + * The current GP has not yet ended, so it > + * should not be possible for rcu_accelerate_cbs() > + * to return true. So complain, but don't awaken. > + */ > + WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp)); > } else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) { > /* > * ...but NOCB kthreads may miss or delay callbacks acceleration > @@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > rcu_disable_urgency_upon_qs(rdp); > rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags); > /* ^^^ Released rnp->lock */ > - if (needwake) > - rcu_gp_kthread_wake(); AFAICS, there is almost no compiler benefit of doing this, and zero runtime benefit of doing this. The WARN_ON_ONCE() also involves a runtime condition check of the return value of rcu_accelerate_cbs(), so you still have a branch. Yes, maybe slightly smaller code without the wake call, but I'm not sure that is worth it. And, if the opinion of system differs, its a bug anyway, so more added risk. > > if (needacc) { > rcu_nocb_lock_irqsave(rdp, flags); And when needacc = true, rcu_accelerate_cbs_unlocked() tries to do a wake up anyway, so it is consistent with nocb vs !nocb. So I am not a fan of this change. ;-) thanks, - Joel
On Fri, Jan 20, 2023 at 3:14 AM Joel Fernandes <joel@joelfernandes.org> wrote: > > On Wed, Jan 18, 2023 at 10:07:14AM -0800, Paul E. McKenney wrote: > > On Wed, Jan 18, 2023 at 03:30:14PM +0800, Zqiang wrote: > > > When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's -> > > > grpmask has not been cleared from the corresponding rcu_node structure's > > > ->qsmask, after that will clear and report quiescent state, but in this > > > time, this also means that current grace period is not end, the current > > > grace period is ongoing, because the rcu_gp_in_progress() currently return > > > true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible > > > to return true. > > > > > > This commit therefore remove impossible rcu_gp_kthread_wake() calling. > > > > > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > > Reviewed-by: Frederic Weisbecker <frederic@kernel.org> > > > > Queued (wordsmithed as shown below, as always, please check) for further > > testing and review, thank you both! > > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > commit fbe3e300ec8b3edd2b8f84dab4dc98947cf71eb8 > > Author: Zqiang <qiang1.zhang@intel.com> > > Date: Wed Jan 18 15:30:14 2023 +0800 > > > > rcu: Remove never-set needwake assignment from rcu_report_qs_rdp() > > > > The rcu_accelerate_cbs() function is invoked by rcu_report_qs_rdp() > > only if there is a grace period in progress that is still blocked > > by at least one CPU on this rcu_node structure. This means that > > rcu_accelerate_cbs() should never return the value true, and thus that > > this function should never set the needwake variable and in turn never > > invoke rcu_gp_kthread_wake(). > > > > This commit therefore removes the needwake variable and the invocation > > of rcu_gp_kthread_wake() in favor of a WARN_ON_ONCE() on the call to > > rcu_accelerate_cbs(). The purpose of this new WARN_ON_ONCE() is to > > detect situations where the system's opinion differs from ours. > > > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > Reviewed-by: Frederic Weisbecker <frederic@kernel.org> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index b2c2045294780..7a3085ad0a7df 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > > { > > unsigned long flags; > > unsigned long mask; > > - bool needwake = false; > > bool needacc = false; > > struct rcu_node *rnp; > > > > @@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > > * NOCB kthreads have their own way to deal with that... > > */ > > if (!rcu_rdp_is_offloaded(rdp)) { > > - needwake = rcu_accelerate_cbs(rnp, rdp); > > + /* > > + * The current GP has not yet ended, so it > > + * should not be possible for rcu_accelerate_cbs() > > + * to return true. So complain, but don't awaken. > > + */ > > + WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp)); > > } else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) { > > /* > > * ...but NOCB kthreads may miss or delay callbacks acceleration > > @@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > > rcu_disable_urgency_upon_qs(rdp); > > rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags); > > /* ^^^ Released rnp->lock */ > > - if (needwake) > > - rcu_gp_kthread_wake(); > > AFAICS, there is almost no compiler benefit of doing this, and zero runtime > benefit of doing this. The WARN_ON_ONCE() also involves a runtime condition > check of the return value of rcu_accelerate_cbs(), so you still have a > branch. Yes, maybe slightly smaller code without the wake call, but I'm not > sure that is worth it. > > And, if the opinion of system differs, its a bug anyway, so more added risk. > > > > > > if (needacc) { > > rcu_nocb_lock_irqsave(rdp, flags); > > And when needacc = true, rcu_accelerate_cbs_unlocked() tries to do a wake up > anyway, so it is consistent with nocb vs !nocb. Sorry, I mean "inconsistent". Thanks, - Joel
> On Wed, Jan 18, 2023 at 03:30:14PM +0800, Zqiang wrote: > > When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's -> > > grpmask has not been cleared from the corresponding rcu_node structure's > > ->qsmask, after that will clear and report quiescent state, but in this > > time, this also means that current grace period is not end, the current > > grace period is ongoing, because the rcu_gp_in_progress() currently return > > true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible > > to return true. > > > > This commit therefore remove impossible rcu_gp_kthread_wake() calling. > > > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > Reviewed-by: Frederic Weisbecker <frederic@kernel.org> > > Queued (wordsmithed as shown below, as always, please check) for further > testing and review, thank you both! > > Thanx, Paul > > ------------------------------------------------------------------------ > > commit fbe3e300ec8b3edd2b8f84dab4dc98947cf71eb8 > Author: Zqiang <qiang1.zhang@intel.com> > Date: Wed Jan 18 15:30:14 2023 +0800 > > rcu: Remove never-set needwake assignment from rcu_report_qs_rdp() > > The rcu_accelerate_cbs() function is invoked by rcu_report_qs_rdp() > only if there is a grace period in progress that is still blocked > by at least one CPU on this rcu_node structure. This means that > rcu_accelerate_cbs() should never return the value true, and thus that > this function should never set the needwake variable and in turn never > invoke rcu_gp_kthread_wake(). > > This commit therefore removes the needwake variable and the invocation > of rcu_gp_kthread_wake() in favor of a WARN_ON_ONCE() on the call to > rcu_accelerate_cbs(). The purpose of this new WARN_ON_ONCE() is to > detect situations where the system's opinion differs from ours. > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > Reviewed-by: Frederic Weisbecker <frederic@kernel.org> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index b2c2045294780..7a3085ad0a7df 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > { > unsigned long flags; > unsigned long mask; > - bool needwake = false; > bool needacc = false; > struct rcu_node *rnp; > > @@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > * NOCB kthreads have their own way to deal with that... > */ > if (!rcu_rdp_is_offloaded(rdp)) { > - needwake = rcu_accelerate_cbs(rnp, rdp); > + /* > + * The current GP has not yet ended, so it > + * should not be possible for rcu_accelerate_cbs() > + * to return true. So complain, but don't awaken. > + */ > + WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp)); > } else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) { > /* > * ...but NOCB kthreads may miss or delay callbacks acceleration > @@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > rcu_disable_urgency_upon_qs(rdp); > rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags); > /* ^^^ Released rnp->lock */ > - if (needwake) > - rcu_gp_kthread_wake(); > >AFAICS, there is almost no compiler benefit of doing this, and zero runtime >benefit of doing this. The WARN_ON_ONCE() also involves a runtime condition >check of the return value of rcu_accelerate_cbs(), so you still have a >branch. Yes, maybe slightly smaller code without the wake call, but I'm not >sure that is worth it. > >And, if the opinion of system differs, its a bug anyway, so more added risk. > > > > if (needacc) { > rcu_nocb_lock_irqsave(rdp, flags); > >And when needacc = true, rcu_accelerate_cbs_unlocked() tries to do a wake up >anyway, so it is consistent with nocb vs !nocb. For !nocb, we invoked rcu_accelerate_cbs() before report qs, so this GP is impossible to end and we also not set RCU_GP_FLAG_INIT to start new GP in rcu_accelerate_cbs(). but for nocb, when needacc = true, we invoke rcu_accelerate_cbs_unlocked() after current CPU has reported qs, if all CPU have been reported qs, we will wakeup gp kthread to end this GP in rcu_report_qs_rnp(). after that, the rcu_accelerate_cbs_unlocked() is possible to try to wake up gp kthread if this GP has ended at this time. so nocb vs !nocb is likely to be inconsistent. Thanks Zqiang > >So I am not a fan of this change. ;-) > >thanks, > > - Joel
On Fri, Jan 20, 2023 at 4:09 AM Zhang, Qiang1 <qiang1.zhang@intel.com> wrote: > > > On Wed, Jan 18, 2023 at 03:30:14PM +0800, Zqiang wrote: > > > When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's -> > > > grpmask has not been cleared from the corresponding rcu_node structure's > > > ->qsmask, after that will clear and report quiescent state, but in this > > > time, this also means that current grace period is not end, the current > > > grace period is ongoing, because the rcu_gp_in_progress() currently return > > > true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible > > > to return true. > > > > > > This commit therefore remove impossible rcu_gp_kthread_wake() calling. > > > > > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > > Reviewed-by: Frederic Weisbecker <frederic@kernel.org> > > > > Queued (wordsmithed as shown below, as always, please check) for further > > testing and review, thank you both! > > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > commit fbe3e300ec8b3edd2b8f84dab4dc98947cf71eb8 > > Author: Zqiang <qiang1.zhang@intel.com> > > Date: Wed Jan 18 15:30:14 2023 +0800 > > > > rcu: Remove never-set needwake assignment from rcu_report_qs_rdp() > > > > The rcu_accelerate_cbs() function is invoked by rcu_report_qs_rdp() > > only if there is a grace period in progress that is still blocked > > by at least one CPU on this rcu_node structure. This means that > > rcu_accelerate_cbs() should never return the value true, and thus that > > this function should never set the needwake variable and in turn never > > invoke rcu_gp_kthread_wake(). > > > > This commit therefore removes the needwake variable and the invocation > > of rcu_gp_kthread_wake() in favor of a WARN_ON_ONCE() on the call to > > rcu_accelerate_cbs(). The purpose of this new WARN_ON_ONCE() is to > > detect situations where the system's opinion differs from ours. > > > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > Reviewed-by: Frederic Weisbecker <frederic@kernel.org> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index b2c2045294780..7a3085ad0a7df 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > > { > > unsigned long flags; > > unsigned long mask; > > - bool needwake = false; > > bool needacc = false; > > struct rcu_node *rnp; > > > > @@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > > * NOCB kthreads have their own way to deal with that... > > */ > > if (!rcu_rdp_is_offloaded(rdp)) { > > - needwake = rcu_accelerate_cbs(rnp, rdp); > > + /* > > + * The current GP has not yet ended, so it > > + * should not be possible for rcu_accelerate_cbs() > > + * to return true. So complain, but don't awaken. > > + */ > > + WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp)); > > } else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) { > > /* > > * ...but NOCB kthreads may miss or delay callbacks acceleration > > @@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > > rcu_disable_urgency_upon_qs(rdp); > > rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags); > > /* ^^^ Released rnp->lock */ > > - if (needwake) > > - rcu_gp_kthread_wake(); > > > >AFAICS, there is almost no compiler benefit of doing this, and zero runtime > >benefit of doing this. The WARN_ON_ONCE() also involves a runtime condition > >check of the return value of rcu_accelerate_cbs(), so you still have a > >branch. Yes, maybe slightly smaller code without the wake call, but I'm not > >sure that is worth it. > > > >And, if the opinion of system differs, its a bug anyway, so more added risk. > > > > > > > > if (needacc) { > > rcu_nocb_lock_irqsave(rdp, flags); > > > >And when needacc = true, rcu_accelerate_cbs_unlocked() tries to do a wake up > >anyway, so it is consistent with nocb vs !nocb. > > For !nocb, we invoked rcu_accelerate_cbs() before report qs, so this GP is impossible to end > and we also not set RCU_GP_FLAG_INIT to start new GP in rcu_accelerate_cbs(). > but for nocb, when needacc = true, we invoke rcu_accelerate_cbs_unlocked() after current CPU > has reported qs, if all CPU have been reported qs, we will wakeup gp kthread to end this GP in > rcu_report_qs_rnp(). after that, the rcu_accelerate_cbs_unlocked() is possible to try to wake up > gp kthread if this GP has ended at this time. so nocb vs !nocb is likely to be inconsistent. > That is a fair point. But after gp ends, rcu_check_quiescent_state() -> note_gp_changes() which will do a accel + GP thread wake up at that point anyway, once it notices that a GP has come to an end. That should happen for both the nocb and !nocb cases right? I am wondering if rcu_report_qs_rdp() needs to be rethought to make both cases consistent. Why does the nocb case need an accel + GP thread wakeup in the rcu_report_qs_rdp() function, but the !nocb case does not? (I am out of office till Monday but will intermittently (maybe) check in, RCU is one of those things that daydreaming tends to lend itself to...) - Joel
> > > On Wed, Jan 18, 2023 at 03:30:14PM +0800, Zqiang wrote: > > > When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's -> > > > grpmask has not been cleared from the corresponding rcu_node structure's > > > ->qsmask, after that will clear and report quiescent state, but in this > > > time, this also means that current grace period is not end, the current > > > grace period is ongoing, because the rcu_gp_in_progress() currently return > > > true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible > > > to return true. > > > > > > This commit therefore remove impossible rcu_gp_kthread_wake() calling. > > > > > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > > Reviewed-by: Frederic Weisbecker <frederic@kernel.org> > > > > Queued (wordsmithed as shown below, as always, please check) for further > > testing and review, thank you both! > > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > commit fbe3e300ec8b3edd2b8f84dab4dc98947cf71eb8 > > Author: Zqiang <qiang1.zhang@intel.com> > > Date: Wed Jan 18 15:30:14 2023 +0800 > > > > rcu: Remove never-set needwake assignment from rcu_report_qs_rdp() > > > > The rcu_accelerate_cbs() function is invoked by rcu_report_qs_rdp() > > only if there is a grace period in progress that is still blocked > > by at least one CPU on this rcu_node structure. This means that > > rcu_accelerate_cbs() should never return the value true, and thus that > > this function should never set the needwake variable and in turn never > > invoke rcu_gp_kthread_wake(). > > > > This commit therefore removes the needwake variable and the invocation > > of rcu_gp_kthread_wake() in favor of a WARN_ON_ONCE() on the call to > > rcu_accelerate_cbs(). The purpose of this new WARN_ON_ONCE() is to > > detect situations where the system's opinion differs from ours. > > > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > Reviewed-by: Frederic Weisbecker <frederic@kernel.org> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index b2c2045294780..7a3085ad0a7df 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > > { > > unsigned long flags; > > unsigned long mask; > > - bool needwake = false; > > bool needacc = false; > > struct rcu_node *rnp; > > > > @@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > > * NOCB kthreads have their own way to deal with that... > > */ > > if (!rcu_rdp_is_offloaded(rdp)) { > > - needwake = rcu_accelerate_cbs(rnp, rdp); > > + /* > > + * The current GP has not yet ended, so it > > + * should not be possible for rcu_accelerate_cbs() > > + * to return true. So complain, but don't awaken. > > + */ > > + WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp)); > > } else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) { > > /* > > * ...but NOCB kthreads may miss or delay callbacks acceleration > > @@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > > rcu_disable_urgency_upon_qs(rdp); > > rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags); > > /* ^^^ Released rnp->lock */ > > - if (needwake) > > - rcu_gp_kthread_wake(); > > > >AFAICS, there is almost no compiler benefit of doing this, and zero runtime > >benefit of doing this. The WARN_ON_ONCE() also involves a runtime condition > >check of the return value of rcu_accelerate_cbs(), so you still have a > >branch. Yes, maybe slightly smaller code without the wake call, but I'm not > >sure that is worth it. > > > >And, if the opinion of system differs, its a bug anyway, so more added risk. > > > > > > > > if (needacc) { > > rcu_nocb_lock_irqsave(rdp, flags); > > > >And when needacc = true, rcu_accelerate_cbs_unlocked() tries to do a wake up > >anyway, so it is consistent with nocb vs !nocb. > > For !nocb, we invoked rcu_accelerate_cbs() before report qs, so this GP is impossible to end > and we also not set RCU_GP_FLAG_INIT to start new GP in rcu_accelerate_cbs(). > but for nocb, when needacc = true, we invoke rcu_accelerate_cbs_unlocked() after current CPU > has reported qs, if all CPU have been reported qs, we will wakeup gp kthread to end this GP in > rcu_report_qs_rnp(). after that, the rcu_accelerate_cbs_unlocked() is possible to try to wake up > gp kthread if this GP has ended at this time. so nocb vs !nocb is likely to be inconsistent. > > >That is a fair point. But after gp ends, rcu_check_quiescent_state() >-> note_gp_changes() which will do a accel + GP thread wake up at that >point anyway, once it notices that a GP has come to an end. That >should happen for both the nocb and !nocb cases right? For nocb rdp, we won't invoke rcu_accelerate_cbs() and rcu_advance_cbs() in note_gp_changes(). so also not wakeup gp kthread in note_gp_changes(). > >I am wondering if rcu_report_qs_rdp() needs to be rethought to make >both cases consistent. > >Why does the nocb case need an accel + GP thread wakeup in the >rcu_report_qs_rdp() function, but the !nocb case does not? For nocb accel + GP kthread wakeup only happen in the middle of a (de-)offloading process. this is an intermediate state. Thanks Zqiang > >(I am out of office till Monday but will intermittently (maybe) check >in, RCU is one of those things that daydreaming tends to lend itself >to...) > > - Joel
> On Jan 20, 2023, at 3:19 AM, Zhang, Qiang1 <qiang1.zhang@intel.com> wrote: > >  >> >> >>>> On Wed, Jan 18, 2023 at 03:30:14PM +0800, Zqiang wrote: >>>>> When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's -> >>>>> grpmask has not been cleared from the corresponding rcu_node structure's >>>>> ->qsmask, after that will clear and report quiescent state, but in this >>>>> time, this also means that current grace period is not end, the current >>>>> grace period is ongoing, because the rcu_gp_in_progress() currently return >>>>> true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible >>>>> to return true. >>>>> >>>>> This commit therefore remove impossible rcu_gp_kthread_wake() calling. >>>>> >>>>> Signed-off-by: Zqiang <qiang1.zhang@intel.com> >>>>> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> >>> >>> Queued (wordsmithed as shown below, as always, please check) for further >>> testing and review, thank you both! >>> >>> Thanx, Paul >>> >>> ------------------------------------------------------------------------ >>> >>> commit fbe3e300ec8b3edd2b8f84dab4dc98947cf71eb8 >>> Author: Zqiang <qiang1.zhang@intel.com> >>> Date: Wed Jan 18 15:30:14 2023 +0800 >>> >>> rcu: Remove never-set needwake assignment from rcu_report_qs_rdp() >>> >>> The rcu_accelerate_cbs() function is invoked by rcu_report_qs_rdp() >>> only if there is a grace period in progress that is still blocked >>> by at least one CPU on this rcu_node structure. This means that >>> rcu_accelerate_cbs() should never return the value true, and thus that >>> this function should never set the needwake variable and in turn never >>> invoke rcu_gp_kthread_wake(). >>> >>> This commit therefore removes the needwake variable and the invocation >>> of rcu_gp_kthread_wake() in favor of a WARN_ON_ONCE() on the call to >>> rcu_accelerate_cbs(). The purpose of this new WARN_ON_ONCE() is to >>> detect situations where the system's opinion differs from ours. >>> >>> Signed-off-by: Zqiang <qiang1.zhang@intel.com> >>> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> >>> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> >>> >>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c >>> index b2c2045294780..7a3085ad0a7df 100644 >>> --- a/kernel/rcu/tree.c >>> +++ b/kernel/rcu/tree.c >>> @@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) >>> { >>> unsigned long flags; >>> unsigned long mask; >>> - bool needwake = false; >>> bool needacc = false; >>> struct rcu_node *rnp; >>> >>> @@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) >>> * NOCB kthreads have their own way to deal with that... >>> */ >>> if (!rcu_rdp_is_offloaded(rdp)) { >>> - needwake = rcu_accelerate_cbs(rnp, rdp); >>> + /* >>> + * The current GP has not yet ended, so it >>> + * should not be possible for rcu_accelerate_cbs() >>> + * to return true. So complain, but don't awaken. >>> + */ >>> + WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp)); >>> } else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) { >>> /* >>> * ...but NOCB kthreads may miss or delay callbacks acceleration >>> @@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) >>> rcu_disable_urgency_upon_qs(rdp); >>> rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags); >>> /* ^^^ Released rnp->lock */ >>> - if (needwake) >>> - rcu_gp_kthread_wake(); >>> >>> AFAICS, there is almost no compiler benefit of doing this, and zero runtime >>> benefit of doing this. The WARN_ON_ONCE() also involves a runtime condition >>> check of the return value of rcu_accelerate_cbs(), so you still have a >>> branch. Yes, maybe slightly smaller code without the wake call, but I'm not >>> sure that is worth it. >>> >>> And, if the opinion of system differs, its a bug anyway, so more added risk. >>> >>> >>> >>> if (needacc) { >>> rcu_nocb_lock_irqsave(rdp, flags); >>> >>> And when needacc = true, rcu_accelerate_cbs_unlocked() tries to do a wake up >>> anyway, so it is consistent with nocb vs !nocb. >> >> For !nocb, we invoked rcu_accelerate_cbs() before report qs, so this GP is impossible to end >> and we also not set RCU_GP_FLAG_INIT to start new GP in rcu_accelerate_cbs(). >> but for nocb, when needacc = true, we invoke rcu_accelerate_cbs_unlocked() after current CPU >> has reported qs, if all CPU have been reported qs, we will wakeup gp kthread to end this GP in >> rcu_report_qs_rnp(). after that, the rcu_accelerate_cbs_unlocked() is possible to try to wake up >> gp kthread if this GP has ended at this time. so nocb vs !nocb is likely to be inconsistent. >> >> >> That is a fair point. But after gp ends, rcu_check_quiescent_state() >> -> note_gp_changes() which will do a accel + GP thread wake up at that >> point anyway, once it notices that a GP has come to an end. That >> should happen for both the nocb and !nocb cases right? > > For nocb rdp, we won't invoke rcu_accelerate_cbs() and rcu_advance_cbs() in > note_gp_changes(). so also not wakeup gp kthread in note_gp_changes(). Yes correct, ok but… > >> >> I am wondering if rcu_report_qs_rdp() needs to be rethought to make >> both cases consistent. >> >> Why does the nocb case need an accel + GP thread wakeup in the >> rcu_report_qs_rdp() function, but the !nocb case does not? > > For nocb accel + GP kthread wakeup only happen in the middle of a (de-)offloading process. > this is an intermediate state. Sure, I know what the code currently does, I am asking why and it feels wrong. I suggest you slightly change your approach to not assuming the code should be bonafide correct and then fixing it (which is ok once in a while), and asking higher level questions to why things are the way they are in the first place (that is just my suggestion and I am not in a place to provide advice, far from it, but I am just telling you my approach — I care more about the code than increasing my patch count :P). If you are in an intermediate state, part way to a !nocb state — you may have missed a nocb-related accel and wake, correct? Why does that matter? Once we transition to a !nocb state, we do not do a post-qs-report accel+wake anyway as we clearly know from the discussion. So why do we need to do it if we missed it for the intermediate stage? So, I am not fully sure yet what that needac is doing and why it is needed. Do not get me wrong, stellar work here. But I suggest challenge the assumptions and the design, not always just the code that was already written :), apologies for any misplaced or noisy advice. Thanks! - Joel > > Thanks > Zqiang > >> >> (I am out of office till Monday but will intermittently (maybe) check >> in, RCU is one of those things that daydreaming tends to lend itself >> to...) >> >> - Joel
On Fri, Jan 20, 2023 at 08:27:03AM -0500, Joel Fernandes wrote: > > > > On Jan 20, 2023, at 3:19 AM, Zhang, Qiang1 <qiang1.zhang@intel.com> wrote: > > > >  > >> > >> > >>>> On Wed, Jan 18, 2023 at 03:30:14PM +0800, Zqiang wrote: > >>>>> When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's -> > >>>>> grpmask has not been cleared from the corresponding rcu_node structure's > >>>>> ->qsmask, after that will clear and report quiescent state, but in this > >>>>> time, this also means that current grace period is not end, the current > >>>>> grace period is ongoing, because the rcu_gp_in_progress() currently return > >>>>> true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible > >>>>> to return true. > >>>>> > >>>>> This commit therefore remove impossible rcu_gp_kthread_wake() calling. > >>>>> > >>>>> Signed-off-by: Zqiang <qiang1.zhang@intel.com> > >>>>> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> > >>> > >>> Queued (wordsmithed as shown below, as always, please check) for further > >>> testing and review, thank you both! > >>> > >>> Thanx, Paul > >>> > >>> ------------------------------------------------------------------------ > >>> > >>> commit fbe3e300ec8b3edd2b8f84dab4dc98947cf71eb8 > >>> Author: Zqiang <qiang1.zhang@intel.com> > >>> Date: Wed Jan 18 15:30:14 2023 +0800 > >>> > >>> rcu: Remove never-set needwake assignment from rcu_report_qs_rdp() > >>> > >>> The rcu_accelerate_cbs() function is invoked by rcu_report_qs_rdp() > >>> only if there is a grace period in progress that is still blocked > >>> by at least one CPU on this rcu_node structure. This means that > >>> rcu_accelerate_cbs() should never return the value true, and thus that > >>> this function should never set the needwake variable and in turn never > >>> invoke rcu_gp_kthread_wake(). > >>> > >>> This commit therefore removes the needwake variable and the invocation > >>> of rcu_gp_kthread_wake() in favor of a WARN_ON_ONCE() on the call to > >>> rcu_accelerate_cbs(). The purpose of this new WARN_ON_ONCE() is to > >>> detect situations where the system's opinion differs from ours. > >>> > >>> Signed-off-by: Zqiang <qiang1.zhang@intel.com> > >>> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> > >>> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > >>> > >>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > >>> index b2c2045294780..7a3085ad0a7df 100644 > >>> --- a/kernel/rcu/tree.c > >>> +++ b/kernel/rcu/tree.c > >>> @@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > >>> { > >>> unsigned long flags; > >>> unsigned long mask; > >>> - bool needwake = false; > >>> bool needacc = false; > >>> struct rcu_node *rnp; > >>> > >>> @@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > >>> * NOCB kthreads have their own way to deal with that... > >>> */ > >>> if (!rcu_rdp_is_offloaded(rdp)) { > >>> - needwake = rcu_accelerate_cbs(rnp, rdp); > >>> + /* > >>> + * The current GP has not yet ended, so it > >>> + * should not be possible for rcu_accelerate_cbs() > >>> + * to return true. So complain, but don't awaken. > >>> + */ > >>> + WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp)); > >>> } else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) { > >>> /* > >>> * ...but NOCB kthreads may miss or delay callbacks acceleration > >>> @@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > >>> rcu_disable_urgency_upon_qs(rdp); > >>> rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags); > >>> /* ^^^ Released rnp->lock */ > >>> - if (needwake) > >>> - rcu_gp_kthread_wake(); > >>> > >>> AFAICS, there is almost no compiler benefit of doing this, and zero runtime > >>> benefit of doing this. The WARN_ON_ONCE() also involves a runtime condition > >>> check of the return value of rcu_accelerate_cbs(), so you still have a > >>> branch. Yes, maybe slightly smaller code without the wake call, but I'm not > >>> sure that is worth it. > >>> > >>> And, if the opinion of system differs, its a bug anyway, so more added risk. > >>> > >>> > >>> > >>> if (needacc) { > >>> rcu_nocb_lock_irqsave(rdp, flags); > >>> > >>> And when needacc = true, rcu_accelerate_cbs_unlocked() tries to do a wake up > >>> anyway, so it is consistent with nocb vs !nocb. > >> > >> For !nocb, we invoked rcu_accelerate_cbs() before report qs, so this GP is impossible to end > >> and we also not set RCU_GP_FLAG_INIT to start new GP in rcu_accelerate_cbs(). > >> but for nocb, when needacc = true, we invoke rcu_accelerate_cbs_unlocked() after current CPU > >> has reported qs, if all CPU have been reported qs, we will wakeup gp kthread to end this GP in > >> rcu_report_qs_rnp(). after that, the rcu_accelerate_cbs_unlocked() is possible to try to wake up > >> gp kthread if this GP has ended at this time. so nocb vs !nocb is likely to be inconsistent. > >> > >> > >> That is a fair point. But after gp ends, rcu_check_quiescent_state() > >> -> note_gp_changes() which will do a accel + GP thread wake up at that > >> point anyway, once it notices that a GP has come to an end. That > >> should happen for both the nocb and !nocb cases right? > > > > For nocb rdp, we won't invoke rcu_accelerate_cbs() and rcu_advance_cbs() in > > note_gp_changes(). so also not wakeup gp kthread in note_gp_changes(). > > Yes correct, ok but… > > > >> > >> I am wondering if rcu_report_qs_rdp() needs to be rethought to make > >> both cases consistent. > >> > >> Why does the nocb case need an accel + GP thread wakeup in the > >> rcu_report_qs_rdp() function, but the !nocb case does not? > > > > For nocb accel + GP kthread wakeup only happen in the middle of a (de-)offloading process. > > this is an intermediate state. > > Sure, I know what the code currently does, I am asking why and it feels wrong. > > I suggest you slightly change your approach to not assuming the code should be bonafide correct and then fixing it (which is ok once in a while), and asking higher level questions to why things are the way they are in the first place (that is just my suggestion and I am not in a place to provide advice, far from it, but I am just telling you my approach — I care more about the code than increasing my patch count :P). > > If you are in an intermediate state, part way to a !nocb state — you may have missed a nocb-related accel and wake, correct? Why does that matter? Once we transition to a !nocb state, we do not do a post-qs-report accel+wake anyway as we clearly know from the discussion. So why do we need to do it if we missed it for the intermediate stage? So, I am not fully sure yet what that needac is doing and why it is needed. > > Do not get me wrong, stellar work here. But I suggest challenge the assumptions and the design, not always just the code that was already written :), apologies for any misplaced or noisy advice. To add to Joel's point, an extra unnecessary check on a slow path can be OK, but missing a necessary check is of course very bad. Just to make sure that I am following along, here are the options I see: 1. Status quo. 2. Zqiang's current patch, as in remove the wakeup and add the WARN_ON_ONCE(). 3. Status quo, and only add the WARN_ON_ONCE(), but still keep the needless check for the wakeup. Are there other options that I have missed? Thanx, Paul
On Fri, Jan 20, 2023 at 12:33:00PM -0800, Paul E. McKenney wrote: > On Fri, Jan 20, 2023 at 08:27:03AM -0500, Joel Fernandes wrote: > > > > > > > On Jan 20, 2023, at 3:19 AM, Zhang, Qiang1 <qiang1.zhang@intel.com> wrote: > > > > > >  > > >> > > >> > > >>>> On Wed, Jan 18, 2023 at 03:30:14PM +0800, Zqiang wrote: > > >>>>> When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's -> > > >>>>> grpmask has not been cleared from the corresponding rcu_node structure's > > >>>>> ->qsmask, after that will clear and report quiescent state, but in this > > >>>>> time, this also means that current grace period is not end, the current > > >>>>> grace period is ongoing, because the rcu_gp_in_progress() currently return > > >>>>> true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible > > >>>>> to return true. > > >>>>> > > >>>>> This commit therefore remove impossible rcu_gp_kthread_wake() calling. > > >>>>> > > >>>>> Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > >>>>> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> > > >>> > > >>> Queued (wordsmithed as shown below, as always, please check) for further > > >>> testing and review, thank you both! > > >>> > > >>> Thanx, Paul > > >>> > > >>> ------------------------------------------------------------------------ > > >>> > > >>> commit fbe3e300ec8b3edd2b8f84dab4dc98947cf71eb8 > > >>> Author: Zqiang <qiang1.zhang@intel.com> > > >>> Date: Wed Jan 18 15:30:14 2023 +0800 > > >>> > > >>> rcu: Remove never-set needwake assignment from rcu_report_qs_rdp() > > >>> > > >>> The rcu_accelerate_cbs() function is invoked by rcu_report_qs_rdp() > > >>> only if there is a grace period in progress that is still blocked > > >>> by at least one CPU on this rcu_node structure. This means that > > >>> rcu_accelerate_cbs() should never return the value true, and thus that > > >>> this function should never set the needwake variable and in turn never > > >>> invoke rcu_gp_kthread_wake(). > > >>> > > >>> This commit therefore removes the needwake variable and the invocation > > >>> of rcu_gp_kthread_wake() in favor of a WARN_ON_ONCE() on the call to > > >>> rcu_accelerate_cbs(). The purpose of this new WARN_ON_ONCE() is to > > >>> detect situations where the system's opinion differs from ours. > > >>> > > >>> Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > >>> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> > > >>> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > >>> > > >>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > >>> index b2c2045294780..7a3085ad0a7df 100644 > > >>> --- a/kernel/rcu/tree.c > > >>> +++ b/kernel/rcu/tree.c > > >>> @@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > > >>> { > > >>> unsigned long flags; > > >>> unsigned long mask; > > >>> - bool needwake = false; > > >>> bool needacc = false; > > >>> struct rcu_node *rnp; > > >>> > > >>> @@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > > >>> * NOCB kthreads have their own way to deal with that... > > >>> */ > > >>> if (!rcu_rdp_is_offloaded(rdp)) { > > >>> - needwake = rcu_accelerate_cbs(rnp, rdp); > > >>> + /* > > >>> + * The current GP has not yet ended, so it > > >>> + * should not be possible for rcu_accelerate_cbs() > > >>> + * to return true. So complain, but don't awaken. > > >>> + */ > > >>> + WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp)); > > >>> } else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) { > > >>> /* > > >>> * ...but NOCB kthreads may miss or delay callbacks acceleration > > >>> @@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > > >>> rcu_disable_urgency_upon_qs(rdp); > > >>> rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags); > > >>> /* ^^^ Released rnp->lock */ > > >>> - if (needwake) > > >>> - rcu_gp_kthread_wake(); > > >>> > > >>> AFAICS, there is almost no compiler benefit of doing this, and zero runtime > > >>> benefit of doing this. The WARN_ON_ONCE() also involves a runtime condition > > >>> check of the return value of rcu_accelerate_cbs(), so you still have a > > >>> branch. Yes, maybe slightly smaller code without the wake call, but I'm not > > >>> sure that is worth it. > > >>> > > >>> And, if the opinion of system differs, its a bug anyway, so more added risk. > > >>> > > >>> > > >>> > > >>> if (needacc) { > > >>> rcu_nocb_lock_irqsave(rdp, flags); > > >>> > > >>> And when needacc = true, rcu_accelerate_cbs_unlocked() tries to do a wake up > > >>> anyway, so it is consistent with nocb vs !nocb. > > >> > > >> For !nocb, we invoked rcu_accelerate_cbs() before report qs, so this GP is impossible to end > > >> and we also not set RCU_GP_FLAG_INIT to start new GP in rcu_accelerate_cbs(). > > >> but for nocb, when needacc = true, we invoke rcu_accelerate_cbs_unlocked() after current CPU > > >> has reported qs, if all CPU have been reported qs, we will wakeup gp kthread to end this GP in > > >> rcu_report_qs_rnp(). after that, the rcu_accelerate_cbs_unlocked() is possible to try to wake up > > >> gp kthread if this GP has ended at this time. so nocb vs !nocb is likely to be inconsistent. > > >> > > >> > > >> That is a fair point. But after gp ends, rcu_check_quiescent_state() > > >> -> note_gp_changes() which will do a accel + GP thread wake up at that > > >> point anyway, once it notices that a GP has come to an end. That > > >> should happen for both the nocb and !nocb cases right? > > > > > > For nocb rdp, we won't invoke rcu_accelerate_cbs() and rcu_advance_cbs() in > > > note_gp_changes(). so also not wakeup gp kthread in note_gp_changes(). > > > > Yes correct, ok but… > > > > > >> > > >> I am wondering if rcu_report_qs_rdp() needs to be rethought to make > > >> both cases consistent. > > >> > > >> Why does the nocb case need an accel + GP thread wakeup in the > > >> rcu_report_qs_rdp() function, but the !nocb case does not? > > > > > > For nocb accel + GP kthread wakeup only happen in the middle of a (de-)offloading process. > > > this is an intermediate state. > > > > Sure, I know what the code currently does, I am asking why and it feels wrong. > > > > I suggest you slightly change your approach to not assuming the code should be bonafide correct and then fixing it (which is ok once in a while), and asking higher level questions to why things are the way they are in the first place (that is just my suggestion and I am not in a place to provide advice, far from it, but I am just telling you my approach — I care more about the code than increasing my patch count :P). > > > > If you are in an intermediate state, part way to a !nocb state — you may have missed a nocb-related accel and wake, correct? Why does that matter? Once we transition to a !nocb state, we do not do a post-qs-report accel+wake anyway as we clearly know from the discussion. So why do we need to do it if we missed it for the intermediate stage? So, I am not fully sure yet what that needac is doing and why it is needed. > > > > Do not get me wrong, stellar work here. But I suggest challenge the assumptions and the design, not always just the code that was already written :), apologies for any misplaced or noisy advice. > > To add to Joel's point, an extra unnecessary check on a slow path can > be OK, but missing a necessary check is of course very bad. > > Just to make sure that I am following along, here are the options I see: > > 1. Status quo. > > 2. Zqiang's current patch, as in remove the wakeup and > add the WARN_ON_ONCE(). > > 3. Status quo, and only add the WARN_ON_ONCE(), but still > keep the needless check for the wakeup. > > Are there other options that I have missed? I'm personally in favour of keeping 2. Removing an imaginary path and consolidating an expectation from such a complicated codebase always makes me able to sleep a few more minutes everyday :) Thanks. > > Thanx, Paul
On Fri, Jan 20, 2023 at 08:27:03AM -0500, Joel Fernandes wrote: > > Sure, I know what the code currently does, I am asking why and it feels wrong. > > I suggest you slightly change your approach to not assuming the code should be bonafide correct and then fixing it (which is ok once in a while), and asking higher level questions to why things are the way they are in the first place (that is just my suggestion and I am not in a place to provide advice, far from it, but I am just telling you my approach — I care more about the code than increasing my patch count :P). > > If you are in an intermediate state, part way to a !nocb state — you may have > missed a nocb-related accel and wake, correct? Why does that matter? Once we > transition to a !nocb state, we do not do a post-qs-report accel+wake anyway > as we clearly know from the discussion. I'm confused. We are doing that acceleration on qs report for !nocb CPU, right? > So why do we need to do it if we > missed it for the intermediate stage? So, I am not fully sure yet what that > needac is doing and why it is needed. To summarize: * If the CPU is NOCB, all the callbacks advance and acceleration is performed by the rcuo/rcuog kthreads. * If the CPU is not NOCB, all the callbacks acceleration is performed by the CPU, such as in the case of rcu_report_qs_rdp(). * If the CPU is transitionning from NOCB to !NOCB or from !NOCB to NOCB, the kthreads may not be available to do the advance/acceleration, so we must do it locally. That's the needacc path. What am I missing? Thanks.
On Fri, Jan 20, 2023 at 11:35:59PM +0100, Frederic Weisbecker wrote: > On Fri, Jan 20, 2023 at 12:33:00PM -0800, Paul E. McKenney wrote: > > On Fri, Jan 20, 2023 at 08:27:03AM -0500, Joel Fernandes wrote: > > > > > > > > > > On Jan 20, 2023, at 3:19 AM, Zhang, Qiang1 <qiang1.zhang@intel.com> wrote: > > > > > > > >  > > > >> > > > >> > > > >>>> On Wed, Jan 18, 2023 at 03:30:14PM +0800, Zqiang wrote: > > > >>>>> When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's -> > > > >>>>> grpmask has not been cleared from the corresponding rcu_node structure's > > > >>>>> ->qsmask, after that will clear and report quiescent state, but in this > > > >>>>> time, this also means that current grace period is not end, the current > > > >>>>> grace period is ongoing, because the rcu_gp_in_progress() currently return > > > >>>>> true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible > > > >>>>> to return true. > > > >>>>> > > > >>>>> This commit therefore remove impossible rcu_gp_kthread_wake() calling. > > > >>>>> > > > >>>>> Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > > >>>>> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> > > > >>> > > > >>> Queued (wordsmithed as shown below, as always, please check) for further > > > >>> testing and review, thank you both! > > > >>> > > > >>> Thanx, Paul > > > >>> > > > >>> ------------------------------------------------------------------------ > > > >>> > > > >>> commit fbe3e300ec8b3edd2b8f84dab4dc98947cf71eb8 > > > >>> Author: Zqiang <qiang1.zhang@intel.com> > > > >>> Date: Wed Jan 18 15:30:14 2023 +0800 > > > >>> > > > >>> rcu: Remove never-set needwake assignment from rcu_report_qs_rdp() > > > >>> > > > >>> The rcu_accelerate_cbs() function is invoked by rcu_report_qs_rdp() > > > >>> only if there is a grace period in progress that is still blocked > > > >>> by at least one CPU on this rcu_node structure. This means that > > > >>> rcu_accelerate_cbs() should never return the value true, and thus that > > > >>> this function should never set the needwake variable and in turn never > > > >>> invoke rcu_gp_kthread_wake(). > > > >>> > > > >>> This commit therefore removes the needwake variable and the invocation > > > >>> of rcu_gp_kthread_wake() in favor of a WARN_ON_ONCE() on the call to > > > >>> rcu_accelerate_cbs(). The purpose of this new WARN_ON_ONCE() is to > > > >>> detect situations where the system's opinion differs from ours. > > > >>> > > > >>> Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > > >>> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> > > > >>> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > > >>> > > > >>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > >>> index b2c2045294780..7a3085ad0a7df 100644 > > > >>> --- a/kernel/rcu/tree.c > > > >>> +++ b/kernel/rcu/tree.c > > > >>> @@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > > > >>> { > > > >>> unsigned long flags; > > > >>> unsigned long mask; > > > >>> - bool needwake = false; > > > >>> bool needacc = false; > > > >>> struct rcu_node *rnp; > > > >>> > > > >>> @@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > > > >>> * NOCB kthreads have their own way to deal with that... > > > >>> */ > > > >>> if (!rcu_rdp_is_offloaded(rdp)) { > > > >>> - needwake = rcu_accelerate_cbs(rnp, rdp); > > > >>> + /* > > > >>> + * The current GP has not yet ended, so it > > > >>> + * should not be possible for rcu_accelerate_cbs() > > > >>> + * to return true. So complain, but don't awaken. > > > >>> + */ > > > >>> + WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp)); > > > >>> } else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) { > > > >>> /* > > > >>> * ...but NOCB kthreads may miss or delay callbacks acceleration > > > >>> @@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > > > >>> rcu_disable_urgency_upon_qs(rdp); > > > >>> rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags); > > > >>> /* ^^^ Released rnp->lock */ > > > >>> - if (needwake) > > > >>> - rcu_gp_kthread_wake(); > > > >>> > > > >>> AFAICS, there is almost no compiler benefit of doing this, and zero runtime > > > >>> benefit of doing this. The WARN_ON_ONCE() also involves a runtime condition > > > >>> check of the return value of rcu_accelerate_cbs(), so you still have a > > > >>> branch. Yes, maybe slightly smaller code without the wake call, but I'm not > > > >>> sure that is worth it. > > > >>> > > > >>> And, if the opinion of system differs, its a bug anyway, so more added risk. > > > >>> > > > >>> > > > >>> > > > >>> if (needacc) { > > > >>> rcu_nocb_lock_irqsave(rdp, flags); > > > >>> > > > >>> And when needacc = true, rcu_accelerate_cbs_unlocked() tries to do a wake up > > > >>> anyway, so it is consistent with nocb vs !nocb. > > > >> > > > >> For !nocb, we invoked rcu_accelerate_cbs() before report qs, so this GP is impossible to end > > > >> and we also not set RCU_GP_FLAG_INIT to start new GP in rcu_accelerate_cbs(). > > > >> but for nocb, when needacc = true, we invoke rcu_accelerate_cbs_unlocked() after current CPU > > > >> has reported qs, if all CPU have been reported qs, we will wakeup gp kthread to end this GP in > > > >> rcu_report_qs_rnp(). after that, the rcu_accelerate_cbs_unlocked() is possible to try to wake up > > > >> gp kthread if this GP has ended at this time. so nocb vs !nocb is likely to be inconsistent. > > > >> > > > >> > > > >> That is a fair point. But after gp ends, rcu_check_quiescent_state() > > > >> -> note_gp_changes() which will do a accel + GP thread wake up at that > > > >> point anyway, once it notices that a GP has come to an end. That > > > >> should happen for both the nocb and !nocb cases right? > > > > > > > > For nocb rdp, we won't invoke rcu_accelerate_cbs() and rcu_advance_cbs() in > > > > note_gp_changes(). so also not wakeup gp kthread in note_gp_changes(). > > > > > > Yes correct, ok but… > > > > > > > >> > > > >> I am wondering if rcu_report_qs_rdp() needs to be rethought to make > > > >> both cases consistent. > > > >> > > > >> Why does the nocb case need an accel + GP thread wakeup in the > > > >> rcu_report_qs_rdp() function, but the !nocb case does not? > > > > > > > > For nocb accel + GP kthread wakeup only happen in the middle of a (de-)offloading process. > > > > this is an intermediate state. > > > > > > Sure, I know what the code currently does, I am asking why and it feels wrong. > > > > > > I suggest you slightly change your approach to not assuming the code should be bonafide correct and then fixing it (which is ok once in a while), and asking higher level questions to why things are the way they are in the first place (that is just my suggestion and I am not in a place to provide advice, far from it, but I am just telling you my approach — I care more about the code than increasing my patch count :P). > > > > > > If you are in an intermediate state, part way to a !nocb state — you may have missed a nocb-related accel and wake, correct? Why does that matter? Once we transition to a !nocb state, we do not do a post-qs-report accel+wake anyway as we clearly know from the discussion. So why do we need to do it if we missed it for the intermediate stage? So, I am not fully sure yet what that needac is doing and why it is needed. > > > > > > Do not get me wrong, stellar work here. But I suggest challenge the assumptions and the design, not always just the code that was already written :), apologies for any misplaced or noisy advice. > > > > To add to Joel's point, an extra unnecessary check on a slow path can > > be OK, but missing a necessary check is of course very bad. > > > > Just to make sure that I am following along, here are the options I see: > > > > 1. Status quo. > > > > 2. Zqiang's current patch, as in remove the wakeup and > > add the WARN_ON_ONCE(). > > > > 3. Status quo, and only add the WARN_ON_ONCE(), but still > > keep the needless check for the wakeup. > > > > Are there other options that I have missed? > > I'm personally in favour of keeping 2. > Removing an imaginary path and consolidating an expectation from such > a complicated codebase always makes me able to sleep a few more minutes > everyday :) Excellent point, thank you! Thanx, Paul
> On Jan 20, 2023, at 3:19 AM, Zhang, Qiang1 <qiang1.zhang@intel.com> wrote: > >  >> >> >>>> On Wed, Jan 18, 2023 at 03:30:14PM +0800, Zqiang wrote: >>>>> When inovke rcu_report_qs_rdp(), if current CPU's rcu_data structure's -> >>>>> grpmask has not been cleared from the corresponding rcu_node structure's >>>>> ->qsmask, after that will clear and report quiescent state, but in this >>>>> time, this also means that current grace period is not end, the current >>>>> grace period is ongoing, because the rcu_gp_in_progress() currently return >>>>> true, so for non-offloaded rdp, invoke rcu_accelerate_cbs() is impossible >>>>> to return true. >>>>> >>>>> This commit therefore remove impossible rcu_gp_kthread_wake() calling. >>>>> >>>>> Signed-off-by: Zqiang <qiang1.zhang@intel.com> >>>>> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> >>> >>> Queued (wordsmithed as shown below, as always, please check) for further >>> testing and review, thank you both! >>> >>> Thanx, Paul >>> >>> ------------------------------------------------------------------------ >>> >>> commit fbe3e300ec8b3edd2b8f84dab4dc98947cf71eb8 >>> Author: Zqiang <qiang1.zhang@intel.com> >>> Date: Wed Jan 18 15:30:14 2023 +0800 >>> >>> rcu: Remove never-set needwake assignment from rcu_report_qs_rdp() >>> >>> The rcu_accelerate_cbs() function is invoked by rcu_report_qs_rdp() >>> only if there is a grace period in progress that is still blocked >>> by at least one CPU on this rcu_node structure. This means that >>> rcu_accelerate_cbs() should never return the value true, and thus that >>> this function should never set the needwake variable and in turn never >>> invoke rcu_gp_kthread_wake(). >>> >>> This commit therefore removes the needwake variable and the invocation >>> of rcu_gp_kthread_wake() in favor of a WARN_ON_ONCE() on the call to >>> rcu_accelerate_cbs(). The purpose of this new WARN_ON_ONCE() is to >>> detect situations where the system's opinion differs from ours. >>> >>> Signed-off-by: Zqiang <qiang1.zhang@intel.com> >>> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> >>> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> >>> >>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c >>> index b2c2045294780..7a3085ad0a7df 100644 >>> --- a/kernel/rcu/tree.c >>> +++ b/kernel/rcu/tree.c >>> @@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) >>> { >>> unsigned long flags; >>> unsigned long mask; >>> - bool needwake = false; >>> bool needacc = false; >>> struct rcu_node *rnp; >>> >>> @@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) >>> * NOCB kthreads have their own way to deal with that... >>> */ >>> if (!rcu_rdp_is_offloaded(rdp)) { >>> - needwake = rcu_accelerate_cbs(rnp, rdp); >>> + /* >>> + * The current GP has not yet ended, so it >>> + * should not be possible for rcu_accelerate_cbs() >>> + * to return true. So complain, but don't awaken. >>> + */ >>> + WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp)); >>> } else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) { >>> /* >>> * ...but NOCB kthreads may miss or delay callbacks acceleration >>> @@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) >>> rcu_disable_urgency_upon_qs(rdp); >>> rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags); >>> /* ^^^ Released rnp->lock */ >>> - if (needwake) >>> - rcu_gp_kthread_wake(); >>> >>> AFAICS, there is almost no compiler benefit of doing this, and zero runtime >>> benefit of doing this. The WARN_ON_ONCE() also involves a runtime condition >>> check of the return value of rcu_accelerate_cbs(), so you still have a >>> branch. Yes, maybe slightly smaller code without the wake call, but I'm not >>> sure that is worth it. >>> >>> And, if the opinion of system differs, its a bug anyway, so more added risk. >>> >>> >>> >>> if (needacc) { >>> rcu_nocb_lock_irqsave(rdp, flags); >>> >>> And when needacc = true, rcu_accelerate_cbs_unlocked() tries to do a wake up >>> anyway, so it is consistent with nocb vs !nocb. >> >> For !nocb, we invoked rcu_accelerate_cbs() before report qs, so this GP is impossible to end >> and we also not set RCU_GP_FLAG_INIT to start new GP in rcu_accelerate_cbs(). >> but for nocb, when needacc = true, we invoke rcu_accelerate_cbs_unlocked() after current CPU >> has reported qs, if all CPU have been reported qs, we will wakeup gp kthread to end this GP in >> rcu_report_qs_rnp(). after that, the rcu_accelerate_cbs_unlocked() is possible to try to wake up >> gp kthread if this GP has ended at this time. so nocb vs !nocb is likely to be inconsistent. >> >> >> That is a fair point. But after gp ends, rcu_check_quiescent_state() >> -> note_gp_changes() which will do a accel + GP thread wake up at that >> point anyway, once it notices that a GP has come to an end. That >> should happen for both the nocb and !nocb cases right? > > For nocb rdp, we won't invoke rcu_accelerate_cbs() and rcu_advance_cbs() in > note_gp_changes(). so also not wakeup gp kthread in note_gp_changes(). > >Yes correct, ok but… > >> >> I am wondering if rcu_report_qs_rdp() needs to be rethought to make >> both cases consistent. >> >> Why does the nocb case need an accel + GP thread wakeup in the >> rcu_report_qs_rdp() function, but the !nocb case does not? > > For nocb accel + GP kthread wakeup only happen in the middle of a (de-)offloading process. > this is an intermediate state. > >Sure, I know what the code currently does, I am asking why and it feels wrong. > >I suggest you slightly change your approach to not assuming the code should be bonafide >correct and then fixing it (which is ok once in a while), and asking higher level questions >to why things are the way they are in the first place (that is just my suggestion and I am not in >a place to provide advice, far from it, but I am just telling you my approach — I care more about >the code than increasing my patch count :P). > Thanks Joel, this is a useful suggestion for me. > >If you are in an intermediate state, part way to a !nocb state — >you may have missed a nocb-related accel and wake, correct? Why does that matter? >Once we transition to a !nocb state, we do not do a post-qs-report accel+wake anyway Should it be transition to a !nocb state, we do a post-qs-report accel+wake. >as we clearly know from the discussion. So why do we need to do it if we missed it for >the intermediate stage? So, I am not fully sure yet what that needac is doing and why it is needed. For de-offloading, when in the process of de-offloading(rcu_segcblist_completely_offloaded() return false), we're not doing bypass even though this rdp is offloaded state(rcu_rdp_is_offloaded(rdp) return true), at this time, the rcuog kthread probably won't accel+wakeup, so we do accel+wakeup in rcu_report_qs_rdp(), as you say why does that matter? for !nocb state, we've always tried to accel+wakeup as much as possible(compared to nocb), let rcu callback be executed as soon as possible. This is just my personal opinion, please correct me if I am wrong. Thanks Zqiang > >Do not get me wrong, stellar work here. But I suggest challenge the assumptions and the design, not always just the code that was already written :), apologies for any misplaced or noisy advice. > >Thanks! > > - Joel > > Thanks > Zqiang > >> >> (I am out of office till Monday but will intermittently (maybe) check >> in, RCU is one of those things that daydreaming tends to lend itself >> to...) >> >> - Joel
Hi Frederic, On Fri, Jan 20, 2023 at 6:04 PM Frederic Weisbecker <frederic@kernel.org> wrote: > > On Fri, Jan 20, 2023 at 08:27:03AM -0500, Joel Fernandes wrote: > > > > Sure, I know what the code currently does, I am asking why and it feels wrong. > > > > I suggest you slightly change your approach to not assuming the code should be bonafide correct and then fixing it (which is ok once in a while), and asking higher level questions to why things are the way they are in the first place (that is just my suggestion and I am not in a place to provide advice, far from it, but I am just telling you my approach — I care more about the code than increasing my patch count :P). > > > > If you are in an intermediate state, part way to a !nocb state — you may have > > missed a nocb-related accel and wake, correct? Why does that matter? Once we > > transition to a !nocb state, we do not do a post-qs-report accel+wake anyway > > as we clearly know from the discussion. > > I'm confused. We are doing that acceleration on qs report for !nocb CPU, right? > > > So why do we need to do it if we > > missed it for the intermediate stage? So, I am not fully sure yet what that > > needac is doing and why it is needed. > > To summarize: > > * If the CPU is NOCB, all the callbacks advance and acceleration is performed > by the rcuo/rcuog kthreads. > > * If the CPU is not NOCB, all the callbacks acceleration is performed by the > CPU, such as in the case of rcu_report_qs_rdp(). > > * If the CPU is transitionning from NOCB to !NOCB or from !NOCB to NOCB, the > kthreads may not be available to do the advance/acceleration, so we must do > it locally. That's the needacc path. Sure, I agree it "must be done locally" for the benefit of the half-way transition. > What am I missing? That the acceleration is also done by __note_gp_changes() once the grace period ends anyway, so if any acceleration was missed as you say, it will be done anyway. Also it is done by scheduler tick raising softirq: rcu_pending() does this: /* Has RCU gone idle with this CPU needing another grace period? */ if (!gp_in_progress && rcu_segcblist_is_enabled(&rdp->cblist) && !rcu_rdp_is_offloaded(rdp) && !rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL)) return 1; and rcu_core(): /* No grace period and unregistered callbacks? */ if (!rcu_gp_in_progress() && rcu_segcblist_is_enabled(&rdp->cblist) && do_batch) { rcu_nocb_lock_irqsave(rdp, flags); if (!rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL)) rcu_accelerate_cbs_unlocked(rnp, rdp); rcu_nocb_unlock_irqrestore(rdp, flags); } So, I am not sure if you need needacc at all. Those CBs that have not been assigned grace period numbers will be taken care off :) Thanks! -Joel
On Mon, Jan 23, 2023 at 10:22:19AM -0500, Joel Fernandes wrote: > > What am I missing? > > That the acceleration is also done by __note_gp_changes() once the > grace period ends anyway, so if any acceleration was missed as you > say, it will be done anyway. > > Also it is done by scheduler tick raising softirq: > > rcu_pending() does this: > /* Has RCU gone idle with this CPU needing another grace period? */ > if (!gp_in_progress && rcu_segcblist_is_enabled(&rdp->cblist) && > !rcu_rdp_is_offloaded(rdp) && > !rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL)) > return 1; > > and rcu_core(): > /* No grace period and unregistered callbacks? */ > if (!rcu_gp_in_progress() && > rcu_segcblist_is_enabled(&rdp->cblist) && do_batch) { > rcu_nocb_lock_irqsave(rdp, flags); > if (!rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL)) > rcu_accelerate_cbs_unlocked(rnp, rdp); > rcu_nocb_unlock_irqrestore(rdp, flags); > } > > So, I am not sure if you need needacc at all. Those CBs that have not > been assigned grace period numbers will be taken care off :) But that's only when there is no grace period pending, so it can't happen while we report a QS. OTOH without the needacc, those callbacks waiting to be accelerated would be eventually processed but only on the next tick following the end of a grace period...if none has started since then. So if someone else starts a new GP before the current CPU, we must wait another GP, etc... That's potentially dangerous. And unfortunately we can't do the acceleration from __note_gp_changes() due to lock ordering restrictions: nocb_lock -> rnp_lock > > Thanks! > > -Joel
> On Jan 23, 2023, at 11:27 AM, Frederic Weisbecker <frederic@kernel.org> wrote: > > On Mon, Jan 23, 2023 at 10:22:19AM -0500, Joel Fernandes wrote: >>> What am I missing? >> >> That the acceleration is also done by __note_gp_changes() once the >> grace period ends anyway, so if any acceleration was missed as you >> say, it will be done anyway. >> >> Also it is done by scheduler tick raising softirq: >> >> rcu_pending() does this: >> /* Has RCU gone idle with this CPU needing another grace period? */ >> if (!gp_in_progress && rcu_segcblist_is_enabled(&rdp->cblist) && >> !rcu_rdp_is_offloaded(rdp) && >> !rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL)) >> return 1; >> >> and rcu_core(): >> /* No grace period and unregistered callbacks? */ >> if (!rcu_gp_in_progress() && >> rcu_segcblist_is_enabled(&rdp->cblist) && do_batch) { >> rcu_nocb_lock_irqsave(rdp, flags); >> if (!rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL)) >> rcu_accelerate_cbs_unlocked(rnp, rdp); >> rcu_nocb_unlock_irqrestore(rdp, flags); >> } >> >> So, I am not sure if you need needacc at all. Those CBs that have not >> been assigned grace period numbers will be taken care off :) > > But that's only when there is no grace period pending, so it can't happen while > we report a QS. > > OTOH without the needacc, those callbacks waiting to be accelerated would be > eventually processed but only on the next tick following the end of a grace > period...if none has started since then. So if someone else starts a new GP > before the current CPU, we must wait another GP, etc... > > That's potentially dangerous. Waiting for just one more GP cannot be dangerous IMO. Anyway there is no guarantee that callback will run immediately at end of GP, there may be one or more GPs before callback can run, if I remember correctly. That is by design.. but please correct me if my understanding is different from yours. > > And unfortunately we can't do the acceleration from __note_gp_changes() due > to lock ordering restrictions: nocb_lock -> rnp_lock > Ah. This part I am not sure. Appreciate if point me to any old archive links or documentation detailing that, if possible… Thanks! - Joel >> >> Thanks! >> >> -Joel
On Mon, Jan 23, 2023 at 03:54:07PM -0500, Joel Fernandes wrote: > > On Jan 23, 2023, at 11:27 AM, Frederic Weisbecker <frederic@kernel.org> wrote: > > > > On Mon, Jan 23, 2023 at 10:22:19AM -0500, Joel Fernandes wrote: > >>> What am I missing? > >> > >> That the acceleration is also done by __note_gp_changes() once the > >> grace period ends anyway, so if any acceleration was missed as you > >> say, it will be done anyway. > >> > >> Also it is done by scheduler tick raising softirq: > >> > >> rcu_pending() does this: > >> /* Has RCU gone idle with this CPU needing another grace period? */ > >> if (!gp_in_progress && rcu_segcblist_is_enabled(&rdp->cblist) && > >> !rcu_rdp_is_offloaded(rdp) && > >> !rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL)) > >> return 1; > >> > >> and rcu_core(): > >> /* No grace period and unregistered callbacks? */ > >> if (!rcu_gp_in_progress() && > >> rcu_segcblist_is_enabled(&rdp->cblist) && do_batch) { > >> rcu_nocb_lock_irqsave(rdp, flags); > >> if (!rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL)) > >> rcu_accelerate_cbs_unlocked(rnp, rdp); > >> rcu_nocb_unlock_irqrestore(rdp, flags); > >> } > >> > >> So, I am not sure if you need needacc at all. Those CBs that have not > >> been assigned grace period numbers will be taken care off :) > > > > But that's only when there is no grace period pending, so it can't happen while > > we report a QS. > > > > OTOH without the needacc, those callbacks waiting to be accelerated would be > > eventually processed but only on the next tick following the end of a grace > > period...if none has started since then. So if someone else starts a new GP > > before the current CPU, we must wait another GP, etc... > > > > That's potentially dangerous. > > Waiting for just one more GP cannot be dangerous IMO. Anyway there is no > guarantee that callback will run immediately at end of GP, there may be one or > more GPs before callback can run, if I remember correctly. That is by > design.. but please correct me if my understanding is different from yours. It's not bound to just one GP. If you have N CPUs flooding callbacks for a long while, your CPU has 1/N chances to be the one starting the next GP on each turn. Relying on the acceleration to be performed only when no GP is running may theoretically starve your callbacks forever. > > And unfortunately we can't do the acceleration from __note_gp_changes() due > > to lock ordering restrictions: nocb_lock -> rnp_lock > > > > Ah. This part I am not sure. Appreciate if point me to any old archive links or documentation detailing that, if possible… It's not documented but the code in nocb_gp_wait() or nocb_cb_wait() has that locking order for example. Excerpt: rcu_nocb_lock_irqsave(rdp, flags); if (rcu_segcblist_nextgp(cblist, &cur_gp_seq) && rcu_seq_done(&rnp->gp_seq, cur_gp_seq) && raw_spin_trylock_rcu_node(rnp)) { /* irqs already disabled. */ needwake_gp = rcu_advance_cbs(rdp->mynode, rdp); raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */ } Thanks.
Hi Frederic, On Mon, Jan 23, 2023 at 10:11:15PM +0100, Frederic Weisbecker wrote: > On Mon, Jan 23, 2023 at 03:54:07PM -0500, Joel Fernandes wrote: > > > On Jan 23, 2023, at 11:27 AM, Frederic Weisbecker <frederic@kernel.org> wrote: > > > > > > On Mon, Jan 23, 2023 at 10:22:19AM -0500, Joel Fernandes wrote: > > >>> What am I missing? > > >> > > >> That the acceleration is also done by __note_gp_changes() once the > > >> grace period ends anyway, so if any acceleration was missed as you > > >> say, it will be done anyway. > > >> > > >> Also it is done by scheduler tick raising softirq: > > >> > > >> rcu_pending() does this: > > >> /* Has RCU gone idle with this CPU needing another grace period? */ > > >> if (!gp_in_progress && rcu_segcblist_is_enabled(&rdp->cblist) && > > >> !rcu_rdp_is_offloaded(rdp) && > > >> !rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL)) > > >> return 1; > > >> > > >> and rcu_core(): > > >> /* No grace period and unregistered callbacks? */ > > >> if (!rcu_gp_in_progress() && > > >> rcu_segcblist_is_enabled(&rdp->cblist) && do_batch) { > > >> rcu_nocb_lock_irqsave(rdp, flags); > > >> if (!rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL)) > > >> rcu_accelerate_cbs_unlocked(rnp, rdp); > > >> rcu_nocb_unlock_irqrestore(rdp, flags); > > >> } > > >> > > >> So, I am not sure if you need needacc at all. Those CBs that have not > > >> been assigned grace period numbers will be taken care off :) > > > > > > But that's only when there is no grace period pending, so it can't happen while > > > we report a QS. > > > > > > OTOH without the needacc, those callbacks waiting to be accelerated would be > > > eventually processed but only on the next tick following the end of a grace > > > period...if none has started since then. So if someone else starts a new GP > > > before the current CPU, we must wait another GP, etc... > > > > > > That's potentially dangerous. > > > > Waiting for just one more GP cannot be dangerous IMO. Anyway there is no > > guarantee that callback will run immediately at end of GP, there may be one or > > more GPs before callback can run, if I remember correctly. That is by > > design.. but please correct me if my understanding is different from yours. > > It's not bound to just one GP. If you have N CPUs flooding callbacks for a > long while, your CPU has 1/N chances to be the one starting the next GP on > each turn. Relying on the acceleration to be performed only when no GP is > running may theoretically starve your callbacks forever. I tend to agree with you, however I was mostly referring to the "needac". The theoretical case is even more theoretical because you have to be half way through the transition _and_ flooding at the same time such that there is not a chance for RCU to be idle for a long period of time (which we don't really see in our systems per-se). But I agree, even if theoretical, maybe better to handle it. > > > And unfortunately we can't do the acceleration from __note_gp_changes() > > > due to lock ordering restrictions: nocb_lock -> rnp_lock > > > Yeah I was more going for the fact that if the offload to deoffload transition completed before note_gp_changes() was called, which obviously we cannot assume... But yeah for the 'limbo between offload to not offload state', eventually RCU goes idle, the scheduling clock interrupt (via rcu_pending()) will raise softirq or note_gp_changes() will accelerate. Which does not help the theoretical flooding case above, but still... > > > > Ah. This part I am not sure. Appreciate if point me to any old archive > > links or documentation detailing that, if possible… > > It's not documented but the code in nocb_gp_wait() or nocb_cb_wait() has > that locking order for example. > > Excerpt: > > rcu_nocb_lock_irqsave(rdp, flags); if (rcu_segcblist_nextgp(cblist, > &cur_gp_seq) && rcu_seq_done(&rnp->gp_seq, cur_gp_seq) && > raw_spin_trylock_rcu_node(rnp)) { /* irqs already disabled. */ > needwake_gp = rcu_advance_cbs(rdp->mynode, rdp); > raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */ } > > Thanks. Ah I know what you mean now, in other words there is a chance of an ABBA deadlock if we do not acquire the locks in the correct order. Thank you for clarifying. Indeed rcutree_migrate_callbacks() also confirms it. I guess to Frederic's point, another case other than the CB flooding, that can starve CB accelerations is if the system is in the 'limbo between offload and de-offload' state for a long period of time. In such a situation also, neither note_gp_changes(), nor the scheduling-clock interrupt via softirq will help accelerate CBs. Whether such a limbo state is possible indefinitely, I am not sure... Thanks a lot for the discussions and it is always a learning experience talking about these...! thanks, - Joel
On Sat, Jan 21, 2023 at 12:38:35AM +0000, Zhang, Qiang1 wrote: [...] > >If you are in an intermediate state, part way to a !nocb state — > >you may have missed a nocb-related accel and wake, correct? Why does that matter? > >Once we transition to a !nocb state, we do not do a post-qs-report accel+wake anyway > > Should it be transition to a !nocb state, we do a post-qs-report accel+wake. > > >as we clearly know from the discussion. So why do we need to do it if we missed it for > >the intermediate stage? So, I am not fully sure yet what that needac is doing and why it is needed. > > For de-offloading, when in the process of > de-offloading(rcu_segcblist_completely_offloaded() return false), we're not > doing bypass even though this rdp is offloaded > state(rcu_rdp_is_offloaded(rdp) return true), at this time, the rcuog > kthread probably won't accel+wakeup, so we do accel+wakeup in > rcu_report_qs_rdp(), as you say why does that matter? for !nocb state, > we've always tried to accel+wakeup as much as possible(compared to nocb), > let rcu callback be executed as soon as possible. > > This is just my personal opinion, please correct me if I am wrong. I think your opinion is correct. The acceleration after the QS reporting may be needed for the case where we are part way between the offload and de-offload state as in this state (which we could call limbo state), there does not seem to be anywhere that acceleration is performed, and if this state persists for a long period of time, then no other path can accelerate the CBs thus likely starving them as Frederic mentioned.. thanks, - Joel > Thanks > Zqiang > > > > > >Do not get me wrong, stellar work here. But I suggest challenge the assumptions and the design, not always just the code that was already written :), apologies for any misplaced or noisy advice. > > > >Thanks! > > > > - Joel > > > > > > Thanks > > Zqiang > > > >> > >> (I am out of office till Monday but will intermittently (maybe) check > >> in, RCU is one of those things that daydreaming tends to lend itself > >> to...) > >> > >> - Joel
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index b2c204529478..0962c2202d45 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1956,7 +1956,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) { unsigned long flags; unsigned long mask; - bool needwake = false; bool needacc = false; struct rcu_node *rnp; @@ -1988,7 +1987,12 @@ rcu_report_qs_rdp(struct rcu_data *rdp) * NOCB kthreads have their own way to deal with that... */ if (!rcu_rdp_is_offloaded(rdp)) { - needwake = rcu_accelerate_cbs(rnp, rdp); + /* + * Current GP does not end, invoke rcu_gp_in_progress() + * will return true and so doesn't wake up GP kthread to + * start a new GP. + */ + WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp)); } else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) { /* * ...but NOCB kthreads may miss or delay callbacks acceleration @@ -2000,8 +2004,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp) rcu_disable_urgency_upon_qs(rdp); rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags); /* ^^^ Released rnp->lock */ - if (needwake) - rcu_gp_kthread_wake(); if (needacc) { rcu_nocb_lock_irqsave(rdp, flags);