Message ID | 20240117102616.18302-1-qiang.zhang1211@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-28832-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:42cf:b0:101:a8e8:374 with SMTP id q15csp814536dye; Wed, 17 Jan 2024 02:27:31 -0800 (PST) X-Google-Smtp-Source: AGHT+IFSolF/C7dR1o5DO1Xsb8y8aKnxJn92Nk9cCzwqJMJ+YEa+29TA+PtjOBWgmM0N6mu2CYmK X-Received: by 2002:a17:906:2805:b0:a29:8d04:9517 with SMTP id r5-20020a170906280500b00a298d049517mr3834183ejc.119.1705487251594; Wed, 17 Jan 2024 02:27:31 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705487251; cv=pass; d=google.com; s=arc-20160816; b=oaXEKIk0fOW9WBOKrFTiTbCb8BrelQtPOcTvLKWluvkdGPeTj7CNHN6yayhlCaSn3s NZawlIK4i3vgDCgehdnVoXVyQBD1kYnxG0Aq0v7DR8PGbaJY6vl9yQsxNuReNYv+oUPb aDFw9SpzU9XWcISuaMOmx0LbcCYXoVE7BPKO5gZoX6Y1NJOlGhZTplW9F6ofD9J6+Kcf Pkj92ip/YYnp8sQg2eIJCB81/mHMtTFZR3HiwSXru8uRXsDEjZiZSmXwlmo3BTY7WWgO qb72pzjTAmr4VBeOOLIEdAWuL4JKkGhtJ2jQE6HS8oQYT4XhzDWNC1wC9XLYGknlwMt4 Vl0w== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-id:precedence:message-id:date :subject:cc:to:from:dkim-signature; bh=Cr+Yty3TNQxQc0mfbzoHMY7BuPSC95Nw8wev9SQkgyM=; fh=pMR+EhsGLr1vpwIXYFvZoEhk00CDVX6m4YxTnnaYA34=; b=J0wwmPFXyFJRPdC8zO652vgtSHzQmQdFigKzrLBSi0Ch9ZBc0zyZ4WuH1DGAlZ/bLQ z3vPAMablSU4tot4y/7jKy1yzJlJDAqVc2I7i6ZlPuaNIIOKj2h23sWitBCYxKCG8tGH SqoemEwuKNkf6FNxO80wg3HCcfxV7hnX50MGRVGkIcGOf+7Pv8jychgvPNeB4CtzmBl8 3qkWFRp/hIvj61Bds8iKS8FEpvrT9ufp9tRNFfiI03ONPIboeWnLGRe6MPMT1w6XJakl vwKtoJzWhMIiEabuZ3iec76lH2I1+6LdRHS6z0/w5OCWONw8+qCQnFAcygqhmA6/8eJ+ c6Jg== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=W+2uUPPW; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-28832-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-28832-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id u17-20020a170906069100b00a2e9225ec87si1176977ejb.86.2024.01.17.02.27.31 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jan 2024 02:27:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-28832-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=W+2uUPPW; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-28832-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-28832-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id D7A791F2322F for <ouuuleilei@gmail.com>; Wed, 17 Jan 2024 10:27:16 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id ACCE11C6B4; Wed, 17 Jan 2024 10:26:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="W+2uUPPW" Received: from mail-yw1-f182.google.com (mail-yw1-f182.google.com [209.85.128.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AD3651BC2E; Wed, 17 Jan 2024 10:26:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705487190; cv=none; b=a8B/WQKz5IOXqUTPSc4ZbKeRgQQmexwDQtWdqbcGhm+PjAELwy7/pG4A+QcGfH7WZI0FUa3ARl/RJyz/tq8HLOgik/zGTT6DlRDQQV3p7QtzJ3kz7RYGjyFJv8UzjRpxB4D6wKxQT+RZttKnBoMeAozyHjJhno/g34IpgSO5NUo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705487190; c=relaxed/simple; bh=nw1ctmmn99ih3vrMxnXPLShNwmlGeic28XUYbf2UxTQ=; h=Received:DKIM-Signature:X-Google-DKIM-Signature: X-Gm-Message-State:X-Google-Smtp-Source:X-Received:Received:From: To:Cc:Subject:Date:Message-Id:X-Mailer; b=QOBpmUa9rXc+M5y6Gac9qcbRxNk7j6EFJ3Y2OH5+FLdmzgH7oetquzscmv9ZZV4YUUv+LYeimOGhUoLLir/3u1/SqHoNfhIHi8mlN/zb4XcMfArwAzXfUOTfgJ0b63Q40yIP/jjdeYKfwy5SQtIlUDmgt6lcwHaCKeRmExIqyIs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=W+2uUPPW; arc=none smtp.client-ip=209.85.128.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yw1-f182.google.com with SMTP id 00721157ae682-5f8cf76ef5bso97801697b3.0; Wed, 17 Jan 2024 02:26:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705487187; x=1706091987; darn=vger.kernel.org; h=message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Cr+Yty3TNQxQc0mfbzoHMY7BuPSC95Nw8wev9SQkgyM=; b=W+2uUPPWHR/jIIWzwTg5Z57VExOAwg6Kd7m1XlJrTweJjsNpwObS0dcjCzfQRtpxi9 zMwonfbsQiGmU97L38AR4R6qoeV6jKdVKlx9Ezszr9+f5ZbIfxMePaSQtjjn3nfQ4mvX FmZRo929peXTPTBGow5Y7kLtY7whZLCW8LKQqiMMBQzkJtX9V2kfl9ABbttME8PpVQvS ibBfFFVZu4zNBG/2E4Pnplf7ImmzS46Pz2T3RlvrNGs4GM4sr7axJs/s4YXSLVcBH+YX 3TEYaKY42ZVdIkTh6Uekt3EttwRzUxiII9jDkPYkcc76nTJOzX+R+GEbYagvgCIwryPW t/Jw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705487187; x=1706091987; h=message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Cr+Yty3TNQxQc0mfbzoHMY7BuPSC95Nw8wev9SQkgyM=; b=FTRGtEl/iu/Yd0tyou2fwBE85UpM8OdJ/UPXV2HbGHu2176ZIyZumY719PSIkRIfeA f+Zt5ByMPkBXr65GPkf/SLaDiJJ5vMgMlaA6OK2dxj0w/61O5ftcL9PVOi/WGVpamcjE JMuOKicA7OYIPLwGh82pm2m9OQeXFLMcPtHnAB9e3KF6Vb2XTr4iC4X3WEf5iaOvpfzU vm6z6hJYdlxpKqswtWGc6JGRIcxJTDDySV+kCm1xhNaLgINq7LY8ZO2D44g+D88Sb8Pn 53one/21xM0LMCPpv0vkNYd5iPg3/oLwxd9uFkCRfwW5e7L6W7IE2R86kORb75JjGGH4 msCA== X-Gm-Message-State: AOJu0YyKXdmalnRNfT6A1uwgQ67eSLxsAginOIIjL3Z9AU7XPnoBCWoY Wque0/6ou8v7ecgGvyah7zA= X-Received: by 2002:a81:b385:0:b0:5e9:5538:d930 with SMTP id r127-20020a81b385000000b005e95538d930mr6529658ywh.47.1705487187588; Wed, 17 Jan 2024 02:26:27 -0800 (PST) Received: from MSCND1355B05.fareast.nevint.com ([117.128.58.94]) by smtp.gmail.com with ESMTPSA id e18-20020a170902cf5200b001d58ed4c58asm10657042plg.63.2024.01.17.02.26.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jan 2024 02:26:27 -0800 (PST) From: Zqiang <qiang.zhang1211@gmail.com> To: paulmck@kernel.org, frederic@kernel.org, quic_neeraju@quicinc.com, joel@joelfernandes.org Cc: qiang.zhang1211@gmail.com, rcu@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] rcu/nocb: Check rdp_gp->nocb_timer in __call_rcu_nocb_wake() Date: Wed, 17 Jan 2024 18:26:16 +0800 Message-Id: <20240117102616.18302-1-qiang.zhang1211@gmail.com> X-Mailer: git-send-email 2.17.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1788333000214100830 X-GMAIL-MSGID: 1788333000214100830 |
Series |
rcu/nocb: Check rdp_gp->nocb_timer in __call_rcu_nocb_wake()
|
|
Commit Message
Z qiang
Jan. 17, 2024, 10:26 a.m. UTC
Currently, only rdp_gp->nocb_timer is used, for nocb_timer of
no-rdp_gp structure, the timer_pending() is always return false,
this commit therefore need to check rdp_gp->nocb_timer in
__call_rcu_nocb_wake().
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
---
kernel/rcu/tree_nocb.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Comments
Le Wed, Jan 17, 2024 at 06:26:16PM +0800, Zqiang a écrit : > Currently, only rdp_gp->nocb_timer is used, for nocb_timer of > no-rdp_gp structure, the timer_pending() is always return false, > this commit therefore need to check rdp_gp->nocb_timer in > __call_rcu_nocb_wake(). > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > --- > kernel/rcu/tree_nocb.h | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h > index 54971afc3a9b..3f85577bddd4 100644 > --- a/kernel/rcu/tree_nocb.h > +++ b/kernel/rcu/tree_nocb.h > @@ -564,6 +564,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, > long lazy_len; > long len; > struct task_struct *t; > + struct rcu_data *rdp_gp = rdp->nocb_gp_rdp; > > // If we are being polled or there is no kthread, just leave. > t = READ_ONCE(rdp->nocb_gp_kthread); > @@ -608,7 +609,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, > smp_mb(); /* Enqueue before timer_pending(). */ > if ((rdp->nocb_cb_sleep || > !rcu_segcblist_ready_cbs(&rdp->cblist)) && > - !timer_pending(&rdp->nocb_timer)) { > + !timer_pending(&rdp_gp->nocb_timer)) { Hehe, good eyes ;-) I had that change in mind but while checking that area further I actually wondered what is the actual purpose of this RCU_NOCB_WAKE_FORCE thing. If we reach that place, it means that the nocb_gp kthread should be awaken already (or the timer pending), so what does a force wake up solve in that case? Paul, any recollection of that? In the meantime: Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
On Wed, Jan 17, 2024 at 01:07:25PM +0100, Frederic Weisbecker wrote: > Le Wed, Jan 17, 2024 at 06:26:16PM +0800, Zqiang a écrit : > > Currently, only rdp_gp->nocb_timer is used, for nocb_timer of > > no-rdp_gp structure, the timer_pending() is always return false, > > this commit therefore need to check rdp_gp->nocb_timer in > > __call_rcu_nocb_wake(). > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > --- > > kernel/rcu/tree_nocb.h | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h > > index 54971afc3a9b..3f85577bddd4 100644 > > --- a/kernel/rcu/tree_nocb.h > > +++ b/kernel/rcu/tree_nocb.h > > @@ -564,6 +564,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, > > long lazy_len; > > long len; > > struct task_struct *t; > > + struct rcu_data *rdp_gp = rdp->nocb_gp_rdp; > > > > // If we are being polled or there is no kthread, just leave. > > t = READ_ONCE(rdp->nocb_gp_kthread); > > @@ -608,7 +609,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, > > smp_mb(); /* Enqueue before timer_pending(). */ > > if ((rdp->nocb_cb_sleep || > > !rcu_segcblist_ready_cbs(&rdp->cblist)) && > > - !timer_pending(&rdp->nocb_timer)) { > > + !timer_pending(&rdp_gp->nocb_timer)) { > > Hehe, good eyes ;-) > > I had that change in mind but while checking that area further I actually > wondered what is the actual purpose of this RCU_NOCB_WAKE_FORCE thing. If > we reach that place, it means that the nocb_gp kthread should be awaken > already (or the timer pending), so what does a force wake up solve in that > case? > > Paul, any recollection of that? Huh. We never actually do RCU_NOCB_WAKE_FORCE in v6.7, if I followed all the code paths correctly. Historically, I have been worried about lost wakeups. Also, there used to be code paths in which a wakeup was not needed, for example, because we knew that the ending of the current grace period would take care of things. Unless there was some huge pile of callbacks, in which case an immediate wakeup could avoid falling behind a callback flood. Given that rcutorture does test callback flooding, we appear to be OK, but maybe it is time to crank up the flooding more. On the other hand, I have started seeing the (very) occasional OOM on TREE03. (In addition to those that show up from time to time on the single-CPU TREE09 scenario.) Thanx, Paul > In the meantime: > > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
On Thu, Jan 18, 2024 at 06:51:57AM -0800, Paul E. McKenney wrote: > On Wed, Jan 17, 2024 at 01:07:25PM +0100, Frederic Weisbecker wrote: > > Le Wed, Jan 17, 2024 at 06:26:16PM +0800, Zqiang a écrit : > > > Currently, only rdp_gp->nocb_timer is used, for nocb_timer of > > > no-rdp_gp structure, the timer_pending() is always return false, > > > this commit therefore need to check rdp_gp->nocb_timer in > > > __call_rcu_nocb_wake(). > > > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > > --- > > > kernel/rcu/tree_nocb.h | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h > > > index 54971afc3a9b..3f85577bddd4 100644 > > > --- a/kernel/rcu/tree_nocb.h > > > +++ b/kernel/rcu/tree_nocb.h > > > @@ -564,6 +564,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, > > > long lazy_len; > > > long len; > > > struct task_struct *t; > > > + struct rcu_data *rdp_gp = rdp->nocb_gp_rdp; > > > > > > // If we are being polled or there is no kthread, just leave. > > > t = READ_ONCE(rdp->nocb_gp_kthread); > > > @@ -608,7 +609,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, > > > smp_mb(); /* Enqueue before timer_pending(). */ > > > if ((rdp->nocb_cb_sleep || > > > !rcu_segcblist_ready_cbs(&rdp->cblist)) && > > > - !timer_pending(&rdp->nocb_timer)) { > > > + !timer_pending(&rdp_gp->nocb_timer)) { > > > > Hehe, good eyes ;-) > > > > I had that change in mind but while checking that area further I actually > > wondered what is the actual purpose of this RCU_NOCB_WAKE_FORCE thing. If > > we reach that place, it means that the nocb_gp kthread should be awaken > > already (or the timer pending), so what does a force wake up solve in that > > case? > > > > Paul, any recollection of that? > > Huh. We never actually do RCU_NOCB_WAKE_FORCE in v6.7, if I followed > all the code paths correctly. > > Historically, I have been worried about lost wakeups. Also, there > used to be code paths in which a wakeup was not needed, for example, > because we knew that the ending of the current grace period would take > care of things. Unless there was some huge pile of callbacks, in which > case an immediate wakeup could avoid falling behind a callback flood. > > Given that rcutorture does test callback flooding, we appear to be OK, > but maybe it is time to crank up the flooding more. > > On the other hand, I have started seeing the (very) occasional OOM > on TREE03. (In addition to those that show up from time to time on the > single-CPU TREE09 scenario.) Oh, and queued for further review and testing, thank you both! Thanx, Paul > > In the meantime: > > > > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Le Thu, Jan 18, 2024 at 06:51:57AM -0800, Paul E. McKenney a écrit : > On Wed, Jan 17, 2024 at 01:07:25PM +0100, Frederic Weisbecker wrote: > > Le Wed, Jan 17, 2024 at 06:26:16PM +0800, Zqiang a écrit : > > > Currently, only rdp_gp->nocb_timer is used, for nocb_timer of > > > no-rdp_gp structure, the timer_pending() is always return false, > > > this commit therefore need to check rdp_gp->nocb_timer in > > > __call_rcu_nocb_wake(). > > > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > > --- > > > kernel/rcu/tree_nocb.h | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h > > > index 54971afc3a9b..3f85577bddd4 100644 > > > --- a/kernel/rcu/tree_nocb.h > > > +++ b/kernel/rcu/tree_nocb.h > > > @@ -564,6 +564,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, > > > long lazy_len; > > > long len; > > > struct task_struct *t; > > > + struct rcu_data *rdp_gp = rdp->nocb_gp_rdp; > > > > > > // If we are being polled or there is no kthread, just leave. > > > t = READ_ONCE(rdp->nocb_gp_kthread); > > > @@ -608,7 +609,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, > > > smp_mb(); /* Enqueue before timer_pending(). */ > > > if ((rdp->nocb_cb_sleep || > > > !rcu_segcblist_ready_cbs(&rdp->cblist)) && > > > - !timer_pending(&rdp->nocb_timer)) { > > > + !timer_pending(&rdp_gp->nocb_timer)) { > > > > Hehe, good eyes ;-) > > > > I had that change in mind but while checking that area further I actually > > wondered what is the actual purpose of this RCU_NOCB_WAKE_FORCE thing. If > > we reach that place, it means that the nocb_gp kthread should be awaken > > already (or the timer pending), so what does a force wake up solve in that > > case? > > > > Paul, any recollection of that? > > Huh. We never actually do RCU_NOCB_WAKE_FORCE in v6.7, if I followed > all the code paths correctly. > > Historically, I have been worried about lost wakeups. Also, there > used to be code paths in which a wakeup was not needed, for example, > because we knew that the ending of the current grace period would take > care of things. Unless there was some huge pile of callbacks, in which > case an immediate wakeup could avoid falling behind a callback flood. Ok then looks like it's time for me to add RCU_NOCB_WAKE_FORCE removal in my TODO list...unless Zqiang would like to give it a try? :-) > > Given that rcutorture does test callback flooding, we appear to be OK, > but maybe it is time to crank up the flooding more. > > On the other hand, I have started seeing the (very) occasional OOM > on TREE03. > (In addition to those that show up from time to time on the > single-CPU TREE09 scenario.) Interesting, are those recent? Bisectable? Thanks!
On Fri, Jan 19, 2024 at 10:47:23PM +0100, Frederic Weisbecker wrote: > Le Thu, Jan 18, 2024 at 06:51:57AM -0800, Paul E. McKenney a écrit : > > On Wed, Jan 17, 2024 at 01:07:25PM +0100, Frederic Weisbecker wrote: > > > Le Wed, Jan 17, 2024 at 06:26:16PM +0800, Zqiang a écrit : > > > > Currently, only rdp_gp->nocb_timer is used, for nocb_timer of > > > > no-rdp_gp structure, the timer_pending() is always return false, > > > > this commit therefore need to check rdp_gp->nocb_timer in > > > > __call_rcu_nocb_wake(). > > > > > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > > > --- > > > > kernel/rcu/tree_nocb.h | 3 ++- > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h > > > > index 54971afc3a9b..3f85577bddd4 100644 > > > > --- a/kernel/rcu/tree_nocb.h > > > > +++ b/kernel/rcu/tree_nocb.h > > > > @@ -564,6 +564,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, > > > > long lazy_len; > > > > long len; > > > > struct task_struct *t; > > > > + struct rcu_data *rdp_gp = rdp->nocb_gp_rdp; > > > > > > > > // If we are being polled or there is no kthread, just leave. > > > > t = READ_ONCE(rdp->nocb_gp_kthread); > > > > @@ -608,7 +609,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, > > > > smp_mb(); /* Enqueue before timer_pending(). */ > > > > if ((rdp->nocb_cb_sleep || > > > > !rcu_segcblist_ready_cbs(&rdp->cblist)) && > > > > - !timer_pending(&rdp->nocb_timer)) { > > > > + !timer_pending(&rdp_gp->nocb_timer)) { > > > > > > Hehe, good eyes ;-) > > > > > > I had that change in mind but while checking that area further I actually > > > wondered what is the actual purpose of this RCU_NOCB_WAKE_FORCE thing. If > > > we reach that place, it means that the nocb_gp kthread should be awaken > > > already (or the timer pending), so what does a force wake up solve in that > > > case? > > > > > > Paul, any recollection of that? > > > > Huh. We never actually do RCU_NOCB_WAKE_FORCE in v6.7, if I followed > > all the code paths correctly. > > > > Historically, I have been worried about lost wakeups. Also, there > > used to be code paths in which a wakeup was not needed, for example, > > because we knew that the ending of the current grace period would take > > care of things. Unless there was some huge pile of callbacks, in which > > case an immediate wakeup could avoid falling behind a callback flood. > > Ok then looks like it's time for me to add RCU_NOCB_WAKE_FORCE removal in > my TODO list...unless Zqiang would like to give it a try? :-) > > > Given that rcutorture does test callback flooding, we appear to be OK, > > but maybe it is time to crank up the flooding more. > > > > On the other hand, I have started seeing the (very) occasional OOM > > on TREE03. > > (In addition to those that show up from time to time on the > > single-CPU TREE09 scenario.) > > Interesting, are those recent? Bisectable? Bisection in progress, got it down to 10 commits. Yet again about ten hours per step on 20 systems... Though maybe I should have put more time into making it happen faster. Except that I was on travel, so I doubt that I would have made all that much progress. ;-) Thanx, Paul Thanx, Paul
On Fri, Jan 19, 2024 at 04:29:52PM -0800, Paul E. McKenney wrote: > On Fri, Jan 19, 2024 at 10:47:23PM +0100, Frederic Weisbecker wrote: > > Le Thu, Jan 18, 2024 at 06:51:57AM -0800, Paul E. McKenney a écrit : > > > On Wed, Jan 17, 2024 at 01:07:25PM +0100, Frederic Weisbecker wrote: > > > > Le Wed, Jan 17, 2024 at 06:26:16PM +0800, Zqiang a écrit : > > > > > Currently, only rdp_gp->nocb_timer is used, for nocb_timer of > > > > > no-rdp_gp structure, the timer_pending() is always return false, > > > > > this commit therefore need to check rdp_gp->nocb_timer in > > > > > __call_rcu_nocb_wake(). > > > > > > > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > > > > --- > > > > > kernel/rcu/tree_nocb.h | 3 ++- > > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h > > > > > index 54971afc3a9b..3f85577bddd4 100644 > > > > > --- a/kernel/rcu/tree_nocb.h > > > > > +++ b/kernel/rcu/tree_nocb.h > > > > > @@ -564,6 +564,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, > > > > > long lazy_len; > > > > > long len; > > > > > struct task_struct *t; > > > > > + struct rcu_data *rdp_gp = rdp->nocb_gp_rdp; > > > > > > > > > > // If we are being polled or there is no kthread, just leave. > > > > > t = READ_ONCE(rdp->nocb_gp_kthread); > > > > > @@ -608,7 +609,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, > > > > > smp_mb(); /* Enqueue before timer_pending(). */ > > > > > if ((rdp->nocb_cb_sleep || > > > > > !rcu_segcblist_ready_cbs(&rdp->cblist)) && > > > > > - !timer_pending(&rdp->nocb_timer)) { > > > > > + !timer_pending(&rdp_gp->nocb_timer)) { > > > > > > > > Hehe, good eyes ;-) > > > > > > > > I had that change in mind but while checking that area further I actually > > > > wondered what is the actual purpose of this RCU_NOCB_WAKE_FORCE thing. If > > > > we reach that place, it means that the nocb_gp kthread should be awaken > > > > already (or the timer pending), so what does a force wake up solve in that > > > > case? > > > > > > > > Paul, any recollection of that? > > > > > > Huh. We never actually do RCU_NOCB_WAKE_FORCE in v6.7, if I followed > > > all the code paths correctly. > > > > > > Historically, I have been worried about lost wakeups. Also, there > > > used to be code paths in which a wakeup was not needed, for example, > > > because we knew that the ending of the current grace period would take > > > care of things. Unless there was some huge pile of callbacks, in which > > > case an immediate wakeup could avoid falling behind a callback flood. > > > > Ok then looks like it's time for me to add RCU_NOCB_WAKE_FORCE removal in > > my TODO list...unless Zqiang would like to give it a try? :-) > > > > > Given that rcutorture does test callback flooding, we appear to be OK, > > > but maybe it is time to crank up the flooding more. > > > > > > On the other hand, I have started seeing the (very) occasional OOM > > > on TREE03. > > > (In addition to those that show up from time to time on the > > > single-CPU TREE09 scenario.) > > > > Interesting, are those recent? Bisectable? > > Bisection in progress, got it down to 10 commits. Yet again about > ten hours per step on 20 systems... > > Though maybe I should have put more time into making it happen faster. > Except that I was on travel, so I doubt that I would have made all that > much progress. ;-) And it hit this one, which you encountered earlier: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier") Which you fixed with this guy: 600310bd7ea8 ("rcu: Defer RCU kthreads wakeup when CPU is dying") Which is not yet in -next. Which means I have spent an embarrassing amount of time bisecting a bug that you already fixed. C'est la vie! Given that v6.8-rc1 is now out, it is time to get a bunch of RCU commits into -next, including that one! ;-) Thanx, Paul
On Mon, Jan 22, 2024 at 04:07:09AM -0800, Paul E. McKenney wrote: > On Fri, Jan 19, 2024 at 04:29:52PM -0800, Paul E. McKenney wrote: > > On Fri, Jan 19, 2024 at 10:47:23PM +0100, Frederic Weisbecker wrote: > > > Le Thu, Jan 18, 2024 at 06:51:57AM -0800, Paul E. McKenney a écrit : > > > > On Wed, Jan 17, 2024 at 01:07:25PM +0100, Frederic Weisbecker wrote: > > > > > Le Wed, Jan 17, 2024 at 06:26:16PM +0800, Zqiang a écrit : > > > > > > Currently, only rdp_gp->nocb_timer is used, for nocb_timer of > > > > > > no-rdp_gp structure, the timer_pending() is always return false, > > > > > > this commit therefore need to check rdp_gp->nocb_timer in > > > > > > __call_rcu_nocb_wake(). > > > > > > > > > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > > > > > --- > > > > > > kernel/rcu/tree_nocb.h | 3 ++- > > > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h > > > > > > index 54971afc3a9b..3f85577bddd4 100644 > > > > > > --- a/kernel/rcu/tree_nocb.h > > > > > > +++ b/kernel/rcu/tree_nocb.h > > > > > > @@ -564,6 +564,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, > > > > > > long lazy_len; > > > > > > long len; > > > > > > struct task_struct *t; > > > > > > + struct rcu_data *rdp_gp = rdp->nocb_gp_rdp; > > > > > > > > > > > > // If we are being polled or there is no kthread, just leave. > > > > > > t = READ_ONCE(rdp->nocb_gp_kthread); > > > > > > @@ -608,7 +609,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, > > > > > > smp_mb(); /* Enqueue before timer_pending(). */ > > > > > > if ((rdp->nocb_cb_sleep || > > > > > > !rcu_segcblist_ready_cbs(&rdp->cblist)) && > > > > > > - !timer_pending(&rdp->nocb_timer)) { > > > > > > + !timer_pending(&rdp_gp->nocb_timer)) { > > > > > > > > > > Hehe, good eyes ;-) > > > > > > > > > > I had that change in mind but while checking that area further I actually > > > > > wondered what is the actual purpose of this RCU_NOCB_WAKE_FORCE thing. If > > > > > we reach that place, it means that the nocb_gp kthread should be awaken > > > > > already (or the timer pending), so what does a force wake up solve in that > > > > > case? > > > > > > > > > > Paul, any recollection of that? > > > > > > > > Huh. We never actually do RCU_NOCB_WAKE_FORCE in v6.7, if I followed > > > > all the code paths correctly. > > > > > > > > Historically, I have been worried about lost wakeups. Also, there > > > > used to be code paths in which a wakeup was not needed, for example, > > > > because we knew that the ending of the current grace period would take > > > > care of things. Unless there was some huge pile of callbacks, in which > > > > case an immediate wakeup could avoid falling behind a callback flood. > > > > > > Ok then looks like it's time for me to add RCU_NOCB_WAKE_FORCE removal in > > > my TODO list...unless Zqiang would like to give it a try? :-) > > > > > > > Given that rcutorture does test callback flooding, we appear to be OK, > > > > but maybe it is time to crank up the flooding more. > > > > > > > > On the other hand, I have started seeing the (very) occasional OOM > > > > on TREE03. > > > > (In addition to those that show up from time to time on the > > > > single-CPU TREE09 scenario.) > > > > > > Interesting, are those recent? Bisectable? > > > > Bisection in progress, got it down to 10 commits. Yet again about > > ten hours per step on 20 systems... > > > > Though maybe I should have put more time into making it happen faster. > > Except that I was on travel, so I doubt that I would have made all that > > much progress. ;-) > > And it hit this one, which you encountered earlier: > > 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier") > > Which you fixed with this guy: > > 600310bd7ea8 ("rcu: Defer RCU kthreads wakeup when CPU is dying") > > Which is not yet in -next. Which means I have spent an embarrassing > amount of time bisecting a bug that you already fixed. C'est la vie! > > Given that v6.8-rc1 is now out, it is time to get a bunch of RCU > commits into -next, including that one! ;-) Now to test your fix on top of the bad commit, and also on top of next-20240110. Just in case... Thanx, Paul
Le Mon, Jan 22, 2024 at 04:11:20AM -0800, Paul E. McKenney a écrit : > On Mon, Jan 22, 2024 at 04:07:09AM -0800, Paul E. McKenney wrote: > > On Fri, Jan 19, 2024 at 04:29:52PM -0800, Paul E. McKenney wrote: > > > On Fri, Jan 19, 2024 at 10:47:23PM +0100, Frederic Weisbecker wrote: > > > > Le Thu, Jan 18, 2024 at 06:51:57AM -0800, Paul E. McKenney a écrit : > > > > > On Wed, Jan 17, 2024 at 01:07:25PM +0100, Frederic Weisbecker wrote: > > > > > > Le Wed, Jan 17, 2024 at 06:26:16PM +0800, Zqiang a écrit : > > > > > > > Currently, only rdp_gp->nocb_timer is used, for nocb_timer of > > > > > > > no-rdp_gp structure, the timer_pending() is always return false, > > > > > > > this commit therefore need to check rdp_gp->nocb_timer in > > > > > > > __call_rcu_nocb_wake(). > > > > > > > > > > > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > > > > > > --- > > > > > > > kernel/rcu/tree_nocb.h | 3 ++- > > > > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h > > > > > > > index 54971afc3a9b..3f85577bddd4 100644 > > > > > > > --- a/kernel/rcu/tree_nocb.h > > > > > > > +++ b/kernel/rcu/tree_nocb.h > > > > > > > @@ -564,6 +564,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, > > > > > > > long lazy_len; > > > > > > > long len; > > > > > > > struct task_struct *t; > > > > > > > + struct rcu_data *rdp_gp = rdp->nocb_gp_rdp; > > > > > > > > > > > > > > // If we are being polled or there is no kthread, just leave. > > > > > > > t = READ_ONCE(rdp->nocb_gp_kthread); > > > > > > > @@ -608,7 +609,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, > > > > > > > smp_mb(); /* Enqueue before timer_pending(). */ > > > > > > > if ((rdp->nocb_cb_sleep || > > > > > > > !rcu_segcblist_ready_cbs(&rdp->cblist)) && > > > > > > > - !timer_pending(&rdp->nocb_timer)) { > > > > > > > + !timer_pending(&rdp_gp->nocb_timer)) { > > > > > > > > > > > > Hehe, good eyes ;-) > > > > > > > > > > > > I had that change in mind but while checking that area further I actually > > > > > > wondered what is the actual purpose of this RCU_NOCB_WAKE_FORCE thing. If > > > > > > we reach that place, it means that the nocb_gp kthread should be awaken > > > > > > already (or the timer pending), so what does a force wake up solve in that > > > > > > case? > > > > > > > > > > > > Paul, any recollection of that? > > > > > > > > > > Huh. We never actually do RCU_NOCB_WAKE_FORCE in v6.7, if I followed > > > > > all the code paths correctly. > > > > > > > > > > Historically, I have been worried about lost wakeups. Also, there > > > > > used to be code paths in which a wakeup was not needed, for example, > > > > > because we knew that the ending of the current grace period would take > > > > > care of things. Unless there was some huge pile of callbacks, in which > > > > > case an immediate wakeup could avoid falling behind a callback flood. > > > > > > > > Ok then looks like it's time for me to add RCU_NOCB_WAKE_FORCE removal in > > > > my TODO list...unless Zqiang would like to give it a try? :-) > > > > > > > > > Given that rcutorture does test callback flooding, we appear to be OK, > > > > > but maybe it is time to crank up the flooding more. > > > > > > > > > > On the other hand, I have started seeing the (very) occasional OOM > > > > > on TREE03. > > > > > (In addition to those that show up from time to time on the > > > > > single-CPU TREE09 scenario.) > > > > > > > > Interesting, are those recent? Bisectable? > > > > > > Bisection in progress, got it down to 10 commits. Yet again about > > > ten hours per step on 20 systems... > > > > > > Though maybe I should have put more time into making it happen faster. > > > Except that I was on travel, so I doubt that I would have made all that > > > much progress. ;-) > > > > And it hit this one, which you encountered earlier: > > > > 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier") > > > > Which you fixed with this guy: > > > > 600310bd7ea8 ("rcu: Defer RCU kthreads wakeup when CPU is dying") > > > > Which is not yet in -next. Which means I have spent an embarrassing > > amount of time bisecting a bug that you already fixed. C'est la vie! > > > > Given that v6.8-rc1 is now out, it is time to get a bunch of RCU > > commits into -next, including that one! ;-) > > Now to test your fix on top of the bad commit, and also on top of > next-20240110. Just in case... Thanks! I may send a v2 at some point within next week. No big change, just a comment and also expand the swake_up_one_online() logic to nocb kthreads as well. > > Thanx, Paul
On Mon, Jan 22, 2024 at 10:41:04PM +0100, Frederic Weisbecker wrote: > Le Mon, Jan 22, 2024 at 04:11:20AM -0800, Paul E. McKenney a écrit : > > On Mon, Jan 22, 2024 at 04:07:09AM -0800, Paul E. McKenney wrote: > > > On Fri, Jan 19, 2024 at 04:29:52PM -0800, Paul E. McKenney wrote: > > > > On Fri, Jan 19, 2024 at 10:47:23PM +0100, Frederic Weisbecker wrote: > > > > > Le Thu, Jan 18, 2024 at 06:51:57AM -0800, Paul E. McKenney a écrit : > > > > > > On Wed, Jan 17, 2024 at 01:07:25PM +0100, Frederic Weisbecker wrote: > > > > > > > Le Wed, Jan 17, 2024 at 06:26:16PM +0800, Zqiang a écrit : > > > > > > > > Currently, only rdp_gp->nocb_timer is used, for nocb_timer of > > > > > > > > no-rdp_gp structure, the timer_pending() is always return false, > > > > > > > > this commit therefore need to check rdp_gp->nocb_timer in > > > > > > > > __call_rcu_nocb_wake(). > > > > > > > > > > > > > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > > > > > > > --- > > > > > > > > kernel/rcu/tree_nocb.h | 3 ++- > > > > > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h > > > > > > > > index 54971afc3a9b..3f85577bddd4 100644 > > > > > > > > --- a/kernel/rcu/tree_nocb.h > > > > > > > > +++ b/kernel/rcu/tree_nocb.h > > > > > > > > @@ -564,6 +564,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, > > > > > > > > long lazy_len; > > > > > > > > long len; > > > > > > > > struct task_struct *t; > > > > > > > > + struct rcu_data *rdp_gp = rdp->nocb_gp_rdp; > > > > > > > > > > > > > > > > // If we are being polled or there is no kthread, just leave. > > > > > > > > t = READ_ONCE(rdp->nocb_gp_kthread); > > > > > > > > @@ -608,7 +609,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, > > > > > > > > smp_mb(); /* Enqueue before timer_pending(). */ > > > > > > > > if ((rdp->nocb_cb_sleep || > > > > > > > > !rcu_segcblist_ready_cbs(&rdp->cblist)) && > > > > > > > > - !timer_pending(&rdp->nocb_timer)) { > > > > > > > > + !timer_pending(&rdp_gp->nocb_timer)) { > > > > > > > > > > > > > > Hehe, good eyes ;-) > > > > > > > > > > > > > > I had that change in mind but while checking that area further I actually > > > > > > > wondered what is the actual purpose of this RCU_NOCB_WAKE_FORCE thing. If > > > > > > > we reach that place, it means that the nocb_gp kthread should be awaken > > > > > > > already (or the timer pending), so what does a force wake up solve in that > > > > > > > case? > > > > > > > > > > > > > > Paul, any recollection of that? > > > > > > > > > > > > Huh. We never actually do RCU_NOCB_WAKE_FORCE in v6.7, if I followed > > > > > > all the code paths correctly. > > > > > > > > > > > > Historically, I have been worried about lost wakeups. Also, there > > > > > > used to be code paths in which a wakeup was not needed, for example, > > > > > > because we knew that the ending of the current grace period would take > > > > > > care of things. Unless there was some huge pile of callbacks, in which > > > > > > case an immediate wakeup could avoid falling behind a callback flood. > > > > > > > > > > Ok then looks like it's time for me to add RCU_NOCB_WAKE_FORCE removal in > > > > > my TODO list...unless Zqiang would like to give it a try? :-) > > > > > > > > > > > Given that rcutorture does test callback flooding, we appear to be OK, > > > > > > but maybe it is time to crank up the flooding more. > > > > > > > > > > > > On the other hand, I have started seeing the (very) occasional OOM > > > > > > on TREE03. > > > > > > (In addition to those that show up from time to time on the > > > > > > single-CPU TREE09 scenario.) > > > > > > > > > > Interesting, are those recent? Bisectable? > > > > > > > > Bisection in progress, got it down to 10 commits. Yet again about > > > > ten hours per step on 20 systems... > > > > > > > > Though maybe I should have put more time into making it happen faster. > > > > Except that I was on travel, so I doubt that I would have made all that > > > > much progress. ;-) > > > > > > And it hit this one, which you encountered earlier: > > > > > > 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier") > > > > > > Which you fixed with this guy: > > > > > > 600310bd7ea8 ("rcu: Defer RCU kthreads wakeup when CPU is dying") > > > > > > Which is not yet in -next. Which means I have spent an embarrassing > > > amount of time bisecting a bug that you already fixed. C'est la vie! > > > > > > Given that v6.8-rc1 is now out, it is time to get a bunch of RCU > > > commits into -next, including that one! ;-) > > > > Now to test your fix on top of the bad commit, and also on top of > > next-20240110. Just in case... > > Thanks! > > I may send a v2 at some point within next week. No big change, just a comment > and also expand the swake_up_one_online() logic to nocb kthreads as well. Looking forward to it! Thanx, Paul
On Tue, Jan 23, 2024 at 02:48:19AM -0800, Paul E. McKenney wrote: > On Mon, Jan 22, 2024 at 10:41:04PM +0100, Frederic Weisbecker wrote: > > Le Mon, Jan 22, 2024 at 04:11:20AM -0800, Paul E. McKenney a écrit : > > > On Mon, Jan 22, 2024 at 04:07:09AM -0800, Paul E. McKenney wrote: > > > > On Fri, Jan 19, 2024 at 04:29:52PM -0800, Paul E. McKenney wrote: > > > > > On Fri, Jan 19, 2024 at 10:47:23PM +0100, Frederic Weisbecker wrote: > > > > > > Le Thu, Jan 18, 2024 at 06:51:57AM -0800, Paul E. McKenney a écrit : > > > > > > > On Wed, Jan 17, 2024 at 01:07:25PM +0100, Frederic Weisbecker wrote: > > > > > > > > Le Wed, Jan 17, 2024 at 06:26:16PM +0800, Zqiang a écrit : > > > > > > > > > Currently, only rdp_gp->nocb_timer is used, for nocb_timer of > > > > > > > > > no-rdp_gp structure, the timer_pending() is always return false, > > > > > > > > > this commit therefore need to check rdp_gp->nocb_timer in > > > > > > > > > __call_rcu_nocb_wake(). > > > > > > > > > > > > > > > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > > > > > > > > --- > > > > > > > > > kernel/rcu/tree_nocb.h | 3 ++- > > > > > > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > > > > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h > > > > > > > > > index 54971afc3a9b..3f85577bddd4 100644 > > > > > > > > > --- a/kernel/rcu/tree_nocb.h > > > > > > > > > +++ b/kernel/rcu/tree_nocb.h > > > > > > > > > @@ -564,6 +564,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, > > > > > > > > > long lazy_len; > > > > > > > > > long len; > > > > > > > > > struct task_struct *t; > > > > > > > > > + struct rcu_data *rdp_gp = rdp->nocb_gp_rdp; > > > > > > > > > > > > > > > > > > // If we are being polled or there is no kthread, just leave. > > > > > > > > > t = READ_ONCE(rdp->nocb_gp_kthread); > > > > > > > > > @@ -608,7 +609,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, > > > > > > > > > smp_mb(); /* Enqueue before timer_pending(). */ > > > > > > > > > if ((rdp->nocb_cb_sleep || > > > > > > > > > !rcu_segcblist_ready_cbs(&rdp->cblist)) && > > > > > > > > > - !timer_pending(&rdp->nocb_timer)) { > > > > > > > > > + !timer_pending(&rdp_gp->nocb_timer)) { > > > > > > > > > > > > > > > > Hehe, good eyes ;-) > > > > > > > > > > > > > > > > I had that change in mind but while checking that area further I actually > > > > > > > > wondered what is the actual purpose of this RCU_NOCB_WAKE_FORCE thing. If > > > > > > > > we reach that place, it means that the nocb_gp kthread should be awaken > > > > > > > > already (or the timer pending), so what does a force wake up solve in that > > > > > > > > case? > > > > > > > > > > > > > > > > Paul, any recollection of that? > > > > > > > > > > > > > > Huh. We never actually do RCU_NOCB_WAKE_FORCE in v6.7, if I followed > > > > > > > all the code paths correctly. > > > > > > > > > > > > > > Historically, I have been worried about lost wakeups. Also, there > > > > > > > used to be code paths in which a wakeup was not needed, for example, > > > > > > > because we knew that the ending of the current grace period would take > > > > > > > care of things. Unless there was some huge pile of callbacks, in which > > > > > > > case an immediate wakeup could avoid falling behind a callback flood. > > > > > > > > > > > > Ok then looks like it's time for me to add RCU_NOCB_WAKE_FORCE removal in > > > > > > my TODO list...unless Zqiang would like to give it a try? :-) > > > > > > > > > > > > > Given that rcutorture does test callback flooding, we appear to be OK, > > > > > > > but maybe it is time to crank up the flooding more. > > > > > > > > > > > > > > On the other hand, I have started seeing the (very) occasional OOM > > > > > > > on TREE03. > > > > > > > (In addition to those that show up from time to time on the > > > > > > > single-CPU TREE09 scenario.) > > > > > > > > > > > > Interesting, are those recent? Bisectable? > > > > > > > > > > Bisection in progress, got it down to 10 commits. Yet again about > > > > > ten hours per step on 20 systems... > > > > > > > > > > Though maybe I should have put more time into making it happen faster. > > > > > Except that I was on travel, so I doubt that I would have made all that > > > > > much progress. ;-) > > > > > > > > And it hit this one, which you encountered earlier: > > > > > > > > 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier") > > > > > > > > Which you fixed with this guy: > > > > > > > > 600310bd7ea8 ("rcu: Defer RCU kthreads wakeup when CPU is dying") > > > > > > > > Which is not yet in -next. Which means I have spent an embarrassing > > > > amount of time bisecting a bug that you already fixed. C'est la vie! > > > > > > > > Given that v6.8-rc1 is now out, it is time to get a bunch of RCU > > > > commits into -next, including that one! ;-) > > > > > > Now to test your fix on top of the bad commit, and also on top of > > > next-20240110. Just in case... > > > > Thanks! > > > > I may send a v2 at some point within next week. No big change, just a comment > > and also expand the swake_up_one_online() logic to nocb kthreads as well. > > Looking forward to it! And your fix on top of next-20240110 did the trick. Whew!!! ;-) Thanx, Paul
> > Le Thu, Jan 18, 2024 at 06:51:57AM -0800, Paul E. McKenney a écrit : > > On Wed, Jan 17, 2024 at 01:07:25PM +0100, Frederic Weisbecker wrote: > > > Le Wed, Jan 17, 2024 at 06:26:16PM +0800, Zqiang a écrit : > > > > Currently, only rdp_gp->nocb_timer is used, for nocb_timer of > > > > no-rdp_gp structure, the timer_pending() is always return false, > > > > this commit therefore need to check rdp_gp->nocb_timer in > > > > __call_rcu_nocb_wake(). > > > > > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > > > --- > > > > kernel/rcu/tree_nocb.h | 3 ++- > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h > > > > index 54971afc3a9b..3f85577bddd4 100644 > > > > --- a/kernel/rcu/tree_nocb.h > > > > +++ b/kernel/rcu/tree_nocb.h > > > > @@ -564,6 +564,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, > > > > long lazy_len; > > > > long len; > > > > struct task_struct *t; > > > > + struct rcu_data *rdp_gp = rdp->nocb_gp_rdp; > > > > > > > > // If we are being polled or there is no kthread, just leave. > > > > t = READ_ONCE(rdp->nocb_gp_kthread); > > > > @@ -608,7 +609,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, > > > > smp_mb(); /* Enqueue before timer_pending(). */ > > > > if ((rdp->nocb_cb_sleep || > > > > !rcu_segcblist_ready_cbs(&rdp->cblist)) && > > > > - !timer_pending(&rdp->nocb_timer)) { > > > > + !timer_pending(&rdp_gp->nocb_timer)) { > > > > > > Hehe, good eyes ;-) > > > > > > I had that change in mind but while checking that area further I actually > > > wondered what is the actual purpose of this RCU_NOCB_WAKE_FORCE thing If > > > we reach that place, it means that the nocb_gp kthread should be awaken > > > already (or the timer pending), so what does a force wake up solve in that > > > case? > > > > > > Paul, any recollection of that? > > > > Huh. We never actually do RCU_NOCB_WAKE_FORCE in v6.7, if I followed > > all the code paths correctly. > > > > Historically, I have been worried about lost wakeups. Also, there > > used to be code paths in which a wakeup was not needed, for example, > > because we knew that the ending of the current grace period would take > > care of things. Unless there was some huge pile of callbacks, in which > > case an immediate wakeup could avoid falling behind a callback flood. > > Ok then looks like it's time for me to add RCU_NOCB_WAKE_FORCE removal in > my TODO list...unless Zqiang would like to give it a try? :-) Thank you Frederic, apologies for the delayed reply. I'm currently traveling, after my vacation is over, I would like to try it if it hasn't been done yet :-) . Thanks Zqiang > > > > > Given that rcutorture does test callback flooding, we appear to be OK, > > but maybe it is time to crank up the flooding more. > > > > On the other hand, I have started seeing the (very) occasional OOM > > on TREE03. > > (In addition to those that show up from time to time on the > > single-CPU TREE09 scenario.) > > Interesting, are those recent? Bisectable? > > Thanks!
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h index 54971afc3a9b..3f85577bddd4 100644 --- a/kernel/rcu/tree_nocb.h +++ b/kernel/rcu/tree_nocb.h @@ -564,6 +564,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, long lazy_len; long len; struct task_struct *t; + struct rcu_data *rdp_gp = rdp->nocb_gp_rdp; // If we are being polled or there is no kthread, just leave. t = READ_ONCE(rdp->nocb_gp_kthread); @@ -608,7 +609,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, smp_mb(); /* Enqueue before timer_pending(). */ if ((rdp->nocb_cb_sleep || !rcu_segcblist_ready_cbs(&rdp->cblist)) && - !timer_pending(&rdp->nocb_timer)) { + !timer_pending(&rdp_gp->nocb_timer)) { rcu_nocb_unlock(rdp); wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_FORCE, TPS("WakeOvfIsDeferred"));