Message ID | 20231208220545.7452-1-frederic@kernel.org |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp5745839vqy; Fri, 8 Dec 2023 14:06:17 -0800 (PST) X-Google-Smtp-Source: AGHT+IE6gc56eChxyrcJiBRi/+5JGTvGAb+X51/edYC/yg141PGKTkBM1WKQQQ2o84HNjCEuN+XA X-Received: by 2002:a05:6e02:1524:b0:35d:68ea:2f32 with SMTP id i4-20020a056e02152400b0035d68ea2f32mr1093396ilu.11.1702073177152; Fri, 08 Dec 2023 14:06:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702073177; cv=none; d=google.com; s=arc-20160816; b=KxZIqlxzRCAaQadHXURw2IeDDuId6szHcugtM4UIXbmsr/7nJNN2BCRSZnnzraDTo9 vum7SFj6hT4apaCI+Y+GfiCRmq5erYbpBqyPBURxuEbR/+nvsofuqrI3Do4v3ZEI2nRT D90nRATT8Zq9m34mU5e5PWaLJKXg9DyvbOdQqo2ZHVnqbSRXXlcDchZi+zdH3xl1Y/qL XhsRpY3lK0cmgmY0ko6PCy7VgnQp9W7PAd48LUMoI5Kr8gIKgupRQgVRdnVxnjwz8RU9 DCcFOTOG/FangkI4qr9ywgMxYJWuHzJ97GLWHP3yqL3TYwdDXwWd3fNq48X74G4lLVz1 Y7Xw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=NrBM1zEAC6Gp+/xEa0N53l5zIeMXXGWOtf3cqfcWRZs=; fh=ruOaVEkAbEh3t4sLFZXGQmBwM4c1/V/LLnMggyZGb4c=; b=V0XBuvVNDNAEoBk8TxtLXYCbqhE/G2Edf/1E8ToPgdqgwcKXGlaHIR51mgoaUFJBkp AGitcWjJ3q5w9QQiX/+C9Xgs3OmptNhMYbdbKIRc3VkDh4jZXNMd70nq612CWUSlbcYm mNGmD3o5za3lxN1Tj5Fo23Jxy2mL/JB/PvW//8voPOkavKWF+nCzK1CrGW0KPNCn4ziF D46Y2JRCwNWv5TSmnZqXeS4StU2IS2REYEf3eChWKfpn1cucWdAC8hb98NEBD7lEkowJ 6uqckKyekBtgWouBiXv+rOfjwDFAYxxCdmDXCf4VSKKMFB1NUV1t/oybeg/aobrw3p4D oDpQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Pw+MKktq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id m188-20020a6326c5000000b005c661f15600si2026349pgm.399.2023.12.08.14.06.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Dec 2023 14:06:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Pw+MKktq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 0CA5C81BFBA1; Fri, 8 Dec 2023 14:06:13 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1574840AbjLHWGE (ORCPT <rfc822;makky5685@gmail.com> + 99 others); Fri, 8 Dec 2023 17:06:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37030 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1574835AbjLHWFq (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 8 Dec 2023 17:05:46 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 23DDE10E0 for <linux-kernel@vger.kernel.org>; Fri, 8 Dec 2023 14:05:53 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6A29DC433C7; Fri, 8 Dec 2023 22:05:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1702073152; bh=CgYvOu578AFfLRbBC6EHcv8SUptcV6n5XjcrCPA7nDI=; h=From:To:Cc:Subject:Date:From; b=Pw+MKktqBiWhUgo6VfuY7O/2yoxIeItlKxQdYL8fX1bz1s/zy7Ihk79bC/wo1Bxzi uRBhcWs2U2UYM8pajpvAJQqhLS1aJEJRK3lZWjiB/Kbo2QkALyO58X3A+D0NApzlhs Bn8MOA1bGIAehpbs4DsmTNLqnsEvc+owbWRKAMWST3L6qc9VgdnxnPkdwwF1GDQ9ft Gv8/IyjFjDnOlait/gfqbobtCiex+zCxlqD/+1oAvQaJ8mGK9w38VwSjdMgdTa4A87 lZqYCSKGDMU3PDjsZgd5PKusJFY5k5lUn0Nhnu4KjvGW4OcWko9ekeeaqzdFsFBZ7v jnjrxXCI3aBvA== From: Frederic Weisbecker <frederic@kernel.org> To: LKML <linux-kernel@vger.kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org>, Boqun Feng <boqun.feng@gmail.com>, Joel Fernandes <joel@joelfernandes.org>, Neeraj Upadhyay <neeraj.upadhyay@amd.com>, "Paul E . McKenney" <paulmck@kernel.org>, Uladzislau Rezki <urezki@gmail.com>, Zqiang <qiang.zhang1211@gmail.com>, rcu <rcu@vger.kernel.org> Subject: [PATCH 0/8] rcu: Fix expedited GP deadlock (and cleanup some nocb stuff) Date: Fri, 8 Dec 2023 23:05:37 +0100 Message-ID: <20231208220545.7452-1-frederic@kernel.org> X-Mailer: git-send-email 2.42.1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Fri, 08 Dec 2023 14:06:13 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784753084135896392 X-GMAIL-MSGID: 1784753084135896392 |
Series |
rcu: Fix expedited GP deadlock (and cleanup some nocb stuff)
|
|
Message
Frederic Weisbecker
Dec. 8, 2023, 10:05 p.m. UTC
TREE04 can trigger a writer stall if run with memory pressure. This is due to a circular dependency between waiting for expedited grace period and polling on expedited grace period when workqueues go back to mayday serialization. Here is a proposal fix. Frederic Weisbecker (8): rcu/nocb: Make IRQs disablement symetric rcu/nocb: Re-arrange call_rcu() NOCB specific code rcu/exp: Fix RCU expedited parallel grace period kworker allocation failure recovery rcu/exp: Handle RCU expedited grace period kworker allocation failure rcu: s/boost_kthread_mutex/kthread_mutex rcu/exp: Make parallel exp gp kworker per rcu node rcu/exp: Handle parallel exp gp kworkers affinity rcu/exp: Remove rcu_par_gp_wq kernel/rcu/rcu.h | 5 - kernel/rcu/tree.c | 222 +++++++++++++++++++++++++-------------- kernel/rcu/tree.h | 12 +-- kernel/rcu/tree_exp.h | 81 +++----------- kernel/rcu/tree_nocb.h | 38 ++++--- kernel/rcu/tree_plugin.h | 52 ++------- 6 files changed, 191 insertions(+), 219 deletions(-)
Comments
On Fri, Dec 08, 2023 at 11:05:37PM +0100, Frederic Weisbecker wrote: > TREE04 can trigger a writer stall if run with memory pressure. This > is due to a circular dependency between waiting for expedited grace > period and polling on expedited grace period when workqueues go back > to mayday serialization. > > Here is a proposal fix. The torture.sh "acceptance test" with KCSAN and --duration 30 ran fine except for this in TREE09: kernel/rcu/tree_nocb.h:1785:13: error: unused function '__call_rcu_nocb_wake' [-Werror,-Wunused-function] My guess is that the declaration of __call_rcu_nocb_wake() in kernel/rcu/tree.h needs an "#ifdef CONFIG_SMP", but you might have a better fix. Thanx, Paul > Frederic Weisbecker (8): > rcu/nocb: Make IRQs disablement symetric > rcu/nocb: Re-arrange call_rcu() NOCB specific code > rcu/exp: Fix RCU expedited parallel grace period kworker allocation > failure recovery > rcu/exp: Handle RCU expedited grace period kworker allocation failure > rcu: s/boost_kthread_mutex/kthread_mutex > rcu/exp: Make parallel exp gp kworker per rcu node > rcu/exp: Handle parallel exp gp kworkers affinity > rcu/exp: Remove rcu_par_gp_wq > > kernel/rcu/rcu.h | 5 - > kernel/rcu/tree.c | 222 +++++++++++++++++++++++++-------------- > kernel/rcu/tree.h | 12 +-- > kernel/rcu/tree_exp.h | 81 +++----------- > kernel/rcu/tree_nocb.h | 38 ++++--- > kernel/rcu/tree_plugin.h | 52 ++------- > 6 files changed, 191 insertions(+), 219 deletions(-) > > -- > 2.42.1 >
Le Mon, Dec 11, 2023 at 08:38:59AM -0800, Paul E. McKenney a écrit : > On Fri, Dec 08, 2023 at 11:05:37PM +0100, Frederic Weisbecker wrote: > > TREE04 can trigger a writer stall if run with memory pressure. This > > is due to a circular dependency between waiting for expedited grace > > period and polling on expedited grace period when workqueues go back > > to mayday serialization. > > > > Here is a proposal fix. > > The torture.sh "acceptance test" with KCSAN and --duration 30 ran > fine except for this in TREE09: > > kernel/rcu/tree_nocb.h:1785:13: error: unused function '__call_rcu_nocb_wake' [-Werror,-Wunused-function] > > My guess is that the declaration of __call_rcu_nocb_wake() in > kernel/rcu/tree.h needs an "#ifdef CONFIG_SMP", but you might have a > better fix. Could be because if CONFIG_RCU_NO_CB_CPU=n, the function is only called (though as dead code) from rcutree_migrate_callbacks() which in turn only exists if CONFIG_HOTPLUG_CPU=y. Something like that then: diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index 35f7af331e6c..e1ff53d5084c 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -445,6 +445,8 @@ static void rcu_qs(void); static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp); #ifdef CONFIG_HOTPLUG_CPU static bool rcu_preempt_has_tasks(struct rcu_node *rnp); +static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty, + unsigned long flags); #endif /* #ifdef CONFIG_HOTPLUG_CPU */ static int rcu_print_task_exp_stall(struct rcu_node *rnp); static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp); @@ -466,8 +468,6 @@ static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp, unsigned long j, bool lazy); static void call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *head, rcu_callback_t func, unsigned long flags, bool lazy); -static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty, - unsigned long flags); static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp, int level); static bool do_nocb_deferred_wakeup(struct rcu_data *rdp); static void rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp);
On Mon, Dec 11, 2023 at 09:04:04PM +0100, Frederic Weisbecker wrote: > Le Mon, Dec 11, 2023 at 08:38:59AM -0800, Paul E. McKenney a écrit : > > On Fri, Dec 08, 2023 at 11:05:37PM +0100, Frederic Weisbecker wrote: > > > TREE04 can trigger a writer stall if run with memory pressure. This > > > is due to a circular dependency between waiting for expedited grace > > > period and polling on expedited grace period when workqueues go back > > > to mayday serialization. > > > > > > Here is a proposal fix. > > > > The torture.sh "acceptance test" with KCSAN and --duration 30 ran > > fine except for this in TREE09: > > > > kernel/rcu/tree_nocb.h:1785:13: error: unused function '__call_rcu_nocb_wake' [-Werror,-Wunused-function] > > > > My guess is that the declaration of __call_rcu_nocb_wake() in > > kernel/rcu/tree.h needs an "#ifdef CONFIG_SMP", but you might have a > > better fix. > > Could be because if CONFIG_RCU_NO_CB_CPU=n, the function is only called > (though as dead code) from rcutree_migrate_callbacks() which in turn only > exists if CONFIG_HOTPLUG_CPU=y. > > Something like that then: > > diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h > index 35f7af331e6c..e1ff53d5084c 100644 > --- a/kernel/rcu/tree.h > +++ b/kernel/rcu/tree.h > @@ -445,6 +445,8 @@ static void rcu_qs(void); > static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp); > #ifdef CONFIG_HOTPLUG_CPU > static bool rcu_preempt_has_tasks(struct rcu_node *rnp); > +static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty, > + unsigned long flags); > #endif /* #ifdef CONFIG_HOTPLUG_CPU */ > static int rcu_print_task_exp_stall(struct rcu_node *rnp); > static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp); > @@ -466,8 +468,6 @@ static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp, > unsigned long j, bool lazy); > static void call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *head, > rcu_callback_t func, unsigned long flags, bool lazy); > -static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty, > - unsigned long flags); > static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp, int level); > static bool do_nocb_deferred_wakeup(struct rcu_data *rdp); > static void rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp); This one passes TREE01 and TINY01, but on TREE09 still gets this: kernel/rcu/tree_nocb.h:1785:13: error: ‘__call_rcu_nocb_wake’ defined but not used [-Werror=unused-function] Huh. I suppose that there is always __maybe_unused? Thanx, Paul
On Mon, Dec 11, 2023 at 01:39:40PM -0800, Paul E. McKenney wrote: > This one passes TREE01 and TINY01, but on TREE09 still gets this: > > kernel/rcu/tree_nocb.h:1785:13: error: ‘__call_rcu_nocb_wake’ defined but not used [-Werror=unused-function] > > Huh. I suppose that there is always __maybe_unused? Looks like a good fit indeed! Thanks! > Thanx, Paul