From patchwork Thu Oct 19 12:02:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 155469 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2010:b0:403:3b70:6f57 with SMTP id fe16csp330914vqb; Thu, 19 Oct 2023 05:03:34 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFZOA9uLk7JczXsinNSsLjgzy9u0J4eNQhv8141zn5Nk1bhbR0RUG48fwG7Txvtq3WzW+TE X-Received: by 2002:a05:6a20:ce81:b0:16b:7602:1837 with SMTP id if1-20020a056a20ce8100b0016b76021837mr1537736pzb.29.1697717014243; Thu, 19 Oct 2023 05:03:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697717014; cv=none; d=google.com; s=arc-20160816; b=RSH4rs/1ZdpVLqJDBAH+JQpIerpjssnnHt43+cRpP4mBdCxW3wvOYSVL8IflkIDGRV BRAOXqep8ZgXA42Fn7hLlfhAijHYebltWvWYVo/HW/I+HGRhcUeo8RogCoHHr93cLMeL yv01UGovEJUxJ0xRxAM0XfA/vx7AET0zGRoeEWgBDmC+qJf+MAHFpaYUQtLTp+UCckXS k8A+32k3ZWICOiL+7oPxFKL1n/p+TFVXtWrcmGcFb8N/UKmrvwgA7kv44B0N4WnhADR7 b0ytBX7HW7u5mLRssUhZgptfYLowLzS5fEc5crT9XuJxXYbF09o2FycKf0U+YBY9xKgr l6Tg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=6lSNSVcRSzH6bzBtZ2uK9sq1Hdl3k8raT4Uq6Tk6Gco=; fh=B4auX2tX3KInqmopc6zh7xbyhZlPw+Z1P/f0tE4CZbE=; b=khp/kry4lKyE7kaVjAJyWaw7rRcZb0J6j0G0ktmp6jlw3g0UL5to1ey5w03OuhOisH Kx7ao0+B7sqoDoI5cy64qccOg+WsQj5SfLPpV3vaqytipAjgQBWLeLY2yW28ZSV4L4uU QQa6KZSojjKt+xpDm2jokJ/G3OicOXOturiW1ZtzEtApOp+x94VjnELBoGJP+QB+5JzI at99+kfJ5xanRgYey7jznwuawnHobGaFBzK6hHwOg8VLNZVyJaHQut6Ret+Lsv1TqpKQ 1MJW0d5uJ2ZMYwReYCaZsJFFDiLoQkCpac+schA0ENXUTDbIsQVf+RGzB8hV1qG16xka ny0Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=ItD6WkAG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from howler.vger.email (howler.vger.email. [23.128.96.34]) by mx.google.com with ESMTPS id z1-20020a633301000000b00582f1f73c82si4035325pgz.381.2023.10.19.05.03.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Oct 2023 05:03:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) client-ip=23.128.96.34; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=ItD6WkAG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 3ECE9826EBBF; Thu, 19 Oct 2023 05:03:29 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345555AbjJSMCj (ORCPT + 25 others); Thu, 19 Oct 2023 08:02:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39996 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345535AbjJSMC3 (ORCPT ); Thu, 19 Oct 2023 08:02:29 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 74279184; Thu, 19 Oct 2023 05:02:27 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E6747C433C7; Thu, 19 Oct 2023 12:02:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1697716947; bh=R+5y2QNUvpt4INL3lQPSCQLkuydVmA5dErPkd3V4WNA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ItD6WkAGIh+xK7y9khfcK2WoitkBiQmlb1XYlEbrfsCoQBQnd1fMcde0whMt0guyT T/oL5W8GM3s8qLYTiW/mHomS+NMwQ+V+CxvTHOMJRHK7noXvHpbcQIyMNBVlO+OI1a mUlClE9U0oWNc4ApcnTz/PYb/zZK12oxEcFmHMzfQsDlEtjA0jekjMpZzUl63sDIZb XVgJg4SKfnYEjztzHIYLmunoO1xW/Y62k6q01/41DCerBvm6Pvs7Y1pSTlsuC9KOQT pcpRjWR6AqZaKzKMkxjKm1nMqlUugxvFo8mExZ7Fk3r59lQJ32avEnYHekqWXzYvr9 6sb27otZE3N0Q== From: Frederic Weisbecker To: LKML Cc: "Joel Fernandes (Google)" , Boqun Feng , Josh Triplett , Mathieu Desnoyers , Neeraj Upadhyay , "Paul E . McKenney" , Steven Rostedt , Uladzislau Rezki , rcu , Huacai Chen , Binbin Zhou , Sergey Senozhatsky , Thomas Gleixner , stable@vger.kernel.org, Frederic Weisbecker Subject: [PATCH 6/6] rcu/tree: Defer setting of jiffies during stall reset Date: Thu, 19 Oct 2023 14:02:02 +0200 Message-Id: <20231019120202.1216228-7-frederic@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231019120202.1216228-1-frederic@kernel.org> References: <20231019120202.1216228-1-frederic@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on howler.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Thu, 19 Oct 2023 05:03:29 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780185315809681181 X-GMAIL-MSGID: 1780185315809681181 From: "Joel Fernandes (Google)" There are instances where rcu_cpu_stall_reset() is called when jiffies did not get a chance to update for a long time. Before jiffies is updated, the CPU stall detector can go off triggering false-positives where a just-started grace period appears to be ages old. In the past, we disabled stall detection in rcu_cpu_stall_reset() however this got changed [1]. This is resulting in false-positives in KGDB usecase [2]. Fix this by deferring the update of jiffies to the third run of the FQS loop. This is more robust, as, even if rcu_cpu_stall_reset() is called just before jiffies is read, we would end up pushing out the jiffies read by 3 more FQS loops. Meanwhile the CPU stall detection will be delayed and we will not get any false positives. [1] https://lore.kernel.org/all/20210521155624.174524-2-senozhatsky@chromium.org/ [2] https://lore.kernel.org/all/20230814020045.51950-2-chenhuacai@loongson.cn/ Tested with rcutorture.cpu_stall option as well to verify stall behavior with/without patch. Tested-by: Huacai Chen Reported-by: Binbin Zhou Closes: https://lore.kernel.org/all/20230814020045.51950-2-chenhuacai@loongson.cn/ Suggested-by: Paul McKenney Cc: Sergey Senozhatsky Cc: Thomas Gleixner Cc: stable@vger.kernel.org Fixes: a80be428fbc1 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()") Signed-off-by: Joel Fernandes (Google) Signed-off-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker --- kernel/rcu/tree.c | 12 ++++++++++++ kernel/rcu/tree.h | 4 ++++ kernel/rcu/tree_stall.h | 20 ++++++++++++++++++-- 3 files changed, 34 insertions(+), 2 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index cb1caefa8bd0..d85779f67aea 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1556,10 +1556,22 @@ static bool rcu_gp_fqs_check_wake(int *gfp) */ static void rcu_gp_fqs(bool first_time) { + int nr_fqs = READ_ONCE(rcu_state.nr_fqs_jiffies_stall); struct rcu_node *rnp = rcu_get_root(); WRITE_ONCE(rcu_state.gp_activity, jiffies); WRITE_ONCE(rcu_state.n_force_qs, rcu_state.n_force_qs + 1); + + WARN_ON_ONCE(nr_fqs > 3); + /* Only countdown nr_fqs for stall purposes if jiffies moves. */ + if (nr_fqs) { + if (nr_fqs == 1) { + WRITE_ONCE(rcu_state.jiffies_stall, + jiffies + rcu_jiffies_till_stall_check()); + } + WRITE_ONCE(rcu_state.nr_fqs_jiffies_stall, --nr_fqs); + } + if (first_time) { /* Collect dyntick-idle snapshots. */ force_qs_rnp(dyntick_save_progress_counter); diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index 192536916f9a..e9821a8422db 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -386,6 +386,10 @@ struct rcu_state { /* in jiffies. */ unsigned long jiffies_stall; /* Time at which to check */ /* for CPU stalls. */ + int nr_fqs_jiffies_stall; /* Number of fqs loops after + * which read jiffies and set + * jiffies_stall. Stall + * warnings disabled if !0. */ unsigned long jiffies_resched; /* Time at which to resched */ /* a reluctant CPU. */ unsigned long n_force_qs_gpstart; /* Snapshot of n_force_qs at */ diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index 49544f932279..ac8e86babe44 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -150,12 +150,17 @@ static void panic_on_rcu_stall(void) /** * rcu_cpu_stall_reset - restart stall-warning timeout for current grace period * + * To perform the reset request from the caller, disable stall detection until + * 3 fqs loops have passed. This is required to ensure a fresh jiffies is + * loaded. It should be safe to do from the fqs loop as enough timer + * interrupts and context switches should have passed. + * * The caller must disable hard irqs. */ void rcu_cpu_stall_reset(void) { - WRITE_ONCE(rcu_state.jiffies_stall, - jiffies + rcu_jiffies_till_stall_check()); + WRITE_ONCE(rcu_state.nr_fqs_jiffies_stall, 3); + WRITE_ONCE(rcu_state.jiffies_stall, ULONG_MAX); } ////////////////////////////////////////////////////////////////////////////// @@ -171,6 +176,7 @@ static void record_gp_stall_check_time(void) WRITE_ONCE(rcu_state.gp_start, j); j1 = rcu_jiffies_till_stall_check(); smp_mb(); // ->gp_start before ->jiffies_stall and caller's ->gp_seq. + WRITE_ONCE(rcu_state.nr_fqs_jiffies_stall, 0); WRITE_ONCE(rcu_state.jiffies_stall, j + j1); rcu_state.jiffies_resched = j + j1 / 2; rcu_state.n_force_qs_gpstart = READ_ONCE(rcu_state.n_force_qs); @@ -726,6 +732,16 @@ static void check_cpu_stall(struct rcu_data *rdp) !rcu_gp_in_progress()) return; rcu_stall_kick_kthreads(); + + /* + * Check if it was requested (via rcu_cpu_stall_reset()) that the FQS + * loop has to set jiffies to ensure a non-stale jiffies value. This + * is required to have good jiffies value after coming out of long + * breaks of jiffies updates. Not doing so can cause false positives. + */ + if (READ_ONCE(rcu_state.nr_fqs_jiffies_stall) > 0) + return; + j = jiffies; /*