From patchwork Wed Oct 19 22:58:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 5912 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp574565wrs; Wed, 19 Oct 2022 16:00:55 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4px+7yLAIj9ZJlGURbUdrwV5AewWIzpgMWhMfm/RrKn+N98TQJ1V1rq7tXjv3Fc8XioTuh X-Received: by 2002:a17:906:cc52:b0:78d:d477:5b7f with SMTP id mm18-20020a170906cc5200b0078dd4775b7fmr8613233ejb.558.1666220455623; Wed, 19 Oct 2022 16:00:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666220455; cv=none; d=google.com; s=arc-20160816; b=KkL+wWp7FSQR7D3J8hrizK9zQatTceZ8fg8+vfKHnSpNR6dJFcl2r3t+MKLTW7o06j yGi8XFXFlUwj2uuVLlBGA3wtWiF8/5gHXauqiqya7mHnce4iU/7LQepU3+szvS9EAddc GRln2BrC3a3py7G+APg0sKmP5hTMdt/vivpwFqX+5PBhLO2lDd4wVtHnvHFnuxRE0d9o dmjHu+36iiXyclqQE4K15JuOj/iFAKW/mjPgFP/VhOH+PdvuEcvDxdtL695OurnOH0wv 6rReiAAPoQ1jv9o3qZgpCmm4q0wlV+oleKPqZR1+204qgMk4Pdt3VVrt7iHwCnCXNDkz HKWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=zK+9MmvvhTfPLDlvN5RU41V+SOQQaX5MUjyEAuLLr+Q=; b=JjPcPTE2tDq9Cxpihg9LQmuy6Sin+mmi7mIHMJ5KD9OLXOweYHdiPDz022w/jBmtuf pDYPtr5d5tmeFs6zq2kQ6eIgN7xO7Po1qw9ZAFGashkF+m0D3GRIuOUwuxIb3xzvIioQ H3OhnhAJlPjWyD9fVV/wdJM7s5N9vQYw5DdC2cyoBoyfQ5cYmsJwMbCDFQbqxIQOd+Np sAHZaO4A2yZbD+GVHTHRtEGjCmvJr/OKtPS46ZKOMD5vNCoB1fcqyLEtTALwRbEK68p/ xN3/bgxnlALN75Fxv26AMrAC5WwH3bOnWB1qNwgo90WChkRrfpMm/3ByNjc33YJuV/or 0aRg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="aSL//Xz1"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s17-20020aa7c551000000b0045c9dbe290csi13849661edr.406.2022.10.19.16.00.30; Wed, 19 Oct 2022 16:00:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="aSL//Xz1"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231728AbiJSW7P (ORCPT + 99 others); Wed, 19 Oct 2022 18:59:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44508 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231317AbiJSW6y (ORCPT ); Wed, 19 Oct 2022 18:58:54 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 214531CBAAF; Wed, 19 Oct 2022 15:58:51 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 5ADD5B82622; Wed, 19 Oct 2022 22:58:49 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EEA8AC4314C; Wed, 19 Oct 2022 22:58:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1666220328; bh=XNWl0HIokgwRVMPagCZ5aA6xkbCFeaPBVMDULtjlh9U=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=aSL//Xz1geDrL70We06qUFfxPQB1fefbnlBS69m60Xc9ptHZUuYCYSx0bFKWDEmZB kAhMsNar9Pa3GHSQ0Jgzi8eDZg6HJNUfQtJCzkF006C6zT4vwIwZQOazU7xWTEqEfW WFdRN9WRt1VZdpLXfCFwWhxa/9o0+ZkA/iu56sd5mG24kAcLRhnepJrKLKduPKw/xP 4HH8j7XDITWJg1Z1T1o5oXHMwi2tAwCKFhXAMV581PgKBWHFNsxW8ncPAGCvtY5vrt 8SzJmbDfQc3nahuTzMqNaOibBp9D1pd+w4akppt8objl6lcsFDLxDWD1y2NqYL/qPB XBQUs9KgsyosA== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id ACFC35C0879; Wed, 19 Oct 2022 15:58:47 -0700 (PDT) From: "Paul E. McKenney" To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@fb.com, rostedt@goodmis.org, "Paul E. McKenney" , Randy Dunlap , Frederic Weisbecker , Thomas Gleixner , John Ogness , Petr Mladek Subject: [PATCH v3 rcu 02/11] srcu: Create an srcu_read_lock_nmisafe() and srcu_read_unlock_nmisafe() Date: Wed, 19 Oct 2022 15:58:37 -0700 Message-Id: <20221019225846.2501109-2-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20221019225838.GA2500612@paulmck-ThinkPad-P17-Gen-1> References: <20221019225838.GA2500612@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 X-Spam-Status: No, score=-7.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747158780615660984?= X-GMAIL-MSGID: =?utf-8?q?1747158780615660984?= On strict load-store architectures, the use of this_cpu_inc() by srcu_read_lock() and srcu_read_unlock() is not NMI-safe in TREE SRCU. To see this suppose that an NMI arrives in the middle of srcu_read_lock(), just after it has read ->srcu_lock_count, but before it has written the incremented value back to memory. If that NMI handler also does srcu_read_lock() and srcu_read_lock() on that same srcu_struct structure, then upon return from that NMI handler, the interrupted srcu_read_lock() will overwrite the NMI handler's update to ->srcu_lock_count, but leave unchanged the NMI handler's update by srcu_read_unlock() to ->srcu_unlock_count. This can result in a too-short SRCU grace period, which can in turn result in arbitrary memory corruption. If the NMI handler instead interrupts the srcu_read_unlock(), this can result in eternal SRCU grace periods, which is not much better. This commit therefore creates a pair of new srcu_read_lock_nmisafe() and srcu_read_unlock_nmisafe() functions, which allow SRCU readers in both NMI handlers and in process and IRQ context. It is bad practice to mix the existing and the new _nmisafe() primitives on the same srcu_struct structure. Use one set or the other, not both. Just to underline that "bad practice" point, using srcu_read_lock() at process level and srcu_read_lock_nmisafe() in your NMI handler will not, repeat NOT, work. If you do not immediately understand why this is the case, please review the earlier paragraphs in this commit log. [ paulmck: Apply kernel test robot feedback. ] [ paulmck: Apply feedback from Randy Dunlap. ] [ paulmck: Apply feedback from John Ogness. ] Link: https://lore.kernel.org/all/20220910221947.171557773@linutronix.de/ Signed-off-by: Paul E. McKenney Acked-by: Randy Dunlap # build-tested Reviewed-by: Frederic Weisbecker Cc: Thomas Gleixner Cc: John Ogness Cc: Petr Mladek --- arch/Kconfig | 3 +++ include/linux/srcu.h | 39 ++++++++++++++++++++++++++++++++++++ include/linux/srcutiny.h | 11 ++++++++++ include/linux/srcutree.h | 3 +++ kernel/rcu/Kconfig | 3 +++ kernel/rcu/rcutorture.c | 11 ++++++++-- kernel/rcu/srcutree.c | 43 ++++++++++++++++++++++++++++++++++++---- 7 files changed, 107 insertions(+), 6 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 8f138e580d1ae..6b95244c3057d 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -468,6 +468,9 @@ config ARCH_WANT_IRQS_OFF_ACTIVATE_MM config ARCH_HAVE_NMI_SAFE_CMPXCHG bool +config ARCH_HAS_NMI_SAFE_THIS_CPU_OPS + bool + config HAVE_ALIGNED_STRUCT_PAGE bool help diff --git a/include/linux/srcu.h b/include/linux/srcu.h index 01226e4d960a0..2cc8321c0c86a 100644 --- a/include/linux/srcu.h +++ b/include/linux/srcu.h @@ -52,6 +52,8 @@ int init_srcu_struct(struct srcu_struct *ssp); #else /* Dummy definition for things like notifiers. Actual use gets link error. */ struct srcu_struct { }; +int __srcu_read_lock_nmisafe(struct srcu_struct *ssp, bool chknmisafe) __acquires(ssp); +void __srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx, bool chknmisafe) __releases(ssp); #endif void call_srcu(struct srcu_struct *ssp, struct rcu_head *head, @@ -166,6 +168,25 @@ static inline int srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp) return retval; } +/** + * srcu_read_lock_nmisafe - register a new reader for an SRCU-protected structure. + * @ssp: srcu_struct in which to register the new reader. + * + * Enter an SRCU read-side critical section, but in an NMI-safe manner. + * See srcu_read_lock() for more information. + */ +static inline int srcu_read_lock_nmisafe(struct srcu_struct *ssp) __acquires(ssp) +{ + int retval; + + if (IS_ENABLED(CONFIG_NEED_SRCU_NMI_SAFE)) + retval = __srcu_read_lock_nmisafe(ssp); + else + retval = __srcu_read_lock(ssp); + rcu_lock_acquire(&(ssp)->dep_map); + return retval; +} + /* Used by tracing, cannot be traced and cannot invoke lockdep. */ static inline notrace int srcu_read_lock_notrace(struct srcu_struct *ssp) __acquires(ssp) @@ -191,6 +212,24 @@ static inline void srcu_read_unlock(struct srcu_struct *ssp, int idx) __srcu_read_unlock(ssp, idx); } +/** + * srcu_read_unlock_nmisafe - unregister a old reader from an SRCU-protected structure. + * @ssp: srcu_struct in which to unregister the old reader. + * @idx: return value from corresponding srcu_read_lock(). + * + * Exit an SRCU read-side critical section, but in an NMI-safe manner. + */ +static inline void srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx) + __releases(ssp) +{ + WARN_ON_ONCE(idx & ~0x1); + rcu_lock_release(&(ssp)->dep_map); + if (IS_ENABLED(CONFIG_NEED_SRCU_NMI_SAFE)) + __srcu_read_unlock_nmisafe(ssp, idx); + else + __srcu_read_unlock(ssp, idx); +} + /* Used by tracing, cannot be traced and cannot call lockdep. */ static inline notrace void srcu_read_unlock_notrace(struct srcu_struct *ssp, int idx) __releases(ssp) diff --git a/include/linux/srcutiny.h b/include/linux/srcutiny.h index 5aa5e0faf6a12..278331bd77660 100644 --- a/include/linux/srcutiny.h +++ b/include/linux/srcutiny.h @@ -90,4 +90,15 @@ static inline void srcu_torture_stats_print(struct srcu_struct *ssp, data_race(READ_ONCE(ssp->srcu_idx_max))); } +static inline int __srcu_read_lock_nmisafe(struct srcu_struct *ssp) +{ + BUG(); + return 0; +} + +static inline void __srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx) +{ + BUG(); +} + #endif diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h index 0c4eca07d78d5..d45dd507f4a56 100644 --- a/include/linux/srcutree.h +++ b/include/linux/srcutree.h @@ -154,4 +154,7 @@ void synchronize_srcu_expedited(struct srcu_struct *ssp); void srcu_barrier(struct srcu_struct *ssp); void srcu_torture_stats_print(struct srcu_struct *ssp, char *tt, char *tf); +int __srcu_read_lock_nmisafe(struct srcu_struct *ssp) __acquires(ssp); +void __srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx) __releases(ssp); + #endif diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig index d471d22a5e21b..f53ad63b2bc63 100644 --- a/kernel/rcu/Kconfig +++ b/kernel/rcu/Kconfig @@ -72,6 +72,9 @@ config TREE_SRCU help This option selects the full-fledged version of SRCU. +config NEED_SRCU_NMI_SAFE + def_bool HAVE_NMI && !ARCH_HAS_NMI_SAFE_THIS_CPU_OPS && !TINY_SRCU + config TASKS_RCU_GENERIC def_bool TASKS_RCU || TASKS_RUDE_RCU || TASKS_TRACE_RCU select SRCU diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index 503c2aa845a4a..b4c74ce102256 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -615,10 +615,14 @@ static struct rcu_torture_ops rcu_busted_ops = { DEFINE_STATIC_SRCU(srcu_ctl); static struct srcu_struct srcu_ctld; static struct srcu_struct *srcu_ctlp = &srcu_ctl; +static struct rcu_torture_ops srcud_ops; static int srcu_torture_read_lock(void) __acquires(srcu_ctlp) { - return srcu_read_lock(srcu_ctlp); + if (cur_ops == &srcud_ops) + return srcu_read_lock_nmisafe(srcu_ctlp); + else + return srcu_read_lock(srcu_ctlp); } static void @@ -642,7 +646,10 @@ srcu_read_delay(struct torture_random_state *rrsp, struct rt_read_seg *rtrsp) static void srcu_torture_read_unlock(int idx) __releases(srcu_ctlp) { - srcu_read_unlock(srcu_ctlp, idx); + if (cur_ops == &srcud_ops) + srcu_read_unlock_nmisafe(srcu_ctlp, idx); + else + srcu_read_unlock(srcu_ctlp, idx); } static int torture_srcu_read_lock_held(void) diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c index 25e9458da6a26..32a94b254d29f 100644 --- a/kernel/rcu/srcutree.c +++ b/kernel/rcu/srcutree.c @@ -654,6 +654,41 @@ void __srcu_read_unlock(struct srcu_struct *ssp, int idx) } EXPORT_SYMBOL_GPL(__srcu_read_unlock); +#ifdef CONFIG_NEED_SRCU_NMI_SAFE + +/* + * Counts the new reader in the appropriate per-CPU element of the + * srcu_struct, but in an NMI-safe manner using RMW atomics. + * Returns an index that must be passed to the matching srcu_read_unlock(). + */ +int __srcu_read_lock_nmisafe(struct srcu_struct *ssp) +{ + int idx; + struct srcu_data *sdp = raw_cpu_ptr(ssp->sda); + + idx = READ_ONCE(ssp->srcu_idx) & 0x1; + atomic_long_inc(&sdp->srcu_lock_count[idx]); + smp_mb__after_atomic(); /* B */ /* Avoid leaking the critical section. */ + return idx; +} +EXPORT_SYMBOL_GPL(__srcu_read_lock_nmisafe); + +/* + * Removes the count for the old reader from the appropriate per-CPU + * element of the srcu_struct. Note that this may well be a different + * CPU than that which was incremented by the corresponding srcu_read_lock(). + */ +void __srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx) +{ + struct srcu_data *sdp = raw_cpu_ptr(ssp->sda); + + smp_mb__before_atomic(); /* C */ /* Avoid leaking the critical section. */ + atomic_long_inc(&sdp->srcu_unlock_count[idx]); +} +EXPORT_SYMBOL_GPL(__srcu_read_unlock_nmisafe); + +#endif // CONFIG_NEED_SRCU_NMI_SAFE + /* * Start an SRCU grace period. */ @@ -1090,7 +1125,7 @@ static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp, int ss_state; check_init_srcu_struct(ssp); - idx = srcu_read_lock(ssp); + idx = __srcu_read_lock_nmisafe(ssp); ss_state = smp_load_acquire(&ssp->srcu_size_state); if (ss_state < SRCU_SIZE_WAIT_CALL) sdp = per_cpu_ptr(ssp->sda, 0); @@ -1123,7 +1158,7 @@ static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp, srcu_funnel_gp_start(ssp, sdp, s, do_norm); else if (needexp) srcu_funnel_exp_start(ssp, sdp_mynode, s); - srcu_read_unlock(ssp, idx); + __srcu_read_unlock_nmisafe(ssp, idx); return s; } @@ -1427,13 +1462,13 @@ void srcu_barrier(struct srcu_struct *ssp) /* Initial count prevents reaching zero until all CBs are posted. */ atomic_set(&ssp->srcu_barrier_cpu_cnt, 1); - idx = srcu_read_lock(ssp); + idx = __srcu_read_lock_nmisafe(ssp); if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_BARRIER) srcu_barrier_one_cpu(ssp, per_cpu_ptr(ssp->sda, 0)); else for_each_possible_cpu(cpu) srcu_barrier_one_cpu(ssp, per_cpu_ptr(ssp->sda, cpu)); - srcu_read_unlock(ssp, idx); + __srcu_read_unlock_nmisafe(ssp, idx); /* Remove the initial count, at which point reaching zero can happen. */ if (atomic_dec_and_test(&ssp->srcu_barrier_cpu_cnt))