From patchwork Mon Jul 17 18:36:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 121556 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:c923:0:b0:3e4:2afc:c1 with SMTP id j3csp1286376vqt; Mon, 17 Jul 2023 11:53:55 -0700 (PDT) X-Google-Smtp-Source: APBJJlFoPoQI5WQP33mJSQsg/cINdHKMCAV22NvvrdOjYslxPSCnEB5+Ra8vUmXe4o/1SypHYI8k X-Received: by 2002:a17:90a:208:b0:262:ec74:bb33 with SMTP id c8-20020a17090a020800b00262ec74bb33mr11384552pjc.46.1689620035450; Mon, 17 Jul 2023 11:53:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689620035; cv=none; d=google.com; s=arc-20160816; b=WxkQiFempLAUHNwo58wZybiuRruUOuSHPGqkVT7zZL2MatwenAnDDgaTqICosLMw9o SaDmeKhn8kiTt4LY+Io/Lz7ASoUg+CHv215I/i2pu1iO2gDJQZxtNyxOu/WwFGi/9Z/T Z55mptC6pCzElkAFpwsO1agNur/5ahwAK79QLVV6wM62kZnvi57HlcTeJAUYbmDemySd 25pDNHpWM5TQufZjXTT3JKBm6IhIfuE1hp8CxDwOG1zVGogg/ZWEuQdjMHTquCFO49Oa m/pBC4Ltq5/jf2ixmjmQvvDANchBgFe/bGc5Ty02MRX22w4McDwssz0J2fAWNiy3BGs5 dkYw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Imi3BieqXdofsPbEEr2wo28itltW/UrO/qKm54YlgSk=; fh=2v2BigKX/Pf+C6GEhbQZCWw2KRurWN9Nu+sU6pW3/0Q=; b=qTyAJVJkh3SQPaS8RDs7kj/AOsrdG1KGNVGmiRwBoZbpO4tqMtmuBX+eysXg1XOJhY tPPs/HQ2tA1Z4rIAkc/3sWdDp6tX9Dy9Fv9rAQdNkGNqM4Lvis8EltOKccVTJJkzAHnP Nqh+pV2Sdx54n7mjDc4bpNmB2pYs2+am4qxSfUKcl0F/HKL4cNrlDSn/fX1S1CbhtZEN /ChusUx2k+lDfdo+aQGZQuOALj87BaNF+RWh1HEXR9c6sEUAKL7Vy4eYaQsK7AP5XkpA 3x1rlyi+Dl0uUI6xfC9+DrohuImLNwRkAh9LOfvj+zbSATevcZgaUvT1EtjYwJtn8rcL uMiw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Qp668aFc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bo24-20020a17090b091800b00262e650abadsi246653pjb.117.2023.07.17.11.53.40; Mon, 17 Jul 2023 11:53:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Qp668aFc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230406AbjGQSgK (ORCPT + 99 others); Mon, 17 Jul 2023 14:36:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37270 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229946AbjGQSgG (ORCPT ); Mon, 17 Jul 2023 14:36:06 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9983599 for ; Mon, 17 Jul 2023 11:36:05 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 2BF3D611F3 for ; Mon, 17 Jul 2023 18:36:05 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 85CB2C433C8; Mon, 17 Jul 2023 18:36:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1689618964; bh=2Tt8ICMKfzWDkMea/gUwrBxO6s1VpyH+p9cl1SdgSpw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Qp668aFcRBGP8zLzY6GQvJLG8x7s3Cq1ojefAVvPI8CpHQePhJ6chLqgHnKFZawlM IGnvCX9z/cIBkB4dwzII7t5oPYyvRjSncG+YciQjojoTHrMOCXf3gRfWlMGXtlpjo2 6OUW5v1SiD4VIbM9kkzVYmzl2jSNqw2A9VtmvGiofypvvDz8TQhODBB+bYd0srS+oM JLz2ckx42b0+dcoq3UXJgvBJ0s6F3L8hGSaq1trPY+KBQWbZ5wjJEJ3oc2ziou5CxJ N5nIAoMtSBvuLW1QpoNrHMBIQq3IoZ5g31+4sjL57ym8XwqsSpBG9Fl1mP7rb2XwGm tr/FfnuYD3+Ig== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 328F6CE04CD; Mon, 17 Jul 2023 11:36:04 -0700 (PDT) From: "Paul E. McKenney" To: peterz@infradead.org, jgross@suse.com, vschneid@redhat.com, yury.norov@gmail.com Cc: linux-kernel@vger.kernel.org, imran.f.khan@oracle.com, kernel-team@meta.com, "Paul E . McKenney" Subject: [PATCH csd-lock 2/2] smp: Reduce NMI traffic from CSD waiters to CSD destination Date: Mon, 17 Jul 2023 11:36:02 -0700 Message-Id: <20230717183602.1099773-2-paulmck@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <96818440-a922-4b43-8871-50358e18b523@paulmck-laptop> References: <96818440-a922-4b43-8871-50358e18b523@paulmck-laptop> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771695018387020127 X-GMAIL-MSGID: 1771695018387020127 From: Imran Khan On systems with hundreds of CPUs, if most of the CPUs detect a CSD hang, then all of these waiting CPUs send an NMI to the destination CPU in order to dump its backtrace. Given enough NMIs, the destination CPU will spent much of its time producing backtraces, thus further delaying that CPU's response to the original CSD IPI. In the worst case, by the time destination CPU is done producing all of these backtrace NMIs, the CSD wait timeout will have elapsed so that the waiters resend their backtrace NMIs again, further delaying forward progress. Therefore, to avoid these delays, issue the backtrace NMI only from the first waiter. The destination CPU's other waiters can make use of backtrace obtained from the first waiter's NMI. Signed-off-by: Imran Khan Cc: Peter Zijlstra Cc: Juergen Gross Cc: Valentin Schneider Cc: Yury Norov Signed-off-by: Paul E. McKenney --- kernel/smp.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/kernel/smp.c b/kernel/smp.c index 1d41a0cb54f1..8455a53465af 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -46,6 +46,8 @@ static DEFINE_PER_CPU_ALIGNED(struct call_function_data, cfd_data); static DEFINE_PER_CPU_SHARED_ALIGNED(struct llist_head, call_single_queue); +static DEFINE_PER_CPU(atomic_t, trigger_backtrace) = ATOMIC_INIT(1); + static void __flush_smp_call_function_queue(bool warn_cpu_offline); int smpcfd_prepare_cpu(unsigned int cpu) @@ -253,7 +255,8 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 * *bug_id, !cpu_cur_csd ? "unresponsive" : "handling this request"); } if (cpu >= 0) { - dump_cpu_task(cpu); + if (atomic_cmpxchg_acquire(&per_cpu(trigger_backtrace, cpu), 1, 0)) + dump_cpu_task(cpu); if (!cpu_cur_csd) { pr_alert("csd: Re-sending CSD lock (#%d) IPI from CPU#%02d to CPU#%02d\n", *bug_id, raw_smp_processor_id(), cpu); arch_send_call_function_single_ipi(cpu); @@ -434,9 +437,14 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline) struct llist_node *entry, *prev; struct llist_head *head; static bool warned; + atomic_t *tbt; lockdep_assert_irqs_disabled(); + /* Allow waiters to send backtrace NMI from here onwards */ + tbt = this_cpu_ptr(&trigger_backtrace); + atomic_set_release(tbt, 1); + head = this_cpu_ptr(&call_single_queue); entry = llist_del_all(head); entry = llist_reverse_order(entry);