From patchwork Wed Dec 20 21:15:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Doug Anderson X-Patchwork-Id: 181760 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:2483:b0:fb:cd0c:d3e with SMTP id q3csp15861dyi; Wed, 20 Dec 2023 13:17:39 -0800 (PST) X-Google-Smtp-Source: AGHT+IFxe9N3mG4qNRztHmEB03UNS5bLfkPV8xxhcEforKeeETu5IR/jZVNO32l/5N2HNFvkhIjs X-Received: by 2002:a05:6214:f21:b0:67a:359:a82c with SMTP id iw1-20020a0562140f2100b0067a0359a82cmr5876957qvb.22.1703107058836; Wed, 20 Dec 2023 13:17:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703107058; cv=none; d=google.com; s=arc-20160816; b=kgjMScKRrjPkDRSy6NvbtUT4W3vr1tjv7nJDpfhtxqZ/eSGzwJo6Xd/lQ6SeKjFBGq UoygdXPoBHmFXKADdVtJx64Whe2x/F84g1tqmLiU34YKoqyQdnExBqLotI8SRcofE2w6 9iv4i6Wk9a24ahIVKNe757mUIbVWk4R86rsDC7OVsW3poO80UKGVwOjzUM+s52iW1i9m eul1ytG0NqwtFpbJ/ZWgVKBIPgt/Jkt41UkuKQ0fQLYm09mQ11vAtJWXCToxxlt6kiJo EsxlmbRYsyqvvVsC9QR3JUIwrAtbpQU3i/lyHbIN6gPJMwPzQWJEYB2S1vYO4TAPNeyD LKRw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=t1CDklyqCRANoyYKv7Digk6bbgZ534wMCluTfdczIuw=; fh=xd3xzeCwEWmmWDqIPUTcLHu9oRFfvngZDI2DlEXKYEU=; b=g7kA5+SxUOR6wlQ1t+92iO7O+8u0IgAEyIsSCIfVDviNe/3Yeb6Xys5S4T5V/maaya jLWSvFHZCA6fZhves17da3cbvECRVpGoqCHcyULSD3SJVr340nkjZDwLfcrzGb7+xMcg /4zLiM4/ZhHQUYdOFAvcDWme1a908jsgdDKtCBnSCOcdzz9cjP42nutMuMWyl/+Ns7i7 ZWzK2s8sI+fIi1Nzzzf6ZEZ+gqCaHib6m50yy1Z5PQj5sXDP+w7+YKWYd5+P0HFM3qJT WI4516Ps/yV3/4LCViZ53uJo/HPnQDx7oV9E/2SM8VWHKse4CJ1JlTKtYy6axOuIjoBu Xdgw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="MgJD/Z3U"; spf=pass (google.com: domain of linux-kernel+bounces-7436-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-7436-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id e4-20020a0cf344000000b0067f28c68f6fsi570799qvm.513.2023.12.20.13.17.38 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Dec 2023 13:17:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-7436-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="MgJD/Z3U"; spf=pass (google.com: domain of linux-kernel+bounces-7436-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-7436-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 9D8F01C22C11 for ; Wed, 20 Dec 2023 21:17:38 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 0D5554A995; Wed, 20 Dec 2023 21:17:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="MgJD/Z3U" X-Original-To: linux-kernel@vger.kernel.org Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18E7C495CE for ; Wed, 20 Dec 2023 21:17:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=chromium.org Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-1d3dee5f534so9098485ad.1 for ; Wed, 20 Dec 2023 13:17:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1703107026; x=1703711826; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=t1CDklyqCRANoyYKv7Digk6bbgZ534wMCluTfdczIuw=; b=MgJD/Z3UnTeItdFfsGqEyxxSFlGTUtQ6xL2moU4MO/TcdgyUQRXTU+jRyvUDLTaMkP sOcjH3DXZ+mc3XTVAK567F5grC4UZufmQNXphmdjfI3OqkoYLys0AnYUC+HvJpQkEbdd 0wO1G+a1lsQIyhsQnDZCVCXQ7XA3zARvR71Fc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703107026; x=1703711826; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=t1CDklyqCRANoyYKv7Digk6bbgZ534wMCluTfdczIuw=; b=UZ4lZxiR4d4OieR1I+jLOukEPSOaaWiYYXuWhvnUG6rayLBweyWbAXwS+oYpdlZv4p iXy7Ux0ndKNJj/zvN5kIzdXWP2f+xLnG4SW106CxETyJFKNlv3pUMwmslZavpH9IhnIT 540r+G9yNZKwvWm5fpEsCNAKWzrw1o8HeDwhG1CB+4abwDGjqqD7hHBzkGzQIlvpfO91 lkEikQxGPKupTIovF2Iow5sCIW3M9OQdmRsKr6tLsp7xT5Qlp9dYptpX5tkE2Ujqs2HE 2V+2yJCpnFn58gFLdEQIPn/DtDgPbGYvpmseNIhdV73b2Yfpctg4tLSJv89I1X4RQXGh w4sA== X-Gm-Message-State: AOJu0YypVMTZYn7HtEWMfECMoWqOwwedWsDNl+PygCwPGlWHxWK/FwiC 47y6V7WzmuqNnVhR76Aj1SiFeg== X-Received: by 2002:a17:902:e543:b0:1d0:bf4c:166c with SMTP id n3-20020a170902e54300b001d0bf4c166cmr4634775plf.1.1703107026411; Wed, 20 Dec 2023 13:17:06 -0800 (PST) Received: from tictac2.mtv.corp.google.com ([2620:15c:9d:2:5a22:d46c:eec1:e5d4]) by smtp.gmail.com with ESMTPSA id u10-20020a170902b28a00b001d3dfebc05esm175023plr.21.2023.12.20.13.17.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Dec 2023 13:17:05 -0800 (PST) From: Douglas Anderson To: Andrew Morton Cc: Petr Mladek , Li Zhe , Pingfan Liu , John Ogness , Lecopzer Chen , Douglas Anderson , linux-kernel@vger.kernel.org Subject: [PATCH 1/4] watchdog/hardlockup: Adopt softlockup logic avoiding double-dumps Date: Wed, 20 Dec 2023 13:15:34 -0800 Message-ID: <20231220131534.1.I4f35a69fbb124b5f0c71f75c631e11fabbe188ff@changeid> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog In-Reply-To: <20231220211640.2023645-1-dianders@chromium.org> References: <20231220211640.2023645-1-dianders@chromium.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785837187538641729 X-GMAIL-MSGID: 1785837187538641729 The hardlockup detector and softlockup detector both have the ability to dump the stack of all CPUs (`kernel.hardlockup_all_cpu_backtrace` and `kernel.softlockup_all_cpu_backtrace`). Both detectors also have some logic to attempt to avoid interleaving printouts if two CPUs were trying to do dumps of all CPUs at the same time. However: - The hardlockup detector's logic still allowed interleaving some information. Specifically another CPU could print modules and dump the stack of the locked CPU at the same time we were dumping all CPUs. - In the case where `kernel.hardlockup_panic` was set in addition to `kernel.hardlockup_all_cpu_backtrace`, when two CPUs both detected hardlockups at the same time the second CPU could call panic() while the first was still dumping stacks. This was especially bad if the locked up CPU wasn't responding to the request for a backtrace since the function nmi_trigger_cpumask_backtrace() can wait up to 10 seconds. Let's resolve this by adopting the softlockup logic in the hardlockup handler. NOTES: - As part of this, one might think that we should make a helper function that both the hard and softlockup detectors call. This turns out not to be super trivial since it would have to be parameterized quite a bit since there are separate global variables controlling each lockup detector and they print log messages that are just different enough that it would be a pain. We probably don't want to change the messages that are printed without good reason to avoid throwing log parsers for a loop. - One might also think that it would be a good idea to have the hardlockup and softlockup detector use the same global variable to prevent interleaving. This would make sure that softlockups and hardlockups can't interleave each other. That _almost_ works but has a dangerous flaw if `kernel.hardlockup_panic` is not the same as `kernel.softlockup_panic` because we might skip a call to panic() if one type of lockup was detected at the same time as another. Signed-off-by: Douglas Anderson --- kernel/watchdog.c | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index bf30a6fac665..b4fd2f12137f 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -91,7 +91,7 @@ static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts); static DEFINE_PER_CPU(int, hrtimer_interrupts_saved); static DEFINE_PER_CPU(bool, watchdog_hardlockup_warned); static DEFINE_PER_CPU(bool, watchdog_hardlockup_touched); -static unsigned long watchdog_hardlockup_all_cpu_dumped; +static unsigned long hard_lockup_nmi_warn; notrace void arch_touch_nmi_watchdog(void) { @@ -156,6 +156,15 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) if (per_cpu(watchdog_hardlockup_warned, cpu)) return; + /* + * Prevent multiple hard-lockup reports if one cpu is already + * engaged in dumping all cpu back traces. + */ + if (sysctl_hardlockup_all_cpu_backtrace) { + if (test_and_set_bit_lock(0, &hard_lockup_nmi_warn)) + return; + } + pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", cpu); print_modules(); print_irqtrace_events(current); @@ -168,13 +177,10 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) trigger_single_cpu_backtrace(cpu); } - /* - * Perform multi-CPU dump only once to avoid multiple - * hardlockups generating interleaving traces - */ - if (sysctl_hardlockup_all_cpu_backtrace && - !test_and_set_bit(0, &watchdog_hardlockup_all_cpu_dumped)) + if (sysctl_hardlockup_all_cpu_backtrace) { trigger_allbutcpu_cpu_backtrace(cpu); + clear_bit_unlock(0, &hard_lockup_nmi_warn); + } if (hardlockup_panic) nmi_panic(regs, "Hard LOCKUP");