From patchwork Wed Dec 20 21:15:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Doug Anderson X-Patchwork-Id: 181763 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:2483:b0:fb:cd0c:d3e with SMTP id q3csp16460dyi; Wed, 20 Dec 2023 13:19:00 -0800 (PST) X-Google-Smtp-Source: AGHT+IG3Dd698bUvgpv/gmYSVVSVrY3Yuz/FHluG4v4ZEN4U19hKwlAyFTZOenuZCYoAH5klsDBW X-Received: by 2002:a05:6e02:1806:b0:35f:9f66:574a with SMTP id a6-20020a056e02180600b0035f9f66574amr10488971ilv.57.1703107140755; Wed, 20 Dec 2023 13:19:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703107140; cv=none; d=google.com; s=arc-20160816; b=u/wjhwsUO/lceNftZaz5YcSb50mrj6rtSqtrsQ2Q/kVgWCSpgbmgVlPgMBj2Pb7WkL SXEIWHLZUwU+kVbhLiUmkHHTRRGT/b4Sho1PbGM+6pwBqLJU9oGw8mM3ydVqcwmRfKuj InqADsqhyztLCUHQsSjZ1cNfuwWutKMHXYd3KBGbk0wdk8iKW3hWpn2+cRb+6vs6DwhK uGZ2bENq6H/rC3frH3Fd8dtnpy/g5EPzCWawd3tOWihTGEol0EiMEorlTgLIU4w3+/j9 eAblMeSSq+aqtaF7HcXrJQCT5kyIlLOwN82UpAkWc0BAi5SGzuIkRXMabhoOHGmtOYiP CkTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=nqMAcmRjFD5lYv9eaz6Li/2mY5/nRWhBk3z0a51JdoE=; fh=xd3xzeCwEWmmWDqIPUTcLHu9oRFfvngZDI2DlEXKYEU=; b=PEHeGSzbfFrqAZf65o7bkGAREz/Wo1iTAjbC1/bA5OzJS6UmkXwZzvh/ueY1xH8Dvw C2mKCKECmEor3OUYzQ6hTUzKMNh9njpkgaod2bYb8FG4Aapic60SUXUlzBjdlFM0uqTy uR6HVHY1FseO6sAEQS8XRzogD57kry6RlcMLKml6D9ElgRZr3hY8kkA6je8N9nTZIgnl JHcs9uRqb0fZxbdkdHkHnchKShfJUxUHpmlxQD3sukkhy25xennkCzFvArzA8uai6pET 5Bri6RvD072ARNB4WEAZNj9VtBnALbtrsxtozu3TdFysD/hD8+IBX9pCjH8Y56PVwrH9 1hHw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="Rm/li2ew"; spf=pass (google.com: domain of linux-kernel+bounces-7438-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-7438-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id e19-20020a630f13000000b005bd27be66e1si336929pgl.719.2023.12.20.13.19.00 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Dec 2023 13:19:00 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-7438-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="Rm/li2ew"; spf=pass (google.com: domain of linux-kernel+bounces-7438-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-7438-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id F1B75B25133 for ; Wed, 20 Dec 2023 21:18:12 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B07784B12D; Wed, 20 Dec 2023 21:17:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="Rm/li2ew" X-Original-To: linux-kernel@vger.kernel.org Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CBDA34A9B3 for ; Wed, 20 Dec 2023 21:17:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=chromium.org Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-1d3aa0321b5so1291455ad.2 for ; Wed, 20 Dec 2023 13:17:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1703107031; x=1703711831; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=nqMAcmRjFD5lYv9eaz6Li/2mY5/nRWhBk3z0a51JdoE=; b=Rm/li2ewUQ2jK1rUJEjKu5kv4mMWZiKi+s0VsGzL71uSuDmezamhntuxbQUmlSoHCQ q6kPeX2kxU8+X4UF1/rg9Hi3nUpMPDGzMk1lfG5D8h1BIXxw17Uu166+p5aXFxkBQfmm ktpEbQicnMv5ns1or4ClMoLB/uaGq3beS9jcE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703107031; x=1703711831; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nqMAcmRjFD5lYv9eaz6Li/2mY5/nRWhBk3z0a51JdoE=; b=YUWmzYnxHLVFRV/EAoYhA5329UD9Ju5Pin4CDFnvAL7DgJheQGjwqWTUoYyCe4vCIq EeuKL6JR5K3BjtoHQr1MQbzv1XOtqgIWsQMiOOj9MvsC7b1w+6hl6rpec2mKuyLY7M+6 ev3CpE9CkLLaPkmVKS/4CfkXsHKpIH7sQh2EiEFs2UWmDW62c3UTtN+5RpnLurz80egc rSRbgnKoIi/k8CLWJ4leUpKDUBqGIGJF/OelPFtBA4Cdjuu+LZlmkqRxIwQOBm0GzCQ0 fZZ+zuA/T+vvnJ8o9Tj9sNGId8u2vQFBQGsCW46qTh/5WQNbq7Y8wdswQuEWWkVoiWjA Jzsw== X-Gm-Message-State: AOJu0YzXHFZeHHEcsYYIbX9JpylXhCm/l4y20u2kB8e3p1JB/HhzCsOO f3LKU53JDPlnYZnitsGQpTBs0VY2ulpxUweYi34= X-Received: by 2002:a17:903:496:b0:1d4:445:ba7b with SMTP id jj22-20020a170903049600b001d40445ba7bmr24892plb.36.1703107031029; Wed, 20 Dec 2023 13:17:11 -0800 (PST) Received: from tictac2.mtv.corp.google.com ([2620:15c:9d:2:5a22:d46c:eec1:e5d4]) by smtp.gmail.com with ESMTPSA id u10-20020a170902b28a00b001d3dfebc05esm175023plr.21.2023.12.20.13.17.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Dec 2023 13:17:10 -0800 (PST) From: Douglas Anderson To: Andrew Morton Cc: Petr Mladek , Li Zhe , Pingfan Liu , John Ogness , Lecopzer Chen , Douglas Anderson , linux-kernel@vger.kernel.org Subject: [PATCH 3/4] watchdog/hardlockup: Use printk_cpu_sync_get_irqsave() to serialize reporting Date: Wed, 20 Dec 2023 13:15:36 -0800 Message-ID: <20231220131534.3.I6ff691b3b40f0379bc860f80c6e729a0485b5247@changeid> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog In-Reply-To: <20231220211640.2023645-1-dianders@chromium.org> References: <20231220211640.2023645-1-dianders@chromium.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785837273078118276 X-GMAIL-MSGID: 1785837273078118276 If two CPUs end up reporting a hardlockup at the same time then their logs could get interleaved which is hard to read. The interleaving problem was especially bad with the "perf" hardlockup detector where the locked up CPU is always the same as the running CPU and we end up in show_regs(). show_regs() has no inherent serialization so we could mix together two crawls if two hardlockups happened at the same time (and if we didn't have `sysctl_hardlockup_all_cpu_backtrace` set). With this change we'll fully serialize hardlockups when using the "perf" hardlockup detector. The interleaving problem was less bad with the "buddy" hardlockup detector. With "buddy" we always end up calling `trigger_single_cpu_backtrace(cpu)` on some CPU other than the running one. trigger_single_cpu_backtrace() always at least serializes the individual stack crawls because it eventually uses printk_cpu_sync_get_irqsave(). Unfortunately the fact that trigger_single_cpu_backtrace() eventually calls printk_cpu_sync_get_irqsave() (on a different CPU) means that we have to drop the "lock" before calling it and we can't fully serialize all printouts associated with a given hardlockup. However, we still do get the advantage of serializing the output of print_modules() and print_irqtrace_events(). Aside from serializing hardlockups from each other, this change also has the advantage of serializing hardlockups and softlockups from each other if they happen to happen at the same time since they are both using the same "lock". Even though nobody is expected to hang while holding the lock associated with printk_cpu_sync_get_irqsave(), out of an abundance of caution, we don't call printk_cpu_sync_get_irqsave() until after we print out about the hardlockup. This makes extra sure that, even if printk_cpu_sync_get_irqsave() somehow never runs we at least print that we saw the hardlockup. This is different than the choice made for softlockup because hardlockup is really our last resort. Signed-off-by: Douglas Anderson Reviewed-by: John Ogness --- kernel/watchdog.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 526041a1100a..11f9577accca 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -151,6 +151,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) */ if (is_hardlockup(cpu)) { unsigned int this_cpu = smp_processor_id(); + unsigned long flags; /* Only print hardlockups once. */ if (per_cpu(watchdog_hardlockup_warned, cpu)) @@ -165,7 +166,17 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) return; } + /* + * NOTE: we call printk_cpu_sync_get_irqsave() after printing + * the lockup message. While it would be nice to serialize + * that printout, we really want to make sure that if some + * other CPU somehow locked up while holding the lock associated + * with printk_cpu_sync_get_irqsave() that we can still at least + * get the message about the lockup out. + */ pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", cpu); + printk_cpu_sync_get_irqsave(flags); + print_modules(); print_irqtrace_events(current); if (cpu == this_cpu) { @@ -173,7 +184,9 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) show_regs(regs); else dump_stack(); + printk_cpu_sync_put_irqrestore(flags); } else { + printk_cpu_sync_put_irqrestore(flags); trigger_single_cpu_backtrace(cpu); }