From patchwork Fri May 19 17:18:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Doug Anderson X-Patchwork-Id: 96541 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1398144vqo; Fri, 19 May 2023 10:23:13 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6G9cZUd4R9AdP7D+q4jH6yyF3n4b6k27cP6c4ft0jsE6V9tW1Wxn8K0wnZnUHWD7PZtg+D X-Received: by 2002:a17:902:d4d0:b0:1ac:83d1:9269 with SMTP id o16-20020a170902d4d000b001ac83d19269mr3993796plg.16.1684516993133; Fri, 19 May 2023 10:23:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684516993; cv=none; d=google.com; s=arc-20160816; b=IJIxfj3L4SYXoL/1UBymIlKXd9MAxk9Zf6tBd1sccFvBExJRIFPX9lQHZ0NojLTnV6 VC5jGraoDVlRqnfSWLFCZXxn4G+gGfUqntugBIaEb9OxKW5lV2jlnAsmbs8smQsuN6q5 dx4bXVS1jqS+wHlCesMcji49bAtmtPBRovzfbOfarNzKgbE4iEcpKOTDZE3oZqKZA4iW AzJsNdSeQxZiRBHZHOpUSdksJXg0Wdhijpf3gkrA10lyAsiDtx/kF/yoySh3c01/O6MD kPsoLbH97Ni+ZDaJg90djgxfouUA9oO1SVwwmxG3qg/b4iAiy+/05OUAGctQuCShX+Mp 1z4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=c2pZ/r5SeHFXOp7Ja/58zWTa7i5IGy6MZWBDd9oaxl8=; b=i/KIAPsTcsWuBpbfKFxZP8XFOYVrzbBwehGUGAxc226vUBDsRSJPfko6/HezwX9D0+ EaZGjDLwBSz+fGh31RrChD5fZ/MklbvyffgzYSQm32j4um0NCdIq6IesDlfDW6lJNZtk BhOK+6UhCHy8lMldG+iuOnvMhw+djjDlxzRfopqB72MooRxLhoYpSdQ0cO3DEs34IkvA 7O2a6N87J9bPgMx787IdDDSPGUsbT/qeWn2gbN2o4uOIu9uG0x7qPovAMzr6xMxtbi9J hTI1NseiN6m5Y+ZNuZQX6vTLRgvHfyJSY6aC7AvcdqlijNdehT2fFdsTUYRGIeAk5Eu6 EZMg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=ilDuCWth; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ha21-20020a17090af3d500b0023def94be5esi2055013pjb.20.2023.05.19.10.22.56; Fri, 19 May 2023 10:23:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=ilDuCWth; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231202AbjESRVd (ORCPT + 99 others); Fri, 19 May 2023 13:21:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59812 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230203AbjESRVX (ORCPT ); Fri, 19 May 2023 13:21:23 -0400 Received: from mail-pj1-x1030.google.com (mail-pj1-x1030.google.com [IPv6:2607:f8b0:4864:20::1030]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3E437E5F for ; Fri, 19 May 2023 10:21:02 -0700 (PDT) Received: by mail-pj1-x1030.google.com with SMTP id 98e67ed59e1d1-253724f6765so1419865a91.3 for ; Fri, 19 May 2023 10:21:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1684516862; x=1687108862; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=c2pZ/r5SeHFXOp7Ja/58zWTa7i5IGy6MZWBDd9oaxl8=; b=ilDuCWthPOZFKtCt3zIf5m+QHYxhL0PTxdcW95Z7gg00USsiUeQGowTgIWPEH/Y33c Ik/fn/jfykHY8iDzsHbZ6NpVZo6Xk3e2y5yTqmdCNxluLu9AGXjZkhh5f44uskA/GzHn u1LDd0tSIk5PEeZ5ZBUo0udHktvUN+nNujoAY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684516862; x=1687108862; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=c2pZ/r5SeHFXOp7Ja/58zWTa7i5IGy6MZWBDd9oaxl8=; b=lFWFHL6z1k4WBfMK3mDJKPXGA4BEghhg/wX90GLK92sBEqmfT+R1QhqKIZ51itT6db m/CpgDF4GnnVJxs4AXU9DoFSTVNxpUNXch3FyK5aKSbKziE5hawjGPaZnA5/Hu066tVO d8hy41ty5P0VJzaDVw2nEpDG9+C8Jahi2755qjj2d3EgdDP7OXqm6JCLT/h9/e+DiHgG 9Pt3AdMtLCS3voWXyGTuRnF8YL3jzC9wciXoxee3GJAGAf1k9/7FZWdxwnqKyqGDP96K eFAXZLuFepAmrIy3tTqn2jtFk346jkAWnwhbsLVAtbmnQfZPw4YrK4QePzXhw6ekYlMG QQ0w== X-Gm-Message-State: AC+VfDy/ScyTf4nk2MSTiaO2JeTfHV9c/PVj/9C/tUTZky9Z9es7m/aY 6KsEH6tP0r+F95ZbKaibd7Z6Rw== X-Received: by 2002:a17:90a:898f:b0:247:eae:1787 with SMTP id v15-20020a17090a898f00b002470eae1787mr2835153pjn.36.1684516861870; Fri, 19 May 2023 10:21:01 -0700 (PDT) Received: from tictac2.mtv.corp.google.com ([2620:15c:9d:2:9b89:2dd0:d160:429d]) by smtp.gmail.com with ESMTPSA id gj19-20020a17090b109300b0024e4f169931sm1763835pjb.2.2023.05.19.10.20.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 May 2023 10:21:01 -0700 (PDT) From: Douglas Anderson To: Petr Mladek , Andrew Morton Cc: Matthias Kaehlcke , kgdb-bugreport@lists.sourceforge.net, Stephane Eranian , mpe@ellerman.id.au, Tzung-Bi Shih , Daniel Thompson , Mark Rutland , linuxppc-dev@lists.ozlabs.org, Sumit Garg , npiggin@gmail.com, davem@davemloft.net, Marc Zyngier , Stephen Boyd , sparclinux@vger.kernel.org, christophe.leroy@csgroup.eu, Catalin Marinas , ravi.v.shankar@intel.com, Randy Dunlap , Pingfan Liu , Guenter Roeck , Lecopzer Chen , Ian Rogers , ito-yuichi@fujitsu.com, ricardo.neri@intel.com, linux-arm-kernel@lists.infradead.org, linux-perf-users@vger.kernel.org, Will Deacon , Chen-Yu Tsai , linux-kernel@vger.kernel.org, Masayoshi Mizuma , Andi Kleen , Douglas Anderson Subject: [PATCH v5 08/18] watchdog/hardlockup: Move perf hardlockup checking/panic to common watchdog.c Date: Fri, 19 May 2023 10:18:32 -0700 Message-ID: <20230519101840.v5.8.Id4133d3183e798122dc3b6205e7852601f289071@changeid> X-Mailer: git-send-email 2.40.1.698.g37aff9b760-goog In-Reply-To: <20230519101840.v5.18.Ia44852044cdcb074f387e80df6b45e892965d4a1@changeid> References: <20230519101840.v5.18.Ia44852044cdcb074f387e80df6b45e892965d4a1@changeid> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766344090621682117?= X-GMAIL-MSGID: =?utf-8?q?1766344090621682117?= The perf hardlockup detector works by looking at interrupt counts and seeing if they change from run to run. The interrupt counts are managed by the common watchdog code via its watchdog_timer_fn(). Currently the API between the perf detector and the common code is a function: is_hardlockup(). When the hard lockup detector sees that function return true then it handles printing out debug info and inducing a panic if necessary. Let's change the API a little bit in preparation for the buddy hardlockup detector. The buddy hardlockup detector wants to print nearly the same debug info and have nearly the same panic behavior. That means we want to move all that code to the common file. For now, the code in the common file will only be there if the perf hardlockup detector is enabled, but eventually it will be selected by a common config. Right now, this _just_ moves the code from the perf detector file to the common file and changes the names. It doesn't make the changes that the buddy hardlockup detector will need and doesn't do any style cleanups. A future patch will do cleanup to make it more obvious what changed. With the above, we no longer have any callers of is_hardlockup() outside of the "watchdog.c" file, so we can remove it from the header, make it static, and move it to the same "#ifdef" block as our new watchdog_hardlockup_check(). While doing this, it can be noted that even if no hardlockup detectors were configured the existing code used to still have the code for counting/checking "hrtimer_interrupts" even if the perf hardlockup detector wasn't configured. We didn't need to do that, so move all the "hrtimer_interrupts" counting to only be there if the perf hardlockup detector is configured as well. This change is expected to be a no-op. Signed-off-by: Douglas Anderson Reviewed-by: Petr Mladek --- Changes in v5: - watchdog_hardlockup_interrupt_count() => watchdog_hardlockup_kick() - watchdog_hardlockup_is_lockedup() => is_hardlockup() Changes in v4: - ("Move perf hardlockup checking/panic ...") new for v4. include/linux/nmi.h | 5 ++- kernel/watchdog.c | 93 +++++++++++++++++++++++++++++++++--------- kernel/watchdog_perf.c | 42 +------------------ 3 files changed, 78 insertions(+), 62 deletions(-) diff --git a/include/linux/nmi.h b/include/linux/nmi.h index fafab128f37e..0c62c1bf0a71 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -15,7 +15,6 @@ void lockup_detector_init(void); void lockup_detector_soft_poweroff(void); void lockup_detector_cleanup(void); -bool is_hardlockup(void); extern int watchdog_user_enabled; extern int nmi_watchdog_user_enabled; @@ -88,6 +87,10 @@ extern unsigned int hardlockup_panic; static inline void hardlockup_detector_disable(void) {} #endif +#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) +void watchdog_hardlockup_check(struct pt_regs *regs); +#endif + #if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR) # define NMI_WATCHDOG_SYSCTL_PERM 0644 #else diff --git a/kernel/watchdog.c b/kernel/watchdog.c index c705a18b26bf..12ce37d76e7d 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -85,6 +85,78 @@ __setup("nmi_watchdog=", hardlockup_panic_setup); #endif /* CONFIG_HARDLOCKUP_DETECTOR */ +#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) + +static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); +static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved); +static DEFINE_PER_CPU(bool, hard_watchdog_warn); +static unsigned long hardlockup_allcpu_dumped; + +static bool is_hardlockup(void) +{ + unsigned long hrint = __this_cpu_read(hrtimer_interrupts); + + if (__this_cpu_read(hrtimer_interrupts_saved) == hrint) + return true; + + __this_cpu_write(hrtimer_interrupts_saved, hrint); + return false; +} + +static void watchdog_hardlockup_kick(void) +{ + __this_cpu_inc(hrtimer_interrupts); +} + +void watchdog_hardlockup_check(struct pt_regs *regs) +{ + /* check for a hardlockup + * This is done by making sure our timer interrupt + * is incrementing. The timer interrupt should have + * fired multiple times before we overflow'd. If it hasn't + * then this is a good indication the cpu is stuck + */ + if (is_hardlockup()) { + int this_cpu = smp_processor_id(); + + /* only print hardlockups once */ + if (__this_cpu_read(hard_watchdog_warn) == true) + return; + + pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", + this_cpu); + print_modules(); + print_irqtrace_events(current); + if (regs) + show_regs(regs); + else + dump_stack(); + + /* + * Perform all-CPU dump only once to avoid multiple hardlockups + * generating interleaving traces + */ + if (sysctl_hardlockup_all_cpu_backtrace && + !test_and_set_bit(0, &hardlockup_allcpu_dumped)) + trigger_allbutself_cpu_backtrace(); + + if (hardlockup_panic) + nmi_panic(regs, "Hard LOCKUP"); + + __this_cpu_write(hard_watchdog_warn, true); + return; + } + + __this_cpu_write(hard_watchdog_warn, false); + return; +} + +#else /* CONFIG_HARDLOCKUP_DETECTOR_PERF */ + +static inline void watchdog_hardlockup_kick(void) { } + +#endif /* !CONFIG_HARDLOCKUP_DETECTOR_PERF */ + /* * These functions can be overridden if an architecture implements its * own hardlockup detector. @@ -176,8 +248,6 @@ static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts); static DEFINE_PER_CPU(unsigned long, watchdog_report_ts); static DEFINE_PER_CPU(struct hrtimer, watchdog_hrtimer); static DEFINE_PER_CPU(bool, softlockup_touch_sync); -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved); static unsigned long soft_lockup_nmi_warn; static int __init nowatchdog_setup(char *str) @@ -312,22 +382,6 @@ static int is_softlockup(unsigned long touch_ts, } /* watchdog detector functions */ -bool is_hardlockup(void) -{ - unsigned long hrint = __this_cpu_read(hrtimer_interrupts); - - if (__this_cpu_read(hrtimer_interrupts_saved) == hrint) - return true; - - __this_cpu_write(hrtimer_interrupts_saved, hrint); - return false; -} - -static void watchdog_interrupt_count(void) -{ - __this_cpu_inc(hrtimer_interrupts); -} - static DEFINE_PER_CPU(struct completion, softlockup_completion); static DEFINE_PER_CPU(struct cpu_stop_work, softlockup_stop_work); @@ -358,8 +412,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) if (!watchdog_enabled) return HRTIMER_NORESTART; - /* kick the hardlockup detector */ - watchdog_interrupt_count(); + watchdog_hardlockup_kick(); /* kick the softlockup detector */ if (completion_done(this_cpu_ptr(&softlockup_completion))) { diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c index 8b8015758ea5..04415812d079 100644 --- a/kernel/watchdog_perf.c +++ b/kernel/watchdog_perf.c @@ -20,13 +20,11 @@ #include #include -static DEFINE_PER_CPU(bool, hard_watchdog_warn); static DEFINE_PER_CPU(bool, watchdog_nmi_touch); static DEFINE_PER_CPU(struct perf_event *, watchdog_ev); static DEFINE_PER_CPU(struct perf_event *, dead_event); static struct cpumask dead_events_mask; -static unsigned long hardlockup_allcpu_dumped; static atomic_t watchdog_cpus = ATOMIC_INIT(0); notrace void arch_touch_nmi_watchdog(void) @@ -122,45 +120,7 @@ static void watchdog_overflow_callback(struct perf_event *event, return; } - /* check for a hardlockup - * This is done by making sure our timer interrupt - * is incrementing. The timer interrupt should have - * fired multiple times before we overflow'd. If it hasn't - * then this is a good indication the cpu is stuck - */ - if (is_hardlockup()) { - int this_cpu = smp_processor_id(); - - /* only print hardlockups once */ - if (__this_cpu_read(hard_watchdog_warn) == true) - return; - - pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", - this_cpu); - print_modules(); - print_irqtrace_events(current); - if (regs) - show_regs(regs); - else - dump_stack(); - - /* - * Perform all-CPU dump only once to avoid multiple hardlockups - * generating interleaving traces - */ - if (sysctl_hardlockup_all_cpu_backtrace && - !test_and_set_bit(0, &hardlockup_allcpu_dumped)) - trigger_allbutself_cpu_backtrace(); - - if (hardlockup_panic) - nmi_panic(regs, "Hard LOCKUP"); - - __this_cpu_write(hard_watchdog_warn, true); - return; - } - - __this_cpu_write(hard_watchdog_warn, false); - return; + watchdog_hardlockup_check(regs); } static int hardlockup_detector_event_create(void)