Message ID | 20230518002555.1114189-1-song@kernel.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp151141vqo; Wed, 17 May 2023 17:49:31 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7mnTYsJFDyWPPx4x9KLEP4vF9DGhykwYD8RKCH1p5mor50xOGwxPV6GsS9iWEMEDqrj8qk X-Received: by 2002:a17:902:d706:b0:1ac:8837:df8 with SMTP id w6-20020a170902d70600b001ac88370df8mr587730ply.6.1684370971551; Wed, 17 May 2023 17:49:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684370971; cv=none; d=google.com; s=arc-20160816; b=K0mCk+BBMWa198c6/PObmjVWLanOOT/lWeuJyHZFr9ck0I+GF1dJ++k8hZ+tcj3i14 ydKtYozHmFpwXlZkPTKZeP7IQWgW+uHk/iRRVNCLZ4iHco8Y3FbUNx6rk2DjzhtQPFvg XbgLPowZwTIy0/vuymcDyodJUMb+F7nVEcNKMItMHreN/yrvzreulJyaYStm57+iea2C EAvysKZ5xvXUa/GaoAP00U2ft5/uQbxcaq5c4KQvN6x3nAmpqDeCO+rO1ceuKZErM9oV M4xOSZS3CWfL0vNCTqOl5MHzoNjca06aYE5W/8sRzuqh6pT773e24XBTAVb83ftFMaLs yUSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=aQmJ9jRK3hy2s8bItx7A2HZSH5WX5nCaoWp9dh0O8/g=; b=0Bw5Fo0bJ1BFYJ9P8/nLG2XEdG2+jzJoD1qPoGaxj32YGoG/xIM0e9pY/CMwPO9KCt 1kUMImn0CXVJ5745cqBf5iG67KBrOkL+lDJkifNLw/D8/y0ZNw2vQx2eGMtnvuADkSaT z836z6fv1XyLNuSQQc4elZt9P4cbDzoATJZtidJXHHcHUVsIst80u8l9GES9f9mf5NlU Be2E0iQIzsOnoj4DGbyUZV3cLPNN/OeJSsyMLkViHOSMrOj/GE1Gwljwbzz2vrqLeX2+ ODD56jmUT5ipWmdF3PuMZxnlFB0QtOV1MaUxynrPNKjsieKfIogSHAMJb2UFiSg74Vx+ GVPA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h22-20020a170902ac9600b001aad291018fsi21871974plr.350.2023.05.17.17.49.16; Wed, 17 May 2023 17:49:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229608AbjERA0M convert rfc822-to-8bit (ORCPT <rfc822;abdi.embedded@gmail.com> + 99 others); Wed, 17 May 2023 20:26:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47728 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229476AbjERA0L (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 17 May 2023 20:26:11 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 54CE03AA1 for <linux-kernel@vger.kernel.org>; Wed, 17 May 2023 17:26:07 -0700 (PDT) Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34HNX2NH028336 for <linux-kernel@vger.kernel.org>; Wed, 17 May 2023 17:26:07 -0700 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3qmrccf1gu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for <linux-kernel@vger.kernel.org>; Wed, 17 May 2023 17:26:06 -0700 Received: from twshared29562.14.frc2.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:21d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Wed, 17 May 2023 17:26:06 -0700 Received: by devbig932.frc1.facebook.com (Postfix, from userid 4523) id 243151DC7E500; Wed, 17 May 2023 17:25:59 -0700 (PDT) From: Song Liu <song@kernel.org> To: <linux-kernel@vger.kernel.org> CC: <kernel-team@meta.com>, Song Liu <song@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, Peter Zijlstra <peterz@infradead.org> Subject: [PATCH v4] watchdog: Allow nmi watchdog to use "ref-cycles" event Date: Wed, 17 May 2023 17:25:55 -0700 Message-ID: <20230518002555.1114189-1-song@kernel.org> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8BIT X-FB-Internal: Safe Content-Type: text/plain X-Proofpoint-GUID: t09TYiwBNJmZYl17ebyb9vheX77SR38h X-Proofpoint-ORIG-GUID: t09TYiwBNJmZYl17ebyb9vheX77SR38h X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-05-17_04,2023-05-17_02,2023-02-09_01 X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766190975739388840?= X-GMAIL-MSGID: =?utf-8?q?1766190975739388840?= |
Series |
[v4] watchdog: Allow nmi watchdog to use "ref-cycles" event
|
|
Commit Message
Song Liu
May 18, 2023, 12:25 a.m. UTC
NMI watchdog permanently consumes one hardware counters per CPU on the
system. For systems that use many hardware counters, this causes more
aggressive time multiplexing of perf events.
OTOH, some CPUs (mostly Intel) support "ref-cycles" event, which is rarely
used. Add kernel cmdline arg nmi_watchdog=ref-cycles to configure the
watchdog to use "ref-cycles" event instead of "cycles".
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Song Liu <song@kernel.org>
---
Changes in v4:
Fix compile error for !CONFIG_HARDLOCKUP_DETECTOR_PERF. (kernel test bot)
Changes in v3:
Pivot the design to use kernel arg nmi_watchdog=ref-cycles (Peter)
---
Documentation/admin-guide/kernel-parameters.txt | 5 +++--
include/linux/nmi.h | 2 ++
kernel/watchdog.c | 2 ++
kernel/watchdog_hld.c | 9 +++++++++
4 files changed, 16 insertions(+), 2 deletions(-)
Comments
On 5/17/23 5:25 PM, Song Liu wrote: > NMI watchdog permanently consumes one hardware counters per CPU on the > system. For systems that use many hardware counters, this causes more > aggressive time multiplexing of perf events. > > OTOH, some CPUs (mostly Intel) support "ref-cycles" event, which is rarely > used. Add kernel cmdline arg nmi_watchdog=ref-cycles to configure the > watchdog to use "ref-cycles" event instead of "cycles". Maybe list some example how this new option will used? > > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Peter Zijlstra <peterz@infradead.org> > Signed-off-by: Song Liu <song@kernel.org> > > --- > Changes in v4: > Fix compile error for !CONFIG_HARDLOCKUP_DETECTOR_PERF. (kernel test bot) > > Changes in v3: > > Pivot the design to use kernel arg nmi_watchdog=ref-cycles (Peter) > --- > Documentation/admin-guide/kernel-parameters.txt | 5 +++-- > include/linux/nmi.h | 2 ++ > kernel/watchdog.c | 2 ++ > kernel/watchdog_hld.c | 9 +++++++++ > 4 files changed, 16 insertions(+), 2 deletions(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index 9e5bab29685f..d378e23dad7c 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -3593,10 +3593,12 @@ > Format: [state][,regs][,debounce][,die] > > nmi_watchdog= [KNL,BUGS=X86] Debugging features for SMP kernels > - Format: [panic,][nopanic,][num] > + Format: [panic,][nopanic,][ref-cycles][num] > Valid num: 0 or 1 > 0 - turn hardlockup detector in nmi_watchdog off > 1 - turn hardlockup detector in nmi_watchdog on > + ref-cycles - configure the watchdog with perf event > + "ref-cycles" instead of "cycles" > When panic is specified, panic when an NMI watchdog > timeout occurs (or 'nopanic' to not panic on an NMI > watchdog, if CONFIG_BOOTPARAM_HARDLOCKUP_PANIC is set) > @@ -7097,4 +7099,3 @@ > memory, and other data can't be written using > xmon commands. > off xmon is disabled. > - > diff --git a/include/linux/nmi.h b/include/linux/nmi.h > index 048c0b9aa623..edfd1bcce0f6 100644 > --- a/include/linux/nmi.h > +++ b/include/linux/nmi.h > @@ -102,12 +102,14 @@ extern void hardlockup_detector_perf_disable(void); > extern void hardlockup_detector_perf_enable(void); > extern void hardlockup_detector_perf_cleanup(void); > extern int hardlockup_detector_perf_init(void); > +extern void hardlockup_config_perf_event(const char *str); > #else > static inline void hardlockup_detector_perf_stop(void) { } > static inline void hardlockup_detector_perf_restart(void) { } > static inline void hardlockup_detector_perf_disable(void) { } > static inline void hardlockup_detector_perf_enable(void) { } > static inline void hardlockup_detector_perf_cleanup(void) { } > +static inline void hardlockup_config_perf_event(const char *str) { } > # if !defined(CONFIG_HAVE_NMI_WATCHDOG) > static inline int hardlockup_detector_perf_init(void) { return -ENODEV; } > static inline void arch_touch_nmi_watchdog(void) {} > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > index 8e61f21e7e33..fed4f0be8e1a 100644 > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -81,6 +81,8 @@ static int __init hardlockup_panic_setup(char *str) > nmi_watchdog_user_enabled = 0; > else if (!strncmp(str, "1", 1)) > nmi_watchdog_user_enabled = 1; > + else if (!strncmp(str, "ref-cycles", 10)) str vs. 'ref-cycles' is tested here. > + hardlockup_config_perf_event(str); > return 1; > } > __setup("nmi_watchdog=", hardlockup_panic_setup); > diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c > index 247bf0b1582c..4deca58ba6ed 100644 > --- a/kernel/watchdog_hld.c > +++ b/kernel/watchdog_hld.c > @@ -294,3 +294,12 @@ int __init hardlockup_detector_perf_init(void) > } > return ret; > } > + > +/** > + * hardlockup_config_perf_event - Overwrite config of wd_hw_attr > + */ > +void __init hardlockup_config_perf_event(const char *str) > +{ > + if (!strncmp(str, "ref-cycles", 10)) It is unnecessarily tested again here. > + wd_hw_attr.config = PERF_COUNT_HW_REF_CPU_CYCLES; > +}
> On May 17, 2023, at 11:44 PM, Yonghong Song <yhs@meta.com> wrote: > > > > On 5/17/23 5:25 PM, Song Liu wrote: >> NMI watchdog permanently consumes one hardware counters per CPU on the >> system. For systems that use many hardware counters, this causes more >> aggressive time multiplexing of perf events. >> OTOH, some CPUs (mostly Intel) support "ref-cycles" event, which is rarely >> used. Add kernel cmdline arg nmi_watchdog=ref-cycles to configure the >> watchdog to use "ref-cycles" event instead of "cycles". > > Maybe list some example how this new option will used? In most case, all we need is to add "nmi_watchdog=refcycles" to kernel args. > >> Cc: Andrew Morton <akpm@linux-foundation.org> >> Cc: Peter Zijlstra <peterz@infradead.org> >> Signed-off-by: Song Liu <song@kernel.org> >> --- >> Changes in v4: >> Fix compile error for !CONFIG_HARDLOCKUP_DETECTOR_PERF. (kernel test bot) >> Changes in v3: >> Pivot the design to use kernel arg nmi_watchdog=ref-cycles (Peter) >> --- >> Documentation/admin-guide/kernel-parameters.txt | 5 +++-- >> include/linux/nmi.h | 2 ++ >> kernel/watchdog.c | 2 ++ >> kernel/watchdog_hld.c | 9 +++++++++ >> 4 files changed, 16 insertions(+), 2 deletions(-) >> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt >> index 9e5bab29685f..d378e23dad7c 100644 >> --- a/Documentation/admin-guide/kernel-parameters.txt >> +++ b/Documentation/admin-guide/kernel-parameters.txt >> @@ -3593,10 +3593,12 @@ >> Format: [state][,regs][,debounce][,die] >> nmi_watchdog= [KNL,BUGS=X86] Debugging features for SMP kernels >> - Format: [panic,][nopanic,][num] >> + Format: [panic,][nopanic,][ref-cycles][num] >> Valid num: 0 or 1 >> 0 - turn hardlockup detector in nmi_watchdog off >> 1 - turn hardlockup detector in nmi_watchdog on >> + ref-cycles - configure the watchdog with perf event >> + "ref-cycles" instead of "cycles" >> When panic is specified, panic when an NMI watchdog >> timeout occurs (or 'nopanic' to not panic on an NMI >> watchdog, if CONFIG_BOOTPARAM_HARDLOCKUP_PANIC is set) >> @@ -7097,4 +7099,3 @@ >> memory, and other data can't be written using >> xmon commands. >> off xmon is disabled. >> - >> diff --git a/include/linux/nmi.h b/include/linux/nmi.h >> index 048c0b9aa623..edfd1bcce0f6 100644 >> --- a/include/linux/nmi.h >> +++ b/include/linux/nmi.h >> @@ -102,12 +102,14 @@ extern void hardlockup_detector_perf_disable(void); >> extern void hardlockup_detector_perf_enable(void); >> extern void hardlockup_detector_perf_cleanup(void); >> extern int hardlockup_detector_perf_init(void); >> +extern void hardlockup_config_perf_event(const char *str); >> #else >> static inline void hardlockup_detector_perf_stop(void) { } >> static inline void hardlockup_detector_perf_restart(void) { } >> static inline void hardlockup_detector_perf_disable(void) { } >> static inline void hardlockup_detector_perf_enable(void) { } >> static inline void hardlockup_detector_perf_cleanup(void) { } >> +static inline void hardlockup_config_perf_event(const char *str) { } >> # if !defined(CONFIG_HAVE_NMI_WATCHDOG) >> static inline int hardlockup_detector_perf_init(void) { return -ENODEV; } >> static inline void arch_touch_nmi_watchdog(void) {} >> diff --git a/kernel/watchdog.c b/kernel/watchdog.c >> index 8e61f21e7e33..fed4f0be8e1a 100644 >> --- a/kernel/watchdog.c >> +++ b/kernel/watchdog.c >> @@ -81,6 +81,8 @@ static int __init hardlockup_panic_setup(char *str) >> nmi_watchdog_user_enabled = 0; >> else if (!strncmp(str, "1", 1)) >> nmi_watchdog_user_enabled = 1; >> + else if (!strncmp(str, "ref-cycles", 10)) > > str vs. 'ref-cycles' is tested here. > >> + hardlockup_config_perf_event(str); >> return 1; >> } >> __setup("nmi_watchdog=", hardlockup_panic_setup); >> diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c >> index 247bf0b1582c..4deca58ba6ed 100644 >> --- a/kernel/watchdog_hld.c >> +++ b/kernel/watchdog_hld.c >> @@ -294,3 +294,12 @@ int __init hardlockup_detector_perf_init(void) >> } >> return ret; >> } >> + >> +/** >> + * hardlockup_config_perf_event - Overwrite config of wd_hw_attr >> + */ >> +void __init hardlockup_config_perf_event(const char *str) >> +{ >> + if (!strncmp(str, "ref-cycles", 10)) > > It is unnecessarily tested again here. In that case, we can probably rename the function as hardlockup_use_ref_cycles(). I am fine with either way. Thanks, Song
Hi Andrew and Peter, Does this version look good do you? Thanks, Song > On May 17, 2023, at 5:25 PM, Song Liu <song@kernel.org> wrote: > > NMI watchdog permanently consumes one hardware counters per CPU on the > system. For systems that use many hardware counters, this causes more > aggressive time multiplexing of perf events. > > OTOH, some CPUs (mostly Intel) support "ref-cycles" event, which is rarely > used. Add kernel cmdline arg nmi_watchdog=ref-cycles to configure the > watchdog to use "ref-cycles" event instead of "cycles". > > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Peter Zijlstra <peterz@infradead.org> > Signed-off-by: Song Liu <song@kernel.org> > > --- > Changes in v4: > Fix compile error for !CONFIG_HARDLOCKUP_DETECTOR_PERF. (kernel test bot) > > Changes in v3: > > Pivot the design to use kernel arg nmi_watchdog=ref-cycles (Peter) > --- > Documentation/admin-guide/kernel-parameters.txt | 5 +++-- > include/linux/nmi.h | 2 ++ > kernel/watchdog.c | 2 ++ > kernel/watchdog_hld.c | 9 +++++++++ > 4 files changed, 16 insertions(+), 2 deletions(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index 9e5bab29685f..d378e23dad7c 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -3593,10 +3593,12 @@ > Format: [state][,regs][,debounce][,die] > > nmi_watchdog= [KNL,BUGS=X86] Debugging features for SMP kernels > - Format: [panic,][nopanic,][num] > + Format: [panic,][nopanic,][ref-cycles][num] > Valid num: 0 or 1 > 0 - turn hardlockup detector in nmi_watchdog off > 1 - turn hardlockup detector in nmi_watchdog on > + ref-cycles - configure the watchdog with perf event > + "ref-cycles" instead of "cycles" > When panic is specified, panic when an NMI watchdog > timeout occurs (or 'nopanic' to not panic on an NMI > watchdog, if CONFIG_BOOTPARAM_HARDLOCKUP_PANIC is set) > @@ -7097,4 +7099,3 @@ > memory, and other data can't be written using > xmon commands. > off xmon is disabled. > - > diff --git a/include/linux/nmi.h b/include/linux/nmi.h > index 048c0b9aa623..edfd1bcce0f6 100644 > --- a/include/linux/nmi.h > +++ b/include/linux/nmi.h > @@ -102,12 +102,14 @@ extern void hardlockup_detector_perf_disable(void); > extern void hardlockup_detector_perf_enable(void); > extern void hardlockup_detector_perf_cleanup(void); > extern int hardlockup_detector_perf_init(void); > +extern void hardlockup_config_perf_event(const char *str); > #else > static inline void hardlockup_detector_perf_stop(void) { } > static inline void hardlockup_detector_perf_restart(void) { } > static inline void hardlockup_detector_perf_disable(void) { } > static inline void hardlockup_detector_perf_enable(void) { } > static inline void hardlockup_detector_perf_cleanup(void) { } > +static inline void hardlockup_config_perf_event(const char *str) { } > # if !defined(CONFIG_HAVE_NMI_WATCHDOG) > static inline int hardlockup_detector_perf_init(void) { return -ENODEV; } > static inline void arch_touch_nmi_watchdog(void) {} > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > index 8e61f21e7e33..fed4f0be8e1a 100644 > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -81,6 +81,8 @@ static int __init hardlockup_panic_setup(char *str) > nmi_watchdog_user_enabled = 0; > else if (!strncmp(str, "1", 1)) > nmi_watchdog_user_enabled = 1; > + else if (!strncmp(str, "ref-cycles", 10)) > + hardlockup_config_perf_event(str); > return 1; > } > __setup("nmi_watchdog=", hardlockup_panic_setup); > diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c > index 247bf0b1582c..4deca58ba6ed 100644 > --- a/kernel/watchdog_hld.c > +++ b/kernel/watchdog_hld.c > @@ -294,3 +294,12 @@ int __init hardlockup_detector_perf_init(void) > } > return ret; > } > + > +/** > + * hardlockup_config_perf_event - Overwrite config of wd_hw_attr > + */ > +void __init hardlockup_config_perf_event(const char *str) > +{ > + if (!strncmp(str, "ref-cycles", 10)) > + wd_hw_attr.config = PERF_COUNT_HW_REF_CPU_CYCLES; > +} > -- > 2.34.1 >
Hi Andrew and Peter, > On May 19, 2023, at 9:59 AM, Song Liu <songliubraving@meta.com> wrote: > > Hi Andrew and Peter, > > Does this version look good do you? > > Thanks, > Song > >> On May 17, 2023, at 5:25 PM, Song Liu <song@kernel.org> wrote: >> >> NMI watchdog permanently consumes one hardware counters per CPU on the >> system. For systems that use many hardware counters, this causes more >> aggressive time multiplexing of perf events. >> >> OTOH, some CPUs (mostly Intel) support "ref-cycles" event, which is rarely >> used. Add kernel cmdline arg nmi_watchdog=ref-cycles to configure the >> watchdog to use "ref-cycles" event instead of "cycles". >> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> Cc: Peter Zijlstra <peterz@infradead.org> >> Signed-off-by: Song Liu <song@kernel.org> Could you please share your comments on this patch? Thanks, Song >> >> --- >> Changes in v4: >> Fix compile error for !CONFIG_HARDLOCKUP_DETECTOR_PERF. (kernel test bot) >> >> Changes in v3: >> >> Pivot the design to use kernel arg nmi_watchdog=ref-cycles (Peter) >> --- >> Documentation/admin-guide/kernel-parameters.txt | 5 +++-- >> include/linux/nmi.h | 2 ++ >> kernel/watchdog.c | 2 ++ >> kernel/watchdog_hld.c | 9 +++++++++ >> 4 files changed, 16 insertions(+), 2 deletions(-) >> >> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt >> index 9e5bab29685f..d378e23dad7c 100644 >> --- a/Documentation/admin-guide/kernel-parameters.txt >> +++ b/Documentation/admin-guide/kernel-parameters.txt >> @@ -3593,10 +3593,12 @@ >> Format: [state][,regs][,debounce][,die] >> >> nmi_watchdog= [KNL,BUGS=X86] Debugging features for SMP kernels >> - Format: [panic,][nopanic,][num] >> + Format: [panic,][nopanic,][ref-cycles][num] >> Valid num: 0 or 1 >> 0 - turn hardlockup detector in nmi_watchdog off >> 1 - turn hardlockup detector in nmi_watchdog on >> + ref-cycles - configure the watchdog with perf event >> + "ref-cycles" instead of "cycles" >> When panic is specified, panic when an NMI watchdog >> timeout occurs (or 'nopanic' to not panic on an NMI >> watchdog, if CONFIG_BOOTPARAM_HARDLOCKUP_PANIC is set) >> @@ -7097,4 +7099,3 @@ >> memory, and other data can't be written using >> xmon commands. >> off xmon is disabled. >> - >> diff --git a/include/linux/nmi.h b/include/linux/nmi.h >> index 048c0b9aa623..edfd1bcce0f6 100644 >> --- a/include/linux/nmi.h >> +++ b/include/linux/nmi.h >> @@ -102,12 +102,14 @@ extern void hardlockup_detector_perf_disable(void); >> extern void hardlockup_detector_perf_enable(void); >> extern void hardlockup_detector_perf_cleanup(void); >> extern int hardlockup_detector_perf_init(void); >> +extern void hardlockup_config_perf_event(const char *str); >> #else >> static inline void hardlockup_detector_perf_stop(void) { } >> static inline void hardlockup_detector_perf_restart(void) { } >> static inline void hardlockup_detector_perf_disable(void) { } >> static inline void hardlockup_detector_perf_enable(void) { } >> static inline void hardlockup_detector_perf_cleanup(void) { } >> +static inline void hardlockup_config_perf_event(const char *str) { } >> # if !defined(CONFIG_HAVE_NMI_WATCHDOG) >> static inline int hardlockup_detector_perf_init(void) { return -ENODEV; } >> static inline void arch_touch_nmi_watchdog(void) {} >> diff --git a/kernel/watchdog.c b/kernel/watchdog.c >> index 8e61f21e7e33..fed4f0be8e1a 100644 >> --- a/kernel/watchdog.c >> +++ b/kernel/watchdog.c >> @@ -81,6 +81,8 @@ static int __init hardlockup_panic_setup(char *str) >> nmi_watchdog_user_enabled = 0; >> else if (!strncmp(str, "1", 1)) >> nmi_watchdog_user_enabled = 1; >> + else if (!strncmp(str, "ref-cycles", 10)) >> + hardlockup_config_perf_event(str); >> return 1; >> } >> __setup("nmi_watchdog=", hardlockup_panic_setup); >> diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c >> index 247bf0b1582c..4deca58ba6ed 100644 >> --- a/kernel/watchdog_hld.c >> +++ b/kernel/watchdog_hld.c >> @@ -294,3 +294,12 @@ int __init hardlockup_detector_perf_init(void) >> } >> return ret; >> } >> + >> +/** >> + * hardlockup_config_perf_event - Overwrite config of wd_hw_attr >> + */ >> +void __init hardlockup_config_perf_event(const char *str) >> +{ >> + if (!strncmp(str, "ref-cycles", 10)) >> + wd_hw_attr.config = PERF_COUNT_HW_REF_CPU_CYCLES; >> +} >> -- >> 2.34.1 >> >
Hi Andrew and Peter, Friendly ping... Any comment on this one? Thanks, Song > On May 25, 2023, at 3:20 PM, Song Liu <songliubraving@meta.com> wrote: > > Hi Andrew and Peter, > > >> On May 19, 2023, at 9:59 AM, Song Liu <songliubraving@meta.com> wrote: >> >> Hi Andrew and Peter, >> >> Does this version look good do you? >> >> Thanks, >> Song >> >>> On May 17, 2023, at 5:25 PM, Song Liu <song@kernel.org> wrote: >>> >>> NMI watchdog permanently consumes one hardware counters per CPU on the >>> system. For systems that use many hardware counters, this causes more >>> aggressive time multiplexing of perf events. >>> >>> OTOH, some CPUs (mostly Intel) support "ref-cycles" event, which is rarely >>> used. Add kernel cmdline arg nmi_watchdog=ref-cycles to configure the >>> watchdog to use "ref-cycles" event instead of "cycles". >>> >>> Cc: Andrew Morton <akpm@linux-foundation.org> >>> Cc: Peter Zijlstra <peterz@infradead.org> >>> Signed-off-by: Song Liu <song@kernel.org> > > Could you please share your comments on this patch? > > Thanks, > Song
On Wed, May 17, 2023 at 05:25:55PM -0700, Song Liu wrote: > NMI watchdog permanently consumes one hardware counters per CPU on the > system. For systems that use many hardware counters, this causes more > aggressive time multiplexing of perf events. > > OTOH, some CPUs (mostly Intel) support "ref-cycles" event, which is rarely > used. Add kernel cmdline arg nmi_watchdog=ref-cycles to configure the > watchdog to use "ref-cycles" event instead of "cycles". > > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Peter Zijlstra <peterz@infradead.org> > Signed-off-by: Song Liu <song@kernel.org> > > --- > Changes in v4: > Fix compile error for !CONFIG_HARDLOCKUP_DETECTOR_PERF. (kernel test bot) > > Changes in v3: > > Pivot the design to use kernel arg nmi_watchdog=ref-cycles (Peter) > --- > Documentation/admin-guide/kernel-parameters.txt | 5 +++-- > include/linux/nmi.h | 2 ++ > kernel/watchdog.c | 2 ++ > kernel/watchdog_hld.c | 9 +++++++++ > 4 files changed, 16 insertions(+), 2 deletions(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index 9e5bab29685f..d378e23dad7c 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -3593,10 +3593,12 @@ > Format: [state][,regs][,debounce][,die] > > nmi_watchdog= [KNL,BUGS=X86] Debugging features for SMP kernels > - Format: [panic,][nopanic,][num] > + Format: [panic,][nopanic,][ref-cycles][num] > Valid num: 0 or 1 > 0 - turn hardlockup detector in nmi_watchdog off > 1 - turn hardlockup detector in nmi_watchdog on > + ref-cycles - configure the watchdog with perf event > + "ref-cycles" instead of "cycles" > When panic is specified, panic when an NMI watchdog > timeout occurs (or 'nopanic' to not panic on an NMI > watchdog, if CONFIG_BOOTPARAM_HARDLOCKUP_PANIC is set) I still hate the whole ref-cycles thing, at the very least powerpc also has HAVE_HARDLOCKUP_DETECTOR_PERF and they don't have ref-cycles, but perhaps them wants to use a different event when the moon is just so... What again was wrong with the option of specifying a raw event value and falling back to cpu-cycles if that fails?
> On Jun 2, 2023, at 3:47 PM, Peter Zijlstra <peterz@infradead.org> wrote: > > On Wed, May 17, 2023 at 05:25:55PM -0700, Song Liu wrote: >> NMI watchdog permanently consumes one hardware counters per CPU on the >> system. For systems that use many hardware counters, this causes more >> aggressive time multiplexing of perf events. >> >> OTOH, some CPUs (mostly Intel) support "ref-cycles" event, which is rarely >> used. Add kernel cmdline arg nmi_watchdog=ref-cycles to configure the >> watchdog to use "ref-cycles" event instead of "cycles". >> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> Cc: Peter Zijlstra <peterz@infradead.org> >> Signed-off-by: Song Liu <song@kernel.org> >> >> --- >> Changes in v4: >> Fix compile error for !CONFIG_HARDLOCKUP_DETECTOR_PERF. (kernel test bot) >> >> Changes in v3: >> >> Pivot the design to use kernel arg nmi_watchdog=ref-cycles (Peter) >> --- >> Documentation/admin-guide/kernel-parameters.txt | 5 +++-- >> include/linux/nmi.h | 2 ++ >> kernel/watchdog.c | 2 ++ >> kernel/watchdog_hld.c | 9 +++++++++ >> 4 files changed, 16 insertions(+), 2 deletions(-) >> >> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt >> index 9e5bab29685f..d378e23dad7c 100644 >> --- a/Documentation/admin-guide/kernel-parameters.txt >> +++ b/Documentation/admin-guide/kernel-parameters.txt >> @@ -3593,10 +3593,12 @@ >> Format: [state][,regs][,debounce][,die] >> >> nmi_watchdog= [KNL,BUGS=X86] Debugging features for SMP kernels >> - Format: [panic,][nopanic,][num] >> + Format: [panic,][nopanic,][ref-cycles][num] >> Valid num: 0 or 1 >> 0 - turn hardlockup detector in nmi_watchdog off >> 1 - turn hardlockup detector in nmi_watchdog on >> + ref-cycles - configure the watchdog with perf event >> + "ref-cycles" instead of "cycles" >> When panic is specified, panic when an NMI watchdog >> timeout occurs (or 'nopanic' to not panic on an NMI >> watchdog, if CONFIG_BOOTPARAM_HARDLOCKUP_PANIC is set) > > I still hate the whole ref-cycles thing, at the very least powerpc also > has HAVE_HARDLOCKUP_DETECTOR_PERF and they don't have ref-cycles, but > perhaps them wants to use a different event when the moon is just so... > > What again was wrong with the option of specifying a raw event value and > falling back to cpu-cycles if that fails? The same raw event number may mean different events on different hardware. So it is more likely to make mistakes in configurations. For example, r300 means ref-cycles on Intel CPUs, but it also means something else on AMD CPUs. I need to be very careful which hosts to run with nmi_watchdog=r300, as it may cause surprises. OTOH, nmi_watchdog=ref-cycles won't have this issue. Of course, this won't work for powerpc. Does this make sense? Thanks, Song
Hi Peter, > On Jun 2, 2023, at 4:15 PM, Song Liu <songliubraving@meta.com> wrote: [...] >>> nmi_watchdog= [KNL,BUGS=X86] Debugging features for SMP kernels >>> - Format: [panic,][nopanic,][num] >>> + Format: [panic,][nopanic,][ref-cycles][num] >>> Valid num: 0 or 1 >>> 0 - turn hardlockup detector in nmi_watchdog off >>> 1 - turn hardlockup detector in nmi_watchdog on >>> + ref-cycles - configure the watchdog with perf event >>> + "ref-cycles" instead of "cycles" >>> When panic is specified, panic when an NMI watchdog >>> timeout occurs (or 'nopanic' to not panic on an NMI >>> watchdog, if CONFIG_BOOTPARAM_HARDLOCKUP_PANIC is set) >> >> I still hate the whole ref-cycles thing, at the very least powerpc also >> has HAVE_HARDLOCKUP_DETECTOR_PERF and they don't have ref-cycles, but >> perhaps them wants to use a different event when the moon is just so... >> >> What again was wrong with the option of specifying a raw event value and >> falling back to cpu-cycles if that fails? > > The same raw event number may mean different events on different hardware. > So it is more likely to make mistakes in configurations. For example, r300 > means ref-cycles on Intel CPUs, but it also means something else on AMD > CPUs. I need to be very careful which hosts to run with nmi_watchdog=r300, > as it may cause surprises. OTOH, nmi_watchdog=ref-cycles won't have this > issue. Of course, this won't work for powerpc. Does this make sense? Do we have other ideas to configure this? Thanks, Song
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 9e5bab29685f..d378e23dad7c 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3593,10 +3593,12 @@ Format: [state][,regs][,debounce][,die] nmi_watchdog= [KNL,BUGS=X86] Debugging features for SMP kernels - Format: [panic,][nopanic,][num] + Format: [panic,][nopanic,][ref-cycles][num] Valid num: 0 or 1 0 - turn hardlockup detector in nmi_watchdog off 1 - turn hardlockup detector in nmi_watchdog on + ref-cycles - configure the watchdog with perf event + "ref-cycles" instead of "cycles" When panic is specified, panic when an NMI watchdog timeout occurs (or 'nopanic' to not panic on an NMI watchdog, if CONFIG_BOOTPARAM_HARDLOCKUP_PANIC is set) @@ -7097,4 +7099,3 @@ memory, and other data can't be written using xmon commands. off xmon is disabled. - diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 048c0b9aa623..edfd1bcce0f6 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -102,12 +102,14 @@ extern void hardlockup_detector_perf_disable(void); extern void hardlockup_detector_perf_enable(void); extern void hardlockup_detector_perf_cleanup(void); extern int hardlockup_detector_perf_init(void); +extern void hardlockup_config_perf_event(const char *str); #else static inline void hardlockup_detector_perf_stop(void) { } static inline void hardlockup_detector_perf_restart(void) { } static inline void hardlockup_detector_perf_disable(void) { } static inline void hardlockup_detector_perf_enable(void) { } static inline void hardlockup_detector_perf_cleanup(void) { } +static inline void hardlockup_config_perf_event(const char *str) { } # if !defined(CONFIG_HAVE_NMI_WATCHDOG) static inline int hardlockup_detector_perf_init(void) { return -ENODEV; } static inline void arch_touch_nmi_watchdog(void) {} diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 8e61f21e7e33..fed4f0be8e1a 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -81,6 +81,8 @@ static int __init hardlockup_panic_setup(char *str) nmi_watchdog_user_enabled = 0; else if (!strncmp(str, "1", 1)) nmi_watchdog_user_enabled = 1; + else if (!strncmp(str, "ref-cycles", 10)) + hardlockup_config_perf_event(str); return 1; } __setup("nmi_watchdog=", hardlockup_panic_setup); diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c index 247bf0b1582c..4deca58ba6ed 100644 --- a/kernel/watchdog_hld.c +++ b/kernel/watchdog_hld.c @@ -294,3 +294,12 @@ int __init hardlockup_detector_perf_init(void) } return ret; } + +/** + * hardlockup_config_perf_event - Overwrite config of wd_hw_attr + */ +void __init hardlockup_config_perf_event(const char *str) +{ + if (!strncmp(str, "ref-cycles", 10)) + wd_hw_attr.config = PERF_COUNT_HW_REF_CPU_CYCLES; +}