[v2] watchdog: Prefer use "ref-cycles" for NMI watchdog

Message ID 20230516235817.2323062-1-song@kernel.org
State New
Headers
Series [v2] watchdog: Prefer use "ref-cycles" for NMI watchdog |

Commit Message

Song Liu May 16, 2023, 11:58 p.m. UTC
  NMI watchdog permanently consumes one hardware counters per CPU on the
system. For systems that use many hardware counters, this causes more
aggressive time multiplexing of perf events.

OTOH, some CPUs (mostly Intel) support "ref-cycles" event, which is rarely
used. Try use "ref-cycles" for the watchdog, so that one more hardware
counter is available to the user. If the CPU doesn't support "ref-cycles",
fall back to "cycles".

The downside of this change is that users of "ref-cycles" need to disable
nmi_watchdog.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Song Liu <song@kernel.org>

---

Changes in v2:
1. Do not send warning when failed to create ref-cycles event.
---
 kernel/watchdog_hld.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)
  

Comments

Li, Xin3 May 17, 2023, 1:23 a.m. UTC | #1
> NMI watchdog permanently consumes one hardware counters per CPU on the
> system. For systems that use many hardware counters, this causes more
> aggressive time multiplexing of perf events.
> 
> OTOH, some CPUs (mostly Intel) support "ref-cycles" event, which is rarely
> used. Try use "ref-cycles" for the watchdog, so that one more hardware
> counter is available to the user. If the CPU doesn't support "ref-cycles",
> fall back to "cycles".
> 
> The downside of this change is that users of "ref-cycles" need to disable
> nmi_watchdog.

From the discussion in v1, the users don't have to disable the NMI watchdog
*permanently*, right?

> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Song Liu <song@kernel.org>
> 
> ---
> 
> Changes in v2:
> 1. Do not send warning when failed to create ref-cycles event.
> ---
>  kernel/watchdog_hld.c | 20 ++++++++++++++------
>  1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
> index 247bf0b1582c..a1d2a43ea31f 100644
> --- a/kernel/watchdog_hld.c
> +++ b/kernel/watchdog_hld.c
> @@ -100,7 +100,7 @@ static inline bool watchdog_check_timestamp(void)
> 
>  static struct perf_event_attr wd_hw_attr = {
>  	.type		= PERF_TYPE_HARDWARE,
> -	.config		= PERF_COUNT_HW_CPU_CYCLES,
> +	.config		= PERF_COUNT_HW_REF_CPU_CYCLES,
>  	.size		= sizeof(struct perf_event_attr),
>  	.pinned		= 1,
>  	.disabled	= 1,
> @@ -163,7 +163,7 @@ static void watchdog_overflow_callback(struct perf_event
> *event,
>  	return;
>  }
> 
> -static int hardlockup_detector_event_create(void)
> +static int hardlockup_detector_event_create(bool send_warning)
>  {
>  	unsigned int cpu = smp_processor_id();
>  	struct perf_event_attr *wd_attr;
> @@ -176,8 +176,10 @@ static int hardlockup_detector_event_create(void)
>  	evt = perf_event_create_kernel_counter(wd_attr, cpu, NULL,
>  					       watchdog_overflow_callback, NULL);
>  	if (IS_ERR(evt)) {
> -		pr_debug("Perf event create on CPU %d failed with %ld\n", cpu,
> -			 PTR_ERR(evt));
> +		if (send_warning) {
> +			pr_debug("Perf event create on CPU %d failed with
> %ld\n", cpu,
> +				 PTR_ERR(evt));
> +		}
>  		return PTR_ERR(evt);
>  	}
>  	this_cpu_write(watchdog_ev, evt);
> @@ -189,7 +191,7 @@ static int hardlockup_detector_event_create(void)
>   */
>  void hardlockup_detector_perf_enable(void)
>  {
> -	if (hardlockup_detector_event_create())
> +	if (hardlockup_detector_event_create(true))
>  		return;
> 
>  	/* use original value for check */
> @@ -284,7 +286,13 @@ void __init hardlockup_detector_perf_restart(void)
>   */
>  int __init hardlockup_detector_perf_init(void)
>  {
> -	int ret = hardlockup_detector_event_create();
> +	int ret = hardlockup_detector_event_create(false);
> +
> +	if (ret) {
> +		/* Failed to create "ref-cycles", try "cycles" instead */
> +		wd_hw_attr.config = PERF_COUNT_HW_CPU_CYCLES;
> +		ret = hardlockup_detector_event_create(true);
> +	}
> 
>  	if (ret) {
>  		pr_info("Perf NMI watchdog permanently disabled\n");
> --
> 2.34.1
  
Song Liu May 17, 2023, 4:38 a.m. UTC | #2
> On May 16, 2023, at 6:23 PM, Li, Xin3 <xin3.li@intel.com> wrote:
> 
>> NMI watchdog permanently consumes one hardware counters per CPU on the
>> system. For systems that use many hardware counters, this causes more
>> aggressive time multiplexing of perf events.
>> 
>> OTOH, some CPUs (mostly Intel) support "ref-cycles" event, which is rarely
>> used. Try use "ref-cycles" for the watchdog, so that one more hardware
>> counter is available to the user. If the CPU doesn't support "ref-cycles",
>> fall back to "cycles".
>> 
>> The downside of this change is that users of "ref-cycles" need to disable
>> nmi_watchdog.
> 
> From the discussion in v1, the users don't have to disable the NMI watchdog
> *permanently*, right?

The users need to disable NMI watchdog when using ref-cycles. For example:

    # disable nmi_watchdog
    sysctl kernel.nmi_watchdog=0

    # use ref-cycles
    perf stat/record -e ref-cycles ...

    # reenable nmi_watchdog
    sysctl kernel.nmi_watchdog=1

Thanks,
Song
  
Peter Zijlstra May 17, 2023, 7:31 a.m. UTC | #3
On Tue, May 16, 2023 at 04:58:17PM -0700, Song Liu wrote:
> NMI watchdog permanently consumes one hardware counters per CPU on the
> system. For systems that use many hardware counters, this causes more
> aggressive time multiplexing of perf events.
> 
> OTOH, some CPUs (mostly Intel) support "ref-cycles" event, which is rarely
> used. Try use "ref-cycles" for the watchdog, so that one more hardware
> counter is available to the user. If the CPU doesn't support "ref-cycles",
> fall back to "cycles".
> 
> The downside of this change is that users of "ref-cycles" need to disable
> nmi_watchdog.

I still utterly hate how you hardcode ref-cycles
  
Song Liu May 17, 2023, 5:51 p.m. UTC | #4
> On May 17, 2023, at 12:31 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Tue, May 16, 2023 at 04:58:17PM -0700, Song Liu wrote:
>> NMI watchdog permanently consumes one hardware counters per CPU on the
>> system. For systems that use many hardware counters, this causes more
>> aggressive time multiplexing of perf events.
>> 
>> OTOH, some CPUs (mostly Intel) support "ref-cycles" event, which is rarely
>> used. Try use "ref-cycles" for the watchdog, so that one more hardware
>> counter is available to the user. If the CPU doesn't support "ref-cycles",
>> fall back to "cycles".
>> 
>> The downside of this change is that users of "ref-cycles" need to disable
>> nmi_watchdog.
> 
> I still utterly hate how you hardcode ref-cycles

OK.. let me try with kernel cmdline args. Sending v3. 

Thanks,
Song
  

Patch

diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 247bf0b1582c..a1d2a43ea31f 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -100,7 +100,7 @@  static inline bool watchdog_check_timestamp(void)
 
 static struct perf_event_attr wd_hw_attr = {
 	.type		= PERF_TYPE_HARDWARE,
-	.config		= PERF_COUNT_HW_CPU_CYCLES,
+	.config		= PERF_COUNT_HW_REF_CPU_CYCLES,
 	.size		= sizeof(struct perf_event_attr),
 	.pinned		= 1,
 	.disabled	= 1,
@@ -163,7 +163,7 @@  static void watchdog_overflow_callback(struct perf_event *event,
 	return;
 }
 
-static int hardlockup_detector_event_create(void)
+static int hardlockup_detector_event_create(bool send_warning)
 {
 	unsigned int cpu = smp_processor_id();
 	struct perf_event_attr *wd_attr;
@@ -176,8 +176,10 @@  static int hardlockup_detector_event_create(void)
 	evt = perf_event_create_kernel_counter(wd_attr, cpu, NULL,
 					       watchdog_overflow_callback, NULL);
 	if (IS_ERR(evt)) {
-		pr_debug("Perf event create on CPU %d failed with %ld\n", cpu,
-			 PTR_ERR(evt));
+		if (send_warning) {
+			pr_debug("Perf event create on CPU %d failed with %ld\n", cpu,
+				 PTR_ERR(evt));
+		}
 		return PTR_ERR(evt);
 	}
 	this_cpu_write(watchdog_ev, evt);
@@ -189,7 +191,7 @@  static int hardlockup_detector_event_create(void)
  */
 void hardlockup_detector_perf_enable(void)
 {
-	if (hardlockup_detector_event_create())
+	if (hardlockup_detector_event_create(true))
 		return;
 
 	/* use original value for check */
@@ -284,7 +286,13 @@  void __init hardlockup_detector_perf_restart(void)
  */
 int __init hardlockup_detector_perf_init(void)
 {
-	int ret = hardlockup_detector_event_create();
+	int ret = hardlockup_detector_event_create(false);
+
+	if (ret) {
+		/* Failed to create "ref-cycles", try "cycles" instead */
+		wd_hw_attr.config = PERF_COUNT_HW_CPU_CYCLES;
+		ret = hardlockup_detector_event_create(true);
+	}
 
 	if (ret) {
 		pr_info("Perf NMI watchdog permanently disabled\n");