hardlockup: detect hard lockups using secondary (buddy) cpus

Message ID 20230421155255.1.I6bf789d21d0c3d75d382e7e51a804a7a51315f2c@changeid
State New
Headers
Series hardlockup: detect hard lockups using secondary (buddy) cpus |

Commit Message

Doug Anderson April 21, 2023, 10:53 p.m. UTC
  From: Colin Cross <ccross@android.com>

Implement a hardlockup detector that can be enabled on SMP systems
that don't have an arch provided one or one implemented atop perf by
using interrupts on other cpus. Each cpu will use its softlockup
hrtimer to check that the next cpu is processing hrtimer interrupts by
verifying that a counter is increasing.

NOTE: unlike the other hard lockup detectors, the buddy one can't
easily provide a backtrace on the CPU that locked up. It relies on
some other mechanism in the system to get information about the locked
up CPUs. This could be support for NMI backtraces like [1], it could
be a mechanism for printing the PC of locked CPUs like [2], or it
could be something else.

This style of hardlockup detector originated in some downstream
Android trees and has been rebased on / carried in ChromeOS trees for
quite a long time for use on arm and arm64 boards. Historically on
these boards we've leveraged mechanism [2] to get information about
hung CPUs, but we could move to [1].

NOTE: the buddy system is not really useful to enable on any
architectures that have a better mechanism. On arm64 folks have been
trying to get a better mechanism for years and there has even been
recent posts of patches adding support [3]. However, nothing about the
buddy system is tied to arm64 and several archs (even arm32, where it
was originally developed) could find it useful.

[1] https://lore.kernel.org/r/20230419225604.21204-1-dianders@chromium.org
[2] https://issuetracker.google.com/172213129
[3] https://lore.kernel.org/linux-arm-kernel/20220903093415.15850-1-lecopzer.chen@mediatek.com/

Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
Signed-off-by: Tzung-Bi Shih <tzungbi@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
This patch has been rebased in ChromeOS kernel trees many times, and
each time someone had to do work on it they added their
Signed-off-by. I've included those here. I've also left the author as
Colin Cross since the core code is still his.

I'll also note that the CC list is pretty giant, but that's what
get_maintainers came up with (plus a few other folks I thought would
be interested). As far as I can tell, there's no true MAINTAINER
listed for the existing watchdog code. Assuming people don't hate
this, maybe it would go through Andrew Morton's tree?

 include/linux/nmi.h         |  18 ++++-
 kernel/Makefile             |   1 +
 kernel/watchdog.c           |  24 ++++--
 kernel/watchdog_buddy_cpu.c | 141 ++++++++++++++++++++++++++++++++++++
 lib/Kconfig.debug           |  19 ++++-
 5 files changed, 192 insertions(+), 11 deletions(-)
 create mode 100644 kernel/watchdog_buddy_cpu.c
  

Comments

Randy Dunlap April 21, 2023, 11:59 p.m. UTC | #1
Hi--

On 4/21/23 15:53, Douglas Anderson wrote:
> From: Colin Cross <ccross@android.com>
> 
> Implement a hardlockup detector that can be enabled on SMP systems
> that don't have an arch provided one or one implemented atop perf by

Is that                            one or more
?

> using interrupts on other cpus. Each cpu will use its softlockup
> hrtimer to check that the next cpu is processing hrtimer interrupts by
> verifying that a counter is increasing.
> 
> NOTE: unlike the other hard lockup detectors, the buddy one can't
> easily provide a backtrace on the CPU that locked up. It relies on
> some other mechanism in the system to get information about the locked
> up CPUs. This could be support for NMI backtraces like [1], it could
> be a mechanism for printing the PC of locked CPUs like [2], or it
> could be something else.
> 
> This style of hardlockup detector originated in some downstream
> Android trees and has been rebased on / carried in ChromeOS trees for
> quite a long time for use on arm and arm64 boards. Historically on
> these boards we've leveraged mechanism [2] to get information about
> hung CPUs, but we could move to [1].
> 
> NOTE: the buddy system is not really useful to enable on any
> architectures that have a better mechanism. On arm64 folks have been
> trying to get a better mechanism for years and there has even been
> recent posts of patches adding support [3]. However, nothing about the
> buddy system is tied to arm64 and several archs (even arm32, where it
> was originally developed) could find it useful.
> 
> [1] https://lore.kernel.org/r/20230419225604.21204-1-dianders@chromium.org
> [2] https://issuetracker.google.com/172213129
> [3] https://lore.kernel.org/linux-arm-kernel/20220903093415.15850-1-lecopzer.chen@mediatek.com/
> 
> Signed-off-by: Colin Cross <ccross@android.com>
> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> Signed-off-by: Guenter Roeck <groeck@chromium.org>
> Signed-off-by: Tzung-Bi Shih <tzungbi@chromium.org>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> This patch has been rebased in ChromeOS kernel trees many times, and
> each time someone had to do work on it they added their
> Signed-off-by. I've included those here. I've also left the author as
> Colin Cross since the core code is still his.
> 
> I'll also note that the CC list is pretty giant, but that's what
> get_maintainers came up with (plus a few other folks I thought would
> be interested). As far as I can tell, there's no true MAINTAINER
> listed for the existing watchdog code. Assuming people don't hate
> this, maybe it would go through Andrew Morton's tree?
> 
>  include/linux/nmi.h         |  18 ++++-
>  kernel/Makefile             |   1 +
>  kernel/watchdog.c           |  24 ++++--
>  kernel/watchdog_buddy_cpu.c | 141 ++++++++++++++++++++++++++++++++++++
>  lib/Kconfig.debug           |  19 ++++-
>  5 files changed, 192 insertions(+), 11 deletions(-)
>  create mode 100644 kernel/watchdog_buddy_cpu.c
> 

> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 39d1d93164bd..9eb86bc9f5ee 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1036,6 +1036,9 @@ config HARDLOCKUP_DETECTOR_PERF
>  config HARDLOCKUP_CHECK_TIMESTAMP
>  	bool
>  
> +config HARDLOCKUP_DETECTOR_CORE
> +	bool
> +
>  #
>  # arch/ can define HAVE_HARDLOCKUP_DETECTOR_ARCH to provide their own hard
>  # lockup detector rather than the perf based detector.
> @@ -1045,6 +1048,7 @@ config HARDLOCKUP_DETECTOR
>  	depends on DEBUG_KERNEL && !S390
>  	depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_ARCH
>  	select LOCKUP_DETECTOR
> +	select HARDLOCKUP_DETECTOR_CORE
>  	select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF
>  	help
>  	  Say Y here to enable the kernel to act as a watchdog to detect
> @@ -1055,9 +1059,22 @@ config HARDLOCKUP_DETECTOR
>  	  chance to run.  The current stack trace is displayed upon detection
>  	  and the system will stay locked up.
>  
> +config HARDLOCKUP_DETECTOR_BUDDY_CPU
> +	bool "Buddy CPU hardlockup detector"
> +	depends on DEBUG_KERNEL && SMP
> +	depends on !HARDLOCKUP_DETECTOR && !HAVE_NMI_WATCHDOG
> +	depends on !S390
> +	select HARDLOCKUP_DETECTOR_CORE
> +	select SOFTLOCKUP_DETECTOR
> +	help
> +	  Say Y here to enable a hardlockup detector where CPUs check
> +	  each other for lockup. Each cpu uses its softlockup hrtimer

Preferably                            CPU

> +	  to check that the next cpu is processing hrtimer interrupts by

and                              CPU

> +	  verifying that a counter is increasing.
> +
>  config BOOTPARAM_HARDLOCKUP_PANIC
>  	bool "Panic (Reboot) On Hard Lockups"
> -	depends on HARDLOCKUP_DETECTOR
> +	depends on HARDLOCKUP_DETECTOR_CORE
>  	help
>  	  Say Y here to enable the kernel to panic on "hard lockups",
>  	  which are bugs that cause the kernel to loop in kernel
  
Ian Rogers April 22, 2023, 1:19 a.m. UTC | #2
On Fri, Apr 21, 2023 at 3:54 PM Douglas Anderson <dianders@chromium.org> wrote:
>
> From: Colin Cross <ccross@android.com>
>
> Implement a hardlockup detector that can be enabled on SMP systems
> that don't have an arch provided one or one implemented atop perf by
> using interrupts on other cpus. Each cpu will use its softlockup
> hrtimer to check that the next cpu is processing hrtimer interrupts by
> verifying that a counter is increasing.
>
> NOTE: unlike the other hard lockup detectors, the buddy one can't
> easily provide a backtrace on the CPU that locked up. It relies on
> some other mechanism in the system to get information about the locked
> up CPUs. This could be support for NMI backtraces like [1], it could
> be a mechanism for printing the PC of locked CPUs like [2], or it
> could be something else.
>
> This style of hardlockup detector originated in some downstream
> Android trees and has been rebased on / carried in ChromeOS trees for
> quite a long time for use on arm and arm64 boards. Historically on
> these boards we've leveraged mechanism [2] to get information about
> hung CPUs, but we could move to [1].
>
> NOTE: the buddy system is not really useful to enable on any
> architectures that have a better mechanism. On arm64 folks have been
> trying to get a better mechanism for years and there has even been
> recent posts of patches adding support [3]. However, nothing about the
> buddy system is tied to arm64 and several archs (even arm32, where it
> was originally developed) could find it useful.
>
> [1] https://lore.kernel.org/r/20230419225604.21204-1-dianders@chromium.org
> [2] https://issuetracker.google.com/172213129
> [3] https://lore.kernel.org/linux-arm-kernel/20220903093415.15850-1-lecopzer.chen@mediatek.com/

There is another proposal to use timers for lockup detection but not
the buddy system:
https://lore.kernel.org/lkml/20230413035844.GA31620@ranerica-svr.sc.intel.com/
It'd be very good to free up the counter used by the current NMI watchdog.

Thanks,
Ian

> Signed-off-by: Colin Cross <ccross@android.com>
> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> Signed-off-by: Guenter Roeck <groeck@chromium.org>
> Signed-off-by: Tzung-Bi Shih <tzungbi@chromium.org>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> This patch has been rebased in ChromeOS kernel trees many times, and
> each time someone had to do work on it they added their
> Signed-off-by. I've included those here. I've also left the author as
> Colin Cross since the core code is still his.
>
> I'll also note that the CC list is pretty giant, but that's what
> get_maintainers came up with (plus a few other folks I thought would
> be interested). As far as I can tell, there's no true MAINTAINER
> listed for the existing watchdog code. Assuming people don't hate
> this, maybe it would go through Andrew Morton's tree?
>
>  include/linux/nmi.h         |  18 ++++-
>  kernel/Makefile             |   1 +
>  kernel/watchdog.c           |  24 ++++--
>  kernel/watchdog_buddy_cpu.c | 141 ++++++++++++++++++++++++++++++++++++
>  lib/Kconfig.debug           |  19 ++++-
>  5 files changed, 192 insertions(+), 11 deletions(-)
>  create mode 100644 kernel/watchdog_buddy_cpu.c
>
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index 048c0b9aa623..35f6c5c2378b 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -45,6 +45,8 @@ extern void touch_softlockup_watchdog(void);
>  extern void touch_softlockup_watchdog_sync(void);
>  extern void touch_all_softlockup_watchdogs(void);
>  extern unsigned int  softlockup_panic;
> +DECLARE_PER_CPU(unsigned long, hrtimer_interrupts);
> +DECLARE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
>
>  extern int lockup_detector_online_cpu(unsigned int cpu);
>  extern int lockup_detector_offline_cpu(unsigned int cpu);
> @@ -81,14 +83,14 @@ static inline void reset_hung_task_detector(void) { }
>  #define NMI_WATCHDOG_ENABLED      (1 << NMI_WATCHDOG_ENABLED_BIT)
>  #define SOFT_WATCHDOG_ENABLED     (1 << SOFT_WATCHDOG_ENABLED_BIT)
>
> -#if defined(CONFIG_HARDLOCKUP_DETECTOR)
> +#if defined(CONFIG_HARDLOCKUP_DETECTOR_CORE)
>  extern void hardlockup_detector_disable(void);
>  extern unsigned int hardlockup_panic;
>  #else
>  static inline void hardlockup_detector_disable(void) {}
>  #endif
>
> -#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
> +#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR_CORE)
>  # define NMI_WATCHDOG_SYSCTL_PERM      0644
>  #else
>  # define NMI_WATCHDOG_SYSCTL_PERM      0444
> @@ -124,6 +126,14 @@ void watchdog_nmi_disable(unsigned int cpu);
>
>  void lockup_detector_reconfigure(void);
>
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_BUDDY_CPU
> +void buddy_cpu_touch_watchdog(void);
> +void watchdog_check_hardlockup(void);
> +#else
> +static inline void buddy_cpu_touch_watchdog(void) {}
> +static inline void watchdog_check_hardlockup(void) {}
> +#endif
> +
>  /**
>   * touch_nmi_watchdog - restart NMI watchdog timeout.
>   *
> @@ -134,6 +144,7 @@ void lockup_detector_reconfigure(void);
>  static inline void touch_nmi_watchdog(void)
>  {
>         arch_touch_nmi_watchdog();
> +       buddy_cpu_touch_watchdog();
>         touch_softlockup_watchdog();
>  }
>
> @@ -196,8 +207,7 @@ static inline bool trigger_single_cpu_backtrace(int cpu)
>  u64 hw_nmi_get_sample_period(int watchdog_thresh);
>  #endif
>
> -#if defined(CONFIG_HARDLOCKUP_CHECK_TIMESTAMP) && \
> -    defined(CONFIG_HARDLOCKUP_DETECTOR)
> +#if defined(CONFIG_HARDLOCKUP_CHECK_TIMESTAMP) && defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
>  void watchdog_update_hrtimer_threshold(u64 period);
>  #else
>  static inline void watchdog_update_hrtimer_threshold(u64 period) { }
> diff --git a/kernel/Makefile b/kernel/Makefile
> index 10ef068f598d..a2054f16f9f4 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -91,6 +91,7 @@ obj-$(CONFIG_FAIL_FUNCTION) += fail_function.o
>  obj-$(CONFIG_KGDB) += debug/
>  obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
>  obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
> +obj-$(CONFIG_HARDLOCKUP_DETECTOR_BUDDY_CPU) += watchdog_buddy_cpu.o
>  obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
>  obj-$(CONFIG_SECCOMP) += seccomp.o
>  obj-$(CONFIG_RELAY) += relay.o
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 8e61f21e7e33..1199043689ae 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -29,7 +29,7 @@
>
>  static DEFINE_MUTEX(watchdog_mutex);
>
> -#if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG)
> +#if defined(CONFIG_HARDLOCKUP_DETECTOR_CORE) || defined(CONFIG_HAVE_NMI_WATCHDOG)
>  # define WATCHDOG_DEFAULT      (SOFT_WATCHDOG_ENABLED | NMI_WATCHDOG_ENABLED)
>  # define NMI_WATCHDOG_DEFAULT  1
>  #else
> @@ -47,7 +47,7 @@ static int __read_mostly nmi_watchdog_available;
>  struct cpumask watchdog_cpumask __read_mostly;
>  unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
>
> -#ifdef CONFIG_HARDLOCKUP_DETECTOR
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_CORE
>
>  # ifdef CONFIG_SMP
>  int __read_mostly sysctl_hardlockup_all_cpu_backtrace;
> @@ -85,7 +85,9 @@ static int __init hardlockup_panic_setup(char *str)
>  }
>  __setup("nmi_watchdog=", hardlockup_panic_setup);
>
> -#endif /* CONFIG_HARDLOCKUP_DETECTOR */
> +#endif /* CONFIG_HARDLOCKUP_DETECTOR_CORE */
> +
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR
>
>  /*
>   * These functions can be overridden if an architecture implements its
> @@ -106,6 +108,13 @@ void __weak watchdog_nmi_disable(unsigned int cpu)
>         hardlockup_detector_perf_disable();
>  }
>
> +#else
> +
> +int __weak watchdog_nmi_enable(unsigned int cpu) { return 0; }
> +void __weak watchdog_nmi_disable(unsigned int cpu) { return; }
> +
> +#endif /* CONFIG_HARDLOCKUP_DETECTOR */
> +
>  /* Return 0, if a NMI watchdog is available. Error code otherwise */
>  int __weak __init watchdog_nmi_probe(void)
>  {
> @@ -179,8 +188,8 @@ static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
>  static DEFINE_PER_CPU(unsigned long, watchdog_report_ts);
>  static DEFINE_PER_CPU(struct hrtimer, watchdog_hrtimer);
>  static DEFINE_PER_CPU(bool, softlockup_touch_sync);
> -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
> -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
> +DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
> +DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
>  static unsigned long soft_lockup_nmi_warn;
>
>  static int __init nowatchdog_setup(char *str)
> @@ -364,6 +373,9 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>         /* kick the hardlockup detector */
>         watchdog_interrupt_count();
>
> +       /* test for hardlockups */
> +       watchdog_check_hardlockup();
> +
>         /* kick the softlockup detector */
>         if (completion_done(this_cpu_ptr(&softlockup_completion))) {
>                 reinit_completion(this_cpu_ptr(&softlockup_completion));
> @@ -820,7 +832,7 @@ static struct ctl_table watchdog_sysctls[] = {
>         },
>  #endif /* CONFIG_SMP */
>  #endif
> -#ifdef CONFIG_HARDLOCKUP_DETECTOR
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_CORE
>         {
>                 .procname       = "hardlockup_panic",
>                 .data           = &hardlockup_panic,
> diff --git a/kernel/watchdog_buddy_cpu.c b/kernel/watchdog_buddy_cpu.c
> new file mode 100644
> index 000000000000..db813b00e6ef
> --- /dev/null
> +++ b/kernel/watchdog_buddy_cpu.c
> @@ -0,0 +1,141 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
> +#include <linux/kernel.h>
> +#include <linux/nmi.h>
> +#include <linux/percpu-defs.h>
> +
> +static DEFINE_PER_CPU(bool, watchdog_touch);
> +static DEFINE_PER_CPU(bool, hard_watchdog_warn);
> +static cpumask_t __read_mostly watchdog_cpus;
> +
> +static unsigned long hardlockup_allcpu_dumped;
> +
> +int __init watchdog_nmi_probe(void)
> +{
> +       return 0;
> +}
> +
> +notrace void buddy_cpu_touch_watchdog(void)
> +{
> +       /*
> +        * Using __raw here because some code paths have
> +        * preemption enabled.  If preemption is enabled
> +        * then interrupts should be enabled too, in which
> +        * case we shouldn't have to worry about the watchdog
> +        * going off.
> +        */
> +       raw_cpu_write(watchdog_touch, true);
> +}
> +EXPORT_SYMBOL_GPL(buddy_cpu_touch_watchdog);
> +
> +static unsigned int watchdog_next_cpu(unsigned int cpu)
> +{
> +       cpumask_t cpus = watchdog_cpus;
> +       unsigned int next_cpu;
> +
> +       next_cpu = cpumask_next(cpu, &cpus);
> +       if (next_cpu >= nr_cpu_ids)
> +               next_cpu = cpumask_first(&cpus);
> +
> +       if (next_cpu == cpu)
> +               return nr_cpu_ids;
> +
> +       return next_cpu;
> +}
> +
> +int watchdog_nmi_enable(unsigned int cpu)
> +{
> +       /*
> +        * The new cpu will be marked online before the first hrtimer interrupt
> +        * runs on it.  If another cpu tests for a hardlockup on the new cpu
> +        * before it has run its first hrtimer, it will get a false positive.
> +        * Touch the watchdog on the new cpu to delay the first check for at
> +        * least 3 sampling periods to guarantee one hrtimer has run on the new
> +        * cpu.
> +        */
> +       per_cpu(watchdog_touch, cpu) = true;
> +       /* Match with smp_rmb() in watchdog_check_hardlockup() */
> +       smp_wmb();
> +       cpumask_set_cpu(cpu, &watchdog_cpus);
> +       return 0;
> +}
> +
> +void watchdog_nmi_disable(unsigned int cpu)
> +{
> +       unsigned int next_cpu = watchdog_next_cpu(cpu);
> +
> +       /*
> +        * Offlining this cpu will cause the cpu before this one to start
> +        * checking the one after this one.  If this cpu just finished checking
> +        * the next cpu and updating hrtimer_interrupts_saved, and then the
> +        * previous cpu checks it within one sample period, it will trigger a
> +        * false positive.  Touch the watchdog on the next cpu to prevent it.
> +        */
> +       if (next_cpu < nr_cpu_ids)
> +               per_cpu(watchdog_touch, next_cpu) = true;
> +       /* Match with smp_rmb() in watchdog_check_hardlockup() */
> +       smp_wmb();
> +       cpumask_clear_cpu(cpu, &watchdog_cpus);
> +}
> +
> +static int is_hardlockup_buddy_cpu(unsigned int cpu)
> +{
> +       unsigned long hrint = per_cpu(hrtimer_interrupts, cpu);
> +
> +       if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
> +               return 1;
> +
> +       per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
> +       return 0;
> +}
> +
> +void watchdog_check_hardlockup(void)
> +{
> +       unsigned int next_cpu;
> +
> +       /*
> +        * Test for hardlockups every 3 samples.  The sample period is
> +        *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over
> +        *  watchdog_thresh (over by 20%).
> +        */
> +       if (__this_cpu_read(hrtimer_interrupts) % 3 != 0)
> +               return;
> +
> +       /* check for a hardlockup on the next cpu */
> +       next_cpu = watchdog_next_cpu(smp_processor_id());
> +       if (next_cpu >= nr_cpu_ids)
> +               return;
> +
> +       /* Match with smp_wmb() in watchdog_nmi_enable() / watchdog_nmi_disable() */
> +       smp_rmb();
> +
> +       if (per_cpu(watchdog_touch, next_cpu) == true) {
> +               per_cpu(watchdog_touch, next_cpu) = false;
> +               return;
> +       }
> +
> +       if (is_hardlockup_buddy_cpu(next_cpu)) {
> +               /* only warn once */
> +               if (per_cpu(hard_watchdog_warn, next_cpu) == true)
> +                       return;
> +
> +               /*
> +                * Perform all-CPU dump only once to avoid multiple hardlockups
> +                * generating interleaving traces
> +                */
> +               if (sysctl_hardlockup_all_cpu_backtrace &&
> +                               !test_and_set_bit(0, &hardlockup_allcpu_dumped))
> +                       trigger_allbutself_cpu_backtrace();
> +
> +               if (hardlockup_panic)
> +                       panic("Watchdog detected hard LOCKUP on cpu %u", next_cpu);
> +               else
> +                       WARN(1, "Watchdog detected hard LOCKUP on cpu %u", next_cpu);
> +
> +               per_cpu(hard_watchdog_warn, next_cpu) = true;
> +       } else {
> +               per_cpu(hard_watchdog_warn, next_cpu) = false;
> +       }
> +}
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 39d1d93164bd..9eb86bc9f5ee 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1036,6 +1036,9 @@ config HARDLOCKUP_DETECTOR_PERF
>  config HARDLOCKUP_CHECK_TIMESTAMP
>         bool
>
> +config HARDLOCKUP_DETECTOR_CORE
> +       bool
> +
>  #
>  # arch/ can define HAVE_HARDLOCKUP_DETECTOR_ARCH to provide their own hard
>  # lockup detector rather than the perf based detector.
> @@ -1045,6 +1048,7 @@ config HARDLOCKUP_DETECTOR
>         depends on DEBUG_KERNEL && !S390
>         depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_ARCH
>         select LOCKUP_DETECTOR
> +       select HARDLOCKUP_DETECTOR_CORE
>         select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF
>         help
>           Say Y here to enable the kernel to act as a watchdog to detect
> @@ -1055,9 +1059,22 @@ config HARDLOCKUP_DETECTOR
>           chance to run.  The current stack trace is displayed upon detection
>           and the system will stay locked up.
>
> +config HARDLOCKUP_DETECTOR_BUDDY_CPU
> +       bool "Buddy CPU hardlockup detector"
> +       depends on DEBUG_KERNEL && SMP
> +       depends on !HARDLOCKUP_DETECTOR && !HAVE_NMI_WATCHDOG
> +       depends on !S390
> +       select HARDLOCKUP_DETECTOR_CORE
> +       select SOFTLOCKUP_DETECTOR
> +       help
> +         Say Y here to enable a hardlockup detector where CPUs check
> +         each other for lockup. Each cpu uses its softlockup hrtimer
> +         to check that the next cpu is processing hrtimer interrupts by
> +         verifying that a counter is increasing.
> +
>  config BOOTPARAM_HARDLOCKUP_PANIC
>         bool "Panic (Reboot) On Hard Lockups"
> -       depends on HARDLOCKUP_DETECTOR
> +       depends on HARDLOCKUP_DETECTOR_CORE
>         help
>           Say Y here to enable the kernel to panic on "hard lockups",
>           which are bugs that cause the kernel to loop in kernel
> --
> 2.40.0.634.g4ca3ef3211-goog
>
  
Daniel Thompson April 24, 2023, 12:53 p.m. UTC | #3
On Fri, Apr 21, 2023 at 03:53:30PM -0700, Douglas Anderson wrote:
> From: Colin Cross <ccross@android.com>
>
> Implement a hardlockup detector that can be enabled on SMP systems
> that don't have an arch provided one or one implemented atop perf by
> using interrupts on other cpus. Each cpu will use its softlockup
> hrtimer to check that the next cpu is processing hrtimer interrupts by
> verifying that a counter is increasing.
>
> NOTE: unlike the other hard lockup detectors, the buddy one can't
> easily provide a backtrace on the CPU that locked up. It relies on
> some other mechanism in the system to get information about the locked
> up CPUs. This could be support for NMI backtraces like [1], it could
> be a mechanism for printing the PC of locked CPUs like [2], or it
> could be something else.
>
> This style of hardlockup detector originated in some downstream
> Android trees and has been rebased on / carried in ChromeOS trees for
> quite a long time for use on arm and arm64 boards. Historically on
> these boards we've leveraged mechanism [2] to get information about
> hung CPUs, but we could move to [1].

On the Arm platforms is this code able to leverage the existing
infrastructure to extract status from stuck CPUs:
https://docs.kernel.org/trace/coresight/coresight-cpu-debug.html


Daniel.
  
Doug Anderson April 24, 2023, 3:23 p.m. UTC | #4
Hi,

On Fri, Apr 21, 2023 at 6:20 PM Ian Rogers <irogers@google.com> wrote:
>
> On Fri, Apr 21, 2023 at 3:54 PM Douglas Anderson <dianders@chromium.org> wrote:
> >
> > From: Colin Cross <ccross@android.com>
> >
> > Implement a hardlockup detector that can be enabled on SMP systems
> > that don't have an arch provided one or one implemented atop perf by
> > using interrupts on other cpus. Each cpu will use its softlockup
> > hrtimer to check that the next cpu is processing hrtimer interrupts by
> > verifying that a counter is increasing.
> >
> > NOTE: unlike the other hard lockup detectors, the buddy one can't
> > easily provide a backtrace on the CPU that locked up. It relies on
> > some other mechanism in the system to get information about the locked
> > up CPUs. This could be support for NMI backtraces like [1], it could
> > be a mechanism for printing the PC of locked CPUs like [2], or it
> > could be something else.
> >
> > This style of hardlockup detector originated in some downstream
> > Android trees and has been rebased on / carried in ChromeOS trees for
> > quite a long time for use on arm and arm64 boards. Historically on
> > these boards we've leveraged mechanism [2] to get information about
> > hung CPUs, but we could move to [1].
> >
> > NOTE: the buddy system is not really useful to enable on any
> > architectures that have a better mechanism. On arm64 folks have been
> > trying to get a better mechanism for years and there has even been
> > recent posts of patches adding support [3]. However, nothing about the
> > buddy system is tied to arm64 and several archs (even arm32, where it
> > was originally developed) could find it useful.
> >
> > [1] https://lore.kernel.org/r/20230419225604.21204-1-dianders@chromium.org
> > [2] https://issuetracker.google.com/172213129
> > [3] https://lore.kernel.org/linux-arm-kernel/20220903093415.15850-1-lecopzer.chen@mediatek.com/
>
> There is another proposal to use timers for lockup detection but not
> the buddy system:
> https://lore.kernel.org/lkml/20230413035844.GA31620@ranerica-svr.sc.intel.com/
> It'd be very good to free up the counter used by the current NMI watchdog.

Thanks for the link!

Looks like that series is x86 only, so I think that ${SUBJECT} patch
should still move forward since it provides a solution that is generic
across any platform. I guess the question is: if the buddy system gets
landed then is the HPET series still worthwhile? I guess the answer to
that would depend on whether the HPET-based watchdog has any
advantages over the buddy system.

I'd imagine that there could be some cases where the HPET system could
detect lockups that the buddy system can't. If _all_ CPUs in the
system have interrupts disabled then the buddy system won't be able to
run, but the HPET system could run. That's a win for the HPET system.
That being said, I guess I could imagine that there could be lockups
that the buddy system could detect that the HPET system couldn't. The
HPET system seems to have a single CPU in charge of processing the
main NMI and then that single CPU is in charge of checking all the
others. If that single CPU goes out to lunch then the system couldn't
detect hard lockups.

In any case, I'm happy to let others debate about the HPET system. For
now, I'll take my action items to be:

1. Modify the patch description and KConfig to include some of the
same advantages that the HPET patch series talks about (freeing up
resources).

2. Increase my CC list for the next version even more to include the
people you added to this thread who have been working on the HPET
patch series.

-Doug
  
Doug Anderson April 24, 2023, 3:41 p.m. UTC | #5
Hi,

On Mon, Apr 24, 2023 at 5:54 AM Daniel Thompson
<daniel.thompson@linaro.org> wrote:
>
> On Fri, Apr 21, 2023 at 03:53:30PM -0700, Douglas Anderson wrote:
> > From: Colin Cross <ccross@android.com>
> >
> > Implement a hardlockup detector that can be enabled on SMP systems
> > that don't have an arch provided one or one implemented atop perf by
> > using interrupts on other cpus. Each cpu will use its softlockup
> > hrtimer to check that the next cpu is processing hrtimer interrupts by
> > verifying that a counter is increasing.
> >
> > NOTE: unlike the other hard lockup detectors, the buddy one can't
> > easily provide a backtrace on the CPU that locked up. It relies on
> > some other mechanism in the system to get information about the locked
> > up CPUs. This could be support for NMI backtraces like [1], it could
> > be a mechanism for printing the PC of locked CPUs like [2], or it
> > could be something else.
> >
> > This style of hardlockup detector originated in some downstream
> > Android trees and has been rebased on / carried in ChromeOS trees for
> > quite a long time for use on arm and arm64 boards. Historically on
> > these boards we've leveraged mechanism [2] to get information about
> > hung CPUs, but we could move to [1].
>
> On the Arm platforms is this code able to leverage the existing
> infrastructure to extract status from stuck CPUs:
> https://docs.kernel.org/trace/coresight/coresight-cpu-debug.html

Yup! I wasn't explicit about this, but that's where you end up if you
follow the whole bug tracker item that was linked as [2].
Specifically, we used to have downstream patches in the ChromeOS that
just reached into the coresight range from a SoC specific driver and
printed out the CPU_DBGPCSR. When Brian was uprevving rk3399
Chromebooks he found that the equivalent functionality had made it
upstream in a generic way through the coresight framework. Brian
confirmed it was working on rk3399 and made all of the device tree
changes needed to get it all hooked up, so (at least for that SoC) it
should work on that SoC.

[2] https://issuetracker.google.com/172213129
  
Chen-Yu Tsai April 25, 2023, 4:58 a.m. UTC | #6
On Mon, Apr 24, 2023 at 11:42 PM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Mon, Apr 24, 2023 at 5:54 AM Daniel Thompson
> <daniel.thompson@linaro.org> wrote:
> >
> > On Fri, Apr 21, 2023 at 03:53:30PM -0700, Douglas Anderson wrote:
> > > From: Colin Cross <ccross@android.com>
> > >
> > > Implement a hardlockup detector that can be enabled on SMP systems
> > > that don't have an arch provided one or one implemented atop perf by
> > > using interrupts on other cpus. Each cpu will use its softlockup
> > > hrtimer to check that the next cpu is processing hrtimer interrupts by
> > > verifying that a counter is increasing.
> > >
> > > NOTE: unlike the other hard lockup detectors, the buddy one can't
> > > easily provide a backtrace on the CPU that locked up. It relies on
> > > some other mechanism in the system to get information about the locked
> > > up CPUs. This could be support for NMI backtraces like [1], it could
> > > be a mechanism for printing the PC of locked CPUs like [2], or it
> > > could be something else.
> > >
> > > This style of hardlockup detector originated in some downstream
> > > Android trees and has been rebased on / carried in ChromeOS trees for
> > > quite a long time for use on arm and arm64 boards. Historically on
> > > these boards we've leveraged mechanism [2] to get information about
> > > hung CPUs, but we could move to [1].
> >
> > On the Arm platforms is this code able to leverage the existing
> > infrastructure to extract status from stuck CPUs:
> > https://docs.kernel.org/trace/coresight/coresight-cpu-debug.html
>
> Yup! I wasn't explicit about this, but that's where you end up if you
> follow the whole bug tracker item that was linked as [2].
> Specifically, we used to have downstream patches in the ChromeOS that
> just reached into the coresight range from a SoC specific driver and
> printed out the CPU_DBGPCSR. When Brian was uprevving rk3399
> Chromebooks he found that the equivalent functionality had made it
> upstream in a generic way through the coresight framework. Brian
> confirmed it was working on rk3399 and made all of the device tree
> changes needed to get it all hooked up, so (at least for that SoC) it
> should work on that SoC.
>
> [2] https://issuetracker.google.com/172213129

IIRC with the coresight CPU debug driver enabled and the proper DT nodes
added, the panic handler does dump out information from the hardware.
I don't think it's wired up for hung tasks though.

ChenYu
  
Doug Anderson April 25, 2023, 3:26 p.m. UTC | #7
Hi,

On Mon, Apr 24, 2023 at 9:58 PM Chen-Yu Tsai <wenst@chromium.org> wrote:
>
> On Mon, Apr 24, 2023 at 11:42 PM Doug Anderson <dianders@chromium.org> wrote:
> >
> > Hi,
> >
> > On Mon, Apr 24, 2023 at 5:54 AM Daniel Thompson
> > <daniel.thompson@linaro.org> wrote:
> > >
> > > On Fri, Apr 21, 2023 at 03:53:30PM -0700, Douglas Anderson wrote:
> > > > From: Colin Cross <ccross@android.com>
> > > >
> > > > Implement a hardlockup detector that can be enabled on SMP systems
> > > > that don't have an arch provided one or one implemented atop perf by
> > > > using interrupts on other cpus. Each cpu will use its softlockup
> > > > hrtimer to check that the next cpu is processing hrtimer interrupts by
> > > > verifying that a counter is increasing.
> > > >
> > > > NOTE: unlike the other hard lockup detectors, the buddy one can't
> > > > easily provide a backtrace on the CPU that locked up. It relies on
> > > > some other mechanism in the system to get information about the locked
> > > > up CPUs. This could be support for NMI backtraces like [1], it could
> > > > be a mechanism for printing the PC of locked CPUs like [2], or it
> > > > could be something else.
> > > >
> > > > This style of hardlockup detector originated in some downstream
> > > > Android trees and has been rebased on / carried in ChromeOS trees for
> > > > quite a long time for use on arm and arm64 boards. Historically on
> > > > these boards we've leveraged mechanism [2] to get information about
> > > > hung CPUs, but we could move to [1].
> > >
> > > On the Arm platforms is this code able to leverage the existing
> > > infrastructure to extract status from stuck CPUs:
> > > https://docs.kernel.org/trace/coresight/coresight-cpu-debug.html
> >
> > Yup! I wasn't explicit about this, but that's where you end up if you
> > follow the whole bug tracker item that was linked as [2].
> > Specifically, we used to have downstream patches in the ChromeOS that
> > just reached into the coresight range from a SoC specific driver and
> > printed out the CPU_DBGPCSR. When Brian was uprevving rk3399
> > Chromebooks he found that the equivalent functionality had made it
> > upstream in a generic way through the coresight framework. Brian
> > confirmed it was working on rk3399 and made all of the device tree
> > changes needed to get it all hooked up, so (at least for that SoC) it
> > should work on that SoC.
> >
> > [2] https://issuetracker.google.com/172213129
>
> IIRC with the coresight CPU debug driver enabled and the proper DT nodes
> added, the panic handler does dump out information from the hardware.
> I don't think it's wired up for hung tasks though.

Yes, that's correct. The coresight CPU debug driver doesn't work for
hung tasks because it can't get a real stack crawl. All it can get is
the PC of the last branch that the CPU took. This is why combining
${SUBJECT} patch with the ability to get stack traces via pseudo-NMI
is superior. That being said, even with just the coresight CPU debug
driver ${SUBJECT} patch is still helpful because (assuming
"hardlockup_panic" is set) we'll do a panic which will then trigger
the coresight CPU debug driver. :-)

-Doug
  
Andi Kleen May 7, 2023, 5:12 p.m. UTC | #8
On Mon, Apr 24, 2023 at 08:23:59AM -0700, Doug Anderson wrote:
> HPET system seems to have a single CPU in charge of processing the
> main NMI and then that single CPU is in charge of checking all the
> others. If that single CPU goes out to lunch then the system couldn't
> detect hard lockups.
> 
> In any case, I'm happy to let others debate about the HPET system. For
> now, I'll take my action items to be:

We don't really seem to make any progress on the HPET series, so even
if it is better in some way a series that is never merged is always
worse than one that is.

My experience is that cases where everything locks up are very rare.
I suspect as long as we cover the garden variety single CPU lockup case well
it is likely very diminishing returns to handle more complex cases. So whatever
gets the job done is fine.

Yes freeing the Perfmon resources is big advantage of either.

-Andi
  

Patch

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 048c0b9aa623..35f6c5c2378b 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -45,6 +45,8 @@  extern void touch_softlockup_watchdog(void);
 extern void touch_softlockup_watchdog_sync(void);
 extern void touch_all_softlockup_watchdogs(void);
 extern unsigned int  softlockup_panic;
+DECLARE_PER_CPU(unsigned long, hrtimer_interrupts);
+DECLARE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
 
 extern int lockup_detector_online_cpu(unsigned int cpu);
 extern int lockup_detector_offline_cpu(unsigned int cpu);
@@ -81,14 +83,14 @@  static inline void reset_hung_task_detector(void) { }
 #define NMI_WATCHDOG_ENABLED      (1 << NMI_WATCHDOG_ENABLED_BIT)
 #define SOFT_WATCHDOG_ENABLED     (1 << SOFT_WATCHDOG_ENABLED_BIT)
 
-#if defined(CONFIG_HARDLOCKUP_DETECTOR)
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_CORE)
 extern void hardlockup_detector_disable(void);
 extern unsigned int hardlockup_panic;
 #else
 static inline void hardlockup_detector_disable(void) {}
 #endif
 
-#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
+#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR_CORE)
 # define NMI_WATCHDOG_SYSCTL_PERM	0644
 #else
 # define NMI_WATCHDOG_SYSCTL_PERM	0444
@@ -124,6 +126,14 @@  void watchdog_nmi_disable(unsigned int cpu);
 
 void lockup_detector_reconfigure(void);
 
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_BUDDY_CPU
+void buddy_cpu_touch_watchdog(void);
+void watchdog_check_hardlockup(void);
+#else
+static inline void buddy_cpu_touch_watchdog(void) {}
+static inline void watchdog_check_hardlockup(void) {}
+#endif
+
 /**
  * touch_nmi_watchdog - restart NMI watchdog timeout.
  *
@@ -134,6 +144,7 @@  void lockup_detector_reconfigure(void);
 static inline void touch_nmi_watchdog(void)
 {
 	arch_touch_nmi_watchdog();
+	buddy_cpu_touch_watchdog();
 	touch_softlockup_watchdog();
 }
 
@@ -196,8 +207,7 @@  static inline bool trigger_single_cpu_backtrace(int cpu)
 u64 hw_nmi_get_sample_period(int watchdog_thresh);
 #endif
 
-#if defined(CONFIG_HARDLOCKUP_CHECK_TIMESTAMP) && \
-    defined(CONFIG_HARDLOCKUP_DETECTOR)
+#if defined(CONFIG_HARDLOCKUP_CHECK_TIMESTAMP) && defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
 void watchdog_update_hrtimer_threshold(u64 period);
 #else
 static inline void watchdog_update_hrtimer_threshold(u64 period) { }
diff --git a/kernel/Makefile b/kernel/Makefile
index 10ef068f598d..a2054f16f9f4 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -91,6 +91,7 @@  obj-$(CONFIG_FAIL_FUNCTION) += fail_function.o
 obj-$(CONFIG_KGDB) += debug/
 obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
 obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR_BUDDY_CPU) += watchdog_buddy_cpu.o
 obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
 obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RELAY) += relay.o
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 8e61f21e7e33..1199043689ae 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -29,7 +29,7 @@ 
 
 static DEFINE_MUTEX(watchdog_mutex);
 
-#if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG)
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_CORE) || defined(CONFIG_HAVE_NMI_WATCHDOG)
 # define WATCHDOG_DEFAULT	(SOFT_WATCHDOG_ENABLED | NMI_WATCHDOG_ENABLED)
 # define NMI_WATCHDOG_DEFAULT	1
 #else
@@ -47,7 +47,7 @@  static int __read_mostly nmi_watchdog_available;
 struct cpumask watchdog_cpumask __read_mostly;
 unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
 
-#ifdef CONFIG_HARDLOCKUP_DETECTOR
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_CORE
 
 # ifdef CONFIG_SMP
 int __read_mostly sysctl_hardlockup_all_cpu_backtrace;
@@ -85,7 +85,9 @@  static int __init hardlockup_panic_setup(char *str)
 }
 __setup("nmi_watchdog=", hardlockup_panic_setup);
 
-#endif /* CONFIG_HARDLOCKUP_DETECTOR */
+#endif /* CONFIG_HARDLOCKUP_DETECTOR_CORE */
+
+#ifdef CONFIG_HARDLOCKUP_DETECTOR
 
 /*
  * These functions can be overridden if an architecture implements its
@@ -106,6 +108,13 @@  void __weak watchdog_nmi_disable(unsigned int cpu)
 	hardlockup_detector_perf_disable();
 }
 
+#else
+
+int __weak watchdog_nmi_enable(unsigned int cpu) { return 0; }
+void __weak watchdog_nmi_disable(unsigned int cpu) { return; }
+
+#endif /* CONFIG_HARDLOCKUP_DETECTOR */
+
 /* Return 0, if a NMI watchdog is available. Error code otherwise */
 int __weak __init watchdog_nmi_probe(void)
 {
@@ -179,8 +188,8 @@  static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
 static DEFINE_PER_CPU(unsigned long, watchdog_report_ts);
 static DEFINE_PER_CPU(struct hrtimer, watchdog_hrtimer);
 static DEFINE_PER_CPU(bool, softlockup_touch_sync);
-static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
-static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
+DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
+DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
 static unsigned long soft_lockup_nmi_warn;
 
 static int __init nowatchdog_setup(char *str)
@@ -364,6 +373,9 @@  static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 	/* kick the hardlockup detector */
 	watchdog_interrupt_count();
 
+	/* test for hardlockups */
+	watchdog_check_hardlockup();
+
 	/* kick the softlockup detector */
 	if (completion_done(this_cpu_ptr(&softlockup_completion))) {
 		reinit_completion(this_cpu_ptr(&softlockup_completion));
@@ -820,7 +832,7 @@  static struct ctl_table watchdog_sysctls[] = {
 	},
 #endif /* CONFIG_SMP */
 #endif
-#ifdef CONFIG_HARDLOCKUP_DETECTOR
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_CORE
 	{
 		.procname	= "hardlockup_panic",
 		.data		= &hardlockup_panic,
diff --git a/kernel/watchdog_buddy_cpu.c b/kernel/watchdog_buddy_cpu.c
new file mode 100644
index 000000000000..db813b00e6ef
--- /dev/null
+++ b/kernel/watchdog_buddy_cpu.c
@@ -0,0 +1,141 @@ 
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/kernel.h>
+#include <linux/nmi.h>
+#include <linux/percpu-defs.h>
+
+static DEFINE_PER_CPU(bool, watchdog_touch);
+static DEFINE_PER_CPU(bool, hard_watchdog_warn);
+static cpumask_t __read_mostly watchdog_cpus;
+
+static unsigned long hardlockup_allcpu_dumped;
+
+int __init watchdog_nmi_probe(void)
+{
+	return 0;
+}
+
+notrace void buddy_cpu_touch_watchdog(void)
+{
+	/*
+	 * Using __raw here because some code paths have
+	 * preemption enabled.  If preemption is enabled
+	 * then interrupts should be enabled too, in which
+	 * case we shouldn't have to worry about the watchdog
+	 * going off.
+	 */
+	raw_cpu_write(watchdog_touch, true);
+}
+EXPORT_SYMBOL_GPL(buddy_cpu_touch_watchdog);
+
+static unsigned int watchdog_next_cpu(unsigned int cpu)
+{
+	cpumask_t cpus = watchdog_cpus;
+	unsigned int next_cpu;
+
+	next_cpu = cpumask_next(cpu, &cpus);
+	if (next_cpu >= nr_cpu_ids)
+		next_cpu = cpumask_first(&cpus);
+
+	if (next_cpu == cpu)
+		return nr_cpu_ids;
+
+	return next_cpu;
+}
+
+int watchdog_nmi_enable(unsigned int cpu)
+{
+	/*
+	 * The new cpu will be marked online before the first hrtimer interrupt
+	 * runs on it.  If another cpu tests for a hardlockup on the new cpu
+	 * before it has run its first hrtimer, it will get a false positive.
+	 * Touch the watchdog on the new cpu to delay the first check for at
+	 * least 3 sampling periods to guarantee one hrtimer has run on the new
+	 * cpu.
+	 */
+	per_cpu(watchdog_touch, cpu) = true;
+	/* Match with smp_rmb() in watchdog_check_hardlockup() */
+	smp_wmb();
+	cpumask_set_cpu(cpu, &watchdog_cpus);
+	return 0;
+}
+
+void watchdog_nmi_disable(unsigned int cpu)
+{
+	unsigned int next_cpu = watchdog_next_cpu(cpu);
+
+	/*
+	 * Offlining this cpu will cause the cpu before this one to start
+	 * checking the one after this one.  If this cpu just finished checking
+	 * the next cpu and updating hrtimer_interrupts_saved, and then the
+	 * previous cpu checks it within one sample period, it will trigger a
+	 * false positive.  Touch the watchdog on the next cpu to prevent it.
+	 */
+	if (next_cpu < nr_cpu_ids)
+		per_cpu(watchdog_touch, next_cpu) = true;
+	/* Match with smp_rmb() in watchdog_check_hardlockup() */
+	smp_wmb();
+	cpumask_clear_cpu(cpu, &watchdog_cpus);
+}
+
+static int is_hardlockup_buddy_cpu(unsigned int cpu)
+{
+	unsigned long hrint = per_cpu(hrtimer_interrupts, cpu);
+
+	if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
+		return 1;
+
+	per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
+	return 0;
+}
+
+void watchdog_check_hardlockup(void)
+{
+	unsigned int next_cpu;
+
+	/*
+	 * Test for hardlockups every 3 samples.  The sample period is
+	 *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over
+	 *  watchdog_thresh (over by 20%).
+	 */
+	if (__this_cpu_read(hrtimer_interrupts) % 3 != 0)
+		return;
+
+	/* check for a hardlockup on the next cpu */
+	next_cpu = watchdog_next_cpu(smp_processor_id());
+	if (next_cpu >= nr_cpu_ids)
+		return;
+
+	/* Match with smp_wmb() in watchdog_nmi_enable() / watchdog_nmi_disable() */
+	smp_rmb();
+
+	if (per_cpu(watchdog_touch, next_cpu) == true) {
+		per_cpu(watchdog_touch, next_cpu) = false;
+		return;
+	}
+
+	if (is_hardlockup_buddy_cpu(next_cpu)) {
+		/* only warn once */
+		if (per_cpu(hard_watchdog_warn, next_cpu) == true)
+			return;
+
+		/*
+		 * Perform all-CPU dump only once to avoid multiple hardlockups
+		 * generating interleaving traces
+		 */
+		if (sysctl_hardlockup_all_cpu_backtrace &&
+				!test_and_set_bit(0, &hardlockup_allcpu_dumped))
+			trigger_allbutself_cpu_backtrace();
+
+		if (hardlockup_panic)
+			panic("Watchdog detected hard LOCKUP on cpu %u", next_cpu);
+		else
+			WARN(1, "Watchdog detected hard LOCKUP on cpu %u", next_cpu);
+
+		per_cpu(hard_watchdog_warn, next_cpu) = true;
+	} else {
+		per_cpu(hard_watchdog_warn, next_cpu) = false;
+	}
+}
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 39d1d93164bd..9eb86bc9f5ee 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1036,6 +1036,9 @@  config HARDLOCKUP_DETECTOR_PERF
 config HARDLOCKUP_CHECK_TIMESTAMP
 	bool
 
+config HARDLOCKUP_DETECTOR_CORE
+	bool
+
 #
 # arch/ can define HAVE_HARDLOCKUP_DETECTOR_ARCH to provide their own hard
 # lockup detector rather than the perf based detector.
@@ -1045,6 +1048,7 @@  config HARDLOCKUP_DETECTOR
 	depends on DEBUG_KERNEL && !S390
 	depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_ARCH
 	select LOCKUP_DETECTOR
+	select HARDLOCKUP_DETECTOR_CORE
 	select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF
 	help
 	  Say Y here to enable the kernel to act as a watchdog to detect
@@ -1055,9 +1059,22 @@  config HARDLOCKUP_DETECTOR
 	  chance to run.  The current stack trace is displayed upon detection
 	  and the system will stay locked up.
 
+config HARDLOCKUP_DETECTOR_BUDDY_CPU
+	bool "Buddy CPU hardlockup detector"
+	depends on DEBUG_KERNEL && SMP
+	depends on !HARDLOCKUP_DETECTOR && !HAVE_NMI_WATCHDOG
+	depends on !S390
+	select HARDLOCKUP_DETECTOR_CORE
+	select SOFTLOCKUP_DETECTOR
+	help
+	  Say Y here to enable a hardlockup detector where CPUs check
+	  each other for lockup. Each cpu uses its softlockup hrtimer
+	  to check that the next cpu is processing hrtimer interrupts by
+	  verifying that a counter is increasing.
+
 config BOOTPARAM_HARDLOCKUP_PANIC
 	bool "Panic (Reboot) On Hard Lockups"
-	depends on HARDLOCKUP_DETECTOR
+	depends on HARDLOCKUP_DETECTOR_CORE
 	help
 	  Say Y here to enable the kernel to panic on "hard lockups",
 	  which are bugs that cause the kernel to loop in kernel