[v8,09/10] arm64: kgdb: Roundup cpus using IPI as NMI

Message ID 20230419155341.v8.9.I2ef26d1b3bfbed2d10a281942b0da7d9854de05e@changeid
State New
Headers
Series arm64: Add framework to turn an IPI as NMI |

Commit Message

Doug Anderson April 19, 2023, 10:56 p.m. UTC
  From: Sumit Garg <sumit.garg@linaro.org>

arm64 platforms with GICv3 or later supports pseudo NMIs which can be
leveraged to roundup CPUs which are stuck in hard lockup state with
interrupts disabled that wouldn't be possible with a normal IPI.

So instead switch to roundup CPUs using IPI turned as NMI. And in
case a particular arm64 platform doesn't supports pseudo NMIs,
it will switch back to default kgdb CPUs roundup mechanism.

Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Tested-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

(no changes since v1)

 arch/arm64/kernel/ipi_nmi.c |  5 +++++
 arch/arm64/kernel/kgdb.c    | 18 ++++++++++++++++++
 2 files changed, 23 insertions(+)
  

Comments

Daniel Thompson May 12, 2023, 2 p.m. UTC | #1
On Wed, Apr 19, 2023 at 03:56:03PM -0700, Douglas Anderson wrote:
> From: Sumit Garg <sumit.garg@linaro.org>
>
> arm64 platforms with GICv3 or later supports pseudo NMIs which can be
> leveraged to roundup CPUs which are stuck in hard lockup state with
> interrupts disabled that wouldn't be possible with a normal IPI.
>
> So instead switch to roundup CPUs using IPI turned as NMI. And in
> case a particular arm64 platform doesn't supports pseudo NMIs,
> it will switch back to default kgdb CPUs roundup mechanism.
>
> Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
> Tested-by: Chen-Yu Tsai <wens@csie.org>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
>
> (no changes since v1)
>
>  arch/arm64/kernel/ipi_nmi.c |  5 +++++
>  arch/arm64/kernel/kgdb.c    | 18 ++++++++++++++++++
>  2 files changed, 23 insertions(+)
>
> diff --git a/arch/arm64/kernel/ipi_nmi.c b/arch/arm64/kernel/ipi_nmi.c
> index c592e92b8cbf..2adaaf1519e5 100644
> --- a/arch/arm64/kernel/ipi_nmi.c
> +++ b/arch/arm64/kernel/ipi_nmi.c
> @@ -8,6 +8,7 @@
>
>  #include <linux/interrupt.h>
>  #include <linux/irq.h>
> +#include <linux/kgdb.h>
>  #include <linux/nmi.h>
>  #include <linux/smp.h>
>
> @@ -45,10 +46,14 @@ bool arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
>  static irqreturn_t ipi_nmi_handler(int irq, void *data)
>  {
>  	irqreturn_t ret = IRQ_NONE;
> +	unsigned int cpu = smp_processor_id();

Does this play nice with CONFIG_DEBUG_PREEMPT? I may have missed
something about the NMI entry but a quick scan of the arm64
arch_irq_disabled() suggests that debug_smp_processor_id() will issue
warnings at this point...

Other than I didn't see anything I don't like here.


Daniel.
  
Doug Anderson May 15, 2023, 11:11 p.m. UTC | #2
Hi,

On Fri, May 12, 2023 at 7:00 AM Daniel Thompson
<daniel.thompson@linaro.org> wrote:
>
> On Wed, Apr 19, 2023 at 03:56:03PM -0700, Douglas Anderson wrote:
> > From: Sumit Garg <sumit.garg@linaro.org>
> >
> > arm64 platforms with GICv3 or later supports pseudo NMIs which can be
> > leveraged to roundup CPUs which are stuck in hard lockup state with
> > interrupts disabled that wouldn't be possible with a normal IPI.
> >
> > So instead switch to roundup CPUs using IPI turned as NMI. And in
> > case a particular arm64 platform doesn't supports pseudo NMIs,
> > it will switch back to default kgdb CPUs roundup mechanism.
> >
> > Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
> > Tested-by: Chen-Yu Tsai <wens@csie.org>
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > ---
> >
> > (no changes since v1)
> >
> >  arch/arm64/kernel/ipi_nmi.c |  5 +++++
> >  arch/arm64/kernel/kgdb.c    | 18 ++++++++++++++++++
> >  2 files changed, 23 insertions(+)
> >
> > diff --git a/arch/arm64/kernel/ipi_nmi.c b/arch/arm64/kernel/ipi_nmi.c
> > index c592e92b8cbf..2adaaf1519e5 100644
> > --- a/arch/arm64/kernel/ipi_nmi.c
> > +++ b/arch/arm64/kernel/ipi_nmi.c
> > @@ -8,6 +8,7 @@
> >
> >  #include <linux/interrupt.h>
> >  #include <linux/irq.h>
> > +#include <linux/kgdb.h>
> >  #include <linux/nmi.h>
> >  #include <linux/smp.h>
> >
> > @@ -45,10 +46,14 @@ bool arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
> >  static irqreturn_t ipi_nmi_handler(int irq, void *data)
> >  {
> >       irqreturn_t ret = IRQ_NONE;
> > +     unsigned int cpu = smp_processor_id();
>
> Does this play nice with CONFIG_DEBUG_PREEMPT? I may have missed
> something about the NMI entry but a quick scan of the arm64
> arch_irq_disabled() suggests that debug_smp_processor_id() will issue
> warnings at this point...
>
> Other than I didn't see anything I don't like here.

Good question. It seems to, at least on the sc7180-trogdor-based
system I tested.

Just to confirm, I printed out the values of some of the stuff that's
checked. When this function was called, I saw:

irqs_disabled() => true
raw_local_save_flags() => 0x000000f0
__irqflags_uses_pmr() => 1

The "__irqflags_uses_pmr" is the thing that gets set when we try to
enable pseudo-NMIs and they're actually there. That causes us to call
__pmr_irqs_disabled_flags() which will compare the flags (0xf0) to
GIC_PRIO_IRQON (0xe0). If flags is not the same as GIC_PRIO_IRQON then
interrupts are disabled.

...so, assuming I understood everything, I think we're OK.

I also tried to see what happened with the whole "fallback to use
regular IPIs if we don't have pseudo-NMIs enabled" (AKA patch #10 in
this series). In that case, I had:

irqs_disabled() => true
raw_local_save_flags() => 0x000000c0
__irqflags_uses_pmr() => 0

...in this case we end up in __daif_irqs_disabled_flags(). That checks
to see if the flags (0xc0) has the "I bit" (0x80) set. It is set, so
interrupts are disabled.

-Doug
  

Patch

diff --git a/arch/arm64/kernel/ipi_nmi.c b/arch/arm64/kernel/ipi_nmi.c
index c592e92b8cbf..2adaaf1519e5 100644
--- a/arch/arm64/kernel/ipi_nmi.c
+++ b/arch/arm64/kernel/ipi_nmi.c
@@ -8,6 +8,7 @@ 
 
 #include <linux/interrupt.h>
 #include <linux/irq.h>
+#include <linux/kgdb.h>
 #include <linux/nmi.h>
 #include <linux/smp.h>
 
@@ -45,10 +46,14 @@  bool arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
 static irqreturn_t ipi_nmi_handler(int irq, void *data)
 {
 	irqreturn_t ret = IRQ_NONE;
+	unsigned int cpu = smp_processor_id();
 
 	if (nmi_cpu_backtrace(get_irq_regs()))
 		ret = IRQ_HANDLED;
 
+	if (!kgdb_nmicallback(cpu, get_irq_regs()))
+		ret = IRQ_HANDLED;
+
 	return ret;
 }
 
diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index cda9c1e9864f..2c85bc1df013 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -17,6 +17,7 @@ 
 
 #include <asm/debug-monitors.h>
 #include <asm/insn.h>
+#include <asm/nmi.h>
 #include <asm/patching.h>
 #include <asm/traps.h>
 
@@ -354,3 +355,20 @@  int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
 	return aarch64_insn_write((void *)bpt->bpt_addr,
 			*(u32 *)bpt->saved_instr);
 }
+
+void kgdb_roundup_cpus(void)
+{
+	struct cpumask mask;
+
+	if (!arm64_supports_nmi()) {
+		kgdb_smp_call_nmi_hook();
+		return;
+	}
+
+	cpumask_copy(&mask, cpu_online_mask);
+	cpumask_clear_cpu(raw_smp_processor_id(), &mask);
+	if (cpumask_empty(&mask))
+		return;
+
+	arm64_send_nmi(&mask);
+}