irqchip/sifive-plic: enable interrupt if needed before EOI

Message ID 20240126223836.202321-1-namcao@linutronix.de
State New
Headers
Series irqchip/sifive-plic: enable interrupt if needed before EOI |

Commit Message

Nam Cao Jan. 26, 2024, 10:38 p.m. UTC
  RISC-V PLIC cannot EOI disabled interrupts, as explained in the
description of Interrupt Completion in the PLIC spec:

"The PLIC signals it has completed executing an interrupt handler by
writing the interrupt ID it received from the claim to the claim/complete
register. The PLIC does not check whether the completion ID is the same
as the last claim ID for that target. If the completion ID does not match
an interrupt source that *is currently enabled* for the target, the
completion is silently ignored."

Commit 69ea463021be ("irqchip/sifive-plic: Fixup EOI failed when masked")
ensured that by enabling the interrupt if needed before EOI.

Commit a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask
operations") removed the interrupt enabling code from the previous
commit, because it assumes that interrupt should be enabled at the point
of EOI. However, this is incorrect: there is a small window after a hart
claiming an interrupt and before irq_desc->lock getting acquired,
interrupt can be disabled during this window. Thus, EOI can be invoked
while the interrupt is disabled, effectively nullify this EOI.

Make sure that interrupt is really enabled before EOI.

Fixes: a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask operations")
Cc: <stable@vger.kernel.org>
Signed-off-by: Nam Cao <namcao@linutronix.de>
---
 drivers/irqchip/irq-sifive-plic.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)
  

Comments

Samuel Holland Jan. 27, 2024, 12:31 a.m. UTC | #1
Hi,

On 2024-01-26 4:38 PM, Nam Cao wrote:
> RISC-V PLIC cannot EOI disabled interrupts, as explained in the
> description of Interrupt Completion in the PLIC spec:
> 
> "The PLIC signals it has completed executing an interrupt handler by
> writing the interrupt ID it received from the claim to the claim/complete
> register. The PLIC does not check whether the completion ID is the same
> as the last claim ID for that target. If the completion ID does not match
> an interrupt source that *is currently enabled* for the target, the
> completion is silently ignored."
> 
> Commit 69ea463021be ("irqchip/sifive-plic: Fixup EOI failed when masked")
> ensured that by enabling the interrupt if needed before EOI.
> 
> Commit a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask
> operations") removed the interrupt enabling code from the previous
> commit, because it assumes that interrupt should be enabled at the point
> of EOI. However, this is incorrect: there is a small window after a hart
> claiming an interrupt and before irq_desc->lock getting acquired,
> interrupt can be disabled during this window. Thus, EOI can be invoked
> while the interrupt is disabled, effectively nullify this EOI.
> 
> Make sure that interrupt is really enabled before EOI.

Could you please try the patch I previously sent for this issue[1]? It
should fix the bug without complicating the IRQ hot path.

Thanks,
Samuel

[1]: https://lore.kernel.org/lkml/20230717185841.1294425-1-samuel.holland@sifive.com/

> Fixes: a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask operations")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Nam Cao <namcao@linutronix.de>
> ---
>  drivers/irqchip/irq-sifive-plic.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
> index 5b7bc4fd9517..0857a516c35b 100644
> --- a/drivers/irqchip/irq-sifive-plic.c
> +++ b/drivers/irqchip/irq-sifive-plic.c
> @@ -148,7 +148,13 @@ static void plic_irq_eoi(struct irq_data *d)
>  {
>  	struct plic_handler *handler = this_cpu_ptr(&plic_handlers);
>  
> -	writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
> +	if (irqd_irq_disabled(d)) {
> +		plic_toggle(handler, d->hwirq, 1);
> +		writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
> +		plic_toggle(handler, d->hwirq, 0);
> +	} else {
> +		writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
> +	}
>  }
>  
>  #ifdef CONFIG_SMP
  
Nam Cao Jan. 27, 2024, 8:31 a.m. UTC | #2
On Fri, 26 Jan 2024 18:31:19 -0600 Samuel Holland <samuel.holland@sifive.com> wrote:
> On 2024-01-26 4:38 PM, Nam Cao wrote:
> > RISC-V PLIC cannot EOI disabled interrupts, as explained in the
> > description of Interrupt Completion in the PLIC spec:
> > 
> > "The PLIC signals it has completed executing an interrupt handler by
> > writing the interrupt ID it received from the claim to the claim/complete
> > register. The PLIC does not check whether the completion ID is the same
> > as the last claim ID for that target. If the completion ID does not match
> > an interrupt source that *is currently enabled* for the target, the
> > completion is silently ignored."
> > 
> > Commit 69ea463021be ("irqchip/sifive-plic: Fixup EOI failed when masked")
> > ensured that by enabling the interrupt if needed before EOI.
> > 
> > Commit a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask
> > operations") removed the interrupt enabling code from the previous
> > commit, because it assumes that interrupt should be enabled at the point
> > of EOI. However, this is incorrect: there is a small window after a hart
> > claiming an interrupt and before irq_desc->lock getting acquired,
> > interrupt can be disabled during this window. Thus, EOI can be invoked
> > while the interrupt is disabled, effectively nullify this EOI.
> > 
> > Make sure that interrupt is really enabled before EOI.  
> 
> Could you please try the patch I previously sent for this issue[1]?

Unfortunately my system still gets frozen with the patch applied :(

I think because the patch doesn't prevent plic_irq_shutdown() from getting
called after the hart claiming the interrupt and before irq_desc is locked
up.

> It should fix the bug without complicating the IRQ hot path.

I can add an unlikely() to help that a bit, because from my experience, it
is quite rare that EOI happens with interrupt disabled.

Best regards,
Nam
  

Patch

diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
index 5b7bc4fd9517..0857a516c35b 100644
--- a/drivers/irqchip/irq-sifive-plic.c
+++ b/drivers/irqchip/irq-sifive-plic.c
@@ -148,7 +148,13 @@  static void plic_irq_eoi(struct irq_data *d)
 {
 	struct plic_handler *handler = this_cpu_ptr(&plic_handlers);
 
-	writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
+	if (irqd_irq_disabled(d)) {
+		plic_toggle(handler, d->hwirq, 1);
+		writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
+		plic_toggle(handler, d->hwirq, 0);
+	} else {
+		writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
+	}
 }
 
 #ifdef CONFIG_SMP