iommu/mediatek: Fix crash on isr after kexec()

Message ID 20221125-mtk-iommu-v1-0-bb5ecac97a28@chromium.org
State New
Headers
Series iommu/mediatek: Fix crash on isr after kexec() |

Commit Message

Ricardo Ribalda Nov. 25, 2022, 4:28 p.m. UTC
  If the system is rebooted via isr(), the IRQ handler might be triggerd
before the domain is initialized. Resulting on an invalid memory access
error.

Fix:
[    0.500930] Unable to handle kernel read from unreadable memory at virtual address 0000000000000070
[    0.501166] Call trace:
[    0.501174]  report_iommu_fault+0x28/0xfc
[    0.501180]  mtk_iommu_isr+0x10c/0x1c0

Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
---
To: Yong Wu <yong.wu@mediatek.com>
To: Joerg Roedel <joro@8bytes.org>
To: Will Deacon <will@kernel.org>
To: Robin Murphy <robin.murphy@arm.com>
To: Matthias Brugger <matthias.bgg@gmail.com>
Cc: iommu@lists.linux.dev
Cc: linux-mediatek@lists.infradead.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/iommu/mtk_iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


---
base-commit: 4312098baf37ee17a8350725e6e0d0e8590252d4
change-id: 20221125-mtk-iommu-13023f971298

Best regards,
  

Comments

Robin Murphy Nov. 25, 2022, 5:02 p.m. UTC | #1
On 2022-11-25 16:28, Ricardo Ribalda wrote:
> If the system is rebooted via isr(), the IRQ handler might be triggerd
> before the domain is initialized. Resulting on an invalid memory access
> error.
> 
> Fix:
> [    0.500930] Unable to handle kernel read from unreadable memory at virtual address 0000000000000070
> [    0.501166] Call trace:
> [    0.501174]  report_iommu_fault+0x28/0xfc
> [    0.501180]  mtk_iommu_isr+0x10c/0x1c0

Hmm, shouldn't we clear any pending faults at probe in 
mtk_iommu_hw_init(), before the IRQ is requested? mtk_iommu_isr() might 
still want to be robust against a spurious interrupt, but then it can 
simply return without doing anything at all if the domain is NULL, since 
we'll know that's the case.

Thanks,
Robin.

(It might be nice if request_irq() had a flag to say "if this IRQ looks 
pending already just clear it" for drivers that know it could only be 
spurious at that point; kexec seems to lead to this problem quite a lot...)

> Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
> ---
> To: Yong Wu <yong.wu@mediatek.com>
> To: Joerg Roedel <joro@8bytes.org>
> To: Will Deacon <will@kernel.org>
> To: Robin Murphy <robin.murphy@arm.com>
> To: Matthias Brugger <matthias.bgg@gmail.com>
> Cc: iommu@lists.linux.dev
> Cc: linux-mediatek@lists.infradead.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> ---
>   drivers/iommu/mtk_iommu.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 2ab2ecfe01f8..17f6be5a5097 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -454,7 +454,7 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
>   		fault_larb = data->plat_data->larbid_remap[fault_larb][sub_comm];
>   	}
>   
> -	if (report_iommu_fault(&dom->domain, bank->parent_dev, fault_iova,
> +	if (dom && report_iommu_fault(&dom->domain, bank->parent_dev, fault_iova,
>   			       write ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ)) {
>   		dev_err_ratelimited(
>   			bank->parent_dev,
> 
> ---
> base-commit: 4312098baf37ee17a8350725e6e0d0e8590252d4
> change-id: 20221125-mtk-iommu-13023f971298
> 
> Best regards,
  
Ricardo Ribalda Nov. 25, 2022, 5:15 p.m. UTC | #2
Hi Robin


Thanks for your  review!

On Fri, 25 Nov 2022 at 18:02, Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2022-11-25 16:28, Ricardo Ribalda wrote:
> > If the system is rebooted via isr(), the IRQ handler might be triggerd
> > before the domain is initialized. Resulting on an invalid memory access
> > error.
> >
> > Fix:
> > [    0.500930] Unable to handle kernel read from unreadable memory at virtual address 0000000000000070
> > [    0.501166] Call trace:
> > [    0.501174]  report_iommu_fault+0x28/0xfc
> > [    0.501180]  mtk_iommu_isr+0x10c/0x1c0
>
> Hmm, shouldn't we clear any pending faults at probe in
> mtk_iommu_hw_init(), before the IRQ is requested? mtk_iommu_isr() might
> still want to be robust against a spurious interrupt, but then it can
> simply return without doing anything at all if the domain is NULL, since
> we'll know that's the case.
>
> Thanks,
> Robin.
>
> (It might be nice if request_irq() had a flag to say "if this IRQ looks
> pending already just clear it" for drivers that know it could only be
> spurious at that point; kexec seems to lead to this problem quite a lot...)

It is not only about the "last" IRQ before kexec. The peripherals
under the IOMMU might still active and producing faults and therefore
IRQs.

I tried this:

@@ -886,6 +886,11 @@ static int mtk_iommu_hw_init(const struct
mtk_iommu_data *data, unsigned int ban
                         upper_32_bits(data->protect_base);
        writel_relaxed(regval, bankx->base + REG_MMU_IVRP_PADDR);

+       /* Clear previous IRQs */
+       regval = readl_relaxed(bankx->base + REG_MMU_INT_CONTROL0);
+       regval |= F_INT_CLR_BIT;
+       writel_relaxed(regval, bankx->base + REG_MMU_INT_CONTROL0);
+
        if (devm_request_irq(bankx->pdev, bankx->irq, mtk_iommu_isr, 0,
                             dev_name(bankx->pdev), (void *)bankx)) {
                writel_relaxed(0, bankx->base + REG_MMU_PT_BASE_ADDR);

And I still get the same crash


>
> > Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
> > ---
> > To: Yong Wu <yong.wu@mediatek.com>
> > To: Joerg Roedel <joro@8bytes.org>
> > To: Will Deacon <will@kernel.org>
> > To: Robin Murphy <robin.murphy@arm.com>
> > To: Matthias Brugger <matthias.bgg@gmail.com>
> > Cc: iommu@lists.linux.dev
> > Cc: linux-mediatek@lists.infradead.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-kernel@vger.kernel.org
> > ---
> >   drivers/iommu/mtk_iommu.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > index 2ab2ecfe01f8..17f6be5a5097 100644
> > --- a/drivers/iommu/mtk_iommu.c
> > +++ b/drivers/iommu/mtk_iommu.c
> > @@ -454,7 +454,7 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
> >               fault_larb = data->plat_data->larbid_remap[fault_larb][sub_comm];
> >       }
> >
> > -     if (report_iommu_fault(&dom->domain, bank->parent_dev, fault_iova,
> > +     if (dom && report_iommu_fault(&dom->domain, bank->parent_dev, fault_iova,
> >                              write ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ)) {
> >               dev_err_ratelimited(
> >                       bank->parent_dev,
> >
> > ---
> > base-commit: 4312098baf37ee17a8350725e6e0d0e8590252d4
> > change-id: 20221125-mtk-iommu-13023f971298
> >
> > Best regards,
  
Yong Wu Nov. 28, 2022, 6:44 a.m. UTC | #3
On Fri, 2022-11-25 at 17:28 +0100, Ricardo Ribalda wrote:
> If the system is rebooted via isr(), the IRQ handler might be
> triggerd
> before the domain is initialized. Resulting on an invalid memory
> access
> error.
> 
> Fix:
> [    0.500930] Unable to handle kernel read from unreadable memory at
> virtual address 0000000000000070
> [    0.501166] Call trace:
> [    0.501174]  report_iommu_fault+0x28/0xfc
> [    0.501180]  mtk_iommu_isr+0x10c/0x1c0
> 
> Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
> ---
> To: Yong Wu <yong.wu@mediatek.com>
> To: Joerg Roedel <joro@8bytes.org>
> To: Will Deacon <will@kernel.org>
> To: Robin Murphy <robin.murphy@arm.com>
> To: Matthias Brugger <matthias.bgg@gmail.com>
> Cc: iommu@lists.linux.dev
> Cc: linux-mediatek@lists.infradead.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  drivers/iommu/mtk_iommu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 2ab2ecfe01f8..17f6be5a5097 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -454,7 +454,7 @@ static irqreturn_t mtk_iommu_isr(int irq, void
> *dev_id)
>  		fault_larb = data->plat_data-
> >larbid_remap[fault_larb][sub_comm];
>  	}
>  
> -	if (report_iommu_fault(&dom->domain, bank->parent_dev,
> fault_iova,
> +	if (dom && report_iommu_fault(&dom->domain, bank->parent_dev,
> fault_iova,


Which SoC does this issue happen? Does this issue is happened in the 
upstream kernel or the downstream kernel? 

Normally each port enable the iommu defaultly. Let's print the error
log even though "dom" is null to check which port fail here. then
analyse the port's behavior.

if (!dom || report_iommu_fault(xx))
     dev_err_ratelimited(xx)

>  			       write ? IOMMU_FAULT_WRITE :
> IOMMU_FAULT_READ)) {
>  		dev_err_ratelimited(
>  			bank->parent_dev,
> 
> ---
> base-commit: 4312098baf37ee17a8350725e6e0d0e8590252d4
> change-id: 20221125-mtk-iommu-13023f971298
> 
> Best regards,
  
Ricardo Ribalda Nov. 28, 2022, 10:14 p.m. UTC | #4
Hi Yong


On Mon, 28 Nov 2022 at 07:44, Yong Wu (吴勇) <Yong.Wu@mediatek.com> wrote:
>
> On Fri, 2022-11-25 at 17:28 +0100, Ricardo Ribalda wrote:
> > If the system is rebooted via isr(), the IRQ handler might be
> > triggerd
> > before the domain is initialized. Resulting on an invalid memory
> > access
> > error.
> >
> > Fix:
> > [    0.500930] Unable to handle kernel read from unreadable memory at
> > virtual address 0000000000000070
> > [    0.501166] Call trace:
> > [    0.501174]  report_iommu_fault+0x28/0xfc
> > [    0.501180]  mtk_iommu_isr+0x10c/0x1c0
> >
> > Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
> > ---
> > To: Yong Wu <yong.wu@mediatek.com>
> > To: Joerg Roedel <joro@8bytes.org>
> > To: Will Deacon <will@kernel.org>
> > To: Robin Murphy <robin.murphy@arm.com>
> > To: Matthias Brugger <matthias.bgg@gmail.com>
> > Cc: iommu@lists.linux.dev
> > Cc: linux-mediatek@lists.infradead.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-kernel@vger.kernel.org
> > ---
> >  drivers/iommu/mtk_iommu.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > index 2ab2ecfe01f8..17f6be5a5097 100644
> > --- a/drivers/iommu/mtk_iommu.c
> > +++ b/drivers/iommu/mtk_iommu.c
> > @@ -454,7 +454,7 @@ static irqreturn_t mtk_iommu_isr(int irq, void
> > *dev_id)
> >               fault_larb = data->plat_data-
> > >larbid_remap[fault_larb][sub_comm];
> >       }
> >
> > -     if (report_iommu_fault(&dom->domain, bank->parent_dev,
> > fault_iova,
> > +     if (dom && report_iommu_fault(&dom->domain, bank->parent_dev,
> > fault_iova,
>
>
> Which SoC does this issue happen? Does this issue is happened in the
> upstream kernel or the downstream kernel?

I am using chromeos-5.10 and chromeos-5.15 (which are pretty much upstream).

I have seen this issue at least with MT8195 and MT8183


>
> Normally each port enable the iommu defaultly. Let's print the error
> log even though "dom" is null to check which port fail here. then
> analyse the port's behavior.
>
> if (!dom || report_iommu_fault(xx))
>      dev_err_ratelimited(xx)

sending a v2 with the change.

Thanks!


>
> >                              write ? IOMMU_FAULT_WRITE :
> > IOMMU_FAULT_READ)) {
> >               dev_err_ratelimited(
> >                       bank->parent_dev,
> >
> > ---
> > base-commit: 4312098baf37ee17a8350725e6e0d0e8590252d4
> > change-id: 20221125-mtk-iommu-13023f971298
> >
> > Best regards,
  

Patch

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 2ab2ecfe01f8..17f6be5a5097 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -454,7 +454,7 @@  static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
 		fault_larb = data->plat_data->larbid_remap[fault_larb][sub_comm];
 	}
 
-	if (report_iommu_fault(&dom->domain, bank->parent_dev, fault_iova,
+	if (dom && report_iommu_fault(&dom->domain, bank->parent_dev, fault_iova,
 			       write ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ)) {
 		dev_err_ratelimited(
 			bank->parent_dev,