PCI: imx6: install the fault handler only if we are really running on a compatible device

Message ID aa1c18c70bea1d6f99c88027dc72c700e8c309a2.1677573834.git.hns@goldelico.com
State New
Headers
Series PCI: imx6: install the fault handler only if we are really running on a compatible device |

Commit Message

H. Nikolaus Schaller Feb. 28, 2023, 8:43 a.m. UTC
  commit bb38919ec56e ("PCI: imx6: Add support for i.MX6 PCIe controller")
added a fault hook to this driver in the probe function. So it was only
installed if needed.

commit bde4a5a00e76 ("PCI: imx6: Allow probe deferral by reset GPIO")
moved it from probe to driver init which installs the hook unconditionally
as soon as the driver is compiled into a kernel.

When this driver is compiled as a module, the hook is not registered
until after the driver has been matched with a .compatible and
loaded.

commit 415b6185c541 ("PCI: imx6: Fix config read timeout handling")
extended the fault handling code.

commit 2d8ed461dbc9 ("PCI: imx6: Add support for i.MX8MQ")
added some protection for non-ARM architectures, but this does not
protect non-i.MX ARM architectures.

Since fault handlers can be triggered on any architecture for different
reasons, there is no guarantee that they will be triggered only for the
assumed situation, leading to improper error handling (i.MX6-specific
imx6q_pcie_abort_handler) on foreign systems.

I had seen strange L3 imprecise external abort messages several times on
OMAP4 and OMAP5 devices and couldn't make sense of them until I realized
they were related to this unused imx6q driver because I had
CONFIG_PCI_IMX6=y.

Note that CONFIG_PCI_IMX6=y is useful for kernel binaries that are designed
to run on different ARM SoC and be differentiated only by device tree
binaries. So turning off CONFIG_PCI_IMX6 is not a solution.

Therefore we check the compatible in the init function before registering
the fault handler.

Fixes: bde4a5a00e76 ("PCI: imx6: Allow probe deferral by reset GPIO")
Fixes: 415b6185c541 ("PCI: imx6: Fix config read timeout handling")
Fixes: 2d8ed461dbc9 ("PCI: imx6: Add support for i.MX8MQ")

Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
---
 drivers/pci/controller/dwc/pci-imx6.c | 9 +++++++++
 1 file changed, 9 insertions(+)
  

Comments

Bjorn Helgaas March 8, 2023, 6:49 p.m. UTC | #1
On Tue, Feb 28, 2023 at 09:43:54AM +0100, H. Nikolaus Schaller wrote:
> commit bb38919ec56e ("PCI: imx6: Add support for i.MX6 PCIe controller")
> added a fault hook to this driver in the probe function. So it was only
> installed if needed.
> 
> commit bde4a5a00e76 ("PCI: imx6: Allow probe deferral by reset GPIO")
> moved it from probe to driver init which installs the hook unconditionally
> as soon as the driver is compiled into a kernel.
> 
> When this driver is compiled as a module, the hook is not registered
> until after the driver has been matched with a .compatible and
> loaded.
> 
> commit 415b6185c541 ("PCI: imx6: Fix config read timeout handling")
> extended the fault handling code.
> 
> commit 2d8ed461dbc9 ("PCI: imx6: Add support for i.MX8MQ")
> added some protection for non-ARM architectures, but this does not
> protect non-i.MX ARM architectures.

Are *all* these commits relevant?  Question also applies to Fixes:
below.

> Since fault handlers can be triggered on any architecture for different
> reasons, there is no guarantee that they will be triggered only for the
> assumed situation, leading to improper error handling (i.MX6-specific
> imx6q_pcie_abort_handler) on foreign systems.
> 
> I had seen strange L3 imprecise external abort messages several times on
> OMAP4 and OMAP5 devices and couldn't make sense of them until I realized
> they were related to this unused imx6q driver because I had
> CONFIG_PCI_IMX6=y.

Apparently imx6q_pcie_abort_handler() assumes it is always called
because of a PCI abort?  If so, that sounds problematic.

If non-PCI imprecise aborts happen on OMAP4 and OMAP5 where imx6q is
unused and imx6q_pcie_abort_handler() is not appropriate, I assume
similar non-PCI aborts can also happen on systems where imx6q *is*
used.

So imx6q_pcie_abort_handler() may be trying to fixup non-PCI aborts
when it shouldn't?

> Note that CONFIG_PCI_IMX6=y is useful for kernel binaries that are designed
> to run on different ARM SoC and be differentiated only by device tree
> binaries. So turning off CONFIG_PCI_IMX6 is not a solution.
> 
> Therefore we check the compatible in the init function before registering
> the fault handler.
> 
> Fixes: bde4a5a00e76 ("PCI: imx6: Allow probe deferral by reset GPIO")
> Fixes: 415b6185c541 ("PCI: imx6: Fix config read timeout handling")
> Fixes: 2d8ed461dbc9 ("PCI: imx6: Add support for i.MX8MQ")
> 
> Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
> ---
>  drivers/pci/controller/dwc/pci-imx6.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/pci/controller/dwc/pci-imx6.c b/drivers/pci/controller/dwc/pci-imx6.c
> index 1dde5c579edc8..89774aa187ae8 100644
> --- a/drivers/pci/controller/dwc/pci-imx6.c
> +++ b/drivers/pci/controller/dwc/pci-imx6.c
> @@ -1402,6 +1402,15 @@ DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_SYNOPSYS, 0xabcd,
>  static int __init imx6_pcie_init(void)
>  {
>  #ifdef CONFIG_ARM
> +	const struct of_device_id *reboot_id;
> +	struct device_node *np;
> +
> +	np = of_find_matching_node_and_match(NULL, imx6_pcie_of_match,
> +					     &reboot_id);

Since you don't need reboot_id, I think you should use
of_find_matching_node() instead.

> +	if (!np)
> +		return -ENODEV;
> +	of_node_put(np);
> +
>  	/*
>  	 * Since probe() can be deferred we need to make sure that
>  	 * hook_fault_code is not called after __init memory is freed
> -- 
> 2.38.1
>
  
H. Nikolaus Schaller March 8, 2023, 8:40 p.m. UTC | #2
Hi Bjorn,

> Am 08.03.2023 um 19:49 schrieb Bjorn Helgaas <helgaas@kernel.org>:
> 
> On Tue, Feb 28, 2023 at 09:43:54AM +0100, H. Nikolaus Schaller wrote:
>> commit bb38919ec56e ("PCI: imx6: Add support for i.MX6 PCIe controller")
>> added a fault hook to this driver in the probe function. So it was only
>> installed if needed.
>> 
>> commit bde4a5a00e76 ("PCI: imx6: Allow probe deferral by reset GPIO")
>> moved it from probe to driver init which installs the hook unconditionally
>> as soon as the driver is compiled into a kernel.
>> 
>> When this driver is compiled as a module, the hook is not registered
>> until after the driver has been matched with a .compatible and
>> loaded.
>> 
>> commit 415b6185c541 ("PCI: imx6: Fix config read timeout handling")
>> extended the fault handling code.
>> 
>> commit 2d8ed461dbc9 ("PCI: imx6: Add support for i.MX8MQ")
>> added some protection for non-ARM architectures, but this does not
>> protect non-i.MX ARM architectures.
> 
> Are *all* these commits relevant?

Yes, it was correct when introduced by commit bb38919ec56e for a goo reason.
And it was broken by bde4a5a00e76 an all attempts later made it worse.

>  Question also applies to Fixes:
> below.

It fixes all between bde4a5a00e76 and HEAD. Well, one can argue that
commit bde4a5a00e76 could be sufficient for Fixes:

I don't know if it is a problem because I have no overview over side-effects.

> 
>> Since fault handlers can be triggered on any architecture for different
>> reasons, there is no guarantee that they will be triggered only for the
>> assumed situation, leading to improper error handling (i.MX6-specific
>> imx6q_pcie_abort_handler) on foreign systems.
>> 
>> I had seen strange L3 imprecise external abort messages several times on
>> OMAP4 and OMAP5 devices and couldn't make sense of them until I realized
>> they were related to this unused imx6q driver because I had
>> CONFIG_PCI_IMX6=y.
> 
> Apparently imx6q_pcie_abort_handler() assumes it is always called
> because of a PCI abort?  If so, that sounds problematic.

> 
> If non-PCI imprecise aborts happen on OMAP4 and OMAP5 where imx6q is
> unused and imx6q_pcie_abort_handler() is not appropriate, I assume
> similar non-PCI aborts can also happen on systems where imx6q *is*
> used.

As far as I know the reasons why imprecise aborts occur may be SoC specific.

So I have no experience with i.MX6 to judge this. My goal is to shield other
architectures from this fault handler may it be correct or wrong.

> So imx6q_pcie_abort_handler() may be trying to fixup non-PCI aborts
> when it shouldn't?

Yes, at least if it is triggered on OMAP4/OMAP5 by accessing non-existing
registers in some subsystems (e.g. through devmem2).

> 
>> Note that CONFIG_PCI_IMX6=y is useful for kernel binaries that are designed
>> to run on different ARM SoC and be differentiated only by device tree
>> binaries. So turning off CONFIG_PCI_IMX6 is not a solution.
>> 
>> Therefore we check the compatible in the init function before registering
>> the fault handler.
>> 
>> Fixes: bde4a5a00e76 ("PCI: imx6: Allow probe deferral by reset GPIO")
>> Fixes: 415b6185c541 ("PCI: imx6: Fix config read timeout handling")
>> Fixes: 2d8ed461dbc9 ("PCI: imx6: Add support for i.MX8MQ")
>> 
>> Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
>> ---
>> drivers/pci/controller/dwc/pci-imx6.c | 9 +++++++++
>> 1 file changed, 9 insertions(+)
>> 
>> diff --git a/drivers/pci/controller/dwc/pci-imx6.c b/drivers/pci/controller/dwc/pci-imx6.c
>> index 1dde5c579edc8..89774aa187ae8 100644
>> --- a/drivers/pci/controller/dwc/pci-imx6.c
>> +++ b/drivers/pci/controller/dwc/pci-imx6.c
>> @@ -1402,6 +1402,15 @@ DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_SYNOPSYS, 0xabcd,
>> static int __init imx6_pcie_init(void)
>> {
>> #ifdef CONFIG_ARM
>> +	const struct of_device_id *reboot_id;
>> +	struct device_node *np;
>> +
>> +	np = of_find_matching_node_and_match(NULL, imx6_pcie_of_match,
>> +					     &reboot_id);
> 
> Since you don't need reboot_id, I think you should use
> of_find_matching_node() instead.

Well, I used it for debugging, but for production code it has indeed no benefit.

of_find_matching_node it is just a static inline wrapper for
of_find_matching_node_and_match with NULL parameter, but we can save one stack position.

I'll send a v2 soon.

> 
>> +	if (!np)
>> +		return -ENODEV;
>> +	of_node_put(np);
>> +
>> 	/*
>> 	 * Since probe() can be deferred we need to make sure that
>> 	 * hook_fault_code is not called after __init memory is freed
>> -- 
>> 2.38.1
>> 

BR and thanks,
Nikolaus
  

Patch

diff --git a/drivers/pci/controller/dwc/pci-imx6.c b/drivers/pci/controller/dwc/pci-imx6.c
index 1dde5c579edc8..89774aa187ae8 100644
--- a/drivers/pci/controller/dwc/pci-imx6.c
+++ b/drivers/pci/controller/dwc/pci-imx6.c
@@ -1402,6 +1402,15 @@  DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_SYNOPSYS, 0xabcd,
 static int __init imx6_pcie_init(void)
 {
 #ifdef CONFIG_ARM
+	const struct of_device_id *reboot_id;
+	struct device_node *np;
+
+	np = of_find_matching_node_and_match(NULL, imx6_pcie_of_match,
+					     &reboot_id);
+	if (!np)
+		return -ENODEV;
+	of_node_put(np);
+
 	/*
 	 * Since probe() can be deferred we need to make sure that
 	 * hook_fault_code is not called after __init memory is freed