net: phy: adin1100: Fix nullptr exception for phy interrupts

Message ID 20240118104341.10832-1-andre.werner@systec-electronic.com
State New
Headers
Series net: phy: adin1100: Fix nullptr exception for phy interrupts |

Commit Message

Andre Werner Jan. 18, 2024, 10:43 a.m. UTC
  If using ADIN1100 as an external phy, e.g. in combination with
"smsc95xx", we ran into nullptr exception by creating a link.

In our case the "smsc95xx" does not check for an available interrupt handler
on external phy driver to use poll instead of interrupts if no handler is
available. So we decide to implement a small handler in the phy driver
to support other MACs as well.

I update the driver to add an interrupt handler because libphy
does not check if their is a interrupt handler available either.

There are several interrupts maskable at the phy, but only link change interrupts
are handled by the driver yet.

We tested the combination "smsc95xx" and "adin1100" with Linux Kernel 6.69
and Linux Kernel 6.1.0, respectively.

Signed-off-by: Andre Werner <andre.werner@systec-electronic.com>
---
 drivers/net/phy/adin1100.c | 58 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)
  

Comments

Andrew Lunn Jan. 18, 2024, 4:53 p.m. UTC | #1
On Thu, Jan 18, 2024 at 11:43:41AM +0100, Andre Werner wrote:
> If using ADIN1100 as an external phy, e.g. in combination with
> "smsc95xx", we ran into nullptr exception by creating a link.
> 
> In our case the "smsc95xx" does not check for an available interrupt handler
> on external phy driver to use poll instead of interrupts if no handler is
> available. So we decide to implement a small handler in the phy driver
> to support other MACs as well.
> 
> I update the driver to add an interrupt handler because libphy
> does not check if their is a interrupt handler available either.
> 
> There are several interrupts maskable at the phy, but only link change interrupts
> are handled by the driver yet.
> 
> We tested the combination "smsc95xx" and "adin1100" with Linux Kernel 6.6.9
> and Linux Kernel 6.1.0, respectively.

Hi Andre

A few different things....

Please could you give more details of the null pointer
exception. phylib should test if the needed methods have been
implemented in the PHY driver, and not tried to use interrupts when
they are missing. It should of polled the PHY. So i would like to
understand what went wrong. Maybe we have a phylib core bug we should
be fixing. Or a bug in the smsc95xx driver.

Please take a read of
https://www.kernel.org/doc/html/latest/process/maintainer-netdev.html

Patches like this should be against net-next, not 6.6.9 etc. Also,
net-next is currently closed due to the merge window being open. Its
fine to post patches, but please mark them RFC until the merge window
is over.

The patch itself looks O.K, but i would make the commit message more
formal. You can add additional comments under the --- which will not
become part of the git history.

       Andrew
  
Andrew Lunn Jan. 18, 2024, 8:09 p.m. UTC | #2
On Thu, Jan 18, 2024 at 06:36:16PM +0100, Heiner Kallweit wrote:
> On 18.01.2024 17:53, Andrew Lunn wrote:
> > On Thu, Jan 18, 2024 at 11:43:41AM +0100, Andre Werner wrote:
> >> If using ADIN1100 as an external phy, e.g. in combination with
> >> "smsc95xx", we ran into nullptr exception by creating a link.
> >>
> >> In our case the "smsc95xx" does not check for an available interrupt handler
> >> on external phy driver to use poll instead of interrupts if no handler is
> >> available. So we decide to implement a small handler in the phy driver
> >> to support other MACs as well.
> >>
> >> I update the driver to add an interrupt handler because libphy
> >> does not check if their is a interrupt handler available either.
> >>
> >> There are several interrupts maskable at the phy, but only link change interrupts
> >> are handled by the driver yet.
> >>
> >> We tested the combination "smsc95xx" and "adin1100" with Linux Kernel 6.6.9
> >> and Linux Kernel 6.1.0, respectively.
> > 
> > Hi Andre
> > 
> > A few different things....
> > 
> > Please could you give more details of the null pointer
> > exception. phylib should test if the needed methods have been
> > implemented in the PHY driver, and not tried to use interrupts when
> > they are missing. It should of polled the PHY. So i would like to
> > understand what went wrong. Maybe we have a phylib core bug we should
> > be fixing. Or a bug in the smsc95xx driver.
> > 
> Seems to be a bug in smsc95xx. The following is the relevant code piece.
> 
> ret = mdiobus_register(pdata->mdiobus);
> 	if (ret) {
> 		netdev_err(dev->net, "Could not register MDIO bus\n");
> 		goto free_mdio;
> 	}
> 
> 	pdata->phydev = phy_find_first(pdata->mdiobus);
> 	if (!pdata->phydev) {
> 		netdev_err(dev->net, "no PHY found\n");
> 		ret = -ENODEV;
> 		goto unregister_mdio;
> 	}
> 
> 	pdata->phydev->irq = phy_irq;
> 
> The interrupt is set too late, after phy_probe(), where we have this:
> 
> if (!phy_drv_supports_irq(phydrv) && phy_interrupt_is_valid(phydev))
> 		phydev->irq = PHY_POLL;
> 
> So we would have two steps:
> 1. Fix the smsc95xx bug (as the same issue could occur with another PHY type)

Its not so nice to fix.

Normally you would do something like:

        pdata->mdiobus->priv = dev;
        pdata->mdiobus->read = smsc95xx_mdiobus_read;
        pdata->mdiobus->write = smsc95xx_mdiobus_write;
        pdata->mdiobus->reset = smsc95xx_mdiobus_reset;
        pdata->mdiobus->name = "smsc95xx-mdiobus";
        pdata->mdiobus->parent = &dev->udev->dev;

        snprintf(pdata->mdiobus->id, ARRAY_SIZE(pdata->mdiobus->id),
                 "usb-%03d:%03d", dev->udev->bus->busnum, dev->udev->devnum);

	pdata->mdiobus->irq[X] = phy_irq;

	ret = mdiobus_register(pdata->mdiobus);

By setting pdata->mdiobus->irq[X] before registering the PHY, the irq
number gets passed to the phydev->irq very early on. If everything is
O.K, interrupts are then used.

However, because of the use of phy_find_first(), we have no idea what
address on the bus the PHY is using. So we don't know which member of
irq[] to set. Its not ideal, but one solution is to set them all.

However, a better solution is to perform the validation again in
phy_attach_direct(). Add a second:

	if (!phy_drv_supports_irq(phydrv) && phy_interrupt_is_valid(phydev))
        	phydev->irq = PHY_POLL;

which will force phydev->irq back to polling.

      Andrew
  
Heiner Kallweit Jan. 18, 2024, 9:24 p.m. UTC | #3
On 18.01.2024 21:09, Andrew Lunn wrote:
> On Thu, Jan 18, 2024 at 06:36:16PM +0100, Heiner Kallweit wrote:
>> On 18.01.2024 17:53, Andrew Lunn wrote:
>>> On Thu, Jan 18, 2024 at 11:43:41AM +0100, Andre Werner wrote:
>>>> If using ADIN1100 as an external phy, e.g. in combination with
>>>> "smsc95xx", we ran into nullptr exception by creating a link.
>>>>
>>>> In our case the "smsc95xx" does not check for an available interrupt handler
>>>> on external phy driver to use poll instead of interrupts if no handler is
>>>> available. So we decide to implement a small handler in the phy driver
>>>> to support other MACs as well.
>>>>
>>>> I update the driver to add an interrupt handler because libphy
>>>> does not check if their is a interrupt handler available either.
>>>>
>>>> There are several interrupts maskable at the phy, but only link change interrupts
>>>> are handled by the driver yet.
>>>>
>>>> We tested the combination "smsc95xx" and "adin1100" with Linux Kernel 6.6.9
>>>> and Linux Kernel 6.1.0, respectively.
>>>
>>> Hi Andre
>>>
>>> A few different things....
>>>
>>> Please could you give more details of the null pointer
>>> exception. phylib should test if the needed methods have been
>>> implemented in the PHY driver, and not tried to use interrupts when
>>> they are missing. It should of polled the PHY. So i would like to
>>> understand what went wrong. Maybe we have a phylib core bug we should
>>> be fixing. Or a bug in the smsc95xx driver.
>>>
>> Seems to be a bug in smsc95xx. The following is the relevant code piece.
>>
>> ret = mdiobus_register(pdata->mdiobus);
>> 	if (ret) {
>> 		netdev_err(dev->net, "Could not register MDIO bus\n");
>> 		goto free_mdio;
>> 	}
>>
>> 	pdata->phydev = phy_find_first(pdata->mdiobus);
>> 	if (!pdata->phydev) {
>> 		netdev_err(dev->net, "no PHY found\n");
>> 		ret = -ENODEV;
>> 		goto unregister_mdio;
>> 	}
>>
>> 	pdata->phydev->irq = phy_irq;
>>
>> The interrupt is set too late, after phy_probe(), where we have this:
>>
>> if (!phy_drv_supports_irq(phydrv) && phy_interrupt_is_valid(phydev))
>> 		phydev->irq = PHY_POLL;
>>
>> So we would have two steps:
>> 1. Fix the smsc95xx bug (as the same issue could occur with another PHY type)
> 
> Its not so nice to fix.
> 
> Normally you would do something like:
> 
>         pdata->mdiobus->priv = dev;
>         pdata->mdiobus->read = smsc95xx_mdiobus_read;
>         pdata->mdiobus->write = smsc95xx_mdiobus_write;
>         pdata->mdiobus->reset = smsc95xx_mdiobus_reset;
>         pdata->mdiobus->name = "smsc95xx-mdiobus";
>         pdata->mdiobus->parent = &dev->udev->dev;
> 
>         snprintf(pdata->mdiobus->id, ARRAY_SIZE(pdata->mdiobus->id),
>                  "usb-%03d:%03d", dev->udev->bus->busnum, dev->udev->devnum);
> 
> 	pdata->mdiobus->irq[X] = phy_irq;
> 
> 	ret = mdiobus_register(pdata->mdiobus);
> 
> By setting pdata->mdiobus->irq[X] before registering the PHY, the irq
> number gets passed to the phydev->irq very early on. If everything is
> O.K, interrupts are then used.
> 
> However, because of the use of phy_find_first(), we have no idea what
> address on the bus the PHY is using. So we don't know which member of
> irq[] to set. Its not ideal, but one solution is to set them all.
> 
> However, a better solution is to perform the validation again in
> phy_attach_direct(). Add a second:
> 
> 	if (!phy_drv_supports_irq(phydrv) && phy_interrupt_is_valid(phydev))
>         	phydev->irq = PHY_POLL;
> 
This would save us here, but can't prevent that phydev->irq may be set
even later. I think, ideally nobody should ever access phydev->irq directly.
There should be a setter which performs the needed checks.
But it may be a longer journey to make parts of struct phy_device private
to phylib.

> which will force phydev->irq back to polling.
> 
>       Andrew

Heiner
  
Andre Werner Jan. 19, 2024, 3:05 a.m. UTC | #4
On Thu, 18 Jan 2024, Russell King (Oracle) wrote:

> In addition to Andrew's comments:
>
> On Thu, Jan 18, 2024 at 11:43:41AM +0100, Andre Werner wrote:
>> +static int adin_config_intr(struct phy_device *phydev)
>> +{
>> +	int ret, regval;
>> +
>> +	ret = adin_phy_ack_intr(phydev);
>> +
>> +	if (ret)
>> +		return ret;
>> +
>> +	regval = phy_read_mmd(phydev, MDIO_MMD_VEND2, ADIN_PHY_SUBSYS_IRQ_MASK);
>> +	if (regval < 0)
>> +		return regval;
>> +
>> +	if (phydev->interrupts == PHY_INTERRUPT_ENABLED)
>> +		regval |= ADIN_LINK_STAT_CHNG_IRQ_EN;
>> +	else
>> +		regval &= ~ADIN_LINK_STAT_CHNG_IRQ_EN;
>> +
>> +	ret = phy_write_mmd(phydev, MDIO_MMD_VEND2,
>> +			    ADIN_PHY_SUBSYS_IRQ_MASK,
>> +			    regval);
>> +	return ret;
>
> 	u16 irq_mask;
>
> 	if (phydev->interrupts == PHY_INTERRUPT_ENABLED)
> 		irq_mask = ADIN_LINK_STAT_CHNG_IRQ_EN;
> 	else
> 		irq_mask = 0;

Unfortunately, I can not do this, because that phy ask for read-modify-write access to
registers and some bits in this register are already set after reset and
should not being modified.

>
> 	return phy_modify_mmd(phydev, MDIO_MMD_VEND2,
> 			      ADIN_PHY_SUBSYS_IRQ_MASK,
> 			      ADIN_LINK_STAT_CHNG_IRQ_EN, irq_mask);
>
>> +}
>> +
>> +static irqreturn_t adin_phy_handle_interrupt(struct phy_device *phydev)
>> +{
>> +	int irq_status;
>> +
>> +	irq_status = phy_read_mmd(phydev, MDIO_MMD_VEND2, ADIN_PHY_SUBSYS_IRQ_STATUS);
>
> Probably want to wrap this - if you're going to bother wrapping your
> phy_write_mmd() above because it overflows 80 columns, then please do
> so consistently.

Done.

>
> Thanks.
>
> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
>
  
Andre Werner Jan. 19, 2024, 6:30 a.m. UTC | #5
On Thu, 18 Jan 2024, Heiner Kallweit wrote:

> On 18.01.2024 21:09, Andrew Lunn wrote:
>> On Thu, Jan 18, 2024 at 06:36:16PM +0100, Heiner Kallweit wrote:
>>> On 18.01.2024 17:53, Andrew Lunn wrote:
>>>> On Thu, Jan 18, 2024 at 11:43:41AM +0100, Andre Werner wrote:
>>>>> If using ADIN1100 as an external phy, e.g. in combination with
>>>>> "smsc95xx", we ran into nullptr exception by creating a link.
>>>>>
>>>>> In our case the "smsc95xx" does not check for an available interrupt handler
>>>>> on external phy driver to use poll instead of interrupts if no handler is
>>>>> available. So we decide to implement a small handler in the phy driver
>>>>> to support other MACs as well.
>>>>>
>>>>> I update the driver to add an interrupt handler because libphy
>>>>> does not check if their is a interrupt handler available either.
>>>>>
>>>>> There are several interrupts maskable at the phy, but only link change interrupts
>>>>> are handled by the driver yet.
>>>>>
>>>>> We tested the combination "smsc95xx" and "adin1100" with Linux Kernel 6.6.9
>>>>> and Linux Kernel 6.1.0, respectively.
>>>>
>>>> Hi Andre
>>>>
>>>> A few different things....
>>>>
>>>> Please could you give more details of the null pointer
>>>> exception. phylib should test if the needed methods have been
>>>> implemented in the PHY driver, and not tried to use interrupts when
>>>> they are missing. It should of polled the PHY. So i would like to
>>>> understand what went wrong. Maybe we have a phylib core bug we should
>>>> be fixing. Or a bug in the smsc95xx driver.
>>>>
>>> Seems to be a bug in smsc95xx. The following is the relevant code piece.
>>>
>>> ret = mdiobus_register(pdata->mdiobus);
>>> 	if (ret) {
>>> 		netdev_err(dev->net, "Could not register MDIO bus\n");
>>> 		goto free_mdio;
>>> 	}
>>>
>>> 	pdata->phydev = phy_find_first(pdata->mdiobus);
>>> 	if (!pdata->phydev) {
>>> 		netdev_err(dev->net, "no PHY found\n");
>>> 		ret = -ENODEV;
>>> 		goto unregister_mdio;
>>> 	}
>>>
>>> 	pdata->phydev->irq = phy_irq;
>>>
>>> The interrupt is set too late, after phy_probe(), where we have this:
>>>
>>> if (!phy_drv_supports_irq(phydrv) && phy_interrupt_is_valid(phydev))
>>> 		phydev->irq = PHY_POLL;
>>>
>>> So we would have two steps:
>>> 1. Fix the smsc95xx bug (as the same issue could occur with another PHY type)
>>
>> Its not so nice to fix.
>>
>> Normally you would do something like:
>>
>>         pdata->mdiobus->priv = dev;
>>         pdata->mdiobus->read = smsc95xx_mdiobus_read;
>>         pdata->mdiobus->write = smsc95xx_mdiobus_write;
>>         pdata->mdiobus->reset = smsc95xx_mdiobus_reset;
>>         pdata->mdiobus->name = "smsc95xx-mdiobus";
>>         pdata->mdiobus->parent = &dev->udev->dev;
>>
>>         snprintf(pdata->mdiobus->id, ARRAY_SIZE(pdata->mdiobus->id),
>>                  "usb-%03d:%03d", dev->udev->bus->busnum, dev->udev->devnum);
>>
>> 	pdata->mdiobus->irq[X] = phy_irq;
>>
>> 	ret = mdiobus_register(pdata->mdiobus);
>>
>> By setting pdata->mdiobus->irq[X] before registering the PHY, the irq
>> number gets passed to the phydev->irq very early on. If everything is
>> O.K, interrupts are then used.
>>
>> However, because of the use of phy_find_first(), we have no idea what
>> address on the bus the PHY is using. So we don't know which member of
>> irq[] to set. Its not ideal, but one solution is to set them all.
>>
>> However, a better solution is to perform the validation again in
>> phy_attach_direct(). Add a second:
>>
>> 	if (!phy_drv_supports_irq(phydrv) && phy_interrupt_is_valid(phydev))
>>         	phydev->irq = PHY_POLL;

I add this check into phy_device.c-> phy_attach_direct as a work around
for now. I will send a new patch set to net-next as suggested.

>>
> This would save us here, but can't prevent that phydev->irq may be set
> even later. I think, ideally nobody should ever access phydev->irq directly.
> There should be a setter which performs the needed checks.
> But it may be a longer journey to make parts of struct phy_device private
> to phylib.
>
>> which will force phydev->irq back to polling.
>>
>>       Andrew
>
> Heiner
>

Regards,

Andre
  

Patch

diff --git a/drivers/net/phy/adin1100.c b/drivers/net/phy/adin1100.c
index 7619d6185801..ed8a7e6250cf 100644
--- a/drivers/net/phy/adin1100.c
+++ b/drivers/net/phy/adin1100.c
@@ -18,6 +18,12 @@ 
 #define PHY_ID_ADIN1110				0x0283bc91
 #define PHY_ID_ADIN2111				0x0283bca1
 
+#define ADIN_PHY_SUBSYS_IRQ_MASK		0x0021
+#define   ADIN_LINK_STAT_CHNG_IRQ_EN		BIT(1)
+
+#define ADIN_PHY_SUBSYS_IRQ_STATUS		0x0011
+#define   ADIN_LINK_STAT_CHNG			BIT(1)
+
 #define ADIN_FORCED_MODE			0x8000
 #define   ADIN_FORCED_MODE_EN			BIT(0)
 
@@ -136,6 +142,56 @@  static int adin_config_aneg(struct phy_device *phydev)
 	return genphy_c45_config_aneg(phydev);
 }
 
+static int adin_phy_ack_intr(struct phy_device *phydev)
+{
+	/* Clear pending interrupts */
+	int rc = phy_read_mmd(phydev, MDIO_MMD_VEND2, ADIN_PHY_SUBSYS_IRQ_STATUS);
+
+	return rc < 0 ? rc : 0;
+}
+
+static int adin_config_intr(struct phy_device *phydev)
+{
+	int ret, regval;
+
+	ret = adin_phy_ack_intr(phydev);
+
+	if (ret)
+		return ret;
+
+	regval = phy_read_mmd(phydev, MDIO_MMD_VEND2, ADIN_PHY_SUBSYS_IRQ_MASK);
+	if (regval < 0)
+		return regval;
+
+	if (phydev->interrupts == PHY_INTERRUPT_ENABLED)
+		regval |= ADIN_LINK_STAT_CHNG_IRQ_EN;
+	else
+		regval &= ~ADIN_LINK_STAT_CHNG_IRQ_EN;
+
+	ret = phy_write_mmd(phydev, MDIO_MMD_VEND2,
+			    ADIN_PHY_SUBSYS_IRQ_MASK,
+			    regval);
+	return ret;
+}
+
+static irqreturn_t adin_phy_handle_interrupt(struct phy_device *phydev)
+{
+	int irq_status;
+
+	irq_status = phy_read_mmd(phydev, MDIO_MMD_VEND2, ADIN_PHY_SUBSYS_IRQ_STATUS);
+	if (irq_status < 0) {
+		phy_error(phydev);
+		return IRQ_NONE;
+	}
+
+	if (!(irq_status & ADIN_LINK_STAT_CHNG))
+		return IRQ_NONE;
+
+	phy_trigger_machine(phydev);
+
+	return IRQ_HANDLED;
+}
+
 static int adin_set_powerdown_mode(struct phy_device *phydev, bool en)
 {
 	int ret;
@@ -275,6 +331,8 @@  static struct phy_driver adin_driver[] = {
 		.probe			= adin_probe,
 		.config_aneg		= adin_config_aneg,
 		.read_status		= adin_read_status,
+		.config_intr		= adin_config_intr,
+		.handle_interrupt	= adin_phy_handle_interrupt,
 		.set_loopback		= adin_set_loopback,
 		.suspend		= adin_suspend,
 		.resume			= adin_resume,