[RFC,net-next,v2,1/2] net: phy: phy_device Prevent nullptr exceptions on ISR

Message ID 20240119065542.30279-1-andre.werner@systec-electronic.com
State New
Headers
Series [RFC,net-next,v2,1/2] net: phy: phy_device Prevent nullptr exceptions on ISR |

Commit Message

Andre Werner Jan. 19, 2024, 6:51 a.m. UTC
  If phydev->irq is set unconditionally by MAC drivers, check
for valid interrupt handler or fall back to polling mode to prevent
nullptr exceptions.

Signed-off-by: Andre Werner <andre.werner@systec-electronic.com>
---

It was exact the way described by Heiner Kallweit. The Kernel Oops
looked as follows:

[  +0.004455] smsc95xx v2.0.0
[  +0.578343] ADIN1100 usb-001:007:00: attached PHY driver (mii_bus:phy_addr=usb-001:007:00, irq=149)
[  +0.000526] smsc95xx 1-4:1.0 eth0: register 'smsc95xx' at usb-0000:00:14.0-4, smsc95xx USB 2.0 Ethernet, 44:d5:f2:a0:4d:bc
[  +0.027107] smsc95xx 1-4:1.0 enp0s20f0u4: renamed from eth0
[  +0.073992] smsc95xx 1-4:1.0 enp0s20f0u4: Link is Down
[  +9.559260] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  +0.000020] #PF: supervisor instruction fetch in kernel mode
[  +0.000008] #PF: error_code(0x0010) - not-present page
[  +0.000009] PGD 0 P4D 0
[  +0.000011] Oops: 0010 [#1] PREEMPT SMP NOPTI
[  +0.000013] CPU: 4 PID: 2073 Comm: irq/149-usb-001 Tainted: G           OE      6.5.5-arch1-1 #1 d82a0f532dd8cfe67d5795c1738d9c01059a0c62
[  +0.000014] Hardware name: LENOVO 20RD001CGE/20RD001CGE, BIOS R16ET29W (1.15 ) 12/03/2020
[  +0.000006] RIP: 0010:0x0
[  +0.000100] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[  +0.000007] RSP: 0018:ffffb3b8c35b3e58 EFLAGS: 00010246
[  +0.000011] RAX: 0000000000000000 RBX: ffff8e5a84d10800 RCX: 0000000000000002
[  +0.000007] RDX: ffff8e5aeba52100 RSI: ffff8e5a84d10800 RDI: ffff8e5a84d10800
[  +0.000007] RBP: ffff8e5a84d10d48 R08: 0000000000000000 R09: 0000000000000000
[  +0.000007] R10: 0000000000000001 R11: 0000000000000100 R12: ffffffffc22720a0
[  +0.000007] R13: ffff8e5a8a0a0c80 R14: ffffffff983505e0 R15: ffff8e5aed177940
[  +0.000006] FS:  0000000000000000(0000) GS:ffff8e5dd1500000(0000) knlGS:0000000000000000
[  +0.000009] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000007] CR2: ffffffffffffffd6 CR3: 000000025b020004 CR4: 00000000003706e0
[  +0.000007] Call Trace:
[  +0.000006]  <TASK>
[  +0.000008]  ? __die+0x23/0x70
[  +0.000018]  ? page_fault_oops+0x171/0x4e0
[  +0.000017]  ? sched_clock+0x10/0x30
[  +0.000010]  ? sched_clock_cpu+0xf/0x190
[  +0.000015]  ? exc_page_fault+0x7f/0x180
[  +0.000018]  ? asm_exc_page_fault+0x26/0x30
[  +0.000014]  ? __pfx_irq_thread_fn+0x10/0x10
[  +0.000031]  phy_interrupt+0xac/0xf0 [libphy 6d44c9178d1bba7d6a381c1347a9242562ba5983]
[  +0.000097]  irq_thread_fn+0x20/0x60
[  +0.000014]  irq_thread+0xfb/0x1c0
[  +0.000012]  ? __pfx_irq_thread_dtor+0x10/0x10
[  +0.000013]  ? __pfx_irq_thread+0x10/0x10
[  +0.000011]  kthread+0xe5/0x120
[  +0.000013]  ? __pfx_kthread+0x10/0x10
[  +0.000012]  ret_from_fork+0x31/0x50
[  +0.000010]  ? __pfx_kthread+0x10/0x10
[  +0.000011]  ret_from_fork_asm+0x1b/0x30
[  +0.000022]  </TASK>
[  +0.000005] Modules linked in: adin1100 smsc95xx selftests usbnet [..]
[  +0.000082] CR2: 0000000000000000
[  +0.000007] ---[ end trace 0000000000000000 ]---

v2:
- Added the workaround to phy_attach_direct as discussed in first patch
  submission.
  TODO: Make phy_device->irq private and add access function that proves
  availability of interrupt handler in phy driver.

---
 drivers/net/phy/phy_device.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)
  

Patch

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 3611ea64875e..3986e103d25e 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1413,6 +1413,11 @@  int phy_sfp_probe(struct phy_device *phydev,
 }
 EXPORT_SYMBOL(phy_sfp_probe);
 
+static bool phy_drv_supports_irq(struct phy_driver *phydrv)
+{
+	return phydrv->config_intr && phydrv->handle_interrupt;
+}
+
 /**
  * phy_attach_direct - attach a network device to a given PHY device pointer
  * @dev: network device to attach
@@ -1527,6 +1532,18 @@  int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
 	if (phydev->dev_flags & PHY_F_NO_IRQ)
 		phydev->irq = PHY_POLL;
 
+	/*
+	 * Some drivers may add IRQ numbers unconditionally to a phy device that does
+	 * not implement an interrupt handler after phy_probe is already done.
+	 * Reset to PHY_POLL to prevent nullptr exceptions in that case.
+	 */
+	if (!phy_drv_supports_irq(phydev->drv) && phy_interrupt_is_valid(phydev)) {
+		phydev_warn(phydev,
+			    "No handler for IRQ=%d available. Falling back to polling mode\n",
+			    phydev->irq);
+		phydev->irq = PHY_POLL;
+	}
+
 	/* Port is set to PORT_TP by default and the actual PHY driver will set
 	 * it to different value depending on the PHY configuration. If we have
 	 * the generic PHY driver we can't figure it out, thus set the old
@@ -2992,11 +3009,6 @@  s32 phy_get_internal_delay(struct phy_device *phydev, struct device *dev,
 }
 EXPORT_SYMBOL(phy_get_internal_delay);
 
-static bool phy_drv_supports_irq(struct phy_driver *phydrv)
-{
-	return phydrv->config_intr && phydrv->handle_interrupt;
-}
-
 static int phy_led_set_brightness(struct led_classdev *led_cdev,
 				  enum led_brightness value)
 {