[net] lan966x: Fix unloading/loading of the driver

Message ID 20230522120038.3749026-1-horatiu.vultur@microchip.com
State New
Headers
Series [net] lan966x: Fix unloading/loading of the driver |

Commit Message

Horatiu Vultur May 22, 2023, noon UTC
  It was noticing that after a while when unloading/loading the driver and
sending traffic through the switch, it would stop working. It would stop
forwarding any traffic and the only way to get out of this was to do a
power cycle of the board. The root cause seems to be that the switch
core is initialized twice. Apparently initializing twice the switch core
disturbs the pointers in the queue systems in the HW, so after a while
it would stop sending the traffic.
Unfortunetly, it is not possible to use a reset of the switch here,
because the reset line is connected to multiple devices like MDIO,
SGPIO, FAN, etc. So then all the devices will get reseted when the
network driver will be loaded.
So the fix is to check if the core is initialized already and if that is
the case don't initialize it again.

Fixes: db8bcaad5393 ("net: lan966x: add the basic lan966x driver")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
---
 drivers/net/ethernet/microchip/lan966x/lan966x_main.c | 10 ++++++++++
 1 file changed, 10 insertions(+)
  

Comments

Simon Horman May 22, 2023, 1:46 p.m. UTC | #1
On Mon, May 22, 2023 at 02:00:38PM +0200, Horatiu Vultur wrote:
> It was noticing that after a while when unloading/loading the driver and
> sending traffic through the switch, it would stop working. It would stop
> forwarding any traffic and the only way to get out of this was to do a
> power cycle of the board. The root cause seems to be that the switch
> core is initialized twice. Apparently initializing twice the switch core
> disturbs the pointers in the queue systems in the HW, so after a while
> it would stop sending the traffic.

Ouch.

> Unfortunetly, it is not possible to use a reset of the switch here,

nit: s/Unfortunetly/Unfortunately/

> because the reset line is connected to multiple devices like MDIO,
> SGPIO, FAN, etc. So then all the devices will get reseted when the

nit: s/reseted/reset/

> network driver will be loaded.
> So the fix is to check if the core is initialized already and if that is
> the case don't initialize it again.
> 
> Fixes: db8bcaad5393 ("net: lan966x: add the basic lan966x driver")
> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>

Reviewed-by: Simon Horman <simon.horman@corigine.com>

...
  
patchwork-bot+netdevbpf@kernel.org May 23, 2023, 1:40 p.m. UTC | #2
Hello:

This patch was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Mon, 22 May 2023 14:00:38 +0200 you wrote:
> It was noticing that after a while when unloading/loading the driver and
> sending traffic through the switch, it would stop working. It would stop
> forwarding any traffic and the only way to get out of this was to do a
> power cycle of the board. The root cause seems to be that the switch
> core is initialized twice. Apparently initializing twice the switch core
> disturbs the pointers in the queue systems in the HW, so after a while
> it would stop sending the traffic.
> Unfortunetly, it is not possible to use a reset of the switch here,
> because the reset line is connected to multiple devices like MDIO,
> SGPIO, FAN, etc. So then all the devices will get reseted when the
> network driver will be loaded.
> So the fix is to check if the core is initialized already and if that is
> the case don't initialize it again.
> 
> [...]

Here is the summary with links:
  - [net] lan966x: Fix unloading/loading of the driver
    https://git.kernel.org/netdev/net/c/600761245952

You are awesome, thank you!
  

Patch

diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
index 5f01b21acdd1b..f6931dfb3e68e 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
@@ -1039,6 +1039,16 @@  static int lan966x_reset_switch(struct lan966x *lan966x)
 
 	reset_control_reset(switch_reset);
 
+	/* Don't reinitialize the switch core, if it is already initialized. In
+	 * case it is initialized twice, some pointers inside the queue system
+	 * in HW will get corrupted and then after a while the queue system gets
+	 * full and no traffic is passing through the switch. The issue is seen
+	 * when loading and unloading the driver and sending traffic through the
+	 * switch.
+	 */
+	if (lan_rd(lan966x, SYS_RESET_CFG) & SYS_RESET_CFG_CORE_ENA)
+		return 0;
+
 	lan_wr(SYS_RESET_CFG_CORE_ENA_SET(0), lan966x, SYS_RESET_CFG);
 	lan_wr(SYS_RAM_INIT_RAM_INIT_SET(1), lan966x, SYS_RAM_INIT);
 	ret = readx_poll_timeout(lan966x_ram_init, lan966x,