[net,1/1] net: stmmac: fix MAC and phylink mismatch issue after resume with STMMAC_FLAG_USE_PHY_WOL enabled

Message ID 20231109050027.2545000-1-yi.fang.gan@intel.com
State New
Headers
Series [net,1/1] net: stmmac: fix MAC and phylink mismatch issue after resume with STMMAC_FLAG_USE_PHY_WOL enabled |

Commit Message

Gan, Yi Fang Nov. 9, 2023, 5 a.m. UTC
  From: "Gan, Yi Fang" <yi.fang.gan@intel.com>

The issue happened when flag STMMAC_FLAG_USE_PHY_WOL is enabled.
It can be reproduced with steps below:
1. Advertise only one speed on the host
2. Enable the WoL on the host
3. Suspend the host
4. Wake up the host

When the WoL is disabled, both the PHY and MAC will suspend and wake up
with everything configured well. When WoL is enabled, the PHY needs to be
stay awake to receive the signal from remote client but MAC will enter
suspend mode.

When the MAC resumes from suspend, phylink_resume() will call
phylink_start() to start the phylink instance which will trigger the
phylink machine to invoke the mac_link_up callback function. The
stmmac_mac_link_up() will configure the MAC_CTRL_REG based on the current
link state. Then the stmmac_hw_setup() will be called to configure the MAC.

This sequence might cause mismatch of the link state between MAC and
phylink. This patch moves the phylink_resume() after stmamc_hw_setup() to
ensure the MAC is initialized before phylink is being configured.

As phylink_resume() is called all the time, refactor the code and
remove the redundant check.

Fixes: 90702dcd19c0 ("net: stmmac: fix MAC not working when system resume back with WoL active")
Cc: <stable@vger.kernel.org> # 5.15+
Signed-off-by: Gan, Yi Fang <yi.fang.gan@intel.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)
  

Comments

Russell King (Oracle) Nov. 9, 2023, 9:15 a.m. UTC | #1
On Thu, Nov 09, 2023 at 01:00:27PM +0800, Gan Yi Fang wrote:
> From: "Gan, Yi Fang" <yi.fang.gan@intel.com>
> 
> The issue happened when flag STMMAC_FLAG_USE_PHY_WOL is enabled.
> It can be reproduced with steps below:
> 1. Advertise only one speed on the host
> 2. Enable the WoL on the host
> 3. Suspend the host
> 4. Wake up the host
> 
> When the WoL is disabled, both the PHY and MAC will suspend and wake up
> with everything configured well. When WoL is enabled, the PHY needs to be
> stay awake to receive the signal from remote client but MAC will enter
> suspend mode.
> 
> When the MAC resumes from suspend, phylink_resume() will call
> phylink_start() to start the phylink instance which will trigger the
> phylink machine to invoke the mac_link_up callback function. The
> stmmac_mac_link_up() will configure the MAC_CTRL_REG based on the current
> link state. Then the stmmac_hw_setup() will be called to configure the MAC.
> 
> This sequence might cause mismatch of the link state between MAC and
> phylink. This patch moves the phylink_resume() after stmamc_hw_setup() to
> ensure the MAC is initialized before phylink is being configured.

Isn't this going to cause problems?

stmamc_hw_setup() calls stmmac_init_dma_engine(), which then calls
stmmac_reset() - and stmmac_reset() can fail if the PHY clock isn't
running, which is why phylink_resume() gets called before this.
  
Russell King (Oracle) Nov. 9, 2023, 9:46 a.m. UTC | #2
On Thu, Nov 09, 2023 at 09:15:36AM +0000, Russell King (Oracle) wrote:
> On Thu, Nov 09, 2023 at 01:00:27PM +0800, Gan Yi Fang wrote:
> > From: "Gan, Yi Fang" <yi.fang.gan@intel.com>
> > 
> > The issue happened when flag STMMAC_FLAG_USE_PHY_WOL is enabled.
> > It can be reproduced with steps below:
> > 1. Advertise only one speed on the host
> > 2. Enable the WoL on the host
> > 3. Suspend the host
> > 4. Wake up the host
> > 
> > When the WoL is disabled, both the PHY and MAC will suspend and wake up
> > with everything configured well. When WoL is enabled, the PHY needs to be
> > stay awake to receive the signal from remote client but MAC will enter
> > suspend mode.
> > 
> > When the MAC resumes from suspend, phylink_resume() will call
> > phylink_start() to start the phylink instance which will trigger the
> > phylink machine to invoke the mac_link_up callback function. The
> > stmmac_mac_link_up() will configure the MAC_CTRL_REG based on the current
> > link state. Then the stmmac_hw_setup() will be called to configure the MAC.
> > 
> > This sequence might cause mismatch of the link state between MAC and
> > phylink. This patch moves the phylink_resume() after stmamc_hw_setup() to
> > ensure the MAC is initialized before phylink is being configured.
> 
> Isn't this going to cause problems?
> 
> stmamc_hw_setup() calls stmmac_init_dma_engine(), which then calls
> stmmac_reset() - and stmmac_reset() can fail if the PHY clock isn't
> running, which is why phylink_resume() gets called before this.

I think these two commits should be reviewed to understand why the code
is the way it is, and why changing it may cause regressions:

90702dcd19c0 ("net: stmmac: fix MAC not working when system resume back
with WoL active")

36d18b5664ef ("net: stmmac: start phylink instance before
stmmac_hw_setup()")

As part of my work on stmmac that got junked, I was looking at a
solution to the "we need the PHY clock to be running for the MAC to
work for things like reset" problem - but those patches got thrown
away when stmmac folk were very nitpicky over %u vs %d in format
strings to print what was a _signed_ value that stmmac code stupidly
converts to an unsigned integer... it's still a signed integer no
matter if code decides to use "unsigned int". I suspect all those
patches (and there was a considerable number of them) have now been
expired from git, so are now totally lost, and honestly I have no
desire to put further work into stmmac stuff.
  
Paolo Abeni Nov. 9, 2023, 12:13 p.m. UTC | #3
On Thu, 2023-11-09 at 09:15 +0000, Russell King (Oracle) wrote:
> On Thu, Nov 09, 2023 at 01:00:27PM +0800, Gan Yi Fang wrote:
> > From: "Gan, Yi Fang" <yi.fang.gan@intel.com>
> > 
> > The issue happened when flag STMMAC_FLAG_USE_PHY_WOL is enabled.
> > It can be reproduced with steps below:
> > 1. Advertise only one speed on the host
> > 2. Enable the WoL on the host
> > 3. Suspend the host
> > 4. Wake up the host
> > 
> > When the WoL is disabled, both the PHY and MAC will suspend and wake up
> > with everything configured well. When WoL is enabled, the PHY needs to be
> > stay awake to receive the signal from remote client but MAC will enter
> > suspend mode.
> > 
> > When the MAC resumes from suspend, phylink_resume() will call
> > phylink_start() to start the phylink instance which will trigger the
> > phylink machine to invoke the mac_link_up callback function. The
> > stmmac_mac_link_up() will configure the MAC_CTRL_REG based on the current
> > link state. Then the stmmac_hw_setup() will be called to configure the MAC.
> > 
> > This sequence might cause mismatch of the link state between MAC and
> > phylink. This patch moves the phylink_resume() after stmamc_hw_setup() to
> > ensure the MAC is initialized before phylink is being configured.
> 
> Isn't this going to cause problems?
> 
> stmamc_hw_setup() calls stmmac_init_dma_engine(), which then calls
> stmmac_reset() - and stmmac_reset() can fail if the PHY clock isn't
> running, which is why phylink_resume() gets called before this.

@Gan Yi Fang: at very least we need a solid explanation in the commit
message why this change don't cause the above problems.

Thanks,

Paolo
  
Gan, Yi Fang Nov. 16, 2023, 7:39 a.m. UTC | #4
Hi Paolo and Russell King,

After study the information provided, it seems better to find another way to
resolve the issue. Appreciate for the details given. Will try to figure out another
solution.


Best Regards,
Gan Yi Fang

> -----Original Message-----
> From: Paolo Abeni <pabeni@redhat.com>
> Sent: Thursday, November 9, 2023 8:14 PM
> To: Russell King (Oracle) <linux@armlinux.org.uk>; Gan, Yi Fang
> <yi.fang.gan@intel.com>
> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>; Jose Abreu
> <joabreu@synopsys.com>; David S . Miller <davem@davemloft.net>; Eric
> Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>;
> Maxime Coquelin <mcoquelin.stm32@gmail.com>; Joakim Zhang
> <qiangqing.zhang@nxp.com>; netdev@vger.kernel.org; linux-stm32@st-md-
> mailman.stormreply.com; linux-arm-kernel@lists.infradead.org; linux-
> kernel@vger.kernel.org; Looi, Hong Aun <hong.aun.looi@intel.com>; Voon,
> Weifeng <weifeng.voon@intel.com>; Song, Yoong Siang
> <yoong.siang.song@intel.com>
> Subject: Re: [PATCH net 1/1] net: stmmac: fix MAC and phylink mismatch issue
> after resume with STMMAC_FLAG_USE_PHY_WOL enabled
> 
> On Thu, 2023-11-09 at 09:15 +0000, Russell King (Oracle) wrote:
> > On Thu, Nov 09, 2023 at 01:00:27PM +0800, Gan Yi Fang wrote:
> > > From: "Gan, Yi Fang" <yi.fang.gan@intel.com>
> > >
> > > The issue happened when flag STMMAC_FLAG_USE_PHY_WOL is enabled.
> > > It can be reproduced with steps below:
> > > 1. Advertise only one speed on the host 2. Enable the WoL on the
> > > host 3. Suspend the host 4. Wake up the host
> > >
> > > When the WoL is disabled, both the PHY and MAC will suspend and wake
> > > up with everything configured well. When WoL is enabled, the PHY
> > > needs to be stay awake to receive the signal from remote client but
> > > MAC will enter suspend mode.
> > >
> > > When the MAC resumes from suspend, phylink_resume() will call
> > > phylink_start() to start the phylink instance which will trigger the
> > > phylink machine to invoke the mac_link_up callback function. The
> > > stmmac_mac_link_up() will configure the MAC_CTRL_REG based on the
> > > current link state. Then the stmmac_hw_setup() will be called to configure
> the MAC.
> > >
> > > This sequence might cause mismatch of the link state between MAC and
> > > phylink. This patch moves the phylink_resume() after
> > > stmamc_hw_setup() to ensure the MAC is initialized before phylink is being
> configured.
> >
> > Isn't this going to cause problems?
> >
> > stmamc_hw_setup() calls stmmac_init_dma_engine(), which then calls
> > stmmac_reset() - and stmmac_reset() can fail if the PHY clock isn't
> > running, which is why phylink_resume() gets called before this.
> 
> @Gan Yi Fang: at very least we need a solid explanation in the commit message
> why this change don't cause the above problems.
> 
> Thanks,
> 
> Paolo
>
  

Patch

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 3e50fd53a617..9b009fa5478f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -7844,16 +7844,6 @@  int stmmac_resume(struct device *dev)
 			return ret;
 	}
 
-	rtnl_lock();
-	if (device_may_wakeup(priv->device) && priv->plat->pmt) {
-		phylink_resume(priv->phylink);
-	} else {
-		phylink_resume(priv->phylink);
-		if (device_may_wakeup(priv->device))
-			phylink_speed_up(priv->phylink);
-	}
-	rtnl_unlock();
-
 	rtnl_lock();
 	mutex_lock(&priv->lock);
 
@@ -7868,6 +7858,11 @@  int stmmac_resume(struct device *dev)
 
 	stmmac_restore_hw_vlan_rx_fltr(priv, ndev, priv->hw);
 
+	phylink_resume(priv->phylink);
+
+	if (device_may_wakeup(priv->device) && !(priv->plat->pmt))
+		phylink_speed_up(priv->phylink);
+
 	stmmac_enable_all_queues(priv);
 	stmmac_enable_all_dma_irq(priv);