[V3,1/2] net: phylink: add a function to resume phy alone to fix resume issue with WoL enabled

Message ID 20230202081559.3553637-1-xiaoning.wang@nxp.com
State New
Headers
Series [V3,1/2] net: phylink: add a function to resume phy alone to fix resume issue with WoL enabled |

Commit Message

Clark Wang Feb. 2, 2023, 8:15 a.m. UTC
  Issue we met:
On some platforms, mac cannot work after resumed from the suspend with WoL
enabled.

The cause of the issue:
1. phylink_resolve() is in a workqueue which will not be executed immediately.
   This is the call sequence:
       phylink_resolve()->phylink_link_up()->pl->mac_ops->mac_link_up()
   For stmmac driver, mac_link_up() will set the correct speed/duplex...
   values which are from link_state.
2. In stmmac_resume(), it will call stmmac_hw_setup() after called the
   phylink_resume(), because mac need phy rx_clk to do the reset.
   stmmac_core_init() is called in function stmmac_hw_setup(), which will
   reset the mac and set the speed/duplex... to default value.
Conclusion: Because phylink_resolve() cannot determine when it is called, it
            cannot be guaranteed to be called after stmmac_core_init().
	    Once stmmac_core_init() is called after phylink_resolve(),
	    the mac will be misconfigured and cannot be used.

In order to avoid this problem, add a function called phylink_phy_resume()
to resume phy separately. This eliminates the need to call phylink_resume()
before stmmac_hw_setup().

Add another judgement before called phy_start() in phylink_start(). This way
phy_start() will not be called multiple times when resumes. At the same time,
it may not affect other drivers that do not use phylink_phy_resume().

Signed-off-by: Clark Wang <xiaoning.wang@nxp.com>
---
V2 change:
 - add mac_resume_phy_separately flag to struct phylink to mark if the mac
   driver uses the phylink_phy_resume() first.
V3 change:
 - add brace to avoid ambiguous 'else'
   Reported-by: kernel test robot <lkp@intel.com>
---
 drivers/net/phy/phylink.c | 32 ++++++++++++++++++++++++++++++--
 include/linux/phylink.h   |  1 +
 2 files changed, 31 insertions(+), 2 deletions(-)
  

Comments

Jakub Kicinski Feb. 3, 2023, 5:24 a.m. UTC | #1
On Thu,  2 Feb 2023 16:15:59 +0800 Clark Wang wrote:
> Issue we met:
> On some platforms, mac cannot work after resumed from the suspend with WoL
> enabled.
> 
> The cause of the issue:
> 1. phylink_resolve() is in a workqueue which will not be executed immediately.
>    This is the call sequence:
>        phylink_resolve()->phylink_link_up()->pl->mac_ops->mac_link_up()
>    For stmmac driver, mac_link_up() will set the correct speed/duplex...
>    values which are from link_state.
> 2. In stmmac_resume(), it will call stmmac_hw_setup() after called the
>    phylink_resume(), because mac need phy rx_clk to do the reset.
>    stmmac_core_init() is called in function stmmac_hw_setup(), which will
>    reset the mac and set the speed/duplex... to default value.
> Conclusion: Because phylink_resolve() cannot determine when it is called, it
>             cannot be guaranteed to be called after stmmac_core_init().
> 	    Once stmmac_core_init() is called after phylink_resolve(),
> 	    the mac will be misconfigured and cannot be used.
> 
> In order to avoid this problem, add a function called phylink_phy_resume()
> to resume phy separately. This eliminates the need to call phylink_resume()
> before stmmac_hw_setup().
> 
> Add another judgement before called phy_start() in phylink_start(). This way
> phy_start() will not be called multiple times when resumes. At the same time,
> it may not affect other drivers that do not use phylink_phy_resume().
> 
> Signed-off-by: Clark Wang <xiaoning.wang@nxp.com>

Patch 2/2 never made it to the list. You'll need to repost.
While I have you - some minor nit picks:

> +/**
> + * phylink_phy_resume() - resume phy alone
> + * @pl: a pointer to a &struct phylink returned from phylink_create()
> + *
> + * In the MAC driver using phylink, if the MAC needs the clock of the phy

You use MAC in capital letters buy phy in lower case, be consistent.

> + * when it resumes, can call this function to resume the phy separately.

missing "it" ? Otherwise the sentence is missing a subject.

> + * Then proceed to MAC resume operations.
> + */
> +void phylink_phy_resume(struct phylink *pl)
> +{
> +	ASSERT_RTNL();
> +
> +	if (!test_bit(PHYLINK_DISABLE_MAC_WOL, &pl->phylink_disable_state)
> +	    && pl->phydev) {

&& goes at the end of the line, not start

> +		phy_start(pl->phydev);
> +		pl->mac_resume_phy_separately = true;
> +	}
> +
> +}
> +EXPORT_SYMBOL_GPL(phylink_phy_resume);
  
Russell King (Oracle) Feb. 9, 2023, 4:57 p.m. UTC | #2
Hi Clark,

Please address Jakub's points and resend with patch 2, thanks. Also,
the subject line should indicate which tree this patch series is
targetting - e.g.

[PATCH net-next v4 1/2] net: phylink: ...

> +/**
> + * phylink_phy_resume() - resume phy alone
> + * @pl: a pointer to a &struct phylink returned from phylink_create()
> + *
> + * In the MAC driver using phylink, if the MAC needs the clock of the phy
> + * when it resumes, can call this function to resume the phy separately.
> + * Then proceed to MAC resume operations.

Please also state that it must only be called prior to calling
phylink_start() in the driver's resume function.

Thanks.
  
Paolo Abeni Feb. 23, 2023, 10:09 a.m. UTC | #3
On Thu, 2023-02-02 at 16:15 +0800, Clark Wang wrote:
> Issue we met:
> On some platforms, mac cannot work after resumed from the suspend with WoL
> enabled.
> 
> The cause of the issue:
> 1. phylink_resolve() is in a workqueue which will not be executed immediately.
>    This is the call sequence:
>        phylink_resolve()->phylink_link_up()->pl->mac_ops->mac_link_up()
>    For stmmac driver, mac_link_up() will set the correct speed/duplex...
>    values which are from link_state.
> 2. In stmmac_resume(), it will call stmmac_hw_setup() after called the
>    phylink_resume(), because mac need phy rx_clk to do the reset.
>    stmmac_core_init() is called in function stmmac_hw_setup(), which will
>    reset the mac and set the speed/duplex... to default value.
> Conclusion: Because phylink_resolve() cannot determine when it is called, it
>             cannot be guaranteed to be called after stmmac_core_init().
> 	    Once stmmac_core_init() is called after phylink_resolve(),
> 	    the mac will be misconfigured and cannot be used.
> 
> In order to avoid this problem, add a function called phylink_phy_resume()
> to resume phy separately. This eliminates the need to call phylink_resume()
> before stmmac_hw_setup().
> 
> Add another judgement before called phy_start() in phylink_start(). This way
> phy_start() will not be called multiple times when resumes. At the same time,
> it may not affect other drivers that do not use phylink_phy_resume().
> 
> Signed-off-by: Clark Wang <xiaoning.wang@nxp.com>
> ---
> V2 change:
>  - add mac_resume_phy_separately flag to struct phylink to mark if the mac
>    driver uses the phylink_phy_resume() first.
> V3 change:
>  - add brace to avoid ambiguous 'else'
>    Reported-by: kernel test robot <lkp@intel.com>
> ---
>  drivers/net/phy/phylink.c | 32 ++++++++++++++++++++++++++++++--
>  include/linux/phylink.h   |  1 +
>  2 files changed, 31 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index 319790221d7f..c2fe66f0b78f 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -80,6 +80,8 @@ struct phylink {
>  	DECLARE_PHY_INTERFACE_MASK(sfp_interfaces);
>  	__ETHTOOL_DECLARE_LINK_MODE_MASK(sfp_support);
>  	u8 sfp_port;
> +
> +	bool mac_resume_phy_separately;
>  };
>  
>  #define phylink_printk(level, pl, fmt, ...) \
> @@ -1509,6 +1511,7 @@ struct phylink *phylink_create(struct phylink_config *config,
>  		return ERR_PTR(-EINVAL);
>  	}
>  
> +	pl->mac_resume_phy_separately = false;
>  	pl->using_mac_select_pcs = using_mac_select_pcs;
>  	pl->phy_state.interface = iface;
>  	pl->link_interface = iface;
> @@ -1943,8 +1946,12 @@ void phylink_start(struct phylink *pl)
>  	}
>  	if (poll)
>  		mod_timer(&pl->link_poll, jiffies + HZ);
> -	if (pl->phydev)
> -		phy_start(pl->phydev);
> +	if (pl->phydev) {
> +		if (!pl->mac_resume_phy_separately)
> +			phy_start(pl->phydev);
> +		else
> +			pl->mac_resume_phy_separately = false;
> +	}
>  	if (pl->sfp_bus)
>  		sfp_upstream_start(pl->sfp_bus);
>  }
> @@ -2024,6 +2031,27 @@ void phylink_suspend(struct phylink *pl, bool mac_wol)
>  }
>  EXPORT_SYMBOL_GPL(phylink_suspend);
>  
> +/**
> + * phylink_phy_resume() - resume phy alone
> + * @pl: a pointer to a &struct phylink returned from phylink_create()
> + *
> + * In the MAC driver using phylink, if the MAC needs the clock of the phy
> + * when it resumes, can call this function to resume the phy separately.
> + * Then proceed to MAC resume operations.
> + */
> +void phylink_phy_resume(struct phylink *pl)
> +{
> +	ASSERT_RTNL();
> +
> +	if (!test_bit(PHYLINK_DISABLE_MAC_WOL, &pl->phylink_disable_state)
> +	    && pl->phydev) {
> +		phy_start(pl->phydev);
> +		pl->mac_resume_phy_separately = true;
> +	}
> +

Minor nit: the empty line here is not needed.

Cheers,

Paolo
  
Russell King (Oracle) Feb. 23, 2023, 10:21 a.m. UTC | #4
On Thu, Feb 23, 2023 at 11:09:04AM +0100, Paolo Abeni wrote:
> On Thu, 2023-02-02 at 16:15 +0800, Clark Wang wrote:
> > Issue we met:
> > On some platforms, mac cannot work after resumed from the suspend with WoL
> > enabled.
> > 
> > The cause of the issue:
> > 1. phylink_resolve() is in a workqueue which will not be executed immediately.
> >    This is the call sequence:
> >        phylink_resolve()->phylink_link_up()->pl->mac_ops->mac_link_up()
> >    For stmmac driver, mac_link_up() will set the correct speed/duplex...
> >    values which are from link_state.
> > 2. In stmmac_resume(), it will call stmmac_hw_setup() after called the
> >    phylink_resume(), because mac need phy rx_clk to do the reset.
> >    stmmac_core_init() is called in function stmmac_hw_setup(), which will
> >    reset the mac and set the speed/duplex... to default value.
> > Conclusion: Because phylink_resolve() cannot determine when it is called, it
> >             cannot be guaranteed to be called after stmmac_core_init().
> > 	    Once stmmac_core_init() is called after phylink_resolve(),
> > 	    the mac will be misconfigured and cannot be used.
> > 
> > In order to avoid this problem, add a function called phylink_phy_resume()
> > to resume phy separately. This eliminates the need to call phylink_resume()
> > before stmmac_hw_setup().
> > 
> > Add another judgement before called phy_start() in phylink_start(). This way
> > phy_start() will not be called multiple times when resumes. At the same time,
> > it may not affect other drivers that do not use phylink_phy_resume().
> > 
> > Signed-off-by: Clark Wang <xiaoning.wang@nxp.com>
> > ---
> > V2 change:
> >  - add mac_resume_phy_separately flag to struct phylink to mark if the mac
> >    driver uses the phylink_phy_resume() first.
> > V3 change:
> >  - add brace to avoid ambiguous 'else'
> >    Reported-by: kernel test robot <lkp@intel.com>
> > ---
> >  drivers/net/phy/phylink.c | 32 ++++++++++++++++++++++++++++++--
> >  include/linux/phylink.h   |  1 +
> >  2 files changed, 31 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> > index 319790221d7f..c2fe66f0b78f 100644
> > --- a/drivers/net/phy/phylink.c
> > +++ b/drivers/net/phy/phylink.c
> > @@ -80,6 +80,8 @@ struct phylink {
> >  	DECLARE_PHY_INTERFACE_MASK(sfp_interfaces);
> >  	__ETHTOOL_DECLARE_LINK_MODE_MASK(sfp_support);
> >  	u8 sfp_port;
> > +
> > +	bool mac_resume_phy_separately;
> >  };
> >  
> >  #define phylink_printk(level, pl, fmt, ...) \
> > @@ -1509,6 +1511,7 @@ struct phylink *phylink_create(struct phylink_config *config,
> >  		return ERR_PTR(-EINVAL);
> >  	}
> >  
> > +	pl->mac_resume_phy_separately = false;
> >  	pl->using_mac_select_pcs = using_mac_select_pcs;
> >  	pl->phy_state.interface = iface;
> >  	pl->link_interface = iface;
> > @@ -1943,8 +1946,12 @@ void phylink_start(struct phylink *pl)
> >  	}
> >  	if (poll)
> >  		mod_timer(&pl->link_poll, jiffies + HZ);
> > -	if (pl->phydev)
> > -		phy_start(pl->phydev);
> > +	if (pl->phydev) {
> > +		if (!pl->mac_resume_phy_separately)
> > +			phy_start(pl->phydev);
> > +		else
> > +			pl->mac_resume_phy_separately = false;
> > +	}
> >  	if (pl->sfp_bus)
> >  		sfp_upstream_start(pl->sfp_bus);
> >  }
> > @@ -2024,6 +2031,27 @@ void phylink_suspend(struct phylink *pl, bool mac_wol)
> >  }
> >  EXPORT_SYMBOL_GPL(phylink_suspend);
> >  
> > +/**
> > + * phylink_phy_resume() - resume phy alone
> > + * @pl: a pointer to a &struct phylink returned from phylink_create()
> > + *
> > + * In the MAC driver using phylink, if the MAC needs the clock of the phy
> > + * when it resumes, can call this function to resume the phy separately.
> > + * Then proceed to MAC resume operations.
> > + */
> > +void phylink_phy_resume(struct phylink *pl)
> > +{
> > +	ASSERT_RTNL();
> > +
> > +	if (!test_bit(PHYLINK_DISABLE_MAC_WOL, &pl->phylink_disable_state)
> > +	    && pl->phydev) {
> > +		phy_start(pl->phydev);
> > +		pl->mac_resume_phy_separately = true;
> > +	}
> > +
> 
> Minor nit: the empty line here is not needed.

The author appears to have become non-responsive after sending half of
the two patch series, and hasn't addressed previous feedback.

In any case, someone else was also having similar issues with stmmac,
and proposing different patches, so I've requested that they work
together to solve what looks like one common problem, instead of us
ending up with two patch series potentially addressing that same
issue.
  
Clark Wang Feb. 23, 2023, 10:27 a.m. UTC | #5
> -----Original Message-----
> From: Russell King <linux@armlinux.org.uk>
> Sent: 2023年2月23日 18:22
> To: Paolo Abeni <pabeni@redhat.com>
> Cc: Clark Wang <xiaoning.wang@nxp.com>; peppe.cavallaro@st.com;
> alexandre.torgue@foss.st.com; joabreu@synopsys.com;
> davem@davemloft.net; edumazet@google.com; kuba@kernel.org;
> mcoquelin.stm32@gmail.com; andrew@lunn.ch; hkallweit1@gmail.com;
> netdev@vger.kernel.org; linux-stm32@st-md-mailman.stormreply.com;
> linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
> dl-linux-imx <linux-imx@nxp.com>
> Subject: Re: [PATCH V3 1/2] net: phylink: add a function to resume phy alone
> to fix resume issue with WoL enabled
> 
> On Thu, Feb 23, 2023 at 11:09:04AM +0100, Paolo Abeni wrote:
> > On Thu, 2023-02-02 at 16:15 +0800, Clark Wang wrote:
> > > Issue we met:
> > > On some platforms, mac cannot work after resumed from the suspend
> with WoL
> > > enabled.
> > >
> > > The cause of the issue:
> > > 1. phylink_resolve() is in a workqueue which will not be executed
> immediately.
> > >    This is the call sequence:
> > >
> phylink_resolve()->phylink_link_up()->pl->mac_ops->mac_link_up()
> > >    For stmmac driver, mac_link_up() will set the correct speed/duplex...
> > >    values which are from link_state.
> > > 2. In stmmac_resume(), it will call stmmac_hw_setup() after called the
> > >    phylink_resume(), because mac need phy rx_clk to do the reset.
> > >    stmmac_core_init() is called in function stmmac_hw_setup(), which
> will
> > >    reset the mac and set the speed/duplex... to default value.
> > > Conclusion: Because phylink_resolve() cannot determine when it is called,
> it
> > >             cannot be guaranteed to be called after
> stmmac_core_init().
> > > 	    Once stmmac_core_init() is called after phylink_resolve(),
> > > 	    the mac will be misconfigured and cannot be used.
> > >
> > > In order to avoid this problem, add a function called
> phylink_phy_resume()
> > > to resume phy separately. This eliminates the need to call
> phylink_resume()
> > > before stmmac_hw_setup().
> > >
> > > Add another judgement before called phy_start() in phylink_start(). This
> way
> > > phy_start() will not be called multiple times when resumes. At the same
> time,
> > > it may not affect other drivers that do not use phylink_phy_resume().
> > >
> > > Signed-off-by: Clark Wang <xiaoning.wang@nxp.com>
> > > ---
> > > V2 change:
> > >  - add mac_resume_phy_separately flag to struct phylink to mark if the
> mac
> > >    driver uses the phylink_phy_resume() first.
> > > V3 change:
> > >  - add brace to avoid ambiguous 'else'
> > >    Reported-by: kernel test robot <lkp@intel.com>
> > > ---
> > >  drivers/net/phy/phylink.c | 32 ++++++++++++++++++++++++++++++--
> > >  include/linux/phylink.h   |  1 +
> > >  2 files changed, 31 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> > > index 319790221d7f..c2fe66f0b78f 100644
> > > --- a/drivers/net/phy/phylink.c
> > > +++ b/drivers/net/phy/phylink.c
> > > @@ -80,6 +80,8 @@ struct phylink {
> > >  	DECLARE_PHY_INTERFACE_MASK(sfp_interfaces);
> > >  	__ETHTOOL_DECLARE_LINK_MODE_MASK(sfp_support);
> > >  	u8 sfp_port;
> > > +
> > > +	bool mac_resume_phy_separately;
> > >  };
> > >
> > >  #define phylink_printk(level, pl, fmt, ...) \
> > > @@ -1509,6 +1511,7 @@ struct phylink *phylink_create(struct
> phylink_config *config,
> > >  		return ERR_PTR(-EINVAL);
> > >  	}
> > >
> > > +	pl->mac_resume_phy_separately = false;
> > >  	pl->using_mac_select_pcs = using_mac_select_pcs;
> > >  	pl->phy_state.interface = iface;
> > >  	pl->link_interface = iface;
> > > @@ -1943,8 +1946,12 @@ void phylink_start(struct phylink *pl)
> > >  	}
> > >  	if (poll)
> > >  		mod_timer(&pl->link_poll, jiffies + HZ);
> > > -	if (pl->phydev)
> > > -		phy_start(pl->phydev);
> > > +	if (pl->phydev) {
> > > +		if (!pl->mac_resume_phy_separately)
> > > +			phy_start(pl->phydev);
> > > +		else
> > > +			pl->mac_resume_phy_separately = false;
> > > +	}
> > >  	if (pl->sfp_bus)
> > >  		sfp_upstream_start(pl->sfp_bus);
> > >  }
> > > @@ -2024,6 +2031,27 @@ void phylink_suspend(struct phylink *pl, bool
> mac_wol)
> > >  }
> > >  EXPORT_SYMBOL_GPL(phylink_suspend);
> > >
> > > +/**
> > > + * phylink_phy_resume() - resume phy alone
> > > + * @pl: a pointer to a &struct phylink returned from phylink_create()
> > > + *
> > > + * In the MAC driver using phylink, if the MAC needs the clock of the phy
> > > + * when it resumes, can call this function to resume the phy separately.
> > > + * Then proceed to MAC resume operations.
> > > + */
> > > +void phylink_phy_resume(struct phylink *pl)
> > > +{
> > > +	ASSERT_RTNL();
> > > +
> > > +	if (!test_bit(PHYLINK_DISABLE_MAC_WOL,
> &pl->phylink_disable_state)
> > > +	    && pl->phydev) {
> > > +		phy_start(pl->phydev);
> > > +		pl->mac_resume_phy_separately = true;
> > > +	}
> > > +
> >
> > Minor nit: the empty line here is not needed.
> 
> The author appears to have become non-responsive after sending half of
> the two patch series, and hasn't addressed previous feedback.
> 
> In any case, someone else was also having similar issues with stmmac,
> and proposing different patches, so I've requested that they work
> together to solve what looks like one common problem, instead of us
> ending up with two patch series potentially addressing that same
> issue.


Hi Russel,

I have sent the V4 patch set yesterday.
You can check it from: https://lore.kernel.org/linux-arm-kernel/20230222092636.1984847-2-xiaoning.wang@nxp.com/T/


Thanks.

Best Regards,
Clark Wang
> 
> --
> RMK's Patch system:
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww
> .armlinux.org.uk%2Fdeveloper%2Fpatches%2F&data=05%7C01%7Cxiaoning.
> wang%40nxp.com%7C675fb3cfa68f4424734008db1587c30d%7C686ea1d3b
> c2b4c6fa92cd99c5c301635%7C0%7C0%7C638127445057658328%7CUnkno
> wn%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1
> haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=k3ZozZTq%2BRIik67
> %2FQFCAEQ7IDb%2FRAPJBXbigLgtbsSU%3D&reserved=0
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
  
Russell King (Oracle) Feb. 23, 2023, 11:06 a.m. UTC | #6
On Thu, Feb 23, 2023 at 10:27:06AM +0000, Clark Wang wrote:
> Hi Russel,
> 
> I have sent the V4 patch set yesterday.
> You can check it from: https://lore.kernel.org/linux-arm-kernel/20230222092636.1984847-2-xiaoning.wang@nxp.com/T/
> 

Ah yes, sent while net-next is closed.

Have you had any contact with Clément Léger ? If not, please can you
reach out to Clément, because he has virtually the same problem. I
don't want to end up with a load of different fixes in the mainline
kernel for the same "we need the PHY clock enabled on stmmac" problem
from different people.

Please try to come up with one patch set between you both to fix this.

(effectively, that's a temporary NAK on your series.)
  
Alexis Lothoré Aug. 10, 2023, 4:10 p.m. UTC | #7
Hello Clark, Russell,

On 2/23/23 12:06, Russell King (Oracle) wrote:
> On Thu, Feb 23, 2023 at 10:27:06AM +0000, Clark Wang wrote:
>> Hi Russel,
>>
>> I have sent the V4 patch set yesterday.
>> You can check it from: https://lore.kernel.org/linux-arm-kernel/20230222092636.1984847-2-xiaoning.wang@nxp.com/T/
>>
> 
> Ah yes, sent while net-next is closed.
> 
> Have you had any contact with Clément Léger ? If not, please can you
> reach out to Clément, because he has virtually the same problem. I
> don't want to end up with a load of different fixes in the mainline
> kernel for the same "we need the PHY clock enabled on stmmac" problem
> from different people.

I am resuming Clement's initial efforts on RZN1 GMAC interface, which indeed is
in need of an early PCS initialization mechanism too ([1]).

> Please try to come up with one patch set between you both to fix this.
> 
> (effectively, that's a temporary NAK on your series.)>

I would like to know if this series is still ongoing/alive ? I have checked for
follow-ups after V4 sent by Clark ([2]), but did not find anything. Clement
handed me over the topic right when Russell suggested to discuss this shared
need, so I am not sure if any mutualization discussion has happened yet ?

If not, what would be the next steps ? Based on my understanding and comments on
the [2] v3, I feel that Clark's series would be a good starting point. In order
to be able to use it in both series, we could possibly make it less specific to
the "resume" mechanism (basically, phylink_phy_resume() =>
phylink_phy_early_start() ) ? It would then prevent [1] from moving the whole
phylink_start() in stmmac_main too early (see issue raised by Russell) and allow
to just call phylink_phy_early_start() early enough, while still being usable in
the resume scenario raised by Clark. Or am I missing bigger issues with current
series ?

Regards,
Alexis

[1]
https://lore.kernel.org/linux-arm-kernel/20230116103926.276869-1-clement.leger@bootlin.com/
[2] https://lore.kernel.org/all/20230222092636.1984847-1-xiaoning.wang@nxp.com/
  
Russell King (Oracle) Aug. 10, 2023, 4:28 p.m. UTC | #8
On Thu, Aug 10, 2023 at 06:10:04PM +0200, Alexis Lothoré wrote:
> Hello Clark, Russell,
> 
> On 2/23/23 12:06, Russell King (Oracle) wrote:
> > On Thu, Feb 23, 2023 at 10:27:06AM +0000, Clark Wang wrote:
> >> Hi Russel,
> >>
> >> I have sent the V4 patch set yesterday.
> >> You can check it from: https://lore.kernel.org/linux-arm-kernel/20230222092636.1984847-2-xiaoning.wang@nxp.com/T/
> >>
> > 
> > Ah yes, sent while net-next is closed.
> > 
> > Have you had any contact with Clément Léger ? If not, please can you
> > reach out to Clément, because he has virtually the same problem. I
> > don't want to end up with a load of different fixes in the mainline
> > kernel for the same "we need the PHY clock enabled on stmmac" problem
> > from different people.
> 
> I am resuming Clement's initial efforts on RZN1 GMAC interface, which indeed is
> in need of an early PCS initialization mechanism too ([1]).
> 
> > Please try to come up with one patch set between you both to fix this.
> > 
> > (effectively, that's a temporary NAK on your series.)>
> 
> I would like to know if this series is still ongoing/alive ? I have checked for
> follow-ups after V4 sent by Clark ([2]), but did not find anything. Clement
> handed me over the topic right when Russell suggested to discuss this shared
> need, so I am not sure if any mutualization discussion has happened yet ?
> 
> If not, what would be the next steps ? Based on my understanding and comments on
> the [2] v3, I feel that Clark's series would be a good starting point. In order
> to be able to use it in both series, we could possibly make it less specific to
> the "resume" mechanism (basically, phylink_phy_resume() =>
> phylink_phy_early_start() ) ? It would then prevent [1] from moving the whole
> phylink_start() in stmmac_main too early (see issue raised by Russell) and allow
> to just call phylink_phy_early_start() early enough, while still being usable in
> the resume scenario raised by Clark. Or am I missing bigger issues with current
> series ?

The whole thing died a death as soon as I suggested that the two parties
work together, so currently as far as I'm concerned, the issue is dead
and no patches have been merged to fix it.

As I stated, I don't want to merge one solution, and then have the other
solution then come along later... the simple answer would have been for
party A to test party B's changes to see whether they solved the
problem, but clearly that never happened.

If there's an unwillingness to work together to solve a common problem,
then the problem will remain unsolved.

Note that we also have an ongoing discussion because of the AR803x PHYs
and their default-enabled hibernation mode, for which I've proposed
this patch. I haven't considered whether it should impact the resume
problem - it probably _should_ and it should probably cause the PHY to
resume outputting its clock when it resumes (which should have already
happened by the time stmmac begins resuming.)

However, as no one seems prepared to constructively comment on either
my proposal nor (so far) the patch, there's no guarantee that we'll
merge the change below.

So, right now I've no idea what's going to become of stmmac and its
requirement to have RXC always present. It seems there's multiple
issues that that requirement causes.

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index fcab363d8dfa..a954f1d61709 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1254,6 +1254,11 @@ static int stmmac_phy_setup(struct stmmac_priv *priv)
 			~(MAC_10HD | MAC_100HD | MAC_1000HD);
 	priv->phylink_config.mac_managed_pm = true;
 
+	/* stmmac always requires a receive clock in order for things like
+	 * hardware reset to work.
+	 */
+	priv->phylink_config.mac_requires_rxc = true;
+
 	phylink = phylink_create(&priv->phylink_config, fwnode,
 				 mode, &stmmac_phylink_mac_ops);
 	if (IS_ERR(phylink))
diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c
index 13c4121fa309..619a63a0d14f 100644
--- a/drivers/net/phy/at803x.c
+++ b/drivers/net/phy/at803x.c
@@ -990,7 +990,8 @@ static int at803x_hibernation_mode_config(struct phy_device *phydev)
 	/* The default after hardware reset is hibernation mode enabled. After
 	 * software reset, the value is retained.
 	 */
-	if (!(priv->flags & AT803X_DISABLE_HIBERNATION_MODE))
+	if (!(priv->flags & AT803X_DISABLE_HIBERNATION_MODE) &&
+	    !(phydev->dev_flags & PHY_F_RXC_ALWAYS_ON))
 		return 0;
 
 	return at803x_debug_reg_mask(phydev, AT803X_DEBUG_REG_HIB_CTRL,
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 4f1c8bb199e9..6568a2759101 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -1830,6 +1830,8 @@ static int phylink_bringup_phy(struct phylink *pl, struct phy_device *phy,
 static int phylink_attach_phy(struct phylink *pl, struct phy_device *phy,
 			      phy_interface_t interface)
 {
+	u32 flags = 0;
+
 	if (WARN_ON(pl->cfg_link_an_mode == MLO_AN_FIXED ||
 		    (pl->cfg_link_an_mode == MLO_AN_INBAND &&
 		     phy_interface_mode_is_8023z(interface) && !pl->sfp_bus)))
@@ -1838,7 +1840,10 @@ static int phylink_attach_phy(struct phylink *pl, struct phy_device *phy,
 	if (pl->phydev)
 		return -EBUSY;
 
-	return phy_attach_direct(pl->netdev, phy, 0, interface);
+	if (pl->config.mac_requires_rxc)
+		flags |= PHY_F_RXC_ALWAYS_ON;
+
+	return phy_attach_direct(pl->netdev, phy, flags, interface);
 }
 
 /**
@@ -1941,6 +1946,9 @@ int phylink_fwnode_phy_connect(struct phylink *pl,
 		pl->link_config.interface = pl->link_interface;
 	}
 
+	if (pl->config.mac_requires_rxc)
+		flags |= PHY_F_RXC_ALWAYS_ON;
+
 	ret = phy_attach_direct(pl->netdev, phy_dev, flags,
 				pl->link_interface);
 	phy_device_free(phy_dev);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index ba08b0e60279..79df5e01707d 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -761,6 +761,7 @@ struct phy_device {
 
 /* Generic phy_device::dev_flags */
 #define PHY_F_NO_IRQ		0x80000000
+#define PHY_F_RXC_ALWAYS_ON	BIT(30)
 
 static inline struct phy_device *to_phy_device(const struct device *dev)
 {
diff --git a/include/linux/phylink.h b/include/linux/phylink.h
index 789c516c6b4a..a83c1a77338f 100644
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -204,6 +204,7 @@ enum phylink_op_type {
  * @poll_fixed_state: if true, starts link_poll,
  *		      if MAC link is at %MLO_AN_FIXED mode.
  * @mac_managed_pm: if true, indicate the MAC driver is responsible for PHY PM.
+ * @mac_requires_rxc: if true, the MAC always requires a receive clock from PHY.
  * @ovr_an_inband: if true, override PCS to MLO_AN_INBAND
  * @get_fixed_state: callback to execute to determine the fixed link state,
  *		     if MAC link is at %MLO_AN_FIXED mode.
@@ -216,6 +217,7 @@ struct phylink_config {
 	enum phylink_op_type type;
 	bool poll_fixed_state;
 	bool mac_managed_pm;
+	bool mac_requires_rxc;
 	bool ovr_an_inband;
 	void (*get_fixed_state)(struct phylink_config *config,
 				struct phylink_link_state *state);
  
Clark Wang Aug. 11, 2023, 3:39 a.m. UTC | #9
Hi Alexis,

I am so sorry. Clement happened to be busy at the end of March, and I was busy with other projects later, so this issue was shelved.
If you have now taken over the job. I can work with you to solve this problem.

The information he gave me at that time was saying "My need is for the PCS to be configured before the stmmac_hw_setup() is phylink_major_config().". So the reason why his patch put phylink_start() before stmmac_hw_setup() is to ensure to execute the phylink_mac_initial_config() function in phylink_start() before MAC reset, right?

But his patch only changed stmmac_open(), that is, it only affected the sequence after the first startup.
Have you tested the system suspend/resume?
In the resume process, phylink_major_config() is called in phylink_resolve(). It is a work. In some cases on some of our platforms, this work may not be executed in time, which will lead to the lack of RXC, which will cause the MAC reset to fail.

Best Regards,
Clark Wang

> -----Original Message-----
> From: Russell King <linux@armlinux.org.uk>
> Sent: 2023年8月11日 0:29
> To: Alexis Lothoré <alexis.lothore@bootlin.com>
> Cc: Clark Wang <xiaoning.wang@nxp.com>; Paolo Abeni
> <pabeni@redhat.com>; peppe.cavallaro@st.com;
> alexandre.torgue@foss.st.com; joabreu@synopsys.com;
> davem@davemloft.net; edumazet@google.com; kuba@kernel.org;
> mcoquelin.stm32@gmail.com; andrew@lunn.ch; hkallweit1@gmail.com;
> netdev@vger.kernel.org; linux-stm32@st-md-mailman.stormreply.com;
> linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
> dl-linux-imx <linux-imx@nxp.com>; Thomas Petazzoni
> <thomas.petazzoni@bootlin.com>
> Subject: Re: [PATCH V3 1/2] net: phylink: add a function to resume phy alone to
> fix resume issue with WoL enabled
>
> On Thu, Aug 10, 2023 at 06:10:04PM +0200, Alexis Lothoré wrote:
> > Hello Clark, Russell,
> >
> > On 2/23/23 12:06, Russell King (Oracle) wrote:
> > > On Thu, Feb 23, 2023 at 10:27:06AM +0000, Clark Wang wrote:
> > >> Hi Russel,
> > >>
> > >> I have sent the V4 patch set yesterday.
> > >> You can check it from:
> https://lore.ker/
> nel.org%2Flinux-arm-kernel%2F20230222092636.1984847-2-xiaoning.wang%4
> 0nxp.com%2FT%2F&data=05%7C01%7Cxiaoning.wang%40nxp.com%7Ccc3388
> b42e244474ceb608db99bedff0%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0
> %7C0%7C638272817295048537%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4
> wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7
> C%7C%7C&sdata=k3kDU5bfPF6eVBjhEAKbUmbO%2FU%2BwwBCMRVW0%2B
> w5a0D0%3D&reserved=0
> > >>
> > >
> > > Ah yes, sent while net-next is closed.
> > >
> > > Have you had any contact with Clément Léger ? If not, please can you
> > > reach out to Clément, because he has virtually the same problem. I
> > > don't want to end up with a load of different fixes in the mainline
> > > kernel for the same "we need the PHY clock enabled on stmmac" problem
> > > from different people.
> >
> > I am resuming Clement's initial efforts on RZN1 GMAC interface, which
> indeed is
> > in need of an early PCS initialization mechanism too ([1]).
> >
> > > Please try to come up with one patch set between you both to fix this.
> > >
> > > (effectively, that's a temporary NAK on your series.)>
> >
> > I would like to know if this series is still ongoing/alive ? I have checked for
> > follow-ups after V4 sent by Clark ([2]), but did not find anything. Clement
> > handed me over the topic right when Russell suggested to discuss this shared
> > need, so I am not sure if any mutualization discussion has happened yet ?
> >
> > If not, what would be the next steps ? Based on my understanding and
> comments on
> > the [2] v3, I feel that Clark's series would be a good starting point. In order
> > to be able to use it in both series, we could possibly make it less specific to
> > the "resume" mechanism (basically, phylink_phy_resume() =>
> > phylink_phy_early_start() ) ? It would then prevent [1] from moving the whole
> > phylink_start() in stmmac_main too early (see issue raised by Russell) and
> allow
> > to just call phylink_phy_early_start() early enough, while still being usable in
> > the resume scenario raised by Clark. Or am I missing bigger issues with
> current
> > series ?
>
> The whole thing died a death as soon as I suggested that the two parties
> work together, so currently as far as I'm concerned, the issue is dead
> and no patches have been merged to fix it.
>
> As I stated, I don't want to merge one solution, and then have the other
> solution then come along later... the simple answer would have been for
> party A to test party B's changes to see whether they solved the
> problem, but clearly that never happened.
>
> If there's an unwillingness to work together to solve a common problem,
> then the problem will remain unsolved.
>
> Note that we also have an ongoing discussion because of the AR803x PHYs
> and their default-enabled hibernation mode, for which I've proposed
> this patch. I haven't considered whether it should impact the resume
> problem - it probably _should_ and it should probably cause the PHY to
> resume outputting its clock when it resumes (which should have already
> happened by the time stmmac begins resuming.)
>
> However, as no one seems prepared to constructively comment on either
> my proposal nor (so far) the patch, there's no guarantee that we'll
> merge the change below.
>
> So, right now I've no idea what's going to become of stmmac and its
> requirement to have RXC always present. It seems there's multiple
> issues that that requirement causes.
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index fcab363d8dfa..a954f1d61709 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -1254,6 +1254,11 @@ static int stmmac_phy_setup(struct stmmac_priv
> *priv)
>                       ~(MAC_10HD | MAC_100HD | MAC_1000HD);
>       priv->phylink_config.mac_managed_pm = true;
>
> +     /* stmmac always requires a receive clock in order for things like
> +      * hardware reset to work.
> +      */
> +     priv->phylink_config.mac_requires_rxc = true;
> +
>       phylink = phylink_create(&priv->phylink_config, fwnode,
>                                mode, &stmmac_phylink_mac_ops);
>       if (IS_ERR(phylink))
> diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c
> index 13c4121fa309..619a63a0d14f 100644
> --- a/drivers/net/phy/at803x.c
> +++ b/drivers/net/phy/at803x.c
> @@ -990,7 +990,8 @@ static int at803x_hibernation_mode_config(struct
> phy_device *phydev)
>       /* The default after hardware reset is hibernation mode enabled. After
>        * software reset, the value is retained.
>        */
> -     if (!(priv->flags & AT803X_DISABLE_HIBERNATION_MODE))
> +     if (!(priv->flags & AT803X_DISABLE_HIBERNATION_MODE) &&
> +         !(phydev->dev_flags & PHY_F_RXC_ALWAYS_ON))
>               return 0;
>
>       return at803x_debug_reg_mask(phydev,
> AT803X_DEBUG_REG_HIB_CTRL,
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index 4f1c8bb199e9..6568a2759101 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -1830,6 +1830,8 @@ static int phylink_bringup_phy(struct phylink *pl,
> struct phy_device *phy,
>  static int phylink_attach_phy(struct phylink *pl, struct phy_device *phy,
>                             phy_interface_t interface)
>  {
> +     u32 flags = 0;
> +
>       if (WARN_ON(pl->cfg_link_an_mode == MLO_AN_FIXED ||
>                   (pl->cfg_link_an_mode == MLO_AN_INBAND &&
>                    phy_interface_mode_is_8023z(interface) && !pl->sfp_bus)))
> @@ -1838,7 +1840,10 @@ static int phylink_attach_phy(struct phylink *pl,
> struct phy_device *phy,
>       if (pl->phydev)
>               return -EBUSY;
>
> -     return phy_attach_direct(pl->netdev, phy, 0, interface);
> +     if (pl->config.mac_requires_rxc)
> +             flags |= PHY_F_RXC_ALWAYS_ON;
> +
> +     return phy_attach_direct(pl->netdev, phy, flags, interface);
>  }
>
>  /**
> @@ -1941,6 +1946,9 @@ int phylink_fwnode_phy_connect(struct phylink *pl,
>               pl->link_config.interface = pl->link_interface;
>       }
>
> +     if (pl->config.mac_requires_rxc)
> +             flags |= PHY_F_RXC_ALWAYS_ON;
> +
>       ret = phy_attach_direct(pl->netdev, phy_dev, flags,
>                               pl->link_interface);
>       phy_device_free(phy_dev);
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index ba08b0e60279..79df5e01707d 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -761,6 +761,7 @@ struct phy_device {
>
>  /* Generic phy_device::dev_flags */
>  #define PHY_F_NO_IRQ         0x80000000
> +#define PHY_F_RXC_ALWAYS_ON  BIT(30)
>
>  static inline struct phy_device *to_phy_device(const struct device *dev)
>  {
> diff --git a/include/linux/phylink.h b/include/linux/phylink.h
> index 789c516c6b4a..a83c1a77338f 100644
> --- a/include/linux/phylink.h
> +++ b/include/linux/phylink.h
> @@ -204,6 +204,7 @@ enum phylink_op_type {
>   * @poll_fixed_state: if true, starts link_poll,
>   *                 if MAC link is at %MLO_AN_FIXED mode.
>   * @mac_managed_pm: if true, indicate the MAC driver is responsible for
> PHY PM.
> + * @mac_requires_rxc: if true, the MAC always requires a receive clock from
> PHY.
>   * @ovr_an_inband: if true, override PCS to MLO_AN_INBAND
>   * @get_fixed_state: callback to execute to determine the fixed link state,
>   *                if MAC link is at %MLO_AN_FIXED mode.
> @@ -216,6 +217,7 @@ struct phylink_config {
>       enum phylink_op_type type;
>       bool poll_fixed_state;
>       bool mac_managed_pm;
> +     bool mac_requires_rxc;
>       bool ovr_an_inband;
>       void (*get_fixed_state)(struct phylink_config *config,
>                               struct phylink_link_state *state);
>
> --
> RMK's Patch system:
> https://www.ar/
> mlinux.org.uk%2Fdeveloper%2Fpatches%2F&data=05%7C01%7Cxiaoning.wan
> g%40nxp.com%7Ccc3388b42e244474ceb608db99bedff0%7C686ea1d3bc2b4c6
> fa92cd99c5c301635%7C0%7C0%7C638272817295048537%7CUnknown%7CT
> WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJ
> XVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2FLZFtiTfr1ZM%2B7OZ%2FM7I
> JkZH42DVgcNMZx8VohBSv38%3D&reserved=0
> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
  

Patch

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 319790221d7f..c2fe66f0b78f 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -80,6 +80,8 @@  struct phylink {
 	DECLARE_PHY_INTERFACE_MASK(sfp_interfaces);
 	__ETHTOOL_DECLARE_LINK_MODE_MASK(sfp_support);
 	u8 sfp_port;
+
+	bool mac_resume_phy_separately;
 };
 
 #define phylink_printk(level, pl, fmt, ...) \
@@ -1509,6 +1511,7 @@  struct phylink *phylink_create(struct phylink_config *config,
 		return ERR_PTR(-EINVAL);
 	}
 
+	pl->mac_resume_phy_separately = false;
 	pl->using_mac_select_pcs = using_mac_select_pcs;
 	pl->phy_state.interface = iface;
 	pl->link_interface = iface;
@@ -1943,8 +1946,12 @@  void phylink_start(struct phylink *pl)
 	}
 	if (poll)
 		mod_timer(&pl->link_poll, jiffies + HZ);
-	if (pl->phydev)
-		phy_start(pl->phydev);
+	if (pl->phydev) {
+		if (!pl->mac_resume_phy_separately)
+			phy_start(pl->phydev);
+		else
+			pl->mac_resume_phy_separately = false;
+	}
 	if (pl->sfp_bus)
 		sfp_upstream_start(pl->sfp_bus);
 }
@@ -2024,6 +2031,27 @@  void phylink_suspend(struct phylink *pl, bool mac_wol)
 }
 EXPORT_SYMBOL_GPL(phylink_suspend);
 
+/**
+ * phylink_phy_resume() - resume phy alone
+ * @pl: a pointer to a &struct phylink returned from phylink_create()
+ *
+ * In the MAC driver using phylink, if the MAC needs the clock of the phy
+ * when it resumes, can call this function to resume the phy separately.
+ * Then proceed to MAC resume operations.
+ */
+void phylink_phy_resume(struct phylink *pl)
+{
+	ASSERT_RTNL();
+
+	if (!test_bit(PHYLINK_DISABLE_MAC_WOL, &pl->phylink_disable_state)
+	    && pl->phydev) {
+		phy_start(pl->phydev);
+		pl->mac_resume_phy_separately = true;
+	}
+
+}
+EXPORT_SYMBOL_GPL(phylink_phy_resume);
+
 /**
  * phylink_resume() - handle a network device resume event
  * @pl: a pointer to a &struct phylink returned from phylink_create()
diff --git a/include/linux/phylink.h b/include/linux/phylink.h
index c492c26202b5..6edfab5f754c 100644
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -589,6 +589,7 @@  void phylink_stop(struct phylink *);
 
 void phylink_suspend(struct phylink *pl, bool mac_wol);
 void phylink_resume(struct phylink *pl);
+void phylink_phy_resume(struct phylink *pl);
 
 void phylink_ethtool_get_wol(struct phylink *, struct ethtool_wolinfo *);
 int phylink_ethtool_set_wol(struct phylink *, struct ethtool_wolinfo *);