[net-next,0/8] Improvements for RealTek 2.5G Ethernet PHYs

Message ID cover.1683756691.git.daniel@makrotopia.org
Headers
Series Improvements for RealTek 2.5G Ethernet PHYs |

Message

Daniel Golle May 10, 2023, 10:53 p.m. UTC
  Improve support for RealTek 2.5G Ethernet PHYs (RTL822x series).
The PHYs can operate with Clause-22 and Clause-45 MDIO.

When using Clause-45 it is desireable to avoid rate-adapter mode and
rather have the MAC interface mode follow the PHY speed. The PHYs
support 2500Base-X for 2500M, and Cisco SGMII for 1000M/100M/10M.

Also prepare support for proprietary RealTek HiSGMII mode which will
be needed for situations when used with RealTek switch or router SoCs
such as RTL839x or RTL93xx.

Add support for Link Down Power Saving Mode (ALDPS) which is already
supported for older RTL821x series 1GbE PHYs.

Make sure that link-partner advertised modes are only used if the
advertisement can be considered valid. Otherwise we are seeing
false-positives warning about downscaling eventhough higher speeds
are not actually advertised by the link partner.

While at it, use helper function for paged operation and make sure
to use use locking for that as well.

Changes since RFC:
 * Turns out paged read used to identify the PHY needs to be hardcoded
   for the simple reason that the function pointers for paged operations
   have not yet been populated at this point. Hence keep open-coding it,
   but use helper function and make sure it happening while the MDIO bus
   mutex is locked.

Alexander Couzens (1):
  net: phy: realtek: rtl8221: allow to configure SERDES mode

Chukun Pan (1):
  net: phy: realtek: switch interface mode for RTL822x series

Daniel Golle (6):
  net: phy: realtek: use genphy_soft_reset for 2.5G PHYs
  net: phy: realtek: disable SGMII in-band AN for 2.5G PHYs
  net: phy: realtek: make sure paged read is protected by mutex
  net: phy: realtek: use inline functions for 10GbE advertisement
  net: phy: realtek: check validity of 10GbE link-partner advertisement
  net: phy: realtek: setup ALDPS on RTL8221B

 drivers/net/phy/realtek.c | 161 ++++++++++++++++++++++++++++++++------
 1 file changed, 138 insertions(+), 23 deletions(-)
  

Comments

Andrew Lunn May 11, 2023, 12:28 a.m. UTC | #1
On Thu, May 11, 2023 at 12:53:22AM +0200, Daniel Golle wrote:
> Improve support for RealTek 2.5G Ethernet PHYs (RTL822x series).
> The PHYs can operate with Clause-22 and Clause-45 MDIO.
> 
> When using Clause-45 it is desireable to avoid rate-adapter mode and
> rather have the MAC interface mode follow the PHY speed. The PHYs
> support 2500Base-X for 2500M, and Cisco SGMII for 1000M/100M/10M.

I don't see what clause-45 has to do with this. The driver knows that
both C22 and C45 addresses spaces exists in the hardware. It can do
reads/writes on both. If the bus master does not support C45, C45 over
C22 will be performed by the core.

    Andrew
  
Heiner Kallweit May 11, 2023, 5:29 a.m. UTC | #2
On 11.05.2023 00:53, Daniel Golle wrote:
> Improve support for RealTek 2.5G Ethernet PHYs (RTL822x series).
> The PHYs can operate with Clause-22 and Clause-45 MDIO.
> 
> When using Clause-45 it is desireable to avoid rate-adapter mode and
> rather have the MAC interface mode follow the PHY speed. The PHYs
> support 2500Base-X for 2500M, and Cisco SGMII for 1000M/100M/10M.
> 
> Also prepare support for proprietary RealTek HiSGMII mode which will
> be needed for situations when used with RealTek switch or router SoCs
> such as RTL839x or RTL93xx.
> 
> Add support for Link Down Power Saving Mode (ALDPS) which is already
> supported for older RTL821x series 1GbE PHYs.
> 
> Make sure that link-partner advertised modes are only used if the
> advertisement can be considered valid. Otherwise we are seeing
> false-positives warning about downscaling eventhough higher speeds
> are not actually advertised by the link partner.
> 
> While at it, use helper function for paged operation and make sure
> to use use locking for that as well.
> 
> Changes since RFC:
>  * Turns out paged read used to identify the PHY needs to be hardcoded
>    for the simple reason that the function pointers for paged operations
>    have not yet been populated at this point. Hence keep open-coding it,
>    but use helper function and make sure it happening while the MDIO bus
>    mutex is locked.
> 
> Alexander Couzens (1):
>   net: phy: realtek: rtl8221: allow to configure SERDES mode
> 
> Chukun Pan (1):
>   net: phy: realtek: switch interface mode for RTL822x series
> 
> Daniel Golle (6):
>   net: phy: realtek: use genphy_soft_reset for 2.5G PHYs
>   net: phy: realtek: disable SGMII in-band AN for 2.5G PHYs
>   net: phy: realtek: make sure paged read is protected by mutex
>   net: phy: realtek: use inline functions for 10GbE advertisement
>   net: phy: realtek: check validity of 10GbE link-partner advertisement
>   net: phy: realtek: setup ALDPS on RTL8221B
> 
>  drivers/net/phy/realtek.c | 161 ++++++++++++++++++++++++++++++++------
>  1 file changed, 138 insertions(+), 23 deletions(-)
> 

Has this series been tested with RTL8125A/B to ensure that the internal
PHY use case still works?
  
Daniel Golle May 11, 2023, 11:44 a.m. UTC | #3
On Thu, May 11, 2023 at 07:29:21AM +0200, Heiner Kallweit wrote:
> On 11.05.2023 00:53, Daniel Golle wrote:
> > Improve support for RealTek 2.5G Ethernet PHYs (RTL822x series).
> > The PHYs can operate with Clause-22 and Clause-45 MDIO.
> > [...]
> 
> Has this series been tested with RTL8125A/B to ensure that the internal
> PHY use case still works?

The series has been present in OpenWrt for a while now and initially
contained a bug which broke the RTL8221 PCIe RealTek NICs. It has since been
resolved and re-tested, and it seems all fine:

https://github.com/openwrt/openwrt/commit/998b9731577dedc7747dcfa412e4543dabaaa131#r110201620

I assume that quite some OpenWrt users may use RTL8125B PCIe NICs, but I
have asked in the OpenWrt forum for testing results including this series:

https://forum.openwrt.org/t/nanopi-r6s-kernel-6-1-intergration/154677/3?u=daniel

As the r8169 driver is not using phylink and uses C22 to connect to the
PHY the main difference which will affect these devices is that
genphy_soft_reset will be called as a result of
r8169_hw_phy_config->phy_init_hw->(phydrv).soft_reset

Also note the r8169 driver always sets the interface mode to either
PHY_INTERFACE_MODE_GMII or PHY_INTERFACE_MODE_MII in
r8169_phy_connect() before calling phy_connect_direct(). While this is
certainly not technically correct for the 2.5G NICs in the strict sense,
it does have the desired effect that the newly introduced function
rtl8221b_config_init() just returns without making any changes.
  
Daniel Golle May 11, 2023, 5:14 p.m. UTC | #4
On Thu, May 11, 2023 at 02:28:15AM +0200, Andrew Lunn wrote:
> On Thu, May 11, 2023 at 12:53:22AM +0200, Daniel Golle wrote:
> > Improve support for RealTek 2.5G Ethernet PHYs (RTL822x series).
> > The PHYs can operate with Clause-22 and Clause-45 MDIO.
> > 
> > When using Clause-45 it is desireable to avoid rate-adapter mode and
> > rather have the MAC interface mode follow the PHY speed. The PHYs
> > support 2500Base-X for 2500M, and Cisco SGMII for 1000M/100M/10M.
> 
> I don't see what clause-45 has to do with this. The driver knows that
> both C22 and C45 addresses spaces exists in the hardware. It can do
> reads/writes on both. If the bus master does not support C45, C45 over
> C22 will be performed by the core.

My understanding is/was that switching the SerDes interface mode is only
intended with Clause-45 PHYs, derived from this comment and code:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/phy/phylink.c#n1661


Hence I concluded that for Clause-22 PHYs we expect the interface mode
to always remain the same, while many Clause-45 PHYs require switching
the SerDes interface mode depending on the speed of the external link.
Trying to use interface mode switching (in the .read_status function)
with is_c45 == false also just didn't work well:

https://github.com/openwrt/openwrt/pull/11990#issuecomment-1503160296


Up to 1000M this has no really been a problem, as the Cisco SGMII SerDes
supports 10M, 100M and 1000M speeds. Starting with 2500M things have
became more complicated, and we usually have the choice of either have
the MAC<->PHY link operate at a contant mode and speed (e.g. 2500Base-X)
or having to switch the MAC<->PHY interface mode (e.g. between
2500Base-X and SGMII) depending on whether the link speed on the
external interface is 2500M or not.

Looking at PHYs which support speeds beyond one gigabit/sec due to the
higher complexity and need for a larger register space most of them are
managed using Clause-45 MDIO. 2500Base-T PHYs are kind of the exception
because some of them (esp. RealTek) are still mostly being managed using
Clause-22 MDIO using proprietary paging mechanisms to enlarge the
register space.

I also don't like overloading the meaning of is_c45 to decide whether
rate-adapter mode should be used or not, neither do I like the idea to
tie the use of phylink to using SGMII in-band-status or not -- but at
this point both do correlate and there aren't any other feature flags or
validation methods to do it in a better way. In the end this can also
just be solved by documenation, ie. makeing sure that those facts are
well understood: interface mode switching only being supported when
using Clause-45 MDIO and also the fact that phylink expects operating
Cisco SGMII without in-band-status when connecting to a managed PHY.
  
Russell King (Oracle) May 11, 2023, 5:30 p.m. UTC | #5
On Thu, May 11, 2023 at 07:14:48PM +0200, Daniel Golle wrote:
> On Thu, May 11, 2023 at 02:28:15AM +0200, Andrew Lunn wrote:
> > On Thu, May 11, 2023 at 12:53:22AM +0200, Daniel Golle wrote:
> > > Improve support for RealTek 2.5G Ethernet PHYs (RTL822x series).
> > > The PHYs can operate with Clause-22 and Clause-45 MDIO.
> > > 
> > > When using Clause-45 it is desireable to avoid rate-adapter mode and
> > > rather have the MAC interface mode follow the PHY speed. The PHYs
> > > support 2500Base-X for 2500M, and Cisco SGMII for 1000M/100M/10M.
> > 
> > I don't see what clause-45 has to do with this. The driver knows that
> > both C22 and C45 addresses spaces exists in the hardware. It can do
> > reads/writes on both. If the bus master does not support C45, C45 over
> > C22 will be performed by the core.
> 
> My understanding is/was that switching the SerDes interface mode is only
> intended with Clause-45 PHYs, derived from this comment and code:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/phy/phylink.c#n1661

It's only because:

1) Clause 22 PHYs haven't done this.
2) There is currently no way to know what set of interfaces a PHY would
   make use of - and that affects what ethtool linkmodes are possible.

What you point to is nothing more than a hack to make Clause 45 PHYs
work with the code that we currently have.

To sort this properly, we need PHY drivers to tell phylink what
interfaces they are going to switch between once they have been
attached to the network interface. This is what these patches in my
net-queue branch are doing:

net: phy: add possible interfaces
net: phy: marvell10g: fill in possible_interfaces
net: phy: bcm84881: fill in possible_interfaces
net: phylink: split out PHY validation from phylink_bringup_phy()
net: phylink: validate only used interfaces for c45 PHYs

Why only C45 PHYs again? Because the two PHY drivers that I've added
support for "possible_interfaces" to are both C45. There's no reason
we can't make that work for C22 PHYs as well.

We could probably make it work for C22 PHYs out of the box by setting
the appropriate bit for the supplied interface in "possible_interfaces"
inside phy_attach_direct() after the call to phy_init_hw() if
"possible_interfaces" is still empty, which means that if a PHY driver
isn't updated to setup "possible_interfaces" then we get basically
whatever interface mode we're attaching with there.

There may be a problem if phy_attach_direct() gets called with
PHY_INTERFACE_MODE_NA (which I believe is possible with DSA.)
  
Russell King (Oracle) May 11, 2023, 6:09 p.m. UTC | #6
On Thu, May 11, 2023 at 06:30:57PM +0100, Russell King (Oracle) wrote:
> On Thu, May 11, 2023 at 07:14:48PM +0200, Daniel Golle wrote:
> > On Thu, May 11, 2023 at 02:28:15AM +0200, Andrew Lunn wrote:
> > > On Thu, May 11, 2023 at 12:53:22AM +0200, Daniel Golle wrote:
> > > > Improve support for RealTek 2.5G Ethernet PHYs (RTL822x series).
> > > > The PHYs can operate with Clause-22 and Clause-45 MDIO.
> > > > 
> > > > When using Clause-45 it is desireable to avoid rate-adapter mode and
> > > > rather have the MAC interface mode follow the PHY speed. The PHYs
> > > > support 2500Base-X for 2500M, and Cisco SGMII for 1000M/100M/10M.
> > > 
> > > I don't see what clause-45 has to do with this. The driver knows that
> > > both C22 and C45 addresses spaces exists in the hardware. It can do
> > > reads/writes on both. If the bus master does not support C45, C45 over
> > > C22 will be performed by the core.
> > 
> > My understanding is/was that switching the SerDes interface mode is only
> > intended with Clause-45 PHYs, derived from this comment and code:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/phy/phylink.c#n1661
> 
> It's only because:
> 
> 1) Clause 22 PHYs haven't done this.
> 2) There is currently no way to know what set of interfaces a PHY would
>    make use of - and that affects what ethtool linkmodes are possible.
> 
> What you point to is nothing more than a hack to make Clause 45 PHYs
> work with the code that we currently have.
> 
> To sort this properly, we need PHY drivers to tell phylink what
> interfaces they are going to switch between once they have been
> attached to the network interface. This is what these patches in my
> net-queue branch are doing:
> 
> net: phy: add possible interfaces
> net: phy: marvell10g: fill in possible_interfaces
> net: phy: bcm84881: fill in possible_interfaces
> net: phylink: split out PHY validation from phylink_bringup_phy()
> net: phylink: validate only used interfaces for c45 PHYs
> 
> Why only C45 PHYs again? Because the two PHY drivers that I've added
> support for "possible_interfaces" to are both C45. There's no reason
> we can't make that work for C22 PHYs as well.
> 
> We could probably make it work for C22 PHYs out of the box by setting
> the appropriate bit for the supplied interface in "possible_interfaces"
> inside phy_attach_direct() after the call to phy_init_hw() if
> "possible_interfaces" is still empty, which means that if a PHY driver
> isn't updated to setup "possible_interfaces" then we get basically
> whatever interface mode we're attaching with there.
> 
> There may be a problem if phy_attach_direct() gets called with
> PHY_INTERFACE_MODE_NA (which I believe is possible with DSA.)

Maybe something like the below on top of those patches I've pointed
to above? Note that this requires all MAC users of phylink to fill
in the supported_interfaces bitmap. One of the other patches in my
net-queue is:

net: phylink: require supported_interfaces to be filled

which comes before the above patches. I think that's a reasonable
expectation today but needs testing and review of all users (esp.
the DSA drivers.)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index af070be717ec..1cfa101960b9 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -1787,8 +1787,26 @@ static int phylink_validate_phy(struct phylink *pl, struct phy_device *phy,
 	 */
 	state->rate_matching = phy_get_rate_matching(phy, state->interface);
 
-	/* If this is a clause 22 PHY or is using rate matching, it only
-	 * operates in a single mode.
+	/* If the PHY provides a bitmap of the interfaces it will be using,
+	 * use this to validate the PHY. This can be used for both clause 22
+	 * and clause 45 PHYs.
+	 */
+	if (!phy_interface_empty(phy->possible_interfaces)) {
+		/* Calculate the union of the interfaces the PHY supports in
+		 * its configured state, and the host's supported interfaces.
+		 * We never want an interface that isn't supported by the host.
+		 */
+		phy_interface_and(interfaces, phy->possible_interfaces,
+				  pl->config->supported_interfaces);
+
+		return phylink_validate_mask(pl, mode, supported, state,
+					     interfaces);
+	}
+
+	/* If the PHY doesn't provide it a bitmap of the interfaces it will
+	 * be using, or is a traditional clause 22 PHY driver that doesn't
+	 * set ->possible_interfaces, or if we're using rate matching, then
+	 * we're operating in a single mode.
 	 */
 	if (!phy->is_c45 || state->rate_matching != RATE_MATCH_NONE)
 		return phylink_validate(pl, mode, supported, state);
@@ -1797,28 +1815,18 @@ static int phylink_validate_phy(struct phylink *pl, struct phy_device *phy,
 	 * modes according to the negotiated media speed. For example, the
 	 * interface may switch between 10GBASE-R, 5GBASE-R, 2500BASE-X and
 	 * SGMII.
+	 *
+	 * If we're operating in such a mode, but haven't been provided a
+	 * possible_interfaces bitmap, then we need to validate all possible
+	 * interfaces.
 	 */
-
-	/* Backwards compatibility for those MAC drivers that don't set
-	 * their supported_interfaces, or PHY drivers that don't set
-	 * their possible_interfaces.
-	 */
-	if (phy_interface_empty(phy->possible_interfaces) &&
+	if (phy->is_c45 &&
 	    state->interface != PHY_INTERFACE_MODE_RXAUI &&
 	    state->interface != PHY_INTERFACE_MODE_XAUI &&
-	    state->interface != PHY_INTERFACE_MODE_USXGMII) {
+	    state->interface != PHY_INTERFACE_MODE_USXGMII)
 		state->interface = PHY_INTERFACE_MODE_NA;
-		return phylink_validate(pl, mode, supported, state);
-	}
-
-	/* Calculate the union of the interfaces the PHY supports in
-	 * its configured state, and the host's supported interfaces.
-	 * We never want an interface that isn't supported by the host.
-	 */
-	phy_interface_and(interfaces, phy->possible_interfaces,
-			  pl->config->supported_interfaces);
 
-	return phylink_validate_mask(pl, mode, supported, state, interfaces);
+	return phylink_validate(pl, mode, supported, state);
 }
 
 static int phylink_bringup_phy(struct phylink *pl, struct phy_device *phy,
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 4e0db4a14f30..b54aa9e8c122 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -178,6 +178,11 @@ static inline bool phy_interface_empty(const unsigned long *intf)
 	return bitmap_empty(intf, PHY_INTERFACE_MODE_MAX);
 }
 
+static inline unsigned int phy_interface_weight(const unsigned long *intf)
+{
+	return bitmap_weight(intf, PHY_INTERFACE_MODE_MAX);
+}
+
 static inline void phy_interface_and(unsigned long *dst, const unsigned long *a,
 				     const unsigned long *b)
 {
  
Daniel Golle May 13, 2023, 5:52 p.m. UTC | #7
Hi Russell,

thank you for valuable your input and suggestions.

On Thu, May 11, 2023 at 07:09:32PM +0100, Russell King (Oracle) wrote:
> On Thu, May 11, 2023 at 06:30:57PM +0100, Russell King (Oracle) wrote:
> > On Thu, May 11, 2023 at 07:14:48PM +0200, Daniel Golle wrote:
> > > On Thu, May 11, 2023 at 02:28:15AM +0200, Andrew Lunn wrote:
> > > > On Thu, May 11, 2023 at 12:53:22AM +0200, Daniel Golle wrote:
> > > > > Improve support for RealTek 2.5G Ethernet PHYs (RTL822x series).
> > > > > The PHYs can operate with Clause-22 and Clause-45 MDIO.
> > > > > 
> > > > > When using Clause-45 it is desireable to avoid rate-adapter mode and
> > > > > rather have the MAC interface mode follow the PHY speed. The PHYs
> > > > > support 2500Base-X for 2500M, and Cisco SGMII for 1000M/100M/10M.
> > > > 
> > > > I don't see what clause-45 has to do with this. The driver knows that
> > > > both C22 and C45 addresses spaces exists in the hardware. It can do
> > > > reads/writes on both. If the bus master does not support C45, C45 over
> > > > C22 will be performed by the core.
> > > 
> > > My understanding is/was that switching the SerDes interface mode is only
> > > intended with Clause-45 PHYs, derived from this comment and code:
> > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/phy/phylink.c#n1661
> > 
> > It's only because:
> > 
> > 1) Clause 22 PHYs haven't done this.
> > 2) There is currently no way to know what set of interfaces a PHY would
> >    make use of - and that affects what ethtool linkmodes are possible.
> > 
> > What you point to is nothing more than a hack to make Clause 45 PHYs
> > work with the code that we currently have.

As this status-quo has been unchanged for several years now, we could
as well consider it having evolved into a convention...?

> > 
> > To sort this properly, we need PHY drivers to tell phylink what
> > interfaces they are going to switch between once they have been
> > attached to the network interface. This is what these patches in my
> > net-queue branch are doing:
> > 
> > net: phy: add possible interfaces
> > net: phy: marvell10g: fill in possible_interfaces
> > net: phy: bcm84881: fill in possible_interfaces
> > net: phylink: split out PHY validation from phylink_bringup_phy()
> > net: phylink: validate only used interfaces for c45 PHYs
> > 
> > Why only C45 PHYs again? Because the two PHY drivers that I've added
> > support for "possible_interfaces" to are both C45. There's no reason
> > we can't make that work for C22 PHYs as well.

Are you planning to re-submit or merge those changes any time in the
near future?

> > 
> > We could probably make it work for C22 PHYs out of the box by setting
> > the appropriate bit for the supplied interface in "possible_interfaces"
> > inside phy_attach_direct() after the call to phy_init_hw() if
> > "possible_interfaces" is still empty, which means that if a PHY driver
> > isn't updated to setup "possible_interfaces" then we get basically
> > whatever interface mode we're attaching with there.
> > 
> > There may be a problem if phy_attach_direct() gets called with
> > PHY_INTERFACE_MODE_NA (which I believe is possible with DSA.)
> 
> Maybe something like the below on top of those patches I've pointed
> to above? Note that this requires all MAC users of phylink to fill
> in the supported_interfaces bitmap. One of the other patches in my
> net-queue is:
> 
> net: phylink: require supported_interfaces to be filled

I've picked up the patches above and also that one from your tree
git://git.armlinux.org.uk/~rmk/linux-arm net-queue. Together with
your suggestion below this will solve part of the problem in a much
more clean way because we explicitely state the supported interface
modes instead of implictely assuming that switching to Cisco SGMII for
10M/100M/1000M is supported by the MAC in case 2500Base-X is used
for 2500M.

Regarding .get_rate_matching I understand that Linux currently just
reads from the PHY whether rate matching is going to be performed.
I assume the PHY should enable rate matching in case the MAC doesn't
support lower-speed interface modes?
In the marvell10g and aquantia PHY drivers I see that the bootloader (?)
is probably supposed to have already done that, as there is no code to
enable or disable rate adapter mode depending on the MACs
capabilities. So this problem (having to decides whether or not it is
feasable to use rate-adapter mode of the PHY; I've 'abused' is_c45 to
decide that...) is not being adressed by your patchset either, or did I
miss something?

Anyway. In case you are submitting or merging that set of changes I can
re-submit my series on top of it.

> 
> which comes before the above patches. I think that's a reasonable
> expectation today but needs testing and review of all users (esp.
> the DSA drivers.)

I can see that it should work fine with mt7530 which is the only DSA
driver I have been dealing with and have hardware to test.

> 
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index af070be717ec..1cfa101960b9 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -1787,8 +1787,26 @@ static int phylink_validate_phy(struct phylink *pl, struct phy_device *phy,
>  	 */
>  	state->rate_matching = phy_get_rate_matching(phy, state->interface);
>  
> -	/* If this is a clause 22 PHY or is using rate matching, it only
> -	 * operates in a single mode.
> +	/* If the PHY provides a bitmap of the interfaces it will be using,
> +	 * use this to validate the PHY. This can be used for both clause 22
> +	 * and clause 45 PHYs.
> +	 */
> +	if (!phy_interface_empty(phy->possible_interfaces)) {
> +		/* Calculate the union of the interfaces the PHY supports in
> +		 * its configured state, and the host's supported interfaces.
> +		 * We never want an interface that isn't supported by the host.
> +		 */
> +		phy_interface_and(interfaces, phy->possible_interfaces,
> +				  pl->config->supported_interfaces);
> +
> +		return phylink_validate_mask(pl, mode, supported, state,
> +					     interfaces);
> +	}
> +
> +	/* If the PHY doesn't provide it a bitmap of the interfaces it will
> +	 * be using, or is a traditional clause 22 PHY driver that doesn't
> +	 * set ->possible_interfaces, or if we're using rate matching, then
> +	 * we're operating in a single mode.
>  	 */
>  	if (!phy->is_c45 || state->rate_matching != RATE_MATCH_NONE)
>  		return phylink_validate(pl, mode, supported, state);
> @@ -1797,28 +1815,18 @@ static int phylink_validate_phy(struct phylink *pl, struct phy_device *phy,
>  	 * modes according to the negotiated media speed. For example, the
>  	 * interface may switch between 10GBASE-R, 5GBASE-R, 2500BASE-X and
>  	 * SGMII.
> +	 *
> +	 * If we're operating in such a mode, but haven't been provided a
> +	 * possible_interfaces bitmap, then we need to validate all possible
> +	 * interfaces.
>  	 */
> -
> -	/* Backwards compatibility for those MAC drivers that don't set
> -	 * their supported_interfaces, or PHY drivers that don't set
> -	 * their possible_interfaces.
> -	 */
> -	if (phy_interface_empty(phy->possible_interfaces) &&
> +	if (phy->is_c45 &&
>  	    state->interface != PHY_INTERFACE_MODE_RXAUI &&
>  	    state->interface != PHY_INTERFACE_MODE_XAUI &&
> -	    state->interface != PHY_INTERFACE_MODE_USXGMII) {
> +	    state->interface != PHY_INTERFACE_MODE_USXGMII)
>  		state->interface = PHY_INTERFACE_MODE_NA;
> -		return phylink_validate(pl, mode, supported, state);
> -	}
> -
> -	/* Calculate the union of the interfaces the PHY supports in
> -	 * its configured state, and the host's supported interfaces.
> -	 * We never want an interface that isn't supported by the host.
> -	 */
> -	phy_interface_and(interfaces, phy->possible_interfaces,
> -			  pl->config->supported_interfaces);
>  
> -	return phylink_validate_mask(pl, mode, supported, state, interfaces);
> +	return phylink_validate(pl, mode, supported, state);
>  }
>  
>  static int phylink_bringup_phy(struct phylink *pl, struct phy_device *phy,
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index 4e0db4a14f30..b54aa9e8c122 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -178,6 +178,11 @@ static inline bool phy_interface_empty(const unsigned long *intf)
>  	return bitmap_empty(intf, PHY_INTERFACE_MODE_MAX);
>  }
>  
> +static inline unsigned int phy_interface_weight(const unsigned long *intf)
> +{
> +	return bitmap_weight(intf, PHY_INTERFACE_MODE_MAX);
> +}
> +
>  static inline void phy_interface_and(unsigned long *dst, const unsigned long *a,
>  				     const unsigned long *b)
>  {
> 
> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
>
  
Russell King (Oracle) May 13, 2023, 7:24 p.m. UTC | #8
On Sat, May 13, 2023 at 07:52:42PM +0200, Daniel Golle wrote:
> Hi Russell,
> 
> thank you for valuable your input and suggestions.
> 
> On Thu, May 11, 2023 at 07:09:32PM +0100, Russell King (Oracle) wrote:
> > On Thu, May 11, 2023 at 06:30:57PM +0100, Russell King (Oracle) wrote:
> > > On Thu, May 11, 2023 at 07:14:48PM +0200, Daniel Golle wrote:
> > > > On Thu, May 11, 2023 at 02:28:15AM +0200, Andrew Lunn wrote:
> > > > > On Thu, May 11, 2023 at 12:53:22AM +0200, Daniel Golle wrote:
> > > > > > Improve support for RealTek 2.5G Ethernet PHYs (RTL822x series).
> > > > > > The PHYs can operate with Clause-22 and Clause-45 MDIO.
> > > > > > 
> > > > > > When using Clause-45 it is desireable to avoid rate-adapter mode and
> > > > > > rather have the MAC interface mode follow the PHY speed. The PHYs
> > > > > > support 2500Base-X for 2500M, and Cisco SGMII for 1000M/100M/10M.
> > > > > 
> > > > > I don't see what clause-45 has to do with this. The driver knows that
> > > > > both C22 and C45 addresses spaces exists in the hardware. It can do
> > > > > reads/writes on both. If the bus master does not support C45, C45 over
> > > > > C22 will be performed by the core.
> > > > 
> > > > My understanding is/was that switching the SerDes interface mode is only
> > > > intended with Clause-45 PHYs, derived from this comment and code:
> > > > 
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/phy/phylink.c#n1661
> > > 
> > > It's only because:
> > > 
> > > 1) Clause 22 PHYs haven't done this.
> > > 2) There is currently no way to know what set of interfaces a PHY would
> > >    make use of - and that affects what ethtool linkmodes are possible.
> > > 
> > > What you point to is nothing more than a hack to make Clause 45 PHYs
> > > work with the code that we currently have.
> 
> As this status-quo has been unchanged for several years now, we could
> as well consider it having evolved into a convention...?

No. It's documented as a hack in the source tree, and if hacks become
conventions, then that leads us into ever more complex code. No thanks.

> > > To sort this properly, we need PHY drivers to tell phylink what
> > > interfaces they are going to switch between once they have been
> > > attached to the network interface. This is what these patches in my
> > > net-queue branch are doing:
> > > 
> > > net: phy: add possible interfaces
> > > net: phy: marvell10g: fill in possible_interfaces
> > > net: phy: bcm84881: fill in possible_interfaces
> > > net: phylink: split out PHY validation from phylink_bringup_phy()
> > > net: phylink: validate only used interfaces for c45 PHYs
> > > 
> > > Why only C45 PHYs again? Because the two PHY drivers that I've added
> > > support for "possible_interfaces" to are both C45. There's no reason
> > > we can't make that work for C22 PHYs as well.
> 
> Are you planning to re-submit or merge those changes any time in the
> near future?

It depends what you mean by "near future". In any case, as a result of
this new use case for them, I did put in a bit more effort with these,
and they've now changed in my tree so that the "possible_interfaces"
thing can be used for _any_ PHY what so ever - no longer limited to
just clause 45 PHYs. That actually makes the patch changing phylink
quite a bit simpler.

> Regarding .get_rate_matching I understand that Linux currently just
> reads from the PHY whether rate matching is going to be performed.
> I assume the PHY should enable rate matching in case the MAC doesn't
> support lower-speed interface modes?

The whole rate-matching thing is a first generation attempt (not by
me!) to add generic support. It's not something that historically
I've cared about, because I haven't had the hardware that supports
it. So, when someone proposed something that looks reasonable, that's
what got merged.

However, now knowing what the Aquantia driver is doing - where its
entirely possible to have rate matching with some negotiation results
but not others, I would say that the current implementation is barely
up to the job - because it assumes that rate matching will be used
for all speeds or no speeds and not a mixture.

> In the marvell10g and aquantia PHY drivers I see that the bootloader (?)
> is probably supposed to have already done that, as there is no code to
> enable or disable rate adapter mode depending on the MACs
> capabilities.

Firstly, you have to understand the historical situation:

  phylib only operated with a single interface type, and mainly clause
  22 PHYs which had none of this interface switching or rate matching
  malarkey to contend with. So there has been no negotiation between
  phylib drivers and MACs. We have relied upon firmware descriptions
  (such as device tree) to tell us what interface mode should be used.

  Some PHY drivers configure the hardware to operate in the requested
  mode, others don't. Some configure only a subset of modes and just
  ignore other modes.

That is how most PHY drivers today operate.

Now that we have a growing number of Clause 45 drivers, the first fully
fleged one was 88x3310 on the Macchiatobin platform. This platform uses
interface-mode switching depending on the speed negotiated on the media
side. The MAC could be switched, so after a bit of discussion, phylib
permitted PHY drivers to switch phydev->interface when reading the PHY
status. There was no need for any negotiation - the hardware was already
setup by pinstrapping, and no need to consider the rate matching modes
on that PHY for two reasons:

1. the PHY is not MACSEC enabled, which means it is incapable of sending
   pause frames to the host. It is stated in the data sheet that in this
   case, the host MAC _must_ increase the IPG. We have no support in the
   MAC driver (mvpp2) to do this.

2. The mvpp2 hardware also does not support flow control, so even if we
   did have a PHY that produced pause frames.

So, overall, with this hardware combination, rate matching was utterly
pointless to consider.

In order to allow the PHY to advertise anything except the highest
speed, it needed a workaround/hack in phylink so that the validation
function allowed the slower speeds. This is how the Clause 45 hack
that's now present came about. If we had this "possible_interfaces"
thing, then that wouldn't have been needed.

Then, the Methode DM7052 SFP module came along, which did exactly the
same kind of host-interface switching that 88x3310 does, but without
_any_ inband words, not even in SGMII. So phylink ended up with a
workaround for the lack of inband (by being currently the only SFP
that phylink switches from inband to PHY mode.)

Then came Marek's 88x3310 on a SFP+ module, which did need some form
of negotiation of the interface modes between the MAC and PHY, so
Marek picked up some patches from my tree, worked on them and they're
now submitted. I wasn't entirely happy with them but had nothing
better, so I didn't object. I'm still not entirely happy with them.
This is where the marvell10g driver gained support for trying to work
out whether the PHY should be reconfigured to use rate-matching or not.

The reason I don't like this is for a few reasons:

1) filling in phydev->host_interfaces makes the PHY driver configure
   the MAC type overriding whatever device-tree said to use as the
   primary interface. Consequently, we can end up with the PHY and
   host MAC using different modes (e.g. a MAC supports USXGMII and
   10GBASE-R. DT says to use 10GBASE-R. MAC configures for 10GBASE-R.
   PHY configures itself for USXGMII.) Provided the MAC pays attention
   to phydev->interface, it shouldn't be a problem, but many MACs do
   not.

2) we have a chicken-and-egg problem in phylink. We don't know
   whether the PHY driver will behave in this way, so we need to
   compute what interface mode should be used before calling
   phy_attach_direct(). If a PHY driver does support host_interfaces,
   then it'll end up choosing some other interface mode, meanwhile
   we've set ourselves up to configure for the mode we've chosen -
   but we've already restricted the linkmodes that the PHY can
   support to the original interface mode we decided upon.

3) they can end up enabling rate-matching mode when in fact we don't
   want rate-matching (e.g. the MAC doesn't support it) but also
   doesn't support e.g. switching to SGMII from 10GBASE-R.

As such, this just doesn't feel like the correct solution to me, and
I think it ends up creating more problems for the future. I had
actually dropped the patches from my tree before they were submitted
because I'd decided they weren't a reasonable way forward.

Then the rate-matching stuff was bolted in, and we get to where we are
today.

As of yesterday, I have now received some SFP+ modules that contain a
*Marvell* AQrate AQR113C PHY - which is driven by the aquantia PHY
driver. These modules do rate-matching with pause frames, and its setup
at power-up so that everything goes to 10GBASE-R on the host side. That
means it'll only work in a SFP cage which is 10G capable.

What I have noticed is that our rate-matching negotiation in the kernel
is broken: as I say above, mvpp2 doesn't support pause-mode rate
matching, yet we end up with this being forced on the mvpp2 driver (and
for some reason, we get the "rx pause" in the link-up kernel
notification message, which I thought I said should *not* be the case
as that message is supposed to report what happened with the *media*
part of the link.)

The point that I'm making is that this code has evolved according to
people's needs, and where we are today with it, it's far from being
fully satisfactory. While it satisfies our currentl use cases, the
way we configure the MAC <-> PHY interface is creaking at the seams
and needs a total redesign, and while it's easy to say that, it's
much harder to find something that does work, and that doesn't
require massive changes in the kernel.

This is exactly what I had been doing with the patches in my tree,
first with the "host_interfaces" thing, then deciding that wasn't
such a good idea, then deciding it would be better (at least for
phylink) if we knew what interfaces the PHY was actually going to
switch between (aka the "possible_interfaces" thing.) Even that
isn't quite sufficient.

Here's what I think we might need to solve this properly.

- We need firmware to describe the capabilities of the board wiring
  between the MAC and the PHY - how many lanes, maximum supported
  speed (maybe minimum as well?) Firmware can also specify what the
  preferred operating mode is.

  (We've had discussion about this in the past... it didn't come to
  a resolution.)

- We need MAC driver to publish based on the above what interface
  modes its prepared to support, and for each interface mode, whether
  rate matching can be supported, and what *sort* of rate matching.
  There's pause-based, extended-IPG-based, and half-duplex back-
  pressure based.)

- We need to know what the PHY supports in exactly the same way.

- We then need a way to resolve those two which allows us to select
  either a single interface mode with or without rate matching, or
  a group of interface modes that the PHY would be prepared to switch
  between based on the media side resolution.

However, to get there, you're probably looking at years of work,
sending patches, getting lots of review comments, and probably getting
push-back against the idea from firmware people.

... and at that point one gets back to what is the simplest thing we
can do to patch the existing code to make our use cases work, rather
than having something designed that caters nicely for the problems
we have today, but also can be sensibly extended as this area is
clearly gaining more "features".

> So this problem (having to decides whether or not it is
> feasable to use rate-adapter mode of the PHY; I've 'abused' is_c45 to
> decide that...) is not being adressed by your patchset either, or did I
> miss something?

See the above!

> Anyway. In case you are submitting or merging that set of changes I can
> re-submit my series on top of it.

Well:

> > which comes before the above patches. I think that's a reasonable
> > expectation today but needs testing and review of all users (esp.
> > the DSA drivers.)
> 
> I can see that it should work fine with mt7530 which is the only DSA
> driver I have been dealing with and have hardware to test.

Yes, and that's one that I've updated myself to populate phylink's
supported_interfaces, which is a pre-requisit for this. Any phylink-
using driver that has not yet populated supported_interfaces will
break with the proposal I've made... because:

> > +	/* If the PHY provides a bitmap of the interfaces it will be using,
> > +	 * use this to validate the PHY. This can be used for both clause 22
> > +	 * and clause 45 PHYs.
> > +	 */
> > +	if (!phy_interface_empty(phy->possible_interfaces)) {
> > +		/* Calculate the union of the interfaces the PHY supports in
> > +		 * its configured state, and the host's supported interfaces.
> > +		 * We never want an interface that isn't supported by the host.
> > +		 */
> > +		phy_interface_and(interfaces, phy->possible_interfaces,
> > +				  pl->config->supported_interfaces);

As soon as we come across a PHY that populates possible_interfaces
without a MAC driver that populates supported_interfaces, this
will result in "interfaces" being empty and...

> > +
> > +		return phylink_validate_mask(pl, mode, supported, state,
> > +					     interfaces);

this will fail, and use the driver to fail.

The problem here is that each time we change stuff in phylink, we have
to maintain compatibility with all those drivers that already use
phylink - which means keeping the legacy methods. These legacy methods
are just building up and up over time, and unless a stop is put to
this, the code is going to become unmaintainable (not only because
of its complexity, but also because it becomes impossible to properly
test. I already don't have any way to test the legacy code paths
that exist today.)

So, I'm against making any further radical changes that involve
obsoleting anything further until we remove some of the legacy code
that has already accumulated.

One of those involves converting the mv88e6xxx DSA driver to use the
phylink_pcs stuff - that's blocked on DT changes that Andrew submitted
back in the first week of April which have only _just_ been picked up
by the Freescale iMX maintainer in the last couple of days (so we're
going to have to wait another kernel cycle before this can progress.)

Another is phylink requiring supported_interfaces to be filled in by
the MAC - I think that's already the case, but having phylink confirm
that would be really useful. I know it's already the case in the
kernels I run on my hardware, because I've had a patch in my tree
for almost the last eleven months checking for this and causing
phylink_create() to fail if it's not filled in. I tend to now code
my phylink changes on the assumption that everyone is filling this
in... which isn't particularly good (as can be seen in the example
for this possible_interfaces thing.)

Anyway, this is probably not the email you were hoping for. :D Sorry.