[RFC,v11,08/12] net: ethernet: mtk_eth_soc: fix RX data corruption issue

Message ID 9a788bb6984c836e63a7ecbdadff11a723769c37.1677699407.git.daniel@makrotopia.org
State New
Headers
Series net: ethernet: mtk_eth_soc: various enhancements |

Commit Message

Daniel Golle March 1, 2023, 7:55 p.m. UTC
  Also set bit 12 which disabled the RX FIDO clear function when setting up
MAC MCR, as MediaTek SDK did the same change stating:
"If without this patch, kernel might receive invalid packets that are
corrupted by GMAC."[1]
This fixes issues with <= 1G speed where we could previously observe
about 30% packet loss while the bad packet counter was increasing.

[1]: https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/d8a2975939a12686c4a95c40db21efdc3f821f63
Tested-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 3 ++-
 drivers/net/ethernet/mediatek/mtk_eth_soc.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)
  

Comments

Vladimir Oltean March 1, 2023, 11:31 p.m. UTC | #1
On Wed, Mar 01, 2023 at 07:55:05PM +0000, Daniel Golle wrote:
> Also set bit 12 which disabled the RX FIDO clear function when setting up
> MAC MCR, as MediaTek SDK did the same change stating:
> "If without this patch, kernel might receive invalid packets that are
> corrupted by GMAC."[1]
> This fixes issues with <= 1G speed where we could previously observe
> about 30% packet loss while the bad packet counter was increasing.
> 
> [1]: https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/d8a2975939a12686c4a95c40db21efdc3f821f63
> Tested-by: Bjørn Mork <bjorn@mork.no>
> Signed-off-by: Daniel Golle <daniel@makrotopia.org>
> ---

Should this patch be submitted separately from the series, to the
net.git tree, to be backported to stable kernels?
  
Daniel Golle March 2, 2023, 12:03 a.m. UTC | #2
On Thu, Mar 02, 2023 at 01:31:21AM +0200, Vladimir Oltean wrote:
> On Wed, Mar 01, 2023 at 07:55:05PM +0000, Daniel Golle wrote:
> > Also set bit 12 which disabled the RX FIDO clear function when setting up
> > MAC MCR, as MediaTek SDK did the same change stating:
> > "If without this patch, kernel might receive invalid packets that are
> > corrupted by GMAC."[1]
> > This fixes issues with <= 1G speed where we could previously observe
> > about 30% packet loss while the bad packet counter was increasing.
> > 
> > [1]: https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/d8a2975939a12686c4a95c40db21efdc3f821f63
> > Tested-by: Bjørn Mork <bjorn@mork.no>
> > Signed-off-by: Daniel Golle <daniel@makrotopia.org>
> > ---
> 
> Should this patch be submitted separately from the series, to the
> net.git tree, to be backported to stable kernels?

Maybe yes, as this issue may affect e.g. the BPi-R3 board when used
with 1G SFP modules. Previously this has just never been a problem as
all practically all boards with MediaTek SoCs using SGMII also use the
MediaTek MT7531 switch connecting in 2500Base-X mode.

Should the Fixes:-tag hence reference the commit adding support for the
BPi-R3?
  
Vladimir Oltean March 2, 2023, 10 a.m. UTC | #3
On Thu, Mar 02, 2023 at 12:03:45AM +0000, Daniel Golle wrote:
> On Thu, Mar 02, 2023 at 01:31:21AM +0200, Vladimir Oltean wrote:
> > On Wed, Mar 01, 2023 at 07:55:05PM +0000, Daniel Golle wrote:
> > > Also set bit 12 which disabled the RX FIDO clear function when setting up
> > > MAC MCR, as MediaTek SDK did the same change stating:
> > > "If without this patch, kernel might receive invalid packets that are
> > > corrupted by GMAC."[1]
> > > This fixes issues with <= 1G speed where we could previously observe
> > > about 30% packet loss while the bad packet counter was increasing.
> > > 
> > > [1]: https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/d8a2975939a12686c4a95c40db21efdc3f821f63
> > > Tested-by: Bjørn Mork <bjorn@mork.no>
> > > Signed-off-by: Daniel Golle <daniel@makrotopia.org>
> > > ---
> > 
> > Should this patch be submitted separately from the series, to the
> > net.git tree, to be backported to stable kernels?
> 
> Maybe yes, as this issue may affect e.g. the BPi-R3 board when used
> with 1G SFP modules. Previously this has just never been a problem as
> all practically all boards with MediaTek SoCs using SGMII also use the
> MediaTek MT7531 switch connecting in 2500Base-X mode.
> 
> Should the Fixes:-tag hence reference the commit adding support for the
> BPi-R3?

If it's not an issue that affects existing setups, there is no need to
backport the patch. But it needs to be clearly described as such in the
commit message.

You mention <= 1G speeds, but then only talk about 1G SFP modules.
I see that the mtk_eth_soc driver also sets "gmii" and "rgmii" in
phylink's supported_interfaces. Those are also <= 1G speeds. There could
also be SGMII on-board PHYs. Does the RX FIFO clearing issue not affect
those?
  
Daniel Golle March 2, 2023, 11:56 a.m. UTC | #4
On Thu, Mar 02, 2023 at 12:00:22PM +0200, Vladimir Oltean wrote:
> On Thu, Mar 02, 2023 at 12:03:45AM +0000, Daniel Golle wrote:
> > On Thu, Mar 02, 2023 at 01:31:21AM +0200, Vladimir Oltean wrote:
> > > On Wed, Mar 01, 2023 at 07:55:05PM +0000, Daniel Golle wrote:
> > > > Also set bit 12 which disabled the RX FIDO clear function when setting up
> > > > MAC MCR, as MediaTek SDK did the same change stating:
> > > > "If without this patch, kernel might receive invalid packets that are
> > > > corrupted by GMAC."[1]
> > > > This fixes issues with <= 1G speed where we could previously observe
> > > > about 30% packet loss while the bad packet counter was increasing.
> > > > 
> > > > [1]: https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/d8a2975939a12686c4a95c40db21efdc3f821f63
> > > > Tested-by: Bjørn Mork <bjorn@mork.no>
> > > > Signed-off-by: Daniel Golle <daniel@makrotopia.org>
> > > > ---
> > > 
> > > Should this patch be submitted separately from the series, to the
> > > net.git tree, to be backported to stable kernels?
> > 
> > Maybe yes, as this issue may affect e.g. the BPi-R3 board when used
> > with 1G SFP modules. Previously this has just never been a problem as
> > all practically all boards with MediaTek SoCs using SGMII also use the
> > MediaTek MT7531 switch connecting in 2500Base-X mode.
> > 
> > Should the Fixes:-tag hence reference the commit adding support for the
> > BPi-R3?
> 
> If it's not an issue that affects existing setups, there is no need to
> backport the patch. But it needs to be clearly described as such in the
> commit message.
> 
> You mention <= 1G speeds, but then only talk about 1G SFP modules.
> I see that the mtk_eth_soc driver also sets "gmii" and "rgmii" in
> phylink's supported_interfaces. Those are also <= 1G speeds. There could
> also be SGMII on-board PHYs. Does the RX FIFO clearing issue not affect
> those?

The issues affects PHYs (and potentially switch PHY ICs) connected via
SGMII operating at 1.25Mbaud.

The only officially supported board affected by this is the BPi-R3 where
it affects the SFP cages -- the on-board MT7531 switch which is also
used on all other boards using these SoCs is connected with 2500Base-X.

The issue does **not** affect RGMII or GMII on the MT7623 SoC, but I
don't have any way to try RGMII or GMII on more recent SoCs as I lack
hardware making use of that to connect a PHY.
  
Vladimir Oltean March 2, 2023, 12:35 p.m. UTC | #5
On Thu, Mar 02, 2023 at 11:56:56AM +0000, Daniel Golle wrote:
> The issues affects PHYs (and potentially switch PHY ICs) connected via
> SGMII operating at 1.25Mbaud.
> 
> The only officially supported board affected by this is the BPi-R3 where
> it affects the SFP cages -- the on-board MT7531 switch which is also
> used on all other boards using these SoCs is connected with 2500Base-X.
> 
> The issue does **not** affect RGMII or GMII on the MT7623 SoC, but I
> don't have any way to try RGMII or GMII on more recent SoCs as I lack
> hardware making use of that to connect a PHY.

I don't believe that board vendors need to have their device tree
"officially supported" in Linux in order to claim that a bug affecting
them should be fixed on stable kernels. As long as the PHY interface
type is generally supported by the driver and is expected to work,
it should work with any board (the exception being if there are other
configuration steps specific to that board required).

In any case, a good commit message for a bug fix explains what is the
user impact of the bug being fixed, the configurations which are affected,
how it was noticed, an adequate Fixes: tag, and if necessary, why it is
being fixed the way it is. In other words, it must be able to respond to
normal questions that a reader might have when he/she stumbles upon it,
for various reasons (it introduces a regression, they are debugging an
issue and want to assess whether backporting this patch would help them,
etc). The reader might be in the not so close future, when you might not
be able to provide clarifications personally, so the commit message
should contain all that you know which is relevant to the topic.

Most of these clarifications were provided by you as replies to the
patch, but they should be present in the commit message instead.
  

Patch

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 030d87c42bd42..ed32a511adc30 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -616,7 +616,8 @@  static int mtk_mac_finish(struct phylink_config *config, unsigned int mode,
 	mcr_cur = mtk_r32(mac->hw, MTK_MAC_MCR(mac->id));
 	mcr_new = mcr_cur;
 	mcr_new |= MAC_MCR_IPG_CFG | MAC_MCR_FORCE_MODE |
-		   MAC_MCR_BACKOFF_EN | MAC_MCR_BACKPR_EN | MAC_MCR_FORCE_LINK;
+		   MAC_MCR_BACKOFF_EN | MAC_MCR_BACKPR_EN | MAC_MCR_FORCE_LINK |
+		   MAC_MCR_RX_FIFO_CLR_DIS;
 
 	/* Only update control register when needed! */
 	if (mcr_new != mcr_cur)
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index 142def8629c82..529c95c481b73 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -404,6 +404,7 @@ 
 #define MAC_MCR_FORCE_MODE	BIT(15)
 #define MAC_MCR_TX_EN		BIT(14)
 #define MAC_MCR_RX_EN		BIT(13)
+#define MAC_MCR_RX_FIFO_CLR_DIS	BIT(12)
 #define MAC_MCR_BACKOFF_EN	BIT(9)
 #define MAC_MCR_BACKPR_EN	BIT(8)
 #define MAC_MCR_FORCE_RX_FC	BIT(5)