[net] net: stmmac: xgmac: fix handling of DPP safety error for DMA channels

Message ID 20240123095006.979437-1-0x1207@gmail.com
State New
Headers
Series [net] net: stmmac: xgmac: fix handling of DPP safety error for DMA channels |

Commit Message

Furong Xu Jan. 23, 2024, 9:50 a.m. UTC
  Commit 56e58d6c8a56 ("net: stmmac: Implement Safety Features in
XGMAC core") checks and reports safety errors, but leaves the
Data Path Parity Errors for each channel in DMA unhandled at all, lead to
a storm of interrupt.
Fix it by checking and clearing the DMA_DPP_Interrupt_Status register.

Fixes: 56e58d6c8a56 ("net: stmmac: Implement Safety Features in XGMAC core")
Signed-off-by: Furong Xu <0x1207@gmail.com>
---
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h      | 1 +
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c | 6 ++++++
 2 files changed, 7 insertions(+)
  

Comments

Serge Semin Jan. 24, 2024, 2:36 p.m. UTC | #1
On Tue, Jan 23, 2024 at 05:50:06PM +0800, Furong Xu wrote:
> Commit 56e58d6c8a56 ("net: stmmac: Implement Safety Features in
> XGMAC core") checks and reports safety errors, but leaves the
> Data Path Parity Errors for each channel in DMA unhandled at all, lead to
> a storm of interrupt.
> Fix it by checking and clearing the DMA_DPP_Interrupt_Status register.
> 
> Fixes: 56e58d6c8a56 ("net: stmmac: Implement Safety Features in XGMAC core")
> Signed-off-by: Furong Xu <0x1207@gmail.com>
> ---
>  drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h      | 1 +
>  drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c | 6 ++++++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
> index 207ff1799f2c..188e11683136 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
> @@ -385,6 +385,7 @@
>  #define XGMAC_DCEIE			BIT(1)
>  #define XGMAC_TCEIE			BIT(0)
>  #define XGMAC_DMA_ECC_INT_STATUS	0x0000306c
> +#define XGMAC_DMA_DPP_INT_STATUS	0x00003074
>  #define XGMAC_DMA_CH_CONTROL(x)		(0x00003100 + (0x80 * (x)))
>  #define XGMAC_SPH			BIT(24)
>  #define XGMAC_PBLx8			BIT(16)
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
> index eb48211d9b0e..874e85b499e2 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
> @@ -745,6 +745,12 @@ static void dwxgmac3_handle_mac_err(struct net_device *ndev,
>  
>  	dwxgmac3_log_error(ndev, value, correctable, "MAC",
>  			   dwxgmac3_mac_errors, STAT_OFF(mac_errors), stats);
> +
> +	value = readl(ioaddr + XGMAC_DMA_DPP_INT_STATUS);
> +	writel(value, ioaddr + XGMAC_DMA_DPP_INT_STATUS);
> +
> +	if (value)
> +		netdev_err(ndev, "Found DMA_DPP error, status: 0x%x\n", value);

1. Why not to implement this in the same way as the rest of the safety
errors handle code? (with the flags described by the
dwxgmac3_error_desc-based table and the respective counters being
incremented should the errors were detected)

2. I don't see this IRQ being enabled in the dwxgmac3_safety_feat_config()
method. How come the respective event has turned to be triggered
anyway?

-Serge(y) 

>  }
>  
>  static const struct dwxgmac3_error_desc dwxgmac3_mtl_errors[32]= {
> -- 
> 2.34.1
> 
>
  
Furong Xu Jan. 25, 2024, 2:56 a.m. UTC | #2
On Wed, 24 Jan 2024 17:36:10 +0300
Serge Semin <fancer.lancer@gmail.com> wrote:

> On Tue, Jan 23, 2024 at 05:50:06PM +0800, Furong Xu wrote:
> > Commit 56e58d6c8a56 ("net: stmmac: Implement Safety Features in
> > XGMAC core") checks and reports safety errors, but leaves the
> > Data Path Parity Errors for each channel in DMA unhandled at all, lead to
> > a storm of interrupt.
> > Fix it by checking and clearing the DMA_DPP_Interrupt_Status register.
> > 
> > Fixes: 56e58d6c8a56 ("net: stmmac: Implement Safety Features in XGMAC core")
> > Signed-off-by: Furong Xu <0x1207@gmail.com>
> > ---
> >  drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h      | 1 +
> >  drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c | 6 ++++++
> >  2 files changed, 7 insertions(+)
> > 
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
> > index 207ff1799f2c..188e11683136 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
> > +++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
> > @@ -385,6 +385,7 @@
> >  #define XGMAC_DCEIE			BIT(1)
> >  #define XGMAC_TCEIE			BIT(0)
> >  #define XGMAC_DMA_ECC_INT_STATUS	0x0000306c
> > +#define XGMAC_DMA_DPP_INT_STATUS	0x00003074
> >  #define XGMAC_DMA_CH_CONTROL(x)		(0x00003100 + (0x80 * (x)))
> >  #define XGMAC_SPH			BIT(24)
> >  #define XGMAC_PBLx8			BIT(16)
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
> > index eb48211d9b0e..874e85b499e2 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
> > @@ -745,6 +745,12 @@ static void dwxgmac3_handle_mac_err(struct net_device *ndev,
> >  
> >  	dwxgmac3_log_error(ndev, value, correctable, "MAC",
> >  			   dwxgmac3_mac_errors, STAT_OFF(mac_errors), stats);
> > +
> > +	value = readl(ioaddr + XGMAC_DMA_DPP_INT_STATUS);
> > +	writel(value, ioaddr + XGMAC_DMA_DPP_INT_STATUS);
> > +
> > +	if (value)
> > +		netdev_err(ndev, "Found DMA_DPP error, status: 0x%x\n", value);  
> 
> 1. Why not to implement this in the same way as the rest of the safety
> errors handle code? (with the flags described by the
> dwxgmac3_error_desc-based table and the respective counters being
> incremented should the errors were detected)
> 
XGMAC_DMA_DPP_INT_STATUS is just a bitmap of DMA RX and TX channels,
bottom 16 bits for 16 DMA TX channels and top 16 bits for 16 DMA RX channels.
No other descriptions.

And the counters should be updated, I will send a new patch.

> 2. I don't see this IRQ being enabled in the dwxgmac3_safety_feat_config()
> method. How come the respective event has turned to be triggered
> anyway?
This error report is enabled by default, and cannot be disabled or marked(as Synopsys Databook says).
What we can do is clearing it when it asserts.
> 
> -Serge(y) 
> 
> >  }
> >  
> >  static const struct dwxgmac3_error_desc dwxgmac3_mtl_errors[32]= {
> > -- 
> > 2.34.1
> > 
> >
  
Serge Semin Jan. 25, 2024, 7:07 p.m. UTC | #3
On Thu, Jan 25, 2024 at 10:56:20AM +0800, Furong Xu wrote:
> On Wed, 24 Jan 2024 17:36:10 +0300
> Serge Semin <fancer.lancer@gmail.com> wrote:
> 
> > On Tue, Jan 23, 2024 at 05:50:06PM +0800, Furong Xu wrote:
> > > Commit 56e58d6c8a56 ("net: stmmac: Implement Safety Features in
> > > XGMAC core") checks and reports safety errors, but leaves the
> > > Data Path Parity Errors for each channel in DMA unhandled at all, lead to
> > > a storm of interrupt.
> > > Fix it by checking and clearing the DMA_DPP_Interrupt_Status register.
> > > 
> > > Fixes: 56e58d6c8a56 ("net: stmmac: Implement Safety Features in XGMAC core")
> > > Signed-off-by: Furong Xu <0x1207@gmail.com>
> > > ---
> > >  drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h      | 1 +
> > >  drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c | 6 ++++++
> > >  2 files changed, 7 insertions(+)
> > > 
> > > diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
> > > index 207ff1799f2c..188e11683136 100644
> > > --- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
> > > +++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
> > > @@ -385,6 +385,7 @@
> > >  #define XGMAC_DCEIE			BIT(1)
> > >  #define XGMAC_TCEIE			BIT(0)
> > >  #define XGMAC_DMA_ECC_INT_STATUS	0x0000306c
> > > +#define XGMAC_DMA_DPP_INT_STATUS	0x00003074
> > >  #define XGMAC_DMA_CH_CONTROL(x)		(0x00003100 + (0x80 * (x)))
> > >  #define XGMAC_SPH			BIT(24)
> > >  #define XGMAC_PBLx8			BIT(16)
> > > diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
> > > index eb48211d9b0e..874e85b499e2 100644
> > > --- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
> > > +++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
> > > @@ -745,6 +745,12 @@ static void dwxgmac3_handle_mac_err(struct net_device *ndev,
> > >  
> > >  	dwxgmac3_log_error(ndev, value, correctable, "MAC",
> > >  			   dwxgmac3_mac_errors, STAT_OFF(mac_errors), stats);
> > > +
> > > +	value = readl(ioaddr + XGMAC_DMA_DPP_INT_STATUS);
> > > +	writel(value, ioaddr + XGMAC_DMA_DPP_INT_STATUS);
> > > +
> > > +	if (value)
> > > +		netdev_err(ndev, "Found DMA_DPP error, status: 0x%x\n", value);  
> > 
> > 1. Why not to implement this in the same way as the rest of the safety
> > errors handle code? (with the flags described by the
> > dwxgmac3_error_desc-based table and the respective counters being
> > incremented should the errors were detected)
> > 

> XGMAC_DMA_DPP_INT_STATUS is just a bitmap of DMA RX and TX channels,
> bottom 16 bits for 16 DMA TX channels and top 16 bits for 16 DMA RX channels.
> No other descriptions.
> 
> And the counters should be updated, I will send a new patch.

Ok. I'll wait for this patch v2 then with the counters fixed. Please
also note that you are adding the _DMA_ DPP events handling support.
Thus the more suitable place for this change would be
dwmac5_handle_dma_err().

> 
> > 2. I don't see this IRQ being enabled in the dwxgmac3_safety_feat_config()
> > method. How come the respective event has turned to be triggered
> > anyway?
> This error report is enabled by default, and cannot be disabled or marked(as Synopsys Databook says).
> What we can do is clearing it when it asserts.

This sounds so strange that I can barely believe in it. The DW QoS Eth
MTL DPP feature can be enabled/disabled, but the DW XGMAC DMA DPP
can't? This doesn't look logical. What's the point in having a never
maskable IRQ for not that much crucial but optional feature? Moreover
DPP adds some data flow overhead. If we are sure that no problem with
the device data paths, then it seems redundant to have it always
enabled. So I guess it must be switchable. Are you completely sure it
isn't?

-Serge(y)

> > 
> > -Serge(y) 
> > 
> > >  }
> > >  
> > >  static const struct dwxgmac3_error_desc dwxgmac3_mtl_errors[32]= {
> > > -- 
> > > 2.34.1
> > > 
> > >   
>
  
Furong Xu Jan. 26, 2024, 3:31 a.m. UTC | #4
On Thu, 25 Jan 2024 22:07:15 +0300
Serge Semin <fancer.lancer@gmail.com> wrote:

> On Thu, Jan 25, 2024 at 10:56:20AM +0800, Furong Xu wrote:
> > On Wed, 24 Jan 2024 17:36:10 +0300
> > Serge Semin <fancer.lancer@gmail.com> wrote:
> >   
> > > On Tue, Jan 23, 2024 at 05:50:06PM +0800, Furong Xu wrote:  
> > > > Commit 56e58d6c8a56 ("net: stmmac: Implement Safety Features in
> > > > XGMAC core") checks and reports safety errors, but leaves the
> > > > Data Path Parity Errors for each channel in DMA unhandled at all, lead to
> > > > a storm of interrupt.
> > > > Fix it by checking and clearing the DMA_DPP_Interrupt_Status register.
> > > > 
> > > > Fixes: 56e58d6c8a56 ("net: stmmac: Implement Safety Features in XGMAC core")
> > > > Signed-off-by: Furong Xu <0x1207@gmail.com>
> > > > ---
> > > >  drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h      | 1 +
> > > >  drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c | 6 ++++++
> > > >  2 files changed, 7 insertions(+)
> > > > 
> > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
> > > > index 207ff1799f2c..188e11683136 100644
> > > > --- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
> > > > +++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
> > > > @@ -385,6 +385,7 @@
> > > >  #define XGMAC_DCEIE			BIT(1)
> > > >  #define XGMAC_TCEIE			BIT(0)
> > > >  #define XGMAC_DMA_ECC_INT_STATUS	0x0000306c
> > > > +#define XGMAC_DMA_DPP_INT_STATUS	0x00003074
> > > >  #define XGMAC_DMA_CH_CONTROL(x)		(0x00003100 + (0x80 * (x)))
> > > >  #define XGMAC_SPH			BIT(24)
> > > >  #define XGMAC_PBLx8			BIT(16)
> > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
> > > > index eb48211d9b0e..874e85b499e2 100644
> > > > --- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
> > > > +++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
> > > > @@ -745,6 +745,12 @@ static void dwxgmac3_handle_mac_err(struct net_device *ndev,
> > > >  
> > > >  	dwxgmac3_log_error(ndev, value, correctable, "MAC",
> > > >  			   dwxgmac3_mac_errors, STAT_OFF(mac_errors), stats);
> > > > +
> > > > +	value = readl(ioaddr + XGMAC_DMA_DPP_INT_STATUS);
> > > > +	writel(value, ioaddr + XGMAC_DMA_DPP_INT_STATUS);
> > > > +
> > > > +	if (value)
> > > > +		netdev_err(ndev, "Found DMA_DPP error, status: 0x%x\n", value);    
> > > 
> > > 1. Why not to implement this in the same way as the rest of the safety
> > > errors handle code? (with the flags described by the
> > > dwxgmac3_error_desc-based table and the respective counters being
> > > incremented should the errors were detected)
> > >   
> 
> > XGMAC_DMA_DPP_INT_STATUS is just a bitmap of DMA RX and TX channels,
> > bottom 16 bits for 16 DMA TX channels and top 16 bits for 16 DMA RX channels.
> > No other descriptions.
> > 
> > And the counters should be updated, I will send a new patch.  
> 
> Ok. I'll wait for this patch v2 then with the counters fixed. Please
> also note that you are adding the _DMA_ DPP events handling support.
> Thus the more suitable place for this change would be
> dwmac5_handle_dma_err().
> 
> >   
> > > 2. I don't see this IRQ being enabled in the dwxgmac3_safety_feat_config()
> > > method. How come the respective event has turned to be triggered
> > > anyway?  
> > This error report is enabled by default, and cannot be disabled or marked(as Synopsys Databook says).
> > What we can do is clearing it when it asserts.  
> 
> This sounds so strange that I can barely believe in it. The DW QoS Eth
> MTL DPP feature can be enabled/disabled, but the DW XGMAC DMA DPP
> can't? This doesn't look logical. What's the point in having a never
> maskable IRQ for not that much crucial but optional feature? Moreover
> DPP adds some data flow overhead. If we are sure that no problem with
> the device data paths, then it seems redundant to have it always
> enabled. So I guess it must be switchable. Are you completely sure it
> isn't?

Sorry for my bad explanation.

Double checked DMA_DPP error report path on my device.

XGMAC DMA_DPP is enable by DDPP bit of MTL_DPP_Control.
DDPP bit is default to 0(Data path Parity Protection is enabled).
When DDPP bit is set to 1(Data path Parity Protection is disabled), no DMA_DPP interrupt is reported.

Once DMA_DPP interrupt is reported, there is no control bit to disable it or mask it.
DMA_DPP error is unrecoverable type, and unrecoverable error interrupt cannot be disabled or masked,
this is a design(as Synopsys Databook says).

A explicit ops on MTL_DPP_Control to clear DDPP bit can add to dwxgmac3_safety_feat_config
to make code looks better.

> 
> -Serge(y)
> 
> > > 
> > > -Serge(y) 
> > >   
> > > >  }
> > > >  
> > > >  static const struct dwxgmac3_error_desc dwxgmac3_mtl_errors[32]= {
> > > > -- 
> > > > 2.34.1
> > > > 
> > > >     
> >
  

Patch

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
index 207ff1799f2c..188e11683136 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
@@ -385,6 +385,7 @@ 
 #define XGMAC_DCEIE			BIT(1)
 #define XGMAC_TCEIE			BIT(0)
 #define XGMAC_DMA_ECC_INT_STATUS	0x0000306c
+#define XGMAC_DMA_DPP_INT_STATUS	0x00003074
 #define XGMAC_DMA_CH_CONTROL(x)		(0x00003100 + (0x80 * (x)))
 #define XGMAC_SPH			BIT(24)
 #define XGMAC_PBLx8			BIT(16)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
index eb48211d9b0e..874e85b499e2 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
@@ -745,6 +745,12 @@  static void dwxgmac3_handle_mac_err(struct net_device *ndev,
 
 	dwxgmac3_log_error(ndev, value, correctable, "MAC",
 			   dwxgmac3_mac_errors, STAT_OFF(mac_errors), stats);
+
+	value = readl(ioaddr + XGMAC_DMA_DPP_INT_STATUS);
+	writel(value, ioaddr + XGMAC_DMA_DPP_INT_STATUS);
+
+	if (value)
+		netdev_err(ndev, "Found DMA_DPP error, status: 0x%x\n", value);
 }
 
 static const struct dwxgmac3_error_desc dwxgmac3_mtl_errors[32]= {