[net] veth: fix ethtool statistical errors

Message ID 20231116114150.48639-1-huangjie.albert@bytedance.com
State New
Headers
Series [net] veth: fix ethtool statistical errors |

Commit Message

黄杰 Nov. 16, 2023, 11:41 a.m. UTC
  if peer->real_num_rx_queues > 1, the ethtool -s command for
veth network device will display some error statistical values.
The value of tx_idx is reset with each iteration, so even if
peer->real_num_rx_queues is greater than 1, the value of tx_idx
will remain constant. This results in incorrect statistical values.
To fix this issue, assign the value of pp_idx to tx_idx.

Fixes: 5fe6e56776ba ("veth: rely on peer veth_rq for ndo_xdp_xmit accounting")
Signed-off-by: Albert Huang <huangjie.albert@bytedance.com>
---
 drivers/net/veth.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Lorenzo Bianconi Nov. 17, 2023, 9:25 a.m. UTC | #1
> if peer->real_num_rx_queues > 1, the ethtool -s command for
> veth network device will display some error statistical values.
> The value of tx_idx is reset with each iteration, so even if
> peer->real_num_rx_queues is greater than 1, the value of tx_idx
> will remain constant. This results in incorrect statistical values.
> To fix this issue, assign the value of pp_idx to tx_idx.
> 
> Fixes: 5fe6e56776ba ("veth: rely on peer veth_rq for ndo_xdp_xmit accounting")
> Signed-off-by: Albert Huang <huangjie.albert@bytedance.com>
> ---
>  drivers/net/veth.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index 0deefd1573cf..3a8e3fc5eeb5 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -225,7 +225,7 @@ static void veth_get_ethtool_stats(struct net_device *dev,
>  	for (i = 0; i < peer->real_num_rx_queues; i++) {
>  		const struct veth_rq_stats *rq_stats = &rcv_priv->rq[i].stats;
>  		const void *base = (void *)&rq_stats->vs;
> -		unsigned int start, tx_idx = idx;
> +		unsigned int start, tx_idx = pp_idx;
>  		size_t offset;
>  
>  		tx_idx += (i % dev->real_num_tx_queues) * VETH_TQ_STATS_LEN;
> -- 
> 2.20.1
> 

Hi Albert,

Can you please provide more details about the issue you are facing?
In particular, what is the number of configured tx and rx queues for both
peers?
tx_idx is the index of the current (local) tx queue and it must restart from
idx in each iteration otherwise we will have an issue when
peer->real_num_rx_queues is greater than dev->real_num_tx_queues.

Regards,
Lorenzo
  
黄杰 Nov. 20, 2023, 9:45 a.m. UTC | #2
Lorenzo Bianconi <lorenzo@kernel.org> 于2023年11月17日周五 17:26写道:
>
> > if peer->real_num_rx_queues > 1, the ethtool -s command for
> > veth network device will display some error statistical values.
> > The value of tx_idx is reset with each iteration, so even if
> > peer->real_num_rx_queues is greater than 1, the value of tx_idx
> > will remain constant. This results in incorrect statistical values.
> > To fix this issue, assign the value of pp_idx to tx_idx.
> >
> > Fixes: 5fe6e56776ba ("veth: rely on peer veth_rq for ndo_xdp_xmit accounting")
> > Signed-off-by: Albert Huang <huangjie.albert@bytedance.com>
> > ---
> >  drivers/net/veth.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> > index 0deefd1573cf..3a8e3fc5eeb5 100644
> > --- a/drivers/net/veth.c
> > +++ b/drivers/net/veth.c
> > @@ -225,7 +225,7 @@ static void veth_get_ethtool_stats(struct net_device *dev,
> >       for (i = 0; i < peer->real_num_rx_queues; i++) {
> >               const struct veth_rq_stats *rq_stats = &rcv_priv->rq[i].stats;
> >               const void *base = (void *)&rq_stats->vs;
> > -             unsigned int start, tx_idx = idx;
> > +             unsigned int start, tx_idx = pp_idx;
> >               size_t offset;
> >
> >               tx_idx += (i % dev->real_num_tx_queues) * VETH_TQ_STATS_LEN;
> > --
> > 2.20.1
> >
>
> Hi Albert,
>
> Can you please provide more details about the issue you are facing?
> In particular, what is the number of configured tx and rx queues for both
> peers?

Hi, Lorenzo
I found this because I wanted to add more echo information in ethttool(for veth,
but I found that the information was incorrect. That's why I paid
attention here.

> tx_idx is the index of the current (local) tx queue and it must restart from
> idx in each iteration otherwise we will have an issue when
> peer->real_num_rx_queues is greater than dev->real_num_tx_queues.
>
OK. I don't know if this is a known issue.

BR
Albert


> Regards,
> Lorenzo
  
Lorenzo Bianconi Nov. 20, 2023, 9:51 a.m. UTC | #3
> Lorenzo Bianconi <lorenzo@kernel.org> 于2023年11月17日周五 17:26写道:
> >
> > > if peer->real_num_rx_queues > 1, the ethtool -s command for
> > > veth network device will display some error statistical values.
> > > The value of tx_idx is reset with each iteration, so even if
> > > peer->real_num_rx_queues is greater than 1, the value of tx_idx
> > > will remain constant. This results in incorrect statistical values.
> > > To fix this issue, assign the value of pp_idx to tx_idx.
> > >
> > > Fixes: 5fe6e56776ba ("veth: rely on peer veth_rq for ndo_xdp_xmit accounting")
> > > Signed-off-by: Albert Huang <huangjie.albert@bytedance.com>
> > > ---
> > >  drivers/net/veth.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> > > index 0deefd1573cf..3a8e3fc5eeb5 100644
> > > --- a/drivers/net/veth.c
> > > +++ b/drivers/net/veth.c
> > > @@ -225,7 +225,7 @@ static void veth_get_ethtool_stats(struct net_device *dev,
> > >       for (i = 0; i < peer->real_num_rx_queues; i++) {
> > >               const struct veth_rq_stats *rq_stats = &rcv_priv->rq[i].stats;
> > >               const void *base = (void *)&rq_stats->vs;
> > > -             unsigned int start, tx_idx = idx;
> > > +             unsigned int start, tx_idx = pp_idx;
> > >               size_t offset;
> > >
> > >               tx_idx += (i % dev->real_num_tx_queues) * VETH_TQ_STATS_LEN;
> > > --
> > > 2.20.1
> > >
> >
> > Hi Albert,
> >
> > Can you please provide more details about the issue you are facing?
> > In particular, what is the number of configured tx and rx queues for both
> > peers?
> 
> Hi, Lorenzo
> I found this because I wanted to add more echo information in ethttool(for veth,
> but I found that the information was incorrect. That's why I paid
> attention here.

ack. Could you please share the veth pair tx/rx queue configuration?

Rergards,
Lorenzo

> 
> > tx_idx is the index of the current (local) tx queue and it must restart from
> > idx in each iteration otherwise we will have an issue when
> > peer->real_num_rx_queues is greater than dev->real_num_tx_queues.
> >
> OK. I don't know if this is a known issue.
> 
> BR
> Albert
> 
> 
> > Regards,
> > Lorenzo
  
黄杰 Nov. 20, 2023, 10:02 a.m. UTC | #4
Lorenzo Bianconi <lorenzo@kernel.org> 于2023年11月20日周一 17:52写道:
>
> > Lorenzo Bianconi <lorenzo@kernel.org> 于2023年11月17日周五 17:26写道:
> > >
> > > > if peer->real_num_rx_queues > 1, the ethtool -s command for
> > > > veth network device will display some error statistical values.
> > > > The value of tx_idx is reset with each iteration, so even if
> > > > peer->real_num_rx_queues is greater than 1, the value of tx_idx
> > > > will remain constant. This results in incorrect statistical values.
> > > > To fix this issue, assign the value of pp_idx to tx_idx.
> > > >
> > > > Fixes: 5fe6e56776ba ("veth: rely on peer veth_rq for ndo_xdp_xmit accounting")
> > > > Signed-off-by: Albert Huang <huangjie.albert@bytedance.com>
> > > > ---
> > > >  drivers/net/veth.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> > > > index 0deefd1573cf..3a8e3fc5eeb5 100644
> > > > --- a/drivers/net/veth.c
> > > > +++ b/drivers/net/veth.c
> > > > @@ -225,7 +225,7 @@ static void veth_get_ethtool_stats(struct net_device *dev,
> > > >       for (i = 0; i < peer->real_num_rx_queues; i++) {
> > > >               const struct veth_rq_stats *rq_stats = &rcv_priv->rq[i].stats;
> > > >               const void *base = (void *)&rq_stats->vs;
> > > > -             unsigned int start, tx_idx = idx;
> > > > +             unsigned int start, tx_idx = pp_idx;
> > > >               size_t offset;
> > > >
> > > >               tx_idx += (i % dev->real_num_tx_queues) * VETH_TQ_STATS_LEN;
> > > > --
> > > > 2.20.1
> > > >
> > >
> > > Hi Albert,
> > >
> > > Can you please provide more details about the issue you are facing?
> > > In particular, what is the number of configured tx and rx queues for both
> > > peers?
> >
> > Hi, Lorenzo
> > I found this because I wanted to add more echo information in ethttool(for veth,
> > but I found that the information was incorrect. That's why I paid
> > attention here.
>
> ack. Could you please share the veth pair tx/rx queue configuration?
>

dev: tx --->4.  rx--->4
peer: tx--->1 rx---->1

Could the following code still be problematic? pp_idx not updated correctly.
page_pool_stats:
veth_get_page_pool_stats(dev, &data[pp_idx]);

BR
Albert

> Rergards,
> Lorenzo
>
> >
> > > tx_idx is the index of the current (local) tx queue and it must restart from
> > > idx in each iteration otherwise we will have an issue when
> > > peer->real_num_rx_queues is greater than dev->real_num_tx_queues.
> > >
> > OK. I don't know if this is a known issue.
> >
> > BR
> > Albert
> >
> >
> > > Regards,
> > > Lorenzo
  
Lorenzo Bianconi Nov. 20, 2023, 10:55 a.m. UTC | #5
> Lorenzo Bianconi <lorenzo@kernel.org> 于2023年11月20日周一 17:52写道:
> >
> > > Lorenzo Bianconi <lorenzo@kernel.org> 于2023年11月17日周五 17:26写道:
> > > >
> > > > > if peer->real_num_rx_queues > 1, the ethtool -s command for
> > > > > veth network device will display some error statistical values.
> > > > > The value of tx_idx is reset with each iteration, so even if
> > > > > peer->real_num_rx_queues is greater than 1, the value of tx_idx
> > > > > will remain constant. This results in incorrect statistical values.
> > > > > To fix this issue, assign the value of pp_idx to tx_idx.
> > > > >
> > > > > Fixes: 5fe6e56776ba ("veth: rely on peer veth_rq for ndo_xdp_xmit accounting")
> > > > > Signed-off-by: Albert Huang <huangjie.albert@bytedance.com>
> > > > > ---
> > > > >  drivers/net/veth.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> > > > > index 0deefd1573cf..3a8e3fc5eeb5 100644
> > > > > --- a/drivers/net/veth.c
> > > > > +++ b/drivers/net/veth.c
> > > > > @@ -225,7 +225,7 @@ static void veth_get_ethtool_stats(struct net_device *dev,
> > > > >       for (i = 0; i < peer->real_num_rx_queues; i++) {
> > > > >               const struct veth_rq_stats *rq_stats = &rcv_priv->rq[i].stats;
> > > > >               const void *base = (void *)&rq_stats->vs;
> > > > > -             unsigned int start, tx_idx = idx;
> > > > > +             unsigned int start, tx_idx = pp_idx;
> > > > >               size_t offset;
> > > > >
> > > > >               tx_idx += (i % dev->real_num_tx_queues) * VETH_TQ_STATS_LEN;
> > > > > --
> > > > > 2.20.1
> > > > >
> > > >
> > > > Hi Albert,
> > > >
> > > > Can you please provide more details about the issue you are facing?
> > > > In particular, what is the number of configured tx and rx queues for both
> > > > peers?
> > >
> > > Hi, Lorenzo
> > > I found this because I wanted to add more echo information in ethttool(for veth,
> > > but I found that the information was incorrect. That's why I paid
> > > attention here.
> >
> > ack. Could you please share the veth pair tx/rx queue configuration?
> >
> 
> dev: tx --->4.  rx--->4
> peer: tx--->1 rx---->1
> 
> Could the following code still be problematic? pp_idx not updated correctly.
> page_pool_stats:
> veth_get_page_pool_stats(dev, &data[pp_idx]);

Thx for pointing this out. This part is a bit tricky but I think I can see the
issue now. Since we have just one peer rx queue, when we run ndo_xdp_xmit
pointer on dev, we will squash all dev xmit queues on the single peer rx one
(where we do do the accounting) [0].
The issue is ethtool will display all dev xmit queues so we need to set pp_idx
properly in veth_get_ethtool_stats().
Can you please take a look to the patch below?

Regards,
Lorenzo

[0] https://github.com/LorenzoBianconi/net-next/blob/master/drivers/net/veth.c#L417

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 9980517ed8b0..8607eb8cf458 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -236,8 +236,8 @@ static void veth_get_ethtool_stats(struct net_device *dev,
 				data[tx_idx + j] += *(u64 *)(base + offset);
 			}
 		} while (u64_stats_fetch_retry(&rq_stats->syncp, start));
-		pp_idx = tx_idx + VETH_TQ_STATS_LEN;
 	}
+	pp_idx = idx + dev->real_num_tx_queues * VETH_TQ_STATS_LEN;
 
 page_pool_stats:
 	veth_get_page_pool_stats(dev, &data[pp_idx]);

> 
> BR
> Albert
> 
> > Rergards,
> > Lorenzo
> >
> > >
> > > > tx_idx is the index of the current (local) tx queue and it must restart from
> > > > idx in each iteration otherwise we will have an issue when
> > > > peer->real_num_rx_queues is greater than dev->real_num_tx_queues.
> > > >
> > > OK. I don't know if this is a known issue.
> > >
> > > BR
> > > Albert
> > >
> > >
> > > > Regards,
> > > > Lorenzo
  
黄杰 Nov. 20, 2023, 11:01 a.m. UTC | #6
黄杰 <huangjie.albert@bytedance.com> 于2023年11月20日周一 18:02写道:
>
> Lorenzo Bianconi <lorenzo@kernel.org> 于2023年11月20日周一 17:52写道:
> >
> > > Lorenzo Bianconi <lorenzo@kernel.org> 于2023年11月17日周五 17:26写道:
> > > >
> > > > > if peer->real_num_rx_queues > 1, the ethtool -s command for
> > > > > veth network device will display some error statistical values.
> > > > > The value of tx_idx is reset with each iteration, so even if
> > > > > peer->real_num_rx_queues is greater than 1, the value of tx_idx
> > > > > will remain constant. This results in incorrect statistical values.
> > > > > To fix this issue, assign the value of pp_idx to tx_idx.
> > > > >
> > > > > Fixes: 5fe6e56776ba ("veth: rely on peer veth_rq for ndo_xdp_xmit accounting")
> > > > > Signed-off-by: Albert Huang <huangjie.albert@bytedance.com>
> > > > > ---
> > > > >  drivers/net/veth.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> > > > > index 0deefd1573cf..3a8e3fc5eeb5 100644
> > > > > --- a/drivers/net/veth.c
> > > > > +++ b/drivers/net/veth.c
> > > > > @@ -225,7 +225,7 @@ static void veth_get_ethtool_stats(struct net_device *dev,
> > > > >       for (i = 0; i < peer->real_num_rx_queues; i++) {
> > > > >               const struct veth_rq_stats *rq_stats = &rcv_priv->rq[i].stats;
> > > > >               const void *base = (void *)&rq_stats->vs;
> > > > > -             unsigned int start, tx_idx = idx;
> > > > > +             unsigned int start, tx_idx = pp_idx;
> > > > >               size_t offset;
> > > > >
> > > > >               tx_idx += (i % dev->real_num_tx_queues) * VETH_TQ_STATS_LEN;
> > > > > --
> > > > > 2.20.1
> > > > >
> > > >
> > > > Hi Albert,
> > > >
> > > > Can you please provide more details about the issue you are facing?
> > > > In particular, what is the number of configured tx and rx queues for both
> > > > peers?
> > >
> > > Hi, Lorenzo
> > > I found this because I wanted to add more echo information in ethttool(for veth,
> > > but I found that the information was incorrect. That's why I paid
> > > attention here.
> >
> > ack. Could you please share the veth pair tx/rx queue configuration?
> >
>
> dev: tx --->4.  rx--->4
> peer: tx--->1 rx---->1
>
> Could the following code still be problematic? pp_idx not updated correctly.
> page_pool_stats:
> veth_get_page_pool_stats(dev, &data[pp_idx]);

I did the test locally and there is no problem with this place. I
didn't fully understand
this piece of code before
thanks.
BR
Albert.

>
> BR
> Albert
>
> > Rergards,
> > Lorenzo
> >
> > >
> > > > tx_idx is the index of the current (local) tx queue and it must restart from
> > > > idx in each iteration otherwise we will have an issue when
> > > > peer->real_num_rx_queues is greater than dev->real_num_tx_queues.
> > > >
> > > OK. I don't know if this is a known issue.
> > >
> > > BR
> > > Albert
> > >
> > >
> > > > Regards,
> > > > Lorenzo
  

Patch

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 0deefd1573cf..3a8e3fc5eeb5 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -225,7 +225,7 @@  static void veth_get_ethtool_stats(struct net_device *dev,
 	for (i = 0; i < peer->real_num_rx_queues; i++) {
 		const struct veth_rq_stats *rq_stats = &rcv_priv->rq[i].stats;
 		const void *base = (void *)&rq_stats->vs;
-		unsigned int start, tx_idx = idx;
+		unsigned int start, tx_idx = pp_idx;
 		size_t offset;
 
 		tx_idx += (i % dev->real_num_tx_queues) * VETH_TQ_STATS_LEN;