[net-next] net: introduce budget_squeeze to help us tune rx behavior

Message ID 20230311163614.92296-1-kerneljasonxing@gmail.com
State New
Headers
Series [net-next] net: introduce budget_squeeze to help us tune rx behavior |

Commit Message

Jason Xing March 11, 2023, 4:36 p.m. UTC
  From: Jason Xing <kernelxing@tencent.com>

When we encounter some performance issue and then get lost on how
to tune the budget limit and time limit in net_rx_action() function,
we can separately counting both of them to avoid the confusion.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
note: this commit is based on the link as below:
https://lore.kernel.org/lkml/20230311151756.83302-1-kerneljasonxing@gmail.com/
---
 include/linux/netdevice.h |  1 +
 net/core/dev.c            | 12 ++++++++----
 net/core/net-procfs.c     |  9 ++++++---
 3 files changed, 15 insertions(+), 7 deletions(-)
  

Comments

Stephen Hemminger March 11, 2023, 5:14 p.m. UTC | #1
On Sun, 12 Mar 2023 00:36:14 +0800
Jason Xing <kerneljasonxing@gmail.com> wrote:

> -	for (;;) {
> +	for (; is_continue;) {


Easier to read this as a
 	while (is_continue) {

but what is wrong with using break; instead?
  
Jason Xing March 12, 2023, 12:04 a.m. UTC | #2
On Sun, Mar 12, 2023 at 1:14 AM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Sun, 12 Mar 2023 00:36:14 +0800
> Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> > -     for (;;) {
> > +     for (; is_continue;) {
>
>
> Easier to read this as a
>         while (is_continue) {
>
> but what is wrong with using break; instead?

If we hit the budget limit and 'break;' immediately, we may miss the
collection when we also hit the time limit. That's why I would like to
know if we hit both of them.

Thank,
Jason
  
Jason Xing March 13, 2023, 2:05 a.m. UTC | #3
On Sun, Mar 12, 2023 at 12:36 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> From: Jason Xing <kernelxing@tencent.com>
>
> When we encounter some performance issue and then get lost on how
> to tune the budget limit and time limit in net_rx_action() function,
> we can separately counting both of them to avoid the confusion.
>
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
> note: this commit is based on the link as below:
> https://lore.kernel.org/lkml/20230311151756.83302-1-kerneljasonxing@gmail.com/
> ---
>  include/linux/netdevice.h |  1 +
>  net/core/dev.c            | 12 ++++++++----
>  net/core/net-procfs.c     |  9 ++++++---
>  3 files changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 6a14b7b11766..5736311a2133 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3157,6 +3157,7 @@ struct softnet_data {
>         /* stats */
>         unsigned int            processed;
>         unsigned int            time_squeeze;
> +       unsigned int            budget_squeeze;
>  #ifdef CONFIG_RPS
>         struct softnet_data     *rps_ipi_list;
>  #endif
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 253584777101..bed7a68fdb5d 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6637,6 +6637,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
>         unsigned long time_limit = jiffies +
>                 usecs_to_jiffies(READ_ONCE(netdev_budget_usecs));
>         int budget = READ_ONCE(netdev_budget);
> +       bool is_continue = true;

I kept thinking during these days, I think it looks not that concise
and elegant and also the name is not that good though the function can
work.

In the next submission, I'm going to choose to use 'while()' instead
of 'for()' suggested by Stephen.

Does anyone else have some advice about this?

Thanks,
Jason

>         LIST_HEAD(list);
>         LIST_HEAD(repoll);
>
> @@ -6644,7 +6645,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
>         list_splice_init(&sd->poll_list, &list);
>         local_irq_enable();
>
> -       for (;;) {
> +       for (; is_continue;) {
>                 struct napi_struct *n;
>
>                 skb_defer_free_flush(sd);
> @@ -6662,10 +6663,13 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
>                  * Allow this to run for 2 jiffies since which will allow
>                  * an average latency of 1.5/HZ.
>                  */
> -               if (unlikely(budget <= 0 ||
> -                            time_after_eq(jiffies, time_limit))) {
> +               if (unlikely(budget <= 0)) {
> +                       sd->budget_squeeze++;
> +                       is_continue = false;
> +               }
> +               if (unlikely(time_after_eq(jiffies, time_limit))) {
>                         sd->time_squeeze++;
> -                       break;
> +                       is_continue = false;
>                 }
>         }
>
> diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
> index 97a304e1957a..4d1a499d7c43 100644
> --- a/net/core/net-procfs.c
> +++ b/net/core/net-procfs.c
> @@ -174,14 +174,17 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
>          */
>         seq_printf(seq,
>                    "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x "
> -                  "%08x %08x\n",
> -                  sd->processed, sd->dropped, sd->time_squeeze, 0,
> +                  "%08x %08x %08x %08x\n",
> +                  sd->processed, sd->dropped,
> +                  0, /* was old way to count time squeeze */
> +                  0,
>                    0, 0, 0, 0, /* was fastroute */
>                    0,   /* was cpu_collision */
>                    sd->received_rps, flow_limit_count,
>                    0,   /* was len of two backlog queues */
>                    (int)seq->index,
> -                  softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd));
> +                  softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd),
> +                  sd->time_squeeze, sd->budget_squeeze);
>         return 0;
>  }
>
> --
> 2.37.3
>
  
Simon Horman March 13, 2023, 8:07 p.m. UTC | #4
On Mon, Mar 13, 2023 at 10:05:18AM +0800, Jason Xing wrote:
> On Sun, Mar 12, 2023 at 12:36 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > When we encounter some performance issue and then get lost on how
> > to tune the budget limit and time limit in net_rx_action() function,
> > we can separately counting both of them to avoid the confusion.
> >
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> > note: this commit is based on the link as below:
> > https://lore.kernel.org/lkml/20230311151756.83302-1-kerneljasonxing@gmail.com/
> > ---
> >  include/linux/netdevice.h |  1 +
> >  net/core/dev.c            | 12 ++++++++----
> >  net/core/net-procfs.c     |  9 ++++++---
> >  3 files changed, 15 insertions(+), 7 deletions(-)
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 6a14b7b11766..5736311a2133 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -3157,6 +3157,7 @@ struct softnet_data {
> >         /* stats */
> >         unsigned int            processed;
> >         unsigned int            time_squeeze;
> > +       unsigned int            budget_squeeze;
> >  #ifdef CONFIG_RPS
> >         struct softnet_data     *rps_ipi_list;
> >  #endif
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 253584777101..bed7a68fdb5d 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -6637,6 +6637,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
> >         unsigned long time_limit = jiffies +
> >                 usecs_to_jiffies(READ_ONCE(netdev_budget_usecs));
> >         int budget = READ_ONCE(netdev_budget);
> > +       bool is_continue = true;
> 
> I kept thinking during these days, I think it looks not that concise
> and elegant and also the name is not that good though the function can
> work.
> 
> In the next submission, I'm going to choose to use 'while()' instead
> of 'for()' suggested by Stephen.
> 
> Does anyone else have some advice about this?

What about:

	int done = false

	while (!done) {
		...
	}

Or:

	for (;;) {
		int done = false;

		...
		if (done)
			break;
	}

> 
> Thanks,
> Jason
> 
> >         LIST_HEAD(list);
> >         LIST_HEAD(repoll);
> >
> > @@ -6644,7 +6645,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
> >         list_splice_init(&sd->poll_list, &list);
> >         local_irq_enable();
> >
> > -       for (;;) {
> > +       for (; is_continue;) {
> >                 struct napi_struct *n;
> >
> >                 skb_defer_free_flush(sd);
> > @@ -6662,10 +6663,13 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
> >                  * Allow this to run for 2 jiffies since which will allow
> >                  * an average latency of 1.5/HZ.
> >                  */
> > -               if (unlikely(budget <= 0 ||
> > -                            time_after_eq(jiffies, time_limit))) {
> > +               if (unlikely(budget <= 0)) {
> > +                       sd->budget_squeeze++;
> > +                       is_continue = false;
> > +               }
> > +               if (unlikely(time_after_eq(jiffies, time_limit))) {
> >                         sd->time_squeeze++;
> > -                       break;
> > +                       is_continue = false;
> >                 }
> >         }
> >
> > diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
> > index 97a304e1957a..4d1a499d7c43 100644
> > --- a/net/core/net-procfs.c
> > +++ b/net/core/net-procfs.c
> > @@ -174,14 +174,17 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
> >          */
> >         seq_printf(seq,
> >                    "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x "
> > -                  "%08x %08x\n",
> > -                  sd->processed, sd->dropped, sd->time_squeeze, 0,
> > +                  "%08x %08x %08x %08x\n",
> > +                  sd->processed, sd->dropped,
> > +                  0, /* was old way to count time squeeze */
> > +                  0,
> >                    0, 0, 0, 0, /* was fastroute */
> >                    0,   /* was cpu_collision */
> >                    sd->received_rps, flow_limit_count,
> >                    0,   /* was len of two backlog queues */
> >                    (int)seq->index,
> > -                  softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd));
> > +                  softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd),
> > +                  sd->time_squeeze, sd->budget_squeeze);
> >         return 0;
> >  }
> >
> > --
> > 2.37.3
> >
>
  
Kui-Feng Lee March 13, 2023, 9:58 p.m. UTC | #5
On 3/11/23 08:36, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> When we encounter some performance issue and then get lost on how
> to tune the budget limit and time limit in net_rx_action() function,
> we can separately counting both of them to avoid the confusion.
> 
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
> note: this commit is based on the link as below:
> https://lore.kernel.org/lkml/20230311151756.83302-1-kerneljasonxing@gmail.com/
> ---
>   include/linux/netdevice.h |  1 +
>   net/core/dev.c            | 12 ++++++++----
>   net/core/net-procfs.c     |  9 ++++++---
>   3 files changed, 15 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 6a14b7b11766..5736311a2133 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3157,6 +3157,7 @@ struct softnet_data {
>   	/* stats */
>   	unsigned int		processed;
>   	unsigned int		time_squeeze;
> +	unsigned int		budget_squeeze;
>   #ifdef CONFIG_RPS
>   	struct softnet_data	*rps_ipi_list;
>   #endif
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 253584777101..bed7a68fdb5d 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6637,6 +6637,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
>   	unsigned long time_limit = jiffies +
>   		usecs_to_jiffies(READ_ONCE(netdev_budget_usecs));
>   	int budget = READ_ONCE(netdev_budget);
> +	bool is_continue = true;
>   	LIST_HEAD(list);
>   	LIST_HEAD(repoll);
>   
> @@ -6644,7 +6645,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
>   	list_splice_init(&sd->poll_list, &list);
>   	local_irq_enable();
>   
> -	for (;;) {
> +	for (; is_continue;) {
>   		struct napi_struct *n;
>   
>   		skb_defer_free_flush(sd);
> @@ -6662,10 +6663,13 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
>   		 * Allow this to run for 2 jiffies since which will allow
>   		 * an average latency of 1.5/HZ.
>   		 */
> -		if (unlikely(budget <= 0 ||
> -			     time_after_eq(jiffies, time_limit))) {
> +		if (unlikely(budget <= 0)) {
> +			sd->budget_squeeze++;
> +			is_continue = false;
> +		}
> +		if (unlikely(time_after_eq(jiffies, time_limit))) {
>   			sd->time_squeeze++;
> -			break;
> +			is_continue = false;
>   		}
>   	}
>   
> diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
> index 97a304e1957a..4d1a499d7c43 100644
> --- a/net/core/net-procfs.c
> +++ b/net/core/net-procfs.c
> @@ -174,14 +174,17 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
>   	 */
>   	seq_printf(seq,
>   		   "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x "
> -		   "%08x %08x\n",
> -		   sd->processed, sd->dropped, sd->time_squeeze, 0,
> +		   "%08x %08x %08x %08x\n",
> +		   sd->processed, sd->dropped,
> +		   0, /* was old way to count time squeeze */

Should we show a proximate number?  For example,
sd->time_squeeze + sd->bud_squeeze.


> +		   0,
>   		   0, 0, 0, 0, /* was fastroute */
>   		   0,	/* was cpu_collision */
>   		   sd->received_rps, flow_limit_count,
>   		   0,	/* was len of two backlog queues */
>   		   (int)seq->index,
> -		   softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd));
> +		   softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd),
> +		   sd->time_squeeze, sd->budget_squeeze);
>   	return 0;
>   }
>
  
Jason Xing March 14, 2023, 1:56 a.m. UTC | #6
On Tue, Mar 14, 2023 at 4:07 AM Simon Horman <simon.horman@corigine.com> wrote:
>
> On Mon, Mar 13, 2023 at 10:05:18AM +0800, Jason Xing wrote:
> > On Sun, Mar 12, 2023 at 12:36 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > When we encounter some performance issue and then get lost on how
> > > to tune the budget limit and time limit in net_rx_action() function,
> > > we can separately counting both of them to avoid the confusion.
> > >
> > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > ---
> > > note: this commit is based on the link as below:
> > > https://lore.kernel.org/lkml/20230311151756.83302-1-kerneljasonxing@gmail.com/
> > > ---
> > >  include/linux/netdevice.h |  1 +
> > >  net/core/dev.c            | 12 ++++++++----
> > >  net/core/net-procfs.c     |  9 ++++++---
> > >  3 files changed, 15 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 6a14b7b11766..5736311a2133 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -3157,6 +3157,7 @@ struct softnet_data {
> > >         /* stats */
> > >         unsigned int            processed;
> > >         unsigned int            time_squeeze;
> > > +       unsigned int            budget_squeeze;
> > >  #ifdef CONFIG_RPS
> > >         struct softnet_data     *rps_ipi_list;
> > >  #endif
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index 253584777101..bed7a68fdb5d 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -6637,6 +6637,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
> > >         unsigned long time_limit = jiffies +
> > >                 usecs_to_jiffies(READ_ONCE(netdev_budget_usecs));
> > >         int budget = READ_ONCE(netdev_budget);
> > > +       bool is_continue = true;
> >
> > I kept thinking during these days, I think it looks not that concise
> > and elegant and also the name is not that good though the function can
> > work.
> >
> > In the next submission, I'm going to choose to use 'while()' instead
> > of 'for()' suggested by Stephen.
> >
> > Does anyone else have some advice about this?
>
> What about:
>
>         int done = false
>
>         while (!done) {
>                 ...
>         }
>
> Or:
>
>         for (;;) {
>                 int done = false;
>
>                 ...
>                 if (done)
>                         break;
>         }
>

Great, that looks much better:)

Thanks,
Jason

> >
> > Thanks,
> > Jason
> >
> > >         LIST_HEAD(list);
> > >         LIST_HEAD(repoll);
> > >
> > > @@ -6644,7 +6645,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
> > >         list_splice_init(&sd->poll_list, &list);
> > >         local_irq_enable();
> > >
> > > -       for (;;) {
> > > +       for (; is_continue;) {
> > >                 struct napi_struct *n;
> > >
> > >                 skb_defer_free_flush(sd);
> > > @@ -6662,10 +6663,13 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
> > >                  * Allow this to run for 2 jiffies since which will allow
> > >                  * an average latency of 1.5/HZ.
> > >                  */
> > > -               if (unlikely(budget <= 0 ||
> > > -                            time_after_eq(jiffies, time_limit))) {
> > > +               if (unlikely(budget <= 0)) {
> > > +                       sd->budget_squeeze++;
> > > +                       is_continue = false;
> > > +               }
> > > +               if (unlikely(time_after_eq(jiffies, time_limit))) {
> > >                         sd->time_squeeze++;
> > > -                       break;
> > > +                       is_continue = false;
> > >                 }
> > >         }
> > >
> > > diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
> > > index 97a304e1957a..4d1a499d7c43 100644
> > > --- a/net/core/net-procfs.c
> > > +++ b/net/core/net-procfs.c
> > > @@ -174,14 +174,17 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
> > >          */
> > >         seq_printf(seq,
> > >                    "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x "
> > > -                  "%08x %08x\n",
> > > -                  sd->processed, sd->dropped, sd->time_squeeze, 0,
> > > +                  "%08x %08x %08x %08x\n",
> > > +                  sd->processed, sd->dropped,
> > > +                  0, /* was old way to count time squeeze */
> > > +                  0,
> > >                    0, 0, 0, 0, /* was fastroute */
> > >                    0,   /* was cpu_collision */
> > >                    sd->received_rps, flow_limit_count,
> > >                    0,   /* was len of two backlog queues */
> > >                    (int)seq->index,
> > > -                  softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd));
> > > +                  softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd),
> > > +                  sd->time_squeeze, sd->budget_squeeze);
> > >         return 0;
> > >  }
> > >
> > > --
> > > 2.37.3
> > >
> >
  
Jason Xing March 14, 2023, 1:57 a.m. UTC | #7
On Tue, Mar 14, 2023 at 5:58 AM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>
>
>
> On 3/11/23 08:36, Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > When we encounter some performance issue and then get lost on how
> > to tune the budget limit and time limit in net_rx_action() function,
> > we can separately counting both of them to avoid the confusion.
> >
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> > note: this commit is based on the link as below:
> > https://lore.kernel.org/lkml/20230311151756.83302-1-kerneljasonxing@gmail.com/
> > ---
> >   include/linux/netdevice.h |  1 +
> >   net/core/dev.c            | 12 ++++++++----
> >   net/core/net-procfs.c     |  9 ++++++---
> >   3 files changed, 15 insertions(+), 7 deletions(-)
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 6a14b7b11766..5736311a2133 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -3157,6 +3157,7 @@ struct softnet_data {
> >       /* stats */
> >       unsigned int            processed;
> >       unsigned int            time_squeeze;
> > +     unsigned int            budget_squeeze;
> >   #ifdef CONFIG_RPS
> >       struct softnet_data     *rps_ipi_list;
> >   #endif
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 253584777101..bed7a68fdb5d 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -6637,6 +6637,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
> >       unsigned long time_limit = jiffies +
> >               usecs_to_jiffies(READ_ONCE(netdev_budget_usecs));
> >       int budget = READ_ONCE(netdev_budget);
> > +     bool is_continue = true;
> >       LIST_HEAD(list);
> >       LIST_HEAD(repoll);
> >
> > @@ -6644,7 +6645,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
> >       list_splice_init(&sd->poll_list, &list);
> >       local_irq_enable();
> >
> > -     for (;;) {
> > +     for (; is_continue;) {
> >               struct napi_struct *n;
> >
> >               skb_defer_free_flush(sd);
> > @@ -6662,10 +6663,13 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
> >                * Allow this to run for 2 jiffies since which will allow
> >                * an average latency of 1.5/HZ.
> >                */
> > -             if (unlikely(budget <= 0 ||
> > -                          time_after_eq(jiffies, time_limit))) {
> > +             if (unlikely(budget <= 0)) {
> > +                     sd->budget_squeeze++;
> > +                     is_continue = false;
> > +             }
> > +             if (unlikely(time_after_eq(jiffies, time_limit))) {
> >                       sd->time_squeeze++;
> > -                     break;
> > +                     is_continue = false;
> >               }
> >       }
> >
> > diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
> > index 97a304e1957a..4d1a499d7c43 100644
> > --- a/net/core/net-procfs.c
> > +++ b/net/core/net-procfs.c
> > @@ -174,14 +174,17 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
> >        */
> >       seq_printf(seq,
> >                  "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x "
> > -                "%08x %08x\n",
> > -                sd->processed, sd->dropped, sd->time_squeeze, 0,
> > +                "%08x %08x %08x %08x\n",
> > +                sd->processed, sd->dropped,
> > +                0, /* was old way to count time squeeze */
>
> Should we show a proximate number?  For example,
> sd->time_squeeze + sd->bud_squeeze.

Yeah, It does make sense. Let the old way to display untouched.

>
>
> > +                0,
> >                  0, 0, 0, 0, /* was fastroute */
> >                  0,   /* was cpu_collision */
> >                  sd->received_rps, flow_limit_count,
> >                  0,   /* was len of two backlog queues */
> >                  (int)seq->index,
> > -                softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd));
> > +                softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd),
> > +                sd->time_squeeze, sd->budget_squeeze);
> >       return 0;
> >   }
> >
  
Jesper Dangaard Brouer March 14, 2023, 8:39 a.m. UTC | #8
On 14/03/2023 02.57, Jason Xing wrote:
> On Tue, Mar 14, 2023 at 5:58 AM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>>
>> On 3/11/23 08:36, Jason Xing wrote:
>>> From: Jason Xing <kernelxing@tencent.com>
>>>
>>> When we encounter some performance issue and then get lost on how
>>> to tune the budget limit and time limit in net_rx_action() function,
>>> we can separately counting both of them to avoid the confusion.
>>>
>>> Signed-off-by: Jason Xing <kernelxing@tencent.com>
>>> ---
>>> note: this commit is based on the link as below:
>>> https://lore.kernel.org/lkml/20230311151756.83302-1-kerneljasonxing@gmail.com/
>>> ---
[...]
>>> diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
>>> index 97a304e1957a..4d1a499d7c43 100644
>>> --- a/net/core/net-procfs.c
>>> +++ b/net/core/net-procfs.c
>>> @@ -174,14 +174,17 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
>>>         */
>>>        seq_printf(seq,
>>>                   "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x "
>>> -                "%08x %08x\n",
>>> -                sd->processed, sd->dropped, sd->time_squeeze, 0,
>>> +                "%08x %08x %08x %08x\n",
>>> +                sd->processed, sd->dropped,
>>> +                0, /* was old way to count time squeeze */
>>
>> Should we show a proximate number?  For example,
>> sd->time_squeeze + sd->bud_squeeze.
> 
> Yeah, It does make sense. Let the old way to display untouched.
>

Yes, I don't think we can/should remove this squeeze stat because
several tools e.g. my own[1] captures these stats (and I know Willem
also have his own tool).
I like the sd->time_squeeze + sd->budget_squeeze suggestion.

  [1] 
https://github.com/netoptimizer/network-testing/blob/master/bin/softnet_stat.pl


>>
>>
>>> +                0,
>>>                   0, 0, 0, 0, /* was fastroute */
>>>                   0,   /* was cpu_collision */
>>>                   sd->received_rps, flow_limit_count,
>>>                   0,   /* was len of two backlog queues */
>>>                   (int)seq->index,
>>> -                softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd));
>>> +                softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd),
>>> +                sd->time_squeeze, sd->budget_squeeze);
>>>        return 0;
>>>    }
>>>

We recently had a very long troubleshooting session around a latency
issue (Cc Simon) where we used the tool[1].  The issue was NIC hardware
RX queue was backlogged, but we didn't see any squeeze events, which
confused us. (This happens because budget was 300 and two NICs using 64
budget each doesn't exceed 300).

We were/are missing another counter to tell us net_rx_action() "repoll"
is happening (as code !list_empty(&repoll)).  That were the case and it
would have "told" us that hardware RX ring was full (larger than 64).

We worked around this limitation by using the tracepoint for napi_poll,
and manually deduced that 64 bulking must mean that "repoll" were happening.

Oneliner bpftrace script:

  bpftrace -e 'tracepoint:napi:napi_poll { 
@napi_rx_bulk[str(args->dev_name)] = lhist(args->work, 0, 64, 4); }'

We used this script (that also measures softirq latency):

 
https://github.com/xdp-project/xdp-project/blob/master/areas/latency/napi_monitor.bt 


I do wonder is it would be valuable to *also* add a tracepoint to
net_rx_action, that expose sd->time_squeeze, sd->budget_squeeze and
repoll-not-empty.

--Jesper
  
Jason Xing March 14, 2023, 9:21 a.m. UTC | #9
On Tue, Mar 14, 2023 at 4:41 PM Jesper Dangaard Brouer
<jbrouer@redhat.com> wrote:
>
>
> On 14/03/2023 02.57, Jason Xing wrote:
> > On Tue, Mar 14, 2023 at 5:58 AM Kui-Feng Lee <sinquersw@gmail.com> wrote:
> >>
> >> On 3/11/23 08:36, Jason Xing wrote:
> >>> From: Jason Xing <kernelxing@tencent.com>
> >>>
> >>> When we encounter some performance issue and then get lost on how
> >>> to tune the budget limit and time limit in net_rx_action() function,
> >>> we can separately counting both of them to avoid the confusion.
> >>>
> >>> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> >>> ---
> >>> note: this commit is based on the link as below:
> >>> https://lore.kernel.org/lkml/20230311151756.83302-1-kerneljasonxing@gmail.com/
> >>> ---
> [...]
> >>> diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
> >>> index 97a304e1957a..4d1a499d7c43 100644
> >>> --- a/net/core/net-procfs.c
> >>> +++ b/net/core/net-procfs.c
> >>> @@ -174,14 +174,17 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
> >>>         */
> >>>        seq_printf(seq,
> >>>                   "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x "
> >>> -                "%08x %08x\n",
> >>> -                sd->processed, sd->dropped, sd->time_squeeze, 0,
> >>> +                "%08x %08x %08x %08x\n",
> >>> +                sd->processed, sd->dropped,
> >>> +                0, /* was old way to count time squeeze */
> >>
> >> Should we show a proximate number?  For example,
> >> sd->time_squeeze + sd->bud_squeeze.
> >
> > Yeah, It does make sense. Let the old way to display untouched.
> >
>
[...]
> Yes, I don't think we can/should remove this squeeze stat because
> several tools e.g. my own[1] captures these stats (and I know Willem
> also have his own tool).
> I like the sd->time_squeeze + sd->budget_squeeze suggestion.

So do I. Therefore I followed this suggestion in the next submission.

[1]
https://lore.kernel.org/lkml/20230314030532.9238-3-kerneljasonxing@gmail.com/

>
>   [1]
> https://github.com/netoptimizer/network-testing/blob/master/bin/softnet_stat.pl
>
>
> >>
> >>
> >>> +                0,
> >>>                   0, 0, 0, 0, /* was fastroute */
> >>>                   0,   /* was cpu_collision */
> >>>                   sd->received_rps, flow_limit_count,
> >>>                   0,   /* was len of two backlog queues */
> >>>                   (int)seq->index,
> >>> -                softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd));
> >>> +                softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd),
> >>> +                sd->time_squeeze, sd->budget_squeeze);
> >>>        return 0;
> >>>    }
> >>>
>
[...]
> We recently had a very long troubleshooting session around a latency
> issue (Cc Simon) where we used the tool[1].  The issue was NIC hardware
> RX queue was backlogged, but we didn't see any squeeze events, which
> confused us. (This happens because budget was 300 and two NICs using 64
> budget each doesn't exceed 300).

I recently found some users running on our production environment hit
the time_squeeze very often which aroused my interests.
Env:
1) budget is 300;
2) eth0 is virtio_net which only registers 32 input interrupts (32
queue pairs) with a larger number of cpus online.

>
> We were/are missing another counter to tell us net_rx_action() "repoll"
> is happening (as code !list_empty(&repoll)).  That were the case and it
> would have "told" us that hardware RX ring was full (larger than 64).
>
> We worked around this limitation by using the tracepoint for napi_poll,
> and manually deduced that 64 bulking must mean that "repoll" were happening.
>
> Oneliner bpftrace script:
>
>   bpftrace -e 'tracepoint:napi:napi_poll {
> @napi_rx_bulk[str(args->dev_name)] = lhist(args->work, 0, 64, 4); }'
>
> We used this script (that also measures softirq latency):
>
>
> https://github.com/xdp-project/xdp-project/blob/master/areas/latency/napi_monitor.bt
>
>
[...]
> I do wonder is it would be valuable to *also* add a tracepoint to
> net_rx_action, that expose sd->time_squeeze, sd->budget_squeeze and
> repoll-not-empty.

I believe it's useful that we can show more details in softnet_data,
but I'm confused about how to display them.
This morning I submitted one patch[1] and chose to do such things when
reading the softnet_stat file.

Could we add more data in the softnet_stat file while also tracing
those three important points? I'm not sure.

Thanks,
Jason

>
> --Jesper
>
  

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 6a14b7b11766..5736311a2133 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3157,6 +3157,7 @@  struct softnet_data {
 	/* stats */
 	unsigned int		processed;
 	unsigned int		time_squeeze;
+	unsigned int		budget_squeeze;
 #ifdef CONFIG_RPS
 	struct softnet_data	*rps_ipi_list;
 #endif
diff --git a/net/core/dev.c b/net/core/dev.c
index 253584777101..bed7a68fdb5d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6637,6 +6637,7 @@  static __latent_entropy void net_rx_action(struct softirq_action *h)
 	unsigned long time_limit = jiffies +
 		usecs_to_jiffies(READ_ONCE(netdev_budget_usecs));
 	int budget = READ_ONCE(netdev_budget);
+	bool is_continue = true;
 	LIST_HEAD(list);
 	LIST_HEAD(repoll);
 
@@ -6644,7 +6645,7 @@  static __latent_entropy void net_rx_action(struct softirq_action *h)
 	list_splice_init(&sd->poll_list, &list);
 	local_irq_enable();
 
-	for (;;) {
+	for (; is_continue;) {
 		struct napi_struct *n;
 
 		skb_defer_free_flush(sd);
@@ -6662,10 +6663,13 @@  static __latent_entropy void net_rx_action(struct softirq_action *h)
 		 * Allow this to run for 2 jiffies since which will allow
 		 * an average latency of 1.5/HZ.
 		 */
-		if (unlikely(budget <= 0 ||
-			     time_after_eq(jiffies, time_limit))) {
+		if (unlikely(budget <= 0)) {
+			sd->budget_squeeze++;
+			is_continue = false;
+		}
+		if (unlikely(time_after_eq(jiffies, time_limit))) {
 			sd->time_squeeze++;
-			break;
+			is_continue = false;
 		}
 	}
 
diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
index 97a304e1957a..4d1a499d7c43 100644
--- a/net/core/net-procfs.c
+++ b/net/core/net-procfs.c
@@ -174,14 +174,17 @@  static int softnet_seq_show(struct seq_file *seq, void *v)
 	 */
 	seq_printf(seq,
 		   "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x "
-		   "%08x %08x\n",
-		   sd->processed, sd->dropped, sd->time_squeeze, 0,
+		   "%08x %08x %08x %08x\n",
+		   sd->processed, sd->dropped,
+		   0, /* was old way to count time squeeze */
+		   0,
 		   0, 0, 0, 0, /* was fastroute */
 		   0,	/* was cpu_collision */
 		   sd->received_rps, flow_limit_count,
 		   0,	/* was len of two backlog queues */
 		   (int)seq->index,
-		   softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd));
+		   softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd),
+		   sd->time_squeeze, sd->budget_squeeze);
 	return 0;
 }