Message ID | 20230311163614.92296-1-kerneljasonxing@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp372800wrd; Sat, 11 Mar 2023 09:17:38 -0800 (PST) X-Google-Smtp-Source: AK7set8OmleSxTrMZlV9hLQAOnj9GC09ziQqpmUtoX2Y6VxsAnIj8ZhM2nUMHhFkxH2UK6ox/z40 X-Received: by 2002:a17:90a:7:b0:23a:6be8:9446 with SMTP id 7-20020a17090a000700b0023a6be89446mr30661003pja.48.1678555058532; Sat, 11 Mar 2023 09:17:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678555058; cv=none; d=google.com; s=arc-20160816; b=ozOzKDPiScr6yQoLSxb0AbwMzc+eQ0JqbJItoacqO+j0zEupd4JalitwXKx2zh7cdE nf1Wv138OLyOc15WOp9oGesWI/RvyXOnOrf51liV4PEiVl/FXiMYJQEJBLl/JX8Zr920 2LPTEy2QPurSju9toanvNZGgeeQuPaucw7IDZlp2+ij89Bt2Pah8TmvgjS5Yv6bcuSN6 Rh+2auh5E3nDFvmrE5akMtmkS7xnxUxhZ2sYYgjp+xvHXwtQJJE7IDdBchysU7hHOwY7 LPhI1S1AU+WsDRyougaFkZfOgiN6vk+VFb6eYDXBlH8jAuHKyLGW+pOS7bK9MQ6f2JHh 9x0A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=R27q+ruFizVfsZnI4Ph2nW9rPNDoG0WYlTi4uPHonSg=; b=iNC/QMuJ0jREnHj8EtMxQax/Vh8STYOy85R9ojqQqUgL1uuoRnBIJfR49Q8Dvwmci2 ChCiU2hjLrShiPKHrFf+G/86G4SKGmgDwaP1rZF+rmX5C0tLAWTtzkIa8xWXMoTmirXy HKYlz+EZkpPL00vwwlBowB6cpZXblO1EjjHkFy9qs7wHsPYZLGEtXgaBsAjEG1sdvKD2 D/HdOu6VHeZ2QFzkNTSGkYMV4dWZ/E+1uhv4zcTjCzRJD8UGah4KfY+jHWMlzWK01THD 5baBWVXmeODwYI8bk9+F0zO2ShMEwGYafulZpJt/ZmZwEkVoXlJ/KtMTm2+AqFP+mq2w 3Bvw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=KULkSXc2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b2-20020a17090a100200b0023b4f1df5e3si264869pja.190.2023.03.11.09.17.23; Sat, 11 Mar 2023 09:17:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=KULkSXc2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229817AbjCKQg1 (ORCPT <rfc822;toshivichauhan@gmail.com> + 99 others); Sat, 11 Mar 2023 11:36:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54788 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229874AbjCKQgZ (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sat, 11 Mar 2023 11:36:25 -0500 Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E128F5CEC9; Sat, 11 Mar 2023 08:36:22 -0800 (PST) Received: by mail-pf1-x431.google.com with SMTP id y10so5409376pfi.8; Sat, 11 Mar 2023 08:36:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1678552582; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=R27q+ruFizVfsZnI4Ph2nW9rPNDoG0WYlTi4uPHonSg=; b=KULkSXc2OamSY41P4speudAtRYuqSs1EfdQ1mcnmMMAVtPTtjJnUGfsPFOdwnYKaU0 4YPTr9wbLFgO8N4nAkhWn0OoB/s8o6ZSt/86vp+yPgefnYxXtNW+fYHl7LGtS/fNfikl yY06bXV3uSwwrzTRd2KfECm0hSY8gtvVbCcbjunDWGl6kXR/kKHBR4A5gOn2d2ALQ7tU pSgpdW7Ss1f4Sa76Ftb0YgBuYM1+Ygkw+dqcXvMcMQOJJM+APufjIJYXn9P7FdHnbzcp IIk5NyryV12bqfRh2/y6wMH7QLLqUlvS8bX6xN5FaxS2ODSuTLGFSZ+N4lkP4nw4MA4y jLUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678552582; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=R27q+ruFizVfsZnI4Ph2nW9rPNDoG0WYlTi4uPHonSg=; b=elAU/9Tf5XWqutregwaey5Hb9kEPDLMehthz+7/sWahcfUQHvYs9obv5VlHsPtCiWR 1k//7yAUYCzimDC936UkCbr3k0Nab93TTkJFv950kmy8+RFd+hyYKTgXsIFVn1K3Ncdx 9ID/kNlJ5F2B/0jFjMKP9bB8XmEhidpgDq15U81BxyRq27naMSrxyowUUM5G762kpD01 YCAh/HRaVKt+opmh+mg0JYj01dINcWv1ZS0Z6V5ujOOlMVeg+Ct4IrDpZmlHbgc0t/Wd POXu1RR4fnFD+MNqi1L5hl/Yv2pazalbaesNW4hAtGzMg88EGQo0S+LiAGTmKkG7GYQz zsyA== X-Gm-Message-State: AO0yUKWp4VpjOW1lyQ2VssIuYkDpaamKGVP1dsWVkNcc/nyzpDhSalPK zQIy4n+o0B+4h3Cl57OSubU= X-Received: by 2002:aa7:96e2:0:b0:5a9:cbc3:ca70 with SMTP id i2-20020aa796e2000000b005a9cbc3ca70mr27132003pfq.24.1678552582273; Sat, 11 Mar 2023 08:36:22 -0800 (PST) Received: from KERNELXING-MB0.tencent.com ([114.253.32.213]) by smtp.gmail.com with ESMTPSA id g21-20020a62e315000000b0058e08796e98sm1648917pfh.196.2023.03.11.08.36.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 11 Mar 2023 08:36:21 -0800 (PST) From: Jason Xing <kerneljasonxing@gmail.com> To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, ast@kernel.org, daniel@iogearbox.net, hawk@kernel.org, john.fastabend@gmail.com Cc: kuniyu@amazon.com, liuhangbin@gmail.com, xiangxia.m.yue@gmail.com, jiri@nvidia.com, andy.ren@getcruise.com, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, kerneljasonxing@gmail.com, Jason Xing <kernelxing@tencent.com> Subject: [PATCH net-next] net: introduce budget_squeeze to help us tune rx behavior Date: Sun, 12 Mar 2023 00:36:14 +0800 Message-Id: <20230311163614.92296-1-kerneljasonxing@gmail.com> X-Mailer: git-send-email 2.33.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760092549156628050?= X-GMAIL-MSGID: =?utf-8?q?1760092549156628050?= |
Series |
[net-next] net: introduce budget_squeeze to help us tune rx behavior
|
|
Commit Message
Jason Xing
March 11, 2023, 4:36 p.m. UTC
From: Jason Xing <kernelxing@tencent.com> When we encounter some performance issue and then get lost on how to tune the budget limit and time limit in net_rx_action() function, we can separately counting both of them to avoid the confusion. Signed-off-by: Jason Xing <kernelxing@tencent.com> --- note: this commit is based on the link as below: https://lore.kernel.org/lkml/20230311151756.83302-1-kerneljasonxing@gmail.com/ --- include/linux/netdevice.h | 1 + net/core/dev.c | 12 ++++++++---- net/core/net-procfs.c | 9 ++++++--- 3 files changed, 15 insertions(+), 7 deletions(-)
Comments
On Sun, 12 Mar 2023 00:36:14 +0800 Jason Xing <kerneljasonxing@gmail.com> wrote: > - for (;;) { > + for (; is_continue;) { Easier to read this as a while (is_continue) { but what is wrong with using break; instead?
On Sun, Mar 12, 2023 at 1:14 AM Stephen Hemminger <stephen@networkplumber.org> wrote: > > On Sun, 12 Mar 2023 00:36:14 +0800 > Jason Xing <kerneljasonxing@gmail.com> wrote: > > > - for (;;) { > > + for (; is_continue;) { > > > Easier to read this as a > while (is_continue) { > > but what is wrong with using break; instead? If we hit the budget limit and 'break;' immediately, we may miss the collection when we also hit the time limit. That's why I would like to know if we hit both of them. Thank, Jason
On Sun, Mar 12, 2023 at 12:36 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > From: Jason Xing <kernelxing@tencent.com> > > When we encounter some performance issue and then get lost on how > to tune the budget limit and time limit in net_rx_action() function, > we can separately counting both of them to avoid the confusion. > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > --- > note: this commit is based on the link as below: > https://lore.kernel.org/lkml/20230311151756.83302-1-kerneljasonxing@gmail.com/ > --- > include/linux/netdevice.h | 1 + > net/core/dev.c | 12 ++++++++---- > net/core/net-procfs.c | 9 ++++++--- > 3 files changed, 15 insertions(+), 7 deletions(-) > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 6a14b7b11766..5736311a2133 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -3157,6 +3157,7 @@ struct softnet_data { > /* stats */ > unsigned int processed; > unsigned int time_squeeze; > + unsigned int budget_squeeze; > #ifdef CONFIG_RPS > struct softnet_data *rps_ipi_list; > #endif > diff --git a/net/core/dev.c b/net/core/dev.c > index 253584777101..bed7a68fdb5d 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -6637,6 +6637,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) > unsigned long time_limit = jiffies + > usecs_to_jiffies(READ_ONCE(netdev_budget_usecs)); > int budget = READ_ONCE(netdev_budget); > + bool is_continue = true; I kept thinking during these days, I think it looks not that concise and elegant and also the name is not that good though the function can work. In the next submission, I'm going to choose to use 'while()' instead of 'for()' suggested by Stephen. Does anyone else have some advice about this? Thanks, Jason > LIST_HEAD(list); > LIST_HEAD(repoll); > > @@ -6644,7 +6645,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) > list_splice_init(&sd->poll_list, &list); > local_irq_enable(); > > - for (;;) { > + for (; is_continue;) { > struct napi_struct *n; > > skb_defer_free_flush(sd); > @@ -6662,10 +6663,13 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) > * Allow this to run for 2 jiffies since which will allow > * an average latency of 1.5/HZ. > */ > - if (unlikely(budget <= 0 || > - time_after_eq(jiffies, time_limit))) { > + if (unlikely(budget <= 0)) { > + sd->budget_squeeze++; > + is_continue = false; > + } > + if (unlikely(time_after_eq(jiffies, time_limit))) { > sd->time_squeeze++; > - break; > + is_continue = false; > } > } > > diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c > index 97a304e1957a..4d1a499d7c43 100644 > --- a/net/core/net-procfs.c > +++ b/net/core/net-procfs.c > @@ -174,14 +174,17 @@ static int softnet_seq_show(struct seq_file *seq, void *v) > */ > seq_printf(seq, > "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x " > - "%08x %08x\n", > - sd->processed, sd->dropped, sd->time_squeeze, 0, > + "%08x %08x %08x %08x\n", > + sd->processed, sd->dropped, > + 0, /* was old way to count time squeeze */ > + 0, > 0, 0, 0, 0, /* was fastroute */ > 0, /* was cpu_collision */ > sd->received_rps, flow_limit_count, > 0, /* was len of two backlog queues */ > (int)seq->index, > - softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd)); > + softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd), > + sd->time_squeeze, sd->budget_squeeze); > return 0; > } > > -- > 2.37.3 >
On Mon, Mar 13, 2023 at 10:05:18AM +0800, Jason Xing wrote: > On Sun, Mar 12, 2023 at 12:36 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > From: Jason Xing <kernelxing@tencent.com> > > > > When we encounter some performance issue and then get lost on how > > to tune the budget limit and time limit in net_rx_action() function, > > we can separately counting both of them to avoid the confusion. > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > --- > > note: this commit is based on the link as below: > > https://lore.kernel.org/lkml/20230311151756.83302-1-kerneljasonxing@gmail.com/ > > --- > > include/linux/netdevice.h | 1 + > > net/core/dev.c | 12 ++++++++---- > > net/core/net-procfs.c | 9 ++++++--- > > 3 files changed, 15 insertions(+), 7 deletions(-) > > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > > index 6a14b7b11766..5736311a2133 100644 > > --- a/include/linux/netdevice.h > > +++ b/include/linux/netdevice.h > > @@ -3157,6 +3157,7 @@ struct softnet_data { > > /* stats */ > > unsigned int processed; > > unsigned int time_squeeze; > > + unsigned int budget_squeeze; > > #ifdef CONFIG_RPS > > struct softnet_data *rps_ipi_list; > > #endif > > diff --git a/net/core/dev.c b/net/core/dev.c > > index 253584777101..bed7a68fdb5d 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -6637,6 +6637,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) > > unsigned long time_limit = jiffies + > > usecs_to_jiffies(READ_ONCE(netdev_budget_usecs)); > > int budget = READ_ONCE(netdev_budget); > > + bool is_continue = true; > > I kept thinking during these days, I think it looks not that concise > and elegant and also the name is not that good though the function can > work. > > In the next submission, I'm going to choose to use 'while()' instead > of 'for()' suggested by Stephen. > > Does anyone else have some advice about this? What about: int done = false while (!done) { ... } Or: for (;;) { int done = false; ... if (done) break; } > > Thanks, > Jason > > > LIST_HEAD(list); > > LIST_HEAD(repoll); > > > > @@ -6644,7 +6645,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) > > list_splice_init(&sd->poll_list, &list); > > local_irq_enable(); > > > > - for (;;) { > > + for (; is_continue;) { > > struct napi_struct *n; > > > > skb_defer_free_flush(sd); > > @@ -6662,10 +6663,13 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) > > * Allow this to run for 2 jiffies since which will allow > > * an average latency of 1.5/HZ. > > */ > > - if (unlikely(budget <= 0 || > > - time_after_eq(jiffies, time_limit))) { > > + if (unlikely(budget <= 0)) { > > + sd->budget_squeeze++; > > + is_continue = false; > > + } > > + if (unlikely(time_after_eq(jiffies, time_limit))) { > > sd->time_squeeze++; > > - break; > > + is_continue = false; > > } > > } > > > > diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c > > index 97a304e1957a..4d1a499d7c43 100644 > > --- a/net/core/net-procfs.c > > +++ b/net/core/net-procfs.c > > @@ -174,14 +174,17 @@ static int softnet_seq_show(struct seq_file *seq, void *v) > > */ > > seq_printf(seq, > > "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x " > > - "%08x %08x\n", > > - sd->processed, sd->dropped, sd->time_squeeze, 0, > > + "%08x %08x %08x %08x\n", > > + sd->processed, sd->dropped, > > + 0, /* was old way to count time squeeze */ > > + 0, > > 0, 0, 0, 0, /* was fastroute */ > > 0, /* was cpu_collision */ > > sd->received_rps, flow_limit_count, > > 0, /* was len of two backlog queues */ > > (int)seq->index, > > - softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd)); > > + softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd), > > + sd->time_squeeze, sd->budget_squeeze); > > return 0; > > } > > > > -- > > 2.37.3 > > >
On 3/11/23 08:36, Jason Xing wrote: > From: Jason Xing <kernelxing@tencent.com> > > When we encounter some performance issue and then get lost on how > to tune the budget limit and time limit in net_rx_action() function, > we can separately counting both of them to avoid the confusion. > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > --- > note: this commit is based on the link as below: > https://lore.kernel.org/lkml/20230311151756.83302-1-kerneljasonxing@gmail.com/ > --- > include/linux/netdevice.h | 1 + > net/core/dev.c | 12 ++++++++---- > net/core/net-procfs.c | 9 ++++++--- > 3 files changed, 15 insertions(+), 7 deletions(-) > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 6a14b7b11766..5736311a2133 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -3157,6 +3157,7 @@ struct softnet_data { > /* stats */ > unsigned int processed; > unsigned int time_squeeze; > + unsigned int budget_squeeze; > #ifdef CONFIG_RPS > struct softnet_data *rps_ipi_list; > #endif > diff --git a/net/core/dev.c b/net/core/dev.c > index 253584777101..bed7a68fdb5d 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -6637,6 +6637,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) > unsigned long time_limit = jiffies + > usecs_to_jiffies(READ_ONCE(netdev_budget_usecs)); > int budget = READ_ONCE(netdev_budget); > + bool is_continue = true; > LIST_HEAD(list); > LIST_HEAD(repoll); > > @@ -6644,7 +6645,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) > list_splice_init(&sd->poll_list, &list); > local_irq_enable(); > > - for (;;) { > + for (; is_continue;) { > struct napi_struct *n; > > skb_defer_free_flush(sd); > @@ -6662,10 +6663,13 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) > * Allow this to run for 2 jiffies since which will allow > * an average latency of 1.5/HZ. > */ > - if (unlikely(budget <= 0 || > - time_after_eq(jiffies, time_limit))) { > + if (unlikely(budget <= 0)) { > + sd->budget_squeeze++; > + is_continue = false; > + } > + if (unlikely(time_after_eq(jiffies, time_limit))) { > sd->time_squeeze++; > - break; > + is_continue = false; > } > } > > diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c > index 97a304e1957a..4d1a499d7c43 100644 > --- a/net/core/net-procfs.c > +++ b/net/core/net-procfs.c > @@ -174,14 +174,17 @@ static int softnet_seq_show(struct seq_file *seq, void *v) > */ > seq_printf(seq, > "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x " > - "%08x %08x\n", > - sd->processed, sd->dropped, sd->time_squeeze, 0, > + "%08x %08x %08x %08x\n", > + sd->processed, sd->dropped, > + 0, /* was old way to count time squeeze */ Should we show a proximate number? For example, sd->time_squeeze + sd->bud_squeeze. > + 0, > 0, 0, 0, 0, /* was fastroute */ > 0, /* was cpu_collision */ > sd->received_rps, flow_limit_count, > 0, /* was len of two backlog queues */ > (int)seq->index, > - softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd)); > + softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd), > + sd->time_squeeze, sd->budget_squeeze); > return 0; > } >
On Tue, Mar 14, 2023 at 4:07 AM Simon Horman <simon.horman@corigine.com> wrote: > > On Mon, Mar 13, 2023 at 10:05:18AM +0800, Jason Xing wrote: > > On Sun, Mar 12, 2023 at 12:36 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > When we encounter some performance issue and then get lost on how > > > to tune the budget limit and time limit in net_rx_action() function, > > > we can separately counting both of them to avoid the confusion. > > > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > --- > > > note: this commit is based on the link as below: > > > https://lore.kernel.org/lkml/20230311151756.83302-1-kerneljasonxing@gmail.com/ > > > --- > > > include/linux/netdevice.h | 1 + > > > net/core/dev.c | 12 ++++++++---- > > > net/core/net-procfs.c | 9 ++++++--- > > > 3 files changed, 15 insertions(+), 7 deletions(-) > > > > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > > > index 6a14b7b11766..5736311a2133 100644 > > > --- a/include/linux/netdevice.h > > > +++ b/include/linux/netdevice.h > > > @@ -3157,6 +3157,7 @@ struct softnet_data { > > > /* stats */ > > > unsigned int processed; > > > unsigned int time_squeeze; > > > + unsigned int budget_squeeze; > > > #ifdef CONFIG_RPS > > > struct softnet_data *rps_ipi_list; > > > #endif > > > diff --git a/net/core/dev.c b/net/core/dev.c > > > index 253584777101..bed7a68fdb5d 100644 > > > --- a/net/core/dev.c > > > +++ b/net/core/dev.c > > > @@ -6637,6 +6637,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) > > > unsigned long time_limit = jiffies + > > > usecs_to_jiffies(READ_ONCE(netdev_budget_usecs)); > > > int budget = READ_ONCE(netdev_budget); > > > + bool is_continue = true; > > > > I kept thinking during these days, I think it looks not that concise > > and elegant and also the name is not that good though the function can > > work. > > > > In the next submission, I'm going to choose to use 'while()' instead > > of 'for()' suggested by Stephen. > > > > Does anyone else have some advice about this? > > What about: > > int done = false > > while (!done) { > ... > } > > Or: > > for (;;) { > int done = false; > > ... > if (done) > break; > } > Great, that looks much better:) Thanks, Jason > > > > Thanks, > > Jason > > > > > LIST_HEAD(list); > > > LIST_HEAD(repoll); > > > > > > @@ -6644,7 +6645,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) > > > list_splice_init(&sd->poll_list, &list); > > > local_irq_enable(); > > > > > > - for (;;) { > > > + for (; is_continue;) { > > > struct napi_struct *n; > > > > > > skb_defer_free_flush(sd); > > > @@ -6662,10 +6663,13 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) > > > * Allow this to run for 2 jiffies since which will allow > > > * an average latency of 1.5/HZ. > > > */ > > > - if (unlikely(budget <= 0 || > > > - time_after_eq(jiffies, time_limit))) { > > > + if (unlikely(budget <= 0)) { > > > + sd->budget_squeeze++; > > > + is_continue = false; > > > + } > > > + if (unlikely(time_after_eq(jiffies, time_limit))) { > > > sd->time_squeeze++; > > > - break; > > > + is_continue = false; > > > } > > > } > > > > > > diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c > > > index 97a304e1957a..4d1a499d7c43 100644 > > > --- a/net/core/net-procfs.c > > > +++ b/net/core/net-procfs.c > > > @@ -174,14 +174,17 @@ static int softnet_seq_show(struct seq_file *seq, void *v) > > > */ > > > seq_printf(seq, > > > "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x " > > > - "%08x %08x\n", > > > - sd->processed, sd->dropped, sd->time_squeeze, 0, > > > + "%08x %08x %08x %08x\n", > > > + sd->processed, sd->dropped, > > > + 0, /* was old way to count time squeeze */ > > > + 0, > > > 0, 0, 0, 0, /* was fastroute */ > > > 0, /* was cpu_collision */ > > > sd->received_rps, flow_limit_count, > > > 0, /* was len of two backlog queues */ > > > (int)seq->index, > > > - softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd)); > > > + softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd), > > > + sd->time_squeeze, sd->budget_squeeze); > > > return 0; > > > } > > > > > > -- > > > 2.37.3 > > > > >
On Tue, Mar 14, 2023 at 5:58 AM Kui-Feng Lee <sinquersw@gmail.com> wrote: > > > > On 3/11/23 08:36, Jason Xing wrote: > > From: Jason Xing <kernelxing@tencent.com> > > > > When we encounter some performance issue and then get lost on how > > to tune the budget limit and time limit in net_rx_action() function, > > we can separately counting both of them to avoid the confusion. > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > --- > > note: this commit is based on the link as below: > > https://lore.kernel.org/lkml/20230311151756.83302-1-kerneljasonxing@gmail.com/ > > --- > > include/linux/netdevice.h | 1 + > > net/core/dev.c | 12 ++++++++---- > > net/core/net-procfs.c | 9 ++++++--- > > 3 files changed, 15 insertions(+), 7 deletions(-) > > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > > index 6a14b7b11766..5736311a2133 100644 > > --- a/include/linux/netdevice.h > > +++ b/include/linux/netdevice.h > > @@ -3157,6 +3157,7 @@ struct softnet_data { > > /* stats */ > > unsigned int processed; > > unsigned int time_squeeze; > > + unsigned int budget_squeeze; > > #ifdef CONFIG_RPS > > struct softnet_data *rps_ipi_list; > > #endif > > diff --git a/net/core/dev.c b/net/core/dev.c > > index 253584777101..bed7a68fdb5d 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -6637,6 +6637,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) > > unsigned long time_limit = jiffies + > > usecs_to_jiffies(READ_ONCE(netdev_budget_usecs)); > > int budget = READ_ONCE(netdev_budget); > > + bool is_continue = true; > > LIST_HEAD(list); > > LIST_HEAD(repoll); > > > > @@ -6644,7 +6645,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) > > list_splice_init(&sd->poll_list, &list); > > local_irq_enable(); > > > > - for (;;) { > > + for (; is_continue;) { > > struct napi_struct *n; > > > > skb_defer_free_flush(sd); > > @@ -6662,10 +6663,13 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) > > * Allow this to run for 2 jiffies since which will allow > > * an average latency of 1.5/HZ. > > */ > > - if (unlikely(budget <= 0 || > > - time_after_eq(jiffies, time_limit))) { > > + if (unlikely(budget <= 0)) { > > + sd->budget_squeeze++; > > + is_continue = false; > > + } > > + if (unlikely(time_after_eq(jiffies, time_limit))) { > > sd->time_squeeze++; > > - break; > > + is_continue = false; > > } > > } > > > > diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c > > index 97a304e1957a..4d1a499d7c43 100644 > > --- a/net/core/net-procfs.c > > +++ b/net/core/net-procfs.c > > @@ -174,14 +174,17 @@ static int softnet_seq_show(struct seq_file *seq, void *v) > > */ > > seq_printf(seq, > > "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x " > > - "%08x %08x\n", > > - sd->processed, sd->dropped, sd->time_squeeze, 0, > > + "%08x %08x %08x %08x\n", > > + sd->processed, sd->dropped, > > + 0, /* was old way to count time squeeze */ > > Should we show a proximate number? For example, > sd->time_squeeze + sd->bud_squeeze. Yeah, It does make sense. Let the old way to display untouched. > > > > + 0, > > 0, 0, 0, 0, /* was fastroute */ > > 0, /* was cpu_collision */ > > sd->received_rps, flow_limit_count, > > 0, /* was len of two backlog queues */ > > (int)seq->index, > > - softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd)); > > + softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd), > > + sd->time_squeeze, sd->budget_squeeze); > > return 0; > > } > >
On 14/03/2023 02.57, Jason Xing wrote: > On Tue, Mar 14, 2023 at 5:58 AM Kui-Feng Lee <sinquersw@gmail.com> wrote: >> >> On 3/11/23 08:36, Jason Xing wrote: >>> From: Jason Xing <kernelxing@tencent.com> >>> >>> When we encounter some performance issue and then get lost on how >>> to tune the budget limit and time limit in net_rx_action() function, >>> we can separately counting both of them to avoid the confusion. >>> >>> Signed-off-by: Jason Xing <kernelxing@tencent.com> >>> --- >>> note: this commit is based on the link as below: >>> https://lore.kernel.org/lkml/20230311151756.83302-1-kerneljasonxing@gmail.com/ >>> --- [...] >>> diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c >>> index 97a304e1957a..4d1a499d7c43 100644 >>> --- a/net/core/net-procfs.c >>> +++ b/net/core/net-procfs.c >>> @@ -174,14 +174,17 @@ static int softnet_seq_show(struct seq_file *seq, void *v) >>> */ >>> seq_printf(seq, >>> "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x " >>> - "%08x %08x\n", >>> - sd->processed, sd->dropped, sd->time_squeeze, 0, >>> + "%08x %08x %08x %08x\n", >>> + sd->processed, sd->dropped, >>> + 0, /* was old way to count time squeeze */ >> >> Should we show a proximate number? For example, >> sd->time_squeeze + sd->bud_squeeze. > > Yeah, It does make sense. Let the old way to display untouched. > Yes, I don't think we can/should remove this squeeze stat because several tools e.g. my own[1] captures these stats (and I know Willem also have his own tool). I like the sd->time_squeeze + sd->budget_squeeze suggestion. [1] https://github.com/netoptimizer/network-testing/blob/master/bin/softnet_stat.pl >> >> >>> + 0, >>> 0, 0, 0, 0, /* was fastroute */ >>> 0, /* was cpu_collision */ >>> sd->received_rps, flow_limit_count, >>> 0, /* was len of two backlog queues */ >>> (int)seq->index, >>> - softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd)); >>> + softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd), >>> + sd->time_squeeze, sd->budget_squeeze); >>> return 0; >>> } >>> We recently had a very long troubleshooting session around a latency issue (Cc Simon) where we used the tool[1]. The issue was NIC hardware RX queue was backlogged, but we didn't see any squeeze events, which confused us. (This happens because budget was 300 and two NICs using 64 budget each doesn't exceed 300). We were/are missing another counter to tell us net_rx_action() "repoll" is happening (as code !list_empty(&repoll)). That were the case and it would have "told" us that hardware RX ring was full (larger than 64). We worked around this limitation by using the tracepoint for napi_poll, and manually deduced that 64 bulking must mean that "repoll" were happening. Oneliner bpftrace script: bpftrace -e 'tracepoint:napi:napi_poll { @napi_rx_bulk[str(args->dev_name)] = lhist(args->work, 0, 64, 4); }' We used this script (that also measures softirq latency): https://github.com/xdp-project/xdp-project/blob/master/areas/latency/napi_monitor.bt I do wonder is it would be valuable to *also* add a tracepoint to net_rx_action, that expose sd->time_squeeze, sd->budget_squeeze and repoll-not-empty. --Jesper
On Tue, Mar 14, 2023 at 4:41 PM Jesper Dangaard Brouer <jbrouer@redhat.com> wrote: > > > On 14/03/2023 02.57, Jason Xing wrote: > > On Tue, Mar 14, 2023 at 5:58 AM Kui-Feng Lee <sinquersw@gmail.com> wrote: > >> > >> On 3/11/23 08:36, Jason Xing wrote: > >>> From: Jason Xing <kernelxing@tencent.com> > >>> > >>> When we encounter some performance issue and then get lost on how > >>> to tune the budget limit and time limit in net_rx_action() function, > >>> we can separately counting both of them to avoid the confusion. > >>> > >>> Signed-off-by: Jason Xing <kernelxing@tencent.com> > >>> --- > >>> note: this commit is based on the link as below: > >>> https://lore.kernel.org/lkml/20230311151756.83302-1-kerneljasonxing@gmail.com/ > >>> --- > [...] > >>> diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c > >>> index 97a304e1957a..4d1a499d7c43 100644 > >>> --- a/net/core/net-procfs.c > >>> +++ b/net/core/net-procfs.c > >>> @@ -174,14 +174,17 @@ static int softnet_seq_show(struct seq_file *seq, void *v) > >>> */ > >>> seq_printf(seq, > >>> "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x " > >>> - "%08x %08x\n", > >>> - sd->processed, sd->dropped, sd->time_squeeze, 0, > >>> + "%08x %08x %08x %08x\n", > >>> + sd->processed, sd->dropped, > >>> + 0, /* was old way to count time squeeze */ > >> > >> Should we show a proximate number? For example, > >> sd->time_squeeze + sd->bud_squeeze. > > > > Yeah, It does make sense. Let the old way to display untouched. > > > [...] > Yes, I don't think we can/should remove this squeeze stat because > several tools e.g. my own[1] captures these stats (and I know Willem > also have his own tool). > I like the sd->time_squeeze + sd->budget_squeeze suggestion. So do I. Therefore I followed this suggestion in the next submission. [1] https://lore.kernel.org/lkml/20230314030532.9238-3-kerneljasonxing@gmail.com/ > > [1] > https://github.com/netoptimizer/network-testing/blob/master/bin/softnet_stat.pl > > > >> > >> > >>> + 0, > >>> 0, 0, 0, 0, /* was fastroute */ > >>> 0, /* was cpu_collision */ > >>> sd->received_rps, flow_limit_count, > >>> 0, /* was len of two backlog queues */ > >>> (int)seq->index, > >>> - softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd)); > >>> + softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd), > >>> + sd->time_squeeze, sd->budget_squeeze); > >>> return 0; > >>> } > >>> > [...] > We recently had a very long troubleshooting session around a latency > issue (Cc Simon) where we used the tool[1]. The issue was NIC hardware > RX queue was backlogged, but we didn't see any squeeze events, which > confused us. (This happens because budget was 300 and two NICs using 64 > budget each doesn't exceed 300). I recently found some users running on our production environment hit the time_squeeze very often which aroused my interests. Env: 1) budget is 300; 2) eth0 is virtio_net which only registers 32 input interrupts (32 queue pairs) with a larger number of cpus online. > > We were/are missing another counter to tell us net_rx_action() "repoll" > is happening (as code !list_empty(&repoll)). That were the case and it > would have "told" us that hardware RX ring was full (larger than 64). > > We worked around this limitation by using the tracepoint for napi_poll, > and manually deduced that 64 bulking must mean that "repoll" were happening. > > Oneliner bpftrace script: > > bpftrace -e 'tracepoint:napi:napi_poll { > @napi_rx_bulk[str(args->dev_name)] = lhist(args->work, 0, 64, 4); }' > > We used this script (that also measures softirq latency): > > > https://github.com/xdp-project/xdp-project/blob/master/areas/latency/napi_monitor.bt > > [...] > I do wonder is it would be valuable to *also* add a tracepoint to > net_rx_action, that expose sd->time_squeeze, sd->budget_squeeze and > repoll-not-empty. I believe it's useful that we can show more details in softnet_data, but I'm confused about how to display them. This morning I submitted one patch[1] and chose to do such things when reading the softnet_stat file. Could we add more data in the softnet_stat file while also tracing those three important points? I'm not sure. Thanks, Jason > > --Jesper >
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 6a14b7b11766..5736311a2133 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3157,6 +3157,7 @@ struct softnet_data { /* stats */ unsigned int processed; unsigned int time_squeeze; + unsigned int budget_squeeze; #ifdef CONFIG_RPS struct softnet_data *rps_ipi_list; #endif diff --git a/net/core/dev.c b/net/core/dev.c index 253584777101..bed7a68fdb5d 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6637,6 +6637,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) unsigned long time_limit = jiffies + usecs_to_jiffies(READ_ONCE(netdev_budget_usecs)); int budget = READ_ONCE(netdev_budget); + bool is_continue = true; LIST_HEAD(list); LIST_HEAD(repoll); @@ -6644,7 +6645,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) list_splice_init(&sd->poll_list, &list); local_irq_enable(); - for (;;) { + for (; is_continue;) { struct napi_struct *n; skb_defer_free_flush(sd); @@ -6662,10 +6663,13 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) * Allow this to run for 2 jiffies since which will allow * an average latency of 1.5/HZ. */ - if (unlikely(budget <= 0 || - time_after_eq(jiffies, time_limit))) { + if (unlikely(budget <= 0)) { + sd->budget_squeeze++; + is_continue = false; + } + if (unlikely(time_after_eq(jiffies, time_limit))) { sd->time_squeeze++; - break; + is_continue = false; } } diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c index 97a304e1957a..4d1a499d7c43 100644 --- a/net/core/net-procfs.c +++ b/net/core/net-procfs.c @@ -174,14 +174,17 @@ static int softnet_seq_show(struct seq_file *seq, void *v) */ seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x " - "%08x %08x\n", - sd->processed, sd->dropped, sd->time_squeeze, 0, + "%08x %08x %08x %08x\n", + sd->processed, sd->dropped, + 0, /* was old way to count time squeeze */ + 0, 0, 0, 0, 0, /* was fastroute */ 0, /* was cpu_collision */ sd->received_rps, flow_limit_count, 0, /* was len of two backlog queues */ (int)seq->index, - softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd)); + softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd), + sd->time_squeeze, sd->budget_squeeze); return 0; }