Message ID | 202212031612057505056@zte.com.cn |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1268676wrr; Sat, 3 Dec 2022 00:31:54 -0800 (PST) X-Google-Smtp-Source: AA0mqf7XXjzKmCJCj3hATuFOWZ7Lc7gBTOFZ/GbSoOX+14W5lNQSWLLmtABhA/W4vsrNEYw3m++K X-Received: by 2002:a63:1055:0:b0:46e:f011:9548 with SMTP id 21-20020a631055000000b0046ef0119548mr47895793pgq.553.1670056314323; Sat, 03 Dec 2022 00:31:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670056314; cv=none; d=google.com; s=arc-20160816; b=s01XTPZ4ZI0buXgAmGoeYcezyB0T7lFqtRykRxsv9K1SKOIv8sHTAmuatskYh3S7Cz VOYmY1nCASkxcsar3HU9VUORdHmVWJBqwx7rcSZ2zRHRd8qr/HjsY1GLZ6J9mXHekRdy vUWBuReOCQIQwTSZ6hqHK3BisyP9M8ITog0rRghEw9jeItaxSxPEhz3DmWKzTbTqaCz4 TPSvffv3lxmQcYoyCMUmhJde0r7csSsnvWjWmvvkIQXzcrGiPO/MCnl5ar1S2qxJ7JH8 EtETf3OPzZt1eHVVVrrsdVRzcQ0wgK8yjvVHksgsYH0MAT78jhqJ1bW9i5oBUEblGUfa Lpjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:subject:cc:to:from:mime-version:message-id:date; bh=xsuNkw6fCaR7AzjTSr4RpJ63Za3mHj+jI4B9YhqDd0Q=; b=eNj+JR8k511BPLUeldIa4VT1apnCasQFxVc+LLvItAbPaKdhJXK3i1foZPPtaxGbmD nL8bopyMzaDYNBGp0b/hZ2zqd59YZ7S2+uJTyylqs6IfoHvWJdvxlD2bkepkRyKAzFsK I1gljvKoOHbkKXDkwJ4S4LKsncQ3OfN0eFfYVo3UKhiwhV61QZDN3mGd9TeDm2+b5zsf PwXrSIWnZwsGcF98rrOufazwtG78QU95ACbYbWFBW03dYTY+S7a5MgAdXPgAfGlQ4nCg PJfTCYvEqWOHH3PKIOUYjhVif+7YuPGetvqjsJlB5KLviQT2x650XWPwSGi2KL7HmaWE 4rYA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=zte.com.cn Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a13-20020a170902eccd00b0018018272902si10192844plh.554.2022.12.03.00.31.41; Sat, 03 Dec 2022 00:31:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=zte.com.cn Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231397AbiLCIMR (ORCPT <rfc822;lhua1029@gmail.com> + 99 others); Sat, 3 Dec 2022 03:12:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44318 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229781AbiLCIMP (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sat, 3 Dec 2022 03:12:15 -0500 Received: from mxct.zte.com.cn (mxct.zte.com.cn [183.62.165.209]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E153115730; Sat, 3 Dec 2022 00:12:12 -0800 (PST) Received: from mse-fl1.zte.com.cn (unknown [10.5.228.132]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mxct.zte.com.cn (FangMail) with ESMTPS id 4NPMwp4Gt0z4y0v8; Sat, 3 Dec 2022 16:12:10 +0800 (CST) Received: from szxlzmapp01.zte.com.cn ([10.5.231.85]) by mse-fl1.zte.com.cn with SMTP id 2B38C3OV026108; Sat, 3 Dec 2022 16:12:03 +0800 (+08) (envelope-from yang.yang29@zte.com.cn) Received: from mapi (szxlzmapp02[null]) by mapi (Zmail) with MAPI id mid14; Sat, 3 Dec 2022 16:12:05 +0800 (CST) Date: Sat, 3 Dec 2022 16:12:05 +0800 (CST) X-Zmail-TransId: 2b04638b04d5ffffffffba98b1a0 X-Mailer: Zmail v1.0 Message-ID: <202212031612057505056@zte.com.cn> Mime-Version: 1.0 From: <yang.yang29@zte.com.cn> To: <davem@davemloft.net>, <edumazet@google.com>, <kuba@kernel.org> Cc: <pabeni@redhat.com>, <bigeasy@linutronix.de>, <imagedong@tencent.com>, <kuniyu@amazon.com>, <petrm@nvidia.com>, <liu3101@purdue.edu>, <wujianguo@chinatelecom.cn>, <netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org> Subject: =?utf-8?q?=5BPATCH_linux-next=5D_net=3A_record_times_of_netdev=5Fbu?= =?utf-8?q?dget_exhausted?= Content-Type: text/plain; charset="UTF-8" X-MAIL: mse-fl1.zte.com.cn 2B38C3OV026108 X-Fangmail-Gw-Spam-Type: 0 X-FangMail-Miltered: at cgslv5.04-192.168.251.13.novalocal with ID 638B04DA.000 by FangMail milter! X-FangMail-Envelope: 1670055130/4NPMwp4Gt0z4y0v8/638B04DA.000/10.5.228.132/[10.5.228.132]/mse-fl1.zte.com.cn/<yang.yang29@zte.com.cn> X-Fangmail-Anti-Spam-Filtered: true X-Fangmail-MID-QID: 638B04DA.000/4NPMwp4Gt0z4y0v8 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751180969754796915?= X-GMAIL-MSGID: =?utf-8?q?1751180969754796915?= |
Series |
[linux-next] net: record times of netdev_budget exhausted
|
|
Commit Message
Yang Yang
Dec. 3, 2022, 8:12 a.m. UTC
From: Yang Yang <yang.yang29@zte.com> A long time ago time_squeeze was used to only record netdev_budget exhausted[1]. Then we added netdev_budget_usecs to enable softirq tuning[2]. And when polling elapsed netdev_budget_usecs, it's also record by time_squeeze. For tuning netdev_budget and netdev_budget_usecs respectively, we'd better distinguish netdev_budget exhausted from netdev_budget_usecs elapsed, so add a new recorder to record netdev_budget exhausted. [1] commit 1da177e4c3f4("Linux-2.6.12-rc2") [2] commit 7acf8a1e8a28("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning") Signed-off-by: Yang Yang <yang.yang29@zte.com> --- include/linux/netdevice.h | 1 + net/core/dev.c | 11 +++++++---- net/core/net-procfs.c | 5 +++-- 3 files changed, 11 insertions(+), 6 deletions(-)
Comments
Hi, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on next-20221202] url: https://github.com/intel-lab-lkp/linux/commits/yang-yang29-zte-com-cn/net-record-times-of-netdev_budget-exhausted/20221203-161326 patch link: https://lore.kernel.org/r/202212031612057505056%40zte.com.cn patch subject: [PATCH linux-next] net: record times of netdev_budget exhausted config: m68k-allyesconfig compiler: m68k-linux-gcc (GCC) 12.1.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/94f9d928cabc8715256c20e909b9e730620f4ed0 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review yang-yang29-zte-com-cn/net-record-times-of-netdev_budget-exhausted/20221203-161326 git checkout 94f9d928cabc8715256c20e909b9e730620f4ed0 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=m68k SHELL=/bin/bash net/ If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> All warnings (new ones prefixed by >>): net/core/net-procfs.c: In function 'softnet_seq_show': net/core/net-procfs.c:177:60: error: expected ')' before 'sd' 177 | softnet_backlog_len(sd), (int)seq->index | ^ | ) 178 | sd->budget_exhaust); | ~~ net/core/net-procfs.c:171:19: note: to match this '(' 171 | seq_printf(seq, | ^ >> net/core/net-procfs.c:172:89: warning: format '%x' expects a matching 'unsigned int' argument [-Wformat=] 172 | "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n", | ~~~^ | | | unsigned int vim +172 net/core/net-procfs.c 166 167 /* the index is the CPU id owing this sd. Since offline CPUs are not 168 * displayed, it would be othrwise not trivial for the user-space 169 * mapping the data a specific CPU 170 */ 171 seq_printf(seq, > 172 "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n", 173 sd->processed, sd->dropped, sd->time_squeeze, 0, 174 0, 0, 0, 0, /* was fastroute */ 175 0, /* was cpu_collision */ 176 sd->received_rps, flow_limit_count, 177 softnet_backlog_len(sd), (int)seq->index 178 sd->budget_exhaust); 179 return 0; 180 } 181
Hi, Thank you for the patch! Yet something to improve: [auto build test ERROR on next-20221202] url: https://github.com/intel-lab-lkp/linux/commits/yang-yang29-zte-com-cn/net-record-times-of-netdev_budget-exhausted/20221203-161326 patch link: https://lore.kernel.org/r/202212031612057505056%40zte.com.cn patch subject: [PATCH linux-next] net: record times of netdev_budget exhausted config: x86_64-randconfig-a012 compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/94f9d928cabc8715256c20e909b9e730620f4ed0 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review yang-yang29-zte-com-cn/net-record-times-of-netdev_budget-exhausted/20221203-161326 git checkout 94f9d928cabc8715256c20e909b9e730620f4ed0 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): >> net/core/net-procfs.c:178:6: error: expected ')' sd->budget_exhaust); ^ net/core/net-procfs.c:171:12: note: to match this '(' seq_printf(seq, ^ 1 error generated. vim +178 net/core/net-procfs.c 166 167 /* the index is the CPU id owing this sd. Since offline CPUs are not 168 * displayed, it would be othrwise not trivial for the user-space 169 * mapping the data a specific CPU 170 */ 171 seq_printf(seq, 172 "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n", 173 sd->processed, sd->dropped, sd->time_squeeze, 0, 174 0, 0, 0, 0, /* was fastroute */ 175 0, /* was cpu_collision */ 176 sd->received_rps, flow_limit_count, 177 softnet_backlog_len(sd), (int)seq->index > 178 sd->budget_exhaust); 179 return 0; 180 } 181
Hi, Thank you for the patch! Yet something to improve: [auto build test ERROR on next-20221202] url: https://github.com/intel-lab-lkp/linux/commits/yang-yang29-zte-com-cn/net-record-times-of-netdev_budget-exhausted/20221203-161326 patch link: https://lore.kernel.org/r/202212031612057505056%40zte.com.cn patch subject: [PATCH linux-next] net: record times of netdev_budget exhausted config: x86_64-randconfig-a013 compiler: gcc-11 (Debian 11.3.0-8) 11.3.0 reproduce (this is a W=1 build): # https://github.com/intel-lab-lkp/linux/commit/94f9d928cabc8715256c20e909b9e730620f4ed0 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review yang-yang29-zte-com-cn/net-record-times-of-netdev_budget-exhausted/20221203-161326 git checkout 94f9d928cabc8715256c20e909b9e730620f4ed0 # save the config file mkdir build_dir && cp config build_dir/.config make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): net/core/net-procfs.c: In function 'softnet_seq_show': >> net/core/net-procfs.c:177:60: error: expected ')' before 'sd' 177 | softnet_backlog_len(sd), (int)seq->index | ^ | ) 178 | sd->budget_exhaust); | ~~ net/core/net-procfs.c:171:19: note: to match this '(' 171 | seq_printf(seq, | ^ net/core/net-procfs.c:172:89: warning: format '%x' expects a matching 'unsigned int' argument [-Wformat=] 172 | "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n", | ~~~^ | | | unsigned int vim +177 net/core/net-procfs.c 166 167 /* the index is the CPU id owing this sd. Since offline CPUs are not 168 * displayed, it would be othrwise not trivial for the user-space 169 * mapping the data a specific CPU 170 */ 171 seq_printf(seq, 172 "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n", 173 sd->processed, sd->dropped, sd->time_squeeze, 0, 174 0, 0, 0, 0, /* was fastroute */ 175 0, /* was cpu_collision */ 176 sd->received_rps, flow_limit_count, > 177 softnet_backlog_len(sd), (int)seq->index 178 sd->budget_exhaust); 179 return 0; 180 } 181
On Sat, 3 Dec 2022 16:12:05 +0800 (CST) yang.yang29@zte.com.cn wrote: > A long time ago time_squeeze was used to only record netdev_budget > exhausted[1]. Then we added netdev_budget_usecs to enable softirq > tuning[2]. And when polling elapsed netdev_budget_usecs, it's also > record by time_squeeze. > For tuning netdev_budget and netdev_budget_usecs respectively, we'd > better distinguish netdev_budget exhausted from netdev_budget_usecs > elapsed, so add a new recorder to record netdev_budget exhausted. You're tuning netdev_budget and netdev_budget_usecs ? You need to say more because I haven't seen anyone do that before. time_squeeze is extremely noisy and annoyingly useless, we need to understand exactly what you're doing before we accept any changes to this core piece of code.
On Tue, 6 Dec 2022 09:53:05 +0800 (CST) kuba@kernel.org wrote: > time_squeeze is extremely noisy and annoyingly useless, > we need to understand exactly what you're doing before > we accept any changes to this core piece of code. The author of "Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning" is Matthew Whitehead, he said this in git log: Constants used for tuning are generally a bad idea, especially as hardware changes over time...For example, a very fast machine might tune this to 1000 microseconds, while my regression testing 486DX-25 needs it to be 4000 microseconds on a nearly idle network to prevent time_squeeze from being incremented. And on my systems there are huge packets on the intranet, and we came accross with lots of time_squeeze. The idea is that, netdev_budget* are selections between throughput and real-time. If we care throughput and not care real-time so much, we may want bigger netdev_budget*. In this scenario, we want to tune netdev_budget* and see their effect separately. By the way, if netdev_budget* are useless, should they be deleted? Thanks.
On Tue, 6 Dec 2022 10:35:07 +0800 (CST) yang.yang29@zte.com.cn wrote: > The author of "Replace 2 jiffies with sysctl netdev_budget_usecs > to enable softirq tuning" is Matthew Whitehead, he said this in > git log: Constants used for tuning are generally a bad idea, especially > as hardware changes over time...For example, a very fast machine > might tune this to 1000 microseconds, while my regression testing > 486DX-25 needs it to be 4000 microseconds on a nearly idle network > to prevent time_squeeze from being incremented. Let's just ignore that on the basis that it mentions prehistoric HW ;) > And on my systems there are huge packets on the intranet, and we > came accross with lots of time_squeeze. The idea is that, netdev_budget* > are selections between throughput and real-time. If we care throughput > and not care real-time so much, we may want bigger netdev_budget*. But are you seeing actual performance wins in terms of throughput or latency? As I said time_squeeze is very noisy. In my experience it's very sensitive to issues with jiffies, like someone masking interrupts on the timekeeper CPU for a long time (which if you use cgroups happens _a lot_ :/). Have you tried threaded NAPI? (find files called 'threaded' in sysfs) It will let you do any such tuning much more flexibly. > In this scenario, we want to tune netdev_budget* and see their effect > separately. > > By the way, if netdev_budget* are useless, should they be deleted? Well, we can't be sure if there's really nobody that uses them :( It's very risky to remove stuff that's exposed to user space.
On Tue, 6 Dec 2022 10:47:07 +0800 (CST) kuba@kernel.org wrote: > But are you seeing actual performance wins in terms of throughput > or latency? I did a test and see 7~8% of performance difference with small and big netdev_budget. Detail: 1. machine In qemu. CPU is QEMU TCG CPU version 2.5+. 2. kernel Linux (none) 5.14.0-rc6+ #91 SMP Tue Dec 6 19:55:14 CST 2022 x86_64 GNU/Linux 3. test condition Run 5 rt tasks to simulate workload, task is test.sh: --- #!/bin/bash while [ 1 ] do ls > /dev/null done --- 4. test method Use ping -f to flood. # ping -f 192.168.1.201 -w 1800 With netdev_buget is 500, and netdev_budget_usecs is 2000: 497913 packets transmitted, 497779 received, 0% packet loss, time 1799992ms rtt min/avg/max/mdev = 0.181/114.417/1915.343/246.098 ms, pipe 144, ipg/ewma 3.615/0.273 ms With netdev_budget is 1, and netdev_budget_usecs is 2000: 457530 packets transmitted, 457528 received, 0% packet loss, time 1799997ms rtt min/avg/max/mdev = 0.180/123.287/1914.186/253.883 ms, pipe 147, ipg/ewma 3.934/0.301 ms With small netdev_budget, avg latency increases 7%, packets transmitted decreases 8%. > Have you tried threaded NAPI? (find files called 'threaded' in sysfs) Thanks, we had researched on threaded NAPI, much applaud for it! But we think someone maynot use it because some kinds of reasons. One is threaded NAPI is good for control, but maynot good for throughput, especially for those who not care real-time too much. Another reason is distribution kernel may too old to support threaded NAPI? >Well, we can't be sure if there's really nobody that uses them :( As we still retain netdev_budget*, and there maybe some using it, should it be improve? Netdev_budget* are sysctl for administrator, when administrator adjust them, they may want to see the effect in a direct or easy way. That's what this patch's purpose.
On Wed, Dec 7, 2022 at 8:28 AM <yang.yang29@zte.com.cn> wrote: > > On Tue, 6 Dec 2022 10:47:07 +0800 (CST) kuba@kernel.org wrote: > > But are you seeing actual performance wins in terms of throughput > > or latency? > > I did a test and see 7~8% of performance difference with small and big > netdev_budget. Detail: > 1. machine > In qemu. CPU is QEMU TCG CPU version 2.5+. > 2. kernel > Linux (none) 5.14.0-rc6+ #91 SMP Tue Dec 6 19:55:14 CST 2022 x86_64 GNU/Linux > 3. test condition > Run 5 rt tasks to simulate workload, task is test.sh: > --- > #!/bin/bash > > while [ 1 ] > do > ls > /dev/null > done > --- > 4. test method > Use ping -f to flood. > # ping -f 192.168.1.201 -w 1800 > > With netdev_buget is 500, and netdev_budget_usecs is 2000: > 497913 packets transmitted, 497779 received, 0% packet loss, time 1799992ms > rtt min/avg/max/mdev = 0.181/114.417/1915.343/246.098 ms, pipe 144, ipg/ewma 3.615/0.273 ms > > With netdev_budget is 1, and netdev_budget_usecs is 2000: > 457530 packets transmitted, 457528 received, 0% packet loss, time 1799997ms > rtt min/avg/max/mdev = 0.180/123.287/1914.186/253.883 ms, pipe 147, ipg/ewma 3.934/0.301 ms > Sure, but netdev_budget set to 1 is extreme, don't you think ??? Has anyone used such a setting ? > With small netdev_budget, avg latency increases 7%, packets transmitted > decreases 8%. > > > Have you tried threaded NAPI? (find files called 'threaded' in sysfs) > > Thanks, we had researched on threaded NAPI, much applaud for it! > But we think someone maynot use it because some kinds of reasons. > One is threaded NAPI is good for control, but maynot good for > throughput, especially for those who not care real-time too much. > Another reason is distribution kernel may too old to support > threaded NAPI? > > >Well, we can't be sure if there's really nobody that uses them :( > > As we still retain netdev_budget*, and there maybe some using it, > should it be improve? Netdev_budget* are sysctl for administrator, > when administrator adjust them, they may want to see the effect in > a direct or easy way. That's what this patch's purpose. We prefer not changing /proc file format as much as we can, they are deprecated/legacy. Presumably, modern tracing techniques can let you do what you want without adding new counters. I think that a per-cpu counter is old-fashioned, and incurs a cost for the vast majority of users who will never look at the counters.
> Sure, but netdev_budget set to 1 is extreme, don't you think ??? Yes of course, that is just a test to show the difference. > We prefer not changing /proc file format as much as we can, they are > deprecated/legacy. Should we add some explain of the deprecation in code or doc? As it's deprecated, I think it's NAK for this patch. > Presumably, modern tracing techniques can let you do what you want > without adding new counters. Totally agree.
> Presumably, modern tracing techniques can let you do what you want > without adding new counters. By the way, should we add a tracepoint likes trace_napi_poll() to make it easier? Something likes: if (unlikely(budget <= 0 || time_after_eq(jiffies, time_limit))) { sd->time_squeeze++; + trace_napi_poll(budget, jiffies, time_limit); break; }
On Wed, 7 Dec 2022 16:17:32 +0800 (CST) yang.yang29@zte.com.cn wrote: > > We prefer not changing /proc file format as much as we can, they are > > deprecated/legacy. > > Should we add some explain of the deprecation in code or doc? > As it's deprecated, I think it's NAK for this patch. Correct, it is a NAK.
On Wed, 7 Dec 2022 20:30:08 +0800 (CST) yang.yang29@zte.com.cn wrote: > > Presumably, modern tracing techniques can let you do what you want > > without adding new counters. > > By the way, should we add a tracepoint likes trace_napi_poll() to make > it easier? Something likes: > if (unlikely(budget <= 0 || > time_after_eq(jiffies, time_limit))) { > sd->time_squeeze++; > + trace_napi_poll(budget, jiffies, time_limit); > break; > } In my experience - no this is not useful. Sorry if this is too direct, but it seems to me like you're trying hard to find something useful to do in this area without a clear use case. We have coding tasks which would definitely be useful and which nobody has time to accomplish. Please ask if you're trying to find something to do.
> In my experience - no this is not useful. Received, thanks! > Sorry if this is too direct, but it seems to me like you're trying hard > to find something useful to do in this area without a clear use case. I see maybe this is a too special scenes, not suitable. The motivation is we see lots of time_squeeze on our working machines, and want to tuning, but our kernel are not ready to use threaded NAPI. And we did see performance difference on different netdev_budget* in preliminary tests. > We have coding tasks which would definitely be useful and which nobody > has time to accomplish. Please ask if you're trying to find something > to do. We focus on 5G telecom machine, which has huge TIPC packets in the intranet. If it's related, we are glad to do it with much appreciate of your indicate! Thanks.
On Thu, 8 Dec 2022 09:12:06 +0800 (CST) yang.yang29@zte.com.cn wrote: > > Sorry if this is too direct, but it seems to me like you're trying hard > > to find something useful to do in this area without a clear use case. > > I see maybe this is a too special scenes, not suitable. The motivation > is we see lots of time_squeeze on our working machines, and want to > tuning, but our kernel are not ready to use threaded NAPI. And we Ah, in that cases I indeed misjudged, sorry. > did see performance difference on different netdev_budget* in > preliminary tests. Right, the budget values < 100 are quite impractical. Also as I said time_squeeze is a terrible metric, if you can find a direct metric in terms of application latency or max PPS, that's much more valuable. > > We have coding tasks which would definitely be useful and which nobody > > has time to accomplish. Please ask if you're trying to find something > > to do. > > We focus on 5G telecom machine, which has huge TIPC packets in the > intranet. If it's related, we are glad to do it with much appreciate of your > indicate! Oh, unfortunately most of the tasks we have are around driver infrastructure.
> ime_squeeze is a terrible metric, if you can find a direct metric > in terms of application latency or max PPS, that's much more valuable. Actually we are working on measure the latency of packets between inqueue and dequeue in eBPF way, we add new tracepoint to help do that. And we are also consider using PSI to measure it. We will submit patches when it's ready, thanks!
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 5aa35c58c342..a77719b956a6 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3135,6 +3135,7 @@ struct softnet_data { /* stats */ unsigned int processed; unsigned int time_squeeze; + unsigned int budget_exhaust; #ifdef CONFIG_RPS struct softnet_data *rps_ipi_list; #endif diff --git a/net/core/dev.c b/net/core/dev.c index 7627c475d991..42ae2dc62661 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6663,11 +6663,14 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) budget -= napi_poll(n, &repoll); /* If softirq window is exhausted then punt. - * Allow this to run for 2 jiffies since which will allow - * an average latency of 1.5/HZ. + * The window is controlled by time and packet budget. + * See Documentation/admin-guide/sysctl/net.rst for details. */ - if (unlikely(budget <= 0 || - time_after_eq(jiffies, time_limit))) { + if (unlikely(budget <= 0)) { + sd->budget_exhaust++; + break; + } + if (unlikely(time_after_eq(jiffies, time_limit))) { sd->time_squeeze++; break; } diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c index 1ec23bf8b05c..e09e245125f0 100644 --- a/net/core/net-procfs.c +++ b/net/core/net-procfs.c @@ -169,12 +169,13 @@ static int softnet_seq_show(struct seq_file *seq, void *v) * mapping the data a specific CPU */ seq_printf(seq, - "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n", + "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n", sd->processed, sd->dropped, sd->time_squeeze, 0, 0, 0, 0, 0, /* was fastroute */ 0, /* was cpu_collision */ sd->received_rps, flow_limit_count, - softnet_backlog_len(sd), (int)seq->index); + softnet_backlog_len(sd), (int)seq->index + sd->budget_exhaust); return 0; }