Message ID | 202303212012296834902@zte.com.cn |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp1750853wrt; Tue, 21 Mar 2023 05:39:40 -0700 (PDT) X-Google-Smtp-Source: AK7set9v2MfD6pphIdDRvSQgpbmkeGmpKUyRr+N7dd0Xczh8UHHpeFXP1YXBqVnySZnX/YhpFJLH X-Received: by 2002:a05:6a20:1a24:b0:da:b92c:a949 with SMTP id cj36-20020a056a201a2400b000dab92ca949mr1582767pzb.36.1679402380111; Tue, 21 Mar 2023 05:39:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679402380; cv=none; d=google.com; s=arc-20160816; b=tDgnjsjHezJVZhwuvQZQZeqtJBRVFCxQ5WA5GWrseS+h5fK/S1mLCPmubPwNgNugN/ xcmfiBeCTxw9LU7WI6Zz6V9UQ3tnKsL1zsxYSuM/AYqo1GbE0iJEfcfoeirWY2+anq2A p4lC+vl6akXFAIGjWlatci7/3wxWA5zN7WJgWvYsta++3eb9adJzBNlpDCxytwTYYbdj 08hjzWCyZX/HoDPNajR2cGtotemxLHaq9K9w0GwN69sF5LFovu9jMaMueqh6b3LEh/Vj CwEjPnhlg22iB4HVJHhWLbikPCkVC46tzKLExPus4SFGvK/12DwCVthAv9i8JxFj9ft8 VONw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:subject:cc:to:from:mime-version:message-id:date; bh=7ilL+VZcGxeq9ACgg3pIXcIsn1EdegAytM5BH7/GJcE=; b=YzWmL4ucZVyQmCEE+lt2qY5Ib/gX9K4Z97/9Z9vhFQTr6aax3/boc4v1Ou8AtmUSJh +jAMN0SeysZPSnGixQZ0MN6osSaZdvpX5VSI06iUxJyjCK11+jM5o5zsc7f7M+aBFXaa DcuUH1Puy/7JquOLyfaI/8maUjD4+yxEN54mn7vpLEk89UkLu2G9URY1Df8CTqjv03Hi Ah0eST5KgYoXbB33+zbbStkO/85ef3gf0eUE70dx6ko1sk89LYSy8j/ell6M2GWsDueY gh/+TD4IuacyBmT2FRAp5u/azX1QSJo9mPfn/ljbR+qKRgETYaTsECWrS0yhWlAgr5pM qKig== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=zte.com.cn Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q6-20020a056a0002a600b00590712dd84bsi12730534pfs.81.2023.03.21.05.39.28; Tue, 21 Mar 2023 05:39:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=zte.com.cn Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230332AbjCUMMw (ORCPT <rfc822;ezelljr.billy@gmail.com> + 99 others); Tue, 21 Mar 2023 08:12:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58610 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229456AbjCUMMv (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 21 Mar 2023 08:12:51 -0400 Received: from mxhk.zte.com.cn (mxhk.zte.com.cn [63.216.63.35]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CF5B933CE4; Tue, 21 Mar 2023 05:12:49 -0700 (PDT) Received: from mse-fl2.zte.com.cn (unknown [10.5.228.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mxhk.zte.com.cn (FangMail) with ESMTPS id 4Pgr8c4Llhz6FK2Q; Tue, 21 Mar 2023 20:12:48 +0800 (CST) Received: from szxlzmapp04.zte.com.cn ([10.5.231.166]) by mse-fl2.zte.com.cn with SMTP id 32LCCQTh002869; Tue, 21 Mar 2023 20:12:26 +0800 (+08) (envelope-from yang.yang29@zte.com.cn) Received: from mapi (szxlzmapp03[null]) by mapi (Zmail) with MAPI id mid14; Tue, 21 Mar 2023 20:12:29 +0800 (CST) Date: Tue, 21 Mar 2023 20:12:29 +0800 (CST) X-Zmail-TransId: 2b0564199f2d308-a24cd X-Mailer: Zmail v1.0 Message-ID: <202303212012296834902@zte.com.cn> Mime-Version: 1.0 From: <yang.yang29@zte.com.cn> To: <edumazet@google.com> Cc: <davem@davemloft.net>, <kuba@kernel.org>, <netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <xu.xin16@zte.com.cn>, <jiang.xuexin@zte.com.cn>, <zhang.yunkai@zte.com.cn> Subject: =?utf-8?q?=5BPATCH=5D_rps=3A_process_the_skb_directly_if_rps_cpu_no?= =?utf-8?q?t_changed?= Content-Type: text/plain; charset="UTF-8" X-MAIL: mse-fl2.zte.com.cn 32LCCQTh002869 X-Fangmail-Gw-Spam-Type: 0 X-Fangmail-Anti-Spam-Filtered: true X-Fangmail-MID-QID: 64199F40.000/4Pgr8c4Llhz6FK2Q X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760981029813861011?= X-GMAIL-MSGID: =?utf-8?q?1760981029813861011?= |
Series |
rps: process the skb directly if rps cpu not changed
|
|
Commit Message
Yang Yang
March 21, 2023, 12:12 p.m. UTC
From: xu xin <xu.xin16@zte.com.cn> In the RPS procedure of NAPI receiving, regardless of whether the rps-calculated CPU of the skb equals to the currently processing CPU, RPS will always use enqueue_to_backlog to enqueue the skb to per-cpu backlog, which will trigger a new NET_RX softirq. Actually, it's not necessary to enqueue it to backlog when rps-calculated CPU id equals to the current processing CPU, and we can call __netif_receive_skb or __netif_receive_skb_list to process the skb directly. The benefit is that it can reduce the number of softirqs of NET_RX and reduce the processing delay of skb. The measured result shows the patch brings 50% reduction of NET_RX softirqs. The test was done on the QEMU environment with two-core CPU by iperf3. taskset 01 iperf3 -c 192.168.2.250 -t 3 -u -R; taskset 02 iperf3 -c 192.168.2.250 -t 3 -u -R; Previous RPS: CPU0 CPU1 NET_RX: 45 0 (before iperf3 testing) NET_RX: 1095 241 (after iperf3 testing) Patched RPS: CPU0 CPU1 NET_RX: 28 4 (before iperf3 testing) NET_RX: 573 32 (after iperf3 testing) Signed-off-by: xu xin <xu.xin16@zte.com.cn> Reviewed-by: Zhang Yunkai <zhang.yunkai@zte.com.cn> Reviewed-by: Yang Yang <yang.yang29@zte.com.cn> Cc: Xuexin Jiang <jiang.xuexin@zte.com.cn> --- net/core/dev.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
Comments
On 2023/3/21 20:12, yang.yang29@zte.com.cn wrote: > From: xu xin <xu.xin16@zte.com.cn> > > In the RPS procedure of NAPI receiving, regardless of whether the > rps-calculated CPU of the skb equals to the currently processing CPU, RPS > will always use enqueue_to_backlog to enqueue the skb to per-cpu backlog, > which will trigger a new NET_RX softirq. Does bypassing the backlog cause out of order problem for packet handling? It seems currently the RPS/RFS will ensure order delivery,such as: https://elixir.bootlin.com/linux/v6.3-rc3/source/net/core/dev.c#L4485 Also, this is an optimization, it should target the net-next branch: [PATCH net-next] rps: process the skb directly if rps cpu not changed > > Actually, it's not necessary to enqueue it to backlog when rps-calculated > CPU id equals to the current processing CPU, and we can call > __netif_receive_skb or __netif_receive_skb_list to process the skb directly. > The benefit is that it can reduce the number of softirqs of NET_RX and reduce > the processing delay of skb. > > The measured result shows the patch brings 50% reduction of NET_RX softirqs. > The test was done on the QEMU environment with two-core CPU by iperf3. > taskset 01 iperf3 -c 192.168.2.250 -t 3 -u -R; > taskset 02 iperf3 -c 192.168.2.250 -t 3 -u -R; > > Previous RPS: > CPU0 CPU1 > NET_RX: 45 0 (before iperf3 testing) > NET_RX: 1095 241 (after iperf3 testing) > > Patched RPS: > CPU0 CPU1 > NET_RX: 28 4 (before iperf3 testing) > NET_RX: 573 32 (after iperf3 testing) > > Signed-off-by: xu xin <xu.xin16@zte.com.cn> > Reviewed-by: Zhang Yunkai <zhang.yunkai@zte.com.cn> > Reviewed-by: Yang Yang <yang.yang29@zte.com.cn> > Cc: Xuexin Jiang <jiang.xuexin@zte.com.cn> > --- > net/core/dev.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/net/core/dev.c b/net/core/dev.c > index c7853192563d..c33ddac3c012 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -5666,8 +5666,9 @@ static int netif_receive_skb_internal(struct sk_buff *skb) > if (static_branch_unlikely(&rps_needed)) { > struct rps_dev_flow voidflow, *rflow = &voidflow; > int cpu = get_rps_cpu(skb->dev, skb, &rflow); > + int current_cpu = smp_processor_id(); > > - if (cpu >= 0) { > + if (cpu >= 0 && cpu != current_cpu) { > ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail); > rcu_read_unlock(); > return ret; > @@ -5699,8 +5700,9 @@ void netif_receive_skb_list_internal(struct list_head *head) > list_for_each_entry_safe(skb, next, head, list) { > struct rps_dev_flow voidflow, *rflow = &voidflow; > int cpu = get_rps_cpu(skb->dev, skb, &rflow); > + int current_cpu = smp_processor_id(); > > - if (cpu >= 0) { > + if (cpu >= 0 && cpu != current_cpu) { > /* Will be handled, remove from list */ > skb_list_del_init(skb); > enqueue_to_backlog(skb, cpu, &rflow->last_qtail); >
On Tue, 21 Mar 2023 20:12:29 +0800 (CST) yang.yang29@zte.com.cn wrote: > The measured result shows the patch brings 50% reduction of NET_RX softirqs. > The test was done on the QEMU environment with two-core CPU by iperf3. > taskset 01 iperf3 -c 192.168.2.250 -t 3 -u -R; > taskset 02 iperf3 -c 192.168.2.250 -t 3 -u -R; > > Previous RPS: > CPU0 CPU1 this header looks misalinged > NET_RX: 45 0 (before iperf3 testing) > NET_RX: 1095 241 (after iperf3 testing) > > Patched RPS: > CPU0 CPU1 > NET_RX: 28 4 (before iperf3 testing) > NET_RX: 573 32 (after iperf3 testing) This table is really confusing. What's the unit, how is it measured and why are you showing before/after rather than the delta? > diff --git a/net/core/dev.c b/net/core/dev.c > index c7853192563d..c33ddac3c012 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -5666,8 +5666,9 @@ static int netif_receive_skb_internal(struct sk_buff *skb) > if (static_branch_unlikely(&rps_needed)) { > struct rps_dev_flow voidflow, *rflow = &voidflow; > int cpu = get_rps_cpu(skb->dev, skb, &rflow); > + int current_cpu = smp_processor_id(); > > - if (cpu >= 0) { > + if (cpu >= 0 && cpu != current_cpu) { > ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail); > rcu_read_unlock(); > return ret; > @@ -5699,8 +5700,9 @@ void netif_receive_skb_list_internal(struct list_head *head) > list_for_each_entry_safe(skb, next, head, list) { > struct rps_dev_flow voidflow, *rflow = &voidflow; > int cpu = get_rps_cpu(skb->dev, skb, &rflow); > + int current_cpu = smp_processor_id(); This does not have to be in the loop. > > - if (cpu >= 0) { > + if (cpu >= 0 && cpu != current_cpu) { Please answer Yunsheng's question as well..
On 2023/3/21 20:12, yang.yang29@zte.com.cn wrote: >> From: xu xin <xu.xin16@zte.com.cn> >> >> In the RPS procedure of NAPI receiving, regardless of whether the >> rps-calculated CPU of the skb equals to the currently processing CPU, RPS >> will always use enqueue_to_backlog to enqueue the skb to per-cpu backlog, >> which will trigger a new NET_RX softirq. > >Does bypassing the backlog cause out of order problem for packet handling? >It seems currently the RPS/RFS will ensure order delivery,such as: >https://elixir.bootlin.com/linux/v6.3-rc3/source/net/core/dev.c#L4485 > >Also, this is an optimization, it should target the net-next branch: >[PATCH net-next] rps: process the skb directly if rps cpu not changed > Well, I thought the patch would't break the effort RFS tried to avoid "Out of Order" packets. But thanks for your reminder, I rethink it again, bypassing the backlog from "netif_receive_skb_list" will mislead RFS's judging if all previous packets for the flow have been dequeued, where RFS thought all packets have been dealed with, but actually they are still in skb lists. Fortunately, bypassing the backlog from "netif_receive_skb" for a single skb is okay and won't cause OOO packets because every skb is processed serially by RPS and sent to the protocol stack as soon as possible. If I'm correct, the code as follws can fix this. --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5666,8 +5666,9 @@ static int netif_receive_skb_internal(struct sk_buff *skb) if (static_branch_unlikely(&rps_needed)) { struct rps_dev_flow voidflow, *rflow = &voidflow; int cpu = get_rps_cpu(skb->dev, skb, &rflow); + int current_cpu = smp_processor_id(); - if (cpu >= 0) { + if (cpu >= 0 && cpu != current_cpu) { ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail); rcu_read_unlock(); return ret; @@ -5699,11 +5700,15 @@ void netif_receive_skb_list_internal(struct list_head *head) list_for_each_entry_safe(skb, next, head, list) { struct rps_dev_flow voidflow, *rflow = &voidflow; int cpu = get_rps_cpu(skb->dev, skb, &rflow); + int current_cpu = smp_processor_id(); if (cpu >= 0) { /* Will be handled, remove from list */ skb_list_del_init(skb); - enqueue_to_backlog(skb, cpu, &rflow->last_qtail); + if (cpu != current_cpu) + enqueue_to_backlog(skb, cpu, &rflow->last_qtail); + else + __netif_receive_skb(skb); } } Thanks. >> >> Actually, it's not necessary to enqueue it to backlog when rps-calculated >> CPU id equals to the current processing CPU, and we can call >> __netif_receive_skb or __netif_receive_skb_list to process the skb directly. >> The benefit is that it can reduce the number of softirqs of NET_RX and reduce >> the processing delay of skb. >> >> The measured result shows the patch brings 50% reduction of NET_RX softirqs. >> The test was done on the QEMU environment with two-core CPU by iperf3. >> taskset 01 iperf3 -c 192.168.2.250 -t 3 -u -R; >> taskset 02 iperf3 -c 192.168.2.250 -t 3 -u -R; >> >> Previous RPS: >> CPU0 CPU1 >> NET_RX: 45 0 (before iperf3 testing) >> NET_RX: 1095 241 (after iperf3 testing) >> >> Patched RPS: >> CPU0 CPU1 >> NET_RX: 28 4 (before iperf3 testing) >> NET_RX: 573 32 (after iperf3 testing) > >Sincerely. >Xu Xin
[So sorry, I made a mistake in the reply title] On 2023/3/21 20:12, yang.yang29@zte.com.cn wrote: >> From: xu xin <xu.xin16@zte.com.cn> >> >> In the RPS procedure of NAPI receiving, regardless of whether the >> rps-calculated CPU of the skb equals to the currently processing CPU, RPS >> will always use enqueue_to_backlog to enqueue the skb to per-cpu backlog, >> which will trigger a new NET_RX softirq. > >Does bypassing the backlog cause out of order problem for packet handling? >It seems currently the RPS/RFS will ensure order delivery,such as: >https://elixir.bootlin.com/linux/v6.3-rc3/source/net/core/dev.c#L4485 > >Also, this is an optimization, it should target the net-next branch: >[PATCH net-next] rps: process the skb directly if rps cpu not changed > Well, I thought the patch would't break the effort RFS tried to avoid "Out of Order" packets. But thanks for your reminder, I rethink it again, bypassing the backlog from "netif_receive_skb_list" will mislead RFS's judging if all previous packets for the flow have been dequeued, where RFS thought all packets have been dealed with, but actually they are still in skb lists. Fortunately, bypassing the backlog from "netif_receive_skb" for a single skb is okay and won't cause OOO packets because every skb is processed serially by RPS and sent to the protocol stack as soon as possible. If I'm correct, the code as follws can fix this. --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5666,8 +5666,9 @@ static int netif_receive_skb_internal(struct sk_buff *skb) if (static_branch_unlikely(&rps_needed)) { struct rps_dev_flow voidflow, *rflow = &voidflow; int cpu = get_rps_cpu(skb->dev, skb, &rflow); + int current_cpu = smp_processor_id(); - if (cpu >= 0) { + if (cpu >= 0 && cpu != current_cpu) { ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail); rcu_read_unlock(); return ret; @@ -5699,11 +5700,15 @@ void netif_receive_skb_list_internal(struct list_head *head) list_for_each_entry_safe(skb, next, head, list) { struct rps_dev_flow voidflow, *rflow = &voidflow; int cpu = get_rps_cpu(skb->dev, skb, &rflow); + int current_cpu = smp_processor_id(); if (cpu >= 0) { /* Will be handled, remove from list */ skb_list_del_init(skb); - enqueue_to_backlog(skb, cpu, &rflow->last_qtail); + if (cpu != current_cpu) + enqueue_to_backlog(skb, cpu, &rflow->last_qtail); + else + __netif_receive_skb(skb); } } Thanks. >> >> Actually, it's not necessary to enqueue it to backlog when rps-calculated >> CPU id equals to the current processing CPU, and we can call >> __netif_receive_skb or __netif_receive_skb_list to process the skb directly. >> The benefit is that it can reduce the number of softirqs of NET_RX and reduce >> the processing delay of skb. >> >> The measured result shows the patch brings 50% reduction of NET_RX softirqs. >> The test was done on the QEMU environment with two-core CPU by iperf3. >> taskset 01 iperf3 -c 192.168.2.250 -t 3 -u -R; >> taskset 02 iperf3 -c 192.168.2.250 -t 3 -u -R; >> >> Previous RPS: >> CPU0 CPU1 >> NET_RX: 45 0 (before iperf3 testing) >> NET_RX: 1095 241 (after iperf3 testing) >> >> Patched RPS: >> CPU0 CPU1 >> NET_RX: 28 4 (before iperf3 testing) >> NET_RX: 573 32 (after iperf3 testing) > >Sincerely. >Xu Xin
diff --git a/net/core/dev.c b/net/core/dev.c index c7853192563d..c33ddac3c012 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5666,8 +5666,9 @@ static int netif_receive_skb_internal(struct sk_buff *skb) if (static_branch_unlikely(&rps_needed)) { struct rps_dev_flow voidflow, *rflow = &voidflow; int cpu = get_rps_cpu(skb->dev, skb, &rflow); + int current_cpu = smp_processor_id(); - if (cpu >= 0) { + if (cpu >= 0 && cpu != current_cpu) { ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail); rcu_read_unlock(); return ret; @@ -5699,8 +5700,9 @@ void netif_receive_skb_list_internal(struct list_head *head) list_for_each_entry_safe(skb, next, head, list) { struct rps_dev_flow voidflow, *rflow = &voidflow; int cpu = get_rps_cpu(skb->dev, skb, &rflow); + int current_cpu = smp_processor_id(); - if (cpu >= 0) { + if (cpu >= 0 && cpu != current_cpu) { /* Will be handled, remove from list */ skb_list_del_init(skb); enqueue_to_backlog(skb, cpu, &rflow->last_qtail);