Message ID | 20230130130752.GA8015@debian |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp2176870wrn; Mon, 30 Jan 2023 05:26:33 -0800 (PST) X-Google-Smtp-Source: AK7set/ujKHfg2vb7be32Q8K7n+5QNV5PRJ3nMYqba5jnWP4fIVVHP1015QjWLUVxObYOoy6Zd9/ X-Received: by 2002:a05:6a20:8f1f:b0:bd:17a4:c33c with SMTP id b31-20020a056a208f1f00b000bd17a4c33cmr3872310pzk.32.1675085193254; Mon, 30 Jan 2023 05:26:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675085193; cv=none; d=google.com; s=arc-20160816; b=nczC+rAYyq5bRPMB4Ya+qj9DbhCMmJf5QyT5feXN4hsITJ+MDVkxQ8FrBx9ZPlIN/f QvfANg2EMpTPXYkPD+5ywRATqSVLIMcEFT03cmMyCIQ28ytr7TcfHdMNczr8URNDGkIZ 7mC4uc5PEiqakSX4Cmgz16jaQW9G0Rf2cGWiC2uq/ltFyNb1a2kpZoFEmydgMnLO44jC Kd62u5vB79X58denKNoWdN8fm1I1Y7Jl7Tv0otu7nuN9R4QDX54RDxY1bhYx9uk4ros2 ttmG2TDHu42CMrJc/WY69ft3u1MwOthVC+9RI2KmCPGlqRGCejpCcE7WbA+eiybaF2eA 0Bgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:to:from:date :dkim-signature; bh=NMug5rn4eBbCzTuP14/117PPtm8gZaNRz7jfLdaZaSE=; b=TCbVPDuvssiCwTmaB+kKCpp8eYL3GVNEmYWBuC/NrPHApJxVda7M0vXXEvcAsQ4dct QOyigf9CWW3puRUno70Ssr0+wTy+BxRXK2bk9xkPJIZVm8aMYux5R0LgoY4hNYJL9i2v jh6DWpLJV4I9yiLywpeb6CsuxvcsTggXSEc9MHOgslo7xyOw5/Pldn4MT+sTr/pUDjrN wq+RE7m21oFeGbehwI1JC5JShyTe4gDsBbo1f/zev9+saYVlQz3i1CyzmkCOObZLeK1Q C66b+jLUpiHUvdU8CEdsj9y3dfz0UWkOhrk3aMHsTppF6tSHrBRPT3dzXZSJRdQuu3Lc KaIQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="H/UESY5g"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q24-20020a637518000000b004e364321cc7si7917191pgc.225.2023.01.30.05.26.20; Mon, 30 Jan 2023 05:26:33 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="H/UESY5g"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235932AbjA3NIo (ORCPT <rfc822;n2h9z4@gmail.com> + 99 others); Mon, 30 Jan 2023 08:08:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60704 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235425AbjA3NIm (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 30 Jan 2023 08:08:42 -0500 Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 19EA238E82; Mon, 30 Jan 2023 05:08:26 -0800 (PST) Received: by mail-wm1-x329.google.com with SMTP id k8-20020a05600c1c8800b003dc57ea0dfeso1994941wms.0; Mon, 30 Jan 2023 05:08:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:to:from:date:from:to:cc:subject:date:message-id :reply-to; bh=NMug5rn4eBbCzTuP14/117PPtm8gZaNRz7jfLdaZaSE=; b=H/UESY5g+1hFVoQBanMALjhu4XqvZnKRhCA9betXDXWKEmsQe+rh0YPJcuO+F08OwU rvIgrbVGO/7L/4Fs68W+E3nJF5bJGUZTyHLUyf3VnfYxHi3NwSAzentarHJdbrcsY8dw 7u+kAkccotcF7Cc07M0btvTPT0jhM0KN5IFMkW5zlnsqus4itC8DaUE63TnZpJS97kLp yfRF8ulFo1CHRIfIAM0yNshreTGwkvVr1fJX0Urm7rD0X/wE8PbRYVvBZuNEzIjICq9a VQXP0wkUwdPTVWgseR/xYSOraeHl1op4jX7l7aCN/dsWg3S0YaRayP3jajP2t0BcPRwT n0sw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NMug5rn4eBbCzTuP14/117PPtm8gZaNRz7jfLdaZaSE=; b=f7+nhexEzyd2tl8Eg5My7rEna+joSTCLnvLktsx22rksS5Mk4OyGhRtR8zzbjDu8g0 KWDLLDMCSxCFfWjJ3P4Zls9eyYxc+zVDPVy5dat3oLi00FYGPdJMqhEPDg8FyGLN7YNh 2yG8YIZzsJ3ydSAa8+88XYXNPhfqcZw1B7bsHcnr6i9G77Szs7OWQyf18chUH/CdMx0C PVcY68fccb6/4L1+hkDqi/In/n+Aam+DtmOZW1t+3zaJQ2BRDnVgI8/5zOxD9kBHoSOX JLAy2QKahLCUhGp8Qfshp1jc+pQc2idFlFpGCWOZTP0YpCyhtcnWMQJR2IMJx0ntCwxM BbkA== X-Gm-Message-State: AO0yUKXICNrzEP8J6Xg7cSRcXJphqokXykPGo2iIVKkU3PeyEwtGCFbR c2B0v4AlhZCxFeILqkDy+Dw32q9+S/Y= X-Received: by 2002:a05:600c:12c6:b0:3dc:59a5:afc7 with SMTP id v6-20020a05600c12c600b003dc59a5afc7mr3411304wmd.20.1675084104638; Mon, 30 Jan 2023 05:08:24 -0800 (PST) Received: from debian ([89.238.191.199]) by smtp.gmail.com with ESMTPSA id 2-20020a05600c028200b003dc4baaedd3sm7282532wmk.37.2023.01.30.05.08.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Jan 2023 05:08:24 -0800 (PST) Date: Mon, 30 Jan 2023 14:07:55 +0100 From: Richard Gobert <richardbgobert@gmail.com> To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, yoshfuji@linux-ipv6.org, dsahern@kernel.org, steffen.klassert@secunet.com, lixiaoyan@google.com, alexanderduyck@fb.com, leon@kernel.org, ye.xingchen@zte.com.cn, iwienand@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 2/2] gro: optimise redundant parsing of packets Message-ID: <20230130130752.GA8015@debian> References: <20230130130047.GA7913@debian> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230130130047.GA7913@debian> User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756453214308186278?= X-GMAIL-MSGID: =?utf-8?q?1756454131340492764?= |
Series |
gro: optimise redundant parsing of packets
|
|
Commit Message
Richard Gobert
Jan. 30, 2023, 1:07 p.m. UTC
Currently, the IPv6 extension headers are parsed twice: first in
ipv6_gro_receive, and then again in ipv6_gro_complete.
The field NAPI_GRO_CB(skb)->proto is used by GRO to hold the layer 4
protocol type that comes after the IPv6 layer. I noticed that it is set
in ipv6_gro_receive, but isn't used anywhere. By using this field, and
also storing the size of the network header, we can avoid parsing
extension headers a second time in ipv6_gro_complete.
The implementation had to handle both inner and outer layers in case of
encapsulation (as they can't use the same field).
I've applied this optimisation to all base protocols (IPv6, IPv4,
Ethernet). Then, I benchmarked this patch on my machine, using ftrace to
measure ipv6_gro_complete's performance, and there was an improvement.
Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
---
include/net/gro.h | 8 ++++++--
net/ethernet/eth.c | 11 +++++++++--
net/ipv4/af_inet.c | 8 +++++++-
net/ipv6/ip6_offload.c | 15 ++++++++++++---
4 files changed, 34 insertions(+), 8 deletions(-)
Comments
From: Richard Gobert <richardbgobert@gmail.com> Date: Mon, 30 Jan 2023 14:07:55 +0100 > Currently, the IPv6 extension headers are parsed twice: first in > ipv6_gro_receive, and then again in ipv6_gro_complete. > > The field NAPI_GRO_CB(skb)->proto is used by GRO to hold the layer 4 > protocol type that comes after the IPv6 layer. I noticed that it is set > in ipv6_gro_receive, but isn't used anywhere. By using this field, and > also storing the size of the network header, we can avoid parsing > extension headers a second time in ipv6_gro_complete. > > The implementation had to handle both inner and outer layers in case of > encapsulation (as they can't use the same field). > > I've applied this optimisation to all base protocols (IPv6, IPv4, > Ethernet). Then, I benchmarked this patch on my machine, using ftrace to > measure ipv6_gro_complete's performance, and there was an improvement. Would be nice to see some perf numbers. "there was an improvement" doesn't say a lot TBH... > > Signed-off-by: Richard Gobert <richardbgobert@gmail.com> > --- > include/net/gro.h | 8 ++++++-- > net/ethernet/eth.c | 11 +++++++++-- > net/ipv4/af_inet.c | 8 +++++++- > net/ipv6/ip6_offload.c | 15 ++++++++++++--- > 4 files changed, 34 insertions(+), 8 deletions(-) [...] > @@ -456,12 +459,16 @@ EXPORT_SYMBOL(eth_gro_receive); > int eth_gro_complete(struct sk_buff *skb, int nhoff) > { > struct ethhdr *eh = (struct ethhdr *)(skb->data + nhoff); > - __be16 type = eh->h_proto; > + __be16 type; Please don't break RCT style when shortening/expanding variable declaration lines. > struct packet_offload *ptype; > int err = -ENOSYS; > > - if (skb->encapsulation) > + if (skb->encapsulation) { > skb_set_inner_mac_header(skb, nhoff); > + type = eh->h_proto; > + } else { > + type = NAPI_GRO_CB(skb)->network_proto; > + } > > ptype = gro_find_complete_by_type(type); > if (ptype != NULL) > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > index 6c0ec2789943..4401af7b3a15 100644 > --- a/net/ipv4/af_inet.c > +++ b/net/ipv4/af_inet.c > @@ -1551,6 +1551,9 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb) > * immediately following this IP hdr. > */ > > + if (!NAPI_GRO_CB(skb)->encap_mark) > + NAPI_GRO_CB(skb)->transport_proto = proto; > + > /* Note : No need to call skb_gro_postpull_rcsum() here, > * as we already checked checksum over ipv4 header was 0 > */ > @@ -1621,12 +1624,15 @@ int inet_gro_complete(struct sk_buff *skb, int nhoff) > __be16 newlen = htons(skb->len - nhoff); > struct iphdr *iph = (struct iphdr *)(skb->data + nhoff); > const struct net_offload *ops; > - int proto = iph->protocol; > + int proto; (same) > int err = -ENOSYS; > > if (skb->encapsulation) { > skb_set_inner_protocol(skb, cpu_to_be16(ETH_P_IP)); > skb_set_inner_network_header(skb, nhoff); > + proto = iph->protocol; > + } else { > + proto = NAPI_GRO_CB(skb)->transport_proto; > } > > csum_replace2(&iph->check, iph->tot_len, newlen); [...] > @@ -358,7 +361,13 @@ INDIRECT_CALLABLE_SCOPE int ipv6_gro_complete(struct sk_buff *skb, int nhoff) > iph->payload_len = htons(payload_len); > } > > - nhoff += sizeof(*iph) + ipv6_exthdrs_len(iph, &ops); > + if (!skb->encapsulation) { > + ops = rcu_dereference(inet6_offloads[NAPI_GRO_CB(skb)->transport_proto]); > + nhoff += NAPI_GRO_CB(skb)->network_len; Why not use the same skb_network_header_len() here? Both skb->network_header and skb->transport_header must be set and correct at this point (if not, you can always fix that). > + } else { > + nhoff += sizeof(*iph) + ipv6_exthdrs_len(iph, &ops); > + } > + > if (WARN_ON(!ops || !ops->callbacks.gro_complete)) > goto out; > Thanks, Olek
On Mon, Jan 30, 2023 at 2:08 PM Richard Gobert <richardbgobert@gmail.com> wrote: > > Currently, the IPv6 extension headers are parsed twice: first in > ipv6_gro_receive, and then again in ipv6_gro_complete. > > The field NAPI_GRO_CB(skb)->proto is used by GRO to hold the layer 4 > protocol type that comes after the IPv6 layer. I noticed that it is set > in ipv6_gro_receive, but isn't used anywhere. By using this field, and > also storing the size of the network header, we can avoid parsing > extension headers a second time in ipv6_gro_complete. > > The implementation had to handle both inner and outer layers in case of > encapsulation (as they can't use the same field). > > I've applied this optimisation to all base protocols (IPv6, IPv4, > Ethernet). Then, I benchmarked this patch on my machine, using ftrace to > measure ipv6_gro_complete's performance, and there was an improvement. It seems your patch adds a lot of conditional checks, which will alternate true/false for encapsulated protocols. So please give us raw numbers, ftrace is too heavy weight for such claims. > > Signed-off-by: Richard Gobert <richardbgobert@gmail.com> > --- > include/net/gro.h | 8 ++++++-- > net/ethernet/eth.c | 11 +++++++++-- > net/ipv4/af_inet.c | 8 +++++++- > net/ipv6/ip6_offload.c | 15 ++++++++++++--- > 4 files changed, 34 insertions(+), 8 deletions(-) > > diff --git a/include/net/gro.h b/include/net/gro.h > index 7b47dd6ce94f..d364616cb930 100644 > --- a/include/net/gro.h > +++ b/include/net/gro.h > @@ -41,8 +41,8 @@ struct napi_gro_cb { > /* Number of segments aggregated. */ > u16 count; > > - /* Used in ipv6_gro_receive() and foo-over-udp */ > - u16 proto; > + /* Used in eth_gro_receive() */ > + __be16 network_proto; > > /* Used in napi_gro_cb::free */ > #define NAPI_GRO_FREE 1 > @@ -86,6 +86,10 @@ struct napi_gro_cb { > > /* used to support CHECKSUM_COMPLETE for tunneling protocols */ > __wsum csum; > + > + /* Used in inet and ipv6 _gro_receive() */ > + u16 network_len; > + u8 transport_proto; > }; > > #define NAPI_GRO_CB(skb) ((struct napi_gro_cb *)(skb)->cb) > diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c > index 2edc8b796a4e..d68ad90f0a9e 100644 > --- a/net/ethernet/eth.c > +++ b/net/ethernet/eth.c > @@ -439,6 +439,9 @@ struct sk_buff *eth_gro_receive(struct list_head *head, struct sk_buff *skb) > goto out; > } > > + if (!NAPI_GRO_CB(skb)->encap_mark) > + NAPI_GRO_CB(skb)->network_proto = type; > + > skb_gro_pull(skb, sizeof(*eh)); > skb_gro_postpull_rcsum(skb, eh, sizeof(*eh)); > > @@ -456,12 +459,16 @@ EXPORT_SYMBOL(eth_gro_receive); > int eth_gro_complete(struct sk_buff *skb, int nhoff) > { > struct ethhdr *eh = (struct ethhdr *)(skb->data + nhoff); Why initializing @eh here is needed ? Presumably, for !skb->encapsulation, @eh would not be used. > - __be16 type = eh->h_proto; > + __be16 type; > struct packet_offload *ptype; > int err = -ENOSYS; > > - if (skb->encapsulation) > + if (skb->encapsulation) { > skb_set_inner_mac_header(skb, nhoff); > + type = eh->h_proto; > + } else { > + type = NAPI_GRO_CB(skb)->network_proto; > + } > > ptype = gro_find_complete_by_type(type); > if (ptype != NULL) > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > index 6c0ec2789943..4401af7b3a15 100644 > --- a/net/ipv4/af_inet.c > +++ b/net/ipv4/af_inet.c > @@ -1551,6 +1551,9 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb) > * immediately following this IP hdr. > */ > > + if (!NAPI_GRO_CB(skb)->encap_mark) > + NAPI_GRO_CB(skb)->transport_proto = proto; > + > /* Note : No need to call skb_gro_postpull_rcsum() here, > * as we already checked checksum over ipv4 header was 0 > */ > @@ -1621,12 +1624,15 @@ int inet_gro_complete(struct sk_buff *skb, int nhoff) > __be16 newlen = htons(skb->len - nhoff); > struct iphdr *iph = (struct iphdr *)(skb->data + nhoff); > const struct net_offload *ops; > - int proto = iph->protocol; > + int proto; > int err = -ENOSYS; > > if (skb->encapsulation) { > skb_set_inner_protocol(skb, cpu_to_be16(ETH_P_IP)); > skb_set_inner_network_header(skb, nhoff); > + proto = iph->protocol; > + } else { > + proto = NAPI_GRO_CB(skb)->transport_proto; I really doubt this change is needed. We need to access iph->fields in the following lines. Adding an else {} branch is adding extra code, and makes your patch longer to review. > } > > csum_replace2(&iph->check, iph->tot_len, newlen); > diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c > index 00dc2e3b0184..79ba5882f576 100644 > --- a/net/ipv6/ip6_offload.c > +++ b/net/ipv6/ip6_offload.c > @@ -227,11 +227,14 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, > iph = ipv6_hdr(skb); > } > > - NAPI_GRO_CB(skb)->proto = proto; I guess you missed BIG TCP ipv4 changes under review... ->proto is now used. > - > flush--; > nlen = skb_network_header_len(skb); > > + if (!NAPI_GRO_CB(skb)->encap_mark) { > + NAPI_GRO_CB(skb)->transport_proto = proto; > + NAPI_GRO_CB(skb)->network_len = nlen; > + } > + > list_for_each_entry(p, head, list) { > const struct ipv6hdr *iph2; > __be32 first_word; /* <Version:4><Traffic_Class:8><Flow_Label:20> */ > @@ -358,7 +361,13 @@ INDIRECT_CALLABLE_SCOPE int ipv6_gro_complete(struct sk_buff *skb, int nhoff) > iph->payload_len = htons(payload_len); > } > > - nhoff += sizeof(*iph) + ipv6_exthdrs_len(iph, &ops); > + if (!skb->encapsulation) { > + ops = rcu_dereference(inet6_offloads[NAPI_GRO_CB(skb)->transport_proto]); > + nhoff += NAPI_GRO_CB(skb)->network_len; > + } else { > + nhoff += sizeof(*iph) + ipv6_exthdrs_len(iph, &ops); IMO ipv6_exthdrs_len() is quite fast for the typical case where we have no extension headers. This new conditional check seems expensive to me. > + } > + > if (WARN_ON(!ops || !ops->callbacks.gro_complete)) > goto out; > > -- > 2.36.1 >
> > > > Currently, the IPv6 extension headers are parsed twice: first in > > ipv6_gro_receive, and then again in ipv6_gro_complete. > > > > The field NAPI_GRO_CB(skb)->proto is used by GRO to hold the layer 4 > > protocol type that comes after the IPv6 layer. I noticed that it is set > > in ipv6_gro_receive, but isn't used anywhere. By using this field, and > > also storing the size of the network header, we can avoid parsing > > extension headers a second time in ipv6_gro_complete. > > > > The implementation had to handle both inner and outer layers in case of > > encapsulation (as they can't use the same field). > > > > I've applied this optimisation to all base protocols (IPv6, IPv4, > > Ethernet). Then, I benchmarked this patch on my machine, using ftrace to > > measure ipv6_gro_complete's performance, and there was an improvement. > > It seems your patch adds a lot of conditional checks, which will > alternate true/false > for encapsulated protocols. > > So please give us raw numbers, ftrace is too heavy weight for such claims. > For the benchmarks, I used 100Gbit NIC mlx5 single-core (power management off), turboboost off. Typical IPv6 traffic (zero extension headers): for i in {1..5}; do netperf -t TCP_STREAM -H 2001:db8:2:2::2 -l 90 | tail -1; done # before 131072 16384 16384 90.00 16391.20 131072 16384 16384 90.00 16403.50 131072 16384 16384 90.00 16403.30 131072 16384 16384 90.00 16397.84 131072 16384 16384 90.00 16398.00 # after 131072 16384 16384 90.00 16399.85 131072 16384 16384 90.00 16392.37 131072 16384 16384 90.00 16403.06 131072 16384 16384 90.00 16406.97 131072 16384 16384 90.00 16406.09 IPv6 over IPv6 traffic: for i in {1..5}; do netperf -t TCP_STREAM -H 4001:db8:2:2::2 -l 90 | tail -1; done # before 131072 16384 16384 90.00 14791.61 131072 16384 16384 90.00 14791.66 131072 16384 16384 90.00 14783.47 131072 16384 16384 90.00 14810.17 131072 16384 16384 90.00 14806.15 # after 131072 16384 16384 90.00 14793.49 131072 16384 16384 90.00 14816.10 131072 16384 16384 90.00 14818.41 131072 16384 16384 90.00 14780.35 131072 16384 16384 90.00 14800.48 IPv6 traffic with varying extension headers: for i in {1..5}; do netperf -t TCP_STREAM -H 2001:db8:2:2::2 -l 90 | tail -1; done # before 131072 16384 16384 90.00 14812.37 131072 16384 16384 90.00 14813.04 131072 16384 16384 90.00 14802.54 131072 16384 16384 90.00 14804.06 131072 16384 16384 90.00 14819.08 # after 131072 16384 16384 90.00 14927.11 131072 16384 16384 90.00 14910.45 131072 16384 16384 90.00 14917.36 131072 16384 16384 90.00 14916.53 131072 16384 16384 90.00 14928.88 > > > > Signed-off-by: Richard Gobert <richardbgobert@gmail.com> > > --- > > include/net/gro.h | 8 ++++++-- > > net/ethernet/eth.c | 11 +++++++++-- > > net/ipv4/af_inet.c | 8 +++++++- > > net/ipv6/ip6_offload.c | 15 ++++++++++++--- > > 4 files changed, 34 insertions(+), 8 deletions(-) > > > > diff --git a/include/net/gro.h b/include/net/gro.h > > index 7b47dd6ce94f..d364616cb930 100644 > > --- a/include/net/gro.h > > +++ b/include/net/gro.h > > @@ -41,8 +41,8 @@ struct napi_gro_cb { > > /* Number of segments aggregated. */ > > u16 count; > > > > - /* Used in ipv6_gro_receive() and foo-over-udp */ > > - u16 proto; > > + /* Used in eth_gro_receive() */ > > + __be16 network_proto; > > > > /* Used in napi_gro_cb::free */ > > #define NAPI_GRO_FREE 1 > > @@ -86,6 +86,10 @@ struct napi_gro_cb { > > > > /* used to support CHECKSUM_COMPLETE for tunneling protocols */ > > __wsum csum; > > + > > + /* Used in inet and ipv6 _gro_receive() */ > > + u16 network_len; > > + u8 transport_proto; > > }; > > > > #define NAPI_GRO_CB(skb) ((struct napi_gro_cb *)(skb)->cb) > > diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c > > index 2edc8b796a4e..d68ad90f0a9e 100644 > > --- a/net/ethernet/eth.c > > +++ b/net/ethernet/eth.c > > @@ -439,6 +439,9 @@ struct sk_buff *eth_gro_receive(struct list_head *head, struct sk_buff *skb) > > goto out; > > } > > > > + if (!NAPI_GRO_CB(skb)->encap_mark) > > + NAPI_GRO_CB(skb)->network_proto = type; > > + > > skb_gro_pull(skb, sizeof(*eh)); > > skb_gro_postpull_rcsum(skb, eh, sizeof(*eh)); > > > > @@ -456,12 +459,16 @@ EXPORT_SYMBOL(eth_gro_receive); > > int eth_gro_complete(struct sk_buff *skb, int nhoff) > > { > > struct ethhdr *eh = (struct ethhdr *)(skb->data + nhoff); > > Why initializing @eh here is needed ? > Presumably, for !skb->encapsulation, @eh would not be used. > Fixed in v2, thanks > > - __be16 type = eh->h_proto; > > + __be16 type; > > struct packet_offload *ptype; > > int err = -ENOSYS; > > > > - if (skb->encapsulation) > > + if (skb->encapsulation) { > > skb_set_inner_mac_header(skb, nhoff); > > + type = eh->h_proto; > > + } else { > > + type = NAPI_GRO_CB(skb)->network_proto; > > + } > > > > ptype = gro_find_complete_by_type(type); > > if (ptype != NULL) > > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > > index 6c0ec2789943..4401af7b3a15 100644 > > --- a/net/ipv4/af_inet.c > > +++ b/net/ipv4/af_inet.c > > @@ -1551,6 +1551,9 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb) > > * immediately following this IP hdr. > > */ > > > > + if (!NAPI_GRO_CB(skb)->encap_mark) > > + NAPI_GRO_CB(skb)->transport_proto = proto; > > + > > /* Note : No need to call skb_gro_postpull_rcsum() here, > > * as we already checked checksum over ipv4 header was 0 > > */ > > @@ -1621,12 +1624,15 @@ int inet_gro_complete(struct sk_buff *skb, int nhoff) > > __be16 newlen = htons(skb->len - nhoff); > > struct iphdr *iph = (struct iphdr *)(skb->data + nhoff); > > const struct net_offload *ops; > > - int proto = iph->protocol; > > + int proto; > > int err = -ENOSYS; > > > > if (skb->encapsulation) { > > skb_set_inner_protocol(skb, cpu_to_be16(ETH_P_IP)); > > skb_set_inner_network_header(skb, nhoff); > > + proto = iph->protocol; > > + } else { > > + proto = NAPI_GRO_CB(skb)->transport_proto; > > I really doubt this change is needed. > We need to access iph->fields in the following lines. > Adding an else {} branch is adding extra code, and makes your patch > longer to review. > Good point, removed in v2. > > } > > > > csum_replace2(&iph->check, iph->tot_len, newlen); > > diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c > > index 00dc2e3b0184..79ba5882f576 100644 > > --- a/net/ipv6/ip6_offload.c > > +++ b/net/ipv6/ip6_offload.c > > @@ -227,11 +227,14 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, > > iph = ipv6_hdr(skb); > > } > > > > - NAPI_GRO_CB(skb)->proto = proto; > > I guess you missed BIG TCP ipv4 changes under review... ->proto is now used. > I rebased the patch now that ipv4 BIG TCP is merged, and made proto and transport_proto be separate variables. > > - > > flush--; > > nlen = skb_network_header_len(skb); > > > > + if (!NAPI_GRO_CB(skb)->encap_mark) { > > + NAPI_GRO_CB(skb)->transport_proto = proto; > > + NAPI_GRO_CB(skb)->network_len = nlen; > > + } > > + > > list_for_each_entry(p, head, list) { > > const struct ipv6hdr *iph2; > > __be32 first_word; /* <Version:4><Traffic_Class:8><Flow_Label:20> */ > > @@ -358,7 +361,13 @@ INDIRECT_CALLABLE_SCOPE int ipv6_gro_complete(struct sk_buff *skb, int nhoff) > > iph->payload_len = htons(payload_len); > > } > > > > - nhoff += sizeof(*iph) + ipv6_exthdrs_len(iph, &ops); > > + if (!skb->encapsulation) { > > + ops = rcu_dereference(inet6_offloads[NAPI_GRO_CB(skb)->transport_proto]); > > + nhoff += NAPI_GRO_CB(skb)->network_len; > > + } else { > > + nhoff += sizeof(*iph) + ipv6_exthdrs_len(iph, &ops); > > IMO ipv6_exthdrs_len() is quite fast for the typical case where we > have no extension headers. > > This new conditional check seems expensive to me. > In v2 I moved the encapsulation branch at the beginning of the function to this spot, merging the conditions. So for the typical case, instead of another redundant extension header length calculation (which requires at least one branch and one dereference of iph data), it's a simple dereference to CB. Thus, performance for the typical case is unharmed, and possibly even slightly improved. For cases with a varying amount of extension headers in ipv6, there's a performance upgrade (multiple memory dereference to iph data and conditional checks are saved). Also, after further inspection, I noticed that there is a potential problem in ipv6_gro_complete(), where inner_network_header is initialized to the wrong value. The initialization of the inner_network_header field should be performed after the BIG TCP if block. I fixed this in v2 by combining this initialization with the new conditional check after the BIG TCP so the patch does not add a conditional check anymore.
> > Currently, the IPv6 extension headers are parsed twice: first in > > ipv6_gro_receive, and then again in ipv6_gro_complete. > > > > The field NAPI_GRO_CB(skb)->proto is used by GRO to hold the layer 4 > > protocol type that comes after the IPv6 layer. I noticed that it is set > > in ipv6_gro_receive, but isn't used anywhere. By using this field, and > > also storing the size of the network header, we can avoid parsing > > extension headers a second time in ipv6_gro_complete. > > > > The implementation had to handle both inner and outer layers in case of > > encapsulation (as they can't use the same field). > > > > I've applied this optimisation to all base protocols (IPv6, IPv4, > > Ethernet). Then, I benchmarked this patch on my machine, using ftrace to > > measure ipv6_gro_complete's performance, and there was an improvement. > > Would be nice to see some perf numbers. "there was an improvement" > doesn't say a lot TBH... > I just posted raw performance numbers as a reply to Eric's message. Take a look there. > > @@ -456,12 +459,16 @@ EXPORT_SYMBOL(eth_gro_receive); > > int eth_gro_complete(struct sk_buff *skb, int nhoff) > > { > > struct ethhdr *eh = (struct ethhdr *)(skb->data + nhoff); > > - __be16 type = eh->h_proto; > > + __be16 type; > > Please don't break RCT style when shortening/expanding variable > declaration lines. Will be fixed in v2. > > @@ -358,7 +361,13 @@ INDIRECT_CALLABLE_SCOPE int ipv6_gro_complete(struct sk_buff *skb, int nhoff) > > iph->payload_len = htons(payload_len); > > } > > > > - nhoff += sizeof(*iph) + ipv6_exthdrs_len(iph, &ops); > > + if (!skb->encapsulation) { > > + ops = rcu_dereference(inet6_offloads[NAPI_GRO_CB(skb)->transport_proto]); > > + nhoff += NAPI_GRO_CB(skb)->network_len; > > Why not use the same skb_network_header_len() here? Both > skb->network_header and skb->transport_header must be set and correct at > this point (if not, you can always fix that). > When processing packets with encapsulation the network_header field is overwritten when processing the inner IP header, so skb_network_header_len won't return the correct value.
diff --git a/include/net/gro.h b/include/net/gro.h index 7b47dd6ce94f..d364616cb930 100644 --- a/include/net/gro.h +++ b/include/net/gro.h @@ -41,8 +41,8 @@ struct napi_gro_cb { /* Number of segments aggregated. */ u16 count; - /* Used in ipv6_gro_receive() and foo-over-udp */ - u16 proto; + /* Used in eth_gro_receive() */ + __be16 network_proto; /* Used in napi_gro_cb::free */ #define NAPI_GRO_FREE 1 @@ -86,6 +86,10 @@ struct napi_gro_cb { /* used to support CHECKSUM_COMPLETE for tunneling protocols */ __wsum csum; + + /* Used in inet and ipv6 _gro_receive() */ + u16 network_len; + u8 transport_proto; }; #define NAPI_GRO_CB(skb) ((struct napi_gro_cb *)(skb)->cb) diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c index 2edc8b796a4e..d68ad90f0a9e 100644 --- a/net/ethernet/eth.c +++ b/net/ethernet/eth.c @@ -439,6 +439,9 @@ struct sk_buff *eth_gro_receive(struct list_head *head, struct sk_buff *skb) goto out; } + if (!NAPI_GRO_CB(skb)->encap_mark) + NAPI_GRO_CB(skb)->network_proto = type; + skb_gro_pull(skb, sizeof(*eh)); skb_gro_postpull_rcsum(skb, eh, sizeof(*eh)); @@ -456,12 +459,16 @@ EXPORT_SYMBOL(eth_gro_receive); int eth_gro_complete(struct sk_buff *skb, int nhoff) { struct ethhdr *eh = (struct ethhdr *)(skb->data + nhoff); - __be16 type = eh->h_proto; + __be16 type; struct packet_offload *ptype; int err = -ENOSYS; - if (skb->encapsulation) + if (skb->encapsulation) { skb_set_inner_mac_header(skb, nhoff); + type = eh->h_proto; + } else { + type = NAPI_GRO_CB(skb)->network_proto; + } ptype = gro_find_complete_by_type(type); if (ptype != NULL) diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 6c0ec2789943..4401af7b3a15 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1551,6 +1551,9 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb) * immediately following this IP hdr. */ + if (!NAPI_GRO_CB(skb)->encap_mark) + NAPI_GRO_CB(skb)->transport_proto = proto; + /* Note : No need to call skb_gro_postpull_rcsum() here, * as we already checked checksum over ipv4 header was 0 */ @@ -1621,12 +1624,15 @@ int inet_gro_complete(struct sk_buff *skb, int nhoff) __be16 newlen = htons(skb->len - nhoff); struct iphdr *iph = (struct iphdr *)(skb->data + nhoff); const struct net_offload *ops; - int proto = iph->protocol; + int proto; int err = -ENOSYS; if (skb->encapsulation) { skb_set_inner_protocol(skb, cpu_to_be16(ETH_P_IP)); skb_set_inner_network_header(skb, nhoff); + proto = iph->protocol; + } else { + proto = NAPI_GRO_CB(skb)->transport_proto; } csum_replace2(&iph->check, iph->tot_len, newlen); diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c index 00dc2e3b0184..79ba5882f576 100644 --- a/net/ipv6/ip6_offload.c +++ b/net/ipv6/ip6_offload.c @@ -227,11 +227,14 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, iph = ipv6_hdr(skb); } - NAPI_GRO_CB(skb)->proto = proto; - flush--; nlen = skb_network_header_len(skb); + if (!NAPI_GRO_CB(skb)->encap_mark) { + NAPI_GRO_CB(skb)->transport_proto = proto; + NAPI_GRO_CB(skb)->network_len = nlen; + } + list_for_each_entry(p, head, list) { const struct ipv6hdr *iph2; __be32 first_word; /* <Version:4><Traffic_Class:8><Flow_Label:20> */ @@ -358,7 +361,13 @@ INDIRECT_CALLABLE_SCOPE int ipv6_gro_complete(struct sk_buff *skb, int nhoff) iph->payload_len = htons(payload_len); } - nhoff += sizeof(*iph) + ipv6_exthdrs_len(iph, &ops); + if (!skb->encapsulation) { + ops = rcu_dereference(inet6_offloads[NAPI_GRO_CB(skb)->transport_proto]); + nhoff += NAPI_GRO_CB(skb)->network_len; + } else { + nhoff += sizeof(*iph) + ipv6_exthdrs_len(iph, &ops); + } + if (WARN_ON(!ops || !ops->callbacks.gro_complete)) goto out;