[RFC,net-next,v3,2/2] bnxt: Use generic HBH removal helper in tx path

Message ID 20221129200653.962019-2-lixiaoyan@google.com
State New
Headers
Series [RFC,net-next,v3,1/2] IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver |

Commit Message

Coco Li Nov. 29, 2022, 8:06 p.m. UTC
  Eric Dumazet implemented Big TCP that allowed bigger TSO/GRO packet sizes
for IPv6 traffic. See patch series:
'commit 89527be8d8d6 ("net: add IFLA_TSO_{MAX_SIZE|SEGS} attributes")'

This reduces the number of packets traversing the networking stack and
should usually improves performance. However, it also inserts a
temporary Hop-by-hop IPv6 extension header.

Using the HBH header removal method in the previous path, the extra header
be removed in bnxt drivers to allow it to send big TCP packets (bigger
TSO packets) as well.

Tested:
Compiled locally

To further test functional correctness, update the GSO/GRO limit on the
physical NIC:

ip link set eth0 gso_max_size 181000
ip link set eth0 gro_max_size 181000

Note that if there are bonding or ipvan devices on top of the physical
NIC, their GSO sizes need to be updated as well.

Then, IPv6/TCP packets with sizes larger than 64k can be observed.

Big TCP functionality is tested by Michael, feature checks not yet.

Tested by Michael:
I've confirmed with our hardware team that this is supported by our
chips, and I've tested it up to gso_max_size of 524280.  Thanks.

Tested-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Coco Li <lixiaoyan@google.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)
  

Comments

Michael Chan Nov. 29, 2022, 8:41 p.m. UTC | #1
On Tue, Nov 29, 2022 at 12:07 PM Coco Li <lixiaoyan@google.com> wrote:
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 0fe164b42c5d..f144a5ef2e04 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -389,6 +389,9 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
>                         return NETDEV_TX_BUSY;
>         }
>
> +       if (unlikely(ipv6_hopopt_jumbo_remove(skb)))
> +               goto tx_free;
> +
>         length = skb->len;
>         len = skb_headlen(skb);
>         last_frag = skb_shinfo(skb)->nr_frags;
> @@ -11342,9 +11345,15 @@ static bool bnxt_exthdr_check(struct bnxt *bp, struct sk_buff *skb, int nw_off,
>
>                 if (hdrlen > 64)
>                         return false;
> +
> +               /* The ext header may be a hop-by-hop header inserted for
> +                * big TCP purposes. This will be removed before sending
> +                * from NIC, so do not count it.
> +                */
> +               if (!(*nexthdr == NEXTHDR_HOP && ipv6_has_hopopt_jumbo(skb)))

To be more efficient, why not just check the header's tlv_type here
instead of calling ipv6_has_hopopt_jumbo()?

> +                       hdr_count++;
>                 nexthdr = &hp->nexthdr;
>                 start += hdrlen;
> -               hdr_count++;
>         }
>         if (nextp) {
>                 /* Caller will check inner protocol */
> @@ -13657,6 +13666,8 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
>                 dev->features &= ~NETIF_F_LRO;
>         dev->priv_flags |= IFF_UNICAST_FLT;
>
> +       netif_set_tso_max_size(dev, GSO_MAX_SIZE);
> +
>  #ifdef CONFIG_BNXT_SRIOV
>         init_waitqueue_head(&bp->sriov_cfg_wait);
>  #endif
> --
> 2.38.1.584.g0f3c55d4c2-goog
>
  
Coco Li Dec. 2, 2022, 2:03 a.m. UTC | #2
On Tue, Nov 29, 2022 at 12:42 PM Michael Chan <michael.chan@broadcom.com> wrote:
>
> On Tue, Nov 29, 2022 at 12:07 PM Coco Li <lixiaoyan@google.com> wrote:
> > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > index 0fe164b42c5d..f144a5ef2e04 100644
> > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > @@ -389,6 +389,9 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >                         return NETDEV_TX_BUSY;
> >         }
> >
> > +       if (unlikely(ipv6_hopopt_jumbo_remove(skb)))
> > +               goto tx_free;
> > +
> >         length = skb->len;
> >         len = skb_headlen(skb);
> >         last_frag = skb_shinfo(skb)->nr_frags;
> > @@ -11342,9 +11345,15 @@ static bool bnxt_exthdr_check(struct bnxt *bp, struct sk_buff *skb, int nw_off,
> >
> >                 if (hdrlen > 64)
> >                         return false;
> > +
> > +               /* The ext header may be a hop-by-hop header inserted for
> > +                * big TCP purposes. This will be removed before sending
> > +                * from NIC, so do not count it.
> > +                */
> > +               if (!(*nexthdr == NEXTHDR_HOP && ipv6_has_hopopt_jumbo(skb)))
>
> To be more efficient, why not just check the header's tlv_type here
> instead of calling ipv6_has_hopopt_jumbo()?
>

It may be possible that the next header is Hop_by_hop but the packet
is not tcp, meaning that it would not be removed and we'd still want
to count this header towards the limit.
ipv6_has_hopopt_jumbo checks for the big tcp case (gso, skb len
reaches a certain size) particularly.

> > +                       hdr_count++;
> >                 nexthdr = &hp->nexthdr;
> >                 start += hdrlen;
> > -               hdr_count++;
> >         }
> >         if (nextp) {
> >                 /* Caller will check inner protocol */
> > @@ -13657,6 +13666,8 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
> >                 dev->features &= ~NETIF_F_LRO;
> >         dev->priv_flags |= IFF_UNICAST_FLT;
> >
> > +       netif_set_tso_max_size(dev, GSO_MAX_SIZE);
> > +
> >  #ifdef CONFIG_BNXT_SRIOV
> >         init_waitqueue_head(&bp->sriov_cfg_wait);
> >  #endif
> > --
> > 2.38.1.584.g0f3c55d4c2-goog
> >
  
Michael Chan Dec. 2, 2022, 5:56 a.m. UTC | #3
On Thu, Dec 1, 2022 at 6:03 PM Coco Li <lixiaoyan@google.com> wrote:
>
> On Tue, Nov 29, 2022 at 12:42 PM Michael Chan <michael.chan@broadcom.com> wrote:
> >
> > On Tue, Nov 29, 2022 at 12:07 PM Coco Li <lixiaoyan@google.com> wrote:
> > > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > > index 0fe164b42c5d..f144a5ef2e04 100644
> > > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > > @@ -389,6 +389,9 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > >                         return NETDEV_TX_BUSY;
> > >         }
> > >
> > > +       if (unlikely(ipv6_hopopt_jumbo_remove(skb)))
> > > +               goto tx_free;
> > > +
> > >         length = skb->len;
> > >         len = skb_headlen(skb);
> > >         last_frag = skb_shinfo(skb)->nr_frags;
> > > @@ -11342,9 +11345,15 @@ static bool bnxt_exthdr_check(struct bnxt *bp, struct sk_buff *skb, int nw_off,
> > >
> > >                 if (hdrlen > 64)
> > >                         return false;
> > > +
> > > +               /* The ext header may be a hop-by-hop header inserted for
> > > +                * big TCP purposes. This will be removed before sending
> > > +                * from NIC, so do not count it.
> > > +                */
> > > +               if (!(*nexthdr == NEXTHDR_HOP && ipv6_has_hopopt_jumbo(skb)))
> >
> > To be more efficient, why not just check the header's tlv_type here
> > instead of calling ipv6_has_hopopt_jumbo()?
> >
>
> It may be possible that the next header is Hop_by_hop but the packet
> is not tcp, meaning that it would not be removed and we'd still want
> to count this header towards the limit.
> ipv6_has_hopopt_jumbo checks for the big tcp case (gso, skb len
> reaches a certain size) particularly.
>

We can add all the additional checks here and it will still be more
efficient because we already know this is ipv6 and we are looking at
the extension header.  This is fast path so I think we want to be as
efficient as possible.
  

Patch

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 0fe164b42c5d..f144a5ef2e04 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -389,6 +389,9 @@  static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
 			return NETDEV_TX_BUSY;
 	}
 
+	if (unlikely(ipv6_hopopt_jumbo_remove(skb)))
+		goto tx_free;
+
 	length = skb->len;
 	len = skb_headlen(skb);
 	last_frag = skb_shinfo(skb)->nr_frags;
@@ -11342,9 +11345,15 @@  static bool bnxt_exthdr_check(struct bnxt *bp, struct sk_buff *skb, int nw_off,
 
 		if (hdrlen > 64)
 			return false;
+
+		/* The ext header may be a hop-by-hop header inserted for
+		 * big TCP purposes. This will be removed before sending
+		 * from NIC, so do not count it.
+		 */
+		if (!(*nexthdr == NEXTHDR_HOP && ipv6_has_hopopt_jumbo(skb)))
+			hdr_count++;
 		nexthdr = &hp->nexthdr;
 		start += hdrlen;
-		hdr_count++;
 	}
 	if (nextp) {
 		/* Caller will check inner protocol */
@@ -13657,6 +13666,8 @@  static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		dev->features &= ~NETIF_F_LRO;
 	dev->priv_flags |= IFF_UNICAST_FLT;
 
+	netif_set_tso_max_size(dev, GSO_MAX_SIZE);
+
 #ifdef CONFIG_BNXT_SRIOV
 	init_waitqueue_head(&bp->sriov_cfg_wait);
 #endif