[net-next] gro: avoid checking for a failed search

Message ID 20221024051744.GA48642@debian
State New
Headers
Series [net-next] gro: avoid checking for a failed search |

Commit Message

Richard Gobert Oct. 24, 2022, 5:17 a.m. UTC
  After searching for a protocol handler in dev_gro_receive, checking for
failure is redundant. Skip the failure code after finding the 
corresponding handler.

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
---
 net/core/gro.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)
  

Comments

Jakub Kicinski Nov. 1, 2022, 3:51 p.m. UTC | #1
On Mon, 24 Oct 2022 07:17:49 +0200 Richard Gobert wrote:
> After searching for a protocol handler in dev_gro_receive, checking for
> failure is redundant. Skip the failure code after finding the 
> corresponding handler.

Why does it matter? You see a measurable perf win?

Please fix the data on your system when you repost.
  
Richard Gobert Nov. 2, 2022, 4:45 p.m. UTC | #2
> Why does it matter? You see a measurable perf win?

In the common case, we will exit the loop with a break,
so this patch eliminates an unnecessary check.

On some architectures this optimization might be done
automatically by the compiler, but I think it will be better
to make it explicit here. Although on x86 this optimization
happens automatically, I noticed that on my build target
(ARM/GCC) this does change the binary.
  
Eric Dumazet Nov. 2, 2022, 6:20 p.m. UTC | #3
On Wed, Nov 2, 2022 at 9:46 AM Richard Gobert <richardbgobert@gmail.com> wrote:
>
> > Why does it matter? You see a measurable perf win?
>
> In the common case, we will exit the loop with a break,
> so this patch eliminates an unnecessary check.
>
> On some architectures this optimization might be done
> automatically by the compiler, but I think it will be better
> to make it explicit here. Although on x86 this optimization
> happens automatically, I noticed that on my build target
> (ARM/GCC) this does change the binary.

What about taking this as an opportunity to reduce the indentation
level by one tab ?

Untested patch:

diff --git a/net/core/gro.c b/net/core/gro.c
index bc9451743307bc380cca96ae6995aa0a3b83d185..ddfe92c9a5e869d241931b72d6b3426a0e858468
100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -491,43 +491,44 @@ static enum gro_result dev_gro_receive(struct
napi_struct *napi, struct sk_buff
        list_for_each_entry_rcu(ptype, head, list) {
                if (ptype->type != type || !ptype->callbacks.gro_receive)
                        continue;
+               goto found_ptype;
+       }
+       rcu_read_unlock();
+       goto normal;

-               skb_set_network_header(skb, skb_gro_offset(skb));
-               skb_reset_mac_len(skb);
-               BUILD_BUG_ON(sizeof_field(struct napi_gro_cb, zeroed)
!= sizeof(u32));
-               BUILD_BUG_ON(!IS_ALIGNED(offsetof(struct napi_gro_cb, zeroed),
-                                        sizeof(u32))); /* Avoid slow
unaligned acc */
-               *(u32 *)&NAPI_GRO_CB(skb)->zeroed = 0;
-               NAPI_GRO_CB(skb)->flush = skb_has_frag_list(skb);
-               NAPI_GRO_CB(skb)->is_atomic = 1;
-               NAPI_GRO_CB(skb)->count = 1;
-               if (unlikely(skb_is_gso(skb))) {
-                       NAPI_GRO_CB(skb)->count = skb_shinfo(skb)->gso_segs;
-                       /* Only support TCP at the moment. */
-                       if (!skb_is_gso_tcp(skb))
-                               NAPI_GRO_CB(skb)->flush = 1;
-               }
-
-               /* Setup for GRO checksum validation */
-               switch (skb->ip_summed) {
-               case CHECKSUM_COMPLETE:
-                       NAPI_GRO_CB(skb)->csum = skb->csum;
-                       NAPI_GRO_CB(skb)->csum_valid = 1;
-                       break;
-               case CHECKSUM_UNNECESSARY:
-                       NAPI_GRO_CB(skb)->csum_cnt = skb->csum_level + 1;
-                       break;
-               }
+found_ptype:
+       skb_set_network_header(skb, skb_gro_offset(skb));
+       skb_reset_mac_len(skb);
+       BUILD_BUG_ON(sizeof_field(struct napi_gro_cb, zeroed) != sizeof(u32));
+       BUILD_BUG_ON(!IS_ALIGNED(offsetof(struct napi_gro_cb, zeroed),
+                                sizeof(u32))); /* Avoid slow unaligned acc */
+       *(u32 *)&NAPI_GRO_CB(skb)->zeroed = 0;
+       NAPI_GRO_CB(skb)->flush = skb_has_frag_list(skb);
+       NAPI_GRO_CB(skb)->is_atomic = 1;
+       NAPI_GRO_CB(skb)->count = 1;
+       if (unlikely(skb_is_gso(skb))) {
+               NAPI_GRO_CB(skb)->count = skb_shinfo(skb)->gso_segs;
+               /* Only support TCP at the moment. */
+               if (!skb_is_gso_tcp(skb))
+                       NAPI_GRO_CB(skb)->flush = 1;
+       }

-               pp = INDIRECT_CALL_INET(ptype->callbacks.gro_receive,
-                                       ipv6_gro_receive, inet_gro_receive,
-                                       &gro_list->list, skb);
+       /* Setup for GRO checksum validation */
+       switch (skb->ip_summed) {
+       case CHECKSUM_COMPLETE:
+               NAPI_GRO_CB(skb)->csum = skb->csum;
+               NAPI_GRO_CB(skb)->csum_valid = 1;
+               break;
+       case CHECKSUM_UNNECESSARY:
+               NAPI_GRO_CB(skb)->csum_cnt = skb->csum_level + 1;
                break;
        }
-       rcu_read_unlock();

-       if (&ptype->list == head)
-               goto normal;
+       pp = INDIRECT_CALL_INET(ptype->callbacks.gro_receive,
+                               ipv6_gro_receive, inet_gro_receive,
+                               &gro_list->list, skb);
+
+       rcu_read_unlock();

        if (PTR_ERR(pp) == -EINPROGRESS) {
                ret = GRO_CONSUMED;
  
Eric Dumazet Nov. 2, 2022, 6:25 p.m. UTC | #4
On Wed, Nov 2, 2022 at 11:20 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, Nov 2, 2022 at 9:46 AM Richard Gobert <richardbgobert@gmail.com> wrote:
> >
> > > Why does it matter? You see a measurable perf win?
> >
> > In the common case, we will exit the loop with a break,
> > so this patch eliminates an unnecessary check.
> >
> > On some architectures this optimization might be done
> > automatically by the compiler, but I think it will be better
> > to make it explicit here. Although on x86 this optimization
> > happens automatically, I noticed that on my build target
> > (ARM/GCC) this does change the binary.
>
> What about taking this as an opportunity to reduce the indentation
> level by one tab ?
>
> Untested patch:
>
> diff --git a/net/core/gro.c b/net/core/gro.c
> index bc9451743307bc380cca96ae6995aa0a3b83d185..ddfe92c9a5e869d241931b72d6b3426a0e858468
> 100644
> --- a/net/core/gro.c
> +++ b/net/core/gro.c
> @@ -491,43 +491,44 @@ static enum gro_result dev_gro_receive(struct
> napi_struct *napi, struct sk_buff
>         list_for_each_entry_rcu(ptype, head, list) {
>                 if (ptype->type != type || !ptype->callbacks.gro_receive)
>                         continue;
> +               goto found_ptype;
> +       }
> +       rcu_read_unlock();
> +       goto normal;


Or even better:

        list_for_each_entry_rcu(ptype, head, list) {
               if (ptype->type == type && ptype->callbacks.gro_receive)
                       goto found_ptype;
       }
       rcu_read_unlock();
       goto normal;
  

Patch

diff --git a/net/core/gro.c b/net/core/gro.c
index bc9451743307..a4e1b5a5be64 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -522,13 +522,13 @@  static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 		pp = INDIRECT_CALL_INET(ptype->callbacks.gro_receive,
 					ipv6_gro_receive, inet_gro_receive,
 					&gro_list->list, skb);
-		break;
+		rcu_read_unlock();
+		goto found;
 	}
 	rcu_read_unlock();
+	goto normal;
 
-	if (&ptype->list == head)
-		goto normal;
-
+found:
 	if (PTR_ERR(pp) == -EINPROGRESS) {
 		ret = GRO_CONSUMED;
 		goto ok;