[net] skb: Do mix page pool and page referenced frags in GRO

Message ID 167475990764.1934330.11960904198087757911.stgit@localhost.localdomain
State New
Headers
Series [net] skb: Do mix page pool and page referenced frags in GRO |

Commit Message

Alexander Duyck Jan. 26, 2023, 7:06 p.m. UTC
  From: Alexander Duyck <alexanderduyck@fb.com>

GSO should not merge page pool recycled frames with standard reference
counted frames. Traditionally this didn't occur, at least not often.
However as we start looking at adding support for wireless adapters there
becomes the potential to mix the two due to A-MSDU repartitioning frames in
the receive path. There are possibly other places where this may have
occurred however I suspect they must be few and far between as we have not
seen this issue until now.

Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool")
Reported-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
---
 net/core/gro.c |    9 +++++++++
 1 file changed, 9 insertions(+)
  

Comments

Toke Høiland-Jørgensen Jan. 26, 2023, 7:14 p.m. UTC | #1
Alexander Duyck <alexander.duyck@gmail.com> writes:

> From: Alexander Duyck <alexanderduyck@fb.com>
>
> GSO should not merge page pool recycled frames with standard reference
> counted frames. Traditionally this didn't occur, at least not often.
> However as we start looking at adding support for wireless adapters there
> becomes the potential to mix the two due to A-MSDU repartitioning frames in
> the receive path. There are possibly other places where this may have
> occurred however I suspect they must be few and far between as we have not
> seen this issue until now.
>
> Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool")
> Reported-by: Felix Fietkau <nbd@nbd.name>
> Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>

I know I'm pattern matching a bit crudely here, but we recently had
another report where doing a get_page() on skb->head didn't seem to be
enough; any chance they might be related?

See: https://lore.kernel.org/r/Y9BfknDG0LXmruDu@JNXK7M3

-Toke
  
Alexander Duyck Jan. 26, 2023, 7:48 p.m. UTC | #2
On Thu, Jan 26, 2023 at 11:14 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Alexander Duyck <alexander.duyck@gmail.com> writes:
>
> > From: Alexander Duyck <alexanderduyck@fb.com>
> >
> > GSO should not merge page pool recycled frames with standard reference
> > counted frames. Traditionally this didn't occur, at least not often.
> > However as we start looking at adding support for wireless adapters there
> > becomes the potential to mix the two due to A-MSDU repartitioning frames in
> > the receive path. There are possibly other places where this may have
> > occurred however I suspect they must be few and far between as we have not
> > seen this issue until now.
> >
> > Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool")
> > Reported-by: Felix Fietkau <nbd@nbd.name>
> > Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
>
> I know I'm pattern matching a bit crudely here, but we recently had
> another report where doing a get_page() on skb->head didn't seem to be
> enough; any chance they might be related?
>
> See: https://lore.kernel.org/r/Y9BfknDG0LXmruDu@JNXK7M3

Looking at it I wouldn't think so. Doing get_page() on these frames is
fine. In the case you reference it looks like get_page() is being
called on a slab allocated skb head. So somehow a slab allocated head
is leaking through.

What is causing the issue here is that after get_page() is being
called and the fragments are moved into a non-pp_recycle skb they are
then picked out and merged back into a pp_recycle skb. As a result
what is happening is that we are seeing a reference count leak from
pp_frag_count and into refcount.

This is the quick-n-dirty fix. I am debating if we want to expand this
so that we could support the case where the donor frame is pp_recycle
but the recipient is a standard reference counted frame. Fixing that
would essentially consist of having to add logic to take the reference
on all donor frags, making certain that nr_frags on the donor skb
isn't altered, and then lastly making sure that all cases use the
NAPI_GRO_FREE path to drop the page pool counts.
  
Toke Høiland-Jørgensen Jan. 26, 2023, 9:35 p.m. UTC | #3
Alexander Duyck <alexander.duyck@gmail.com> writes:

> On Thu, Jan 26, 2023 at 11:14 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Alexander Duyck <alexander.duyck@gmail.com> writes:
>>
>> > From: Alexander Duyck <alexanderduyck@fb.com>
>> >
>> > GSO should not merge page pool recycled frames with standard reference
>> > counted frames. Traditionally this didn't occur, at least not often.
>> > However as we start looking at adding support for wireless adapters there
>> > becomes the potential to mix the two due to A-MSDU repartitioning frames in
>> > the receive path. There are possibly other places where this may have
>> > occurred however I suspect they must be few and far between as we have not
>> > seen this issue until now.
>> >
>> > Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool")
>> > Reported-by: Felix Fietkau <nbd@nbd.name>
>> > Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
>>
>> I know I'm pattern matching a bit crudely here, but we recently had
>> another report where doing a get_page() on skb->head didn't seem to be
>> enough; any chance they might be related?
>>
>> See: https://lore.kernel.org/r/Y9BfknDG0LXmruDu@JNXK7M3
>
> Looking at it I wouldn't think so. Doing get_page() on these frames is
> fine. In the case you reference it looks like get_page() is being
> called on a slab allocated skb head. So somehow a slab allocated head
> is leaking through.

Alright, thanks for taking a look! :)

-Toke
  
Jakub Kicinski Jan. 26, 2023, 11:13 p.m. UTC | #4
On Thu, 26 Jan 2023 11:06:59 -0800 Alexander Duyck wrote:
> From: Alexander Duyck <alexanderduyck@fb.com>
> 
> GSO should not merge page pool recycled frames with standard reference
> counted frames. Traditionally this didn't occur, at least not often.
> However as we start looking at adding support for wireless adapters there
> becomes the potential to mix the two due to A-MSDU repartitioning frames in
> the receive path. There are possibly other places where this may have
> occurred however I suspect they must be few and far between as we have not
> seen this issue until now.
> 
> Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool")
> Reported-by: Felix Fietkau <nbd@nbd.name>
> Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>

Exciting investigation!
Felix, out of curiosity - the impact of loosing GRO on performance is
not significant enough to care?  We could possibly try to switch to
using the frag list if we can't merge into frags safely.

> diff --git a/net/core/gro.c b/net/core/gro.c
> index 506f83d715f8..4bac7ea6e025 100644
> --- a/net/core/gro.c
> +++ b/net/core/gro.c
> @@ -162,6 +162,15 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
>  	struct sk_buff *lp;
>  	int segs;
>  
> +	/* Do not splice page pool based packets w/ non-page pool
> +	 * packets. This can result in reference count issues as page
> +	 * pool pages will not decrement the reference count and will
> +	 * instead be immediately returned to the pool or have frag
> +	 * count decremented.
> +	 */
> +	if (p->pp_recycle != skb->pp_recycle)
> +		return -ETOOMANYREFS;
> 
>  	/* pairs with WRITE_ONCE() in netif_set_gro_max_size() */
>  	gro_max_size = READ_ONCE(p->dev->gro_max_size);
>  
> 
>
  
Ilias Apalodimas Jan. 27, 2023, 7:15 a.m. UTC | #5
Thanks Alexander!

On Fri, 27 Jan 2023 at 01:13, Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 26 Jan 2023 11:06:59 -0800 Alexander Duyck wrote:
> > From: Alexander Duyck <alexanderduyck@fb.com>
> >
> > GSO should not merge page pool recycled frames with standard reference
> > counted frames. Traditionally this didn't occur, at least not often.
> > However as we start looking at adding support for wireless adapters there
> > becomes the potential to mix the two due to A-MSDU repartitioning frames in
> > the receive path. There are possibly other places where this may have
> > occurred however I suspect they must be few and far between as we have not
> > seen this issue until now.
> >
> > Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool")
> > Reported-by: Felix Fietkau <nbd@nbd.name>
> > Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
>
> Exciting investigation!
> Felix, out of curiosity - the impact of loosing GRO on performance is
> not significant enough to care?  We could possibly try to switch to
> using the frag list if we can't merge into frags safely.
>
> > diff --git a/net/core/gro.c b/net/core/gro.c
> > index 506f83d715f8..4bac7ea6e025 100644
> > --- a/net/core/gro.c
> > +++ b/net/core/gro.c
> > @@ -162,6 +162,15 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
> >       struct sk_buff *lp;
> >       int segs;
> >
> > +     /* Do not splice page pool based packets w/ non-page pool
> > +      * packets. This can result in reference count issues as page
> > +      * pool pages will not decrement the reference count and will
> > +      * instead be immediately returned to the pool or have frag
> > +      * count decremented.
> > +      */
> > +     if (p->pp_recycle != skb->pp_recycle)
> > +             return -ETOOMANYREFS;
> >
> >       /* pairs with WRITE_ONCE() in netif_set_gro_max_size() */
> >       gro_max_size = READ_ONCE(p->dev->gro_max_size);
> >
> >
> >
>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
  
Felix Fietkau Jan. 27, 2023, 7:21 a.m. UTC | #6
On 27.01.23 00:13, Jakub Kicinski wrote:
> On Thu, 26 Jan 2023 11:06:59 -0800 Alexander Duyck wrote:
>> From: Alexander Duyck <alexanderduyck@fb.com>
>> 
>> GSO should not merge page pool recycled frames with standard reference
>> counted frames. Traditionally this didn't occur, at least not often.
>> However as we start looking at adding support for wireless adapters there
>> becomes the potential to mix the two due to A-MSDU repartitioning frames in
>> the receive path. There are possibly other places where this may have
>> occurred however I suspect they must be few and far between as we have not
>> seen this issue until now.
>> 
>> Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool")
>> Reported-by: Felix Fietkau <nbd@nbd.name>
>> Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
> 
> Exciting investigation!
> Felix, out of curiosity - the impact of loosing GRO on performance is
> not significant enough to care?  We could possibly try to switch to
> using the frag list if we can't merge into frags safely.
Since this only affects combining page_pool and non-page_pool packets, 
the performance loss should be neglegible.

- Felix
  
Jakub Kicinski Jan. 28, 2023, 5:26 a.m. UTC | #7
On Sat, 28 Jan 2023 10:37:47 +0800 Yunsheng Lin wrote:
> If we are not allowing gro for the above case, setting NAPI_GRO_CB(p)->flush
> to 1 in gro_list_prepare() seems to be making more sense so that the above
> case has the same handling as skb_has_frag_list() handling?
> https://elixir.bootlin.com/linux/v6.2-rc4/source/net/core/gro.c#L503
> 
> As it seems to avoid some unnecessary operation according to comment
> in tcp4_gro_receive():
> https://elixir.bootlin.com/linux/v6.2-rc4/source/net/ipv4/tcp_offload.c#L322

The frag_list case can be determined with just the input skb.
For pp_recycle we need to compare input skb's pp_recycle with
the pp_recycle of the skb already held by GRO.

I'll hold off with applying a bit longer tho, in case Eric
wants to chime in with an ack or opinion.
  
Eric Dumazet Jan. 28, 2023, 7:08 a.m. UTC | #8
On Sat, Jan 28, 2023 at 6:26 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sat, 28 Jan 2023 10:37:47 +0800 Yunsheng Lin wrote:
> > If we are not allowing gro for the above case, setting NAPI_GRO_CB(p)->flush
> > to 1 in gro_list_prepare() seems to be making more sense so that the above
> > case has the same handling as skb_has_frag_list() handling?
> > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/core/gro.c#L503
> >
> > As it seems to avoid some unnecessary operation according to comment
> > in tcp4_gro_receive():
> > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/ipv4/tcp_offload.c#L322
>
> The frag_list case can be determined with just the input skb.
> For pp_recycle we need to compare input skb's pp_recycle with
> the pp_recycle of the skb already held by GRO.
>
> I'll hold off with applying a bit longer tho, in case Eric
> wants to chime in with an ack or opinion.

We can say that we are adding in the fast path an expensive check
about an unlikely condition.

GRO is by far the most expensive component in our stack.

I would at least make the extra checks conditional to CONFIG_PAGE_POOL=y
Ideally all accesses to skb->pp_recycle should be done via a helper [1]

I hope that we will switch later to a per-page marker, instead of a per-skb one.

( a la https://www.spinics.net/lists/netdev/msg874099.html )

[1] Problem is that CONFIG_PAGE_POOL=y is now forced because of
net/bpf/test_run.c

So... testing skb->pp_recycle seems needed for the time being

Reviewed-by: Eric Dumazet <edumazet@google.com>

My tentative patch was something like:

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4c8492401a101f1d6d43079fc70962210389763c..a53b176738b10f3b69b38c487e0c280f44990b6f
100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -918,8 +918,10 @@ struct sk_buff {
                                fclone:2,
                                peeked:1,
                                head_frag:1,
-                               pfmemalloc:1,
-                               pp_recycle:1; /* page_pool recycle indicator */
+#ifdef CONFIG_PAGE_POOL
+                               pp_recycle:1, /* page_pool recycle indicator */
+#endif
+                               pfmemalloc:1;
 #ifdef CONFIG_SKB_EXTENSIONS
        __u8                    active_extensions;
 #endif
@@ -3388,6 +3390,15 @@ static inline void __skb_frag_unref(skb_frag_t
*frag, bool recycle)
        put_page(page);
 }

+static inline bool skb_pp_recycle(const struct sk_buff *skb)
+{
+#ifdef CONFIG_PAGE_POOL
+       return skb->pp_recycle;
+#else
+       return false;
+#endif
+}
+
 /**
  * skb_frag_unref - release a reference on a paged fragment of an skb.
  * @skb: the buffer
@@ -3400,7 +3411,7 @@ static inline void skb_frag_unref(struct sk_buff
*skb, int f)
        struct skb_shared_info *shinfo = skb_shinfo(skb);

        if (!skb_zcopy_managed(skb))
-               __skb_frag_unref(&shinfo->frags[f], skb->pp_recycle);
+               __skb_frag_unref(&shinfo->frags[f], skb_pp_recycle(skb));
 }

 /**
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 180df58e85c72eaa16f5cb56b56d181a379b8921..7a2783a2c9608eec728a0adacea4619ab1c62791
100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -801,19 +801,13 @@ static void skb_clone_fraglist(struct sk_buff *skb)
                skb_get(list);
 }

-static bool skb_pp_recycle(struct sk_buff *skb, void *data)
-{
-       if (!IS_ENABLED(CONFIG_PAGE_POOL) || !skb->pp_recycle)
-               return false;
-       return page_pool_return_skb_page(virt_to_page(data));
-}
-
 static void skb_free_head(struct sk_buff *skb)
 {
        unsigned char *head = skb->head;

        if (skb->head_frag) {
-               if (skb_pp_recycle(skb, head))
+               if (skb_pp_recycle(skb) &&
+                   page_pool_return_skb_page(virt_to_page(head)))
                        return;
                skb_free_frag(head);
        } else {
@@ -840,7 +834,7 @@ static void skb_release_data(struct sk_buff *skb,
enum skb_drop_reason reason)
        }

        for (i = 0; i < shinfo->nr_frags; i++)
-               __skb_frag_unref(&shinfo->frags[i], skb->pp_recycle);
+               __skb_frag_unref(&shinfo->frags[i], skb_pp_recycle(skb));

 free_head:
        if (shinfo->frag_list)
@@ -857,7 +851,10 @@ static void skb_release_data(struct sk_buff *skb,
enum skb_drop_reason reason)
         * Eventually the last SKB will have the recycling bit set and it's
         * dataref set to 0, which will trigger the recycling
         */
+#ifdef CONFIG_PAGE_POOL
        skb->pp_recycle = 0;
+#endif
+       return;
 }

 /*
@@ -1292,7 +1289,9 @@ static struct sk_buff *__skb_clone(struct
sk_buff *n, struct sk_buff *skb)
        n->nohdr = 0;
        n->peeked = 0;
        C(pfmemalloc);
+#ifdef CONFIG_PAGE_POOL
        C(pp_recycle);
+#endif
        n->destructor = NULL;
        C(tail);
        C(end);
@@ -3859,7 +3858,7 @@ int skb_shift(struct sk_buff *tgt, struct
sk_buff *skb, int shiftlen)
                fragto = &skb_shinfo(tgt)->frags[merge];

                skb_frag_size_add(fragto, skb_frag_size(fragfrom));
-               __skb_frag_unref(fragfrom, skb->pp_recycle);
+               __skb_frag_unref(fragfrom, skb_pp_recycle(skb));
        }

        /* Reposition in the original skb */
@@ -5529,7 +5528,7 @@ bool skb_try_coalesce(struct sk_buff *to, struct
sk_buff *from,
         * references for cloned SKBs at the moment that would result in
         * inconsistent reference counts.
         */
-       if (to->pp_recycle != (from->pp_recycle && !skb_cloned(from)))
+       if (skb_pp_recycle(to) != (skb_pp_recycle(from) && !skb_cloned(from)))
                return false;

        if (len <= skb_tailroom(to)) {
  
Eric Dumazet Jan. 28, 2023, 7:15 a.m. UTC | #9
On Sat, Jan 28, 2023 at 6:26 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sat, 28 Jan 2023 10:37:47 +0800 Yunsheng Lin wrote:
> > If we are not allowing gro for the above case, setting NAPI_GRO_CB(p)->flush
> > to 1 in gro_list_prepare() seems to be making more sense so that the above
> > case has the same handling as skb_has_frag_list() handling?
> > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/core/gro.c#L503
> >
> > As it seems to avoid some unnecessary operation according to comment
> > in tcp4_gro_receive():
> > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/ipv4/tcp_offload.c#L322
>
> The frag_list case can be determined with just the input skb.
> For pp_recycle we need to compare input skb's pp_recycle with
> the pp_recycle of the skb already held by GRO.
>
> I'll hold off with applying a bit longer tho, in case Eric
> wants to chime in with an ack or opinion.

Doing the test only if the final step (once all headers have been
verified) seems less costly
for the vast majority of the cases the driver cooks skbs with a
consistent pp_recycle bit ?

So Alex patch seems less expensive to me than adding the check very early.
  
patchwork-bot+netdevbpf@kernel.org Jan. 28, 2023, 7:50 a.m. UTC | #10
Hello:

This patch was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 26 Jan 2023 11:06:59 -0800 you wrote:
> From: Alexander Duyck <alexanderduyck@fb.com>
> 
> GSO should not merge page pool recycled frames with standard reference
> counted frames. Traditionally this didn't occur, at least not often.
> However as we start looking at adding support for wireless adapters there
> becomes the potential to mix the two due to A-MSDU repartitioning frames in
> the receive path. There are possibly other places where this may have
> occurred however I suspect they must be few and far between as we have not
> seen this issue until now.
> 
> [...]

Here is the summary with links:
  - [net] skb: Do mix page pool and page referenced frags in GRO
    https://git.kernel.org/netdev/net/c/7d2c89b32587

You are awesome, thank you!
  
Alexander Duyck Jan. 28, 2023, 5:08 p.m. UTC | #11
On Fri, Jan 27, 2023 at 11:16 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Sat, Jan 28, 2023 at 6:26 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Sat, 28 Jan 2023 10:37:47 +0800 Yunsheng Lin wrote:
> > > If we are not allowing gro for the above case, setting NAPI_GRO_CB(p)->flush
> > > to 1 in gro_list_prepare() seems to be making more sense so that the above
> > > case has the same handling as skb_has_frag_list() handling?
> > > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/core/gro.c#L503
> > >
> > > As it seems to avoid some unnecessary operation according to comment
> > > in tcp4_gro_receive():
> > > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/ipv4/tcp_offload.c#L322
> >
> > The frag_list case can be determined with just the input skb.
> > For pp_recycle we need to compare input skb's pp_recycle with
> > the pp_recycle of the skb already held by GRO.
> >
> > I'll hold off with applying a bit longer tho, in case Eric
> > wants to chime in with an ack or opinion.
>
> Doing the test only if the final step (once all headers have been
> verified) seems less costly
> for the vast majority of the cases the driver cooks skbs with a
> consistent pp_recycle bit ?
>
> So Alex patch seems less expensive to me than adding the check very early.

That was the general idea. Basically there is no need to look into
this until we are looking at merging the skb and it is very unlikely
that we will see a mix of page pool and non-page pool skbs. I
considered this check to be something equivalent to discovering there
is no space in the skb to store the frags so that is one of the
reasons why I had picked the spot that I did.
  
Paolo Abeni Jan. 30, 2023, 8:50 a.m. UTC | #12
On Sat, 2023-01-28 at 08:08 +0100, Eric Dumazet wrote:
> On Sat, Jan 28, 2023 at 6:26 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > 
> > On Sat, 28 Jan 2023 10:37:47 +0800 Yunsheng Lin wrote:
> > > If we are not allowing gro for the above case, setting NAPI_GRO_CB(p)->flush
> > > to 1 in gro_list_prepare() seems to be making more sense so that the above
> > > case has the same handling as skb_has_frag_list() handling?
> > > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/core/gro.c#L503
> > > 
> > > As it seems to avoid some unnecessary operation according to comment
> > > in tcp4_gro_receive():
> > > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/ipv4/tcp_offload.c#L322
> > 
> > The frag_list case can be determined with just the input skb.
> > For pp_recycle we need to compare input skb's pp_recycle with
> > the pp_recycle of the skb already held by GRO.
> > 
> > I'll hold off with applying a bit longer tho, in case Eric
> > wants to chime in with an ack or opinion.
> 
> We can say that we are adding in the fast path an expensive check
> about an unlikely condition.
> 
> GRO is by far the most expensive component in our stack.

Slightly related to the above: currently the GRO engine performs the
skb metadata check for every packet. My understanding is that even with
XDP enabled and ebpf running on the given packet, the skb should 
usually have meta_len == 0. 

What about setting 'skb->slow_gro' together with meta_len and moving
the skb_metadata_differs() check under the slow_gro guard?

Cheers,

Paolo
  
Alexander Duyck Jan. 30, 2023, 4:17 p.m. UTC | #13
On Mon, Jan 30, 2023 at 12:50 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Sat, 2023-01-28 at 08:08 +0100, Eric Dumazet wrote:
> > On Sat, Jan 28, 2023 at 6:26 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > On Sat, 28 Jan 2023 10:37:47 +0800 Yunsheng Lin wrote:
> > > > If we are not allowing gro for the above case, setting NAPI_GRO_CB(p)->flush
> > > > to 1 in gro_list_prepare() seems to be making more sense so that the above
> > > > case has the same handling as skb_has_frag_list() handling?
> > > > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/core/gro.c#L503
> > > >
> > > > As it seems to avoid some unnecessary operation according to comment
> > > > in tcp4_gro_receive():
> > > > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/ipv4/tcp_offload.c#L322
> > >
> > > The frag_list case can be determined with just the input skb.
> > > For pp_recycle we need to compare input skb's pp_recycle with
> > > the pp_recycle of the skb already held by GRO.
> > >
> > > I'll hold off with applying a bit longer tho, in case Eric
> > > wants to chime in with an ack or opinion.
> >
> > We can say that we are adding in the fast path an expensive check
> > about an unlikely condition.
> >
> > GRO is by far the most expensive component in our stack.
>
> Slightly related to the above: currently the GRO engine performs the
> skb metadata check for every packet. My understanding is that even with
> XDP enabled and ebpf running on the given packet, the skb should
> usually have meta_len == 0.
>
> What about setting 'skb->slow_gro' together with meta_len and moving
> the skb_metadata_differs() check under the slow_gro guard?

Makes sense to me, especially since we have to do a pointer chase to
get the metadata length out of the shared info.

Looking at the code one thing I was wondering about is if we should be
flagging frames where one is slow_gro and one is not as having a diff
and just skipping the checks since we know the slow_gro checks are
expensive and if they differ based on that flag odds are one will have
a field present that the other doesn't.
  
Jesper Dangaard Brouer Jan. 30, 2023, 4:49 p.m. UTC | #14
On 27/01/2023 00.13, Jakub Kicinski wrote:
> On Thu, 26 Jan 2023 11:06:59 -0800 Alexander Duyck wrote:
>> From: Alexander Duyck<alexanderduyck@fb.com>
>>
>> GSO should not merge page pool recycled frames with standard reference
>> counted frames. Traditionally this didn't occur, at least not often.
>> However as we start looking at adding support for wireless adapters there
>> becomes the potential to mix the two due to A-MSDU repartitioning frames in
>> the receive path. There are possibly other places where this may have
>> occurred however I suspect they must be few and far between as we have not
>> seen this issue until now.
>>
>> Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool")
>> Reported-by: Felix Fietkau<nbd@nbd.name>
>> Signed-off-by: Alexander Duyck<alexanderduyck@fb.com>
> Exciting investigation!
> Felix, out of curiosity - the impact of loosing GRO on performance is
> not significant enough to care?  We could possibly try to switch to
> using the frag list if we can't merge into frags safely.

Using the frag list sounds scary, because we recently learned that
kfree_skb_list requires all SKBs on the list to have same refcnt (else
the walking of the list can lead to other bugs).

--Jesper
  

Patch

diff --git a/net/core/gro.c b/net/core/gro.c
index 506f83d715f8..4bac7ea6e025 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -162,6 +162,15 @@  int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
 	struct sk_buff *lp;
 	int segs;
 
+	/* Do not splice page pool based packets w/ non-page pool
+	 * packets. This can result in reference count issues as page
+	 * pool pages will not decrement the reference count and will
+	 * instead be immediately returned to the pool or have frag
+	 * count decremented.
+	 */
+	if (p->pp_recycle != skb->pp_recycle)
+		return -ETOOMANYREFS;
+
 	/* pairs with WRITE_ONCE() in netif_set_gro_max_size() */
 	gro_max_size = READ_ONCE(p->dev->gro_max_size);