Message ID | 167475990764.1934330.11960904198087757911.stgit@localhost.localdomain |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp438876wrn; Thu, 26 Jan 2023 11:08:11 -0800 (PST) X-Google-Smtp-Source: AK7set8YkHzvoNfh38TFrLpsUdUCwgEWhQ29cPqFmmReW2mhGyKCd3uPACdXuqlAYeBz5vCmHvVs X-Received: by 2002:aa7:cd65:0:b0:4a0:e415:d39f with SMTP id ca5-20020aa7cd65000000b004a0e415d39fmr2533239edb.41.1674760091645; Thu, 26 Jan 2023 11:08:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674760091; cv=none; d=google.com; s=arc-20160816; b=RLIP0slcXQPY/R7evHB5N+qYDFwbrlOHsFA+E5CMTCiRwxqblNulXhOTT13LfUvHMN s09OVx0yxQIw2YTiDqB/vXenHRhfkwwRfSyj5I3MD5DWL3X2d+FLfHCmhI4RkmF7m37m 2BZ0YaYglIDDhHmr3LhNWL/VZeCOIDwblCR6DJa8f6Jo08BxudSkFxO+jgGWz+cXBTlZ pIoWu0yB4V+mK+A8M/IpwPYGPJ+uWPHhcds8jw3lBEo+dD/LFtfn3/UY4uheKbRLt0v8 elrqYRkhtYZB72mXr4Uo6+9hgpJqkJA5pdvKutfv9qAIfVX5Yf1nltAGlNRbSxU9RgGN MUhQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:dkim-signature; bh=uCv4gRxFA6EVdh6HH0URpLxFamW9QobkbMQUcrragFA=; b=hl2M5euA+T79dThdSb4yjqqbOilcgGnFbQH36qiuj+Zaan+a/YxnqHMAL5VdKFXcU4 /sKA1yRm9VlyCqD2doH2pBEtPH1Hd8DbJz19q62RwnWCZgcxiEWTteCWtvNJ5liCUHhk nCNFRIxcsFj3+8uz+1eTVFQEFgvJj6qhQfCe1V91VEaTi9cMh+7UCTNDj2u/uVidCfUB sxJMU+QUboVZw/JB6WTULg+ouNW15Upemo3dx4fHofAvcANmcPr2OOMNWOhTbLixhebL ksKcqfUPra6K4bf3KmQpKQcaJgXBAjxURQU5XoOBCLx6FBcS226d22UBexwlGQ8n1jz2 5aMQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=d7zk+ziM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u21-20020a509515000000b004a08f080500si384058eda.441.2023.01.26.11.07.46; Thu, 26 Jan 2023 11:08:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=d7zk+ziM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232141AbjAZTHL (ORCPT <rfc822;lekhanya01809@gmail.com> + 99 others); Thu, 26 Jan 2023 14:07:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50118 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230221AbjAZTHI (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 26 Jan 2023 14:07:08 -0500 Received: from mail-pl1-x62c.google.com (mail-pl1-x62c.google.com [IPv6:2607:f8b0:4864:20::62c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C9E34526F; Thu, 26 Jan 2023 11:07:01 -0800 (PST) Received: by mail-pl1-x62c.google.com with SMTP id k13so2776421plg.0; Thu, 26 Jan 2023 11:07:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject:from:to:cc:subject :date:message-id:reply-to; bh=uCv4gRxFA6EVdh6HH0URpLxFamW9QobkbMQUcrragFA=; b=d7zk+ziM5pl3fshYkdFnHqJelj/bCdpeh61z0rKS3HyiBOxqqSaSrpmOOjue+oFzkA tiXUtVosjqMu4+mh3INRd10ByCUaavhnwKIzlEMvnLp5j/HmIC4r/1ag9aLrea7cEhn1 lqaWCPflpEsHOHnuDKlsFlFhSwNtQymZ9qNJxw8L3ydnUSV4xQUJCK9or+zPVB7z66fD sPApZtNqFXZacMd1MHJny2jBYh1w843FZSMeHa6M5UqTQsL80av6VqgHXS1ev5rU5ATh Oa9uxUgTplrbNV09S3P/iiWdBWBVqn0g+XlLnFQ5dQqzGQm7iw5rRCou6VnaM6it6Hg2 t0yQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=uCv4gRxFA6EVdh6HH0URpLxFamW9QobkbMQUcrragFA=; b=Bx/owZFQZ4eAdH2LA06yalxsjUJ6uZ+WWMAFsIPDV4XfLAzfXQEU+U/OqJYbm5f9ay uIJiCFoYqSGzUtNzLRSGrQi3evvAKXYiYriiKFz2pw9Sii/j3XstHJqouRjMy50sHVDU vu2NsCCCGV5zuuXIYhCi8NwjEZGvqDBcTOUHedjkLT9i+72o0ejPfvaDt3vmfrLh+ypk zFPniaPhLwyWYiUDUr7fEhShinIfePeK8zeojC7yP65aeIM7kb58mWNmd9hbBcSdfYBq 0fxaAu7LT/fEyqgd4n7zCwQEotT02yYMhbIX5UC4FMMxgfsU5dtMxFZHBLOWmWwr2y6R HtZQ== X-Gm-Message-State: AO0yUKVmzeKSPWxToYF75IuxT+pGIrpvXbJn//i3hyJnG7OVPa8WHTlv 8UsKpCmHWkwJM4qQDmTVBW8= X-Received: by 2002:a17:90b:4d81:b0:22c:1e60:dc04 with SMTP id oj1-20020a17090b4d8100b0022c1e60dc04mr3291871pjb.16.1674760020927; Thu, 26 Jan 2023 11:07:00 -0800 (PST) Received: from localhost.localdomain ([98.97.119.47]) by smtp.gmail.com with ESMTPSA id e11-20020a17090a728b00b00229094aabd0sm3632313pjg.35.2023.01.26.11.06.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Jan 2023 11:07:00 -0800 (PST) Subject: [net PATCH] skb: Do mix page pool and page referenced frags in GRO From: Alexander Duyck <alexander.duyck@gmail.com> To: nbd@nbd.name Cc: alexander.duyck@gmail.com, davem@davemloft.net, edumazet@google.com, hawk@kernel.org, ilias.apalodimas@linaro.org, kuba@kernel.org, linux-kernel@vger.kernel.org, linyunsheng@huawei.com, lorenzo@kernel.org, netdev@vger.kernel.org, pabeni@redhat.com Date: Thu, 26 Jan 2023 11:06:59 -0800 Message-ID: <167475990764.1934330.11960904198087757911.stgit@localhost.localdomain> In-Reply-To: <04e27096-9ace-07eb-aa51-1663714a586d@nbd.name> References: <04e27096-9ace-07eb-aa51-1663714a586d@nbd.name> User-Agent: StGit/1.5 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756113237557810318?= X-GMAIL-MSGID: =?utf-8?q?1756113237557810318?= |
Series |
[net] skb: Do mix page pool and page referenced frags in GRO
|
|
Commit Message
Alexander Duyck
Jan. 26, 2023, 7:06 p.m. UTC
From: Alexander Duyck <alexanderduyck@fb.com> GSO should not merge page pool recycled frames with standard reference counted frames. Traditionally this didn't occur, at least not often. However as we start looking at adding support for wireless adapters there becomes the potential to mix the two due to A-MSDU repartitioning frames in the receive path. There are possibly other places where this may have occurred however I suspect they must be few and far between as we have not seen this issue until now. Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool") Reported-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> --- net/core/gro.c | 9 +++++++++ 1 file changed, 9 insertions(+)
Comments
Alexander Duyck <alexander.duyck@gmail.com> writes: > From: Alexander Duyck <alexanderduyck@fb.com> > > GSO should not merge page pool recycled frames with standard reference > counted frames. Traditionally this didn't occur, at least not often. > However as we start looking at adding support for wireless adapters there > becomes the potential to mix the two due to A-MSDU repartitioning frames in > the receive path. There are possibly other places where this may have > occurred however I suspect they must be few and far between as we have not > seen this issue until now. > > Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool") > Reported-by: Felix Fietkau <nbd@nbd.name> > Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> I know I'm pattern matching a bit crudely here, but we recently had another report where doing a get_page() on skb->head didn't seem to be enough; any chance they might be related? See: https://lore.kernel.org/r/Y9BfknDG0LXmruDu@JNXK7M3 -Toke
On Thu, Jan 26, 2023 at 11:14 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Alexander Duyck <alexander.duyck@gmail.com> writes: > > > From: Alexander Duyck <alexanderduyck@fb.com> > > > > GSO should not merge page pool recycled frames with standard reference > > counted frames. Traditionally this didn't occur, at least not often. > > However as we start looking at adding support for wireless adapters there > > becomes the potential to mix the two due to A-MSDU repartitioning frames in > > the receive path. There are possibly other places where this may have > > occurred however I suspect they must be few and far between as we have not > > seen this issue until now. > > > > Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool") > > Reported-by: Felix Fietkau <nbd@nbd.name> > > Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> > > I know I'm pattern matching a bit crudely here, but we recently had > another report where doing a get_page() on skb->head didn't seem to be > enough; any chance they might be related? > > See: https://lore.kernel.org/r/Y9BfknDG0LXmruDu@JNXK7M3 Looking at it I wouldn't think so. Doing get_page() on these frames is fine. In the case you reference it looks like get_page() is being called on a slab allocated skb head. So somehow a slab allocated head is leaking through. What is causing the issue here is that after get_page() is being called and the fragments are moved into a non-pp_recycle skb they are then picked out and merged back into a pp_recycle skb. As a result what is happening is that we are seeing a reference count leak from pp_frag_count and into refcount. This is the quick-n-dirty fix. I am debating if we want to expand this so that we could support the case where the donor frame is pp_recycle but the recipient is a standard reference counted frame. Fixing that would essentially consist of having to add logic to take the reference on all donor frags, making certain that nr_frags on the donor skb isn't altered, and then lastly making sure that all cases use the NAPI_GRO_FREE path to drop the page pool counts.
Alexander Duyck <alexander.duyck@gmail.com> writes: > On Thu, Jan 26, 2023 at 11:14 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> Alexander Duyck <alexander.duyck@gmail.com> writes: >> >> > From: Alexander Duyck <alexanderduyck@fb.com> >> > >> > GSO should not merge page pool recycled frames with standard reference >> > counted frames. Traditionally this didn't occur, at least not often. >> > However as we start looking at adding support for wireless adapters there >> > becomes the potential to mix the two due to A-MSDU repartitioning frames in >> > the receive path. There are possibly other places where this may have >> > occurred however I suspect they must be few and far between as we have not >> > seen this issue until now. >> > >> > Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool") >> > Reported-by: Felix Fietkau <nbd@nbd.name> >> > Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> >> >> I know I'm pattern matching a bit crudely here, but we recently had >> another report where doing a get_page() on skb->head didn't seem to be >> enough; any chance they might be related? >> >> See: https://lore.kernel.org/r/Y9BfknDG0LXmruDu@JNXK7M3 > > Looking at it I wouldn't think so. Doing get_page() on these frames is > fine. In the case you reference it looks like get_page() is being > called on a slab allocated skb head. So somehow a slab allocated head > is leaking through. Alright, thanks for taking a look! :) -Toke
On Thu, 26 Jan 2023 11:06:59 -0800 Alexander Duyck wrote: > From: Alexander Duyck <alexanderduyck@fb.com> > > GSO should not merge page pool recycled frames with standard reference > counted frames. Traditionally this didn't occur, at least not often. > However as we start looking at adding support for wireless adapters there > becomes the potential to mix the two due to A-MSDU repartitioning frames in > the receive path. There are possibly other places where this may have > occurred however I suspect they must be few and far between as we have not > seen this issue until now. > > Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool") > Reported-by: Felix Fietkau <nbd@nbd.name> > Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Exciting investigation! Felix, out of curiosity - the impact of loosing GRO on performance is not significant enough to care? We could possibly try to switch to using the frag list if we can't merge into frags safely. > diff --git a/net/core/gro.c b/net/core/gro.c > index 506f83d715f8..4bac7ea6e025 100644 > --- a/net/core/gro.c > +++ b/net/core/gro.c > @@ -162,6 +162,15 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb) > struct sk_buff *lp; > int segs; > > + /* Do not splice page pool based packets w/ non-page pool > + * packets. This can result in reference count issues as page > + * pool pages will not decrement the reference count and will > + * instead be immediately returned to the pool or have frag > + * count decremented. > + */ > + if (p->pp_recycle != skb->pp_recycle) > + return -ETOOMANYREFS; > > /* pairs with WRITE_ONCE() in netif_set_gro_max_size() */ > gro_max_size = READ_ONCE(p->dev->gro_max_size); > > >
Thanks Alexander! On Fri, 27 Jan 2023 at 01:13, Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 26 Jan 2023 11:06:59 -0800 Alexander Duyck wrote: > > From: Alexander Duyck <alexanderduyck@fb.com> > > > > GSO should not merge page pool recycled frames with standard reference > > counted frames. Traditionally this didn't occur, at least not often. > > However as we start looking at adding support for wireless adapters there > > becomes the potential to mix the two due to A-MSDU repartitioning frames in > > the receive path. There are possibly other places where this may have > > occurred however I suspect they must be few and far between as we have not > > seen this issue until now. > > > > Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool") > > Reported-by: Felix Fietkau <nbd@nbd.name> > > Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> > > Exciting investigation! > Felix, out of curiosity - the impact of loosing GRO on performance is > not significant enough to care? We could possibly try to switch to > using the frag list if we can't merge into frags safely. > > > diff --git a/net/core/gro.c b/net/core/gro.c > > index 506f83d715f8..4bac7ea6e025 100644 > > --- a/net/core/gro.c > > +++ b/net/core/gro.c > > @@ -162,6 +162,15 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb) > > struct sk_buff *lp; > > int segs; > > > > + /* Do not splice page pool based packets w/ non-page pool > > + * packets. This can result in reference count issues as page > > + * pool pages will not decrement the reference count and will > > + * instead be immediately returned to the pool or have frag > > + * count decremented. > > + */ > > + if (p->pp_recycle != skb->pp_recycle) > > + return -ETOOMANYREFS; > > > > /* pairs with WRITE_ONCE() in netif_set_gro_max_size() */ > > gro_max_size = READ_ONCE(p->dev->gro_max_size); > > > > > > > Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
On 27.01.23 00:13, Jakub Kicinski wrote: > On Thu, 26 Jan 2023 11:06:59 -0800 Alexander Duyck wrote: >> From: Alexander Duyck <alexanderduyck@fb.com> >> >> GSO should not merge page pool recycled frames with standard reference >> counted frames. Traditionally this didn't occur, at least not often. >> However as we start looking at adding support for wireless adapters there >> becomes the potential to mix the two due to A-MSDU repartitioning frames in >> the receive path. There are possibly other places where this may have >> occurred however I suspect they must be few and far between as we have not >> seen this issue until now. >> >> Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool") >> Reported-by: Felix Fietkau <nbd@nbd.name> >> Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> > > Exciting investigation! > Felix, out of curiosity - the impact of loosing GRO on performance is > not significant enough to care? We could possibly try to switch to > using the frag list if we can't merge into frags safely. Since this only affects combining page_pool and non-page_pool packets, the performance loss should be neglegible. - Felix
On Sat, 28 Jan 2023 10:37:47 +0800 Yunsheng Lin wrote: > If we are not allowing gro for the above case, setting NAPI_GRO_CB(p)->flush > to 1 in gro_list_prepare() seems to be making more sense so that the above > case has the same handling as skb_has_frag_list() handling? > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/core/gro.c#L503 > > As it seems to avoid some unnecessary operation according to comment > in tcp4_gro_receive(): > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/ipv4/tcp_offload.c#L322 The frag_list case can be determined with just the input skb. For pp_recycle we need to compare input skb's pp_recycle with the pp_recycle of the skb already held by GRO. I'll hold off with applying a bit longer tho, in case Eric wants to chime in with an ack or opinion.
On Sat, Jan 28, 2023 at 6:26 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Sat, 28 Jan 2023 10:37:47 +0800 Yunsheng Lin wrote: > > If we are not allowing gro for the above case, setting NAPI_GRO_CB(p)->flush > > to 1 in gro_list_prepare() seems to be making more sense so that the above > > case has the same handling as skb_has_frag_list() handling? > > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/core/gro.c#L503 > > > > As it seems to avoid some unnecessary operation according to comment > > in tcp4_gro_receive(): > > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/ipv4/tcp_offload.c#L322 > > The frag_list case can be determined with just the input skb. > For pp_recycle we need to compare input skb's pp_recycle with > the pp_recycle of the skb already held by GRO. > > I'll hold off with applying a bit longer tho, in case Eric > wants to chime in with an ack or opinion. We can say that we are adding in the fast path an expensive check about an unlikely condition. GRO is by far the most expensive component in our stack. I would at least make the extra checks conditional to CONFIG_PAGE_POOL=y Ideally all accesses to skb->pp_recycle should be done via a helper [1] I hope that we will switch later to a per-page marker, instead of a per-skb one. ( a la https://www.spinics.net/lists/netdev/msg874099.html ) [1] Problem is that CONFIG_PAGE_POOL=y is now forced because of net/bpf/test_run.c So... testing skb->pp_recycle seems needed for the time being Reviewed-by: Eric Dumazet <edumazet@google.com> My tentative patch was something like: diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 4c8492401a101f1d6d43079fc70962210389763c..a53b176738b10f3b69b38c487e0c280f44990b6f 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -918,8 +918,10 @@ struct sk_buff { fclone:2, peeked:1, head_frag:1, - pfmemalloc:1, - pp_recycle:1; /* page_pool recycle indicator */ +#ifdef CONFIG_PAGE_POOL + pp_recycle:1, /* page_pool recycle indicator */ +#endif + pfmemalloc:1; #ifdef CONFIG_SKB_EXTENSIONS __u8 active_extensions; #endif @@ -3388,6 +3390,15 @@ static inline void __skb_frag_unref(skb_frag_t *frag, bool recycle) put_page(page); } +static inline bool skb_pp_recycle(const struct sk_buff *skb) +{ +#ifdef CONFIG_PAGE_POOL + return skb->pp_recycle; +#else + return false; +#endif +} + /** * skb_frag_unref - release a reference on a paged fragment of an skb. * @skb: the buffer @@ -3400,7 +3411,7 @@ static inline void skb_frag_unref(struct sk_buff *skb, int f) struct skb_shared_info *shinfo = skb_shinfo(skb); if (!skb_zcopy_managed(skb)) - __skb_frag_unref(&shinfo->frags[f], skb->pp_recycle); + __skb_frag_unref(&shinfo->frags[f], skb_pp_recycle(skb)); } /** diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 180df58e85c72eaa16f5cb56b56d181a379b8921..7a2783a2c9608eec728a0adacea4619ab1c62791 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -801,19 +801,13 @@ static void skb_clone_fraglist(struct sk_buff *skb) skb_get(list); } -static bool skb_pp_recycle(struct sk_buff *skb, void *data) -{ - if (!IS_ENABLED(CONFIG_PAGE_POOL) || !skb->pp_recycle) - return false; - return page_pool_return_skb_page(virt_to_page(data)); -} - static void skb_free_head(struct sk_buff *skb) { unsigned char *head = skb->head; if (skb->head_frag) { - if (skb_pp_recycle(skb, head)) + if (skb_pp_recycle(skb) && + page_pool_return_skb_page(virt_to_page(head))) return; skb_free_frag(head); } else { @@ -840,7 +834,7 @@ static void skb_release_data(struct sk_buff *skb, enum skb_drop_reason reason) } for (i = 0; i < shinfo->nr_frags; i++) - __skb_frag_unref(&shinfo->frags[i], skb->pp_recycle); + __skb_frag_unref(&shinfo->frags[i], skb_pp_recycle(skb)); free_head: if (shinfo->frag_list) @@ -857,7 +851,10 @@ static void skb_release_data(struct sk_buff *skb, enum skb_drop_reason reason) * Eventually the last SKB will have the recycling bit set and it's * dataref set to 0, which will trigger the recycling */ +#ifdef CONFIG_PAGE_POOL skb->pp_recycle = 0; +#endif + return; } /* @@ -1292,7 +1289,9 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb) n->nohdr = 0; n->peeked = 0; C(pfmemalloc); +#ifdef CONFIG_PAGE_POOL C(pp_recycle); +#endif n->destructor = NULL; C(tail); C(end); @@ -3859,7 +3858,7 @@ int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen) fragto = &skb_shinfo(tgt)->frags[merge]; skb_frag_size_add(fragto, skb_frag_size(fragfrom)); - __skb_frag_unref(fragfrom, skb->pp_recycle); + __skb_frag_unref(fragfrom, skb_pp_recycle(skb)); } /* Reposition in the original skb */ @@ -5529,7 +5528,7 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, * references for cloned SKBs at the moment that would result in * inconsistent reference counts. */ - if (to->pp_recycle != (from->pp_recycle && !skb_cloned(from))) + if (skb_pp_recycle(to) != (skb_pp_recycle(from) && !skb_cloned(from))) return false; if (len <= skb_tailroom(to)) {
On Sat, Jan 28, 2023 at 6:26 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Sat, 28 Jan 2023 10:37:47 +0800 Yunsheng Lin wrote: > > If we are not allowing gro for the above case, setting NAPI_GRO_CB(p)->flush > > to 1 in gro_list_prepare() seems to be making more sense so that the above > > case has the same handling as skb_has_frag_list() handling? > > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/core/gro.c#L503 > > > > As it seems to avoid some unnecessary operation according to comment > > in tcp4_gro_receive(): > > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/ipv4/tcp_offload.c#L322 > > The frag_list case can be determined with just the input skb. > For pp_recycle we need to compare input skb's pp_recycle with > the pp_recycle of the skb already held by GRO. > > I'll hold off with applying a bit longer tho, in case Eric > wants to chime in with an ack or opinion. Doing the test only if the final step (once all headers have been verified) seems less costly for the vast majority of the cases the driver cooks skbs with a consistent pp_recycle bit ? So Alex patch seems less expensive to me than adding the check very early.
Hello: This patch was applied to netdev/net.git (master) by Jakub Kicinski <kuba@kernel.org>: On Thu, 26 Jan 2023 11:06:59 -0800 you wrote: > From: Alexander Duyck <alexanderduyck@fb.com> > > GSO should not merge page pool recycled frames with standard reference > counted frames. Traditionally this didn't occur, at least not often. > However as we start looking at adding support for wireless adapters there > becomes the potential to mix the two due to A-MSDU repartitioning frames in > the receive path. There are possibly other places where this may have > occurred however I suspect they must be few and far between as we have not > seen this issue until now. > > [...] Here is the summary with links: - [net] skb: Do mix page pool and page referenced frags in GRO https://git.kernel.org/netdev/net/c/7d2c89b32587 You are awesome, thank you!
On Fri, Jan 27, 2023 at 11:16 PM Eric Dumazet <edumazet@google.com> wrote: > > On Sat, Jan 28, 2023 at 6:26 AM Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Sat, 28 Jan 2023 10:37:47 +0800 Yunsheng Lin wrote: > > > If we are not allowing gro for the above case, setting NAPI_GRO_CB(p)->flush > > > to 1 in gro_list_prepare() seems to be making more sense so that the above > > > case has the same handling as skb_has_frag_list() handling? > > > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/core/gro.c#L503 > > > > > > As it seems to avoid some unnecessary operation according to comment > > > in tcp4_gro_receive(): > > > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/ipv4/tcp_offload.c#L322 > > > > The frag_list case can be determined with just the input skb. > > For pp_recycle we need to compare input skb's pp_recycle with > > the pp_recycle of the skb already held by GRO. > > > > I'll hold off with applying a bit longer tho, in case Eric > > wants to chime in with an ack or opinion. > > Doing the test only if the final step (once all headers have been > verified) seems less costly > for the vast majority of the cases the driver cooks skbs with a > consistent pp_recycle bit ? > > So Alex patch seems less expensive to me than adding the check very early. That was the general idea. Basically there is no need to look into this until we are looking at merging the skb and it is very unlikely that we will see a mix of page pool and non-page pool skbs. I considered this check to be something equivalent to discovering there is no space in the skb to store the frags so that is one of the reasons why I had picked the spot that I did.
On Sat, 2023-01-28 at 08:08 +0100, Eric Dumazet wrote: > On Sat, Jan 28, 2023 at 6:26 AM Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Sat, 28 Jan 2023 10:37:47 +0800 Yunsheng Lin wrote: > > > If we are not allowing gro for the above case, setting NAPI_GRO_CB(p)->flush > > > to 1 in gro_list_prepare() seems to be making more sense so that the above > > > case has the same handling as skb_has_frag_list() handling? > > > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/core/gro.c#L503 > > > > > > As it seems to avoid some unnecessary operation according to comment > > > in tcp4_gro_receive(): > > > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/ipv4/tcp_offload.c#L322 > > > > The frag_list case can be determined with just the input skb. > > For pp_recycle we need to compare input skb's pp_recycle with > > the pp_recycle of the skb already held by GRO. > > > > I'll hold off with applying a bit longer tho, in case Eric > > wants to chime in with an ack or opinion. > > We can say that we are adding in the fast path an expensive check > about an unlikely condition. > > GRO is by far the most expensive component in our stack. Slightly related to the above: currently the GRO engine performs the skb metadata check for every packet. My understanding is that even with XDP enabled and ebpf running on the given packet, the skb should usually have meta_len == 0. What about setting 'skb->slow_gro' together with meta_len and moving the skb_metadata_differs() check under the slow_gro guard? Cheers, Paolo
On Mon, Jan 30, 2023 at 12:50 AM Paolo Abeni <pabeni@redhat.com> wrote: > > On Sat, 2023-01-28 at 08:08 +0100, Eric Dumazet wrote: > > On Sat, Jan 28, 2023 at 6:26 AM Jakub Kicinski <kuba@kernel.org> wrote: > > > > > > On Sat, 28 Jan 2023 10:37:47 +0800 Yunsheng Lin wrote: > > > > If we are not allowing gro for the above case, setting NAPI_GRO_CB(p)->flush > > > > to 1 in gro_list_prepare() seems to be making more sense so that the above > > > > case has the same handling as skb_has_frag_list() handling? > > > > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/core/gro.c#L503 > > > > > > > > As it seems to avoid some unnecessary operation according to comment > > > > in tcp4_gro_receive(): > > > > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/ipv4/tcp_offload.c#L322 > > > > > > The frag_list case can be determined with just the input skb. > > > For pp_recycle we need to compare input skb's pp_recycle with > > > the pp_recycle of the skb already held by GRO. > > > > > > I'll hold off with applying a bit longer tho, in case Eric > > > wants to chime in with an ack or opinion. > > > > We can say that we are adding in the fast path an expensive check > > about an unlikely condition. > > > > GRO is by far the most expensive component in our stack. > > Slightly related to the above: currently the GRO engine performs the > skb metadata check for every packet. My understanding is that even with > XDP enabled and ebpf running on the given packet, the skb should > usually have meta_len == 0. > > What about setting 'skb->slow_gro' together with meta_len and moving > the skb_metadata_differs() check under the slow_gro guard? Makes sense to me, especially since we have to do a pointer chase to get the metadata length out of the shared info. Looking at the code one thing I was wondering about is if we should be flagging frames where one is slow_gro and one is not as having a diff and just skipping the checks since we know the slow_gro checks are expensive and if they differ based on that flag odds are one will have a field present that the other doesn't.
On 27/01/2023 00.13, Jakub Kicinski wrote: > On Thu, 26 Jan 2023 11:06:59 -0800 Alexander Duyck wrote: >> From: Alexander Duyck<alexanderduyck@fb.com> >> >> GSO should not merge page pool recycled frames with standard reference >> counted frames. Traditionally this didn't occur, at least not often. >> However as we start looking at adding support for wireless adapters there >> becomes the potential to mix the two due to A-MSDU repartitioning frames in >> the receive path. There are possibly other places where this may have >> occurred however I suspect they must be few and far between as we have not >> seen this issue until now. >> >> Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool") >> Reported-by: Felix Fietkau<nbd@nbd.name> >> Signed-off-by: Alexander Duyck<alexanderduyck@fb.com> > Exciting investigation! > Felix, out of curiosity - the impact of loosing GRO on performance is > not significant enough to care? We could possibly try to switch to > using the frag list if we can't merge into frags safely. Using the frag list sounds scary, because we recently learned that kfree_skb_list requires all SKBs on the list to have same refcnt (else the walking of the list can lead to other bugs). --Jesper
diff --git a/net/core/gro.c b/net/core/gro.c index 506f83d715f8..4bac7ea6e025 100644 --- a/net/core/gro.c +++ b/net/core/gro.c @@ -162,6 +162,15 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb) struct sk_buff *lp; int segs; + /* Do not splice page pool based packets w/ non-page pool + * packets. This can result in reference count issues as page + * pool pages will not decrement the reference count and will + * instead be immediately returned to the pool or have frag + * count decremented. + */ + if (p->pp_recycle != skb->pp_recycle) + return -ETOOMANYREFS; + /* pairs with WRITE_ONCE() in netif_set_gro_max_size() */ gro_max_size = READ_ONCE(p->dev->gro_max_size);