Message ID | 20230313215553.1045175-4-aleksander.lobakin@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1436521wrd; Mon, 13 Mar 2023 15:18:52 -0700 (PDT) X-Google-Smtp-Source: AK7set8M4w4+4cfrfh2VqPnrOAvES6CTfug7ZOnCPgvZeY55h8ULJQKyOfBiQTailhKQUnJflHlD X-Received: by 2002:a05:6a20:3cab:b0:cd:3bca:cfa2 with SMTP id b43-20020a056a203cab00b000cd3bcacfa2mr38158166pzj.23.1678745932062; Mon, 13 Mar 2023 15:18:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678745932; cv=none; d=google.com; s=arc-20160816; b=b//DMfr4wBFmjkkOA3u7LN1Xzey2Ea+BfVj0zKKL3hXm857boiJXT08q6Sgo+om5hr ZNZEN1NxYdNIYqGOMXRpSsywjA4j8G9eLjL6w+QKxfFdtrEH9vWakmmVbIPgVSgNqL0H d6R7S8DaWZSs/6/uB2JWDyHpjMSbgjYYZTBdOCwjEvLAhqUO1sHuyI+qNSYae5X505kd PxwAqW48iX6zo6S3OUr32kjqf6KR1W0CH8lfnZC9pPx3sSXPVHLUj1xFcZfDzfLeK4Fe LcYuWZSyD5d6RkT7BbxkWhgG+KFpomKWtc1vqc1/+BDvWiw+Z1GGnE49wq14ErrNq0St yD5w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=2DU+rl9zRBk1Nx+b+CkkzoF0vJbv2MVMdHq1+gT7Id8=; b=rph1EKntBtqJsG6jBeNgAG6puXFKnrt3KwXgcN3ZyPutlRJweVdsoKt94o1z/A34Xq 27Ngpt5bYuqcY5JBPRf/t0hh1m3PLvwvTSH09Vmc8m2QEvMAZoT6baCSnYpYzdUi/hMF NHChsGPwYqiBdPnWVN7d1RV9N41u0YFu0OxZCNF+S9kvI6L/WNlCliLaVFTvYVgcVyHs tNiCHfOJNVn8z6ucZIJag1n3BVWo/J5AeL84ZBF3ipjSGYbBcfq3TRjWvtFDjst5ujRV xBfE/yBHbWmuZgNamAko/rdVZRNCKOjcS6DljOLQQd+E43Ci/tfoIXG0xEAmhsoT8neI DGGQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=MyGi+7i6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v28-20020a63481c000000b00509461bbf1csi562624pga.79.2023.03.13.15.18.36; Mon, 13 Mar 2023 15:18:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=MyGi+7i6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230097AbjCMV50 (ORCPT <rfc822;realc9580@gmail.com> + 99 others); Mon, 13 Mar 2023 17:57:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52426 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229743AbjCMV5R (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 13 Mar 2023 17:57:17 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F344B8F530; Mon, 13 Mar 2023 14:57:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1678744627; x=1710280627; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=WV2QgxuHT3p8j+KGI77slUqYCqNGD1cnP+Xz71qqoTc=; b=MyGi+7i6tdmk2MU1rQoilOIcb3i+ftu+R7cev1k6nvB9N8j/eiX/7ikb TiPcIU7Ig5DuB++0TdD9NV3UN6ccz22Wb8klZRgFHW6YcEGV+OmLLLH40 2l785/t4u6TOCdwYywj1pqhFuvMVovd5kBqEcTgw01qRx0ZiHsJtco/9M ZF4PnY9aMGL/ezotL0/DNKOrTlMThy0MXAQZEpJy6cOuYW9sB6r/H6/VE MdtqaIoF0PI9XBylFd5uazqtjXDQL0nWu1VhZxC2fGqQ/YCd8YLoj5jVg vmszJNbLngkz84v1cFJJgQGRiSv7HHkxKyN0Ku2eY7oy7cGAS+L3DmFsj w==; X-IronPort-AV: E=McAfee;i="6500,9779,10648"; a="364928655" X-IronPort-AV: E=Sophos;i="5.98,258,1673942400"; d="scan'208";a="364928655" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Mar 2023 14:57:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10648"; a="747750981" X-IronPort-AV: E=Sophos;i="5.98,258,1673942400"; d="scan'208";a="747750981" Received: from newjersey.igk.intel.com ([10.102.20.203]) by fmsmga004.fm.intel.com with ESMTP; 13 Mar 2023 14:57:03 -0700 From: Alexander Lobakin <aleksander.lobakin@intel.com> To: Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Andrii Nakryiko <andrii@kernel.org>, Martin KaFai Lau <martin.lau@linux.dev> Cc: Alexander Lobakin <aleksander.lobakin@intel.com>, Maciej Fijalkowski <maciej.fijalkowski@intel.com>, Larysa Zaremba <larysa.zaremba@intel.com>, =?utf-8?q?Toke_H=C3=B8iland-J?= =?utf-8?q?=C3=B8rgensen?= <toke@redhat.com>, Song Liu <song@kernel.org>, Jesper Dangaard Brouer <hawk@kernel.org>, John Fastabend <john.fastabend@gmail.com>, Menglong Dong <imagedong@tencent.com>, Mykola Lysenko <mykolal@fb.com>, "David S. Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, Eric Dumazet <edumazet@google.com>, Paolo Abeni <pabeni@redhat.com>, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH bpf-next v3 3/4] xdp: recycle Page Pool backed skbs built from XDP frames Date: Mon, 13 Mar 2023 22:55:52 +0100 Message-Id: <20230313215553.1045175-4-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230313215553.1045175-1-aleksander.lobakin@intel.com> References: <20230313215553.1045175-1-aleksander.lobakin@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760292137802948721?= X-GMAIL-MSGID: =?utf-8?q?1760292694286087498?= |
Series |
xdp: recycle Page Pool backed skbs built from XDP frames
|
|
Commit Message
Alexander Lobakin
March 13, 2023, 9:55 p.m. UTC
__xdp_build_skb_from_frame() state(d):
/* Until page_pool get SKB return path, release DMA here */
Page Pool got skb pages recycling in April 2021, but missed this
function.
xdp_release_frame() is relevant only for Page Pool backed frames and it
detaches the page from the corresponding page_pool in order to make it
freeable via page_frag_free(). It can instead just mark the output skb
as eligible for recycling if the frame is backed by a pp. No change for
other memory model types (the same condition check as before).
cpumap redirect and veth on Page Pool drivers now become zero-alloc (or
almost).
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
net/core/xdp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Comments
On 13/03/2023 22.55, Alexander Lobakin wrote: > __xdp_build_skb_from_frame() state(d): > > /* Until page_pool get SKB return path, release DMA here */ > > Page Pool got skb pages recycling in April 2021, but missed this > function. > > xdp_release_frame() is relevant only for Page Pool backed frames and it > detaches the page from the corresponding page_pool in order to make it > freeable via page_frag_free(). It can instead just mark the output skb > as eligible for recycling if the frame is backed by a pp. No change for > other memory model types (the same condition check as before). > cpumap redirect and veth on Page Pool drivers now become zero-alloc (or > almost). > > Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> > --- > net/core/xdp.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/core/xdp.c b/net/core/xdp.c > index 8c92fc553317..a2237cfca8e9 100644 > --- a/net/core/xdp.c > +++ b/net/core/xdp.c > @@ -658,8 +658,8 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, > * - RX ring dev queue index (skb_record_rx_queue) > */ > > - /* Until page_pool get SKB return path, release DMA here */ > - xdp_release_frame(xdpf); > + if (xdpf->mem.type == MEM_TYPE_PAGE_POOL) > + skb_mark_for_recycle(skb); I hope this is safe ;-) ... Meaning hopefully drivers does the correct thing when XDP_REDIRECT'ing page_pool pages. Looking for drivers doing weird refcnt tricks and XDP_REDIRECT'ing, I noticed the driver aquantia/atlantic (in aq_get_rxpages_xdp), but I now see this is not using page_pool, so it should be affected by this (but I worry if atlantic driver have a potential race condition for its refcnt scheme). > > /* Allow SKB to reuse area used by xdp_frame */ > xdp_scrub_frame(xdpf);
From: Jesper Dangaard Brouer <jbrouer@redhat.com> Date: Wed, 15 Mar 2023 15:55:44 +0100 > > On 13/03/2023 22.55, Alexander Lobakin wrote: >> __xdp_build_skb_from_frame() state(d): >> >> /* Until page_pool get SKB return path, release DMA here */ >> >> Page Pool got skb pages recycling in April 2021, but missed this >> function. >> >> xdp_release_frame() is relevant only for Page Pool backed frames and it >> detaches the page from the corresponding page_pool in order to make it >> freeable via page_frag_free(). It can instead just mark the output skb >> as eligible for recycling if the frame is backed by a pp. No change for >> other memory model types (the same condition check as before). >> cpumap redirect and veth on Page Pool drivers now become zero-alloc (or >> almost). >> >> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> >> --- >> net/core/xdp.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/net/core/xdp.c b/net/core/xdp.c >> index 8c92fc553317..a2237cfca8e9 100644 >> --- a/net/core/xdp.c >> +++ b/net/core/xdp.c >> @@ -658,8 +658,8 @@ struct sk_buff *__xdp_build_skb_from_frame(struct >> xdp_frame *xdpf, >> * - RX ring dev queue index (skb_record_rx_queue) >> */ >> - /* Until page_pool get SKB return path, release DMA here */ >> - xdp_release_frame(xdpf); >> + if (xdpf->mem.type == MEM_TYPE_PAGE_POOL) >> + skb_mark_for_recycle(skb); > > I hope this is safe ;-) ... Meaning hopefully drivers does the correct > thing when XDP_REDIRECT'ing page_pool pages. Safe when it's done by the schoolbook. For now I'm observing only one syzbot issue with test_run due to that it assumes yet another bunch o'things I wouldn't rely on :D (separate subthread) > > Looking for drivers doing weird refcnt tricks and XDP_REDIRECT'ing, I > noticed the driver aquantia/atlantic (in aq_get_rxpages_xdp), but I now > see this is not using page_pool, so it should be affected by this (but I > worry if atlantic driver have a potential race condition for its refcnt > scheme). If we encounter some driver using Page Pool, but mangling refcounts on redirect, we'll fix it ;) > >> /* Allow SKB to reuse area used by xdp_frame */ >> xdp_scrub_frame(xdpf); > Thanks, Olek
On 15/03/2023 15.58, Alexander Lobakin wrote: > From: Jesper Dangaard Brouer <jbrouer@redhat.com> > Date: Wed, 15 Mar 2023 15:55:44 +0100 > >> On 13/03/2023 22.55, Alexander Lobakin wrote: [...] >>> >>> diff --git a/net/core/xdp.c b/net/core/xdp.c >>> index 8c92fc553317..a2237cfca8e9 100644 >>> --- a/net/core/xdp.c >>> +++ b/net/core/xdp.c >>> @@ -658,8 +658,8 @@ struct sk_buff *__xdp_build_skb_from_frame(struct >>> xdp_frame *xdpf, >>> * - RX ring dev queue index (skb_record_rx_queue) >>> */ >>> - /* Until page_pool get SKB return path, release DMA here */ >>> - xdp_release_frame(xdpf); >>> + if (xdpf->mem.type == MEM_TYPE_PAGE_POOL) >>> + skb_mark_for_recycle(skb); >> >> I hope this is safe ;-) ... Meaning hopefully drivers does the correct >> thing when XDP_REDIRECT'ing page_pool pages. > > Safe when it's done by the schoolbook. For now I'm observing only one > syzbot issue with test_run due to that it assumes yet another bunch > o'things I wouldn't rely on :D (separate subthread) > >> >> Looking for drivers doing weird refcnt tricks and XDP_REDIRECT'ing, I >> noticed the driver aquantia/atlantic (in aq_get_rxpages_xdp), but I now >> see this is not using page_pool, so it should be affected by this (but I >> worry if atlantic driver have a potential race condition for its refcnt >> scheme). > > If we encounter some driver using Page Pool, but mangling refcounts on > redirect, we'll fix it ;) > Thanks for signing up for fixing these issues down-the-road :-) For what is it worth, I've rebased to include this patchset on my testlab. For now, I've tested mlx5 with cpumap redirect and net stack processing, everything seems to be working nicely. When disabling GRO/GRO, then the cpumap get same and sometimes better TCP throughput performance, even-though checksum have to be done in software. (Hopefully we can soon close the missing HW checksum gap with XDP-hints). --Jesper
From: Jesper Dangaard Brouer <jbrouer@redhat.com> Date: Thu, 16 Mar 2023 18:10:26 +0100 > > On 15/03/2023 15.58, Alexander Lobakin wrote: >> From: Jesper Dangaard Brouer <jbrouer@redhat.com> >> Date: Wed, 15 Mar 2023 15:55:44 +0100 [...] > Thanks for signing up for fixing these issues down-the-road :-) At some point, I wasn't sure which commit tags to put to Fixes:. Like, from one PoV, it's not my patch which introduced them. From the other side, there was no chance to have 0x42 overwritten in the metadata during the selftest before that switch and no one could even predict it (I didn't expect XDP_PASS frames from the test_run to reach neigh xmit at all), so the original code is not buggy itself as well ._. > > For what is it worth, I've rebased to include this patchset on my > testlab. > > For now, I've tested mlx5 with cpumap redirect and net stack processing, > everything seems to be working nicely. When disabling GRO/GRO, then the > cpumap get same and sometimes better TCP throughput performance, > even-though checksum have to be done in software. (Hopefully we can soon > close the missing HW checksum gap with XDP-hints). Yeah I'm also looking forward to having some hints being passed to cpumap/veth, so that __xdp_build_skb_from_frame() could consume it. So that I could pick a bunch of patches from my RFC back to switch cpumap to GRO finally :D > > --Jesper > Thanks, Olek
diff --git a/net/core/xdp.c b/net/core/xdp.c index 8c92fc553317..a2237cfca8e9 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -658,8 +658,8 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, * - RX ring dev queue index (skb_record_rx_queue) */ - /* Until page_pool get SKB return path, release DMA here */ - xdp_release_frame(xdpf); + if (xdpf->mem.type == MEM_TYPE_PAGE_POOL) + skb_mark_for_recycle(skb); /* Allow SKB to reuse area used by xdp_frame */ xdp_scrub_frame(xdpf);