Message ID | 20230301160315.1022488-2-aleksander.lobakin@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp3715891wrd; Wed, 1 Mar 2023 08:06:12 -0800 (PST) X-Google-Smtp-Source: AK7set86rcynl+jehBuebtXkHJp8KBGXiu+xV6H142q4l7ir0qZi0I9HZEBCp0ulGXc88juQfiVp X-Received: by 2002:a17:907:7f1e:b0:8b1:76dd:f5ef with SMTP id qf30-20020a1709077f1e00b008b176ddf5efmr9428083ejc.5.1677686771977; Wed, 01 Mar 2023 08:06:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1677686771; cv=none; d=google.com; s=arc-20160816; b=BMWqsrSvtQo8wUKtXnx8d/soF6A9oVed+wYub94W88uc28r/BfMlKOm5S9WzvOnADg aqZ4n3QBEZVKNHN1b+3PeAmUuTHz/8p4144N95YbF34Hv+fpLx6Ov+lDCOcOTrJCqvJF 0ZVv/c0FGMPgdoKHBu7kqJiZipgcgkU+KaV0CY39ognXcXDiIdWFWsJvX/C406e3lCrY u6e9sFYGT1P45Kl8IQ7zeY/QGkZQ/7mG5wf6ULmuVR9t5oZV2M/O1+9RiG4CYV4Ou0CG +1muB5BvgR65upe65D6nNJjv0JBqhuvfKCFqma6Ivc4KzDxUI59nLL28LZobBs00Zrbj 9Q+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=sNYJNKG3NTTVTuj9+moiASnwPzyA0oRP5p905skLJp4=; b=qJkV4nowVd7r4ZwF5r88GXwvzywtMM6rB30kyPr6MxYMglC4OMxKlVoNUPWPLRqdBb 3ZPNUDp0AIcjnKRWdBhSjb+arrgk8RuhW1HppWBp+BIZbb1FrPqylGGNX6f9U+LW8sG9 enujVE3dkkkQxFt5BDVUU+kMc0oN19VxajP268ft8WNlsgOhRB/JLDtoWSlaNQogSiwC bBBmnxEfT0ZyZ1y2fGC4BxZBUuRas49glUdqpIz0aAu4N10RT2ITfcMacjrgp5uXA3+J k5R1c+cL0argIl+0M4p6Ik911ia2AVtl1dOLX0SzK7MQxmznIPgs9k8EXmLppO39Gxyk AL1g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Pvdlb2x9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l8-20020a1709060e0800b008b31e35479csi14276533eji.639.2023.03.01.08.05.49; Wed, 01 Mar 2023 08:06:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Pvdlb2x9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229892AbjCAQEV (ORCPT <rfc822;david.simonyants@gmail.com> + 99 others); Wed, 1 Mar 2023 11:04:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54682 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229753AbjCAQES (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 1 Mar 2023 11:04:18 -0500 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E9167392AD; Wed, 1 Mar 2023 08:04:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1677686658; x=1709222658; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7yX/2MWa8HFobhzC8fLR29+A0vGAbF/zFcR/r79lMZo=; b=Pvdlb2x9hASV9XkK5grxk7vJnZa2qqmsyeh9Rtpk09dQaPTwj+yrMuqj pdX/YY3ouFj3nCmsGkD3HKy8Q81gjVBfr83gFpmpw/Dfmu8CtJl/0dvhZ pCro3ne5OzCJowbbflEP86HEQkaqEdFs/EFb5X83oQlDiE8pGjon6+TrH /ymHZFP8nd8HBDJrxcsN+NyNSN7pdRORpGQVjJ/HwEpglneKvKIkXghmz Aw6SxDVX0QJLY0jNqTyfEWBbjdtYolRWJLv/V6EsqXoLU+nsX54iDcyEK pus2SpGyX1giEC+8mf6gdHWAdoF3SOsm7N3CSBaOMnlK/h5Tb0iQEkc+0 g==; X-IronPort-AV: E=McAfee;i="6500,9779,10636"; a="322709885" X-IronPort-AV: E=Sophos;i="5.98,225,1673942400"; d="scan'208";a="322709885" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Mar 2023 08:04:16 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10636"; a="848686060" X-IronPort-AV: E=Sophos;i="5.98,225,1673942400"; d="scan'208";a="848686060" Received: from irvmail002.ir.intel.com ([10.43.11.120]) by orsmga005.jf.intel.com with ESMTP; 01 Mar 2023 08:04:12 -0800 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail002.ir.intel.com (Postfix) with ESMTP id 0844B36C08; Wed, 1 Mar 2023 16:04:11 +0000 (GMT) From: Alexander Lobakin <aleksander.lobakin@intel.com> To: Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Andrii Nakryiko <andrii@kernel.org>, Martin KaFai Lau <martin.lau@linux.dev> Cc: Alexander Lobakin <aleksander.lobakin@intel.com>, Maciej Fijalkowski <maciej.fijalkowski@intel.com>, Larysa Zaremba <larysa.zaremba@intel.com>, =?utf-8?q?Toke_H=C3=B8iland-J?= =?utf-8?q?=C3=B8rgensen?= <toke@redhat.com>, Song Liu <song@kernel.org>, Jesper Dangaard Brouer <hawk@kernel.org>, Jakub Kicinski <kuba@kernel.org>, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH bpf-next v1 1/2] xdp: recycle Page Pool backed skbs built from XDP frames Date: Wed, 1 Mar 2023 17:03:14 +0100 Message-Id: <20230301160315.1022488-2-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230301160315.1022488-1-aleksander.lobakin@intel.com> References: <20230301160315.1022488-1-aleksander.lobakin@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759182048609422746?= X-GMAIL-MSGID: =?utf-8?q?1759182084399229574?= |
Series |
xdp: recycle Page Pool backed skbs built from XDP frames
|
|
Commit Message
Alexander Lobakin
March 1, 2023, 4:03 p.m. UTC
__xdp_build_skb_from_frame() state(d):
/* Until page_pool get SKB return path, release DMA here */
Page Pool got skb pages recycling in April 2021, but missed this
function.
xdp_release_frame() is relevant only for Page Pool backed frames and it
detaches the page from the corresponding Pool in order to make it
freeable via page_frag_free(). It can instead just mark the output skb
as eligible for recycling if the frame is backed by a PP. No change for
other memory model types (the same condition check as before).
cpumap redirect and veth on Page Pool drivers now become zero-alloc (or
almost).
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
net/core/xdp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Comments
Hi Alexander, Thank you for the patch! Yet something to improve: [auto build test ERROR on bpf-next/master] url: https://github.com/intel-lab-lkp/linux/commits/Alexander-Lobakin/xdp-recycle-Page-Pool-backed-skbs-built-from-XDP-frames/20230302-000635 base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master patch link: https://lore.kernel.org/r/20230301160315.1022488-2-aleksander.lobakin%40intel.com patch subject: [PATCH bpf-next v1 1/2] xdp: recycle Page Pool backed skbs built from XDP frames config: i386-randconfig-a001 (https://download.01.org/0day-ci/archive/20230302/202303020331.PSFMFbXw-lkp@intel.com/config) compiler: gcc-11 (Debian 11.3.0-8) 11.3.0 reproduce (this is a W=1 build): # https://github.com/intel-lab-lkp/linux/commit/a5ca5578e9bd35220be091fd02df96d492120ee3 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Alexander-Lobakin/xdp-recycle-Page-Pool-backed-skbs-built-from-XDP-frames/20230302-000635 git checkout a5ca5578e9bd35220be091fd02df96d492120ee3 # save the config file mkdir build_dir && cp config build_dir/.config make W=1 O=build_dir ARCH=i386 olddefconfig make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> | Link: https://lore.kernel.org/oe-kbuild-all/202303020331.PSFMFbXw-lkp@intel.com/ All errors (new ones prefixed by >>): net/core/xdp.c: In function '__xdp_build_skb_from_frame': >> net/core/xdp.c:662:17: error: implicit declaration of function 'skb_mark_for_recycle' [-Werror=implicit-function-declaration] 662 | skb_mark_for_recycle(skb); | ^~~~~~~~~~~~~~~~~~~~ cc1: some warnings being treated as errors vim +/skb_mark_for_recycle +662 net/core/xdp.c 614 615 struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, 616 struct sk_buff *skb, 617 struct net_device *dev) 618 { 619 struct skb_shared_info *sinfo = xdp_get_shared_info_from_frame(xdpf); 620 unsigned int headroom, frame_size; 621 void *hard_start; 622 u8 nr_frags; 623 624 /* xdp frags frame */ 625 if (unlikely(xdp_frame_has_frags(xdpf))) 626 nr_frags = sinfo->nr_frags; 627 628 /* Part of headroom was reserved to xdpf */ 629 headroom = sizeof(*xdpf) + xdpf->headroom; 630 631 /* Memory size backing xdp_frame data already have reserved 632 * room for build_skb to place skb_shared_info in tailroom. 633 */ 634 frame_size = xdpf->frame_sz; 635 636 hard_start = xdpf->data - headroom; 637 skb = build_skb_around(skb, hard_start, frame_size); 638 if (unlikely(!skb)) 639 return NULL; 640 641 skb_reserve(skb, headroom); 642 __skb_put(skb, xdpf->len); 643 if (xdpf->metasize) 644 skb_metadata_set(skb, xdpf->metasize); 645 646 if (unlikely(xdp_frame_has_frags(xdpf))) 647 xdp_update_skb_shared_info(skb, nr_frags, 648 sinfo->xdp_frags_size, 649 nr_frags * xdpf->frame_sz, 650 xdp_frame_is_frag_pfmemalloc(xdpf)); 651 652 /* Essential SKB info: protocol and skb->dev */ 653 skb->protocol = eth_type_trans(skb, dev); 654 655 /* Optional SKB info, currently missing: 656 * - HW checksum info (skb->ip_summed) 657 * - HW RX hash (skb_set_hash) 658 * - RX ring dev queue index (skb_record_rx_queue) 659 */ 660 661 if (xdpf->mem.type == MEM_TYPE_PAGE_POOL) > 662 skb_mark_for_recycle(skb); 663 664 /* Allow SKB to reuse area used by xdp_frame */ 665 xdp_scrub_frame(xdpf); 666 667 return skb; 668 } 669 EXPORT_SYMBOL_GPL(__xdp_build_skb_from_frame); 670
Hi Alexander, Thank you for the patch! Yet something to improve: [auto build test ERROR on bpf-next/master] url: https://github.com/intel-lab-lkp/linux/commits/Alexander-Lobakin/xdp-recycle-Page-Pool-backed-skbs-built-from-XDP-frames/20230302-000635 base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master patch link: https://lore.kernel.org/r/20230301160315.1022488-2-aleksander.lobakin%40intel.com patch subject: [PATCH bpf-next v1 1/2] xdp: recycle Page Pool backed skbs built from XDP frames config: x86_64-randconfig-a014 (https://download.01.org/0day-ci/archive/20230302/202303020342.Wi2PRFFH-lkp@intel.com/config) compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/a5ca5578e9bd35220be091fd02df96d492120ee3 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Alexander-Lobakin/xdp-recycle-Page-Pool-backed-skbs-built-from-XDP-frames/20230302-000635 git checkout a5ca5578e9bd35220be091fd02df96d492120ee3 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 olddefconfig COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> | Link: https://lore.kernel.org/oe-kbuild-all/202303020342.Wi2PRFFH-lkp@intel.com/ All errors (new ones prefixed by >>): >> net/core/xdp.c:662:3: error: implicit declaration of function 'skb_mark_for_recycle' is invalid in C99 [-Werror,-Wimplicit-function-declaration] skb_mark_for_recycle(skb); ^ 1 error generated. vim +/skb_mark_for_recycle +662 net/core/xdp.c 614 615 struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, 616 struct sk_buff *skb, 617 struct net_device *dev) 618 { 619 struct skb_shared_info *sinfo = xdp_get_shared_info_from_frame(xdpf); 620 unsigned int headroom, frame_size; 621 void *hard_start; 622 u8 nr_frags; 623 624 /* xdp frags frame */ 625 if (unlikely(xdp_frame_has_frags(xdpf))) 626 nr_frags = sinfo->nr_frags; 627 628 /* Part of headroom was reserved to xdpf */ 629 headroom = sizeof(*xdpf) + xdpf->headroom; 630 631 /* Memory size backing xdp_frame data already have reserved 632 * room for build_skb to place skb_shared_info in tailroom. 633 */ 634 frame_size = xdpf->frame_sz; 635 636 hard_start = xdpf->data - headroom; 637 skb = build_skb_around(skb, hard_start, frame_size); 638 if (unlikely(!skb)) 639 return NULL; 640 641 skb_reserve(skb, headroom); 642 __skb_put(skb, xdpf->len); 643 if (xdpf->metasize) 644 skb_metadata_set(skb, xdpf->metasize); 645 646 if (unlikely(xdp_frame_has_frags(xdpf))) 647 xdp_update_skb_shared_info(skb, nr_frags, 648 sinfo->xdp_frags_size, 649 nr_frags * xdpf->frame_sz, 650 xdp_frame_is_frag_pfmemalloc(xdpf)); 651 652 /* Essential SKB info: protocol and skb->dev */ 653 skb->protocol = eth_type_trans(skb, dev); 654 655 /* Optional SKB info, currently missing: 656 * - HW checksum info (skb->ip_summed) 657 * - HW RX hash (skb_set_hash) 658 * - RX ring dev queue index (skb_record_rx_queue) 659 */ 660 661 if (xdpf->mem.type == MEM_TYPE_PAGE_POOL) > 662 skb_mark_for_recycle(skb); 663 664 /* Allow SKB to reuse area used by xdp_frame */ 665 xdp_scrub_frame(xdpf); 666 667 return skb; 668 } 669 EXPORT_SYMBOL_GPL(__xdp_build_skb_from_frame); 670
On 2023/3/2 0:03, Alexander Lobakin wrote: > __xdp_build_skb_from_frame() state(d): > > /* Until page_pool get SKB return path, release DMA here */ > > Page Pool got skb pages recycling in April 2021, but missed this > function. > > xdp_release_frame() is relevant only for Page Pool backed frames and it > detaches the page from the corresponding Pool in order to make it > freeable via page_frag_free(). It can instead just mark the output skb > as eligible for recycling if the frame is backed by a PP. No change for > other memory model types (the same condition check as before). > cpumap redirect and veth on Page Pool drivers now become zero-alloc (or > almost). > > Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> > --- > net/core/xdp.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/core/xdp.c b/net/core/xdp.c > index 8c92fc553317..a2237cfca8e9 100644 > --- a/net/core/xdp.c > +++ b/net/core/xdp.c > @@ -658,8 +658,8 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, > * - RX ring dev queue index (skb_record_rx_queue) > */ > > - /* Until page_pool get SKB return path, release DMA here */ > - xdp_release_frame(xdpf); > + if (xdpf->mem.type == MEM_TYPE_PAGE_POOL) > + skb_mark_for_recycle(skb); We both rely on both skb->pp_recycle and page->pp_magic to decide the page is really from page pool. So there was a few corner case problem when we are sharing a page for different skb in the driver level or calling skb_clone() or skb_try_coalesce(). see: https://github.com/torvalds/linux/commit/2cc3aeb5ecccec0d266813172fcd82b4b5fa5803 https://lore.kernel.org/netdev/MW5PR15MB51214C0513DB08A3607FBC1FBDE19@MW5PR15MB5121.namprd15.prod.outlook.com/t/ https://lore.kernel.org/netdev/167475990764.1934330.11960904198087757911.stgit@localhost.localdomain/ As the 'struct xdp_frame' also use 'struct skb_shared_info' which is sharable, see xdp_get_shared_info_from_frame(). For now xdpf_clone() does not seems to handling frag page yet, so it should be fine for now. IMHO we should find a way to use per-page marker, instead of both per-skb and per-page markers, in order to avoid the above problem for xdp if xdp has a similar processing as skb, as suggested by Eric. https://lore.kernel.org/netdev/CANn89iKgZU4Q+THXupzZi4hETuKuCOvOB=iHpp5JzQTNv_Fg_A@mail.gmail.com/ > > /* Allow SKB to reuse area used by xdp_frame */ > xdp_scrub_frame(xdpf); >
On 02/03/2023 03.30, Yunsheng Lin wrote: > On 2023/3/2 0:03, Alexander Lobakin wrote: >> __xdp_build_skb_from_frame() state(d): >> >> /* Until page_pool get SKB return path, release DMA here */ >> >> Page Pool got skb pages recycling in April 2021, but missed this >> function. >> >> xdp_release_frame() is relevant only for Page Pool backed frames and it >> detaches the page from the corresponding Pool in order to make it >> freeable via page_frag_free(). It can instead just mark the output skb >> as eligible for recycling if the frame is backed by a PP. No change for >> other memory model types (the same condition check as before). >> cpumap redirect and veth on Page Pool drivers now become zero-alloc (or >> almost). >> >> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> >> --- >> net/core/xdp.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/net/core/xdp.c b/net/core/xdp.c >> index 8c92fc553317..a2237cfca8e9 100644 >> --- a/net/core/xdp.c >> +++ b/net/core/xdp.c >> @@ -658,8 +658,8 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, >> * - RX ring dev queue index (skb_record_rx_queue) >> */ >> >> - /* Until page_pool get SKB return path, release DMA here */ >> - xdp_release_frame(xdpf); >> + if (xdpf->mem.type == MEM_TYPE_PAGE_POOL) >> + skb_mark_for_recycle(skb); > > > We both rely on both skb->pp_recycle and page->pp_magic to decide > the page is really from page pool. So there was a few corner case > problem when we are sharing a page for different skb in the driver > level or calling skb_clone() or skb_try_coalesce(). > see: > https://github.com/torvalds/linux/commit/2cc3aeb5ecccec0d266813172fcd82b4b5fa5803 > https://lore.kernel.org/netdev/MW5PR15MB51214C0513DB08A3607FBC1FBDE19@MW5PR15MB5121.namprd15.prod.outlook.com/t/ > https://lore.kernel.org/netdev/167475990764.1934330.11960904198087757911.stgit@localhost.localdomain/ > > As the 'struct xdp_frame' also use 'struct skb_shared_info' which is > sharable, see xdp_get_shared_info_from_frame(). > > For now xdpf_clone() does not seems to handling frag page yet, > so it should be fine for now. > > IMHO we should find a way to use per-page marker, instead of both > per-skb and per-page markers, in order to avoid the above problem > for xdp if xdp has a similar processing as skb, as suggested by Eric. > Moving to a per-page marker can be *more* expensive if the struct-page memory isn't cache-hot. So, if struct-page is accessed anyhow then sure we can move it to a per-page marker. > https://lore.kernel.org/netdev/CANn89iKgZU4Q+THXupzZi4hETuKuCOvOB=iHpp5JzQTNv_Fg_A@mail.gmail.com/ > >> >> /* Allow SKB to reuse area used by xdp_frame */ >> xdp_scrub_frame(xdpf); >> >
From: Yunsheng Lin <linyunsheng@huawei.com> Date: Thu, 2 Mar 2023 10:30:13 +0800 > On 2023/3/2 0:03, Alexander Lobakin wrote: >> __xdp_build_skb_from_frame() state(d): >> >> /* Until page_pool get SKB return path, release DMA here */ >> >> Page Pool got skb pages recycling in April 2021, but missed this >> function. [...] > We both rely on both skb->pp_recycle and page->pp_magic to decide > the page is really from page pool. So there was a few corner case > problem when we are sharing a page for different skb in the driver > level or calling skb_clone() or skb_try_coalesce(). > see: > https://github.com/torvalds/linux/commit/2cc3aeb5ecccec0d266813172fcd82b4b5fa5803 > https://lore.kernel.org/netdev/MW5PR15MB51214C0513DB08A3607FBC1FBDE19@MW5PR15MB5121.namprd15.prod.outlook.com/t/ > https://lore.kernel.org/netdev/167475990764.1934330.11960904198087757911.stgit@localhost.localdomain/ And they are fixed :D No drivers currently which use Page Pool mix PP pages with non-PP. And it's impossible to trigger try_coalesce() or so at least on cpumap path since we're only creating skbs at that moment, they don't come from anywhere else. > > As the 'struct xdp_frame' also use 'struct skb_shared_info' which is > sharable, see xdp_get_shared_info_from_frame(). > > For now xdpf_clone() does not seems to handling frag page yet, > so it should be fine for now. xdpf_clone() clones a frame to a new full page and doesn't copy its skb_shared_info. > > IMHO we should find a way to use per-page marker, instead of both > per-skb and per-page markers, in order to avoid the above problem > for xdp if xdp has a similar processing as skb, as suggested by Eric. > > https://lore.kernel.org/netdev/CANn89iKgZU4Q+THXupzZi4hETuKuCOvOB=iHpp5JzQTNv_Fg_A@mail.gmail.com/ As Jesper already pointed out, not having a quick way to check whether we have to check ::pp_magic at all can decrease performance. So it's rather a shortcut. > >> >> /* Allow SKB to reuse area used by xdp_frame */ >> xdp_scrub_frame(xdpf); >> Thanks, Olek
On 2023/3/3 19:22, Alexander Lobakin wrote: > From: Yunsheng Lin <linyunsheng@huawei.com> > Date: Thu, 2 Mar 2023 10:30:13 +0800 > >> On 2023/3/2 0:03, Alexander Lobakin wrote: >>> __xdp_build_skb_from_frame() state(d): >>> >>> /* Until page_pool get SKB return path, release DMA here */ >>> >>> Page Pool got skb pages recycling in April 2021, but missed this >>> function. > > [...] > >> We both rely on both skb->pp_recycle and page->pp_magic to decide >> the page is really from page pool. So there was a few corner case >> problem when we are sharing a page for different skb in the driver >> level or calling skb_clone() or skb_try_coalesce(). >> see: >> https://github.com/torvalds/linux/commit/2cc3aeb5ecccec0d266813172fcd82b4b5fa5803 >> https://lore.kernel.org/netdev/MW5PR15MB51214C0513DB08A3607FBC1FBDE19@MW5PR15MB5121.namprd15.prod.outlook.com/t/ >> https://lore.kernel.org/netdev/167475990764.1934330.11960904198087757911.stgit@localhost.localdomain/ > > And they are fixed :D > No drivers currently which use Page Pool mix PP pages with non-PP. And The wireless adapter which use Page Pool *does* mix PP pages with non-PP, see below discussion: https://lore.kernel.org/netdev/156f3e120bd0757133cb6bc11b76889637b5e0a6.camel@gmail.com/ > it's impossible to trigger try_coalesce() or so at least on cpumap path > since we're only creating skbs at that moment, they don't come from > anywhere else. > >> >> As the 'struct xdp_frame' also use 'struct skb_shared_info' which is >> sharable, see xdp_get_shared_info_from_frame(). >> >> For now xdpf_clone() does not seems to handling frag page yet, >> so it should be fine for now. > > xdpf_clone() clones a frame to a new full page and doesn't copy its > skb_shared_info. > >> >> IMHO we should find a way to use per-page marker, instead of both >> per-skb and per-page markers, in order to avoid the above problem >> for xdp if xdp has a similar processing as skb, as suggested by Eric. >> >> https://lore.kernel.org/netdev/CANn89iKgZU4Q+THXupzZi4hETuKuCOvOB=iHpp5JzQTNv_Fg_A@mail.gmail.com/ > > As Jesper already pointed out, not having a quick way to check whether > we have to check ::pp_magic at all can decrease performance. So it's > rather a shortcut. When we are freeing a page by updating the _refcount, I think we are already touching the cache of ::pp_magic. Anyway, I am not sure checking ::pp_magic is correct when a page will be passing between different subsystem and back to the network stack eventually, checking ::pp_magic may not be correct if this happens. Another way is to use the bottom two bits in bv_page, see: https://www.spinics.net/lists/netdev/msg874099.html > >> >>> >>> /* Allow SKB to reuse area used by xdp_frame */ >>> xdp_scrub_frame(xdpf); >>> > > Thanks, > Olek > . >
From: Yunsheng Lin <linyunsheng@huawei.com> Date: Fri, 3 Mar 2023 20:44:24 +0800 > On 2023/3/3 19:22, Alexander Lobakin wrote: >> From: Yunsheng Lin <linyunsheng@huawei.com> >> Date: Thu, 2 Mar 2023 10:30:13 +0800 [...] >> And they are fixed :D >> No drivers currently which use Page Pool mix PP pages with non-PP. And > > The wireless adapter which use Page Pool *does* mix PP pages with > non-PP, see below discussion: > > https://lore.kernel.org/netdev/156f3e120bd0757133cb6bc11b76889637b5e0a6.camel@gmail.com/ Ah right, I remember that (also was fixed). Not that I think it is correct to mix them -- for my PoV, a driver shoule either give *all* its Rx buffers as PP-backed or not use PP at all. [...] >> As Jesper already pointed out, not having a quick way to check whether >> we have to check ::pp_magic at all can decrease performance. So it's >> rather a shortcut. > > When we are freeing a page by updating the _refcount, I think > we are already touching the cache of ::pp_magic. But no page freeing happens before checking for skb->pp_recycle, neither in skb_pp_recycle() (skb_free_head() etc.)[0] nor in skb_frag_unref()[1]. > > Anyway, I am not sure checking ::pp_magic is correct when a > page will be passing between different subsystem and back to > the network stack eventually, checking ::pp_magic may not be > correct if this happens. > > Another way is to use the bottom two bits in bv_page, see: > https://www.spinics.net/lists/netdev/msg874099.html > >> >>> >>>> >>>> /* Allow SKB to reuse area used by xdp_frame */ >>>> xdp_scrub_frame(xdpf); >>>> >> >> Thanks, >> Olek >> . >> [0] https://elixir.bootlin.com/linux/latest/source/net/core/skbuff.c#L808 [1] https://elixir.bootlin.com/linux/latest/source/include/linux/skbuff.h#L3385 Thanks, Olek
On 2023/3/3 21:26, Alexander Lobakin wrote: > From: Yunsheng Lin <linyunsheng@huawei.com> > Date: Fri, 3 Mar 2023 20:44:24 +0800 > >> On 2023/3/3 19:22, Alexander Lobakin wrote: >>> From: Yunsheng Lin <linyunsheng@huawei.com> >>> Date: Thu, 2 Mar 2023 10:30:13 +0800 > > [...] > >>> And they are fixed :D >>> No drivers currently which use Page Pool mix PP pages with non-PP. And >> >> The wireless adapter which use Page Pool *does* mix PP pages with >> non-PP, see below discussion: >> >> https://lore.kernel.org/netdev/156f3e120bd0757133cb6bc11b76889637b5e0a6.camel@gmail.com/ > > Ah right, I remember that (also was fixed). > Not that I think it is correct to mix them -- for my PoV, a driver > shoule either give *all* its Rx buffers as PP-backed or not use PP at all. > > [...] > >>> As Jesper already pointed out, not having a quick way to check whether >>> we have to check ::pp_magic at all can decrease performance. So it's >>> rather a shortcut. >> >> When we are freeing a page by updating the _refcount, I think >> we are already touching the cache of ::pp_magic. > > But no page freeing happens before checking for skb->pp_recycle, neither > in skb_pp_recycle() (skb_free_head() etc.)[0] nor in skb_frag_unref()[1]. If we move to per page marker, we probably do not need checking skb->pp_recycle. Note both page_pool_return_skb_page() and skb_free_frag() can reuse the cache line triggered by per page marker checking if the per page marker is in the 'struct page'. > >> >> Anyway, I am not sure checking ::pp_magic is correct when a >> page will be passing between different subsystem and back to >> the network stack eventually, checking ::pp_magic may not be >> correct if this happens. >> >> Another way is to use the bottom two bits in bv_page, see: >> https://www.spinics.net/lists/netdev/msg874099.html >> >>> >>>> >>>>> >>>>> /* Allow SKB to reuse area used by xdp_frame */ >>>>> xdp_scrub_frame(xdpf); >>>>> >>> >>> Thanks, >>> Olek >>> . >>> > > [0] https://elixir.bootlin.com/linux/latest/source/net/core/skbuff.c#L808 > [1] > https://elixir.bootlin.com/linux/latest/source/include/linux/skbuff.h#L3385 > > Thanks, > Olek > . >
From: Yunsheng Lin <linyunsheng@huawei.com> Date: Mon, 6 Mar 2023 09:09:31 +0800 > On 2023/3/3 21:26, Alexander Lobakin wrote: >> From: Yunsheng Lin <linyunsheng@huawei.com> >> Date: Fri, 3 Mar 2023 20:44:24 +0800 >> >>> On 2023/3/3 19:22, Alexander Lobakin wrote: >>>> From: Yunsheng Lin <linyunsheng@huawei.com> >>>> Date: Thu, 2 Mar 2023 10:30:13 +0800 >> >> [...] >> >>>> And they are fixed :D >>>> No drivers currently which use Page Pool mix PP pages with non-PP. And >>> >>> The wireless adapter which use Page Pool *does* mix PP pages with >>> non-PP, see below discussion: >>> >>> https://lore.kernel.org/netdev/156f3e120bd0757133cb6bc11b76889637b5e0a6.camel@gmail.com/ >> >> Ah right, I remember that (also was fixed). >> Not that I think it is correct to mix them -- for my PoV, a driver >> shoule either give *all* its Rx buffers as PP-backed or not use PP at all. >> >> [...] >> >>>> As Jesper already pointed out, not having a quick way to check whether >>>> we have to check ::pp_magic at all can decrease performance. So it's >>>> rather a shortcut. >>> >>> When we are freeing a page by updating the _refcount, I think >>> we are already touching the cache of ::pp_magic. >> >> But no page freeing happens before checking for skb->pp_recycle, neither >> in skb_pp_recycle() (skb_free_head() etc.)[0] nor in skb_frag_unref()[1]. > > If we move to per page marker, we probably do not need checking > skb->pp_recycle. > > Note both page_pool_return_skb_page() and skb_free_frag() can > reuse the cache line triggered by per page marker checking if > the per page marker is in the 'struct page'. Ah, from that perspective. Yes, you're probably right, but would need to be tested anyway. I don't see any open problems with the PP recycling right now on the lists, but someone may try to change it one day. Anyway, this flag is only to do a quick test. We do have sk_buff::pfmemalloc, but this flag doesn't mean every page from this skb was pfmemalloced. > >> >>> >>> Anyway, I am not sure checking ::pp_magic is correct when a >>> page will be passing between different subsystem and back to >>> the network stack eventually, checking ::pp_magic may not be >>> correct if this happens. >>> >>> Another way is to use the bottom two bits in bv_page, see: >>> https://www.spinics.net/lists/netdev/msg874099.html >>> >>>> >>>>> >>>>>> >>>>>> /* Allow SKB to reuse area used by xdp_frame */ >>>>>> xdp_scrub_frame(xdpf); >>>>>> >>>> >>>> Thanks, >>>> Olek >>>> . >>>> >> >> [0] https://elixir.bootlin.com/linux/latest/source/net/core/skbuff.c#L808 >> [1] >> https://elixir.bootlin.com/linux/latest/source/include/linux/skbuff.h#L3385 >> >> Thanks, >> Olek >> . >> Thanks, Olek
On 2023/3/6 19:58, Alexander Lobakin wrote: > From: Yunsheng Lin <linyunsheng@huawei.com> > Date: Mon, 6 Mar 2023 09:09:31 +0800 > >> On 2023/3/3 21:26, Alexander Lobakin wrote: >>> From: Yunsheng Lin <linyunsheng@huawei.com> >>> Date: Fri, 3 Mar 2023 20:44:24 +0800 >>> >>>> On 2023/3/3 19:22, Alexander Lobakin wrote: >>>>> From: Yunsheng Lin <linyunsheng@huawei.com> >>>>> Date: Thu, 2 Mar 2023 10:30:13 +0800 >>> >>> [...] >>> >>>>> And they are fixed :D >>>>> No drivers currently which use Page Pool mix PP pages with non-PP. And >>>> >>>> The wireless adapter which use Page Pool *does* mix PP pages with >>>> non-PP, see below discussion: >>>> >>>> https://lore.kernel.org/netdev/156f3e120bd0757133cb6bc11b76889637b5e0a6.camel@gmail.com/ >>> >>> Ah right, I remember that (also was fixed). >>> Not that I think it is correct to mix them -- for my PoV, a driver >>> shoule either give *all* its Rx buffers as PP-backed or not use PP at all. >>> >>> [...] >>> >>>>> As Jesper already pointed out, not having a quick way to check whether >>>>> we have to check ::pp_magic at all can decrease performance. So it's >>>>> rather a shortcut. >>>> >>>> When we are freeing a page by updating the _refcount, I think >>>> we are already touching the cache of ::pp_magic. >>> >>> But no page freeing happens before checking for skb->pp_recycle, neither >>> in skb_pp_recycle() (skb_free_head() etc.)[0] nor in skb_frag_unref()[1]. >> >> If we move to per page marker, we probably do not need checking >> skb->pp_recycle. >> >> Note both page_pool_return_skb_page() and skb_free_frag() can >> reuse the cache line triggered by per page marker checking if >> the per page marker is in the 'struct page'. > > Ah, from that perspective. Yes, you're probably right, but would need to > be tested anyway. I don't see any open problems with the PP recycling > right now on the lists, but someone may try to change it one day. > Anyway, this flag is only to do a quick test. We do have > sk_buff::pfmemalloc, but this flag doesn't mean every page from this skb > was pfmemalloced. The point seems to be that sk_buff::pfmemalloc allow false positive, which means skb->pfmemalloc can be set to true while every page from this skb is not pfmemalloced as you mentioned. While skb->pp_recycle can't allow false positive, if that happens, reference counting of the page will not be handled properly if pp and non-pp skb shares the page as the wireless adapter does. > >> >>> >>>> >>>> Anyway, I am not sure checking ::pp_magic is correct when a >>>> page will be passing between different subsystem and back to >>>> the network stack eventually, checking ::pp_magic may not be >>>> correct if this happens. >>>> >>>> Another way is to use the bottom two bits in bv_page, see: >>>> https://www.spinics.net/lists/netdev/msg874099.html >>>> >>>>> >>>>>> >>>>>>> >>>>>>> /* Allow SKB to reuse area used by xdp_frame */ >>>>>>> xdp_scrub_frame(xdpf); >>>>>>> >>>>> >>>>> Thanks, >>>>> Olek >>>>> . >>>>> >>> >>> [0] https://elixir.bootlin.com/linux/latest/source/net/core/skbuff.c#L808 >>> [1] >>> https://elixir.bootlin.com/linux/latest/source/include/linux/skbuff.h#L3385 >>> >>> Thanks, >>> Olek >>> . >>> > > Thanks, > Olek > > . >
From: Yunsheng Lin <linyunsheng@huawei.com> Date: Tue, 7 Mar 2023 10:50:34 +0800 > On 2023/3/6 19:58, Alexander Lobakin wrote: >> From: Yunsheng Lin <linyunsheng@huawei.com> >> Date: Mon, 6 Mar 2023 09:09:31 +0800 [...] >> Ah, from that perspective. Yes, you're probably right, but would need to >> be tested anyway. I don't see any open problems with the PP recycling >> right now on the lists, but someone may try to change it one day. >> Anyway, this flag is only to do a quick test. We do have >> sk_buff::pfmemalloc, but this flag doesn't mean every page from this skb >> was pfmemalloced. > > The point seems to be that sk_buff::pfmemalloc allow false positive, which > means skb->pfmemalloc can be set to true while every page from this skb is > not pfmemalloced as you mentioned. > > While skb->pp_recycle can't allow false positive, if that happens, reference > counting of the page will not be handled properly if pp and non-pp skb shares > the page as the wireless adapter does. You mean false-positives in both directions? Because if ->pp_recycle is set, the stack can still free non-PP pages. In the opposite case, I mean when ->pp_recycle is false and an skb page belongs to a page_pool, yes, there'll be issues. But I think the deal is to propagate the flag when you want to attach a PP-backed page to the skb? I mean, if someone decides to mix pages with different memory models, it's his responsibility to make sure everything is fine, because it's not a common/intended way. Isn't it? > >> >>> >>>> >>>>> >>>>> Anyway, I am not sure checking ::pp_magic is correct when a >>>>> page will be passing between different subsystem and back to >>>>> the network stack eventually, checking ::pp_magic may not be >>>>> correct if this happens. >>>>> >>>>> Another way is to use the bottom two bits in bv_page, see: >>>>> https://www.spinics.net/lists/netdev/msg874099.html This one is interesting actually. We'd only need one bit -- which is 100% free and available in case of page pointers. >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> /* Allow SKB to reuse area used by xdp_frame */ >>>>>>>> xdp_scrub_frame(xdpf); [...] Thanks, Olek
On 2023/3/8 2:14, Alexander Lobakin wrote: > From: Yunsheng Lin <linyunsheng@huawei.com> > Date: Tue, 7 Mar 2023 10:50:34 +0800 > >> On 2023/3/6 19:58, Alexander Lobakin wrote: >>> From: Yunsheng Lin <linyunsheng@huawei.com> >>> Date: Mon, 6 Mar 2023 09:09:31 +0800 > > [...] > >>> Ah, from that perspective. Yes, you're probably right, but would need to >>> be tested anyway. I don't see any open problems with the PP recycling >>> right now on the lists, but someone may try to change it one day. >>> Anyway, this flag is only to do a quick test. We do have >>> sk_buff::pfmemalloc, but this flag doesn't mean every page from this skb >>> was pfmemalloced. >> >> The point seems to be that sk_buff::pfmemalloc allow false positive, which >> means skb->pfmemalloc can be set to true while every page from this skb is >> not pfmemalloced as you mentioned. >> >> While skb->pp_recycle can't allow false positive, if that happens, reference >> counting of the page will not be handled properly if pp and non-pp skb shares >> the page as the wireless adapter does. > > You mean false-positives in both directions? Because if ->pp_recycle is > set, the stack can still free non-PP pages. In the opposite case, I mean > when ->pp_recycle is false and an skb page belongs to a page_pool, yes, > there'll be issues. That may depends on what is a PP pages and what is a non-PP pages, it seems hard to answer now. For a skb with ->pp_recycle being true and its frag page with page->pp_magic being PP_SIGNATURE, when calling skb_clone()/pskb_expand_head() or skb_try_coalesce(), we may call __skb_frag_ref() for the frag page, which mean a page with page->pp_magic being PP_SIGNATURE can be both PP page and non-PP page at the same time. So it is important to set the ->pp_recycle correctly, and it seems hard to get that right from past experience,that's why a per page marker is suggested. > But I think the deal is to propagate the flag when you want to attach a > PP-backed page to the skb? I mean, if someone decides to mix pages with > different memory models, it's his responsibility to make sure everything > is fine, because it's not a common/intended way. Isn't it? > >> >>> >>>> >>>>> >>>>>> >>>>>> Anyway, I am not sure checking ::pp_magic is correct when a >>>>>> page will be passing between different subsystem and back to >>>>>> the network stack eventually, checking ::pp_magic may not be >>>>>> correct if this happens. >>>>>> >>>>>> Another way is to use the bottom two bits in bv_page, see: >>>>>> https://www.spinics.net/lists/netdev/msg874099.html > > This one is interesting actually. We'd only need one bit -- which is > 100% free and available in case of page pointers. > >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> /* Allow SKB to reuse area used by xdp_frame */ >>>>>>>>> xdp_scrub_frame(xdpf); > > [...] > > Thanks, > Olek > > . >
From: Yunsheng Lin <linyunsheng@huawei.com> Date: Wed, 8 Mar 2023 14:27:13 +0800 > On 2023/3/8 2:14, Alexander Lobakin wrote: >> From: Yunsheng Lin <linyunsheng@huawei.com> >> Date: Tue, 7 Mar 2023 10:50:34 +0800 [...] >> You mean false-positives in both directions? Because if ->pp_recycle is >> set, the stack can still free non-PP pages. In the opposite case, I mean >> when ->pp_recycle is false and an skb page belongs to a page_pool, yes, >> there'll be issues. > > That may depends on what is a PP pages and what is a non-PP pages, it seems > hard to answer now. > > For a skb with ->pp_recycle being true and its frag page with page->pp_magic > being PP_SIGNATURE, when calling skb_clone()/pskb_expand_head() or > skb_try_coalesce(), we may call __skb_frag_ref() for the frag page, which > mean a page with page->pp_magic being PP_SIGNATURE can be both PP page > and non-PP page at the same time. So it is important to set the ->pp_recycle > correctly, and it seems hard to get that right from past experience,that's > why a per page marker is suggested. Oh well, I didn't know that :s Thanks for the expl. [...] Olek
diff --git a/net/core/xdp.c b/net/core/xdp.c index 8c92fc553317..a2237cfca8e9 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -658,8 +658,8 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, * - RX ring dev queue index (skb_record_rx_queue) */ - /* Until page_pool get SKB return path, release DMA here */ - xdp_release_frame(xdpf); + if (xdpf->mem.type == MEM_TYPE_PAGE_POOL) + skb_mark_for_recycle(skb); /* Allow SKB to reuse area used by xdp_frame */ xdp_scrub_frame(xdpf);