Message ID | ee5513e6384696147da9bdccd2e22ea27d690084.1698431765.git.dxu@dxuuu.xyz |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:d641:0:b0:403:3b70:6f57 with SMTP id cy1csp814195vqb; Fri, 27 Oct 2023 11:48:58 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGUG6N5/hW10Yw7LKAzdT8KkLgL8qkUiUv1Wi+tlGMlrTFS4HTsXqqAk3tI5igzvPBh1t34 X-Received: by 2002:a05:6870:95a2:b0:1e9:b537:51ef with SMTP id k34-20020a05687095a200b001e9b53751efmr4472515oao.31.1698432538362; Fri, 27 Oct 2023 11:48:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698432538; cv=none; d=google.com; s=arc-20160816; b=m2F+/75RNjodaelVO9NLxCAV5TcIxPhpGK9qhoaI9fUG5dm4q50iwwUXz/V0rHCB9y uQq698XAqexSKfBeht66g9UXfBxqHj6P5AbD/Wm272DvmZjr2O5xsPVmUm9NV891A7go Kp0J8kToExlitb09IpuGkSIe4Xo2+yiYK25lza2ANP1pmIAVg9jVffxbN0xKVY8kZyZX zNpsZmzoQGiX9UlgvU+hEAnckU9/hJhXAR2KeIV9Wh+cmZ5vlGZNqszwM861boUnoeQw K9ze1LzVBdnPy99+oXH6KaI+vdiPILM3gO+PDWy1uv18nmogb6krIfJ7Q8d40SqUJ4sq ZNBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :feedback-id:dkim-signature:dkim-signature; bh=0SBud1bmEqWqZY3oTQ6U+NNL1SBx9tn7F3PIApGl8IA=; fh=zh5rRCNFXAx4kHy1v3LnBQtc1FuSnpjWd8y5x9/mw8w=; b=EomxcOfw4HPmnW2GSQgTvnEVwHFgE9OyvRKWg4t97AU18c3IcZcvc/MV3/fTEXt50J IZmA2ZAI5+QvmfuKh0gzPRQRwM1K1lX9LiAZQY/UX0N4FsvvM+bEP7kNN6Bt/SR4o/O3 I3VwT7TezYqgLdEVZu/vqtCp97bE6v9PP5MReZZusVgdJ6XWW21IY2hZgwccUTRdNmGw kh2hWs3ieL+rMemy0egvG9EqVBfbBM9HF4kNo9USnj8zKF8jxaMy207XrXnm9/WH1QLY 4Ho8NLjg66XHssPwvlFOw0M+i2cv5l2lW9PSxDKBbvY6j8vss8Clhq3gocel9jCrMoCP cOsw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@dxuuu.xyz header.s=fm2 header.b=jAFpr4ni; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=NgibRXQF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id m2-20020a817102000000b005a22cf9483fsi3037136ywc.511.2023.10.27.11.48.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Oct 2023 11:48:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; dkim=pass header.i=@dxuuu.xyz header.s=fm2 header.b=jAFpr4ni; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=NgibRXQF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id A484183D6F1E; Fri, 27 Oct 2023 11:48:52 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346541AbjJ0SrO (ORCPT <rfc822;a1648639935@gmail.com> + 25 others); Fri, 27 Oct 2023 14:47:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34500 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346553AbjJ0Sqx (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 27 Oct 2023 14:46:53 -0400 Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8271412A; Fri, 27 Oct 2023 11:46:46 -0700 (PDT) Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id CC7475C01C9; Fri, 27 Oct 2023 14:46:45 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute3.internal (MEProxy); Fri, 27 Oct 2023 14:46:45 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dxuuu.xyz; h=cc :cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm2; t=1698432405; x= 1698518805; bh=0SBud1bmEqWqZY3oTQ6U+NNL1SBx9tn7F3PIApGl8IA=; b=j AFpr4niQzEcm9Fz2w69CvSqLla63CQN+17UWh1albQcxUg2lGQfS0yLz0OjUJz2n evbQQNImDOMHNqgpVuMEjrF/KHpJwrZK+G6O4WMvq722QYv384aPyE/K0pZzg9Eh CZq0XZGOk8adCf1Fbb+sxf7OvYyMsxhWS1ZYmGZq/DeEG+1VbDGCsue9E3Eag5JO a8u9VxbadUGCsTi7GQNxbt+wTKrAagwBgdmK0UgYdojW9BtCjKjEpi+DiX2pjSG6 pKdxp+ygjGmycTiZ9z1dcSdQwzUdi+FSG/oioq7fk6p54IGCz4/xRf91XD+Uh97H /MpcOCHUR4d5fpyS8mSZQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1698432405; x= 1698518805; bh=0SBud1bmEqWqZY3oTQ6U+NNL1SBx9tn7F3PIApGl8IA=; b=N gibRXQFnKYn0QkB9y5cN4yi6q4Nyv6w6a2raYdFmZi0ERVsFViBFg92CipX3Vu+c rT5eKisVRJ6RUc/4Nw6MBbuYA/3zxhMe3meiAvoahZlsmr/myiXdruGbHbae4gCl O3INNh/ea7LdvQDLQD/yLdF2zyaeC0HYgFMMe8+F5gC4oPF5YLTJdLp3i1i5BMSW jiLD4y3y6rWpeTW63H+xIuTwjZLuteU9FDu7Y+eSAy0yMOQNi4PLfRme6PmKz7xW khO1fTyrqw+ZZYN8BVZNSvqHihymHFwudHBQtSv7vMt/T98KqVF7r1ei3R1bG9qb N7oAO1ZnqE2Boe7cfHQYg== X-ME-Sender: <xms:lQU8ZaBM71bt3pYFJyUNVOX46FIVSLMttLjVkyAwS5TTSc8DR9ffkQ> <xme:lQU8ZUh8Ma7rABlYxPZSQWcwrOVuBCfYlnbt6yRGT2VR9yEjEC0ElewxyVWQldVEz gS5jR-p2TmVhu8lOA> X-ME-Received: <xmr:lQU8Zdnyeppy0sd6V6zzA8RW4GQdSAw-E6kN1dTRgo2UAyH9totWP5NobPoaIHB87n_Ky_0> X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrleeggdduvdelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne gfrhhlucfvnfffucdljedtmdenucfjughrpefhvfevufffkffojghfggfgsedtkeertder tddtnecuhfhrohhmpeffrghnihgvlhcuighuuceougiguhesugiguhhuuhdrgiihiieqne cuggftrfgrthhtvghrnhepgfefgfegjefhudeikedvueetffelieefuedvhfehjeeljeej kefgffeghfdttdetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilh hfrhhomhepugiguhesugiguhhuuhdrgiihii X-ME-Proxy: <xmx:lQU8ZYzId49eHTRHRkk0v0icXxwfteTKytuUmhq6rMAs7x86Y7uOHg> <xmx:lQU8ZfTvgTkaWGc7v_ynkC1E0RUOegCBai6kH7Jc_QYsufkYQanqJQ> <xmx:lQU8ZTb_voezV2Am4x-5xSH9WmbPT7WvXxNakT_udrVchwoSbcvG0g> <xmx:lQU8ZdBvZdQ4krVH9hrs5YPLDFCguizebTaiVphyEso8JiHQj70EOg> Feedback-ID: i6a694271:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 27 Oct 2023 14:46:44 -0400 (EDT) From: Daniel Xu <dxu@dxuuu.xyz> To: hawk@kernel.org, steffen.klassert@secunet.com, ast@kernel.org, pabeni@redhat.com, daniel@iogearbox.net, kuba@kernel.org, Herbert Xu <herbert@gondor.apana.org.au>, davem@davemloft.net, john.fastabend@gmail.com, edumazet@google.com, antony.antony@secunet.com Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, devel@linux-ipsec.org Subject: [RFC bpf-next 1/6] bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc Date: Fri, 27 Oct 2023 12:46:17 -0600 Message-ID: <ee5513e6384696147da9bdccd2e22ea27d690084.1698431765.git.dxu@dxuuu.xyz> X-Mailer: git-send-email 2.42.0 In-Reply-To: <cover.1698431765.git.dxu@dxuuu.xyz> References: <cover.1698431765.git.dxu@dxuuu.xyz> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Fri, 27 Oct 2023 11:48:52 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780935597569784608 X-GMAIL-MSGID: 1780935597569784608 |
Series |
Add bpf_xdp_get_xfrm_state() kfunc
|
|
Commit Message
Daniel Xu
Oct. 27, 2023, 6:46 p.m. UTC
This commit adds an unstable kfunc helper to access internal xfrm_state
associated with an SA. This is intended to be used for the upcoming
IPsec pcpu work to assign special pcpu SAs to a particular CPU. In other
words: for custom software RSS.
That being said, the function that this kfunc wraps is fairly generic
and used for a lot of xfrm tasks. I'm sure people will find uses
elsewhere over time.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
---
include/net/xfrm.h | 9 ++++
net/xfrm/Makefile | 1 +
net/xfrm/xfrm_policy.c | 2 +
net/xfrm/xfrm_state_bpf.c | 105 ++++++++++++++++++++++++++++++++++++++
4 files changed, 117 insertions(+)
create mode 100644 net/xfrm/xfrm_state_bpf.c
Comments
On Fri, Oct 27, 2023 at 11:46 AM Daniel Xu <dxu@dxuuu.xyz> wrote: > > This commit adds an unstable kfunc helper to access internal xfrm_state > associated with an SA. This is intended to be used for the upcoming > IPsec pcpu work to assign special pcpu SAs to a particular CPU. In other > words: for custom software RSS. > > That being said, the function that this kfunc wraps is fairly generic > and used for a lot of xfrm tasks. I'm sure people will find uses > elsewhere over time. > > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> > --- > include/net/xfrm.h | 9 ++++ > net/xfrm/Makefile | 1 + > net/xfrm/xfrm_policy.c | 2 + > net/xfrm/xfrm_state_bpf.c | 105 ++++++++++++++++++++++++++++++++++++++ > 4 files changed, 117 insertions(+) > create mode 100644 net/xfrm/xfrm_state_bpf.c > > diff --git a/include/net/xfrm.h b/include/net/xfrm.h > index 98d7aa78adda..ab4cf66480f3 100644 > --- a/include/net/xfrm.h > +++ b/include/net/xfrm.h > @@ -2188,4 +2188,13 @@ static inline int register_xfrm_interface_bpf(void) > > #endif > > +#if IS_ENABLED(CONFIG_DEBUG_INFO_BTF) > +int register_xfrm_state_bpf(void); > +#else > +static inline int register_xfrm_state_bpf(void) > +{ > + return 0; > +} > +#endif > + > #endif /* _NET_XFRM_H */ > diff --git a/net/xfrm/Makefile b/net/xfrm/Makefile > index cd47f88921f5..547cec77ba03 100644 > --- a/net/xfrm/Makefile > +++ b/net/xfrm/Makefile > @@ -21,3 +21,4 @@ obj-$(CONFIG_XFRM_USER_COMPAT) += xfrm_compat.o > obj-$(CONFIG_XFRM_IPCOMP) += xfrm_ipcomp.o > obj-$(CONFIG_XFRM_INTERFACE) += xfrm_interface.o > obj-$(CONFIG_XFRM_ESPINTCP) += espintcp.o > +obj-$(CONFIG_DEBUG_INFO_BTF) += xfrm_state_bpf.o > diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c > index 5cdd3bca3637..62e64fa7ae5c 100644 > --- a/net/xfrm/xfrm_policy.c > +++ b/net/xfrm/xfrm_policy.c > @@ -4267,6 +4267,8 @@ void __init xfrm_init(void) > #ifdef CONFIG_XFRM_ESPINTCP > espintcp_init(); > #endif > + > + register_xfrm_state_bpf(); > } > > #ifdef CONFIG_AUDITSYSCALL > diff --git a/net/xfrm/xfrm_state_bpf.c b/net/xfrm/xfrm_state_bpf.c > new file mode 100644 > index 000000000000..a73a17a6497b > --- /dev/null > +++ b/net/xfrm/xfrm_state_bpf.c > @@ -0,0 +1,105 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* Unstable XFRM state BPF helpers. > + * > + * Note that it is allowed to break compatibility for these functions since the > + * interface they are exposed through to BPF programs is explicitly unstable. > + */ > + > +#include <linux/bpf.h> > +#include <linux/btf_ids.h> > +#include <net/xdp.h> > +#include <net/xfrm.h> > + > +/* bpf_xfrm_state_opts - Options for XFRM state lookup helpers > + * > + * Members: > + * @error - Out parameter, set for any errors encountered > + * Values: > + * -EINVAL - netns_id is less than -1 > + * -EINVAL - Passed NULL for opts > + * -EINVAL - opts__sz isn't BPF_XFRM_STATE_OPTS_SZ > + * -ENONET - No network namespace found for netns_id > + * @netns_id - Specify the network namespace for lookup > + * Values: > + * BPF_F_CURRENT_NETNS (-1) > + * Use namespace associated with ctx > + * [0, S32_MAX] > + * Network Namespace ID > + * @mark - XFRM mark to match on > + * @daddr - Destination address to match on > + * @spi - Security parameter index to match on > + * @proto - L3 protocol to match on > + * @family - L3 protocol family to match on > + */ > +struct bpf_xfrm_state_opts { > + s32 error; > + s32 netns_id; > + u32 mark; > + xfrm_address_t daddr; > + __be32 spi; > + u8 proto; > + u16 family; > +}; > + > +enum { > + BPF_XFRM_STATE_OPTS_SZ = sizeof(struct bpf_xfrm_state_opts), > +}; > + > +__diag_push(); > +__diag_ignore_all("-Wmissing-prototypes", > + "Global functions as their definitions will be in xfrm_state BTF"); > + > +/* bpf_xdp_get_xfrm_state - Get XFRM state > + * > + * Parameters: > + * @ctx - Pointer to ctx (xdp_md) in XDP program > + * Cannot be NULL > + * @opts - Options for lookup (documented above) > + * Cannot be NULL > + * @opts__sz - Length of the bpf_xfrm_state_opts structure > + * Must be BPF_XFRM_STATE_OPTS_SZ > + */ > +__bpf_kfunc struct xfrm_state * > +bpf_xdp_get_xfrm_state(struct xdp_md *ctx, struct bpf_xfrm_state_opts *opts, u32 opts__sz) > +{ > + struct xdp_buff *xdp = (struct xdp_buff *)ctx; > + struct net *net = dev_net(xdp->rxq->dev); > + > + if (!opts || opts__sz != BPF_XFRM_STATE_OPTS_SZ) { > + opts->error = -EINVAL; > + return NULL; > + } > + > + if (unlikely(opts->netns_id < BPF_F_CURRENT_NETNS)) { > + opts->error = -EINVAL; > + return NULL; > + } > + > + if (opts->netns_id >= 0) { > + net = get_net_ns_by_id(net, opts->netns_id); > + if (unlikely(!net)) { > + opts->error = -ENONET; > + return NULL; > + } > + } > + > + return xfrm_state_lookup(net, opts->mark, &opts->daddr, opts->spi, > + opts->proto, opts->family); > +} Patch 6 example does little to explain how this kfunc can be used. Cover letter sounds promising, but no code to demonstrate the result. The main issue is that this kfunc has to be KF_ACQUIRE, otherwise bpf prog will keep leaking xfrm_state. Plenty of red flags in this RFC.
Hi Alexei, On Sat, Oct 28, 2023 at 04:49:45PM -0700, Alexei Starovoitov wrote: > On Fri, Oct 27, 2023 at 11:46 AM Daniel Xu <dxu@dxuuu.xyz> wrote: > > > > This commit adds an unstable kfunc helper to access internal xfrm_state > > associated with an SA. This is intended to be used for the upcoming > > IPsec pcpu work to assign special pcpu SAs to a particular CPU. In other > > words: for custom software RSS. > > > > That being said, the function that this kfunc wraps is fairly generic > > and used for a lot of xfrm tasks. I'm sure people will find uses > > elsewhere over time. > > > > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> > > --- > > include/net/xfrm.h | 9 ++++ > > net/xfrm/Makefile | 1 + > > net/xfrm/xfrm_policy.c | 2 + > > net/xfrm/xfrm_state_bpf.c | 105 ++++++++++++++++++++++++++++++++++++++ > > 4 files changed, 117 insertions(+) > > create mode 100644 net/xfrm/xfrm_state_bpf.c > > > > diff --git a/include/net/xfrm.h b/include/net/xfrm.h > > index 98d7aa78adda..ab4cf66480f3 100644 > > --- a/include/net/xfrm.h > > +++ b/include/net/xfrm.h > > @@ -2188,4 +2188,13 @@ static inline int register_xfrm_interface_bpf(void) > > > > #endif > > > > +#if IS_ENABLED(CONFIG_DEBUG_INFO_BTF) > > +int register_xfrm_state_bpf(void); > > +#else > > +static inline int register_xfrm_state_bpf(void) > > +{ > > + return 0; > > +} > > +#endif > > + > > #endif /* _NET_XFRM_H */ > > diff --git a/net/xfrm/Makefile b/net/xfrm/Makefile > > index cd47f88921f5..547cec77ba03 100644 > > --- a/net/xfrm/Makefile > > +++ b/net/xfrm/Makefile > > @@ -21,3 +21,4 @@ obj-$(CONFIG_XFRM_USER_COMPAT) += xfrm_compat.o > > obj-$(CONFIG_XFRM_IPCOMP) += xfrm_ipcomp.o > > obj-$(CONFIG_XFRM_INTERFACE) += xfrm_interface.o > > obj-$(CONFIG_XFRM_ESPINTCP) += espintcp.o > > +obj-$(CONFIG_DEBUG_INFO_BTF) += xfrm_state_bpf.o > > diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c > > index 5cdd3bca3637..62e64fa7ae5c 100644 > > --- a/net/xfrm/xfrm_policy.c > > +++ b/net/xfrm/xfrm_policy.c > > @@ -4267,6 +4267,8 @@ void __init xfrm_init(void) > > #ifdef CONFIG_XFRM_ESPINTCP > > espintcp_init(); > > #endif > > + > > + register_xfrm_state_bpf(); > > } > > > > #ifdef CONFIG_AUDITSYSCALL > > diff --git a/net/xfrm/xfrm_state_bpf.c b/net/xfrm/xfrm_state_bpf.c > > new file mode 100644 > > index 000000000000..a73a17a6497b > > --- /dev/null > > +++ b/net/xfrm/xfrm_state_bpf.c > > @@ -0,0 +1,105 @@ > > +// SPDX-License-Identifier: GPL-2.0-only > > +/* Unstable XFRM state BPF helpers. > > + * > > + * Note that it is allowed to break compatibility for these functions since the > > + * interface they are exposed through to BPF programs is explicitly unstable. > > + */ > > + > > +#include <linux/bpf.h> > > +#include <linux/btf_ids.h> > > +#include <net/xdp.h> > > +#include <net/xfrm.h> > > + > > +/* bpf_xfrm_state_opts - Options for XFRM state lookup helpers > > + * > > + * Members: > > + * @error - Out parameter, set for any errors encountered > > + * Values: > > + * -EINVAL - netns_id is less than -1 > > + * -EINVAL - Passed NULL for opts > > + * -EINVAL - opts__sz isn't BPF_XFRM_STATE_OPTS_SZ > > + * -ENONET - No network namespace found for netns_id > > + * @netns_id - Specify the network namespace for lookup > > + * Values: > > + * BPF_F_CURRENT_NETNS (-1) > > + * Use namespace associated with ctx > > + * [0, S32_MAX] > > + * Network Namespace ID > > + * @mark - XFRM mark to match on > > + * @daddr - Destination address to match on > > + * @spi - Security parameter index to match on > > + * @proto - L3 protocol to match on > > + * @family - L3 protocol family to match on > > + */ > > +struct bpf_xfrm_state_opts { > > + s32 error; > > + s32 netns_id; > > + u32 mark; > > + xfrm_address_t daddr; > > + __be32 spi; > > + u8 proto; > > + u16 family; > > +}; > > + > > +enum { > > + BPF_XFRM_STATE_OPTS_SZ = sizeof(struct bpf_xfrm_state_opts), > > +}; > > + > > +__diag_push(); > > +__diag_ignore_all("-Wmissing-prototypes", > > + "Global functions as their definitions will be in xfrm_state BTF"); > > + > > +/* bpf_xdp_get_xfrm_state - Get XFRM state > > + * > > + * Parameters: > > + * @ctx - Pointer to ctx (xdp_md) in XDP program > > + * Cannot be NULL > > + * @opts - Options for lookup (documented above) > > + * Cannot be NULL > > + * @opts__sz - Length of the bpf_xfrm_state_opts structure > > + * Must be BPF_XFRM_STATE_OPTS_SZ > > + */ > > +__bpf_kfunc struct xfrm_state * > > +bpf_xdp_get_xfrm_state(struct xdp_md *ctx, struct bpf_xfrm_state_opts *opts, u32 opts__sz) > > +{ > > + struct xdp_buff *xdp = (struct xdp_buff *)ctx; > > + struct net *net = dev_net(xdp->rxq->dev); > > + > > + if (!opts || opts__sz != BPF_XFRM_STATE_OPTS_SZ) { > > + opts->error = -EINVAL; > > + return NULL; > > + } > > + > > + if (unlikely(opts->netns_id < BPF_F_CURRENT_NETNS)) { > > + opts->error = -EINVAL; > > + return NULL; > > + } > > + > > + if (opts->netns_id >= 0) { > > + net = get_net_ns_by_id(net, opts->netns_id); > > + if (unlikely(!net)) { > > + opts->error = -ENONET; > > + return NULL; > > + } > > + } > > + > > + return xfrm_state_lookup(net, opts->mark, &opts->daddr, opts->spi, > > + opts->proto, opts->family); > > +} > > Patch 6 example does little to explain how this kfunc can be used. > Cover letter sounds promising, but no code to demonstrate the result. Part of the reason for that is this kfunc is intended to be used with a not-yet-upstreamed xfrm patchset. The other is that the usage is quite trivial. This is the code the experiments were run with: https://github.com/danobi/xdp-tools/blob/e89a1c617aba3b50d990f779357d6ce2863ecb27/xdp-bench/xdp_redirect_cpumap.bpf.c#L385-L406 We intend to upstream that cpumap mode to xdp-tools as soon as the xfrm patches are in. (Note the linked code is a little buggy but the main idea is there). Depending on your appetite for complex diagrams, I can also offer you a sequence diagram that describes how everything fits together: https://dxuuu.xyz/r/ipsec-pcpu.png The TLDR is that all the magic comes from xfrm subsystem. This kfunc just enables software RSS. > The main issue is that this kfunc has to be KF_ACQUIRE, > otherwise bpf prog will keep leaking xfrm_state. > Plenty of red flags in this RFC. Ack, will check on KF_ACQUIRE. Thanks, Daniel
On Sun, Oct 29, 2023 at 3:55 PM Daniel Xu <dxu@dxuuu.xyz> wrote: > > Hi Alexei, > > On Sat, Oct 28, 2023 at 04:49:45PM -0700, Alexei Starovoitov wrote: > > On Fri, Oct 27, 2023 at 11:46 AM Daniel Xu <dxu@dxuuu.xyz> wrote: > > > > > > This commit adds an unstable kfunc helper to access internal xfrm_state > > > associated with an SA. This is intended to be used for the upcoming > > > IPsec pcpu work to assign special pcpu SAs to a particular CPU. In other > > > words: for custom software RSS. > > > > > > That being said, the function that this kfunc wraps is fairly generic > > > and used for a lot of xfrm tasks. I'm sure people will find uses > > > elsewhere over time. > > > > > > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> > > > --- > > > include/net/xfrm.h | 9 ++++ > > > net/xfrm/Makefile | 1 + > > > net/xfrm/xfrm_policy.c | 2 + > > > net/xfrm/xfrm_state_bpf.c | 105 ++++++++++++++++++++++++++++++++++++++ > > > 4 files changed, 117 insertions(+) > > > create mode 100644 net/xfrm/xfrm_state_bpf.c > > > > > > diff --git a/include/net/xfrm.h b/include/net/xfrm.h > > > index 98d7aa78adda..ab4cf66480f3 100644 > > > --- a/include/net/xfrm.h > > > +++ b/include/net/xfrm.h > > > @@ -2188,4 +2188,13 @@ static inline int register_xfrm_interface_bpf(void) > > > > > > #endif > > > > > > +#if IS_ENABLED(CONFIG_DEBUG_INFO_BTF) > > > +int register_xfrm_state_bpf(void); > > > +#else > > > +static inline int register_xfrm_state_bpf(void) > > > +{ > > > + return 0; > > > +} > > > +#endif > > > + > > > #endif /* _NET_XFRM_H */ > > > diff --git a/net/xfrm/Makefile b/net/xfrm/Makefile > > > index cd47f88921f5..547cec77ba03 100644 > > > --- a/net/xfrm/Makefile > > > +++ b/net/xfrm/Makefile > > > @@ -21,3 +21,4 @@ obj-$(CONFIG_XFRM_USER_COMPAT) += xfrm_compat.o > > > obj-$(CONFIG_XFRM_IPCOMP) += xfrm_ipcomp.o > > > obj-$(CONFIG_XFRM_INTERFACE) += xfrm_interface.o > > > obj-$(CONFIG_XFRM_ESPINTCP) += espintcp.o > > > +obj-$(CONFIG_DEBUG_INFO_BTF) += xfrm_state_bpf.o > > > diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c > > > index 5cdd3bca3637..62e64fa7ae5c 100644 > > > --- a/net/xfrm/xfrm_policy.c > > > +++ b/net/xfrm/xfrm_policy.c > > > @@ -4267,6 +4267,8 @@ void __init xfrm_init(void) > > > #ifdef CONFIG_XFRM_ESPINTCP > > > espintcp_init(); > > > #endif > > > + > > > + register_xfrm_state_bpf(); > > > } > > > > > > #ifdef CONFIG_AUDITSYSCALL > > > diff --git a/net/xfrm/xfrm_state_bpf.c b/net/xfrm/xfrm_state_bpf.c > > > new file mode 100644 > > > index 000000000000..a73a17a6497b > > > --- /dev/null > > > +++ b/net/xfrm/xfrm_state_bpf.c > > > @@ -0,0 +1,105 @@ > > > +// SPDX-License-Identifier: GPL-2.0-only > > > +/* Unstable XFRM state BPF helpers. > > > + * > > > + * Note that it is allowed to break compatibility for these functions since the > > > + * interface they are exposed through to BPF programs is explicitly unstable. > > > + */ > > > + > > > +#include <linux/bpf.h> > > > +#include <linux/btf_ids.h> > > > +#include <net/xdp.h> > > > +#include <net/xfrm.h> > > > + > > > +/* bpf_xfrm_state_opts - Options for XFRM state lookup helpers > > > + * > > > + * Members: > > > + * @error - Out parameter, set for any errors encountered > > > + * Values: > > > + * -EINVAL - netns_id is less than -1 > > > + * -EINVAL - Passed NULL for opts > > > + * -EINVAL - opts__sz isn't BPF_XFRM_STATE_OPTS_SZ > > > + * -ENONET - No network namespace found for netns_id > > > + * @netns_id - Specify the network namespace for lookup > > > + * Values: > > > + * BPF_F_CURRENT_NETNS (-1) > > > + * Use namespace associated with ctx > > > + * [0, S32_MAX] > > > + * Network Namespace ID > > > + * @mark - XFRM mark to match on > > > + * @daddr - Destination address to match on > > > + * @spi - Security parameter index to match on > > > + * @proto - L3 protocol to match on > > > + * @family - L3 protocol family to match on > > > + */ > > > +struct bpf_xfrm_state_opts { > > > + s32 error; > > > + s32 netns_id; > > > + u32 mark; > > > + xfrm_address_t daddr; > > > + __be32 spi; > > > + u8 proto; > > > + u16 family; > > > +}; > > > + > > > +enum { > > > + BPF_XFRM_STATE_OPTS_SZ = sizeof(struct bpf_xfrm_state_opts), > > > +}; > > > + > > > +__diag_push(); > > > +__diag_ignore_all("-Wmissing-prototypes", > > > + "Global functions as their definitions will be in xfrm_state BTF"); > > > + > > > +/* bpf_xdp_get_xfrm_state - Get XFRM state > > > + * > > > + * Parameters: > > > + * @ctx - Pointer to ctx (xdp_md) in XDP program > > > + * Cannot be NULL > > > + * @opts - Options for lookup (documented above) > > > + * Cannot be NULL > > > + * @opts__sz - Length of the bpf_xfrm_state_opts structure > > > + * Must be BPF_XFRM_STATE_OPTS_SZ > > > + */ > > > +__bpf_kfunc struct xfrm_state * > > > +bpf_xdp_get_xfrm_state(struct xdp_md *ctx, struct bpf_xfrm_state_opts *opts, u32 opts__sz) > > > +{ > > > + struct xdp_buff *xdp = (struct xdp_buff *)ctx; > > > + struct net *net = dev_net(xdp->rxq->dev); > > > + > > > + if (!opts || opts__sz != BPF_XFRM_STATE_OPTS_SZ) { > > > + opts->error = -EINVAL; > > > + return NULL; > > > + } > > > + > > > + if (unlikely(opts->netns_id < BPF_F_CURRENT_NETNS)) { > > > + opts->error = -EINVAL; > > > + return NULL; > > > + } > > > + > > > + if (opts->netns_id >= 0) { > > > + net = get_net_ns_by_id(net, opts->netns_id); > > > + if (unlikely(!net)) { > > > + opts->error = -ENONET; > > > + return NULL; > > > + } > > > + } > > > + > > > + return xfrm_state_lookup(net, opts->mark, &opts->daddr, opts->spi, > > > + opts->proto, opts->family); > > > +} > > > > Patch 6 example does little to explain how this kfunc can be used. > > Cover letter sounds promising, but no code to demonstrate the result. > > Part of the reason for that is this kfunc is intended to be used with a > not-yet-upstreamed xfrm patchset. The other is that the usage is quite > trivial. This is the code the experiments were run with: > > https://github.com/danobi/xdp-tools/blob/e89a1c617aba3b50d990f779357d6ce2863ecb27/xdp-bench/xdp_redirect_cpumap.bpf.c#L385-L406 > > We intend to upstream that cpumap mode to xdp-tools as soon as the xfrm > patches are in. (Note the linked code is a little buggy but the > main idea is there). I don't understand how it survives anything, but sanity check. To measure perf gains it needs to be under traffic for some time, but x = bpf_xdp_get_xfrm_state(ctx, &opts, sizeof(opts)); will keep refcnt++ that state for every packet. Minimum -> memory leak or refcnt overflow.
On Tue, Oct 31, 2023 at 03:38:26PM -0700, Alexei Starovoitov wrote: > On Sun, Oct 29, 2023 at 3:55 PM Daniel Xu <dxu@dxuuu.xyz> wrote: > > > > Hi Alexei, > > > > On Sat, Oct 28, 2023 at 04:49:45PM -0700, Alexei Starovoitov wrote: > > > On Fri, Oct 27, 2023 at 11:46 AM Daniel Xu <dxu@dxuuu.xyz> wrote: > > > > > > > > This commit adds an unstable kfunc helper to access internal xfrm_state > > > > associated with an SA. This is intended to be used for the upcoming > > > > IPsec pcpu work to assign special pcpu SAs to a particular CPU. In other > > > > words: for custom software RSS. > > > > > > > > That being said, the function that this kfunc wraps is fairly generic > > > > and used for a lot of xfrm tasks. I'm sure people will find uses > > > > elsewhere over time. > > > > > > > > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> > > > > --- > > > > include/net/xfrm.h | 9 ++++ > > > > net/xfrm/Makefile | 1 + > > > > net/xfrm/xfrm_policy.c | 2 + > > > > net/xfrm/xfrm_state_bpf.c | 105 ++++++++++++++++++++++++++++++++++++++ > > > > 4 files changed, 117 insertions(+) > > > > create mode 100644 net/xfrm/xfrm_state_bpf.c > > > > > > > > diff --git a/include/net/xfrm.h b/include/net/xfrm.h > > > > index 98d7aa78adda..ab4cf66480f3 100644 > > > > --- a/include/net/xfrm.h > > > > +++ b/include/net/xfrm.h > > > > @@ -2188,4 +2188,13 @@ static inline int register_xfrm_interface_bpf(void) > > > > > > > > #endif > > > > > > > > +#if IS_ENABLED(CONFIG_DEBUG_INFO_BTF) > > > > +int register_xfrm_state_bpf(void); > > > > +#else > > > > +static inline int register_xfrm_state_bpf(void) > > > > +{ > > > > + return 0; > > > > +} > > > > +#endif > > > > + > > > > #endif /* _NET_XFRM_H */ > > > > diff --git a/net/xfrm/Makefile b/net/xfrm/Makefile > > > > index cd47f88921f5..547cec77ba03 100644 > > > > --- a/net/xfrm/Makefile > > > > +++ b/net/xfrm/Makefile > > > > @@ -21,3 +21,4 @@ obj-$(CONFIG_XFRM_USER_COMPAT) += xfrm_compat.o > > > > obj-$(CONFIG_XFRM_IPCOMP) += xfrm_ipcomp.o > > > > obj-$(CONFIG_XFRM_INTERFACE) += xfrm_interface.o > > > > obj-$(CONFIG_XFRM_ESPINTCP) += espintcp.o > > > > +obj-$(CONFIG_DEBUG_INFO_BTF) += xfrm_state_bpf.o > > > > diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c > > > > index 5cdd3bca3637..62e64fa7ae5c 100644 > > > > --- a/net/xfrm/xfrm_policy.c > > > > +++ b/net/xfrm/xfrm_policy.c > > > > @@ -4267,6 +4267,8 @@ void __init xfrm_init(void) > > > > #ifdef CONFIG_XFRM_ESPINTCP > > > > espintcp_init(); > > > > #endif > > > > + > > > > + register_xfrm_state_bpf(); > > > > } > > > > > > > > #ifdef CONFIG_AUDITSYSCALL > > > > diff --git a/net/xfrm/xfrm_state_bpf.c b/net/xfrm/xfrm_state_bpf.c > > > > new file mode 100644 > > > > index 000000000000..a73a17a6497b > > > > --- /dev/null > > > > +++ b/net/xfrm/xfrm_state_bpf.c > > > > @@ -0,0 +1,105 @@ > > > > +// SPDX-License-Identifier: GPL-2.0-only > > > > +/* Unstable XFRM state BPF helpers. > > > > + * > > > > + * Note that it is allowed to break compatibility for these functions since the > > > > + * interface they are exposed through to BPF programs is explicitly unstable. > > > > + */ > > > > + > > > > +#include <linux/bpf.h> > > > > +#include <linux/btf_ids.h> > > > > +#include <net/xdp.h> > > > > +#include <net/xfrm.h> > > > > + > > > > +/* bpf_xfrm_state_opts - Options for XFRM state lookup helpers > > > > + * > > > > + * Members: > > > > + * @error - Out parameter, set for any errors encountered > > > > + * Values: > > > > + * -EINVAL - netns_id is less than -1 > > > > + * -EINVAL - Passed NULL for opts > > > > + * -EINVAL - opts__sz isn't BPF_XFRM_STATE_OPTS_SZ > > > > + * -ENONET - No network namespace found for netns_id > > > > + * @netns_id - Specify the network namespace for lookup > > > > + * Values: > > > > + * BPF_F_CURRENT_NETNS (-1) > > > > + * Use namespace associated with ctx > > > > + * [0, S32_MAX] > > > > + * Network Namespace ID > > > > + * @mark - XFRM mark to match on > > > > + * @daddr - Destination address to match on > > > > + * @spi - Security parameter index to match on > > > > + * @proto - L3 protocol to match on > > > > + * @family - L3 protocol family to match on > > > > + */ > > > > +struct bpf_xfrm_state_opts { > > > > + s32 error; > > > > + s32 netns_id; > > > > + u32 mark; > > > > + xfrm_address_t daddr; > > > > + __be32 spi; > > > > + u8 proto; > > > > + u16 family; > > > > +}; > > > > + > > > > +enum { > > > > + BPF_XFRM_STATE_OPTS_SZ = sizeof(struct bpf_xfrm_state_opts), > > > > +}; > > > > + > > > > +__diag_push(); > > > > +__diag_ignore_all("-Wmissing-prototypes", > > > > + "Global functions as their definitions will be in xfrm_state BTF"); > > > > + > > > > +/* bpf_xdp_get_xfrm_state - Get XFRM state > > > > + * > > > > + * Parameters: > > > > + * @ctx - Pointer to ctx (xdp_md) in XDP program > > > > + * Cannot be NULL > > > > + * @opts - Options for lookup (documented above) > > > > + * Cannot be NULL > > > > + * @opts__sz - Length of the bpf_xfrm_state_opts structure > > > > + * Must be BPF_XFRM_STATE_OPTS_SZ > > > > + */ > > > > +__bpf_kfunc struct xfrm_state * > > > > +bpf_xdp_get_xfrm_state(struct xdp_md *ctx, struct bpf_xfrm_state_opts *opts, u32 opts__sz) > > > > +{ > > > > + struct xdp_buff *xdp = (struct xdp_buff *)ctx; > > > > + struct net *net = dev_net(xdp->rxq->dev); > > > > + > > > > + if (!opts || opts__sz != BPF_XFRM_STATE_OPTS_SZ) { > > > > + opts->error = -EINVAL; > > > > + return NULL; > > > > + } > > > > + > > > > + if (unlikely(opts->netns_id < BPF_F_CURRENT_NETNS)) { > > > > + opts->error = -EINVAL; > > > > + return NULL; > > > > + } > > > > + > > > > + if (opts->netns_id >= 0) { > > > > + net = get_net_ns_by_id(net, opts->netns_id); > > > > + if (unlikely(!net)) { > > > > + opts->error = -ENONET; > > > > + return NULL; > > > > + } > > > > + } > > > > + > > > > + return xfrm_state_lookup(net, opts->mark, &opts->daddr, opts->spi, > > > > + opts->proto, opts->family); > > > > +} > > > > > > Patch 6 example does little to explain how this kfunc can be used. > > > Cover letter sounds promising, but no code to demonstrate the result. > > > > Part of the reason for that is this kfunc is intended to be used with a > > not-yet-upstreamed xfrm patchset. The other is that the usage is quite > > trivial. This is the code the experiments were run with: > > > > https://github.com/danobi/xdp-tools/blob/e89a1c617aba3b50d990f779357d6ce2863ecb27/xdp-bench/xdp_redirect_cpumap.bpf.c#L385-L406 > > > > We intend to upstream that cpumap mode to xdp-tools as soon as the xfrm > > patches are in. (Note the linked code is a little buggy but the > > main idea is there). > > I don't understand how it survives anything, but sanity check. > To measure perf gains it needs to be under traffic for some time, > but > x = bpf_xdp_get_xfrm_state(ctx, &opts, sizeof(opts)); > will keep refcnt++ that state for every packet. > Minimum -> memory leak or refcnt overflow. Yeah, I agree the code in this patchset is not correct. I have the fix (a KF_RELEASE wrapper around xfrm_state_put()) ready to send. I think Steffen was gonna chat w/ you about this at IETF next week. But I can send it now if you'd like. To answer your question why it doesn't blow up immediately: * The test system only has ~33 inbound SAs and the test doesn't try to delete any. So leak is not noticed in the test. Oddly enough I recall `ip x s flush` working correctly... Could be misremembering. * Refcnt overflow will indeed happen, but some rough math shows it'll take about 12 hrs receiving at 100Gbps for that to happen. 100Gbps = 12.5 GB/s. 12.5GB / (32 CPUs) / (9000B) = 43k pps for each pcpu SA. INT_MAX = 2 billion. 2B / 4k = 46k. 46k seconds to hours is ~12 hrs. And I was only running traffic for ~1 hour. At least I think that math is right. Thanks, Daniel
On Wed, Nov 1, 2023 at 10:51 AM Daniel Xu <dxu@dxuuu.xyz> wrote: > > Yeah, I agree the code in this patchset is not correct. I have the fix > (a KF_RELEASE wrapper around xfrm_state_put()) ready to send. I think > Steffen was gonna chat w/ you about this at IETF next week. But I can > send it now if you'd like. I say send a new version with all issues addressed now, since it might help to frame the discussion at IETF. > > To answer your question why it doesn't blow up immediately: > > * The test system only has ~33 inbound SAs and the test doesn't try to > delete any. So leak is not noticed in the test. Oddly enough I recall > `ip x s flush` working correctly... Could be misremembering. > > * Refcnt overflow will indeed happen, but some rough math shows it'll > take about 12 hrs receiving at 100Gbps for that to happen. 100Gbps = > 12.5 GB/s. 12.5GB / (32 CPUs) / (9000B) = 43k pps for each pcpu SA. > INT_MAX = 2 billion. 2B / 4k = 46k. 46k seconds to hours is ~12 hrs. > And I was only running traffic for ~1 hour. > > At least I think that math is right. Makes sense.
diff --git a/include/net/xfrm.h b/include/net/xfrm.h index 98d7aa78adda..ab4cf66480f3 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -2188,4 +2188,13 @@ static inline int register_xfrm_interface_bpf(void) #endif +#if IS_ENABLED(CONFIG_DEBUG_INFO_BTF) +int register_xfrm_state_bpf(void); +#else +static inline int register_xfrm_state_bpf(void) +{ + return 0; +} +#endif + #endif /* _NET_XFRM_H */ diff --git a/net/xfrm/Makefile b/net/xfrm/Makefile index cd47f88921f5..547cec77ba03 100644 --- a/net/xfrm/Makefile +++ b/net/xfrm/Makefile @@ -21,3 +21,4 @@ obj-$(CONFIG_XFRM_USER_COMPAT) += xfrm_compat.o obj-$(CONFIG_XFRM_IPCOMP) += xfrm_ipcomp.o obj-$(CONFIG_XFRM_INTERFACE) += xfrm_interface.o obj-$(CONFIG_XFRM_ESPINTCP) += espintcp.o +obj-$(CONFIG_DEBUG_INFO_BTF) += xfrm_state_bpf.o diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 5cdd3bca3637..62e64fa7ae5c 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -4267,6 +4267,8 @@ void __init xfrm_init(void) #ifdef CONFIG_XFRM_ESPINTCP espintcp_init(); #endif + + register_xfrm_state_bpf(); } #ifdef CONFIG_AUDITSYSCALL diff --git a/net/xfrm/xfrm_state_bpf.c b/net/xfrm/xfrm_state_bpf.c new file mode 100644 index 000000000000..a73a17a6497b --- /dev/null +++ b/net/xfrm/xfrm_state_bpf.c @@ -0,0 +1,105 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Unstable XFRM state BPF helpers. + * + * Note that it is allowed to break compatibility for these functions since the + * interface they are exposed through to BPF programs is explicitly unstable. + */ + +#include <linux/bpf.h> +#include <linux/btf_ids.h> +#include <net/xdp.h> +#include <net/xfrm.h> + +/* bpf_xfrm_state_opts - Options for XFRM state lookup helpers + * + * Members: + * @error - Out parameter, set for any errors encountered + * Values: + * -EINVAL - netns_id is less than -1 + * -EINVAL - Passed NULL for opts + * -EINVAL - opts__sz isn't BPF_XFRM_STATE_OPTS_SZ + * -ENONET - No network namespace found for netns_id + * @netns_id - Specify the network namespace for lookup + * Values: + * BPF_F_CURRENT_NETNS (-1) + * Use namespace associated with ctx + * [0, S32_MAX] + * Network Namespace ID + * @mark - XFRM mark to match on + * @daddr - Destination address to match on + * @spi - Security parameter index to match on + * @proto - L3 protocol to match on + * @family - L3 protocol family to match on + */ +struct bpf_xfrm_state_opts { + s32 error; + s32 netns_id; + u32 mark; + xfrm_address_t daddr; + __be32 spi; + u8 proto; + u16 family; +}; + +enum { + BPF_XFRM_STATE_OPTS_SZ = sizeof(struct bpf_xfrm_state_opts), +}; + +__diag_push(); +__diag_ignore_all("-Wmissing-prototypes", + "Global functions as their definitions will be in xfrm_state BTF"); + +/* bpf_xdp_get_xfrm_state - Get XFRM state + * + * Parameters: + * @ctx - Pointer to ctx (xdp_md) in XDP program + * Cannot be NULL + * @opts - Options for lookup (documented above) + * Cannot be NULL + * @opts__sz - Length of the bpf_xfrm_state_opts structure + * Must be BPF_XFRM_STATE_OPTS_SZ + */ +__bpf_kfunc struct xfrm_state * +bpf_xdp_get_xfrm_state(struct xdp_md *ctx, struct bpf_xfrm_state_opts *opts, u32 opts__sz) +{ + struct xdp_buff *xdp = (struct xdp_buff *)ctx; + struct net *net = dev_net(xdp->rxq->dev); + + if (!opts || opts__sz != BPF_XFRM_STATE_OPTS_SZ) { + opts->error = -EINVAL; + return NULL; + } + + if (unlikely(opts->netns_id < BPF_F_CURRENT_NETNS)) { + opts->error = -EINVAL; + return NULL; + } + + if (opts->netns_id >= 0) { + net = get_net_ns_by_id(net, opts->netns_id); + if (unlikely(!net)) { + opts->error = -ENONET; + return NULL; + } + } + + return xfrm_state_lookup(net, opts->mark, &opts->daddr, opts->spi, + opts->proto, opts->family); +} + +__diag_pop() + +BTF_SET8_START(xfrm_state_kfunc_set) +BTF_ID_FLAGS(func, bpf_xdp_get_xfrm_state, KF_RET_NULL) +BTF_SET8_END(xfrm_state_kfunc_set) + +static const struct btf_kfunc_id_set xfrm_state_xdp_kfunc_set = { + .owner = THIS_MODULE, + .set = &xfrm_state_kfunc_set, +}; + +int __init register_xfrm_state_bpf(void) +{ + return register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, + &xfrm_state_xdp_kfunc_set); +}