Message ID | cover.1687819413.git.dxu@dxuuu.xyz |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp7808608vqr; Mon, 26 Jun 2023 16:03:46 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5oNJX8agq5XD18V1hT62wNydcs9L+9I/Xzx3oY2P34mQaMYXU6h0WMI4txre3KcAJZPMa0 X-Received: by 2002:a17:902:7c06:b0:1b3:a928:18e8 with SMTP id x6-20020a1709027c0600b001b3a92818e8mr7629221pll.52.1687820626294; Mon, 26 Jun 2023 16:03:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687820626; cv=none; d=google.com; s=arc-20160816; b=w3fv5OICHF30SsDeQAId/LWhT2pyOFFr4AWOZfAZWJbTI/Vu9GweDpwr6wPegMlWnk NNG83TiuuTxtu/x+dRyTu4pWMogeVOcSjK3iLyugbwQovYntCarzWlbYBvoR9Gmqn0rj khjJSTMrjrahuRpcAyknLWsyPj1fbaFGTEwqwxn/Kuw6mUMEv5/F9EbBHIHKyiU866Ti Al8DDph9sTsvxsXFkpwIunL8PYAt3b4ULKEaAQHbo/lWT0VKbijNZ5ZIkmY9IXz9S03p qh/eKtq6YprOM9ltLmY6cNGEuaas8SKUusNEWCTGNY23Jsqy8XxL2FpV2bVvEGFx16be mzqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:feedback-id:dkim-signature :dkim-signature; bh=i+8OuBfbObCWELCv1iVIGo3uNjpR3WlawSQL1LNzwes=; fh=N1h+3vIVaCNMlUWTHExNyA5gCsWpPgWDWBA8SlVq7PQ=; b=K6YhlSP0hag+P7+g4DgrmRigqY/Xpp5I3YzBgUHxFVg4fo40cEFK3pSxhZQ8MX7iMy HKo24mu9xixifHhBNKPqcI2j1VHkRnEsy4dUACzwbqWKD5Ho7FZG7o956gH4u3EM9zx+ Fh25K192ikyAqCc8ydfBm95exPSSTmnyTO5+jubdi1/KfbAw1+BrevLZcD3Y3Q7DTUQq cUYPPpiJl/96vIjoe2J3en63RT7DTGMWQmyhF6Hl+KqF/5aCWd7HhmLS0n/Dx+S+2ACc DS+PaRldcXpBDtycIubjekir3y3j9igerAGnk9Lri/m2Cdtg59tr/Wetu2A4sWKXClck oTtg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@dxuuu.xyz header.s=fm1 header.b=VYB854Ku; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=o8GnYwts; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lm14-20020a170903298e00b001b52e89895esi6042866plb.181.2023.06.26.16.03.33; Mon, 26 Jun 2023 16:03:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@dxuuu.xyz header.s=fm1 header.b=VYB854Ku; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=o8GnYwts; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229784AbjFZXC2 (ORCPT <rfc822;filip.gregor98@gmail.com> + 99 others); Mon, 26 Jun 2023 19:02:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51140 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229584AbjFZXCZ (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 26 Jun 2023 19:02:25 -0400 Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BB21010E2; Mon, 26 Jun 2023 16:02:23 -0700 (PDT) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id CCD135C0120; Mon, 26 Jun 2023 19:02:19 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute1.internal (MEProxy); Mon, 26 Jun 2023 19:02:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dxuuu.xyz; h=cc :cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:message-id:mime-version:reply-to:sender:subject :subject:to:to; s=fm1; t=1687820539; x=1687906939; bh=i+8OuBfbOb CWELCv1iVIGo3uNjpR3WlawSQL1LNzwes=; b=VYB854Kusx3ppHI4Z4B3ZcDzis LRnU0J6seQeLryAj39+NkNO7UK2sVheAurLHbXp/HlMbM5TjwJmgnPIrRX4UQZ6N bMEJZKSQAMphpWHYYCXuDtaCZF8n1CzyCafqbuzcFAksQfzOZRJtlEfB1NvuKgx7 zjWikoxx/DkV+poPwUYYF0p6XojcbTiTnCOwSSG9osczaNH1w2oTs0erP4kYlmGG /Z/x8viT6R/uWYGK65m9RRsc2+fY9K0f/Hr0pADrrnklQc5lGZhnNIDTGZys9Z4h l2nTbmcCUppXDotFevaI2qLKI2ZpoVCGAqKNwrSei+uKbn48Jik6fdTNZWeQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:message-id:mime-version:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; t=1687820539; x=1687906939; bh=i+8OuBfbObCWE LCv1iVIGo3uNjpR3WlawSQL1LNzwes=; b=o8GnYwts5WqsJGN8GMjOQMXmYalqu +3EOIGdNJ1haFTpwvAu9HAu2gip+u8LrYCq4/SmUw2vMncaOIPE3AeyTk5NR3Ei7 dQMJIt0wDaFEuirrIJSm+el3uJ8YfSBITTvLgrvhl57bbOh+z97JHLtIfHLVNLdk s3t1YbQ8GiIypFBL7RS0YvaTZDI573j1Tdx696IKKY6yLUCDnFAqQlVEMp2W4uW1 xP/wfxoz6ftKPOC3hZxDsm3oYfbD2NEEgkFzzZr8ujRd5cemc+7pccVw5nH8BRaq kayukgEeZT7/xAoPfi8ff8LwRH6PKYqF1EnpAsdEQ8GwOKfyA3ZzVWYaw== X-ME-Sender: <xms:-xiaZEhOAvBFyJ7gletXvWP7WgarqYHMxSjB403qMpvhtRaSf4lvFA> <xme:-xiaZNAgpiUl3rtwwO5HRL12SZAtUWgSb0arrTXX6cTkpOxKR63IAKQAsiug4eMjY ICDQq_bVeJZrhNBZA> X-ME-Received: <xmr:-xiaZMEwV-BAg-rxRVh5tPwwGoWVs-16fG9JX6Fe7E_2l8kAHrF9AEEoPH1wrxeghPaMlbm0s1Jp3MnYh6Pt1y9tSRFYdZ-kqDnJeAvt4lQ> X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrgeehgedgudejucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucgfrhhlucfvnfffucdlfeehmdenucfjughrpefhvf evufffkffoggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgvlhcuighuuceougig uhesugiguhhuuhdrgiihiieqnecuggftrfgrthhtvghrnhepueekjeeviefghfethedtgf duheevheelfffgueekffetheelieetgfdvfffghfffnecuffhomhgrihhnpehivghtfhdr ohhrghdpkhgvrhhnvghlrdhorhhgnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepugiguhesugiguhhuuhdrgiihii X-ME-Proxy: <xmx:-xiaZFSUvG_G0tfySAKsXAV1NqaufI3PaWz2CJmOzTJ7DzxPOg_8Nw> <xmx:-xiaZBxkQtJUsE-Zju4j-GQUZMT-toXCOBXm3Osoty6dxou1YTct7Q> <xmx:-xiaZD5k7D4shybHrbzCZHig8DAABy2bF2jA6G0w3dK1xVhoGQzdlg> <xmx:-xiaZAx_Gax5MBN_hgx9JgewsUVrc04bPEYv-_RL-bNYKBaVd3j3ig> Feedback-ID: i6a694271:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 26 Jun 2023 19:02:18 -0400 (EDT) From: Daniel Xu <dxu@dxuuu.xyz> To: bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, coreteam@netfilter.org, netfilter-devel@vger.kernel.org, fw@strlen.de, daniel@iogearbox.net Cc: dsahern@kernel.org Subject: [PATCH bpf-next 0/7] Support defragmenting IPv(4|6) packets in BPF Date: Mon, 26 Jun 2023 17:02:07 -0600 Message-Id: <cover.1687819413.git.dxu@dxuuu.xyz> X-Mailer: git-send-email 2.40.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769808200894118122?= X-GMAIL-MSGID: =?utf-8?q?1769808200894118122?= |
Series |
Support defragmenting IPv(4|6) packets in BPF
|
|
Message
Daniel Xu
June 26, 2023, 11:02 p.m. UTC
=== Context === In the context of a middlebox, fragmented packets are tricky to handle. The full 5-tuple of a packet is often only available in the first fragment which makes enforcing consistent policy difficult. There are really only two stateless options, neither of which are very nice: 1. Enforce policy on first fragment and accept all subsequent fragments. This works but may let in certain attacks or allow data exfiltration. 2. Enforce policy on first fragment and drop all subsequent fragments. This does not really work b/c some protocols may rely on fragmentation. For example, DNS may rely on oversized UDP packets for large responses. So stateful tracking is the only sane option. RFC 8900 [0] calls this out as well in section 6.3: Middleboxes [...] should process IP fragments in a manner that is consistent with [RFC0791] and [RFC8200]. In many cases, middleboxes must maintain state in order to achieve this goal. === BPF related bits === Policy has traditionally been enforced from XDP/TC hooks. Both hooks run before kernel reassembly facilities. However, with the new BPF_PROG_TYPE_NETFILTER, we can rather easily hook into existing netfilter reassembly infra. The basic idea is we bump a refcnt on the netfilter defrag module and then run the bpf prog after the defrag module runs. This allows bpf progs to transparently see full, reassembled packets. The nice thing about this is that progs don't have to carry around logic to detect fragments. === Patchset details === There was an earlier attempt at providing defrag via kfuncs [1]. The feedback was that we could end up doing too much stuff in prog execution context (like sending ICMP error replies). However, I think there are still some outstanding discussion w.r.t. performance when it comes to netfilter vs the previous approach. I'll schedule some time during office hours for this. Patches 1 & 2 are stolenfrom Florian. Hopefully he doesn't mind. There were some outstanding comments on the v2 [2] but it doesn't look like a v3 was ever submitted. I've addressed the comments and put them in this patchset cuz I needed them. Finally, the new selftest seems to be a little flaky. I'm not quite sure why the server will fail to `recvfrom()` occassionaly. I'm fairly sure it's a timing related issue with creating veths. I'll keep debugging but I didn't want that to hold up discussion on this patchset. [0]: https://datatracker.ietf.org/doc/html/rfc8900 [1]: https://lore.kernel.org/bpf/cover.1677526810.git.dxu@dxuuu.xyz/ [2]: https://lore.kernel.org/bpf/20230525110100.8212-1-fw@strlen.de/ Daniel Xu (7): tools: libbpf: add netfilter link attach helper selftests/bpf: Add bpf_program__attach_netfilter helper test netfilter: defrag: Add glue hooks for enabling/disabling defrag netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link bpf: selftests: Support not connecting client socket bpf: selftests: Support custom type and proto for client sockets bpf: selftests: Add defrag selftests include/linux/netfilter.h | 12 + include/uapi/linux/bpf.h | 5 + net/ipv4/netfilter/nf_defrag_ipv4.c | 8 + net/ipv6/netfilter/nf_defrag_ipv6_hooks.c | 10 + net/netfilter/core.c | 6 + net/netfilter/nf_bpf_link.c | 108 ++++++- tools/include/uapi/linux/bpf.h | 5 + tools/lib/bpf/bpf.c | 8 + tools/lib/bpf/bpf.h | 6 + tools/lib/bpf/libbpf.c | 47 +++ tools/lib/bpf/libbpf.h | 15 + tools/lib/bpf/libbpf.map | 1 + tools/testing/selftests/bpf/Makefile | 4 +- .../selftests/bpf/generate_udp_fragments.py | 90 ++++++ .../selftests/bpf/ip_check_defrag_frags.h | 57 ++++ tools/testing/selftests/bpf/network_helpers.c | 26 +- tools/testing/selftests/bpf/network_helpers.h | 3 + .../bpf/prog_tests/ip_check_defrag.c | 282 ++++++++++++++++++ .../bpf/prog_tests/netfilter_basic.c | 78 +++++ .../selftests/bpf/progs/ip_check_defrag.c | 104 +++++++ .../bpf/progs/test_netfilter_link_attach.c | 14 + 21 files changed, 868 insertions(+), 21 deletions(-) create mode 100755 tools/testing/selftests/bpf/generate_udp_fragments.py create mode 100644 tools/testing/selftests/bpf/ip_check_defrag_frags.h create mode 100644 tools/testing/selftests/bpf/prog_tests/ip_check_defrag.c create mode 100644 tools/testing/selftests/bpf/prog_tests/netfilter_basic.c create mode 100644 tools/testing/selftests/bpf/progs/ip_check_defrag.c create mode 100644 tools/testing/selftests/bpf/progs/test_netfilter_link_attach.c
Comments
Daniel Xu <dxu@dxuuu.xyz> wrote: > Patches 1 & 2 are stolenfrom Florian. Hopefully he doesn't mind. There > were some outstanding comments on the v2 [2] but it doesn't look like a > v3 was ever submitted. I've addressed the comments and put them in this > patchset cuz I needed them. I did not submit a v3 because i had to wait for the bpf -> bpf-next merge to get "bpf: netfilter: Add BPF_NETFILTER bpf_attach_type". Now that has been done so I will do v3 shortly.
Hi Florian, On Tue, Jun 27, 2023 at 12:48:20PM +0200, Florian Westphal wrote: > Daniel Xu <dxu@dxuuu.xyz> wrote: > > Patches 1 & 2 are stolenfrom Florian. Hopefully he doesn't mind. There > > were some outstanding comments on the v2 [2] but it doesn't look like a > > v3 was ever submitted. I've addressed the comments and put them in this > > patchset cuz I needed them. > > I did not submit a v3 because i had to wait for the bpf -> bpf-next > merge to get "bpf: netfilter: Add BPF_NETFILTER bpf_attach_type". > > Now that has been done so I will do v3 shortly. Ack. Will wait for your patches to go in before sending my v2. Thanks, Daniel
> The basic idea is we bump a refcnt on the netfilter defrag module and > then run the bpf prog after the defrag module runs. This allows bpf > progs to transparently see full, reassembled packets. The nice thing > about this is that progs don't have to carry around logic to detect > fragments. One high-level comment after glancing through the series: Instead of allocating a flag specifically for the defrag module, why not support loading (and holding) arbitrary netfilter modules in the UAPI? If we need to allocate a new flag every time someone wants to use a netfilter module along with BPF we'll run out of flags pretty quickly :) -Toke
Hi Toke, Thanks for taking a look at the patchset. On Tue, Jun 27, 2023 at 04:25:13PM +0200, Toke Høiland-Jørgensen wrote: > > The basic idea is we bump a refcnt on the netfilter defrag module and > > then run the bpf prog after the defrag module runs. This allows bpf > > progs to transparently see full, reassembled packets. The nice thing > > about this is that progs don't have to carry around logic to detect > > fragments. > > One high-level comment after glancing through the series: Instead of > allocating a flag specifically for the defrag module, why not support > loading (and holding) arbitrary netfilter modules in the UAPI? If we > need to allocate a new flag every time someone wants to use a netfilter > module along with BPF we'll run out of flags pretty quickly :) I don't have enough context on netfilter in general to say if it'd be generically useful -- perhaps Florian can comment on that. However, I'm not sure such a mechanism removes the need for a flag. The netfilter defrag modules still need to be called into to bump the refcnt. The module could export some kfuncs to inc/dec the refcnt, but it'd be rather odd for prog code to think about the lifetime of the attachment (as inc/dec for _each_ prog execution seems wasteful and slow). AFAIK all the other resource acquire/release APIs are for a single prog execution. So a flag for link attach feels the most natural to me. We could always add a flag2 field or something right? [...] Thanks, Daniel
Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > The basic idea is we bump a refcnt on the netfilter defrag module and > > then run the bpf prog after the defrag module runs. This allows bpf > > progs to transparently see full, reassembled packets. The nice thing > > about this is that progs don't have to carry around logic to detect > > fragments. > > One high-level comment after glancing through the series: Instead of > allocating a flag specifically for the defrag module, why not support > loading (and holding) arbitrary netfilter modules in the UAPI? How would that work/look like? defrag (and conntrack) need special handling because loading these modules has no effect on the datapath. Traditionally, yes, loading was enough, but now with netns being ubiquitous we don't want these to get enabled unless needed. Ignoring bpf, this happens when user adds nftables/iptables rules that check for conntrack state, use some form of NAT or use e.g. tproxy. For bpf a flag during link attachment seemed like the best way to go. At the moment I only see two flags for this, namely "need defrag" and "need conntrack". For conntrack, we MIGHT be able to not need a flag but maybe verifier could "guess" based on kfuncs used. But for defrag, I don't think its good to add a dummy do-nothing kfunc just for expressing the dependency on bpf prog side.
Florian Westphal <fw@strlen.de> writes: > Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> > The basic idea is we bump a refcnt on the netfilter defrag module and >> > then run the bpf prog after the defrag module runs. This allows bpf >> > progs to transparently see full, reassembled packets. The nice thing >> > about this is that progs don't have to carry around logic to detect >> > fragments. >> >> One high-level comment after glancing through the series: Instead of >> allocating a flag specifically for the defrag module, why not support >> loading (and holding) arbitrary netfilter modules in the UAPI? > > How would that work/look like? > > defrag (and conntrack) need special handling because loading these > modules has no effect on the datapath. > > Traditionally, yes, loading was enough, but now with netns being > ubiquitous we don't want these to get enabled unless needed. > > Ignoring bpf, this happens when user adds nftables/iptables rules > that check for conntrack state, use some form of NAT or use e.g. tproxy. > > For bpf a flag during link attachment seemed like the best way > to go. Right, I wasn't disputing that having a flag to load a module was a good idea. On the contrary, I was thinking we'd need many more of these if/when BPF wants to take advantage of more netfilter code. Say, if a BPF module wants to call into TPROXY, that module would also need go be loaded and kept around, no? I was thinking something along the lines of just having a field 'netfilter_modules[]' where userspace could put an arbitrary number of module names into, and we'd load all of them and put a ref into the bpf_link. In principle, we could just have that be a string array of module names, but that's probably a bit cumbersome (and, well, building a generic module loader interface into the bpf_like API is not desirable either). But maybe with an explicit ENUM? > At the moment I only see two flags for this, namely > "need defrag" and "need conntrack". > > For conntrack, we MIGHT be able to not need a flag but > maybe verifier could "guess" based on kfuncs used. If the verifier can just identify the modules from the kfuncs and do the whole thing automatically, that would of course be even better from an ease-of-use PoV. Not sure what that would take, though? I seem to recall having discussions around these lines before that fell down on various points. > But for defrag, I don't think its good to add a dummy do-nothing > kfunc just for expressing the dependency on bpf prog side. Agreed. -Toke
Toke Høiland-Jørgensen <toke@redhat.com> wrote: > Florian Westphal <fw@strlen.de> writes: > > For bpf a flag during link attachment seemed like the best way > > to go. > > Right, I wasn't disputing that having a flag to load a module was a good > idea. On the contrary, I was thinking we'd need many more of these > if/when BPF wants to take advantage of more netfilter code. Say, if a > BPF module wants to call into TPROXY, that module would also need go be > loaded and kept around, no? That seems to be a different topic that has nothing to do with either bpf_link or netfilter? If the program calls into say, TPROXY, then I'd expect that this needs to be handled via kfuncs, no? Or if I misunderstand, what do you mean by "call into TPROXY"? And if so, thats already handled at bpf_prog load time, not at link creation time, or do I miss something here? AFAIU, if prog uses such kfuncs, verifier will grab needed module ref and if module isn't loaded the kfuncs won't be found and program load fails. > I was thinking something along the lines of just having a field > 'netfilter_modules[]' where userspace could put an arbitrary number of > module names into, and we'd load all of them and put a ref into the > bpf_link. Why? I fail to understand the connection between bpf_link, netfilter and modules. What makes netfilter so special that we need such a module array, and what does that have to do with bpf_link interface? > In principle, we could just have that be a string array f > module names, but that's probably a bit cumbersome (and, well, building > a generic module loader interface into the bpf_like API is not > desirable either). But maybe with an explicit ENUM? What functionality does that provide? I can't think of a single module where this functionality is needed. Either we're talking about future kfuncs, then, as far as i understand how kfuncs work, this is handled at bpf_prog load time, not when the bpf_link is created. Or we are talking about implicit dependencies, where program doesn't call function X but needs functionality handled earlier in the pipeline? The only two instances I know where this is the case for netfilter is defrag + conntrack. > > For conntrack, we MIGHT be able to not need a flag but > > maybe verifier could "guess" based on kfuncs used. > > If the verifier can just identify the modules from the kfuncs and do the > whole thing automatically, that would of course be even better from an > ease-of-use PoV. Not sure what that would take, though? I seem to recall > having discussions around these lines before that fell down on various > points. AFAICS the conntrack kfuncs are wired to nf_conntrack already, so I would expect that the module has to be loaded already for the verifier to accept the program. Those kfuncs are not yet exposed to NETFILTER program types. Once they are, all that would be needed is for the netfilter bpf_link to be able tp detect that the prog is calling into those kfuncs, and then make the needed register/unregister calls to enable the conntrack hooks. Wheter thats better than using an explicit "please turn on conntrack for me", I don't know. Perhaps future bpf programs could access skb->_nfct directly without kfuncs so I'd say the flag is a better approach from an uapi point of view.
Florian Westphal <fw@strlen.de> writes: > Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> Florian Westphal <fw@strlen.de> writes: >> > For bpf a flag during link attachment seemed like the best way >> > to go. >> >> Right, I wasn't disputing that having a flag to load a module was a good >> idea. On the contrary, I was thinking we'd need many more of these >> if/when BPF wants to take advantage of more netfilter code. Say, if a >> BPF module wants to call into TPROXY, that module would also need go be >> loaded and kept around, no? > > That seems to be a different topic that has nothing to do with > either bpf_link or netfilter? > > If the program calls into say, TPROXY, then I'd expect that this needs > to be handled via kfuncs, no? Or if I misunderstand, what do you mean > by "call into TPROXY"? > > And if so, thats already handled at bpf_prog load time, not > at link creation time, or do I miss something here? > > AFAIU, if prog uses such kfuncs, verifier will grab needed module ref > and if module isn't loaded the kfuncs won't be found and program load > fails. ... > Or we are talking about implicit dependencies, where program doesn't > call function X but needs functionality handled earlier in the pipeline? > > The only two instances I know where this is the case for netfilter > is defrag + conntrack. Well, I was kinda mixing the two cases above, sorry about that. The "kfuncs locking the module" was not present in my mind when starting to talk about that bit... As for the original question, that's answered by your point above: If those two modules are the only ones that are likely to need this, then a flag for each is fine by me - that was the key piece I was missing (I'm not a netfilter expert, as you well know). Thanks for clarifying, and apologies for the muddled thinking! :) -Toke
Toke Høiland-Jørgensen <toke@redhat.com> wrote: > Florian Westphal <fw@strlen.de> writes: > As for the original question, that's answered by your point above: If > those two modules are the only ones that are likely to need this, then a > flag for each is fine by me - that was the key piece I was missing (I'm > not a netfilter expert, as you well know). No problem, I was worried I was missing an important piece of kfunc plumbing :-) You do raise a good point though. With kfuncs, module is pinned. So, should a "please turn on defrag for this bpf_link" pin the defrag modules too? For plain netfilter we don't do that, i.e. you can just do "rmmod nf_defrag_ipv4". But I suspect that for the new bpf-link defrag we probably should grab a reference to prevent unwanted functionality breakage of the bpf prog.
On Thu, Jun 29, 2023 at 04:53:15PM +0200, Florian Westphal wrote: > Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Florian Westphal <fw@strlen.de> writes: > > As for the original question, that's answered by your point above: If > > those two modules are the only ones that are likely to need this, then a > > flag for each is fine by me - that was the key piece I was missing (I'm > > not a netfilter expert, as you well know). > > No problem, I was worried I was missing an important piece of kfunc > plumbing :-) > > You do raise a good point though. With kfuncs, module is pinned. > So, should a "please turn on defrag for this bpf_link" pin > the defrag modules too? > > For plain netfilter we don't do that, i.e. you can just do > "rmmod nf_defrag_ipv4". But I suspect that for the new bpf-link > defrag we probably should grab a reference to prevent unwanted > functionality breakage of the bpf prog. Ack. Will add to v3. Thanks, Daniel