From patchwork Wed Dec 13 11:45:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 177929 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:3b04:b0:fb:cd0c:d3e with SMTP id c4csp7712597dys; Wed, 13 Dec 2023 03:46:15 -0800 (PST) X-Google-Smtp-Source: AGHT+IECND1v7265HGDqImNzd/4C9IF3qKuJ5OLs7oUhzJGQpShDJgYXTXwMxCUTLquCgAKj8IT+ X-Received: by 2002:a92:ca0c:0:b0:35d:a6af:5fc7 with SMTP id j12-20020a92ca0c000000b0035da6af5fc7mr10970051ils.53.1702467974735; Wed, 13 Dec 2023 03:46:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702467974; cv=none; d=google.com; s=arc-20160816; b=KDPSmQvXUdJtCNedjLj6LKxG7dwmgFH5wBWfau5oq/q/+qSaVFjWhMGermWs/5zlV6 iipK5nvR3UxLgQK3jMItvDfw4g0GlHIVnM5rB+C7KfLYNNvISXY6VhGvtK4iksALP20N B9ZqNBe5jLjJnN6SFuHMolFNJ0zqzWs7qWw5zRTOFYUGy1f50u+8LoPtbKkrvkIxD7MW zI1i8wgdyhfb5QwxA9MDJm00+PAoGgNr3/YM4Z5+nGQkrqaOL0NHknIuP2YVvg9jWEbS ysMk4k2cRp2Yh7pn2DBd0Jw2hGz3JfLj/NHGlLXS/vWyZ7Tv6q4DNfnLyenNs4aVGvh6 iCXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from; bh=6Mwbaqqz+kT5qLL8LTKLmmm6pkVjdGNnVmaTSvOawtE=; fh=DtebtD3C5NalZZ7aacdAi1fxTfGxukEt+FlPM6A5a6s=; b=rA7Yp1CfUQyz6FqYT+h89N/IG59RDyzls5v5N8zQA6Mw+fjbGaXLu/hk6FojcUy3SZ ArBAwVDO10nDHNMI9K7uEjrdJml/xLVn1ksfS4aleXytXQ4y/MTNNSZR9v2AoZZEta7p kvKtsRHyaZu6mz3wcmtSHgaqpwwg/5O59hvNOeD3C+2d8Uii/KbceZxCiT9iwO+2Koah qDRqjrH8lSDugC2kz7wmUU0mH5uF+iAIZlXlxwv9EDgrsmcsEcRKNsPrmQrKpUeESWo0 9YUujNJa/CqMZdRiY67F3tokVaR/Wi8UEvqeeAHuDzEc6EDRltOi7qWFlvSr7U3sbcmy b3dQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id b21-20020a63eb55000000b005c220a94525si9313126pgk.90.2023.12.13.03.46.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Dec 2023 03:46:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id C4D1D80C0389; Wed, 13 Dec 2023 03:46:13 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378115AbjLMLqD (ORCPT + 99 others); Wed, 13 Dec 2023 06:46:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55710 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235172AbjLMLp7 (ORCPT ); Wed, 13 Dec 2023 06:45:59 -0500 Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 88F2AF5; Wed, 13 Dec 2023 03:46:04 -0800 (PST) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R941e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046056;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=14;SR=0;TI=SMTPD_---0VyQzxuB_1702467960; Received: from j66a10360.sqa.eu95.tbsite.net(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0VyQzxuB_1702467960) by smtp.aliyun-inc.com; Wed, 13 Dec 2023 19:46:02 +0800 From: "D. Wythe" To: pablo@netfilter.org, kadlec@netfilter.org, fw@strlen.de Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, coreteam@netfilter.org, netfilter-devel@vger.kernel.org, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, ast@kernel.org, "D. Wythe" Subject: [RFC nf-next 1/2] netfilter: bpf: support prog update Date: Wed, 13 Dec 2023 19:45:44 +0800 Message-Id: <1702467945-38866-2-git-send-email-alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1702467945-38866-1-git-send-email-alibuda@linux.alibaba.com> References: <1702467945-38866-1-git-send-email-alibuda@linux.alibaba.com> X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Wed, 13 Dec 2023 03:46:13 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785167059360166641 X-GMAIL-MSGID: 1785167059360166641 From: "D. Wythe" To support the prog update, we need to ensure that the prog seen within the hook is always valid. Considering that hooks are always protected by rcu_read_lock(), which provide us the ability to use a new RCU-protected context to access the prog. Signed-off-by: D. Wythe --- net/netfilter/nf_bpf_link.c | 124 +++++++++++++++++++++++++++++++++++++++----- 1 file changed, 111 insertions(+), 13 deletions(-) diff --git a/net/netfilter/nf_bpf_link.c b/net/netfilter/nf_bpf_link.c index e502ec0..918c470 100644 --- a/net/netfilter/nf_bpf_link.c +++ b/net/netfilter/nf_bpf_link.c @@ -8,17 +8,11 @@ #include #include -static unsigned int nf_hook_run_bpf(void *bpf_prog, struct sk_buff *skb, - const struct nf_hook_state *s) +struct bpf_nf_hook_ctx { - const struct bpf_prog *prog = bpf_prog; - struct bpf_nf_ctx ctx = { - .state = s, - .skb = skb, - }; - - return bpf_prog_run(prog, &ctx); -} + struct bpf_prog *prog; + struct rcu_head rcu; +}; struct bpf_nf_link { struct bpf_link link; @@ -26,8 +20,59 @@ struct bpf_nf_link { struct net *net; u32 dead; const struct nf_defrag_hook *defrag_hook; + /* protect link update in parallel */ + struct mutex update_lock; + struct bpf_nf_hook_ctx __rcu *hook_ctx; }; +static struct bpf_nf_hook_ctx * +bpf_nf_hook_ctx_from_prog(struct bpf_prog *prog, gfp_t flags) +{ + struct bpf_nf_hook_ctx *hook_ctx; + + hook_ctx = kmalloc(sizeof(*hook_ctx), flags); + if (hook_ctx) { + hook_ctx->prog = prog; + bpf_prog_inc(prog); + } + return hook_ctx; +} + +static void bpf_nf_hook_ctx_free(struct bpf_nf_hook_ctx *hook_ctx) +{ + if (!hook_ctx) + return; + if (hook_ctx->prog) + bpf_prog_put(hook_ctx->prog); + kfree(hook_ctx); +} + +static void __bpf_nf_hook_ctx_free_rcu(struct rcu_head *head) +{ + struct bpf_nf_hook_ctx *hook_ctx = container_of(head, struct bpf_nf_hook_ctx, rcu); + + bpf_nf_hook_ctx_free(hook_ctx); +} + +static void bpf_nf_hook_ctx_free_rcu(struct bpf_nf_hook_ctx *hook_ctx) +{ + call_rcu(&hook_ctx->rcu, __bpf_nf_hook_ctx_free_rcu); +} + +static unsigned int nf_hook_run_bpf(void *bpf_link, struct sk_buff *skb, + const struct nf_hook_state *s) +{ + const struct bpf_nf_link *link = bpf_link; + struct bpf_nf_hook_ctx *hook_ctx; + struct bpf_nf_ctx ctx = { + .state = s, + .skb = skb, + }; + + hook_ctx = rcu_dereference(link->hook_ctx); + return bpf_prog_run(hook_ctx->prog, &ctx); +} + #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV4) || IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) static const struct nf_defrag_hook * get_proto_defrag_hook(struct bpf_nf_link *link, @@ -120,6 +165,10 @@ static void bpf_nf_link_release(struct bpf_link *link) if (!cmpxchg(&nf_link->dead, 0, 1)) { nf_unregister_net_hook(nf_link->net, &nf_link->hook_ops); bpf_nf_disable_defrag(nf_link); + /* Wait for outstanding hook to complete before the + * link gets released. + */ + synchronize_rcu(); } } @@ -127,6 +176,7 @@ static void bpf_nf_link_dealloc(struct bpf_link *link) { struct bpf_nf_link *nf_link = container_of(link, struct bpf_nf_link, link); + bpf_nf_hook_ctx_free(nf_link->hook_ctx); kfree(nf_link); } @@ -162,7 +212,42 @@ static int bpf_nf_link_fill_link_info(const struct bpf_link *link, static int bpf_nf_link_update(struct bpf_link *link, struct bpf_prog *new_prog, struct bpf_prog *old_prog) { - return -EOPNOTSUPP; + struct bpf_nf_link *nf_link = container_of(link, struct bpf_nf_link, link); + struct bpf_nf_hook_ctx *hook_ctx; + int err = 0; + + mutex_lock(&nf_link->update_lock); + + /* target old_prog mismatch */ + if (old_prog && link->prog != old_prog) { + err = -EPERM; + goto out; + } + + old_prog = link->prog; + if (old_prog == new_prog) { + /* don't need update */ + bpf_prog_put(new_prog); + goto out; + } + + hook_ctx = bpf_nf_hook_ctx_from_prog(new_prog, GFP_USER); + if (!hook_ctx) { + err = -ENOMEM; + goto out; + } + + /* replace and get the old one */ + hook_ctx = rcu_replace_pointer(nf_link->hook_ctx, hook_ctx, + lockdep_is_held(&nf_link->update_lock)); + /* free old hook_ctx */ + bpf_nf_hook_ctx_free_rcu(hook_ctx); + + old_prog = xchg(&link->prog, new_prog); + bpf_prog_put(old_prog); +out: + mutex_unlock(&nf_link->update_lock); + return err; } static const struct bpf_link_ops bpf_nf_link_lops = { @@ -222,11 +307,22 @@ int bpf_nf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) if (!link) return -ENOMEM; + link->hook_ctx = bpf_nf_hook_ctx_from_prog(prog, GFP_USER); + if (!link->hook_ctx) { + kfree(link); + return -ENOMEM; + } + bpf_link_init(&link->link, BPF_LINK_TYPE_NETFILTER, &bpf_nf_link_lops, prog); link->hook_ops.hook = nf_hook_run_bpf; link->hook_ops.hook_ops_type = NF_HOOK_OP_BPF; - link->hook_ops.priv = prog; + + /* bpf_nf_link_release() ensures that after its execution, there will be + * no ongoing or upcoming execution of nf_hook_run_bpf() within any context. + * Therefore, within nf_hook_run_bpf(), the link remains valid at all times." + */ + link->hook_ops.priv = link; link->hook_ops.pf = attr->link_create.netfilter.pf; link->hook_ops.priority = attr->link_create.netfilter.priority; @@ -236,9 +332,11 @@ int bpf_nf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) link->dead = false; link->defrag_hook = NULL; + mutex_init(&link->update_lock); + err = bpf_link_prime(&link->link, &link_primer); if (err) { - kfree(link); + bpf_nf_link_dealloc(&link->link); return err; }