Message ID | cdbbc9df16044b568448ed9cd828d406f0851bfb.1690255889.git.yan@cloudflare.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9010:0:b0:3e4:2afc:c1 with SMTP id l16csp2253100vqg; Mon, 24 Jul 2023 22:40:58 -0700 (PDT) X-Google-Smtp-Source: APBJJlGZ4WcfmC413k3k2T76hKbSq+5aVbWzkQdvKplfebGpbZj3bTfr/yRbBf+ce+6amIT0G38l X-Received: by 2002:a17:90b:3907:b0:263:5c6a:1956 with SMTP id ob7-20020a17090b390700b002635c6a1956mr8246042pjb.25.1690263658449; Mon, 24 Jul 2023 22:40:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690263658; cv=none; d=google.com; s=arc-20160816; b=Lm2SLuzfl3HdmLy9IhYRseE5b92J3UH4Teof5H/6nkwyX7OS7Iw/vAK4VoSMZptY0b oCb657IUpU9yfCzIem9F9qizXBh6ajhgY7tnceecQdQhO9x092zGAFft429YHpRF7ZpG LBfYNq1kd2gUotWQyBFHzvOZfEIkQGoIgNmBF1+EtK3MFwZ9KaRnd9kIW6NqsffzpY/m Yl2lKmrtuEw2BvnAKCT1XIstVStlRsKhRnGVdNd2o7HIedERZ1/jTmdylSfcp2ZWeTgl g+WNC9EHA9lwzzS/wt6J16JNaf6GCoulsTvUjpF89eY/+LMJuc3O2udMf1tdMkhjIRuA V/jw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=dExAqqjX2sAceHiZkTjWc7ytvlJ4THivNCeY471el6A=; fh=HzHPdbt9AGkswnXXoAPFoSS2pL3/HPGqiE7o+b7+qyY=; b=bw6B9igm+UUGCDvfcByP94La0d6czfYtztV+5wqS4Z45x0fjnlnFzywk6VaDroDxNw 31sDKUIhFc8oWD7nA6sE7S12KZSdLAXwYLROeCn6vCWLxTs5u5Ze3PU8+xDtjmaplhUH +YidSRojVviBmpEFTsPEznC0bXC0y9xvIP5zabiNinRZpLDfYQjF3qrp/sTi3EHWuxzb fkGJPeZYiOZ84qbRziddNkrvm6tlk0vdGvvQWuWCLSojSlmUAFnJQBhy/2JH5AzmmWrv XgLYxTkVFRylDnCgCaFMwlLS811g/nuvxUtFXW0lUf8aidDAgpwuElP8vxdgA5iM3Tho uJsg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google header.b=a7JHbgZ4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cloudflare.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k191-20020a6384c8000000b0055c87771a7asi10803926pgd.149.2023.07.24.22.40.45; Mon, 24 Jul 2023 22:40:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google header.b=a7JHbgZ4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cloudflare.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230438AbjGYENR (ORCPT <rfc822;kautuk.consul.80@gmail.com> + 99 others); Tue, 25 Jul 2023 00:13:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49898 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230207AbjGYENQ (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 25 Jul 2023 00:13:16 -0400 Received: from mail-qv1-xf30.google.com (mail-qv1-xf30.google.com [IPv6:2607:f8b0:4864:20::f30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B3E9BE66 for <linux-kernel@vger.kernel.org>; Mon, 24 Jul 2023 21:13:14 -0700 (PDT) Received: by mail-qv1-xf30.google.com with SMTP id 6a1803df08f44-63cf40716ffso19475016d6.2 for <linux-kernel@vger.kernel.org>; Mon, 24 Jul 2023 21:13:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; t=1690258394; x=1690863194; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=dExAqqjX2sAceHiZkTjWc7ytvlJ4THivNCeY471el6A=; b=a7JHbgZ44jJD9d5BBzwGMElBDFTGmE6VPiSm37KRstnL6yyrYlU04Jyxj8DJlTLI0k 1GmawVpW5PeKff+kV0qUwqnxLNbVXOXAJSJ/D7TPQnkWacQb7Z1ntVAdH+WW/s/Wxgkv Gz1dvEr/HcdiSu3pizPxhZC03nzUjhgq5jOHg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690258394; x=1690863194; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=dExAqqjX2sAceHiZkTjWc7ytvlJ4THivNCeY471el6A=; b=i6Rmm9OJ+z76IWCkzWb/0Bvu2r0sA+P8fCBdDcLDrGrfOZ7URBncmyZgK3KqDDzIEV lq9SyJf9bAwXqOvm1c9IEfs8J8N83oCHRZ4aRX9kUELBLBmmSEkONY/o3LDjm2PF9Y9a x3E8gA6kNIGvWh2UfY1T7ZLLxxmnmLAQVZmA42abu6EHIqjB5MyyiFL8oBqAx32OSlam 3vq5+lMLIEvAUM6OILy8mnFgE4ssR/Lpn0QTW+SIOI0Pj36gOR40s60OTdm6u6S8W/9m FeG3O1ojAp5ZYW6E5h87IwNrIBbmGzwugt/AxLs1DFl+THOMz0RAwy4RMgeZCv7i+udT Z2FQ== X-Gm-Message-State: ABy/qLYwVhtGGh7jXS+UfOjGY8Ua6zoDg2UqS87KMA8a+jc7r2Hp2YVA E5747ubl82GSb8jxy7WUpH39kQ== X-Received: by 2002:a05:6214:184d:b0:63d:a05:256a with SMTP id d13-20020a056214184d00b0063d0a05256amr1596951qvy.8.1690258393862; Mon, 24 Jul 2023 21:13:13 -0700 (PDT) Received: from debian.debian ([140.141.197.139]) by smtp.gmail.com with ESMTPSA id j8-20020a0cf308000000b0063cdbe73a05sm833186qvl.97.2023.07.24.21.13.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jul 2023 21:13:13 -0700 (PDT) Date: Mon, 24 Jul 2023 21:13:10 -0700 From: Yan Zhai <yan@cloudflare.com> To: bpf@vger.kernel.org Cc: Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Andrii Nakryiko <andrii@kernel.org>, Martin KaFai Lau <martin.lau@linux.dev>, Song Liu <song@kernel.org>, Yonghong Song <yhs@fb.com>, John Fastabend <john.fastabend@gmail.com>, KP Singh <kpsingh@kernel.org>, Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>, Jiri Olsa <jolsa@kernel.org>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, Mykola Lysenko <mykolal@fb.com>, Shuah Khan <shuah@kernel.org>, Yan Zhai <yan@cloudflare.com>, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-kselftest@vger.kernel.org, kernel-team@cloudflare.com, Jordan Griege <jgriege@cloudflare.com> Subject: [PATCH v3 bpf 1/2] bpf: fix skb_do_redirect return values Message-ID: <cdbbc9df16044b568448ed9cd828d406f0851bfb.1690255889.git.yan@cloudflare.com> References: <cover.1690255889.git.yan@cloudflare.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <cover.1690255889.git.yan@cloudflare.com> X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772369905963881324 X-GMAIL-MSGID: 1772369905963881324 |
Series |
bpf: return proper error codes for lwt redirect
|
|
Commit Message
Yan Zhai
July 25, 2023, 4:13 a.m. UTC
skb_do_redirect returns various of values: error code (negative), 0
(success), and some positive status code, e.g. NET_XMIT_CN, NET_RX_DROP.
Such code are not handled at lwt xmit hook in function ip_finish_output2
and ip6_finish_output, which can cause unexpected problems. This change
converts the positive status code to proper error code.
Suggested-by: Stanislav Fomichev <sdf@google.com>
Reported-by: Jordan Griege <jgriege@cloudflare.com>
Signed-off-by: Yan Zhai <yan@cloudflare.com>
---
v3: converts also RX side return value in addition to TX values
v2: code style change suggested by Stanislav Fomichev
---
net/core/filter.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
Comments
On Mon, Jul 24, 2023 at 09:13 PM -07, Yan Zhai wrote: > skb_do_redirect returns various of values: error code (negative), 0 > (success), and some positive status code, e.g. NET_XMIT_CN, NET_RX_DROP. > Such code are not handled at lwt xmit hook in function ip_finish_output2 > and ip6_finish_output, which can cause unexpected problems. This change > converts the positive status code to proper error code. > > Suggested-by: Stanislav Fomichev <sdf@google.com> > Reported-by: Jordan Griege <jgriege@cloudflare.com> > Signed-off-by: Yan Zhai <yan@cloudflare.com> > > --- > v3: converts also RX side return value in addition to TX values > v2: code style change suggested by Stanislav Fomichev > --- > net/core/filter.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/net/core/filter.c b/net/core/filter.c > index 06ba0e56e369..3e232ce11ca0 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -2095,7 +2095,12 @@ static const struct bpf_func_proto bpf_csum_level_proto = { > > static inline int __bpf_rx_skb(struct net_device *dev, struct sk_buff *skb) > { > - return dev_forward_skb_nomtu(dev, skb); > + int ret = dev_forward_skb_nomtu(dev, skb); > + > + if (unlikely(ret > 0)) > + return -ENETDOWN; > + > + return 0; > } > > static inline int __bpf_rx_skb_no_mac(struct net_device *dev, > @@ -2106,6 +2111,8 @@ static inline int __bpf_rx_skb_no_mac(struct net_device *dev, > if (likely(!ret)) { > skb->dev = dev; > ret = netif_rx(skb); > + } else if (ret > 0) { > + return -ENETDOWN; > } > > return ret; > @@ -2129,6 +2136,9 @@ static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb) > ret = dev_queue_xmit(skb); > dev_xmit_recursion_dec(); > > + if (unlikely(ret > 0)) > + ret = net_xmit_errno(ret); > + > return ret; > } net_xmit_errno maps NET_XMIT_DROP to -ENOBUFS. It would make sense to me to map NET_RX_DROP to -ENOBUFS as well, instead of -ENETDOWN, to be consistent. It looks like the Fixes tag for this should point to the change that introduced BPF for LWT: Fixes: 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure")
On Tue, Jul 25, 2023 at 12:11 AM Markus Elfring <Markus.Elfring@web.de> wrote: > > > … unexpected problems. This change > > converts the positive status code to proper error code. > > Please choose a corresponding imperative change suggestion. > > See also: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.5-rc3#n94 > > > Did you provide sufficient justification for a possible addition of the tag “Fixes”? > > > … > > v2: code style change suggested by Stanislav Fomichev > > --- > > net/core/filter.c | 12 +++++++++++- > … > > How do you think about to replace this marker by a line break? > > See also: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.5-rc3#n711 > > Regards, > Markus Hi Markus, Thanks for the suggestions, those are what I could use more help with. Will address these in the next version. Yan
On Tue, Jul 25, 2023 at 4:14 AM Jakub Sitnicki <jakub@cloudflare.com> wrote: > > On Mon, Jul 24, 2023 at 09:13 PM -07, Yan Zhai wrote: > > skb_do_redirect returns various of values: error code (negative), 0 > > (success), and some positive status code, e.g. NET_XMIT_CN, NET_RX_DROP. > > Such code are not handled at lwt xmit hook in function ip_finish_output2 > > and ip6_finish_output, which can cause unexpected problems. This change > > converts the positive status code to proper error code. > > > > Suggested-by: Stanislav Fomichev <sdf@google.com> > > Reported-by: Jordan Griege <jgriege@cloudflare.com> > > Signed-off-by: Yan Zhai <yan@cloudflare.com> > > > > --- > > v3: converts also RX side return value in addition to TX values > > v2: code style change suggested by Stanislav Fomichev > > --- > > net/core/filter.c | 12 +++++++++++- > > 1 file changed, 11 insertions(+), 1 deletion(-) > > > > diff --git a/net/core/filter.c b/net/core/filter.c > > index 06ba0e56e369..3e232ce11ca0 100644 > > --- a/net/core/filter.c > > +++ b/net/core/filter.c > > @@ -2095,7 +2095,12 @@ static const struct bpf_func_proto bpf_csum_level_proto = { > > > > static inline int __bpf_rx_skb(struct net_device *dev, struct sk_buff *skb) > > { > > - return dev_forward_skb_nomtu(dev, skb); > > + int ret = dev_forward_skb_nomtu(dev, skb); > > + > > + if (unlikely(ret > 0)) > > + return -ENETDOWN; > > + > > + return 0; > > } > > > > static inline int __bpf_rx_skb_no_mac(struct net_device *dev, > > @@ -2106,6 +2111,8 @@ static inline int __bpf_rx_skb_no_mac(struct net_device *dev, > > if (likely(!ret)) { > > skb->dev = dev; > > ret = netif_rx(skb); > > + } else if (ret > 0) { > > + return -ENETDOWN; > > } > > > > return ret; > > @@ -2129,6 +2136,9 @@ static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb) > > ret = dev_queue_xmit(skb); > > dev_xmit_recursion_dec(); > > > > + if (unlikely(ret > 0)) > > + ret = net_xmit_errno(ret); > > + > > return ret; > > } > > net_xmit_errno maps NET_XMIT_DROP to -ENOBUFS. It would make sense to me > to map NET_RX_DROP to -ENOBUFS as well, instead of -ENETDOWN, to be > consistent. > In fact I looked at all those errno, but found none actually covers this situation completely. For the redirect ingress case, there are three reasons to fail: backlog full, dev down, and MTU issue. This won't be a problem for typical RX paths, since the error code is usually discarded by call sites of deliver_skb. But redirect ingress opens a call chain that would propagate this error to local sendmsg, which may be very confusing to troubleshoot in a complex environment (especially when backlog fills). That said I agree ENOBUF covers the most likely reason to fail (backlog). Let me change to that one in the next version if there are no new suggestions. > It looks like the Fixes tag for this should point to the change that > introduced BPF for LWT: > > Fixes: 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure") > Thanks for finding the tag. I was debating if it should be LWT commit or bpf_redirect commit: the error is not handled at LWT, but it seems actually innocent. The actual fix is the return value from the bpf redirect code. Let me incorporate both in the next one to justify better. -- Yan
On 07/25, Yan Zhai wrote: > On Tue, Jul 25, 2023 at 4:14 AM Jakub Sitnicki <jakub@cloudflare.com> wrote: > > > > On Mon, Jul 24, 2023 at 09:13 PM -07, Yan Zhai wrote: > > > skb_do_redirect returns various of values: error code (negative), 0 > > > (success), and some positive status code, e.g. NET_XMIT_CN, NET_RX_DROP. > > > Such code are not handled at lwt xmit hook in function ip_finish_output2 > > > and ip6_finish_output, which can cause unexpected problems. This change > > > converts the positive status code to proper error code. > > > > > > Suggested-by: Stanislav Fomichev <sdf@google.com> > > > Reported-by: Jordan Griege <jgriege@cloudflare.com> > > > Signed-off-by: Yan Zhai <yan@cloudflare.com> > > > > > > --- > > > v3: converts also RX side return value in addition to TX values > > > v2: code style change suggested by Stanislav Fomichev > > > --- > > > net/core/filter.c | 12 +++++++++++- > > > 1 file changed, 11 insertions(+), 1 deletion(-) > > > > > > diff --git a/net/core/filter.c b/net/core/filter.c > > > index 06ba0e56e369..3e232ce11ca0 100644 > > > --- a/net/core/filter.c > > > +++ b/net/core/filter.c > > > @@ -2095,7 +2095,12 @@ static const struct bpf_func_proto bpf_csum_level_proto = { > > > > > > static inline int __bpf_rx_skb(struct net_device *dev, struct sk_buff *skb) > > > { > > > - return dev_forward_skb_nomtu(dev, skb); > > > + int ret = dev_forward_skb_nomtu(dev, skb); > > > + > > > + if (unlikely(ret > 0)) > > > + return -ENETDOWN; > > > + > > > + return 0; > > > } > > > > > > static inline int __bpf_rx_skb_no_mac(struct net_device *dev, > > > @@ -2106,6 +2111,8 @@ static inline int __bpf_rx_skb_no_mac(struct net_device *dev, > > > if (likely(!ret)) { > > > skb->dev = dev; > > > ret = netif_rx(skb); > > > + } else if (ret > 0) { > > > + return -ENETDOWN; > > > } > > > > > > return ret; > > > @@ -2129,6 +2136,9 @@ static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb) > > > ret = dev_queue_xmit(skb); > > > dev_xmit_recursion_dec(); > > > > > > + if (unlikely(ret > 0)) > > > + ret = net_xmit_errno(ret); > > > + > > > return ret; > > > } > > > > net_xmit_errno maps NET_XMIT_DROP to -ENOBUFS. It would make sense to me > > to map NET_RX_DROP to -ENOBUFS as well, instead of -ENETDOWN, to be > > consistent. > > > In fact I looked at all those errno, but found none actually covers > this situation completely. For the redirect ingress case, there are > three reasons to fail: backlog full, dev down, and MTU issue. This > won't be a problem for typical RX paths, since the error code is > usually discarded by call sites of deliver_skb. But redirect ingress > opens a call chain that would propagate this error to local sendmsg, > which may be very confusing to troubleshoot in a complex environment > (especially when backlog fills). > > That said I agree ENOBUF covers the most likely reason to fail > (backlog). Let me change to that one in the next version if there are > no new suggestions. nit: also maybe wrap these rx paths into some new net_rx_errno ? To mirror the tx side. > > It looks like the Fixes tag for this should point to the change that > > introduced BPF for LWT: > > > > Fixes: 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure") > > > Thanks for finding the tag. I was debating if it should be LWT commit > or bpf_redirect commit: the error is not handled at LWT, but it seems > actually innocent. The actual fix is the return value from the bpf > redirect code. Let me incorporate both in the next one to justify > better. > > -- > Yan
diff --git a/net/core/filter.c b/net/core/filter.c index 06ba0e56e369..3e232ce11ca0 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2095,7 +2095,12 @@ static const struct bpf_func_proto bpf_csum_level_proto = { static inline int __bpf_rx_skb(struct net_device *dev, struct sk_buff *skb) { - return dev_forward_skb_nomtu(dev, skb); + int ret = dev_forward_skb_nomtu(dev, skb); + + if (unlikely(ret > 0)) + return -ENETDOWN; + + return 0; } static inline int __bpf_rx_skb_no_mac(struct net_device *dev, @@ -2106,6 +2111,8 @@ static inline int __bpf_rx_skb_no_mac(struct net_device *dev, if (likely(!ret)) { skb->dev = dev; ret = netif_rx(skb); + } else if (ret > 0) { + return -ENETDOWN; } return ret; @@ -2129,6 +2136,9 @@ static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb) ret = dev_queue_xmit(skb); dev_xmit_recursion_dec(); + if (unlikely(ret > 0)) + ret = net_xmit_errno(ret); + return ret; }