Message ID | 20230313215553.1045175-1-aleksander.lobakin@intel.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1432992wrd; Mon, 13 Mar 2023 15:10:01 -0700 (PDT) X-Google-Smtp-Source: AK7set+IbJ/ykgisLO80oFE7GiCSepalco0XfO7UovSp3ymvB9/1+PdknvOzjlMBkfNHNDdAf4Fr X-Received: by 2002:a17:902:ce92:b0:19e:27a1:dd94 with SMTP id f18-20020a170902ce9200b0019e27a1dd94mr44155294plg.35.1678745401272; Mon, 13 Mar 2023 15:10:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678745401; cv=none; d=google.com; s=arc-20160816; b=IKQ3gkAReylJCiFh/lTwwSqMH5KErhOsko/y1E7Fw/DPJj2AtTpzl9OPlhMrE0AGVA XnzZJIihEG979rnHv23gqxOL0HT8AScRhh+xmVLE/5j1K0TLMo1Y1NfNhmE15uxppmXF nvdJptsw82w0bC7DaxYNKBfiawgc0T6+dmSH2GGCeX3u6LG5hJMUMcHS/FhExRxJu/cN BFlf9TX1AfuFRMnoDhLH2lviyUQlk3IimAfQIhgRgazkrPrtC4ZhjYxBMByNomt3AFps yNc/8iFPWQ6TE6m16LyYWF7R78piWlRczBvzbKMGtwPWOT3j8OpsbTUdXeckZpwbHxxv UNXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=Q10U2mlH8H4yAUTocSPwKTut23oRqgefnjKmoDESEdE=; b=Nn8WTtYiABxu1G92oxYNwxSyZkN20Cwlh69wVsfXnVrdpKT4luXPdMS+dtMyXxOfSL LyJXM1P+dtNWefQj3j6G+LZIzg+J785mARNtyChYkkwjaiA2PxTm9v8pU0K9HMCLnuwK FWjqohGpx5p7Fp+DSzqJQ5NMbUhLHMwIrISqoyCZFXMxai3LO0IErsNPiTb6409olrsZ WEDC2L7RvqjbrHvUMR+9quPvJadRztxQtSwNfgWyWNrwOteQpDZFt6JUBnNBNLVhnIGy uFfIl/cP8IX9VtaXapffzuurOoQNPIHeubuwcxspnjwmIkH/fIBBOba4Lb3yjr7f6Wqt Yq+g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=mOsXbz5T; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k7-20020a170902ba8700b0019ced5c611dsi683696pls.495.2023.03.13.15.09.44; Mon, 13 Mar 2023 15:10:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=mOsXbz5T; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230034AbjCMV46 (ORCPT <rfc822;realc9580@gmail.com> + 99 others); Mon, 13 Mar 2023 17:56:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51846 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229832AbjCMV44 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 13 Mar 2023 17:56:56 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3FD2F8C809; Mon, 13 Mar 2023 14:56:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1678744615; x=1710280615; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=pLr4zyKR9mScniLFymNZ6LCQtLSp1OBPkBpafzPWR8w=; b=mOsXbz5TDqndvg4wXIxK2e7SU2/jOBqvcbGDyC22Pqcy7ltJZbHwRmxq xeDrO4LZTzFoNIHyYs3DjezYShjjEmdf1ph7Z26Z6E6d8L9VuWOUbjT/5 lRavEke66gW37MPPparPkIB6czB8TMbJQwvaoDA3VQH6dDV0rl0r/KPUk ffN87SG0PODJnKUCXa07f50gNk2sNNmXA7ujrvSy2MnY3RHzQ1OvCDftV 4ikTZVlJ8Ype5SJZTaqv+wEGll5t+ngj3LQk9bLROxXLrW6Lsv+76ZwDq twD8FgPQn/BNjui8lHLDqxXy5cIKsUmCQtSYWH4TVbwHXmalaSxSmzIRh A==; X-IronPort-AV: E=McAfee;i="6500,9779,10648"; a="364928600" X-IronPort-AV: E=Sophos;i="5.98,258,1673942400"; d="scan'208";a="364928600" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Mar 2023 14:56:54 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10648"; a="747750966" X-IronPort-AV: E=Sophos;i="5.98,258,1673942400"; d="scan'208";a="747750966" Received: from newjersey.igk.intel.com ([10.102.20.203]) by fmsmga004.fm.intel.com with ESMTP; 13 Mar 2023 14:56:51 -0700 From: Alexander Lobakin <aleksander.lobakin@intel.com> To: Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Andrii Nakryiko <andrii@kernel.org>, Martin KaFai Lau <martin.lau@linux.dev> Cc: Alexander Lobakin <aleksander.lobakin@intel.com>, Maciej Fijalkowski <maciej.fijalkowski@intel.com>, Larysa Zaremba <larysa.zaremba@intel.com>, =?utf-8?q?Toke_H=C3=B8iland-J?= =?utf-8?q?=C3=B8rgensen?= <toke@redhat.com>, Song Liu <song@kernel.org>, Jesper Dangaard Brouer <hawk@kernel.org>, John Fastabend <john.fastabend@gmail.com>, Menglong Dong <imagedong@tencent.com>, Mykola Lysenko <mykolal@fb.com>, "David S. Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, Eric Dumazet <edumazet@google.com>, Paolo Abeni <pabeni@redhat.com>, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH bpf-next v3 0/4] xdp: recycle Page Pool backed skbs built from XDP frames Date: Mon, 13 Mar 2023 22:55:49 +0100 Message-Id: <20230313215553.1045175-1-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760292137802948721?= X-GMAIL-MSGID: =?utf-8?q?1760292137802948721?= |
Series |
xdp: recycle Page Pool backed skbs built from XDP frames
|
|
Message
Alexander Lobakin
March 13, 2023, 9:55 p.m. UTC
Yeah, I still remember that "Who needs cpumap nowadays" (c), but anyway. __xdp_build_skb_from_frame() missed the moment when the networking stack became able to recycle skb pages backed by a page_pool. This was making e.g. cpumap redirect even less effective than simple %XDP_PASS. veth was also affected in some scenarios. A lot of drivers use skb_mark_for_recycle() already, it's been almost two years and seems like there are no issues in using it in the generic code too. {__,}xdp_release_frame() can be then removed as it losts its last user. Page Pool becomes then zero-alloc (or almost) in the abovementioned cases, too. Other memory type models (who needs them at this point) have no changes. Some numbers on 1 Xeon Platinum core bombed with 27 Mpps of 64-byte IPv6 UDP, iavf w/XDP[0] (CONFIG_PAGE_POOL_STATS is enabled): Plain %XDP_PASS on baseline, Page Pool driver: src cpu Rx drops dst cpu Rx 2.1 Mpps N/A 2.1 Mpps cpumap redirect (cross-core, w/o leaving its NUMA node) on baseline: 6.8 Mpps 5.0 Mpps 1.8 Mpps cpumap redirect with skb PP recycling: 7.9 Mpps 5.7 Mpps 2.2 Mpps +22% (from cpumap redir on baseline) [0] https://github.com/alobakin/linux/commits/iavf-xdp Alexander Lobakin (4): selftests/bpf: robustify test_xdp_do_redirect with more payload magics net: page_pool, skbuff: make skb_mark_for_recycle() always available xdp: recycle Page Pool backed skbs built from XDP frames xdp: remove unused {__,}xdp_release_frame() include/linux/skbuff.h | 4 +-- include/net/xdp.h | 29 --------------- net/core/xdp.c | 19 ++-------- .../bpf/progs/test_xdp_do_redirect.c | 36 +++++++++++++------ 4 files changed, 30 insertions(+), 58 deletions(-) --- From v2[1]: * fix the test_xdp_do_redirect selftest failing after the series: it was relying on that %XDP_PASS frames can't be recycled on veth (BPF CI, Alexei); * explain "w/o leaving its node" in the cover letter (Jesper). From v1[2]: * make skb_mark_for_recycle() always available, otherwise there are build failures on non-PP systems (kbuild bot); * 'Page Pool' -> 'page_pool' when it's about a page_pool instance, not API (Jesper); * expanded test system info a bit in the cover letter (Jesper). [1] https://lore.kernel.org/bpf/20230303133232.2546004-1-aleksander.lobakin@intel.com [2] https://lore.kernel.org/bpf/20230301160315.1022488-1-aleksander.lobakin@intel.com
Comments
From: Alexander Lobakin <aleksander.lobakin@intel.com> Date: Mon, 13 Mar 2023 22:55:49 +0100 > Yeah, I still remember that "Who needs cpumap nowadays" (c), but anyway. > > __xdp_build_skb_from_frame() missed the moment when the networking stack > became able to recycle skb pages backed by a page_pool. This was making > e.g. cpumap redirect even less effective than simple %XDP_PASS. veth was > also affected in some scenarios. [...] Regarding failing tests, here's a piece of logs: #288 xdp_devmap_attach:OK [ 156.324473] IPv6: ADDRCONF(NETDEV_CHANGE): veth_src: link becomes ready [ 156.362859] bond2 (unregistering): Released all slaves #289 xdp_do_redirect:OK #290 xdp_info:OK [...] #297/1 xfrm_info/xfrm_info:OK #297 xfrm_info:OK All error logs: libbpf: prog 'trace_virtqueue_add_sgs': BPF program load failed: Bad address libbpf: prog 'trace_virtqueue_add_sgs': -- BEGIN PROG LOAD LOG -- The sequence of 8193 jumps is too complex. verification time 77808 usec stack depth 64 processed 156616 insns (limit 1000000) max_states_per_insn 8 total_states 1754 peak_states 1712 mark_read 12 -- END PROG LOAD LOG -- libbpf: prog 'trace_virtqueue_add_sgs': failed to load: -14 libbpf: failed to load object 'loop6.bpf.o' scale_test:FAIL:expect_success unexpected error: -14 (errno 14) #257 verif_scale_loop6:FAIL Summary: 288/1766 PASSED, 21 SKIPPED, 1 FAILED So, xdp_do_redirect, which was previously failing, now works fine. OTOH, "verif_scale_loop6" now fails, but from what I understand from the log, it has nothing with the series ("8193 jumps is too complex" -- I don't even touch program-related stuff). I don't know what's the reason of it failing, can it be some CI issues or maybe some recent commits? Thanks, Olek
On Tue, Mar 14, 2023 at 4:58 AM Alexander Lobakin <aleksander.lobakin@intel.com> wrote: > > All error logs: > libbpf: prog 'trace_virtqueue_add_sgs': BPF program load failed: Bad > address > libbpf: prog 'trace_virtqueue_add_sgs': -- BEGIN PROG LOAD LOG -- > The sequence of 8193 jumps is too complex. > verification time 77808 usec > stack depth 64 > processed 156616 insns (limit 1000000) max_states_per_insn 8 > total_states 1754 peak_states 1712 mark_read 12 > -- END PROG LOAD LOG -- > libbpf: prog 'trace_virtqueue_add_sgs': failed to load: -14 > libbpf: failed to load object 'loop6.bpf.o' > scale_test:FAIL:expect_success unexpected error: -14 (errno 14) > #257 verif_scale_loop6:FAIL > Summary: 288/1766 PASSED, 21 SKIPPED, 1 FAILED > > So, xdp_do_redirect, which was previously failing, now works fine. OTOH, > "verif_scale_loop6" now fails, but from what I understand from the log, > it has nothing with the series ("8193 jumps is too complex" -- I don't > even touch program-related stuff). I don't know what's the reason of it > failing, can it be some CI issues or maybe some recent commits? Yeah. It's an issue with the latest clang. We don't have a workaround for this yet. It's not a blocker for your patchset. We didn't have time to look at it closely.
Hello: This series was applied to bpf/bpf-next.git (master) by Alexei Starovoitov <ast@kernel.org>: On Mon, 13 Mar 2023 22:55:49 +0100 you wrote: > Yeah, I still remember that "Who needs cpumap nowadays" (c), but anyway. > > __xdp_build_skb_from_frame() missed the moment when the networking stack > became able to recycle skb pages backed by a page_pool. This was making > e.g. cpumap redirect even less effective than simple %XDP_PASS. veth was > also affected in some scenarios. > A lot of drivers use skb_mark_for_recycle() already, it's been almost > two years and seems like there are no issues in using it in the generic > code too. {__,}xdp_release_frame() can be then removed as it losts its > last user. > Page Pool becomes then zero-alloc (or almost) in the abovementioned > cases, too. Other memory type models (who needs them at this point) > have no changes. > > [...] Here is the summary with links: - [bpf-next,v3,1/4] selftests/bpf: robustify test_xdp_do_redirect with more payload magics https://git.kernel.org/bpf/bpf-next/c/487deb3e3393 - [bpf-next,v3,2/4] net: page_pool, skbuff: make skb_mark_for_recycle() always available https://git.kernel.org/bpf/bpf-next/c/2c854e5fcd7e - [bpf-next,v3,3/4] xdp: recycle Page Pool backed skbs built from XDP frames https://git.kernel.org/bpf/bpf-next/c/9c94bbf9a87b - [bpf-next,v3,4/4] xdp: remove unused {__,}xdp_release_frame() https://git.kernel.org/bpf/bpf-next/c/d4e492338d11 You are awesome, thank you!
On Tue, Mar 14, 2023 at 11:52 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Tue, Mar 14, 2023 at 4:58 AM Alexander Lobakin > <aleksander.lobakin@intel.com> wrote: > > > > All error logs: > > libbpf: prog 'trace_virtqueue_add_sgs': BPF program load failed: Bad > > address > > libbpf: prog 'trace_virtqueue_add_sgs': -- BEGIN PROG LOAD LOG -- > > The sequence of 8193 jumps is too complex. > > verification time 77808 usec > > stack depth 64 > > processed 156616 insns (limit 1000000) max_states_per_insn 8 > > total_states 1754 peak_states 1712 mark_read 12 > > -- END PROG LOAD LOG -- > > libbpf: prog 'trace_virtqueue_add_sgs': failed to load: -14 > > libbpf: failed to load object 'loop6.bpf.o' > > scale_test:FAIL:expect_success unexpected error: -14 (errno 14) > > #257 verif_scale_loop6:FAIL > > Summary: 288/1766 PASSED, 21 SKIPPED, 1 FAILED > > > > So, xdp_do_redirect, which was previously failing, now works fine. OTOH, > > "verif_scale_loop6" now fails, but from what I understand from the log, > > it has nothing with the series ("8193 jumps is too complex" -- I don't > > even touch program-related stuff). I don't know what's the reason of it > > failing, can it be some CI issues or maybe some recent commits? > > Yeah. It's an issue with the latest clang. > We don't have a workaround for this yet. > It's not a blocker for your patchset. > We didn't have time to look at it closely. I applied the workaround for this test. It's all green now except s390 where it fails with test_xdp_do_redirect:PASS:prog_run 0 nsec test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec test_xdp_do_redirect:PASS:pkt_count_zero 0 nsec test_xdp_do_redirect:FAIL:pkt_count_tc unexpected pkt_count_tc: actual 220 != expected 9998 test_max_pkt_size:PASS:prog_run_max_size 0 nsec test_max_pkt_size:PASS:prog_run_too_big 0 nsec close_netns:PASS:setns 0 nsec #289 xdp_do_redirect:FAIL Summary: 270/1674 PASSED, 30 SKIPPED, 1 FAILED Alex, could you please take a look at why it's happening? I suspect it's an endianness issue in: if (*metadata != 0x42) return XDP_ABORTED; but your patch didn't change that, so I'm not sure why it worked before.
From: Alexei Starovoitov <alexei.starovoitov@gmail.com> Date: Tue, 14 Mar 2023 16:54:25 -0700 > On Tue, Mar 14, 2023 at 11:52 AM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: [...] > test_xdp_do_redirect:PASS:prog_run 0 nsec > test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec > test_xdp_do_redirect:PASS:pkt_count_zero 0 nsec > test_xdp_do_redirect:FAIL:pkt_count_tc unexpected pkt_count_tc: actual > 220 != expected 9998 > test_max_pkt_size:PASS:prog_run_max_size 0 nsec > test_max_pkt_size:PASS:prog_run_too_big 0 nsec > close_netns:PASS:setns 0 nsec > #289 xdp_do_redirect:FAIL > Summary: 270/1674 PASSED, 30 SKIPPED, 1 FAILED > > Alex, > could you please take a look at why it's happening? > > I suspect it's an endianness issue in: > if (*metadata != 0x42) > return XDP_ABORTED; > but your patch didn't change that, > so I'm not sure why it worked before. Sure, lemme fix it real quick. Thanks, Olek
From: Alexander Lobakin <aleksander.lobakin@intel.com> Date: Wed, 15 Mar 2023 10:56:25 +0100 > From: Alexei Starovoitov <alexei.starovoitov@gmail.com> > Date: Tue, 14 Mar 2023 16:54:25 -0700 > >> On Tue, Mar 14, 2023 at 11:52 AM Alexei Starovoitov >> <alexei.starovoitov@gmail.com> wrote: > > [...] > >> test_xdp_do_redirect:PASS:prog_run 0 nsec >> test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec >> test_xdp_do_redirect:PASS:pkt_count_zero 0 nsec >> test_xdp_do_redirect:FAIL:pkt_count_tc unexpected pkt_count_tc: actual >> 220 != expected 9998 >> test_max_pkt_size:PASS:prog_run_max_size 0 nsec >> test_max_pkt_size:PASS:prog_run_too_big 0 nsec >> close_netns:PASS:setns 0 nsec >> #289 xdp_do_redirect:FAIL >> Summary: 270/1674 PASSED, 30 SKIPPED, 1 FAILED >> >> Alex, >> could you please take a look at why it's happening? >> >> I suspect it's an endianness issue in: >> if (*metadata != 0x42) >> return XDP_ABORTED; >> but your patch didn't change that, >> so I'm not sure why it worked before. > > Sure, lemme fix it real quick. Hi Ilya, Do you have s390 testing setups? Maybe you could take a look, since I don't have one and can't debug it? Doesn't seem to be Endianness issue. I mean, I have this (the below patch), but not sure it will fix anything -- IIRC eBPF arch always matches the host arch ._. I can't figure out from the code what does happen wrongly :s And it happens only on s390. Thanks, Olek --- diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c index 662b6c6c5ed7..b21371668447 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c @@ -107,7 +107,7 @@ void test_xdp_do_redirect(void) .attach_point = BPF_TC_INGRESS); memcpy(&data[sizeof(__u32)], &pkt_udp, sizeof(pkt_udp)); - *((__u32 *)data) = 0x42; /* metadata test value */ + *((__u32 *)data) = htonl(0x42); /* metadata test value */ skel = test_xdp_do_redirect__open(); if (!ASSERT_OK_PTR(skel, "skel")) diff --git a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c index cd2d4e3258b8..2475bc30ced2 100644 --- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 #include <vmlinux.h> +#include <bpf/bpf_endian.h> #include <bpf/bpf_helpers.h> #define ETH_ALEN 6 @@ -28,7 +29,7 @@ volatile int retcode = XDP_REDIRECT; SEC("xdp") int xdp_redirect(struct xdp_md *xdp) { - __u32 *metadata = (void *)(long)xdp->data_meta; + __be32 *metadata = (void *)(long)xdp->data_meta; void *data_end = (void *)(long)xdp->data_end; void *data = (void *)(long)xdp->data; @@ -44,7 +45,7 @@ int xdp_redirect(struct xdp_md *xdp) if (metadata + 1 > data) return XDP_ABORTED; - if (*metadata != 0x42) + if (*metadata != __bpf_htonl(0x42)) return XDP_ABORTED; if (*payload == MARK_XMIT)
On Wed, 2023-03-15 at 11:54 +0100, Alexander Lobakin wrote: > From: Alexander Lobakin <aleksander.lobakin@intel.com> > Date: Wed, 15 Mar 2023 10:56:25 +0100 > > > From: Alexei Starovoitov <alexei.starovoitov@gmail.com> > > Date: Tue, 14 Mar 2023 16:54:25 -0700 > > > > > On Tue, Mar 14, 2023 at 11:52 AM Alexei Starovoitov > > > <alexei.starovoitov@gmail.com> wrote: > > > > [...] > > > > > test_xdp_do_redirect:PASS:prog_run 0 nsec > > > test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec > > > test_xdp_do_redirect:PASS:pkt_count_zero 0 nsec > > > test_xdp_do_redirect:FAIL:pkt_count_tc unexpected pkt_count_tc: > > > actual > > > 220 != expected 9998 > > > test_max_pkt_size:PASS:prog_run_max_size 0 nsec > > > test_max_pkt_size:PASS:prog_run_too_big 0 nsec > > > close_netns:PASS:setns 0 nsec > > > #289 xdp_do_redirect:FAIL > > > Summary: 270/1674 PASSED, 30 SKIPPED, 1 FAILED > > > > > > Alex, > > > could you please take a look at why it's happening? > > > > > > I suspect it's an endianness issue in: > > > if (*metadata != 0x42) > > > return XDP_ABORTED; > > > but your patch didn't change that, > > > so I'm not sure why it worked before. > > > > Sure, lemme fix it real quick. > > Hi Ilya, > > Do you have s390 testing setups? Maybe you could take a look, since I > don't have one and can't debug it? Doesn't seem to be Endianness > issue. > I mean, I have this (the below patch), but not sure it will fix > anything -- IIRC eBPF arch always matches the host arch ._. > I can't figure out from the code what does happen wrongly :s And it > happens only on s390. > > Thanks, > Olek > --- > diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c > b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c > index 662b6c6c5ed7..b21371668447 100644 > --- a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c > +++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c > @@ -107,7 +107,7 @@ void test_xdp_do_redirect(void) > .attach_point = BPF_TC_INGRESS); > > memcpy(&data[sizeof(__u32)], &pkt_udp, sizeof(pkt_udp)); > - *((__u32 *)data) = 0x42; /* metadata test value */ > + *((__u32 *)data) = htonl(0x42); /* metadata test value */ > > skel = test_xdp_do_redirect__open(); > if (!ASSERT_OK_PTR(skel, "skel")) > diff --git a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c > b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c > index cd2d4e3258b8..2475bc30ced2 100644 > --- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c > +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c > @@ -1,5 +1,6 @@ > // SPDX-License-Identifier: GPL-2.0 > #include <vmlinux.h> > +#include <bpf/bpf_endian.h> > #include <bpf/bpf_helpers.h> > > #define ETH_ALEN 6 > @@ -28,7 +29,7 @@ volatile int retcode = XDP_REDIRECT; > SEC("xdp") > int xdp_redirect(struct xdp_md *xdp) > { > - __u32 *metadata = (void *)(long)xdp->data_meta; > + __be32 *metadata = (void *)(long)xdp->data_meta; > void *data_end = (void *)(long)xdp->data_end; > void *data = (void *)(long)xdp->data; > > @@ -44,7 +45,7 @@ int xdp_redirect(struct xdp_md *xdp) > if (metadata + 1 > data) > return XDP_ABORTED; > > - if (*metadata != 0x42) > + if (*metadata != __bpf_htonl(0x42)) > return XDP_ABORTED; > > if (*payload == MARK_XMIT) Okay, I'll take a look. Two quick observations for now: - Unfortunately the above patch does not help. - In dmesg I see: Driver unsupported XDP return value 0 on prog xdp_redirect (id 23) dev N/A, expect packet loss!
On Wed, Mar 15, 2023 at 3:55 AM Alexander Lobakin <aleksander.lobakin@intel.com> wrote: > > From: Alexander Lobakin <aleksander.lobakin@intel.com> > Date: Wed, 15 Mar 2023 10:56:25 +0100 > > > From: Alexei Starovoitov <alexei.starovoitov@gmail.com> > > Date: Tue, 14 Mar 2023 16:54:25 -0700 > > > >> On Tue, Mar 14, 2023 at 11:52 AM Alexei Starovoitov > >> <alexei.starovoitov@gmail.com> wrote: > > > > [...] > > > >> test_xdp_do_redirect:PASS:prog_run 0 nsec > >> test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec > >> test_xdp_do_redirect:PASS:pkt_count_zero 0 nsec > >> test_xdp_do_redirect:FAIL:pkt_count_tc unexpected pkt_count_tc: actual > >> 220 != expected 9998 > >> test_max_pkt_size:PASS:prog_run_max_size 0 nsec > >> test_max_pkt_size:PASS:prog_run_too_big 0 nsec > >> close_netns:PASS:setns 0 nsec > >> #289 xdp_do_redirect:FAIL > >> Summary: 270/1674 PASSED, 30 SKIPPED, 1 FAILED > >> > >> Alex, > >> could you please take a look at why it's happening? > >> > >> I suspect it's an endianness issue in: > >> if (*metadata != 0x42) > >> return XDP_ABORTED; > >> but your patch didn't change that, > >> so I'm not sure why it worked before. > > > > Sure, lemme fix it real quick. > > Hi Ilya, > > Do you have s390 testing setups? Maybe you could take a look, since I > don't have one and can't debug it? Doesn't seem to be Endianness issue. > I mean, I have this (the below patch), but not sure it will fix > anything -- IIRC eBPF arch always matches the host arch ._. > I can't figure out from the code what does happen wrongly :s And it > happens only on s390. > > Thanks, > Olek > --- > diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c > index 662b6c6c5ed7..b21371668447 100644 > --- a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c > +++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c > @@ -107,7 +107,7 @@ void test_xdp_do_redirect(void) > .attach_point = BPF_TC_INGRESS); > > memcpy(&data[sizeof(__u32)], &pkt_udp, sizeof(pkt_udp)); > - *((__u32 *)data) = 0x42; /* metadata test value */ > + *((__u32 *)data) = htonl(0x42); /* metadata test value */ > > skel = test_xdp_do_redirect__open(); > if (!ASSERT_OK_PTR(skel, "skel")) > diff --git a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c > index cd2d4e3258b8..2475bc30ced2 100644 > --- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c > +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c > @@ -1,5 +1,6 @@ > // SPDX-License-Identifier: GPL-2.0 > #include <vmlinux.h> > +#include <bpf/bpf_endian.h> > #include <bpf/bpf_helpers.h> > > #define ETH_ALEN 6 > @@ -28,7 +29,7 @@ volatile int retcode = XDP_REDIRECT; > SEC("xdp") > int xdp_redirect(struct xdp_md *xdp) > { > - __u32 *metadata = (void *)(long)xdp->data_meta; > + __be32 *metadata = (void *)(long)xdp->data_meta; > void *data_end = (void *)(long)xdp->data_end; > void *data = (void *)(long)xdp->data; > > @@ -44,7 +45,7 @@ int xdp_redirect(struct xdp_md *xdp) > if (metadata + 1 > data) > return XDP_ABORTED; > > - if (*metadata != 0x42) > + if (*metadata != __bpf_htonl(0x42)) > return XDP_ABORTED; Looks sane to me. I'd probably use 'u8 * metadata' instead. Both in bpf and user space just not to worry about endianness. Could you please submit an official patch and let CI judge?
On Wed, 2023-03-15 at 15:54 +0100, Ilya Leoshkevich wrote: > On Wed, 2023-03-15 at 11:54 +0100, Alexander Lobakin wrote: > > From: Alexander Lobakin <aleksander.lobakin@intel.com> > > Date: Wed, 15 Mar 2023 10:56:25 +0100 > > > > > From: Alexei Starovoitov <alexei.starovoitov@gmail.com> > > > Date: Tue, 14 Mar 2023 16:54:25 -0700 > > > > > > > On Tue, Mar 14, 2023 at 11:52 AM Alexei Starovoitov > > > > <alexei.starovoitov@gmail.com> wrote: > > > > > > [...] > > > > > > > test_xdp_do_redirect:PASS:prog_run 0 nsec > > > > test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec > > > > test_xdp_do_redirect:PASS:pkt_count_zero 0 nsec > > > > test_xdp_do_redirect:FAIL:pkt_count_tc unexpected pkt_count_tc: > > > > actual > > > > 220 != expected 9998 > > > > test_max_pkt_size:PASS:prog_run_max_size 0 nsec > > > > test_max_pkt_size:PASS:prog_run_too_big 0 nsec > > > > close_netns:PASS:setns 0 nsec > > > > #289 xdp_do_redirect:FAIL > > > > Summary: 270/1674 PASSED, 30 SKIPPED, 1 FAILED > > > > > > > > Alex, > > > > could you please take a look at why it's happening? > > > > > > > > I suspect it's an endianness issue in: > > > > if (*metadata != 0x42) > > > > return XDP_ABORTED; > > > > but your patch didn't change that, > > > > so I'm not sure why it worked before. > > > > > > Sure, lemme fix it real quick. > > > > Hi Ilya, > > > > Do you have s390 testing setups? Maybe you could take a look, since > > I > > don't have one and can't debug it? Doesn't seem to be Endianness > > issue. > > I mean, I have this (the below patch), but not sure it will fix > > anything -- IIRC eBPF arch always matches the host arch ._. > > I can't figure out from the code what does happen wrongly :s And it > > happens only on s390. > > > > Thanks, > > Olek > > --- > > diff --git > > a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c > > b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c > > index 662b6c6c5ed7..b21371668447 100644 > > --- a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c > > +++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c > > @@ -107,7 +107,7 @@ void test_xdp_do_redirect(void) > > .attach_point = BPF_TC_INGRESS); > > > > memcpy(&data[sizeof(__u32)], &pkt_udp, sizeof(pkt_udp)); > > - *((__u32 *)data) = 0x42; /* metadata test value */ > > + *((__u32 *)data) = htonl(0x42); /* metadata test value */ > > > > skel = test_xdp_do_redirect__open(); > > if (!ASSERT_OK_PTR(skel, "skel")) > > diff --git > > a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c > > b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c > > index cd2d4e3258b8..2475bc30ced2 100644 > > --- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c > > +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c > > @@ -1,5 +1,6 @@ > > // SPDX-License-Identifier: GPL-2.0 > > #include <vmlinux.h> > > +#include <bpf/bpf_endian.h> > > #include <bpf/bpf_helpers.h> > > > > #define ETH_ALEN 6 > > @@ -28,7 +29,7 @@ volatile int retcode = XDP_REDIRECT; > > SEC("xdp") > > int xdp_redirect(struct xdp_md *xdp) > > { > > - __u32 *metadata = (void *)(long)xdp->data_meta; > > + __be32 *metadata = (void *)(long)xdp->data_meta; > > void *data_end = (void *)(long)xdp->data_end; > > void *data = (void *)(long)xdp->data; > > > > @@ -44,7 +45,7 @@ int xdp_redirect(struct xdp_md *xdp) > > if (metadata + 1 > data) > > return XDP_ABORTED; > > > > - if (*metadata != 0x42) > > + if (*metadata != __bpf_htonl(0x42)) > > return XDP_ABORTED; > > > > if (*payload == MARK_XMIT) > > Okay, I'll take a look. Two quick observations for now: > > - Unfortunately the above patch does not help. > > - In dmesg I see: > > Driver unsupported XDP return value 0 on prog xdp_redirect (id > 23) > dev N/A, expect packet loss! I haven't identified the issue yet, but I have found a couple more things that might be helpful: - In problematic cases metadata contains 0, so this is not an endianness issue. data is still reasonable though. I'm trying to understand what is causing this. - Applying the following diff: --- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c @@ -52,7 +52,7 @@ int xdp_redirect(struct xdp_md *xdp) *payload = MARK_IN; - if (bpf_xdp_adjust_meta(xdp, 4)) + if (false && bpf_xdp_adjust_meta(xdp, 4)) return XDP_ABORTED; if (retcode > XDP_PASS) causes a kernel panic even on x86_64: BUG: kernel NULL pointer dereference, address: 0000000000000d28 ... Call Trace: <TASK> build_skb_around+0x22/0xb0 __xdp_build_skb_from_frame+0x4e/0x130 bpf_test_run_xdp_live+0x65f/0x7c0 ? __pfx_xdp_test_run_init_page+0x10/0x10 bpf_prog_test_run_xdp+0x2ba/0x480 bpf_prog_test_run+0xeb/0x110 __sys_bpf+0x2b9/0x570 __x64_sys_bpf+0x1c/0x30 do_syscall_64+0x48/0xa0 entry_SYSCALL_64_after_hwframe+0x72/0xdc I haven't looked into this at all, but I believe this needs to be fixed - BPF should never cause kernel panics.
From: Ilya Leoshkevich <iii@linux.ibm.com> Date: Wed, 15 Mar 2023 19:00:47 +0100 > On Wed, 2023-03-15 at 15:54 +0100, Ilya Leoshkevich wrote: >> On Wed, 2023-03-15 at 11:54 +0100, Alexander Lobakin wrote: >>> From: Alexander Lobakin <aleksander.lobakin@intel.com> >>> Date: Wed, 15 Mar 2023 10:56:25 +0100 >>> >>>> From: Alexei Starovoitov <alexei.starovoitov@gmail.com> >>>> Date: Tue, 14 Mar 2023 16:54:25 -0700 >>>> >>>>> On Tue, Mar 14, 2023 at 11:52 AM Alexei Starovoitov >>>>> <alexei.starovoitov@gmail.com> wrote: >>>> >>>> [...] >>>> >>>>> test_xdp_do_redirect:PASS:prog_run 0 nsec >>>>> test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec >>>>> test_xdp_do_redirect:PASS:pkt_count_zero 0 nsec >>>>> test_xdp_do_redirect:FAIL:pkt_count_tc unexpected pkt_count_tc: >>>>> actual >>>>> 220 != expected 9998 >>>>> test_max_pkt_size:PASS:prog_run_max_size 0 nsec >>>>> test_max_pkt_size:PASS:prog_run_too_big 0 nsec >>>>> close_netns:PASS:setns 0 nsec >>>>> #289 xdp_do_redirect:FAIL >>>>> Summary: 270/1674 PASSED, 30 SKIPPED, 1 FAILED >>>>> >>>>> Alex, >>>>> could you please take a look at why it's happening? >>>>> >>>>> I suspect it's an endianness issue in: >>>>> if (*metadata != 0x42) >>>>> return XDP_ABORTED; >>>>> but your patch didn't change that, >>>>> so I'm not sure why it worked before. >>>> >>>> Sure, lemme fix it real quick. >>> >>> Hi Ilya, >>> >>> Do you have s390 testing setups? Maybe you could take a look, since >>> I >>> don't have one and can't debug it? Doesn't seem to be Endianness >>> issue. >>> I mean, I have this (the below patch), but not sure it will fix >>> anything -- IIRC eBPF arch always matches the host arch ._. >>> I can't figure out from the code what does happen wrongly :s And it >>> happens only on s390. >>> >>> Thanks, >>> Olek >>> --- >>> diff --git >>> a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c >>> b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c >>> index 662b6c6c5ed7..b21371668447 100644 >>> --- a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c >>> +++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c >>> @@ -107,7 +107,7 @@ void test_xdp_do_redirect(void) >>> .attach_point = BPF_TC_INGRESS); >>> >>> memcpy(&data[sizeof(__u32)], &pkt_udp, sizeof(pkt_udp)); >>> - *((__u32 *)data) = 0x42; /* metadata test value */ >>> + *((__u32 *)data) = htonl(0x42); /* metadata test value */ >>> >>> skel = test_xdp_do_redirect__open(); >>> if (!ASSERT_OK_PTR(skel, "skel")) >>> diff --git >>> a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c >>> b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c >>> index cd2d4e3258b8..2475bc30ced2 100644 >>> --- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c >>> +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c >>> @@ -1,5 +1,6 @@ >>> // SPDX-License-Identifier: GPL-2.0 >>> #include <vmlinux.h> >>> +#include <bpf/bpf_endian.h> >>> #include <bpf/bpf_helpers.h> >>> >>> #define ETH_ALEN 6 >>> @@ -28,7 +29,7 @@ volatile int retcode = XDP_REDIRECT; >>> SEC("xdp") >>> int xdp_redirect(struct xdp_md *xdp) >>> { >>> - __u32 *metadata = (void *)(long)xdp->data_meta; >>> + __be32 *metadata = (void *)(long)xdp->data_meta; >>> void *data_end = (void *)(long)xdp->data_end; >>> void *data = (void *)(long)xdp->data; >>> >>> @@ -44,7 +45,7 @@ int xdp_redirect(struct xdp_md *xdp) >>> if (metadata + 1 > data) >>> return XDP_ABORTED; >>> >>> - if (*metadata != 0x42) >>> + if (*metadata != __bpf_htonl(0x42)) >>> return XDP_ABORTED; >>> >>> if (*payload == MARK_XMIT) >> >> Okay, I'll take a look. Two quick observations for now: >> >> - Unfortunately the above patch does not help. >> >> - In dmesg I see: >> >> Driver unsupported XDP return value 0 on prog xdp_redirect (id >> 23) >> dev N/A, expect packet loss! > > I haven't identified the issue yet, but I have found a couple more > things that might be helpful: > > - In problematic cases metadata contains 0, so this is not an > endianness issue. data is still reasonable though. I'm trying to > understand what is causing this. > > - Applying the following diff: > > --- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c > +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c > @@ -52,7 +52,7 @@ int xdp_redirect(struct xdp_md *xdp) > > *payload = MARK_IN; > > - if (bpf_xdp_adjust_meta(xdp, 4)) > + if (false && bpf_xdp_adjust_meta(xdp, 4)) > return XDP_ABORTED; > > if (retcode > XDP_PASS) > > causes a kernel panic even on x86_64: > > BUG: kernel NULL pointer dereference, address: 0000000000000d28 > ... > Call Trace: > <TASK> > build_skb_around+0x22/0xb0 > __xdp_build_skb_from_frame+0x4e/0x130 > bpf_test_run_xdp_live+0x65f/0x7c0 > ? __pfx_xdp_test_run_init_page+0x10/0x10 > bpf_prog_test_run_xdp+0x2ba/0x480 > bpf_prog_test_run+0xeb/0x110 > __sys_bpf+0x2b9/0x570 > __x64_sys_bpf+0x1c/0x30 > do_syscall_64+0x48/0xa0 > entry_SYSCALL_64_after_hwframe+0x72/0xdc > > I haven't looked into this at all, but I believe this needs to be > fixed - BPF should never cause kernel panics. This one is basically the same issue as syzbot mentioned today (separate subthread). I'm waiting for a feedback from Toke on which way of fixing he'd prefer (I proposed 2). If those zeroed metadata magics that you observe have the same roots with the panic, one fix will smash 2 issues. Thanks, Olek
On Wed, 2023-03-15 at 19:12 +0100, Alexander Lobakin wrote: > From: Ilya Leoshkevich <iii@linux.ibm.com> > Date: Wed, 15 Mar 2023 19:00:47 +0100 > > > On Wed, 2023-03-15 at 15:54 +0100, Ilya Leoshkevich wrote: > > > On Wed, 2023-03-15 at 11:54 +0100, Alexander Lobakin wrote: > > > > From: Alexander Lobakin <aleksander.lobakin@intel.com> > > > > Date: Wed, 15 Mar 2023 10:56:25 +0100 > > > > > > > > > From: Alexei Starovoitov <alexei.starovoitov@gmail.com> > > > > > Date: Tue, 14 Mar 2023 16:54:25 -0700 > > > > > > > > > > > On Tue, Mar 14, 2023 at 11:52 AM Alexei Starovoitov > > > > > > <alexei.starovoitov@gmail.com> wrote: > > > > > > > > > > [...] > > > > > > > > > > > test_xdp_do_redirect:PASS:prog_run 0 nsec > > > > > > test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec > > > > > > test_xdp_do_redirect:PASS:pkt_count_zero 0 nsec > > > > > > test_xdp_do_redirect:FAIL:pkt_count_tc unexpected > > > > > > pkt_count_tc: > > > > > > actual > > > > > > 220 != expected 9998 > > > > > > test_max_pkt_size:PASS:prog_run_max_size 0 nsec > > > > > > test_max_pkt_size:PASS:prog_run_too_big 0 nsec > > > > > > close_netns:PASS:setns 0 nsec > > > > > > #289 xdp_do_redirect:FAIL > > > > > > Summary: 270/1674 PASSED, 30 SKIPPED, 1 FAILED > > > > > > > > > > > > Alex, > > > > > > could you please take a look at why it's happening? > > > > > > > > > > > > I suspect it's an endianness issue in: > > > > > > if (*metadata != 0x42) > > > > > > return XDP_ABORTED; > > > > > > but your patch didn't change that, > > > > > > so I'm not sure why it worked before. > > > > > > > > > > Sure, lemme fix it real quick. > > > > > > > > Hi Ilya, > > > > > > > > Do you have s390 testing setups? Maybe you could take a look, > > > > since > > > > I > > > > don't have one and can't debug it? Doesn't seem to be > > > > Endianness > > > > issue. > > > > I mean, I have this (the below patch), but not sure it will fix > > > > anything -- IIRC eBPF arch always matches the host arch ._. > > > > I can't figure out from the code what does happen wrongly :s > > > > And it > > > > happens only on s390. > > > > > > > > Thanks, > > > > Olek > > > > --- > > > > diff --git > > > > a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c > > > > b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c > > > > index 662b6c6c5ed7..b21371668447 100644 > > > > --- a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c > > > > +++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c > > > > @@ -107,7 +107,7 @@ void test_xdp_do_redirect(void) > > > > .attach_point = BPF_TC_INGRESS); > > > > > > > > memcpy(&data[sizeof(__u32)], &pkt_udp, > > > > sizeof(pkt_udp)); > > > > - *((__u32 *)data) = 0x42; /* metadata test value */ > > > > + *((__u32 *)data) = htonl(0x42); /* metadata test value > > > > */ > > > > > > > > skel = test_xdp_do_redirect__open(); > > > > if (!ASSERT_OK_PTR(skel, "skel")) > > > > diff --git > > > > a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c > > > > b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c > > > > index cd2d4e3258b8..2475bc30ced2 100644 > > > > --- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c > > > > +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c > > > > @@ -1,5 +1,6 @@ > > > > // SPDX-License-Identifier: GPL-2.0 > > > > #include <vmlinux.h> > > > > +#include <bpf/bpf_endian.h> > > > > #include <bpf/bpf_helpers.h> > > > > > > > > #define ETH_ALEN 6 > > > > @@ -28,7 +29,7 @@ volatile int retcode = XDP_REDIRECT; > > > > SEC("xdp") > > > > int xdp_redirect(struct xdp_md *xdp) > > > > { > > > > - __u32 *metadata = (void *)(long)xdp->data_meta; > > > > + __be32 *metadata = (void *)(long)xdp->data_meta; > > > > void *data_end = (void *)(long)xdp->data_end; > > > > void *data = (void *)(long)xdp->data; > > > > > > > > @@ -44,7 +45,7 @@ int xdp_redirect(struct xdp_md *xdp) > > > > if (metadata + 1 > data) > > > > return XDP_ABORTED; > > > > > > > > - if (*metadata != 0x42) > > > > + if (*metadata != __bpf_htonl(0x42)) > > > > return XDP_ABORTED; > > > > > > > > if (*payload == MARK_XMIT) > > > > > > Okay, I'll take a look. Two quick observations for now: > > > > > > - Unfortunately the above patch does not help. > > > > > > - In dmesg I see: > > > > > > Driver unsupported XDP return value 0 on prog xdp_redirect > > > (id > > > 23) > > > dev N/A, expect packet loss! > > > > I haven't identified the issue yet, but I have found a couple more > > things that might be helpful: > > > > - In problematic cases metadata contains 0, so this is not an > > endianness issue. data is still reasonable though. I'm trying to > > understand what is causing this. > > > > - Applying the following diff: > > > > --- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c > > +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c > > @@ -52,7 +52,7 @@ int xdp_redirect(struct xdp_md *xdp) > > > > *payload = MARK_IN; > > > > - if (bpf_xdp_adjust_meta(xdp, 4)) > > + if (false && bpf_xdp_adjust_meta(xdp, 4)) > > return XDP_ABORTED; > > > > if (retcode > XDP_PASS) > > > > causes a kernel panic even on x86_64: > > > > BUG: kernel NULL pointer dereference, address: > > 0000000000000d28 > > ... > > Call Trace: > > <TASK> > > > > build_skb_around+0x22/0xb0 > > __xdp_build_skb_from_frame+0x4e/0x130 > > bpf_test_run_xdp_live+0x65f/0x7c0 > > ? __pfx_xdp_test_run_init_page+0x10/0x10 > > bpf_prog_test_run_xdp+0x2ba/0x480 > > bpf_prog_test_run+0xeb/0x110 > > __sys_bpf+0x2b9/0x570 > > __x64_sys_bpf+0x1c/0x30 > > do_syscall_64+0x48/0xa0 > > entry_SYSCALL_64_after_hwframe+0x72/0xdc > > > > I haven't looked into this at all, but I believe this needs to be > > fixed - BPF should never cause kernel panics. > > This one is basically the same issue as syzbot mentioned today > (separate > subthread). I'm waiting for a feedback from Toke on which way of > fixing > he'd prefer (I proposed 2). If those zeroed metadata magics that you > observe have the same roots with the panic, one fix will smash 2 > issues. > > Thanks, > Olek Sounds good, I will wait for an update then. In the meantime, I found the code that overwrites the metadata: #0 0x0000000000aaeee6 in neigh_hh_output (hh=0x83258df0, skb=0x88142200) at linux/include/net/neighbour.h:503 #1 0x0000000000ab2cda in neigh_output (skip_cache=false, skb=0x88142200, n=<optimized out>) at linux/include/net/neighbour.h:544 #2 ip6_finish_output2 (net=net@entry=0x88edba00, sk=sk@entry=0x0, skb=skb@entry=0x88142200) at linux/net/ipv6/ip6_output.c:134 #3 0x0000000000ab4cbc in __ip6_finish_output (skb=0x88142200, sk=0x0, net=0x88edba00) at linux/net/ipv6/ip6_output.c:195 #4 ip6_finish_output (net=0x88edba00, sk=0x0, skb=0x88142200) at linux/net/ipv6/ip6_output.c:206 #5 0x0000000000ab5cbc in dst_input (skb=<optimized out>) at linux/include/net/dst.h:454 #6 ip6_sublist_rcv_finish (head=head@entry=0x38000dbf520) at linux/net/ipv6/ip6_input.c:88 #7 0x0000000000ab6104 in ip6_list_rcv_finish (net=<optimized out>, head=<optimized out>, sk=0x0) at linux/net/ipv6/ip6_input.c:145 #8 0x0000000000ab72bc in ipv6_list_rcv (head=0x38000dbf638, pt=<optimized out>, orig_dev=<optimized out>) at linux/net/ipv6/ip6_input.c:354 #9 0x00000000008b3710 in __netif_receive_skb_list_ptype (orig_dev=0x880b8000, pt_prev=0x176b7f8 <ipv6_packet_type>, head=0x38000dbf638) at linux/net/core/dev.c:5520 #10 __netif_receive_skb_list_core (head=head@entry=0x38000dbf7b8, pfmemalloc=pfmemalloc@entry=false) at linux/net/core/dev.c:5568 #11 0x00000000008b4390 in __netif_receive_skb_list (head=0x38000dbf7b8) at linux/net/core/dev.c:5620 #12 netif_receive_skb_list_internal (head=head@entry=0x38000dbf7b8) at linux/net/core/dev.c:5711 #13 0x00000000008b45ce in netif_receive_skb_list (head=head@entry=0x38000dbf7b8) at linux/net/core/dev.c:5763 #14 0x0000000000950782 in xdp_recv_frames (dev=<optimized out>, skbs=<optimized out>, nframes=62, frames=0x8587c600) at linux/net/bpf/test_run.c:256 #15 xdp_test_run_batch (xdp=xdp@entry=0x38000dbf900, prog=prog@entry=0x37fffe75000, repeat=<optimized out>) at linux/net/bpf/test_run.c:334 namely: static inline int neigh_hh_output(const struct hh_cache *hh, struct sk_buff *skb) ... memcpy(skb->data - HH_DATA_MOD, hh->hh_data, HH_DATA_MOD); It's hard for me to see what is going on here, since I'm not familiar with the networking code - since XDP metadata is located at the end of headroom, should not there be something that prevents the network stack from overwriting it? Or can it be that netif_receive_skb_list() is free to do whatever it wants with that memory and we cannot expect to receive it back intact?
From: Ilya Leoshkevich <iii@linux.ibm.com> Date: Wed, 15 Mar 2023 19:26:00 +0100 > On Wed, 2023-03-15 at 19:12 +0100, Alexander Lobakin wrote: >> From: Ilya Leoshkevich <iii@linux.ibm.com> >> Date: Wed, 15 Mar 2023 19:00:47 +0100 >> >>> On Wed, 2023-03-15 at 15:54 +0100, Ilya Leoshkevich wrote: >>>> On Wed, 2023-03-15 at 11:54 +0100, Alexander Lobakin wrote: >>>>> From: Alexander Lobakin <aleksander.lobakin@intel.com> >>>>> Date: Wed, 15 Mar 2023 10:56:25 +0100 >>>>> >>>>>> From: Alexei Starovoitov <alexei.starovoitov@gmail.com> >>>>>> Date: Tue, 14 Mar 2023 16:54:25 -0700 >>>>>> >>>>>>> On Tue, Mar 14, 2023 at 11:52 AM Alexei Starovoitov >>>>>>> <alexei.starovoitov@gmail.com> wrote: >>>>>> >>>>>> [...] >>>>>> >>>>>>> test_xdp_do_redirect:PASS:prog_run 0 nsec >>>>>>> test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec >>>>>>> test_xdp_do_redirect:PASS:pkt_count_zero 0 nsec >>>>>>> test_xdp_do_redirect:FAIL:pkt_count_tc unexpected >>>>>>> pkt_count_tc: >>>>>>> actual >>>>>>> 220 != expected 9998 >>>>>>> test_max_pkt_size:PASS:prog_run_max_size 0 nsec >>>>>>> test_max_pkt_size:PASS:prog_run_too_big 0 nsec >>>>>>> close_netns:PASS:setns 0 nsec >>>>>>> #289 xdp_do_redirect:FAIL >>>>>>> Summary: 270/1674 PASSED, 30 SKIPPED, 1 FAILED >>>>>>> >>>>>>> Alex, >>>>>>> could you please take a look at why it's happening? >>>>>>> >>>>>>> I suspect it's an endianness issue in: >>>>>>> if (*metadata != 0x42) >>>>>>> return XDP_ABORTED; >>>>>>> but your patch didn't change that, >>>>>>> so I'm not sure why it worked before. >>>>>> >>>>>> Sure, lemme fix it real quick. >>>>> >>>>> Hi Ilya, >>>>> >>>>> Do you have s390 testing setups? Maybe you could take a look, >>>>> since >>>>> I >>>>> don't have one and can't debug it? Doesn't seem to be >>>>> Endianness >>>>> issue. >>>>> I mean, I have this (the below patch), but not sure it will fix >>>>> anything -- IIRC eBPF arch always matches the host arch ._. >>>>> I can't figure out from the code what does happen wrongly :s >>>>> And it >>>>> happens only on s390. >>>>> >>>>> Thanks, >>>>> Olek >>>>> --- >>>>> diff --git >>>>> a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c >>>>> b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c >>>>> index 662b6c6c5ed7..b21371668447 100644 >>>>> --- a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c >>>>> +++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c >>>>> @@ -107,7 +107,7 @@ void test_xdp_do_redirect(void) >>>>> .attach_point = BPF_TC_INGRESS); >>>>> >>>>> memcpy(&data[sizeof(__u32)], &pkt_udp, >>>>> sizeof(pkt_udp)); >>>>> - *((__u32 *)data) = 0x42; /* metadata test value */ >>>>> + *((__u32 *)data) = htonl(0x42); /* metadata test value >>>>> */ >>>>> >>>>> skel = test_xdp_do_redirect__open(); >>>>> if (!ASSERT_OK_PTR(skel, "skel")) >>>>> diff --git >>>>> a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c >>>>> b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c >>>>> index cd2d4e3258b8..2475bc30ced2 100644 >>>>> --- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c >>>>> +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c >>>>> @@ -1,5 +1,6 @@ >>>>> // SPDX-License-Identifier: GPL-2.0 >>>>> #include <vmlinux.h> >>>>> +#include <bpf/bpf_endian.h> >>>>> #include <bpf/bpf_helpers.h> >>>>> >>>>> #define ETH_ALEN 6 >>>>> @@ -28,7 +29,7 @@ volatile int retcode = XDP_REDIRECT; >>>>> SEC("xdp") >>>>> int xdp_redirect(struct xdp_md *xdp) >>>>> { >>>>> - __u32 *metadata = (void *)(long)xdp->data_meta; >>>>> + __be32 *metadata = (void *)(long)xdp->data_meta; >>>>> void *data_end = (void *)(long)xdp->data_end; >>>>> void *data = (void *)(long)xdp->data; >>>>> >>>>> @@ -44,7 +45,7 @@ int xdp_redirect(struct xdp_md *xdp) >>>>> if (metadata + 1 > data) >>>>> return XDP_ABORTED; >>>>> >>>>> - if (*metadata != 0x42) >>>>> + if (*metadata != __bpf_htonl(0x42)) >>>>> return XDP_ABORTED; >>>>> >>>>> if (*payload == MARK_XMIT) >>>> >>>> Okay, I'll take a look. Two quick observations for now: >>>> >>>> - Unfortunately the above patch does not help. >>>> >>>> - In dmesg I see: >>>> >>>> Driver unsupported XDP return value 0 on prog xdp_redirect >>>> (id >>>> 23) >>>> dev N/A, expect packet loss! >>> >>> I haven't identified the issue yet, but I have found a couple more >>> things that might be helpful: >>> >>> - In problematic cases metadata contains 0, so this is not an >>> endianness issue. data is still reasonable though. I'm trying to >>> understand what is causing this. >>> >>> - Applying the following diff: >>> >>> --- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c >>> +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c >>> @@ -52,7 +52,7 @@ int xdp_redirect(struct xdp_md *xdp) >>> >>> *payload = MARK_IN; >>> >>> - if (bpf_xdp_adjust_meta(xdp, 4)) >>> + if (false && bpf_xdp_adjust_meta(xdp, 4)) >>> return XDP_ABORTED; >>> >>> if (retcode > XDP_PASS) >>> >>> causes a kernel panic even on x86_64: >>> >>> BUG: kernel NULL pointer dereference, address: >>> 0000000000000d28 >>> ... >>> Call Trace: >>> <TASK> >>> >>> build_skb_around+0x22/0xb0 >>> __xdp_build_skb_from_frame+0x4e/0x130 >>> bpf_test_run_xdp_live+0x65f/0x7c0 >>> ? __pfx_xdp_test_run_init_page+0x10/0x10 >>> bpf_prog_test_run_xdp+0x2ba/0x480 >>> bpf_prog_test_run+0xeb/0x110 >>> __sys_bpf+0x2b9/0x570 >>> __x64_sys_bpf+0x1c/0x30 >>> do_syscall_64+0x48/0xa0 >>> entry_SYSCALL_64_after_hwframe+0x72/0xdc >>> >>> I haven't looked into this at all, but I believe this needs to be >>> fixed - BPF should never cause kernel panics. >> >> This one is basically the same issue as syzbot mentioned today >> (separate >> subthread). I'm waiting for a feedback from Toke on which way of >> fixing >> he'd prefer (I proposed 2). If those zeroed metadata magics that you >> observe have the same roots with the panic, one fix will smash 2 >> issues. >> >> Thanks, >> Olek > > Sounds good, I will wait for an update then. > > In the meantime, I found the code that overwrites the metadata: > > #0 0x0000000000aaeee6 in neigh_hh_output (hh=0x83258df0, > skb=0x88142200) at linux/include/net/neighbour.h:503 > #1 0x0000000000ab2cda in neigh_output (skip_cache=false, > skb=0x88142200, n=<optimized out>) at linux/include/net/neighbour.h:544 > #2 ip6_finish_output2 (net=net@entry=0x88edba00, sk=sk@entry=0x0, > skb=skb@entry=0x88142200) at linux/net/ipv6/ip6_output.c:134 > #3 0x0000000000ab4cbc in __ip6_finish_output (skb=0x88142200, sk=0x0, > net=0x88edba00) at linux/net/ipv6/ip6_output.c:195 > #4 ip6_finish_output (net=0x88edba00, sk=0x0, skb=0x88142200) at > linux/net/ipv6/ip6_output.c:206 > #5 0x0000000000ab5cbc in dst_input (skb=<optimized out>) at > linux/include/net/dst.h:454 > #6 ip6_sublist_rcv_finish (head=head@entry=0x38000dbf520) at > linux/net/ipv6/ip6_input.c:88 > #7 0x0000000000ab6104 in ip6_list_rcv_finish (net=<optimized out>, > head=<optimized out>, sk=0x0) at linux/net/ipv6/ip6_input.c:145 > #8 0x0000000000ab72bc in ipv6_list_rcv (head=0x38000dbf638, > pt=<optimized out>, orig_dev=<optimized out>) at > linux/net/ipv6/ip6_input.c:354 > #9 0x00000000008b3710 in __netif_receive_skb_list_ptype > (orig_dev=0x880b8000, pt_prev=0x176b7f8 <ipv6_packet_type>, > head=0x38000dbf638) at linux/net/core/dev.c:5520 > #10 __netif_receive_skb_list_core (head=head@entry=0x38000dbf7b8, > pfmemalloc=pfmemalloc@entry=false) at linux/net/core/dev.c:5568 > #11 0x00000000008b4390 in __netif_receive_skb_list (head=0x38000dbf7b8) > at linux/net/core/dev.c:5620 > #12 netif_receive_skb_list_internal (head=head@entry=0x38000dbf7b8) at > linux/net/core/dev.c:5711 > #13 0x00000000008b45ce in netif_receive_skb_list > (head=head@entry=0x38000dbf7b8) at linux/net/core/dev.c:5763 > #14 0x0000000000950782 in xdp_recv_frames (dev=<optimized out>, > skbs=<optimized out>, nframes=62, frames=0x8587c600) at > linux/net/bpf/test_run.c:256 > #15 xdp_test_run_batch (xdp=xdp@entry=0x38000dbf900, > prog=prog@entry=0x37fffe75000, repeat=<optimized out>) at > linux/net/bpf/test_run.c:334 > > namely: > > static inline int neigh_hh_output(const struct hh_cache *hh, struct > sk_buff *skb) > ... > memcpy(skb->data - HH_DATA_MOD, hh->hh_data, HH_DATA_MOD); > > It's hard for me to see what is going on here, since I'm not familiar > with the networking code - since XDP metadata is located at the end of > headroom, should not there be something that prevents the network stack > from overwriting it? Or can it be that netif_receive_skb_list() is Ok I got it. Much thanks for the debugging! It gets overwritten when neigh puts the MAC addr back when xmitting. It works on LE, because 0x42 fits into one bit and it's stored in `mac_addr - 4`. On BE this byte is `mac_addr - 1`. Neigh rounds 14 bytes of Ethernet header to 16, so two bytes of metadata get wiped. This is not the same bug as the one that syzbot reported, but they are provoked by the same: assumptions that %XDP_PASS guarantees the page won't come back to the pool. So there are two ways: the first one is to fix those two bugs (two oneliners basically), the second one is to turn off the recycling for bpf_test_run at all. I like the first more as it theoretically keeps the perf boost for bpf_test_run gained from enabling the recycling, but it can't guarantee similar stuff won't happen soon :D Something else might get overwritten somewhere else and so on. The second one will effectively revert the logics for bpf_test_run to the pre-recycling state. I'll submit the first way today, it will be a series of two. Will see then how it goes, I'm fine with both ways. > free to do whatever it wants with that memory and we cannot expect to > receive it back intact? Ideally, yes, after a frame is passed outside the driver, you can't assume anything on its page won't be changed. But that's one of the bpf_test_run's "features" that gives it nice perf :D So as long as assumptions don't get broken by some change, like this one with the recycling, it just works. Thanks, Olek