Message ID | 20230725155403.796-1-andrew.kanner@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9010:0:b0:3e4:2afc:c1 with SMTP id l16csp2608330vqg; Tue, 25 Jul 2023 10:04:05 -0700 (PDT) X-Google-Smtp-Source: APBJJlFuhd5GFZkOAYvf1warmbhLk6Y8ohf4PLCzAUUpfTMGPLxFzUxFB5HewKF8hL9gfcyEuN8u X-Received: by 2002:a2e:9c93:0:b0:2b4:83c3:d285 with SMTP id x19-20020a2e9c93000000b002b483c3d285mr8858583lji.38.1690304644817; Tue, 25 Jul 2023 10:04:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690304644; cv=none; d=google.com; s=arc-20160816; b=Sf5rNKflrvjl2zXM/yqu3wybQLgyBetnsSw+M2DYWTBFuidoSmYd7qm8jgYTTCfVRg y9abFhytDPxG9gWNwyPyRMveQ0w9F/AkA17eHpN/WuwUB/4mOHSU9FjZY6bOMVVre0+N C/yMga6Bh/t7nM78ei3CVewvOAGX9NA/m5gPwe8rAX5trHcaOu/OWrool/N/UT3yipNJ 5EEuTqBrxA/rkcgDrFrC8mTaplDscyRwJcm2LFGHx3XQsFSa6qwZln3jF63HLqk32e+C WtQ6l0BDOkPsesQqRyO/6PP672cABrqOJNcL7S3Sy/hDPlHd/wtblSDsvyQQ3Uea2W/L sw4g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=+2ueYrBi52W2OkGXaF73xpgZBQIsvgeSihX41z0L7xE=; fh=neJKXSJhdh/dzWbfNkmI6hL7/RYA+C6ag08sPsn5SZA=; b=FeT+vIBekVQHM/MFykReLKEWNiOHo9fgTrI7/Dnb+yOCZHVsJ+1WMDB7gkB3qrv2Lj JfE5AKM1dLhLRAjdxdFXJuAAy4ecS3T2TT+VZ0qwKG/KXR1/1FOJ+dga64rbuUprtRe4 dpYeDRrOSaSB+86PuGZcsg1DCpnBE7q4/cOW3rMftia/ZKmsZrF1RiXD5bGarKvfbiJm TYvzERKB4q+N8ys7e3gUfyr54nGqtHwoqBcHiDKTrrFs1ekvUX2j2ByHqqP0WmtpI7sV Roe8paPHUpWLD9tuk3K0PqUYe9k6baQYx+hQh3n2N45Y7S8YpcYjKdnUUdxfn8Yb3LMC Hv8g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=L+StV56S; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id mc17-20020a170906eb5100b00992f8116abdsi9231582ejb.480.2023.07.25.10.03.40; Tue, 25 Jul 2023 10:04:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=L+StV56S; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232793AbjGYPyc (ORCPT <rfc822;kautuk.consul.80@gmail.com> + 99 others); Tue, 25 Jul 2023 11:54:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35332 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232389AbjGYPya (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 25 Jul 2023 11:54:30 -0400 Received: from mail-lj1-x232.google.com (mail-lj1-x232.google.com [IPv6:2a00:1450:4864:20::232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 07F94211C; Tue, 25 Jul 2023 08:54:29 -0700 (PDT) Received: by mail-lj1-x232.google.com with SMTP id 38308e7fff4ca-2b9aa1d3029so9215311fa.2; Tue, 25 Jul 2023 08:54:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1690300467; x=1690905267; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=+2ueYrBi52W2OkGXaF73xpgZBQIsvgeSihX41z0L7xE=; b=L+StV56SO8Hq4MRlBfv5ewD58CUtuNhf3SladD8jctRRcL8zbKOJsdIzyNKM0XJ804 jLH3O4VhCEmWK1u6tUPh5jf0YG5dMHLu9yHparAuZpXH7rsAT9ZVHUMHcYtHDZzASR7b 3i3BXSwCf4JHyFWVVa5ojk+GqQLWDg4r1e1sxOQ1Tn++Guxd10G9SGo4Xr0rv+jmzODJ WVegOTzRbBVIglLKAn1sVUe7mfWQE0B9vb0drVotj5vI5YJeJz4hy+Qy7bpDfsXJd/ZN KrjAhhPwLx0XzQ/eK8ueSW9CGP3K9LXg87Hg2jeC3yiKtAJuuqB1/NxOePbT6bwnYdf8 cGYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690300467; x=1690905267; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=+2ueYrBi52W2OkGXaF73xpgZBQIsvgeSihX41z0L7xE=; b=U5tOeiApvtXPUNKyUKPMiW91jq1up57L2acjNAVzaFrJoKFQGC+Pfm27VAw1PWyTGz cDYk9vzPWGSCXhx4tFA0KUHZ9fxJThlPqcfGQ0zVF+/91P7tMuGX5whAmm81HY14AUcM q6glbYwj+FuY/BHD9uymKB2sKF0wRBBjC+gwWTy/x40bavNhb5AEM+5+4XmHuiUUoAEV 9V37ykm/vMGMU+2eKK60GkswBVS4KRnwauP+ff05cWlI1FWAhb3rLrNCLxg7pSPr3nk1 DhJVN6J3S/MamI9grH6/9A0G7T+BDJRWrN/yM85BY+XiiDAuAit8EEKyeEdOxcuQv0AR 6ZoQ== X-Gm-Message-State: ABy/qLa8bsOrQsgQ5h/9by74AkAl0nk++htX3e+qFq6GLu1wgYc868VA ufW5funMwA1yf49SFuDba40= X-Received: by 2002:a2e:99d9:0:b0:2b5:80e0:f18e with SMTP id l25-20020a2e99d9000000b002b580e0f18emr8957085ljj.3.1690300467033; Tue, 25 Jul 2023 08:54:27 -0700 (PDT) Received: from localhost.localdomain ([77.222.25.78]) by smtp.gmail.com with ESMTPSA id a24-20020a2e8618000000b002b96a3a87d5sm3510747lji.98.2023.07.25.08.54.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Jul 2023 08:54:26 -0700 (PDT) From: Andrew Kanner <andrew.kanner@gmail.com> To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, jasowang@redhat.com, netdev@vger.kernel.org, brouer@redhat.com, linux-kernel@vger.kernel.org Cc: linux-kernel-mentees@lists.linuxfoundation.org, syzbot+f817490f5bd20541b90a@syzkaller.appspotmail.com, Andrew Kanner <andrew.kanner@gmail.com> Subject: [PATCH v3] drivers: net: prevent tun_get_user() to exceed xdp size limits Date: Tue, 25 Jul 2023 18:54:04 +0300 Message-Id: <20230725155403.796-1-andrew.kanner@gmail.com> X-Mailer: git-send-email 2.39.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772411484206623810 X-GMAIL-MSGID: 1772412883458357331 |
Series |
[v3] drivers: net: prevent tun_get_user() to exceed xdp size limits
|
|
Commit Message
Andrew Kanner
July 25, 2023, 3:54 p.m. UTC
Syzkaller reported the following issue:
=======================================
Too BIG xdp->frame_sz = 131072
WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121
____bpf_xdp_adjust_tail net/core/filter.c:4121 [inline]
WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121
bpf_xdp_adjust_tail+0x466/0xa10 net/core/filter.c:4103
...
Call Trace:
<TASK>
bpf_prog_4add87e5301a4105+0x1a/0x1c
__bpf_prog_run include/linux/filter.h:600 [inline]
bpf_prog_run_xdp include/linux/filter.h:775 [inline]
bpf_prog_run_generic_xdp+0x57e/0x11e0 net/core/dev.c:4721
netif_receive_generic_xdp net/core/dev.c:4807 [inline]
do_xdp_generic+0x35c/0x770 net/core/dev.c:4866
tun_get_user+0x2340/0x3ca0 drivers/net/tun.c:1919
tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2043
call_write_iter include/linux/fs.h:1871 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x650/0xe40 fs/read_write.c:584
ksys_write+0x12f/0x250 fs/read_write.c:637
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
xdp->frame_sz > PAGE_SIZE check was introduced in commit c8741e2bfe87
("xdp: Allow bpf_xdp_adjust_tail() to grow packet size"). But
tun_get_user() still provides an execution path with do_xdp_generic()
and exceed XDP limits for packet size.
Using the syzkaller repro with reduced packet size it was also
discovered that XDP_PACKET_HEADROOM is not checked in
tun_can_build_skb(), although pad may be incremented in
tun_build_skb().
If we move the limit check from tun_can_build_skb() to tun_build_skb()
we will make xdp to be used only in tun_build_skb(), without falling
in tun_alloc_skb(), etc. And moreover we will drop the packet which
can't be processed in tun_build_skb().
Reported-and-tested-by: syzbot+f817490f5bd20541b90a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000774b9205f1d8a80d@google.com/T/
Link: https://syzkaller.appspot.com/bug?id=5335c7c62bfff89bbb1c8f14cdabebe91909060f
Fixes: 7df13219d757 ("tun: reserve extra headroom only when XDP is set")
Signed-off-by: Andrew Kanner <andrew.kanner@gmail.com>
---
Notes:
V2 -> V3:
* attach the forgotten changelog
V1 -> V2:
* merged 2 patches in 1, fixing both issues: WARN_ON_ONCE with
syzkaller repro and missing XDP_PACKET_HEADROOM in pad
* changed the title and description of the execution path, suggested
by Jason Wang <jasowang@redhat.com>
* move the limit check from tun_can_build_skb() to tun_build_skb() to
remove duplication and locking issue, and also drop the packet in
case of a failed check - noted by Jason Wang <jasowang@redhat.com>
drivers/net/tun.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
Comments
On Tue, Jul 25, 2023 at 11:54 PM Andrew Kanner <andrew.kanner@gmail.com> wrote: > > Syzkaller reported the following issue: > ======================================= > Too BIG xdp->frame_sz = 131072 > WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121 > ____bpf_xdp_adjust_tail net/core/filter.c:4121 [inline] > WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121 > bpf_xdp_adjust_tail+0x466/0xa10 net/core/filter.c:4103 > ... > Call Trace: > <TASK> > bpf_prog_4add87e5301a4105+0x1a/0x1c > __bpf_prog_run include/linux/filter.h:600 [inline] > bpf_prog_run_xdp include/linux/filter.h:775 [inline] > bpf_prog_run_generic_xdp+0x57e/0x11e0 net/core/dev.c:4721 > netif_receive_generic_xdp net/core/dev.c:4807 [inline] > do_xdp_generic+0x35c/0x770 net/core/dev.c:4866 > tun_get_user+0x2340/0x3ca0 drivers/net/tun.c:1919 > tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2043 > call_write_iter include/linux/fs.h:1871 [inline] > new_sync_write fs/read_write.c:491 [inline] > vfs_write+0x650/0xe40 fs/read_write.c:584 > ksys_write+0x12f/0x250 fs/read_write.c:637 > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > > xdp->frame_sz > PAGE_SIZE check was introduced in commit c8741e2bfe87 > ("xdp: Allow bpf_xdp_adjust_tail() to grow packet size"). But > tun_get_user() still provides an execution path with do_xdp_generic() > and exceed XDP limits for packet size. > > Using the syzkaller repro with reduced packet size it was also > discovered that XDP_PACKET_HEADROOM is not checked in > tun_can_build_skb(), although pad may be incremented in > tun_build_skb(). > > If we move the limit check from tun_can_build_skb() to tun_build_skb() > we will make xdp to be used only in tun_build_skb(), without falling > in tun_alloc_skb(), etc. And moreover we will drop the packet which > can't be processed in tun_build_skb(). > > Reported-and-tested-by: syzbot+f817490f5bd20541b90a@syzkaller.appspotmail.com > Closes: https://lore.kernel.org/all/000000000000774b9205f1d8a80d@google.com/T/ > Link: https://syzkaller.appspot.com/bug?id=5335c7c62bfff89bbb1c8f14cdabebe91909060f > Fixes: 7df13219d757 ("tun: reserve extra headroom only when XDP is set") > Signed-off-by: Andrew Kanner <andrew.kanner@gmail.com> > --- > > Notes: > V2 -> V3: > * attach the forgotten changelog > V1 -> V2: > * merged 2 patches in 1, fixing both issues: WARN_ON_ONCE with > syzkaller repro and missing XDP_PACKET_HEADROOM in pad > * changed the title and description of the execution path, suggested > by Jason Wang <jasowang@redhat.com> > * move the limit check from tun_can_build_skb() to tun_build_skb() to > remove duplication and locking issue, and also drop the packet in > case of a failed check - noted by Jason Wang <jasowang@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Thanks > > drivers/net/tun.c | 7 +++---- > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index d75456adc62a..7c2b05ce0421 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -1594,10 +1594,6 @@ static bool tun_can_build_skb(struct tun_struct *tun, struct tun_file *tfile, > if (zerocopy) > return false; > > - if (SKB_DATA_ALIGN(len + TUN_RX_PAD) + > - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) > PAGE_SIZE) > - return false; > - > return true; > } > > @@ -1673,6 +1669,9 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, > buflen += SKB_DATA_ALIGN(len + pad); > rcu_read_unlock(); > > + if (buflen > PAGE_SIZE) > + return ERR_PTR(-EFAULT); > + > alloc_frag->offset = ALIGN((u64)alloc_frag->offset, SMP_CACHE_BYTES); > if (unlikely(!skb_page_frag_refill(buflen, alloc_frag, GFP_KERNEL))) > return ERR_PTR(-ENOMEM); > -- > 2.39.3 >
On Wed, Jul 26, 2023 at 10:09:53AM +0800, Jason Wang wrote: > On Tue, Jul 25, 2023 at 11:54 PM Andrew Kanner <andrew.kanner@gmail.com> wrote: > > > > Syzkaller reported the following issue: > > ======================================= > > Too BIG xdp->frame_sz = 131072 > > WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121 > > ____bpf_xdp_adjust_tail net/core/filter.c:4121 [inline] > > WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121 > > bpf_xdp_adjust_tail+0x466/0xa10 net/core/filter.c:4103 > > ... > > Call Trace: > > <TASK> > > bpf_prog_4add87e5301a4105+0x1a/0x1c > > __bpf_prog_run include/linux/filter.h:600 [inline] > > bpf_prog_run_xdp include/linux/filter.h:775 [inline] > > bpf_prog_run_generic_xdp+0x57e/0x11e0 net/core/dev.c:4721 > > netif_receive_generic_xdp net/core/dev.c:4807 [inline] > > do_xdp_generic+0x35c/0x770 net/core/dev.c:4866 > > tun_get_user+0x2340/0x3ca0 drivers/net/tun.c:1919 > > tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2043 > > call_write_iter include/linux/fs.h:1871 [inline] > > new_sync_write fs/read_write.c:491 [inline] > > vfs_write+0x650/0xe40 fs/read_write.c:584 > > ksys_write+0x12f/0x250 fs/read_write.c:637 > > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > > do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 > > entry_SYSCALL_64_after_hwframe+0x63/0xcd > > > > xdp->frame_sz > PAGE_SIZE check was introduced in commit c8741e2bfe87 > > ("xdp: Allow bpf_xdp_adjust_tail() to grow packet size"). But > > tun_get_user() still provides an execution path with do_xdp_generic() > > and exceed XDP limits for packet size. > > > > Using the syzkaller repro with reduced packet size it was also > > discovered that XDP_PACKET_HEADROOM is not checked in > > tun_can_build_skb(), although pad may be incremented in > > tun_build_skb(). > > > > If we move the limit check from tun_can_build_skb() to tun_build_skb() > > we will make xdp to be used only in tun_build_skb(), without falling > > in tun_alloc_skb(), etc. And moreover we will drop the packet which > > can't be processed in tun_build_skb(). > > > > Reported-and-tested-by: syzbot+f817490f5bd20541b90a@syzkaller.appspotmail.com > > Closes: https://lore.kernel.org/all/000000000000774b9205f1d8a80d@google.com/T/ > > Link: https://syzkaller.appspot.com/bug?id=5335c7c62bfff89bbb1c8f14cdabebe91909060f > > Fixes: 7df13219d757 ("tun: reserve extra headroom only when XDP is set") > > Signed-off-by: Andrew Kanner <andrew.kanner@gmail.com> > > --- > > > > Notes: > > V2 -> V3: > > * attach the forgotten changelog > > V1 -> V2: > > * merged 2 patches in 1, fixing both issues: WARN_ON_ONCE with > > syzkaller repro and missing XDP_PACKET_HEADROOM in pad > > * changed the title and description of the execution path, suggested > > by Jason Wang <jasowang@redhat.com> > > * move the limit check from tun_can_build_skb() to tun_build_skb() to > > remove duplication and locking issue, and also drop the packet in > > case of a failed check - noted by Jason Wang <jasowang@redhat.com> > > Acked-by: Jason Wang <jasowang@redhat.com> > > Thanks > > > > > drivers/net/tun.c | 7 +++---- > > 1 file changed, 3 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > > index d75456adc62a..7c2b05ce0421 100644 > > --- a/drivers/net/tun.c > > +++ b/drivers/net/tun.c > > @@ -1594,10 +1594,6 @@ static bool tun_can_build_skb(struct tun_struct *tun, struct tun_file *tfile, > > if (zerocopy) > > return false; > > > > - if (SKB_DATA_ALIGN(len + TUN_RX_PAD) + > > - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) > PAGE_SIZE) > > - return false; > > - > > return true; > > } > > > > @@ -1673,6 +1669,9 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, > > buflen += SKB_DATA_ALIGN(len + pad); > > rcu_read_unlock(); > > > > + if (buflen > PAGE_SIZE) > > + return ERR_PTR(-EFAULT); > > + > > alloc_frag->offset = ALIGN((u64)alloc_frag->offset, SMP_CACHE_BYTES); > > if (unlikely(!skb_page_frag_refill(buflen, alloc_frag, GFP_KERNEL))) > > return ERR_PTR(-ENOMEM); > > -- > > 2.39.3 > > > Thanks, Jason. Can anyone point me to some tests other than tools/testing/selftests/net/tun.c? This one shows: PASSED: 5 / 5 tests passed. I'm trying to figure out if we're dropping more packets than expected with this patch. Not sure if the test above is enough.
Cc. John and Ahern On 26/07/2023 04.09, Jason Wang wrote: > On Tue, Jul 25, 2023 at 11:54 PM Andrew Kanner <andrew.kanner@gmail.com> wrote: >> >> Syzkaller reported the following issue: >> ======================================= >> Too BIG xdp->frame_sz = 131072 Is this a contiguous physical memory allocation? 131072 bytes equal order 5 page. Looking at tun.c code I cannot find a code path that could create order-5 skb->data, but only SKB with order-0 fragments. But I guess it is the netif_receive_generic_xdp() what will realloc to make this linear (via skb_linearize()) >> WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121 >> ____bpf_xdp_adjust_tail net/core/filter.c:4121 [inline] >> WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121 >> bpf_xdp_adjust_tail+0x466/0xa10 net/core/filter.c:4103 >> ... >> Call Trace: >> <TASK> >> bpf_prog_4add87e5301a4105+0x1a/0x1c >> __bpf_prog_run include/linux/filter.h:600 [inline] >> bpf_prog_run_xdp include/linux/filter.h:775 [inline] >> bpf_prog_run_generic_xdp+0x57e/0x11e0 net/core/dev.c:4721 >> netif_receive_generic_xdp net/core/dev.c:4807 [inline] >> do_xdp_generic+0x35c/0x770 net/core/dev.c:4866 >> tun_get_user+0x2340/0x3ca0 drivers/net/tun.c:1919 >> tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2043 >> call_write_iter include/linux/fs.h:1871 [inline] >> new_sync_write fs/read_write.c:491 [inline] >> vfs_write+0x650/0xe40 fs/read_write.c:584 >> ksys_write+0x12f/0x250 fs/read_write.c:637 >> do_syscall_x64 arch/x86/entry/common.c:50 [inline] >> do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 >> entry_SYSCALL_64_after_hwframe+0x63/0xcd >> >> xdp->frame_sz > PAGE_SIZE check was introduced in commit c8741e2bfe87 >> ("xdp: Allow bpf_xdp_adjust_tail() to grow packet size"). But >> tun_get_user() still provides an execution path with do_xdp_generic() >> and exceed XDP limits for packet size. I added this check and maybe it is too strict. XDP can work on higher order pages, as long as this is contiguous physical memory (e.g. a page). And An order 5 page (131072 bytes) seems excessive, but maybe TUN have a use-case for having such large packets? (Question to Ahern?) I'm considering we should change the size-limit to order-2 (16384) or order-3 (32768). Order-3 because netstack have: #define SKB_FRAG_PAGE_ORDER get_order(32768) And order-2 because netstack have: SKB_MAX_ALLOC (16KiB) - See discussion in commit 6306c1189e77 ("bpf: Remove MTU check in __bpf_skb_max_len"). - https://git.kernel.org/torvalds/c/6306c1189e77 >> >> Using the syzkaller repro with reduced packet size it was also >> discovered that XDP_PACKET_HEADROOM is not checked in >> tun_can_build_skb(), although pad may be incremented in >> tun_build_skb(). >> >> If we move the limit check from tun_can_build_skb() to tun_build_skb() >> we will make xdp to be used only in tun_build_skb(), without falling >> in tun_alloc_skb(), etc. And moreover we will drop the packet which >> can't be processed in tun_build_skb(). Looking at tun_build_skb() is uses the page_frag system, and can thus create up-to SKB_FRAG_PAGE_ORDER (size 32768 / order-3). >> >> Reported-and-tested-by: syzbot+f817490f5bd20541b90a@syzkaller.appspotmail.com >> Closes: https://lore.kernel.org/all/000000000000774b9205f1d8a80d@google.com/T/ >> Link: https://syzkaller.appspot.com/bug?id=5335c7c62bfff89bbb1c8f14cdabebe91909060f >> Fixes: 7df13219d757 ("tun: reserve extra headroom only when XDP is set") >> Signed-off-by: Andrew Kanner <andrew.kanner@gmail.com> >> --- >> >> Notes: >> V2 -> V3: >> * attach the forgotten changelog >> V1 -> V2: >> * merged 2 patches in 1, fixing both issues: WARN_ON_ONCE with >> syzkaller repro and missing XDP_PACKET_HEADROOM in pad >> * changed the title and description of the execution path, suggested >> by Jason Wang <jasowang@redhat.com> >> * move the limit check from tun_can_build_skb() to tun_build_skb() to >> remove duplication and locking issue, and also drop the packet in >> case of a failed check - noted by Jason Wang <jasowang@redhat.com> > > Acked-by: Jason Wang <jasowang@redhat.com> > > Thanks > >> >> drivers/net/tun.c | 7 +++---- >> 1 file changed, 3 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/net/tun.c b/drivers/net/tun.c >> index d75456adc62a..7c2b05ce0421 100644 >> --- a/drivers/net/tun.c >> +++ b/drivers/net/tun.c >> @@ -1594,10 +1594,6 @@ static bool tun_can_build_skb(struct tun_struct *tun, struct tun_file *tfile, >> if (zerocopy) >> return false; >> >> - if (SKB_DATA_ALIGN(len + TUN_RX_PAD) + >> - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) > PAGE_SIZE) >> - return false; >> - >> return true; >> } >> >> @@ -1673,6 +1669,9 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, >> buflen += SKB_DATA_ALIGN(len + pad); >> rcu_read_unlock(); >> >> + if (buflen > PAGE_SIZE) >> + return ERR_PTR(-EFAULT); Concretely I'm saying maybe use SKB_FRAG_PAGE_ORDER "size" here? e.g. create SKB_FRAG_PAGE_SIZE define as below. if (buflen > SKB_FRAG_PAGE_SIZE) diff --git a/include/net/sock.h b/include/net/sock.h index 656ea89f60ff..4c4b3c257b52 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2886,7 +2886,8 @@ extern int sysctl_optmem_max; extern __u32 sysctl_wmem_default; extern __u32 sysctl_rmem_default; -#define SKB_FRAG_PAGE_ORDER get_order(32768) +#define SKB_FRAG_PAGE_SIZE 32768 +#define SKB_FRAG_PAGE_ORDER get_order(SKB_FRAG_PAGE_SIZE) DECLARE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key); >> + >> alloc_frag->offset = ALIGN((u64)alloc_frag->offset, SMP_CACHE_BYTES); >> if (unlikely(!skb_page_frag_refill(buflen, alloc_frag, GFP_KERNEL))) >> return ERR_PTR(-ENOMEM); --Jesper
On 7/26/23 3:02 AM, Jesper Dangaard Brouer wrote: > Cc. John and Ahern > > On 26/07/2023 04.09, Jason Wang wrote: >> On Tue, Jul 25, 2023 at 11:54 PM Andrew Kanner >> <andrew.kanner@gmail.com> wrote: >>> >>> Syzkaller reported the following issue: >>> ======================================= >>> Too BIG xdp->frame_sz = 131072 > > Is this a contiguous physical memory allocation? > > 131072 bytes equal order 5 page. > > Looking at tun.c code I cannot find a code path that could create > order-5 skb->data, but only SKB with order-0 fragments. But I guess it > is the netif_receive_generic_xdp() what will realloc to make this linear > (via skb_linearize()) get_tun_user is passed an iov_iter with a single segment of 65007 total_len. The alloc_skb path is hit with an align size of only 64. That is insufficient for XDP so the netif_receive_generic_xdp hits the pskb_expand_head path. Something is off in the math in netif_receive_generic_xdp resulting in the skb markers being off. That causes bpf_prog_run_generic_xdp to compute the wrong frame_sz. > >>> WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121 >>> ____bpf_xdp_adjust_tail net/core/filter.c:4121 [inline] >>> WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121 >>> bpf_xdp_adjust_tail+0x466/0xa10 net/core/filter.c:4103 >>> ... >>> Call Trace: >>> <TASK> >>> bpf_prog_4add87e5301a4105+0x1a/0x1c >>> __bpf_prog_run include/linux/filter.h:600 [inline] >>> bpf_prog_run_xdp include/linux/filter.h:775 [inline] >>> bpf_prog_run_generic_xdp+0x57e/0x11e0 net/core/dev.c:4721 >>> netif_receive_generic_xdp net/core/dev.c:4807 [inline] >>> do_xdp_generic+0x35c/0x770 net/core/dev.c:4866 >>> tun_get_user+0x2340/0x3ca0 drivers/net/tun.c:1919 >>> tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2043 >>> call_write_iter include/linux/fs.h:1871 [inline] >>> new_sync_write fs/read_write.c:491 [inline] >>> vfs_write+0x650/0xe40 fs/read_write.c:584 >>> ksys_write+0x12f/0x250 fs/read_write.c:637 >>> do_syscall_x64 arch/x86/entry/common.c:50 [inline] >>> do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 >>> entry_SYSCALL_64_after_hwframe+0x63/0xcd >>> >>> xdp->frame_sz > PAGE_SIZE check was introduced in commit c8741e2bfe87 >>> ("xdp: Allow bpf_xdp_adjust_tail() to grow packet size"). But >>> tun_get_user() still provides an execution path with do_xdp_generic() >>> and exceed XDP limits for packet size. > > I added this check and maybe it is too strict. XDP can work on higher > order pages, as long as this is contiguous physical memory (e.g. a > page). And > > An order 5 page (131072 bytes) seems excessive, but maybe TUN have a > use-case for having such large packets? (Question to Ahern?) > > I'm considering we should change the size-limit to order-2 (16384) or > order-3 (32768). > > Order-3 because netstack have: > #define SKB_FRAG_PAGE_ORDER get_order(32768) > > And order-2 because netstack have: SKB_MAX_ALLOC (16KiB) > - See discussion in commit 6306c1189e77 ("bpf: Remove MTU check in > __bpf_skb_max_len"). > - https://git.kernel.org/torvalds/c/6306c1189e77 > > >>> >>> Using the syzkaller repro with reduced packet size it was also >>> discovered that XDP_PACKET_HEADROOM is not checked in >>> tun_can_build_skb(), although pad may be incremented in >>> tun_build_skb(). >>> >>> If we move the limit check from tun_can_build_skb() to tun_build_skb() >>> we will make xdp to be used only in tun_build_skb(), without falling >>> in tun_alloc_skb(), etc. And moreover we will drop the packet which >>> can't be processed in tun_build_skb(). > > Looking at tun_build_skb() is uses the page_frag system, and can thus > create up-to SKB_FRAG_PAGE_ORDER (size 32768 / order-3). > >>> >>> Reported-and-tested-by: >>> syzbot+f817490f5bd20541b90a@syzkaller.appspotmail.com >>> Closes: >>> https://lore.kernel.org/all/000000000000774b9205f1d8a80d@google.com/T/ >>> Link: >>> https://syzkaller.appspot.com/bug?id=5335c7c62bfff89bbb1c8f14cdabebe91909060f >>> Fixes: 7df13219d757 ("tun: reserve extra headroom only when XDP is set") >>> Signed-off-by: Andrew Kanner <andrew.kanner@gmail.com> >>> --- >>> >>> Notes: >>> V2 -> V3: >>> * attach the forgotten changelog >>> V1 -> V2: >>> * merged 2 patches in 1, fixing both issues: WARN_ON_ONCE with >>> syzkaller repro and missing XDP_PACKET_HEADROOM in pad >>> * changed the title and description of the execution path, >>> suggested >>> by Jason Wang <jasowang@redhat.com> >>> * move the limit check from tun_can_build_skb() to >>> tun_build_skb() to >>> remove duplication and locking issue, and also drop the packet in >>> case of a failed check - noted by Jason Wang >>> <jasowang@redhat.com> >> >> Acked-by: Jason Wang <jasowang@redhat.com> >> >> Thanks >> >>> >>> drivers/net/tun.c | 7 +++---- >>> 1 file changed, 3 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c >>> index d75456adc62a..7c2b05ce0421 100644 >>> --- a/drivers/net/tun.c >>> +++ b/drivers/net/tun.c >>> @@ -1594,10 +1594,6 @@ static bool tun_can_build_skb(struct >>> tun_struct *tun, struct tun_file *tfile, >>> if (zerocopy) >>> return false; >>> >>> - if (SKB_DATA_ALIGN(len + TUN_RX_PAD) + >>> - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) > PAGE_SIZE) >>> - return false; >>> - >>> return true; >>> } >>> >>> @@ -1673,6 +1669,9 @@ static struct sk_buff *tun_build_skb(struct >>> tun_struct *tun, >>> buflen += SKB_DATA_ALIGN(len + pad); >>> rcu_read_unlock(); >>> >>> + if (buflen > PAGE_SIZE) >>> + return ERR_PTR(-EFAULT); > > Concretely I'm saying maybe use SKB_FRAG_PAGE_ORDER "size" here? > > e.g. create SKB_FRAG_PAGE_SIZE define as below. > if (buflen > SKB_FRAG_PAGE_SIZE) > > diff --git a/include/net/sock.h b/include/net/sock.h > index 656ea89f60ff..4c4b3c257b52 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -2886,7 +2886,8 @@ extern int sysctl_optmem_max; > extern __u32 sysctl_wmem_default; > extern __u32 sysctl_rmem_default; > > -#define SKB_FRAG_PAGE_ORDER get_order(32768) > +#define SKB_FRAG_PAGE_SIZE 32768 > +#define SKB_FRAG_PAGE_ORDER get_order(SKB_FRAG_PAGE_SIZE) > DECLARE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key); > >>> + >>> alloc_frag->offset = ALIGN((u64)alloc_frag->offset, >>> SMP_CACHE_BYTES); >>> if (unlikely(!skb_page_frag_refill(buflen, alloc_frag, >>> GFP_KERNEL))) >>> return ERR_PTR(-ENOMEM); > > --Jesper >
On 7/26/23 1:37 PM, David Ahern wrote: > On 7/26/23 3:02 AM, Jesper Dangaard Brouer wrote: >> Cc. John and Ahern >> >> On 26/07/2023 04.09, Jason Wang wrote: >>> On Tue, Jul 25, 2023 at 11:54 PM Andrew Kanner >>> <andrew.kanner@gmail.com> wrote: >>>> >>>> Syzkaller reported the following issue: >>>> ======================================= >>>> Too BIG xdp->frame_sz = 131072 >> >> Is this a contiguous physical memory allocation? >> >> 131072 bytes equal order 5 page. >> >> Looking at tun.c code I cannot find a code path that could create >> order-5 skb->data, but only SKB with order-0 fragments. But I guess it >> is the netif_receive_generic_xdp() what will realloc to make this linear >> (via skb_linearize()) > > > get_tun_user is passed an iov_iter with a single segment of 65007 > total_len. The alloc_skb path is hit with an align size of only 64. That > is insufficient for XDP so the netif_receive_generic_xdp hits the > pskb_expand_head path. Something is off in the math in > netif_receive_generic_xdp resulting in the skb markers being off. That > causes bpf_prog_run_generic_xdp to compute the wrong frame_sz. BTW, it is pskb_expand_head that turns it from a 64kB to a 128 kB allocation. But the 128kB part is not relevant to the "bug" here really. The warn on getting tripped in bpf_xdp_adjust_tail is because xdp generic path is skb based and can have a frame_sz > 4kB. That's what the splat is about. Perhaps the solution is to remove the WARN_ON.
On Thu, Jul 27, 2023 at 8:27 AM David Ahern <dsahern@gmail.com> wrote: > > On 7/26/23 1:37 PM, David Ahern wrote: > > On 7/26/23 3:02 AM, Jesper Dangaard Brouer wrote: > >> Cc. John and Ahern > >> > >> On 26/07/2023 04.09, Jason Wang wrote: > >>> On Tue, Jul 25, 2023 at 11:54 PM Andrew Kanner > >>> <andrew.kanner@gmail.com> wrote: > >>>> > >>>> Syzkaller reported the following issue: > >>>> ======================================= > >>>> Too BIG xdp->frame_sz = 131072 > >> > >> Is this a contiguous physical memory allocation? > >> > >> 131072 bytes equal order 5 page. > >> > >> Looking at tun.c code I cannot find a code path that could create > >> order-5 skb->data, but only SKB with order-0 fragments. But I guess it > >> is the netif_receive_generic_xdp() what will realloc to make this linear > >> (via skb_linearize()) > > > > > > get_tun_user is passed an iov_iter with a single segment of 65007 > > total_len. The alloc_skb path is hit with an align size of only 64. That > > is insufficient for XDP so the netif_receive_generic_xdp hits the > > pskb_expand_head path. Something is off in the math in > > netif_receive_generic_xdp resulting in the skb markers being off. That > > causes bpf_prog_run_generic_xdp to compute the wrong frame_sz. > > > BTW, it is pskb_expand_head that turns it from a 64kB to a 128 kB > allocation. But the 128kB part is not relevant to the "bug" here really. > > The warn on getting tripped in bpf_xdp_adjust_tail is because xdp > generic path is skb based and can have a frame_sz > 4kB. That's what the > splat is about. Other possibility: tun_can_build_skb() doesn't count XDP_PACKET_HEADROOM this may end up with producing a frame_sz which is greater than PAGE_SIZE as well in tun_build_skb(). And rethink this patch, it looks wrong since it basically drops all packets whose buflen is greater than PAGE_SIZE since it can't fall back to tun_alloc_skb(). > > Perhaps the solution is to remove the WARN_ON. Yes, that is what I'm asking if this warning still makes sense in V1. Thanks > >
On Thu, 2023-07-27 at 14:07 +0800, Jason Wang wrote: > On Thu, Jul 27, 2023 at 8:27 AM David Ahern <dsahern@gmail.com> wrote: > > > > On 7/26/23 1:37 PM, David Ahern wrote: > > > On 7/26/23 3:02 AM, Jesper Dangaard Brouer wrote: > > > > Cc. John and Ahern > > > > > > > > On 26/07/2023 04.09, Jason Wang wrote: > > > > > On Tue, Jul 25, 2023 at 11:54 PM Andrew Kanner > > > > > <andrew.kanner@gmail.com> wrote: > > > > > > > > > > > > Syzkaller reported the following issue: > > > > > > ======================================= > > > > > > Too BIG xdp->frame_sz = 131072 > > > > > > > > Is this a contiguous physical memory allocation? > > > > > > > > 131072 bytes equal order 5 page. > > > > > > > > Looking at tun.c code I cannot find a code path that could create > > > > order-5 skb->data, but only SKB with order-0 fragments. But I guess it > > > > is the netif_receive_generic_xdp() what will realloc to make this linear > > > > (via skb_linearize()) > > > > > > > > > get_tun_user is passed an iov_iter with a single segment of 65007 > > > total_len. The alloc_skb path is hit with an align size of only 64. That > > > is insufficient for XDP so the netif_receive_generic_xdp hits the > > > pskb_expand_head path. Something is off in the math in > > > netif_receive_generic_xdp resulting in the skb markers being off. That > > > causes bpf_prog_run_generic_xdp to compute the wrong frame_sz. > > > > > > BTW, it is pskb_expand_head that turns it from a 64kB to a 128 kB > > allocation. But the 128kB part is not relevant to the "bug" here really. > > > > The warn on getting tripped in bpf_xdp_adjust_tail is because xdp > > generic path is skb based and can have a frame_sz > 4kB. That's what the > > splat is about. > > Other possibility: > > tun_can_build_skb() doesn't count XDP_PACKET_HEADROOM this may end up > with producing a frame_sz which is greater than PAGE_SIZE as well in > tun_build_skb(). > > And rethink this patch, it looks wrong since it basically drops all > packets whose buflen is greater than PAGE_SIZE since it can't fall > back to tun_alloc_skb(). > > > > > Perhaps the solution is to remove the WARN_ON. > > Yes, that is what I'm asking if this warning still makes sense in V1. I understand the consensus is solving the issue by changing/removing the WARN_ON() in XDP. I think it makes sense, I guess the same warn can be reached via packet socket xmit on veth or similar topology. Cheers, Paolo
On 27/07/2023 11.30, Paolo Abeni wrote: > On Thu, 2023-07-27 at 14:07 +0800, Jason Wang wrote: >> On Thu, Jul 27, 2023 at 8:27 AM David Ahern <dsahern@gmail.com> wrote: >>> >>> On 7/26/23 1:37 PM, David Ahern wrote: >>>> On 7/26/23 3:02 AM, Jesper Dangaard Brouer wrote: >>>>> Cc. John and Ahern >>>>> >>>>> On 26/07/2023 04.09, Jason Wang wrote: >>>>>> On Tue, Jul 25, 2023 at 11:54 PM Andrew Kanner >>>>>> <andrew.kanner@gmail.com> wrote: >>>>>>> >>>>>>> Syzkaller reported the following issue: >>>>>>> ======================================= >>>>>>> Too BIG xdp->frame_sz = 131072 >>>>> >>>>> Is this a contiguous physical memory allocation? >>>>> >>>>> 131072 bytes equal order 5 page. >>>>> >>>>> Looking at tun.c code I cannot find a code path that could create >>>>> order-5 skb->data, but only SKB with order-0 fragments. But I guess it >>>>> is the netif_receive_generic_xdp() what will realloc to make this linear >>>>> (via skb_linearize()) >>>> >>>> >>>> get_tun_user is passed an iov_iter with a single segment of 65007 >>>> total_len. The alloc_skb path is hit with an align size of only 64. That >>>> is insufficient for XDP so the netif_receive_generic_xdp hits the >>>> pskb_expand_head path. Something is off in the math in >>>> netif_receive_generic_xdp resulting in the skb markers being off. That >>>> causes bpf_prog_run_generic_xdp to compute the wrong frame_sz. >>> >>> >>> BTW, it is pskb_expand_head that turns it from a 64kB to a 128 kB >>> allocation. But the 128kB part is not relevant to the "bug" here really. >>> True, it is another "bug"/unexpected-behavior that SKB gets reallocated to be 128KiB. We should likely solve this in another patch. >>> The warn on getting tripped in bpf_xdp_adjust_tail is because xdp >>> generic path is skb based and can have a frame_sz > 4kB. That's what the >>> splat is about. Agree, that the warn condition should be changed, even removed. It is interesting that this warn caught this unexpected-behavior of expanding to 128KiB. >> >> Other possibility: >> >> tun_can_build_skb() doesn't count XDP_PACKET_HEADROOM this may end up >> with producing a frame_sz which is greater than PAGE_SIZE as well in >> tun_build_skb(). True, and the way I read the tun_build_skb() code, via skb_page_frag_refill(), it can produce an SKB with data size (buflen) upto order-3 = 32KiB (SKB_FRAG_PAGE_ORDER). Thus, the existing check in tun_can_build_skb() for PAGE_SIZE can/should be relaxed? (Please correct me as I don't fully understand tun_get_user() code) >> >> And rethink this patch, it looks wrong since it basically drops all >> packets whose buflen is greater than PAGE_SIZE since it can't fall >> back to tun_alloc_skb(). >> I agree, this is why I reacted, as this version of the patch could potentially cause issues and packet drops. >>> >>> Perhaps the solution is to remove the WARN_ON. >> >> Yes, that is what I'm asking if this warning still makes sense in V1. > > I understand the consensus is solving the issue by changing/removing > the WARN_ON() in XDP. I think it makes sense, I guess the same warn can > be reached via packet socket xmit on veth or similar topology. > Yes, we can completely remove this check. The original intend was to catch cases where XDP drivers have not been updated to use xdp.frame_sz, but that is not longer a concern (since xdp_init_buff). It was added (by me) in commit: - c8741e2bfe87 ("xdp: Allow bpf_xdp_adjust_tail() to grow packet size") - v5.8-rc1 - as part of merge 5cc5924d8315 I'm sure it is safe to remove since commit: - 43b5169d8355 ("net, xdp: Introduce xdp_init_buff utility routine") - v5.12-rc1 where we introduced xdp_init_buff() helper, which all XDP driver use today. Question is what "Fixes:" tag should the patch have? To Andrew, will you (1) send a new patch that removes this check instead? (2) have cycles to investigate why the unexpected-behavior of expanding to 128KiB happens? --Jesper
On Thu, Jul 27, 2023 at 01:13:10PM +0200, Jesper Dangaard Brouer wrote: > > > On 27/07/2023 11.30, Paolo Abeni wrote: > > On Thu, 2023-07-27 at 14:07 +0800, Jason Wang wrote: > > > On Thu, Jul 27, 2023 at 8:27 AM David Ahern <dsahern@gmail.com> wrote: > > > > > > > > On 7/26/23 1:37 PM, David Ahern wrote: > > > > > On 7/26/23 3:02 AM, Jesper Dangaard Brouer wrote: > > > > > > Cc. John and Ahern > > > > > > > > > > > > On 26/07/2023 04.09, Jason Wang wrote: > > > > > > > On Tue, Jul 25, 2023 at 11:54 PM Andrew Kanner > > > > > > > <andrew.kanner@gmail.com> wrote: > > > > > > > > > > > > > > > > Syzkaller reported the following issue: > > > > > > > > ======================================= > > > > > > > > Too BIG xdp->frame_sz = 131072 > > > > > > > > > > > > Is this a contiguous physical memory allocation? > > > > > > > > > > > > 131072 bytes equal order 5 page. > > > > > > > > > > > > Looking at tun.c code I cannot find a code path that could create > > > > > > order-5 skb->data, but only SKB with order-0 fragments. But I guess it > > > > > > is the netif_receive_generic_xdp() what will realloc to make this linear > > > > > > (via skb_linearize()) > > > > > > > > > > > > > > > get_tun_user is passed an iov_iter with a single segment of 65007 > > > > > total_len. The alloc_skb path is hit with an align size of only 64. That > > > > > is insufficient for XDP so the netif_receive_generic_xdp hits the > > > > > pskb_expand_head path. Something is off in the math in > > > > > netif_receive_generic_xdp resulting in the skb markers being off. That > > > > > causes bpf_prog_run_generic_xdp to compute the wrong frame_sz. > > > > > > > > > > > > BTW, it is pskb_expand_head that turns it from a 64kB to a 128 kB > > > > allocation. But the 128kB part is not relevant to the "bug" here really. > > > > > > True, it is another "bug"/unexpected-behavior that SKB gets reallocated > to be 128KiB. We should likely solve this in another patch. > > > > > The warn on getting tripped in bpf_xdp_adjust_tail is because xdp > > > > generic path is skb based and can have a frame_sz > 4kB. That's what the > > > > splat is about. > > Agree, that the warn condition should be changed, even removed. > It is interesting that this warn caught this unexpected-behavior of > expanding to 128KiB. > > > > > > > Other possibility: > > > > > > tun_can_build_skb() doesn't count XDP_PACKET_HEADROOM this may end up > > > with producing a frame_sz which is greater than PAGE_SIZE as well in > > > tun_build_skb(). > > True, and the way I read the tun_build_skb() code, via > skb_page_frag_refill(), > it can produce an SKB with data size (buflen) upto order-3 = 32KiB > (SKB_FRAG_PAGE_ORDER). > > Thus, the existing check in tun_can_build_skb() for PAGE_SIZE can/should be > relaxed? > (Please correct me as I don't fully understand tun_get_user() code) > > > > > > > And rethink this patch, it looks wrong since it basically drops all > > > packets whose buflen is greater than PAGE_SIZE since it can't fall > > > back to tun_alloc_skb(). > > > > > I agree, this is why I reacted, as this version of the patch could > potentially cause issues and packet drops. > > > > > > > > > Perhaps the solution is to remove the WARN_ON. > > > > > > Yes, that is what I'm asking if this warning still makes sense in V1. > > > > I understand the consensus is solving the issue by changing/removing > > the WARN_ON() in XDP. I think it makes sense, I guess the same warn can > > be reached via packet socket xmit on veth or similar topology. > > > > Yes, we can completely remove this check. The original intend was to > catch cases where XDP drivers have not been updated to use xdp.frame_sz, > but that is not longer a concern (since xdp_init_buff). > > It was added (by me) in commit: > - c8741e2bfe87 ("xdp: Allow bpf_xdp_adjust_tail() to grow packet size") > - v5.8-rc1 > - as part of merge 5cc5924d8315 > > I'm sure it is safe to remove since commit: > - 43b5169d8355 ("net, xdp: Introduce xdp_init_buff utility routine") > - v5.12-rc1 > > where we introduced xdp_init_buff() helper, which all XDP driver use today. > Question is what "Fixes:" tag should the patch have? > > To Andrew, will you > (1) send a new patch that removes this check instead? > (2) have cycles to investigate why the unexpected-behavior of > expanding to 128KiB happens? > > --Jesper > Thanks, everyone. If we summarize the discussion - there are 3 issues here: 1. tun_can_build_skb() doesn't count XDP_PACKET_HEADROOM (minor and most trivial) 2. WARN_ON_ONCE from net/core/filter.c, which may be too strict / not needed at all. 3. strange behaviour with reallocationg SKB (65007 -> 131072) I can check these issues. I have to dive a little deeper with 2-3, most likely with kgdb and syzkaller repro. But seems this is not somewhat urgent and lives quite a long time without being noticed. BTW: Attached the ftrace logs using the original syzkaller repro (starting with tun_get_user()). They answer Jesper's question about contiguous physical memory allocation (kmem_cache_alloc_node() / kmalloc_reserve()). But I'll check it one more time before submitting a new PATCH V4 or another patch / patch series.
On 7/27/23 5:48 PM, Andrew Kanner wrote: > > Thanks, everyone. > > If we summarize the discussion - there are 3 issues here: > 1. tun_can_build_skb() doesn't count XDP_PACKET_HEADROOM (minor and > most trivial) > 2. WARN_ON_ONCE from net/core/filter.c, which may be too strict / not > needed at all. > 3. strange behaviour with reallocationg SKB (65007 -> 131072) I believe that happens because of the current skb size and the need to expand it to account for the XDP headroom makes the allocation go over 64kB. Since tun is given the packet via a write call there are no header markers to allocate separate space for headers and data (e.g. like TCP does with 32kB data segments). > > I can check these issues. I have to dive a little deeper with 2-3, > most likely with kgdb and syzkaller repro. But seems this is not > somewhat urgent and lives quite a long time without being noticed. > > BTW: Attached the ftrace logs using the original syzkaller repro > (starting with tun_get_user()). They answer Jesper's question about > contiguous physical memory allocation (kmem_cache_alloc_node() / > kmalloc_reserve()). But I'll check it one more time before submitting > a new PATCH V4 or another patch / patch series. >
On Thu, Jul 27, 2023 at 06:11:57PM -0600, David Ahern wrote: > On 7/27/23 5:48 PM, Andrew Kanner wrote: > > > > Thanks, everyone. > > > > If we summarize the discussion - there are 3 issues here: > > 1. tun_can_build_skb() doesn't count XDP_PACKET_HEADROOM (minor and > > most trivial) > > 2. WARN_ON_ONCE from net/core/filter.c, which may be too strict / not > > needed at all. > > 3. strange behaviour with reallocationg SKB (65007 -> 131072) > > I believe that happens because of the current skb size and the need to > expand it to account for the XDP headroom makes the allocation go over > 64kB. Since tun is given the packet via a write call there are no header > markers to allocate separate space for headers and data (e.g. like TCP > does with 32kB data segments). Yes, this is exactly what you suspected. In pskb_expand_head() -> kmalloc_reserve() I have these values initially: (gdb) p *size $13 = 65408 (gdb) p obj_size $16 = 65728 and it will do: data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL); ... obj_size = SKB_HEAD_ALIGN(*size); ... *size = obj_size = kmalloc_size_roundup(obj_size); (gdb) p *size $22 = 131072 So this is kmalloc_size_roundup() doing this math with the following: /* Above the smaller buckets, size is a multiple of page size. */ │ if (size > KMALLOC_MAX_CACHE_SIZE) │ return PAGE_SIZE << get_order(size); > > > > I can check these issues. I have to dive a little deeper with 2-3, > > most likely with kgdb and syzkaller repro. But seems this is not > > somewhat urgent and lives quite a long time without being noticed. > > > > BTW: Attached the ftrace logs using the original syzkaller repro > > (starting with tun_get_user()). They answer Jesper's question about > > contiguous physical memory allocation (kmem_cache_alloc_node() / > > kmalloc_reserve()). But I'll check it one more time before submitting > > a new PATCH V4 or another patch / patch series. > > > I see no other bugs in math, so not sure wether it should be fixed. Is it ok and expected to roundup the memory allocation?
diff --git a/drivers/net/tun.c b/drivers/net/tun.c index d75456adc62a..7c2b05ce0421 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1594,10 +1594,6 @@ static bool tun_can_build_skb(struct tun_struct *tun, struct tun_file *tfile, if (zerocopy) return false; - if (SKB_DATA_ALIGN(len + TUN_RX_PAD) + - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) > PAGE_SIZE) - return false; - return true; } @@ -1673,6 +1669,9 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, buflen += SKB_DATA_ALIGN(len + pad); rcu_read_unlock(); + if (buflen > PAGE_SIZE) + return ERR_PTR(-EFAULT); + alloc_frag->offset = ALIGN((u64)alloc_frag->offset, SMP_CACHE_BYTES); if (unlikely(!skb_page_frag_refill(buflen, alloc_frag, GFP_KERNEL))) return ERR_PTR(-ENOMEM);