Message ID | 20221101040440.3637007-1-zhongbaisong@huawei.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp2716404wru; Mon, 31 Oct 2022 21:07:10 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7dpDtwD1eA7zn39cFXEcRiVPwDcn4IlzX+1b30V03618y5/4+d9cek0GViksvKQa3OVOxq X-Received: by 2002:a17:90a:1946:b0:212:f926:5382 with SMTP id 6-20020a17090a194600b00212f9265382mr17897894pjh.218.1667275630403; Mon, 31 Oct 2022 21:07:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667275630; cv=none; d=google.com; s=arc-20160816; b=jxt8zJ72l/VvYjGrF+9QfFrtktamq86Bx9mquKQPqsDlbFwHb2nUUfrF0gaAVhnGSL bPcKoTW1oonIaSCVh5SfzZNoF3oQkwVooLgwQzZbbQg1ycJfwuueCo8/hyThTwgGLmcy 4FeY6+eKQj51fB5jdEPnysiPPPh6Gddjgm4iHrNNshf8Z+MF9Xo680g3s6GuB1/R+9/7 SuBT6pPavPPDdX3bpq05JjQQCeF1fICj10RkH+fYJ6Net/uoHgpP75XqO1bwRLoIrPY7 Qdx91Bm8Og++hz4kLvjwV7BrKFHVCsQVWzRrScz0A6KG8Yrgs7qJWU0I4ulgc0DEuHRC 6ZTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=sN4C6vjKTnBKmXcJ3Pk/K5y5gBNTOKWc/YLTZFMaLAI=; b=VU8Aq9SZFG8w4Uu0bDpStztV+0v/ud76tNZ6pANszppWBtYZQhY/fTEGB3EqpQrrJQ fNVXacurLQbS3Zm3dnLdR2zpiTcA8FiLNJ976hqCxq0V2qc0GUxSPHV6rkzLSKuPrB9k tAPWaFr3Jtehx301pYDdvDFOPWOTHyxG0GjN6tG0EX9kzgbKJbzykyDFe4cyd43FWHta UHutH3W8Sb/VFKFvGB8Ywwi00RmMX2nLZTmJDqZhOxsvxaUbn09YOY0jy6a5JTPrZx4x KBuSmjCQJmRVQV9E4hQJlKWsPknYbS6AzcI+BHbKmU7r/KTVbFNqk82Z5HeCn1iVwZyu e7Kw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v4-20020a632f04000000b0046edd275609si11187575pgv.684.2022.10.31.21.06.57; Mon, 31 Oct 2022 21:07:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229511AbiKAEEp (ORCPT <rfc822;kartikey406@gmail.com> + 99 others); Tue, 1 Nov 2022 00:04:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43816 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229452AbiKAEEo (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 1 Nov 2022 00:04:44 -0400 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C47B313F9A; Mon, 31 Oct 2022 21:04:42 -0700 (PDT) Received: from canpemm500005.china.huawei.com (unknown [172.30.72.56]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4N1brK2f6GzVj3y; Tue, 1 Nov 2022 11:59:45 +0800 (CST) Received: from huawei.com (10.175.104.82) by canpemm500005.china.huawei.com (7.192.104.229) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 1 Nov 2022 12:04:40 +0800 From: Baisong Zhong <zhongbaisong@huawei.com> To: <edumazet@google.com>, <davem@davemloft.net>, <kuba@kernel.org>, <pabeni@redhat.com> CC: <linux-kernel@vger.kernel.org>, <bpf@vger.kernel.org>, <netdev@vger.kernel.org>, <zhongbaisong@huawei.com>, <ast@kernel.org>, <song@kernel.org>, <yhs@fb.com>, <haoluo@google.com> Subject: [PATCH -next] bpf, test_run: fix alignment problem in bpf_prog_test_run_skb() Date: Tue, 1 Nov 2022 12:04:40 +0800 Message-ID: <20221101040440.3637007-1-zhongbaisong@huawei.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.175.104.82] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To canpemm500005.china.huawei.com (7.192.104.229) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748265211376281154?= X-GMAIL-MSGID: =?utf-8?q?1748265211376281154?= |
Series |
[-next] bpf, test_run: fix alignment problem in bpf_prog_test_run_skb()
|
|
Commit Message
Baisong Zhong
Nov. 1, 2022, 4:04 a.m. UTC
Recently, we got a syzkaller problem because of aarch64
alignment fault if KFENCE enabled.
When the size from user bpf program is an odd number, like
399, 407, etc, it will cause skb shard info's alignment access,
as seen below:
BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032
Use-after-free read at 0xffff6254fffac077 (in kfence-#213):
__lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline]
arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline]
arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline]
atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline]
__skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032
skb_clone+0xf4/0x214 net/core/skbuff.c:1481
____bpf_clone_redirect net/core/filter.c:2433 [inline]
bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420
bpf_prog_d3839dd9068ceb51+0x80/0x330
bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline]
bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53
bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594
bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
__do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
__se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381
kfence-#213: 0xffff6254fffac000-0xffff6254fffac196, size=407, cache=kmalloc-512
allocated by task 15074 on cpu 0 at 1342.585390s:
kmalloc include/linux/slab.h:568 [inline]
kzalloc include/linux/slab.h:675 [inline]
bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191
bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512
bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
__do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
__se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381
__arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381
To fix the problem, we round up allocations with kmalloc_size_roundup()
so that build_skb()'s use of kize() is always alignment and no special
handling of the memory is needed by KFENCE.
Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command")
Signed-off-by: Baisong Zhong <zhongbaisong@huawei.com>
---
net/bpf/test_run.c | 1 +
1 file changed, 1 insertion(+)
Comments
[ +kfence folks ] On 11/1/22 5:04 AM, Baisong Zhong wrote: > Recently, we got a syzkaller problem because of aarch64 > alignment fault if KFENCE enabled. > > When the size from user bpf program is an odd number, like > 399, 407, etc, it will cause skb shard info's alignment access, > as seen below: > > BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032 > > Use-after-free read at 0xffff6254fffac077 (in kfence-#213): > __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline] > arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline] > arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline] > atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline] > __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032 > skb_clone+0xf4/0x214 net/core/skbuff.c:1481 > ____bpf_clone_redirect net/core/filter.c:2433 [inline] > bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420 > bpf_prog_d3839dd9068ceb51+0x80/0x330 > bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline] > bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53 > bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594 > bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline] > __do_sys_bpf kernel/bpf/syscall.c:4441 [inline] > __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381 > > kfence-#213: 0xffff6254fffac000-0xffff6254fffac196, size=407, cache=kmalloc-512 > > allocated by task 15074 on cpu 0 at 1342.585390s: > kmalloc include/linux/slab.h:568 [inline] > kzalloc include/linux/slab.h:675 [inline] > bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191 > bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512 > bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline] > __do_sys_bpf kernel/bpf/syscall.c:4441 [inline] > __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381 > __arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381 > > To fix the problem, we round up allocations with kmalloc_size_roundup() > so that build_skb()'s use of kize() is always alignment and no special > handling of the memory is needed by KFENCE. > > Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command") > Signed-off-by: Baisong Zhong <zhongbaisong@huawei.com> > --- > net/bpf/test_run.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c > index 13d578ce2a09..058b67108873 100644 > --- a/net/bpf/test_run.c > +++ b/net/bpf/test_run.c > @@ -774,6 +774,7 @@ static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size, > if (user_size > size) > return ERR_PTR(-EMSGSIZE); > > + size = kmalloc_size_roundup(size); > data = kzalloc(size + headroom + tailroom, GFP_USER); The fact that you need to do this roundup on call sites feels broken, no? Was there some discussion / consensus that now all k*alloc() call sites would need to be fixed up? Couldn't this be done transparently in k*alloc() when KFENCE is enabled? I presume there may be lots of other such occasions in the kernel where similar issue triggers, fixing up all call-sites feels like ton of churn compared to api-internal, generic fix. > if (!data) > return ERR_PTR(-ENOMEM); > Thanks, Daniel
On 2022/11/2 0:45, Daniel Borkmann wrote: > [ +kfence folks ] + cc: Alexander Potapenko, Marco Elver, Dmitry Vyukov Do you have any suggestions about this problem? Thanks, . > > On 11/1/22 5:04 AM, Baisong Zhong wrote: >> Recently, we got a syzkaller problem because of aarch64 >> alignment fault if KFENCE enabled. >> >> When the size from user bpf program is an odd number, like >> 399, 407, etc, it will cause skb shard info's alignment access, >> as seen below: >> >> BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 >> net/core/skbuff.c:1032 >> >> Use-after-free read at 0xffff6254fffac077 (in kfence-#213): >> __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline] >> arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline] >> arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline] >> atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline] >> __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032 >> skb_clone+0xf4/0x214 net/core/skbuff.c:1481 >> ____bpf_clone_redirect net/core/filter.c:2433 [inline] >> bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420 >> bpf_prog_d3839dd9068ceb51+0x80/0x330 >> bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline] >> bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53 >> bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594 >> bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline] >> __do_sys_bpf kernel/bpf/syscall.c:4441 [inline] >> __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381 >> >> kfence-#213: 0xffff6254fffac000-0xffff6254fffac196, size=407, >> cache=kmalloc-512 >> >> allocated by task 15074 on cpu 0 at 1342.585390s: >> kmalloc include/linux/slab.h:568 [inline] >> kzalloc include/linux/slab.h:675 [inline] >> bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191 >> bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512 >> bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline] >> __do_sys_bpf kernel/bpf/syscall.c:4441 [inline] >> __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381 >> __arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381 >> >> To fix the problem, we round up allocations with kmalloc_size_roundup() >> so that build_skb()'s use of kize() is always alignment and no special >> handling of the memory is needed by KFENCE. >> >> Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command") >> Signed-off-by: Baisong Zhong <zhongbaisong@huawei.com> >> --- >> net/bpf/test_run.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c >> index 13d578ce2a09..058b67108873 100644 >> --- a/net/bpf/test_run.c >> +++ b/net/bpf/test_run.c >> @@ -774,6 +774,7 @@ static void *bpf_test_init(const union bpf_attr >> *kattr, u32 user_size, >> if (user_size > size) >> return ERR_PTR(-EMSGSIZE); >> + size = kmalloc_size_roundup(size); >> data = kzalloc(size + headroom + tailroom, GFP_USER); > > The fact that you need to do this roundup on call sites feels broken, no? > Was there some discussion / consensus that now all k*alloc() call sites > would need to be fixed up? Couldn't this be done transparently in k*alloc() > when KFENCE is enabled? I presume there may be lots of other such occasions > in the kernel where similar issue triggers, fixing up all call-sites feels > like ton of churn compared to api-internal, generic fix. > >> if (!data) >> return ERR_PTR(-ENOMEM); >> > > Thanks, > Daniel >
On Wed, 2 Nov 2022 10:59:44 +0800 zhongbaisong wrote: > On 2022/11/2 0:45, Daniel Borkmann wrote: > > [ +kfence folks ] > > + cc: Alexander Potapenko, Marco Elver, Dmitry Vyukov > > Do you have any suggestions about this problem? + Kees who has been sending similar patches for drivers > > On 11/1/22 5:04 AM, Baisong Zhong wrote: > >> Recently, we got a syzkaller problem because of aarch64 > >> alignment fault if KFENCE enabled. > >> > >> When the size from user bpf program is an odd number, like > >> 399, 407, etc, it will cause skb shard info's alignment access, > >> as seen below: > >> > >> BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 > >> net/core/skbuff.c:1032 > >> > >> Use-after-free read at 0xffff6254fffac077 (in kfence-#213): > >> __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline] > >> arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline] > >> arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline] > >> atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline] > >> __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032 > >> skb_clone+0xf4/0x214 net/core/skbuff.c:1481 > >> ____bpf_clone_redirect net/core/filter.c:2433 [inline] > >> bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420 > >> bpf_prog_d3839dd9068ceb51+0x80/0x330 > >> bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline] > >> bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53 > >> bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594 > >> bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline] > >> __do_sys_bpf kernel/bpf/syscall.c:4441 [inline] > >> __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381 > >> > >> kfence-#213: 0xffff6254fffac000-0xffff6254fffac196, size=407, > >> cache=kmalloc-512 > >> > >> allocated by task 15074 on cpu 0 at 1342.585390s: > >> kmalloc include/linux/slab.h:568 [inline] > >> kzalloc include/linux/slab.h:675 [inline] > >> bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191 > >> bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512 > >> bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline] > >> __do_sys_bpf kernel/bpf/syscall.c:4441 [inline] > >> __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381 > >> __arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381 > >> > >> To fix the problem, we round up allocations with kmalloc_size_roundup() > >> so that build_skb()'s use of kize() is always alignment and no special > >> handling of the memory is needed by KFENCE. > >> > >> Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command") > >> Signed-off-by: Baisong Zhong <zhongbaisong@huawei.com> > >> --- > >> net/bpf/test_run.c | 1 + > >> 1 file changed, 1 insertion(+) > >> > >> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c > >> index 13d578ce2a09..058b67108873 100644 > >> --- a/net/bpf/test_run.c > >> +++ b/net/bpf/test_run.c > >> @@ -774,6 +774,7 @@ static void *bpf_test_init(const union bpf_attr > >> *kattr, u32 user_size, > >> if (user_size > size) > >> return ERR_PTR(-EMSGSIZE); > >> + size = kmalloc_size_roundup(size); > >> data = kzalloc(size + headroom + tailroom, GFP_USER); > > > > The fact that you need to do this roundup on call sites feels broken, no? > > Was there some discussion / consensus that now all k*alloc() call sites > > would need to be fixed up? Couldn't this be done transparently in k*alloc() > > when KFENCE is enabled? I presume there may be lots of other such occasions > > in the kernel where similar issue triggers, fixing up all call-sites feels > > like ton of churn compared to api-internal, generic fix. > > > >> if (!data) > >> return ERR_PTR(-ENOMEM); > >> > > > > Thanks, > > Daniel > > > >
On Tue, Nov 01, 2022 at 09:05:42PM -0700, Jakub Kicinski wrote: > On Wed, 2 Nov 2022 10:59:44 +0800 zhongbaisong wrote: > > On 2022/11/2 0:45, Daniel Borkmann wrote: > > > [ +kfence folks ] > > > > + cc: Alexander Potapenko, Marco Elver, Dmitry Vyukov > > > > Do you have any suggestions about this problem? > > + Kees who has been sending similar patches for drivers > > > > On 11/1/22 5:04 AM, Baisong Zhong wrote: > > >> Recently, we got a syzkaller problem because of aarch64 > > >> alignment fault if KFENCE enabled. > > >> > > >> When the size from user bpf program is an odd number, like > > >> 399, 407, etc, it will cause skb shard info's alignment access, > > >> as seen below: > > >> > > >> BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 > > >> net/core/skbuff.c:1032 > > >> > > >> Use-after-free read at 0xffff6254fffac077 (in kfence-#213): > > >> __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline] > > >> arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline] > > >> arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline] > > >> atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline] > > >> __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032 > > >> skb_clone+0xf4/0x214 net/core/skbuff.c:1481 > > >> ____bpf_clone_redirect net/core/filter.c:2433 [inline] > > >> bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420 > > >> bpf_prog_d3839dd9068ceb51+0x80/0x330 > > >> bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline] > > >> bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53 > > >> bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594 > > >> bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline] > > >> __do_sys_bpf kernel/bpf/syscall.c:4441 [inline] > > >> __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381 > > >> > > >> kfence-#213: 0xffff6254fffac000-0xffff6254fffac196, size=407, > > >> cache=kmalloc-512 > > >> > > >> allocated by task 15074 on cpu 0 at 1342.585390s: > > >> kmalloc include/linux/slab.h:568 [inline] > > >> kzalloc include/linux/slab.h:675 [inline] > > >> bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191 > > >> bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512 > > >> bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline] > > >> __do_sys_bpf kernel/bpf/syscall.c:4441 [inline] > > >> __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381 > > >> __arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381 > > >> > > >> To fix the problem, we round up allocations with kmalloc_size_roundup() > > >> so that build_skb()'s use of kize() is always alignment and no special > > >> handling of the memory is needed by KFENCE. > > >> > > >> Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command") > > >> Signed-off-by: Baisong Zhong <zhongbaisong@huawei.com> > > >> --- > > >> net/bpf/test_run.c | 1 + > > >> 1 file changed, 1 insertion(+) > > >> > > >> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c > > >> index 13d578ce2a09..058b67108873 100644 > > >> --- a/net/bpf/test_run.c > > >> +++ b/net/bpf/test_run.c > > >> @@ -774,6 +774,7 @@ static void *bpf_test_init(const union bpf_attr > > >> *kattr, u32 user_size, > > >> if (user_size > size) > > >> return ERR_PTR(-EMSGSIZE); > > >> + size = kmalloc_size_roundup(size); > > >> data = kzalloc(size + headroom + tailroom, GFP_USER); > > > > > > The fact that you need to do this roundup on call sites feels broken, no? > > > Was there some discussion / consensus that now all k*alloc() call sites > > > would need to be fixed up? Couldn't this be done transparently in k*alloc() > > > when KFENCE is enabled? I presume there may be lots of other such occasions > > > in the kernel where similar issue triggers, fixing up all call-sites feels > > > like ton of churn compared to api-internal, generic fix. I hope I answer this in more detail here: https://lore.kernel.org/lkml/202211010937.4631CB1B0E@keescook/ The problem is that ksize() should never have existed in the first place. :P Every runtime bounds checker has tripped over it, and with the addition of the __alloc_size attribute, I had to start ripping ksize() out: it can't be used to pretend an allocation grew in size. Things need to either preallocate more or go through *realloc() like everything else. Luckily, ksize() is rare. FWIW, the above fix doesn't look correct to me -- I would expect this to be: size_t alloc_size; ... alloc_size = kmalloc_size_roundup(size + headroom + tailroom); data = kzalloc(alloc_size, GFP_USER);
On Tue, Nov 1, 2022 at 9:27 PM Kees Cook <keescook@chromium.org> wrote: > > On Tue, Nov 01, 2022 at 09:05:42PM -0700, Jakub Kicinski wrote: > > On Wed, 2 Nov 2022 10:59:44 +0800 zhongbaisong wrote: > > > On 2022/11/2 0:45, Daniel Borkmann wrote: > > > > [ +kfence folks ] > > > > > > + cc: Alexander Potapenko, Marco Elver, Dmitry Vyukov > > > > > > Do you have any suggestions about this problem? > > > > + Kees who has been sending similar patches for drivers > > > > > > On 11/1/22 5:04 AM, Baisong Zhong wrote: > > > >> Recently, we got a syzkaller problem because of aarch64 > > > >> alignment fault if KFENCE enabled. > > > >> > > > >> When the size from user bpf program is an odd number, like > > > >> 399, 407, etc, it will cause skb shard info's alignment access, > > > >> as seen below: > > > >> > > > >> BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 > > > >> net/core/skbuff.c:1032 > > > >> > > > >> Use-after-free read at 0xffff6254fffac077 (in kfence-#213): > > > >> __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline] > > > >> arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline] > > > >> arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline] > > > >> atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline] > > > >> __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032 > > > >> skb_clone+0xf4/0x214 net/core/skbuff.c:1481 > > > >> ____bpf_clone_redirect net/core/filter.c:2433 [inline] > > > >> bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420 > > > >> bpf_prog_d3839dd9068ceb51+0x80/0x330 > > > >> bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline] > > > >> bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53 > > > >> bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594 > > > >> bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline] > > > >> __do_sys_bpf kernel/bpf/syscall.c:4441 [inline] > > > >> __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381 > > > >> > > > >> kfence-#213: 0xffff6254fffac000-0xffff6254fffac196, size=407, > > > >> cache=kmalloc-512 > > > >> > > > >> allocated by task 15074 on cpu 0 at 1342.585390s: > > > >> kmalloc include/linux/slab.h:568 [inline] > > > >> kzalloc include/linux/slab.h:675 [inline] > > > >> bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191 > > > >> bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512 > > > >> bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline] > > > >> __do_sys_bpf kernel/bpf/syscall.c:4441 [inline] > > > >> __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381 > > > >> __arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381 > > > >> > > > >> To fix the problem, we round up allocations with kmalloc_size_roundup() > > > >> so that build_skb()'s use of kize() is always alignment and no special > > > >> handling of the memory is needed by KFENCE. > > > >> > > > >> Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command") > > > >> Signed-off-by: Baisong Zhong <zhongbaisong@huawei.com> > > > >> --- > > > >> net/bpf/test_run.c | 1 + > > > >> 1 file changed, 1 insertion(+) > > > >> > > > >> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c > > > >> index 13d578ce2a09..058b67108873 100644 > > > >> --- a/net/bpf/test_run.c > > > >> +++ b/net/bpf/test_run.c > > > >> @@ -774,6 +774,7 @@ static void *bpf_test_init(const union bpf_attr > > > >> *kattr, u32 user_size, > > > >> if (user_size > size) > > > >> return ERR_PTR(-EMSGSIZE); > > > >> + size = kmalloc_size_roundup(size); > > > >> data = kzalloc(size + headroom + tailroom, GFP_USER); > > > > > > > > The fact that you need to do this roundup on call sites feels broken, no? > > > > Was there some discussion / consensus that now all k*alloc() call sites > > > > would need to be fixed up? Couldn't this be done transparently in k*alloc() > > > > when KFENCE is enabled? I presume there may be lots of other such occasions > > > > in the kernel where similar issue triggers, fixing up all call-sites feels > > > > like ton of churn compared to api-internal, generic fix. > > I hope I answer this in more detail here: > https://lore.kernel.org/lkml/202211010937.4631CB1B0E@keescook/ > > The problem is that ksize() should never have existed in the first > place. :P Every runtime bounds checker has tripped over it, and with > the addition of the __alloc_size attribute, I had to start ripping > ksize() out: it can't be used to pretend an allocation grew in size. > Things need to either preallocate more or go through *realloc() like > everything else. Luckily, ksize() is rare. > > FWIW, the above fix doesn't look correct to me -- I would expect this to > be: > > size_t alloc_size; > ... > alloc_size = kmalloc_size_roundup(size + headroom + tailroom); > data = kzalloc(alloc_size, GFP_USER); Making sure the struct skb_shared_info is aligned to a cache line does not need kmalloc_size_roundup(). What is needed is to adjust @size so that (@size + @headroom) is a multiple of SMP_CACHE_BYTES
On 2022/11/2 12:37, Eric Dumazet wrote: > On Tue, Nov 1, 2022 at 9:27 PM Kees Cook <keescook@chromium.org> wrote: >> >> On Tue, Nov 01, 2022 at 09:05:42PM -0700, Jakub Kicinski wrote: >>> On Wed, 2 Nov 2022 10:59:44 +0800 zhongbaisong wrote: >>>> On 2022/11/2 0:45, Daniel Borkmann wrote: >>>>> [ +kfence folks ] >>>> >>>> + cc: Alexander Potapenko, Marco Elver, Dmitry Vyukov >>>> >>>> Do you have any suggestions about this problem? >>> >>> + Kees who has been sending similar patches for drivers >>> >>>>> On 11/1/22 5:04 AM, Baisong Zhong wrote: >>>>>> Recently, we got a syzkaller problem because of aarch64 >>>>>> alignment fault if KFENCE enabled. >>>>>> >>>>>> When the size from user bpf program is an odd number, like >>>>>> 399, 407, etc, it will cause skb shard info's alignment access, >>>>>> as seen below: >>>>>> >>>>>> BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 >>>>>> net/core/skbuff.c:1032 >>>>>> >>>>>> Use-after-free read at 0xffff6254fffac077 (in kfence-#213): >>>>>> __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline] >>>>>> arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline] >>>>>> arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline] >>>>>> atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline] >>>>>> __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032 >>>>>> skb_clone+0xf4/0x214 net/core/skbuff.c:1481 >>>>>> ____bpf_clone_redirect net/core/filter.c:2433 [inline] >>>>>> bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420 >>>>>> bpf_prog_d3839dd9068ceb51+0x80/0x330 >>>>>> bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline] >>>>>> bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53 >>>>>> bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594 >>>>>> bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline] >>>>>> __do_sys_bpf kernel/bpf/syscall.c:4441 [inline] >>>>>> __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381 >>>>>> >>>>>> kfence-#213: 0xffff6254fffac000-0xffff6254fffac196, size=407, >>>>>> cache=kmalloc-512 >>>>>> >>>>>> allocated by task 15074 on cpu 0 at 1342.585390s: >>>>>> kmalloc include/linux/slab.h:568 [inline] >>>>>> kzalloc include/linux/slab.h:675 [inline] >>>>>> bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191 >>>>>> bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512 >>>>>> bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline] >>>>>> __do_sys_bpf kernel/bpf/syscall.c:4441 [inline] >>>>>> __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381 >>>>>> __arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381 >>>>>> >>>>>> To fix the problem, we round up allocations with kmalloc_size_roundup() >>>>>> so that build_skb()'s use of kize() is always alignment and no special >>>>>> handling of the memory is needed by KFENCE. >>>>>> >>>>>> Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command") >>>>>> Signed-off-by: Baisong Zhong <zhongbaisong@huawei.com> >>>>>> --- >>>>>> net/bpf/test_run.c | 1 + >>>>>> 1 file changed, 1 insertion(+) >>>>>> >>>>>> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c >>>>>> index 13d578ce2a09..058b67108873 100644 >>>>>> --- a/net/bpf/test_run.c >>>>>> +++ b/net/bpf/test_run.c >>>>>> @@ -774,6 +774,7 @@ static void *bpf_test_init(const union bpf_attr >>>>>> *kattr, u32 user_size, >>>>>> if (user_size > size) >>>>>> return ERR_PTR(-EMSGSIZE); >>>>>> + size = kmalloc_size_roundup(size); >>>>>> data = kzalloc(size + headroom + tailroom, GFP_USER); >>>>> >>>>> The fact that you need to do this roundup on call sites feels broken, no? >>>>> Was there some discussion / consensus that now all k*alloc() call sites >>>>> would need to be fixed up? Couldn't this be done transparently in k*alloc() >>>>> when KFENCE is enabled? I presume there may be lots of other such occasions >>>>> in the kernel where similar issue triggers, fixing up all call-sites feels >>>>> like ton of churn compared to api-internal, generic fix. >> >> I hope I answer this in more detail here: >> https://lore.kernel.org/lkml/202211010937.4631CB1B0E@keescook/ >> >> The problem is that ksize() should never have existed in the first >> place. :P Every runtime bounds checker has tripped over it, and with >> the addition of the __alloc_size attribute, I had to start ripping >> ksize() out: it can't be used to pretend an allocation grew in size. >> Things need to either preallocate more or go through *realloc() like >> everything else. Luckily, ksize() is rare. >> >> FWIW, the above fix doesn't look correct to me -- I would expect this to >> be: >> >> size_t alloc_size; >> ... >> alloc_size = kmalloc_size_roundup(size + headroom + tailroom); >> data = kzalloc(alloc_size, GFP_USER); > > Making sure the struct skb_shared_info is aligned to a cache line does > not need kmalloc_size_roundup(). > > What is needed is to adjust @size so that (@size + @headroom) is a > multiple of SMP_CACHE_BYTES ok, I'll fix it and send v2. Thanks .
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 13d578ce2a09..058b67108873 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -774,6 +774,7 @@ static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size, if (user_size > size) return ERR_PTR(-EMSGSIZE); + size = kmalloc_size_roundup(size); data = kzalloc(size + headroom + tailroom, GFP_USER); if (!data) return ERR_PTR(-ENOMEM);