skbuff: Reallocate to ksize() in __build_skb_around()

Message ID 20221206231659.never.929-kees@kernel.org
State New
Headers
Series skbuff: Reallocate to ksize() in __build_skb_around() |

Commit Message

Kees Cook Dec. 6, 2022, 11:17 p.m. UTC
  When build_skb() is passed a frag_size of 0, it means the buffer came
from kmalloc. In these cases, ksize() is used to find its actual size,
but since the allocation may not have been made to that size, actually
perform the krealloc() call so that all the associated buffer size
checking will be correctly notified. For example, syzkaller reported:

  BUG: KASAN: slab-out-of-bounds in __build_skb_around+0x235/0x340 net/core/skbuff.c:294
  Write of size 32 at addr ffff88802aa172c0 by task syz-executor413/5295

For bpf_prog_test_run_skb(), which uses a kmalloc()ed buffer passed to
build_skb().

Reported-by: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com
Link: https://groups.google.com/g/syzkaller-bugs/c/UnIKxTtU5-0/m/-wbXinkgAQAJ
Fixes: 38931d8989b5 ("mm: Make ksize() a reporting-only function")
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: pepsipu <soopthegoop@gmail.com>
Cc: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: kasan-dev <kasan-dev@googlegroups.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: ast@kernel.org
Cc: bpf <bpf@vger.kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Hao Luo <haoluo@google.com>
Cc: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: jolsa@kernel.org
Cc: KP Singh <kpsingh@kernel.org>
Cc: martin.lau@linux.dev
Cc: Stanislav Fomichev <sdf@google.com>
Cc: song@kernel.org
Cc: Yonghong Song <yhs@fb.com>
Cc: netdev@vger.kernel.org
Cc: LKML <linux-kernel@vger.kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 net/core/skbuff.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)
  

Comments

Jakub Kicinski Dec. 7, 2022, 1:55 a.m. UTC | #1
On Tue,  6 Dec 2022 15:17:14 -0800 Kees Cook wrote:
> -	unsigned int size = frag_size ? : ksize(data);
> +	unsigned int size = frag_size;
> +
> +	/* When frag_size == 0, the buffer came from kmalloc, so we
> +	 * must find its true allocation size (and grow it to match).
> +	 */
> +	if (unlikely(size == 0)) {
> +		void *resized;
> +
> +		size = ksize(data);
> +		/* krealloc() will immediate return "data" when
> +		 * "ksize(data)" is requested: it is the existing upper
> +		 * bounds. As a result, GFP_ATOMIC will be ignored.
> +		 */
> +		resized = krealloc(data, size, GFP_ATOMIC);
> +		if (WARN_ON(resized != data))
> +			data = resized;
> +	}
>  

Aammgh. build_skb(0) is plain silly, AFAIK. The performance hit of
using kmalloc()'ed heads is large because GRO can't free the metadata.
So we end up carrying per-MTU skbs across to the application and then
freeing them one by one. With pages we just aggregate up to 64k of data
in a single skb.

I can only grep out 3 cases of build_skb(.. 0), could we instead
convert them into a new build_skb_slab(), and handle all the silliness
in such a new helper? That'd be a win both for the memory safety and one
fewer branch for the fast path.

I think it's worth doing, so LMK if you're okay to do this extra work,
otherwise I can help (unless e.g. Eric tells me I'm wrong..).
  
Kees Cook Dec. 7, 2022, 3:47 a.m. UTC | #2
On December 6, 2022 5:55:57 PM PST, Jakub Kicinski <kuba@kernel.org> wrote:
>On Tue,  6 Dec 2022 15:17:14 -0800 Kees Cook wrote:
>> -	unsigned int size = frag_size ? : ksize(data);
>> +	unsigned int size = frag_size;
>> +
>> +	/* When frag_size == 0, the buffer came from kmalloc, so we
>> +	 * must find its true allocation size (and grow it to match).
>> +	 */
>> +	if (unlikely(size == 0)) {
>> +		void *resized;
>> +
>> +		size = ksize(data);
>> +		/* krealloc() will immediate return "data" when
>> +		 * "ksize(data)" is requested: it is the existing upper
>> +		 * bounds. As a result, GFP_ATOMIC will be ignored.
>> +		 */
>> +		resized = krealloc(data, size, GFP_ATOMIC);
>> +		if (WARN_ON(resized != data))
>> +			data = resized;
>> +	}
>>  
>
>Aammgh. build_skb(0) is plain silly, AFAIK. The performance hit of
>using kmalloc()'ed heads is large because GRO can't free the metadata.
>So we end up carrying per-MTU skbs across to the application and then
>freeing them one by one. With pages we just aggregate up to 64k of data
>in a single skb.

This isn't changed by this patch, though? The users of kmalloc+build_skb are pre-existing.

>I can only grep out 3 cases of build_skb(.. 0), could we instead
>convert them into a new build_skb_slab(), and handle all the silliness
>in such a new helper? That'd be a win both for the memory safety and one
>fewer branch for the fast path.

When I went through callers, it was many more than 3. Regardless, I don't see the point: my patch has no more branches than the original code (in fact, it may actually be faster because I made the initial assignment unconditional, and zero-test-after-assign is almost free, where as before it tested before the assign. And now it's marked as unlikely to keep it out-of-line.

>I think it's worth doing, so LMK if you're okay to do this extra work,
>otherwise I can help (unless e.g. Eric tells me I'm wrong..).

I had been changing callers to round up (e.g. bnx2), but it seemed like centralizing this makes more sense. I don't think a different helper will clean this up.

-Kees
  
Jakub Kicinski Dec. 7, 2022, 4:04 a.m. UTC | #3
On Tue, 06 Dec 2022 19:47:13 -0800 Kees Cook wrote:
> >Aammgh. build_skb(0) is plain silly, AFAIK. The performance hit of
> >using kmalloc()'ed heads is large because GRO can't free the metadata.
> >So we end up carrying per-MTU skbs across to the application and then
> >freeing them one by one. With pages we just aggregate up to 64k of data
> >in a single skb.  
> 
> This isn't changed by this patch, though? The users of
> kmalloc+build_skb are pre-existing.

Yes.

> >I can only grep out 3 cases of build_skb(.. 0), could we instead
> >convert them into a new build_skb_slab(), and handle all the silliness
> >in such a new helper? That'd be a win both for the memory safety and one
> >fewer branch for the fast path.  
> 
> When I went through callers, it was many more than 3. Regardless, I
> don't see the point: my patch has no more branches than the original
> code (in fact, it may actually be faster because I made the initial
> assignment unconditional, and zero-test-after-assign is almost free,
> where as before it tested before the assign. And now it's marked as
> unlikely to keep it out-of-line.

Maybe.

> >I think it's worth doing, so LMK if you're okay to do this extra
> >work, otherwise I can help (unless e.g. Eric tells me I'm wrong..).  
> 
> I had been changing callers to round up (e.g. bnx2), but it seemed
> like centralizing this makes more sense. I don't think a different
> helper will clean this up.

It's a combination of the fact that I think "0 is magic" falls in 
the "garbage" category of APIs, and the fact that driver developers
have many things to worry about, so they often don't know that using
slab is a bad idea. So I want a helper out of the normal path, where 
I can put a kdoc warning that says "if you're doing this - GRO will
suck, use page frags".
  
Vlastimil Babka Dec. 7, 2022, 9:19 a.m. UTC | #4
On 12/7/22 00:17, Kees Cook wrote:
> When build_skb() is passed a frag_size of 0, it means the buffer came
> from kmalloc. In these cases, ksize() is used to find its actual size,
> but since the allocation may not have been made to that size, actually
> perform the krealloc() call so that all the associated buffer size
> checking will be correctly notified. For example, syzkaller reported:
> 
>   BUG: KASAN: slab-out-of-bounds in __build_skb_around+0x235/0x340 net/core/skbuff.c:294
>   Write of size 32 at addr ffff88802aa172c0 by task syz-executor413/5295
> 
> For bpf_prog_test_run_skb(), which uses a kmalloc()ed buffer passed to
> build_skb().

Weren't all such kmalloc() users converted to kmalloc_size_roundup() to
prevent this?

> Reported-by: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com
> Link: https://groups.google.com/g/syzkaller-bugs/c/UnIKxTtU5-0/m/-wbXinkgAQAJ
> Fixes: 38931d8989b5 ("mm: Make ksize() a reporting-only function")
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Pavel Begunkov <asml.silence@gmail.com>
> Cc: pepsipu <soopthegoop@gmail.com>
> Cc: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: kasan-dev <kasan-dev@googlegroups.com>
> Cc: Andrii Nakryiko <andrii@kernel.org>
> Cc: ast@kernel.org
> Cc: bpf <bpf@vger.kernel.org>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Hao Luo <haoluo@google.com>
> Cc: Jesper Dangaard Brouer <hawk@kernel.org>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: jolsa@kernel.org
> Cc: KP Singh <kpsingh@kernel.org>
> Cc: martin.lau@linux.dev
> Cc: Stanislav Fomichev <sdf@google.com>
> Cc: song@kernel.org
> Cc: Yonghong Song <yhs@fb.com>
> Cc: netdev@vger.kernel.org
> Cc: LKML <linux-kernel@vger.kernel.org>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  net/core/skbuff.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 1d9719e72f9d..b55d061ed8b4 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -274,7 +274,23 @@ static void __build_skb_around(struct sk_buff *skb, void *data,
>  			       unsigned int frag_size)
>  {
>  	struct skb_shared_info *shinfo;
> -	unsigned int size = frag_size ? : ksize(data);
> +	unsigned int size = frag_size;
> +
> +	/* When frag_size == 0, the buffer came from kmalloc, so we
> +	 * must find its true allocation size (and grow it to match).
> +	 */
> +	if (unlikely(size == 0)) {
> +		void *resized;
> +
> +		size = ksize(data);
> +		/* krealloc() will immediate return "data" when
> +		 * "ksize(data)" is requested: it is the existing upper
> +		 * bounds. As a result, GFP_ATOMIC will be ignored.
> +		 */
> +		resized = krealloc(data, size, GFP_ATOMIC);
> +		if (WARN_ON(resized != data))

WARN_ON_ONCE() could be sufficient as either this is impossible to hit by
definition, or something went very wrong (a patch screwed ksize/krealloc?)
and it can be hit many times?

> +			data = resized;

In that "impossible" case, this could also end up as NULL due to GFP_ATOMIC
allocation failure, but maybe it's really impractical to do anything about it...

> +	}
>  
>  	size -= SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>
  
Eric Dumazet Dec. 7, 2022, 10:30 a.m. UTC | #5
On Wed, Dec 7, 2022 at 2:56 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue,  6 Dec 2022 15:17:14 -0800 Kees Cook wrote:
> > -     unsigned int size = frag_size ? : ksize(data);
> > +     unsigned int size = frag_size;
> > +
> > +     /* When frag_size == 0, the buffer came from kmalloc, so we
> > +      * must find its true allocation size (and grow it to match).
> > +      */
> > +     if (unlikely(size == 0)) {
> > +             void *resized;
> > +
> > +             size = ksize(data);
> > +             /* krealloc() will immediate return "data" when
> > +              * "ksize(data)" is requested: it is the existing upper
> > +              * bounds. As a result, GFP_ATOMIC will be ignored.
> > +              */
> > +             resized = krealloc(data, size, GFP_ATOMIC);
> > +             if (WARN_ON(resized != data))
> > +                     data = resized;
> > +     }
> >
>
> Aammgh. build_skb(0) is plain silly, AFAIK. The performance hit of
> using kmalloc()'ed heads is large because GRO can't free the metadata.
> So we end up carrying per-MTU skbs across to the application and then
> freeing them one by one. With pages we just aggregate up to 64k of data
> in a single skb.
>
> I can only grep out 3 cases of build_skb(.. 0), could we instead
> convert them into a new build_skb_slab(), and handle all the silliness
> in such a new helper? That'd be a win both for the memory safety and one
> fewer branch for the fast path.
>
> I think it's worth doing, so LMK if you're okay to do this extra work,
> otherwise I can help (unless e.g. Eric tells me I'm wrong..).

I totally agree, I would indeed remove ksize() use completely,
let callers give us the size, and the head_frag boolean,
instead of inferring from size==0
  

Patch

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 1d9719e72f9d..b55d061ed8b4 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -274,7 +274,23 @@  static void __build_skb_around(struct sk_buff *skb, void *data,
 			       unsigned int frag_size)
 {
 	struct skb_shared_info *shinfo;
-	unsigned int size = frag_size ? : ksize(data);
+	unsigned int size = frag_size;
+
+	/* When frag_size == 0, the buffer came from kmalloc, so we
+	 * must find its true allocation size (and grow it to match).
+	 */
+	if (unlikely(size == 0)) {
+		void *resized;
+
+		size = ksize(data);
+		/* krealloc() will immediate return "data" when
+		 * "ksize(data)" is requested: it is the existing upper
+		 * bounds. As a result, GFP_ATOMIC will be ignored.
+		 */
+		resized = krealloc(data, size, GFP_ATOMIC);
+		if (WARN_ON(resized != data))
+			data = resized;
+	}
 
 	size -= SKB_DATA_ALIGN(sizeof(struct skb_shared_info));