[bpf,v3] net/xdp: fix zero-size allocation warning in xskq_create()

Message ID 20231005193548.515-1-andrew.kanner@gmail.com
State New
Headers
Series [bpf,v3] net/xdp: fix zero-size allocation warning in xskq_create() |

Commit Message

Andrew Kanner Oct. 5, 2023, 7:35 p.m. UTC
  Syzkaller reported the following issue:
 ------------[ cut here ]------------
 WARNING: CPU: 0 PID: 2807 at mm/vmalloc.c:3247 __vmalloc_node_range (mm/vmalloc.c:3361)
 Modules linked in:
 CPU: 0 PID: 2807 Comm: repro Not tainted 6.6.0-rc2+ #12
 Hardware name: Generic DT based system
 unwind_backtrace from show_stack (arch/arm/kernel/traps.c:258)
 show_stack from dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1))
 dump_stack_lvl from __warn (kernel/panic.c:633 kernel/panic.c:680)
 __warn from warn_slowpath_fmt (./include/linux/context_tracking.h:153 kernel/panic.c:700)
 warn_slowpath_fmt from __vmalloc_node_range (mm/vmalloc.c:3361 (discriminator 3))
 __vmalloc_node_range from vmalloc_user (mm/vmalloc.c:3478)
 vmalloc_user from xskq_create (net/xdp/xsk_queue.c:40)
 xskq_create from xsk_setsockopt (net/xdp/xsk.c:953 net/xdp/xsk.c:1286)
 xsk_setsockopt from __sys_setsockopt (net/socket.c:2308)
 __sys_setsockopt from ret_fast_syscall (arch/arm/kernel/entry-common.S:68)

xskq_get_ring_size() uses struct_size() macro to safely calculate the
size of struct xsk_queue and q->nentries of desc members. But the
syzkaller repro was able to set q->nentries with the value initially
taken from copy_from_sockptr() high enough to return SIZE_MAX by
struct_size(). The next PAGE_ALIGN(size) is such case will overflow
the size_t value and set it to 0. This will trigger WARN_ON_ONCE in
vmalloc_user() -> __vmalloc_node_range().

The issue is reproducible on 32-bit arm kernel.

Reported-and-tested-by: syzbot+fae676d3cf469331fc89@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000c84b4705fb31741e@google.com/T/
Link: https://syzkaller.appspot.com/bug?extid=fae676d3cf469331fc89
Reported-by: syzbot+b132693e925cbbd89e26@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000e20df20606ebab4f@google.com/T/
Fixes: 9f78bf330a66 ("xsk: support use vaddr as ring")
Signed-off-by: Andrew Kanner <andrew.kanner@gmail.com>
---

Notes (akanner):
    v3:
      - free kzalloc-ed memory before return, the leak was noticed by
        Daniel Borkmann <daniel@iogearbox.net>
    v2: https://lore.kernel.org/all/20231002222939.1519-1-andrew.kanner@gmail.com/raw
      - use unlikely() optimization for the case with SIZE_MAX return from
        struct_size(), suggested by Alexander Lobakin
        <aleksander.lobakin@intel.com>
      - cc-ed 4 more maintainers, mentioned by cc_maintainers patchwork
        test
    
    v1: https://lore.kernel.org/all/20230928204440.543-1-andrew.kanner@gmail.com/T/
      - RFC notes:
        It was found that net/xdp/xsk.c:xsk_setsockopt() uses
        copy_from_sockptr() to get the number of entries (int) for cases
        with XDP_RX_RING / XDP_TX_RING and XDP_UMEM_FILL_RING /
        XDP_UMEM_COMPLETION_RING.
    
        Next in xsk_init_queue() there're 2 sanity checks (entries == 0)
        and (!is_power_of_2(entries)) for which -EINVAL will be returned.
    
        After that net/xdp/xsk_queue.c:xskq_create() will calculate the
        size multipling the number of entries (int) with the size of u64,
        at least.
    
        I wonder if there should be the upper bound (e.g. the 3rd sanity
        check inside xsk_init_queue()). It seems that without the upper
        limit it's quiet easy to overflow the allocated size (SIZE_MAX),
        especially for 32-bit architectures, for example arm nodes which
        were used by the syzkaller.
    
        In this patch I added a naive check for SIZE_MAX which helped to
        skip zero-size allocation after overflow, but maybe it's not quite
        right. Please, suggest if you have any thoughts about the
        appropriate limit for the size of these xdp rings.
    
        PS: the initial number of entries is 0x20000000 in syzkaller
        repro: syscall(__NR_setsockopt, (intptr_t)r[0], 0x11b, 3,
        0x20000040, 0x20);
    
        Link:
        https://syzkaller.appspot.com/text?tag=ReproC&x=10910f18280000

 net/xdp/xsk_queue.c | 5 +++++
 1 file changed, 5 insertions(+)
  

Comments

patchwork-bot+netdevbpf@kernel.org Oct. 6, 2023, 12:40 a.m. UTC | #1
Hello:

This patch was applied to bpf/bpf.git (master)
by Martin KaFai Lau <martin.lau@kernel.org>:

On Thu,  5 Oct 2023 22:35:49 +0300 you wrote:
> Syzkaller reported the following issue:
>  ------------[ cut here ]------------
>  WARNING: CPU: 0 PID: 2807 at mm/vmalloc.c:3247 __vmalloc_node_range (mm/vmalloc.c:3361)
>  Modules linked in:
>  CPU: 0 PID: 2807 Comm: repro Not tainted 6.6.0-rc2+ #12
>  Hardware name: Generic DT based system
>  unwind_backtrace from show_stack (arch/arm/kernel/traps.c:258)
>  show_stack from dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1))
>  dump_stack_lvl from __warn (kernel/panic.c:633 kernel/panic.c:680)
>  __warn from warn_slowpath_fmt (./include/linux/context_tracking.h:153 kernel/panic.c:700)
>  warn_slowpath_fmt from __vmalloc_node_range (mm/vmalloc.c:3361 (discriminator 3))
>  __vmalloc_node_range from vmalloc_user (mm/vmalloc.c:3478)
>  vmalloc_user from xskq_create (net/xdp/xsk_queue.c:40)
>  xskq_create from xsk_setsockopt (net/xdp/xsk.c:953 net/xdp/xsk.c:1286)
>  xsk_setsockopt from __sys_setsockopt (net/socket.c:2308)
>  __sys_setsockopt from ret_fast_syscall (arch/arm/kernel/entry-common.S:68)
> 
> [...]

Here is the summary with links:
  - [bpf,v3] net/xdp: fix zero-size allocation warning in xskq_create()
    https://git.kernel.org/bpf/bpf/c/90aeaa99f53e

You are awesome, thank you!
  
Martin KaFai Lau Oct. 6, 2023, 1 a.m. UTC | #2
On 10/5/23 12:35 PM, Andrew Kanner wrote:
> Syzkaller reported the following issue:
>   ------------[ cut here ]------------
>   WARNING: CPU: 0 PID: 2807 at mm/vmalloc.c:3247 __vmalloc_node_range (mm/vmalloc.c:3361)
>   Modules linked in:
>   CPU: 0 PID: 2807 Comm: repro Not tainted 6.6.0-rc2+ #12
>   Hardware name: Generic DT based system
>   unwind_backtrace from show_stack (arch/arm/kernel/traps.c:258)
>   show_stack from dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1))
>   dump_stack_lvl from __warn (kernel/panic.c:633 kernel/panic.c:680)
>   __warn from warn_slowpath_fmt (./include/linux/context_tracking.h:153 kernel/panic.c:700)
>   warn_slowpath_fmt from __vmalloc_node_range (mm/vmalloc.c:3361 (discriminator 3))
>   __vmalloc_node_range from vmalloc_user (mm/vmalloc.c:3478)
>   vmalloc_user from xskq_create (net/xdp/xsk_queue.c:40)
>   xskq_create from xsk_setsockopt (net/xdp/xsk.c:953 net/xdp/xsk.c:1286)
>   xsk_setsockopt from __sys_setsockopt (net/socket.c:2308)
>   __sys_setsockopt from ret_fast_syscall (arch/arm/kernel/entry-common.S:68)
> 
> xskq_get_ring_size() uses struct_size() macro to safely calculate the
> size of struct xsk_queue and q->nentries of desc members. But the
> syzkaller repro was able to set q->nentries with the value initially
> taken from copy_from_sockptr() high enough to return SIZE_MAX by
> struct_size(). The next PAGE_ALIGN(size) is such case will overflow
> the size_t value and set it to 0. This will trigger WARN_ON_ONCE in

Please ignore the pw-bot email. A question just came to my mind after applying.

> diff --git a/net/xdp/xsk_queue.c b/net/xdp/xsk_queue.c
> index f8905400ee07..c7e8bbb12752 100644
> --- a/net/xdp/xsk_queue.c
> +++ b/net/xdp/xsk_queue.c
> @@ -34,6 +34,11 @@ struct xsk_queue *xskq_create(u32 nentries, bool umem_queue)
>   	q->ring_mask = nentries - 1;
>   
>   	size = xskq_get_ring_size(q, umem_queue);
> +	if (unlikely(size == SIZE_MAX)) {

What if "size" is SIZE_MAX-1? Would it still overflow the PAGE_ALIGN below?

> +		kfree(q);
> +		return NULL;
> +	}
> +
>   	size = PAGE_ALIGN(size);
>   
>   	q->ring = vmalloc_user(size);
  
Andrew Kanner Oct. 6, 2023, 7:09 a.m. UTC | #3
On Thu, Oct 05, 2023 at 06:00:46PM -0700, Martin KaFai Lau wrote:
[...]
> > diff --git a/net/xdp/xsk_queue.c b/net/xdp/xsk_queue.c
> > index f8905400ee07..c7e8bbb12752 100644
> > --- a/net/xdp/xsk_queue.c
> > +++ b/net/xdp/xsk_queue.c
> > @@ -34,6 +34,11 @@ struct xsk_queue *xskq_create(u32 nentries, bool umem_queue)
> >   	q->ring_mask = nentries - 1;
> >   	size = xskq_get_ring_size(q, umem_queue);
> > +	if (unlikely(size == SIZE_MAX)) {
> 
> What if "size" is SIZE_MAX-1? Would it still overflow the PAGE_ALIGN below?
> 
> > +		kfree(q);
> > +		return NULL;
> > +	}
> > +
> >   	size = PAGE_ALIGN(size);
> >   	q->ring = vmalloc_user(size);
> 

I asked myself the same question before v1. E.g. thinking about the
check: (size > SIZE_MAX - PAGE_SIZE + 1)

But xskq_create() is called after the check for
!is_power_of_2(entries) in xsk_init_queue(). So I tried the same
reproducer and divided the (nentries) value by 2 in a loop - it hits
either SIZE_MAX case or the normal cases without overflow (sometimes
throwing vmalloc error complaining about size which exceed total pages
in my arm setup).

So I can't see a way size will be SIZE_MAX-1, etc. Correct me if I'm
wrong, please.

PS: In the output below the first 2 values of (nentries) hit SIZE_MAX
case, the rest hit the normal case, vmalloc_user() is complaining
about 1 allocation:

0x20000000
0x10000000
0x8000000
[   41.759195][ T2807] pre PAGE_ALIGN size = 2147483968 (0x80000140), PAGE_SIZE = 4096 (0x1000)
[   41.759621][ T2807] repro-iter: vmalloc error: size 2147487744, exceeds total pages, mode:0xdc0(GFP_KERNEL|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
[...]
0x4000000
0x2000000
0x1000000
0x800000
0x400000
0x200000
0x100000
0x80000
0x40000
0x20000
0x10000
0x8000
0x4000
0x2000
0x1000
0x800
0x400
0x200
0x100
0x80
0x40
0x20
0x10
0x8
0x4
0x2
  
Martin KaFai Lau Oct. 6, 2023, 5:37 p.m. UTC | #4
On 10/6/23 12:09 AM, Andrew Kanner wrote:
> On Thu, Oct 05, 2023 at 06:00:46PM -0700, Martin KaFai Lau wrote:
> [...]
>>> diff --git a/net/xdp/xsk_queue.c b/net/xdp/xsk_queue.c
>>> index f8905400ee07..c7e8bbb12752 100644
>>> --- a/net/xdp/xsk_queue.c
>>> +++ b/net/xdp/xsk_queue.c
>>> @@ -34,6 +34,11 @@ struct xsk_queue *xskq_create(u32 nentries, bool umem_queue)
>>>    	q->ring_mask = nentries - 1;
>>>    	size = xskq_get_ring_size(q, umem_queue);
>>> +	if (unlikely(size == SIZE_MAX)) {
>>
>> What if "size" is SIZE_MAX-1? Would it still overflow the PAGE_ALIGN below?
>>
>>> +		kfree(q);
>>> +		return NULL;
>>> +	}
>>> +
>>>    	size = PAGE_ALIGN(size);
>>>    	q->ring = vmalloc_user(size);
>>
> 
> I asked myself the same question before v1. E.g. thinking about the
> check: (size > SIZE_MAX - PAGE_SIZE + 1)
> 
> But xskq_create() is called after the check for
> !is_power_of_2(entries) in xsk_init_queue(). So I tried the same
> reproducer and divided the (nentries) value by 2 in a loop - it hits
> either SIZE_MAX case or the normal cases without overflow (sometimes
> throwing vmalloc error complaining about size which exceed total pages
> in my arm setup).
> 
> So I can't see a way size will be SIZE_MAX-1, etc. Correct me if I'm
> wrong, please.
> 
> PS: In the output below the first 2 values of (nentries) hit SIZE_MAX

Thanks for the explanation, so iiuc it means it will overflow the struct_size() 
first because of the is_power_of_2(nentries) requirement? Could you help adding 
some comment to explain? Thanks.

> case, the rest hit the normal case, vmalloc_user() is complaining
> about 1 allocation:
> 
> 0x20000000
> 0x10000000
> 0x8000000
> [   41.759195][ T2807] pre PAGE_ALIGN size = 2147483968 (0x80000140), PAGE_SIZE = 4096 (0x1000)
> [   41.759621][ T2807] repro-iter: vmalloc error: size 2147487744, exceeds total pages, mode:0xdc0(GFP_KERNEL|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
> [...]
> 0x4000000
> 0x2000000
> 0x1000000
> 0x800000
> 0x400000
> 0x200000
> 0x100000
> 0x80000
> 0x40000
> 0x20000
> 0x10000
> 0x8000
> 0x4000
> 0x2000
> 0x1000
> 0x800
> 0x400
> 0x200
> 0x100
> 0x80
> 0x40
> 0x20
> 0x10
> 0x8
> 0x4
> 0x2
>
  
Andrew Kanner Oct. 6, 2023, 11:24 p.m. UTC | #5
On Fri, Oct 06, 2023 at 10:37:44AM -0700, Martin KaFai Lau wrote:
[...] 
> > > What if "size" is SIZE_MAX-1? Would it still overflow the PAGE_ALIGN below?
> > > 
> > > > +		kfree(q);
> > > > +		return NULL;
> > > > +	}
> > > > +
> > > >    	size = PAGE_ALIGN(size);
> > > >    	q->ring = vmalloc_user(size);
> > > 
> > 
> > I asked myself the same question before v1. E.g. thinking about the
> > check: (size > SIZE_MAX - PAGE_SIZE + 1)
> > 
> > But xskq_create() is called after the check for
> > !is_power_of_2(entries) in xsk_init_queue(). So I tried the same
> > reproducer and divided the (nentries) value by 2 in a loop - it hits
> > either SIZE_MAX case or the normal cases without overflow (sometimes
> > throwing vmalloc error complaining about size which exceed total pages
> > in my arm setup).
> > 
> > So I can't see a way size will be SIZE_MAX-1, etc. Correct me if I'm
> > wrong, please.
> > 
> > PS: In the output below the first 2 values of (nentries) hit SIZE_MAX
> 
> Thanks for the explanation, so iiuc it means it will overflow the
> struct_size() first because of the is_power_of_2(nentries) requirement?
> Could you help adding some comment to explain? Thanks.
>

The overflow happens because there's no upper limit for nentries
(userspace input). Let me add more context, e.g. from net/xdp/xsk.c:

static int xsk_setsockopt(struct socket *sock, int level, int optname,
                          sockptr_t optval, unsigned int optlen)
{
[...]
                if (copy_from_sockptr(&entries, optval, sizeof(entries)))
                        return -EFAULT;
[...]
                err = xsk_init_queue(entries, q, false);
[...]
}

'entries' is passed to xsk_init_queue() and there're 2 checks: for 0
and is_power_of_2() only, no upper bound check:

static int xsk_init_queue(u32 entries, struct xsk_queue **queue,
                          bool umem_queue)
{
        struct xsk_queue *q;

        if (entries == 0 || *queue || !is_power_of_2(entries))
                return -EINVAL;

        q = xskq_create(entries, umem_queue);
        if (!q)
                return -ENOMEM;
[...]
}

The 'entries' value is next passed to struct_size() in
net/xdp/xsk_queue.c. If it's large enough - SIZE_MAX will be returned.

I'm not sure if some appropriate limit for the size of XDP_RX_RING /
XDP_TX_RING and XDP_UMEM_FILL_RING / XDP_UMEM_COMPLETION_RING rings
should be used. But anyway, vmalloc() will tell if it's not ok with
the requested allocation size.
  
Martin KaFai Lau Oct. 6, 2023, 11:58 p.m. UTC | #6
On 10/6/23 4:24 PM, Andrew Kanner wrote:
>> Thanks for the explanation, so iiuc it means it will overflow the
>> struct_size() first because of the is_power_of_2(nentries) requirement?
>> Could you help adding some comment to explain? Thanks.
>>
> The overflow happens because there's no upper limit for nentries
> (userspace input). Let me add more context, e.g. from net/xdp/xsk.c:
> 
> static int xsk_setsockopt(struct socket *sock, int level, int optname,
>                            sockptr_t optval, unsigned int optlen)
> {
> [...]
>                  if (copy_from_sockptr(&entries, optval, sizeof(entries)))
>                          return -EFAULT;
> [...]
>                  err = xsk_init_queue(entries, q, false);
> [...]
> }
> 
> 'entries' is passed to xsk_init_queue() and there're 2 checks: for 0
> and is_power_of_2() only, no upper bound check:
> 
> static int xsk_init_queue(u32 entries, struct xsk_queue **queue,
>                            bool umem_queue)
> {
>          struct xsk_queue *q;
> 
>          if (entries == 0 || *queue || !is_power_of_2(entries))
>                  return -EINVAL;
> 
>          q = xskq_create(entries, umem_queue);
>          if (!q)
>                  return -ENOMEM;
> [...]
> }
> 
> The 'entries' value is next passed to struct_size() in
> net/xdp/xsk_queue.c. If it's large enough - SIZE_MAX will be returned.

All make sense. I was mostly asking to add a comment at the "if (unlikely(size 
== SIZE_MAX)" check to explain this details on why checking SIZE_MAX is enough.
  
Andrew Kanner Oct. 7, 2023, 6:56 a.m. UTC | #7
On Fri, Oct 06, 2023 at 04:58:18PM -0700, Martin KaFai Lau wrote:
> On 10/6/23 4:24 PM, Andrew Kanner wrote:
> > > Thanks for the explanation, so iiuc it means it will overflow the
> > > struct_size() first because of the is_power_of_2(nentries) requirement?
> > > Could you help adding some comment to explain? Thanks.
> > > 
> > The overflow happens because there's no upper limit for nentries
> > (userspace input). Let me add more context, e.g. from net/xdp/xsk.c:
> > 
> > static int xsk_setsockopt(struct socket *sock, int level, int optname,
> >                            sockptr_t optval, unsigned int optlen)
> > {
> > [...]
> >                  if (copy_from_sockptr(&entries, optval, sizeof(entries)))
> >                          return -EFAULT;
> > [...]
> >                  err = xsk_init_queue(entries, q, false);
> > [...]
> > }
> > 
> > 'entries' is passed to xsk_init_queue() and there're 2 checks: for 0
> > and is_power_of_2() only, no upper bound check:
> > 
> > static int xsk_init_queue(u32 entries, struct xsk_queue **queue,
> >                            bool umem_queue)
> > {
> >          struct xsk_queue *q;
> > 
> >          if (entries == 0 || *queue || !is_power_of_2(entries))
> >                  return -EINVAL;
> > 
> >          q = xskq_create(entries, umem_queue);
> >          if (!q)
> >                  return -ENOMEM;
> > [...]
> > }
> > 
> > The 'entries' value is next passed to struct_size() in
> > net/xdp/xsk_queue.c. If it's large enough - SIZE_MAX will be returned.
> 
> All make sense. I was mostly asking to add a comment at the "if
> (unlikely(size == SIZE_MAX)" check to explain this details on why checking
> SIZE_MAX is enough.

Ok, I got it. Will add in v4.
Thanks.
  

Patch

diff --git a/net/xdp/xsk_queue.c b/net/xdp/xsk_queue.c
index f8905400ee07..c7e8bbb12752 100644
--- a/net/xdp/xsk_queue.c
+++ b/net/xdp/xsk_queue.c
@@ -34,6 +34,11 @@  struct xsk_queue *xskq_create(u32 nentries, bool umem_queue)
 	q->ring_mask = nentries - 1;
 
 	size = xskq_get_ring_size(q, umem_queue);
+	if (unlikely(size == SIZE_MAX)) {
+		kfree(q);
+		return NULL;
+	}
+
 	size = PAGE_ALIGN(size);
 
 	q->ring = vmalloc_user(size);