net: sched: fix memory leak in tcindex_set_parms

Message ID 20221031060835.11722-1-yin31149@gmail.com
State New
Headers
Series net: sched: fix memory leak in tcindex_set_parms |

Commit Message

Hawkins Jiawei Oct. 31, 2022, 6:08 a.m. UTC
  Syzkaller reports a memory leak as follows:
====================================
BUG: memory leak
unreferenced object 0xffff88810c287f00 (size 256):
  comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046
    [<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline]
    [<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline]
    [<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline]
    [<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline]
    [<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342
    [<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553
    [<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147
    [<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082
    [<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540
    [<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
    [<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345
    [<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921
    [<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline]
    [<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734
    [<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482
    [<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536
    [<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622
    [<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline]
    [<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline]
    [<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648
    [<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    [<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
    [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
====================================

Kernel will uses tcindex_change() to change an existing
traffic-control-indices filter properties. During the
process of changing, kernel will clears the old
traffic-control-indices filter result, and updates it
by RCU assigning new traffic-control-indices data.

Yet the problem is that, kernel will clears the old
traffic-control-indices filter result, without destroying
its tcf_exts structure, which triggers the above
memory leak.

This patch solves it by using tcf_exts_destroy() to
destroy the tcf_exts structure in old
traffic-control-indices filter result.

Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/
Reported-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com
Tested-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
---
 net/sched/cls_tcindex.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)
  

Comments

Jakub Kicinski Nov. 3, 2022, 3:26 a.m. UTC | #1
On Mon, 31 Oct 2022 14:08:35 +0800 Hawkins Jiawei wrote:
> Kernel will uses tcindex_change() to change an existing

s/will//

> traffic-control-indices filter properties. During the
> process of changing, kernel will clears the old

s/will//

> traffic-control-indices filter result, and updates it
> by RCU assigning new traffic-control-indices data.
> 
> Yet the problem is that, kernel will clears the old

s/will//

> traffic-control-indices filter result, without destroying
> its tcf_exts structure, which triggers the above
> memory leak.
> 
> This patch solves it by using tcf_exts_destroy() to
> destroy the tcf_exts structure in old
> traffic-control-indices filter result.
> 

Please provide a Fixes tag to where the problem was introduced 
(or the initial git commit).

> Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/
> Reported-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com
> Tested-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com
> Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
> ---
>  net/sched/cls_tcindex.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
> index 1c9eeb98d826..dc872a794337 100644
> --- a/net/sched/cls_tcindex.c
> +++ b/net/sched/cls_tcindex.c
> @@ -338,6 +338,9 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
>  	struct tcf_result cr = {};
>  	int err, balloc = 0;
>  	struct tcf_exts e;
> +#ifdef CONFIG_NET_CLS_ACT
> +	struct tcf_exts old_e = {};
> +#endif

Why all the ifdefs?

>  	err = tcf_exts_init(&e, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
>  	if (err < 0)
> @@ -479,6 +482,14 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
>  	}
>  
>  	if (old_r && old_r != r) {
> +#ifdef CONFIG_NET_CLS_ACT
> +		/* r->exts is not copied from old_r->exts, and
> +		 * the following code will clears the old_r, so
> +		 * we need to destroy it after updating the tp->root,
> +		 * to avoid memory leak bug.
> +		 */
> +		old_e = old_r->exts;
> +#endif

Can't you localize all the changes to this if block?

Maybe add a function called tcindex_filter_result_reinit()
which will act more appropriately?

>  		err = tcindex_filter_result_init(old_r, cp, net);
>  		if (err < 0) {
>  			kfree(f);
> @@ -510,6 +521,9 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
>  		tcf_exts_destroy(&new_filter_result.exts);
>  	}
>  
> +#ifdef CONFIG_NET_CLS_ACT
> +	tcf_exts_destroy(&old_e);
> +#endif
>  	if (oldp)
>  		tcf_queue_work(&oldp->rwork, tcindex_partial_destroy_work);
>  	return 0;
  
Hawkins Jiawei Nov. 3, 2022, 4:07 p.m. UTC | #2
Hi Jakub,
On Thu, 3 Nov 2022 at 11:26, Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 31 Oct 2022 14:08:35 +0800 Hawkins Jiawei wrote:
> > Kernel will uses tcindex_change() to change an existing
>
> s/will//
>
> > traffic-control-indices filter properties. During the
> > process of changing, kernel will clears the old
>
> s/will//
>
> > traffic-control-indices filter result, and updates it
> > by RCU assigning new traffic-control-indices data.
> >
> > Yet the problem is that, kernel will clears the old
>
> s/will//
Thanks for the suggestion. I will amend these in the v2 patch.

>
> > traffic-control-indices filter result, without destroying
> > its tcf_exts structure, which triggers the above
> > memory leak.
> >
> > This patch solves it by using tcf_exts_destroy() to
> > destroy the tcf_exts structure in old
> > traffic-control-indices filter result.
> >
>
> Please provide a Fixes tag to where the problem was introduced
> (or the initial git commit).
Thanks for reminding, it seems that the problem was 
introduced by commit 
b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()"),
because it was in this commit that kernel allocated the struct tcf_exts
for new traffic-control-indices filter result in tcindex_alloc_perfect_hash().

I will add the tag in the v2 patch.

>
> > Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/
> > Reported-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com
> > Tested-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com
> > Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
> > ---
> >  net/sched/cls_tcindex.c | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> >
> > diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
> > index 1c9eeb98d826..dc872a794337 100644
> > --- a/net/sched/cls_tcindex.c
> > +++ b/net/sched/cls_tcindex.c
> > @@ -338,6 +338,9 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
> >       struct tcf_result cr = {};
> >       int err, balloc = 0;
> >       struct tcf_exts e;
> > +#ifdef CONFIG_NET_CLS_ACT
> > +     struct tcf_exts old_e = {};
> > +#endif
>
> Why all the ifdefs?
Thanks for suggestion, it seems that these ifdefs are not needed.
I will delete these in the v2 patch.

>
> >       err = tcf_exts_init(&e, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
> >       if (err < 0)
> > @@ -479,6 +482,14 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
> >       }
> >
> >       if (old_r && old_r != r) {
> > +#ifdef CONFIG_NET_CLS_ACT
> > +             /* r->exts is not copied from old_r->exts, and
> > +              * the following code will clears the old_r, so
> > +              * we need to destroy it after updating the tp->root,
> > +              * to avoid memory leak bug.
> > +              */
> > +             old_e = old_r->exts;
> > +#endif
>
> Can't you localize all the changes to this if block?
>
> Maybe add a function called tcindex_filter_result_reinit()
> which will act more appropriately?
I think we shouldn't put the tcf_exts_destroy(&old_e)
into this if block, or other RCU readers may derefer the
freed memory (Please correct me If I am wrong).

So I put the tcf_exts_destroy(&old_e) near the tcindex 
destroy work, after the RCU updateing.

>
> >               err = tcindex_filter_result_init(old_r, cp, net);
> >               if (err < 0) {
> >                       kfree(f);
> > @@ -510,6 +521,9 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
> >               tcf_exts_destroy(&new_filter_result.exts);
> >       }
> >
> > +#ifdef CONFIG_NET_CLS_ACT
> > +     tcf_exts_destroy(&old_e);
> > +#endif
> >       if (oldp)
> >               tcf_queue_work(&oldp->rwork, tcindex_partial_destroy_work);
> >       return 0;
  
Jakub Kicinski Nov. 4, 2022, 2:23 a.m. UTC | #3
On Fri,  4 Nov 2022 00:07:00 +0800 Hawkins Jiawei wrote:
> > Can't you localize all the changes to this if block?
> >
> > Maybe add a function called tcindex_filter_result_reinit()
> > which will act more appropriately?  
> 
> I think we shouldn't put the tcf_exts_destroy(&old_e)
> into this if block, or other RCU readers may derefer the
> freed memory (Please correct me If I am wrong).
> 
> So I put the tcf_exts_destroy(&old_e) near the tcindex 
> destroy work, after the RCU updateing.

I'm not sure what this code is trying to do, to be honest.
Your concern that there may be a concurrent reader is valid,
but then again tcindex_filter_result_init() just wipes the
entire structure with a memset() so concurrent readers are
already likely broken?

Maybe tcindex_filter_result_init() dates back to times when
exts were a list (see commit 22dc13c837c) and calling 
tcf_exts_init() wasn't that different than cleaning it up?
In other words this code is trying to destroy old_r, not
reinitialize it?

> >  
> > >               err = tcindex_filter_result_init(old_r, cp, net);
  
Hawkins Jiawei Nov. 5, 2022, 2:11 p.m. UTC | #4
On Fri, 4 Nov 2022 at 10:23, Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri,  4 Nov 2022 00:07:00 +0800 Hawkins Jiawei wrote:
> > > Can't you localize all the changes to this if block?
> > >
> > > Maybe add a function called tcindex_filter_result_reinit()
> > > which will act more appropriately? 
> >
> > I think we shouldn't put the tcf_exts_destroy(&old_e)
> > into this if block, or other RCU readers may derefer the
> > freed memory (Please correct me If I am wrong).
> >
> > So I put the tcf_exts_destroy(&old_e) near the tcindex
> > destroy work, after the RCU updateing.
>
> I'm not sure what this code is trying to do, to be honest.
> Your concern that there may be a concurrent reader is valid,
> but then again tcindex_filter_result_init() just wipes the
> entire structure with a memset() so concurrent readers are
> already likely broken?
>
> Maybe tcindex_filter_result_init() dates back to times when
> exts were a list (see commit 22dc13c837c) and calling
> tcf_exts_init() wasn't that different than cleaning it up?
> In other words this code is trying to destroy old_r, not
> reinitialize it?
Yes, I also think this code is just trying to destroy the old_r.

In my opinion, the context here is a bit like, this filter's some
properties has been changed, so kernel should drop its old filter
result and update a new one.

Before kernel finishes RCU updating, concurrent readers should
see an empty result(or a valid old result), cleaned by
tcindex_filter_result_init().

This won't trigger the memory leak before commit b9a24bb76bf6
("net_sched: properly handle failure case of tcf_exts_init()"),
I think. Because the new filter result still uses the old_r->exts.

Yet after this commit, kernel allocates the new struct tcf_exts for
new filter result in tcindex_alloc_perfect_hash(), which triggers
the memory leak if kernel cleans the old_r without destroying its
newly allocted struct tcf_exts.

As for the patch, I think we'd better free this struct tcf_exts
after RCU updating, to make sure that concurrent readers can only
see an empty result or a valid old result, before finishing updating
(Please correct me if I am wrong).
>
> > > 
> > > >               err = tcindex_filter_result_init(old_r, cp, net);
  
Cong Wang Nov. 5, 2022, 7:50 p.m. UTC | #5
On Mon, Oct 31, 2022 at 02:08:35PM +0800, Hawkins Jiawei wrote:
> Syzkaller reports a memory leak as follows:
> ====================================
> BUG: memory leak
> unreferenced object 0xffff88810c287f00 (size 256):
>   comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s)
>   hex dump (first 32 bytes):
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace:
>     [<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046
>     [<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline]
>     [<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline]
>     [<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline]
>     [<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline]
>     [<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342
>     [<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553
>     [<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147
>     [<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082
>     [<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540
>     [<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
>     [<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345
>     [<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921
>     [<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline]
>     [<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734
>     [<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482
>     [<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536
>     [<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622
>     [<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline]
>     [<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline]
>     [<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648
>     [<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>     [<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>     [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> ====================================
> 
> Kernel will uses tcindex_change() to change an existing
> traffic-control-indices filter properties. During the
> process of changing, kernel will clears the old
> traffic-control-indices filter result, and updates it
> by RCU assigning new traffic-control-indices data.
> 
> Yet the problem is that, kernel will clears the old
> traffic-control-indices filter result, without destroying
> its tcf_exts structure, which triggers the above
> memory leak.
> 
> This patch solves it by using tcf_exts_destroy() to
> destroy the tcf_exts structure in old
> traffic-control-indices filter result.

So... your patch can be just the following one-liner, right?


diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index 1c9eeb98d826..00a6c04a4b42 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -479,6 +479,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
 	}
 
 	if (old_r && old_r != r) {
+		tcf_exts_destroy(&old_r->exts);
 		err = tcindex_filter_result_init(old_r, cp, net);
 		if (err < 0) {
 			kfree(f);
  
Hawkins Jiawei Nov. 6, 2022, 2:55 p.m. UTC | #6
Hi Cong,

On Sun, 6 Nov 2022 at 03:50, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Mon, Oct 31, 2022 at 02:08:35PM +0800, Hawkins Jiawei wrote:
> > Syzkaller reports a memory leak as follows:
> > ====================================
> > BUG: memory leak
> > unreferenced object 0xffff88810c287f00 (size 256):
> >   comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s)
> >   hex dump (first 32 bytes):
> >     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >   backtrace:
> >     [<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046
> >     [<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline]
> >     [<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline]
> >     [<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline]
> >     [<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline]
> >     [<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342
> >     [<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553
> >     [<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147
> >     [<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082
> >     [<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540
> >     [<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
> >     [<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345
> >     [<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921
> >     [<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline]
> >     [<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734
> >     [<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482
> >     [<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536
> >     [<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622
> >     [<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline]
> >     [<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline]
> >     [<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648
> >     [<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> >     [<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> >     [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > ====================================
> >
> > Kernel will uses tcindex_change() to change an existing
> > traffic-control-indices filter properties. During the
> > process of changing, kernel will clears the old
> > traffic-control-indices filter result, and updates it
> > by RCU assigning new traffic-control-indices data.
> >
> > Yet the problem is that, kernel will clears the old
> > traffic-control-indices filter result, without destroying
> > its tcf_exts structure, which triggers the above
> > memory leak.
> >
> > This patch solves it by using tcf_exts_destroy() to
> > destroy the tcf_exts structure in old
> > traffic-control-indices filter result.
>
> So... your patch can be just the following one-liner, right?

Yes, as you and Jakub points out, all ifdefs can be removed,
and I will refactor those in v2 patch.

>
>
> diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
> index 1c9eeb98d826..00a6c04a4b42 100644
> --- a/net/sched/cls_tcindex.c
> +++ b/net/sched/cls_tcindex.c
> @@ -479,6 +479,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
>         }
>
>         if (old_r && old_r != r) {
> +               tcf_exts_destroy(&old_r->exts);
>                 err = tcindex_filter_result_init(old_r, cp, net);
>                 if (err < 0) {
>                         kfree(f);

As for the position of the tcf_exts_destroy(), should we
call it after the RCU updating, after
`rcu_assign_pointer(tp->root, cp)` ?

Or the concurrent RCU readers may derefer this freed memory
(Please correct me If I am wrong).
  
Cong Wang Nov. 6, 2022, 5:49 p.m. UTC | #7
On Sun, Nov 06, 2022 at 10:55:31PM +0800, Hawkins Jiawei wrote:
> Hi Cong,
> 
> >
> >
> > diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
> > index 1c9eeb98d826..00a6c04a4b42 100644
> > --- a/net/sched/cls_tcindex.c
> > +++ b/net/sched/cls_tcindex.c
> > @@ -479,6 +479,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
> >         }
> >
> >         if (old_r && old_r != r) {
> > +               tcf_exts_destroy(&old_r->exts);
> >                 err = tcindex_filter_result_init(old_r, cp, net);
> >                 if (err < 0) {
> >                         kfree(f);
> 
> As for the position of the tcf_exts_destroy(), should we
> call it after the RCU updating, after
> `rcu_assign_pointer(tp->root, cp)` ?
> 
> Or the concurrent RCU readers may derefer this freed memory
> (Please correct me If I am wrong).

I don't think so, because we already have tcf_exts_change() in multiple
places within tcindex_set_parms(). Even if this is really a problem,
moving it after rcu_assign_pointer() does not help, you need to wait for
a grace period.

Thanks.
  
Hawkins Jiawei Nov. 7, 2022, 4 p.m. UTC | #8
On Mon, 7 Nov 2022 at 01:49, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Sun, Nov 06, 2022 at 10:55:31PM +0800, Hawkins Jiawei wrote:
> > Hi Cong,
> >
> > >
> > >
> > > diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
> > > index 1c9eeb98d826..00a6c04a4b42 100644
> > > --- a/net/sched/cls_tcindex.c
> > > +++ b/net/sched/cls_tcindex.c
> > > @@ -479,6 +479,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
> > >         }
> > >
> > >         if (old_r && old_r != r) {
> > > +               tcf_exts_destroy(&old_r->exts);
> > >                 err = tcindex_filter_result_init(old_r, cp, net);
> > >                 if (err < 0) {
> > >                         kfree(f);
> >
> > As for the position of the tcf_exts_destroy(), should we
> > call it after the RCU updating, after
> > `rcu_assign_pointer(tp->root, cp)` ?
> >
> > Or the concurrent RCU readers may derefer this freed memory
> > (Please correct me If I am wrong).
>
> I don't think so, because we already have tcf_exts_change() in multiple
> places within tcindex_set_parms(). Even if this is really a problem,

Do you mean that, if this is a problem, then these tcf_exts_change()
should have already triggered the Use-after-Free?(Please correct me
if I get wrong)

But it seems that these tcf_exts_change() don't destory the old_r,
so it doesn't face the above concurrent problems.

I find there are two tcf_exts_chang() in tcindex_set_parms().
One is

	oldp = p;
	r->res = cr;
	tcf_exts_change(&r->exts, &e);

	rcu_assign_pointer(tp->root, cp);

the other is

	f->result.res = r->res;
	tcf_exts_change(&f->result.exts, &r->exts);

	fp = cp->h + (handle % cp->hash);
	for (nfp = rtnl_dereference(*fp);
	     nfp;
	     fp = &nfp->next, nfp = rtnl_dereference(*fp))
			; /* nothing */

	rcu_assign_pointer(*fp, f);

*r->exts* or *f->result.exts*, both are newly allocated in
`tcindex_set_params()`, so the concurrent RCU readers won't read them
before RCU updating.

> moving it after rcu_assign_pointer() does not help, you need to wait for
> a grace period.

Yes, you are right. So if this is really a problem, I wonder if we can
add the synchronize_rcu() before freeing the old->exts, like:

diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index 1c9eeb98d826..57d900c664cf 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -338,6 +338,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
        struct tcf_result cr = {};
        int err, balloc = 0;
        struct tcf_exts e;
+       struct tcf_exts old_e = {};
 
        err = tcf_exts_init(&e, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
        if (err < 0)
@@ -479,6 +480,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
        }
 
        if (old_r && old_r != r) {
+               old_e = old_r->exts;
                err = tcindex_filter_result_init(old_r, cp, net);
                if (err < 0) {
                        kfree(f);
@@ -510,6 +512,9 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
                tcf_exts_destroy(&new_filter_result.exts);
        }
 
+       synchronize_rcu();
+       tcf_exts_destroy(&old_e);
+
        if (oldp)
                tcf_queue_work(&oldp->rwork, tcindex_partial_destroy_work);
        return 0;

>
> Thanks.
  

Patch

diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index 1c9eeb98d826..dc872a794337 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -338,6 +338,9 @@  tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
 	struct tcf_result cr = {};
 	int err, balloc = 0;
 	struct tcf_exts e;
+#ifdef CONFIG_NET_CLS_ACT
+	struct tcf_exts old_e = {};
+#endif
 
 	err = tcf_exts_init(&e, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
 	if (err < 0)
@@ -479,6 +482,14 @@  tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
 	}
 
 	if (old_r && old_r != r) {
+#ifdef CONFIG_NET_CLS_ACT
+		/* r->exts is not copied from old_r->exts, and
+		 * the following code will clears the old_r, so
+		 * we need to destroy it after updating the tp->root,
+		 * to avoid memory leak bug.
+		 */
+		old_e = old_r->exts;
+#endif
 		err = tcindex_filter_result_init(old_r, cp, net);
 		if (err < 0) {
 			kfree(f);
@@ -510,6 +521,9 @@  tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
 		tcf_exts_destroy(&new_filter_result.exts);
 	}
 
+#ifdef CONFIG_NET_CLS_ACT
+	tcf_exts_destroy(&old_e);
+#endif
 	if (oldp)
 		tcf_queue_work(&oldp->rwork, tcindex_partial_destroy_work);
 	return 0;