[v3] net: sched: fix memory leak in tcindex_set_parms

Message ID 20221129025249.463833-1-yin31149@gmail.com
State New
Headers
Series [v3] net: sched: fix memory leak in tcindex_set_parms |

Commit Message

Hawkins Jiawei Nov. 29, 2022, 2:52 a.m. UTC
  Syzkaller reports a memory leak as follows:
====================================
BUG: memory leak
unreferenced object 0xffff88810c287f00 (size 256):
  comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046
    [<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline]
    [<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline]
    [<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline]
    [<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline]
    [<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342
    [<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553
    [<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147
    [<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082
    [<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540
    [<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
    [<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345
    [<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921
    [<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline]
    [<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734
    [<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482
    [<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536
    [<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622
    [<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline]
    [<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline]
    [<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648
    [<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    [<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
    [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
====================================

Kernel uses tcindex_change() to change an existing
filter properties. During the process of changing,
kernel uses tcindex_alloc_perfect_hash() to newly
allocate filter results, uses tcindex_filter_result_init()
to clear the old filter result.

Yet the problem is that, kernel clears the old
filter result, without destroying its tcf_exts structure,
which triggers the above memory leak.

Considering that there already extis a tc_filter_wq workqueue
to destroy the old tcindex_data by tcindex_partial_destroy_work()
at the end of tcindex_set_parms(), this patch solves this memory
leak bug by removing this old filter result clearing part,
and delegating it to the tc_filter_wq workqueue.

[Thanks to the suggestion from Jakub Kicinski, Cong Wang, Paolo Abeni
and Dmitry Vyukov]

Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()")
Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/
Reported-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com
Tested-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com
Cc: Cong Wang <cong.wang@bytedance.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com> 
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
---
v3:
  - refactor the commit message
  - delegate the tcf_exts_destroy() to tc_filter_wq workqueue,
suggested by Paolo Abeni and Dmitry Vyukov

v2: https://lore.kernel.org/all/20221113170507.8205-1-yin31149@gmail.com/

v1: https://lore.kernel.org/all/20221031060835.11722-1-yin31149@gmail.com/

 net/sched/cls_tcindex.c | 8 --------
 1 file changed, 8 deletions(-)
  

Comments

Paolo Abeni Dec. 1, 2022, 10:24 a.m. UTC | #1
On Tue, 2022-11-29 at 10:52 +0800, Hawkins Jiawei wrote:
> Syzkaller reports a memory leak as follows:
> ====================================
> BUG: memory leak
> unreferenced object 0xffff88810c287f00 (size 256):
>   comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s)
>   hex dump (first 32 bytes):
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace:
>     [<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046
>     [<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline]
>     [<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline]
>     [<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline]
>     [<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline]
>     [<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342
>     [<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553
>     [<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147
>     [<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082
>     [<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540
>     [<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
>     [<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345
>     [<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921
>     [<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline]
>     [<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734
>     [<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482
>     [<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536
>     [<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622
>     [<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline]
>     [<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline]
>     [<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648
>     [<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>     [<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>     [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> ====================================
> 
> Kernel uses tcindex_change() to change an existing
> filter properties. During the process of changing,
> kernel uses tcindex_alloc_perfect_hash() to newly
> allocate filter results, uses tcindex_filter_result_init()
> to clear the old filter result.
> 
> Yet the problem is that, kernel clears the old
> filter result, without destroying its tcf_exts structure,
> which triggers the above memory leak.
> 
> Considering that there already extis a tc_filter_wq workqueue
> to destroy the old tcindex_data by tcindex_partial_destroy_work()
> at the end of tcindex_set_parms(), this patch solves this memory
> leak bug by removing this old filter result clearing part,
> and delegating it to the tc_filter_wq workqueue.
> 
> [Thanks to the suggestion from Jakub Kicinski, Cong Wang, Paolo Abeni
> and Dmitry Vyukov]
> 
> Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()")
> Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/
> Reported-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com
> Tested-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com
> Cc: Cong Wang <cong.wang@bytedance.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Dmitry Vyukov <dvyukov@google.com> 
> Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>

The patch looks correct to me, but we are very late in this release
cycle, and I fear there is a chance of introducing some regression. The
issue addressed here is present since quite some time, I suggest to
postpone this fix to the beginning of the next release cycle.

Please, repost this patch after that 6.1 is released, thanks! (And feel
free to add my Acked-by).

Paolo
  
Hawkins Jiawei Dec. 1, 2022, 1:20 p.m. UTC | #2
On Thu, 1 Dec 2022 at 18:24, Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Tue, 2022-11-29 at 10:52 +0800, Hawkins Jiawei wrote:
> > Syzkaller reports a memory leak as follows:
> > ====================================
> > BUG: memory leak
> > unreferenced object 0xffff88810c287f00 (size 256):
> >   comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s)
> >   hex dump (first 32 bytes):
> >     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >   backtrace:
> >     [<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046
> >     [<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline]
> >     [<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline]
> >     [<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline]
> >     [<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline]
> >     [<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342
> >     [<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553
> >     [<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147
> >     [<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082
> >     [<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540
> >     [<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
> >     [<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345
> >     [<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921
> >     [<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline]
> >     [<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734
> >     [<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482
> >     [<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536
> >     [<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622
> >     [<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline]
> >     [<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline]
> >     [<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648
> >     [<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> >     [<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> >     [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > ====================================
> >
> > Kernel uses tcindex_change() to change an existing
> > filter properties. During the process of changing,
> > kernel uses tcindex_alloc_perfect_hash() to newly
> > allocate filter results, uses tcindex_filter_result_init()
> > to clear the old filter result.
> >
> > Yet the problem is that, kernel clears the old
> > filter result, without destroying its tcf_exts structure,
> > which triggers the above memory leak.
> >
> > Considering that there already extis a tc_filter_wq workqueue
> > to destroy the old tcindex_data by tcindex_partial_destroy_work()
> > at the end of tcindex_set_parms(), this patch solves this memory
> > leak bug by removing this old filter result clearing part,
> > and delegating it to the tc_filter_wq workqueue.
> >
> > [Thanks to the suggestion from Jakub Kicinski, Cong Wang, Paolo Abeni
> > and Dmitry Vyukov]
> >
> > Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()")
> > Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/
> > Reported-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com
> > Tested-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com
> > Cc: Cong Wang <cong.wang@bytedance.com>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: Paolo Abeni <pabeni@redhat.com>
> > Cc: Dmitry Vyukov <dvyukov@google.com>
> > Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
>
> The patch looks correct to me, but we are very late in this release
> cycle, and I fear there is a chance of introducing some regression. The
> issue addressed here is present since quite some time, I suggest to
> postpone this fix to the beginning of the next release cycle.
>
> Please, repost this patch after that 6.1 is released, thanks! (And feel
> free to add my Acked-by).

Thanks for your review.

I will retest this patch after 6.1, and repost this patch
if the patch works fine.

>
> Paolo
>
  
Cong Wang Dec. 3, 2022, 8:19 p.m. UTC | #3
On Tue, Nov 29, 2022 at 10:52:49AM +0800, Hawkins Jiawei wrote:
> Kernel uses tcindex_change() to change an existing
> filter properties. During the process of changing,
> kernel uses tcindex_alloc_perfect_hash() to newly
> allocate filter results, uses tcindex_filter_result_init()
> to clear the old filter result.
> 
> Yet the problem is that, kernel clears the old
> filter result, without destroying its tcf_exts structure,
> which triggers the above memory leak.
> 
> Considering that there already extis a tc_filter_wq workqueue
> to destroy the old tcindex_data by tcindex_partial_destroy_work()
> at the end of tcindex_set_parms(), this patch solves this memory
> leak bug by removing this old filter result clearing part,
> and delegating it to the tc_filter_wq workqueue.

Hmm?? The tcindex_partial_destroy_work() is to destroy 'oldp' which is
different from 'old_r'. I mean, you seem assuming that struct
tcindex_filter_result is always from struct tcindex_data, which is not
true, check the following tcindex_lookup() which retrieves tcindex_filter_result
from struct tcindex_filter.

static struct tcindex_filter_result *tcindex_lookup(struct tcindex_data *p,
                                                    u16 key)
{
        if (p->perfect) {
                struct tcindex_filter_result *f = p->perfect + key;

                return tcindex_filter_is_set(f) ? f : NULL;
        } else if (p->h) {
                struct tcindex_filter __rcu **fp;
                struct tcindex_filter *f;

                fp = &p->h[key % p->hash];
                for (f = rcu_dereference_bh_rtnl(*fp);
                     f;
                     fp = &f->next, f = rcu_dereference_bh_rtnl(*fp))
                        if (f->key == key)
                                return &f->result;
        }

        return NULL;
}

 
> diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
> index 1c9eeb98d826..3f4e7a6cdd96 100644
> --- a/net/sched/cls_tcindex.c
> +++ b/net/sched/cls_tcindex.c
> @@ -478,14 +478,6 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
>  		tcf_bind_filter(tp, &cr, base);
>  	}
>  
> -	if (old_r && old_r != r) {
> -		err = tcindex_filter_result_init(old_r, cp, net);
> -		if (err < 0) {
> -			kfree(f);
> -			goto errout_alloc;
> -		}
> -	}
> -

Even if your above analysis is correct, 'old_r' becomes unused (set but not used)
now, I think you should get some compiler warning.

Thanks.
  
Hawkins Jiawei Dec. 5, 2022, 3:19 p.m. UTC | #4
On Sun, 4 Dec 2022 at 04:19, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Tue, Nov 29, 2022 at 10:52:49AM +0800, Hawkins Jiawei wrote:
> > Kernel uses tcindex_change() to change an existing
> > filter properties. During the process of changing,
> > kernel uses tcindex_alloc_perfect_hash() to newly
> > allocate filter results, uses tcindex_filter_result_init()
> > to clear the old filter result.
> >
> > Yet the problem is that, kernel clears the old
> > filter result, without destroying its tcf_exts structure,
> > which triggers the above memory leak.
> >
> > Considering that there already extis a tc_filter_wq workqueue
> > to destroy the old tcindex_data by tcindex_partial_destroy_work()
> > at the end of tcindex_set_parms(), this patch solves this memory
> > leak bug by removing this old filter result clearing part,
> > and delegating it to the tc_filter_wq workqueue.
>
> Hmm?? The tcindex_partial_destroy_work() is to destroy 'oldp' which is
> different from 'old_r'. I mean, you seem assuming that struct
> tcindex_filter_result is always from struct tcindex_data, which is not
> true, check the following tcindex_lookup() which retrieves tcindex_filter_result
> from struct tcindex_filter.
>
> static struct tcindex_filter_result *tcindex_lookup(struct tcindex_data *p,
>                                                     u16 key)
> {
>         if (p->perfect) {
>                 struct tcindex_filter_result *f = p->perfect + key;
>
>                 return tcindex_filter_is_set(f) ? f : NULL;
>         } else if (p->h) {
>                 struct tcindex_filter __rcu **fp;
>                 struct tcindex_filter *f;
>
>                 fp = &p->h[key % p->hash];
>                 for (f = rcu_dereference_bh_rtnl(*fp);
>                      f;
>                      fp = &f->next, f = rcu_dereference_bh_rtnl(*fp))
>                         if (f->key == key)
>                                 return &f->result;
>         }
>
>         return NULL;
> }

Oh, thanks for correcting me! You are right, I wrongly assuming that
struct tcindex_filter_result is always from struct tcindex_data
`perfect` field.

But I think this patch still can fix this problem, after reviewing
the tcindex_set_parms(). Because only the `tcindex_filter_result` is
from `struct tcindex_data`, can the code reaches the deleted part
in this patch.

To be more specific, the simplified logic about original
tcindex_set_parms() is as below:

static int
tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
		  u32 handle, struct tcindex_data *p,
		  struct tcindex_filter_result *r, struct nlattr **tb,
		  struct nlattr *est, u32 flags, struct netlink_ext_ack *extack)
{
	...
	if (p->perfect) {
		int i;

		if (tcindex_alloc_perfect_hash(net, cp) < 0)
			goto errout;
		cp->alloc_hash = cp->hash;
		for (i = 0; i < min(cp->hash, p->hash); i++)
			cp->perfect[i].res = p->perfect[i].res;
		balloc = 1;
	}
	cp->h = p->h;

	...

	if (cp->perfect)
		r = cp->perfect + handle;
	else
		r = tcindex_lookup(cp, handle) ? : &new_filter_result;

	if (old_r && old_r != r) {
		err = tcindex_filter_result_init(old_r, cp, net);
		if (err < 0) {
			kfree(f);
			goto errout_alloc;
		}
	}
	...
}

- cp's h field is directly copied from p's h field

- if `old_r` is retrieved from struct tcindex_filter, in other word,
is retrieved from p's h field. Then the `r` should get the same value
from `tcindex_loopup(cp, handle)`.

- so `old_r == r` is true, code will never uses tcindex_filter_result_init()
to clear the old_r in such case.

So I think this patch still can fix this memory leak caused by 
tcindex_filter_result_init(), But maybe I need to improve my
commit message.

Please correct me If I am wrong.

> > diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
> > index 1c9eeb98d826..3f4e7a6cdd96 100644
> > --- a/net/sched/cls_tcindex.c
> > +++ b/net/sched/cls_tcindex.c
> > @@ -478,14 +478,6 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
> >               tcf_bind_filter(tp, &cr, base);
> >       }
> > 
> > -     if (old_r && old_r != r) {
> > -             err = tcindex_filter_result_init(old_r, cp, net);
> > -             if (err < 0) {
> > -                     kfree(f);
> > -                     goto errout_alloc;
> > -             }
> > -     }
> > -
>
> Even if your above analysis is correct, 'old_r' becomes unused (set but not used)
> now, I think you should get some compiler warning.


Oh, it actually didn't trigger any compiler warning,
because there is still a used place as below:

static int
tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
		  u32 handle, struct tcindex_data *p,
		  struct tcindex_filter_result *r, struct nlattr **tb,
		  struct nlattr *est, u32 flags, struct netlink_ext_ack *extack)
{
	struct tcindex_filter_result new_filter_result, *old_r = r;
	...
	err = tcindex_filter_result_init(&new_filter_result, cp, net);
	if (err < 0)
		goto errout_alloc;
	if (old_r)
		cr = r->res;
	...
}

But the `old_r` and `r` has the same value here, so we can just replace
the `old_r` with `r` here, and delete the `old_r` as you suggested.

Thanks for your suggestion!

>
> Thanks.
  
Cong Wang Dec. 10, 2022, 9:29 p.m. UTC | #5
On Mon, Dec 05, 2022 at 11:19:56PM +0800, Hawkins Jiawei wrote:
> To be more specific, the simplified logic about original
> tcindex_set_parms() is as below:
> 
> static int
> tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
> 		  u32 handle, struct tcindex_data *p,
> 		  struct tcindex_filter_result *r, struct nlattr **tb,
> 		  struct nlattr *est, u32 flags, struct netlink_ext_ack *extack)
> {
> 	...
> 	if (p->perfect) {
> 		int i;
> 
> 		if (tcindex_alloc_perfect_hash(net, cp) < 0)
> 			goto errout;
> 		cp->alloc_hash = cp->hash;
> 		for (i = 0; i < min(cp->hash, p->hash); i++)
> 			cp->perfect[i].res = p->perfect[i].res;
> 		balloc = 1;
> 	}
> 	cp->h = p->h;
> 
> 	...
> 
> 	if (cp->perfect)
> 		r = cp->perfect + handle;

We can reach here if p->perfect is non-NULL.

> 	else
> 		r = tcindex_lookup(cp, handle) ? : &new_filter_result;
> 
> 	if (old_r && old_r != r) {
> 		err = tcindex_filter_result_init(old_r, cp, net);
> 		if (err < 0) {
> 			kfree(f);
> 			goto errout_alloc;
> 		}
> 	}
> 	...
> }
> 
> - cp's h field is directly copied from p's h field
> 
> - if `old_r` is retrieved from struct tcindex_filter, in other word,
> is retrieved from p's h field. Then the `r` should get the same value
> from `tcindex_loopup(cp, handle)`.

See above, 'r' can be 'cp->perfect + handle' which is newly allocated,
hence different from 'old_r'.

> 
> - so `old_r == r` is true, code will never uses tcindex_filter_result_init()
> to clear the old_r in such case.

Not always.

> 
> So I think this patch still can fix this memory leak caused by 
> tcindex_filter_result_init(), But maybe I need to improve my
> commit message.
> 

I think your patch may introduce other memory leaks and 'old_r' may
be left as obsoleted too.

Thanks.
  
Hawkins Jiawei Dec. 12, 2022, 4:14 p.m. UTC | #6
On Sun, 11 Dec 2022 at 05:29, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Mon, Dec 05, 2022 at 11:19:56PM +0800, Hawkins Jiawei wrote:
> > To be more specific, the simplified logic about original
> > tcindex_set_parms() is as below:
> >
> > static int
> > tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
> >                 u32 handle, struct tcindex_data *p,
> >                 struct tcindex_filter_result *r, struct nlattr **tb,
> >                 struct nlattr *est, u32 flags, struct netlink_ext_ack *extack)
> > {
> >       ...
> >       if (p->perfect) {
> >               int i;
> >
> >               if (tcindex_alloc_perfect_hash(net, cp) < 0)
> >                       goto errout;
> >               cp->alloc_hash = cp->hash;
> >               for (i = 0; i < min(cp->hash, p->hash); i++)
> >                       cp->perfect[i].res = p->perfect[i].res;
> >               balloc = 1;
> >       }
> >       cp->h = p->h;
> >
> >       ...
> >
> >       if (cp->perfect)
> >               r = cp->perfect + handle;
>
> We can reach here if p->perfect is non-NULL.
>
> >       else
> >               r = tcindex_lookup(cp, handle) ? : &new_filter_result;
> >
> >       if (old_r && old_r != r) {
> >               err = tcindex_filter_result_init(old_r, cp, net);
> >               if (err < 0) {
> >                       kfree(f);
> >                       goto errout_alloc;
> >               }
> >       }
> >       ...
> > }
> >
> > - cp's h field is directly copied from p's h field
> >
> > - if `old_r` is retrieved from struct tcindex_filter, in other word,
> > is retrieved from p's h field. Then the `r` should get the same value
> > from `tcindex_loopup(cp, handle)`.
>
> See above, 'r' can be 'cp->perfect + handle' which is newly allocated,
> hence different from 'old_r'.

But if `r` is `cp->perfect + handle`, this means `cp->perfect` is not
NULL. So `p->perfect` should not be NULL, which means `old_r` should be
`p->perfect + handle`, according to tcindex_lookup(). This is not
correct with the assumption that `old_r` is retrieved from p's h field.

>
> >
> > - so `old_r == r` is true, code will never uses tcindex_filter_result_init()
> > to clear the old_r in such case.
>
> Not always.
>
> >
> > So I think this patch still can fix this memory leak caused by
> > tcindex_filter_result_init(), But maybe I need to improve my
> > commit message.
> >
>
> I think your patch may introduce other memory leaks and 'old_r' may
> be left as obsoleted too.

I still think this patch should not introduce any memory leaks.

* If the `old_r` is not NULL, it should have only two source according
to the tcindex_lookup() - `old_r` is retrieved from `p->perfect`; or
`old_r` is retrieved from `p->h`. And if `old_r` is retrieved from `p->h`,
this means `p->perfect` is NULL.


* If the `old_r` is retrieved from `p->perfect`, kernel uses
tcindex_alloc_perfect_hash() to newly allocate the filter results.
And `r` should be `cp->perfect + handle`, which is newly allocated.

So `r != old_r` in this situation, but kernel will clears the `old_r`
at tc_filter_wq workqueue in tcindex_partial_destroy_work(), by
destroying the p->perfect. So here kernel doesn't need
tcindex_filter_result_init() to clear the old filter result, and
there is no memory leak.


* If the `old_r` is retrieved from `p->h`, then `p->perfect` is NULL
discussed above. Considering that `cp->h` is directly copied from
`p->h`, `r` should get the same value as `old_r` from tcindex_lookup().

So `r == old_r`, it will ignore the part that kernel uses
tcindex_filter_result_init() to clear the old filter result. So removing
this part of code should have no effect in this situation.



It seems that whether `old_r` is retrived from `p->h` or `p->perfect`,
it is okay to directly deleting the part that kernel uses
tcindex_filter_result_init() to clear the old filter result, without any
memory leak. But this can fix the memory leak caused by
tcindex_filter_result_init().

As for `old_r` may be left as obsoleted, do you mean `old_r` becomes
unused(set but not used)? I think we can directly removing `old_r`.

>
> Thanks.
  

Patch

diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index 1c9eeb98d826..3f4e7a6cdd96 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -478,14 +478,6 @@  tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
 		tcf_bind_filter(tp, &cr, base);
 	}
 
-	if (old_r && old_r != r) {
-		err = tcindex_filter_result_init(old_r, cp, net);
-		if (err < 0) {
-			kfree(f);
-			goto errout_alloc;
-		}
-	}
-
 	oldp = p;
 	r->res = cr;
 	tcf_exts_change(&r->exts, &e);