Message ID | 20221031060835.11722-1-yin31149@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp2137882wru; Sun, 30 Oct 2022 23:12:02 -0700 (PDT) X-Google-Smtp-Source: AMsMyM73pUw2a4vBhyWOBMGFLTNT0CpXzZpXFFnNmpG8B4TtQ/B3HXtu3FIi5F2cNLvE+JMWlUgn X-Received: by 2002:a05:6a00:224c:b0:56c:40ff:7709 with SMTP id i12-20020a056a00224c00b0056c40ff7709mr12609333pfu.59.1667196722421; Sun, 30 Oct 2022 23:12:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667196722; cv=none; d=google.com; s=arc-20160816; b=HC2uGUcW4zVzEretQOF7PfUlCcX4QO/J1nxNrwYKAg1D81MEdERhQ6igHOni+QUVMf Wrgggd7aqrOcD3jkZihmc35MYfDBouQBGK/g4lDOmNgw/aWrasSfJrZvVWZ7Pee7VQ3Q h9OSXHA9Ks9WLQrCiyIvkmsZLprVx8R1NLEZUU5b0s4ILLrd+gQB9sc5maXAGU8L+c9e LMIrich3tfYyxKkpAQG3BUZTdnyioznjBHBt14FURcUHQM7xAjxPzAXCuBKyic02XtRg 7p5h9wWP7XsFnXa0dzxMfAPg+k3XUHWO32BpPwVnYaC8N4+2a6U3pvj/czKlfE5ASuYI BMKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=iEE08gyNxXm/V2oNqjJuD7X/v5POQZGV6saTMMYOy0I=; b=yKyu4DHiqwoxgV0EZhtQj3FCXiT/lMNECZalMsvFLgOuJ+mBJL66lvd4L7KuEkOgv2 AcEF3pzlvSVsFc+hdzqR85u8vR1nz1QHqRNREb5SUMK008GqlKkgfj0id+SwK4n59yff 086sNZRs59Ggkh0SkApvlGN7K+wOPchomaXMmYJZxKC0H8a1tbTxY/i593IUU9jH/IRZ nl1ThcFaMUYNaI++d3/g78slboxymNDR4NmM0JXuYjtfA3nAlq3pIoT+lYzt2OVQbADs OprDSL26iKg5gutas6TV76qP8J/zpRUFQStghOA7z/D80I7O6tZvl0oQghfJ/QAOrF4U a/mA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="pe7Z/HoF"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b17-20020a631b11000000b0046ebb32e9d4si8385170pgb.737.2022.10.30.23.11.48; Sun, 30 Oct 2022 23:12:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="pe7Z/HoF"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229628AbiJaGIn (ORCPT <rfc822;kartikey406@gmail.com> + 99 others); Mon, 31 Oct 2022 02:08:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229487AbiJaGIk (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 31 Oct 2022 02:08:40 -0400 Received: from mail-pg1-x52c.google.com (mail-pg1-x52c.google.com [IPv6:2607:f8b0:4864:20::52c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B301310CE; Sun, 30 Oct 2022 23:08:39 -0700 (PDT) Received: by mail-pg1-x52c.google.com with SMTP id e129so9840243pgc.9; Sun, 30 Oct 2022 23:08:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=iEE08gyNxXm/V2oNqjJuD7X/v5POQZGV6saTMMYOy0I=; b=pe7Z/HoFYVd22iReZXMg/yQhJGXQCbDTDTjxqJoDfsGnPXU2r6xPW1xBStxvz+u9Uj NlGCW++oC0//NtdCNnfGpcYAoz38+o27CfDgLkpZ6Ucqf3GrZwamjIETWPsdjQMQYBow W2BWEMUAsAdjkIWNorvzojKqgxMpwgGdkUhPp+izVwoF9/PBhYVYrozupT8TO/GjYQxS Ac49G9tiMena5j3mB0m2HRExHdN2prqCK1K80opvSfkcnkWpG8RREY0fiGVy4RC37IO5 NGaLOBNRGnzceorVhG1+tA08H9Wuby7xcyGjeAiAFrPiGTvMo9YF0a5y7XgSPWf9oEWS 4wPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=iEE08gyNxXm/V2oNqjJuD7X/v5POQZGV6saTMMYOy0I=; b=gjXaZCNGpiAEuZkQOgqO5TdLS+fdZRRZL47mpmY2sDXU5huq196cF2yusgx4QsXMza y9cjwlcAGddazgDtxLxeCzFnB8BTWzYpV/duRvxa5WILquBvCJT3mjDGmZD0um6zRxVW P7+IBeGfSGTREZGtD1crTDdIUxI5groR1r8KT+M+GZnti//MyiuHrSrMfPZ8b0WvCVog bVbXs5dB0hKqLhY8u5wi+VWlRJs1OsfTbS9EDY7aWaTC+jtH1+qnePXu/nNH6YkdQsHj CVc1CkVo2pzL+ED/K9/+Fx0Lwt1HDAnx6+hb9S0Eu7xHOu3tRuyMeRwRDY2bjBdZJ56r 3H9Q== X-Gm-Message-State: ACrzQf0eKLD1eoMkmywngb9kfA0RcFcF4+tSJdeB2bbJDhnl0uV9V1VU +YzHjtfPhUPXC3pfEy2Kfvw= X-Received: by 2002:a63:5455:0:b0:46f:be60:af9b with SMTP id e21-20020a635455000000b0046fbe60af9bmr3441248pgm.307.1667196519118; Sun, 30 Oct 2022 23:08:39 -0700 (PDT) Received: from localhost ([159.226.94.113]) by smtp.gmail.com with ESMTPSA id n6-20020a170902d2c600b00181f8523f60sm3583174plc.225.2022.10.30.23.08.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 30 Oct 2022 23:08:38 -0700 (PDT) From: Hawkins Jiawei <yin31149@gmail.com> To: yin31149@gmail.com, Jamal Hadi Salim <jhs@mojatatu.com>, Cong Wang <xiyou.wangcong@gmail.com>, Jiri Pirko <jiri@resnulli.us>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com> Cc: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com, syzkaller-bugs@googlegroups.com, 18801353760@163.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] net: sched: fix memory leak in tcindex_set_parms Date: Mon, 31 Oct 2022 14:08:35 +0800 Message-Id: <20221031060835.11722-1-yin31149@gmail.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748182470508835704?= X-GMAIL-MSGID: =?utf-8?q?1748182470508835704?= |
Series |
net: sched: fix memory leak in tcindex_set_parms
|
|
Commit Message
Hawkins Jiawei
Oct. 31, 2022, 6:08 a.m. UTC
Syzkaller reports a memory leak as follows: ==================================== BUG: memory leak unreferenced object 0xffff88810c287f00 (size 256): comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046 [<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline] [<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline] [<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline] [<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline] [<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342 [<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553 [<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147 [<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082 [<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540 [<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] [<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345 [<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921 [<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline] [<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734 [<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482 [<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536 [<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622 [<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline] [<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline] [<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648 [<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd ==================================== Kernel will uses tcindex_change() to change an existing traffic-control-indices filter properties. During the process of changing, kernel will clears the old traffic-control-indices filter result, and updates it by RCU assigning new traffic-control-indices data. Yet the problem is that, kernel will clears the old traffic-control-indices filter result, without destroying its tcf_exts structure, which triggers the above memory leak. This patch solves it by using tcf_exts_destroy() to destroy the tcf_exts structure in old traffic-control-indices filter result. Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/ Reported-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com Tested-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> --- net/sched/cls_tcindex.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
Comments
On Mon, 31 Oct 2022 14:08:35 +0800 Hawkins Jiawei wrote: > Kernel will uses tcindex_change() to change an existing s/will// > traffic-control-indices filter properties. During the > process of changing, kernel will clears the old s/will// > traffic-control-indices filter result, and updates it > by RCU assigning new traffic-control-indices data. > > Yet the problem is that, kernel will clears the old s/will// > traffic-control-indices filter result, without destroying > its tcf_exts structure, which triggers the above > memory leak. > > This patch solves it by using tcf_exts_destroy() to > destroy the tcf_exts structure in old > traffic-control-indices filter result. > Please provide a Fixes tag to where the problem was introduced (or the initial git commit). > Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/ > Reported-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com > Tested-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com > Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> > --- > net/sched/cls_tcindex.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c > index 1c9eeb98d826..dc872a794337 100644 > --- a/net/sched/cls_tcindex.c > +++ b/net/sched/cls_tcindex.c > @@ -338,6 +338,9 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > struct tcf_result cr = {}; > int err, balloc = 0; > struct tcf_exts e; > +#ifdef CONFIG_NET_CLS_ACT > + struct tcf_exts old_e = {}; > +#endif Why all the ifdefs? > err = tcf_exts_init(&e, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE); > if (err < 0) > @@ -479,6 +482,14 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > } > > if (old_r && old_r != r) { > +#ifdef CONFIG_NET_CLS_ACT > + /* r->exts is not copied from old_r->exts, and > + * the following code will clears the old_r, so > + * we need to destroy it after updating the tp->root, > + * to avoid memory leak bug. > + */ > + old_e = old_r->exts; > +#endif Can't you localize all the changes to this if block? Maybe add a function called tcindex_filter_result_reinit() which will act more appropriately? > err = tcindex_filter_result_init(old_r, cp, net); > if (err < 0) { > kfree(f); > @@ -510,6 +521,9 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > tcf_exts_destroy(&new_filter_result.exts); > } > > +#ifdef CONFIG_NET_CLS_ACT > + tcf_exts_destroy(&old_e); > +#endif > if (oldp) > tcf_queue_work(&oldp->rwork, tcindex_partial_destroy_work); > return 0;
Hi Jakub, On Thu, 3 Nov 2022 at 11:26, Jakub Kicinski <kuba@kernel.org> wrote: > > On Mon, 31 Oct 2022 14:08:35 +0800 Hawkins Jiawei wrote: > > Kernel will uses tcindex_change() to change an existing > > s/will// > > > traffic-control-indices filter properties. During the > > process of changing, kernel will clears the old > > s/will// > > > traffic-control-indices filter result, and updates it > > by RCU assigning new traffic-control-indices data. > > > > Yet the problem is that, kernel will clears the old > > s/will// Thanks for the suggestion. I will amend these in the v2 patch. > > > traffic-control-indices filter result, without destroying > > its tcf_exts structure, which triggers the above > > memory leak. > > > > This patch solves it by using tcf_exts_destroy() to > > destroy the tcf_exts structure in old > > traffic-control-indices filter result. > > > > Please provide a Fixes tag to where the problem was introduced > (or the initial git commit). Thanks for reminding, it seems that the problem was introduced by commit b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()"), because it was in this commit that kernel allocated the struct tcf_exts for new traffic-control-indices filter result in tcindex_alloc_perfect_hash(). I will add the tag in the v2 patch. > > > Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/ > > Reported-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com > > Tested-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com > > Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> > > --- > > net/sched/cls_tcindex.c | 14 ++++++++++++++ > > 1 file changed, 14 insertions(+) > > > > diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c > > index 1c9eeb98d826..dc872a794337 100644 > > --- a/net/sched/cls_tcindex.c > > +++ b/net/sched/cls_tcindex.c > > @@ -338,6 +338,9 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > > struct tcf_result cr = {}; > > int err, balloc = 0; > > struct tcf_exts e; > > +#ifdef CONFIG_NET_CLS_ACT > > + struct tcf_exts old_e = {}; > > +#endif > > Why all the ifdefs? Thanks for suggestion, it seems that these ifdefs are not needed. I will delete these in the v2 patch. > > > err = tcf_exts_init(&e, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE); > > if (err < 0) > > @@ -479,6 +482,14 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > > } > > > > if (old_r && old_r != r) { > > +#ifdef CONFIG_NET_CLS_ACT > > + /* r->exts is not copied from old_r->exts, and > > + * the following code will clears the old_r, so > > + * we need to destroy it after updating the tp->root, > > + * to avoid memory leak bug. > > + */ > > + old_e = old_r->exts; > > +#endif > > Can't you localize all the changes to this if block? > > Maybe add a function called tcindex_filter_result_reinit() > which will act more appropriately? I think we shouldn't put the tcf_exts_destroy(&old_e) into this if block, or other RCU readers may derefer the freed memory (Please correct me If I am wrong). So I put the tcf_exts_destroy(&old_e) near the tcindex destroy work, after the RCU updateing. > > > err = tcindex_filter_result_init(old_r, cp, net); > > if (err < 0) { > > kfree(f); > > @@ -510,6 +521,9 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > > tcf_exts_destroy(&new_filter_result.exts); > > } > > > > +#ifdef CONFIG_NET_CLS_ACT > > + tcf_exts_destroy(&old_e); > > +#endif > > if (oldp) > > tcf_queue_work(&oldp->rwork, tcindex_partial_destroy_work); > > return 0;
On Fri, 4 Nov 2022 00:07:00 +0800 Hawkins Jiawei wrote: > > Can't you localize all the changes to this if block? > > > > Maybe add a function called tcindex_filter_result_reinit() > > which will act more appropriately? > > I think we shouldn't put the tcf_exts_destroy(&old_e) > into this if block, or other RCU readers may derefer the > freed memory (Please correct me If I am wrong). > > So I put the tcf_exts_destroy(&old_e) near the tcindex > destroy work, after the RCU updateing. I'm not sure what this code is trying to do, to be honest. Your concern that there may be a concurrent reader is valid, but then again tcindex_filter_result_init() just wipes the entire structure with a memset() so concurrent readers are already likely broken? Maybe tcindex_filter_result_init() dates back to times when exts were a list (see commit 22dc13c837c) and calling tcf_exts_init() wasn't that different than cleaning it up? In other words this code is trying to destroy old_r, not reinitialize it? > > > > > err = tcindex_filter_result_init(old_r, cp, net);
On Fri, 4 Nov 2022 at 10:23, Jakub Kicinski <kuba@kernel.org> wrote: > > On Fri, 4 Nov 2022 00:07:00 +0800 Hawkins Jiawei wrote: > > > Can't you localize all the changes to this if block? > > > > > > Maybe add a function called tcindex_filter_result_reinit() > > > which will act more appropriately? > > > > I think we shouldn't put the tcf_exts_destroy(&old_e) > > into this if block, or other RCU readers may derefer the > > freed memory (Please correct me If I am wrong). > > > > So I put the tcf_exts_destroy(&old_e) near the tcindex > > destroy work, after the RCU updateing. > > I'm not sure what this code is trying to do, to be honest. > Your concern that there may be a concurrent reader is valid, > but then again tcindex_filter_result_init() just wipes the > entire structure with a memset() so concurrent readers are > already likely broken? > > Maybe tcindex_filter_result_init() dates back to times when > exts were a list (see commit 22dc13c837c) and calling > tcf_exts_init() wasn't that different than cleaning it up? > In other words this code is trying to destroy old_r, not > reinitialize it? Yes, I also think this code is just trying to destroy the old_r. In my opinion, the context here is a bit like, this filter's some properties has been changed, so kernel should drop its old filter result and update a new one. Before kernel finishes RCU updating, concurrent readers should see an empty result(or a valid old result), cleaned by tcindex_filter_result_init(). This won't trigger the memory leak before commit b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()"), I think. Because the new filter result still uses the old_r->exts. Yet after this commit, kernel allocates the new struct tcf_exts for new filter result in tcindex_alloc_perfect_hash(), which triggers the memory leak if kernel cleans the old_r without destroying its newly allocted struct tcf_exts. As for the patch, I think we'd better free this struct tcf_exts after RCU updating, to make sure that concurrent readers can only see an empty result or a valid old result, before finishing updating (Please correct me if I am wrong). > > > > > > > > err = tcindex_filter_result_init(old_r, cp, net);
On Mon, Oct 31, 2022 at 02:08:35PM +0800, Hawkins Jiawei wrote: > Syzkaller reports a memory leak as follows: > ==================================== > BUG: memory leak > unreferenced object 0xffff88810c287f00 (size 256): > comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s) > hex dump (first 32 bytes): > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > backtrace: > [<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046 > [<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline] > [<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline] > [<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline] > [<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline] > [<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342 > [<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553 > [<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147 > [<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082 > [<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540 > [<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] > [<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345 > [<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921 > [<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline] > [<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734 > [<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482 > [<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536 > [<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622 > [<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline] > [<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline] > [<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648 > [<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] > [<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 > [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd > ==================================== > > Kernel will uses tcindex_change() to change an existing > traffic-control-indices filter properties. During the > process of changing, kernel will clears the old > traffic-control-indices filter result, and updates it > by RCU assigning new traffic-control-indices data. > > Yet the problem is that, kernel will clears the old > traffic-control-indices filter result, without destroying > its tcf_exts structure, which triggers the above > memory leak. > > This patch solves it by using tcf_exts_destroy() to > destroy the tcf_exts structure in old > traffic-control-indices filter result. So... your patch can be just the following one-liner, right? diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c index 1c9eeb98d826..00a6c04a4b42 100644 --- a/net/sched/cls_tcindex.c +++ b/net/sched/cls_tcindex.c @@ -479,6 +479,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, } if (old_r && old_r != r) { + tcf_exts_destroy(&old_r->exts); err = tcindex_filter_result_init(old_r, cp, net); if (err < 0) { kfree(f);
Hi Cong, On Sun, 6 Nov 2022 at 03:50, Cong Wang <xiyou.wangcong@gmail.com> wrote: > > On Mon, Oct 31, 2022 at 02:08:35PM +0800, Hawkins Jiawei wrote: > > Syzkaller reports a memory leak as follows: > > ==================================== > > BUG: memory leak > > unreferenced object 0xffff88810c287f00 (size 256): > > comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s) > > hex dump (first 32 bytes): > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > backtrace: > > [<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046 > > [<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline] > > [<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline] > > [<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline] > > [<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline] > > [<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342 > > [<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553 > > [<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147 > > [<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082 > > [<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540 > > [<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] > > [<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345 > > [<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921 > > [<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline] > > [<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734 > > [<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482 > > [<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536 > > [<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622 > > [<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline] > > [<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline] > > [<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648 > > [<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] > > [<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 > > [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd > > ==================================== > > > > Kernel will uses tcindex_change() to change an existing > > traffic-control-indices filter properties. During the > > process of changing, kernel will clears the old > > traffic-control-indices filter result, and updates it > > by RCU assigning new traffic-control-indices data. > > > > Yet the problem is that, kernel will clears the old > > traffic-control-indices filter result, without destroying > > its tcf_exts structure, which triggers the above > > memory leak. > > > > This patch solves it by using tcf_exts_destroy() to > > destroy the tcf_exts structure in old > > traffic-control-indices filter result. > > So... your patch can be just the following one-liner, right? Yes, as you and Jakub points out, all ifdefs can be removed, and I will refactor those in v2 patch. > > > diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c > index 1c9eeb98d826..00a6c04a4b42 100644 > --- a/net/sched/cls_tcindex.c > +++ b/net/sched/cls_tcindex.c > @@ -479,6 +479,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > } > > if (old_r && old_r != r) { > + tcf_exts_destroy(&old_r->exts); > err = tcindex_filter_result_init(old_r, cp, net); > if (err < 0) { > kfree(f); As for the position of the tcf_exts_destroy(), should we call it after the RCU updating, after `rcu_assign_pointer(tp->root, cp)` ? Or the concurrent RCU readers may derefer this freed memory (Please correct me If I am wrong).
On Sun, Nov 06, 2022 at 10:55:31PM +0800, Hawkins Jiawei wrote: > Hi Cong, > > > > > > > diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c > > index 1c9eeb98d826..00a6c04a4b42 100644 > > --- a/net/sched/cls_tcindex.c > > +++ b/net/sched/cls_tcindex.c > > @@ -479,6 +479,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > > } > > > > if (old_r && old_r != r) { > > + tcf_exts_destroy(&old_r->exts); > > err = tcindex_filter_result_init(old_r, cp, net); > > if (err < 0) { > > kfree(f); > > As for the position of the tcf_exts_destroy(), should we > call it after the RCU updating, after > `rcu_assign_pointer(tp->root, cp)` ? > > Or the concurrent RCU readers may derefer this freed memory > (Please correct me If I am wrong). I don't think so, because we already have tcf_exts_change() in multiple places within tcindex_set_parms(). Even if this is really a problem, moving it after rcu_assign_pointer() does not help, you need to wait for a grace period. Thanks.
On Mon, 7 Nov 2022 at 01:49, Cong Wang <xiyou.wangcong@gmail.com> wrote: > > On Sun, Nov 06, 2022 at 10:55:31PM +0800, Hawkins Jiawei wrote: > > Hi Cong, > > > > > > > > > > > diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c > > > index 1c9eeb98d826..00a6c04a4b42 100644 > > > --- a/net/sched/cls_tcindex.c > > > +++ b/net/sched/cls_tcindex.c > > > @@ -479,6 +479,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > > > } > > > > > > if (old_r && old_r != r) { > > > + tcf_exts_destroy(&old_r->exts); > > > err = tcindex_filter_result_init(old_r, cp, net); > > > if (err < 0) { > > > kfree(f); > > > > As for the position of the tcf_exts_destroy(), should we > > call it after the RCU updating, after > > `rcu_assign_pointer(tp->root, cp)` ? > > > > Or the concurrent RCU readers may derefer this freed memory > > (Please correct me If I am wrong). > > I don't think so, because we already have tcf_exts_change() in multiple > places within tcindex_set_parms(). Even if this is really a problem, Do you mean that, if this is a problem, then these tcf_exts_change() should have already triggered the Use-after-Free?(Please correct me if I get wrong) But it seems that these tcf_exts_change() don't destory the old_r, so it doesn't face the above concurrent problems. I find there are two tcf_exts_chang() in tcindex_set_parms(). One is oldp = p; r->res = cr; tcf_exts_change(&r->exts, &e); rcu_assign_pointer(tp->root, cp); the other is f->result.res = r->res; tcf_exts_change(&f->result.exts, &r->exts); fp = cp->h + (handle % cp->hash); for (nfp = rtnl_dereference(*fp); nfp; fp = &nfp->next, nfp = rtnl_dereference(*fp)) ; /* nothing */ rcu_assign_pointer(*fp, f); *r->exts* or *f->result.exts*, both are newly allocated in `tcindex_set_params()`, so the concurrent RCU readers won't read them before RCU updating. > moving it after rcu_assign_pointer() does not help, you need to wait for > a grace period. Yes, you are right. So if this is really a problem, I wonder if we can add the synchronize_rcu() before freeing the old->exts, like: diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c index 1c9eeb98d826..57d900c664cf 100644 --- a/net/sched/cls_tcindex.c +++ b/net/sched/cls_tcindex.c @@ -338,6 +338,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, struct tcf_result cr = {}; int err, balloc = 0; struct tcf_exts e; + struct tcf_exts old_e = {}; err = tcf_exts_init(&e, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE); if (err < 0) @@ -479,6 +480,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, } if (old_r && old_r != r) { + old_e = old_r->exts; err = tcindex_filter_result_init(old_r, cp, net); if (err < 0) { kfree(f); @@ -510,6 +512,9 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, tcf_exts_destroy(&new_filter_result.exts); } + synchronize_rcu(); + tcf_exts_destroy(&old_e); + if (oldp) tcf_queue_work(&oldp->rwork, tcindex_partial_destroy_work); return 0; > > Thanks.
diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c index 1c9eeb98d826..dc872a794337 100644 --- a/net/sched/cls_tcindex.c +++ b/net/sched/cls_tcindex.c @@ -338,6 +338,9 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, struct tcf_result cr = {}; int err, balloc = 0; struct tcf_exts e; +#ifdef CONFIG_NET_CLS_ACT + struct tcf_exts old_e = {}; +#endif err = tcf_exts_init(&e, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE); if (err < 0) @@ -479,6 +482,14 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, } if (old_r && old_r != r) { +#ifdef CONFIG_NET_CLS_ACT + /* r->exts is not copied from old_r->exts, and + * the following code will clears the old_r, so + * we need to destroy it after updating the tp->root, + * to avoid memory leak bug. + */ + old_e = old_r->exts; +#endif err = tcindex_filter_result_init(old_r, cp, net); if (err < 0) { kfree(f); @@ -510,6 +521,9 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, tcf_exts_destroy(&new_filter_result.exts); } +#ifdef CONFIG_NET_CLS_ACT + tcf_exts_destroy(&old_e); +#endif if (oldp) tcf_queue_work(&oldp->rwork, tcindex_partial_destroy_work); return 0;