Message ID | 20221113170507.8205-1-yin31149@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp1764818wru; Sun, 13 Nov 2022 09:19:12 -0800 (PST) X-Google-Smtp-Source: AA0mqf5dRG+eO1yM/xkoDIDj+d8idZ6X3ITYdJ1RfbIRHQstEQ+2dTgxsXK297sbRKYMMKTXIfuh X-Received: by 2002:a63:114b:0:b0:470:5b0d:b50e with SMTP id 11-20020a63114b000000b004705b0db50emr8725338pgr.488.1668359952346; Sun, 13 Nov 2022 09:19:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668359952; cv=none; d=google.com; s=arc-20160816; b=KsWHC7VqpB+d8ZwdnS/5Jnjb49ehkqgIW2eYPLsj+vcQKBiLUhIsHLBDKoEEoxX3fy n5enyIi7sU0cSPeXUw6YVQVIen+IpSojs2LWLgtxuK3J/nmq/Hft65i0u7zcf+SRZjwQ eg77+YRvBWdltoMFoTJQBLCLq1ClXEvK960lBPTLXn8dLj/KmQL6oXrRiS83WaMIhZYu lpD67xxr+1HPzr74dLrMjEod0MWHoSoCJoQBObKNeyReVEKYbFfsACv8pEGCBSjtmcdm Q4DUh3bWl8bRKlwHZ/tpdnYuKXU5opvrIOMqIRbjS8JCdDgrC4pDpAvOfx3C2cR731m6 SeSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=gA2/oZmfwkbn2XDHTL8FyR2xx1wlqu47HFzJm5YeRu8=; b=fRnuVmlw2VlZbXjuPpz6L/aS8S15r5TaIYLw8z0dzscB2iXujXqwEGE3ETk5L33FAU zhsKaCxkGQBB8NtkKRTAhmDTg9rZWBqmWY8C0v4mrckvx7zw3RxNPvC5QvT4kp/FhXty z+NM1kOdtidL7GgcgV5YIP18b7OPuxGPr1o8lK8XRENBrssgrcaJUh0iHSiJeAKy7Rcu QYmcR5Ll2UfeLnDAnabHKeCbuC6BOPToU0B7MPdTEqhaGRvTQWe5Xd+hpaZehRbTr4EX rJmkVdxcurleG5zUmpA0SXcJP9hIEAJYEYHXFMLWDHAlqzwLcCKbA71WyUs/RkpQqEds Mbyg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=FrzKq3sB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h5-20020a635305000000b0043946964302si7846210pgb.173.2022.11.13.09.18.53; Sun, 13 Nov 2022 09:19:12 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=FrzKq3sB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235214AbiKMRFZ (ORCPT <rfc822;winker.wchi@gmail.com> + 99 others); Sun, 13 Nov 2022 12:05:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41576 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232799AbiKMRFX (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sun, 13 Nov 2022 12:05:23 -0500 Received: from mail-pf1-x42c.google.com (mail-pf1-x42c.google.com [IPv6:2607:f8b0:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 630E9DEE6; Sun, 13 Nov 2022 09:05:22 -0800 (PST) Received: by mail-pf1-x42c.google.com with SMTP id 140so7560344pfz.6; Sun, 13 Nov 2022 09:05:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=gA2/oZmfwkbn2XDHTL8FyR2xx1wlqu47HFzJm5YeRu8=; b=FrzKq3sBcmK7w6Lr9PwFG56hYyGXGffSTIj2Ql2D/MKxaBY8nj8NZJLjY9UaXriiIx 0l32NyLQBGptP9cgRHIMDael4OQRYPSeoEjVjaza7akvqdsA4lrcH97hlf3wLwiCnrWC lJNTv5AZx34ySde+Dz9TMpQQ/PoUdgpBb4l6IbQFrVA8/Gbr3EBuwbEPA/K87byw6sH+ ReFIpcy83XfwFDZXPe8TLKFtf0/Bnc1In8lWM6ed4985ArYSftD+LK2iyOz/IFCuFY3q kg+crTLR+M2L83c1fTp5JvcO/YhBwu+XBt0eTAjkDNmG2kyuDUAlSqq9rS870C0DUyFY DJsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=gA2/oZmfwkbn2XDHTL8FyR2xx1wlqu47HFzJm5YeRu8=; b=VMldAU1sq9t+9h0R6dWOAqy+06Mr1lgavjnjyU1AbVjAoBnL6hipx6QFqThd/ZUuR8 28jBdx9wzP1vQI68n57PuyNtoWfrc6wlY0QvlJl9maPu8xPULg/3KSkOBblIr81J2b+a bzdwuickIwTtns+1LKvr0NRiMUnf07pRxfKWqOIN/a0gjdt4GIisMIfJVwsqkYuJStsm AzNtFVDOQk7YvSLCPO3Ln08Xgbu5Mx/5UzR1CDldWRcHHpgmW5C46W++UDNg6yLpKPq9 ITvMef7H26+xW76O3FXStzaCK8GbTpopFhVuUSMNbHW2ZJCy8yS0iWlL3CDuHynvZTyV OrJw== X-Gm-Message-State: ANoB5pkUBkXd/L4VPV2wzDTtXHqFM2SD9mtBzkVPv4hwt6RLWE005wyg WuUf7xoexkb0TuOu7Hs4qM4= X-Received: by 2002:a05:6a00:1f13:b0:567:546c:718b with SMTP id be19-20020a056a001f1300b00567546c718bmr10846747pfb.17.1668359121826; Sun, 13 Nov 2022 09:05:21 -0800 (PST) Received: from localhost ([114.254.0.245]) by smtp.gmail.com with ESMTPSA id x66-20020a626345000000b0053e4296e1d3sm4938132pfb.198.2022.11.13.09.05.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 13 Nov 2022 09:05:21 -0800 (PST) From: Hawkins Jiawei <yin31149@gmail.com> To: yin31149@gmail.com, Jamal Hadi Salim <jhs@mojatatu.com>, Cong Wang <xiyou.wangcong@gmail.com>, Jiri Pirko <jiri@resnulli.us>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com> Cc: 18801353760@163.com, syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com, syzkaller-bugs@googlegroups.com, Cong Wang <cong.wang@bytedance.com>, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2] net: sched: fix memory leak in tcindex_set_parms Date: Mon, 14 Nov 2022 01:05:08 +0800 Message-Id: <20221113170507.8205-1-yin31149@gmail.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749402205149978177?= X-GMAIL-MSGID: =?utf-8?q?1749402205149978177?= |
Series |
[v2] net: sched: fix memory leak in tcindex_set_parms
|
|
Commit Message
Hawkins Jiawei
Nov. 13, 2022, 5:05 p.m. UTC
Syzkaller reports a memory leak as follows: ==================================== BUG: memory leak unreferenced object 0xffff88810c287f00 (size 256): comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046 [<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline] [<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline] [<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline] [<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline] [<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342 [<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553 [<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147 [<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082 [<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540 [<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] [<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345 [<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921 [<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline] [<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734 [<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482 [<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536 [<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622 [<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline] [<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline] [<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648 [<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd ==================================== Kernel uses tcindex_change() to change an existing traffic-control-indices filter properties. During the process of changing, kernel clears the old traffic-control-indices filter result, and updates it by RCU assigning new traffic-control-indices data. Yet the problem is that, kernel clears the old traffic-control-indices filter result, without destroying its tcf_exts structure, which triggers the above memory leak. This patch solves it by using tcf_exts_destroy() to destroy the tcf_exts structure in old traffic-control-indices filter result, after the RCU grace period. [Thanks to the suggestion from Jakub Kicinski and Cong Wang] Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()") Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/ Reported-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com Tested-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com Cc: Cong Wang <cong.wang@bytedance.com> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> --- v2: - remove all 'will' in commit message according to Jakub Kicinski - add Fixes tag according to Jakub Kicinski - remove all ifdefs according to Jakub Kicinski and Cong Wang - add synchronize_rcu() before destorying old_e according to Cong Wang v1: https://lore.kernel.org/all/20221031060835.11722-1-yin31149@gmail.com/ net/sched/cls_tcindex.c | 8 ++++++++ 1 file changed, 8 insertions(+)
Comments
On Mon, 2022-11-14 at 01:05 +0800, Hawkins Jiawei wrote: > Syzkaller reports a memory leak as follows: > ==================================== > BUG: memory leak > unreferenced object 0xffff88810c287f00 (size 256): > comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s) > hex dump (first 32 bytes): > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > backtrace: > [<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046 > [<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline] > [<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline] > [<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline] > [<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline] > [<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342 > [<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553 > [<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147 > [<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082 > [<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540 > [<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] > [<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345 > [<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921 > [<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline] > [<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734 > [<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482 > [<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536 > [<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622 > [<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline] > [<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline] > [<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648 > [<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] > [<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 > [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd > ==================================== > > Kernel uses tcindex_change() to change an existing > traffic-control-indices filter properties. During the > process of changing, kernel clears the old > traffic-control-indices filter result, and updates it > by RCU assigning new traffic-control-indices data. > > Yet the problem is that, kernel clears the old > traffic-control-indices filter result, without destroying > its tcf_exts structure, which triggers the above > memory leak. > > This patch solves it by using tcf_exts_destroy() to > destroy the tcf_exts structure in old > traffic-control-indices filter result, after the > RCU grace period. > > [Thanks to the suggestion from Jakub Kicinski and Cong Wang] > > Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()") > Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/ > Reported-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com > Tested-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com > Cc: Cong Wang <cong.wang@bytedance.com> > Cc: Jakub Kicinski <kuba@kernel.org> > Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> > --- > v2: > - remove all 'will' in commit message according to Jakub Kicinski > - add Fixes tag according to Jakub Kicinski > - remove all ifdefs according to Jakub Kicinski and Cong Wang > - add synchronize_rcu() before destorying old_e according to > Cong Wang > > v1: https://lore.kernel.org/all/20221031060835.11722-1-yin31149@gmail.com/ > net/sched/cls_tcindex.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c > index 1c9eeb98d826..d2fac9559d3e 100644 > --- a/net/sched/cls_tcindex.c > +++ b/net/sched/cls_tcindex.c > @@ -338,6 +338,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > struct tcf_result cr = {}; > int err, balloc = 0; > struct tcf_exts e; > + struct tcf_exts old_e = {}; > > err = tcf_exts_init(&e, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE); > if (err < 0) > @@ -479,6 +480,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > } > > if (old_r && old_r != r) { > + old_e = old_r->exts; > err = tcindex_filter_result_init(old_r, cp, net); > if (err < 0) { > kfree(f); > @@ -510,6 +512,12 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > tcf_exts_destroy(&new_filter_result.exts); > } > > + /* Note: old_e should be destroyed after the RCU grace period, > + * to avoid possible use-after-free by concurrent readers. > + */ > + synchronize_rcu(); this could make tc reconfiguration potentially very slow. I'm wondering if we can delegate the tcf_exts_destroy() to some workqueue? Thanks! Paolo
On Tue, 15 Nov 2022 at 12:36, Paolo Abeni <pabeni@redhat.com> wrote: > > On Mon, 2022-11-14 at 01:05 +0800, Hawkins Jiawei wrote: > > Syzkaller reports a memory leak as follows: > > ==================================== > > BUG: memory leak > > unreferenced object 0xffff88810c287f00 (size 256): > > comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s) > > hex dump (first 32 bytes): > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > backtrace: > > [<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046 > > [<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline] > > [<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline] > > [<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline] > > [<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline] > > [<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342 > > [<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553 > > [<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147 > > [<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082 > > [<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540 > > [<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] > > [<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345 > > [<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921 > > [<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline] > > [<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734 > > [<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482 > > [<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536 > > [<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622 > > [<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline] > > [<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline] > > [<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648 > > [<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] > > [<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 > > [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd > > ==================================== > > > > Kernel uses tcindex_change() to change an existing > > traffic-control-indices filter properties. During the > > process of changing, kernel clears the old > > traffic-control-indices filter result, and updates it > > by RCU assigning new traffic-control-indices data. > > > > Yet the problem is that, kernel clears the old > > traffic-control-indices filter result, without destroying > > its tcf_exts structure, which triggers the above > > memory leak. > > > > This patch solves it by using tcf_exts_destroy() to > > destroy the tcf_exts structure in old > > traffic-control-indices filter result, after the > > RCU grace period. > > > > [Thanks to the suggestion from Jakub Kicinski and Cong Wang] > > > > Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()") > > Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/ > > Reported-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com > > Tested-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com > > Cc: Cong Wang <cong.wang@bytedance.com> > > Cc: Jakub Kicinski <kuba@kernel.org> > > Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> > > --- > > v2: > > - remove all 'will' in commit message according to Jakub Kicinski > > - add Fixes tag according to Jakub Kicinski > > - remove all ifdefs according to Jakub Kicinski and Cong Wang > > - add synchronize_rcu() before destorying old_e according to > > Cong Wang > > > > v1: https://lore.kernel.org/all/20221031060835.11722-1-yin31149@gmail.com/ > > net/sched/cls_tcindex.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c > > index 1c9eeb98d826..d2fac9559d3e 100644 > > --- a/net/sched/cls_tcindex.c > > +++ b/net/sched/cls_tcindex.c > > @@ -338,6 +338,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > > struct tcf_result cr = {}; > > int err, balloc = 0; > > struct tcf_exts e; > > + struct tcf_exts old_e = {}; > > > > err = tcf_exts_init(&e, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE); > > if (err < 0) > > @@ -479,6 +480,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > > } > > > > if (old_r && old_r != r) { > > + old_e = old_r->exts; > > err = tcindex_filter_result_init(old_r, cp, net); > > if (err < 0) { > > kfree(f); > > @@ -510,6 +512,12 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > > tcf_exts_destroy(&new_filter_result.exts); > > } > > > > + /* Note: old_e should be destroyed after the RCU grace period, > > + * to avoid possible use-after-free by concurrent readers. > > + */ > > + synchronize_rcu(); > > this could make tc reconfiguration potentially very slow. I'm wondering > if we can delegate the tcf_exts_destroy() to some workqueue? call_rcu?
On Mon, 14 Nov 2022 01:05:08 +0800 Hawkins Jiawei wrote: > diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c > index 1c9eeb98d826..d2fac9559d3e 100644 > --- a/net/sched/cls_tcindex.c > +++ b/net/sched/cls_tcindex.c > @@ -338,6 +338,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > struct tcf_result cr = {}; > int err, balloc = 0; > struct tcf_exts e; > + struct tcf_exts old_e = {}; This is not a valid way of initializing a structure. tcf_exts_init() is supposed to be called. If we add a list member to that structure this code will break, again. > err = tcf_exts_init(&e, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE); > if (err < 0) > @@ -479,6 +480,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > } > > if (old_r && old_r != r) { > + old_e = old_r->exts; > err = tcindex_filter_result_init(old_r, cp, net); > if (err < 0) { > kfree(f); > @@ -510,6 +512,12 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > tcf_exts_destroy(&new_filter_result.exts); > } > > + /* Note: old_e should be destroyed after the RCU grace period, > + * to avoid possible use-after-free by concurrent readers. > + */ > + synchronize_rcu(); > + tcf_exts_destroy(&old_e); I don't think this dance is required, @cp is a copy of the original data, and the original (@p) is destroyed in a safe manner below. > if (oldp) > tcf_queue_work(&oldp->rwork, tcindex_partial_destroy_work); > return 0;
On Tue, 2022-11-15 at 09:02 -0800, Jakub Kicinski wrote: > On Mon, 14 Nov 2022 01:05:08 +0800 Hawkins Jiawei wrote: > > > @@ -479,6 +480,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > > } > > > > if (old_r && old_r != r) { > > + old_e = old_r->exts; > > err = tcindex_filter_result_init(old_r, cp, net); > > if (err < 0) { > > kfree(f); > > @@ -510,6 +512,12 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > > tcf_exts_destroy(&new_filter_result.exts); > > } > > > > + /* Note: old_e should be destroyed after the RCU grace period, > > + * to avoid possible use-after-free by concurrent readers. > > + */ > > + synchronize_rcu(); > > + tcf_exts_destroy(&old_e); > > I don't think this dance is required, @cp is a copy of the original > data, and the original (@p) is destroyed in a safe manner below. This code confuses me more than a bit, and I don't follow ?!? it looks like that at this point: * the data path could access 'old_r->exts' contents via 'p' just before the previous 'tcindex_filter_result_init(old_r, cp, net);' but still potentially within the same RCU grace period * 'tcindex_filter_result_init(old_r, cp, net);' has 'unlinked' the old exts from 'p' so that will not be freed by later tcindex_partial_destroy_work() Overall it looks to me that we need some somewhat wait for the RCU grace period, Somewhat side question: it looks like that the 'perfect hashing' usage is the root cause of the issue addressed here, and very likely is afflicted by other problems, e.g. the data curruption in 'err = tcindex_filter_result_init(old_r, cp, net);'. AFAICS 'perfect hashing' usage is a sort of optimization that the user- space may trigger with some combination of the tcindex arguments. I'm wondering if we could drop all perfect hashing related code? Paolo
On Tue, 15 Nov 2022 19:57:10 +0100 Paolo Abeni wrote: > This code confuses me more than a bit, and I don't follow ?!? It's very confusing :S For starters I don't know when r != old_r. I mean now it triggers randomly after the RCU-ification, but in the original code when it was just a memset(). When would old_r ever not be null and yet point to a different entry? > it looks like that at this point: > > * the data path could access 'old_r->exts' contents via 'p' just before > the previous 'tcindex_filter_result_init(old_r, cp, net);' but still > potentially within the same RCU grace period > > * 'tcindex_filter_result_init(old_r, cp, net);' has 'unlinked' the old > exts from 'p' so that will not be freed by later > tcindex_partial_destroy_work() > > Overall it looks to me that we need some somewhat wait for the RCU > grace period, Isn't it better to make @cp a deeper copy of @p ? I thought it already is but we don't seem to be cloning p->h. Also the cloning of p->perfect looks quite lossy. > Somewhat side question: it looks like that the 'perfect hashing' usage > is the root cause of the issue addressed here, and very likely is > afflicted by other problems, e.g. the data curruption in 'err = > tcindex_filter_result_init(old_r, cp, net);'. > > AFAICS 'perfect hashing' usage is a sort of optimization that the user- > space may trigger with some combination of the tcindex arguments. I'm > wondering if we could drop all perfect hashing related code? The thought of "how much of this can we delete" did cross my mind :)
Hi, On Wed, 16 Nov 2022 at 01:02, Jakub Kicinski <kuba@kernel.org> wrote: > > On Mon, 14 Nov 2022 01:05:08 +0800 Hawkins Jiawei wrote: > > diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c > > index 1c9eeb98d826..d2fac9559d3e 100644 > > --- a/net/sched/cls_tcindex.c > > +++ b/net/sched/cls_tcindex.c > > @@ -338,6 +338,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > > struct tcf_result cr = {}; > > int err, balloc = 0; > > struct tcf_exts e; > > + struct tcf_exts old_e = {}; > > This is not a valid way of initializing a structure. > tcf_exts_init() is supposed to be called. > If we add a list member to that structure this code will break, again. Yes, you are right. But the `old_e` variable here is used only for freeing old_r->exts resource, `old_e` will be overwritten by old_r->exts content as follows: struct tcf_exts old_e = {}; ... if (old_r && old_r != r) { old_e = old_r->exts; ... } ... synchronize_rcu(); tcf_exts_destroy(&old_e); So this patch uses `struct tcf_exts old_e = {}` here just for a cleared space.
On Wed, 16 Nov 2022 at 10:44, Jakub Kicinski <kuba@kernel.org> wrote: > > On Tue, 15 Nov 2022 19:57:10 +0100 Paolo Abeni wrote: > > This code confuses me more than a bit, and I don't follow ?!? > > It's very confusing :S > > For starters I don't know when r != old_r. I mean now it triggers > randomly after the RCU-ification, but in the original code when > it was just a memset(). When would old_r ever not be null and yet > point to a different entry? I am also confused about the code when I tried to fix this bug. As for when `old_r != r`, according to the simplified code below, this should be probably true if `p->perfect` is true or `!p->perfect && !pc->h` is true(please correct me if I am wrong) struct tcindex_filter_result new_filter_result, *old_r = r; struct tcindex_data *cp = NULL, *oldp; struct tcf_result cr = {}; /* tcindex_data attributes must look atomic to classifier/lookup so * allocate new tcindex data and RCU assign it onto root. Keeping * perfect hash and hash pointers from old data. */ cp = kzalloc(sizeof(*cp), GFP_KERNEL); if (p->perfect) { if (tcindex_alloc_perfect_hash(net, cp) < 0) goto errout; ... } cp->h = p->h; if (!cp->perfect && !cp->h) { if (valid_perfect_hash(cp)) { if (tcindex_alloc_perfect_hash(net, cp) < 0) goto errout_alloc; } else { struct tcindex_filter __rcu **hash; hash = kcalloc(cp->hash, sizeof(struct tcindex_filter *), GFP_KERNEL); if (!hash) goto errout_alloc; cp->h = hash; } } ... if (cp->perfect) r = cp->perfect + handle; else r = tcindex_lookup(cp, handle) ? : &new_filter_result; if (old_r && old_r != r) { err = tcindex_filter_result_init(old_r, cp, net); if (err < 0) { kfree(f); goto errout_alloc; } } * If `p->perfect` is true, tcindex_alloc_perfect_hash() newly alloctes cp->perfect. * If `!p->perfect && !p->h` is true, cp->perfect or cp->h is newly allocated. In either case, r probably points to the newly allocated memory, which should not equals to the old_r. > > > it looks like that at this point: > > > > * the data path could access 'old_r->exts' contents via 'p' just before > > the previous 'tcindex_filter_result_init(old_r, cp, net);' but still > > potentially within the same RCU grace period > > > > * 'tcindex_filter_result_init(old_r, cp, net);' has 'unlinked' the old > > exts from 'p' so that will not be freed by later > > tcindex_partial_destroy_work() > > > > Overall it looks to me that we need some somewhat wait for the RCU > > grace period, > > Isn't it better to make @cp a deeper copy of @p ? > I thought it already is but we don't seem to be cloning p->h. > Also the cloning of p->perfect looks quite lossy. Yes, I also think @cp should be a deeper copy of @p. But it seems that in tcindex_alloc_perfect_hash(), each @cp ->exts will be initialized by tcf_exts_init() as below, and tcindex_set_parms() forgets to free the old ->exts content, triggering this memory leak.(Please correct me if I am wrong) static int tcindex_alloc_perfect_hash(struct net *net, struct tcindex_data *cp) { int i, err = 0; cp->perfect = kcalloc(cp->hash, sizeof(struct tcindex_filter_result), GFP_KERNEL | __GFP_NOWARN); for (i = 0; i < cp->hash; i++) { err = tcf_exts_init(&cp->perfect[i].exts, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE); if (err < 0) goto errout; cp->perfect[i].p = cp; } } static inline int tcf_exts_init(struct tcf_exts *exts, struct net *net, int action, int police) { #ifdef CONFIG_NET_CLS_ACT exts->type = 0; exts->nr_actions = 0; /* Note: we do not own yet a reference on net. * This reference might be taken later from tcf_exts_get_net(). */ exts->net = net; exts->actions = kcalloc(TCA_ACT_MAX_PRIO, sizeof(struct tc_action *), GFP_KERNEL); if (!exts->actions) return -ENOMEM; #endif exts->action = action; exts->police = police; return 0; } > > > Somewhat side question: it looks like that the 'perfect hashing' usage > > is the root cause of the issue addressed here, and very likely is > > afflicted by other problems, e.g. the data curruption in 'err = > > tcindex_filter_result_init(old_r, cp, net);'. > > > > AFAICS 'perfect hashing' usage is a sort of optimization that the user- > > space may trigger with some combination of the tcindex arguments. I'm > > wondering if we could drop all perfect hashing related code? > > The thought of "how much of this can we delete" did cross my mind :)
On Wed, 16 Nov 2022 at 20:10, Hawkins Jiawei <yin31149@gmail.com> wrote: > > On Wed, 16 Nov 2022 at 10:44, Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Tue, 15 Nov 2022 19:57:10 +0100 Paolo Abeni wrote: > > > This code confuses me more than a bit, and I don't follow ?!? > > > > It's very confusing :S > > > > For starters I don't know when r != old_r. I mean now it triggers > > randomly after the RCU-ification, but in the original code when > > it was just a memset(). When would old_r ever not be null and yet > > point to a different entry? > > I am also confused about the code when I tried to fix this bug. > > As for when `old_r != r`, according to the simplified > code below, this should be probably true if `p->perfect` is true > or `!p->perfect && !pc->h` is true(please correct me if I am wrong) > > struct tcindex_filter_result new_filter_result, *old_r = r; > struct tcindex_data *cp = NULL, *oldp; > struct tcf_result cr = {}; > > /* tcindex_data attributes must look atomic to classifier/lookup so > * allocate new tcindex data and RCU assign it onto root. Keeping > * perfect hash and hash pointers from old data. > */ > cp = kzalloc(sizeof(*cp), GFP_KERNEL); > > if (p->perfect) { > if (tcindex_alloc_perfect_hash(net, cp) < 0) > goto errout; > ... > } > cp->h = p->h; > > if (!cp->perfect && !cp->h) { > if (valid_perfect_hash(cp)) { > if (tcindex_alloc_perfect_hash(net, cp) < 0) > goto errout_alloc; > > } else { > struct tcindex_filter __rcu **hash; > > hash = kcalloc(cp->hash, > sizeof(struct tcindex_filter *), > GFP_KERNEL); > > if (!hash) > goto errout_alloc; > > cp->h = hash; > } > } > ... > > if (cp->perfect) > r = cp->perfect + handle; > else > r = tcindex_lookup(cp, handle) ? : &new_filter_result; > > if (old_r && old_r != r) { > err = tcindex_filter_result_init(old_r, cp, net); > if (err < 0) { > kfree(f); > goto errout_alloc; > } > } > > * If `p->perfect` is true, tcindex_alloc_perfect_hash() newly > alloctes cp->perfect. > > * If `!p->perfect && !p->h` is true, cp->perfect or cp->h is > newly allocated. > > In either case, r probably points to the newly allocated memory, > which should not equals to the old_r. Sorry for the error. In the second case, `r` is possibly pointing to the `&new_filter_result`, which is a stack variable address, and should still not equal to the `old_r`. > > > > > > it looks like that at this point: > > > > > > * the data path could access 'old_r->exts' contents via 'p' just before > > > the previous 'tcindex_filter_result_init(old_r, cp, net);' but still > > > potentially within the same RCU grace period > > > > > > * 'tcindex_filter_result_init(old_r, cp, net);' has 'unlinked' the old > > > exts from 'p' so that will not be freed by later > > > tcindex_partial_destroy_work() > > > > > > Overall it looks to me that we need some somewhat wait for the RCU > > > grace period, > > > > Isn't it better to make @cp a deeper copy of @p ? > > I thought it already is but we don't seem to be cloning p->h. > > Also the cloning of p->perfect looks quite lossy. > > Yes, I also think @cp should be a deeper copy of @p. > > But it seems that in tcindex_alloc_perfect_hash(), > each @cp ->exts will be initialized by tcf_exts_init() > as below, and tcindex_set_parms() forgets to free the > old ->exts content, triggering this memory leak.(Please > correct me if I am wrong) > > static int tcindex_alloc_perfect_hash(struct net *net, > struct tcindex_data *cp) > { > int i, err = 0; > > cp->perfect = kcalloc(cp->hash, sizeof(struct tcindex_filter_result), > GFP_KERNEL | __GFP_NOWARN); > > for (i = 0; i < cp->hash; i++) { > err = tcf_exts_init(&cp->perfect[i].exts, net, > TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE); > if (err < 0) > goto errout; > cp->perfect[i].p = cp; > } > } > > static inline int tcf_exts_init(struct tcf_exts *exts, struct net *net, > int action, int police) > { > #ifdef CONFIG_NET_CLS_ACT > exts->type = 0; > exts->nr_actions = 0; > /* Note: we do not own yet a reference on net. > * This reference might be taken later from tcf_exts_get_net(). > */ > exts->net = net; > exts->actions = kcalloc(TCA_ACT_MAX_PRIO, sizeof(struct tc_action *), > GFP_KERNEL); > if (!exts->actions) > return -ENOMEM; > #endif > exts->action = action; > exts->police = police; > return 0; > } > > > > > > Somewhat side question: it looks like that the 'perfect hashing' usage > > > is the root cause of the issue addressed here, and very likely is > > > afflicted by other problems, e.g. the data curruption in 'err = > > > tcindex_filter_result_init(old_r, cp, net);'. > > > > > > AFAICS 'perfect hashing' usage is a sort of optimization that the user- > > > space may trigger with some combination of the tcindex arguments. I'm > > > wondering if we could drop all perfect hashing related code? > > > > The thought of "how much of this can we delete" did cross my mind :)
diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c index 1c9eeb98d826..d2fac9559d3e 100644 --- a/net/sched/cls_tcindex.c +++ b/net/sched/cls_tcindex.c @@ -338,6 +338,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, struct tcf_result cr = {}; int err, balloc = 0; struct tcf_exts e; + struct tcf_exts old_e = {}; err = tcf_exts_init(&e, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE); if (err < 0) @@ -479,6 +480,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, } if (old_r && old_r != r) { + old_e = old_r->exts; err = tcindex_filter_result_init(old_r, cp, net); if (err < 0) { kfree(f); @@ -510,6 +512,12 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, tcf_exts_destroy(&new_filter_result.exts); } + /* Note: old_e should be destroyed after the RCU grace period, + * to avoid possible use-after-free by concurrent readers. + */ + synchronize_rcu(); + tcf_exts_destroy(&old_e); + if (oldp) tcf_queue_work(&oldp->rwork, tcindex_partial_destroy_work); return 0;