[RESEND,net-next,2/5] net/sched: taprio: keep child Qdisc refcount elevated at 2 in offload mode
Message ID | 20230602103750.2290132-3-vladimir.oltean@nxp.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp950588vqr; Fri, 2 Jun 2023 04:20:11 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7zHOoa/Cizl8N897S30k2C5c5Cdb8KaJEiBB7kL2RqtE/O/lqx4BkRGrWFtqfisBK6OsXs X-Received: by 2002:a05:6a20:a11c:b0:111:52a4:7fab with SMTP id q28-20020a056a20a11c00b0011152a47fabmr5566148pzk.26.1685704811594; Fri, 02 Jun 2023 04:20:11 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1685704811; cv=pass; d=google.com; s=arc-20160816; b=ikiYijiKZNef8mKMGy0WN2iSOWmWY/PGXbGufZ7ItF0Daa5cT2NmA4t3qjq5MEO9+S CSWYtdmgo9qyg5iIfv1ZWWAS4U3MHYqvdCGJMZziun0PcKZOyYY1d7Yw39ibTxIyHIWx kMQtlIJJvaEgQJA3rWT2J8iNJFb5RVQx7o7tuxDz0ip/hliTzB2buMniJSykHX9uHDjn szwtWcO0feN4zGXW7hPyqZQ/m7ruTWf4jgdV7RFPneh2VuNMOVFrndSV+OY3pkgrJW6y ZXtV7UmYV/c9MBO3hu2huZ3SuQK9ErZF565MbWRsWRIYCSdAsuE04duAknquMKw7j/WY Qu/w== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:content-transfer-encoding :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=j2NIe3uM/y4O56qmrtdqLZ6MoKG1xfC+qdNDtgfdlP4=; b=QWrChNlRzmpO0R3tJnCwx0CAjMsrzWeW84f+i84ERGNXIFLFnDg3in11J/YwuAfgAE bUkSZoIKluQ1OeMCPA7BQlGEgNXKTkkR3mTI7zXFCN+6hNzcURW9GBj1RmsE9qPDCZIn YAZtUSROknLXaq/ndf9QthW8gCZGhbfI4S20CJJ6urJar8S+o6UdUIs0NmV4G8Sjp2l9 vbuykGxdm/A7oCXitsn1tRSU6FgSS/xz9xJGDUy9BC9GOkCM6TBUjOP8I9SHmVOxcrdz FvcuE2LcqyISi2Buhds4Hcl+9h22WnaivgbNchk2K476qltoBsr+64XgEYxadN3/Vh51 NofQ== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@nxp.com header.s=selector2 header.b=afnN+efu; arc=pass (i=1 spf=pass spfdomain=nxp.com dkim=pass dkdomain=nxp.com dmarc=pass fromdomain=nxp.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nxp.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c28-20020a63725c000000b0053fb7d14255si859017pgn.0.2023.06.02.04.19.57; Fri, 02 Jun 2023 04:20:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@nxp.com header.s=selector2 header.b=afnN+efu; arc=pass (i=1 spf=pass spfdomain=nxp.com dkim=pass dkdomain=nxp.com dmarc=pass fromdomain=nxp.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nxp.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235545AbjFBKlB (ORCPT <rfc822;limurcpp@gmail.com> + 99 others); Fri, 2 Jun 2023 06:41:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59748 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235865AbjFBKj4 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 2 Jun 2023 06:39:56 -0400 Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2065.outbound.protection.outlook.com [40.107.20.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D68810FC; Fri, 2 Jun 2023 03:39:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CgmH6u0uUjBokSHsByjs7y6pGjqPxRiXWssbEccsGjwZ1YZiV/QcNriDBvDCFpLkxYo6bdkaPITLSkwh2dNUJ7/+KEEezIW+e9lY/vNEHpFET8sFAxx+pgN4eR1K05XlPWwMCYXtFFOL5XS+YdPyZFWRapZrAzXPcb7z2KMzUDCiXBFME4iLkNsfylh/Qx4CbIDrnt19kzD3Z2/VKXmYCnAMZP0uxEFnk1H6vQ4AOWD9baiakpdcwoCS5DoBGnqBI89sRZnP31lTCfiVqyZPLpGQRvz/XthAiG2E+jMpNykAtaoI14b+SpExf+y4CW2IvZJItAvKbUDif5cSCo7UPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=j2NIe3uM/y4O56qmrtdqLZ6MoKG1xfC+qdNDtgfdlP4=; b=ghMVejatq4rts8xpk6IBPQWKBdj8VMyZImeRxPqDX0LC8r6bSjOkvJsNDtywzYYjeVuupPnVk7kYLdvGs2vb7+buDKIgQxszAd8+nOpqvULhFS1dk8E+YYGytXbs+OIw2Cb4mY4osiRJvMF9W+eYhkXfILYaFrlfHS6ssAAsoS3zMY4O0mUdSnPIsLQSme47At1KGpXIQp8A/q6L2e4Fd0IY63ECQuYOi/V6YdSlzBBrmGPaUBL5yk9fmUZmk7Uw0nsmRIqRXB6dcrYypM0IWgwciMl7YyKw6p3R9ASH0hknCTMYNtOf33g1HU0dhKmaAz+8psqAa/HKS6m19O632A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nxp.com; dmarc=pass action=none header.from=nxp.com; dkim=pass header.d=nxp.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nxp.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=j2NIe3uM/y4O56qmrtdqLZ6MoKG1xfC+qdNDtgfdlP4=; b=afnN+efujdKY9hZky5jMDfRL2K99b4c+K9YbW+8cdWltu++DOwRo6SDW5DC17RkBcf/wUxaXQTQOpINPAAShIY2sbgwr1lWtFvFhVsM0+7ZOU5Db3P0SGVnFqXiqZFU6Jc0g5X95yaQO5jUXYS0fEV/PPQkNZgM+nEkhUQTUqY8= Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nxp.com; Received: from AM0PR04MB6452.eurprd04.prod.outlook.com (2603:10a6:208:16d::21) by PAXPR04MB8926.eurprd04.prod.outlook.com (2603:10a6:102:20d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6455.24; Fri, 2 Jun 2023 10:38:19 +0000 Received: from AM0PR04MB6452.eurprd04.prod.outlook.com ([fe80::47e4:eb1:e83c:fa4a]) by AM0PR04MB6452.eurprd04.prod.outlook.com ([fe80::47e4:eb1:e83c:fa4a%4]) with mapi id 15.20.6455.020; Fri, 2 Jun 2023 10:38:19 +0000 From: Vladimir Oltean <vladimir.oltean@nxp.com> To: netdev@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, Jamal Hadi Salim <jhs@mojatatu.com>, Cong Wang <xiyou.wangcong@gmail.com>, Jiri Pirko <jiri@resnulli.us>, Vinicius Costa Gomes <vinicius.gomes@intel.com>, linux-kernel@vger.kernel.org, intel-wired-lan@lists.osuosl.org, Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>, Peilin Ye <yepeilin.cs@gmail.com>, Pedro Tammela <pctammela@mojatatu.com> Subject: [PATCH RESEND net-next 2/5] net/sched: taprio: keep child Qdisc refcount elevated at 2 in offload mode Date: Fri, 2 Jun 2023 13:37:47 +0300 Message-Id: <20230602103750.2290132-3-vladimir.oltean@nxp.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230602103750.2290132-1-vladimir.oltean@nxp.com> References: <20230602103750.2290132-1-vladimir.oltean@nxp.com> Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: BE1P281CA0095.DEUP281.PROD.OUTLOOK.COM (2603:10a6:b10:79::10) To AM0PR04MB6452.eurprd04.prod.outlook.com (2603:10a6:208:16d::21) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: AM0PR04MB6452:EE_|PAXPR04MB8926:EE_ X-MS-Office365-Filtering-Correlation-Id: 87f51ae4-859a-40ef-2f3e-08db63557b07 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 3sY1wF1atYsj8VSPFiLvoVZtq5IZjsx+YiJ6JpHHLkd2tAYJnANQuEFhOz8cgGuAhKQ2sh2EOzzuEeoSm3rXTYk/JceAbRhOqDiI8gGoyEmDO6Axv8fD/3OxMBkD4rKIv1+1177Cy0snVqVhaWj22HvDwK8e7xO2Asqj03cE90+VIVUNXZqx/MWBiIzh3OckqT/D83GuLbmJab6vVZaeEMqXAjPFRW8zKR5A96fJ7Lj4KKEus7dvduuAQhHkcfVPEnZcnjkVYV0ynH8EPBcOQwSmWOPKgM54F/43vz1N0ZW67jNb7KqPFaAZK/AvsvMFkZ9RcOMPR4EULMTumB1xR2jrTI3GF/4+fb9PSV6qxY58hcPbq6YbcS4dh6Bvq5TTjOQa0CRpPSPu/ulrxBHXmu0ui2CYu1XYzSMmjlN8Hb4Tbm9YeLqN3jBonFQucoXbPRqJJVSXGK0TBTWFvkRWfBhAR2llKzFbeyqIrfg3H9LUHzEazpIR7VkuiGi2YDcvhqKl1FpE9KBTf/a57nSKCK+atNew3cSkDGELyFVTL/RWTiUGLDTSSnggy9fOFmACxV1nzE5FISr//cLfsl7df/cAxzzCNdnbWioLwdzP2FeyhH+lYJKNG9L1A6KZ//iC X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:AM0PR04MB6452.eurprd04.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(4636009)(366004)(346002)(376002)(39860400002)(136003)(396003)(451199021)(8676002)(478600001)(6506007)(1076003)(26005)(186003)(6512007)(6666004)(52116002)(6486002)(2616005)(41300700001)(83380400001)(316002)(2906002)(44832011)(8936002)(5660300002)(7416002)(66476007)(66556008)(66946007)(6916009)(54906003)(86362001)(36756003)(4326008)(38350700002)(38100700002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: E2o07uJ3jjTcrO7aaXiAN2HKJMB81o4JFHM4fVjC9KYSdi8rCYue6I2Y8dcVoKIF1b/AtAVJUSFx0EhUDaTY8eFXWXiWy7y7ThykVJOCQIVzGcVlzkfy5StuXxrxMp/tZbhpiw9zWSCuk39dnjHtEq6uNvNel9hLpHvM2cuR+KTfhKsuZEkRME/NNo24QI+wQXDFSoP81JnMxN2QqyrpkNkYrU+Wv2gFPJVHvV1n1y5KxPNqaLpY5+9LsnzgrIhbjtTXr+pY0Vwl8gEwH4L18VsgwYyc+tlBj2M8GBpnE8gVOQdxHiFQg9cRwwU+JMOnQZ+JnOq5wyL+AbD5/ZQ3inA5Dy+e72I768bvmajFTcPef3wvF8gmxfiCi1qht9Amd6wsXLKXWUXfDYQOwKjKX5QQsCpZwLSgLopQuhAKnLCw6v1yD2KWEWvlEGBWLhBzZZjH+xZIA3HK/f2H8BtvSnACv1Fxtyjb8EKhPgoxKmiCqtgQsjykJFHHS41VooNsXG5W1S9ke3r11BTqiVrc01XSfZ1nuKDkI6lF/0qYMjmN5TcCU/E1eiiHOAB92gP1DoR0/eWTb0DCWi71kRop38I4o7woSuouQDbjKXkgtas6a8Bha+809YW/Kw7POhherNU7ZlbFwJ4fGEQ0TT2Ju5PVALWtyH+yIISz09vy7KS9QVgXN1Z3oVqpvFJVzPtt/WmkkA8mLVXNvcPEJLUAplrNVoLTSIcFiKYJezmRwrS4LgVRX6qTwc7UYpkwWoLSYIUvcLOeuJez6Lfs8FUIJJbuICosoZMcfLINaILxKFjxg0fIKhiS6TjVhwJa5Qc0phe61O0gSToWM4o8hUU6GtUcOoY40QQyxSpr0Ig0RU06LzwvmYWoFuOfnxd6MXZkiBZIQ0N8Gw+GnCR7XGpTAFcNI9AGjmBGkEw0mEy8PFOrXqkQbznUR1dMLajDA1a9VT6L90A+AXKLKYmEyZDJNqEK/awWBA6+75LdsPjrJ3mF5U92qTKPYf2ktSKhmmdZPwpngKFmDXd8qI+H2V1xTTs1CqhC/ytihJVgo2nJKOGbiauLAXDBZ++r3L5bRFaUnc0UAhTs48NS0JoXXd29n05xKy3kVuuR2uwOjjsTDinL3xhgCXIkZBDjbpyZ/e3GDvOyBsCDxQwMhn+u6C7nEL7tZr507uYyyPnU4+s9NqNv3YI+74RjLh3AZ4F0iUVoQdiw6uNizroqhQ10FpLGGPWBjGBUjC+lOw1bY8Pnxt/2yLFuRaZJxmjTOLUdkA0YowtHh3gGYlzmtBI3CoFgCwfEhlSioHWu05l/I8dOA8zT8MO6Z9dHyaLWRdByjC5MUbMnJ4ree3GDaVa4Gny91T/KtWmoVNIA2CDIvuxJ83Y9vNXRK9leiesmvcGNVYtwg4rPqLO54XkGro0zJiRO7/FtY92A96Dt0PrL5mkK21+SHvA0ntbYbiYtSeHJZ8C3YmTQB7R46OCv48ZuWKr7QW87kVtFSrUPsgLEuNYbgtqAyiXG4/JZhgcN+m1cdoDbQ37dqmawBWcBlLCgAOgXq8K52eirbWQLzqP+3cavtZiZowtr+UXpEMqxVOBdcl/qQSp+jdxBx0moB5hoOstQhg== X-OriginatorOrg: nxp.com X-MS-Exchange-CrossTenant-Network-Message-Id: 87f51ae4-859a-40ef-2f3e-08db63557b07 X-MS-Exchange-CrossTenant-AuthSource: AM0PR04MB6452.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Jun 2023 10:38:19.2286 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 686ea1d3-bc2b-4c6f-a92c-d99c5c301635 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: j9mRJ/NR9Nv6PmNBzCCKyjIBucSDlEFK3ktygSgSf4A4ndf1Dq8IhjhuXfcI2o4nFf4K6j1HlAR+O8hT8Bwxkw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAXPR04MB8926 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767589608532708809?= X-GMAIL-MSGID: =?utf-8?q?1767589608532708809?= |
Series |
Improve the taprio qdisc's relationship with its children
|
|
Commit Message
Vladimir Oltean
June 2, 2023, 10:37 a.m. UTC
The taprio qdisc currently lives in the following equilibrium.
In the software scheduling case, taprio attaches itself to all TXQs,
thus having a refcount of 1 + the number of TX queues. In this mode,
q->qdiscs[] is not visible directly to the Qdisc API. The lifetime of
the Qdiscs from this private array lasts until qdisc_destroy() ->
taprio_destroy().
In the fully offloaded case, the root taprio has a refcount of 1, and
all child q->qdiscs[] also have a refcount of 1. The child q->qdiscs[]
are visible to the Qdisc API (they are attached to the netdev TXQs
directly), however taprio loses a reference to them very early - during
qdisc_graft(parent==NULL) -> taprio_attach(). At that time, taprio frees
the q->qdiscs[] array to not leak memory, but interestingly, it does not
release a reference on these qdiscs because it doesn't effectively own
them - they are created by taprio but owned by the Qdisc core, and will
be freed by qdisc_graft(parent==NULL, new==NULL) -> qdisc_put(old) when
the Qdisc is deleted or when the child Qdisc is replaced with something
else.
My interest is to change this equilibrium such that taprio also owns a
reference on the q->qdiscs[] child Qdiscs for the lifetime of the root
Qdisc, including in full offload mode. I want this because I would like
taprio_leaf(), taprio_dump_class(), taprio_dump_class_stats() to have
insight into q->qdiscs[] for the software scheduling mode - currently
they look at dev_queue->qdisc_sleeping, which is, as mentioned, the same
as the root taprio.
The following set of changes is necessary:
- don't free q->qdiscs[] early in taprio_attach(), free it late in
taprio_destroy() for consistency with software mode. But:
- currently that's not possible, because taprio doesn't own a reference
on q->qdiscs[]. So hold that reference - once during the initial
attach() and once during subsequent graft() calls when the child is
changed.
- always keep track of the current child in q->qdiscs[], even for full
offload mode, so that we free in taprio_destroy() what we should, and
not something stale.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
net/sched/sch_taprio.c | 28 +++++++++++++++++-----------
1 file changed, 17 insertions(+), 11 deletions(-)
Comments
On Fri, 2023-06-02 at 13:37 +0300, Vladimir Oltean wrote: > The taprio qdisc currently lives in the following equilibrium. > > In the software scheduling case, taprio attaches itself to all TXQs, > thus having a refcount of 1 + the number of TX queues. In this mode, > q->qdiscs[] is not visible directly to the Qdisc API. The lifetime of > the Qdiscs from this private array lasts until qdisc_destroy() -> > taprio_destroy(). > > In the fully offloaded case, the root taprio has a refcount of 1, and > all child q->qdiscs[] also have a refcount of 1. The child q->qdiscs[] > are visible to the Qdisc API (they are attached to the netdev TXQs > directly), however taprio loses a reference to them very early - during > qdisc_graft(parent==NULL) -> taprio_attach(). At that time, taprio frees > the q->qdiscs[] array to not leak memory, but interestingly, it does not > release a reference on these qdiscs because it doesn't effectively own > them - they are created by taprio but owned by the Qdisc core, and will > be freed by qdisc_graft(parent==NULL, new==NULL) -> qdisc_put(old) when > the Qdisc is deleted or when the child Qdisc is replaced with something > else. > > My interest is to change this equilibrium such that taprio also owns a > reference on the q->qdiscs[] child Qdiscs for the lifetime of the root > Qdisc, including in full offload mode. I want this because I would like > taprio_leaf(), taprio_dump_class(), taprio_dump_class_stats() to have > insight into q->qdiscs[] for the software scheduling mode - currently > they look at dev_queue->qdisc_sleeping, which is, as mentioned, the same > as the root taprio. > > The following set of changes is necessary: > - don't free q->qdiscs[] early in taprio_attach(), free it late in > taprio_destroy() for consistency with software mode. But: > - currently that's not possible, because taprio doesn't own a reference > on q->qdiscs[]. So hold that reference - once during the initial > attach() and once during subsequent graft() calls when the child is > changed. > - always keep track of the current child in q->qdiscs[], even for full > offload mode, so that we free in taprio_destroy() what we should, and > not something stale. > > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> > --- > net/sched/sch_taprio.c | 28 +++++++++++++++++----------- > 1 file changed, 17 insertions(+), 11 deletions(-) > > diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c > index b1c611c72aa4..8807fc915b79 100644 > --- a/net/sched/sch_taprio.c > +++ b/net/sched/sch_taprio.c > @@ -2138,23 +2138,20 @@ static void taprio_attach(struct Qdisc *sch) > > qdisc->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT; > old = dev_graft_qdisc(dev_queue, qdisc); > + /* Keep refcount of q->qdiscs[ntx] at 2 */ > + qdisc_refcount_inc(qdisc); > } else { > /* In software mode, attach the root taprio qdisc > * to all netdev TX queues, so that dev_qdisc_enqueue() > * goes through taprio_enqueue(). > */ > old = dev_graft_qdisc(dev_queue, sch); > + /* Keep root refcount at 1 + num_tx_queues */ > qdisc_refcount_inc(sch); > } > if (old) > qdisc_put(old); > } > - > - /* access to the child qdiscs is not needed in offload mode */ > - if (FULL_OFFLOAD_IS_ENABLED(q->flags)) { > - kfree(q->qdiscs); > - q->qdiscs = NULL; > - } > } > > static struct netdev_queue *taprio_queue_get(struct Qdisc *sch, > @@ -2183,15 +2180,24 @@ static int taprio_graft(struct Qdisc *sch, unsigned long cl, > if (dev->flags & IFF_UP) > dev_deactivate(dev); > > - if (FULL_OFFLOAD_IS_ENABLED(q->flags)) { > + /* In software mode, the root taprio qdisc is still the one attached to > + * all netdev TX queues, and hence responsible for taprio_enqueue() to > + * forward the skbs to the child qdiscs from the private q->qdiscs[] > + * array. So only attach the new qdisc to the netdev queue in offload > + * mode, where the enqueue must bypass taprio. However, save the > + * reference to the new qdisc in the private array in both cases, to > + * have an up-to-date reference to our children. > + */ > + if (FULL_OFFLOAD_IS_ENABLED(q->flags)) > *old = dev_graft_qdisc(dev_queue, new); > - } else { > + else > *old = q->qdiscs[cl - 1]; > - q->qdiscs[cl - 1] = new; > - } > > - if (new) > + q->qdiscs[cl - 1] = new; > + if (new) { > + qdisc_refcount_inc(new); > new->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT; > + } > Isn't the above leaking a reference to old with something alike: tc qdisc replace dev eth0 handle 8001: parent root stab overhead 24 taprio flags 0x2 #... # each q in q->qdiscs has refcnt == 2 tc qdisc replace dev eth0 parent 8001:8 cbs #... # -> taprio_graft(..., cbs, ...) # cbs refcnt is 2 # 'old' refcnt decreases by 1, refcount will not reach 0?!? kmemleak should be able to see that. BTW, what about including your tests from the cover letter somewhere under tc-testing? Thanks! Paolo
On Tue, Jun 06, 2023 at 12:27:09PM +0200, Paolo Abeni wrote: > On Fri, 2023-06-02 at 13:37 +0300, Vladimir Oltean wrote: > > The taprio qdisc currently lives in the following equilibrium. > > > > In the software scheduling case, taprio attaches itself to all TXQs, > > thus having a refcount of 1 + the number of TX queues. In this mode, > > q->qdiscs[] is not visible directly to the Qdisc API. The lifetime of > > the Qdiscs from this private array lasts until qdisc_destroy() -> > > taprio_destroy(). > > > > In the fully offloaded case, the root taprio has a refcount of 1, and > > all child q->qdiscs[] also have a refcount of 1. The child q->qdiscs[] > > are visible to the Qdisc API (they are attached to the netdev TXQs > > directly), however taprio loses a reference to them very early - during > > qdisc_graft(parent==NULL) -> taprio_attach(). At that time, taprio frees > > the q->qdiscs[] array to not leak memory, but interestingly, it does not > > release a reference on these qdiscs because it doesn't effectively own > > them - they are created by taprio but owned by the Qdisc core, and will > > be freed by qdisc_graft(parent==NULL, new==NULL) -> qdisc_put(old) when > > the Qdisc is deleted or when the child Qdisc is replaced with something > > else. > > > > My interest is to change this equilibrium such that taprio also owns a > > reference on the q->qdiscs[] child Qdiscs for the lifetime of the root > > Qdisc, including in full offload mode. I want this because I would like > > taprio_leaf(), taprio_dump_class(), taprio_dump_class_stats() to have > > insight into q->qdiscs[] for the software scheduling mode - currently > > they look at dev_queue->qdisc_sleeping, which is, as mentioned, the same > > as the root taprio. > > > > The following set of changes is necessary: > > - don't free q->qdiscs[] early in taprio_attach(), free it late in > > taprio_destroy() for consistency with software mode. But: > > - currently that's not possible, because taprio doesn't own a reference > > on q->qdiscs[]. So hold that reference - once during the initial > > attach() and once during subsequent graft() calls when the child is > > changed. > > - always keep track of the current child in q->qdiscs[], even for full > > offload mode, so that we free in taprio_destroy() what we should, and > > not something stale. > > > > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> > > --- > > net/sched/sch_taprio.c | 28 +++++++++++++++++----------- > > 1 file changed, 17 insertions(+), 11 deletions(-) > > > > diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c > > index b1c611c72aa4..8807fc915b79 100644 > > --- a/net/sched/sch_taprio.c > > +++ b/net/sched/sch_taprio.c > > @@ -2138,23 +2138,20 @@ static void taprio_attach(struct Qdisc *sch) > > > > qdisc->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT; > > old = dev_graft_qdisc(dev_queue, qdisc); > > + /* Keep refcount of q->qdiscs[ntx] at 2 */ > > + qdisc_refcount_inc(qdisc); > > } else { > > /* In software mode, attach the root taprio qdisc > > * to all netdev TX queues, so that dev_qdisc_enqueue() > > * goes through taprio_enqueue(). > > */ > > old = dev_graft_qdisc(dev_queue, sch); > > + /* Keep root refcount at 1 + num_tx_queues */ > > qdisc_refcount_inc(sch); > > } > > if (old) > > qdisc_put(old); > > } > > - > > - /* access to the child qdiscs is not needed in offload mode */ > > - if (FULL_OFFLOAD_IS_ENABLED(q->flags)) { > > - kfree(q->qdiscs); > > - q->qdiscs = NULL; > > - } > > } > > > > static struct netdev_queue *taprio_queue_get(struct Qdisc *sch, > > @@ -2183,15 +2180,24 @@ static int taprio_graft(struct Qdisc *sch, unsigned long cl, > > if (dev->flags & IFF_UP) > > dev_deactivate(dev); > > > > - if (FULL_OFFLOAD_IS_ENABLED(q->flags)) { > > + /* In software mode, the root taprio qdisc is still the one attached to > > + * all netdev TX queues, and hence responsible for taprio_enqueue() to > > + * forward the skbs to the child qdiscs from the private q->qdiscs[] > > + * array. So only attach the new qdisc to the netdev queue in offload > > + * mode, where the enqueue must bypass taprio. However, save the > > + * reference to the new qdisc in the private array in both cases, to > > + * have an up-to-date reference to our children. > > + */ > > + if (FULL_OFFLOAD_IS_ENABLED(q->flags)) > > *old = dev_graft_qdisc(dev_queue, new); > > - } else { > > + else > > *old = q->qdiscs[cl - 1]; > > - q->qdiscs[cl - 1] = new; > > - } > > > > - if (new) > > + q->qdiscs[cl - 1] = new; > > + if (new) { > > + qdisc_refcount_inc(new); > > new->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT; > > + } > > > Isn't the above leaking a reference to old with something alike: > > tc qdisc replace dev eth0 handle 8001: parent root stab overhead 24 taprio flags 0x2 #... > # each q in q->qdiscs has refcnt == 2 > tc qdisc replace dev eth0 parent 8001:8 cbs #... > # -> taprio_graft(..., cbs, ...) > # cbs refcnt is 2 > # 'old' refcnt decreases by 1, refcount will not reach 0?!? > > kmemleak should be able to see that. You're right - in full offload mode, the refcount of "old" (pfifo, parent 8001:8) does not drop to 0 after grafting something else to 8001:8. I believe this other implementation should work in all cases. static int taprio_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new, struct Qdisc **old, struct netlink_ext_ack *extack) { struct taprio_sched *q = qdisc_priv(sch); struct net_device *dev = qdisc_dev(sch); struct netdev_queue *dev_queue = taprio_queue_get(sch, cl); if (!dev_queue) return -EINVAL; if (dev->flags & IFF_UP) dev_deactivate(dev); /* In offload mode, the child Qdisc is directly attached to the netdev * TX queue, and thus, we need to keep its refcount elevated in order * to counteract qdisc_graft()'s call to qdisc_put() once per TX queue. * However, save the reference to the new qdisc in the private array in * both software and offload cases, to have an up-to-date reference to * our children. */ if (FULL_OFFLOAD_IS_ENABLED(q->flags)) { *old = dev_graft_qdisc(dev_queue, new); if (new) qdisc_refcount_inc(new); if (*old) qdisc_put(*old); } else { *old = q->qdiscs[cl - 1]; } q->qdiscs[cl - 1] = new; if (new) new->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT; if (dev->flags & IFF_UP) dev_activate(dev); return 0; } > BTW, what about including your tests from the cover letter somewhere under tc-testing? I don't know about that. Does it involve adding taprio hw offload to netdevsim, so that both code paths are covered? To be honest with you, I never ran tdc, and I worry about what I may find there, unrelated to what I'm being asked to add (at a cursory glance I see overlapping TXQs in taprio.json), and which has the possibility of turning this into a huge time pit.
On Tue, 2023-06-06 at 18:56 +0300, Vladimir Oltean wrote: > On Tue, Jun 06, 2023 at 12:27:09PM +0200, Paolo Abeni wrote: > > On Fri, 2023-06-02 at 13:37 +0300, Vladimir Oltean wrote: > > > The taprio qdisc currently lives in the following equilibrium. > > > > > > In the software scheduling case, taprio attaches itself to all TXQs, > > > thus having a refcount of 1 + the number of TX queues. In this mode, > > > q->qdiscs[] is not visible directly to the Qdisc API. The lifetime of > > > the Qdiscs from this private array lasts until qdisc_destroy() -> > > > taprio_destroy(). > > > > > > In the fully offloaded case, the root taprio has a refcount of 1, and > > > all child q->qdiscs[] also have a refcount of 1. The child q->qdiscs[] > > > are visible to the Qdisc API (they are attached to the netdev TXQs > > > directly), however taprio loses a reference to them very early - during > > > qdisc_graft(parent==NULL) -> taprio_attach(). At that time, taprio frees > > > the q->qdiscs[] array to not leak memory, but interestingly, it does not > > > release a reference on these qdiscs because it doesn't effectively own > > > them - they are created by taprio but owned by the Qdisc core, and will > > > be freed by qdisc_graft(parent==NULL, new==NULL) -> qdisc_put(old) when > > > the Qdisc is deleted or when the child Qdisc is replaced with something > > > else. > > > > > > My interest is to change this equilibrium such that taprio also owns a > > > reference on the q->qdiscs[] child Qdiscs for the lifetime of the root > > > Qdisc, including in full offload mode. I want this because I would like > > > taprio_leaf(), taprio_dump_class(), taprio_dump_class_stats() to have > > > insight into q->qdiscs[] for the software scheduling mode - currently > > > they look at dev_queue->qdisc_sleeping, which is, as mentioned, the same > > > as the root taprio. > > > > > > The following set of changes is necessary: > > > - don't free q->qdiscs[] early in taprio_attach(), free it late in > > > taprio_destroy() for consistency with software mode. But: > > > - currently that's not possible, because taprio doesn't own a reference > > > on q->qdiscs[]. So hold that reference - once during the initial > > > attach() and once during subsequent graft() calls when the child is > > > changed. > > > - always keep track of the current child in q->qdiscs[], even for full > > > offload mode, so that we free in taprio_destroy() what we should, and > > > not something stale. > > > > > > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> > > > --- > > > net/sched/sch_taprio.c | 28 +++++++++++++++++----------- > > > 1 file changed, 17 insertions(+), 11 deletions(-) > > > > > > diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c > > > index b1c611c72aa4..8807fc915b79 100644 > > > --- a/net/sched/sch_taprio.c > > > +++ b/net/sched/sch_taprio.c > > > @@ -2138,23 +2138,20 @@ static void taprio_attach(struct Qdisc *sch) > > > > > > qdisc->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT; > > > old = dev_graft_qdisc(dev_queue, qdisc); > > > + /* Keep refcount of q->qdiscs[ntx] at 2 */ > > > + qdisc_refcount_inc(qdisc); > > > } else { > > > /* In software mode, attach the root taprio qdisc > > > * to all netdev TX queues, so that dev_qdisc_enqueue() > > > * goes through taprio_enqueue(). > > > */ > > > old = dev_graft_qdisc(dev_queue, sch); > > > + /* Keep root refcount at 1 + num_tx_queues */ > > > qdisc_refcount_inc(sch); > > > } > > > if (old) > > > qdisc_put(old); > > > } > > > - > > > - /* access to the child qdiscs is not needed in offload mode */ > > > - if (FULL_OFFLOAD_IS_ENABLED(q->flags)) { > > > - kfree(q->qdiscs); > > > - q->qdiscs = NULL; > > > - } > > > } > > > > > > static struct netdev_queue *taprio_queue_get(struct Qdisc *sch, > > > @@ -2183,15 +2180,24 @@ static int taprio_graft(struct Qdisc *sch, unsigned long cl, > > > if (dev->flags & IFF_UP) > > > dev_deactivate(dev); > > > > > > - if (FULL_OFFLOAD_IS_ENABLED(q->flags)) { > > > + /* In software mode, the root taprio qdisc is still the one attached to > > > + * all netdev TX queues, and hence responsible for taprio_enqueue() to > > > + * forward the skbs to the child qdiscs from the private q->qdiscs[] > > > + * array. So only attach the new qdisc to the netdev queue in offload > > > + * mode, where the enqueue must bypass taprio. However, save the > > > + * reference to the new qdisc in the private array in both cases, to > > > + * have an up-to-date reference to our children. > > > + */ > > > + if (FULL_OFFLOAD_IS_ENABLED(q->flags)) > > > *old = dev_graft_qdisc(dev_queue, new); > > > - } else { > > > + else > > > *old = q->qdiscs[cl - 1]; > > > - q->qdiscs[cl - 1] = new; > > > - } > > > > > > - if (new) > > > + q->qdiscs[cl - 1] = new; > > > + if (new) { > > > + qdisc_refcount_inc(new); > > > new->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT; > > > + } > > > > > Isn't the above leaking a reference to old with something alike: > > > > tc qdisc replace dev eth0 handle 8001: parent root stab overhead 24 taprio flags 0x2 #... > > # each q in q->qdiscs has refcnt == 2 > > tc qdisc replace dev eth0 parent 8001:8 cbs #... > > # -> taprio_graft(..., cbs, ...) > > # cbs refcnt is 2 > > # 'old' refcnt decreases by 1, refcount will not reach 0?!? > > > > kmemleak should be able to see that. > > You're right - in full offload mode, the refcount of "old" (pfifo, parent 8001:8) > does not drop to 0 after grafting something else to 8001:8. > > I believe this other implementation should work in all cases. > > static int taprio_graft(struct Qdisc *sch, unsigned long cl, > struct Qdisc *new, struct Qdisc **old, > struct netlink_ext_ack *extack) > { > struct taprio_sched *q = qdisc_priv(sch); > struct net_device *dev = qdisc_dev(sch); > struct netdev_queue *dev_queue = taprio_queue_get(sch, cl); > > if (!dev_queue) > return -EINVAL; > > if (dev->flags & IFF_UP) > dev_deactivate(dev); > > /* In offload mode, the child Qdisc is directly attached to the netdev > * TX queue, and thus, we need to keep its refcount elevated in order > * to counteract qdisc_graft()'s call to qdisc_put() once per TX queue. > * However, save the reference to the new qdisc in the private array in > * both software and offload cases, to have an up-to-date reference to > * our children. > */ > if (FULL_OFFLOAD_IS_ENABLED(q->flags)) { > *old = dev_graft_qdisc(dev_queue, new); > if (new) > qdisc_refcount_inc(new); > if (*old) > qdisc_put(*old); > } else { > *old = q->qdiscs[cl - 1]; > } Perhaps the above chunk could be: *old = q->qdiscs[cl - 1]; if (FULL_OFFLOAD_IS_ENABLED(q->flags)) { WARN_ON_ONCE(dev_graft_qdisc(dev_queue, new) != *old); if (new) qdisc_refcount_inc(new); if (*old) qdisc_put(*old); } (boldly assuming I'm not completely lost, which looks a wild bet ;) > q->qdiscs[cl - 1] = new; > if (new) > new->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT; > > if (dev->flags & IFF_UP) > dev_activate(dev); > > return 0; > } > > > BTW, what about including your tests from the cover letter somewhere under tc-testing? > > I don't know about that. Does it involve adding taprio hw offload to netdevsim, > so that both code paths are covered? I guess I underlooked the needed effort and we could live without new tests here. Cheers, Paolo
On Wed, Jun 07, 2023 at 12:05:19PM +0200, Paolo Abeni wrote: > Perhaps the above chunk could be: > > *old = q->qdiscs[cl - 1]; > if (FULL_OFFLOAD_IS_ENABLED(q->flags)) { > WARN_ON_ONCE(dev_graft_qdisc(dev_queue, new) != *old); > if (new) > qdisc_refcount_inc(new); > if (*old) > qdisc_put(*old); > } > > (boldly assuming I'm not completely lost, which looks a wild bet ;) Yeah, could be like that. In full offload mode, q->qdiscs[cl - 1] is also what's grafted to the TXQ, so the WARN_ON() would be indicative of a serious bug if it triggered. > > > BTW, what about including your tests from the cover letter somewhere under tc-testing? > > > > I don't know about that. Does it involve adding taprio hw offload to netdevsim, > > so that both code paths are covered? > > I guess I underlooked the needed effort and we could live without new > tests here. Let's see how the discussion progresses.
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index b1c611c72aa4..8807fc915b79 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -2138,23 +2138,20 @@ static void taprio_attach(struct Qdisc *sch) qdisc->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT; old = dev_graft_qdisc(dev_queue, qdisc); + /* Keep refcount of q->qdiscs[ntx] at 2 */ + qdisc_refcount_inc(qdisc); } else { /* In software mode, attach the root taprio qdisc * to all netdev TX queues, so that dev_qdisc_enqueue() * goes through taprio_enqueue(). */ old = dev_graft_qdisc(dev_queue, sch); + /* Keep root refcount at 1 + num_tx_queues */ qdisc_refcount_inc(sch); } if (old) qdisc_put(old); } - - /* access to the child qdiscs is not needed in offload mode */ - if (FULL_OFFLOAD_IS_ENABLED(q->flags)) { - kfree(q->qdiscs); - q->qdiscs = NULL; - } } static struct netdev_queue *taprio_queue_get(struct Qdisc *sch, @@ -2183,15 +2180,24 @@ static int taprio_graft(struct Qdisc *sch, unsigned long cl, if (dev->flags & IFF_UP) dev_deactivate(dev); - if (FULL_OFFLOAD_IS_ENABLED(q->flags)) { + /* In software mode, the root taprio qdisc is still the one attached to + * all netdev TX queues, and hence responsible for taprio_enqueue() to + * forward the skbs to the child qdiscs from the private q->qdiscs[] + * array. So only attach the new qdisc to the netdev queue in offload + * mode, where the enqueue must bypass taprio. However, save the + * reference to the new qdisc in the private array in both cases, to + * have an up-to-date reference to our children. + */ + if (FULL_OFFLOAD_IS_ENABLED(q->flags)) *old = dev_graft_qdisc(dev_queue, new); - } else { + else *old = q->qdiscs[cl - 1]; - q->qdiscs[cl - 1] = new; - } - if (new) + q->qdiscs[cl - 1] = new; + if (new) { + qdisc_refcount_inc(new); new->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT; + } if (dev->flags & IFF_UP) dev_activate(dev);