Message ID | 20230130005725.3517597-6-sdonthineni@nvidia.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1951104wrn; Sun, 29 Jan 2023 17:12:35 -0800 (PST) X-Google-Smtp-Source: AK7set/31qg5Qp08Tn0rb56A6odVqC1mPFsfSLqBEp4NuGgtYvux/+bntrShOO6OQHno5IScGkF/ X-Received: by 2002:a17:902:d4cf:b0:194:d1ee:954 with SMTP id o15-20020a170902d4cf00b00194d1ee0954mr7484448plg.7.1675041154670; Sun, 29 Jan 2023 17:12:34 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1675041154; cv=pass; d=google.com; s=arc-20160816; b=Zl2/s3TlZ1rZJLE4/AwOz1W+jzHpH/f6PRM0smuC8RgFBpfXSDy4v9p/mH5IVc6uo8 bKrgdzvNcxqQ0Uxa9f9yBKo/yti3eM+zNjbhLivaVa7JKOhXymrtOp91KvAIpA6RMyAO iVHACf6K/8GcOeVe+Rz94sQ8ahgsv7gwpZ/sRbwMeFnKI7Nf0vGXkb/HFnNhb2bh2a4c Zc0Vre5/Oq/C9WBsSbZ2skBzaQs1dyK2lDf09oMQdb1Ybx1j0wAwtOnuqnmOY1jXTBE+ AY0ne2MSZGV2XyIxONFKk9Zbaj/ZmVousMcBr3aIZZPi2qGgBIuYAEwftsi1FxQwQhCt SwOA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=9v9WfyT8WB4VHt8ZMWtIo74b8GKNJCGheVaBwTRAisw=; b=A8TMUMIkt7jYBm9SCs2Bo3peDukuXe0Iz3kxkhWD0XR2sndcw2yKO+03WMuYBbNUSi 1xIq8BOsRyAU7K7tyskdN88WN4xk/xloJJwKlYKnq7+Jo2ipqwK1zn7UrOoSpZ5w2IMs aicSN6rYZAvO2YJ+xz49OPpCagxapD4ShXn2STR7kZPHZv+C+vVmj/OrzU+yEt0JPF+4 zmi4CGy/XyFd8sNedCrjviqXqeB3vnbbQn/tLLVK2Ib+PVFaLXARDULPYs3y/DZyKF0D Z922HlNEC491hoYx0Of94cKIVkNnLKQ8n9Gi81z2GqjQ7eWZzCPYwKi+VVLiyGhgt7XP H/1w== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@Nvidia.com header.s=selector2 header.b=jqnTxu+I; arc=pass (i=1 spf=pass spfdomain=nvidia.com dmarc=pass fromdomain=nvidia.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=nvidia.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id je12-20020a170903264c00b001947b3ec2desi9779501plb.234.2023.01.29.17.12.22; Sun, 29 Jan 2023 17:12:34 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@Nvidia.com header.s=selector2 header.b=jqnTxu+I; arc=pass (i=1 spf=pass spfdomain=nvidia.com dmarc=pass fromdomain=nvidia.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=nvidia.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235065AbjA3A6A (ORCPT <rfc822;n2h9z4@gmail.com> + 99 others); Sun, 29 Jan 2023 19:58:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45192 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233396AbjA3A5s (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sun, 29 Jan 2023 19:57:48 -0500 Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10on2072.outbound.protection.outlook.com [40.107.94.72]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A614C1D907 for <linux-kernel@vger.kernel.org>; Sun, 29 Jan 2023 16:57:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Bdif9+gC+YqED77/z7Prh9voqo+l5TZRqvfwhaHK8eK3nKAApHCEPok47Obx/4jYywoAVXwEmKeSdzKB5dfVgw8IxRO4AzvsX4Tt6wTEQXgQvZSQLlun+3ieSpCSM7eWZttoxJ/bTjhmEKbdvrxNBk9QjNYbxrhqty5sUu+sduPixdZWSQNUzA+SJYHYZMP7zmysOWS5vYFsXFYKk1L4OdrzqneG6YFhAZZPpiwMylomdPV4Lb7BKkG4vZapAFFyTth6glvbZkOffQGld/ihGVGyZJ5YBeBRCBuzo2xuWuQKriUB0kCxWDY7KbBgn8TqXE5KMuk0nXihdrJOOspC8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=9v9WfyT8WB4VHt8ZMWtIo74b8GKNJCGheVaBwTRAisw=; b=ZpT1FfnL+mpOjyOmV2v4l+jTba1MaCrzBTgoXxydtzQ05kkSN6dPNdYaNt08fHrfNAyN4+XRFpAXh2Bh7TSe/hMMUbIF54uAuMkxCGiWJRdy/B6wQTA/Aptxu1peiFaXOUJo2q+SFO3GWtLCehq8DF6P5n+0g81FngGB4OLjCaw32jdY9MxAJee+1CaL3DzNdR2lOP+zs2B6man7yM/5sJVgUuSIthEpBzsWg+PofW0/QY6gyvQt6IaXoukOQ2+C+no8mPTmC/LL5vQ4YhP7rJZq90kgtraYM2BxjX5941PeCeSJtW/lLMbGdCAUpG+HQ4z5gxHQ76LqS51un4DpKA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=linutronix.de smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=9v9WfyT8WB4VHt8ZMWtIo74b8GKNJCGheVaBwTRAisw=; b=jqnTxu+IZZ86sZPJnrEILMXU4dZcYOMrCA/y+Ctkydbi8g/b4lnEXvWrlEN9/M92y32efRarSguwPUCOLdp3W2dJLdE8xQOl0Bs8a2xggXhnISd+Bvw7BODZ6fJglhBkHgRgsAWkZ3tm9hwmJuOp2C16oqNzTKeJH925KWof/4bSVY5zjLpqB19KT6NDfFHkQ2yUZWlEey8OQNOPyhWqPyiTPxsxZShMrz2evhmNgjU8xAYJirWnN7HR1bC6qbqDHh0AI2qaNthHxqXkLWwiX8uGLbElVkO0FJEnLd9qZg7EEjeogBkWOQZVaec0TjuBmADli5RCJStkRA0a/RXl6Q== Received: from MW2PR16CA0070.namprd16.prod.outlook.com (2603:10b6:907:1::47) by MN2PR12MB4254.namprd12.prod.outlook.com (2603:10b6:208:1d0::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6043.36; Mon, 30 Jan 2023 00:57:42 +0000 Received: from CO1NAM11FT079.eop-nam11.prod.protection.outlook.com (2603:10b6:907:1:cafe::6c) by MW2PR16CA0070.outlook.office365.com (2603:10b6:907:1::47) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6043.36 via Frontend Transport; Mon, 30 Jan 2023 00:57:42 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by CO1NAM11FT079.mail.protection.outlook.com (10.13.175.134) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6043.22 via Frontend Transport; Mon, 30 Jan 2023 00:57:41 +0000 Received: from rnnvmail204.nvidia.com (10.129.68.6) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Sun, 29 Jan 2023 16:57:31 -0800 Received: from rnnvmail202.nvidia.com (10.129.68.7) by rnnvmail204.nvidia.com (10.129.68.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Sun, 29 Jan 2023 16:57:31 -0800 Received: from SDONTHINENI-DESKTOP.nvidia.com (10.127.8.9) by mail.nvidia.com (10.129.68.7) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Sun, 29 Jan 2023 16:57:30 -0800 From: Shanker Donthineni <sdonthineni@nvidia.com> To: Thomas Gleixner <tglx@linutronix.de>, Marc Zyngier <maz@kernel.org>, Michael Walle <michael@walle.cc> CC: Sebastian Andrzej Siewior <bigeasy@linutronix.de>, Hans de Goede <hdegoede@redhat.com>, Wolfram Sang <wsa+renesas@sang-engineering.com>, Shanker Donthineni <sdonthineni@nvidia.com>, <linux-kernel@vger.kernel.org> Subject: [PATCH 5/5] genirq: Use the maple tree for IRQ descriptors management Date: Sun, 29 Jan 2023 18:57:25 -0600 Message-ID: <20230130005725.3517597-6-sdonthineni@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230130005725.3517597-1-sdonthineni@nvidia.com> References: <20230130005725.3517597-1-sdonthineni@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public Content-Transfer-Encoding: 8bit Content-Type: text/plain X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1NAM11FT079:EE_|MN2PR12MB4254:EE_ X-MS-Office365-Filtering-Correlation-Id: 85e95a8d-3bdd-4ca5-ccf8-08db025cfdc1 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: eyBzXCOFx4874txF1QFpaPsq9WoAWrin6p9Of9ZOo3RP8oqIYzx1ezQZMY6jrPKYG17mqFknYeuUzB2KfkeCX6YHsJeZ3u5dMedbhL0YpDoocQy9V2EaQ+dvyb3BLUm7IlrCaeb0R3tWMFapR+nz4huaFvkwy7Q2obfQ49XNoo+viS6jwbodgqoqW3GK1quI8A6EIMeU5TrFgDhycGTpG+auMhE4i8/+PjTTgQ9Fs0vG1edbY+U7nN/JhwvdlIWKTSq2YJpeyYb2U+pksHLAVUGk8ZGNkLbySPz0YfDgYTA5RKltIW7RifJZGkwVBWPrQahIOeoVZjtA+mYgnPj+8Bh76FZ/nwfayHDMtST9MVW1apWXuzA6lsDZiKq3+PeTG33EkWVA6hHvGR2v8SShGmd4lmTTU0eekOJIYbOAH8pz4p3au65Hk80hp0y1abraiceYYcehGl/Zq7cEQMFAT91RAIJzf6bth6YHnFKnj6En2blx8J29m9nL7hP9tVGBvs2GBkrmLCX3nUSq1g+1xetM1gLdYX2hU8P/VMnMMFyeGocpHM114p7hRp4VVMWJH/84Z6eVdCqIceC8AlzM+c1GwIsUbNEjPXBu2TqsZ/FiGmQrDGgiyVydQKBIZwMdj94PuE3ZrOvAagCkIKaElNzVhDxjmduLb9G6R0dUDcQr/7sSmFc1Fc943DohUstgwt/CU78j57Wq7rKA3t5hzg== X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230025)(4636009)(39860400002)(396003)(346002)(136003)(376002)(451199018)(36840700001)(40470700004)(46966006)(36756003)(54906003)(110136005)(316002)(70586007)(4326008)(8676002)(70206006)(8936002)(41300700001)(5660300002)(36860700001)(86362001)(82740400003)(356005)(7636003)(1076003)(6666004)(186003)(26005)(66899018)(336012)(82310400005)(2906002)(40460700003)(40480700001)(47076005)(478600001)(7696005)(426003)(83380400001)(2616005);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Jan 2023 00:57:41.8921 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 85e95a8d-3bdd-4ca5-ccf8-08db025cfdc1 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT079.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR12MB4254 X-Spam-Status: No, score=-0.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756407953959206193?= X-GMAIL-MSGID: =?utf-8?q?1756407953959206193?= |
Series |
Increase the number of IRQ descriptors for SPARSEIRQ
|
|
Commit Message
Shanker Donthineni
Jan. 30, 2023, 12:57 a.m. UTC
The current implementation uses a static bitmap and a radix tree
to manage IRQ allocation and irq_desc pointer store respectively.
However, the size of the bitmap is constrained by the build time
macro MAX_SPARSE_IRQS, which may not be sufficient to support the
high-end servers, particularly those with GICv4.1 hardware, which
require a large interrupt space to cover LPIs and vSGIs
The maple tree is a highly efficient data structure for storing
non-overlapping ranges and can handle a large number of entries,
up to ULONG_MAX. It can be utilized for both storing IRQD and
identifying available free spaces.
The IRQD management can be simplified by switching to a maple tree
data structure, which offers greater flexibility and scalability.
To support modern servers, the maximum number of IRQs has been
increased to INT_MAX, which provides a more adequate value than
the previous limit of NR_IRQS+8192.
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
---
kernel/irq/internals.h | 2 +-
kernel/irq/irqdesc.c | 51 ++++++++++++++++++++++++------------------
2 files changed, 30 insertions(+), 23 deletions(-)
Comments
On Sun, Jan 29 2023 at 18:57, Shanker Donthineni wrote: > The current implementation uses a static bitmap and a radix tree > to manage IRQ allocation and irq_desc pointer store respectively. > However, the size of the bitmap is constrained by the build time > macro MAX_SPARSE_IRQS, which may not be sufficient to support the > high-end servers, particularly those with GICv4.1 hardware, which > require a large interrupt space to cover LPIs and vSGIs > > The maple tree is a highly efficient data structure for storing > non-overlapping ranges and can handle a large number of entries, > up to ULONG_MAX. It can be utilized for both storing IRQD and IRQD ??. Please write it out: interrupt descriptors Changelogs have no space constraints and there is zero value to introduce unreadable acronyms. > static DEFINE_MUTEX(sparse_irq_lock); > -static DECLARE_BITMAP(allocated_irqs, MAX_SPARSE_IRQS); > +static struct maple_tree sparse_irqs = MTREE_INIT_EXT(sparse_irqs, > + MT_FLAGS_ALLOC_RANGE | > + MT_FLAGS_LOCK_EXTERN | > + MT_FLAGS_USE_RCU, sparse_irq_lock); Nit. Can we please format this properly: static struct maple_tree sparse_irqs = MTREE_INIT_EXT(sparse_irqs, MT_FLAGS_ALLOC_RANGE | MT_FLAGS_LOCK_EXTERN | MT_FLAGS_USE_RCU, sparse_irq_lock); Other than that this looks really good. Thanks, tglx
On 1/31/23 03:52, Thomas Gleixner wrote: > External email: Use caution opening links or attachments > > > On Sun, Jan 29 2023 at 18:57, Shanker Donthineni wrote: >> The current implementation uses a static bitmap and a radix tree >> to manage IRQ allocation and irq_desc pointer store respectively. >> However, the size of the bitmap is constrained by the build time >> macro MAX_SPARSE_IRQS, which may not be sufficient to support the >> high-end servers, particularly those with GICv4.1 hardware, which >> require a large interrupt space to cover LPIs and vSGIs >> >> The maple tree is a highly efficient data structure for storing >> non-overlapping ranges and can handle a large number of entries, >> up to ULONG_MAX. It can be utilized for both storing IRQD and > > IRQD ??. Please write it out: interrupt descriptors > > Changelogs have no space constraints and there is zero value to > introduce unreadable acronyms. > >> static DEFINE_MUTEX(sparse_irq_lock); >> -static DECLARE_BITMAP(allocated_irqs, MAX_SPARSE_IRQS); >> +static struct maple_tree sparse_irqs = MTREE_INIT_EXT(sparse_irqs, >> + MT_FLAGS_ALLOC_RANGE | >> + MT_FLAGS_LOCK_EXTERN | >> + MT_FLAGS_USE_RCU, sparse_irq_lock); > > Nit. Can we please format this properly: > > static struct maple_tree sparse_irqs = MTREE_INIT_EXT(sparse_irqs, > MT_FLAGS_ALLOC_RANGE | > MT_FLAGS_LOCK_EXTERN | > MT_FLAGS_USE_RCU, > sparse_irq_lock); > > Other than that this looks really good. > I'll update in v2 patch. Thanks, Shanker
Greeting, FYI, we noticed WARNING:at_kernel/locking/lockdep.c:#lockdep_hardirqs_on_prepare due to commit (built with gcc-11): commit: 02fb8013ee5f9b7d7bc35d54bf8bc5fe1179970c ("[PATCH 5/5] genirq: Use the maple tree for IRQ descriptors management") url: https://github.com/intel-lab-lkp/linux/commits/Shanker-Donthineni/genirq-Use-hlist-for-managing-resend-handlers/20230130-085956 base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 188a569658584e93930ab60334c5a1079c0330d8 patch link: https://lore.kernel.org/all/20230130005725.3517597-6-sdonthineni@nvidia.com/ patch subject: [PATCH 5/5] genirq: Use the maple tree for IRQ descriptors management in testcase: boot on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): If you fix the issue, kindly add following tag | Reported-by: kernel test robot <oliver.sang@intel.com> | Link: https://lore.kernel.org/oe-lkp/202302011308.f53123d2-oliver.sang@intel.com [ 2.214554][ T0] ------------[ cut here ]------------ [ 2.215401][ T0] DEBUG_LOCKS_WARN_ON(early_boot_irqs_disabled) [ 2.215446][ T0] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:4308 lockdep_hardirqs_on_prepare+0x2d4/0x350 [ 2.217975][ T0] Modules linked in: [ 2.218526][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-rc3-00015-g02fb8013ee5f #1 [ 2.219803][ T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-5 04/01/2014 [ 2.221228][ T0] RIP: 0010:lockdep_hardirqs_on_prepare+0x2d4/0x350 [ 2.222207][ T0] Code: 11 38 d0 7c 04 84 d2 75 5e 8b 0d bf 8b f7 03 85 c9 0f 85 c9 fe ff ff 48 c7 c6 40 7d a9 83 48 c7 c7 60 4e a9 83 e8 60 7c 35 02 <0f> 0b e9 af fe ff ff e8 50 8d 62 00 e9 0c fe ff ff e8 e6 8d 62 00 [ 2.224923][ T0] RSP: 0000:ffffffff844075a0 EFLAGS: 00010082 [ 2.225792][ T0] RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000 [ 2.226889][ T0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: fffffbfff0880ea6 [ 2.227955][ T0] RBP: ffff8883af23fac0 R08: 0000000000000000 R09: ffffffff844072df [ 2.229068][ T0] R10: fffffbfff0880e5b R11: 0000000000000001 R12: 0000000000000002 [ 2.230147][ T0] R13: 0000000000000002 R14: ffff88810022b018 R15: ffff88810022b010 [ 2.231269][ T0] FS: 0000000000000000(0000) GS:ffff8883af200000(0000) knlGS:0000000000000000 [ 2.232522][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.233395][ T0] CR2: ffff88843ffff000 CR3: 000000000442a000 CR4: 00000000000406b0 [ 2.234504][ T0] Call Trace: [ 2.234941][ T0] <TASK> [ 2.235345][ T0] trace_hardirqs_on+0x40/0x140 [ 2.236029][ T0] __kmem_cache_alloc_bulk+0x22e/0x490 [ 2.236795][ T0] ? kasan_set_track+0x25/0x30 [ 2.237470][ T0] kmem_cache_alloc_bulk+0x159/0x2e0 [ 2.238225][ T0] mas_alloc_nodes+0x253/0x690 [ 2.238886][ T0] mas_split+0x30d/0x1580 [ 2.239561][ T0] ? mas_push_data+0x1a40/0x1a40 [ 2.240219][ T0] ? memset+0x24/0x50 [ 2.240782][ T0] ? blake2s_final+0x110/0x140 [ 2.241426][ T0] ? blake2s+0x115/0x150 [ 2.242143][ T0] ? wait_for_random_bytes+0xd0/0xd0 [ 2.242859][ T0] ? mas_mab_cp+0x2f6/0x890 [ 2.243527][ T0] ? memset+0x24/0x50 [ 2.244122][ T0] ? find_held_lock+0x2c/0x110 [ 2.244803][ T0] ? mas_store_b_node+0x54c/0x1180 [ 2.245510][ T0] ? rcu_read_lock_sched_held+0x16/0x80 [ 2.246282][ T0] mas_wr_bnode+0x14f/0x1d0 [ 2.246902][ T0] ? mas_commit_b_node+0x600/0x600 [ 2.247677][ T0] ? secondary_startup_64_no_verify+0xe0/0xeb [ 2.248567][ T0] ? ___slab_alloc+0x70b/0xe00 [ 2.249251][ T0] ? mas_wr_store_entry+0x2e9/0xe30 [ 2.250088][ T0] ? rcu_read_lock_sched_held+0x16/0x80 [ 2.250864][ T0] mas_store_gfp+0xc2/0x190 [ 2.251516][ T0] ? mtree_erase+0x100/0x100 [ 2.252190][ T0] ? lockdep_init_map_type+0x2c7/0x780 [ 2.252924][ T0] irq_insert_desc+0xac/0xf0 [ 2.253562][ T0] ? irq_kobj_release+0x100/0x100 [ 2.254243][ T0] early_irq_init+0x81/0x8c [ 2.254866][ T0] start_kernel+0x1c7/0x3a4 [ 2.255479][ T0] secondary_startup_64_no_verify+0xe0/0xeb [ 2.256408][ T0] </TASK> [ 2.256802][ T0] irq event stamp: 0 [ 2.257268][ T0] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [ 2.258177][ T0] hardirqs last disabled at (0): [<0000000000000000>] 0x0 [ 2.259116][ T0] softirqs last enabled at (0): [<0000000000000000>] 0x0 [ 2.260044][ T0] softirqs last disabled at (0): [<0000000000000000>] 0x0 [ 2.260979][ T0] ---[ end trace 0000000000000000 ]--- [ 2.262190][ T0] rcu: srcu_init: Setting srcu_struct sizes based on contention. [ 2.263441][ T0] ------------[ cut here ]------------ [ 2.264180][ T0] Interrupts were enabled early [ 2.264809][ T0] WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x239/0x3a4 [ 2.265872][ T0] Modules linked in: [ 2.266391][ T0] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 6.2.0-rc3-00015-g02fb8013ee5f #1 [ 2.267721][ T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-5 04/01/2014 [ 2.270166][ T0] RIP: 0010:start_kernel+0x239/0x3a4 [ 2.270938][ T0] Code: 48 89 05 f6 11 58 7a e8 b9 04 06 00 e8 f4 d2 d1 fd e8 40 75 05 00 9c 58 0f ba e0 09 73 0e 48 c7 c7 60 0e a0 83 e8 af bf bf fd <0f> 0b c6 05 2a 12 81 ff 00 e8 ad 96 ad fb fb e8 58 07 07 00 e8 49 [ 2.273782][ T0] RSP: 0000:ffffffff84407f38 EFLAGS: 00010286 [ 2.274637][ T0] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 2.275771][ T0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: fffffbfff0880fd9 [ 2.276858][ T0] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff84407c77 [ 2.277994][ T0] R10: fffffbfff0880f8e R11: 0000000000000001 R12: 0000000000000000 [ 2.279079][ T0] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 2.280185][ T0] FS: 0000000000000000(0000) GS:ffff8883af200000(0000) knlGS:0000000000000000 [ 2.281474][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.282441][ T0] CR2: ffff88843ffff000 CR3: 000000000442a000 CR4: 00000000000406b0 [ 2.283519][ T0] Call Trace: [ 2.283930][ T0] <TASK> [ 2.284328][ T0] secondary_startup_64_no_verify+0xe0/0xeb [ 2.285143][ T0] </TASK> [ 2.285517][ T0] irq event stamp: 0 [ 2.286011][ T0] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [ 2.286946][ T0] hardirqs last disabled at (0): [<0000000000000000>] 0x0 [ 2.287873][ T0] softirqs last enabled at (0): [<0000000000000000>] 0x0 [ 2.288797][ T0] softirqs last disabled at (0): [<0000000000000000>] 0x0 [ 2.289618][ T0] ---[ end trace 0000000000000000 ]--- To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email # if come across any failure that blocks the test, # please remove ~/.lkp and /lkp dir to run from a clean state.
On Wed, Feb 01 2023 at 14:02, kernel test robot wrote: > FYI, we noticed WARNING:at_kernel/locking/lockdep.c:#lockdep_hardirqs_on_prepare due to commit (built with gcc-11): > > commit: 02fb8013ee5f9b7d7bc35d54bf8bc5fe1179970c ("[PATCH 5/5] genirq: Use the maple tree for IRQ descriptors management") > url: https://github.com/intel-lab-lkp/linux/commits/Shanker-Donthineni/genirq-Use-hlist-for-managing-resend-handlers/20230130-085956 > base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 188a569658584e93930ab60334c5a1079c0330d8 > patch link: https://lore.kernel.org/all/20230130005725.3517597-6-sdonthineni@nvidia.com/ > patch subject: [PATCH 5/5] genirq: Use the maple tree for IRQ > descriptors management > [ 2.214554][ T0] ------------[ cut here ]------------ > [ 2.215401][ T0] DEBUG_LOCKS_WARN_ON(early_boot_irqs_disabled) > [ 2.215446][ T0] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:4308 lockdep_hardirqs_on_prepare+0x2d4/0x350 > [ 2.217975][ T0] Modules linked in: > [ 2.218526][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-rc3-00015-g02fb8013ee5f #1 > [ 2.219803][ T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-5 04/01/2014 > [ 2.221228][ T0] RIP: 0010:lockdep_hardirqs_on_prepare+0x2d4/0x350 > [ 2.222207][ T0] Code: 11 38 d0 7c 04 84 d2 75 5e 8b 0d bf 8b f7 03 85 c9 0f 85 c9 fe ff ff 48 c7 c6 40 7d a9 83 48 c7 c7 60 4e a9 83 e8 60 7c 35 02 <0f> 0b e9 af fe ff ff e8 50 8d 62 00 e9 0c fe ff ff e8 e6 8d 62 00 > [ 2.224923][ T0] RSP: 0000:ffffffff844075a0 EFLAGS: 00010082 > [ 2.225792][ T0] RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000 > [ 2.226889][ T0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: fffffbfff0880ea6 > [ 2.227955][ T0] RBP: ffff8883af23fac0 R08: 0000000000000000 R09: ffffffff844072df > [ 2.229068][ T0] R10: fffffbfff0880e5b R11: 0000000000000001 R12: 0000000000000002 > [ 2.230147][ T0] R13: 0000000000000002 R14: ffff88810022b018 R15: ffff88810022b010 > [ 2.231269][ T0] FS: 0000000000000000(0000) GS:ffff8883af200000(0000) knlGS:0000000000000000 > [ 2.232522][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 2.233395][ T0] CR2: ffff88843ffff000 CR3: 000000000442a000 CR4: 00000000000406b0 > [ 2.234504][ T0] Call Trace: > [ 2.234941][ T0] <TASK> > [ 2.235345][ T0] trace_hardirqs_on+0x40/0x140 > [ 2.236029][ T0] __kmem_cache_alloc_bulk+0x22e/0x490 > [ 2.236795][ T0] ? kasan_set_track+0x25/0x30 > [ 2.237470][ T0] kmem_cache_alloc_bulk+0x159/0x2e0 > [ 2.238225][ T0] mas_alloc_nodes+0x253/0x690 > [ 2.238886][ T0] mas_split+0x30d/0x1580 > [ 2.239561][ T0] ? mas_push_data+0x1a40/0x1a40 > [ 2.240219][ T0] ? memset+0x24/0x50 > [ 2.240782][ T0] ? blake2s_final+0x110/0x140 > [ 2.241426][ T0] ? blake2s+0x115/0x150 > [ 2.242143][ T0] ? wait_for_random_bytes+0xd0/0xd0 > [ 2.242859][ T0] ? mas_mab_cp+0x2f6/0x890 > [ 2.243527][ T0] ? memset+0x24/0x50 > [ 2.244122][ T0] ? find_held_lock+0x2c/0x110 > [ 2.244803][ T0] ? mas_store_b_node+0x54c/0x1180 > [ 2.245510][ T0] ? rcu_read_lock_sched_held+0x16/0x80 > [ 2.246282][ T0] mas_wr_bnode+0x14f/0x1d0 > [ 2.246902][ T0] ? mas_commit_b_node+0x600/0x600 > [ 2.247677][ T0] ? secondary_startup_64_no_verify+0xe0/0xeb > [ 2.248567][ T0] ? ___slab_alloc+0x70b/0xe00 > [ 2.249251][ T0] ? mas_wr_store_entry+0x2e9/0xe30 > [ 2.250088][ T0] ? rcu_read_lock_sched_held+0x16/0x80 > [ 2.250864][ T0] mas_store_gfp+0xc2/0x190 > [ 2.251516][ T0] ? mtree_erase+0x100/0x100 > [ 2.252190][ T0] ? lockdep_init_map_type+0x2c7/0x780 > [ 2.252924][ T0] irq_insert_desc+0xac/0xf0 > [ 2.253562][ T0] ? irq_kobj_release+0x100/0x100 > [ 2.254243][ T0] early_irq_init+0x81/0x8c > [ 2.254866][ T0] start_kernel+0x1c7/0x3a4 > [ 2.255479][ T0] secondary_startup_64_no_verify+0xe0/0xeb This triggers because __kmem_cache_alloc_bulk() uses lock_irq()/unlock_irq(). Seems nobody used it during early boot stage yet. Though the maple tree conversion of the interrupt descriptor storage which is the purpose of the patch in question makes that happen. Fix below. Thanks, tglx --- Subject: mm, slub: Take slab lock with irqsave() From: Thomas Gleixner <tglx@linutronix.de> Date: Wed, 01 Feb 2023 14:14:00 +0100 <Add blurb> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- mm/slub.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) --- a/mm/slub.c +++ b/mm/slub.c @@ -3913,6 +3913,7 @@ static inline int __kmem_cache_alloc_bul size_t size, void **p, struct obj_cgroup *objcg) { struct kmem_cache_cpu *c; + unsigned long irqflags; int i; /* @@ -3921,7 +3922,7 @@ static inline int __kmem_cache_alloc_bul * handlers invoking normal fastpath. */ c = slub_get_cpu_ptr(s->cpu_slab); - local_lock_irq(&s->cpu_slab->lock); + local_lock_irqsave(&s->cpu_slab->lock, irqflags); for (i = 0; i < size; i++) { void *object = kfence_alloc(s, s->object_size, flags); @@ -3942,7 +3943,7 @@ static inline int __kmem_cache_alloc_bul */ c->tid = next_tid(c->tid); - local_unlock_irq(&s->cpu_slab->lock); + local_unlock_irqrestore(&s->cpu_slab->lock, irqflags); /* * Invoking slow path likely have side-effect @@ -3956,7 +3957,7 @@ static inline int __kmem_cache_alloc_bul c = this_cpu_ptr(s->cpu_slab); maybe_wipe_obj_freeptr(s, p[i]); - local_lock_irq(&s->cpu_slab->lock); + local_lock_irqsave(&s->cpu_slab->lock, irqflags); continue; /* goto for-loop */ } @@ -3965,7 +3966,7 @@ static inline int __kmem_cache_alloc_bul maybe_wipe_obj_freeptr(s, p[i]); } c->tid = next_tid(c->tid); - local_unlock_irq(&s->cpu_slab->lock); + local_unlock_irqrestore(&s->cpu_slab->lock, irqflags); slub_put_cpu_ptr(s->cpu_slab); return i;
On 2/1/23 14:27, Thomas Gleixner wrote: > On Wed, Feb 01 2023 at 14:02, kernel test robot wrote: >> FYI, we noticed WARNING:at_kernel/locking/lockdep.c:#lockdep_hardirqs_on_prepare due to commit (built with gcc-11): >> >> commit: 02fb8013ee5f9b7d7bc35d54bf8bc5fe1179970c ("[PATCH 5/5] genirq: Use the maple tree for IRQ descriptors management") >> url: https://github.com/intel-lab-lkp/linux/commits/Shanker-Donthineni/genirq-Use-hlist-for-managing-resend-handlers/20230130-085956 >> base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 188a569658584e93930ab60334c5a1079c0330d8 >> patch link: https://lore.kernel.org/all/20230130005725.3517597-6-sdonthineni@nvidia.com/ >> patch subject: [PATCH 5/5] genirq: Use the maple tree for IRQ >> descriptors management > >> [ 2.214554][ T0] ------------[ cut here ]------------ >> [ 2.215401][ T0] DEBUG_LOCKS_WARN_ON(early_boot_irqs_disabled) >> [ 2.215446][ T0] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:4308 lockdep_hardirqs_on_prepare+0x2d4/0x350 >> [ 2.217975][ T0] Modules linked in: >> [ 2.218526][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-rc3-00015-g02fb8013ee5f #1 >> [ 2.219803][ T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-5 04/01/2014 >> [ 2.221228][ T0] RIP: 0010:lockdep_hardirqs_on_prepare+0x2d4/0x350 >> [ 2.222207][ T0] Code: 11 38 d0 7c 04 84 d2 75 5e 8b 0d bf 8b f7 03 85 c9 0f 85 c9 fe ff ff 48 c7 c6 40 7d a9 83 48 c7 c7 60 4e a9 83 e8 60 7c 35 02 <0f> 0b e9 af fe ff ff e8 50 8d 62 00 e9 0c fe ff ff e8 e6 8d 62 00 >> [ 2.224923][ T0] RSP: 0000:ffffffff844075a0 EFLAGS: 00010082 >> [ 2.225792][ T0] RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000 >> [ 2.226889][ T0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: fffffbfff0880ea6 >> [ 2.227955][ T0] RBP: ffff8883af23fac0 R08: 0000000000000000 R09: ffffffff844072df >> [ 2.229068][ T0] R10: fffffbfff0880e5b R11: 0000000000000001 R12: 0000000000000002 >> [ 2.230147][ T0] R13: 0000000000000002 R14: ffff88810022b018 R15: ffff88810022b010 >> [ 2.231269][ T0] FS: 0000000000000000(0000) GS:ffff8883af200000(0000) knlGS:0000000000000000 >> [ 2.232522][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 2.233395][ T0] CR2: ffff88843ffff000 CR3: 000000000442a000 CR4: 00000000000406b0 >> [ 2.234504][ T0] Call Trace: >> [ 2.234941][ T0] <TASK> >> [ 2.235345][ T0] trace_hardirqs_on+0x40/0x140 >> [ 2.236029][ T0] __kmem_cache_alloc_bulk+0x22e/0x490 >> [ 2.236795][ T0] ? kasan_set_track+0x25/0x30 >> [ 2.237470][ T0] kmem_cache_alloc_bulk+0x159/0x2e0 >> [ 2.238225][ T0] mas_alloc_nodes+0x253/0x690 >> [ 2.238886][ T0] mas_split+0x30d/0x1580 >> [ 2.239561][ T0] ? mas_push_data+0x1a40/0x1a40 >> [ 2.240219][ T0] ? memset+0x24/0x50 >> [ 2.240782][ T0] ? blake2s_final+0x110/0x140 >> [ 2.241426][ T0] ? blake2s+0x115/0x150 >> [ 2.242143][ T0] ? wait_for_random_bytes+0xd0/0xd0 >> [ 2.242859][ T0] ? mas_mab_cp+0x2f6/0x890 >> [ 2.243527][ T0] ? memset+0x24/0x50 >> [ 2.244122][ T0] ? find_held_lock+0x2c/0x110 >> [ 2.244803][ T0] ? mas_store_b_node+0x54c/0x1180 >> [ 2.245510][ T0] ? rcu_read_lock_sched_held+0x16/0x80 >> [ 2.246282][ T0] mas_wr_bnode+0x14f/0x1d0 >> [ 2.246902][ T0] ? mas_commit_b_node+0x600/0x600 >> [ 2.247677][ T0] ? secondary_startup_64_no_verify+0xe0/0xeb >> [ 2.248567][ T0] ? ___slab_alloc+0x70b/0xe00 >> [ 2.249251][ T0] ? mas_wr_store_entry+0x2e9/0xe30 >> [ 2.250088][ T0] ? rcu_read_lock_sched_held+0x16/0x80 >> [ 2.250864][ T0] mas_store_gfp+0xc2/0x190 >> [ 2.251516][ T0] ? mtree_erase+0x100/0x100 >> [ 2.252190][ T0] ? lockdep_init_map_type+0x2c7/0x780 >> [ 2.252924][ T0] irq_insert_desc+0xac/0xf0 >> [ 2.253562][ T0] ? irq_kobj_release+0x100/0x100 >> [ 2.254243][ T0] early_irq_init+0x81/0x8c >> [ 2.254866][ T0] start_kernel+0x1c7/0x3a4 >> [ 2.255479][ T0] secondary_startup_64_no_verify+0xe0/0xeb > > This triggers because __kmem_cache_alloc_bulk() uses > lock_irq()/unlock_irq(). Seems nobody used it during early boot stage > yet. Though the maple tree conversion of the interrupt descriptor > storage which is the purpose of the patch in question makes that happen. > > Fix below. Looks like it should work. But I think we also need to adjust SLAB's mm/slab.c kmem_cache_alloc_bulk() which does local_irq_disable(); / local_irq_enable(); right? Also if we enter this with IRQ's disabled, then we should take care about the proper gfp flags. Looking at the patch [1] I see WARN_ON(mas_store_gfp(&mas, desc, GFP_KERNEL) != 0); so GFP_KERNEL would be wrong with irqs disabled, looks like a case for GFP_ATOMIC. OTOH I can see the thing it replaces was: static RADIX_TREE(irq_desc_tree, GFP_KERNEL); so that's also a GFP_KERNEL and we haven't seen debug splats from might_alloc() checks before in this code?. That's weird, or maybe the case of "we didn't enable irqs yet on this cpu being bootstrapped" is handled differently than "we have temporarily disabled irqs"? Sure, during early boot we should have all the memory and no need to reclaim... [1] https://lore.kernel.org/all/20230130005725.3517597-6-sdonthineni@nvidia.com/#t > Thanks, > > tglx > --- > Subject: mm, slub: Take slab lock with irqsave() > From: Thomas Gleixner <tglx@linutronix.de> > Date: Wed, 01 Feb 2023 14:14:00 +0100 > > <Add blurb> Will you add the blurb, and the SLAB part, or should I? And once done should I put it in slab tree for 6.3 or want to make it part of the series so it's not blocked? > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > --- > mm/slub.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -3913,6 +3913,7 @@ static inline int __kmem_cache_alloc_bul > size_t size, void **p, struct obj_cgroup *objcg) > { > struct kmem_cache_cpu *c; > + unsigned long irqflags; > int i; > > /* > @@ -3921,7 +3922,7 @@ static inline int __kmem_cache_alloc_bul > * handlers invoking normal fastpath. > */ > c = slub_get_cpu_ptr(s->cpu_slab); > - local_lock_irq(&s->cpu_slab->lock); > + local_lock_irqsave(&s->cpu_slab->lock, irqflags); > > for (i = 0; i < size; i++) { > void *object = kfence_alloc(s, s->object_size, flags); > @@ -3942,7 +3943,7 @@ static inline int __kmem_cache_alloc_bul > */ > c->tid = next_tid(c->tid); > > - local_unlock_irq(&s->cpu_slab->lock); > + local_unlock_irqrestore(&s->cpu_slab->lock, irqflags); > > /* > * Invoking slow path likely have side-effect > @@ -3956,7 +3957,7 @@ static inline int __kmem_cache_alloc_bul > c = this_cpu_ptr(s->cpu_slab); > maybe_wipe_obj_freeptr(s, p[i]); > > - local_lock_irq(&s->cpu_slab->lock); > + local_lock_irqsave(&s->cpu_slab->lock, irqflags); > > continue; /* goto for-loop */ > } > @@ -3965,7 +3966,7 @@ static inline int __kmem_cache_alloc_bul > maybe_wipe_obj_freeptr(s, p[i]); > } > c->tid = next_tid(c->tid); > - local_unlock_irq(&s->cpu_slab->lock); > + local_unlock_irqrestore(&s->cpu_slab->lock, irqflags); > slub_put_cpu_ptr(s->cpu_slab); > > return i; > >
On Mon, Feb 06 2023 at 15:24, Vlastimil Babka wrote: > On 2/1/23 14:27, Thomas Gleixner wrote: >> This triggers because __kmem_cache_alloc_bulk() uses >> lock_irq()/unlock_irq(). Seems nobody used it during early boot stage >> yet. Though the maple tree conversion of the interrupt descriptor >> storage which is the purpose of the patch in question makes that happen. >> >> Fix below. > > Looks like it should work. But I think we also need to adjust SLAB's > mm/slab.c kmem_cache_alloc_bulk() which does local_irq_disable(); / > local_irq_enable(); right? Yup. > Also if we enter this with IRQ's disabled, then we should take care about > the proper gfp flags. Looking at the patch [1] I see > > WARN_ON(mas_store_gfp(&mas, desc, GFP_KERNEL) != 0); > > so GFP_KERNEL would be wrong with irqs disabled, looks like a case for > GFP_ATOMIC. > OTOH I can see the thing it replaces was: > > static RADIX_TREE(irq_desc_tree, GFP_KERNEL); > > so that's also a GFP_KERNEL and we haven't seen debug splats from > might_alloc() checks before in this code?. That's weird, or maybe the > case might_alloc() might_sleep_if() __might_sleep() WARN_ON(task->state != RUNNING); <- Does not trigger __might_resched() if (.... || system_state == SYSTEM_BOOTING || ...) return; As system_state is SYSTEM_BOOTING at this point the splats are not happening. > of "we didn't enable irqs yet on this cpu being bootstrapped" is handled > differently than "we have temporarily disabled irqs"? Sure, during early > boot we should have all the memory and no need to reclaim... The point is that interrupts are fully disabled during early boot and there is no scheduler so there is no scheduling possible. Quite some code in the kernel relies on GFP_KERNEL being functional during that early boot stage. If the kernel runs out of memory that early, then the chance of recovery is exactly 0. Thanks, tglx
On Mon, Feb 06 2023 at 15:24, Vlastimil Babka wrote: > On 2/1/23 14:27, Thomas Gleixner wrote: >> Subject: mm, slub: Take slab lock with irqsave() >> From: Thomas Gleixner <tglx@linutronix.de> >> Date: Wed, 01 Feb 2023 14:14:00 +0100 >> >> <Add blurb> > > Will you add the blurb, and the SLAB part, or should I? And once done should > I put it in slab tree for 6.3 or want to make it part of the series so it's > not blocked? Ooops. I missed that part. Let me add slab and blurb and send it as a proper patch. Just take it into the slab tree. The maple tree conversion has still some issues, so I don't expect it to be 6.3 material. Thanks, tglx
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h index 5d741b0e7d5e..e35de737802c 100644 --- a/kernel/irq/internals.h +++ b/kernel/irq/internals.h @@ -12,7 +12,7 @@ #include <linux/sched/clock.h> #ifdef CONFIG_SPARSE_IRQ -# define MAX_SPARSE_IRQS (NR_IRQS + 8196) +# define MAX_SPARSE_IRQS INT_MAX #else # define MAX_SPARSE_IRQS NR_IRQS #endif diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c index 247a0718d028..06be5f924a7c 100644 --- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -15,6 +15,7 @@ #include <linux/radix-tree.h> #include <linux/bitmap.h> #include <linux/irqdomain.h> +#include <linux/maple_tree.h> #include <linux/sysfs.h> #include "internals.h" @@ -131,17 +132,37 @@ int nr_irqs = NR_IRQS; EXPORT_SYMBOL_GPL(nr_irqs); static DEFINE_MUTEX(sparse_irq_lock); -static DECLARE_BITMAP(allocated_irqs, MAX_SPARSE_IRQS); +static struct maple_tree sparse_irqs = MTREE_INIT_EXT(sparse_irqs, + MT_FLAGS_ALLOC_RANGE | + MT_FLAGS_LOCK_EXTERN | + MT_FLAGS_USE_RCU, sparse_irq_lock); static int irq_find_free_area(unsigned int from, unsigned int cnt) { - return bitmap_find_next_zero_area(allocated_irqs, MAX_SPARSE_IRQS, - from, cnt, 0); + MA_STATE(mas, &sparse_irqs, 0, 0); + + if (mas_empty_area(&mas, from, MAX_SPARSE_IRQS, cnt)) + return -ENOSPC; + return mas.index; } static unsigned int irq_find_next_irq(unsigned int offset) { - return find_next_bit(allocated_irqs, nr_irqs, offset); + struct irq_desc *desc = mt_next(&sparse_irqs, offset, nr_irqs); + + return desc ? irq_desc_get_irq(desc) : nr_irqs; +} + +static void irq_insert_desc(unsigned int irq, struct irq_desc *desc) +{ + MA_STATE(mas, &sparse_irqs, irq, irq); + WARN_ON(mas_store_gfp(&mas, desc, GFP_KERNEL) != 0); +} + +static void delete_irq_desc(unsigned int irq) +{ + MA_STATE(mas, &sparse_irqs, irq, irq); + mas_erase(&mas); } static int irq_expand_nr_irqs(unsigned int nr) @@ -363,26 +384,14 @@ static void irq_sysfs_del(struct irq_desc *desc) {} #endif /* CONFIG_SYSFS */ -static RADIX_TREE(irq_desc_tree, GFP_KERNEL); - -static void irq_insert_desc(unsigned int irq, struct irq_desc *desc) -{ - radix_tree_insert(&irq_desc_tree, irq, desc); -} - struct irq_desc *irq_to_desc(unsigned int irq) { - return radix_tree_lookup(&irq_desc_tree, irq); + return mtree_load(&sparse_irqs, irq); } #ifdef CONFIG_KVM_BOOK3S_64_HV_MODULE EXPORT_SYMBOL_GPL(irq_to_desc); #endif -static void delete_irq_desc(unsigned int irq) -{ - radix_tree_delete(&irq_desc_tree, irq); -} - #ifdef CONFIG_SMP static void free_masks(struct irq_desc *desc) { @@ -527,7 +536,6 @@ static int alloc_descs(unsigned int start, unsigned int cnt, int node, irq_sysfs_add(start + i, desc); irq_add_debugfs_entry(start + i, desc); } - bitmap_set(allocated_irqs, start, cnt); return start; err: @@ -559,7 +567,6 @@ int __init early_irq_init(void) for (i = 0; i < initcnt; i++) { desc = alloc_desc(i, node, 0, NULL, NULL); - set_bit(i, allocated_irqs); irq_insert_desc(i, desc); } return arch_early_irq_init(); @@ -613,6 +620,7 @@ static void free_desc(unsigned int irq) raw_spin_lock_irqsave(&desc->lock, flags); desc_set_defaults(irq, desc, irq_desc_get_node(desc), NULL, NULL); raw_spin_unlock_irqrestore(&desc->lock, flags); + delete_irq_desc(irq); } static inline int alloc_descs(unsigned int start, unsigned int cnt, int node, @@ -625,15 +633,15 @@ static inline int alloc_descs(unsigned int start, unsigned int cnt, int node, struct irq_desc *desc = irq_to_desc(start + i); desc->owner = owner; + irq_insert_desc(start + i, desc); } - bitmap_set(allocated_irqs, start, cnt); return start; } void irq_mark_irq(unsigned int irq) { mutex_lock(&sparse_irq_lock); - bitmap_set(allocated_irqs, irq, 1); + irq_insert_desc(irq, irq_descs + irq); mutex_unlock(&sparse_irq_lock); } @@ -777,7 +785,6 @@ void irq_free_descs(unsigned int from, unsigned int cnt) for (i = 0; i < cnt; i++) free_desc(from + i); - bitmap_clear(allocated_irqs, from, cnt); mutex_unlock(&sparse_irq_lock); } EXPORT_SYMBOL_GPL(irq_free_descs);