Message ID | 20231023160018.164054-3-mario.limonciello@amd.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce89:0:b0:403:3b70:6f57 with SMTP id p9csp1392917vqx; Mon, 23 Oct 2023 09:01:22 -0700 (PDT) X-Google-Smtp-Source: AGHT+IELrCCfSFY62xy4h2nnKWLc8V9gM1t1ua7VR1tzrCvsYxxjguR46cWM8sW4k7AKHC/4dPe6 X-Received: by 2002:a05:6a00:134d:b0:6b8:69fa:a11 with SMTP id k13-20020a056a00134d00b006b869fa0a11mr11419730pfu.12.1698076882122; Mon, 23 Oct 2023 09:01:22 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1698076882; cv=pass; d=google.com; s=arc-20160816; b=PnynpkKxwlK5HwfMZU8bbvh+E497DDZPiMFBxAzvedl6YVNxI4WJLofMUJlai3rUju Hlmk+YmpefJh0hubaeakIBjUhBJSTlqnPszJnmPWE82dNXnzThDLjFM4rr7pjza0WVig WpBsAw3oZZJsIrsdgum3fNwGmvoGgc5Ahya/d1708zNkt8N+syM0R+wjt95uSnTIGyTQ Kj4Eqe8Qgig5Mqe0cE1K6hR5ejev95neb361O96M8PhG/wn63CqEp3tsqOf6bgn+d2kQ nt6Kn+CTJJ3sAOFWNCkYMyyRAG0Y+po+UsRctsBBYBrXQqQd8mLdx2zT21mW0sIqz9u6 iSQQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=jKbpWSI5rdNsAcVT0gIb+6jCCMCBeYe2GrpQggqSXNY=; fh=lZUIGCL4esAFHXKCK7ldk6VPLxmtiVSfIW7ZX9/ohAE=; b=FXfwpJTFtn44MGZchSgiei61XXlJDceyQftKR0Ycmj37pCzcjrmloKr2Efarz8oM7j G51ui6/UHQFFAo029zJyHCke2czTjj5RNbAskKm6EMDwTpIFQvQXAdlX94eeRtR6ku76 TrftJ0dzfG6xRkwWsOzL92z69vdgxMfbBM++wBS4di8eZ48dxxmhSp6cJAbjojHHi20i ydLOwQuUEO4d3mbtn1tRy/mL6M6sXNUxZh3yC6NTLZAbVw4cBt6S28Vt0ILWR80qbpmz P4mC1bTAVVs88dRthg4SBSQDm9UkCciV6mENv/lRrjWtCvd21sWoTUAOrWDPXjaMjjX9 MDqg== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amd.com header.s=selector1 header.b=GsnI0IQg; arc=pass (i=1 spf=pass spfdomain=amd.com dmarc=pass fromdomain=amd.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amd.com Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id t5-20020aa79465000000b0068a6eb3b548si6397353pfq.401.2023.10.23.09.01.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Oct 2023 09:01:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@amd.com header.s=selector1 header.b=GsnI0IQg; arc=pass (i=1 spf=pass spfdomain=amd.com dmarc=pass fromdomain=amd.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amd.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 4DEAE807C85B; Mon, 23 Oct 2023 09:01:14 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232464AbjJWQBC (ORCPT <rfc822;aposhian.dev@gmail.com> + 27 others); Mon, 23 Oct 2023 12:01:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47684 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233188AbjJWQAq (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 23 Oct 2023 12:00:46 -0400 Received: from NAM02-DM3-obe.outbound.protection.outlook.com (mail-dm3nam02on2082.outbound.protection.outlook.com [40.107.95.82]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DCB1610D1; Mon, 23 Oct 2023 09:00:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=jw5OKym8eb1sM120uqXUybkxNlNJO50aGr7QdjDKXaXYcxZkM474PofNkKVy/zv3PUw63WissS/FAbmnRRXqjPlZV1BGuFWFwtSCI9rF8GTraHDX8D25heRSFsbwuTiUvMRAnnrv59DQKrrUmJPuR40XW24LzAgdoNmYA4G05ZMN+UbiYFu39g2sN7XgD02bQ0fg+FNASkYXJ8zoUw+Yy8Wzsy48VihPWVF8+hYWuhqm/wgcY/O2n9qXPjq4EWPMSCSSp7eyMrJllHCjLb0Klmsasr6U2AaRSXIBkEwYKJu+vFxyz00bml9yhYo0YsgJSPuWJpck5qtES1HEiO4e7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=jKbpWSI5rdNsAcVT0gIb+6jCCMCBeYe2GrpQggqSXNY=; b=myoZGHjWdR8BZ4BcCjQo+Gro5ejchSJFlS5rhAXBqnPP1xk3XQ0lzDWyERlNHXLS4hJ6ElCfGLsIXnZJpEsxWLd96mRu1L6PMuJvw2dLKcsItYOQLCPxYsLAEvapXDKeWkOTwhV4Q0ShoXy3reg2emXd0S99aGOxRaquuXFMLZyBBpbuuX+ODB1LXXRxlMhzUd4jc7YbDBbVMrrw3pYx1akE4xhPfhOobdd/sI5folFtWYJrZFrmVRg/XQHXDX41ur06FDVrzLjIb8Hx9edlOlkaL8hsUhbDX+EOZIJpPHwAnZieL4AoPMlwNddvSi35BYzH/gMoPU92fB8enUqctA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=infradead.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=jKbpWSI5rdNsAcVT0gIb+6jCCMCBeYe2GrpQggqSXNY=; b=GsnI0IQg4KaJZEqUEjmLMHvOUW7Y2tcolj/BDPf7azdFvzmGAcYJz7KGZzwTdbFcNFTHoQbPse15jXQ4N1dqGtWQX7Il5gfb3m2TxUyfQ+BfE0d8Q756DJdS+npBmW++nypZKFk6JCJ1nMTV/zs/4U+22WS+Vnd4Bv6yIvqRpVE= Received: from CY8PR12CA0001.namprd12.prod.outlook.com (2603:10b6:930:4e::24) by MW4PR12MB5666.namprd12.prod.outlook.com (2603:10b6:303:188::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6907.26; Mon, 23 Oct 2023 16:00:41 +0000 Received: from CY4PEPF0000EE3C.namprd03.prod.outlook.com (2603:10b6:930:4e:cafe::62) by CY8PR12CA0001.outlook.office365.com (2603:10b6:930:4e::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6907.33 via Frontend Transport; Mon, 23 Oct 2023 16:00:40 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CY4PEPF0000EE3C.mail.protection.outlook.com (10.167.242.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6933.15 via Frontend Transport; Mon, 23 Oct 2023 16:00:40 +0000 Received: from AUS-P9-MLIMONCI.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Mon, 23 Oct 2023 11:00:39 -0500 From: Mario Limonciello <mario.limonciello@amd.com> To: Peter Zijlstra <peterz@infradead.org>, Borislav Petkov <bp@alien8.de>, Thomas Gleixner <tglx@linutronix.de>, Dave Hansen <dave.hansen@linux.intel.com>, Sandipan Das <sandipan.das@amd.com>, "H . Peter Anvin" <hpa@zytor.com> CC: <linux-kernel@vger.kernel.org>, <x86@kernel.org>, <linux-pm@vger.kernel.org>, <rafael@kernel.org>, <pavel@ucw.cz>, <linux-perf-users@vger.kernel.org>, Ingo Molnar <mingo@redhat.com>, "Arnaldo Carvalho de Melo" <acme@kernel.org>, Mark Rutland <mark.rutland@arm.com>, Alexander Shishkin <alexander.shishkin@linux.intel.com>, Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>, Ian Rogers <irogers@google.com>, Adrian Hunter <adrian.hunter@intel.com>, "Mario Limonciello" <mario.limonciello@amd.com> Subject: [PATCH 2/2] perf/x86/amd: Don't allow pre-emption in amd_pmu_lbr_reset() Date: Mon, 23 Oct 2023 11:00:18 -0500 Message-ID: <20231023160018.164054-3-mario.limonciello@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231023160018.164054-1-mario.limonciello@amd.com> References: <20231023160018.164054-1-mario.limonciello@amd.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000EE3C:EE_|MW4PR12MB5666:EE_ X-MS-Office365-Filtering-Correlation-Id: 3a31c1f0-96c5-4c2a-7682-08dbd3e134a3 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: qnDU8E0jMpXScECBwwcNfzHnALLCj5smgStDnug8REbs79EYSiTFNtUq0PXn5H4qQd+MC5yMeIeFx6ULyVJlvEr165Wt3NkqYtPzOjhYRNQeCRtCI7/+3TFG1PbUQEFaA2USxl8MRkp6ppVKB7wJtrNO1WxoV1WtlcirVLw5mLc3wsHyL0TnfWPEQeZDADpVP+Wz7x8AMKARotlhlnVBS49XCxtrik9PEtGDPZtPujzNUVClIyVTfBJeC0pfxViz5thLSkjZkkN2GvvgJ90BY0Kl28zstnqvQdCkn5tLRgdYQ960SWoajPXjGHZbAXtTA0GlN3PYN0sm5dJlvySaUksubWJtLiOIKjrMp/8A08OotW9HIffUcZSe+EmXpv3O/kekjlLIoT8LzB7BHACfIU4duRhMRnxUP/C3JyFGOGiiWP8xp8g//Z0NZVixE/m65ewjM7mgJufO6XQNjLWizBh18Mle43l604iPGMFkDuh68KYVhk2T9//xikhcmrzMljtc6pd9Vw71aaclW0dXwmZCjCULpUKzzRQkgekPe7e2OMy3dWbSCDV5NlR2lL3MUM6iP7vG+MZqOVr/9J5ZSOJyCvIHqHCPZrm46w3/PkQ/GujxnHEJp1600MFDs3XgrrFxS3zIcJ8gaYNfmuNw7JPgJF+lpgK7O9cTfoZCfysxw1AnD3ajuqJFFjD62y0Z6N0h8GfnXdZxa/SjNFYp2XbaF+amVDig/SMoX7DtadNjn2n0YvtgPdZa3xNQQjlBXN/Xa5nZk6RfG90IlkuMcg== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230031)(4636009)(39860400002)(396003)(136003)(376002)(346002)(230922051799003)(186009)(82310400011)(451199024)(64100799003)(1800799009)(36840700001)(46966006)(40470700004)(40480700001)(40460700003)(7416002)(36860700001)(5660300002)(82740400003)(2906002)(356005)(41300700001)(16526019)(7696005)(86362001)(1076003)(336012)(83380400001)(81166007)(26005)(426003)(4326008)(8936002)(47076005)(44832011)(2616005)(36756003)(8676002)(6666004)(478600001)(316002)(70586007)(70206006)(110136005)(54906003)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Oct 2023 16:00:40.6349 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 3a31c1f0-96c5-4c2a-7682-08dbd3e134a3 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000EE3C.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW4PR12MB5666 X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Mon, 23 Oct 2023 09:01:14 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780562664764596471 X-GMAIL-MSGID: 1780562664764596471 |
Series |
Fixes for s3 with parallel bootup
|
|
Commit Message
Mario Limonciello
Oct. 23, 2023, 4 p.m. UTC
Fixes a BUG reported during suspend to ram testing.
```
[ 478.274752] BUG: using smp_processor_id() in preemptible [00000000] code: rtcwake/2948
[ 478.274754] caller is amd_pmu_lbr_reset+0x19/0xc0
```
Cc: stable@vger.kernel.org # 6.1+
Fixes: ca5b7c0d9621 ("perf/x86/amd/lbr: Add LbrExtV2 branch record support")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
arch/x86/events/amd/lbr.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Comments
* Mario Limonciello <mario.limonciello@amd.com> wrote: > Fixes a BUG reported during suspend to ram testing. > > ``` > [ 478.274752] BUG: using smp_processor_id() in preemptible [00000000] code: rtcwake/2948 > [ 478.274754] caller is amd_pmu_lbr_reset+0x19/0xc0 > ``` > > Cc: stable@vger.kernel.org # 6.1+ > Fixes: ca5b7c0d9621 ("perf/x86/amd/lbr: Add LbrExtV2 branch record support") > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> > --- > arch/x86/events/amd/lbr.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c > index eb31f850841a..5b98e8c7d8b7 100644 > --- a/arch/x86/events/amd/lbr.c > +++ b/arch/x86/events/amd/lbr.c > @@ -321,7 +321,7 @@ int amd_pmu_lbr_hw_config(struct perf_event *event) > > void amd_pmu_lbr_reset(void) > { > - struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); > + struct cpu_hw_events *cpuc = get_cpu_ptr(&cpu_hw_events); > int i; > > if (!x86_pmu.lbr_nr) > @@ -335,6 +335,7 @@ void amd_pmu_lbr_reset(void) > > cpuc->last_task_ctx = NULL; > cpuc->last_log_id = 0; > + put_cpu_ptr(&cpu_hw_events); > wrmsrl(MSR_AMD64_LBR_SELECT, 0); > } Weird, amd_pmu_lbr_reset() is called from these places: - amd_pmu_lbr_sched_task(): during task sched-in during context-switching, this should already have preemption disabled. - amd_pmu_lbr_add(): this gets indirectly called by amd_pmu::add (amd_pmu_add_event()), called by event_sched_in(), which too should have preemption disabled. I clearly must have missed some additional place it gets called in. Could you please cite the full log of the amd_pmu_lbr_reset() call that caused the critical section warning? Thanks, Ingo
On 10/24/2023 03:02, Ingo Molnar wrote: > > * Mario Limonciello <mario.limonciello@amd.com> wrote: > >> Fixes a BUG reported during suspend to ram testing. >> >> ``` >> [ 478.274752] BUG: using smp_processor_id() in preemptible [00000000] code: rtcwake/2948 >> [ 478.274754] caller is amd_pmu_lbr_reset+0x19/0xc0 >> ``` >> >> Cc: stable@vger.kernel.org # 6.1+ >> Fixes: ca5b7c0d9621 ("perf/x86/amd/lbr: Add LbrExtV2 branch record support") >> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> >> --- >> arch/x86/events/amd/lbr.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c >> index eb31f850841a..5b98e8c7d8b7 100644 >> --- a/arch/x86/events/amd/lbr.c >> +++ b/arch/x86/events/amd/lbr.c >> @@ -321,7 +321,7 @@ int amd_pmu_lbr_hw_config(struct perf_event *event) >> >> void amd_pmu_lbr_reset(void) >> { >> - struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); >> + struct cpu_hw_events *cpuc = get_cpu_ptr(&cpu_hw_events); >> int i; >> >> if (!x86_pmu.lbr_nr) >> @@ -335,6 +335,7 @@ void amd_pmu_lbr_reset(void) >> >> cpuc->last_task_ctx = NULL; >> cpuc->last_log_id = 0; >> + put_cpu_ptr(&cpu_hw_events); >> wrmsrl(MSR_AMD64_LBR_SELECT, 0); >> } > > Weird, amd_pmu_lbr_reset() is called from these places: > > - amd_pmu_lbr_sched_task(): during task sched-in during > context-switching, this should already have preemption disabled. > > - amd_pmu_lbr_add(): this gets indirectly called by amd_pmu::add > (amd_pmu_add_event()), called by event_sched_in(), which too should have > preemption disabled. > > I clearly must have missed some additional place it gets called in. > > Could you please cite the full log of the amd_pmu_lbr_reset() call that > caused the critical section warning? > > Thanks, > > Ingo Below is the call trace in case you think it's better to disable preemption by the caller instead. If you think it's better to keep it in amd_pmu_lbr_reset() I'll add this trace to the commit message. Call Trace: <TASK> dump_stack_lvl+0x44/0x60 check_preemption_disabled+0xce/0xf0 ? __pfx_x86_pmu_dead_cpu+0x10/0x10 amd_pmu_lbr_reset+0x19/0xc0 ? __pfx_x86_pmu_dead_cpu+0x10/0x10 amd_pmu_cpu_reset.constprop.0+0x51/0x60 amd_pmu_cpu_dead+0x3e/0x90 x86_pmu_dead_cpu+0x13/0x20 cpuhp_invoke_callback+0x169/0x4b0 ? __pfx_virtnet_cpu_dead+0x10/0x10 __cpuhp_invoke_callback_range+0x76/0xe0 _cpu_down+0x112/0x270 freeze_secondary_cpus+0x8e/0x280 suspend_devices_and_enter+0x342/0x900 pm_suspend+0x2fd/0x690 state_store+0x71/0xd0 kernfs_fop_write_iter+0x128/0x1c0 vfs_write+0x2db/0x400 ksys_write+0x5f/0xe0 do_syscall_64+0x59/0x90
On Tue, Oct 24, 2023 at 10:32:27AM -0500, Mario Limonciello wrote: > On 10/24/2023 03:02, Ingo Molnar wrote: > > > > * Mario Limonciello <mario.limonciello@amd.com> wrote: > > > > > Fixes a BUG reported during suspend to ram testing. > > > > > > ``` > > > [ 478.274752] BUG: using smp_processor_id() in preemptible [00000000] code: rtcwake/2948 > > > [ 478.274754] caller is amd_pmu_lbr_reset+0x19/0xc0 > > > ``` > > > > > > Cc: stable@vger.kernel.org # 6.1+ > > > Fixes: ca5b7c0d9621 ("perf/x86/amd/lbr: Add LbrExtV2 branch record support") > > > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> > > > --- > > > arch/x86/events/amd/lbr.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c > > > index eb31f850841a..5b98e8c7d8b7 100644 > > > --- a/arch/x86/events/amd/lbr.c > > > +++ b/arch/x86/events/amd/lbr.c > > > @@ -321,7 +321,7 @@ int amd_pmu_lbr_hw_config(struct perf_event *event) > > > void amd_pmu_lbr_reset(void) > > > { > > > - struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); > > > + struct cpu_hw_events *cpuc = get_cpu_ptr(&cpu_hw_events); > > > int i; > > > if (!x86_pmu.lbr_nr) > > > @@ -335,6 +335,7 @@ void amd_pmu_lbr_reset(void) > > > cpuc->last_task_ctx = NULL; > > > cpuc->last_log_id = 0; > > > + put_cpu_ptr(&cpu_hw_events); > > > wrmsrl(MSR_AMD64_LBR_SELECT, 0); > > > } > > > > Weird, amd_pmu_lbr_reset() is called from these places: > > > > - amd_pmu_lbr_sched_task(): during task sched-in during > > context-switching, this should already have preemption disabled. > > > > - amd_pmu_lbr_add(): this gets indirectly called by amd_pmu::add > > (amd_pmu_add_event()), called by event_sched_in(), which too should have > > preemption disabled. > > > > I clearly must have missed some additional place it gets called in. > > > > Could you please cite the full log of the amd_pmu_lbr_reset() call that > > caused the critical section warning? > > > > Thanks, > > > > Ingo > > Below is the call trace in case you think it's better to disable preemption > by the caller instead. If you think it's better to keep it in > amd_pmu_lbr_reset() I'll add this trace to the commit message. You cut too much; what task is running this? IIRC this is the hotplug thread running a teardown function on that CPU itself. It being a strict per-cpu thread should not trip smp_processor_id() wanrs. > > Call Trace: > <TASK> > dump_stack_lvl+0x44/0x60 > check_preemption_disabled+0xce/0xf0 > ? __pfx_x86_pmu_dead_cpu+0x10/0x10 > amd_pmu_lbr_reset+0x19/0xc0 > ? __pfx_x86_pmu_dead_cpu+0x10/0x10 > amd_pmu_cpu_reset.constprop.0+0x51/0x60 > amd_pmu_cpu_dead+0x3e/0x90 > x86_pmu_dead_cpu+0x13/0x20 > cpuhp_invoke_callback+0x169/0x4b0 > ? __pfx_virtnet_cpu_dead+0x10/0x10 > __cpuhp_invoke_callback_range+0x76/0xe0 > _cpu_down+0x112/0x270 > freeze_secondary_cpus+0x8e/0x280 > suspend_devices_and_enter+0x342/0x900 > pm_suspend+0x2fd/0x690 > state_store+0x71/0xd0 > kernfs_fop_write_iter+0x128/0x1c0 > vfs_write+0x2db/0x400 > ksys_write+0x5f/0xe0 > do_syscall_64+0x59/0x90 >
On 10/24/2023 10:59, Peter Zijlstra wrote: > On Tue, Oct 24, 2023 at 10:32:27AM -0500, Mario Limonciello wrote: >> On 10/24/2023 03:02, Ingo Molnar wrote: >>> >>> * Mario Limonciello <mario.limonciello@amd.com> wrote: >>> >>>> Fixes a BUG reported during suspend to ram testing. >>>> >>>> ``` >>>> [ 478.274752] BUG: using smp_processor_id() in preemptible [00000000] code: rtcwake/2948 >>>> [ 478.274754] caller is amd_pmu_lbr_reset+0x19/0xc0 >>>> ``` >>>> >>>> Cc: stable@vger.kernel.org # 6.1+ >>>> Fixes: ca5b7c0d9621 ("perf/x86/amd/lbr: Add LbrExtV2 branch record support") >>>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> >>>> --- >>>> arch/x86/events/amd/lbr.c | 3 ++- >>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c >>>> index eb31f850841a..5b98e8c7d8b7 100644 >>>> --- a/arch/x86/events/amd/lbr.c >>>> +++ b/arch/x86/events/amd/lbr.c >>>> @@ -321,7 +321,7 @@ int amd_pmu_lbr_hw_config(struct perf_event *event) >>>> void amd_pmu_lbr_reset(void) >>>> { >>>> - struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); >>>> + struct cpu_hw_events *cpuc = get_cpu_ptr(&cpu_hw_events); >>>> int i; >>>> if (!x86_pmu.lbr_nr) >>>> @@ -335,6 +335,7 @@ void amd_pmu_lbr_reset(void) >>>> cpuc->last_task_ctx = NULL; >>>> cpuc->last_log_id = 0; >>>> + put_cpu_ptr(&cpu_hw_events); >>>> wrmsrl(MSR_AMD64_LBR_SELECT, 0); >>>> } >>> >>> Weird, amd_pmu_lbr_reset() is called from these places: >>> >>> - amd_pmu_lbr_sched_task(): during task sched-in during >>> context-switching, this should already have preemption disabled. >>> >>> - amd_pmu_lbr_add(): this gets indirectly called by amd_pmu::add >>> (amd_pmu_add_event()), called by event_sched_in(), which too should have >>> preemption disabled. >>> >>> I clearly must have missed some additional place it gets called in. >>> >>> Could you please cite the full log of the amd_pmu_lbr_reset() call that >>> caused the critical section warning? >>> >>> Thanks, >>> >>> Ingo >> >> Below is the call trace in case you think it's better to disable preemption >> by the caller instead. If you think it's better to keep it in >> amd_pmu_lbr_reset() I'll add this trace to the commit message. > > You cut too much; what task is running this? > > IIRC this is the hotplug thread running a teardown function on that CPU > itself. It being a strict per-cpu thread should not trip > smp_processor_id() wanrs. > BUG: using smp_processor_id() in preemptible [00000000] code: rtcwake/2960 caller is amd_pmu_lbr_reset+0x19/0xc0 CPU: 104 PID: 2960 Comm: rtcwake Not tainted 6.6.0-rc6-00002-g3e2c7f3ac51f #1025 Call Trace: <TASK> dump_stack_lvl+0x44/0x60 check_preemption_disabled+0xce/0xf0 ? __pfx_x86_pmu_dead_cpu+0x10/0x10 amd_pmu_lbr_reset+0x19/0xc0 ? __pfx_x86_pmu_dead_cpu+0x10/0x10 amd_pmu_cpu_reset.constprop.0+0x51/0x60 amd_pmu_cpu_dead+0x3e/0x90 x86_pmu_dead_cpu+0x13/0x20 cpuhp_invoke_callback+0x169/0x4b0 ? __pfx_virtnet_cpu_dead+0x10/0x10 __cpuhp_invoke_callback_range+0x76/0xe0 _cpu_down+0x112/0x270 freeze_secondary_cpus+0x8e/0x280 suspend_devices_and_enter+0x342/0x900 pm_suspend+0x2fd/0x690 state_store+0x71/0xd0 kernfs_fop_write_iter+0x128/0x1c0 vfs_write+0x2db/0x400 ksys_write+0x5f/0xe0 do_syscall_64+0x59/0x90 ? srso_alias_return_thunk+0x5/0x7f ? count_memcg_events.constprop.0+0x1a/0x30 ? srso_alias_return_thunk+0x5/0x7f ? handle_mm_fault+0x1e9/0x340 ? srso_alias_return_thunk+0x5/0x7f ? preempt_count_add+0x4d/0xa0 ? srso_alias_return_thunk+0x5/0x7f ? up_read+0x38/0x70 ? srso_alias_return_thunk+0x5/0x7f ? do_user_addr_fault+0x343/0x6b0 ? srso_alias_return_thunk+0x5/0x7f ? exc_page_fault+0x74/0x170 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7f32f8d14a77 Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 RSP: 002b:00007ffdc648de18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f32f8d14a77 RDX: 0000000000000004 RSI: 000055b2fc2a5670 RDI: 0000000000000004 RBP: 000055b2fc2a5670 R08: 0000000000000000 R09: 000055b2fc2a5670 R10: 00007f32f8e1a2f0 R11: 0000000000000246 R12: 0000000000000004 R13: 000055b2fc2a2480 R14: 00007f32f8e16600 R15: 00007f32f8e15a00 </TASK> >> >> Call Trace: >> <TASK> >> dump_stack_lvl+0x44/0x60 >> check_preemption_disabled+0xce/0xf0 >> ? __pfx_x86_pmu_dead_cpu+0x10/0x10 >> amd_pmu_lbr_reset+0x19/0xc0 >> ? __pfx_x86_pmu_dead_cpu+0x10/0x10 >> amd_pmu_cpu_reset.constprop.0+0x51/0x60 >> amd_pmu_cpu_dead+0x3e/0x90 >> x86_pmu_dead_cpu+0x13/0x20 >> cpuhp_invoke_callback+0x169/0x4b0 >> ? __pfx_virtnet_cpu_dead+0x10/0x10 >> __cpuhp_invoke_callback_range+0x76/0xe0 >> _cpu_down+0x112/0x270 >> freeze_secondary_cpus+0x8e/0x280 >> suspend_devices_and_enter+0x342/0x900 >> pm_suspend+0x2fd/0x690 >> state_store+0x71/0xd0 >> kernfs_fop_write_iter+0x128/0x1c0 >> vfs_write+0x2db/0x400 >> ksys_write+0x5f/0xe0 >> do_syscall_64+0x59/0x90 >>
On Tue, Oct 24, 2023 at 11:04:06AM -0500, Mario Limonciello wrote: > > IIRC this is the hotplug thread running a teardown function on that CPU > > itself. It being a strict per-cpu thread should not trip > > smp_processor_id() wanrs. > > > > BUG: using smp_processor_id() in preemptible [00000000] code: rtcwake/2960 > caller is amd_pmu_lbr_reset+0x19/0xc0 > CPU: 104 PID: 2960 Comm: rtcwake Not tainted 6.6.0-rc6-00002-g3e2c7f3ac51f Very much not the cpuhp/%u thread :/, let me try and figure out how that happens. > #1025 > Call Trace: > <TASK> > dump_stack_lvl+0x44/0x60 > check_preemption_disabled+0xce/0xf0 > ? __pfx_x86_pmu_dead_cpu+0x10/0x10 > amd_pmu_lbr_reset+0x19/0xc0 > ? __pfx_x86_pmu_dead_cpu+0x10/0x10 > amd_pmu_cpu_reset.constprop.0+0x51/0x60 > amd_pmu_cpu_dead+0x3e/0x90 > x86_pmu_dead_cpu+0x13/0x20 > cpuhp_invoke_callback+0x169/0x4b0 > ? __pfx_virtnet_cpu_dead+0x10/0x10 > __cpuhp_invoke_callback_range+0x76/0xe0 > _cpu_down+0x112/0x270 > freeze_secondary_cpus+0x8e/0x280 > suspend_devices_and_enter+0x342/0x900 > pm_suspend+0x2fd/0x690 > state_store+0x71/0xd0 > kernfs_fop_write_iter+0x128/0x1c0 > vfs_write+0x2db/0x400 > ksys_write+0x5f/0xe0 > do_syscall_64+0x59/0x90 > ? srso_alias_return_thunk+0x5/0x7f > ? count_memcg_events.constprop.0+0x1a/0x30 > ? srso_alias_return_thunk+0x5/0x7f > ? handle_mm_fault+0x1e9/0x340 > ? srso_alias_return_thunk+0x5/0x7f > ? preempt_count_add+0x4d/0xa0 > ? srso_alias_return_thunk+0x5/0x7f > ? up_read+0x38/0x70 > ? srso_alias_return_thunk+0x5/0x7f > ? do_user_addr_fault+0x343/0x6b0 > ? srso_alias_return_thunk+0x5/0x7f > ? exc_page_fault+0x74/0x170 > entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > RIP: 0033:0x7f32f8d14a77 > Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa > 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff > 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 > RSP: 002b:00007ffdc648de18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 > RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f32f8d14a77 > RDX: 0000000000000004 RSI: 000055b2fc2a5670 RDI: 0000000000000004 > RBP: 000055b2fc2a5670 R08: 0000000000000000 R09: 000055b2fc2a5670 > R10: 00007f32f8e1a2f0 R11: 0000000000000246 R12: 0000000000000004 > R13: 000055b2fc2a2480 R14: 00007f32f8e16600 R15: 00007f32f8e15a00 > </TASK>
On Tue, Oct 24, 2023 at 06:30:38PM +0200, Peter Zijlstra wrote: > On Tue, Oct 24, 2023 at 11:04:06AM -0500, Mario Limonciello wrote: > > > > IIRC this is the hotplug thread running a teardown function on that CPU > > > itself. It being a strict per-cpu thread should not trip > > > smp_processor_id() wanrs. > > > > > > > BUG: using smp_processor_id() in preemptible [00000000] code: rtcwake/2960 > > caller is amd_pmu_lbr_reset+0x19/0xc0 > > CPU: 104 PID: 2960 Comm: rtcwake Not tainted 6.6.0-rc6-00002-g3e2c7f3ac51f > > Very much not the cpuhp/%u thread :/, let me try and figure out how that > happens. Uhh, my bad, these are the PREPARE/DEAD handlers, they run before online and after dying. The CPU is completely dead. Running lbr_reset() here makes no sense. Did that want to be in amd_pmu_cpu_dying() ? > > > #1025 > > Call Trace: > > <TASK> > > dump_stack_lvl+0x44/0x60 > > check_preemption_disabled+0xce/0xf0 > > ? __pfx_x86_pmu_dead_cpu+0x10/0x10 > > amd_pmu_lbr_reset+0x19/0xc0 > > ? __pfx_x86_pmu_dead_cpu+0x10/0x10 > > amd_pmu_cpu_reset.constprop.0+0x51/0x60 > > amd_pmu_cpu_dead+0x3e/0x90 > > x86_pmu_dead_cpu+0x13/0x20 > > cpuhp_invoke_callback+0x169/0x4b0 > > ? __pfx_virtnet_cpu_dead+0x10/0x10 > > __cpuhp_invoke_callback_range+0x76/0xe0 > > _cpu_down+0x112/0x270 > > freeze_secondary_cpus+0x8e/0x280
* Ingo Molnar <mingo@kernel.org> wrote: > > * Mario Limonciello <mario.limonciello@amd.com> wrote: > > > Fixes a BUG reported during suspend to ram testing. > > > > ``` > > [ 478.274752] BUG: using smp_processor_id() in preemptible [00000000] code: rtcwake/2948 > > [ 478.274754] caller is amd_pmu_lbr_reset+0x19/0xc0 > > ``` > > > > Cc: stable@vger.kernel.org # 6.1+ > > Fixes: ca5b7c0d9621 ("perf/x86/amd/lbr: Add LbrExtV2 branch record support") > > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> > > --- > > arch/x86/events/amd/lbr.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c > > index eb31f850841a..5b98e8c7d8b7 100644 > > --- a/arch/x86/events/amd/lbr.c > > +++ b/arch/x86/events/amd/lbr.c > > @@ -321,7 +321,7 @@ int amd_pmu_lbr_hw_config(struct perf_event *event) > > > > void amd_pmu_lbr_reset(void) > > { > > - struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); > > + struct cpu_hw_events *cpuc = get_cpu_ptr(&cpu_hw_events); > > int i; > > > > if (!x86_pmu.lbr_nr) > > @@ -335,6 +335,7 @@ void amd_pmu_lbr_reset(void) > > > > cpuc->last_task_ctx = NULL; > > cpuc->last_log_id = 0; > > + put_cpu_ptr(&cpu_hw_events); > > wrmsrl(MSR_AMD64_LBR_SELECT, 0); > > } > > Weird, amd_pmu_lbr_reset() is called from these places: > > - amd_pmu_lbr_sched_task(): during task sched-in during > context-switching, this should already have preemption disabled. > > - amd_pmu_lbr_add(): this gets indirectly called by amd_pmu::add > (amd_pmu_add_event()), called by event_sched_in(), which too should have > preemption disabled. > > I clearly must have missed some additional place it gets called in. Just for completeness, the additional place I missed is amd_pmu_cpu_reset(): static_call(amd_pmu_branch_reset)(); ... and the amd_pmu_branch_reset static call is set up with amd_pmu_lbr_reset, which is why git grep missed it. Anyway, amd_pmu_cpu_reset() is very much something that should run non-preemptable to begin with, so your patch only papers over the real problem AFAICS. Thanks, Ingo
On 10/24/2023 11:51, Ingo Molnar wrote: > > * Ingo Molnar <mingo@kernel.org> wrote: > >> >> * Mario Limonciello <mario.limonciello@amd.com> wrote: >> >>> Fixes a BUG reported during suspend to ram testing. >>> >>> ``` >>> [ 478.274752] BUG: using smp_processor_id() in preemptible [00000000] code: rtcwake/2948 >>> [ 478.274754] caller is amd_pmu_lbr_reset+0x19/0xc0 >>> ``` >>> >>> Cc: stable@vger.kernel.org # 6.1+ >>> Fixes: ca5b7c0d9621 ("perf/x86/amd/lbr: Add LbrExtV2 branch record support") >>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> >>> --- >>> arch/x86/events/amd/lbr.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c >>> index eb31f850841a..5b98e8c7d8b7 100644 >>> --- a/arch/x86/events/amd/lbr.c >>> +++ b/arch/x86/events/amd/lbr.c >>> @@ -321,7 +321,7 @@ int amd_pmu_lbr_hw_config(struct perf_event *event) >>> >>> void amd_pmu_lbr_reset(void) >>> { >>> - struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); >>> + struct cpu_hw_events *cpuc = get_cpu_ptr(&cpu_hw_events); >>> int i; >>> >>> if (!x86_pmu.lbr_nr) >>> @@ -335,6 +335,7 @@ void amd_pmu_lbr_reset(void) >>> >>> cpuc->last_task_ctx = NULL; >>> cpuc->last_log_id = 0; >>> + put_cpu_ptr(&cpu_hw_events); >>> wrmsrl(MSR_AMD64_LBR_SELECT, 0); >>> } >> >> Weird, amd_pmu_lbr_reset() is called from these places: >> >> - amd_pmu_lbr_sched_task(): during task sched-in during >> context-switching, this should already have preemption disabled. >> >> - amd_pmu_lbr_add(): this gets indirectly called by amd_pmu::add >> (amd_pmu_add_event()), called by event_sched_in(), which too should have >> preemption disabled. >> >> I clearly must have missed some additional place it gets called in. > > Just for completeness, the additional place I missed is > amd_pmu_cpu_reset(): > > static_call(amd_pmu_branch_reset)(); > > ... and the amd_pmu_branch_reset static call is set up with > amd_pmu_lbr_reset, which is why git grep missed it. > > Anyway, amd_pmu_cpu_reset() is very much something that should run > non-preemptable to begin with, so your patch only papers over the real > problem AFAICS. > > Thanks, > > Ingo In that case - should preemption be disabled for all of x86_pmu_dying_cpu() perhaps? For good measure x86_pmu_starting_cpu() too?
On Tue, Oct 24, 2023 at 01:30:59PM -0500, Mario Limonciello wrote: > In that case - should preemption be disabled for all of x86_pmu_dying_cpu() > perhaps? > > For good measure x86_pmu_starting_cpu() too? starting and dying are with IRQs disabled.
On 10/24/2023 10:04 PM, Peter Zijlstra wrote: > On Tue, Oct 24, 2023 at 06:30:38PM +0200, Peter Zijlstra wrote: >> On Tue, Oct 24, 2023 at 11:04:06AM -0500, Mario Limonciello wrote: >> >>>> IIRC this is the hotplug thread running a teardown function on that CPU >>>> itself. It being a strict per-cpu thread should not trip >>>> smp_processor_id() wanrs. >>>> >>> >>> BUG: using smp_processor_id() in preemptible [00000000] code: rtcwake/2960 >>> caller is amd_pmu_lbr_reset+0x19/0xc0 >>> CPU: 104 PID: 2960 Comm: rtcwake Not tainted 6.6.0-rc6-00002-g3e2c7f3ac51f >> >> Very much not the cpuhp/%u thread :/, let me try and figure out how that >> happens. > > Uhh, my bad, these are the PREPARE/DEAD handlers, they run before online > and after dying. The CPU is completely dead. Running lbr_reset() here > makes no sense. > > Did that want to be in amd_pmu_cpu_dying() ? > Agreed, it should have gone into the cpu_dying() callback. lbr_reset() is called once from cpu_starting() so I wonder if its necessary to call it again in the CPU offline path.
diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c index eb31f850841a..5b98e8c7d8b7 100644 --- a/arch/x86/events/amd/lbr.c +++ b/arch/x86/events/amd/lbr.c @@ -321,7 +321,7 @@ int amd_pmu_lbr_hw_config(struct perf_event *event) void amd_pmu_lbr_reset(void) { - struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + struct cpu_hw_events *cpuc = get_cpu_ptr(&cpu_hw_events); int i; if (!x86_pmu.lbr_nr) @@ -335,6 +335,7 @@ void amd_pmu_lbr_reset(void) cpuc->last_task_ctx = NULL; cpuc->last_log_id = 0; + put_cpu_ptr(&cpu_hw_events); wrmsrl(MSR_AMD64_LBR_SELECT, 0); }