Message ID | 20230622131841.3153672-2-yazen.ghannam@amd.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp5069913vqr; Thu, 22 Jun 2023 06:29:50 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5eb4sKuj7cLu62s7RCXwibGy+n8ppvw0z8riPDrbeoG3RTuZhWr9kns4fphdolce5Vj8ZO X-Received: by 2002:a05:6a20:54aa:b0:10b:aeff:aa00 with SMTP id i42-20020a056a2054aa00b0010baeffaa00mr23663048pzk.28.1687440590507; Thu, 22 Jun 2023 06:29:50 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1687440590; cv=pass; d=google.com; s=arc-20160816; b=RzPg1RnCibLeMdGX/wa09rkTNvvutA2BNcBGtBkeQNXbWjc0r7dvk/X9JEgyeqh3z3 UQyfMDwyTQHxN6bNAKfyrmHjfhJIw3Ns4t6JUTOhxzH8Tt1Q5fydJ19fRWhvnJlk0GN/ ebe2T3FDaEO3l/IwbVM0Wtpr+ZILtQyXusfh7p8AABnUMCrmrfbh3dmalfU9AVO1XMFi jB+O7Zj+L+vPkdLGhnFgkNVkcUbglr48H2tDGvbZ3jQKtVKvX9uol9HjS3HBYE5NjLCj HkYacD/kyJk4yf47CGonRXshBhORSpiTYI6RXKEgVu9leHKlC8h6DBOZL7azaigd9IB5 Ggdw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=2KmocMG7BBQNO6Tdd3y0QzNKlMi08DyAx0t25guv2SU=; b=NLfSpH9Gm8XliJXVTHBG48pSYz8Q6TkDoYvlxMpdPNmve1WwWUSRN2hg1ea/enG1+t +BZgrQqmRGZCnxZqBon2mC9Uv2PplWrFA/pj9XQEAUeh4q4wELJC/Rni4MdIqbqOhSol ojWdkIe3SyHBuiS1co+HJ0Vr6tX3mK8edk0IhE7o08IlWFx8iQLHJDBbpfx2BRagKWjJ P7++qzwAChHLaugTCEG11hL/qtqU0FW9QXNRrczJ+o7QXt7K3Ev8W1s/NCQT+ZqLYodP Z6+b+/uU1HwDKz+LYfFi8J4A0yE+Bdr+4z+I1+wV1CRi05l1h9eq+Y9CwA+JTjHnmotf HF9g== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amd.com header.s=selector1 header.b="q/VmTcLr"; arc=pass (i=1 spf=pass spfdomain=amd.com dmarc=pass fromdomain=amd.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amd.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b68-20020a62cf47000000b006689f320421si46525pfg.150.2023.06.22.06.29.37; Thu, 22 Jun 2023 06:29:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@amd.com header.s=selector1 header.b="q/VmTcLr"; arc=pass (i=1 spf=pass spfdomain=amd.com dmarc=pass fromdomain=amd.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amd.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230448AbjFVNS7 (ORCPT <rfc822;maxin.john@gmail.com> + 99 others); Thu, 22 Jun 2023 09:18:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37234 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230187AbjFVNS5 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 22 Jun 2023 09:18:57 -0400 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2044.outbound.protection.outlook.com [40.107.92.44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B9E9A198; Thu, 22 Jun 2023 06:18:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=C6vxBYZN+rdE7ad220YvGz+/JjJnGHcjlZ/CrTddC/rNsKdsSdrefASFKQ5gYYopR/5bAQgnC42ADJ8CDheUJT27LaKMR4mZO3DaqYfnF9jj8zqJHkEshXVUNYfjQN3TiajAfpk+KftpuCOat2DMjmPWWhiTzR/34+Yw9/VYdoHHnMOYGr6yysQ6tLXLhKrM8LQlJ/k9f5ByfQ7O7jSEJe7Q3oLhaRT2ktJczde3cDaqtRu//46Ece84cCqeAG2zWwQ2aHWspN8h3M46KoeU4EIT8TW7ZK6n/8HaEoZfTWaTO+AVoOAeInLqrtFFB+r3c2vOURRbsQzo1vcikl63yA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=2KmocMG7BBQNO6Tdd3y0QzNKlMi08DyAx0t25guv2SU=; b=Of/i0N3LRt0neAKn/ILKDZVuSUl5xSaCHzQ8C9/OVaTrx0oQ/hVwOTHL6k48frzM5nqSdzRFQ8IsckSPPBowtcVT/oly8swi6pT3C+HTrMhtdv8mQl3K2lcu3aelakuwbjxo9+6yiStpLCEjbR2ZzXceU/Jr5NHkCUzfteuowpXVBwvMCd8zPixB5Dt+7gNgH0NhfRXVoD3+8jyL4mO4mXCQCrqxzkuZslHWpwtzCJess5fcuat7ANFGqxM85R4IyaxKL1BXlyGGUShRBLhr6qZxl0UCjKiEOVJy2s5x5ZuPrXYq/DQMzJABrDOkaHaZEj8i0zBvKCsRz+tVrrdpKw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=2KmocMG7BBQNO6Tdd3y0QzNKlMi08DyAx0t25guv2SU=; b=q/VmTcLrQ/hy1cNd0cpt6aZCEnKdkJCEniLlx8cxvZuQPFeozDRvD/CG9KgsRJ675RZEMMa9zbfim8CNyT6kOwk9gy29pRPrhaL0ZXjmH58OfJNC+mSU6fYTOgXvWf5t3WB6UbIxFsG+WoW9bT2LeR40LP9iE+TfokMp/VeL6LQ= Received: from DM5PR08CA0041.namprd08.prod.outlook.com (2603:10b6:4:60::30) by LV8PR12MB9081.namprd12.prod.outlook.com (2603:10b6:408:188::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.23; Thu, 22 Jun 2023 13:18:55 +0000 Received: from SA2PEPF0000150A.namprd04.prod.outlook.com (2603:10b6:4:60:cafe::40) by DM5PR08CA0041.outlook.office365.com (2603:10b6:4:60::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.24 via Frontend Transport; Thu, 22 Jun 2023 13:18:54 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by SA2PEPF0000150A.mail.protection.outlook.com (10.167.242.42) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6521.17 via Frontend Transport; Thu, 22 Jun 2023 13:18:54 +0000 Received: from quartz-7b1chost.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Thu, 22 Jun 2023 08:18:53 -0500 From: Yazen Ghannam <yazen.ghannam@amd.com> To: <linux-edac@vger.kernel.org> CC: <linux-kernel@vger.kernel.org>, <tony.luck@intel.com>, <x86@kernel.org>, Yazen Ghannam <yazen.ghannam@amd.com> Subject: [PATCH 1/2] x86/mce: Disable preemption for CPER decoding Date: Thu, 22 Jun 2023 08:18:40 -0500 Message-ID: <20230622131841.3153672-2-yazen.ghannam@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230622131841.3153672-1-yazen.ghannam@amd.com> References: <20230622131841.3153672-1-yazen.ghannam@amd.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SA2PEPF0000150A:EE_|LV8PR12MB9081:EE_ X-MS-Office365-Filtering-Correlation-Id: e4939043-1299-49f5-0a02-08db73233a9e X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: lqk8tB0rvuM/VmSA+Y5OEJ5GjRL3UqHKxkh0hOC0gQdS5uVbMpi8qMPlvZ4Ndtne98aPhPhaSPitrBE1YFcfxotYXkWjI733TkkTBL7tniZ3QOdpTCbgPuPilWvzRnEa/MfI7LUU2wenhkI6P8ESvCJnggE9uLXwWGUz665wP2mzyRmnAGWTRLvwU8F6rwoxOykxQ0+gRMKxDS4ozSoD8Qch27n5WSy3mr6ucCzUJeGDCZjMVPWVued4vq8F48TsWdC3hse9PDvsgped1eYF9oavNffDRXgXpCdofHgL2n1jOCaDlsv69xdVCaVWAEjI20Wa39Qh6LEYsDqNq12Ev8BRgugCkLIXeF/g00Kmsr9i9hl/40fzzHT4oDGublDAtAnKCf2KpAAvA6+RBhvxNLBjGrmFgYYoE2cQ1eHLIx0+xkgFXL4MC5367JeN1XlAyknvirtOLEIqzsIJfWNQpygRzQ9CyqZ7wqCfHCREtnMXWv4W+j4xOdiZimyBacaotPsBWRE6Bhh/+O7smPNfZJ03oQqCA3SiqYPXvb8ZImbb/QBExGuacS1GqVaVxZ6muWwYKtIh/AOIdnLU/n3kjYmKI1PKz42GrUs1GX/LxzA8JTDLkzrl0MYI9Pl77EWQXZn9bJz8QzK/BZ1XjbHXMHu1ke2V0L7vR9OoTONV/yW3utIKfenI2lSSCz+GHHNr7fKNSPRQSuYt5l9y2sSm/mqFpdcUn1fDj24wUJcweL5htRjZTpTOg9OuMDfbIuULbd177Pg8KF96qb0nad9yaw== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230028)(4636009)(346002)(136003)(376002)(39860400002)(396003)(451199021)(36840700001)(46966006)(40470700004)(70206006)(70586007)(4326008)(316002)(8676002)(8936002)(6916009)(16526019)(41300700001)(26005)(186003)(2616005)(1076003)(54906003)(40460700003)(6666004)(426003)(336012)(7696005)(82310400005)(2906002)(5660300002)(478600001)(40480700001)(44832011)(81166007)(356005)(82740400003)(47076005)(36860700001)(83380400001)(36756003)(86362001)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Jun 2023 13:18:54.6917 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: e4939043-1299-49f5-0a02-08db73233a9e X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: SA2PEPF0000150A.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: LV8PR12MB9081 X-Spam-Status: No, score=-1.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769409704788433170?= X-GMAIL-MSGID: =?utf-8?q?1769409704788433170?= |
Series |
SMCA CPER Fixes
|
|
Commit Message
Yazen Ghannam
June 22, 2023, 1:18 p.m. UTC
Scalable MCA systems may report errors found during boot-time polling
through the ACPI Boot Error Record Table (BERT). The errors are logged
in an "x86 Processor" Common Platform Error Record (CPER). The format of
the x86 CPER does not include a logical CPU number, but it does provide
the logical APIC ID for the logical CPU. Also, it does not explicitly
provide MCA error information, but it can share this information using
an "MSR Context" defined in the CPER format.
The MCA error information is parsed by
1) Checking that the context matches the Scalable MCA register space.
2) Finding the logical CPU that matches the logical APIC ID from the
CPER.
3) Filling in struct mce with the relevant data and logging it.
All the above is done when the BERT is processed during late init. This
can be scheduled on any CPU, and it may be preemptible.
This results in two issues.
1) mce_setup() includes a call to smp_processor_id(). This will throw a
warning if preemption is enabled.
2) mce_setup() will pull info from the executing CPU, so some info in
struct mce may be incorrect for the CPU with the error. For example,
in a dual-socket system, an error logged in socket 1 CPU but
processed by a socket 0 CPU will save the PPIN of the socket 0 CPU.
Fix the first issue by locally disabling preemption before calling
mce_setup().
Fixes: 4a24d80b8c3e ("x86/mce, cper: Pass x86 CPER through the MCA handling chain")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: stable@vger.kernel.org
---
arch/x86/kernel/cpu/mce/apei.c | 2 ++
1 file changed, 2 insertions(+)
Comments
> All the above is done when the BERT is processed during late init. This > can be scheduled on any CPU, and it may be preemptible. > 2) mce_setup() will pull info from the executing CPU, so some info in > struct mce may be incorrect for the CPU with the error. For example, > in a dual-socket system, an error logged in socket 1 CPU but > processed by a socket 0 CPU will save the PPIN of the socket 0 CPU. > Fix the first issue by locally disabling preemption before calling > mce_setup(). It doesn't really fix the issue, it just makes the warnings go away. The BERT record was created because some error crashed the system. It's being parsed by a CPU that likely had nothing to do with the actual error that occurred in the previous incarnation of the OS. If there is a CPER record in the BERT data that includes CPU information, that would be the right thing to use. Alternatively is there some invalid CPU value that could be loaded into the "struct mce"? -Tony
On 6/22/2023 11:35 AM, Luck, Tony wrote: >> All the above is done when the BERT is processed during late init. This >> can be scheduled on any CPU, and it may be preemptible. > >> 2) mce_setup() will pull info from the executing CPU, so some info in >> struct mce may be incorrect for the CPU with the error. For example, >> in a dual-socket system, an error logged in socket 1 CPU but >> processed by a socket 0 CPU will save the PPIN of the socket 0 CPU. > >> Fix the first issue by locally disabling preemption before calling >> mce_setup(). > > It doesn't really fix the issue, it just makes the warnings go away. > > The BERT record was created because some error crashed the > system. It's being parsed by a CPU that likely had nothing > to do with the actual error that occurred in the previous incarnation > of the OS. > Yes, these are true statements. > If there is a CPER record in the BERT data that includes CPU > information, that would be the right thing to use. Alternatively > is there some invalid CPU value that could be loaded into the > "struct mce"? > This is the reason we search for the logical CPU number using the Local APIC ID provided in the CPER. And fill in relevant data using that CPU number. Thanks, Yazen
> This is the reason we search for the logical CPU number using the Local > APIC ID provided in the CPER. And fill in relevant data using that CPU > number. So you don't care which CPU number mce_setup() used because you are going to update it with the right one from CPER? Then maybe the fix for part 1 is just to use raw_smp_processor_id() instead of smp_processor_id() to avoid the warning for calling with pre-emption enabled, instead of disabling premption with the get_cpu() ... put_cpu() wrap around the call to mce_setup()? -Tony
On 6/22/2023 1:05 PM, Luck, Tony wrote: >> This is the reason we search for the logical CPU number using the Local >> APIC ID provided in the CPER. And fill in relevant data using that CPU >> number. > > So you don't care which CPU number mce_setup() used because you are > going to update it with the right one from CPER? > That's right. > Then maybe the fix for part 1 is just to use raw_smp_processor_id() instead of > smp_processor_id() to avoid the warning for calling with pre-emption enabled, > instead of disabling premption with the get_cpu() ... put_cpu() wrap around the > call to mce_setup()? You mean use raw_smp_processor_id() in mce_setup()? I thought about that, but decided against it. I figure the preemption warning is helpful to catch issues when mce_setup() *is* supposed to run on the current CPU but doesn't. This BERT decoding path is the only exception AFAIK. So I didn't want to change the common code for a single exception. I just noticed a similar potential issue with mce_setup() in apei_mce_report_mem_error(). How is the CPU number decided there? Is it always "don't care", since the mce record is "fake"? Here are another couple of solutions for the preemption issue. 1) Don't use mce_setup() at all. Instead, do the memset(), etc. in the local function. This would result in some code duplication. 2) Split mce_setup() into global and per_cpu parts. The memset(), cpuid, etc. would be global, and the cpu_data()* and rdmsr() would be per_cpu. Option #2 can also be used in apei_mce_report_mem_error(), I think. Thanks, Yazen
> 2) Split mce_setup() into global and per_cpu parts. The memset(), cpuid, > etc. would be global, and the cpu_data()* and rdmsr() would be per_cpu. That sounds good. So global is: memset(m, 0, sizeof(struct mce)); /* need the internal __ version to avoid deadlocks */ m->time = __ktime_get_real_seconds(); m->cpuvendor = boot_cpu_data.x86_vendor; m->mcgcap = __rdmsr(MSR_IA32_MCG_CAP); m->microcode = boot_cpu_data.microcode; m->cpuid = cpuid_eax(1); Though that last one is perhaps per-cpu if you want to allow for mixed-stepping systems. Perhaps m->time also? Questionable whether it is useful to log time this record was created, when it refers to something much earlier in the BERT case. and per-cpu is: m->cpu = m->extcpu = smp_processor_id(); m->socketid = cpu_data(m->extcpu).phys_proc_id; m->apicid = cpu_data(m->extcpu).initial_apicid; m->ppin = cpu_data(m->extcpu).ppin; > Option #2 can also be used in apei_mce_report_mem_error(), I think. Agreed. -Tony
On 6/22/2023 3:42 PM, Luck, Tony wrote: >> 2) Split mce_setup() into global and per_cpu parts. The memset(), cpuid, >> etc. would be global, and the cpu_data()* and rdmsr() would be per_cpu. > > That sounds good. So global is: > > memset(m, 0, sizeof(struct mce)); > /* need the internal __ version to avoid deadlocks */ > m->time = __ktime_get_real_seconds(); > m->cpuvendor = boot_cpu_data.x86_vendor; > m->mcgcap = __rdmsr(MSR_IA32_MCG_CAP); MCG_CAP would be per_cpu, because the bank count can vary. But I don't think this matters in practice. So leaving it global is okay, I think. > m->microcode = boot_cpu_data.microcode; > m->cpuid = cpuid_eax(1); > > Though that last one is perhaps per-cpu if you want to allow for mixed-stepping systems. > Perhaps m->time also? Questionable whether it is useful to log time this record > was created, when it refers to something much earlier in the BERT case. > I agree about m->time. It doesn't seem useful in this case. But I don't know about m->cpuid. Mixing processor revisions is not allowed on AMD systems, and I don't know about other vendors. So I'd leave m->cpuid as global unless there's a strong case otherwise. > and per-cpu is: > > m->cpu = m->extcpu = smp_processor_id(); > m->socketid = cpu_data(m->extcpu).phys_proc_id; > m->apicid = cpu_data(m->extcpu).initial_apicid; > m->ppin = cpu_data(m->extcpu).ppin; > >> Option #2 can also be used in apei_mce_report_mem_error(), I think. > > Agreed. > Okay, I'll update that too. Thanks, Yazen
> But I don't know about m->cpuid. Mixing processor revisions is not > allowed on AMD systems, and I don't know about other vendors. So I'd > leave m->cpuid as global unless there's a strong case otherwise. There is (or was) support for mixed stepping in the microcode update code. Not sure if Boris and Ashok came to any agreement on keeping it. -Tony
On Fri, Jun 23, 2023 at 03:44:06PM +0000, Luck, Tony wrote: > There is (or was) support for mixed stepping in the microcode update > code. Not sure if Boris and Ashok came to any agreement on keeping it. Yap, needs to stay on AMD as the loader has always supported it. Btw, you might wanna update your bookmarks - bp@suse.de doesn't work anymore. :-)
On 6/23/2023 12:01 PM, Borislav Petkov wrote: > On Fri, Jun 23, 2023 at 03:44:06PM +0000, Luck, Tony wrote: >> There is (or was) support for mixed stepping in the microcode update >> code. Not sure if Boris and Ashok came to any agreement on keeping it. > > Yap, needs to stay on AMD as the loader has always supported it. > I don't understand this. Maybe it's a wording thing. I see the following in a PPR document. Section: Mixed Processor Revision Supports AMD Family XXh Models XXh processors with different OPNs or different revisions cannot be mixed in a multiprocessor system. If the BIOS detects an unsupported configuration, the system will halt prior to X86 core release and signal a port 80 error code. Is stepping not included in this statement? Or do you mean that we can support mixed microcode systems? Meaning the processors are identical but with different microcode versions. Thanks, Yazen
On Fri, Jun 23, 2023 at 12:14:00PM -0400, Yazen Ghannam wrote:
> Or do you mean that we can support mixed microcode systems?
We can and always have. I can chase down hw guys at some point and have
them make a definitive statement about mixed silicon steppings but we
have bigger fish to fry right now.
:-)
On 6/23/2023 12:42 PM, Borislav Petkov wrote: > On Fri, Jun 23, 2023 at 12:14:00PM -0400, Yazen Ghannam wrote: >> Or do you mean that we can support mixed microcode systems? > > We can and always have. I can chase down hw guys at some point and have > them make a definitive statement about mixed silicon steppings but we > have bigger fish to fry right now. > > :-) > No problem. I'll work on a solution that covers this case then. Thanks, Yazen
diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c index 8ed341714686..2a7a51ca2995 100644 --- a/arch/x86/kernel/cpu/mce/apei.c +++ b/arch/x86/kernel/cpu/mce/apei.c @@ -97,7 +97,9 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id) if (ctx_info->reg_arr_size < 48) return -EINVAL; + get_cpu(); mce_setup(&m); + put_cpu(); m.extcpu = -1; m.socketid = -1;