Message ID | 20221103175901.164783-4-ashok.raj@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp683551wru; Thu, 3 Nov 2022 11:02:53 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4H7iyOiRugNUTaXH7T+Vxcq8BgcZT+mxegCK2dZMl3ugsKXFoRmeCSat+kVKhEO/Orw0Sg X-Received: by 2002:a17:906:da86:b0:7ad:dc94:1b7 with SMTP id xh6-20020a170906da8600b007addc9401b7mr20208978ejb.288.1667498566108; Thu, 03 Nov 2022 11:02:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667498566; cv=none; d=google.com; s=arc-20160816; b=lIFGF/b5Vr7hn2p39uNO7J/SiAnAE42IC/oK8IgLjg4jqPeAYcxS0gNzmZ2Mm6AVZ1 TYBiYNnRWzjk8tSaIUHwByBMikfokc7acYJK+p7PkGeP0Cu5QDMLvf10UgwM4uldLIGl 8ODY5wQcMGm00V9hxcRoR1KA58O+W8sD6Vs+LM8cawCf2zrx5j+C5yvyWmdT2lACgsM5 LJjXxO5FtWKkej1uxJvYREh5nw5GdbKQd/Hkv8EdS2MGGsCXVbT1bCqQABPSy0Q9O69/ TbOOzFgMJDFQ8YUa10Wlk6WezYKaTJmfoleU5GYKWLeY5T3QRtmD6zHkn41x+uoPNt54 8zsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=q/i7d/GFOXwLz0NcZsTXwIAcsUbDY8yxDLsOFsoNmaU=; b=nVnpC8H8/iDpKB6Ly2vklrO+tqpuSdfp3nmVMi9ia51cxDvVIVENGwMNd5ecJsDZpX irdL8X/tz8tJ1eXi0mZ89OX3ZppeThSMGtrZhPvkhoEBCJDh6GAwpMs+DwUnZx/ycCxk 2Z2FnN1TOsOp2pG3GmlXt64ezkr5HkL1WB0z6aqUDLazGaBI9I7bUey/qGxXzLUNZfh3 1y0DQOLkLCzb2ZpfBJRzZe6mEiOdXgPHBWE7zRmGblcEYzzecOU+NGTIr2FANmsYP9hv rEuUwwoDAkztbhWvovoTLeOW/tOzYnf81Ri+7g94l2OanFQgyw4Q5ES6XIno5j4w2YaW poKg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=fQxmtkdq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l9-20020a170906794900b0078dc5b2b6c4si2151929ejo.666.2022.11.03.11.02.21; Thu, 03 Nov 2022 11:02:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=fQxmtkdq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231925AbiKCSAO (ORCPT <rfc822;yves.mi.zy@gmail.com> + 99 others); Thu, 3 Nov 2022 14:00:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37074 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232049AbiKCR73 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 3 Nov 2022 13:59:29 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 672FC2BC7 for <linux-kernel@vger.kernel.org>; Thu, 3 Nov 2022 10:59:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667498360; x=1699034360; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EQQs2oxMSF89nE0NdVO0VqxxpugPawYi9iOrJgEYQNQ=; b=fQxmtkdqLVlYUQtPsURxuiyIf8W2S/AVNxtMigVguLW0bGGZvd7q5pq3 ur6A4f9CaEZeu2LyHF4kh6lmv3UBXvVNRlccNjvJXhzFr5tSn5mGUsuKK 6I1msJSoLFsE6lwGeLFslrxV9f+l+cPh71wIW59M5+m1rcMJptssygFcs /5S6rfE5FxyLWq4Mr8QKFfX3oHb6D72wfMhwu16dXVpdxHRXmO+qtshka T7icyB4rP0JJa0PCJgiuIfCwmAlB7Bk0rXhG2DDZaonTKNZoAy9AVUzEo o/2FEpn4ItWoLPsMTfAEj1qvqLexQmK6STAqzOhc3xE0XhKn+ZBKiP05g g==; X-IronPort-AV: E=McAfee;i="6500,9779,10520"; a="308476966" X-IronPort-AV: E=Sophos;i="5.96,134,1665471600"; d="scan'208";a="308476966" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Nov 2022 10:59:18 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10520"; a="809762547" X-IronPort-AV: E=Sophos;i="5.96,134,1665471600"; d="scan'208";a="809762547" Received: from araj-dh-work.jf.intel.com ([10.165.157.158]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Nov 2022 10:59:17 -0700 From: Ashok Raj <ashok.raj@intel.com> To: Borislav Petkov <bp@alien8.de>, Thomas Gleixner <tglx@linutronix.de> Cc: "LKML Mailing List" <linux-kernel@vger.kernel.org>, X86-kernel <x86@kernel.org>, Tony Luck <tony.luck@intel.com>, Dave Hansen <dave.hansen@intel.com>, Arjan van de Ven <arjan.van.de.ven@intel.com>, Andy Lutomirski <luto@kernel.org>, Jacon Jun Pan <jacob.jun.pan@intel.com>, Tom Lendacky <thomas.lendacky@amd.com>, Kai Huang <kai.huang@intel.com>, Andrew Cooper <andrew.cooper3@citrix.com>, Ashok Raj <ashok.raj@intel.com> Subject: [v2 03/13] x86/microcode/intel: Fix a hang if early loading microcode fails Date: Thu, 3 Nov 2022 17:58:51 +0000 Message-Id: <20221103175901.164783-4-ashok.raj@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221103175901.164783-1-ashok.raj@intel.com> References: <20221103175901.164783-1-ashok.raj@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748498975961117334?= X-GMAIL-MSGID: =?utf-8?q?1748498975961117334?= |
Series |
Make microcode late loading more robust
|
|
Commit Message
Ashok Raj
Nov. 3, 2022, 5:58 p.m. UTC
When early loading of microcode fails for any reason other than the wrong family-model-stepping, Linux can get into an infinite loop retrying the same failed load. A single retry is needed to handle any mixed stepping case. Assume we have a microcode that fails to load for some reason. load_ucode_ap() seems to retry if the loading fails. But it searches for a new rev, but ends up finding the same copy. Hence it appears to repeat the same load, retry loop for ever. load_ucode_intel_ap() { .. reget: if (!*iup) { patch = __load_ucode_intel(&uci); ^^^^^ Finds the same patch every time. if (!patch) return; *iup = patch; } uci.mc = *iup; if (apply_microcode_early(&uci, true)) { ^^^^^^^^^^^^ apply fails /* Mixed-silicon system? Try to refetch the proper patch: */ *iup = NULL; goto reget; ^^^^^ Rince repeat. } } Fixes: 06b8534cb728 ("x86/microcode: Rework microcode loading") Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Ashok Raj <ashok.raj@intel.com> --- arch/x86/kernel/cpu/microcode/intel.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
Comments
On Thu, Nov 03, 2022 at 05:58:51PM +0000, Ashok Raj wrote: > When early loading of microcode fails for any reason other than the wrong > family-model-stepping, Linux can get into an infinite loop retrying the > same failed load. > > A single retry is needed to handle any mixed stepping case. > > Assume we have a microcode that fails to load for some reason. > load_ucode_ap() seems to retry if the loading fails. But it searches for Seems to retry because we were supporting mixed revisions. Which we do not now. And if you say "seems" then this sounds like the problem hasn't been analyzed properly. If this can happen with the current code, then this needs to be fixed in stable. So, how do you trigger exactly? I'd like to reproduce it myself. As to this patch: it should simply be removing the retrying instead of doing silly crap like bool retried = false; ... In light of how a lot has changed since last time, yes, please redo the patchset ontop of tip:x86/microcode, keeping in mind now that we don't support mixed revisions anymore. Just like dhansen said, you can split it in fixes and new features so that it is not too many patches at once - your call. Thx.
On Wed, Nov 09, 2022 at 12:25:02PM +0100, Borislav Petkov wrote: > On Thu, Nov 03, 2022 at 05:58:51PM +0000, Ashok Raj wrote: > > When early loading of microcode fails for any reason other than the wrong > > family-model-stepping, Linux can get into an infinite loop retrying the > > same failed load. > > > > A single retry is needed to handle any mixed stepping case. > > > > Assume we have a microcode that fails to load for some reason. > > load_ucode_ap() seems to retry if the loading fails. But it searches for > > Seems to retry because we were supporting mixed revisions. Which we do > not now. The retry wasn't the problem, but hitting the same failed microcode over and over is the problem. It is called out in the commit log. As part of dropping mixed stepping, we can drop this retry. Maybe the right way is to remember if the bsp failed, then there is no point in trying to apply on the AP's. reload_early_microcode->reload_ucode_intel() ->apply_microcode_intel() we aren't checking if early load failed for bsp, we should save and skip loading on all AP's. > > And if you say "seems" then this sounds like the problem hasn't been > analyzed properly. If this can happen with the current code, then this > needs to be fixed in stable. So, how do you trigger exactly? > > I'd like to reproduce it myself. Certainly, take the fms+pf of the platform you are testing. - Take a microcode file from the distribution for a different fms that didn't belong to the one you are testing. - You will have to fake the external header data and change it to the one you want microcode match to work - recompute all checksums and use that file instead of the original file. I accidently ran into it since I had a copy of debug uCode that require additional steps before loading. I have a tool that I can change to give you some production microcode that will fail in your platform. Just provide me with the fms+pf values, and I an provide one for your test. Let me know if you need one for testing. > > As to this patch: it should simply be removing the retrying instead of > doing silly crap like > > bool retried = false; > > ... > > In light of how a lot has changed since last time, yes, please redo the > patchset ontop of tip:x86/microcode, keeping in mind now that we don't > support mixed revisions anymore. > > Just like dhansen said, you can split it in fixes and new features so > that it is not too many patches at once - your call. That makes sense, I'll send the bug fix patches separately. Cheers, Ashok
On Wed, Nov 09, 2022 at 08:07:32AM -0800, Ashok Raj wrote: > - Take a microcode file from the distribution for a different fms that didn't > belong to the one you are testing. > - You will have to fake the external header data and change it to the one > you want microcode match to work > - recompute all checksums and use that file instead of the original file. This sounds like this cannot happen with officially released microcode - only with something "hacked". If so, I'm not interested in such "fixes".
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c index 733b5eac0444..8ef04447fcf0 100644 --- a/arch/x86/kernel/cpu/microcode/intel.c +++ b/arch/x86/kernel/cpu/microcode/intel.c @@ -606,6 +606,7 @@ void __init load_ucode_intel_bsp(void) { struct microcode_intel *patch; struct ucode_cpu_info uci; + int rev, ret; patch = __load_ucode_intel(&uci); if (!patch) @@ -613,13 +614,18 @@ void __init load_ucode_intel_bsp(void) uci.mc = patch; - apply_microcode_early(&uci, true); + ret = apply_microcode_early(&uci, true); + if (ret) { + rev = patch->hdr.rev; + pr_err("Revision 0x%x failed during early loading\n", rev); + } } void load_ucode_intel_ap(void) { struct microcode_intel *patch, **iup; struct ucode_cpu_info uci; + bool retried = false; if (IS_ENABLED(CONFIG_X86_32)) iup = (struct microcode_intel **) __pa_nodebug(&intel_ucode_patch); @@ -638,9 +644,13 @@ void load_ucode_intel_ap(void) uci.mc = *iup; if (apply_microcode_early(&uci, true)) { + if (retried) + return; + /* Mixed-silicon system? Try to refetch the proper patch: */ *iup = NULL; + retried = true; goto reget; } }