Message ID | 20221102074713.21493-1-jgross@suse.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp3469162wru; Wed, 2 Nov 2022 00:51:08 -0700 (PDT) X-Google-Smtp-Source: AMsMyM602/7OVsMDwIQLK8eIbexH+6QtqHeYpZHvPX0qzqW0cQqDtQoBwfpLZOqCwGPcm5sCdp25 X-Received: by 2002:a05:6402:90a:b0:458:ca4d:a2f8 with SMTP id g10-20020a056402090a00b00458ca4da2f8mr23261723edz.230.1667375467912; Wed, 02 Nov 2022 00:51:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667375467; cv=none; d=google.com; s=arc-20160816; b=mX0Nb5FH3Hk7B2wBTET6vsQaRPYnWgA8ipC464m1TTW4hnAg1bl4v2CSR92F52ZMYG U+59iPhF94MwXvyBsYES4/J8FWagFIC8/yoDi8CgsJtDxVJ9ierREfGMfPcezkejetNM OmlQrbEUcLRJ0gcwJlBCkZz/vPBklsSOxkJEir/gf++/x/t5mRvMyiu8jZ+qbS9H30rd Rs1C9KIB92lWP9ew1hmEdMDZgzinEqYB2ju42Ip1y703qBocf6euJ2D6I9cz2Y8mUMg2 s/uGRYQJ7nDHuMS/iNRYVnJsJTWvrRG0NbdVMtXckysUes27mCYE2nX4DCbbDATQW9oV s4BQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=QXxgpBh7/7Ngps7X8Ud2Cswcq3gje0X5VBuwnN65KQc=; b=r9S7sMtKQ9cdadZ4ZAeod7OutUechTsLwa/TvPX0gkNkvjoMv9XNLchp8m0IyxHyJQ 54HfEuw1oHmgHWAYfajhdHOiaF44VsUmkiMGCP5ZqL8WdSKkv0KK7kKhB51dx7nxSJcv 4CqIXq0MUl7mInJjO5jzXxpsOQ0XP3gYWMIoYC6BFQmLSKeM8eNRxcHCffdR16rZCzY3 vnxaaXdPFp6D8Swk0oenqhr2HK0r3Kj1Udn2iejtIc9GEDqzuvd76ZUJs0O2xHk/waKI wZ+Fr+vnyh5QvRdbPZQMuGiqM0mtH9qBRct+0fAeCGlAmn33JXeC+RukAPll7XjIME3j WHpg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=n8TLRKVn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id du5-20020a17090772c500b007a8c58b51a1si15708255ejc.179.2022.11.02.00.50.44; Wed, 02 Nov 2022 00:51:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=n8TLRKVn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230416AbiKBHrU (ORCPT <rfc822;rua109.linux@gmail.com> + 99 others); Wed, 2 Nov 2022 03:47:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58236 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230026AbiKBHrS (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 2 Nov 2022 03:47:18 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C89025E8C; Wed, 2 Nov 2022 00:47:17 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id AB4A31F8B0; Wed, 2 Nov 2022 07:47:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1667375235; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=QXxgpBh7/7Ngps7X8Ud2Cswcq3gje0X5VBuwnN65KQc=; b=n8TLRKVn9h8Rnp5Qz1eBqnKdpz+LlW+wk2cr0jH2R9Mf0D6vec0wj4098jH0FEygWoopCM Q5RZbiPcLUOekkdwOsPUF5q5FEqI7LAbyV7+H26oaUgT5DwGL0giJK6ol20V5ZG91ZcSQO DoOjkLWZ/z8qlJoUOLpd9l1cZ5vykT4= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 52AB51376E; Wed, 2 Nov 2022 07:47:15 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id Hyq9EoMgYmOBcgAAMHmgww (envelope-from <jgross@suse.com>); Wed, 02 Nov 2022 07:47:15 +0000 From: Juergen Gross <jgross@suse.com> To: linux-kernel@vger.kernel.org, x86@kernel.org, linux-pm@vger.kernel.org Cc: Juergen Gross <jgross@suse.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, "H. Peter Anvin" <hpa@zytor.com>, "Rafael J. Wysocki" <rafael@kernel.org>, Pavel Machek <pavel@ucw.cz>, Andy Lutomirski <luto@kernel.org>, Peter Zijlstra <peterz@infradead.org> Subject: [PATCH v5 00/16] x86: make PAT and MTRR independent from each other Date: Wed, 2 Nov 2022 08:46:57 +0100 Message-Id: <20221102074713.21493-1-jgross@suse.com> X-Mailer: git-send-email 2.35.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748369898734159475?= X-GMAIL-MSGID: =?utf-8?q?1748369898734159475?= |
Series |
x86: make PAT and MTRR independent from each other
|
|
Message
Juergen Gross
Nov. 2, 2022, 7:46 a.m. UTC
Today PAT can't be used without MTRR being available, unless MTRR is at least configured via CONFIG_MTRR and the system is running as Xen PV guest. In this case PAT is automatically available via the hypervisor, but the PAT MSR can't be modified by the kernel and MTRR is disabled. The same applies to a kernel built with no MTRR support: it won't allow to use the PAT MSR, even if there is no technical reason for that, other than setting up PAT on all CPUs the same way (which is a requirement of the processor's cache management) is relying on some MTRR specific code. Fix all of that by: - moving the function needed by PAT from MTRR specific code one level up - reworking the init sequences of MTRR and PAT to be more similar to each other without calling PAT from MTRR code - removing the dependency of PAT on MTRR There is some more cleanup done reducing code size. Note that patches 1+2 have already been applied to tip.git x86/cpu. They are included in this series only for reference. Changes in V5: - addressed comments Changes in V4: - new patches 10, 14, 15, 16 - split up old patch 4 into 3 patches - addressed comments Changes in V3: - replace patch 1 by just adding a comment Changes in V2: - complete rework of the patches based on comments by Boris Petkov - added several patches to the series Juergen Gross (16): x86/mtrr: add comment for set_mtrr_state() serialization x86/mtrr: remove unused cyrix_set_all() function x86/mtrr: replace use_intel() with a local flag x86/mtrr: rename prepare_set() and post_set() x86/mtrr: split MTRR specific handling from cache dis/enabling x86: move some code out of arch/x86/kernel/cpu/mtrr x86/mtrr: Disentangle MTRR init from PAT init. x86/mtrr: remove set_all callback from struct mtrr_ops x86/mtrr: simplify mtrr_bp_init() x86/mtrr: get rid of __mtrr_enabled bool x86/mtrr: let cache_aps_delayed_init replace mtrr_aps_delayed_init x86/mtrr: add a stop_machine() handler calling only cache_cpu_init() x86: decouple PAT and MTRR handling x86: switch cache_ap_init() to hotplug callback x86: do MTRR/PAT setup on all secondary CPUs in parallel x86/mtrr: simplify mtrr_ops initialization arch/x86/include/asm/cacheinfo.h | 19 ++++ arch/x86/include/asm/memtype.h | 5 +- arch/x86/include/asm/mtrr.h | 17 +-- arch/x86/kernel/cpu/cacheinfo.c | 173 +++++++++++++++++++++++++++++ arch/x86/kernel/cpu/common.c | 2 +- arch/x86/kernel/cpu/mtrr/amd.c | 8 +- arch/x86/kernel/cpu/mtrr/centaur.c | 8 +- arch/x86/kernel/cpu/mtrr/cyrix.c | 42 +------ arch/x86/kernel/cpu/mtrr/generic.c | 127 ++++----------------- arch/x86/kernel/cpu/mtrr/mtrr.c | 171 ++++------------------------ arch/x86/kernel/cpu/mtrr/mtrr.h | 15 +-- arch/x86/kernel/setup.c | 14 +-- arch/x86/kernel/smpboot.c | 9 +- arch/x86/mm/pat/memtype.c | 152 +++++++++---------------- arch/x86/power/cpu.c | 3 +- include/linux/cpuhotplug.h | 1 + 16 files changed, 308 insertions(+), 458 deletions(-)
Comments
On Wed, Nov 02, 2022 at 08:46:57AM +0100, Juergen Gross wrote: > Today PAT can't be used without MTRR being available, unless MTRR is at > least configured via CONFIG_MTRR and the system is running as Xen PV > guest. In this case PAT is automatically available via the hypervisor, > but the PAT MSR can't be modified by the kernel and MTRR is disabled. > > The same applies to a kernel built with no MTRR support: it won't > allow to use the PAT MSR, even if there is no technical reason for > that, other than setting up PAT on all CPUs the same way (which is a > requirement of the processor's cache management) is relying on some > MTRR specific code. > > Fix all of that by: One of the AMD test boxes here says with this: ... [ 0.863466] PCI: not using MMCONFIG [ 0.863475] PCI: Using configuration type 1 for base access [ 0.863478] PCI: Using configuration type 1 for extended access [ 0.866733] mtrr: your CPUs had inconsistent MTRRdefType settings [ 0.866737] mtrr: probably your BIOS does not setup all CPUs. [ 0.866740] mtrr: corrected configuration. [ 0.869350] kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible. ... Previous logs don't have it: PCI: not using MMCONFIG PCI: Using configuration type 1 for base access PCI: Using configuration type 1 for extended access kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible.
On 02.11.22 19:04, Borislav Petkov wrote: > On Wed, Nov 02, 2022 at 08:46:57AM +0100, Juergen Gross wrote: >> Today PAT can't be used without MTRR being available, unless MTRR is at >> least configured via CONFIG_MTRR and the system is running as Xen PV >> guest. In this case PAT is automatically available via the hypervisor, >> but the PAT MSR can't be modified by the kernel and MTRR is disabled. >> >> The same applies to a kernel built with no MTRR support: it won't >> allow to use the PAT MSR, even if there is no technical reason for >> that, other than setting up PAT on all CPUs the same way (which is a >> requirement of the processor's cache management) is relying on some >> MTRR specific code. >> >> Fix all of that by: > > One of the AMD test boxes here says with this: > > ... > [ 0.863466] PCI: not using MMCONFIG > [ 0.863475] PCI: Using configuration type 1 for base access > [ 0.863478] PCI: Using configuration type 1 for extended access > [ 0.866733] mtrr: your CPUs had inconsistent MTRRdefType settings > [ 0.866737] mtrr: probably your BIOS does not setup all CPUs. > [ 0.866740] mtrr: corrected configuration. > [ 0.869350] kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible. > ... > > Previous logs don't have it: > > PCI: not using MMCONFIG > PCI: Using configuration type 1 for base access > PCI: Using configuration type 1 for extended access > kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible. > Weird. I can't spot any modification which could have caused that. Would it be possible to identify the patch causing that? Juergen
On Thu, Nov 03, 2022 at 09:40:32AM +0100, Juergen Gross wrote:
> Would it be possible to identify the patch causing that?
Lemme try to find a smaller box which shows that too - that one is a
pain to bisect on.
On Thu, Nov 03, 2022 at 05:15:52PM +0100, Borislav Petkov wrote: > Lemme try to find a smaller box which shows that too - that one is a > pain to bisect on. Ok, couldn't find a smaller one (or maybe it had to be a big one to tickle this out). So I think it is the parallel setup thing: x86/mtrr: Do MTRR/PAT setup on all secondary CPUs in parallel Note that before it, it would do the configuration sequentially on each CPU: [ 0.759239] MTRR: prepare_set: CPU83, MSR_MTRRdefType: 0x0, read: (0xc00:0) [ 0.759239] MTRR: set_mtrr_state: CPU83, mtrr_deftype_lo: 0xc00, mtrr_state.def_type: 0, mtrr_state.enabled: 3 [ 0.760794] MTRR: post_set: CPU83, MSR_MTRRdefType will write: (0xc00:0) [ 0.761151] MTRR: prepare_set: CPU70, MSR_MTRRdefType: 0x0, read: (0xc00:0) [ 0.761151] MTRR: set_mtrr_state: CPU70, mtrr_deftype_lo: 0xc00, mtrr_state.def_type: 0, mtrr_state.enabled: 3 [ 0.761151] MTRR: post_set: CPU70, MSR_MTRRdefType will write: (0xc00:0) ... and so on. Now, it would do it all in parallel: [ 0.762006] MTRR: mtrr_disable: CPU70, MSR_MTRRdefType: 0x0, read: (0xc00:0) [ 0.761916] MTRR: mtrr_disable: CPU18, MSR_MTRRdefType: 0x0, read: (0xc00:0) [ 0.761808] MTRR: mtrr_disable: CPU82, MSR_MTRRdefType: 0x0, read: (0xc00:0) [ 0.762593] MTRR: mtrr_disable: CPU6, MSR_MTRRdefType: 0x0, read: (0x0:0) ^^^^^^ Note that last thing. That comes from (with debug output added): void mtrr_disable(struct cache_state *state) { unsigned int cpu = smp_processor_id(); u64 msrval; /* Save MTRR state */ rdmsr(MSR_MTRRdefType, state->mtrr_deftype_lo, state->mtrr_deftype_hi); /* Disable MTRRs, and set the default type to uncached */ mtrr_wrmsr(MSR_MTRRdefType, state->mtrr_deftype_lo & ~0xcff, state->mtrr_deftype_hi); rdmsrl(MSR_MTRRdefType, msrval); pr_info("%s: CPU%d, MSR_MTRRdefType: 0x%llx, read: (0x%x:%x)\n", __func__, cpu, msrval, state->mtrr_deftype_lo, state->mtrr_deftype_hi); } The "read: (0x0:0)" basically says that state->mtrr_deftype_lo, state->mtrr_deftype_hi are both 0 already. BUT(!), they should NOT be. The low piece is 0xc00 on most cores except a handful and it means that MTRRs and Fixed Range are enabled. In total, they're these cores here: [ 0.762593] MTRR: mtrr_disable: CPU6, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762247] MTRR: mtrr_disable: CPU26, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762685] MTRR: mtrr_disable: CPU68, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762725] MTRR: mtrr_disable: CPU17, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762685] MTRR: mtrr_disable: CPU69, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762800] MTRR: mtrr_disable: CPU1, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762734] MTRR: mtrr_disable: CPU13, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762720] MTRR: mtrr_disable: CPU24, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762696] MTRR: mtrr_disable: CPU66, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762716] MTRR: mtrr_disable: CPU48, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762693] MTRR: mtrr_disable: CPU57, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762519] MTRR: mtrr_disable: CPU87, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762532] MTRR: mtrr_disable: CPU58, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762755] MTRR: mtrr_disable: CPU32, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762693] MTRR: mtrr_disable: CPU52, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762861] MTRR: mtrr_disable: CPU0, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762724] MTRR: mtrr_disable: CPU21, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762640] MTRR: mtrr_disable: CPU15, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762615] MTRR: mtrr_disable: CPU50, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762741] MTRR: mtrr_disable: CPU40, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762738] MTRR: mtrr_disable: CPU37, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762716] MTRR: mtrr_disable: CPU25, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762512] MTRR: mtrr_disable: CPU59, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762721] MTRR: mtrr_disable: CPU45, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762682] MTRR: mtrr_disable: CPU56, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762583] MTRR: mtrr_disable: CPU124, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762751] MTRR: mtrr_disable: CPU12, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762741] MTRR: mtrr_disable: CPU9, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762575] MTRR: mtrr_disable: CPU51, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762632] MTRR: mtrr_disable: CPU100, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762688] MTRR: mtrr_disable: CPU61, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762610] MTRR: mtrr_disable: CPU105, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762721] MTRR: mtrr_disable: CPU20, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.762583] MTRR: mtrr_disable: CPU47, MSR_MTRRdefType: 0x0, read: (0x0:0) Now, if I add MFENCEs around those RDMSRs: void mtrr_disable(struct cache_state *state) { unsigned int cpu = smp_processor_id(); u64 msrval; /* Save MTRR state */ rdmsr(MSR_MTRRdefType, state->mtrr_deftype_lo, state->mtrr_deftype_hi); __mb(); /* Disable MTRRs, and set the default type to uncached */ mtrr_wrmsr(MSR_MTRRdefType, state->mtrr_deftype_lo & ~0xcff, state->mtrr_deftype_hi); __mb(); rdmsrl(MSR_MTRRdefType, msrval); pr_info("%s: CPU%d, MSR_MTRRdefType: 0x%llx, read: (0x%x:%x)\n", __func__, cpu, msrval, state->mtrr_deftype_lo, state->mtrr_deftype_hi); __mb(); } the amount of cores becomes less: [ 0.765260] MTRR: mtrr_disable: CPU6, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.765462] MTRR: mtrr_disable: CPU5, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.765242] MTRR: mtrr_disable: CPU22, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.765522] MTRR: mtrr_disable: CPU0, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.765474] MTRR: mtrr_disable: CPU1, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.765207] MTRR: mtrr_disable: CPU54, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.765225] MTRR: mtrr_disable: CPU8, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.765282] MTRR: mtrr_disable: CPU88, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.765150] MTRR: mtrr_disable: CPU119, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.765370] MTRR: mtrr_disable: CPU49, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.765395] MTRR: mtrr_disable: CPU16, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.765348] MTRR: mtrr_disable: CPU52, MSR_MTRRdefType: 0x0, read: (0x0:0) [ 0.765270] MTRR: mtrr_disable: CPU58, MSR_MTRRdefType: 0x0, read: (0x0:0) which basically hints at some speculative fun where we end up reading the MSR *after* the write to it has already happened. After this thing: /* Disable MTRRs, and set the default type to uncached */ mtrr_wrmsr(MSR_MTRRdefType, state->mtrr_deftype_lo & ~0xcff, state->mtrr_deftype_hi); and thus when we read it, we already read the disabled state. But this is only a conjecture because I still have no clear idea how TF would that even happen?!? Needless to say, this fixes it, ofc: diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c index 3805a6d32d37..4a685898caf3 100644 --- a/arch/x86/kernel/cpu/cacheinfo.c +++ b/arch/x86/kernel/cpu/cacheinfo.c @@ -1116,12 +1116,14 @@ void cache_enable(struct cache_state *state) __write_cr4(state->cr4); } +static DEFINE_RAW_SPINLOCK(set_atomicity_lock); + static void cache_cpu_init(void) { unsigned long flags; struct cache_state state = { }; - local_irq_save(flags); + raw_spin_lock_irqsave(&set_atomicity_lock, flags); cache_disable(&state); if (memory_caching_control & CACHE_MTRR) @@ -1131,7 +1133,7 @@ static void cache_cpu_init(void) pat_cpu_init(); cache_enable(&state); - local_irq_restore(flags); + raw_spin_unlock_irqrestore(&set_atomicity_lock, flags); } static bool cache_aps_delayed_init = true; --- and frankly, considering how we have bigger fish to fry, I'd say we do it the old way and leave that can'o'worms half-opened. Unless you wanna continue poking at it. I can give you access to that box at work... Thx.
On 07.11.22 20:25, Borislav Petkov wrote: > On Thu, Nov 03, 2022 at 05:15:52PM +0100, Borislav Petkov wrote: >> Lemme try to find a smaller box which shows that too - that one is a >> pain to bisect on. > > Ok, couldn't find a smaller one (or maybe it had to be a big one to > tickle this out). > > So I think it is the parallel setup thing: > > x86/mtrr: Do MTRR/PAT setup on all secondary CPUs in parallel > > Note that before it, it would do the configuration sequentially on each > CPU: > > [ 0.759239] MTRR: prepare_set: CPU83, MSR_MTRRdefType: 0x0, read: (0xc00:0) > [ 0.759239] MTRR: set_mtrr_state: CPU83, mtrr_deftype_lo: 0xc00, mtrr_state.def_type: 0, mtrr_state.enabled: 3 > [ 0.760794] MTRR: post_set: CPU83, MSR_MTRRdefType will write: (0xc00:0) > [ 0.761151] MTRR: prepare_set: CPU70, MSR_MTRRdefType: 0x0, read: (0xc00:0) > [ 0.761151] MTRR: set_mtrr_state: CPU70, mtrr_deftype_lo: 0xc00, mtrr_state.def_type: 0, mtrr_state.enabled: 3 > [ 0.761151] MTRR: post_set: CPU70, MSR_MTRRdefType will write: (0xc00:0) > ... > > and so on. > > Now, it would do it all in parallel: > > [ 0.762006] MTRR: mtrr_disable: CPU70, MSR_MTRRdefType: 0x0, read: (0xc00:0) > [ 0.761916] MTRR: mtrr_disable: CPU18, MSR_MTRRdefType: 0x0, read: (0xc00:0) > [ 0.761808] MTRR: mtrr_disable: CPU82, MSR_MTRRdefType: 0x0, read: (0xc00:0) > [ 0.762593] MTRR: mtrr_disable: CPU6, MSR_MTRRdefType: 0x0, read: (0x0:0) > ^^^^^^ > > Note that last thing. That comes from (with debug output added): > > void mtrr_disable(struct cache_state *state) > { > unsigned int cpu = smp_processor_id(); > u64 msrval; > > /* Save MTRR state */ > rdmsr(MSR_MTRRdefType, state->mtrr_deftype_lo, state->mtrr_deftype_hi); > > /* Disable MTRRs, and set the default type to uncached */ > mtrr_wrmsr(MSR_MTRRdefType, state->mtrr_deftype_lo & ~0xcff, > state->mtrr_deftype_hi); > > rdmsrl(MSR_MTRRdefType, msrval); > > pr_info("%s: CPU%d, MSR_MTRRdefType: 0x%llx, read: (0x%x:%x)\n", > __func__, cpu, msrval, state->mtrr_deftype_lo, state->mtrr_deftype_hi); > } > > The "read: (0x0:0)" basically says that > > state->mtrr_deftype_lo, state->mtrr_deftype_hi > > are both 0 already. BUT(!), they should NOT be. The low piece is 0xc00 on most > cores except a handful and it means that MTRRs and Fixed Range are > enabled. In total, they're these cores here: > > [ 0.762593] MTRR: mtrr_disable: CPU6, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762247] MTRR: mtrr_disable: CPU26, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762685] MTRR: mtrr_disable: CPU68, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762725] MTRR: mtrr_disable: CPU17, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762685] MTRR: mtrr_disable: CPU69, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762800] MTRR: mtrr_disable: CPU1, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762734] MTRR: mtrr_disable: CPU13, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762720] MTRR: mtrr_disable: CPU24, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762696] MTRR: mtrr_disable: CPU66, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762716] MTRR: mtrr_disable: CPU48, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762693] MTRR: mtrr_disable: CPU57, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762519] MTRR: mtrr_disable: CPU87, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762532] MTRR: mtrr_disable: CPU58, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762755] MTRR: mtrr_disable: CPU32, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762693] MTRR: mtrr_disable: CPU52, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762861] MTRR: mtrr_disable: CPU0, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762724] MTRR: mtrr_disable: CPU21, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762640] MTRR: mtrr_disable: CPU15, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762615] MTRR: mtrr_disable: CPU50, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762741] MTRR: mtrr_disable: CPU40, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762738] MTRR: mtrr_disable: CPU37, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762716] MTRR: mtrr_disable: CPU25, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762512] MTRR: mtrr_disable: CPU59, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762721] MTRR: mtrr_disable: CPU45, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762682] MTRR: mtrr_disable: CPU56, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762583] MTRR: mtrr_disable: CPU124, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762751] MTRR: mtrr_disable: CPU12, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762741] MTRR: mtrr_disable: CPU9, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762575] MTRR: mtrr_disable: CPU51, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762632] MTRR: mtrr_disable: CPU100, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762688] MTRR: mtrr_disable: CPU61, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762610] MTRR: mtrr_disable: CPU105, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762721] MTRR: mtrr_disable: CPU20, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.762583] MTRR: mtrr_disable: CPU47, MSR_MTRRdefType: 0x0, read: (0x0:0) > > Now, if I add MFENCEs around those RDMSRs: > > void mtrr_disable(struct cache_state *state) > { > unsigned int cpu = smp_processor_id(); > u64 msrval; > > /* Save MTRR state */ > rdmsr(MSR_MTRRdefType, state->mtrr_deftype_lo, state->mtrr_deftype_hi); > > __mb(); > > /* Disable MTRRs, and set the default type to uncached */ > mtrr_wrmsr(MSR_MTRRdefType, state->mtrr_deftype_lo & ~0xcff, > state->mtrr_deftype_hi); > > __mb(); > > rdmsrl(MSR_MTRRdefType, msrval); > > pr_info("%s: CPU%d, MSR_MTRRdefType: 0x%llx, read: (0x%x:%x)\n", > __func__, cpu, msrval, state->mtrr_deftype_lo, state->mtrr_deftype_hi); > > __mb(); > } > > the amount of cores becomes less: Probably not because of the fencing, but because of the different timing. > > [ 0.765260] MTRR: mtrr_disable: CPU6, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.765462] MTRR: mtrr_disable: CPU5, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.765242] MTRR: mtrr_disable: CPU22, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.765522] MTRR: mtrr_disable: CPU0, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.765474] MTRR: mtrr_disable: CPU1, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.765207] MTRR: mtrr_disable: CPU54, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.765225] MTRR: mtrr_disable: CPU8, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.765282] MTRR: mtrr_disable: CPU88, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.765150] MTRR: mtrr_disable: CPU119, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.765370] MTRR: mtrr_disable: CPU49, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.765395] MTRR: mtrr_disable: CPU16, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.765348] MTRR: mtrr_disable: CPU52, MSR_MTRRdefType: 0x0, read: (0x0:0) > [ 0.765270] MTRR: mtrr_disable: CPU58, MSR_MTRRdefType: 0x0, read: (0x0:0) > > which basically hints at some speculative fun where we end up reading > the MSR *after* the write to it has already happened. After this thing: > > /* Disable MTRRs, and set the default type to uncached */ > mtrr_wrmsr(MSR_MTRRdefType, state->mtrr_deftype_lo & ~0xcff, > state->mtrr_deftype_hi); > > and thus when we read it, we already read the disabled state. But this > is only a conjecture because I still have no clear idea how TF would > that even happen?!? Yeah, and why doesn't it happen when we handle only one cpu at a time? There might be some interaction between the cpus ... > > Needless to say, this fixes it, ofc: > > diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c > index 3805a6d32d37..4a685898caf3 100644 > --- a/arch/x86/kernel/cpu/cacheinfo.c > +++ b/arch/x86/kernel/cpu/cacheinfo.c > @@ -1116,12 +1116,14 @@ void cache_enable(struct cache_state *state) > __write_cr4(state->cr4); > } > > +static DEFINE_RAW_SPINLOCK(set_atomicity_lock); > + > static void cache_cpu_init(void) > { > unsigned long flags; > struct cache_state state = { }; > > - local_irq_save(flags); > + raw_spin_lock_irqsave(&set_atomicity_lock, flags); > cache_disable(&state); > > if (memory_caching_control & CACHE_MTRR) > @@ -1131,7 +1133,7 @@ static void cache_cpu_init(void) > pat_cpu_init(); > > cache_enable(&state); > - local_irq_restore(flags); > + raw_spin_unlock_irqrestore(&set_atomicity_lock, flags); > } > > static bool cache_aps_delayed_init = true; > > --- > > and frankly, considering how we have bigger fish to fry, I'd say we do > it the old way and leave that can'o'worms half-opened. I agree to keep this patch out of the series for now. > > Unless you wanna continue poking at it. I can give you access to that > box at work... Yes, please. I suspect there are some additional requirements for updating MTRR in parallel, or this is "just" a cpu bug. Juergen