Message ID | 20230209072220.6836-3-jgross@suse.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp186547wrn; Wed, 8 Feb 2023 23:23:50 -0800 (PST) X-Google-Smtp-Source: AK7set+m+6MloGpmP8UMxZ8Pxi52tubsi/oVciY4G+UeOV/K8sQ2b2VIUhVSXBgDdJLmU3JdvAGw X-Received: by 2002:a17:906:cb92:b0:889:5ca0:146e with SMTP id mf18-20020a170906cb9200b008895ca0146emr10308364ejb.16.1675927430292; Wed, 08 Feb 2023 23:23:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675927430; cv=none; d=google.com; s=arc-20160816; b=Yi7hlM4jeJwbraYCPT0ZV/BUyyuVWAY94XAAIV/iN4FcZnBuZKRSEKv+WXTkqbBigo 42PozANr5JCS23VIaFYf/I6RabtO1pz9cgT1VMevXEHmV/fi+AwuD2z+4bgKQFmG2moU 3wuwWe7TE/d6eEMQFFmd23twO8Vb/AqyJ4fy7lyjhj+y1/Azi+vSyQx9F0ljQrAWYd4h K1Cos5pNdTqvKMO9mEGPRhbu6smtmZhzGB3gH06j63sdxgPJxwriPaZU1gPc1u1aV2WG Jtxz2Lfhx2WO/VBwmgXJctu8AZUszXZh07PhDhNzTESS6CgvttQk5MVXYafECox4BPUZ vqZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=6KlzZtRVU/BktL0oxuQOwlCZm+eOW7c+Ihxdj/RddRU=; b=qFb8akTPkMYgQR2fdMgfvKpIva4lrXojAyEquug2Z9gl2Bl39hd3sX9C+gKxJJn+zK X1jbGyCKXUGu+ap9Gj0XR/9x9yV8GqHtlnadXPJJFFt+JHLQa3wrchfmDLkMjwM0nGOI y1HwjnI/eHAhsyA3a7q0NMcrZzxNtSS6OwCswVKtUvTJyWdrq9Tjo3jLBKGvbLXWxw53 rk+QCGzTZOW5NLAbKPp0WwAswnE3YzycsYwdLVxJ5l2+GEnef3D5XSBPdho6d1rzrTzC scFGjuAbBiHfUG8QnkrFWdLP0Uc3AgIGDUkOzssN44q/BKa7mG/CpPE/tegikZzvq7+g Ob7g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=SghSS7Pt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id fd24-20020a1709072a1800b008911a333905si1445299ejc.11.2023.02.08.23.23.27; Wed, 08 Feb 2023 23:23:50 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=SghSS7Pt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229853AbjBIHXL (ORCPT <rfc822;ybw1215001957@gmail.com> + 99 others); Thu, 9 Feb 2023 02:23:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34628 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229672AbjBIHXF (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 9 Feb 2023 02:23:05 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E5AB31C59E for <linux-kernel@vger.kernel.org>; Wed, 8 Feb 2023 23:22:35 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id A942F34DB4; Thu, 9 Feb 2023 07:22:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1675927354; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6KlzZtRVU/BktL0oxuQOwlCZm+eOW7c+Ihxdj/RddRU=; b=SghSS7PtDhKUO4VPKdXdgTYUXvRc6OBO3BazFVVGVDSKJq1l9SDT4wz6p9G9ynon3ndyEI UetnZPhGah5APTgvI58VZrC1vMRgjtivd3NkzGtzqDzXMtLOF+mwAYOpTh4b9s22cC8Fka YnfIYXeaPFKaGefmU0jzkBfIdtmECHk= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 5CC2D1339E; Thu, 9 Feb 2023 07:22:34 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id SPNcFTqf5GPzeAAAMHmgww (envelope-from <jgross@suse.com>); Thu, 09 Feb 2023 07:22:34 +0000 From: Juergen Gross <jgross@suse.com> To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: lists@nerdbynature.de, mikelley@microsoft.com, torvalds@linux-foundation.org, Juergen Gross <jgross@suse.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, "H. Peter Anvin" <hpa@zytor.com> Subject: [PATCH v2 2/8] x86/mtrr: support setting MTRR state for software defined MTRRs Date: Thu, 9 Feb 2023 08:22:14 +0100 Message-Id: <20230209072220.6836-3-jgross@suse.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230209072220.6836-1-jgross@suse.com> References: <20230209072220.6836-1-jgross@suse.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757337281287459748?= X-GMAIL-MSGID: =?utf-8?q?1757337281287459748?= |
Series |
x86/mtrr: fix handling with PAT but without MTRR
|
|
Commit Message
Juergen Gross
Feb. 9, 2023, 7:22 a.m. UTC
When running virtualized, MTRR access can be reduced (e.g. in Xen PV
guests or when running as a SEV-SNP guest under Hyper-V). Typically
the hypervisor will reset the MTRR feature in cpuid data, resulting
in no MTRR memory type information being available for the kernel.
This has turned out to result in problems:
- Hyper-V SEV-SNP guests using uncached mappings where they shouldn't
- Xen PV dom0 mapping memory as WB which should be UC- instead
Solve those problems by supporting to set a fixed MTRR state,
overwriting the empty state used today. In case such a state has been
set, don't call get_mtrr_state() in mtrr_bp_init(). The set state
will only be used by mtrr_type_lookup(), as in all other cases
mtrr_enabled() is being checked, which will return false. Accept the
overwrite call only in case of MTRRs being disabled in cpuid.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- new patch
---
arch/x86/include/asm/mtrr.h | 2 ++
arch/x86/kernel/cpu/mtrr/generic.c | 38 ++++++++++++++++++++++++++++++
arch/x86/kernel/cpu/mtrr/mtrr.c | 9 +++++++
3 files changed, 49 insertions(+)
Comments
From: Juergen Gross <jgross@suse.com> Sent: Wednesday, February 8, 2023 11:22 PM > > When running virtualized, MTRR access can be reduced (e.g. in Xen PV > guests or when running as a SEV-SNP guest under Hyper-V). Typically > the hypervisor will reset the MTRR feature in cpuid data, resulting > in no MTRR memory type information being available for the kernel. > > This has turned out to result in problems: > > - Hyper-V SEV-SNP guests using uncached mappings where they shouldn't > - Xen PV dom0 mapping memory as WB which should be UC- instead > > Solve those problems by supporting to set a fixed MTRR state, > overwriting the empty state used today. In case such a state has been > set, don't call get_mtrr_state() in mtrr_bp_init(). The set state > will only be used by mtrr_type_lookup(), as in all other cases > mtrr_enabled() is being checked, which will return false. Accept the > overwrite call only in case of MTRRs being disabled in cpuid. > > Signed-off-by: Juergen Gross <jgross@suse.com> > --- > V2: > - new patch > --- > arch/x86/include/asm/mtrr.h | 2 ++ > arch/x86/kernel/cpu/mtrr/generic.c | 38 ++++++++++++++++++++++++++++++ > arch/x86/kernel/cpu/mtrr/mtrr.c | 9 +++++++ > 3 files changed, 49 insertions(+) > > diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h > index f0eeaf6e5f5f..0b8f51d683dc 100644 > --- a/arch/x86/include/asm/mtrr.h > +++ b/arch/x86/include/asm/mtrr.h > @@ -31,6 +31,8 @@ > */ > # ifdef CONFIG_MTRR > void mtrr_bp_init(void); > +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var, > + mtrr_type *fixed, mtrr_type def_type); Could you add a stub for the !CONFIG_MTRR case? Then the #ifdef CONFIG_MTRR could be removed in Patch 3 of this series. Michael
On 13.02.23 02:07, Michael Kelley (LINUX) wrote: > From: Juergen Gross <jgross@suse.com> Sent: Wednesday, February 8, 2023 11:22 PM >> >> When running virtualized, MTRR access can be reduced (e.g. in Xen PV >> guests or when running as a SEV-SNP guest under Hyper-V). Typically >> the hypervisor will reset the MTRR feature in cpuid data, resulting >> in no MTRR memory type information being available for the kernel. >> >> This has turned out to result in problems: >> >> - Hyper-V SEV-SNP guests using uncached mappings where they shouldn't >> - Xen PV dom0 mapping memory as WB which should be UC- instead >> >> Solve those problems by supporting to set a fixed MTRR state, >> overwriting the empty state used today. In case such a state has been >> set, don't call get_mtrr_state() in mtrr_bp_init(). The set state >> will only be used by mtrr_type_lookup(), as in all other cases >> mtrr_enabled() is being checked, which will return false. Accept the >> overwrite call only in case of MTRRs being disabled in cpuid. >> >> Signed-off-by: Juergen Gross <jgross@suse.com> >> --- >> V2: >> - new patch >> --- >> arch/x86/include/asm/mtrr.h | 2 ++ >> arch/x86/kernel/cpu/mtrr/generic.c | 38 ++++++++++++++++++++++++++++++ >> arch/x86/kernel/cpu/mtrr/mtrr.c | 9 +++++++ >> 3 files changed, 49 insertions(+) >> >> diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h >> index f0eeaf6e5f5f..0b8f51d683dc 100644 >> --- a/arch/x86/include/asm/mtrr.h >> +++ b/arch/x86/include/asm/mtrr.h >> @@ -31,6 +31,8 @@ >> */ >> # ifdef CONFIG_MTRR >> void mtrr_bp_init(void); >> +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var, >> + mtrr_type *fixed, mtrr_type def_type); > > Could you add a stub for the !CONFIG_MTRR case? Then the > #ifdef CONFIG_MTRR could be removed in Patch 3 of this series. I was on the edge whether to add a stub. The Xen use case strongly suggests that the code wants to be inside an #ifdef, while the Hyper-V case is so simple, that it would benefit from the stub. As there was another #ifdef just above the added code in mshyperv.c I believed it would be fine without a stub. As you seem to like it better with the stub, I can add it. Juergen
From: Juergen Gross <jgross@suse.com> Sent: Sunday, February 12, 2023 10:27 PM > > On 13.02.23 02:07, Michael Kelley (LINUX) wrote: > > From: Juergen Gross <jgross@suse.com> Sent: Wednesday, February 8, 2023 11:22 PM > >> > >> When running virtualized, MTRR access can be reduced (e.g. in Xen PV > >> guests or when running as a SEV-SNP guest under Hyper-V). Typically > >> the hypervisor will reset the MTRR feature in cpuid data, resulting > >> in no MTRR memory type information being available for the kernel. > >> > >> This has turned out to result in problems: > >> > >> - Hyper-V SEV-SNP guests using uncached mappings where they shouldn't > >> - Xen PV dom0 mapping memory as WB which should be UC- instead > >> > >> Solve those problems by supporting to set a fixed MTRR state, > >> overwriting the empty state used today. In case such a state has been > >> set, don't call get_mtrr_state() in mtrr_bp_init(). The set state > >> will only be used by mtrr_type_lookup(), as in all other cases > >> mtrr_enabled() is being checked, which will return false. Accept the > >> overwrite call only in case of MTRRs being disabled in cpuid. > >> > >> Signed-off-by: Juergen Gross <jgross@suse.com> > >> --- > >> V2: > >> - new patch > >> --- > >> arch/x86/include/asm/mtrr.h | 2 ++ > >> arch/x86/kernel/cpu/mtrr/generic.c | 38 ++++++++++++++++++++++++++++++ > >> arch/x86/kernel/cpu/mtrr/mtrr.c | 9 +++++++ > >> 3 files changed, 49 insertions(+) > >> > >> diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h > >> index f0eeaf6e5f5f..0b8f51d683dc 100644 > >> --- a/arch/x86/include/asm/mtrr.h > >> +++ b/arch/x86/include/asm/mtrr.h > >> @@ -31,6 +31,8 @@ > >> */ > >> # ifdef CONFIG_MTRR > >> void mtrr_bp_init(void); > >> +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var, > >> + mtrr_type *fixed, mtrr_type def_type); > > > > Could you add a stub for the !CONFIG_MTRR case? Then the > > #ifdef CONFIG_MTRR could be removed in Patch 3 of this series. > > I was on the edge whether to add a stub. The Xen use case strongly > suggests that the code wants to be inside an #ifdef, while the Hyper-V > case is so simple, that it would benefit from the stub. As there was > another #ifdef just above the added code in mshyperv.c I believed it > would be fine without a stub. As you seem to like it better with the > stub, I can add it. > Thanks. And that other #ifdef is going away soon ... Michael
On Thu, Feb 09, 2023 at 08:22:14AM +0100, Juergen Gross wrote: > When running virtualized, MTRR access can be reduced (e.g. in Xen PV > guests or when running as a SEV-SNP guest under Hyper-V). Typically > the hypervisor will reset the MTRR feature in cpuid data, resulting > in no MTRR memory type information being available for the kernel. > > This has turned out to result in problems: > > - Hyper-V SEV-SNP guests using uncached mappings where they shouldn't > - Xen PV dom0 mapping memory as WB which should be UC- instead > > Solve those problems by supporting to set a fixed MTRR state, > overwriting the empty state used today. In case such a state has been > set, don't call get_mtrr_state() in mtrr_bp_init(). The set state > will only be used by mtrr_type_lookup(), as in all other cases > mtrr_enabled() is being checked, which will return false. Accept the > overwrite call only in case of MTRRs being disabled in cpuid. s/cpuid/CPUID/g > Signed-off-by: Juergen Gross <jgross@suse.com> > --- > V2: > - new patch > --- > arch/x86/include/asm/mtrr.h | 2 ++ > arch/x86/kernel/cpu/mtrr/generic.c | 38 ++++++++++++++++++++++++++++++ > arch/x86/kernel/cpu/mtrr/mtrr.c | 9 +++++++ > 3 files changed, 49 insertions(+) > > diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h > index f0eeaf6e5f5f..0b8f51d683dc 100644 > --- a/arch/x86/include/asm/mtrr.h > +++ b/arch/x86/include/asm/mtrr.h > @@ -31,6 +31,8 @@ > */ > # ifdef CONFIG_MTRR > void mtrr_bp_init(void); > +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var, > + mtrr_type *fixed, mtrr_type def_type); > extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform); > extern void mtrr_save_fixed_ranges(void *); > extern void mtrr_save_state(void); > diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c > index ee09d359e08f..788bc16888a5 100644 > --- a/arch/x86/kernel/cpu/mtrr/generic.c > +++ b/arch/x86/kernel/cpu/mtrr/generic.c > @@ -240,6 +240,44 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end, > return mtrr_state.def_type; > } > > +/** > + * mtrr_overwrite_state - set fixed MTRR state fixed only? You pass in variable too... > + * > + * Used to set MTRR state via different means (e.g. with data obtained from > + * a hypervisor). > + */ > +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var, > + mtrr_type *fixed, mtrr_type def_type) > +{ > + unsigned int i; > + > + if (boot_cpu_has(X86_FEATURE_MTRR)) check_for_deprecated_apis: WARNING: arch/x86/kernel/cpu/mtrr/generic.c:254: Do not use boot_cpu_has() - use cpu_feature_enabled() instead. > + return; So this here needs to check: if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR) && !(cpu_feature_enabled(X86_FEATURE_SEV_SNP) || cpu_feature_enabled(X86_FEATURE_XENPV))) { WARN_ON_ONCE(1); return; } as we don't want this to be called somewhere or by something else. The SEV_SNP flag can be used from: https://lore.kernel.org/r/20221214194056.161492-14-michael.roth@amd.com I'm assuming here HyperV SEV-SNP guests really do set that feature flag (they better). We can expedite that patch ofc. And for dom0 I *think* we use X86_FEATURE_XENPV but I leave that to you. > + > + if (var) { > + if (num_var > MTRR_MAX_VAR_RANGES) { > + pr_warn("Trying to overwrite MTRR state with %u variable entries\n", > + num_var); What's that check for? Sanity of callers? > + num_var = MTRR_MAX_VAR_RANGES; > + } > + for (i = 0; i < num_var; i++) > + mtrr_state.var_ranges[i] = var[i]; > + num_var_ranges = num_var; > + } > + > + if (fixed) { > + for (i = 0; i < MTRR_NUM_FIXED_RANGES; i++) You're not doing this sanity check here, expecting that callers would know what they're doing... > + mtrr_state.fixed_ranges[i] = fixed[i]; > + mtrr_state.enabled |= MTRR_STATE_MTRR_FIXED_ENABLED; > + mtrr_state.have_fixed = 1; > + } > + > + mtrr_state.def_type = def_type; > + mtrr_state.enabled |= MTRR_STATE_MTRR_ENABLED; > + > + mtrr_state_set = 1; > +} I can't say that I'm crazy about the call sites: mtrr_overwrite_state(NULL, 0, NULL, MTRR_TYPE_WRBACK); This looks like it wants a mtrr_override_def_type(MTRR_TYPE_WRBACK); instead of passing in all those nulls as params. This: mtrr_overwrite_state(var, reg, NULL, MTRR_TYPE_UNCACHABLE); I guess is a bit better. Dunno, if it is only those two callers we can say, meh, whatever, this interface is not pretty but does the job at least. But if more users start popping up then I guess we can do mtrr_override_fixed() mtrr_override_variable() mtrr_override_def_type() ... > /** > * mtrr_type_lookup - look up memory type in MTRR > * > diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c > index 542ca5639dfd..b73fe243c7fd 100644 > --- a/arch/x86/kernel/cpu/mtrr/mtrr.c > +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c > @@ -668,6 +668,15 @@ void __init mtrr_bp_init(void) > const char *why = "(not available)"; > unsigned int phys_addr; > > + if (mtrr_state.enabled) { Not crazy about this either: this relies on the fragile boot ordering where init_hypervisor_platform() runs before this so it has a chance that mtrr_state.enabled will be already set. Yeah, yeah, cache_bp_init() and all the MTRR BSP setup stuff happens after it but there should at least be a comment over init_hypervisor_platform()'s call site in setup_arch() stating that cache_bp_init() needs to happen *after* it because <reason>. I think we should also check x86_hyper_type here and not do anything if not set. As this is all HV-related muck. Xen I guess is a bit better because that call there happens even earlier but we need the comments to say that the ordering matters because future reorganization could cause it to blow up and people would search themselves crazy why in the hell it breaks... Can Xen use x86_hyper_type() too? Thx.
On 13.02.23 12:39, Borislav Petkov wrote: > On Thu, Feb 09, 2023 at 08:22:14AM +0100, Juergen Gross wrote: >> When running virtualized, MTRR access can be reduced (e.g. in Xen PV >> guests or when running as a SEV-SNP guest under Hyper-V). Typically >> the hypervisor will reset the MTRR feature in cpuid data, resulting >> in no MTRR memory type information being available for the kernel. >> >> This has turned out to result in problems: >> >> - Hyper-V SEV-SNP guests using uncached mappings where they shouldn't >> - Xen PV dom0 mapping memory as WB which should be UC- instead >> >> Solve those problems by supporting to set a fixed MTRR state, >> overwriting the empty state used today. In case such a state has been >> set, don't call get_mtrr_state() in mtrr_bp_init(). The set state >> will only be used by mtrr_type_lookup(), as in all other cases >> mtrr_enabled() is being checked, which will return false. Accept the >> overwrite call only in case of MTRRs being disabled in cpuid. > > s/cpuid/CPUID/g Okay. > >> Signed-off-by: Juergen Gross <jgross@suse.com> >> --- >> V2: >> - new patch >> --- >> arch/x86/include/asm/mtrr.h | 2 ++ >> arch/x86/kernel/cpu/mtrr/generic.c | 38 ++++++++++++++++++++++++++++++ >> arch/x86/kernel/cpu/mtrr/mtrr.c | 9 +++++++ >> 3 files changed, 49 insertions(+) >> >> diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h >> index f0eeaf6e5f5f..0b8f51d683dc 100644 >> --- a/arch/x86/include/asm/mtrr.h >> +++ b/arch/x86/include/asm/mtrr.h >> @@ -31,6 +31,8 @@ >> */ >> # ifdef CONFIG_MTRR >> void mtrr_bp_init(void); >> +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var, >> + mtrr_type *fixed, mtrr_type def_type); >> extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform); >> extern void mtrr_save_fixed_ranges(void *); >> extern void mtrr_save_state(void); >> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c >> index ee09d359e08f..788bc16888a5 100644 >> --- a/arch/x86/kernel/cpu/mtrr/generic.c >> +++ b/arch/x86/kernel/cpu/mtrr/generic.c >> @@ -240,6 +240,44 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end, >> return mtrr_state.def_type; >> } >> >> +/** >> + * mtrr_overwrite_state - set fixed MTRR state > > fixed only? You pass in variable too... Fixed in the sense of static. > >> + * >> + * Used to set MTRR state via different means (e.g. with data obtained from >> + * a hypervisor). >> + */ >> +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var, >> + mtrr_type *fixed, mtrr_type def_type) >> +{ >> + unsigned int i; >> + >> + if (boot_cpu_has(X86_FEATURE_MTRR)) > > check_for_deprecated_apis: WARNING: arch/x86/kernel/cpu/mtrr/generic.c:254: Do not use boot_cpu_has() - use cpu_feature_enabled() instead. Okay. > >> + return; > > So this here needs to check: > > if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR) && > !(cpu_feature_enabled(X86_FEATURE_SEV_SNP) || > cpu_feature_enabled(X86_FEATURE_XENPV))) { > WARN_ON_ONCE(1); > return; > } > > as we don't want this to be called somewhere or by something else. Wouldn't !cpu_feature_enabled(X86_FEATURE_HYPERVISOR) be enough? I'm not sure we won't need that for TDX guests, too. > > The SEV_SNP flag can be used from: > > https://lore.kernel.org/r/20221214194056.161492-14-michael.roth@amd.com > > I'm assuming here HyperV SEV-SNP guests really do set that feature flag > (they better). We can expedite that patch ofc. > > And for dom0 I *think* we use X86_FEATURE_XENPV but I leave that to you. Yes, it is only relevant for PV dom0. > >> + >> + if (var) { >> + if (num_var > MTRR_MAX_VAR_RANGES) { >> + pr_warn("Trying to overwrite MTRR state with %u variable entries\n", >> + num_var); > > What's that check for? Sanity of callers? Yes. > >> + num_var = MTRR_MAX_VAR_RANGES; >> + } >> + for (i = 0; i < num_var; i++) >> + mtrr_state.var_ranges[i] = var[i]; >> + num_var_ranges = num_var; >> + } >> + >> + if (fixed) { >> + for (i = 0; i < MTRR_NUM_FIXED_RANGES; i++) > > You're not doing this sanity check here, expecting that callers would > know what they're doing... The number of fixed MTRRs is not dynamic AFAIK. > >> + mtrr_state.fixed_ranges[i] = fixed[i]; >> + mtrr_state.enabled |= MTRR_STATE_MTRR_FIXED_ENABLED; >> + mtrr_state.have_fixed = 1; >> + } >> + >> + mtrr_state.def_type = def_type; >> + mtrr_state.enabled |= MTRR_STATE_MTRR_ENABLED; >> + >> + mtrr_state_set = 1; >> +} > > I can't say that I'm crazy about the call sites: > > mtrr_overwrite_state(NULL, 0, NULL, MTRR_TYPE_WRBACK); > > This looks like it wants a > > mtrr_override_def_type(MTRR_TYPE_WRBACK); > > instead of passing in all those nulls as params. > > This: > > mtrr_overwrite_state(var, reg, NULL, MTRR_TYPE_UNCACHABLE); > > I guess is a bit better. > > Dunno, if it is only those two callers we can say, meh, whatever, this > interface is not pretty but does the job at least. But if more users > start popping up then I guess we can do > > mtrr_override_fixed() > mtrr_override_variable() > mtrr_override_def_type() A single interface makes it easier to avoid multiple calls. In the end I'm fine with either way. > > ... > > >> /** >> * mtrr_type_lookup - look up memory type in MTRR >> * >> diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c >> index 542ca5639dfd..b73fe243c7fd 100644 >> --- a/arch/x86/kernel/cpu/mtrr/mtrr.c >> +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c >> @@ -668,6 +668,15 @@ void __init mtrr_bp_init(void) >> const char *why = "(not available)"; >> unsigned int phys_addr; >> >> + if (mtrr_state.enabled) { > > Not crazy about this either: this relies on the fragile boot ordering > where init_hypervisor_platform() runs before this so it has a chance > that mtrr_state.enabled will be already set. > > Yeah, yeah, cache_bp_init() and all the MTRR BSP setup stuff happens > after it but there should at least be a comment over > init_hypervisor_platform()'s call site in setup_arch() stating that > cache_bp_init() needs to happen *after* it because <reason>. I'll add that comment. > > I think we should also check > > x86_hyper_type > > here and not do anything if not set. As this is all HV-related muck. > > Xen I guess is a bit better because that call there happens even earlier > but we need the comments to say that the ordering matters because future > reorganization could cause it to blow up and people would search > themselves crazy why in the hell it breaks... > > Can Xen use x86_hyper_type() too? It does. Juergen
On Mon, Feb 13, 2023 at 03:07:07PM +0100, Juergen Gross wrote: > Fixed in the sense of static. Well, you can't use "fixed" to say "static" when former means something very specific already in MTRR land. > Wouldn't !cpu_feature_enabled(X86_FEATURE_HYPERVISOR) be enough? > > I'm not sure we won't need that for TDX guests, too. See, that's the problem. I wanna have it simple too. Lemme check with dhansen. > Yes, it is only relevant for PV dom0. Right, I was asking whether "PV dom0" == X86_FEATURE_XENPV? :) > The number of fixed MTRRs is not dynamic AFAIK. But nothing guarantees that the caller would pass an array "mtrr_type *fixed" of size MTRR_NUM_FIXED_RANGES, right? > A single interface makes it easier to avoid multiple calls. > > In the end I'm fine with either way. Yeah, I know. Question is, how much of this functionality will be needed/used so that we can go all out on the interface design or we can do a single one and forget about it... > > Can Xen use x86_hyper_type() too? > > It does. Then pls add a x86_hyper_type check too to make sure a potential move of this call is caught in the future. Thx.
On Mon, Feb 13, 2023 at 04:03:07PM +0100, Borislav Petkov wrote: > > Wouldn't !cpu_feature_enabled(X86_FEATURE_HYPERVISOR) be enough? > > > > I'm not sure we won't need that for TDX guests, too. > > See, that's the problem. I wanna have it simple too. Lemme check with > dhansen. He says MTRRs are enabled in TDX guests: "X86_FEATURE_MTRR is fixed to 1 in TDX guests." So we will have to do the more finer-grained check I guess.
On 13.02.23 16:11, Borislav Petkov wrote: > On Mon, Feb 13, 2023 at 04:03:07PM +0100, Borislav Petkov wrote: >>> Wouldn't !cpu_feature_enabled(X86_FEATURE_HYPERVISOR) be enough? >>> >>> I'm not sure we won't need that for TDX guests, too. >> >> See, that's the problem. I wanna have it simple too. Lemme check with >> dhansen. > > He says MTRRs are enabled in TDX guests: "X86_FEATURE_MTRR is fixed to > 1 in TDX guests." > > So we will have to do the more finer-grained check I guess. Isn't the check for !X86_FEATURE_MTRR && X86_FEATURE_HYPERVISOR enough then? Yes, you still could construct cases where it would go wrong, but I don't think we should over-engineer it. Juergen
On 2/13/23 07:11, Borislav Petkov wrote: > On Mon, Feb 13, 2023 at 04:03:07PM +0100, Borislav Petkov wrote: >>> Wouldn't !cpu_feature_enabled(X86_FEATURE_HYPERVISOR) be enough? >>> >>> I'm not sure we won't need that for TDX guests, too. >> See, that's the problem. I wanna have it simple too. Lemme check with >> dhansen. > He says MTRRs are enabled in TDX guests: "X86_FEATURE_MTRR is fixed to > 1 in TDX guests." > > So we will have to do the more finer-grained check I guess. Yes, TDX guests see MTRRs as being supported. But, the TDX module also appears to inject a #VE for all RDMSR or WRMSR's to the MTRRs. That makes them effectively useless. I actually don't know what the heck TDX guests are supposed to do if they feel like mucking with the MSRs. The architecture (CPUID) is essentially telling them: "Sure, go ahead MTRRs are fiiiiiiine". But the TDX module is sitting there throwing exceptions (#VE) if the guest tries to touch MTRRs. It sounds like there are some guest<->host ABIs on Xen to help the guests do this. But I don't see anything in the TDX "GHCI" about it.
On 13.02.23 16:03, Borislav Petkov wrote: > On Mon, Feb 13, 2023 at 03:07:07PM +0100, Juergen Gross wrote: >> Fixed in the sense of static. > > Well, you can't use "fixed" to say "static" when former means something > very specific already in MTRR land. > >> Wouldn't !cpu_feature_enabled(X86_FEATURE_HYPERVISOR) be enough? >> >> I'm not sure we won't need that for TDX guests, too. > > See, that's the problem. I wanna have it simple too. Lemme check with > dhansen. > >> Yes, it is only relevant for PV dom0. > > Right, I was asking whether "PV dom0" == X86_FEATURE_XENPV? No, you can have PV guests not being dom0. > > :) > >> The number of fixed MTRRs is not dynamic AFAIK. > > But nothing guarantees that the caller would pass an array "mtrr_type > *fixed" of size MTRR_NUM_FIXED_RANGES, right? Right. In the end I wouldn't mind dropping the fixed MTRRs from the interface, as they are currently not needed at all. > >> A single interface makes it easier to avoid multiple calls. >> >> In the end I'm fine with either way. > > Yeah, I know. Question is, how much of this functionality will be > needed/used so that we can go all out on the interface design or we can > do a single one and forget about it... I'd say we go with what is needed right now. And having a single interface makes all the sanity checking you are asking for easier. > >>> Can Xen use x86_hyper_type() too? >> >> It does. > > Then pls add a x86_hyper_type check too to make sure a potential move of > this call is caught in the future. What are you especially asking for? With my current patches Xen PV dom0 will call mtrr_overwrite_state() before x86_hyper_type is set, while a Hyper-V SEV-SNP guest will make the call after it has been set. Both calls happen before cache_bp_init(). So I could move the mtrr_overwrite_state() call for Xen PV dom0 into its init_platform() callback and check in mtrr_overwrite_state() x86_hyper_type to be set, or I could reject a call of mtrr_overwrite_state() after the call of cache_bp_init() has happened, or I could do both. Juergen
On 13.02.23 16:27, Dave Hansen wrote: > On 2/13/23 07:11, Borislav Petkov wrote: >> On Mon, Feb 13, 2023 at 04:03:07PM +0100, Borislav Petkov wrote: >>>> Wouldn't !cpu_feature_enabled(X86_FEATURE_HYPERVISOR) be enough? >>>> >>>> I'm not sure we won't need that for TDX guests, too. >>> See, that's the problem. I wanna have it simple too. Lemme check with >>> dhansen. >> He says MTRRs are enabled in TDX guests: "X86_FEATURE_MTRR is fixed to >> 1 in TDX guests." >> >> So we will have to do the more finer-grained check I guess. > > Yes, TDX guests see MTRRs as being supported. But, the TDX module also > appears to inject a #VE for all RDMSR or WRMSR's to the MTRRs. That > makes them effectively useless. > > I actually don't know what the heck TDX guests are supposed to do if > they feel like mucking with the MSRs. The architecture (CPUID) is > essentially telling them: "Sure, go ahead MTRRs are fiiiiiiine". But > the TDX module is sitting there throwing exceptions (#VE) if the guest > tries to touch MTRRs. > > It sounds like there are some guest<->host ABIs on Xen to help the > guests do this. But I don't see anything in the TDX "GHCI" about it. This is in line of the PAT init sequence of TDX guests. PAT is said to be supported, but a TDX guest can't use the sequence as written in the SDM for setting the PAT MSR (disable caches, etc.). Juergen
On Mon, Feb 13, 2023 at 04:18:48PM +0100, Juergen Gross wrote: > Yes, you still could construct cases where it would go wrong, but I don't > think we should over-engineer it. Actually, we should allow only those for which we know they get special treatment for MTRRs settings and warn for all the rest. And judging by Dave's reply, I think TDX should be in that category too since it throws #VEs...
On 13.02.23 16:40, Borislav Petkov wrote: > On Mon, Feb 13, 2023 at 04:18:48PM +0100, Juergen Gross wrote: >> Yes, you still could construct cases where it would go wrong, but I don't >> think we should over-engineer it. > > Actually, we should allow only those for which we know they get special > treatment for MTRRs settings and warn for all the rest. > > And judging by Dave's reply, I think TDX should be in that category too > since it throws #VEs... > Okay, and it has MTRRs enabled (as Hyper-V SEV-SNP guests), so I shouldn't test that, I guess (or we should disable the feature before calling the overwrite function). Juergen
On Mon, Feb 13, 2023 at 04:36:12PM +0100, Juergen Gross wrote: > In the end I wouldn't mind dropping the fixed MTRRs from the interface, as > they are currently not needed at all. Yes, the less the better. > I'd say we go with what is needed right now. And having a single interface > makes all the sanity checking you are asking for easier. I guess I need to remember to finish designing this if more users appear... > What are you especially asking for? > > With my current patches Xen PV dom0 will call mtrr_overwrite_state() before > x86_hyper_type is set, while a Hyper-V SEV-SNP guest will make the call after > it has been set. Both calls happen before cache_bp_init(). > > So I could move the mtrr_overwrite_state() call for Xen PV dom0 into its > init_platform() callback and check in mtrr_overwrite_state() x86_hyper_type > to be set, I believe that is good enough, see below. > or I could reject a call of mtrr_overwrite_state() after the call of > cache_bp_init() has happened, or I could do both. I think one thing is enough as we'll be loud enough. --- diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c index b73fe243c7fd..2dbe2c10e959 100644 --- a/arch/x86/kernel/cpu/mtrr/mtrr.c +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c @@ -49,6 +49,7 @@ #include <asm/cacheinfo.h> #include <asm/cpufeature.h> #include <asm/e820/api.h> +#include <asm/hypervisor.h> #include <asm/mtrr.h> #include <asm/msr.h> #include <asm/memtype.h> @@ -668,7 +669,12 @@ void __init mtrr_bp_init(void) const char *why = "(not available)"; unsigned int phys_addr; +#ifdef CONFIG_HYPERVISOR_GUEST if (mtrr_state.enabled) { + + /* This should not happen without a hypervisor present. */ + WARN_ON_ONCE(!x86_hyper_type); + /* Software overwrite of MTRR state, only for generic case. */ mtrr_calc_physbits(true); init_table(); @@ -676,6 +682,7 @@ void __init mtrr_bp_init(void) return; } +#endif phys_addr = mtrr_calc_physbits(boot_cpu_has(X86_FEATURE_MTRR));
On Mon, Feb 13, 2023 at 04:44:09PM +0100, Juergen Gross wrote: > Okay, and it has MTRRs enabled (as Hyper-V SEV-SNP guests), so I shouldn't > test that, I guess (or we should disable the feature before calling the > overwrite function). I think we should handle TDX the same way - as if the MTRRs are read-only there. So you can check X86_FEATURE_TDX_GUEST in addition. Thx.
On Mon, Feb 13, 2023 at 03:07:07PM +0100, Juergen Gross wrote: > > So this here needs to check: > > > > if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR) && > > !(cpu_feature_enabled(X86_FEATURE_SEV_SNP) || > > cpu_feature_enabled(X86_FEATURE_XENPV))) { > > WARN_ON_ONCE(1); > > return; > > } > > > > as we don't want this to be called somewhere or by something else. > > Wouldn't !cpu_feature_enabled(X86_FEATURE_HYPERVISOR) be enough? > > I'm not sure we won't need that for TDX guests, too. TDX guests are covered by X86_FEATURE_HYPERVISOR.
On 13.02.23 19:43, Borislav Petkov wrote: > On Mon, Feb 13, 2023 at 04:36:12PM +0100, Juergen Gross wrote: >> In the end I wouldn't mind dropping the fixed MTRRs from the interface, as >> they are currently not needed at all. > > Yes, the less the better. > >> I'd say we go with what is needed right now. And having a single interface >> makes all the sanity checking you are asking for easier. > > I guess I need to remember to finish designing this if more users > appear... > >> What are you especially asking for? >> >> With my current patches Xen PV dom0 will call mtrr_overwrite_state() before >> x86_hyper_type is set, while a Hyper-V SEV-SNP guest will make the call after >> it has been set. Both calls happen before cache_bp_init(). >> >> So I could move the mtrr_overwrite_state() call for Xen PV dom0 into its >> init_platform() callback and check in mtrr_overwrite_state() x86_hyper_type >> to be set, > > I believe that is good enough, see below. > >> or I could reject a call of mtrr_overwrite_state() after the call of >> cache_bp_init() has happened, or I could do both. > > I think one thing is enough as we'll be loud enough. > > --- > diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c > index b73fe243c7fd..2dbe2c10e959 100644 > --- a/arch/x86/kernel/cpu/mtrr/mtrr.c > +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c > @@ -49,6 +49,7 @@ > #include <asm/cacheinfo.h> > #include <asm/cpufeature.h> > #include <asm/e820/api.h> > +#include <asm/hypervisor.h> > #include <asm/mtrr.h> > #include <asm/msr.h> > #include <asm/memtype.h> > @@ -668,7 +669,12 @@ void __init mtrr_bp_init(void) > const char *why = "(not available)"; > unsigned int phys_addr; > > +#ifdef CONFIG_HYPERVISOR_GUEST > if (mtrr_state.enabled) { > + > + /* This should not happen without a hypervisor present. */ > + WARN_ON_ONCE(!x86_hyper_type); > + > /* Software overwrite of MTRR state, only for generic case. */ > mtrr_calc_physbits(true); > init_table(); > @@ -676,6 +682,7 @@ void __init mtrr_bp_init(void) > > return; > } > +#endif I will change this a little bit in order to avoid the #ifdef by using "WARN_ON(hypervisor_is_type() == X86_HYPER_NATIVE);" Juergen
On 13.02.23 19:53, Borislav Petkov wrote: > On Mon, Feb 13, 2023 at 04:44:09PM +0100, Juergen Gross wrote: >> Okay, and it has MTRRs enabled (as Hyper-V SEV-SNP guests), so I shouldn't >> test that, I guess (or we should disable the feature before calling the >> overwrite function). > > I think we should handle TDX the same way - as if the MTRRs are > read-only there. So you can check X86_FEATURE_TDX_GUEST in addition. Okay, if you really want to dictate the allowed use cases (this seems to be a layering violation), but you are the maintainer of that code. Juergen
On Tue, Feb 14, 2023 at 08:04:47AM +0100, Juergen Gross wrote: > Okay, if you really want to dictate the allowed use cases (this seems to be Read upthread - TDX guests cause #VEs for MTRR accesses. #VEs which are unneeded and should be avoided if possible. > a layering violation), but you are the maintainer of that code. And why are you so against catching misuses of this, which should absolutely *not* be needed by anything else?
On 14.02.23 09:58, Borislav Petkov wrote: > On Tue, Feb 14, 2023 at 08:04:47AM +0100, Juergen Gross wrote: >> Okay, if you really want to dictate the allowed use cases (this seems to be > > Read upthread - TDX guests cause #VEs for MTRR accesses. #VEs which are > unneeded and should be avoided if possible. Of course, I don't question the need for TDX guests to use the overwrite. > >> a layering violation), but you are the maintainer of that code. > > And why are you so against catching misuses of this, which should > absolutely *not* be needed by anything else I just don't like the idea of trying to catch all possible misuses in lower levels, at the same time introducing the need to modify those tests in case a new valid use case is popping up. But as said, you are the maintainer, so its your final decision. Juergen
On Tue, Feb 14, 2023 at 10:02:51AM +0100, Juergen Gross wrote: > I just don't like the idea of trying to catch all possible misuses in > lower levels, at the same time introducing the need to modify those > tests in case a new valid use case is popping up. So what would you do: generally allow this so that potentially other guest configurations misuse it? And when we decide to change it, all those users will come running and complaining that we broke it? And then we're stuck with a nasty workaround in the tree because we have to support them too? See, all we do here is because of such misguided (or maybe didn't know better) decisions which have happened a long time ago.
On 14.02.23 10:10, Borislav Petkov wrote: > On Tue, Feb 14, 2023 at 10:02:51AM +0100, Juergen Gross wrote: >> I just don't like the idea of trying to catch all possible misuses in >> lower levels, at the same time introducing the need to modify those >> tests in case a new valid use case is popping up. > > So what would you do: generally allow this so that potentially other > guest configurations misuse it? I guess this largely depends on the functionality. I don't see why anyone would try to use MTRR overwrite functionality without really needing it. But maybe I'm wrong here and I'm under-estimating the "creativity" of kernel hackers. > And when we decide to change it, all those users will come running and > complaining that we broke it? > > And then we're stuck with a nasty workaround in the tree because we have > to support them too? > > See, all we do here is because of such misguided (or maybe didn't know > better) decisions which have happened a long time ago. I can see your point. Maybe I haven't seen enough crazy hacks yet. :-) No need to further discuss this topic from my side, as I have voiced my opinion and you did so, too. I will add the tests you are asking for. Juergen
On Tue, Feb 14, 2023 at 10:17:12AM +0100, Juergen Gross wrote: > I guess this largely depends on the functionality. I don't see why anyone > would try to use MTRR overwrite functionality without really needing it. > > But maybe I'm wrong here and I'm under-estimating the "creativity" of > kernel hackers. This is exactly it - if it is there, it will get used eventually. Think of it this way: this is a special, well, kinda hack, if you will, which *nothing* else would need. We can always relax the condition for using it if something else appears with a valid use case. What we can't do nearly as easily is the reverse: remove it or tighten the check later. So the general policy is: workarounds like this need to be as specialized as possible. > Maybe I haven't seen enough crazy hacks yet. :-) You're kidding, right? You hack on Xen for a long time... :-P > No need to further discuss this topic from my side, as I have voiced my > opinion and you did so, too. I will add the tests you are asking for. Thanks!
On 13.02.23 12:39, Borislav Petkov wrote: > On Thu, Feb 09, 2023 at 08:22:14AM +0100, Juergen Gross wrote: >> When running virtualized, MTRR access can be reduced (e.g. in Xen PV >> guests or when running as a SEV-SNP guest under Hyper-V). Typically >> the hypervisor will reset the MTRR feature in cpuid data, resulting >> in no MTRR memory type information being available for the kernel. >> >> This has turned out to result in problems: >> >> - Hyper-V SEV-SNP guests using uncached mappings where they shouldn't >> - Xen PV dom0 mapping memory as WB which should be UC- instead >> >> Solve those problems by supporting to set a fixed MTRR state, >> overwriting the empty state used today. In case such a state has been >> set, don't call get_mtrr_state() in mtrr_bp_init(). The set state >> will only be used by mtrr_type_lookup(), as in all other cases >> mtrr_enabled() is being checked, which will return false. Accept the >> overwrite call only in case of MTRRs being disabled in cpuid. > > s/cpuid/CPUID/g > >> Signed-off-by: Juergen Gross <jgross@suse.com> >> --- >> V2: >> - new patch >> --- >> arch/x86/include/asm/mtrr.h | 2 ++ >> arch/x86/kernel/cpu/mtrr/generic.c | 38 ++++++++++++++++++++++++++++++ >> arch/x86/kernel/cpu/mtrr/mtrr.c | 9 +++++++ >> 3 files changed, 49 insertions(+) >> >> diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h >> index f0eeaf6e5f5f..0b8f51d683dc 100644 >> --- a/arch/x86/include/asm/mtrr.h >> +++ b/arch/x86/include/asm/mtrr.h >> @@ -31,6 +31,8 @@ >> */ >> # ifdef CONFIG_MTRR >> void mtrr_bp_init(void); >> +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var, >> + mtrr_type *fixed, mtrr_type def_type); >> extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform); >> extern void mtrr_save_fixed_ranges(void *); >> extern void mtrr_save_state(void); >> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c >> index ee09d359e08f..788bc16888a5 100644 >> --- a/arch/x86/kernel/cpu/mtrr/generic.c >> +++ b/arch/x86/kernel/cpu/mtrr/generic.c >> @@ -240,6 +240,44 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end, >> return mtrr_state.def_type; >> } >> >> +/** >> + * mtrr_overwrite_state - set fixed MTRR state > > fixed only? You pass in variable too... > >> + * >> + * Used to set MTRR state via different means (e.g. with data obtained from >> + * a hypervisor). >> + */ >> +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var, >> + mtrr_type *fixed, mtrr_type def_type) >> +{ >> + unsigned int i; >> + >> + if (boot_cpu_has(X86_FEATURE_MTRR)) > > check_for_deprecated_apis: WARNING: arch/x86/kernel/cpu/mtrr/generic.c:254: Do not use boot_cpu_has() - use cpu_feature_enabled() instead. > >> + return; > > So this here needs to check: > > if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR) && > !(cpu_feature_enabled(X86_FEATURE_SEV_SNP) || > cpu_feature_enabled(X86_FEATURE_XENPV))) { > WARN_ON_ONCE(1); > return; > } > > as we don't want this to be called somewhere or by something else. > > The SEV_SNP flag can be used from: > > https://lore.kernel.org/r/20221214194056.161492-14-michael.roth@amd.com > > I'm assuming here HyperV SEV-SNP guests really do set that feature flag > (they better). We can expedite that patch ofc. Is that flag _really_ meant to indicate we are running as a SEV-SNP guest? Given that the referenced patch is part of the SEV-SNP host support series, I'm inclined to suspect it won't be set for sure in HyperV SEV-SNP guests. And who is setting it for KVM SEV-SNP guests? Juergen
On Thu, Feb 16, 2023 at 10:32:28AM +0100, Juergen Gross wrote: > On 13.02.23 12:39, Borislav Petkov wrote: > >On Thu, Feb 09, 2023 at 08:22:14AM +0100, Juergen Gross wrote: > >>When running virtualized, MTRR access can be reduced (e.g. in Xen PV > >>guests or when running as a SEV-SNP guest under Hyper-V). Typically > >>the hypervisor will reset the MTRR feature in cpuid data, resulting > >>in no MTRR memory type information being available for the kernel. > >> > >>This has turned out to result in problems: > >> > >>- Hyper-V SEV-SNP guests using uncached mappings where they shouldn't > >>- Xen PV dom0 mapping memory as WB which should be UC- instead > >> > >>Solve those problems by supporting to set a fixed MTRR state, > >>overwriting the empty state used today. In case such a state has been > >>set, don't call get_mtrr_state() in mtrr_bp_init(). The set state > >>will only be used by mtrr_type_lookup(), as in all other cases > >>mtrr_enabled() is being checked, which will return false. Accept the > >>overwrite call only in case of MTRRs being disabled in cpuid. > > > >s/cpuid/CPUID/g > > > >>Signed-off-by: Juergen Gross <jgross@suse.com> > >>--- > >>V2: > >>- new patch > >>--- > >> arch/x86/include/asm/mtrr.h | 2 ++ > >> arch/x86/kernel/cpu/mtrr/generic.c | 38 ++++++++++++++++++++++++++++++ > >> arch/x86/kernel/cpu/mtrr/mtrr.c | 9 +++++++ > >> 3 files changed, 49 insertions(+) > >> > >>diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h > >>index f0eeaf6e5f5f..0b8f51d683dc 100644 > >>--- a/arch/x86/include/asm/mtrr.h > >>+++ b/arch/x86/include/asm/mtrr.h > >>@@ -31,6 +31,8 @@ > >> */ > >> # ifdef CONFIG_MTRR > >> void mtrr_bp_init(void); > >>+void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var, > >>+ mtrr_type *fixed, mtrr_type def_type); > >> extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform); > >> extern void mtrr_save_fixed_ranges(void *); > >> extern void mtrr_save_state(void); > >>diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c > >>index ee09d359e08f..788bc16888a5 100644 > >>--- a/arch/x86/kernel/cpu/mtrr/generic.c > >>+++ b/arch/x86/kernel/cpu/mtrr/generic.c > >>@@ -240,6 +240,44 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end, > >> return mtrr_state.def_type; > >> } > >>+/** > >>+ * mtrr_overwrite_state - set fixed MTRR state > > > >fixed only? You pass in variable too... > > > >>+ * > >>+ * Used to set MTRR state via different means (e.g. with data obtained from > >>+ * a hypervisor). > >>+ */ > >>+void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var, > >>+ mtrr_type *fixed, mtrr_type def_type) > >>+{ > >>+ unsigned int i; > >>+ > >>+ if (boot_cpu_has(X86_FEATURE_MTRR)) > > > >check_for_deprecated_apis: WARNING: arch/x86/kernel/cpu/mtrr/generic.c:254: Do not use boot_cpu_has() - use cpu_feature_enabled() instead. > > > >>+ return; > > > >So this here needs to check: > > > > if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR) && > > !(cpu_feature_enabled(X86_FEATURE_SEV_SNP) || > > cpu_feature_enabled(X86_FEATURE_XENPV))) { > > WARN_ON_ONCE(1); > > return; > > } > > > >as we don't want this to be called somewhere or by something else. > > > >The SEV_SNP flag can be used from: > > > >https://lore.kernel.org/r/20221214194056.161492-14-michael.roth@amd.com > > > >I'm assuming here HyperV SEV-SNP guests really do set that feature flag > >(they better). We can expedite that patch ofc. > > Is that flag _really_ meant to indicate we are running as a SEV-SNP guest? > > Given that the referenced patch is part of the SEV-SNP host support series, > I'm inclined to suspect it won't be set for sure in HyperV SEV-SNP guests. > And who is setting it for KVM SEV-SNP guests? > > > Juergen Initially both guest and host side have X86_FEATURE_SEV_SNP, but early_detect_mem_encrypt() clears it for the guest side. You would want cc_platform_has(CC_ATTR_GUEST_SEV_SNP), but: * there are two kinds of Hyper-V SEV-SNP VMs: "unenlightened" (vTOM+paravisor) and "enlightened" * the kernel currently supports the "unenlightened" kind which do not set CC_ATTR_GUEST_SEV_SNP (because it implies codepaths which do not apply to vTOM mode) * the patchset that adds support for "enlightened" Hyper-V SEV-SNP VMs sets CC_ATTR_GUEST_SEV_SNP [1] The closest you can get is: cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) && hypervisor_is_type(X86_HYPER_MS_HYPERV) but that would soon cover TDX guests too so <shrug>... [1]: https://lore.kernel.org/lkml/1673559753-94403-8-git-send-email-mikelley@microsoft.com/
On Mon, Feb 13, 2023 at 12:39:40PM +0100, Borislav Petkov wrote: > On Thu, Feb 09, 2023 at 08:22:14AM +0100, Juergen Gross wrote: > > When running virtualized, MTRR access can be reduced (e.g. in Xen PV > > guests or when running as a SEV-SNP guest under Hyper-V). Typically > > the hypervisor will reset the MTRR feature in cpuid data, resulting > > in no MTRR memory type information being available for the kernel. > > > > This has turned out to result in problems: > > > > - Hyper-V SEV-SNP guests using uncached mappings where they shouldn't > > - Xen PV dom0 mapping memory as WB which should be UC- instead > > > > Solve those problems by supporting to set a fixed MTRR state, > > overwriting the empty state used today. In case such a state has been > > set, don't call get_mtrr_state() in mtrr_bp_init(). The set state > > will only be used by mtrr_type_lookup(), as in all other cases > > mtrr_enabled() is being checked, which will return false. Accept the > > overwrite call only in case of MTRRs being disabled in cpuid. > > s/cpuid/CPUID/g > > > Signed-off-by: Juergen Gross <jgross@suse.com> > > --- > > V2: > > - new patch > > --- > > arch/x86/include/asm/mtrr.h | 2 ++ > > arch/x86/kernel/cpu/mtrr/generic.c | 38 ++++++++++++++++++++++++++++++ > > arch/x86/kernel/cpu/mtrr/mtrr.c | 9 +++++++ > > 3 files changed, 49 insertions(+) > > > > diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h > > index f0eeaf6e5f5f..0b8f51d683dc 100644 > > --- a/arch/x86/include/asm/mtrr.h > > +++ b/arch/x86/include/asm/mtrr.h > > @@ -31,6 +31,8 @@ > > */ > > # ifdef CONFIG_MTRR > > void mtrr_bp_init(void); > > +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var, > > + mtrr_type *fixed, mtrr_type def_type); > > extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform); > > extern void mtrr_save_fixed_ranges(void *); > > extern void mtrr_save_state(void); > > diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c > > index ee09d359e08f..788bc16888a5 100644 > > --- a/arch/x86/kernel/cpu/mtrr/generic.c > > +++ b/arch/x86/kernel/cpu/mtrr/generic.c > > @@ -240,6 +240,44 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end, > > return mtrr_state.def_type; > > } > > > > +/** > > + * mtrr_overwrite_state - set fixed MTRR state > > fixed only? You pass in variable too... > > > + * > > + * Used to set MTRR state via different means (e.g. with data obtained from > > + * a hypervisor). > > + */ > > +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var, > > + mtrr_type *fixed, mtrr_type def_type) > > +{ > > + unsigned int i; > > + > > + if (boot_cpu_has(X86_FEATURE_MTRR)) > > check_for_deprecated_apis: WARNING: arch/x86/kernel/cpu/mtrr/generic.c:254: Do not use boot_cpu_has() - use cpu_feature_enabled() instead. > Hi Boris, Where does this check come from? I can't find a source for it. Jeremi
On Thu, Feb 16, 2023 at 10:32:28AM +0100, Juergen Gross wrote: > Is that flag _really_ meant to indicate we are running as a SEV-SNP guest? Yes. > Given that the referenced patch is part of the SEV-SNP host support series, > I'm inclined to suspect it won't be set for sure in HyperV SEV-SNP guests. It better be. If it is a modified guest - no matter how modified - it should set that flag. The vTOM thing is still being discussed. > And who is setting it for KVM SEV-SNP guests? That same patch does.
On Thu, Feb 16, 2023 at 03:07:28AM -0800, Jeremi Piotrowski wrote:
> Where does this check come from? I can't find a source for it.
That's the patch checker I've been writing while reviewing patches:
https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=vp
It is, ofc, WIP.
On 16.02.23 12:25, Borislav Petkov wrote: > On Thu, Feb 16, 2023 at 10:32:28AM +0100, Juergen Gross wrote: >> Is that flag _really_ meant to indicate we are running as a SEV-SNP guest? > > Yes. > >> Given that the referenced patch is part of the SEV-SNP host support series, >> I'm inclined to suspect it won't be set for sure in HyperV SEV-SNP guests. > > It better be. If it is a modified guest - no matter how modified - it > should set that flag. The vTOM thing is still being discussed. > >> And who is setting it for KVM SEV-SNP guests? > > That same patch does. Hmm, I must be blind. I can't spot it. I'm seeing only the feature bit #define and a call of setup_clear_cpu_cap(X86_FEATURE_SEV_SNP) in this patch. Or is it done by hardware or the hypervisor? Juergen
On Thu, Feb 16, 2023 at 01:19:22PM +0100, Juergen Gross wrote: > Hmm, I must be blind. I can't spot it. > > I'm seeing only the feature bit #define and a call of > setup_clear_cpu_cap(X86_FEATURE_SEV_SNP) in this patch. > > Or is it done by hardware or the hypervisor? Correction - I meant CC_ATTR_GUEST_SEV_SNP not the CPUID feature flag. Sorry for the confusion folks.
From: Borislav Petkov <bp@alien8.de> Sent: Thursday, February 16, 2023 4:29 AM > > On Thu, Feb 16, 2023 at 01:19:22PM +0100, Juergen Gross wrote: > > Hmm, I must be blind. I can't spot it. > > > > I'm seeing only the feature bit #define and a call of > > setup_clear_cpu_cap(X86_FEATURE_SEV_SNP) in this patch. > > > > Or is it done by hardware or the hypervisor? > > Correction - I meant CC_ATTR_GUEST_SEV_SNP not the CPUID feature flag. > In current upstream code, Hyper-V vTOM VMs aren't participating in the CC_ATTR_* scheme at all, so CC_ATTR_GUEST_SEV_SNP won't be set. Getting Hyper-V vTOM VMs integrated into that scheme is a key part of my big patch set[1] that we're separately trying to resolve the last issues with. Michael [1] https://lore.kernel.org/linux-hyperv/1673559753-94403-1-git-send-email-mikelley@microsoft.com/
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h index f0eeaf6e5f5f..0b8f51d683dc 100644 --- a/arch/x86/include/asm/mtrr.h +++ b/arch/x86/include/asm/mtrr.h @@ -31,6 +31,8 @@ */ # ifdef CONFIG_MTRR void mtrr_bp_init(void); +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var, + mtrr_type *fixed, mtrr_type def_type); extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform); extern void mtrr_save_fixed_ranges(void *); extern void mtrr_save_state(void); diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c index ee09d359e08f..788bc16888a5 100644 --- a/arch/x86/kernel/cpu/mtrr/generic.c +++ b/arch/x86/kernel/cpu/mtrr/generic.c @@ -240,6 +240,44 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end, return mtrr_state.def_type; } +/** + * mtrr_overwrite_state - set fixed MTRR state + * + * Used to set MTRR state via different means (e.g. with data obtained from + * a hypervisor). + */ +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var, + mtrr_type *fixed, mtrr_type def_type) +{ + unsigned int i; + + if (boot_cpu_has(X86_FEATURE_MTRR)) + return; + + if (var) { + if (num_var > MTRR_MAX_VAR_RANGES) { + pr_warn("Trying to overwrite MTRR state with %u variable entries\n", + num_var); + num_var = MTRR_MAX_VAR_RANGES; + } + for (i = 0; i < num_var; i++) + mtrr_state.var_ranges[i] = var[i]; + num_var_ranges = num_var; + } + + if (fixed) { + for (i = 0; i < MTRR_NUM_FIXED_RANGES; i++) + mtrr_state.fixed_ranges[i] = fixed[i]; + mtrr_state.enabled |= MTRR_STATE_MTRR_FIXED_ENABLED; + mtrr_state.have_fixed = 1; + } + + mtrr_state.def_type = def_type; + mtrr_state.enabled |= MTRR_STATE_MTRR_ENABLED; + + mtrr_state_set = 1; +} + /** * mtrr_type_lookup - look up memory type in MTRR * diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c index 542ca5639dfd..b73fe243c7fd 100644 --- a/arch/x86/kernel/cpu/mtrr/mtrr.c +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c @@ -668,6 +668,15 @@ void __init mtrr_bp_init(void) const char *why = "(not available)"; unsigned int phys_addr; + if (mtrr_state.enabled) { + /* Software overwrite of MTRR state, only for generic case. */ + mtrr_calc_physbits(true); + init_table(); + pr_info("MTRRs set to read-only\n"); + + return; + } + phys_addr = mtrr_calc_physbits(boot_cpu_has(X86_FEATURE_MTRR)); if (boot_cpu_has(X86_FEATURE_MTRR)) {