[v2,2/8] x86/mtrr: support setting MTRR state for software defined MTRRs

Message ID 20230209072220.6836-3-jgross@suse.com
State New
Headers
Series x86/mtrr: fix handling with PAT but without MTRR |

Commit Message

Juergen Gross Feb. 9, 2023, 7:22 a.m. UTC
  When running virtualized, MTRR access can be reduced (e.g. in Xen PV
guests or when running as a SEV-SNP guest under Hyper-V). Typically
the hypervisor will reset the MTRR feature in cpuid data, resulting
in no MTRR memory type information being available for the kernel.

This has turned out to result in problems:

- Hyper-V SEV-SNP guests using uncached mappings where they shouldn't
- Xen PV dom0 mapping memory as WB which should be UC- instead

Solve those problems by supporting to set a fixed MTRR state,
overwriting the empty state used today. In case such a state has been
set, don't call get_mtrr_state() in mtrr_bp_init(). The set state
will only be used by mtrr_type_lookup(), as in all other cases
mtrr_enabled() is being checked, which will return false. Accept the
overwrite call only in case of MTRRs being disabled in cpuid.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- new patch
---
 arch/x86/include/asm/mtrr.h        |  2 ++
 arch/x86/kernel/cpu/mtrr/generic.c | 38 ++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/mtrr/mtrr.c    |  9 +++++++
 3 files changed, 49 insertions(+)
  

Comments

Michael Kelley (LINUX) Feb. 13, 2023, 1:07 a.m. UTC | #1
From: Juergen Gross <jgross@suse.com> Sent: Wednesday, February 8, 2023 11:22 PM
> 
> When running virtualized, MTRR access can be reduced (e.g. in Xen PV
> guests or when running as a SEV-SNP guest under Hyper-V). Typically
> the hypervisor will reset the MTRR feature in cpuid data, resulting
> in no MTRR memory type information being available for the kernel.
> 
> This has turned out to result in problems:
> 
> - Hyper-V SEV-SNP guests using uncached mappings where they shouldn't
> - Xen PV dom0 mapping memory as WB which should be UC- instead
> 
> Solve those problems by supporting to set a fixed MTRR state,
> overwriting the empty state used today. In case such a state has been
> set, don't call get_mtrr_state() in mtrr_bp_init(). The set state
> will only be used by mtrr_type_lookup(), as in all other cases
> mtrr_enabled() is being checked, which will return false. Accept the
> overwrite call only in case of MTRRs being disabled in cpuid.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> V2:
> - new patch
> ---
>  arch/x86/include/asm/mtrr.h        |  2 ++
>  arch/x86/kernel/cpu/mtrr/generic.c | 38 ++++++++++++++++++++++++++++++
>  arch/x86/kernel/cpu/mtrr/mtrr.c    |  9 +++++++
>  3 files changed, 49 insertions(+)
> 
> diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
> index f0eeaf6e5f5f..0b8f51d683dc 100644
> --- a/arch/x86/include/asm/mtrr.h
> +++ b/arch/x86/include/asm/mtrr.h
> @@ -31,6 +31,8 @@
>   */
>  # ifdef CONFIG_MTRR
>  void mtrr_bp_init(void);
> +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
> +			  mtrr_type *fixed, mtrr_type def_type);

Could you add a stub for the !CONFIG_MTRR case?  Then the
#ifdef CONFIG_MTRR could be removed in Patch 3 of this series.

Michael
  
Juergen Gross Feb. 13, 2023, 6:27 a.m. UTC | #2
On 13.02.23 02:07, Michael Kelley (LINUX) wrote:
> From: Juergen Gross <jgross@suse.com> Sent: Wednesday, February 8, 2023 11:22 PM
>>
>> When running virtualized, MTRR access can be reduced (e.g. in Xen PV
>> guests or when running as a SEV-SNP guest under Hyper-V). Typically
>> the hypervisor will reset the MTRR feature in cpuid data, resulting
>> in no MTRR memory type information being available for the kernel.
>>
>> This has turned out to result in problems:
>>
>> - Hyper-V SEV-SNP guests using uncached mappings where they shouldn't
>> - Xen PV dom0 mapping memory as WB which should be UC- instead
>>
>> Solve those problems by supporting to set a fixed MTRR state,
>> overwriting the empty state used today. In case such a state has been
>> set, don't call get_mtrr_state() in mtrr_bp_init(). The set state
>> will only be used by mtrr_type_lookup(), as in all other cases
>> mtrr_enabled() is being checked, which will return false. Accept the
>> overwrite call only in case of MTRRs being disabled in cpuid.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>> V2:
>> - new patch
>> ---
>>   arch/x86/include/asm/mtrr.h        |  2 ++
>>   arch/x86/kernel/cpu/mtrr/generic.c | 38 ++++++++++++++++++++++++++++++
>>   arch/x86/kernel/cpu/mtrr/mtrr.c    |  9 +++++++
>>   3 files changed, 49 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
>> index f0eeaf6e5f5f..0b8f51d683dc 100644
>> --- a/arch/x86/include/asm/mtrr.h
>> +++ b/arch/x86/include/asm/mtrr.h
>> @@ -31,6 +31,8 @@
>>    */
>>   # ifdef CONFIG_MTRR
>>   void mtrr_bp_init(void);
>> +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
>> +			  mtrr_type *fixed, mtrr_type def_type);
> 
> Could you add a stub for the !CONFIG_MTRR case?  Then the
> #ifdef CONFIG_MTRR could be removed in Patch 3 of this series.

I was on the edge whether to add a stub. The Xen use case strongly
suggests that the code wants to be inside an #ifdef, while the Hyper-V
case is so simple, that it would benefit from the stub. As there was
another #ifdef just above the added code in mshyperv.c I believed it
would be fine without a stub. As you seem to like it better with the
stub, I can add it.


Juergen
  
Michael Kelley (LINUX) Feb. 13, 2023, 6:43 a.m. UTC | #3
From: Juergen Gross <jgross@suse.com> Sent: Sunday, February 12, 2023 10:27 PM
> 
> On 13.02.23 02:07, Michael Kelley (LINUX) wrote:
> > From: Juergen Gross <jgross@suse.com> Sent: Wednesday, February 8, 2023 11:22 PM
> >>
> >> When running virtualized, MTRR access can be reduced (e.g. in Xen PV
> >> guests or when running as a SEV-SNP guest under Hyper-V). Typically
> >> the hypervisor will reset the MTRR feature in cpuid data, resulting
> >> in no MTRR memory type information being available for the kernel.
> >>
> >> This has turned out to result in problems:
> >>
> >> - Hyper-V SEV-SNP guests using uncached mappings where they shouldn't
> >> - Xen PV dom0 mapping memory as WB which should be UC- instead
> >>
> >> Solve those problems by supporting to set a fixed MTRR state,
> >> overwriting the empty state used today. In case such a state has been
> >> set, don't call get_mtrr_state() in mtrr_bp_init(). The set state
> >> will only be used by mtrr_type_lookup(), as in all other cases
> >> mtrr_enabled() is being checked, which will return false. Accept the
> >> overwrite call only in case of MTRRs being disabled in cpuid.
> >>
> >> Signed-off-by: Juergen Gross <jgross@suse.com>
> >> ---
> >> V2:
> >> - new patch
> >> ---
> >>   arch/x86/include/asm/mtrr.h        |  2 ++
> >>   arch/x86/kernel/cpu/mtrr/generic.c | 38 ++++++++++++++++++++++++++++++
> >>   arch/x86/kernel/cpu/mtrr/mtrr.c    |  9 +++++++
> >>   3 files changed, 49 insertions(+)
> >>
> >> diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
> >> index f0eeaf6e5f5f..0b8f51d683dc 100644
> >> --- a/arch/x86/include/asm/mtrr.h
> >> +++ b/arch/x86/include/asm/mtrr.h
> >> @@ -31,6 +31,8 @@
> >>    */
> >>   # ifdef CONFIG_MTRR
> >>   void mtrr_bp_init(void);
> >> +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
> >> +			  mtrr_type *fixed, mtrr_type def_type);
> >
> > Could you add a stub for the !CONFIG_MTRR case?  Then the
> > #ifdef CONFIG_MTRR could be removed in Patch 3 of this series.
> 
> I was on the edge whether to add a stub. The Xen use case strongly
> suggests that the code wants to be inside an #ifdef, while the Hyper-V
> case is so simple, that it would benefit from the stub. As there was
> another #ifdef just above the added code in mshyperv.c I believed it
> would be fine without a stub. As you seem to like it better with the
> stub, I can add it.
> 

Thanks.  And that other #ifdef is going away soon ...

Michael
  
Borislav Petkov Feb. 13, 2023, 11:39 a.m. UTC | #4
On Thu, Feb 09, 2023 at 08:22:14AM +0100, Juergen Gross wrote:
> When running virtualized, MTRR access can be reduced (e.g. in Xen PV
> guests or when running as a SEV-SNP guest under Hyper-V). Typically
> the hypervisor will reset the MTRR feature in cpuid data, resulting
> in no MTRR memory type information being available for the kernel.
> 
> This has turned out to result in problems:
> 
> - Hyper-V SEV-SNP guests using uncached mappings where they shouldn't
> - Xen PV dom0 mapping memory as WB which should be UC- instead
> 
> Solve those problems by supporting to set a fixed MTRR state,
> overwriting the empty state used today. In case such a state has been
> set, don't call get_mtrr_state() in mtrr_bp_init(). The set state
> will only be used by mtrr_type_lookup(), as in all other cases
> mtrr_enabled() is being checked, which will return false. Accept the
> overwrite call only in case of MTRRs being disabled in cpuid.

s/cpuid/CPUID/g

> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> V2:
> - new patch
> ---
>  arch/x86/include/asm/mtrr.h        |  2 ++
>  arch/x86/kernel/cpu/mtrr/generic.c | 38 ++++++++++++++++++++++++++++++
>  arch/x86/kernel/cpu/mtrr/mtrr.c    |  9 +++++++
>  3 files changed, 49 insertions(+)
> 
> diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
> index f0eeaf6e5f5f..0b8f51d683dc 100644
> --- a/arch/x86/include/asm/mtrr.h
> +++ b/arch/x86/include/asm/mtrr.h
> @@ -31,6 +31,8 @@
>   */
>  # ifdef CONFIG_MTRR
>  void mtrr_bp_init(void);
> +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
> +			  mtrr_type *fixed, mtrr_type def_type);
>  extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
>  extern void mtrr_save_fixed_ranges(void *);
>  extern void mtrr_save_state(void);
> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
> index ee09d359e08f..788bc16888a5 100644
> --- a/arch/x86/kernel/cpu/mtrr/generic.c
> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> @@ -240,6 +240,44 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
>  	return mtrr_state.def_type;
>  }
>  
> +/**
> + * mtrr_overwrite_state - set fixed MTRR state

fixed only? You pass in variable too...

> + *
> + * Used to set MTRR state via different means (e.g. with data obtained from
> + * a hypervisor).
> + */
> +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
> +			  mtrr_type *fixed, mtrr_type def_type)
> +{
> +	unsigned int i;
> +
> +	if (boot_cpu_has(X86_FEATURE_MTRR))

check_for_deprecated_apis: WARNING: arch/x86/kernel/cpu/mtrr/generic.c:254: Do not use boot_cpu_has() - use cpu_feature_enabled() instead.

> +		return;

So this here needs to check:

	if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR) &&
	    !(cpu_feature_enabled(X86_FEATURE_SEV_SNP) ||
	      cpu_feature_enabled(X86_FEATURE_XENPV))) {
		WARN_ON_ONCE(1);
		return;
	}

as we don't want this to be called somewhere or by something else.

The SEV_SNP flag can be used from:

https://lore.kernel.org/r/20221214194056.161492-14-michael.roth@amd.com

I'm assuming here HyperV SEV-SNP guests really do set that feature flag
(they better). We can expedite that patch ofc.

And for dom0 I *think* we use X86_FEATURE_XENPV but I leave that to you.

> +
> +	if (var) {
> +		if (num_var > MTRR_MAX_VAR_RANGES) {
> +			pr_warn("Trying to overwrite MTRR state with %u variable entries\n",
> +				num_var);

What's that check for? Sanity of callers?

> +			num_var = MTRR_MAX_VAR_RANGES;
> +		}
> +		for (i = 0; i < num_var; i++)
> +			mtrr_state.var_ranges[i] = var[i];
> +		num_var_ranges = num_var;
> +	}
> +
> +	if (fixed) {
> +		for (i = 0; i < MTRR_NUM_FIXED_RANGES; i++)

You're not doing this sanity check here, expecting that callers would
know what they're doing...

> +			mtrr_state.fixed_ranges[i] = fixed[i];
> +		mtrr_state.enabled |= MTRR_STATE_MTRR_FIXED_ENABLED;
> +		mtrr_state.have_fixed = 1;
> +	}
> +
> +	mtrr_state.def_type = def_type;
> +	mtrr_state.enabled |= MTRR_STATE_MTRR_ENABLED;
> +
> +	mtrr_state_set = 1;
> +}

I can't say that I'm crazy about the call sites:

	mtrr_overwrite_state(NULL, 0, NULL, MTRR_TYPE_WRBACK);

This looks like it wants a

	mtrr_override_def_type(MTRR_TYPE_WRBACK);

instead of passing in all those nulls as params.

This:

	mtrr_overwrite_state(var, reg, NULL, MTRR_TYPE_UNCACHABLE);

I guess is a bit better.

Dunno, if it is only those two callers we can say, meh, whatever, this
interface is not pretty but does the job at least. But if more users
start popping up then I guess we can do

	mtrr_override_fixed()
	mtrr_override_variable()
	mtrr_override_def_type()

...


>  /**
>   * mtrr_type_lookup - look up memory type in MTRR
>   *
> diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
> index 542ca5639dfd..b73fe243c7fd 100644
> --- a/arch/x86/kernel/cpu/mtrr/mtrr.c
> +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
> @@ -668,6 +668,15 @@ void __init mtrr_bp_init(void)
>  	const char *why = "(not available)";
>  	unsigned int phys_addr;
>  
> +	if (mtrr_state.enabled) {

Not crazy about this either: this relies on the fragile boot ordering
where init_hypervisor_platform() runs before this so it has a chance
that mtrr_state.enabled will be already set.

Yeah, yeah, cache_bp_init() and all the MTRR BSP setup stuff happens
after it but there should at least be a comment over
init_hypervisor_platform()'s call site in setup_arch() stating that
cache_bp_init() needs to happen *after* it because <reason>.

I think we should also check

	x86_hyper_type

here and not do anything if not set. As this is all HV-related muck.

Xen I guess is a bit better because that call there happens even earlier
but we need the comments to say that the ordering matters because future
reorganization could cause it to blow up and people would search
themselves crazy why in the hell it breaks...

Can Xen use x86_hyper_type() too?

Thx.
  
Juergen Gross Feb. 13, 2023, 2:07 p.m. UTC | #5
On 13.02.23 12:39, Borislav Petkov wrote:
> On Thu, Feb 09, 2023 at 08:22:14AM +0100, Juergen Gross wrote:
>> When running virtualized, MTRR access can be reduced (e.g. in Xen PV
>> guests or when running as a SEV-SNP guest under Hyper-V). Typically
>> the hypervisor will reset the MTRR feature in cpuid data, resulting
>> in no MTRR memory type information being available for the kernel.
>>
>> This has turned out to result in problems:
>>
>> - Hyper-V SEV-SNP guests using uncached mappings where they shouldn't
>> - Xen PV dom0 mapping memory as WB which should be UC- instead
>>
>> Solve those problems by supporting to set a fixed MTRR state,
>> overwriting the empty state used today. In case such a state has been
>> set, don't call get_mtrr_state() in mtrr_bp_init(). The set state
>> will only be used by mtrr_type_lookup(), as in all other cases
>> mtrr_enabled() is being checked, which will return false. Accept the
>> overwrite call only in case of MTRRs being disabled in cpuid.
> 
> s/cpuid/CPUID/g

Okay.

> 
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>> V2:
>> - new patch
>> ---
>>   arch/x86/include/asm/mtrr.h        |  2 ++
>>   arch/x86/kernel/cpu/mtrr/generic.c | 38 ++++++++++++++++++++++++++++++
>>   arch/x86/kernel/cpu/mtrr/mtrr.c    |  9 +++++++
>>   3 files changed, 49 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
>> index f0eeaf6e5f5f..0b8f51d683dc 100644
>> --- a/arch/x86/include/asm/mtrr.h
>> +++ b/arch/x86/include/asm/mtrr.h
>> @@ -31,6 +31,8 @@
>>    */
>>   # ifdef CONFIG_MTRR
>>   void mtrr_bp_init(void);
>> +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
>> +			  mtrr_type *fixed, mtrr_type def_type);
>>   extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
>>   extern void mtrr_save_fixed_ranges(void *);
>>   extern void mtrr_save_state(void);
>> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
>> index ee09d359e08f..788bc16888a5 100644
>> --- a/arch/x86/kernel/cpu/mtrr/generic.c
>> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
>> @@ -240,6 +240,44 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
>>   	return mtrr_state.def_type;
>>   }
>>   
>> +/**
>> + * mtrr_overwrite_state - set fixed MTRR state
> 
> fixed only? You pass in variable too...

Fixed in the sense of static.

> 
>> + *
>> + * Used to set MTRR state via different means (e.g. with data obtained from
>> + * a hypervisor).
>> + */
>> +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
>> +			  mtrr_type *fixed, mtrr_type def_type)
>> +{
>> +	unsigned int i;
>> +
>> +	if (boot_cpu_has(X86_FEATURE_MTRR))
> 
> check_for_deprecated_apis: WARNING: arch/x86/kernel/cpu/mtrr/generic.c:254: Do not use boot_cpu_has() - use cpu_feature_enabled() instead.

Okay.

> 
>> +		return;
> 
> So this here needs to check:
> 
> 	if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR) &&
> 	    !(cpu_feature_enabled(X86_FEATURE_SEV_SNP) ||
> 	      cpu_feature_enabled(X86_FEATURE_XENPV))) {
> 		WARN_ON_ONCE(1);
> 		return;
> 	}
> 
> as we don't want this to be called somewhere or by something else.

Wouldn't !cpu_feature_enabled(X86_FEATURE_HYPERVISOR) be enough?

I'm not sure we won't need that for TDX guests, too.

> 
> The SEV_SNP flag can be used from:
> 
> https://lore.kernel.org/r/20221214194056.161492-14-michael.roth@amd.com
> 
> I'm assuming here HyperV SEV-SNP guests really do set that feature flag
> (they better). We can expedite that patch ofc.
> 
> And for dom0 I *think* we use X86_FEATURE_XENPV but I leave that to you.

Yes, it is only relevant for PV dom0.

> 
>> +
>> +	if (var) {
>> +		if (num_var > MTRR_MAX_VAR_RANGES) {
>> +			pr_warn("Trying to overwrite MTRR state with %u variable entries\n",
>> +				num_var);
> 
> What's that check for? Sanity of callers?

Yes.

> 
>> +			num_var = MTRR_MAX_VAR_RANGES;
>> +		}
>> +		for (i = 0; i < num_var; i++)
>> +			mtrr_state.var_ranges[i] = var[i];
>> +		num_var_ranges = num_var;
>> +	}
>> +
>> +	if (fixed) {
>> +		for (i = 0; i < MTRR_NUM_FIXED_RANGES; i++)
> 
> You're not doing this sanity check here, expecting that callers would
> know what they're doing...

The number of fixed MTRRs is not dynamic AFAIK.

> 
>> +			mtrr_state.fixed_ranges[i] = fixed[i];
>> +		mtrr_state.enabled |= MTRR_STATE_MTRR_FIXED_ENABLED;
>> +		mtrr_state.have_fixed = 1;
>> +	}
>> +
>> +	mtrr_state.def_type = def_type;
>> +	mtrr_state.enabled |= MTRR_STATE_MTRR_ENABLED;
>> +
>> +	mtrr_state_set = 1;
>> +}
> 
> I can't say that I'm crazy about the call sites:
> 
> 	mtrr_overwrite_state(NULL, 0, NULL, MTRR_TYPE_WRBACK);
> 
> This looks like it wants a
> 
> 	mtrr_override_def_type(MTRR_TYPE_WRBACK);
> 
> instead of passing in all those nulls as params.
> 
> This:
> 
> 	mtrr_overwrite_state(var, reg, NULL, MTRR_TYPE_UNCACHABLE);
> 
> I guess is a bit better.
> 
> Dunno, if it is only those two callers we can say, meh, whatever, this
> interface is not pretty but does the job at least. But if more users
> start popping up then I guess we can do
> 
> 	mtrr_override_fixed()
> 	mtrr_override_variable()
> 	mtrr_override_def_type()

A single interface makes it easier to avoid multiple calls.

In the end I'm fine with either way.

> 
> ...
> 
> 
>>   /**
>>    * mtrr_type_lookup - look up memory type in MTRR
>>    *
>> diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
>> index 542ca5639dfd..b73fe243c7fd 100644
>> --- a/arch/x86/kernel/cpu/mtrr/mtrr.c
>> +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
>> @@ -668,6 +668,15 @@ void __init mtrr_bp_init(void)
>>   	const char *why = "(not available)";
>>   	unsigned int phys_addr;
>>   
>> +	if (mtrr_state.enabled) {
> 
> Not crazy about this either: this relies on the fragile boot ordering
> where init_hypervisor_platform() runs before this so it has a chance
> that mtrr_state.enabled will be already set.
> 
> Yeah, yeah, cache_bp_init() and all the MTRR BSP setup stuff happens
> after it but there should at least be a comment over
> init_hypervisor_platform()'s call site in setup_arch() stating that
> cache_bp_init() needs to happen *after* it because <reason>.

I'll add that comment.

> 
> I think we should also check
> 
> 	x86_hyper_type
> 
> here and not do anything if not set. As this is all HV-related muck.
> 
> Xen I guess is a bit better because that call there happens even earlier
> but we need the comments to say that the ordering matters because future
> reorganization could cause it to blow up and people would search
> themselves crazy why in the hell it breaks...
> 
> Can Xen use x86_hyper_type() too?

It does.


Juergen
  
Borislav Petkov Feb. 13, 2023, 3:03 p.m. UTC | #6
On Mon, Feb 13, 2023 at 03:07:07PM +0100, Juergen Gross wrote:
> Fixed in the sense of static.

Well, you can't use "fixed" to say "static" when former means something
very specific already in MTRR land.

> Wouldn't !cpu_feature_enabled(X86_FEATURE_HYPERVISOR) be enough?
> 
> I'm not sure we won't need that for TDX guests, too.

See, that's the problem. I wanna have it simple too. Lemme check with
dhansen.

> Yes, it is only relevant for PV dom0.

Right, I was asking whether "PV dom0" == X86_FEATURE_XENPV?

:)

> The number of fixed MTRRs is not dynamic AFAIK.

But nothing guarantees that the caller would pass an array "mtrr_type
*fixed" of size MTRR_NUM_FIXED_RANGES, right?

> A single interface makes it easier to avoid multiple calls.
> 
> In the end I'm fine with either way.

Yeah, I know. Question is, how much of this functionality will be
needed/used so that we can go all out on the interface design or we can
do a single one and forget about it...

> > Can Xen use x86_hyper_type() too?
> 
> It does.

Then pls add a x86_hyper_type check too to make sure a potential move of
this call is caught in the future.

Thx.
  
Borislav Petkov Feb. 13, 2023, 3:11 p.m. UTC | #7
On Mon, Feb 13, 2023 at 04:03:07PM +0100, Borislav Petkov wrote:
> > Wouldn't !cpu_feature_enabled(X86_FEATURE_HYPERVISOR) be enough?
> >
> > I'm not sure we won't need that for TDX guests, too.
>
> See, that's the problem. I wanna have it simple too. Lemme check with
> dhansen.

He says MTRRs are enabled in TDX guests: "X86_FEATURE_MTRR is fixed to
1 in TDX guests."

So we will have to do the more finer-grained check I guess.
  
Juergen Gross Feb. 13, 2023, 3:18 p.m. UTC | #8
On 13.02.23 16:11, Borislav Petkov wrote:
> On Mon, Feb 13, 2023 at 04:03:07PM +0100, Borislav Petkov wrote:
>>> Wouldn't !cpu_feature_enabled(X86_FEATURE_HYPERVISOR) be enough?
>>>
>>> I'm not sure we won't need that for TDX guests, too.
>>
>> See, that's the problem. I wanna have it simple too. Lemme check with
>> dhansen.
> 
> He says MTRRs are enabled in TDX guests: "X86_FEATURE_MTRR is fixed to
> 1 in TDX guests."
> 
> So we will have to do the more finer-grained check I guess.

Isn't the check for !X86_FEATURE_MTRR && X86_FEATURE_HYPERVISOR enough
then?

Yes, you still could construct cases where it would go wrong, but I don't
think we should over-engineer it.


Juergen
  
Dave Hansen Feb. 13, 2023, 3:27 p.m. UTC | #9
On 2/13/23 07:11, Borislav Petkov wrote:
> On Mon, Feb 13, 2023 at 04:03:07PM +0100, Borislav Petkov wrote:
>>> Wouldn't !cpu_feature_enabled(X86_FEATURE_HYPERVISOR) be enough?
>>>
>>> I'm not sure we won't need that for TDX guests, too.
>> See, that's the problem. I wanna have it simple too. Lemme check with
>> dhansen.
> He says MTRRs are enabled in TDX guests: "X86_FEATURE_MTRR is fixed to
> 1 in TDX guests."
> 
> So we will have to do the more finer-grained check I guess.

Yes, TDX guests see MTRRs as being supported.  But, the TDX module also
appears to inject a #VE for all RDMSR or WRMSR's to the MTRRs.  That
makes them effectively useless.

I actually don't know what the heck TDX guests are supposed to do if
they feel like mucking with the MSRs.  The architecture (CPUID) is
essentially telling them: "Sure, go ahead MTRRs are fiiiiiiine".  But
the TDX module is sitting there throwing exceptions (#VE) if the guest
tries to touch MTRRs.

It sounds like there are some guest<->host ABIs on Xen to help the
guests do this.  But I don't see anything in the TDX "GHCI" about it.
  
Juergen Gross Feb. 13, 2023, 3:36 p.m. UTC | #10
On 13.02.23 16:03, Borislav Petkov wrote:
> On Mon, Feb 13, 2023 at 03:07:07PM +0100, Juergen Gross wrote:
>> Fixed in the sense of static.
> 
> Well, you can't use "fixed" to say "static" when former means something
> very specific already in MTRR land.
> 
>> Wouldn't !cpu_feature_enabled(X86_FEATURE_HYPERVISOR) be enough?
>>
>> I'm not sure we won't need that for TDX guests, too.
> 
> See, that's the problem. I wanna have it simple too. Lemme check with
> dhansen.
> 
>> Yes, it is only relevant for PV dom0.
> 
> Right, I was asking whether "PV dom0" == X86_FEATURE_XENPV?

No, you can have PV guests not being dom0.

> 
> :)
> 
>> The number of fixed MTRRs is not dynamic AFAIK.
> 
> But nothing guarantees that the caller would pass an array "mtrr_type
> *fixed" of size MTRR_NUM_FIXED_RANGES, right?

Right.

In the end I wouldn't mind dropping the fixed MTRRs from the interface, as
they are currently not needed at all.

> 
>> A single interface makes it easier to avoid multiple calls.
>>
>> In the end I'm fine with either way.
> 
> Yeah, I know. Question is, how much of this functionality will be
> needed/used so that we can go all out on the interface design or we can
> do a single one and forget about it...

I'd say we go with what is needed right now. And having a single interface
makes all the sanity checking you are asking for easier.

> 
>>> Can Xen use x86_hyper_type() too?
>>
>> It does.
> 
> Then pls add a x86_hyper_type check too to make sure a potential move of
> this call is caught in the future.

What are you especially asking for?

With my current patches Xen PV dom0 will call mtrr_overwrite_state() before
x86_hyper_type is set, while a Hyper-V SEV-SNP guest will make the call after
it has been set. Both calls happen before cache_bp_init().

So I could move the mtrr_overwrite_state() call for Xen PV dom0 into its
init_platform() callback and check in mtrr_overwrite_state() x86_hyper_type
to be set, or I could reject a call of mtrr_overwrite_state() after the call
of cache_bp_init() has happened, or I could do both.


Juergen
  
Juergen Gross Feb. 13, 2023, 3:38 p.m. UTC | #11
On 13.02.23 16:27, Dave Hansen wrote:
> On 2/13/23 07:11, Borislav Petkov wrote:
>> On Mon, Feb 13, 2023 at 04:03:07PM +0100, Borislav Petkov wrote:
>>>> Wouldn't !cpu_feature_enabled(X86_FEATURE_HYPERVISOR) be enough?
>>>>
>>>> I'm not sure we won't need that for TDX guests, too.
>>> See, that's the problem. I wanna have it simple too. Lemme check with
>>> dhansen.
>> He says MTRRs are enabled in TDX guests: "X86_FEATURE_MTRR is fixed to
>> 1 in TDX guests."
>>
>> So we will have to do the more finer-grained check I guess.
> 
> Yes, TDX guests see MTRRs as being supported.  But, the TDX module also
> appears to inject a #VE for all RDMSR or WRMSR's to the MTRRs.  That
> makes them effectively useless.
> 
> I actually don't know what the heck TDX guests are supposed to do if
> they feel like mucking with the MSRs.  The architecture (CPUID) is
> essentially telling them: "Sure, go ahead MTRRs are fiiiiiiine".  But
> the TDX module is sitting there throwing exceptions (#VE) if the guest
> tries to touch MTRRs.
> 
> It sounds like there are some guest<->host ABIs on Xen to help the
> guests do this.  But I don't see anything in the TDX "GHCI" about it.

This is in line of the PAT init sequence of TDX guests. PAT is said to
be supported, but a TDX guest can't use the sequence as written in the
SDM for setting the PAT MSR (disable caches, etc.).


Juergen
  
Borislav Petkov Feb. 13, 2023, 3:40 p.m. UTC | #12
On Mon, Feb 13, 2023 at 04:18:48PM +0100, Juergen Gross wrote:
> Yes, you still could construct cases where it would go wrong, but I don't
> think we should over-engineer it.

Actually, we should allow only those for which we know they get special
treatment for MTRRs settings and warn for all the rest.

And judging by Dave's reply, I think TDX should be in that category too
since it throws #VEs...
  
Juergen Gross Feb. 13, 2023, 3:44 p.m. UTC | #13
On 13.02.23 16:40, Borislav Petkov wrote:
> On Mon, Feb 13, 2023 at 04:18:48PM +0100, Juergen Gross wrote:
>> Yes, you still could construct cases where it would go wrong, but I don't
>> think we should over-engineer it.
> 
> Actually, we should allow only those for which we know they get special
> treatment for MTRRs settings and warn for all the rest.
> 
> And judging by Dave's reply, I think TDX should be in that category too
> since it throws #VEs...
> 

Okay, and it has MTRRs enabled (as Hyper-V SEV-SNP guests), so I shouldn't
test that, I guess (or we should disable the feature before calling the
overwrite function).


Juergen
  
Borislav Petkov Feb. 13, 2023, 6:43 p.m. UTC | #14
On Mon, Feb 13, 2023 at 04:36:12PM +0100, Juergen Gross wrote:
> In the end I wouldn't mind dropping the fixed MTRRs from the interface, as
> they are currently not needed at all.

Yes, the less the better.

> I'd say we go with what is needed right now. And having a single interface
> makes all the sanity checking you are asking for easier.

I guess I need to remember to finish designing this if more users
appear...

> What are you especially asking for?
> 
> With my current patches Xen PV dom0 will call mtrr_overwrite_state() before
> x86_hyper_type is set, while a Hyper-V SEV-SNP guest will make the call after
> it has been set. Both calls happen before cache_bp_init().
> 
> So I could move the mtrr_overwrite_state() call for Xen PV dom0 into its
> init_platform() callback and check in mtrr_overwrite_state() x86_hyper_type
> to be set,

I believe that is good enough, see below.

> or I could reject a call of mtrr_overwrite_state() after the call of
> cache_bp_init() has happened, or I could do both.

I think one thing is enough as we'll be loud enough.

---
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index b73fe243c7fd..2dbe2c10e959 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -49,6 +49,7 @@
 #include <asm/cacheinfo.h>
 #include <asm/cpufeature.h>
 #include <asm/e820/api.h>
+#include <asm/hypervisor.h>
 #include <asm/mtrr.h>
 #include <asm/msr.h>
 #include <asm/memtype.h>
@@ -668,7 +669,12 @@ void __init mtrr_bp_init(void)
 	const char *why = "(not available)";
 	unsigned int phys_addr;
 
+#ifdef CONFIG_HYPERVISOR_GUEST
 	if (mtrr_state.enabled) {
+
+		/* This should not happen without a hypervisor present. */
+		WARN_ON_ONCE(!x86_hyper_type);
+
 		/* Software overwrite of MTRR state, only for generic case. */
 		mtrr_calc_physbits(true);
 		init_table();
@@ -676,6 +682,7 @@ void __init mtrr_bp_init(void)
 
 		return;
 	}
+#endif
 
 	phys_addr = mtrr_calc_physbits(boot_cpu_has(X86_FEATURE_MTRR));
  
Borislav Petkov Feb. 13, 2023, 6:53 p.m. UTC | #15
On Mon, Feb 13, 2023 at 04:44:09PM +0100, Juergen Gross wrote:
> Okay, and it has MTRRs enabled (as Hyper-V SEV-SNP guests), so I shouldn't
> test that, I guess (or we should disable the feature before calling the
> overwrite function).

I think we should handle TDX the same way - as if the MTRRs are
read-only there. So you can check X86_FEATURE_TDX_GUEST in addition.

Thx.
  
Kirill A. Shutemov Feb. 14, 2023, 12:45 a.m. UTC | #16
On Mon, Feb 13, 2023 at 03:07:07PM +0100, Juergen Gross wrote:
> > So this here needs to check:
> > 
> > 	if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR) &&
> > 	    !(cpu_feature_enabled(X86_FEATURE_SEV_SNP) ||
> > 	      cpu_feature_enabled(X86_FEATURE_XENPV))) {
> > 		WARN_ON_ONCE(1);
> > 		return;
> > 	}
> > 
> > as we don't want this to be called somewhere or by something else.
> 
> Wouldn't !cpu_feature_enabled(X86_FEATURE_HYPERVISOR) be enough?
> 
> I'm not sure we won't need that for TDX guests, too.

TDX guests are covered by X86_FEATURE_HYPERVISOR.
  
Juergen Gross Feb. 14, 2023, 7:01 a.m. UTC | #17
On 13.02.23 19:43, Borislav Petkov wrote:
> On Mon, Feb 13, 2023 at 04:36:12PM +0100, Juergen Gross wrote:
>> In the end I wouldn't mind dropping the fixed MTRRs from the interface, as
>> they are currently not needed at all.
> 
> Yes, the less the better.
> 
>> I'd say we go with what is needed right now. And having a single interface
>> makes all the sanity checking you are asking for easier.
> 
> I guess I need to remember to finish designing this if more users
> appear...
> 
>> What are you especially asking for?
>>
>> With my current patches Xen PV dom0 will call mtrr_overwrite_state() before
>> x86_hyper_type is set, while a Hyper-V SEV-SNP guest will make the call after
>> it has been set. Both calls happen before cache_bp_init().
>>
>> So I could move the mtrr_overwrite_state() call for Xen PV dom0 into its
>> init_platform() callback and check in mtrr_overwrite_state() x86_hyper_type
>> to be set,
> 
> I believe that is good enough, see below.
> 
>> or I could reject a call of mtrr_overwrite_state() after the call of
>> cache_bp_init() has happened, or I could do both.
> 
> I think one thing is enough as we'll be loud enough.
> 
> ---
> diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
> index b73fe243c7fd..2dbe2c10e959 100644
> --- a/arch/x86/kernel/cpu/mtrr/mtrr.c
> +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
> @@ -49,6 +49,7 @@
>   #include <asm/cacheinfo.h>
>   #include <asm/cpufeature.h>
>   #include <asm/e820/api.h>
> +#include <asm/hypervisor.h>
>   #include <asm/mtrr.h>
>   #include <asm/msr.h>
>   #include <asm/memtype.h>
> @@ -668,7 +669,12 @@ void __init mtrr_bp_init(void)
>   	const char *why = "(not available)";
>   	unsigned int phys_addr;
>   
> +#ifdef CONFIG_HYPERVISOR_GUEST
>   	if (mtrr_state.enabled) {
> +
> +		/* This should not happen without a hypervisor present. */
> +		WARN_ON_ONCE(!x86_hyper_type);
> +
>   		/* Software overwrite of MTRR state, only for generic case. */
>   		mtrr_calc_physbits(true);
>   		init_table();
> @@ -676,6 +682,7 @@ void __init mtrr_bp_init(void)
>   
>   		return;
>   	}
> +#endif

I will change this a little bit in order to avoid the #ifdef by using
"WARN_ON(hypervisor_is_type() == X86_HYPER_NATIVE);"


Juergen
  
Juergen Gross Feb. 14, 2023, 7:04 a.m. UTC | #18
On 13.02.23 19:53, Borislav Petkov wrote:
> On Mon, Feb 13, 2023 at 04:44:09PM +0100, Juergen Gross wrote:
>> Okay, and it has MTRRs enabled (as Hyper-V SEV-SNP guests), so I shouldn't
>> test that, I guess (or we should disable the feature before calling the
>> overwrite function).
> 
> I think we should handle TDX the same way - as if the MTRRs are
> read-only there. So you can check X86_FEATURE_TDX_GUEST in addition.

Okay, if you really want to dictate the allowed use cases (this seems to be
a layering violation), but you are the maintainer of that code.


Juergen
  
Borislav Petkov Feb. 14, 2023, 8:58 a.m. UTC | #19
On Tue, Feb 14, 2023 at 08:04:47AM +0100, Juergen Gross wrote:
> Okay, if you really want to dictate the allowed use cases (this seems to be

Read upthread - TDX guests cause #VEs for MTRR accesses. #VEs which are
unneeded and should be avoided if possible.

> a layering violation), but you are the maintainer of that code.

And why are you so against catching misuses of this, which should
absolutely *not* be needed by anything else?
  
Juergen Gross Feb. 14, 2023, 9:02 a.m. UTC | #20
On 14.02.23 09:58, Borislav Petkov wrote:
> On Tue, Feb 14, 2023 at 08:04:47AM +0100, Juergen Gross wrote:
>> Okay, if you really want to dictate the allowed use cases (this seems to be
> 
> Read upthread - TDX guests cause #VEs for MTRR accesses. #VEs which are
> unneeded and should be avoided if possible.

Of course, I don't question the need for TDX guests to use the overwrite.

> 
>> a layering violation), but you are the maintainer of that code.
> 
> And why are you so against catching misuses of this, which should
> absolutely *not* be needed by anything else

I just don't like the idea of trying to catch all possible misuses in
lower levels, at the same time introducing the need to modify those
tests in case a new valid use case is popping up.

But as said, you are the maintainer, so its your final decision.


Juergen
  
Borislav Petkov Feb. 14, 2023, 9:10 a.m. UTC | #21
On Tue, Feb 14, 2023 at 10:02:51AM +0100, Juergen Gross wrote:
> I just don't like the idea of trying to catch all possible misuses in
> lower levels, at the same time introducing the need to modify those
> tests in case a new valid use case is popping up.

So what would you do: generally allow this so that potentially other
guest configurations misuse it?

And when we decide to change it, all those users will come running and
complaining that we broke it?

And then we're stuck with a nasty workaround in the tree because we have
to support them too?

See, all we do here is because of such misguided (or maybe didn't know
better) decisions which have happened a long time ago.
  
Juergen Gross Feb. 14, 2023, 9:17 a.m. UTC | #22
On 14.02.23 10:10, Borislav Petkov wrote:
> On Tue, Feb 14, 2023 at 10:02:51AM +0100, Juergen Gross wrote:
>> I just don't like the idea of trying to catch all possible misuses in
>> lower levels, at the same time introducing the need to modify those
>> tests in case a new valid use case is popping up.
> 
> So what would you do: generally allow this so that potentially other
> guest configurations misuse it?

I guess this largely depends on the functionality. I don't see why anyone
would try to use MTRR overwrite functionality without really needing it.

But maybe I'm wrong here and I'm under-estimating the "creativity" of
kernel hackers.

> And when we decide to change it, all those users will come running and
> complaining that we broke it?
> 
> And then we're stuck with a nasty workaround in the tree because we have
> to support them too?
> 
> See, all we do here is because of such misguided (or maybe didn't know
> better) decisions which have happened a long time ago.

I can see your point.

Maybe I haven't seen enough crazy hacks yet. :-)

No need to further discuss this topic from my side, as I have voiced my
opinion and you did so, too. I will add the tests you are asking for.


Juergen
  
Borislav Petkov Feb. 14, 2023, 9:32 a.m. UTC | #23
On Tue, Feb 14, 2023 at 10:17:12AM +0100, Juergen Gross wrote:
> I guess this largely depends on the functionality. I don't see why anyone
> would try to use MTRR overwrite functionality without really needing it.
>
> But maybe I'm wrong here and I'm under-estimating the "creativity" of
> kernel hackers.

This is exactly it - if it is there, it will get used eventually.

Think of it this way: this is a special, well, kinda hack, if you will,
which *nothing* else would need. We can always relax the condition for
using it if something else appears with a valid use case.

What we can't do nearly as easily is the reverse: remove it or tighten
the check later.

So the general policy is: workarounds like this need to be as
specialized as possible.

> Maybe I haven't seen enough crazy hacks yet. :-)

You're kidding, right? You hack on Xen for a long time... :-P

> No need to further discuss this topic from my side, as I have voiced my
> opinion and you did so, too. I will add the tests you are asking for.

Thanks!
  
Juergen Gross Feb. 16, 2023, 9:32 a.m. UTC | #24
On 13.02.23 12:39, Borislav Petkov wrote:
> On Thu, Feb 09, 2023 at 08:22:14AM +0100, Juergen Gross wrote:
>> When running virtualized, MTRR access can be reduced (e.g. in Xen PV
>> guests or when running as a SEV-SNP guest under Hyper-V). Typically
>> the hypervisor will reset the MTRR feature in cpuid data, resulting
>> in no MTRR memory type information being available for the kernel.
>>
>> This has turned out to result in problems:
>>
>> - Hyper-V SEV-SNP guests using uncached mappings where they shouldn't
>> - Xen PV dom0 mapping memory as WB which should be UC- instead
>>
>> Solve those problems by supporting to set a fixed MTRR state,
>> overwriting the empty state used today. In case such a state has been
>> set, don't call get_mtrr_state() in mtrr_bp_init(). The set state
>> will only be used by mtrr_type_lookup(), as in all other cases
>> mtrr_enabled() is being checked, which will return false. Accept the
>> overwrite call only in case of MTRRs being disabled in cpuid.
> 
> s/cpuid/CPUID/g
> 
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>> V2:
>> - new patch
>> ---
>>   arch/x86/include/asm/mtrr.h        |  2 ++
>>   arch/x86/kernel/cpu/mtrr/generic.c | 38 ++++++++++++++++++++++++++++++
>>   arch/x86/kernel/cpu/mtrr/mtrr.c    |  9 +++++++
>>   3 files changed, 49 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
>> index f0eeaf6e5f5f..0b8f51d683dc 100644
>> --- a/arch/x86/include/asm/mtrr.h
>> +++ b/arch/x86/include/asm/mtrr.h
>> @@ -31,6 +31,8 @@
>>    */
>>   # ifdef CONFIG_MTRR
>>   void mtrr_bp_init(void);
>> +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
>> +			  mtrr_type *fixed, mtrr_type def_type);
>>   extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
>>   extern void mtrr_save_fixed_ranges(void *);
>>   extern void mtrr_save_state(void);
>> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
>> index ee09d359e08f..788bc16888a5 100644
>> --- a/arch/x86/kernel/cpu/mtrr/generic.c
>> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
>> @@ -240,6 +240,44 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
>>   	return mtrr_state.def_type;
>>   }
>>   
>> +/**
>> + * mtrr_overwrite_state - set fixed MTRR state
> 
> fixed only? You pass in variable too...
> 
>> + *
>> + * Used to set MTRR state via different means (e.g. with data obtained from
>> + * a hypervisor).
>> + */
>> +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
>> +			  mtrr_type *fixed, mtrr_type def_type)
>> +{
>> +	unsigned int i;
>> +
>> +	if (boot_cpu_has(X86_FEATURE_MTRR))
> 
> check_for_deprecated_apis: WARNING: arch/x86/kernel/cpu/mtrr/generic.c:254: Do not use boot_cpu_has() - use cpu_feature_enabled() instead.
> 
>> +		return;
> 
> So this here needs to check:
> 
> 	if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR) &&
> 	    !(cpu_feature_enabled(X86_FEATURE_SEV_SNP) ||
> 	      cpu_feature_enabled(X86_FEATURE_XENPV))) {
> 		WARN_ON_ONCE(1);
> 		return;
> 	}
> 
> as we don't want this to be called somewhere or by something else.
> 
> The SEV_SNP flag can be used from:
> 
> https://lore.kernel.org/r/20221214194056.161492-14-michael.roth@amd.com
> 
> I'm assuming here HyperV SEV-SNP guests really do set that feature flag
> (they better). We can expedite that patch ofc.

Is that flag _really_ meant to indicate we are running as a SEV-SNP guest?

Given that the referenced patch is part of the SEV-SNP host support series,
I'm inclined to suspect it won't be set for sure in HyperV SEV-SNP guests.
And who is setting it for KVM SEV-SNP guests?


Juergen
  
Jeremi Piotrowski Feb. 16, 2023, 11:02 a.m. UTC | #25
On Thu, Feb 16, 2023 at 10:32:28AM +0100, Juergen Gross wrote:
> On 13.02.23 12:39, Borislav Petkov wrote:
> >On Thu, Feb 09, 2023 at 08:22:14AM +0100, Juergen Gross wrote:
> >>When running virtualized, MTRR access can be reduced (e.g. in Xen PV
> >>guests or when running as a SEV-SNP guest under Hyper-V). Typically
> >>the hypervisor will reset the MTRR feature in cpuid data, resulting
> >>in no MTRR memory type information being available for the kernel.
> >>
> >>This has turned out to result in problems:
> >>
> >>- Hyper-V SEV-SNP guests using uncached mappings where they shouldn't
> >>- Xen PV dom0 mapping memory as WB which should be UC- instead
> >>
> >>Solve those problems by supporting to set a fixed MTRR state,
> >>overwriting the empty state used today. In case such a state has been
> >>set, don't call get_mtrr_state() in mtrr_bp_init(). The set state
> >>will only be used by mtrr_type_lookup(), as in all other cases
> >>mtrr_enabled() is being checked, which will return false. Accept the
> >>overwrite call only in case of MTRRs being disabled in cpuid.
> >
> >s/cpuid/CPUID/g
> >
> >>Signed-off-by: Juergen Gross <jgross@suse.com>
> >>---
> >>V2:
> >>- new patch
> >>---
> >>  arch/x86/include/asm/mtrr.h        |  2 ++
> >>  arch/x86/kernel/cpu/mtrr/generic.c | 38 ++++++++++++++++++++++++++++++
> >>  arch/x86/kernel/cpu/mtrr/mtrr.c    |  9 +++++++
> >>  3 files changed, 49 insertions(+)
> >>
> >>diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
> >>index f0eeaf6e5f5f..0b8f51d683dc 100644
> >>--- a/arch/x86/include/asm/mtrr.h
> >>+++ b/arch/x86/include/asm/mtrr.h
> >>@@ -31,6 +31,8 @@
> >>   */
> >>  # ifdef CONFIG_MTRR
> >>  void mtrr_bp_init(void);
> >>+void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
> >>+			  mtrr_type *fixed, mtrr_type def_type);
> >>  extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
> >>  extern void mtrr_save_fixed_ranges(void *);
> >>  extern void mtrr_save_state(void);
> >>diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
> >>index ee09d359e08f..788bc16888a5 100644
> >>--- a/arch/x86/kernel/cpu/mtrr/generic.c
> >>+++ b/arch/x86/kernel/cpu/mtrr/generic.c
> >>@@ -240,6 +240,44 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
> >>  	return mtrr_state.def_type;
> >>  }
> >>+/**
> >>+ * mtrr_overwrite_state - set fixed MTRR state
> >
> >fixed only? You pass in variable too...
> >
> >>+ *
> >>+ * Used to set MTRR state via different means (e.g. with data obtained from
> >>+ * a hypervisor).
> >>+ */
> >>+void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
> >>+			  mtrr_type *fixed, mtrr_type def_type)
> >>+{
> >>+	unsigned int i;
> >>+
> >>+	if (boot_cpu_has(X86_FEATURE_MTRR))
> >
> >check_for_deprecated_apis: WARNING: arch/x86/kernel/cpu/mtrr/generic.c:254: Do not use boot_cpu_has() - use cpu_feature_enabled() instead.
> >
> >>+		return;
> >
> >So this here needs to check:
> >
> >	if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR) &&
> >	    !(cpu_feature_enabled(X86_FEATURE_SEV_SNP) ||
> >	      cpu_feature_enabled(X86_FEATURE_XENPV))) {
> >		WARN_ON_ONCE(1);
> >		return;
> >	}
> >
> >as we don't want this to be called somewhere or by something else.
> >
> >The SEV_SNP flag can be used from:
> >
> >https://lore.kernel.org/r/20221214194056.161492-14-michael.roth@amd.com
> >
> >I'm assuming here HyperV SEV-SNP guests really do set that feature flag
> >(they better). We can expedite that patch ofc.
> 
> Is that flag _really_ meant to indicate we are running as a SEV-SNP guest?
> 
> Given that the referenced patch is part of the SEV-SNP host support series,
> I'm inclined to suspect it won't be set for sure in HyperV SEV-SNP guests.
> And who is setting it for KVM SEV-SNP guests?
> 
> 
> Juergen

Initially both guest and host side have X86_FEATURE_SEV_SNP, but
early_detect_mem_encrypt() clears it for the guest side.

You would want cc_platform_has(CC_ATTR_GUEST_SEV_SNP), but:

* there are two kinds of Hyper-V SEV-SNP VMs: "unenlightened" (vTOM+paravisor)
  and "enlightened"
* the kernel currently supports the "unenlightened" kind which do not set
  CC_ATTR_GUEST_SEV_SNP (because it implies codepaths which do not apply to
  vTOM mode)
* the patchset that adds support for "enlightened" Hyper-V SEV-SNP VMs sets
  CC_ATTR_GUEST_SEV_SNP [1]

The closest you can get is:

  cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) && hypervisor_is_type(X86_HYPER_MS_HYPERV)

but that would soon cover TDX guests too so <shrug>...

[1]: https://lore.kernel.org/lkml/1673559753-94403-8-git-send-email-mikelley@microsoft.com/
  
Jeremi Piotrowski Feb. 16, 2023, 11:07 a.m. UTC | #26
On Mon, Feb 13, 2023 at 12:39:40PM +0100, Borislav Petkov wrote:
> On Thu, Feb 09, 2023 at 08:22:14AM +0100, Juergen Gross wrote:
> > When running virtualized, MTRR access can be reduced (e.g. in Xen PV
> > guests or when running as a SEV-SNP guest under Hyper-V). Typically
> > the hypervisor will reset the MTRR feature in cpuid data, resulting
> > in no MTRR memory type information being available for the kernel.
> > 
> > This has turned out to result in problems:
> > 
> > - Hyper-V SEV-SNP guests using uncached mappings where they shouldn't
> > - Xen PV dom0 mapping memory as WB which should be UC- instead
> > 
> > Solve those problems by supporting to set a fixed MTRR state,
> > overwriting the empty state used today. In case such a state has been
> > set, don't call get_mtrr_state() in mtrr_bp_init(). The set state
> > will only be used by mtrr_type_lookup(), as in all other cases
> > mtrr_enabled() is being checked, which will return false. Accept the
> > overwrite call only in case of MTRRs being disabled in cpuid.
> 
> s/cpuid/CPUID/g
> 
> > Signed-off-by: Juergen Gross <jgross@suse.com>
> > ---
> > V2:
> > - new patch
> > ---
> >  arch/x86/include/asm/mtrr.h        |  2 ++
> >  arch/x86/kernel/cpu/mtrr/generic.c | 38 ++++++++++++++++++++++++++++++
> >  arch/x86/kernel/cpu/mtrr/mtrr.c    |  9 +++++++
> >  3 files changed, 49 insertions(+)
> > 
> > diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
> > index f0eeaf6e5f5f..0b8f51d683dc 100644
> > --- a/arch/x86/include/asm/mtrr.h
> > +++ b/arch/x86/include/asm/mtrr.h
> > @@ -31,6 +31,8 @@
> >   */
> >  # ifdef CONFIG_MTRR
> >  void mtrr_bp_init(void);
> > +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
> > +			  mtrr_type *fixed, mtrr_type def_type);
> >  extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
> >  extern void mtrr_save_fixed_ranges(void *);
> >  extern void mtrr_save_state(void);
> > diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
> > index ee09d359e08f..788bc16888a5 100644
> > --- a/arch/x86/kernel/cpu/mtrr/generic.c
> > +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> > @@ -240,6 +240,44 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
> >  	return mtrr_state.def_type;
> >  }
> >  
> > +/**
> > + * mtrr_overwrite_state - set fixed MTRR state
> 
> fixed only? You pass in variable too...
> 
> > + *
> > + * Used to set MTRR state via different means (e.g. with data obtained from
> > + * a hypervisor).
> > + */
> > +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
> > +			  mtrr_type *fixed, mtrr_type def_type)
> > +{
> > +	unsigned int i;
> > +
> > +	if (boot_cpu_has(X86_FEATURE_MTRR))
> 
> check_for_deprecated_apis: WARNING: arch/x86/kernel/cpu/mtrr/generic.c:254: Do not use boot_cpu_has() - use cpu_feature_enabled() instead.
> 

Hi Boris,

Where does this check come from? I can't find a source for it.

Jeremi
  
Borislav Petkov Feb. 16, 2023, 11:25 a.m. UTC | #27
On Thu, Feb 16, 2023 at 10:32:28AM +0100, Juergen Gross wrote:
> Is that flag _really_ meant to indicate we are running as a SEV-SNP guest?

Yes.

> Given that the referenced patch is part of the SEV-SNP host support series,
> I'm inclined to suspect it won't be set for sure in HyperV SEV-SNP guests.

It better be. If it is a modified guest - no matter how modified - it
should set that flag. The vTOM thing is still being discussed.

> And who is setting it for KVM SEV-SNP guests?

That same patch does.
  
Borislav Petkov Feb. 16, 2023, 11:27 a.m. UTC | #28
On Thu, Feb 16, 2023 at 03:07:28AM -0800, Jeremi Piotrowski wrote:
> Where does this check come from? I can't find a source for it.

That's the patch checker I've been writing while reviewing patches:

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=vp

It is, ofc, WIP.
  
Juergen Gross Feb. 16, 2023, 12:19 p.m. UTC | #29
On 16.02.23 12:25, Borislav Petkov wrote:
> On Thu, Feb 16, 2023 at 10:32:28AM +0100, Juergen Gross wrote:
>> Is that flag _really_ meant to indicate we are running as a SEV-SNP guest?
> 
> Yes.
> 
>> Given that the referenced patch is part of the SEV-SNP host support series,
>> I'm inclined to suspect it won't be set for sure in HyperV SEV-SNP guests.
> 
> It better be. If it is a modified guest - no matter how modified - it
> should set that flag. The vTOM thing is still being discussed.
> 
>> And who is setting it for KVM SEV-SNP guests?
> 
> That same patch does.

Hmm, I must be blind. I can't spot it.

I'm seeing only the feature bit #define and a call of
setup_clear_cpu_cap(X86_FEATURE_SEV_SNP) in this patch.

Or is it done by hardware or the hypervisor?


Juergen
  
Borislav Petkov Feb. 16, 2023, 12:29 p.m. UTC | #30
On Thu, Feb 16, 2023 at 01:19:22PM +0100, Juergen Gross wrote:
> Hmm, I must be blind. I can't spot it.
> 
> I'm seeing only the feature bit #define and a call of
> setup_clear_cpu_cap(X86_FEATURE_SEV_SNP) in this patch.
> 
> Or is it done by hardware or the hypervisor?

Correction - I meant CC_ATTR_GUEST_SEV_SNP not the CPUID feature flag.

Sorry for the confusion folks.
  
Michael Kelley (LINUX) Feb. 16, 2023, 4:04 p.m. UTC | #31
From: Borislav Petkov <bp@alien8.de> Sent: Thursday, February 16, 2023 4:29 AM
>
> On Thu, Feb 16, 2023 at 01:19:22PM +0100, Juergen Gross wrote:
> > Hmm, I must be blind. I can't spot it.
> >
> > I'm seeing only the feature bit #define and a call of
> > setup_clear_cpu_cap(X86_FEATURE_SEV_SNP) in this patch.
> >
> > Or is it done by hardware or the hypervisor?
> 
> Correction - I meant CC_ATTR_GUEST_SEV_SNP not the CPUID feature flag.
> 

In current upstream code, Hyper-V vTOM VMs aren't participating in
the CC_ATTR_* scheme at all, so CC_ATTR_GUEST_SEV_SNP won't be
set.   Getting Hyper-V vTOM VMs integrated into that scheme is a key
part of my big patch set[1] that we're separately trying to resolve the
last issues with.

Michael

[1] https://lore.kernel.org/linux-hyperv/1673559753-94403-1-git-send-email-mikelley@microsoft.com/
  

Patch

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index f0eeaf6e5f5f..0b8f51d683dc 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -31,6 +31,8 @@ 
  */
 # ifdef CONFIG_MTRR
 void mtrr_bp_init(void);
+void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
+			  mtrr_type *fixed, mtrr_type def_type);
 extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
 extern void mtrr_save_fixed_ranges(void *);
 extern void mtrr_save_state(void);
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index ee09d359e08f..788bc16888a5 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -240,6 +240,44 @@  static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
 	return mtrr_state.def_type;
 }
 
+/**
+ * mtrr_overwrite_state - set fixed MTRR state
+ *
+ * Used to set MTRR state via different means (e.g. with data obtained from
+ * a hypervisor).
+ */
+void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
+			  mtrr_type *fixed, mtrr_type def_type)
+{
+	unsigned int i;
+
+	if (boot_cpu_has(X86_FEATURE_MTRR))
+		return;
+
+	if (var) {
+		if (num_var > MTRR_MAX_VAR_RANGES) {
+			pr_warn("Trying to overwrite MTRR state with %u variable entries\n",
+				num_var);
+			num_var = MTRR_MAX_VAR_RANGES;
+		}
+		for (i = 0; i < num_var; i++)
+			mtrr_state.var_ranges[i] = var[i];
+		num_var_ranges = num_var;
+	}
+
+	if (fixed) {
+		for (i = 0; i < MTRR_NUM_FIXED_RANGES; i++)
+			mtrr_state.fixed_ranges[i] = fixed[i];
+		mtrr_state.enabled |= MTRR_STATE_MTRR_FIXED_ENABLED;
+		mtrr_state.have_fixed = 1;
+	}
+
+	mtrr_state.def_type = def_type;
+	mtrr_state.enabled |= MTRR_STATE_MTRR_ENABLED;
+
+	mtrr_state_set = 1;
+}
+
 /**
  * mtrr_type_lookup - look up memory type in MTRR
  *
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index 542ca5639dfd..b73fe243c7fd 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -668,6 +668,15 @@  void __init mtrr_bp_init(void)
 	const char *why = "(not available)";
 	unsigned int phys_addr;
 
+	if (mtrr_state.enabled) {
+		/* Software overwrite of MTRR state, only for generic case. */
+		mtrr_calc_physbits(true);
+		init_table();
+		pr_info("MTRRs set to read-only\n");
+
+		return;
+	}
+
 	phys_addr = mtrr_calc_physbits(boot_cpu_has(X86_FEATURE_MTRR));
 
 	if (boot_cpu_has(X86_FEATURE_MTRR)) {