[v5,04/15] x86/mtrr: support setting MTRR state for software defined MTRRs

Message ID 20230401063652.23522-5-jgross@suse.com
State New
Headers
Series x86/mtrr: fix handling with PAT but without MTRR |

Commit Message

Juergen Gross April 1, 2023, 6:36 a.m. UTC
  When running virtualized, MTRR access can be reduced (e.g. in Xen PV
guests or when running as a SEV-SNP guest under Hyper-V). Typically
the hypervisor will reset the MTRR feature in CPUID data, resulting
in no MTRR memory type information being available for the kernel.

This has turned out to result in problems:

- Hyper-V SEV-SNP guests using uncached mappings where they shouldn't
- Xen PV dom0 mapping memory as WB which should be UC- instead

Solve those problems by supporting to set a static MTRR state,
overwriting the empty state used today. In case such a state has been
set, don't call get_mtrr_state() in mtrr_bp_init(). The set state
will only be used by mtrr_type_lookup(), as in all other cases
mtrr_enabled() is being checked, which will return false. Accept the
overwrite call only for selected cases when running as a guest.
Disable X86_FEATURE_MTRR in order to avoid any MTRR modifications by
just refusing them.

Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
---
V2:
- new patch
V3:
- omit fixed MTRRs, as those are currently not needed
- disable X86_FEATURE_MTRR instead of testing it
- provide a stub for !CONFIG_MTRR (Michael Kelley)
- use cpu_feature_enabled() (Boris Petkov)
- add tests for mtrr_overwrite_state() being allowed (Boris Petkov)
V4:
- add test for hv_is_isolation_supported() (Michael Kelley)
V5:
- drop test for running as native (Boris Petkov)
- split large complex test into multiple simple ones (Boris Petkov)
- enhance test in mtrr_bp_init() (Boris Petkov)
---
 arch/x86/include/asm/mtrr.h        |  8 +++++
 arch/x86/kernel/cpu/mtrr/generic.c | 58 +++++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/mtrr/mtrr.c    |  9 +++++
 arch/x86/kernel/setup.c            |  2 ++
 4 files changed, 76 insertions(+), 1 deletion(-)
  

Comments

Kai Huang April 3, 2023, 2:25 a.m. UTC | #1
On Sat, 2023-04-01 at 08:36 +0200, Juergen Gross wrote:
> +/**
> + * mtrr_overwrite_state - set static MTRR state
> + *
> + * Used to set MTRR state via different means (e.g. with data obtained from
> + * a hypervisor).
> + * Is allowed only for special cases when running virtualized. Must be called
> + * from the x86_init.hyper.init_platform() hook.
> + */
> +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
> +			  mtrr_type def_type)
> +{
> +	unsigned int i;
> +
> +	/* Only allowed to be called once before mtrr_bp_init(). */
> +	if (WARN_ON(mtrr_state_set))
> +		return;
> +
> +	/* Only allowed when running virtualized. */
> +	if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
> +		return;
> +
> +	/*
> +	 * Only allowed for special virtualization cases:
> +	 * - when running as SEV-SNP guest
> +	 * - when running as Hyper-V isolated guest
> +	 * - when running as Xen PV guest
> +	 * - when running as TSX guest
			     ^
			     TDX guest

> +	 */
> +	if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP) &&
> +	    !hv_is_isolation_supported() &&
> +	    !cpu_feature_enabled(X86_FEATURE_XENPV) &&
> +	    !cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
> +		return;
> +
> +	/* Disable MTRR in order to disable MTRR modifications. */
> +	setup_clear_cpu_cap(X86_FEATURE_MTRR);
> +
> +	if (var) {
> +		if (num_var > MTRR_MAX_VAR_RANGES) {
> +			pr_warn("Trying to overwrite MTRR state with %u variable entries\n",
> +				num_var);
> +			num_var = MTRR_MAX_VAR_RANGES;
> +		}
> +		for (i = 0; i < num_var; i++)
> +			mtrr_state.var_ranges[i] = var[i];
> +		num_var_ranges = num_var;
> +	}
> +
> +	mtrr_state.def_type = def_type;
> +	mtrr_state.enabled |= MTRR_STATE_MTRR_ENABLED;
> +
> +	mtrr_state_set = 1;
> +}
> +
>  /**
>   * mtrr_type_lookup - look up memory type in MTRR
>   *
> diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
> index 1beb38f7a7a3..1c19d67ddab3 100644
> --- a/arch/x86/kernel/cpu/mtrr/mtrr.c
> +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
> @@ -666,6 +666,15 @@ void __init mtrr_bp_init(void)
>  	const char *why = "(not available)";
>  	unsigned int phys_addr;
>  
> +	if (!generic_mtrrs && mtrr_state.enabled) {
> +		/* Software overwrite of MTRR state, only for generic case. */
							      ^
							      !generic case?

> +		mtrr_calc_physbits(true);
> +		init_table();
> +		pr_info("MTRRs set to read-only\n");
> +
> +		return;
> +	}
> +
>  	phys_addr = mtrr_calc_physbits(generic_mtrrs);
  
Juergen Gross April 3, 2023, 7:10 a.m. UTC | #2
On 03.04.23 04:25, Huang, Kai wrote:
> On Sat, 2023-04-01 at 08:36 +0200, Juergen Gross wrote:
>> +/**
>> + * mtrr_overwrite_state - set static MTRR state
>> + *
>> + * Used to set MTRR state via different means (e.g. with data obtained from
>> + * a hypervisor).
>> + * Is allowed only for special cases when running virtualized. Must be called
>> + * from the x86_init.hyper.init_platform() hook.
>> + */
>> +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
>> +			  mtrr_type def_type)
>> +{
>> +	unsigned int i;
>> +
>> +	/* Only allowed to be called once before mtrr_bp_init(). */
>> +	if (WARN_ON(mtrr_state_set))
>> +		return;
>> +
>> +	/* Only allowed when running virtualized. */
>> +	if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
>> +		return;
>> +
>> +	/*
>> +	 * Only allowed for special virtualization cases:
>> +	 * - when running as SEV-SNP guest
>> +	 * - when running as Hyper-V isolated guest
>> +	 * - when running as Xen PV guest
>> +	 * - when running as TSX guest
> 			     ^
> 			     TDX guest

Thanks.

> 
>> +	 */
>> +	if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP) &&
>> +	    !hv_is_isolation_supported() &&
>> +	    !cpu_feature_enabled(X86_FEATURE_XENPV) &&
>> +	    !cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
>> +		return;
>> +
>> +	/* Disable MTRR in order to disable MTRR modifications. */
>> +	setup_clear_cpu_cap(X86_FEATURE_MTRR);
>> +
>> +	if (var) {
>> +		if (num_var > MTRR_MAX_VAR_RANGES) {
>> +			pr_warn("Trying to overwrite MTRR state with %u variable entries\n",
>> +				num_var);
>> +			num_var = MTRR_MAX_VAR_RANGES;
>> +		}
>> +		for (i = 0; i < num_var; i++)
>> +			mtrr_state.var_ranges[i] = var[i];
>> +		num_var_ranges = num_var;
>> +	}
>> +
>> +	mtrr_state.def_type = def_type;
>> +	mtrr_state.enabled |= MTRR_STATE_MTRR_ENABLED;
>> +
>> +	mtrr_state_set = 1;
>> +}
>> +
>>   /**
>>    * mtrr_type_lookup - look up memory type in MTRR
>>    *
>> diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
>> index 1beb38f7a7a3..1c19d67ddab3 100644
>> --- a/arch/x86/kernel/cpu/mtrr/mtrr.c
>> +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
>> @@ -666,6 +666,15 @@ void __init mtrr_bp_init(void)
>>   	const char *why = "(not available)";
>>   	unsigned int phys_addr;
>>   
>> +	if (!generic_mtrrs && mtrr_state.enabled) {
>> +		/* Software overwrite of MTRR state, only for generic case. */
> 							      ^
> 							      !generic case?

No. This test just verifies that the (visible) MTRR feature is switched off,
as there are no ways to modify any MTRR registers in the overwrite case.

I can make this more obvious in a comment.


Juergen
  
Kai Huang April 3, 2023, 9:27 a.m. UTC | #3
> > >   /**
> > >    * mtrr_type_lookup - look up memory type in MTRR
> > >    *
> > > diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
> > > index 1beb38f7a7a3..1c19d67ddab3 100644
> > > --- a/arch/x86/kernel/cpu/mtrr/mtrr.c
> > > +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
> > > @@ -666,6 +666,15 @@ void __init mtrr_bp_init(void)
> > >   	const char *why = "(not available)";
> > >   	unsigned int phys_addr;
> > >   
> > > +	if (!generic_mtrrs && mtrr_state.enabled) {
> > > +		/* Software overwrite of MTRR state, only for generic case. */
> > 							      ^
> > 							      !generic case?
> 
> No. This test just verifies that the (visible) MTRR feature is switched off,
> as there are no ways to modify any MTRR registers in the overwrite case.
> 
> I can make this more obvious in a comment.

Should the comment say something like (because it applies to the code inside the
check):


	If we have a static (synthetic) MTRR already established for special 
	VMs, we still need to calculate the physical address bits using
generic 
	way, because the hardware to run those special VMs indeed has MTRR.

That explains why 'true' is passed to mtrr_calc_physbits().

?

> 
> 
> Juergen
  
Juergen Gross April 3, 2023, 9:35 a.m. UTC | #4
On 03.04.23 11:27, Huang, Kai wrote:
> 
>>>>    /**
>>>>     * mtrr_type_lookup - look up memory type in MTRR
>>>>     *
>>>> diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
>>>> index 1beb38f7a7a3..1c19d67ddab3 100644
>>>> --- a/arch/x86/kernel/cpu/mtrr/mtrr.c
>>>> +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
>>>> @@ -666,6 +666,15 @@ void __init mtrr_bp_init(void)
>>>>    	const char *why = "(not available)";
>>>>    	unsigned int phys_addr;
>>>>    
>>>> +	if (!generic_mtrrs && mtrr_state.enabled) {
>>>> +		/* Software overwrite of MTRR state, only for generic case. */
>>> 							      ^
>>> 							      !generic case?
>>
>> No. This test just verifies that the (visible) MTRR feature is switched off,
>> as there are no ways to modify any MTRR registers in the overwrite case.
>>
>> I can make this more obvious in a comment.
> 
> Should the comment say something like (because it applies to the code inside the
> check):
> 
> 
> 	If we have a static (synthetic) MTRR already established for special
> 	VMs, we still need to calculate the physical address bits using
> generic
> 	way, because the hardware to run those special VMs indeed has MTRR.
> 
> That explains why 'true' is passed to mtrr_calc_physbits().

I'd rather say that the interface of mtrr_overwrite_state() is based on the
interface of generic MTRRs.


Juergen
  
Kai Huang April 3, 2023, 9:43 a.m. UTC | #5
On Mon, 2023-04-03 at 09:27 +0000, Huang, Kai wrote:
> > > >   /**
> > > >    * mtrr_type_lookup - look up memory type in MTRR
> > > >    *
> > > > diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c
> > > > b/arch/x86/kernel/cpu/mtrr/mtrr.c
> > > > index 1beb38f7a7a3..1c19d67ddab3 100644
> > > > --- a/arch/x86/kernel/cpu/mtrr/mtrr.c
> > > > +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
> > > > @@ -666,6 +666,15 @@ void __init mtrr_bp_init(void)
> > > >   	const char *why = "(not available)";
> > > >   	unsigned int phys_addr;
> > > >   
> > > > +	if (!generic_mtrrs && mtrr_state.enabled) {
> > > > +		/* Software overwrite of MTRR state, only for generic
> > > > case. */
> > > 							      ^
> > > 							      !generic
> > > case?
> > 
> > No. This test just verifies that the (visible) MTRR feature is switched off,
> > as there are no ways to modify any MTRR registers in the overwrite case.
> > 
> > I can make this more obvious in a comment.
> 
> Should the comment say something like (because it applies to the code inside
> the
> check):
> 
> 
> 	If we have a static (synthetic) MTRR already established for special 
> 	VMs, we still need to calculate the physical address bits using
> generic 
> 	way, because the hardware to run those special VMs indeed has MTRR.
> 
> That explains why 'true' is passed to mtrr_calc_physbits().
> 
> ?
> 
> 

And yes I guess it would be better to point out we cannot modify any MTRR
registers in the overwrite case, but this is also clear to me.  So no opinion on
this point, but I do find the original comment is a little bit confusing.
  
Kai Huang April 3, 2023, 9:44 a.m. UTC | #6
On Mon, 2023-04-03 at 11:35 +0200, Juergen Gross wrote:
> On 03.04.23 11:27, Huang, Kai wrote:
> > 
> > > > >    /**
> > > > >     * mtrr_type_lookup - look up memory type in MTRR
> > > > >     *
> > > > > diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
> > > > > index 1beb38f7a7a3..1c19d67ddab3 100644
> > > > > --- a/arch/x86/kernel/cpu/mtrr/mtrr.c
> > > > > +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
> > > > > @@ -666,6 +666,15 @@ void __init mtrr_bp_init(void)
> > > > >    	const char *why = "(not available)";
> > > > >    	unsigned int phys_addr;
> > > > >    
> > > > > +	if (!generic_mtrrs && mtrr_state.enabled) {
> > > > > +		/* Software overwrite of MTRR state, only for generic case. */
> > > > 							      ^
> > > > 							      !generic case?
> > > 
> > > No. This test just verifies that the (visible) MTRR feature is switched off,
> > > as there are no ways to modify any MTRR registers in the overwrite case.
> > > 
> > > I can make this more obvious in a comment.
> > 
> > Should the comment say something like (because it applies to the code inside the
> > check):
> > 
> > 
> > 	If we have a static (synthetic) MTRR already established for special
> > 	VMs, we still need to calculate the physical address bits using
> > generic
> > 	way, because the hardware to run those special VMs indeed has MTRR.
> > 
> > That explains why 'true' is passed to mtrr_calc_physbits().
> 
> I'd rather say that the interface of mtrr_overwrite_state() is based on the
> interface of generic MTRRs.

Sure fine to me too.  Thanks.
  
Borislav Petkov April 11, 2023, 1:20 p.m. UTC | #7
On Sat, Apr 01, 2023 at 08:36:41AM +0200, Juergen Gross wrote:
> When running virtualized, MTRR access can be reduced (e.g. in Xen PV
> guests or when running as a SEV-SNP guest under Hyper-V). Typically
> the hypervisor will reset the MTRR feature in CPUID data, resulting
> in no MTRR memory type information being available for the kernel.
> 
> This has turned out to result in problems:

Let's add the links to those problems:

> - Hyper-V SEV-SNP guests using uncached mappings where they shouldn't

I can't find Michael's original report, do you have a link?

> - Xen PV dom0 mapping memory as WB which should be UC- instead

Link: https://lore.kernel.org/all/4fe9541e-4d4c-2b2a-f8c8-2d34a7284930@nerdbynature.de/

> 
> Solve those problems by supporting to set a static MTRR state,

s/by supporting to set a/allowing an MTRR static state override/

> +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
> +			  mtrr_type def_type)
> +{
> +	unsigned int i;
> +
> +	/* Only allowed to be called once before mtrr_bp_init(). */
> +	if (WARN_ON(mtrr_state_set))

WARN_ON_ONCE() is probably better.

> +		return;
> +
> +	/* Only allowed when running virtualized. */
> +	if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
> +		return;
> +
> +	/*
> +	 * Only allowed for special virtualization cases:
> +	 * - when running as SEV-SNP guest
> +	 * - when running as Hyper-V isolated guest

	when running as a SEV-SNP guest on a HyperV with vTOM enabled

that's a single condition.

> +	 * - when running as Xen PV guest
> +	 * - when running as TSX guest
> +	 */
> +	if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP) &&
> +	    !hv_is_isolation_supported() &&
> +	    !cpu_feature_enabled(X86_FEATURE_XENPV) &&
> +	    !cpu_feature_enabled(X86_FEATURE_TDX_GUEST))

IOW:

	if (!(hv_is_isolation_supported() && cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) &&
	    !cpu_feature_enabled(X86_FEATURE_XENPV) &&
	    !cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
  
Juergen Gross April 11, 2023, 1:31 p.m. UTC | #8
On 11.04.23 15:20, Borislav Petkov wrote:
> On Sat, Apr 01, 2023 at 08:36:41AM +0200, Juergen Gross wrote:
>> When running virtualized, MTRR access can be reduced (e.g. in Xen PV
>> guests or when running as a SEV-SNP guest under Hyper-V). Typically
>> the hypervisor will reset the MTRR feature in CPUID data, resulting
>> in no MTRR memory type information being available for the kernel.
>>
>> This has turned out to result in problems:
> 
> Let's add the links to those problems:
> 
>> - Hyper-V SEV-SNP guests using uncached mappings where they shouldn't
> 
> I can't find Michael's original report, do you have a link?

DYM 
https://lore.kernel.org/lkml/BYAPR21MB16883ABC186566BD4D2A1451D7FE9@BYAPR21MB1688.namprd21.prod.outlook.com/ 
?

> 
>> - Xen PV dom0 mapping memory as WB which should be UC- instead
> 
> Link: https://lore.kernel.org/all/4fe9541e-4d4c-2b2a-f8c8-2d34a7284930@nerdbynature.de/
> 
>>
>> Solve those problems by supporting to set a static MTRR state,
> 
> s/by supporting to set a/allowing an MTRR static state override/
> 
>> +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
>> +			  mtrr_type def_type)
>> +{
>> +	unsigned int i;
>> +
>> +	/* Only allowed to be called once before mtrr_bp_init(). */
>> +	if (WARN_ON(mtrr_state_set))
> 
> WARN_ON_ONCE() is probably better.

If you like that better (I don't see the real benefit, as something would
be severely broken if this triggers more than once, but I don't really
mind).

> 
>> +		return;
>> +
>> +	/* Only allowed when running virtualized. */
>> +	if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
>> +		return;
>> +
>> +	/*
>> +	 * Only allowed for special virtualization cases:
>> +	 * - when running as SEV-SNP guest
>> +	 * - when running as Hyper-V isolated guest
> 
> 	when running as a SEV-SNP guest on a HyperV with vTOM enabled
> 
> that's a single condition.
> 
>> +	 * - when running as Xen PV guest
>> +	 * - when running as TSX guest
>> +	 */
>> +	if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP) &&
>> +	    !hv_is_isolation_supported() &&
>> +	    !cpu_feature_enabled(X86_FEATURE_XENPV) &&
>> +	    !cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
> 
> IOW:
> 
> 	if (!(hv_is_isolation_supported() && cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) &&
> 	    !cpu_feature_enabled(X86_FEATURE_XENPV) &&
> 	    !cpu_feature_enabled(X86_FEATURE_TDX_GUEST))

Okay.


Juergen
  
Michael Kelley (LINUX) April 11, 2023, 1:59 p.m. UTC | #9
From: Borislav Petkov <bp@alien8.de> Sent: Tuesday, April 11, 2023 6:21 AM
> 
> On Sat, Apr 01, 2023 at 08:36:41AM +0200, Juergen Gross wrote:

[snip]

> >
> > +
> > +	/*
> > +	 * Only allowed for special virtualization cases:
> > +	 * - when running as SEV-SNP guest
> > +	 * - when running as Hyper-V isolated guest
> 
> 	when running as a SEV-SNP guest on a HyperV with vTOM enabled
> 
> that's a single condition.
> 
> > +	 * - when running as Xen PV guest
> > +	 * - when running as TSX guest
> > +	 */
> > +	if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP) &&
> > +	    !hv_is_isolation_supported() &&
> > +	    !cpu_feature_enabled(X86_FEATURE_XENPV) &&
> > +	    !cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
> 
> IOW:
> 
> 	if (!(hv_is_isolation_supported() && cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) &&
> 	    !cpu_feature_enabled(X86_FEATURE_XENPV) &&
> 	    !cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
> 

That's doesn't work.  Hyper-V guests with vTOM don't have
CC_ATTR_GUEST_SEV_SNP.   As previously discussed, the SEV-SNP
machinery is handled by the paravisor, and the Linux guest doesn't
see it.  Enabling CC_ATTR_GUEST_SEV_SNP in a vTOM guest would
trigger Linux to do a bunch of SNP stuff that the paravisor has already
done and would break things.   The standalone hv_is_isolation_supported()
test is sufficient to detect this case.

I really wanted to avoid calls to hv_is_isolation_supported() outside
of Hyper-V specific code in the kernel.  The alternative is to create
another CC_ATTR_ value that is set in the vTOM case, but that reopens
the naming can-of-worms.

Michael
  
Juergen Gross April 11, 2023, 2:04 p.m. UTC | #10
On 11.04.23 15:59, Michael Kelley (LINUX) wrote:
> From: Borislav Petkov <bp@alien8.de> Sent: Tuesday, April 11, 2023 6:21 AM
>>
>> On Sat, Apr 01, 2023 at 08:36:41AM +0200, Juergen Gross wrote:
> 
> [snip]
> 
>>>
>>> +
>>> +	/*
>>> +	 * Only allowed for special virtualization cases:
>>> +	 * - when running as SEV-SNP guest
>>> +	 * - when running as Hyper-V isolated guest
>>
>> 	when running as a SEV-SNP guest on a HyperV with vTOM enabled
>>
>> that's a single condition.
>>
>>> +	 * - when running as Xen PV guest
>>> +	 * - when running as TSX guest
>>> +	 */
>>> +	if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP) &&
>>> +	    !hv_is_isolation_supported() &&
>>> +	    !cpu_feature_enabled(X86_FEATURE_XENPV) &&
>>> +	    !cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
>>
>> IOW:
>>
>> 	if (!(hv_is_isolation_supported() && cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) &&
>> 	    !cpu_feature_enabled(X86_FEATURE_XENPV) &&
>> 	    !cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
>>
> 
> That's doesn't work.  Hyper-V guests with vTOM don't have
> CC_ATTR_GUEST_SEV_SNP.

Yeah, the condition needs to be:

	if (!(hv_is_isolation_supported() ||
	      cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) &&
  	    !cpu_feature_enabled(X86_FEATURE_XENPV) &&
  	    !cpu_feature_enabled(X86_FEATURE_TDX_GUEST))

This is equivalent to the condition in my patch.


Juergen
  
Borislav Petkov April 11, 2023, 2:25 p.m. UTC | #11
On Tue, Apr 11, 2023 at 01:59:36PM +0000, Michael Kelley (LINUX) wrote:
> That's doesn't work.  Hyper-V guests with vTOM don't have
> CC_ATTR_GUEST_SEV_SNP. As previously discussed, the SEV-SNP
> machinery is handled by the paravisor,

Do you see now why I want for all kinds of guest types to not deviate
from doing the proper/default checks for their type? This is a good
example, case-in-point.

All "special" guests would get broken in the future, otherwise.

> I really wanted to avoid calls to hv_is_isolation_supported() outside
> of Hyper-V specific code in the kernel.  The alternative is to create
> another CC_ATTR_ value that is set in the vTOM case, but that reopens
> the naming can-of-worms.

Now is the time for you to settle on what would be the official way to
query those guest types, before it propagates everywhere.
  
Borislav Petkov April 11, 2023, 2:26 p.m. UTC | #12
On Tue, Apr 11, 2023 at 04:04:00PM +0200, Juergen Gross wrote:
> Yeah, the condition needs to be:
> 
> 	if (!(hv_is_isolation_supported() ||
> 	      cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) &&
	      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Why is that needed at all?
  
Juergen Gross April 11, 2023, 3:57 p.m. UTC | #13
On 11.04.23 16:26, Borislav Petkov wrote:
> On Tue, Apr 11, 2023 at 04:04:00PM +0200, Juergen Gross wrote:
>> Yeah, the condition needs to be:
>>
>> 	if (!(hv_is_isolation_supported() ||
>> 	      cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) &&
> 	      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> Why is that needed at all?
> 

Short answer: You asked me to add it in V2 of the series.

Longer answer: SEV_SNP guests and TDX guests would cause #VE when writing
   MTRR MSRs.


Juergen
  
Borislav Petkov April 11, 2023, 5:14 p.m. UTC | #14
On Tue, Apr 11, 2023 at 03:31:06PM +0200, Juergen Gross wrote:
> DYM https://lore.kernel.org/lkml/BYAPR21MB16883ABC186566BD4D2A1451D7FE9@BYAPR21MB1688.namprd21.prod.outlook.com/

Yap, looks like the one. Thx.

> If you like that better (I don't see the real benefit, as something would
> be severely broken if this triggers more than once, but I don't really
> mind).

No need to spam the console with the same thing, especially if something
else broken is wanting to tell us what it is, or some other assertion
fires.

But yeah, ok, this is on the BSP path so should *not* happen more than
once anyway.

Thx.
  
Borislav Petkov April 11, 2023, 5:15 p.m. UTC | #15
On Tue, Apr 11, 2023 at 05:57:07PM +0200, Juergen Gross wrote:
> Short answer: You asked me to add it in V2 of the series.
> 
> Longer answer: SEV_SNP guests and TDX guests would cause #VE when writing
>   MTRR MSRs.

Ah, sorry. Let's extend the comment then and use your original check:

       /*
        * Only allowed for special virtualization cases:
        * - when running as Hyper-V, SEV-SNP guest using vTOM
        * - when running as Xen PV guest
        * - when running as SEV-SNP or TSX guest to avoid unnecessary
	*   VMM communication/Virtualization exceptions (#VC, #VE)
        */
	if (!hv_is_isolation_supported() &&
	    !cpu_feature_enabled(X86_FEATURE_XENPV) &&
	    !cc_platform_has(CC_ATTR_GUEST_SEV_SNP) &&
	    !cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
		return;

Thx.
  
Juergen Gross April 12, 2023, 8:30 a.m. UTC | #16
On 11.04.23 19:15, Borislav Petkov wrote:
> On Tue, Apr 11, 2023 at 05:57:07PM +0200, Juergen Gross wrote:
>> Short answer: You asked me to add it in V2 of the series.
>>
>> Longer answer: SEV_SNP guests and TDX guests would cause #VE when writing
>>    MTRR MSRs.
> 
> Ah, sorry. Let's extend the comment then and use your original check:
> 
>         /*
>          * Only allowed for special virtualization cases:
>          * - when running as Hyper-V, SEV-SNP guest using vTOM
>          * - when running as Xen PV guest
>          * - when running as SEV-SNP or TSX guest to avoid unnecessary
> 	*   VMM communication/Virtualization exceptions (#VC, #VE)
>          */
> 	if (!hv_is_isolation_supported() &&
> 	    !cpu_feature_enabled(X86_FEATURE_XENPV) &&
> 	    !cc_platform_has(CC_ATTR_GUEST_SEV_SNP) &&
> 	    !cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
> 		return;

Okay.


Juergen
  

Patch

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index 4e59f7854950..6decb18e22ed 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -49,6 +49,8 @@ 
  */
 # ifdef CONFIG_MTRR
 void mtrr_bp_init(void);
+void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
+			  mtrr_type def_type);
 extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
 extern void mtrr_save_fixed_ranges(void *);
 extern void mtrr_save_state(void);
@@ -66,6 +68,12 @@  void mtrr_disable(void);
 void mtrr_enable(void);
 void mtrr_generic_set_state(void);
 #  else
+static inline void mtrr_overwrite_state(struct mtrr_var_range *var,
+					unsigned int num_var,
+					mtrr_type def_type)
+{
+}
+
 static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
 {
 	/*
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 9a12da76635c..0794f3f1cc27 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -8,10 +8,12 @@ 
 #include <linux/init.h>
 #include <linux/io.h>
 #include <linux/mm.h>
-
+#include <linux/cc_platform.h>
 #include <asm/processor-flags.h>
 #include <asm/cacheinfo.h>
 #include <asm/cpufeature.h>
+#include <asm/hypervisor.h>
+#include <asm/mshyperv.h>
 #include <asm/tlbflush.h>
 #include <asm/mtrr.h>
 #include <asm/msr.h>
@@ -241,6 +243,60 @@  static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
 	return mtrr_state.def_type;
 }
 
+/**
+ * mtrr_overwrite_state - set static MTRR state
+ *
+ * Used to set MTRR state via different means (e.g. with data obtained from
+ * a hypervisor).
+ * Is allowed only for special cases when running virtualized. Must be called
+ * from the x86_init.hyper.init_platform() hook.
+ */
+void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
+			  mtrr_type def_type)
+{
+	unsigned int i;
+
+	/* Only allowed to be called once before mtrr_bp_init(). */
+	if (WARN_ON(mtrr_state_set))
+		return;
+
+	/* Only allowed when running virtualized. */
+	if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
+		return;
+
+	/*
+	 * Only allowed for special virtualization cases:
+	 * - when running as SEV-SNP guest
+	 * - when running as Hyper-V isolated guest
+	 * - when running as Xen PV guest
+	 * - when running as TSX guest
+	 */
+	if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP) &&
+	    !hv_is_isolation_supported() &&
+	    !cpu_feature_enabled(X86_FEATURE_XENPV) &&
+	    !cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
+		return;
+
+	/* Disable MTRR in order to disable MTRR modifications. */
+	setup_clear_cpu_cap(X86_FEATURE_MTRR);
+
+	if (var) {
+		if (num_var > MTRR_MAX_VAR_RANGES) {
+			pr_warn("Trying to overwrite MTRR state with %u variable entries\n",
+				num_var);
+			num_var = MTRR_MAX_VAR_RANGES;
+		}
+		for (i = 0; i < num_var; i++)
+			mtrr_state.var_ranges[i] = var[i];
+		num_var_ranges = num_var;
+	}
+
+	mtrr_state.def_type = def_type;
+	mtrr_state.enabled |= MTRR_STATE_MTRR_ENABLED;
+
+	mtrr_state_set = 1;
+}
+
 /**
  * mtrr_type_lookup - look up memory type in MTRR
  *
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index 1beb38f7a7a3..1c19d67ddab3 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -666,6 +666,15 @@  void __init mtrr_bp_init(void)
 	const char *why = "(not available)";
 	unsigned int phys_addr;
 
+	if (!generic_mtrrs && mtrr_state.enabled) {
+		/* Software overwrite of MTRR state, only for generic case. */
+		mtrr_calc_physbits(true);
+		init_table();
+		pr_info("MTRRs set to read-only\n");
+
+		return;
+	}
+
 	phys_addr = mtrr_calc_physbits(generic_mtrrs);
 
 	if (generic_mtrrs) {
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 16babff771bd..0cccfeb67c3a 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1037,6 +1037,8 @@  void __init setup_arch(char **cmdline_p)
 	/*
 	 * VMware detection requires dmi to be available, so this
 	 * needs to be done after dmi_setup(), for the boot CPU.
+	 * For some guest types (Xen PV, SEV-SNP, TDX) it is required to be
+	 * called before cache_bp_init() for setting up MTRR state.
 	 */
 	init_hypervisor_platform();