[v4,03/12] x86/mtrr: support setting MTRR state for software defined MTRRs

Message ID 20230306163425.8324-4-jgross@suse.com
State New
Headers
Series x86/mtrr: fix handling with PAT but without MTRR |

Commit Message

Juergen Gross March 6, 2023, 4:34 p.m. UTC
  When running virtualized, MTRR access can be reduced (e.g. in Xen PV
guests or when running as a SEV-SNP guest under Hyper-V). Typically
the hypervisor will reset the MTRR feature in CPUID data, resulting
in no MTRR memory type information being available for the kernel.

This has turned out to result in problems:

- Hyper-V SEV-SNP guests using uncached mappings where they shouldn't
- Xen PV dom0 mapping memory as WB which should be UC- instead

Solve those problems by supporting to set a static MTRR state,
overwriting the empty state used today. In case such a state has been
set, don't call get_mtrr_state() in mtrr_bp_init(). The set state
will only be used by mtrr_type_lookup(), as in all other cases
mtrr_enabled() is being checked, which will return false. Accept the
overwrite call only for selected cases when running as a guest.
Disable X86_FEATURE_MTRR in order to avoid any MTRR modifications by
just refusing them.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- new patch
V3:
- omit fixed MTRRs, as those are currently not needed
- disable X86_FEATURE_MTRR instead of testing it
- provide a stub for !CONFIG_MTRR (Michael Kelley)
- use cpu_feature_enabled() (Boris Petkov)
- add tests for mtrr_overwrite_state() being allowed (Boris Petkov)
V4:
- add test for hv_is_isolation_supported() (Michael Kelley)
---
 arch/x86/include/asm/mtrr.h        |  8 ++++++
 arch/x86/kernel/cpu/mtrr/generic.c | 46 +++++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/mtrr/mtrr.c    |  9 ++++++
 arch/x86/kernel/setup.c            |  2 ++
 4 files changed, 64 insertions(+), 1 deletion(-)
  

Comments

Kai Huang March 20, 2023, 12:59 p.m. UTC | #1
On Mon, 2023-03-06 at 17:34 +0100, Juergen Gross wrote:
> When running virtualized, MTRR access can be reduced (e.g. in Xen PV
> guests or when running as a SEV-SNP guest under Hyper-V). Typically
> the hypervisor will reset the MTRR feature in CPUID data, resulting
> in no MTRR memory type information being available for the kernel.
> 
> This has turned out to result in problems:
> 
> - Hyper-V SEV-SNP guests using uncached mappings where they shouldn't
> - Xen PV dom0 mapping memory as WB which should be UC- instead
> 
> Solve those problems by supporting to set a static MTRR state,
> overwriting the empty state used today. In case such a state has been
> set, don't call get_mtrr_state() in mtrr_bp_init(). The set state
> will only be used by mtrr_type_lookup(), as in all other cases
> mtrr_enabled() is being checked, which will return false. Accept the
> overwrite call only for selected cases when running as a guest.
> Disable X86_FEATURE_MTRR in order to avoid any MTRR modifications by
> just refusing them.
> 
> 
[...]

> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
> index ee09d359e08f..49b4cc923312 100644
> --- a/arch/x86/kernel/cpu/mtrr/generic.c
> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> @@ -8,10 +8,12 @@
>  #include <linux/init.h>
>  #include <linux/io.h>
>  #include <linux/mm.h>
> -
> +#include <linux/cc_platform.h>
>  #include <asm/processor-flags.h>
>  #include <asm/cacheinfo.h>
>  #include <asm/cpufeature.h>
> +#include <asm/hypervisor.h>
> +#include <asm/mshyperv.h>

Is <asm/mshyperv.h> needed here?

>  #include <asm/tlbflush.h>
>  #include <asm/mtrr.h>
>  #include <asm/msr.h>
> @@ -240,6 +242,48 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
>  	return mtrr_state.def_type;
>  }
>  
> +/**
> + * mtrr_overwrite_state - set static MTRR state
> + *
> + * Used to set MTRR state via different means (e.g. with data obtained from
> + * a hypervisor).

+KVM list and KVM maintainers,

IIUC in the next patch, SEV-SNP guest only sets a synthetic MTRR w/o telling the
hypervisor (hyperv).  I think this works for SEV-SNP running on top of hyperv
because they have mutual understanding?

What about the SNP guest running on other hypervisors such as KVM?

Since this code covers TDX guest too, I think eventually it makes sense for TDX
guest to use this function too (to avoid #VE IIUC).  If want to do that, then I
think TDX guest should have the same mutual understanding with *ALL* hypervisor,
as I am not sure what's the point of making the TDX guest's MTRR behaviour
depending on specific hypervisor.

For now I don't see there's any use case for TDX guest to use non-WB memory type
(in fact, KVM always maps guest memory as WB if there's no non-coherent DMA to
the guest memory), so to me it seems it's OK to make a universal mutual
understanding that TDX guest will always have WB memory type for all memory.

But, I am not sure whether it's better to have a standard hypercall between
guest & hypervisor for this purpose so things can be more flexible?

> + * Is allowed only for special cases when running virtualized. Must be called
> + * from the x86_init.hyper.init_platform() hook. X86_FEATURE_MTRR must be off.
> + */
> +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
> +			  mtrr_type def_type)
> +{
> +	unsigned int i;
> +
> +	if (WARN_ON(mtrr_state_set ||
> +		    hypervisor_is_type(X86_HYPER_NATIVE) ||
> +		    !cpu_feature_enabled(X86_FEATURE_HYPERVISOR) ||
> +		    (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP) &&
> +		     !hv_is_isolation_supported() &&
> +		     !cpu_feature_enabled(X86_FEATURE_XENPV) &&
> +		     !cpu_feature_enabled(X86_FEATURE_TDX_GUEST))))
> +		return;
> +
> +	/* Disable MTRR in order to disable MTRR modifications. */
> +	setup_clear_cpu_cap(X86_FEATURE_MTRR);
> +
> +	if (var) {
> +		if (num_var > MTRR_MAX_VAR_RANGES) {
> +			pr_warn("Trying to overwrite MTRR state with %u variable entries\n",
> +				num_var);
> +			num_var = MTRR_MAX_VAR_RANGES;
> +		}
> +		for (i = 0; i < num_var; i++)
> +			mtrr_state.var_ranges[i] = var[i];
> +		num_var_ranges = num_var;
> +	}
> +
> +	mtrr_state.def_type = def_type;
> +	mtrr_state.enabled |= MTRR_STATE_MTRR_ENABLED;
> +
> +	mtrr_state_set = 1;
> +}
> +
>  /**
>   * mtrr_type_lookup - look up memory type in MTRR
>   *
> diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
> index 7596ebeab929..5fe62ee0361b 100644
> --- a/arch/x86/kernel/cpu/mtrr/mtrr.c
> +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
> @@ -666,6 +666,15 @@ void __init mtrr_bp_init(void)
>  	const char *why = "(not available)";
>  	unsigned int phys_addr;
>  
> +	if (mtrr_state.enabled) {
> +		/* Software overwrite of MTRR state, only for generic case. */
> +		mtrr_calc_physbits(true);
> +		init_table();
> +		pr_info("MTRRs set to read-only\n");
> +
> +		return;
> +	}
> +
>  	phys_addr = mtrr_calc_physbits(boot_cpu_has(X86_FEATURE_MTRR));
>  
>  	if (boot_cpu_has(X86_FEATURE_MTRR)) {
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 16babff771bd..0cccfeb67c3a 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1037,6 +1037,8 @@ void __init setup_arch(char **cmdline_p)
>  	/*
>  	 * VMware detection requires dmi to be available, so this
>  	 * needs to be done after dmi_setup(), for the boot CPU.
> +	 * For some guest types (Xen PV, SEV-SNP, TDX) it is required to be
> +	 * called before cache_bp_init() for setting up MTRR state.
>  	 */
>  	init_hypervisor_platform();
>
  
Juergen Gross March 20, 2023, 1:47 p.m. UTC | #2
On 20.03.23 13:59, Huang, Kai wrote:
> On Mon, 2023-03-06 at 17:34 +0100, Juergen Gross wrote:
>> When running virtualized, MTRR access can be reduced (e.g. in Xen PV
>> guests or when running as a SEV-SNP guest under Hyper-V). Typically
>> the hypervisor will reset the MTRR feature in CPUID data, resulting
>> in no MTRR memory type information being available for the kernel.
>>
>> This has turned out to result in problems:
>>
>> - Hyper-V SEV-SNP guests using uncached mappings where they shouldn't
>> - Xen PV dom0 mapping memory as WB which should be UC- instead
>>
>> Solve those problems by supporting to set a static MTRR state,
>> overwriting the empty state used today. In case such a state has been
>> set, don't call get_mtrr_state() in mtrr_bp_init(). The set state
>> will only be used by mtrr_type_lookup(), as in all other cases
>> mtrr_enabled() is being checked, which will return false. Accept the
>> overwrite call only for selected cases when running as a guest.
>> Disable X86_FEATURE_MTRR in order to avoid any MTRR modifications by
>> just refusing them.
>>
>>
> [...]
> 
>> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
>> index ee09d359e08f..49b4cc923312 100644
>> --- a/arch/x86/kernel/cpu/mtrr/generic.c
>> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
>> @@ -8,10 +8,12 @@
>>   #include <linux/init.h>
>>   #include <linux/io.h>
>>   #include <linux/mm.h>
>> -
>> +#include <linux/cc_platform.h>
>>   #include <asm/processor-flags.h>
>>   #include <asm/cacheinfo.h>
>>   #include <asm/cpufeature.h>
>> +#include <asm/hypervisor.h>
>> +#include <asm/mshyperv.h>
> 
> Is <asm/mshyperv.h> needed here?

Yes, for hv_is_isolation_supported().

> 
>>   #include <asm/tlbflush.h>
>>   #include <asm/mtrr.h>
>>   #include <asm/msr.h>
>> @@ -240,6 +242,48 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
>>   	return mtrr_state.def_type;
>>   }
>>   
>> +/**
>> + * mtrr_overwrite_state - set static MTRR state
>> + *
>> + * Used to set MTRR state via different means (e.g. with data obtained from
>> + * a hypervisor).
> 
> +KVM list and KVM maintainers,
> 
> IIUC in the next patch, SEV-SNP guest only sets a synthetic MTRR w/o telling the
> hypervisor (hyperv).  I think this works for SEV-SNP running on top of hyperv
> because they have mutual understanding?
> 
> What about the SNP guest running on other hypervisors such as KVM?
> 
> Since this code covers TDX guest too, I think eventually it makes sense for TDX
> guest to use this function too (to avoid #VE IIUC).  If want to do that, then I
> think TDX guest should have the same mutual understanding with *ALL* hypervisor,
> as I am not sure what's the point of making the TDX guest's MTRR behaviour
> depending on specific hypervisor.

This series tries to fix the current fallout.

Boris Petkov asked for the hypervisor specific tests to be added, so I've
added them after discussing the topic with him (he is the maintainer of
this code after all).

> For now I don't see there's any use case for TDX guest to use non-WB memory type
> (in fact, KVM always maps guest memory as WB if there's no non-coherent DMA to
> the guest memory), so to me it seems it's OK to make a universal mutual
> understanding that TDX guest will always have WB memory type for all memory.

I agree.

> But, I am not sure whether it's better to have a standard hypercall between
> guest & hypervisor for this purpose so things can be more flexible?

Maybe. But for now we need to handle the current situation.


Juergen
  
Borislav Petkov March 20, 2023, 7:05 p.m. UTC | #3
On Mon, Mar 06, 2023 at 05:34:16PM +0100, Juergen Gross wrote:
> +/**
> + * mtrr_overwrite_state - set static MTRR state
> + *
> + * Used to set MTRR state via different means (e.g. with data obtained from
> + * a hypervisor).
> + * Is allowed only for special cases when running virtualized. Must be called
> + * from the x86_init.hyper.init_platform() hook. X86_FEATURE_MTRR must be off.
> + */
> +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
> +			  mtrr_type def_type)
> +{
> +	unsigned int i;
> +
> +	if (WARN_ON(mtrr_state_set ||
> +		    hypervisor_is_type(X86_HYPER_NATIVE) ||

Why that check?

> +		    !cpu_feature_enabled(X86_FEATURE_HYPERVISOR) ||
> +		    (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP) &&
> +		     !hv_is_isolation_supported() &&
> +		     !cpu_feature_enabled(X86_FEATURE_XENPV) &&
> +		     !cpu_feature_enabled(X86_FEATURE_TDX_GUEST))))

This is unparseable. Please split it into separate checks:

	if (WARN_ON(mtrr_state_set))
		return;

	if (WARN_ON(!cpu_feature_enabled(X86_FEATURE_HYPERVISOR)))
		return;

	...

and add comments above each one why we're testing this.


> +		return;
> +
> +	/* Disable MTRR in order to disable MTRR modifications. */
> +	setup_clear_cpu_cap(X86_FEATURE_MTRR);
> +
> +	if (var) {
> +		if (num_var > MTRR_MAX_VAR_RANGES) {
> +			pr_warn("Trying to overwrite MTRR state with %u variable entries\n",
> +				num_var);
> +			num_var = MTRR_MAX_VAR_RANGES;
> +		}
> +		for (i = 0; i < num_var; i++)
> +			mtrr_state.var_ranges[i] = var[i];
> +		num_var_ranges = num_var;
> +	}
> +
> +	mtrr_state.def_type = def_type;
> +	mtrr_state.enabled |= MTRR_STATE_MTRR_ENABLED;
> +
> +	mtrr_state_set = 1;
> +}
> +
>  /**
>   * mtrr_type_lookup - look up memory type in MTRR
>   *
> diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
> index 7596ebeab929..5fe62ee0361b 100644
> --- a/arch/x86/kernel/cpu/mtrr/mtrr.c
> +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
> @@ -666,6 +666,15 @@ void __init mtrr_bp_init(void)
>  	const char *why = "(not available)";
>  	unsigned int phys_addr;
>  
> +	if (mtrr_state.enabled) {

I'm guessing the proper detection of that weird state should be:

	/*
	 * Check for the software overwrite of MTRR state, only for generic case.
	 * See mtrr_overwrite_state().
	 */
	if (!cpu_feature_enabled(X86_FEATURE_MTRR) &&
	    mtrr_state.enabled) {
		...


> +		/* Software overwrite of MTRR state, only for generic case. */
> +		mtrr_calc_physbits(true);
> +		init_table();
> +		pr_info("MTRRs set to read-only\n");
> +
> +		return;
> +	}
> +
  
Kai Huang March 20, 2023, 9:34 p.m. UTC | #4
On Mon, 2023-03-20 at 14:47 +0100, Juergen Gross wrote:
> On 20.03.23 13:59, Huang, Kai wrote:
> > On Mon, 2023-03-06 at 17:34 +0100, Juergen Gross wrote:
> > > When running virtualized, MTRR access can be reduced (e.g. in Xen PV
> > > guests or when running as a SEV-SNP guest under Hyper-V). Typically
> > > the hypervisor will reset the MTRR feature in CPUID data, resulting
> > > in no MTRR memory type information being available for the kernel.
> > > 
> > > This has turned out to result in problems:
> > > 
> > > - Hyper-V SEV-SNP guests using uncached mappings where they shouldn't
> > > - Xen PV dom0 mapping memory as WB which should be UC- instead
> > > 
> > > Solve those problems by supporting to set a static MTRR state,
> > > overwriting the empty state used today. In case such a state has been
> > > set, don't call get_mtrr_state() in mtrr_bp_init(). The set state
> > > will only be used by mtrr_type_lookup(), as in all other cases
> > > mtrr_enabled() is being checked, which will return false. Accept the
> > > overwrite call only for selected cases when running as a guest.
> > > Disable X86_FEATURE_MTRR in order to avoid any MTRR modifications by
> > > just refusing them.
> > > 
> > > 
> > [...]
> > 
> > > diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
> > > index ee09d359e08f..49b4cc923312 100644
> > > --- a/arch/x86/kernel/cpu/mtrr/generic.c
> > > +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> > > @@ -8,10 +8,12 @@
> > >   #include <linux/init.h>
> > >   #include <linux/io.h>
> > >   #include <linux/mm.h>
> > > -
> > > +#include <linux/cc_platform.h>
> > >   #include <asm/processor-flags.h>
> > >   #include <asm/cacheinfo.h>
> > >   #include <asm/cpufeature.h>
> > > +#include <asm/hypervisor.h>
> > > +#include <asm/mshyperv.h>
> > 
> > Is <asm/mshyperv.h> needed here?
> 
> Yes, for hv_is_isolation_supported().
> 
> > 
> > >   #include <asm/tlbflush.h>
> > >   #include <asm/mtrr.h>
> > >   #include <asm/msr.h>
> > > @@ -240,6 +242,48 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
> > >   	return mtrr_state.def_type;
> > >   }
> > >   
> > > +/**
> > > + * mtrr_overwrite_state - set static MTRR state
> > > + *
> > > + * Used to set MTRR state via different means (e.g. with data obtained from
> > > + * a hypervisor).
> > 
> > +KVM list and KVM maintainers,
> > 
> > IIUC in the next patch, SEV-SNP guest only sets a synthetic MTRR w/o telling the
> > hypervisor (hyperv).  I think this works for SEV-SNP running on top of hyperv
> > because they have mutual understanding?
> > 
> > What about the SNP guest running on other hypervisors such as KVM?
> > 
> > Since this code covers TDX guest too, I think eventually it makes sense for TDX
> > guest to use this function too (to avoid #VE IIUC).  If want to do that, then I
> > think TDX guest should have the same mutual understanding with *ALL* hypervisor,
> > as I am not sure what's the point of making the TDX guest's MTRR behaviour
> > depending on specific hypervisor.
> 
> This series tries to fix the current fallout.
> 
> Boris Petkov asked for the hypervisor specific tests to be added, so I've
> added them after discussing the topic with him (he is the maintainer of
> this code after all).
> 
> > For now I don't see there's any use case for TDX guest to use non-WB memory type
> > (in fact, KVM always maps guest memory as WB if there's no non-coherent DMA to
> > the guest memory), so to me it seems it's OK to make a universal mutual
> > understanding that TDX guest will always have WB memory type for all memory.
> 
> I agree.
> 
> > But, I am not sure whether it's better to have a standard hypercall between
> > guest & hypervisor for this purpose so things can be more flexible?
> 
> Maybe. But for now we need to handle the current situation.
> 
> 

Agreed.  Thanks for explaining.
  
Borislav Petkov March 20, 2023, 10:42 p.m. UTC | #5
On Mon, Mar 20, 2023 at 02:47:30PM +0100, Juergen Gross wrote:
> > Since this code covers TDX guest too, I think eventually it makes sense for TDX
> > guest to use this function too (to avoid #VE IIUC).  If want to do that, then I
> > think TDX guest should have the same mutual understanding with *ALL* hypervisor,
> > as I am not sure what's the point of making the TDX guest's MTRR behaviour
> > depending on specific hypervisor.
> 
> This series tries to fix the current fallout.

We can relax the check so that it runs on TDX too. Along with a comment
above it why it needs to run on TDX.
  
Juergen Gross March 21, 2023, 6 a.m. UTC | #6
On 20.03.23 20:05, Borislav Petkov wrote:
> On Mon, Mar 06, 2023 at 05:34:16PM +0100, Juergen Gross wrote:
>> +/**
>> + * mtrr_overwrite_state - set static MTRR state
>> + *
>> + * Used to set MTRR state via different means (e.g. with data obtained from
>> + * a hypervisor).
>> + * Is allowed only for special cases when running virtualized. Must be called
>> + * from the x86_init.hyper.init_platform() hook. X86_FEATURE_MTRR must be off.
>> + */
>> +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
>> +			  mtrr_type def_type)
>> +{
>> +	unsigned int i;
>> +
>> +	if (WARN_ON(mtrr_state_set ||
>> +		    hypervisor_is_type(X86_HYPER_NATIVE) ||
> 
> Why that check?

I guess you are asking because the next test seems to catch the same case?

I think it doesn't, e.g. for the case of unknown hypervisors (which shows that
X86_HYPER_NATIVE in theory should be named X86_HYPER_NATIVE_OR_UNKNOWN, or it
should be split into X86_HYPER_NATIVE and X86_HYPER_UNKNOWN).

> 
>> +		    !cpu_feature_enabled(X86_FEATURE_HYPERVISOR) ||
>> +		    (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP) &&
>> +		     !hv_is_isolation_supported() &&
>> +		     !cpu_feature_enabled(X86_FEATURE_XENPV) &&
>> +		     !cpu_feature_enabled(X86_FEATURE_TDX_GUEST))))
> 
> This is unparseable. Please split it into separate checks:
> 
> 	if (WARN_ON(mtrr_state_set))
> 		return;
> 
> 	if (WARN_ON(!cpu_feature_enabled(X86_FEATURE_HYPERVISOR)))
> 		return;
> 
> 	...
> 
> and add comments above each one why we're testing this.

Okay.

> 
> 
>> +		return;
>> +
>> +	/* Disable MTRR in order to disable MTRR modifications. */
>> +	setup_clear_cpu_cap(X86_FEATURE_MTRR);
>> +
>> +	if (var) {
>> +		if (num_var > MTRR_MAX_VAR_RANGES) {
>> +			pr_warn("Trying to overwrite MTRR state with %u variable entries\n",
>> +				num_var);
>> +			num_var = MTRR_MAX_VAR_RANGES;
>> +		}
>> +		for (i = 0; i < num_var; i++)
>> +			mtrr_state.var_ranges[i] = var[i];
>> +		num_var_ranges = num_var;
>> +	}
>> +
>> +	mtrr_state.def_type = def_type;
>> +	mtrr_state.enabled |= MTRR_STATE_MTRR_ENABLED;
>> +
>> +	mtrr_state_set = 1;
>> +}
>> +
>>   /**
>>    * mtrr_type_lookup - look up memory type in MTRR
>>    *
>> diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
>> index 7596ebeab929..5fe62ee0361b 100644
>> --- a/arch/x86/kernel/cpu/mtrr/mtrr.c
>> +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
>> @@ -666,6 +666,15 @@ void __init mtrr_bp_init(void)
>>   	const char *why = "(not available)";
>>   	unsigned int phys_addr;
>>   
>> +	if (mtrr_state.enabled) {
> 
> I'm guessing the proper detection of that weird state should be:
> 
> 	/*
> 	 * Check for the software overwrite of MTRR state, only for generic case.
> 	 * See mtrr_overwrite_state().
> 	 */
> 	if (!cpu_feature_enabled(X86_FEATURE_MTRR) &&
> 	    mtrr_state.enabled) {
> 		...

It basically doesn't matter.

The only possibility of mtrr_state.enabled to be set at this point is a
previous call of mtrr_overwrite_state().


Juergen
  
Juergen Gross March 21, 2023, 6:01 a.m. UTC | #7
On 20.03.23 23:42, Borislav Petkov wrote:
> On Mon, Mar 20, 2023 at 02:47:30PM +0100, Juergen Gross wrote:
>>> Since this code covers TDX guest too, I think eventually it makes sense for TDX
>>> guest to use this function too (to avoid #VE IIUC).  If want to do that, then I
>>> think TDX guest should have the same mutual understanding with *ALL* hypervisor,
>>> as I am not sure what's the point of making the TDX guest's MTRR behaviour
>>> depending on specific hypervisor.
>>
>> This series tries to fix the current fallout.
> 
> We can relax the check so that it runs on TDX too. Along with a comment
> above it why it needs to run on TDX.
> 

Okay, fine with me.


Juergen
  
Borislav Petkov March 21, 2023, 10:30 a.m. UTC | #8
On Tue, Mar 21, 2023 at 07:00:58AM +0100, Juergen Gross wrote:
> I guess you are asking because the next test seems to catch the same case?
> 
> I think it doesn't, e.g. for the case of unknown hypervisors (which shows that
> X86_HYPER_NATIVE in theory should be named X86_HYPER_NATIVE_OR_UNKNOWN, or it
> should be split into X86_HYPER_NATIVE and X86_HYPER_UNKNOWN).

Yeah, we don't care about unknown hypervisors. They'll crash'n'burn
anyway.

My intent is to have every case properly documented with a comment above it
instead of one huge compound conditional.

> It basically doesn't matter.

It doesn't matter now. Until someone decides to "redefine" how MTRRs
should be done again because the next representative from the virt zoo
decided to do magic pink ponies.

I'm not taking any chances anymore judging by the amount of crap that
gets sent into arch/x86/ to support some weird guest contraption.

> The only possibility of mtrr_state.enabled to be set at this point is a
> previous call of mtrr_overwrite_state().

Sure, pls make it explicit and defensive so that it is perfectly clear
what's going on.

Thx.
  
Juergen Gross March 21, 2023, 3:49 p.m. UTC | #9
On 21.03.23 11:30, Borislav Petkov wrote:
> On Tue, Mar 21, 2023 at 07:00:58AM +0100, Juergen Gross wrote:
>> I guess you are asking because the next test seems to catch the same case?
>>
>> I think it doesn't, e.g. for the case of unknown hypervisors (which shows that
>> X86_HYPER_NATIVE in theory should be named X86_HYPER_NATIVE_OR_UNKNOWN, or it
>> should be split into X86_HYPER_NATIVE and X86_HYPER_UNKNOWN).
> 
> Yeah, we don't care about unknown hypervisors. They'll crash'n'burn
> anyway.

Okay, I'll drop that test.

> My intent is to have every case properly documented with a comment above it
> instead of one huge compound conditional.
> 
>> It basically doesn't matter.
> 
> It doesn't matter now. Until someone decides to "redefine" how MTRRs
> should be done again because the next representative from the virt zoo
> decided to do magic pink ponies.
> 
> I'm not taking any chances anymore judging by the amount of crap that
> gets sent into arch/x86/ to support some weird guest contraption.
> 
>> The only possibility of mtrr_state.enabled to be set at this point is a
>> previous call of mtrr_overwrite_state().
> 
> Sure, pls make it explicit and defensive so that it is perfectly clear
> what's going on.

Okay, will do the modification you were suggesting.


Juergen
  

Patch

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index f0eeaf6e5f5f..f1cb81330a64 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -31,6 +31,8 @@ 
  */
 # ifdef CONFIG_MTRR
 void mtrr_bp_init(void);
+void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
+			  mtrr_type def_type);
 extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
 extern void mtrr_save_fixed_ranges(void *);
 extern void mtrr_save_state(void);
@@ -48,6 +50,12 @@  void mtrr_disable(void);
 void mtrr_enable(void);
 void mtrr_generic_set_state(void);
 #  else
+static inline void mtrr_overwrite_state(struct mtrr_var_range *var,
+					unsigned int num_var,
+					mtrr_type def_type)
+{
+}
+
 static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
 {
 	/*
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index ee09d359e08f..49b4cc923312 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -8,10 +8,12 @@ 
 #include <linux/init.h>
 #include <linux/io.h>
 #include <linux/mm.h>
-
+#include <linux/cc_platform.h>
 #include <asm/processor-flags.h>
 #include <asm/cacheinfo.h>
 #include <asm/cpufeature.h>
+#include <asm/hypervisor.h>
+#include <asm/mshyperv.h>
 #include <asm/tlbflush.h>
 #include <asm/mtrr.h>
 #include <asm/msr.h>
@@ -240,6 +242,48 @@  static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
 	return mtrr_state.def_type;
 }
 
+/**
+ * mtrr_overwrite_state - set static MTRR state
+ *
+ * Used to set MTRR state via different means (e.g. with data obtained from
+ * a hypervisor).
+ * Is allowed only for special cases when running virtualized. Must be called
+ * from the x86_init.hyper.init_platform() hook. X86_FEATURE_MTRR must be off.
+ */
+void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
+			  mtrr_type def_type)
+{
+	unsigned int i;
+
+	if (WARN_ON(mtrr_state_set ||
+		    hypervisor_is_type(X86_HYPER_NATIVE) ||
+		    !cpu_feature_enabled(X86_FEATURE_HYPERVISOR) ||
+		    (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP) &&
+		     !hv_is_isolation_supported() &&
+		     !cpu_feature_enabled(X86_FEATURE_XENPV) &&
+		     !cpu_feature_enabled(X86_FEATURE_TDX_GUEST))))
+		return;
+
+	/* Disable MTRR in order to disable MTRR modifications. */
+	setup_clear_cpu_cap(X86_FEATURE_MTRR);
+
+	if (var) {
+		if (num_var > MTRR_MAX_VAR_RANGES) {
+			pr_warn("Trying to overwrite MTRR state with %u variable entries\n",
+				num_var);
+			num_var = MTRR_MAX_VAR_RANGES;
+		}
+		for (i = 0; i < num_var; i++)
+			mtrr_state.var_ranges[i] = var[i];
+		num_var_ranges = num_var;
+	}
+
+	mtrr_state.def_type = def_type;
+	mtrr_state.enabled |= MTRR_STATE_MTRR_ENABLED;
+
+	mtrr_state_set = 1;
+}
+
 /**
  * mtrr_type_lookup - look up memory type in MTRR
  *
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index 7596ebeab929..5fe62ee0361b 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -666,6 +666,15 @@  void __init mtrr_bp_init(void)
 	const char *why = "(not available)";
 	unsigned int phys_addr;
 
+	if (mtrr_state.enabled) {
+		/* Software overwrite of MTRR state, only for generic case. */
+		mtrr_calc_physbits(true);
+		init_table();
+		pr_info("MTRRs set to read-only\n");
+
+		return;
+	}
+
 	phys_addr = mtrr_calc_physbits(boot_cpu_has(X86_FEATURE_MTRR));
 
 	if (boot_cpu_has(X86_FEATURE_MTRR)) {
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 16babff771bd..0cccfeb67c3a 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1037,6 +1037,8 @@  void __init setup_arch(char **cmdline_p)
 	/*
 	 * VMware detection requires dmi to be available, so this
 	 * needs to be done after dmi_setup(), for the boot CPU.
+	 * For some guest types (Xen PV, SEV-SNP, TDX) it is required to be
+	 * called before cache_bp_init() for setting up MTRR state.
 	 */
 	init_hypervisor_platform();