[6/9] KVM: SVM: Add MSR_IA32_XSS to the GHCB for hypervisor kernel

Message ID 20231010200220.897953-7-john.allen@amd.com
State New
Headers
Series SVM guest shadow stack support |

Commit Message

John Allen Oct. 10, 2023, 8:02 p.m. UTC
  When a guest issues a cpuid instruction for Fn0000000D_x0B
(CetUserOffset), KVM will intercept and need to access the guest
MSR_IA32_XSS value. For SEV-ES, this is encrypted and needs to be
included in the GHCB to be visible to the hypervisor.

Signed-off-by: John Allen <john.allen@amd.com>
---
 arch/x86/include/asm/svm.h |  1 +
 arch/x86/kvm/svm/sev.c     | 12 ++++++++++--
 arch/x86/kvm/svm/svm.c     |  1 +
 arch/x86/kvm/svm/svm.h     |  3 ++-
 4 files changed, 14 insertions(+), 3 deletions(-)
  

Comments

Sean Christopherson Oct. 14, 2023, 12:31 a.m. UTC | #1
On Tue, Oct 10, 2023, John Allen wrote:
> When a guest issues a cpuid instruction for Fn0000000D_x0B
> (CetUserOffset), KVM will intercept and need to access the guest
> MSR_IA32_XSS value. For SEV-ES, this is encrypted and needs to be
> included in the GHCB to be visible to the hypervisor.
> 
> Signed-off-by: John Allen <john.allen@amd.com>
> ---
> @@ -3032,6 +3037,9 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
>  		if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP))
>  			svm_clr_intercept(svm, INTERCEPT_RDTSCP);
>  	}
> +
> +	if (kvm_caps.supported_xss)
> +		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);

This creates a giant gaping virtualization hole for the guest to walk through.
Want to hide shadow stacks from the guest because hardware is broken?  Too bad,
the guest can still set any and all XFeature bits it wants!  I realize "KVM"
already creates such a hole by disabling interception of XSETBV, but that doesn't
make it right.  In quotes because it's not like KVM has a choice for SEV-ES guests.

This is another example of SEV-SNP and beyond clearly being designed to work with
a paravisor, SVM_VMGEXIT_AP_CREATE being the other big one that comes to mind.

KVM should at least block CET by killing the VM if the guest illegally sets
CR4.CET.  Ugh, and is that even possible with SVM_VMGEXIT_AP_CREATE?  The guest
can shove in whatever CR4 it wants, so long as it the value is supported by
hardware.

At what point do we bite the bullet and require a paravisor?  Because the more I
see of SNP, the more I'm convinced that it's not entirely safe to run untrusted
guest code at VMPL0.
  
Maxim Levitsky Nov. 2, 2023, 6:10 p.m. UTC | #2
On Tue, 2023-10-10 at 20:02 +0000, John Allen wrote:
> When a guest issues a cpuid instruction for Fn0000000D_x0B
> (CetUserOffset), KVM will intercept and need to access the guest
> MSR_IA32_XSS value. For SEV-ES, this is encrypted and needs to be
> included in the GHCB to be visible to the hypervisor.
> 
> Signed-off-by: John Allen <john.allen@amd.com>
> ---
>  arch/x86/include/asm/svm.h |  1 +
>  arch/x86/kvm/svm/sev.c     | 12 ++++++++++--
>  arch/x86/kvm/svm/svm.c     |  1 +
>  arch/x86/kvm/svm/svm.h     |  3 ++-
>  4 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index 568d97084e44..5afc9e03379d 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -678,5 +678,6 @@ DEFINE_GHCB_ACCESSORS(sw_exit_info_1)
>  DEFINE_GHCB_ACCESSORS(sw_exit_info_2)
>  DEFINE_GHCB_ACCESSORS(sw_scratch)
>  DEFINE_GHCB_ACCESSORS(xcr0)
> +DEFINE_GHCB_ACCESSORS(xss)

I don't see anywhere in the patch adding xss to ghcb_save_area.
What kernel version/commit these patches are based on?

>  
>  #endif
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index bb4b18baa6f7..94ab7203525f 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2445,8 +2445,13 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
>  
>  	svm->vmcb->save.cpl = kvm_ghcb_get_cpl_if_valid(svm, ghcb);
>  
> -	if (kvm_ghcb_xcr0_is_valid(svm)) {
> -		vcpu->arch.xcr0 = ghcb_get_xcr0(ghcb);
> +	if (kvm_ghcb_xcr0_is_valid(svm) || kvm_ghcb_xss_is_valid(svm)) {
> +		if (kvm_ghcb_xcr0_is_valid(svm))
> +			vcpu->arch.xcr0 = ghcb_get_xcr0(ghcb);
> +
> +		if (kvm_ghcb_xss_is_valid(svm))
> +			vcpu->arch.ia32_xss = ghcb_get_xss(ghcb);
> +
>  		kvm_update_cpuid_runtime(vcpu);
>  	}
>  
> @@ -3032,6 +3037,9 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
>  		if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP))
>  			svm_clr_intercept(svm, INTERCEPT_RDTSCP);
>  	}
> +
> +	if (kvm_caps.supported_xss)
> +		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);

This is not just a virtualization hole. This allows the guest to set MSR_IA32_XSS
to whatever value it wants, and thus it might allow XSAVES to access some host msrs
that guest must not be able to access.

AMD might not yet have such msrs, but on Intel side I do see various components
like 'HDC State', 'HWP state' and such.

I understand that this is needed so that #VC handler could read this msr, and trying
to read it will cause another #VC which is probably not allowed (I don't know this detail of SEV-ES)

I guess #VC handler should instead use a kernel cached value of this msr instead, or at least
KVM should only allow reads and not writes to it.

In addition to that, if we decide to open the read access to the IA32_XSS from the guest,
this IMHO should be done in a separate patch.

Best regards,
	Maxim Levitsky


>  }
>  
>  void sev_init_vmcb(struct vcpu_svm *svm)
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 984e89d7a734..ee7c7d0a09ab 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -146,6 +146,7 @@ static const struct svm_direct_access_msrs {
>  	{ .index = MSR_IA32_PL1_SSP,                    .always = false },
>  	{ .index = MSR_IA32_PL2_SSP,                    .always = false },
>  	{ .index = MSR_IA32_PL3_SSP,                    .always = false },
> +	{ .index = MSR_IA32_XSS,                        .always = false },
>  	{ .index = MSR_INVALID,				.always = false },
>  };
>  
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index bdc39003b955..2011456d2e9f 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -30,7 +30,7 @@
>  #define	IOPM_SIZE PAGE_SIZE * 3
>  #define	MSRPM_SIZE PAGE_SIZE * 2
>  
> -#define MAX_DIRECT_ACCESS_MSRS	53
> +#define MAX_DIRECT_ACCESS_MSRS	54
>  #define MSRPM_OFFSETS	32
>  extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
>  extern bool npt_enabled;
> @@ -720,5 +720,6 @@ DEFINE_KVM_GHCB_ACCESSORS(sw_exit_info_1)
>  DEFINE_KVM_GHCB_ACCESSORS(sw_exit_info_2)
>  DEFINE_KVM_GHCB_ACCESSORS(sw_scratch)
>  DEFINE_KVM_GHCB_ACCESSORS(xcr0)
> +DEFINE_KVM_GHCB_ACCESSORS(xss)
>  
>  #endif
  
Sean Christopherson Nov. 2, 2023, 11:22 p.m. UTC | #3
On Thu, Nov 02, 2023, Maxim Levitsky wrote:
> On Tue, 2023-10-10 at 20:02 +0000, John Allen wrote:
> > @@ -3032,6 +3037,9 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
> >  		if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP))
> >  			svm_clr_intercept(svm, INTERCEPT_RDTSCP);
> >  	}
> > +
> > +	if (kvm_caps.supported_xss)
> > +		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);
> 
> This is not just a virtualization hole. This allows the guest to set MSR_IA32_XSS
> to whatever value it wants, and thus it might allow XSAVES to access some host msrs
> that guest must not be able to access.
> 
> AMD might not yet have such msrs, but on Intel side I do see various components
> like 'HDC State', 'HWP state' and such.

The approach AMD has taken with SEV-ES+ is to have ucode context switch everything
that the guest can access.  So, in theory, if/when AMD adds more XCR0/XSS-based
features, that state will also be context switched.

Don't get me wrong, I hate this with a passion, but it's not *quite* fatally unsafe,
just horrific.

> I understand that this is needed so that #VC handler could read this msr, and
> trying to read it will cause another #VC which is probably not allowed (I
> don't know this detail of SEV-ES)
> 
> I guess #VC handler should instead use a kernel cached value of this msr
> instead, or at least KVM should only allow reads and not writes to it.

Nope, doesn't work.  In addition to automatically context switching state, SEV-ES
also encrypts the guest state, i.e. KVM *can't* correctly virtualize XSS (or XCR0)
for the guest, because KVM *can't* load the guest's desired value into hardware.

The guest can do #VMGEXIT (a.k.a. VMMCALL) all it wants to request a certain XSS
or XCR0, and there's not a damn thing KVM can do to service the request.
  
Maxim Levitsky Nov. 7, 2023, 6:20 p.m. UTC | #4
On Thu, 2023-11-02 at 16:22 -0700, Sean Christopherson wrote:
> On Thu, Nov 02, 2023, Maxim Levitsky wrote:
> > On Tue, 2023-10-10 at 20:02 +0000, John Allen wrote:
> > > @@ -3032,6 +3037,9 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
> > >  		if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP))
> > >  			svm_clr_intercept(svm, INTERCEPT_RDTSCP);
> > >  	}
> > > +
> > > +	if (kvm_caps.supported_xss)
> > > +		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);
> > 
> > This is not just a virtualization hole. This allows the guest to set MSR_IA32_XSS
> > to whatever value it wants, and thus it might allow XSAVES to access some host msrs
> > that guest must not be able to access.
> > 
> > AMD might not yet have such msrs, but on Intel side I do see various components
> > like 'HDC State', 'HWP state' and such.
> 
> The approach AMD has taken with SEV-ES+ is to have ucode context switch everything
> that the guest can access.  So, in theory, if/when AMD adds more XCR0/XSS-based
> features, that state will also be context switched.
> 
> Don't get me wrong, I hate this with a passion, but it's not *quite* fatally unsafe,
> just horrific.
> 
> > I understand that this is needed so that #VC handler could read this msr, and
> > trying to read it will cause another #VC which is probably not allowed (I
> > don't know this detail of SEV-ES)
> > 
> > I guess #VC handler should instead use a kernel cached value of this msr
> > instead, or at least KVM should only allow reads and not writes to it.
> 
> Nope, doesn't work.  In addition to automatically context switching state, SEV-ES
> also encrypts the guest state, i.e. KVM *can't* correctly virtualize XSS (or XCR0)
> for the guest, because KVM *can't* load the guest's desired value into hardware.
> 
> The guest can do #VMGEXIT (a.k.a. VMMCALL) all it wants to request a certain XSS
> or XCR0, and there's not a damn thing KVM can do to service the request.
> 

Ah, I understand now. Everything makes sense, and yes, this is really ugly.

Best regards,
	Maxim Levitsky
  
John Allen Feb. 15, 2024, 5:39 p.m. UTC | #5
On Tue, Nov 07, 2023 at 08:20:52PM +0200, Maxim Levitsky wrote:
> On Thu, 2023-11-02 at 16:22 -0700, Sean Christopherson wrote:
> > On Thu, Nov 02, 2023, Maxim Levitsky wrote:
> > > On Tue, 2023-10-10 at 20:02 +0000, John Allen wrote:
> > > > @@ -3032,6 +3037,9 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
> > > >  		if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP))
> > > >  			svm_clr_intercept(svm, INTERCEPT_RDTSCP);
> > > >  	}
> > > > +
> > > > +	if (kvm_caps.supported_xss)
> > > > +		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);
> > > 
> > > This is not just a virtualization hole. This allows the guest to set MSR_IA32_XSS
> > > to whatever value it wants, and thus it might allow XSAVES to access some host msrs
> > > that guest must not be able to access.
> > > 
> > > AMD might not yet have such msrs, but on Intel side I do see various components
> > > like 'HDC State', 'HWP state' and such.
> > 
> > The approach AMD has taken with SEV-ES+ is to have ucode context switch everything
> > that the guest can access.  So, in theory, if/when AMD adds more XCR0/XSS-based
> > features, that state will also be context switched.
> > 
> > Don't get me wrong, I hate this with a passion, but it's not *quite* fatally unsafe,
> > just horrific.
> > 
> > > I understand that this is needed so that #VC handler could read this msr, and
> > > trying to read it will cause another #VC which is probably not allowed (I
> > > don't know this detail of SEV-ES)
> > > 
> > > I guess #VC handler should instead use a kernel cached value of this msr
> > > instead, or at least KVM should only allow reads and not writes to it.
> > 
> > Nope, doesn't work.  In addition to automatically context switching state, SEV-ES
> > also encrypts the guest state, i.e. KVM *can't* correctly virtualize XSS (or XCR0)
> > for the guest, because KVM *can't* load the guest's desired value into hardware.
> > 
> > The guest can do #VMGEXIT (a.k.a. VMMCALL) all it wants to request a certain XSS
> > or XCR0, and there's not a damn thing KVM can do to service the request.
> > 
> 
> Ah, I understand now. Everything makes sense, and yes, this is really ugly.

Hi Maxim and Sean,

It looks as though there are some broad changes that will need to happen
over the long term WRT to SEV-ES/SEV-SNP. In the short term, how would
you suggest I proceed with the SVM shstk series? Can we omit the SEV-ES
changes for now with an additional patch that disallows guest shstk when
SEV-ES is enabled? Subsequently, when we have a proper solution for the
concerns discussed here, we could submit another series for SEV-ES
support.

Thanks,
John
  
Sean Christopherson Feb. 20, 2024, 4:20 p.m. UTC | #6
On Thu, Feb 15, 2024, John Allen wrote:
> On Tue, Nov 07, 2023 at 08:20:52PM +0200, Maxim Levitsky wrote:
> > On Thu, 2023-11-02 at 16:22 -0700, Sean Christopherson wrote:
> > > On Thu, Nov 02, 2023, Maxim Levitsky wrote:
> > > > On Tue, 2023-10-10 at 20:02 +0000, John Allen wrote:
> > > > > @@ -3032,6 +3037,9 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
> > > > >  		if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP))
> > > > >  			svm_clr_intercept(svm, INTERCEPT_RDTSCP);
> > > > >  	}
> > > > > +
> > > > > +	if (kvm_caps.supported_xss)
> > > > > +		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);
> > > > 
> > > > This is not just a virtualization hole. This allows the guest to set MSR_IA32_XSS
> > > > to whatever value it wants, and thus it might allow XSAVES to access some host msrs
> > > > that guest must not be able to access.
> > > > 
> > > > AMD might not yet have such msrs, but on Intel side I do see various components
> > > > like 'HDC State', 'HWP state' and such.
> > > 
> > > The approach AMD has taken with SEV-ES+ is to have ucode context switch everything
> > > that the guest can access.  So, in theory, if/when AMD adds more XCR0/XSS-based
> > > features, that state will also be context switched.
> > > 
> > > Don't get me wrong, I hate this with a passion, but it's not *quite* fatally unsafe,
> > > just horrific.
> > > 
> > > > I understand that this is needed so that #VC handler could read this msr, and
> > > > trying to read it will cause another #VC which is probably not allowed (I
> > > > don't know this detail of SEV-ES)
> > > > 
> > > > I guess #VC handler should instead use a kernel cached value of this msr
> > > > instead, or at least KVM should only allow reads and not writes to it.
> > > 
> > > Nope, doesn't work.  In addition to automatically context switching state, SEV-ES
> > > also encrypts the guest state, i.e. KVM *can't* correctly virtualize XSS (or XCR0)
> > > for the guest, because KVM *can't* load the guest's desired value into hardware.
> > > 
> > > The guest can do #VMGEXIT (a.k.a. VMMCALL) all it wants to request a certain XSS
> > > or XCR0, and there's not a damn thing KVM can do to service the request.
> > > 
> > 
> > Ah, I understand now. Everything makes sense, and yes, this is really ugly.
> 
> Hi Maxim and Sean,
> 
> It looks as though there are some broad changes that will need to happen
> over the long term WRT to SEV-ES/SEV-SNP. In the short term, how would
> you suggest I proceed with the SVM shstk series? Can we omit the SEV-ES
> changes for now with an additional patch that disallows guest shstk when
> SEV-ES is enabled? Subsequently, when we have a proper solution for the
> concerns discussed here, we could submit another series for SEV-ES
> support.

The SEV-ES mess was already addressed by commit a26b7cd22546 ("KVM: SEV: Do not
intercept accesses to MSR_IA32_XSS for SEV-ES guests").  Or is there more that's
needed for shadow stacks?
  
John Allen Feb. 20, 2024, 4:33 p.m. UTC | #7
On Tue, Feb 20, 2024 at 08:20:36AM -0800, Sean Christopherson wrote:
> On Thu, Feb 15, 2024, John Allen wrote:
> > On Tue, Nov 07, 2023 at 08:20:52PM +0200, Maxim Levitsky wrote:
> > > On Thu, 2023-11-02 at 16:22 -0700, Sean Christopherson wrote:
> > > > On Thu, Nov 02, 2023, Maxim Levitsky wrote:
> > > > > On Tue, 2023-10-10 at 20:02 +0000, John Allen wrote:
> > > > > > @@ -3032,6 +3037,9 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
> > > > > >  		if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP))
> > > > > >  			svm_clr_intercept(svm, INTERCEPT_RDTSCP);
> > > > > >  	}
> > > > > > +
> > > > > > +	if (kvm_caps.supported_xss)
> > > > > > +		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);
> > > > > 
> > > > > This is not just a virtualization hole. This allows the guest to set MSR_IA32_XSS
> > > > > to whatever value it wants, and thus it might allow XSAVES to access some host msrs
> > > > > that guest must not be able to access.
> > > > > 
> > > > > AMD might not yet have such msrs, but on Intel side I do see various components
> > > > > like 'HDC State', 'HWP state' and such.
> > > > 
> > > > The approach AMD has taken with SEV-ES+ is to have ucode context switch everything
> > > > that the guest can access.  So, in theory, if/when AMD adds more XCR0/XSS-based
> > > > features, that state will also be context switched.
> > > > 
> > > > Don't get me wrong, I hate this with a passion, but it's not *quite* fatally unsafe,
> > > > just horrific.
> > > > 
> > > > > I understand that this is needed so that #VC handler could read this msr, and
> > > > > trying to read it will cause another #VC which is probably not allowed (I
> > > > > don't know this detail of SEV-ES)
> > > > > 
> > > > > I guess #VC handler should instead use a kernel cached value of this msr
> > > > > instead, or at least KVM should only allow reads and not writes to it.
> > > > 
> > > > Nope, doesn't work.  In addition to automatically context switching state, SEV-ES
> > > > also encrypts the guest state, i.e. KVM *can't* correctly virtualize XSS (or XCR0)
> > > > for the guest, because KVM *can't* load the guest's desired value into hardware.
> > > > 
> > > > The guest can do #VMGEXIT (a.k.a. VMMCALL) all it wants to request a certain XSS
> > > > or XCR0, and there's not a damn thing KVM can do to service the request.
> > > > 
> > > 
> > > Ah, I understand now. Everything makes sense, and yes, this is really ugly.
> > 
> > Hi Maxim and Sean,
> > 
> > It looks as though there are some broad changes that will need to happen
> > over the long term WRT to SEV-ES/SEV-SNP. In the short term, how would
> > you suggest I proceed with the SVM shstk series? Can we omit the SEV-ES
> > changes for now with an additional patch that disallows guest shstk when
> > SEV-ES is enabled? Subsequently, when we have a proper solution for the
> > concerns discussed here, we could submit another series for SEV-ES
> > support.
> 
> The SEV-ES mess was already addressed by commit a26b7cd22546 ("KVM: SEV: Do not
> intercept accesses to MSR_IA32_XSS for SEV-ES guests").  Or is there more that's
> needed for shadow stacks?

Ah, yes, you are right. That patch should address the controversial
change discussed above at least. Patch 5/9 and 7/9 of this series also
address different SEV-ES issues and will still need to included.

Thanks,
John
  
John Allen Feb. 21, 2024, 4:38 p.m. UTC | #8
On Thu, Nov 02, 2023 at 08:10:58PM +0200, Maxim Levitsky wrote:
> On Tue, 2023-10-10 at 20:02 +0000, John Allen wrote:
> > When a guest issues a cpuid instruction for Fn0000000D_x0B
> > (CetUserOffset), KVM will intercept and need to access the guest
> > MSR_IA32_XSS value. For SEV-ES, this is encrypted and needs to be
> > included in the GHCB to be visible to the hypervisor.
> > 
> > Signed-off-by: John Allen <john.allen@amd.com>
> > ---
> >  arch/x86/include/asm/svm.h |  1 +
> >  arch/x86/kvm/svm/sev.c     | 12 ++++++++++--
> >  arch/x86/kvm/svm/svm.c     |  1 +
> >  arch/x86/kvm/svm/svm.h     |  3 ++-
> >  4 files changed, 14 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> > index 568d97084e44..5afc9e03379d 100644
> > --- a/arch/x86/include/asm/svm.h
> > +++ b/arch/x86/include/asm/svm.h
> > @@ -678,5 +678,6 @@ DEFINE_GHCB_ACCESSORS(sw_exit_info_1)
> >  DEFINE_GHCB_ACCESSORS(sw_exit_info_2)
> >  DEFINE_GHCB_ACCESSORS(sw_scratch)
> >  DEFINE_GHCB_ACCESSORS(xcr0)
> > +DEFINE_GHCB_ACCESSORS(xss)
> 
> I don't see anywhere in the patch adding xss to ghcb_save_area.
> What kernel version/commit these patches are based on?

Looks like it first got added to the vmcb save area here:
861377730aa9db4cbaa0f3bd3f4d295c152732c4

It was included in the ghcb save area when it was created it looks
like:
a4690359eaec985a1351786da887df1ba92440a0

Unless I'm misunderstanding the ask. Is there somewhere else this needs
to be added?

Thanks,
John

> 
> >  
> >  #endif
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index bb4b18baa6f7..94ab7203525f 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -2445,8 +2445,13 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
> >  
> >  	svm->vmcb->save.cpl = kvm_ghcb_get_cpl_if_valid(svm, ghcb);
> >  
> > -	if (kvm_ghcb_xcr0_is_valid(svm)) {
> > -		vcpu->arch.xcr0 = ghcb_get_xcr0(ghcb);
> > +	if (kvm_ghcb_xcr0_is_valid(svm) || kvm_ghcb_xss_is_valid(svm)) {
> > +		if (kvm_ghcb_xcr0_is_valid(svm))
> > +			vcpu->arch.xcr0 = ghcb_get_xcr0(ghcb);
> > +
> > +		if (kvm_ghcb_xss_is_valid(svm))
> > +			vcpu->arch.ia32_xss = ghcb_get_xss(ghcb);
> > +
> >  		kvm_update_cpuid_runtime(vcpu);
> >  	}
> >  
> > @@ -3032,6 +3037,9 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
> >  		if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP))
> >  			svm_clr_intercept(svm, INTERCEPT_RDTSCP);
> >  	}
> > +
> > +	if (kvm_caps.supported_xss)
> > +		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);
> 
> This is not just a virtualization hole. This allows the guest to set MSR_IA32_XSS
> to whatever value it wants, and thus it might allow XSAVES to access some host msrs
> that guest must not be able to access.
> 
> AMD might not yet have such msrs, but on Intel side I do see various components
> like 'HDC State', 'HWP state' and such.
> 
> I understand that this is needed so that #VC handler could read this msr, and trying
> to read it will cause another #VC which is probably not allowed (I don't know this detail of SEV-ES)
> 
> I guess #VC handler should instead use a kernel cached value of this msr instead, or at least
> KVM should only allow reads and not writes to it.
> 
> In addition to that, if we decide to open the read access to the IA32_XSS from the guest,
> this IMHO should be done in a separate patch.
> 
> Best regards,
> 	Maxim Levitsky
> 
> 
> >  }
> >  
> >  void sev_init_vmcb(struct vcpu_svm *svm)
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 984e89d7a734..ee7c7d0a09ab 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -146,6 +146,7 @@ static const struct svm_direct_access_msrs {
> >  	{ .index = MSR_IA32_PL1_SSP,                    .always = false },
> >  	{ .index = MSR_IA32_PL2_SSP,                    .always = false },
> >  	{ .index = MSR_IA32_PL3_SSP,                    .always = false },
> > +	{ .index = MSR_IA32_XSS,                        .always = false },
> >  	{ .index = MSR_INVALID,				.always = false },
> >  };
> >  
> > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > index bdc39003b955..2011456d2e9f 100644
> > --- a/arch/x86/kvm/svm/svm.h
> > +++ b/arch/x86/kvm/svm/svm.h
> > @@ -30,7 +30,7 @@
> >  #define	IOPM_SIZE PAGE_SIZE * 3
> >  #define	MSRPM_SIZE PAGE_SIZE * 2
> >  
> > -#define MAX_DIRECT_ACCESS_MSRS	53
> > +#define MAX_DIRECT_ACCESS_MSRS	54
> >  #define MSRPM_OFFSETS	32
> >  extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
> >  extern bool npt_enabled;
> > @@ -720,5 +720,6 @@ DEFINE_KVM_GHCB_ACCESSORS(sw_exit_info_1)
> >  DEFINE_KVM_GHCB_ACCESSORS(sw_exit_info_2)
> >  DEFINE_KVM_GHCB_ACCESSORS(sw_scratch)
> >  DEFINE_KVM_GHCB_ACCESSORS(xcr0)
> > +DEFINE_KVM_GHCB_ACCESSORS(xss)
> >  
> >  #endif
> 
> 
> 
>
  

Patch

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 568d97084e44..5afc9e03379d 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -678,5 +678,6 @@  DEFINE_GHCB_ACCESSORS(sw_exit_info_1)
 DEFINE_GHCB_ACCESSORS(sw_exit_info_2)
 DEFINE_GHCB_ACCESSORS(sw_scratch)
 DEFINE_GHCB_ACCESSORS(xcr0)
+DEFINE_GHCB_ACCESSORS(xss)
 
 #endif
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index bb4b18baa6f7..94ab7203525f 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2445,8 +2445,13 @@  static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
 
 	svm->vmcb->save.cpl = kvm_ghcb_get_cpl_if_valid(svm, ghcb);
 
-	if (kvm_ghcb_xcr0_is_valid(svm)) {
-		vcpu->arch.xcr0 = ghcb_get_xcr0(ghcb);
+	if (kvm_ghcb_xcr0_is_valid(svm) || kvm_ghcb_xss_is_valid(svm)) {
+		if (kvm_ghcb_xcr0_is_valid(svm))
+			vcpu->arch.xcr0 = ghcb_get_xcr0(ghcb);
+
+		if (kvm_ghcb_xss_is_valid(svm))
+			vcpu->arch.ia32_xss = ghcb_get_xss(ghcb);
+
 		kvm_update_cpuid_runtime(vcpu);
 	}
 
@@ -3032,6 +3037,9 @@  static void sev_es_init_vmcb(struct vcpu_svm *svm)
 		if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP))
 			svm_clr_intercept(svm, INTERCEPT_RDTSCP);
 	}
+
+	if (kvm_caps.supported_xss)
+		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);
 }
 
 void sev_init_vmcb(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 984e89d7a734..ee7c7d0a09ab 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -146,6 +146,7 @@  static const struct svm_direct_access_msrs {
 	{ .index = MSR_IA32_PL1_SSP,                    .always = false },
 	{ .index = MSR_IA32_PL2_SSP,                    .always = false },
 	{ .index = MSR_IA32_PL3_SSP,                    .always = false },
+	{ .index = MSR_IA32_XSS,                        .always = false },
 	{ .index = MSR_INVALID,				.always = false },
 };
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index bdc39003b955..2011456d2e9f 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -30,7 +30,7 @@ 
 #define	IOPM_SIZE PAGE_SIZE * 3
 #define	MSRPM_SIZE PAGE_SIZE * 2
 
-#define MAX_DIRECT_ACCESS_MSRS	53
+#define MAX_DIRECT_ACCESS_MSRS	54
 #define MSRPM_OFFSETS	32
 extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
 extern bool npt_enabled;
@@ -720,5 +720,6 @@  DEFINE_KVM_GHCB_ACCESSORS(sw_exit_info_1)
 DEFINE_KVM_GHCB_ACCESSORS(sw_exit_info_2)
 DEFINE_KVM_GHCB_ACCESSORS(sw_scratch)
 DEFINE_KVM_GHCB_ACCESSORS(xcr0)
+DEFINE_KVM_GHCB_ACCESSORS(xss)
 
 #endif