[kernel,v3,2/3] KVM: SEV: Enable data breakpoints in SEV-ES

Message ID 20230120031047.628097-3-aik@amd.com
State New
Headers
Series KVM: SEV: Enable AMD SEV-ES DebugSwap |

Commit Message

Alexey Kardashevskiy Jan. 20, 2023, 3:10 a.m. UTC
  Prior to SEV-ES, KVM stored/loaded host debug registers upon switching
to/from a VM. Changing those registers inside a running SEV VM
triggered #VC exit to KVM.

SEV-ES added the encrypted state (ES) which uses an encrypted guest page
for the VM state (VMSA). The hardware saves/restores certain registers on
VMRUN/VMEXIT according to a swap type (A, B, C), see
"Table B-3. Swap Types" in the AMD Architecture Programmer’s Manual
volume 2.

AMD Milan (Fam 19h) introduces support for the debug registers swapping.
DR6 and DR7 are always swapped. DR[0-3] and DR[0-3]_ADDR_MASK are swapped
a type B when SEV_FEATURES[5] ("DebugSwap") is set.

Enable DebugSwap in VMSA. But only do so if CPUID Fn80000021_EAX[0]
("NoNestedDataBp", "Processor ignores nested data breakpoints") is
supported by the SOC as otherwise a malicious SEV-ES guest can set up
data breakpoints on the #VC IDT entry/stack and cause an infinite loop.

Eliminate DR7 and #DB intercepts as:
- they are not needed when DebugSwap is supported;
- #VC for these intercepts is most likely not supported anyway and
kills the VM.
Keep DR7 intercepted unless DebugSwap enabled to prevent the infinite #DB
loop DoS.

Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
---
Changes:
v3:
* rewrote the commit log again
* rebased on tip/master to use recently defined X86_FEATURE_NO_NESTED_DATA_BP
* s/boot_cpu_has/cpu_feature_enabled/

v2:
* debug_swap moved from vcpu to module_param
* rewrote commit log

---
Tested with:
===
int x;
int main(int argc, char *argv[])
{
        x = 1;
        return 0;
}
===
gcc -g a.c
rsync a.out ruby-954vm:~/
ssh -t ruby-954vm 'gdb -ex "file a.out" -ex "watch x" -ex r'

where ruby-954vm is a VM.

With "/sys/module/kvm_amd/parameters/debug_swap = 0", gdb does not stop
on the watchpoint, with "= 1" - gdb does.
---
 arch/x86/include/asm/svm.h |  1 +
 arch/x86/kvm/svm/svm.h     | 16 ++++++++---
 arch/x86/kvm/svm/sev.c     | 29 ++++++++++++++++++++
 arch/x86/kvm/svm/svm.c     |  3 +-
 4 files changed, 44 insertions(+), 5 deletions(-)
  

Comments

Borislav Petkov Jan. 31, 2023, 7:22 p.m. UTC | #1
Hey Sean,

On Fri, Jan 20, 2023 at 02:10:46PM +1100, Alexey Kardashevskiy wrote:
> Prior to SEV-ES, KVM stored/loaded host debug registers upon switching
> to/from a VM. Changing those registers inside a running SEV VM
> triggered #VC exit to KVM.
> 
> SEV-ES added the encrypted state (ES) which uses an encrypted guest page
> for the VM state (VMSA). The hardware saves/restores certain registers on
> VMRUN/VMEXIT according to a swap type (A, B, C), see
> "Table B-3. Swap Types" in the AMD Architecture Programmer’s Manual
> volume 2.
> 
> AMD Milan (Fam 19h) introduces support for the debug registers swapping.
> DR6 and DR7 are always swapped. DR[0-3] and DR[0-3]_ADDR_MASK are swapped
> a type B when SEV_FEATURES[5] ("DebugSwap") is set.
> 
> Enable DebugSwap in VMSA. But only do so if CPUID Fn80000021_EAX[0]
> ("NoNestedDataBp", "Processor ignores nested data breakpoints") is
> supported by the SOC as otherwise a malicious SEV-ES guest can set up
> data breakpoints on the #VC IDT entry/stack and cause an infinite loop.
> 
> Eliminate DR7 and #DB intercepts as:
> - they are not needed when DebugSwap is supported;
> - #VC for these intercepts is most likely not supported anyway and
> kills the VM.
> Keep DR7 intercepted unless DebugSwap enabled to prevent the infinite #DB
> loop DoS.

...

ok to take this through the tip tree?

Thx.
  
Sean Christopherson Feb. 1, 2023, 2:18 a.m. UTC | #2
On Fri, Jan 20, 2023, Alexey Kardashevskiy wrote:
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 4826e6cc611b..61f2cad1cbaf 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -389,6 +389,8 @@ static inline bool vmcb12_is_intercept(struct vmcb_ctrl_area_cached *control, u3
>  	return test_bit(bit, (unsigned long *)&control->intercepts);
>  }
>  
> +extern bool sev_es_is_debug_swap_enabled(void);
> +
>  static inline void set_dr_intercepts(struct vcpu_svm *svm)
>  {
>  	struct vmcb *vmcb = svm->vmcb01.ptr;
> @@ -410,8 +412,10 @@ static inline void set_dr_intercepts(struct vcpu_svm *svm)
>  		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_WRITE);
>  	}
>  
> -	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
> -	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
> +	if (!sev_es_guest(svm->vcpu.kvm) || !sev_es_is_debug_swap_enabled()) {

Looking below, doesn't this do the wrong thing if set_dr_intercepts() is called
before SVM_SEV_FEAT_DEBUG_SWAP is set?  I.e. when this is called before LAUNCH_UPDATE?
Seems like this should check SVM_SEV_FEAT_DEBUG_SWAP in sev_features regardless
of when SVM_SEV_FEAT_DEBUG_SWAP is set.

And if KVM checks sev_features, then I _think_ we can avoid having to expose
sev_es_debug_swap_enabled to svm.{c,h} (though why on earth {set,clr}_dr_intercepts()
is in svm.h is another question for the future).

Follow-up question: does KVM _have_ to wait until KVM_SEV_LAUNCH_UPDATE_VMSA to
set the flag?

> +		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
> +		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
> +	}
>  
>  	recalc_intercepts(svm);
>  }
> @@ -422,8 +426,12 @@ static inline void clr_dr_intercepts(struct vcpu_svm *svm)
>  
>  	vmcb->control.intercepts[INTERCEPT_DR] = 0;
>  
> -	/* DR7 access must remain intercepted for an SEV-ES guest */
> -	if (sev_es_guest(svm->vcpu.kvm)) {
> +	/*
> +	 * DR7 access must remain intercepted for an SEV-ES guest unless DebugSwap
> +	 * (depends on NO_NESTED_DATA_BP) is enabled as otherwise a VM writing to DR7
> +	 * from the #DB handler may trigger infinite loop of #DB's.
> +	 */
> +	if (sev_es_guest(svm->vcpu.kvm) && !sev_es_is_debug_swap_enabled()) {
>  		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
>  		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
>  	}
>
> @@ -52,11 +53,21 @@ module_param_named(sev, sev_enabled, bool, 0444);
>  /* enable/disable SEV-ES support */
>  static bool sev_es_enabled = true;
>  module_param_named(sev_es, sev_es_enabled, bool, 0444);
> +
> +/* enable/disable SEV-ES DebugSwap support */
> +static bool sev_es_debug_swap_enabled = true;
> +module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0644);

Module param needs 0444 permissions, i.e. shouldn't be writable after KVM is
loaded.  Though I don't know that providing a module param is warranted in this
case.  KVM provides module params for SEV and SEV-ES because there are legitimate
reasons to turn them off, but at a glance, I don't see why we'd want that for this
feature.

>  #else
>  #define sev_enabled false
>  #define sev_es_enabled false
> +#define sev_es_debug_swap false

This needs to be sev_es_debug_swap_enabled, otherwise things fall apart with
CONFIG_KVM_AMD_SEV=n.

arch/x86/kvm/svm/sev.c: In function ‘sev_es_is_debug_swap_enabled’:
arch/x86/kvm/svm/sev.c:69:16: error: ‘sev_es_debug_swap_enabled’ undeclared (first use in this function); did you mean ‘sev_es_is_debug_swap_enabled’?
   69 |         return sev_es_debug_swap_enabled;
      |                ^~~~~~~~~~~~~~~~~~~~~~~~~
      |                sev_es_is_debug_swap_enabled


>  #endif /* CONFIG_KVM_AMD_SEV */
>  
> +bool sev_es_is_debug_swap_enabled(void)
> +{
> +	return sev_es_debug_swap_enabled;
> +}

...

> @@ -604,6 +615,9 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
>  	save->xss  = svm->vcpu.arch.ia32_xss;
>  	save->dr6  = svm->vcpu.arch.dr6;
>  
> +	if (sev_es_is_debug_swap_enabled())
> +		save->sev_features |= SVM_SEV_FEAT_DEBUG_SWAP;
> +
>  	pr_debug("Virtual Machine Save Area (VMSA):\n");
>  	print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
>  
> @@ -2249,6 +2263,9 @@ void __init sev_hardware_setup(void)
>  out:
>  	sev_enabled = sev_supported;
>  	sev_es_enabled = sev_es_supported;
> +	if (sev_es_debug_swap_enabled)
> +		sev_es_debug_swap_enabled = sev_es_enabled &&
> +			cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP);

Slight preference for:

	if (!sev_es_enabled || !cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP))
		sev_es_debug_swap_enabled = false;

KVM does short-circuit some checks on module param values, but usually only to
avoid additional setup.

>  #endif
>  }
>  
> @@ -3027,6 +3044,18 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
>  
>  	/* MSR_IA32_XSS is restored on VMEXIT, save the currnet host value */
>  	hostsa->xss = host_xss;
> +
> +	/* The DebugSwap SEV feature does Type B swaps of DR[0-3] */
> +	if (sev_es_is_debug_swap_enabled()) {
> +		hostsa->dr0 = native_get_debugreg(0);
> +		hostsa->dr1 = native_get_debugreg(1);
> +		hostsa->dr2 = native_get_debugreg(2);
> +		hostsa->dr3 = native_get_debugreg(3);
> +		hostsa->dr0_addr_mask = amd_get_dr_addr_mask(0);
> +		hostsa->dr1_addr_mask = amd_get_dr_addr_mask(1);
> +		hostsa->dr2_addr_mask = amd_get_dr_addr_mask(2);
> +		hostsa->dr3_addr_mask = amd_get_dr_addr_mask(3);
> +	}
>  }
>  
>  void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 60c7c880266b..6c54a3c9d442 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1190,7 +1190,8 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
>  	set_exception_intercept(svm, UD_VECTOR);
>  	set_exception_intercept(svm, MC_VECTOR);
>  	set_exception_intercept(svm, AC_VECTOR);
> -	set_exception_intercept(svm, DB_VECTOR);
> +	if (!sev_es_is_debug_swap_enabled())
> +		set_exception_intercept(svm, DB_VECTOR);

This is wrong.  KVM needs to intercept #DBs when debugging non-SEV-ES VMs.
This _could_ be tied to X86_FEATURE_NO_NESTED_DATA_BP, but the KVM would need to
toggle the intercept depending on whether or not userspace wants to debug the
guest.

Similar to the DR7 interception, can this check sev_features directly?
  
Sean Christopherson Feb. 1, 2023, 2:20 a.m. UTC | #3
On Tue, Jan 31, 2023, Borislav Petkov wrote:
> Hey Sean,
> 
> On Fri, Jan 20, 2023 at 02:10:46PM +1100, Alexey Kardashevskiy wrote:
> > Prior to SEV-ES, KVM stored/loaded host debug registers upon switching
> > to/from a VM. Changing those registers inside a running SEV VM
> > triggered #VC exit to KVM.
> > 
> > SEV-ES added the encrypted state (ES) which uses an encrypted guest page
> > for the VM state (VMSA). The hardware saves/restores certain registers on
> > VMRUN/VMEXIT according to a swap type (A, B, C), see
> > "Table B-3. Swap Types" in the AMD Architecture Programmer’s Manual
> > volume 2.
> > 
> > AMD Milan (Fam 19h) introduces support for the debug registers swapping.
> > DR6 and DR7 are always swapped. DR[0-3] and DR[0-3]_ADDR_MASK are swapped
> > a type B when SEV_FEATURES[5] ("DebugSwap") is set.
> > 
> > Enable DebugSwap in VMSA. But only do so if CPUID Fn80000021_EAX[0]
> > ("NoNestedDataBp", "Processor ignores nested data breakpoints") is
> > supported by the SOC as otherwise a malicious SEV-ES guest can set up
> > data breakpoints on the #VC IDT entry/stack and cause an infinite loop.
> > 
> > Eliminate DR7 and #DB intercepts as:
> > - they are not needed when DebugSwap is supported;
> > - #VC for these intercepts is most likely not supported anyway and
> > kills the VM.
> > Keep DR7 intercepted unless DebugSwap enabled to prevent the infinite #DB
> > loop DoS.
> 
> ...
> 
> ok to take this through the tip tree?

I would prefer to take this through KVM, there's enough subtle complexity in this
code that it'd be nice to have it close by.

If you're happy with patch 1, maybe ack that one and take it through KVM, and
route patch 3 through tip?
  
Sean Christopherson Feb. 1, 2023, 7:32 p.m. UTC | #4
On Wed, Feb 01, 2023, Sean Christopherson wrote:
> On Tue, Jan 31, 2023, Borislav Petkov wrote:
> > Hey Sean,
> > 
> > On Fri, Jan 20, 2023 at 02:10:46PM +1100, Alexey Kardashevskiy wrote:
> > > Prior to SEV-ES, KVM stored/loaded host debug registers upon switching
> > > to/from a VM. Changing those registers inside a running SEV VM
> > > triggered #VC exit to KVM.
> > > 
> > > SEV-ES added the encrypted state (ES) which uses an encrypted guest page
> > > for the VM state (VMSA). The hardware saves/restores certain registers on
> > > VMRUN/VMEXIT according to a swap type (A, B, C), see
> > > "Table B-3. Swap Types" in the AMD Architecture Programmer’s Manual
> > > volume 2.
> > > 
> > > AMD Milan (Fam 19h) introduces support for the debug registers swapping.
> > > DR6 and DR7 are always swapped. DR[0-3] and DR[0-3]_ADDR_MASK are swapped
> > > a type B when SEV_FEATURES[5] ("DebugSwap") is set.
> > > 
> > > Enable DebugSwap in VMSA. But only do so if CPUID Fn80000021_EAX[0]
> > > ("NoNestedDataBp", "Processor ignores nested data breakpoints") is
> > > supported by the SOC as otherwise a malicious SEV-ES guest can set up
> > > data breakpoints on the #VC IDT entry/stack and cause an infinite loop.
> > > 
> > > Eliminate DR7 and #DB intercepts as:
> > > - they are not needed when DebugSwap is supported;
> > > - #VC for these intercepts is most likely not supported anyway and
> > > kills the VM.
> > > Keep DR7 intercepted unless DebugSwap enabled to prevent the infinite #DB
> > > loop DoS.
> > 
> > ...
> > 
> > ok to take this through the tip tree?
> 
> I would prefer to take this through KVM, there's enough subtle complexity in this
> code that it'd be nice to have it close by.
> 
> If you're happy with patch 1, maybe ack that one and take it through KVM, and
> route patch 3 through tip?

Ah, you've already applied 1.  That works too.  I don't think KVM support for
DebugSwap is going to make v6.3 no matter who takes what, so just base on the
next version of this patch on tip/x86/cpu and I'll make a mental note to not try
to grab this until after v6.3-rc1.
  
Alexey Kardashevskiy Feb. 3, 2023, 3:37 a.m. UTC | #5
On 01/02/2023 13:18, Sean Christopherson wrote:
> On Fri, Jan 20, 2023, Alexey Kardashevskiy wrote:
>> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
>> index 4826e6cc611b..61f2cad1cbaf 100644
>> --- a/arch/x86/kvm/svm/svm.h
>> +++ b/arch/x86/kvm/svm/svm.h
>> @@ -389,6 +389,8 @@ static inline bool vmcb12_is_intercept(struct vmcb_ctrl_area_cached *control, u3
>>   	return test_bit(bit, (unsigned long *)&control->intercepts);
>>   }
>>   
>> +extern bool sev_es_is_debug_swap_enabled(void);
>> +
>>   static inline void set_dr_intercepts(struct vcpu_svm *svm)
>>   {
>>   	struct vmcb *vmcb = svm->vmcb01.ptr;
>> @@ -410,8 +412,10 @@ static inline void set_dr_intercepts(struct vcpu_svm *svm)
>>   		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_WRITE);
>>   	}
>>   
>> -	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
>> -	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
>> +	if (!sev_es_guest(svm->vcpu.kvm) || !sev_es_is_debug_swap_enabled()) {
> 
> Looking below, doesn't this do the wrong thing if set_dr_intercepts() is called
> before SVM_SEV_FEAT_DEBUG_SWAP is set?  I.e. when this is called before LAUNCH_UPDATE?
> Seems like this should check SVM_SEV_FEAT_DEBUG_SWAP in sev_features regardless
> of when SVM_SEV_FEAT_DEBUG_SWAP is set.
> 
> And if KVM checks sev_features, then I _think_ we can avoid having to expose
> sev_es_debug_swap_enabled to svm.{c,h} (though why on earth {set,clr}_dr_intercepts()
> is in svm.h is another question for the future).


883b0a91f41a ("KVM: SVM: Move Nested SVM Implementation to nested.c") 
did that. Makes sense for things like vmcb_set_intercept() but 
{set,clr}_dr_intercepts() are still only called from svm.c so I'll move 
them there (btw do I need a separate patch for that? usually yes)

> 
> Follow-up question: does KVM _have_ to wait until KVM_SEV_LAUNCH_UPDATE_VMSA to
> set the flag?

Nope. Will repost soon as a reply to this mail.

>> +		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
>> +		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
>> +	}
>>   
>>   	recalc_intercepts(svm);
>>   }
>> @@ -422,8 +426,12 @@ static inline void clr_dr_intercepts(struct vcpu_svm *svm)
>>   
>>   	vmcb->control.intercepts[INTERCEPT_DR] = 0;
>>   
>> -	/* DR7 access must remain intercepted for an SEV-ES guest */
>> -	if (sev_es_guest(svm->vcpu.kvm)) {
>> +	/*
>> +	 * DR7 access must remain intercepted for an SEV-ES guest unless DebugSwap
>> +	 * (depends on NO_NESTED_DATA_BP) is enabled as otherwise a VM writing to DR7
>> +	 * from the #DB handler may trigger infinite loop of #DB's.
>> +	 */
>> +	if (sev_es_guest(svm->vcpu.kvm) && !sev_es_is_debug_swap_enabled()) {
>>   		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
>>   		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
>>   	}
>>
>> @@ -52,11 +53,21 @@ module_param_named(sev, sev_enabled, bool, 0444);
>>   /* enable/disable SEV-ES support */
>>   static bool sev_es_enabled = true;
>>   module_param_named(sev_es, sev_es_enabled, bool, 0444);
>> +
>> +/* enable/disable SEV-ES DebugSwap support */
>> +static bool sev_es_debug_swap_enabled = true;
>> +module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0644);
> 
> Module param needs 0444 permissions, i.e. shouldn't be writable after KVM is
> loaded.  Though I don't know that providing a module param is warranted in this
> case.

> KVM provides module params for SEV and SEV-ES because there are legitimate
> reasons to turn them off, but at a glance, I don't see why we'd want that for this
> feature.


/me confused. You suggested this in the first place for (I think) for 
the case if the feature is found to be broken later on so admins can 
disable it.

And I was using it to verify "x86/debug: Fix stack recursion caused by 
DR7 accesses" which is convenient but it is a minor thing.



>>   #else
>>   #define sev_enabled false
>>   #define sev_es_enabled false
>> +#define sev_es_debug_swap false
> 
> This needs to be sev_es_debug_swap_enabled, otherwise things fall apart with
> CONFIG_KVM_AMD_SEV=n.
> 
> arch/x86/kvm/svm/sev.c: In function ‘sev_es_is_debug_swap_enabled’:
> arch/x86/kvm/svm/sev.c:69:16: error: ‘sev_es_debug_swap_enabled’ undeclared (first use in this function); did you mean ‘sev_es_is_debug_swap_enabled’?
>     69 |         return sev_es_debug_swap_enabled;
>        |                ^~~~~~~~~~~~~~~~~~~~~~~~~
>        |                sev_es_is_debug_swap_enabled
> 
> 
>>   #endif /* CONFIG_KVM_AMD_SEV */
>>   
>> +bool sev_es_is_debug_swap_enabled(void)
>> +{
>> +	return sev_es_debug_swap_enabled;
>> +}
> 
> ...
> 
>> @@ -604,6 +615,9 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
>>   	save->xss  = svm->vcpu.arch.ia32_xss;
>>   	save->dr6  = svm->vcpu.arch.dr6;
>>   
>> +	if (sev_es_is_debug_swap_enabled())
>> +		save->sev_features |= SVM_SEV_FEAT_DEBUG_SWAP;
>> +
>>   	pr_debug("Virtual Machine Save Area (VMSA):\n");
>>   	print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
>>   
>> @@ -2249,6 +2263,9 @@ void __init sev_hardware_setup(void)
>>   out:
>>   	sev_enabled = sev_supported;
>>   	sev_es_enabled = sev_es_supported;
>> +	if (sev_es_debug_swap_enabled)
>> +		sev_es_debug_swap_enabled = sev_es_enabled &&
>> +			cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP);
> 
> Slight preference for:
> 
> 	if (!sev_es_enabled || !cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP))
> 		sev_es_debug_swap_enabled = false;
> 
> KVM does short-circuit some checks on module param values, but usually only to
> avoid additional setup.
> 
>>   #endif
>>   }
>>   
>> @@ -3027,6 +3044,18 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
>>   
>>   	/* MSR_IA32_XSS is restored on VMEXIT, save the currnet host value */
>>   	hostsa->xss = host_xss;
>> +
>> +	/* The DebugSwap SEV feature does Type B swaps of DR[0-3] */
>> +	if (sev_es_is_debug_swap_enabled()) {
>> +		hostsa->dr0 = native_get_debugreg(0);
>> +		hostsa->dr1 = native_get_debugreg(1);
>> +		hostsa->dr2 = native_get_debugreg(2);
>> +		hostsa->dr3 = native_get_debugreg(3);
>> +		hostsa->dr0_addr_mask = amd_get_dr_addr_mask(0);
>> +		hostsa->dr1_addr_mask = amd_get_dr_addr_mask(1);
>> +		hostsa->dr2_addr_mask = amd_get_dr_addr_mask(2);
>> +		hostsa->dr3_addr_mask = amd_get_dr_addr_mask(3);
>> +	}
>>   }
>>   
>>   void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index 60c7c880266b..6c54a3c9d442 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -1190,7 +1190,8 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
>>   	set_exception_intercept(svm, UD_VECTOR);
>>   	set_exception_intercept(svm, MC_VECTOR);
>>   	set_exception_intercept(svm, AC_VECTOR);
>> -	set_exception_intercept(svm, DB_VECTOR);
>> +	if (!sev_es_is_debug_swap_enabled())
>> +		set_exception_intercept(svm, DB_VECTOR);
> 
> This is wrong.  KVM needs to intercept #DBs when debugging non-SEV-ES VMs.

Sorry, not following. The #DB intercept for non-SEV-ES is enabled here. 
Thanks,


> This _could_ be tied to X86_FEATURE_NO_NESTED_DATA_BP, but the KVM would need to
> toggle the intercept depending on whether or not userspace wants to debug the
> guest.
> 
> Similar to the DR7 interception, can this check sev_features directly?
  
Borislav Petkov Feb. 3, 2023, 12:26 p.m. UTC | #6
On Wed, Feb 01, 2023 at 07:32:29PM +0000, Sean Christopherson wrote:
> Ah, you've already applied 1.  That works too.  I don't think KVM support for
> DebugSwap is going to make v6.3 no matter who takes what, so just base on the
> next version of this patch on tip/x86/cpu and I'll make a mental note to not try
> to grab this until after v6.3-rc1.

Yeah, I was thinking the same thing, seeing how you guys are swamped
with stuff. (Who isn't?! ;-\). So yeah, let's do that and if there's
change of plans, just ping me.

Thx.
  
Sean Christopherson March 23, 2023, 4:39 p.m. UTC | #7
On Fri, Feb 03, 2023, Alexey Kardashevskiy wrote:
> > Follow-up question: does KVM _have_ to wait until KVM_SEV_LAUNCH_UPDATE_VMSA to
> > set the flag?
> 
> Nope. Will repost soon as a reply to this mail.

Please, please do not post new versions In-Reply-To the previous version, and
especially not In-Reply-To a random mail buried deep in the thread.  b4, which
is imperfect but makes my life sooo much easier, gets confused by all the threading
and partial rerolls.  The next version also buries _this_ reply, which is partly
why I haven't responded until now.  I simply missed this the below questions because
I saw v4 and assumed all my feedback was addressed, i.e. that I could handle this
in the context of 6.4 and not earlier.

Continuing on that topic, please do not post a new version until open questions
from the previous version are resolved.  Posting a new version when there are
unresolved questions might feel like it helps things move faster, but more often
than not it has the comlete opposite effect.

> > > +/* enable/disable SEV-ES DebugSwap support */
> > > +static bool sev_es_debug_swap_enabled = true;
> > > +module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0644);
> > 
> > Module param needs 0444 permissions, i.e. shouldn't be writable after KVM is
> > loaded.  Though I don't know that providing a module param is warranted in this
> > case.
> > KVM provides module params for SEV and SEV-ES because there are legitimate
> > reasons to turn them off, but at a glance, I don't see why we'd want that for this
> > feature.
> 
> 
> /me confused. You suggested this in the first place for (I think) for the
> case if the feature is found to be broken later on so admins can disable it.

Hrm, so I did.  Right, IIUC, this has guest visible effects, i.e. guest can
read/write DRs, and so the admin might want the ability to disable the feature.

Speaking of past me, no one answered my question about how this will interact
with SNP, where the VM can maniuplate the VMSA.

  : > @@ -604,6 +607,9 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
  : >       save->xss  = svm->vcpu.arch.ia32_xss;
  : >       save->dr6  = svm->vcpu.arch.dr6;
  : > 
  : > +     if (sev->debug_swap)
  : > +             save->sev_features |= SVM_SEV_FEAT_DEBUG_SWAP;
  : 
  : Resurrecting my objection to "AP Creation NAE event"[*], what happens if a hypervisor
  : supports GHCB_HV_FT_SNP_AP_CREATION but not DebugSwap?  IIUC, a guest can corrupt
  : host DRs by enabling DebugSwap in the VMSA of an AP vCPU, e.g. the CPU will load
  : zeros on VM-Exit if the host hasn't stuffed the host save area fields.
  : 
  : KVM can obviously just make sure to save its DRs if hardware supports DebugSwap,
  : but what if DebugSwap is buggy and needs to be disabled?  And what about the next
  : feature that can apparently be enabled by the guest?
  : 
  : [*] https://lore.kernel.org/all/YWnbfCet84Vup6q9@google.com

> And I was using it to verify "x86/debug: Fix stack recursion caused by DR7
> accesses" which is convenient but it is a minor thing.

...

> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index 60c7c880266b..6c54a3c9d442 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -1190,7 +1190,8 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
> > >   	set_exception_intercept(svm, UD_VECTOR);
> > >   	set_exception_intercept(svm, MC_VECTOR);
> > >   	set_exception_intercept(svm, AC_VECTOR);
> > > -	set_exception_intercept(svm, DB_VECTOR);
> > > +	if (!sev_es_is_debug_swap_enabled())
> > > +		set_exception_intercept(svm, DB_VECTOR);
> > 
> > This is wrong.  KVM needs to intercept #DBs when debugging non-SEV-ES VMs.
> 
> Sorry, not following. The #DB intercept for non-SEV-ES is enabled here.

The helper in this version (v3) just queries whether or not the feature is enabled,
it doesn't differentiate between SEV-ES and other VM types.  I.e. loading KVM with
SEV-ES and DebugSwap enabled would break non-SEV-ES VMs running on the same host.

 +bool sev_es_is_debug_swap_enabled(void)
 +{
 +     return sev_es_debug_swap_enabled;
 +}

Looks like this was fixed in v4.

> > This _could_ be tied to X86_FEATURE_NO_NESTED_DATA_BP, but the KVM would need to
> > toggle the intercept depending on whether or not userspace wants to debug the
> > guest.
> > 
> > Similar to the DR7 interception, can this check sev_features directly?
  
Alexey Kardashevskiy March 24, 2023, 4:05 a.m. UTC | #8
On 24/3/23 03:39, Sean Christopherson wrote:
> On Fri, Feb 03, 2023, Alexey Kardashevskiy wrote:
>>> Follow-up question: does KVM _have_ to wait until KVM_SEV_LAUNCH_UPDATE_VMSA to
>>> set the flag?
>>
>> Nope. Will repost soon as a reply to this mail.
> 
> Please, please do not post new versions In-Reply-To the previous version, and
> especially not In-Reply-To a random mail buried deep in the thread.  b4, which
> is imperfect but makes my life sooo much easier, gets confused by all the threading
> and partial rerolls.  The next version also buries _this_ reply, which is partly
> why I haven't responded until now.  I simply missed this the below questions because
> I saw v4 and assumed all my feedback was addressed, i.e. that I could handle this
> in the context of 6.4 and not earlier.

Ok, noted. Sorry for the mess.

> Continuing on that topic, please do not post a new version until open questions
> from the previous version are resolved.  Posting a new version when there are
> unresolved questions might feel like it helps things move faster, but more often
> than not it has the comlete opposite effect.

Well I thought keeping the history in one place/thread is helping. Oh 
well...


>>>> +/* enable/disable SEV-ES DebugSwap support */
>>>> +static bool sev_es_debug_swap_enabled = true;
>>>> +module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0644);
>>>
>>> Module param needs 0444 permissions, i.e. shouldn't be writable after KVM is
>>> loaded.  Though I don't know that providing a module param is warranted in this
>>> case.
>>> KVM provides module params for SEV and SEV-ES because there are legitimate
>>> reasons to turn them off, but at a glance, I don't see why we'd want that for this
>>> feature.
>>
>>
>> /me confused. You suggested this in the first place for (I think) for the
>> case if the feature is found to be broken later on so admins can disable it.
> 
> Hrm, so I did.  Right, IIUC, this has guest visible effects, i.e. guest can
> read/write DRs, and so the admin might want the ability to disable the feature.
> 
> Speaking of past me, no one answered my question about how this will interact
> with SNP, where the VM can maniuplate the VMSA.
> 
>    : > @@ -604,6 +607,9 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
>    : >       save->xss  = svm->vcpu.arch.ia32_xss;
>    : >       save->dr6  = svm->vcpu.arch.dr6;
>    : >
>    : > +     if (sev->debug_swap)
>    : > +             save->sev_features |= SVM_SEV_FEAT_DEBUG_SWAP;
>    :
>    : Resurrecting my objection to "AP Creation NAE event"[*], what happens if a hypervisor
>    : supports GHCB_HV_FT_SNP_AP_CREATION but not DebugSwap?  IIUC, a guest can corrupt
>    : host DRs by enabling DebugSwap in the VMSA of an AP vCPU, e.g. the CPU will load
>    : zeros on VM-Exit if the host hasn't stuffed the host save area fields.

I was convinced it was answered (by Tom). (...10min later...) now I can 
see it was an internal email :-/

The host will have to save DRs and masks to the host save area for SNP 
and non-SNP guests (until we get the hw which allows masking features). 
Which patchset will do this - it depends on what gets accepted first - 
this or the SNP support.



>    : KVM can obviously just make sure to save its DRs if hardware supports DebugSwap,
>    : but what if DebugSwap is buggy and needs to be disabled?  And what about the next
>    : feature that can apparently be enabled by the guest?
>    :
>    : [*] https://lore.kernel.org/all/YWnbfCet84Vup6q9@google.com
> 
>> And I was using it to verify "x86/debug: Fix stack recursion caused by DR7
>> accesses" which is convenient but it is a minor thing.
> 
> ...
> 
>>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>>>> index 60c7c880266b..6c54a3c9d442 100644
>>>> --- a/arch/x86/kvm/svm/svm.c
>>>> +++ b/arch/x86/kvm/svm/svm.c
>>>> @@ -1190,7 +1190,8 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
>>>>    	set_exception_intercept(svm, UD_VECTOR);
>>>>    	set_exception_intercept(svm, MC_VECTOR);
>>>>    	set_exception_intercept(svm, AC_VECTOR);
>>>> -	set_exception_intercept(svm, DB_VECTOR);
>>>> +	if (!sev_es_is_debug_swap_enabled())
>>>> +		set_exception_intercept(svm, DB_VECTOR);
>>>
>>> This is wrong.  KVM needs to intercept #DBs when debugging non-SEV-ES VMs.
>>
>> Sorry, not following. The #DB intercept for non-SEV-ES is enabled here.
> 
> The helper in this version (v3) just queries whether or not the feature is enabled,
> it doesn't differentiate between SEV-ES and other VM types.  I.e. loading KVM with
> SEV-ES and DebugSwap enabled would break non-SEV-ES VMs running on the same host.
> 
>   +bool sev_es_is_debug_swap_enabled(void)
>   +{
>   +     return sev_es_debug_swap_enabled;
>   +}
> 
> Looks like this was fixed in v4.

Right. I'll try addressing the comments from the other big reply of 
yours and send the patches. Thanks for all comments!


>>> This _could_ be tied to X86_FEATURE_NO_NESTED_DATA_BP, but the KVM would need to
>>> toggle the intercept depending on whether or not userspace wants to debug the
>>> guest.
>>>
>>> Similar to the DR7 interception, can this check sev_features directly?
  

Patch

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index cb1ee53ad3b1..665515c7edae 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -278,6 +278,7 @@  enum avic_ipi_failure_cause {
 #define AVIC_HPA_MASK	~((0xFFFULL << 52) | 0xFFF)
 #define VMCB_AVIC_APIC_BAR_MASK		0xFFFFFFFFFF000ULL
 
+#define SVM_SEV_FEAT_DEBUG_SWAP                        BIT(5)
 
 struct vmcb_seg {
 	u16 selector;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 4826e6cc611b..61f2cad1cbaf 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -389,6 +389,8 @@  static inline bool vmcb12_is_intercept(struct vmcb_ctrl_area_cached *control, u3
 	return test_bit(bit, (unsigned long *)&control->intercepts);
 }
 
+extern bool sev_es_is_debug_swap_enabled(void);
+
 static inline void set_dr_intercepts(struct vcpu_svm *svm)
 {
 	struct vmcb *vmcb = svm->vmcb01.ptr;
@@ -410,8 +412,10 @@  static inline void set_dr_intercepts(struct vcpu_svm *svm)
 		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_WRITE);
 	}
 
-	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
-	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
+	if (!sev_es_guest(svm->vcpu.kvm) || !sev_es_is_debug_swap_enabled()) {
+		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
+		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
+	}
 
 	recalc_intercepts(svm);
 }
@@ -422,8 +426,12 @@  static inline void clr_dr_intercepts(struct vcpu_svm *svm)
 
 	vmcb->control.intercepts[INTERCEPT_DR] = 0;
 
-	/* DR7 access must remain intercepted for an SEV-ES guest */
-	if (sev_es_guest(svm->vcpu.kvm)) {
+	/*
+	 * DR7 access must remain intercepted for an SEV-ES guest unless DebugSwap
+	 * (depends on NO_NESTED_DATA_BP) is enabled as otherwise a VM writing to DR7
+	 * from the #DB handler may trigger infinite loop of #DB's.
+	 */
+	if (sev_es_guest(svm->vcpu.kvm) && !sev_es_is_debug_swap_enabled()) {
 		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
 		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
 	}
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 86d6897f4806..e989a6f4953d 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -21,6 +21,7 @@ 
 #include <asm/pkru.h>
 #include <asm/trapnr.h>
 #include <asm/fpu/xcr.h>
+#include <asm/debugreg.h>
 
 #include "mmu.h"
 #include "x86.h"
@@ -52,11 +53,21 @@  module_param_named(sev, sev_enabled, bool, 0444);
 /* enable/disable SEV-ES support */
 static bool sev_es_enabled = true;
 module_param_named(sev_es, sev_es_enabled, bool, 0444);
+
+/* enable/disable SEV-ES DebugSwap support */
+static bool sev_es_debug_swap_enabled = true;
+module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0644);
 #else
 #define sev_enabled false
 #define sev_es_enabled false
+#define sev_es_debug_swap false
 #endif /* CONFIG_KVM_AMD_SEV */
 
+bool sev_es_is_debug_swap_enabled(void)
+{
+	return sev_es_debug_swap_enabled;
+}
+
 static u8 sev_enc_bit;
 static DECLARE_RWSEM(sev_deactivate_lock);
 static DEFINE_MUTEX(sev_bitmap_lock);
@@ -604,6 +615,9 @@  static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 	save->xss  = svm->vcpu.arch.ia32_xss;
 	save->dr6  = svm->vcpu.arch.dr6;
 
+	if (sev_es_is_debug_swap_enabled())
+		save->sev_features |= SVM_SEV_FEAT_DEBUG_SWAP;
+
 	pr_debug("Virtual Machine Save Area (VMSA):\n");
 	print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
 
@@ -2249,6 +2263,9 @@  void __init sev_hardware_setup(void)
 out:
 	sev_enabled = sev_supported;
 	sev_es_enabled = sev_es_supported;
+	if (sev_es_debug_swap_enabled)
+		sev_es_debug_swap_enabled = sev_es_enabled &&
+			cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP);
 #endif
 }
 
@@ -3027,6 +3044,18 @@  void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
 
 	/* MSR_IA32_XSS is restored on VMEXIT, save the currnet host value */
 	hostsa->xss = host_xss;
+
+	/* The DebugSwap SEV feature does Type B swaps of DR[0-3] */
+	if (sev_es_is_debug_swap_enabled()) {
+		hostsa->dr0 = native_get_debugreg(0);
+		hostsa->dr1 = native_get_debugreg(1);
+		hostsa->dr2 = native_get_debugreg(2);
+		hostsa->dr3 = native_get_debugreg(3);
+		hostsa->dr0_addr_mask = amd_get_dr_addr_mask(0);
+		hostsa->dr1_addr_mask = amd_get_dr_addr_mask(1);
+		hostsa->dr2_addr_mask = amd_get_dr_addr_mask(2);
+		hostsa->dr3_addr_mask = amd_get_dr_addr_mask(3);
+	}
 }
 
 void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 60c7c880266b..6c54a3c9d442 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1190,7 +1190,8 @@  static void init_vmcb(struct kvm_vcpu *vcpu)
 	set_exception_intercept(svm, UD_VECTOR);
 	set_exception_intercept(svm, MC_VECTOR);
 	set_exception_intercept(svm, AC_VECTOR);
-	set_exception_intercept(svm, DB_VECTOR);
+	if (!sev_es_is_debug_swap_enabled())
+		set_exception_intercept(svm, DB_VECTOR);
 	/*
 	 * Guest access to VMware backdoor ports could legitimately
 	 * trigger #GP because of TSS I/O permission bitmap.