[02/13] KVM: nSVM: don't call nested_sync_control_from_vmcb02 on each VM exit

Message ID 20221117143242.102721-3-mlevitsk@redhat.com
State New
Headers
Series SVM: vNMI (with my fixes) |

Commit Message

Maxim Levitsky Nov. 17, 2022, 2:32 p.m. UTC
  Calling nested_sync_control_from_vmcb02 on each VM exit (nested or not),
was an attempt to keep the int_ctl field in the vmcb12 cache
up to date on each VM exit.

However all other fields in the vmcb12 cache are not kept up to	 date,
therefore for consistency it is better to do this on a nested VM exit only.

No functional change intended.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 17 ++++++++---------
 arch/x86/kvm/svm/svm.c    |  2 --
 arch/x86/kvm/svm/svm.h    |  1 -
 3 files changed, 8 insertions(+), 12 deletions(-)
  

Comments

Sean Christopherson Nov. 17, 2022, 8:04 p.m. UTC | #1
On Thu, Nov 17, 2022, Maxim Levitsky wrote:
> Calling nested_sync_control_from_vmcb02 on each VM exit (nested or not),
> was an attempt to keep the int_ctl field in the vmcb12 cache
> up to date on each VM exit.

This doesn't mesh with the reasoning in commit 2d8a42be0e2b ("KVM: nSVM: synchronize
VMCB controls updated by the processor on every vmexit"), which states that the
goal is to keep svm->nested.ctl.* synchronized, not vmcb12.  Or is nested.ctl the
cache you are referring to?

> However all other fields in the vmcb12 cache are not kept up to	 date,

IIUC, this isn't technically true.  They are up-to-date because they're never
modified by hardware.

> therefore for consistency it is better to do this on a nested VM exit only.

Again, IIUC, this actually introduces an inconsistency because it leaves stale
state in svm->nested.ctl, whereas the existing code ensures all state in
svm->nested.ctl is fresh immediately after non-nested VM-Exit.
  
Maxim Levitsky Nov. 21, 2022, 11:07 a.m. UTC | #2
On Thu, 2022-11-17 at 20:04 +0000, Sean Christopherson wrote:
> On Thu, Nov 17, 2022, Maxim Levitsky wrote:
> > Calling nested_sync_control_from_vmcb02 on each VM exit (nested or not),
> > was an attempt to keep the int_ctl field in the vmcb12 cache
> > up to date on each VM exit.
> 
> This doesn't mesh with the reasoning in commit 2d8a42be0e2b ("KVM: nSVM: synchronize
> VMCB controls updated by the processor on every vmexit"), which states that the
> goal is to keep svm->nested.ctl.* synchronized, not vmcb12.  Or is nested.ctl the
> cache you are referring to?

Thanks for digging that commit out.

My reasoning was that cache contains both control and 'save' fields, and
we don't update 'save' fields on each VM exit.

For control it indeed looks like int_ctl and eventinj are the only fields
that are updated by the CPU, although IMHO they don't *need* to be updated
until we do a nested VM exit, because the VM isn't supposed to look at vmcb while it
is in use by the CPU, its state is undefined.

For migration though, this does look like a problem. It can be fixed during
reading the nested state but it is a hack as well.

My idea was as you had seen in the patches it to unify int_ctl handling,
since some bits might need to be copied to vmcb12 but some to vmcb01,
and we happened to have none of these so far, and it "happened" to work.

Do you have an idea on how to do this cleanly? I can just leave this as is
and only sync the bits of int_ctl from vmcb02 to vmcb01 on nested VM exit.
Ugly but would work.




> 
> > However all other fields in the vmcb12 cache are not kept up to  date,
> 
> IIUC, this isn't technically true.  They are up-to-date because they're never
> modified by hardware.

In both save and control cache. In control cache indeed looks like the
fields are kept up to date.

Best regards,
	Maxim Levitsky

> 
> > therefore for consistency it is better to do this on a nested VM exit only.
> 
> Again, IIUC, this actually introduces an inconsistency because it leaves stale
> state in svm->nested.ctl, whereas the existing code ensures all state in
> svm->nested.ctl is fresh immediately after non-nested VM-Exit.
>
  
Sean Christopherson Nov. 21, 2022, 5:51 p.m. UTC | #3
On Mon, Nov 21, 2022, Maxim Levitsky wrote:
> On Thu, 2022-11-17 at 20:04 +0000, Sean Christopherson wrote:
> > On Thu, Nov 17, 2022, Maxim Levitsky wrote:
> > > Calling nested_sync_control_from_vmcb02 on each VM exit (nested or not),
> > > was an attempt to keep the int_ctl field in the vmcb12 cache
> > > up to date on each VM exit.
> > 
> > This doesn't mesh with the reasoning in commit 2d8a42be0e2b ("KVM: nSVM: synchronize
> > VMCB controls updated by the processor on every vmexit"), which states that the
> > goal is to keep svm->nested.ctl.* synchronized, not vmcb12.  Or is nested.ctl the
> > cache you are referring to?
> 
> Thanks for digging that commit out.
> 
> My reasoning was that cache contains both control and 'save' fields, and
> we don't update 'save' fields on each VM exit.
>
> For control it indeed looks like int_ctl and eventinj are the only fields
> that are updated by the CPU, although IMHO they don't *need* to be updated
> until we do a nested VM exit, because the VM isn't supposed to look at vmcb
> while it is in use by the CPU, its state is undefined.

The point of the cache isn't to forward info to L2 though, it's so that KVM can
query the effective VMCB state without having to read guest memory and/or track
where the current state lives.

> For migration though, this does look like a problem. It can be fixed during
> reading the nested state but it is a hack as well.
>
> My idea was as you had seen in the patches it to unify int_ctl handling,
> since some bits might need to be copied to vmcb12 but some to vmcb01,
> and we happened to have none of these so far, and it "happened" to work.
> 
> Do you have an idea on how to do this cleanly? I can just leave this as is
> and only sync the bits of int_ctl from vmcb02 to vmcb01 on nested VM exit.
> Ugly but would work.

That honestly seems like the best option to me.  The ugly part isn't as much KVM's
caching as it is the mixed, conditional behavior of int_ctl.  E.g. VMX has even
more caching and synchronization (eVMCS, shadow VMCS, etc...), but off the top of
my head I can't think of any scenarios where KVM needs to splice/split VMCS fields.
KVM needs to conditionally sync fields, but not split like this.

In other words, I think this particular code is going to be rather ugly no matter
what KVM does.
  

Patch

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 43cc4a5d22e012..91a51e75717dca 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -407,11 +407,12 @@  void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
  * Synchronize fields that are written by the processor, so that
  * they can be copied back into the vmcb12.
  */
-void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
+static void nested_sync_control_from_vmcb02(struct vcpu_svm *svm,
+					    struct vmcb *vmcb12)
 {
 	u32 mask;
-	svm->nested.ctl.event_inj      = svm->vmcb->control.event_inj;
-	svm->nested.ctl.event_inj_err  = svm->vmcb->control.event_inj_err;
+	vmcb12->control.event_inj      = svm->vmcb->control.event_inj;
+	vmcb12->control.event_inj_err  = svm->vmcb->control.event_inj_err;
 
 	/* Only a few fields of int_ctl are written by the processor.  */
 	mask = V_IRQ_MASK | V_TPR_MASK;
@@ -431,8 +432,8 @@  void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
 	if (nested_vgif_enabled(svm))
 		mask |= V_GIF_MASK;
 
-	svm->nested.ctl.int_ctl        &= ~mask;
-	svm->nested.ctl.int_ctl        |= svm->vmcb->control.int_ctl & mask;
+	vmcb12->control.int_ctl        &= ~mask;
+	vmcb12->control.int_ctl        |= svm->vmcb->control.int_ctl & mask;
 }
 
 /*
@@ -985,13 +986,11 @@  int nested_svm_vmexit(struct vcpu_svm *svm)
 	if (vmcb12->control.exit_code != SVM_EXIT_ERR)
 		nested_save_pending_event_to_vmcb12(svm, vmcb12);
 
+	nested_sync_control_from_vmcb02(svm, vmcb12);
+
 	if (svm->nrips_enabled)
 		vmcb12->control.next_rip  = vmcb02->control.next_rip;
 
-	vmcb12->control.int_ctl           = svm->nested.ctl.int_ctl;
-	vmcb12->control.event_inj         = svm->nested.ctl.event_inj;
-	vmcb12->control.event_inj_err     = svm->nested.ctl.event_inj_err;
-
 	if (!kvm_pause_in_guest(vcpu->kvm)) {
 		vmcb01->control.pause_filter_count = vmcb02->control.pause_filter_count;
 		vmcb_mark_dirty(vmcb01, VMCB_INTERCEPTS);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 527f18d8cc4489..03acbe8ff34edb 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4016,8 +4016,6 @@  static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	svm->next_rip = 0;
 	if (is_guest_mode(vcpu)) {
-		nested_sync_control_from_vmcb02(svm);
-
 		/* Track VMRUNs that have made past consistency checking */
 		if (svm->nested.nested_run_pending &&
 		    svm->vmcb->control.exit_code != SVM_EXIT_ERR)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 199a2ecef1cec6..f5383104d00580 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -618,7 +618,6 @@  void nested_copy_vmcb_control_to_cache(struct vcpu_svm *svm,
 				       struct vmcb_control_area *control);
 void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
 				    struct vmcb_save_area *save);
-void nested_sync_control_from_vmcb02(struct vcpu_svm *svm);
 void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm);
 void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb);