[2/3] KVM: x86: Don't update KVM PV feature CPUID during vCPU running

Message ID 9fbf5b4022d67157d6305bc1811f36d9096c26fc.1680179693.git.houwenlong.hwl@antgroup.com
State New
Headers
Series [1/3] KVM: x86: Disallow enable KVM_CAP_X86_DISABLE_EXITS capability after vCPUs have been created |

Commit Message

Hou Wenlong March 30, 2023, 12:35 p.m. UTC
  __kvm_update_cpuid_runtime() may be called during vCPU running and KVM
PV feature CPUID is updated too. But the cached KVM PV feature bitmap is
not updated. Actually, KVM PV feature CPUID shouldn't be updated,
otherwise, KVM PV feature would be broken in guest. Currently, only
KVM_FEATURE_PV_UNHALT is updated, and it's impossible after disallow
disable HLT exits. However, KVM PV feature CPUID should be updated only
in KVM_SET_CPUID{,2} ioctl.

Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
---
 arch/x86/kvm/cpuid.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)
  

Comments

Sean Christopherson April 6, 2023, 3:29 a.m. UTC | #1
+Kechen

On Thu, Mar 30, 2023, Hou Wenlong wrote:
> __kvm_update_cpuid_runtime() may be called during vCPU running and KVM
> PV feature CPUID is updated too. But the cached KVM PV feature bitmap is
> not updated. Actually, KVM PV feature CPUID shouldn't be updated,
> otherwise, KVM PV feature would be broken in guest. Currently, only
> KVM_FEATURE_PV_UNHALT is updated, and it's impossible after disallow
> disable HLT exits. However, KVM PV feature CPUID should be updated only
> in KVM_SET_CPUID{,2} ioctl.
> 
> Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
> ---
>  arch/x86/kvm/cpuid.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 6972e0be60fa..af92d3422c79 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -222,6 +222,17 @@ static struct kvm_cpuid_entry2 *kvm_find_kvm_cpuid_features(struct kvm_vcpu *vcp
>  					     vcpu->arch.cpuid_nent);
>  }
>  
> +static void kvm_update_pv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *entries,
> +				int nent)
> +{
> +	struct kvm_cpuid_entry2 *best;
> +
> +	best = __kvm_find_kvm_cpuid_features(vcpu, entries, nent);
> +	if (kvm_hlt_in_guest(vcpu->kvm) && best &&
> +		(best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
> +		best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
> +}
> +
>  void kvm_update_pv_runtime(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpuid_entry2 *best = kvm_find_kvm_cpuid_features(vcpu);
> @@ -280,11 +291,6 @@ static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_e
>  		     cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
>  		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
>  
> -	best = __kvm_find_kvm_cpuid_features(vcpu, entries, nent);
> -	if (kvm_hlt_in_guest(vcpu->kvm) && best &&
> -		(best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
> -		best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
> -
>  	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) {
>  		best = cpuid_entry2_find(entries, nent, 0x1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
>  		if (best)
> @@ -402,6 +408,7 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
>  	int r;
>  
>  	__kvm_update_cpuid_runtime(vcpu, e2, nent);
> +	kvm_update_pv_cpuid(vcpu, e2, nent);

Hrm, this will silently conflict with the proposed per-vCPU controls[*].  Though
arguably that patch is buggy and "needs" to toggle PV_UNHALT when userspace
messes with HLT passthrough.  But that doesn't really make sense either because
no guest will react kindly to KVM_FEATURE_PV_UNHALT disappearing.

I really wish this code didn't exist, i.e. that KVM let/forced userspace deal
with correctly defining guest CPUID.

Kechen, is it feasible for your userspace to clear PV_UNHALT when it (might) use
the per-vCPU control?  I.e. can KVM do as this series proposes and update guest
CPUID only on KVM_SET_CPUID{2}?  Dropping the behavior for the per-VM control
is probably not an option as I gotta assume that'd break userspace, but I would
really like to avoid carrying that over to the per-vCPU control, which would get
quite messy and probably can't work anyways.

[*] https://lkml.kernel.org/r/20230121020738.2973-6-kechenl%40nvidia.com
  
Kechen Lu April 6, 2023, 6:30 p.m. UTC | #2
Hi Sean,

> -----Original Message-----
> From: Sean Christopherson <seanjc@google.com>
> Sent: Wednesday, April 5, 2023 8:29 PM
> To: Hou Wenlong <houwenlong.hwl@antgroup.com>
> Cc: kvm@vger.kernel.org; Paolo Bonzini <pbonzini@redhat.com>; Thomas
> Gleixner <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; Borislav
> Petkov <bp@alien8.de>; Dave Hansen <dave.hansen@linux.intel.com>;
> x86@kernel.org; H. Peter Anvin <hpa@zytor.com>; linux-
> kernel@vger.kernel.org; Kechen Lu <kechenl@nvidia.com>
> Subject: Re: [PATCH 2/3] KVM: x86: Don't update KVM PV feature CPUID
> during vCPU running
> 
> External email: Use caution opening links or attachments
> 
> 
> +Kechen
> 
> On Thu, Mar 30, 2023, Hou Wenlong wrote:
> > __kvm_update_cpuid_runtime() may be called during vCPU running and
> KVM
> > PV feature CPUID is updated too. But the cached KVM PV feature bitmap
> > is not updated. Actually, KVM PV feature CPUID shouldn't be updated,
> > otherwise, KVM PV feature would be broken in guest. Currently, only
> > KVM_FEATURE_PV_UNHALT is updated, and it's impossible after disallow
> > disable HLT exits. However, KVM PV feature CPUID should be updated
> > only in KVM_SET_CPUID{,2} ioctl.
> >
> > Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
> > ---
> >  arch/x86/kvm/cpuid.c | 17 ++++++++++++-----
> >  1 file changed, 12 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index
> > 6972e0be60fa..af92d3422c79 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -222,6 +222,17 @@ static struct kvm_cpuid_entry2
> *kvm_find_kvm_cpuid_features(struct kvm_vcpu *vcp
> >                                            vcpu->arch.cpuid_nent);  }
> >
> > +static void kvm_update_pv_cpuid(struct kvm_vcpu *vcpu, struct
> kvm_cpuid_entry2 *entries,
> > +                             int nent) {
> > +     struct kvm_cpuid_entry2 *best;
> > +
> > +     best = __kvm_find_kvm_cpuid_features(vcpu, entries, nent);
> > +     if (kvm_hlt_in_guest(vcpu->kvm) && best &&
> > +             (best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
> > +             best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT); }
> > +
> >  void kvm_update_pv_runtime(struct kvm_vcpu *vcpu)  {
> >       struct kvm_cpuid_entry2 *best =
> > kvm_find_kvm_cpuid_features(vcpu);
> > @@ -280,11 +291,6 @@ static void __kvm_update_cpuid_runtime(struct
> kvm_vcpu *vcpu, struct kvm_cpuid_e
> >                    cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
> >               best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
> >
> > -     best = __kvm_find_kvm_cpuid_features(vcpu, entries, nent);
> > -     if (kvm_hlt_in_guest(vcpu->kvm) && best &&
> > -             (best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
> > -             best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
> > -
> >       if (!kvm_check_has_quirk(vcpu->kvm,
> KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) {
> >               best = cpuid_entry2_find(entries, nent, 0x1,
> KVM_CPUID_INDEX_NOT_SIGNIFICANT);
> >               if (best)
> > @@ -402,6 +408,7 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu,
> struct kvm_cpuid_entry2 *e2,
> >       int r;
> >
> >       __kvm_update_cpuid_runtime(vcpu, e2, nent);
> > +     kvm_update_pv_cpuid(vcpu, e2, nent);
> 
> Hrm, this will silently conflict with the proposed per-vCPU controls[*].
> Though arguably that patch is buggy and "needs" to toggle PV_UNHALT
> when userspace messes with HLT passthrough.  But that doesn't really make
> sense either because no guest will react kindly to
> KVM_FEATURE_PV_UNHALT disappearing.

Yes agree, toggling PV_UNHALT with per-vCPU control also sounds not making 
sense to me. And as pv feature is per VM bases, if current per-vCPU control 
touches the pv feature toggling, that would probably cause a lot of messes.

> 
> I really wish this code didn't exist, i.e. that KVM let/forced userspace deal
> with correctly defining guest CPUID.
> 
> Kechen, is it feasible for your userspace to clear PV_UNHALT when it (might)
> use the per-vCPU control?  I.e. can KVM do as this series proposes and
> update guest CPUID only on KVM_SET_CPUID{2}?  Dropping the behavior for
> the per-VM control is probably not an option as I gotta assume that'd break
> userspace, but I would really like to avoid carrying that over to the per-vCPU
> control, which would get quite messy and probably can't work anyways.

Yes, in our use cases, it's feasible to clear PV_UNHALT while using the 
per-vCPU control. I think it makes sense on userspace responsibility to clear 
the PV_UNHALT bits while trying to use the per-vCPU control for hlt passthrough.
We may add notes/requirement after this line of doc 
Documentation/virt/kvm/api.rst:
"Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits."

Best Regards,
Kechen

> 
> [*] https://lkml.kernel.org/r/20230121020738.2973-6-kechenl%40nvidia.com
  

Patch

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 6972e0be60fa..af92d3422c79 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -222,6 +222,17 @@  static struct kvm_cpuid_entry2 *kvm_find_kvm_cpuid_features(struct kvm_vcpu *vcp
 					     vcpu->arch.cpuid_nent);
 }
 
+static void kvm_update_pv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *entries,
+				int nent)
+{
+	struct kvm_cpuid_entry2 *best;
+
+	best = __kvm_find_kvm_cpuid_features(vcpu, entries, nent);
+	if (kvm_hlt_in_guest(vcpu->kvm) && best &&
+		(best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
+		best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
+}
+
 void kvm_update_pv_runtime(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpuid_entry2 *best = kvm_find_kvm_cpuid_features(vcpu);
@@ -280,11 +291,6 @@  static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_e
 		     cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
 		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
 
-	best = __kvm_find_kvm_cpuid_features(vcpu, entries, nent);
-	if (kvm_hlt_in_guest(vcpu->kvm) && best &&
-		(best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
-		best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
-
 	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) {
 		best = cpuid_entry2_find(entries, nent, 0x1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
 		if (best)
@@ -402,6 +408,7 @@  static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
 	int r;
 
 	__kvm_update_cpuid_runtime(vcpu, e2, nent);
+	kvm_update_pv_cpuid(vcpu, e2, nent);
 
 	/*
 	 * KVM does not correctly handle changing guest CPUID after KVM_RUN, as