[v5] KVM x86/xen: add an override for PVCLOCK_TSC_STABLE_BIT

Message ID 20231102162128.2353459-1-paul@xen.org
State New
Headers
Series [v5] KVM x86/xen: add an override for PVCLOCK_TSC_STABLE_BIT |

Commit Message

Paul Durrant Nov. 2, 2023, 4:21 p.m. UTC
  From: Paul Durrant <pdurrant@amazon.com>

Unless explicitly told to do so (by passing 'clocksource=tsc' and
'tsc=stable:socket', and then jumping through some hoops concerning
potential CPU hotplug) Xen will never use TSC as its clocksource.
Hence, by default, a Xen guest will not see PVCLOCK_TSC_STABLE_BIT set
in either the primary or secondary pvclock memory areas. This has
led to bugs in some guest kernels which only become evident if
PVCLOCK_TSC_STABLE_BIT *is* set in the pvclocks. Hence, to support
such guests, give the VMM a new Xen HVM config flag to tell KVM to
forcibly clear the bit in the Xen pvclocks.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---

v5:
 - Fix warning reported by kernel test robot.

v4:
 - Re-base.
 - Re-work 'update_pvclock' test as requested.

v3:
 - Moved clearing of PVCLOCK_TSC_STABLE_BIT the right side of the
   memcpy().
 - Added an all-vCPUs KVM_REQ_CLOCK_UPDATE when the HVM config
   flag is changed.
---
 Documentation/virt/kvm/api.rst |  6 ++++++
 arch/x86/kvm/x86.c             | 28 +++++++++++++++++++++++-----
 arch/x86/kvm/xen.c             |  9 ++++++++-
 include/uapi/linux/kvm.h       |  1 +
 4 files changed, 38 insertions(+), 6 deletions(-)


base-commit: 45b890f7689eb0aba454fc5831d2d79763781677
  

Comments

David Woodhouse Nov. 8, 2023, 6:39 p.m. UTC | #1
On Thu, 2023-11-02 at 16:21 +0000, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@amazon.com>
> 
> Unless explicitly told to do so (by passing 'clocksource=tsc' and
> 'tsc=stable:socket', and then jumping through some hoops concerning
> potential CPU hotplug) Xen will never use TSC as its clocksource.
> Hence, by default, a Xen guest will not see PVCLOCK_TSC_STABLE_BIT set
> in either the primary or secondary pvclock memory areas. This has
> led to bugs in some guest kernels which only become evident if
> PVCLOCK_TSC_STABLE_BIT *is* set in the pvclocks. Hence, to support
> such guests, give the VMM a new Xen HVM config flag to tell KVM to
> forcibly clear the bit in the Xen pvclocks.
> 
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
  
Paul Durrant Nov. 30, 2023, 3:49 p.m. UTC | #2
On 08/11/2023 18:39, David Woodhouse wrote:
> On Thu, 2023-11-02 at 16:21 +0000, Paul Durrant wrote:
>> From: Paul Durrant <pdurrant@amazon.com>
>>
>> Unless explicitly told to do so (by passing 'clocksource=tsc' and
>> 'tsc=stable:socket', and then jumping through some hoops concerning
>> potential CPU hotplug) Xen will never use TSC as its clocksource.
>> Hence, by default, a Xen guest will not see PVCLOCK_TSC_STABLE_BIT set
>> in either the primary or secondary pvclock memory areas. This has
>> led to bugs in some guest kernels which only become evident if
>> PVCLOCK_TSC_STABLE_BIT *is* set in the pvclocks. Hence, to support
>> such guests, give the VMM a new Xen HVM config flag to tell KVM to
>> forcibly clear the bit in the Xen pvclocks.
>>
>> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> 
> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>

Sean,

   Is any more work needed on this?

   Paul
  
Sean Christopherson Nov. 30, 2023, 4:36 p.m. UTC | #3
+Andrew

On Thu, Nov 02, 2023, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@amazon.com>
> 
> Unless explicitly told to do so (by passing 'clocksource=tsc' and
> 'tsc=stable:socket', and then jumping through some hoops concerning
> potential CPU hotplug) Xen will never use TSC as its clocksource.
> Hence, by default, a Xen guest will not see PVCLOCK_TSC_STABLE_BIT set
> in either the primary or secondary pvclock memory areas. This has
> led to bugs in some guest kernels which only become evident if
> PVCLOCK_TSC_STABLE_BIT *is* set in the pvclocks. Hence, to support
> such guests, give the VMM a new Xen HVM config flag to tell KVM to
> forcibly clear the bit in the Xen pvclocks.

...

> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 7025b3751027..a9bdd25826d1 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8374,6 +8374,7 @@ PVHVM guests. Valid flags are::
>    #define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL		(1 << 4)
>    #define KVM_XEN_HVM_CONFIG_EVTCHN_SEND		(1 << 5)
>    #define KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG	(1 << 6)
> +  #define KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE	(1 << 7)

Does Xen actually support PVCLOCK_TSC_STABLE_BIT?  I.e. do we need new uAPI to
fix this, or can/should KVM simply _never_ set PVCLOCK_TSC_STABLE_BIT for Xen
clocks?  At a glance, PVCLOCK_TSC_STABLE_BIT looks like it was added as a purely
Linux/KVM-only thing.
  
Paul Durrant Nov. 30, 2023, 4:41 p.m. UTC | #4
On 30/11/2023 16:36, Sean Christopherson wrote:
> +Andrew
> 
> On Thu, Nov 02, 2023, Paul Durrant wrote:
>> From: Paul Durrant <pdurrant@amazon.com>
>>
>> Unless explicitly told to do so (by passing 'clocksource=tsc' and
>> 'tsc=stable:socket', and then jumping through some hoops concerning
>> potential CPU hotplug) Xen will never use TSC as its clocksource.
>> Hence, by default, a Xen guest will not see PVCLOCK_TSC_STABLE_BIT set
>> in either the primary or secondary pvclock memory areas. This has
>> led to bugs in some guest kernels which only become evident if
>> PVCLOCK_TSC_STABLE_BIT *is* set in the pvclocks. Hence, to support
>> such guests, give the VMM a new Xen HVM config flag to tell KVM to
>> forcibly clear the bit in the Xen pvclocks.
> 
> ...
> 
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index 7025b3751027..a9bdd25826d1 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -8374,6 +8374,7 @@ PVHVM guests. Valid flags are::
>>     #define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL		(1 << 4)
>>     #define KVM_XEN_HVM_CONFIG_EVTCHN_SEND		(1 << 5)
>>     #define KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG	(1 << 6)
>> +  #define KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE	(1 << 7)
> 
> Does Xen actually support PVCLOCK_TSC_STABLE_BIT?  I.e. do we need new uAPI to
> fix this, or can/should KVM simply _never_ set PVCLOCK_TSC_STABLE_BIT for Xen
> clocks?  At a glance, PVCLOCK_TSC_STABLE_BIT looks like it was added as a purely
> Linux/KVM-only thing.

It's certainly tested in arch/x86/xen/time.c, in 
xen_setup_vsyscall_time_info() and xen_time_init(), so I'd guess it is 
considered to be supported.

   Paul
  
David Woodhouse Nov. 30, 2023, 7:41 p.m. UTC | #5
On Thu, 2023-11-30 at 16:41 +0000, Paul Durrant wrote:
> On 30/11/2023 16:36, Sean Christopherson wrote:
> > +Andrew
> > 
> > On Thu, Nov 02, 2023, Paul Durrant wrote:
> > > From: Paul Durrant <pdurrant@amazon.com>
> > > 
> > > Unless explicitly told to do so (by passing 'clocksource=tsc' and
> > > 'tsc=stable:socket', and then jumping through some hoops concerning
> > > potential CPU hotplug) Xen will never use TSC as its clocksource.
> > > Hence, by default, a Xen guest will not see PVCLOCK_TSC_STABLE_BIT set
> > > in either the primary or secondary pvclock memory areas. This has
> > > led to bugs in some guest kernels which only become evident if
> > > PVCLOCK_TSC_STABLE_BIT *is* set in the pvclocks. Hence, to support
> > > such guests, give the VMM a new Xen HVM config flag to tell KVM to
> > > forcibly clear the bit in the Xen pvclocks.
> > 
> > ...
> > 
> > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > index 7025b3751027..a9bdd25826d1 100644
> > > --- a/Documentation/virt/kvm/api.rst
> > > +++ b/Documentation/virt/kvm/api.rst
> > > @@ -8374,6 +8374,7 @@ PVHVM guests. Valid flags are::
> > >     #define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL            (1 << 4)
> > >     #define KVM_XEN_HVM_CONFIG_EVTCHN_SEND              (1 << 5)
> > >     #define KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG     (1 << 6)
> > > +  #define KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE      (1 << 7)
> > 
> > Does Xen actually support PVCLOCK_TSC_STABLE_BIT?  I.e. do we need new uAPI to
> > fix this, or can/should KVM simply _never_ set PVCLOCK_TSC_STABLE_BIT for Xen
> > clocks?  At a glance, PVCLOCK_TSC_STABLE_BIT looks like it was added as a purely
> > Linux/KVM-only thing.
> 
> It's certainly tested in arch/x86/xen/time.c, in 
> xen_setup_vsyscall_time_info() and xen_time_init(), so I'd guess it is 
> considered to be supported.

And yes, Xen does set it, if you jump through the right hoops to make
Xen actually use the TSC as its clocksource.

The new uAPI is just a single bit in the KVM_XEN_HVM_CONFIG
capabilities; I think it's reasonable enough.
  
Sean Christopherson Dec. 1, 2023, 10:42 p.m. UTC | #6
On Thu, Nov 30, 2023, David Woodhouse wrote:
> On Thu, 2023-11-30 at 16:41 +0000, Paul Durrant wrote:
> > On 30/11/2023 16:36, Sean Christopherson wrote:
> > > +Andrew
> > > 
> > > On Thu, Nov 02, 2023, Paul Durrant wrote:
> > > > From: Paul Durrant <pdurrant@amazon.com>
> > > > 
> > > > Unless explicitly told to do so (by passing 'clocksource=tsc' and
> > > > 'tsc=stable:socket', and then jumping through some hoops concerning
> > > > potential CPU hotplug) Xen will never use TSC as its clocksource.
> > > > Hence, by default, a Xen guest will not see PVCLOCK_TSC_STABLE_BIT set
> > > > in either the primary or secondary pvclock memory areas. This has
> > > > led to bugs in some guest kernels which only become evident if
> > > > PVCLOCK_TSC_STABLE_BIT *is* set in the pvclocks. Hence, to support
> > > > such guests, give the VMM a new Xen HVM config flag to tell KVM to
> > > > forcibly clear the bit in the Xen pvclocks.
> > > 
> > > ...
> > > 
> > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > > index 7025b3751027..a9bdd25826d1 100644
> > > > --- a/Documentation/virt/kvm/api.rst
> > > > +++ b/Documentation/virt/kvm/api.rst
> > > > @@ -8374,6 +8374,7 @@ PVHVM guests. Valid flags are::
> > > >     #define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL            (1 << 4)
> > > >     #define KVM_XEN_HVM_CONFIG_EVTCHN_SEND              (1 << 5)
> > > >     #define KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG     (1 << 6)
> > > > +  #define KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE      (1 << 7)
> > > 
> > > Does Xen actually support PVCLOCK_TSC_STABLE_BIT?  I.e. do we need new uAPI to
> > > fix this, or can/should KVM simply _never_ set PVCLOCK_TSC_STABLE_BIT for Xen
> > > clocks?  At a glance, PVCLOCK_TSC_STABLE_BIT looks like it was added as a purely
> > > Linux/KVM-only thing.
> > 
> > It's certainly tested in arch/x86/xen/time.c, in 
> > xen_setup_vsyscall_time_info() and xen_time_init(), so I'd guess it is 
> > considered to be supported.
> 
> And yes, Xen does set it, if you jump through the right hoops to make
> Xen actually use the TSC as its clocksource.
> 
> The new uAPI is just a single bit in the KVM_XEN_HVM_CONFIG
> capabilities; I think it's reasonable enough.

Yeah, I was just hoping that maybe we could squeak by without it.  I'll get this
queued up next week, purely because I try to avoid (but often fail) pushing to
-next on Fridays.
  
Sean Christopherson Dec. 8, 2023, 2:17 a.m. UTC | #7
On Thu, 02 Nov 2023 16:21:28 +0000, Paul Durrant wrote:
> Unless explicitly told to do so (by passing 'clocksource=tsc' and
> 'tsc=stable:socket', and then jumping through some hoops concerning
> potential CPU hotplug) Xen will never use TSC as its clocksource.
> Hence, by default, a Xen guest will not see PVCLOCK_TSC_STABLE_BIT set
> in either the primary or secondary pvclock memory areas. This has
> led to bugs in some guest kernels which only become evident if
> PVCLOCK_TSC_STABLE_BIT *is* set in the pvclocks. Hence, to support
> such guests, give the VMM a new Xen HVM config flag to tell KVM to
> forcibly clear the bit in the Xen pvclocks.
> 
> [...]

Applied to kvm-x86 xen (and not on a Friday!), thanks!

[1/1] KVM x86/xen: add an override for PVCLOCK_TSC_STABLE_BIT
      https://github.com/kvm-x86/linux/commit/6d7228352609

--
https://github.com/kvm-x86/linux/tree/next
  

Patch

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 7025b3751027..a9bdd25826d1 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8374,6 +8374,7 @@  PVHVM guests. Valid flags are::
   #define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL		(1 << 4)
   #define KVM_XEN_HVM_CONFIG_EVTCHN_SEND		(1 << 5)
   #define KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG	(1 << 6)
+  #define KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE	(1 << 7)
 
 The KVM_XEN_HVM_CONFIG_HYPERCALL_MSR flag indicates that the KVM_XEN_HVM_CONFIG
 ioctl is available, for the guest to set its hypercall page.
@@ -8417,6 +8418,11 @@  behave more correctly, not using the XEN_RUNSTATE_UPDATE flag until/unless
 specifically enabled (by the guest making the hypercall, causing the VMM
 to enable the KVM_XEN_ATTR_TYPE_RUNSTATE_UPDATE_FLAG attribute).
 
+The KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE flag indicates that KVM supports
+clearing the PVCLOCK_TSC_STABLE_BIT flag in Xen pvclock sources. This will be
+done when the KVM_CAP_XEN_HVM ioctl sets the
+KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE flag.
+
 8.31 KVM_CAP_PPC_MULTITCE
 -------------------------
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2c924075f6f1..cc8d1ae29be3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3104,7 +3104,8 @@  u64 get_kvmclock_ns(struct kvm *kvm)
 
 static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
 				    struct gfn_to_pfn_cache *gpc,
-				    unsigned int offset)
+				    unsigned int offset,
+				    bool force_tsc_unstable)
 {
 	struct kvm_vcpu_arch *vcpu = &v->arch;
 	struct pvclock_vcpu_time_info *guest_hv_clock;
@@ -3141,6 +3142,10 @@  static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
 	}
 
 	memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock));
+
+	if (force_tsc_unstable)
+		guest_hv_clock->flags &= ~PVCLOCK_TSC_STABLE_BIT;
+
 	smp_wmb();
 
 	guest_hv_clock->version = ++vcpu->hv_clock.version;
@@ -3161,6 +3166,16 @@  static int kvm_guest_time_update(struct kvm_vcpu *v)
 	u64 tsc_timestamp, host_tsc;
 	u8 pvclock_flags;
 	bool use_master_clock;
+#ifdef CONFIG_KVM_XEN
+	/*
+	 * For Xen guests we may need to override PVCLOCK_TSC_STABLE_BIT as unless
+	 * explicitly told to use TSC as its clocksource Xen will not set this bit.
+	 * This default behaviour led to bugs in some guest kernels which cause
+	 * problems if they observe PVCLOCK_TSC_STABLE_BIT in the pvclock flags.
+	 */
+	bool xen_pvclock_tsc_unstable =
+		ka->xen_hvm_config.flags & KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE;
+#endif
 
 	kernel_ns = 0;
 	host_tsc = 0;
@@ -3239,13 +3254,15 @@  static int kvm_guest_time_update(struct kvm_vcpu *v)
 	vcpu->hv_clock.flags = pvclock_flags;
 
 	if (vcpu->pv_time.active)
-		kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0);
+		kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0, false);
 #ifdef CONFIG_KVM_XEN
 	if (vcpu->xen.vcpu_info_cache.active)
 		kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_info_cache,
-					offsetof(struct compat_vcpu_info, time));
+					offsetof(struct compat_vcpu_info, time),
+					xen_pvclock_tsc_unstable);
 	if (vcpu->xen.vcpu_time_info_cache.active)
-		kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_time_info_cache, 0);
+		kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_time_info_cache, 0,
+					xen_pvclock_tsc_unstable);
 #endif
 	kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock);
 	return 0;
@@ -4638,7 +4655,8 @@  int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		    KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL |
 		    KVM_XEN_HVM_CONFIG_SHARED_INFO |
 		    KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL |
-		    KVM_XEN_HVM_CONFIG_EVTCHN_SEND;
+		    KVM_XEN_HVM_CONFIG_EVTCHN_SEND |
+		    KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE;
 		if (sched_info_on())
 			r |= KVM_XEN_HVM_CONFIG_RUNSTATE |
 			     KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG;
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index e53fad915a62..e43948b87f94 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -1162,7 +1162,9 @@  int kvm_xen_hvm_config(struct kvm *kvm, struct kvm_xen_hvm_config *xhc)
 {
 	/* Only some feature flags need to be *enabled* by userspace */
 	u32 permitted_flags = KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL |
-		KVM_XEN_HVM_CONFIG_EVTCHN_SEND;
+		KVM_XEN_HVM_CONFIG_EVTCHN_SEND |
+		KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE;
+	u32 old_flags;
 
 	if (xhc->flags & ~permitted_flags)
 		return -EINVAL;
@@ -1183,9 +1185,14 @@  int kvm_xen_hvm_config(struct kvm *kvm, struct kvm_xen_hvm_config *xhc)
 	else if (!xhc->msr && kvm->arch.xen_hvm_config.msr)
 		static_branch_slow_dec_deferred(&kvm_xen_enabled);
 
+	old_flags = kvm->arch.xen_hvm_config.flags;
 	memcpy(&kvm->arch.xen_hvm_config, xhc, sizeof(*xhc));
 
 	mutex_unlock(&kvm->arch.xen.xen_lock);
+
+	if ((old_flags ^ xhc->flags) & KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE)
+		kvm_make_all_cpus_request(kvm, KVM_REQ_CLOCK_UPDATE);
+
 	return 0;
 }
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 211b86de35ac..ae90294456df 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1291,6 +1291,7 @@  struct kvm_x86_mce {
 #define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL	(1 << 4)
 #define KVM_XEN_HVM_CONFIG_EVTCHN_SEND		(1 << 5)
 #define KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG	(1 << 6)
+#define KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE	(1 << 7)
 
 struct kvm_xen_hvm_config {
 	__u32 flags;