From patchwork Thu Nov 2 16:21:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul Durrant X-Patchwork-Id: 160976 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:8f47:0:b0:403:3b70:6f57 with SMTP id j7csp474146vqu; Thu, 2 Nov 2023 09:22:20 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHCJlXGTHQvvEBk75pXVGLBQt9kRXwUMDnfXhyDyDF293tZYK1hwgtj6ft2hs5SXGz7Q5AC X-Received: by 2002:a05:6a00:44c1:b0:6c3:3989:ca15 with SMTP id cv1-20020a056a0044c100b006c33989ca15mr1263328pfb.32.1698942140451; Thu, 02 Nov 2023 09:22:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698942140; cv=none; d=google.com; s=arc-20160816; b=wRXCk0rE37SlPzy73yJilaDuu4P7g70jqSKnXzC2i/7wrMhBVp86UbCkHyK1C80DT4 wlMmC8WCC8dJuJBzOH5xnAl++3RZKJ2s9qB7Wei0ItivOl9CSvg/NMvA6I2zYXpR4+dn 29G6li1vkUMyp1mYQmPvHAw76GF0G0gsyTW7hquX5VnzY3G5w20+diFdCNONmXBmmAC6 D80pjBra6pFerNzjYrxKsvjDRyOBBbGRAGrZqEyGsv7MwNqX+spBkuzjvXRwxuX+ioFl sSXzR6NAx7Ho2uSdCItaLTvHt5roWqWNqCEfeq66+NdJKsC75nhH84mNi0VJe+xrOPUE vcXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:to:from:dkim-signature; bh=ypWhZ+TKbQ7/ZkY7xWEAWru4yXqntW8dv6C3cB3xvx0=; fh=/NbIdVsjiVtuUXo6JKFkhHYQCAksfSdBPWHKTMf6bDs=; b=P6YMu8ISfUP2c1l8UTcslWgSifFrsoLyNwxoTgitIhQA4O0w9NlriHR6na5Q1ROHnQ /UF1ehD08SVuYE/ik1O+esuM5AOlYfxlaAkHHT+E8flQTCIN8+AhgzxAn1QoiVX+4ZBZ sCdfP9A+vW+Cv+fXG+7v3itMKHNMl+9urF5+8VC7NpFgJQQyibaCGSkym5llI99ceQBh 9L/1qQLao+zoC/Sz7Zu75OwZVy8XtiU81JP9awzPqfKHuwtOEdO5gdJSbgQhnDgFq9Kv evSxpe5u/vAF+MsNYokag+4Nk5vEsJLY82U0WLiABRVoxISeO2/QMTk1haIb/Ym/Nmx0 MPOw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@xen.org header.s=20200302mail header.b=pgSDHfra; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id c4-20020a656744000000b005b8f0c8ddbasi2227494pgu.243.2023.11.02.09.22.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Nov 2023 09:22:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@xen.org header.s=20200302mail header.b=pgSDHfra; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 3B8C0802284B; Thu, 2 Nov 2023 09:22:17 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1376572AbjKBQWF (ORCPT + 35 others); Thu, 2 Nov 2023 12:22:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60628 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234506AbjKBQWD (ORCPT ); Thu, 2 Nov 2023 12:22:03 -0400 Received: from mail.xenproject.org (mail.xenproject.org [104.130.215.37]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A521212D; Thu, 2 Nov 2023 09:21:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=xen.org; s=20200302mail; h=Content-Transfer-Encoding:MIME-Version:Message-Id:Date: Subject:To:From; bh=ypWhZ+TKbQ7/ZkY7xWEAWru4yXqntW8dv6C3cB3xvx0=; b=pgSDHfrab xNkCqsrmw+kXGBJ0a36S9+XoCL1X+N4d5ov15BPOLNqwCYrjuNFMaF4tsOzG13VAcSQ9Sjcfv1COB RpwqTiVsrOAS+V6ZDUXIeE1VNSahLSXgBI1O+P5cnYROXM2Kbjw3bH2hOMcPiI559/K8XoE1hc3O0 cSvYuSLQ=; Received: from xenbits.xenproject.org ([104.239.192.120]) by mail.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1qyaRs-0001tA-Fv; Thu, 02 Nov 2023 16:21:36 +0000 Received: from ec2-63-33-11-17.eu-west-1.compute.amazonaws.com ([63.33.11.17] helo=REM-PW02S00X.ant.amazon.com) by xenbits.xenproject.org with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qyaRs-0000Vf-6Y; Thu, 02 Nov 2023 16:21:36 +0000 From: Paul Durrant To: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , David Woodhouse , Paul Durrant , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v5] KVM x86/xen: add an override for PVCLOCK_TSC_STABLE_BIT Date: Thu, 2 Nov 2023 16:21:28 +0000 Message-Id: <20231102162128.2353459-1-paul@xen.org> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Thu, 02 Nov 2023 09:22:17 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781469954042528444 X-GMAIL-MSGID: 1781469954042528444 From: Paul Durrant Unless explicitly told to do so (by passing 'clocksource=tsc' and 'tsc=stable:socket', and then jumping through some hoops concerning potential CPU hotplug) Xen will never use TSC as its clocksource. Hence, by default, a Xen guest will not see PVCLOCK_TSC_STABLE_BIT set in either the primary or secondary pvclock memory areas. This has led to bugs in some guest kernels which only become evident if PVCLOCK_TSC_STABLE_BIT *is* set in the pvclocks. Hence, to support such guests, give the VMM a new Xen HVM config flag to tell KVM to forcibly clear the bit in the Xen pvclocks. Signed-off-by: Paul Durrant Reviewed-by: David Woodhouse --- v5: - Fix warning reported by kernel test robot. v4: - Re-base. - Re-work 'update_pvclock' test as requested. v3: - Moved clearing of PVCLOCK_TSC_STABLE_BIT the right side of the memcpy(). - Added an all-vCPUs KVM_REQ_CLOCK_UPDATE when the HVM config flag is changed. --- Documentation/virt/kvm/api.rst | 6 ++++++ arch/x86/kvm/x86.c | 28 +++++++++++++++++++++++----- arch/x86/kvm/xen.c | 9 ++++++++- include/uapi/linux/kvm.h | 1 + 4 files changed, 38 insertions(+), 6 deletions(-) base-commit: 45b890f7689eb0aba454fc5831d2d79763781677 diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 7025b3751027..a9bdd25826d1 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -8374,6 +8374,7 @@ PVHVM guests. Valid flags are:: #define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL (1 << 4) #define KVM_XEN_HVM_CONFIG_EVTCHN_SEND (1 << 5) #define KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG (1 << 6) + #define KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE (1 << 7) The KVM_XEN_HVM_CONFIG_HYPERCALL_MSR flag indicates that the KVM_XEN_HVM_CONFIG ioctl is available, for the guest to set its hypercall page. @@ -8417,6 +8418,11 @@ behave more correctly, not using the XEN_RUNSTATE_UPDATE flag until/unless specifically enabled (by the guest making the hypercall, causing the VMM to enable the KVM_XEN_ATTR_TYPE_RUNSTATE_UPDATE_FLAG attribute). +The KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE flag indicates that KVM supports +clearing the PVCLOCK_TSC_STABLE_BIT flag in Xen pvclock sources. This will be +done when the KVM_CAP_XEN_HVM ioctl sets the +KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE flag. + 8.31 KVM_CAP_PPC_MULTITCE ------------------------- diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2c924075f6f1..cc8d1ae29be3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3104,7 +3104,8 @@ u64 get_kvmclock_ns(struct kvm *kvm) static void kvm_setup_guest_pvclock(struct kvm_vcpu *v, struct gfn_to_pfn_cache *gpc, - unsigned int offset) + unsigned int offset, + bool force_tsc_unstable) { struct kvm_vcpu_arch *vcpu = &v->arch; struct pvclock_vcpu_time_info *guest_hv_clock; @@ -3141,6 +3142,10 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v, } memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock)); + + if (force_tsc_unstable) + guest_hv_clock->flags &= ~PVCLOCK_TSC_STABLE_BIT; + smp_wmb(); guest_hv_clock->version = ++vcpu->hv_clock.version; @@ -3161,6 +3166,16 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) u64 tsc_timestamp, host_tsc; u8 pvclock_flags; bool use_master_clock; +#ifdef CONFIG_KVM_XEN + /* + * For Xen guests we may need to override PVCLOCK_TSC_STABLE_BIT as unless + * explicitly told to use TSC as its clocksource Xen will not set this bit. + * This default behaviour led to bugs in some guest kernels which cause + * problems if they observe PVCLOCK_TSC_STABLE_BIT in the pvclock flags. + */ + bool xen_pvclock_tsc_unstable = + ka->xen_hvm_config.flags & KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE; +#endif kernel_ns = 0; host_tsc = 0; @@ -3239,13 +3254,15 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) vcpu->hv_clock.flags = pvclock_flags; if (vcpu->pv_time.active) - kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0); + kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0, false); #ifdef CONFIG_KVM_XEN if (vcpu->xen.vcpu_info_cache.active) kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_info_cache, - offsetof(struct compat_vcpu_info, time)); + offsetof(struct compat_vcpu_info, time), + xen_pvclock_tsc_unstable); if (vcpu->xen.vcpu_time_info_cache.active) - kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_time_info_cache, 0); + kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_time_info_cache, 0, + xen_pvclock_tsc_unstable); #endif kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock); return 0; @@ -4638,7 +4655,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL | KVM_XEN_HVM_CONFIG_SHARED_INFO | KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL | - KVM_XEN_HVM_CONFIG_EVTCHN_SEND; + KVM_XEN_HVM_CONFIG_EVTCHN_SEND | + KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE; if (sched_info_on()) r |= KVM_XEN_HVM_CONFIG_RUNSTATE | KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG; diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index e53fad915a62..e43948b87f94 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -1162,7 +1162,9 @@ int kvm_xen_hvm_config(struct kvm *kvm, struct kvm_xen_hvm_config *xhc) { /* Only some feature flags need to be *enabled* by userspace */ u32 permitted_flags = KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL | - KVM_XEN_HVM_CONFIG_EVTCHN_SEND; + KVM_XEN_HVM_CONFIG_EVTCHN_SEND | + KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE; + u32 old_flags; if (xhc->flags & ~permitted_flags) return -EINVAL; @@ -1183,9 +1185,14 @@ int kvm_xen_hvm_config(struct kvm *kvm, struct kvm_xen_hvm_config *xhc) else if (!xhc->msr && kvm->arch.xen_hvm_config.msr) static_branch_slow_dec_deferred(&kvm_xen_enabled); + old_flags = kvm->arch.xen_hvm_config.flags; memcpy(&kvm->arch.xen_hvm_config, xhc, sizeof(*xhc)); mutex_unlock(&kvm->arch.xen.xen_lock); + + if ((old_flags ^ xhc->flags) & KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE) + kvm_make_all_cpus_request(kvm, KVM_REQ_CLOCK_UPDATE); + return 0; } diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 211b86de35ac..ae90294456df 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1291,6 +1291,7 @@ struct kvm_x86_mce { #define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL (1 << 4) #define KVM_XEN_HVM_CONFIG_EVTCHN_SEND (1 << 5) #define KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG (1 << 6) +#define KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE (1 << 7) struct kvm_xen_hvm_config { __u32 flags;