Message ID | 20230524061634.54141-2-chao.gao@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2637644vqo; Tue, 23 May 2023 23:39:06 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ78ZvwLcD/7a876b5uHH/nSMi6sI5OHjlLJ+tZp/HDCu5NotF9yOeuL8LeuRujzACJqmg1q X-Received: by 2002:a17:90a:6be2:b0:24e:e6c:794c with SMTP id w89-20020a17090a6be200b0024e0e6c794cmr16407186pjj.38.1684910345689; Tue, 23 May 2023 23:39:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684910345; cv=none; d=google.com; s=arc-20160816; b=XCOXy8j1/Snt55upbiqb5LTySj+iJnd3AZ2NH3XUv4WpWK9Y2K1drR4e+0GaiEhcRA 8SSUoSimmYSIc9cdHm3vs1AkAhsRyc8ZNp16AgtfeHBm0QjzY5dohrbiMW4LBMMC9azH uxHvthm7KmOM3YP1Dg/6ysF/WGtVdY/3PozYg27S/Wa/zZB3uKuk932ClBIgn9xwRngB du8likfBvJQTbKjUCkx5F6lWgLfc0BrmKbQ30RfgD/KXk9gmHV9CdE4NKriBsbqC74eY egtOQa7lrhPFEzmPFc2EPfJqsdLcoLPhblgT4c4oqZ3V3O79JhA1fSNB+Zq/ugf7+NBc C7NQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=61FXRWbGNdl5bZ5UvgRZMe7HpEtRr7/kBHF+N/7e0WM=; b=ByVYIeT7JVE0n1QZOndOsfTfHM7aRxeLcgF47g6R7M8EoiJuNwCFXBein9FXBwJVnv +xYL4C2A98OeruWbvLdzdt6mDRoLosKM7fT6MwSi9OAuNG3zOS7Iod39iPDBJP1tT/Ub cVQ+HPInfXZkekvfnqjJqJXsnnLk+F3Lhqu7cZP4bWHXTf85Wg6UULZ2LLnM60O/ieS6 PB4Nzr9q95tWEvJDMp97fDGtYUqMqghhqqqPaWW7EybzxNltCADNY6Ed8xBsXWGA/UDN zU48fELmkcKbyjEJxL04ItVpXNOhyf8vm18/S2ymqBdOvL7t6zJDv/7+RDnUHI8OVYKY /Lyg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=O6YA8b4g; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o21-20020a17090ac71500b0022c24bf1810si808050pjt.29.2023.05.23.23.38.52; Tue, 23 May 2023 23:39:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=O6YA8b4g; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239681AbjEXGRF (ORCPT <rfc822;ahmedalshaiji.dev@gmail.com> + 99 others); Wed, 24 May 2023 02:17:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50802 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239080AbjEXGRA (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 24 May 2023 02:17:00 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4FF4B11D; Tue, 23 May 2023 23:16:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684909019; x=1716445019; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XvdDalOzO6dPftSCyB2wcRuCPA9Wximcc78/Jb4PyuU=; b=O6YA8b4gyjjao9MvU7aUvjmOibaMgokSQMzGNiFW6DM74nrGsp3/xmeM pxkSUf4Oe/853hX4dDNA/NtBqZh4/QRok3ewwrMRMJPLauy/Cr4C6IyeM OWBS4ZKmO3xAVso+aPNltT3PP6GTI5BLq/fM8OULDshEp4s1hy5XvKCA1 vvptuh8TzR4hyeevSGX1y57CMPOKB6zWWgNrmdm7fOWDNI5k5zj8VqLS7 HbqOcOO2vbmK/pYuyVL043fZFznQzA+ujBCV6BlV2oh6vF7dEBbRKki8s uWUifq1Xv2NJrZPBxWxABseE7opZDEdYY8q7wXctLQWJ2DNgcdh8qhSZ1 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10719"; a="356695065" X-IronPort-AV: E=Sophos;i="6.00,188,1681196400"; d="scan'208";a="356695065" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 May 2023 23:16:59 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10719"; a="704212361" X-IronPort-AV: E=Sophos;i="6.00,188,1681196400"; d="scan'208";a="704212361" Received: from spr.sh.intel.com ([10.239.53.106]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 May 2023 23:16:54 -0700 From: Chao Gao <chao.gao@intel.com> To: kvm@vger.kernel.org, x86@kernel.org Cc: xiaoyao.li@intel.com, Chao Gao <chao.gao@intel.com>, Sean Christopherson <seanjc@google.com>, Paolo Bonzini <pbonzini@redhat.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, "H. Peter Anvin" <hpa@zytor.com>, linux-kernel@vger.kernel.org Subject: [PATCH v2 1/3] KVM: x86: Track supported ARCH_CAPABILITIES in kvm_caps Date: Wed, 24 May 2023 14:16:31 +0800 Message-Id: <20230524061634.54141-2-chao.gao@intel.com> X-Mailer: git-send-email 2.40.0 In-Reply-To: <20230524061634.54141-1-chao.gao@intel.com> References: <20230524061634.54141-1-chao.gao@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766756550208729090?= X-GMAIL-MSGID: =?utf-8?q?1766756550208729090?= |
Series |
MSR_IA32_ARCH_CAPABILITIES cleanups
|
|
Commit Message
Chao Gao
May 24, 2023, 6:16 a.m. UTC
to avoid computing the supported value at runtime every time.
Toggle the ARCH_CAP_SKIP_VMENTRY_L1DFLUSH bit when l1tf_vmx_mitigation
is modified to achieve the same result as runtime computing.
Opportunistically, add a comment to document the problem of allowing
changing the supported value of ARCH_CAPABILITIES and the reason why
we don't fix it.
No functional change intended.
Link: https://lore.kernel.org/all/ZGZhW%2Fx5OWPmx1qD@google.com/
Link: https://lore.kernel.org/all/ZGeU9sYTPxqNGSqI@google.com/
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
arch/x86/kvm/vmx/vmx.c | 25 +++++++++++++++++++++++--
arch/x86/kvm/x86.c | 7 ++++---
arch/x86/kvm/x86.h | 1 +
3 files changed, 28 insertions(+), 5 deletions(-)
Comments
On 5/24/2023 2:16 PM, Chao Gao wrote: > to avoid computing the supported value at runtime every time. > > Toggle the ARCH_CAP_SKIP_VMENTRY_L1DFLUSH bit when l1tf_vmx_mitigation > is modified to achieve the same result as runtime computing. It's not the same result. In kvm_get_arch_capabilities(), host's value is honored. I.e., when host supports ARCH_CAP_SKIP_VMENTRY_L1DFLUSH, l1tf_vmx_mitigation doesn't make any difference to the result. > Opportunistically, add a comment to document the problem of allowing > changing the supported value of ARCH_CAPABILITIES and the reason why > we don't fix it. > > No functional change intended. > > Link: https://lore.kernel.org/all/ZGZhW%2Fx5OWPmx1qD@google.com/ > Link: https://lore.kernel.org/all/ZGeU9sYTPxqNGSqI@google.com/ > Signed-off-by: Chao Gao <chao.gao@intel.com> > --- > arch/x86/kvm/vmx/vmx.c | 25 +++++++++++++++++++++++-- > arch/x86/kvm/x86.c | 7 ++++--- > arch/x86/kvm/x86.h | 1 + > 3 files changed, 28 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 44fb619803b8..8274ef5e89e5 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -309,10 +309,31 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf) > > l1tf_vmx_mitigation = l1tf; > > - if (l1tf != VMENTER_L1D_FLUSH_NEVER) > + /* > + * Update static keys and supported arch capabilities according to > + * the new mitigation state. > + * > + * ARCH_CAP_SKIP_VMENTRY_L1DFLUSH is toggled because if we do cache > + * flushes for L1 guests on (nested) vmlaunch/vmresume to L2, L1 > + * guests can skip the flush and if we don't, then L1 guests need > + * to do a flush. > + * > + * Toggling ARCH_CAP_SKIP_VMENTRY_L1DFLUSH may present inconsistent > + * model to the guest, e.g., if userspace isn't careful, a VM can > + * have vCPUs with different values for ARCH_CAPABILITIES. But > + * there is almost no chance to fix the issue. Because, to present > + * a consistent model, KVM essentially needs to disallow changing > + * the module param after VMs/vCPUs have been created, but that > + * would prevent userspace from toggling the param while VMs are > + * running, e.g., in response to a new vulnerability. > + */ > + if (l1tf != VMENTER_L1D_FLUSH_NEVER) { > static_branch_enable(&vmx_l1d_should_flush); > - else > + kvm_caps.supported_arch_cap |= ARCH_CAP_SKIP_VMENTRY_L1DFLUSH; > + } else { > static_branch_disable(&vmx_l1d_should_flush); > + kvm_caps.supported_arch_cap &= ~ARCH_CAP_SKIP_VMENTRY_L1DFLUSH; > + } > > if (l1tf == VMENTER_L1D_FLUSH_COND) > static_branch_enable(&vmx_l1d_flush_cond); > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index c0778ca39650..2408b5f554b7 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -1672,7 +1672,7 @@ static int kvm_get_msr_feature(struct kvm_msr_entry *msr) > { > switch (msr->index) { > case MSR_IA32_ARCH_CAPABILITIES: > - msr->data = kvm_get_arch_capabilities(); > + msr->data = kvm_caps.supported_arch_cap; > break; > case MSR_IA32_PERF_CAPABILITIES: > msr->data = kvm_caps.supported_perf_cap; > @@ -7156,7 +7156,7 @@ static void kvm_probe_msr_to_save(u32 msr_index) > return; > break; > case MSR_IA32_TSX_CTRL: > - if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR)) > + if (!(kvm_caps.supported_arch_cap & ARCH_CAP_TSX_CTRL_MSR)) > return; > break; > default: > @@ -9532,6 +9532,7 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) > kvm_caps.max_guest_tsc_khz = max; > } > kvm_caps.default_tsc_scaling_ratio = 1ULL << kvm_caps.tsc_scaling_ratio_frac_bits; > + kvm_caps.supported_arch_cap = kvm_get_arch_capabilities(); > kvm_init_msr_lists(); > return 0; > > @@ -11895,7 +11896,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) > if (r) > goto free_guest_fpu; > > - vcpu->arch.arch_capabilities = kvm_get_arch_capabilities(); > + vcpu->arch.arch_capabilities = kvm_caps.supported_arch_cap; > vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT; > kvm_xen_init_vcpu(vcpu); > kvm_vcpu_mtrr_init(vcpu); > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h > index c544602d07a3..d3e524bcc169 100644 > --- a/arch/x86/kvm/x86.h > +++ b/arch/x86/kvm/x86.h > @@ -29,6 +29,7 @@ struct kvm_caps { > u64 supported_xcr0; > u64 supported_xss; > u64 supported_perf_cap; > + u64 supported_arch_cap; > }; > > void kvm_spurious_fault(void);
On Wed, May 24, 2023 at 04:14:10PM +0800, Xiaoyao Li wrote: >On 5/24/2023 2:16 PM, Chao Gao wrote: >> to avoid computing the supported value at runtime every time. >> >> Toggle the ARCH_CAP_SKIP_VMENTRY_L1DFLUSH bit when l1tf_vmx_mitigation >> is modified to achieve the same result as runtime computing. > >It's not the same result. it is because ... > >In kvm_get_arch_capabilities(), host's value is honored. I.e., when host >supports ARCH_CAP_SKIP_VMENTRY_L1DFLUSH, l1tf_vmx_mitigation doesn't make any >difference to the result. ... l1tf_vmx_mitigation should be VMENTER_L1D_FLUSH_NOT_REQUIRED in this case. l1tf_vmx_mitigation cannot be VMENTER_L1D_FLUSH_NEVER. > >> Opportunistically, add a comment to document the problem of allowing >> changing the supported value of ARCH_CAPABILITIES and the reason why >> we don't fix it. >> >> No functional change intended. >> >> Link: https://lore.kernel.org/all/ZGZhW%2Fx5OWPmx1qD@google.com/ >> Link: https://lore.kernel.org/all/ZGeU9sYTPxqNGSqI@google.com/ >> Signed-off-by: Chao Gao <chao.gao@intel.com> >> --- >> arch/x86/kvm/vmx/vmx.c | 25 +++++++++++++++++++++++-- >> arch/x86/kvm/x86.c | 7 ++++--- >> arch/x86/kvm/x86.h | 1 + >> 3 files changed, 28 insertions(+), 5 deletions(-) >> >> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c >> index 44fb619803b8..8274ef5e89e5 100644 >> --- a/arch/x86/kvm/vmx/vmx.c >> +++ b/arch/x86/kvm/vmx/vmx.c >> @@ -309,10 +309,31 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf) >> l1tf_vmx_mitigation = l1tf; >> - if (l1tf != VMENTER_L1D_FLUSH_NEVER) >> + /* >> + * Update static keys and supported arch capabilities according to >> + * the new mitigation state. >> + * >> + * ARCH_CAP_SKIP_VMENTRY_L1DFLUSH is toggled because if we do cache >> + * flushes for L1 guests on (nested) vmlaunch/vmresume to L2, L1 >> + * guests can skip the flush and if we don't, then L1 guests need >> + * to do a flush. >> + * >> + * Toggling ARCH_CAP_SKIP_VMENTRY_L1DFLUSH may present inconsistent >> + * model to the guest, e.g., if userspace isn't careful, a VM can >> + * have vCPUs with different values for ARCH_CAPABILITIES. But >> + * there is almost no chance to fix the issue. Because, to present >> + * a consistent model, KVM essentially needs to disallow changing >> + * the module param after VMs/vCPUs have been created, but that >> + * would prevent userspace from toggling the param while VMs are >> + * running, e.g., in response to a new vulnerability. >> + */ >> + if (l1tf != VMENTER_L1D_FLUSH_NEVER) { >> static_branch_enable(&vmx_l1d_should_flush); >> - else >> + kvm_caps.supported_arch_cap |= ARCH_CAP_SKIP_VMENTRY_L1DFLUSH; >> + } else { >> static_branch_disable(&vmx_l1d_should_flush); >> + kvm_caps.supported_arch_cap &= ~ARCH_CAP_SKIP_VMENTRY_L1DFLUSH; >> + } >> if (l1tf == VMENTER_L1D_FLUSH_COND) >> static_branch_enable(&vmx_l1d_flush_cond); >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index c0778ca39650..2408b5f554b7 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -1672,7 +1672,7 @@ static int kvm_get_msr_feature(struct kvm_msr_entry *msr) >> { >> switch (msr->index) { >> case MSR_IA32_ARCH_CAPABILITIES: >> - msr->data = kvm_get_arch_capabilities(); >> + msr->data = kvm_caps.supported_arch_cap; >> break; >> case MSR_IA32_PERF_CAPABILITIES: >> msr->data = kvm_caps.supported_perf_cap; >> @@ -7156,7 +7156,7 @@ static void kvm_probe_msr_to_save(u32 msr_index) >> return; >> break; >> case MSR_IA32_TSX_CTRL: >> - if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR)) >> + if (!(kvm_caps.supported_arch_cap & ARCH_CAP_TSX_CTRL_MSR)) >> return; >> break; >> default: >> @@ -9532,6 +9532,7 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) >> kvm_caps.max_guest_tsc_khz = max; >> } >> kvm_caps.default_tsc_scaling_ratio = 1ULL << kvm_caps.tsc_scaling_ratio_frac_bits; >> + kvm_caps.supported_arch_cap = kvm_get_arch_capabilities(); >> kvm_init_msr_lists(); >> return 0; >> @@ -11895,7 +11896,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) >> if (r) >> goto free_guest_fpu; >> - vcpu->arch.arch_capabilities = kvm_get_arch_capabilities(); >> + vcpu->arch.arch_capabilities = kvm_caps.supported_arch_cap; >> vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT; >> kvm_xen_init_vcpu(vcpu); >> kvm_vcpu_mtrr_init(vcpu); >> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h >> index c544602d07a3..d3e524bcc169 100644 >> --- a/arch/x86/kvm/x86.h >> +++ b/arch/x86/kvm/x86.h >> @@ -29,6 +29,7 @@ struct kvm_caps { >> u64 supported_xcr0; >> u64 supported_xss; >> u64 supported_perf_cap; >> + u64 supported_arch_cap; >> }; >> void kvm_spurious_fault(void); >
On 5/24/2023 4:32 PM, Chao Gao wrote: > On Wed, May 24, 2023 at 04:14:10PM +0800, Xiaoyao Li wrote: >> On 5/24/2023 2:16 PM, Chao Gao wrote: >>> to avoid computing the supported value at runtime every time. >>> >>> Toggle the ARCH_CAP_SKIP_VMENTRY_L1DFLUSH bit when l1tf_vmx_mitigation >>> is modified to achieve the same result as runtime computing. >> >> It's not the same result. > > it is because ... > >> >> In kvm_get_arch_capabilities(), host's value is honored. I.e., when host >> supports ARCH_CAP_SKIP_VMENTRY_L1DFLUSH, l1tf_vmx_mitigation doesn't make any >> difference to the result. > > ... l1tf_vmx_mitigation should be VMENTER_L1D_FLUSH_NOT_REQUIRED in this > case. l1tf_vmx_mitigation cannot be VMENTER_L1D_FLUSH_NEVER. yes. you are right. Maybe we can clarify it in the changelog anyway, Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> >> >>> Opportunistically, add a comment to document the problem of allowing >>> changing the supported value of ARCH_CAPABILITIES and the reason why >>> we don't fix it. >>> >>> No functional change intended. >>> >>> Link: https://lore.kernel.org/all/ZGZhW%2Fx5OWPmx1qD@google.com/ >>> Link: https://lore.kernel.org/all/ZGeU9sYTPxqNGSqI@google.com/ >>> Signed-off-by: Chao Gao <chao.gao@intel.com> >>> --- >>> arch/x86/kvm/vmx/vmx.c | 25 +++++++++++++++++++++++-- >>> arch/x86/kvm/x86.c | 7 ++++--- >>> arch/x86/kvm/x86.h | 1 + >>> 3 files changed, 28 insertions(+), 5 deletions(-) >>> >>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c >>> index 44fb619803b8..8274ef5e89e5 100644 >>> --- a/arch/x86/kvm/vmx/vmx.c >>> +++ b/arch/x86/kvm/vmx/vmx.c >>> @@ -309,10 +309,31 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf) >>> l1tf_vmx_mitigation = l1tf; >>> - if (l1tf != VMENTER_L1D_FLUSH_NEVER) >>> + /* >>> + * Update static keys and supported arch capabilities according to >>> + * the new mitigation state. >>> + * >>> + * ARCH_CAP_SKIP_VMENTRY_L1DFLUSH is toggled because if we do cache >>> + * flushes for L1 guests on (nested) vmlaunch/vmresume to L2, L1 >>> + * guests can skip the flush and if we don't, then L1 guests need >>> + * to do a flush. >>> + * >>> + * Toggling ARCH_CAP_SKIP_VMENTRY_L1DFLUSH may present inconsistent >>> + * model to the guest, e.g., if userspace isn't careful, a VM can >>> + * have vCPUs with different values for ARCH_CAPABILITIES. But >>> + * there is almost no chance to fix the issue. Because, to present >>> + * a consistent model, KVM essentially needs to disallow changing >>> + * the module param after VMs/vCPUs have been created, but that >>> + * would prevent userspace from toggling the param while VMs are >>> + * running, e.g., in response to a new vulnerability. >>> + */ >>> + if (l1tf != VMENTER_L1D_FLUSH_NEVER) { >>> static_branch_enable(&vmx_l1d_should_flush); >>> - else >>> + kvm_caps.supported_arch_cap |= ARCH_CAP_SKIP_VMENTRY_L1DFLUSH; >>> + } else { >>> static_branch_disable(&vmx_l1d_should_flush); >>> + kvm_caps.supported_arch_cap &= ~ARCH_CAP_SKIP_VMENTRY_L1DFLUSH; >>> + } >>> if (l1tf == VMENTER_L1D_FLUSH_COND) >>> static_branch_enable(&vmx_l1d_flush_cond); >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>> index c0778ca39650..2408b5f554b7 100644 >>> --- a/arch/x86/kvm/x86.c >>> +++ b/arch/x86/kvm/x86.c >>> @@ -1672,7 +1672,7 @@ static int kvm_get_msr_feature(struct kvm_msr_entry *msr) >>> { >>> switch (msr->index) { >>> case MSR_IA32_ARCH_CAPABILITIES: >>> - msr->data = kvm_get_arch_capabilities(); >>> + msr->data = kvm_caps.supported_arch_cap; >>> break; >>> case MSR_IA32_PERF_CAPABILITIES: >>> msr->data = kvm_caps.supported_perf_cap; >>> @@ -7156,7 +7156,7 @@ static void kvm_probe_msr_to_save(u32 msr_index) >>> return; >>> break; >>> case MSR_IA32_TSX_CTRL: >>> - if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR)) >>> + if (!(kvm_caps.supported_arch_cap & ARCH_CAP_TSX_CTRL_MSR)) >>> return; >>> break; >>> default: >>> @@ -9532,6 +9532,7 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) >>> kvm_caps.max_guest_tsc_khz = max; >>> } >>> kvm_caps.default_tsc_scaling_ratio = 1ULL << kvm_caps.tsc_scaling_ratio_frac_bits; >>> + kvm_caps.supported_arch_cap = kvm_get_arch_capabilities(); >>> kvm_init_msr_lists(); >>> return 0; >>> @@ -11895,7 +11896,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) >>> if (r) >>> goto free_guest_fpu; >>> - vcpu->arch.arch_capabilities = kvm_get_arch_capabilities(); >>> + vcpu->arch.arch_capabilities = kvm_caps.supported_arch_cap; >>> vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT; >>> kvm_xen_init_vcpu(vcpu); >>> kvm_vcpu_mtrr_init(vcpu); >>> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h >>> index c544602d07a3..d3e524bcc169 100644 >>> --- a/arch/x86/kvm/x86.h >>> +++ b/arch/x86/kvm/x86.h >>> @@ -29,6 +29,7 @@ struct kvm_caps { >>> u64 supported_xcr0; >>> u64 supported_xss; >>> u64 supported_perf_cap; >>> + u64 supported_arch_cap; >>> }; >>> void kvm_spurious_fault(void); >>
On Wed, May 24, 2023, Chao Gao wrote: > @@ -9532,6 +9532,7 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) > kvm_caps.max_guest_tsc_khz = max; > } > kvm_caps.default_tsc_scaling_ratio = 1ULL << kvm_caps.tsc_scaling_ratio_frac_bits; > + kvm_caps.supported_arch_cap = kvm_get_arch_capabilities(); > kvm_init_msr_lists(); > return 0; > > @@ -11895,7 +11896,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) > if (r) > goto free_guest_fpu; > > - vcpu->arch.arch_capabilities = kvm_get_arch_capabilities(); > + vcpu->arch.arch_capabilities = kvm_caps.supported_arch_cap; > vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT; > kvm_xen_init_vcpu(vcpu); > kvm_vcpu_mtrr_init(vcpu); > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h > index c544602d07a3..d3e524bcc169 100644 > --- a/arch/x86/kvm/x86.h > +++ b/arch/x86/kvm/x86.h > @@ -29,6 +29,7 @@ struct kvm_caps { > u64 supported_xcr0; > u64 supported_xss; > u64 supported_perf_cap; > + u64 supported_arch_cap; Hrm, I take back my earlier vote about using a dynamic snapshot. "supported" isn't quite right. KVM always "supports" advertising SKIP_VMENTRY_L1DFLUSH to the guest. And KVM really does treat the MSR like a CPUID leaf, in that KVM doesn't sanity check the value coming in from userspace. Whether or not that's a good idea is debatable, but it is was it is. The value is more like KVM's current default. Looking at all the uses of both the default/supported value, and the host MSR, I think it makes more sense to snapshot the host value than it does to snapshot and update the default/supported value. The default value is used only when a vCPU is created and when userspace does a system-scoped KVM_GET_MSRS, i.e. avoiding the RDMSR is nice, but making the read super fast isn't necessary, e.g. the overhead of the boot_cpu_has() and boot_cpu_has_bug() checks is negligible. And if KVM snapshots the MSR, the other usage of the host value can be cleaned up too. I'm leaning towards doing this instead of patches [1/3] and [3/3]: From: Sean Christopherson <seanjc@google.com> Date: Tue, 6 Jun 2023 09:20:31 -0700 Subject: [PATCH 1/2] KVM: x86: Snapshot host's MSR_IA32_ARCH_CAPABILITIES Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/kvm/vmx/vmx.c | 22 ++++++---------------- arch/x86/kvm/x86.c | 13 +++++++------ arch/x86/kvm/x86.h | 1 + 3 files changed, 14 insertions(+), 22 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 2d9d155691a7..42d1148f933c 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -255,14 +255,9 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf) return 0; } - if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES)) { - u64 msr; - - rdmsrl(MSR_IA32_ARCH_CAPABILITIES, msr); - if (msr & ARCH_CAP_SKIP_VMENTRY_L1DFLUSH) { - l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_NOT_REQUIRED; - return 0; - } + if (host_arch_capabilities & ARCH_CAP_SKIP_VMENTRY_L1DFLUSH) { + l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_NOT_REQUIRED; + return 0; } /* If set to auto use the default l1tf mitigation method */ @@ -373,15 +368,10 @@ static int vmentry_l1d_flush_get(char *s, const struct kernel_param *kp) static void vmx_setup_fb_clear_ctrl(void) { - u64 msr; - - if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES) && + if ((host_arch_capabilities & ARCH_CAP_FB_CLEAR_CTRL) && !boot_cpu_has_bug(X86_BUG_MDS) && - !boot_cpu_has_bug(X86_BUG_TAA)) { - rdmsrl(MSR_IA32_ARCH_CAPABILITIES, msr); - if (msr & ARCH_CAP_FB_CLEAR_CTRL) - vmx_fb_clear_ctrl_available = true; - } + !boot_cpu_has_bug(X86_BUG_TAA)) + vmx_fb_clear_ctrl_available = true; } static __always_inline void vmx_disable_fb_clear(struct vcpu_vmx *vmx) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7c7be4815eaa..7c2e796fa460 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -237,6 +237,9 @@ EXPORT_SYMBOL_GPL(enable_apicv); u64 __read_mostly host_xss; EXPORT_SYMBOL_GPL(host_xss); +u64 __read_mostly host_arch_capabilities; +EXPORT_SYMBOL_GPL(host_arch_capabilities); + const struct _kvm_stats_desc kvm_vm_stats_desc[] = { KVM_GENERIC_VM_STATS(), STATS_DESC_COUNTER(VM, mmu_shadow_zapped), @@ -1612,12 +1615,7 @@ static bool kvm_is_immutable_feature_msr(u32 msr) static u64 kvm_get_arch_capabilities(void) { - u64 data = 0; - - if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES)) { - rdmsrl(MSR_IA32_ARCH_CAPABILITIES, data); - data &= KVM_SUPPORTED_ARCH_CAP; - } + u64 data = host_arch_capabilities & KVM_SUPPORTED_ARCH_CAP; /* * If nx_huge_pages is enabled, KVM's shadow paging will ensure that @@ -9492,6 +9490,9 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) kvm_init_pmu_capability(ops->pmu_ops); + if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES)) + rdmsrl(MSR_IA32_ARCH_CAPABILITIES, host_arch_capabilities); + r = ops->hardware_setup(); if (r != 0) goto out_mmu_exit; diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 82e3dafc5453..1e7be1f6ab29 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -323,6 +323,7 @@ fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu); extern u64 host_xcr0; extern u64 host_xss; +extern u64 host_arch_capabilities; extern struct kvm_caps kvm_caps; base-commit: 02f1b0b736606f9870595b3089d9c124f9da8be9
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 44fb619803b8..8274ef5e89e5 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -309,10 +309,31 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf) l1tf_vmx_mitigation = l1tf; - if (l1tf != VMENTER_L1D_FLUSH_NEVER) + /* + * Update static keys and supported arch capabilities according to + * the new mitigation state. + * + * ARCH_CAP_SKIP_VMENTRY_L1DFLUSH is toggled because if we do cache + * flushes for L1 guests on (nested) vmlaunch/vmresume to L2, L1 + * guests can skip the flush and if we don't, then L1 guests need + * to do a flush. + * + * Toggling ARCH_CAP_SKIP_VMENTRY_L1DFLUSH may present inconsistent + * model to the guest, e.g., if userspace isn't careful, a VM can + * have vCPUs with different values for ARCH_CAPABILITIES. But + * there is almost no chance to fix the issue. Because, to present + * a consistent model, KVM essentially needs to disallow changing + * the module param after VMs/vCPUs have been created, but that + * would prevent userspace from toggling the param while VMs are + * running, e.g., in response to a new vulnerability. + */ + if (l1tf != VMENTER_L1D_FLUSH_NEVER) { static_branch_enable(&vmx_l1d_should_flush); - else + kvm_caps.supported_arch_cap |= ARCH_CAP_SKIP_VMENTRY_L1DFLUSH; + } else { static_branch_disable(&vmx_l1d_should_flush); + kvm_caps.supported_arch_cap &= ~ARCH_CAP_SKIP_VMENTRY_L1DFLUSH; + } if (l1tf == VMENTER_L1D_FLUSH_COND) static_branch_enable(&vmx_l1d_flush_cond); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c0778ca39650..2408b5f554b7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1672,7 +1672,7 @@ static int kvm_get_msr_feature(struct kvm_msr_entry *msr) { switch (msr->index) { case MSR_IA32_ARCH_CAPABILITIES: - msr->data = kvm_get_arch_capabilities(); + msr->data = kvm_caps.supported_arch_cap; break; case MSR_IA32_PERF_CAPABILITIES: msr->data = kvm_caps.supported_perf_cap; @@ -7156,7 +7156,7 @@ static void kvm_probe_msr_to_save(u32 msr_index) return; break; case MSR_IA32_TSX_CTRL: - if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR)) + if (!(kvm_caps.supported_arch_cap & ARCH_CAP_TSX_CTRL_MSR)) return; break; default: @@ -9532,6 +9532,7 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) kvm_caps.max_guest_tsc_khz = max; } kvm_caps.default_tsc_scaling_ratio = 1ULL << kvm_caps.tsc_scaling_ratio_frac_bits; + kvm_caps.supported_arch_cap = kvm_get_arch_capabilities(); kvm_init_msr_lists(); return 0; @@ -11895,7 +11896,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) if (r) goto free_guest_fpu; - vcpu->arch.arch_capabilities = kvm_get_arch_capabilities(); + vcpu->arch.arch_capabilities = kvm_caps.supported_arch_cap; vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT; kvm_xen_init_vcpu(vcpu); kvm_vcpu_mtrr_init(vcpu); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index c544602d07a3..d3e524bcc169 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -29,6 +29,7 @@ struct kvm_caps { u64 supported_xcr0; u64 supported_xss; u64 supported_perf_cap; + u64 supported_arch_cap; }; void kvm_spurious_fault(void);