Message ID | 20231108183003.5981-7-xin3.li@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:aa0b:0:b0:403:3b70:6f57 with SMTP id k11csp1125499vqo; Wed, 8 Nov 2023 11:04:36 -0800 (PST) X-Google-Smtp-Source: AGHT+IFspqjVsleKElEyiwlfh23y/wKNikt/y9YkYuDlztjnSqGhahCXsAfGjXSKB+qivpffNaH2 X-Received: by 2002:a17:90b:2551:b0:280:a002:be85 with SMTP id nw17-20020a17090b255100b00280a002be85mr4112966pjb.20.1699470275679; Wed, 08 Nov 2023 11:04:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699470275; cv=none; d=google.com; s=arc-20160816; b=pGwLUni46xrPmLqu3VY8pJPzQb2SDclzX0+kQLrIsnyU2RnlSN+zMcOx/Wa+Ucjbmn wcsaznXBI7yHqkZ2s/NhyZlPEmtIAid310/IQE0QxquHovjLDDiGX2JquGZXIYRJQjvP RiaK906ixusU4sQBAckpAJcA//6MV3xjdCwsRnjE8BiEjlOZj63XQ1sio82jgRpzDDmZ eQexrkdfbb4StZdGxyfxYGzcN5CW2I+uAKy6gHy0Ou45U6kfvK95mwvnyjY5L01vlj7n npaItI/Bo11X7zOHKMmlGjrsx9mBubN8fuCXvm2YLwovGWLo3hKuV1B8yLewcz6ERMj9 5pTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=QsbPjh4EuFtOjAwoFni9OherG17rDfUIOLQZ50G8BuI=; fh=jyCs2STAghiemYCMqutk2CMon3BCEX6sSSqaVLqDaaU=; b=uqxmjjnBIvufCM4cg0nO9/F8KWxcraaN9+lPUwqK5GhZ9ZsAqSraQgaCarRBJ4RP4k 30sMNREmNvD89H1a2TFqZ2ydRIg4UrPgE7AT5+S2JxhX8aCp2bVsJ0j3EYB27fPFiOBg rl1iT8rePEPk0RHtesjYWw1cYzCuTMCX4mjH4azAZ0RG9oawX2vV8RhDhYcvkNcYqPvC Ww3J66zDp1HEwUZf3i6i4WBV49FcFxUetewAOJmBahkxRzB2GVW7+3NCrUFal6HUEBC3 Nv9FByo/nAU6KxOrxqvEXZUpVq7xF5oYEUZWESET0NVjQ6IEiAXbIfLD8sc48C79PyYW VnDg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=eX3fdQK8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id gm10-20020a17090b100a00b0028098f4dc5fsi2799825pjb.106.2023.11.08.11.04.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Nov 2023 11:04:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=eX3fdQK8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id E49B681B75CD; Wed, 8 Nov 2023 11:01:12 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233122AbjKHTAo (ORCPT <rfc822;jaysivo@gmail.com> + 32 others); Wed, 8 Nov 2023 14:00:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53200 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231961AbjKHTAV (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 8 Nov 2023 14:00:21 -0500 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C52DB2115; Wed, 8 Nov 2023 11:00:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1699470020; x=1731006020; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=IRPQ5ioteqwh4CfCqIjfG/Rnvw8uAQhRAapN4CI6IQw=; b=eX3fdQK8JsApP9z8bxI4pPxxR05RM8YyOiRfWXbjw2rJEqxlm7zRdp9W pQbg8cgk1w2aSDFDt5Y4anHI8VTAaBNkINTyhDQb/1mE+LlI8/TzxAfab GcHEJgtZTmsFsgDJ74D8FrD7720XbMkvK2pNziOaqqiYfvo7KQ+c6+ZAo 06tssZmc1nDnA3IQAvb7maZv28mcRfP0B7eIOzcT/aDpfeX3JaRsUWD0y uQYc0Y3h7d7v+D7RTtTxkal8uOfy9RKjxXpDSuDPe0UHamVYQhigFf+/4 HixVkH5BZq+xFt5qkdY89sd433ufE0dX1ansoz59eZeHG/a+AhtPv+QvK A==; X-IronPort-AV: E=McAfee;i="6600,9927,10888"; a="8486267" X-IronPort-AV: E=Sophos;i="6.03,287,1694761200"; d="scan'208";a="8486267" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Nov 2023 11:00:19 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.03,287,1694761200"; d="scan'208";a="10892433" Received: from unknown (HELO fred..) ([172.25.112.68]) by orviesa001.jf.intel.com with ESMTP; 08 Nov 2023 11:00:18 -0800 From: Xin Li <xin3.li@intel.com> To: kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, corbet@lwn.net, kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, vkuznets@redhat.com, peterz@infradead.org, ravi.v.shankar@intel.com Subject: [PATCH v1 06/23] KVM: VMX: Defer enabling FRED MSRs save/load until after set CPUID Date: Wed, 8 Nov 2023 10:29:46 -0800 Message-ID: <20231108183003.5981-7-xin3.li@intel.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231108183003.5981-1-xin3.li@intel.com> References: <20231108183003.5981-1-xin3.li@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Wed, 08 Nov 2023 11:01:13 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782023744183480738 X-GMAIL-MSGID: 1782023744183480738 |
Series |
Enable FRED with KVM VMX
|
|
Commit Message
Li, Xin3
Nov. 8, 2023, 6:29 p.m. UTC
Clear FRED VM entry/exit controls when initializing a vCPU, and set these controls only if FRED is enumerated after set CPUID. FRED VM entry/exit controls need to be set to establish context sufficient to support FRED event delivery immediately after VM entry and exit. However it is not required to save/load FRED MSRs for a non-FRED guest, which aren't supposed to access FRED MSRs. A non-FRED guest should get #GP upon accessing FRED MSRs, otherwise it corrupts host FRED MSRs. Tested-by: Shan Kang <shan.kang@intel.com> Signed-off-by: Xin Li <xin3.li@intel.com> --- arch/x86/kvm/vmx/vmx.c | 34 +++++++++++++++++++++++++++++++++- 1 file changed, 33 insertions(+), 1 deletion(-)
Comments
On Wed, Nov 08, 2023 at 10:29:46AM -0800, Xin Li wrote: >Clear FRED VM entry/exit controls when initializing a vCPU, and set >these controls only if FRED is enumerated after set CPUID. > >FRED VM entry/exit controls need to be set to establish context >sufficient to support FRED event delivery immediately after VM entry >and exit. However it is not required to save/load FRED MSRs for >a non-FRED guest, which aren't supposed to access FRED MSRs. > >A non-FRED guest should get #GP upon accessing FRED MSRs, otherwise >it corrupts host FRED MSRs. > >Tested-by: Shan Kang <shan.kang@intel.com> >Signed-off-by: Xin Li <xin3.li@intel.com> >--- > arch/x86/kvm/vmx/vmx.c | 34 +++++++++++++++++++++++++++++++++- > 1 file changed, 33 insertions(+), 1 deletion(-) > >diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c >index 9186f41974ab..5d4786812664 100644 >--- a/arch/x86/kvm/vmx/vmx.c >+++ b/arch/x86/kvm/vmx/vmx.c >@@ -4423,6 +4423,9 @@ static u32 vmx_vmentry_ctrl(void) > if (cpu_has_perf_global_ctrl_bug()) > vmentry_ctrl &= ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL; > >+ /* Whether to load guest FRED MSRs is deferred until after set CPUID */ >+ vmentry_ctrl &= ~VM_ENTRY_LOAD_IA32_FRED; >+ > return vmentry_ctrl; > } > >@@ -4458,7 +4461,13 @@ static u32 vmx_vmexit_ctrl(void) > > static u64 vmx_secondary_vmexit_ctrl(void) > { >- return vmcs_config.secondary_vmexit_ctrl; >+ u64 secondary_vmexit_ctrl = vmcs_config.secondary_vmexit_ctrl; >+ >+ /* Whether to save/load FRED MSRs is deferred until after set CPUID */ >+ secondary_vmexit_ctrl &= ~(SECONDARY_VM_EXIT_SAVE_IA32_FRED | >+ SECONDARY_VM_EXIT_LOAD_IA32_FRED); >+ >+ return secondary_vmexit_ctrl; > } > > static void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu) >@@ -7785,10 +7794,33 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu) > vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4)); > } > >+static void vmx_vcpu_config_fred_after_set_cpuid(struct kvm_vcpu *vcpu) >+{ >+ struct vcpu_vmx *vmx = to_vmx(vcpu); >+ >+ if (!cpu_feature_enabled(X86_FEATURE_FRED) || >+ !guest_cpuid_has(vcpu, X86_FEATURE_FRED)) >+ return; >+ >+ /* Enable loading guest FRED MSRs from VMCS */ >+ vm_entry_controls_setbit(vmx, VM_ENTRY_LOAD_IA32_FRED); >+ >+ /* >+ * Enable saving guest FRED MSRs into VMCS and loading host FRED MSRs >+ * from VMCS. >+ */ >+ vm_exit_controls_setbit(vmx, VM_EXIT_ACTIVATE_SECONDARY_CONTROLS); >+ secondary_vm_exit_controls_setbit(vmx, >+ SECONDARY_VM_EXIT_SAVE_IA32_FRED | >+ SECONDARY_VM_EXIT_LOAD_IA32_FRED); all above vmcs controls need to be cleared if guest doesn't enumerate FRED, see https://lore.kernel.org/all/ZJYzPn7ipYfO0fLZ@google.com/ Clearing VM_EXIT_ACTIVATE_SECONDARY_CONTROLS may be problematic when new bits are added to secondary vmcs controls. Why not keep VM_EXIT_ACTIVATE_SECONDARY_CONTROLS always on if it is supported? or you see any perf impact? >+} >+ > static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > { > struct vcpu_vmx *vmx = to_vmx(vcpu); > >+ vmx_vcpu_config_fred_after_set_cpuid(vcpu); >+ > /* > * XSAVES is effectively enabled if and only if XSAVE is also exposed > * to the guest. XSAVES depends on CR4.OSXSAVE, and CR4.OSXSAVE can be >-- >2.42.0 > >
> >+static void vmx_vcpu_config_fred_after_set_cpuid(struct kvm_vcpu *vcpu) > >+{ > >+ struct vcpu_vmx *vmx = to_vmx(vcpu); > >+ > >+ if (!cpu_feature_enabled(X86_FEATURE_FRED) || > >+ !guest_cpuid_has(vcpu, X86_FEATURE_FRED)) > >+ return; > >+ > >+ /* Enable loading guest FRED MSRs from VMCS */ > >+ vm_entry_controls_setbit(vmx, VM_ENTRY_LOAD_IA32_FRED); > >+ > >+ /* > >+ * Enable saving guest FRED MSRs into VMCS and loading host FRED MSRs > >+ * from VMCS. > >+ */ > >+ vm_exit_controls_setbit(vmx, > VM_EXIT_ACTIVATE_SECONDARY_CONTROLS); > >+ secondary_vm_exit_controls_setbit(vmx, > >+ SECONDARY_VM_EXIT_SAVE_IA32_FRED > | > >+ > SECONDARY_VM_EXIT_LOAD_IA32_FRED); > > all above vmcs controls need to be cleared if guest doesn't enumerate FRED, see > > https://lore.kernel.org/all/ZJYzPn7ipYfO0fLZ@google.com/ Good point, the user space could set cpuid multiple times... > Clearing VM_EXIT_ACTIVATE_SECONDARY_CONTROLS may be problematic when > new bits are added to secondary vmcs controls. Why not keep > VM_EXIT_ACTIVATE_SECONDARY_CONTROLS always on if it is supported? or you > see any perf impact? I think it from the other way, why keeps hw loading it on every vmentry even if it's not used by a guest? Different CPUs may implement it in different ways, which we can't assume. Other features needing it should set it separately, say with a refcount.
On Thu, Nov 09, 2023, Xin3 Li wrote: > > >+static void vmx_vcpu_config_fred_after_set_cpuid(struct kvm_vcpu *vcpu) > > >+{ > > >+ struct vcpu_vmx *vmx = to_vmx(vcpu); > > >+ > > >+ if (!cpu_feature_enabled(X86_FEATURE_FRED) || > > >+ !guest_cpuid_has(vcpu, X86_FEATURE_FRED)) > > >+ return; > > >+ > > >+ /* Enable loading guest FRED MSRs from VMCS */ > > >+ vm_entry_controls_setbit(vmx, VM_ENTRY_LOAD_IA32_FRED); > > >+ > > >+ /* > > >+ * Enable saving guest FRED MSRs into VMCS and loading host FRED MSRs > > >+ * from VMCS. > > >+ */ > > >+ vm_exit_controls_setbit(vmx, > > VM_EXIT_ACTIVATE_SECONDARY_CONTROLS); > > >+ secondary_vm_exit_controls_setbit(vmx, > > >+ SECONDARY_VM_EXIT_SAVE_IA32_FRED > > | > > >+ > > SECONDARY_VM_EXIT_LOAD_IA32_FRED); > > > > all above vmcs controls need to be cleared if guest doesn't enumerate FRED, see > > > > https://lore.kernel.org/all/ZJYzPn7ipYfO0fLZ@google.com/ > > Good point, the user space could set cpuid multiple times... > > > Clearing VM_EXIT_ACTIVATE_SECONDARY_CONTROLS may be problematic when > > new bits are added to secondary vmcs controls. Why not keep > > VM_EXIT_ACTIVATE_SECONDARY_CONTROLS always on if it is supported? or you > > see any perf impact? > > I think it from the other way, why keeps hw loading it on every vmentry > even if it's not used by a guest? Oh, yeesh, this is clearing the activation control. Yeah, NAK to that, just leave it set. If it weren't for the fact that there is apparently a metrict ton of FRED state (I thought the whole point of FRED was to kill off legacy crap like CPL1 and CPL2 stacks?) _and_ that KVM needs to toggle MSR intercepts, I'd probably push back on toggling even the FRED controls. > Different CPUs may implement it in different ways, which we can't assume. Implement what in a different way? The VMCS fields and FRED are architectural. The internal layout of the VMCS is uarch specific, but the encodings and semantics absolutely cannot change without breaking software. And if Intel does something asinine like make a control active-low then we have far, far bigger problems. > Other features needing it should set it separately, say with a refcount. Absolutely not. Unless Intel screwed up the implementation, the cost of leaving VM_EXIT_ACTIVATE_SECONDARY_CONTROLS set when it's supported shouldn't even be measurable.
> > > Clearing VM_EXIT_ACTIVATE_SECONDARY_CONTROLS may be problematic when > > > new bits are added to secondary vmcs controls. Why not keep > > > VM_EXIT_ACTIVATE_SECONDARY_CONTROLS always on if it is supported? or > > > you see any perf impact? > > > > I think it from the other way, why keeps hw loading it on every > > vmentry even if it's not used by a guest? > > Oh, yeesh, this is clearing the activation control. Yeah, NAK to that, just leave it > set. If it weren't for the fact that there is apparently a metrict ton of FRED state (I > thought the whole point of FRED was to kill off legacy crap like > CPL1 and CPL2 stacks?) _and_ that KVM needs to toggle MSR intercepts, I'd > probably push back on toggling even the FRED controls. To me, FRED is to _architecturally_ do the right thing for x86 event handling. I don't think FRED's major purpose is to kill legacy craps, but longer term it paves the way for that. Yeah, I would like to discuss whether to toggle FRED controls. > > > Different CPUs may implement it in different ways, which we can't assume. > > Implement what in a different way? The VMCS fields and FRED are architectural. > The internal layout of the VMCS is uarch specific, but the encodings and semantics > absolutely cannot change without breaking software. And if Intel does something > asinine like make a control active-low then we have far, far bigger problems. I should have made it clear that I wasn't talking at the ISA level. And of course CPU uarch implementations should be transparent to software. I mean a CPU uarch could choose to check the activation bit in the VM exit controls first and then decide whether to load the 2nd VM exit controls. While if resources allow, a CPU uarch could always load the 2nd VM exit controls. BTW, I believe the active-low controls are really gone with new features. All new controls are all 0s by default. > > Other features needing it should set it separately, say with a refcount. > > Absolutely not. Unless Intel screwed up the implementation, the cost of leaving > VM_EXIT_ACTIVATE_SECONDARY_CONTROLS set when it's supported shouldn't > even be measurable. I do hope so. However, I don't know whether this is guaranteed or not on all uarch implementations. A decision to leave it set is good enough for now. Thanks! Xin
On Tue, Nov 14, 2023, Xin3 Li wrote: > > Implement what in a different way? The VMCS fields and FRED are architectural. > > The internal layout of the VMCS is uarch specific, but the encodings and semantics > > absolutely cannot change without breaking software. And if Intel does something > > asinine like make a control active-low then we have far, far bigger problems. > > I should have made it clear that I wasn't talking at the ISA level. And > of course CPU uarch implementations should be transparent to software. > > I mean a CPU uarch could choose to check the activation bit in the VM exit > controls first and then decide whether to load the 2nd VM exit controls. > While if resources allow, a CPU uarch could always load the 2nd VM exit > controls. And why does that matter? Loading a field speculatively/out-of-order is fine, consuming it when it architecturally is supposed to be ignored is not.
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 9186f41974ab..5d4786812664 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -4423,6 +4423,9 @@ static u32 vmx_vmentry_ctrl(void) if (cpu_has_perf_global_ctrl_bug()) vmentry_ctrl &= ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL; + /* Whether to load guest FRED MSRs is deferred until after set CPUID */ + vmentry_ctrl &= ~VM_ENTRY_LOAD_IA32_FRED; + return vmentry_ctrl; } @@ -4458,7 +4461,13 @@ static u32 vmx_vmexit_ctrl(void) static u64 vmx_secondary_vmexit_ctrl(void) { - return vmcs_config.secondary_vmexit_ctrl; + u64 secondary_vmexit_ctrl = vmcs_config.secondary_vmexit_ctrl; + + /* Whether to save/load FRED MSRs is deferred until after set CPUID */ + secondary_vmexit_ctrl &= ~(SECONDARY_VM_EXIT_SAVE_IA32_FRED | + SECONDARY_VM_EXIT_LOAD_IA32_FRED); + + return secondary_vmexit_ctrl; } static void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu) @@ -7785,10 +7794,33 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu) vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4)); } +static void vmx_vcpu_config_fred_after_set_cpuid(struct kvm_vcpu *vcpu) +{ + struct vcpu_vmx *vmx = to_vmx(vcpu); + + if (!cpu_feature_enabled(X86_FEATURE_FRED) || + !guest_cpuid_has(vcpu, X86_FEATURE_FRED)) + return; + + /* Enable loading guest FRED MSRs from VMCS */ + vm_entry_controls_setbit(vmx, VM_ENTRY_LOAD_IA32_FRED); + + /* + * Enable saving guest FRED MSRs into VMCS and loading host FRED MSRs + * from VMCS. + */ + vm_exit_controls_setbit(vmx, VM_EXIT_ACTIVATE_SECONDARY_CONTROLS); + secondary_vm_exit_controls_setbit(vmx, + SECONDARY_VM_EXIT_SAVE_IA32_FRED | + SECONDARY_VM_EXIT_LOAD_IA32_FRED); +} + static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); + vmx_vcpu_config_fred_after_set_cpuid(vcpu); + /* * XSAVES is effectively enabled if and only if XSAVE is also exposed * to the guest. XSAVES depends on CR4.OSXSAVE, and CR4.OSXSAVE can be