Message ID | 20231124055330.138870-23-weijiang.yang@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce62:0:b0:403:3b70:6f57 with SMTP id o2csp985008vqx; Fri, 24 Nov 2023 00:01:45 -0800 (PST) X-Google-Smtp-Source: AGHT+IGgGZ/1buVowRL/NPeOh/i8jrE9K4PWvmNQdiBv2a52W8VvTlMwuM+h3WV945KNhSNKUpx+ X-Received: by 2002:a05:6a20:d704:b0:181:44c:d6a with SMTP id iz4-20020a056a20d70400b00181044c0d6amr3068480pzb.21.1700812905712; Fri, 24 Nov 2023 00:01:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700812905; cv=none; d=google.com; s=arc-20160816; b=AZ4fbtOmlRZSFZUQF++0qM7FZQb1eQA1jChc4Zc9nCielRRf4gTf9rHXnORNDGIgA1 zhAfZUCsNdJCF9XeOI9hmDxOg2/wx1Pz9NakMKe82mwRNq9tC3L41wnJ0KaIkV/pGCqP sSDONk62WCkIyVdnW+h4xzXTQ1R4XpBfeChF5uOd3L1zfwVD5v2SLl+I0mhz8YQ3cUUT 7nx6I99G7C3JUboCfomJuAIPykXshhaZqhd2vX8RDCtbmKw/dV6zX4TPenTTPLsYbt22 GMM5pU5smjDA5O2IMg2NNNiKG4E/9NLP5nBMCJT4txc2hcUfq0vOhEZ+GIiYmH6/hCVn KJ+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=aE9kMJ9kIrzUEh4yr7lsxdSovxM9LhjevMMP2rpPh6U=; fh=uy0EBGgYIm8+MgsxUvKHUXUo3s9z4H9hdLwRv8YoeJU=; b=a+hJ4pHsrFnYSyeEeUzum7S2x0IwKRnMRrqTSwMCoWiLVyL+a+NHadrGyXEUB7gOLl lhYqeOP1GR1yPpYrIkB9eGGtQcFF9znPM9TEMd1vL8bAa2AAeJLSbyQNLNdHPXM56OnW 95ddwQ2cak7R86VNXrJygNIxkliH+qcK0s96vObdlUeBgfofWyK38O7o5uM+BkgMZHyX /ndAqFmnMyTxV6c4Tfq5KAFyZ1/4raDqTLaH92FpOazJ7y5X62762mRDrSmTmrdME0YA qwHLwbg5sUxM0biXIeK/mc8sNqjnQKGQLI0qgmS2Y7KGxBEkMRnfx3A61Pu1YYD1kUOG hLSA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=UJuqdZuX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id 125-20020a630083000000b0057828b85afdsi2948379pga.795.2023.11.24.00.01.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Nov 2023 00:01:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=UJuqdZuX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id A4210836D96B; Fri, 24 Nov 2023 00:01:19 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345264AbjKXIAc (ORCPT <rfc822;jaysivo@gmail.com> + 29 others); Fri, 24 Nov 2023 03:00:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46116 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232939AbjKXH6m (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 24 Nov 2023 02:58:42 -0500 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 317221701; Thu, 23 Nov 2023 23:58:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700812729; x=1732348729; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZMY42zgIRweQyMfjMH09Kkfx/LB5Hk+BgP/exHBfq/0=; b=UJuqdZuX9JqOnJ++QzWJTz0q7Q07KZPYEvMbT+1c20jk+ak0HsK3C6c6 +yBY34VKX2Cs8s+5V58irPxSr/mbMIIOU/bmcrrK+HyPOCnbqmaC4ho41 PwjrgM/Z4i3x9cEptFB2Nb+U0bCEciXUyKXjckllFvPsAkiFORFNoXcTM RfGc+nbmH17rsr2iJkZ7gcW9LM+lz6n4TCT+DC+QAW1vzDTtzJnV9WagK XQ/bsnnr6GaVDr/DSCEH8R5oMRWRAhPVva5Tu85lHMh1Fill0dBQt5d3P wO0dnAvY3Je60FFI1M5HUc47IzZDv/wgrqooe2E3XjuqPK5auIquGi3Z0 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10902"; a="458872401" X-IronPort-AV: E=Sophos;i="6.04,223,1695711600"; d="scan'208";a="458872401" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Nov 2023 23:58:43 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10902"; a="833629859" X-IronPort-AV: E=Sophos;i="6.04,223,1695711600"; d="scan'208";a="833629859" Received: from unknown (HELO embargo.jf.intel.com) ([10.165.9.183]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Nov 2023 23:58:42 -0800 From: Yang Weijiang <weijiang.yang@intel.com> To: seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: peterz@infradead.org, chao.gao@intel.com, rick.p.edgecombe@intel.com, mlevitsk@redhat.com, john.allen@amd.com, weijiang.yang@intel.com Subject: [PATCH v7 22/26] KVM: VMX: Set up interception for CET MSRs Date: Fri, 24 Nov 2023 00:53:26 -0500 Message-Id: <20231124055330.138870-23-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20231124055330.138870-1-weijiang.yang@intel.com> References: <20231124055330.138870-1-weijiang.yang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Fri, 24 Nov 2023 00:01:20 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1783431593316216030 X-GMAIL-MSGID: 1783431593316216030 |
Series |
Enable CET Virtualization
|
|
Commit Message
Yang, Weijiang
Nov. 24, 2023, 5:53 a.m. UTC
Enable/disable CET MSRs interception per associated feature configuration.
Shadow Stack feature requires all CET MSRs passed through to guest to make
it supported in user and supervisor mode while IBT feature only depends on
MSR_IA32_{U,S}_CETS_CET to enable user and supervisor IBT.
Note, this MSR design introduced an architectural limitation of SHSTK and
IBT control for guest, i.e., when SHSTK is exposed, IBT is also available
to guest from architectual perspective since IBT relies on subset of SHSTK
relevant MSRs.
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
arch/x86/kvm/vmx/vmx.c | 42 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+)
Comments
On Fri, 2023-11-24 at 00:53 -0500, Yang Weijiang wrote: > Enable/disable CET MSRs interception per associated feature configuration. > Shadow Stack feature requires all CET MSRs passed through to guest to make > it supported in user and supervisor mode while IBT feature only depends on > MSR_IA32_{U,S}_CETS_CET to enable user and supervisor IBT. > > Note, this MSR design introduced an architectural limitation of SHSTK and > IBT control for guest, i.e., when SHSTK is exposed, IBT is also available > to guest from architectual perspective since IBT relies on subset of SHSTK > relevant MSRs. > > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> > --- > arch/x86/kvm/vmx/vmx.c | 42 ++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 42 insertions(+) > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 554f665e59c3..e484333eddb0 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -699,6 +699,10 @@ static bool is_valid_passthrough_msr(u32 msr) > case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 8: > /* LBR MSRs. These are handled in vmx_update_intercept_for_lbr_msrs() */ > return true; > + case MSR_IA32_U_CET: > + case MSR_IA32_S_CET: > + case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB: > + return true; > } > > r = possible_passthrough_msr_slot(msr) != -ENOENT; > @@ -7766,6 +7770,42 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu) > vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4)); > } > > +static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu) > +{ > + bool incpt; > + > + if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) { > + incpt = !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK); > + > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, > + MSR_TYPE_RW, incpt); > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, > + MSR_TYPE_RW, incpt); > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, > + MSR_TYPE_RW, incpt); > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP, > + MSR_TYPE_RW, incpt); > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP, > + MSR_TYPE_RW, incpt); > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, > + MSR_TYPE_RW, incpt); > + if (guest_cpuid_has(vcpu, X86_FEATURE_LM)) > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_INT_SSP_TAB, > + MSR_TYPE_RW, incpt); > + if (!incpt) > + return; > + } > + > + if (kvm_cpu_cap_has(X86_FEATURE_IBT)) { > + incpt = !guest_cpuid_has(vcpu, X86_FEATURE_IBT); > + > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, > + MSR_TYPE_RW, incpt); > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, > + MSR_TYPE_RW, incpt); > + } > +} > + > static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > { > struct vcpu_vmx *vmx = to_vmx(vcpu); > @@ -7843,6 +7883,8 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > > /* Refresh #PF interception to account for MAXPHYADDR changes. */ > vmx_update_exception_bitmap(vcpu); > + > + vmx_update_intercept_for_cet_msr(vcpu); > } > > static u64 vmx_get_perf_capabilities(void) My review feedback from the previous patch still applies as well, I still think that we should either try a best effort approach to plug this virtualization hole, or we at least should fail guest creation if the virtualization hole is present as I said: "Another, much simpler option is to fail the guest creation if the shadow stack + indirect branch tracking state differs between host and the guest, unless both are disabled in the guest. (in essence don't let the guest be created if (2) or (3) happen)" Please at least tell me what do you think about this. Best regards, Maxim Levitsky
On Thu, Nov 30, 2023 at 07:44:45PM +0200, Maxim Levitsky wrote: >On Fri, 2023-11-24 at 00:53 -0500, Yang Weijiang wrote: >> Enable/disable CET MSRs interception per associated feature configuration. >> Shadow Stack feature requires all CET MSRs passed through to guest to make >> it supported in user and supervisor mode while IBT feature only depends on >> MSR_IA32_{U,S}_CETS_CET to enable user and supervisor IBT. >> >> Note, this MSR design introduced an architectural limitation of SHSTK and >> IBT control for guest, i.e., when SHSTK is exposed, IBT is also available >> to guest from architectual perspective since IBT relies on subset of SHSTK >> relevant MSRs. >> >> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> >> --- >> arch/x86/kvm/vmx/vmx.c | 42 ++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 42 insertions(+) >> >> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c >> index 554f665e59c3..e484333eddb0 100644 >> --- a/arch/x86/kvm/vmx/vmx.c >> +++ b/arch/x86/kvm/vmx/vmx.c >> @@ -699,6 +699,10 @@ static bool is_valid_passthrough_msr(u32 msr) >> case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 8: >> /* LBR MSRs. These are handled in vmx_update_intercept_for_lbr_msrs() */ >> return true; >> + case MSR_IA32_U_CET: >> + case MSR_IA32_S_CET: >> + case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB: >> + return true; >> } >> >> r = possible_passthrough_msr_slot(msr) != -ENOENT; >> @@ -7766,6 +7770,42 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu) >> vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4)); >> } >> >> +static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu) >> +{ >> + bool incpt; >> + >> + if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) { >> + incpt = !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK); >> + >> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, >> + MSR_TYPE_RW, incpt); >> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, >> + MSR_TYPE_RW, incpt); >> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, >> + MSR_TYPE_RW, incpt); >> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP, >> + MSR_TYPE_RW, incpt); >> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP, >> + MSR_TYPE_RW, incpt); >> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, >> + MSR_TYPE_RW, incpt); >> + if (guest_cpuid_has(vcpu, X86_FEATURE_LM)) >> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_INT_SSP_TAB, >> + MSR_TYPE_RW, incpt); >> + if (!incpt) >> + return; >> + } >> + >> + if (kvm_cpu_cap_has(X86_FEATURE_IBT)) { >> + incpt = !guest_cpuid_has(vcpu, X86_FEATURE_IBT); >> + >> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, >> + MSR_TYPE_RW, incpt); >> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, >> + MSR_TYPE_RW, incpt); >> + } >> +} >> + >> static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) >> { >> struct vcpu_vmx *vmx = to_vmx(vcpu); >> @@ -7843,6 +7883,8 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) >> >> /* Refresh #PF interception to account for MAXPHYADDR changes. */ >> vmx_update_exception_bitmap(vcpu); >> + >> + vmx_update_intercept_for_cet_msr(vcpu); >> } >> >> static u64 vmx_get_perf_capabilities(void) > >My review feedback from the previous patch still applies as well, > >I still think that we should either try a best effort approach to plug >this virtualization hole, or we at least should fail guest creation >if the virtualization hole is present as I said: > >"Another, much simpler option is to fail the guest creation if the shadow stack + indirect branch tracking >state differs between host and the guest, unless both are disabled in the guest. >(in essence don't let the guest be created if (2) or (3) happen)" Enforcing a "none" or "all" policy is a temporary solution. in future, if some reserved bits in S/U_CET MSRs are extended for new features, there will be: platform A supports SS + IBT platform B supports SS + IBT + new feature Guests running on B inevitably have the same virtualization hole. and if kvm continues enforcing the policy on B, then VM migration from A to B would be impossible. To me, intercepting S/U_CET MSR and CET_S/U xsave components is intricate and yields marginal benefits. And I also doubt any reasonable OS implementation would depend on #GP of WRMSR to S/U_CET MSRs for functionalities. So, I vote to leave the patch as-is. > >Please at least tell me what do you think about this. > >Best regards, > Maxim Levitsky > > >
On 12/1/2023 1:44 AM, Maxim Levitsky wrote: > On Fri, 2023-11-24 at 00:53 -0500, Yang Weijiang wrote: >> Enable/disable CET MSRs interception per associated feature configuration. >> Shadow Stack feature requires all CET MSRs passed through to guest to make >> it supported in user and supervisor mode while IBT feature only depends on >> MSR_IA32_{U,S}_CETS_CET to enable user and supervisor IBT. >> >> Note, this MSR design introduced an architectural limitation of SHSTK and >> IBT control for guest, i.e., when SHSTK is exposed, IBT is also available >> to guest from architectual perspective since IBT relies on subset of SHSTK >> relevant MSRs. >> >> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> >> --- >> arch/x86/kvm/vmx/vmx.c | 42 ++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 42 insertions(+) >> >> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c >> index 554f665e59c3..e484333eddb0 100644 >> --- a/arch/x86/kvm/vmx/vmx.c >> +++ b/arch/x86/kvm/vmx/vmx.c >> @@ -699,6 +699,10 @@ static bool is_valid_passthrough_msr(u32 msr) >> case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 8: >> /* LBR MSRs. These are handled in vmx_update_intercept_for_lbr_msrs() */ >> return true; >> + case MSR_IA32_U_CET: >> + case MSR_IA32_S_CET: >> + case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB: >> + return true; >> } >> >> r = possible_passthrough_msr_slot(msr) != -ENOENT; >> @@ -7766,6 +7770,42 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu) >> vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4)); >> } >> >> +static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu) >> +{ >> + bool incpt; >> + >> + if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) { >> + incpt = !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK); >> + >> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, >> + MSR_TYPE_RW, incpt); >> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, >> + MSR_TYPE_RW, incpt); >> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, >> + MSR_TYPE_RW, incpt); >> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP, >> + MSR_TYPE_RW, incpt); >> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP, >> + MSR_TYPE_RW, incpt); >> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, >> + MSR_TYPE_RW, incpt); >> + if (guest_cpuid_has(vcpu, X86_FEATURE_LM)) >> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_INT_SSP_TAB, >> + MSR_TYPE_RW, incpt); >> + if (!incpt) >> + return; >> + } >> + >> + if (kvm_cpu_cap_has(X86_FEATURE_IBT)) { >> + incpt = !guest_cpuid_has(vcpu, X86_FEATURE_IBT); >> + >> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, >> + MSR_TYPE_RW, incpt); >> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, >> + MSR_TYPE_RW, incpt); >> + } >> +} >> + >> static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) >> { >> struct vcpu_vmx *vmx = to_vmx(vcpu); >> @@ -7843,6 +7883,8 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) >> >> /* Refresh #PF interception to account for MAXPHYADDR changes. */ >> vmx_update_exception_bitmap(vcpu); >> + >> + vmx_update_intercept_for_cet_msr(vcpu); >> } >> >> static u64 vmx_get_perf_capabilities(void) > My review feedback from the previous patch still applies as well, > > I still think that we should either try a best effort approach to plug > this virtualization hole, or we at least should fail guest creation > if the virtualization hole is present as I said: > > "Another, much simpler option is to fail the guest creation if the shadow stack + indirect branch tracking > state differs between host and the guest, unless both are disabled in the guest. > (in essence don't let the guest be created if (2) or (3) happen)" > > Please at least tell me what do you think about this. Oh, I thought I had replied this patch in v6 but I failed to send it out! Let me explain it a bit, at early stage of this series, I thought of checking relevant host feature enabling status before exposing guest CET features, but it's proved unnecessary and user unfriendly. E.g., we frequently disable host CET features due to whatever reasons on host, then the features cannot be used/tested in guest at all. Technically, guest should be allowed to run the features so long as the dependencies(i.e., xsave related support) are enabled on host and there're no risks brought up by using of the features in guest. I think cloud-computing should share the similar pain point when deploy CET into virtualization usages.
On Fri, 2023-12-01 at 14:33 +0800, Chao Gao wrote: > On Thu, Nov 30, 2023 at 07:44:45PM +0200, Maxim Levitsky wrote: > > On Fri, 2023-11-24 at 00:53 -0500, Yang Weijiang wrote: > > > Enable/disable CET MSRs interception per associated feature configuration. > > > Shadow Stack feature requires all CET MSRs passed through to guest to make > > > it supported in user and supervisor mode while IBT feature only depends on > > > MSR_IA32_{U,S}_CETS_CET to enable user and supervisor IBT. > > > > > > Note, this MSR design introduced an architectural limitation of SHSTK and > > > IBT control for guest, i.e., when SHSTK is exposed, IBT is also available > > > to guest from architectual perspective since IBT relies on subset of SHSTK > > > relevant MSRs. > > > > > > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> > > > --- > > > arch/x86/kvm/vmx/vmx.c | 42 ++++++++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 42 insertions(+) > > > > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > > > index 554f665e59c3..e484333eddb0 100644 > > > --- a/arch/x86/kvm/vmx/vmx.c > > > +++ b/arch/x86/kvm/vmx/vmx.c > > > @@ -699,6 +699,10 @@ static bool is_valid_passthrough_msr(u32 msr) > > > case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 8: > > > /* LBR MSRs. These are handled in vmx_update_intercept_for_lbr_msrs() */ > > > return true; > > > + case MSR_IA32_U_CET: > > > + case MSR_IA32_S_CET: > > > + case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB: > > > + return true; > > > } > > > > > > r = possible_passthrough_msr_slot(msr) != -ENOENT; > > > @@ -7766,6 +7770,42 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu) > > > vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4)); > > > } > > > > > > +static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu) > > > +{ > > > + bool incpt; > > > + > > > + if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) { > > > + incpt = !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK); > > > + > > > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, > > > + MSR_TYPE_RW, incpt); > > > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, > > > + MSR_TYPE_RW, incpt); > > > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, > > > + MSR_TYPE_RW, incpt); > > > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP, > > > + MSR_TYPE_RW, incpt); > > > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP, > > > + MSR_TYPE_RW, incpt); > > > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, > > > + MSR_TYPE_RW, incpt); > > > + if (guest_cpuid_has(vcpu, X86_FEATURE_LM)) > > > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_INT_SSP_TAB, > > > + MSR_TYPE_RW, incpt); > > > + if (!incpt) > > > + return; > > > + } > > > + > > > + if (kvm_cpu_cap_has(X86_FEATURE_IBT)) { > > > + incpt = !guest_cpuid_has(vcpu, X86_FEATURE_IBT); > > > + > > > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, > > > + MSR_TYPE_RW, incpt); > > > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, > > > + MSR_TYPE_RW, incpt); > > > + } > > > +} > > > + > > > static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > > > { > > > struct vcpu_vmx *vmx = to_vmx(vcpu); > > > @@ -7843,6 +7883,8 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > > > > > > /* Refresh #PF interception to account for MAXPHYADDR changes. */ > > > vmx_update_exception_bitmap(vcpu); > > > + > > > + vmx_update_intercept_for_cet_msr(vcpu); > > > } > > > > > > static u64 vmx_get_perf_capabilities(void) > > > > My review feedback from the previous patch still applies as well, > > > > I still think that we should either try a best effort approach to plug > > this virtualization hole, or we at least should fail guest creation > > if the virtualization hole is present as I said: > > > > "Another, much simpler option is to fail the guest creation if the shadow stack + indirect branch tracking > > state differs between host and the guest, unless both are disabled in the guest. > > (in essence don't let the guest be created if (2) or (3) happen)" > > Enforcing a "none" or "all" policy is a temporary solution. in future, if some > reserved bits in S/U_CET MSRs are extended for new features, there will be: > > platform A supports SS + IBT > platform B supports SS + IBT + new feature > > Guests running on B inevitably have the same virtualization hole. and if kvm > continues enforcing the policy on B, then VM migration from A to B would be > impossible. > > To me, intercepting S/U_CET MSR and CET_S/U xsave components is intricate and > yields marginal benefits. And I also doubt any reasonable OS implementation > would depend on #GP of WRMSR to S/U_CET MSRs for functionalities. So, I vote > to leave the patch as-is. To some extent I do agree with you but this can become a huge mess in the future. I think we need at least to tell Intel/AMD about this to ensure that they don't make this thing worse than it already is. Also the very least we can do if we opt to keep things as is, is to document this virtualization hole - we have Documentation/virt/kvm/x86/errata.rst for that. Best regards, Maxim Levitsky > > > Please at least tell me what do you think about this. > > Best regards, > > Maxim Levitsky > > > > > >
On Fri, 2023-12-01 at 17:45 +0800, Yang, Weijiang wrote: > On 12/1/2023 1:44 AM, Maxim Levitsky wrote: > > On Fri, 2023-11-24 at 00:53 -0500, Yang Weijiang wrote: > > > Enable/disable CET MSRs interception per associated feature configuration. > > > Shadow Stack feature requires all CET MSRs passed through to guest to make > > > it supported in user and supervisor mode while IBT feature only depends on > > > MSR_IA32_{U,S}_CETS_CET to enable user and supervisor IBT. > > > > > > Note, this MSR design introduced an architectural limitation of SHSTK and > > > IBT control for guest, i.e., when SHSTK is exposed, IBT is also available > > > to guest from architectual perspective since IBT relies on subset of SHSTK > > > relevant MSRs. > > > > > > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> > > > --- > > > arch/x86/kvm/vmx/vmx.c | 42 ++++++++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 42 insertions(+) > > > > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > > > index 554f665e59c3..e484333eddb0 100644 > > > --- a/arch/x86/kvm/vmx/vmx.c > > > +++ b/arch/x86/kvm/vmx/vmx.c > > > @@ -699,6 +699,10 @@ static bool is_valid_passthrough_msr(u32 msr) > > > case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 8: > > > /* LBR MSRs. These are handled in vmx_update_intercept_for_lbr_msrs() */ > > > return true; > > > + case MSR_IA32_U_CET: > > > + case MSR_IA32_S_CET: > > > + case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB: > > > + return true; > > > } > > > > > > r = possible_passthrough_msr_slot(msr) != -ENOENT; > > > @@ -7766,6 +7770,42 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu) > > > vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4)); > > > } > > > > > > +static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu) > > > +{ > > > + bool incpt; > > > + > > > + if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) { > > > + incpt = !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK); > > > + > > > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, > > > + MSR_TYPE_RW, incpt); > > > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, > > > + MSR_TYPE_RW, incpt); > > > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, > > > + MSR_TYPE_RW, incpt); > > > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP, > > > + MSR_TYPE_RW, incpt); > > > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP, > > > + MSR_TYPE_RW, incpt); > > > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, > > > + MSR_TYPE_RW, incpt); > > > + if (guest_cpuid_has(vcpu, X86_FEATURE_LM)) > > > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_INT_SSP_TAB, > > > + MSR_TYPE_RW, incpt); > > > + if (!incpt) > > > + return; > > > + } > > > + > > > + if (kvm_cpu_cap_has(X86_FEATURE_IBT)) { > > > + incpt = !guest_cpuid_has(vcpu, X86_FEATURE_IBT); > > > + > > > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, > > > + MSR_TYPE_RW, incpt); > > > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, > > > + MSR_TYPE_RW, incpt); > > > + } > > > +} > > > + > > > static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > > > { > > > struct vcpu_vmx *vmx = to_vmx(vcpu); > > > @@ -7843,6 +7883,8 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > > > > > > /* Refresh #PF interception to account for MAXPHYADDR changes. */ > > > vmx_update_exception_bitmap(vcpu); > > > + > > > + vmx_update_intercept_for_cet_msr(vcpu); > > > } > > > > > > static u64 vmx_get_perf_capabilities(void) > > My review feedback from the previous patch still applies as well, > > > > I still think that we should either try a best effort approach to plug > > this virtualization hole, or we at least should fail guest creation > > if the virtualization hole is present as I said: > > > > "Another, much simpler option is to fail the guest creation if the shadow stack + indirect branch tracking > > state differs between host and the guest, unless both are disabled in the guest. > > (in essence don't let the guest be created if (2) or (3) happen)" > > > > Please at least tell me what do you think about this. > > Oh, I thought I had replied this patch in v6 but I failed to send it out! > Let me explain it a bit, at early stage of this series, I thought of checking relevant host > feature enabling status before exposing guest CET features, but it's proved > unnecessary and user unfriendly. > > E.g., we frequently disable host CET features due to whatever reasons on host, then > the features cannot be used/tested in guest at all. Technically, guest should be allowed > to run the features so long as the dependencies(i.e., xsave related support) are enabled > on host and there're no risks brought up by using of the features in guest. To be honest this is a dangerous POV in regard to guest migration: If the VMM were to be lax with features that it exposes to the guest, then the guests will start to make assumptions instead of checking CPUID and then the guest will mysteriously fail when migrated to a machine which actually lacks the features, in addition to not having them in the CPUID. In other words, leaving "undocumented" features opens a slippery slope of later supporting this undocumented behavior. I understand though that CET is problematic, and I overall won't object much to leave things as is but a part of me thinks that we will regret it later. Best regards, Maxim Levitsky > > I think cloud-computing should share the similar pain point when deploy CET into virtualization > usages. > >
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 554f665e59c3..e484333eddb0 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -699,6 +699,10 @@ static bool is_valid_passthrough_msr(u32 msr) case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 8: /* LBR MSRs. These are handled in vmx_update_intercept_for_lbr_msrs() */ return true; + case MSR_IA32_U_CET: + case MSR_IA32_S_CET: + case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB: + return true; } r = possible_passthrough_msr_slot(msr) != -ENOENT; @@ -7766,6 +7770,42 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu) vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4)); } +static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu) +{ + bool incpt; + + if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) { + incpt = !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK); + + vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, + MSR_TYPE_RW, incpt); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, + MSR_TYPE_RW, incpt); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, + MSR_TYPE_RW, incpt); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP, + MSR_TYPE_RW, incpt); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP, + MSR_TYPE_RW, incpt); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, + MSR_TYPE_RW, incpt); + if (guest_cpuid_has(vcpu, X86_FEATURE_LM)) + vmx_set_intercept_for_msr(vcpu, MSR_IA32_INT_SSP_TAB, + MSR_TYPE_RW, incpt); + if (!incpt) + return; + } + + if (kvm_cpu_cap_has(X86_FEATURE_IBT)) { + incpt = !guest_cpuid_has(vcpu, X86_FEATURE_IBT); + + vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, + MSR_TYPE_RW, incpt); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, + MSR_TYPE_RW, incpt); + } +} + static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); @@ -7843,6 +7883,8 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) /* Refresh #PF interception to account for MAXPHYADDR changes. */ vmx_update_exception_bitmap(vcpu); + + vmx_update_intercept_for_cet_msr(vcpu); } static u64 vmx_get_perf_capabilities(void)