Message ID | 20230914063325.85503-12-weijiang.yang@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp309873vqi; Thu, 14 Sep 2023 05:26:40 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF0K9YwiXygtG77sw8MI60OrZZASqcyiHyll1pO0z/WAMjghl40ek/xkSyMgw4QlreNBASN X-Received: by 2002:a05:6a20:a10c:b0:145:3bd9:1389 with SMTP id q12-20020a056a20a10c00b001453bd91389mr4900728pzk.6.1694694399803; Thu, 14 Sep 2023 05:26:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694694399; cv=none; d=google.com; s=arc-20160816; b=Wduw7DH+iR5rAyaCc7ESZI05ZK1d+mUqAlioprvuAk/iwdCzRjs5BBwbHEUGl1DHR5 haIKkoxpX+4hffc8kn/4mRDkmYC4rU6ny6n+zvIb5BuM77CN2xj+E5HaaTOpINeYQ7JM OJY8Zcyhj3p5mM7qdfWqAHf4LG3jr5j0C8UwLCXdVKYljbMEuJLJDppRb5c12mGcYNPx 9RLKri9DrdY7orNjlXA3FI8NS6eEVwz+ngN+cq9E04eCIUdAua3sJ37c01PGUJCzCbOB 9wiCBJwuLmAU4CO8Rk5/yEfYQX2hCcozoB9JrxoTf3mJjd1AEhS1/lftgSja9jUbGa75 OohQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=pF7K1E0yaWZXGkQ8MfUJ9QWwGiQh8ZibNSYdHp4N+oI=; fh=jQqUhNQZPAXcZ44u72Wu3jv2pQizzn2Be2T8mkd1RwU=; b=Vm20RRcfuHCuEjTtnNktFBGdonZvic1A78udhqXFhgO8BnfC72t5+Mk/pWQ0NAFkIL BugCrxqzLAXWD9g2lnVLm3yH2hRmaTZGGiifAnWpDz438BJnvRZ63JCU2BB7y7bSdDvJ UKzlblXAXAHdcAGrnBib8o1pAvSYy54PNzRO8jQOKkTsGUX9qsl/jPPY9sMa9AVmBdug bj7JAHlIU2KBV3fOTm0Oi234xQhetuL4Bouog+hGvItI90kCZ1BW9uiK+OT5k97I4wz2 S8eH6pkwdwERgl35Hrnn+0NHWHfeXhLOZojR30mj5muOTNhnUa6AWtJ5ENc+2wo0XYhk BhlA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Ge1ePs9N; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id o20-20020a17090aac1400b0025c1f64f29dsi1476064pjq.171.2023.09.14.05.26.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Sep 2023 05:26:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Ge1ePs9N; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 331DE8275E7A; Thu, 14 Sep 2023 02:39:09 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237817AbjINJi7 (ORCPT <rfc822;chrisfriedt@gmail.com> + 35 others); Thu, 14 Sep 2023 05:38:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49858 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237308AbjINJiY (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 14 Sep 2023 05:38:24 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3D6211BFF; Thu, 14 Sep 2023 02:38:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1694684300; x=1726220300; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=faimbtNjTig8ktvwEdQFWKOpWka/XGI0e5+YlNc3lTQ=; b=Ge1ePs9NdamCIEJc8g9sVnh5FWpbq1OianxLrjLI64n+mLOzr8RE2Vf8 cqynR8qaTy7SeapKsU0w+Wk8LaDW5v/bS0jb/mweXFBL/XwAGJ5C4nv+R 8jeuDD1uwypS73LPLfhy6+QwfQmI+nFmGXGb9RXv/GlQLkYQFvpMEqrEq qv/iH22Q7KhqkccuvGEW51pf/931cBDry/vUZBJJMBDXUC7brj/z6RRaX BsVFzwg5vWGdEF+ZXDScZKdBwtsQLdRqrGMtCpbgayyCs27I1x5iJGE6t BOpTcBruVmdarA0l8JlqxS5ZzwZYr0cVAeU7wAXIkaNSyHGn/WOGflzp2 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10832"; a="409857363" X-IronPort-AV: E=Sophos;i="6.02,145,1688454000"; d="scan'208";a="409857363" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Sep 2023 02:38:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10832"; a="747656246" X-IronPort-AV: E=Sophos;i="6.02,145,1688454000"; d="scan'208";a="747656246" Received: from embargo.jf.intel.com ([10.165.9.183]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Sep 2023 02:38:19 -0700 From: Yang Weijiang <weijiang.yang@intel.com> To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: dave.hansen@intel.com, peterz@infradead.org, chao.gao@intel.com, rick.p.edgecombe@intel.com, weijiang.yang@intel.com, john.allen@amd.com Subject: [PATCH v6 11/25] KVM: x86: Report XSS as to-be-saved if there are supported features Date: Thu, 14 Sep 2023 02:33:11 -0400 Message-Id: <20230914063325.85503-12-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230914063325.85503-1-weijiang.yang@intel.com> References: <20230914063325.85503-1-weijiang.yang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Thu, 14 Sep 2023 02:39:09 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777015874688527691 X-GMAIL-MSGID: 1777015874688527691 |
Series |
Enable CET Virtualization
|
|
Commit Message
Yang, Weijiang
Sept. 14, 2023, 6:33 a.m. UTC
From: Sean Christopherson <seanjc@google.com> Add MSR_IA32_XSS to list of MSRs reported to userspace if supported_xss is non-zero, i.e. KVM supports at least one XSS based feature. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> --- arch/x86/kvm/x86.c | 5 +++++ 1 file changed, 5 insertions(+)
Comments
On Thu, 2023-09-14 at 02:33 -0400, Yang Weijiang wrote: > From: Sean Christopherson <seanjc@google.com> > > Add MSR_IA32_XSS to list of MSRs reported to userspace if supported_xss > is non-zero, i.e. KVM supports at least one XSS based feature. I can't believe that CET is the first supervisor feature that KVM supports... Ah, now I understand why: 1. XSAVES on AMD can't really be intercepted (other than clearing CR4.OSXSAVE bit, which isn't an option if you want to support AVX for example) On VMX however you can intercept XSAVES and even intercept it only when it touches specific bits of state that you don't want the guest to read/write freely. 2. Even if it was possible to intercept it, guests use XSAVES on every context switch if available and emulating it might be costly. 3. Emulating XSAVES is also not that easy to do correctly. However XSAVES touches various MSRs, thus letting the guest use it unintercepted means giving access to host MSRs, which might be wrong security wise in some cases. Thus I see that KVM hardcodes the IA32_XSS to 0, and that makes the XSAVES work exactly like XSAVE. And for some features which would benefit from XSAVES state components, KVM likely won't even be able to do so due to this limitation. (this is allowed thankfully by the CPUID), forcing the guests to use rdmsr/wrmsr instead. However it is possible to enable IA32_XSS bits in case the msrs XSAVES reads/writes can't do harm to the host, and then KVM can context switch these MSRs when the guest exits and that is what is done here with CET. If you think that a short summary of the above can help the future reader to understand why IA32_XSS support is added only now, it might be a good idea to add a few lines to the changelog of this patch. > > Signed-off-by: Sean Christopherson <seanjc@google.com> > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> > --- > arch/x86/kvm/x86.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index e0b55c043dab..1258d1d6dd52 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -1464,6 +1464,7 @@ static const u32 msrs_to_save_base[] = { > MSR_IA32_UMWAIT_CONTROL, > > MSR_IA32_XFD, MSR_IA32_XFD_ERR, > + MSR_IA32_XSS, > }; > > static const u32 msrs_to_save_pmu[] = { > @@ -7195,6 +7196,10 @@ static void kvm_probe_msr_to_save(u32 msr_index) > if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR)) > return; > break; > + case MSR_IA32_XSS: > + if (!kvm_caps.supported_xss) > + return; > + break; > default: > break; > } Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Best regards, Maxim Levitsky
On Tue, Oct 31, 2023, Maxim Levitsky wrote: > On Thu, 2023-09-14 at 02:33 -0400, Yang Weijiang wrote: > > From: Sean Christopherson <seanjc@google.com> > > > > Add MSR_IA32_XSS to list of MSRs reported to userspace if supported_xss > > is non-zero, i.e. KVM supports at least one XSS based feature. > > > I can't believe that CET is the first supervisor feature that KVM supports... > > Ah, now I understand why: > > 1. XSAVES on AMD can't really be intercepted (other than clearing CR4.OSXSAVE > bit, which isn't an option if you want to support AVX for example) On VMX > however you can intercept XSAVES and even intercept it only when it touches > specific bits of state that you don't want the guest to read/write freely. > > 2. Even if it was possible to intercept it, guests use XSAVES on every > context switch if available and emulating it might be costly. > > 3. Emulating XSAVES is also not that easy to do correctly. > > However XSAVES touches various MSRs, thus letting the guest use it > unintercepted means giving access to host MSRs, which might be wrong security > wise in some cases. > > Thus I see that KVM hardcodes the IA32_XSS to 0, and that makes the XSAVES > work exactly like XSAVE. > > And for some features which would benefit from XSAVES state components, > KVM likely won't even be able to do so due to this limitation. > (this is allowed thankfully by the CPUID), forcing the guests to use > rdmsr/wrmsr instead. Sort of? KVM doesn't (yet) virtualize PASID, HDC, HWP, or arch LBRs (wow, there's a lot of stuff getting thrown into XSTATE), so naturally those aren't supported in XSS. KVM does virtualize Processor Trace (PT), but PT is a bit of a special snowflake. E.g. the host kernel elects NOT to manage PT MSRs via XSTATE, but it would be possible for KVM to the guest to manage PT MSRs via XSTATE. I suspect the answer to PT is threefold: 1. Exposing a feature that isn't "supported" by the host kernel is scary. 2. No one has pushed for the support, e.g. Linux guests obviously don't complain about lack of XSS support for PT. 3. Toggling PT MSR passthrough on XSAVES/XRSTORS accesses would be more complex and less performant than KVM's current approach. Re: #3, KVM does passthrough PT MSRs, but only when the guest is actively using PT. PT is basically a super fancy PMU feature, and so KVM "needs" to load guest state as late as possible before VM-Entry, and load host state as early as possible after VM-Exit. I.e. the context switch happens on *every* entry/exit pair. By passing through PT MSRs only when needed, KVM avoids a rather large pile of RDMSRs and WRMSRs on every entry/exit, as the host values can be kept resident in hardware so long as the main enable bit is cleared in the guest's control MSR (which is context switch via a dedicated VMCS field). XSAVES isn't subject to MSR intercepts, but KVM could utilize VMX's XSS-exiting bitmap to effectively intercept reads and writes to PT MSRs. Except that as you note, KVM would either need to emulate XSAVES (oof) or save/load PT MSRs much more frequently. So it's kind of an emulation thing, but I honestly doubt that emulating XSAVES was ever seriously considered when KVM support for PT was added. CET is different than PT because the MSRs that need to be context switched at every entry/exit have dedicated VMCS fields. The IA32_PLx_SSP MSRs don't have VMCS fields, but they are consumed only in privelege level changes, i.e. can be safely deferred until guest "FPU" state is put. > However it is possible to enable IA32_XSS bits in case the msrs XSAVES > reads/writes can't do harm to the host, and then KVM can context switch these > MSRs when the guest exits and that is what is done here with CET. This isn't really true. It's not a safety or correctness issue so much as it's a performance issue. E.g. KVM could let the guest use XSS for any virtualized feature, but it would effectively require context switching related state that the host needs loaded "immediately" after VM-Exit. And for MSRs, that gets very expensive without dedicated VMCS fields. I mean, yeah, it's a correctness thing to not consume guest state in the host and vice versa, but that's not unique to XSS in any way.
On Wed, 2023-11-01 at 12:18 -0700, Sean Christopherson wrote: > On Tue, Oct 31, 2023, Maxim Levitsky wrote: > > On Thu, 2023-09-14 at 02:33 -0400, Yang Weijiang wrote: > > > From: Sean Christopherson <seanjc@google.com> > > > > > > Add MSR_IA32_XSS to list of MSRs reported to userspace if supported_xss > > > is non-zero, i.e. KVM supports at least one XSS based feature. > > > > I can't believe that CET is the first supervisor feature that KVM supports... > > > > Ah, now I understand why: > > > > 1. XSAVES on AMD can't really be intercepted (other than clearing CR4.OSXSAVE > > bit, which isn't an option if you want to support AVX for example) On VMX > > however you can intercept XSAVES and even intercept it only when it touches > > specific bits of state that you don't want the guest to read/write freely. > > > > 2. Even if it was possible to intercept it, guests use XSAVES on every > > context switch if available and emulating it might be costly. > > > > 3. Emulating XSAVES is also not that easy to do correctly. > > > > However XSAVES touches various MSRs, thus letting the guest use it > > unintercepted means giving access to host MSRs, which might be wrong security > > wise in some cases. > > > > Thus I see that KVM hardcodes the IA32_XSS to 0, and that makes the XSAVES > > work exactly like XSAVE. > > > > And for some features which would benefit from XSAVES state components, > > KVM likely won't even be able to do so due to this limitation. > > (this is allowed thankfully by the CPUID), forcing the guests to use > > rdmsr/wrmsr instead. > > Sort of? KVM doesn't (yet) virtualize PASID, HDC, HWP, or arch LBRs (wow, > there's a lot of stuff getting thrown into XSTATE), so naturally those aren't > supported in XSS. > > KVM does virtualize Processor Trace (PT), but PT is a bit of a special snowflake. > E.g. the host kernel elects NOT to manage PT MSRs via XSTATE, but it would be > possible for KVM to the guest to manage PT MSRs via XSTATE. I must also note that PT doesn't always uses guest physical addresses to write its trace output, because there is a secondary execution control 'Intel PT uses guest physical addresses', however I see that KVM requires it, so yes, we could likely have supported PT xsaves component. > > I suspect the answer to PT is threefold: > > 1. Exposing a feature that isn't "supported" by the host kernel is scary. > 2. No one has pushed for the support, e.g. Linux guests obviously don't complain > about lack of XSS support for PT. > 3. Toggling PT MSR passthrough on XSAVES/XRSTORS accesses would be more complex > and less performant than KVM's current approach. > > Re: #3, KVM does passthrough PT MSRs, but only when the guest is actively using > PT. PT is basically a super fancy PMU feature, and so KVM "needs" to load guest > state as late as possible before VM-Entry, and load host state as early as possible > after VM-Exit. I.e. the context switch happens on *every* entry/exit pair. > Makes sense. > By passing through PT MSRs only when needed, KVM avoids a rather large pile of > RDMSRs and WRMSRs on every entry/exit, as the host values can be kept resident in > hardware so long as the main enable bit is cleared in the guest's control MSR > (which is context switch via a dedicated VMCS field). > > XSAVES isn't subject to MSR intercepts, but KVM could utilize VMX's XSS-exiting > bitmap to effectively intercept reads and writes to PT MSRs. Except that as you > note, KVM would either need to emulate XSAVES (oof) or save/load PT MSRs much more > frequently. > > So it's kind of an emulation thing, but I honestly doubt that emulating XSAVES > was ever seriously considered when KVM support for PT was added. > > CET is different than PT because the MSRs that need to be context switched at > every entry/exit have dedicated VMCS fields. The IA32_PLx_SSP MSRs don't have > VMCS fields, but they are consumed only in privelege level changes, i.e. can be > safely deferred until guest "FPU" state is put. > > > However it is possible to enable IA32_XSS bits in case the msrs XSAVES > > reads/writes can't do harm to the host, and then KVM can context switch these > > MSRs when the guest exits and that is what is done here with CET. > > This isn't really true. It's not a safety or correctness issue so much as it's > a performance issue. True as well, I haven't thought about it from this POV. > E.g. KVM could let the guest use XSS for any virtualized > feature, but it would effectively require context switching related state that > the host needs loaded "immediately" after VM-Exit. And for MSRs, that gets > very expensive without dedicated VMCS fields. Yes, unless allowing setting a MSR via xrstors causes harm to the host, (for example msr that has a physical address in it). Such MSRs cannot be allowed to be set by the guest even for the duration of the guest run, and that means that we cannot pass through the corresponding XSS state component. > > I mean, yeah, it's a correctness thing to not consume guest state in the host > and vice versa, but that's not unique to XSS in any way. > Best regards, Maxim Levitsky
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e0b55c043dab..1258d1d6dd52 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1464,6 +1464,7 @@ static const u32 msrs_to_save_base[] = { MSR_IA32_UMWAIT_CONTROL, MSR_IA32_XFD, MSR_IA32_XFD_ERR, + MSR_IA32_XSS, }; static const u32 msrs_to_save_pmu[] = { @@ -7195,6 +7196,10 @@ static void kvm_probe_msr_to_save(u32 msr_index) if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR)) return; break; + case MSR_IA32_XSS: + if (!kvm_caps.supported_xss) + return; + break; default: break; }