Message ID | 20231122170106.270266-2-jpiotrowski@linux.microsoft.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2b07:b0:403:3b70:6f57 with SMTP id io7csp1468590vqb; Wed, 22 Nov 2023 09:03:40 -0800 (PST) X-Google-Smtp-Source: AGHT+IGGftTq9f0fDSs0dN8zBZ2rIhlRRAd1JYJC+910zFVKoapPfYfOAZrEtBKIEECsN20cxrmI X-Received: by 2002:a05:6a20:440d:b0:18b:47a9:66e9 with SMTP id ce13-20020a056a20440d00b0018b47a966e9mr2984653pzb.40.1700672620451; Wed, 22 Nov 2023 09:03:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700672620; cv=none; d=google.com; s=arc-20160816; b=CZ0201VPmiAhtrMnPKGBXdo7F5bm0e9g6pKHblSGcwKXAwLk425RgRpynAmrMRXCZb nM7hBkhvYBm/dfLSdZojOj5AEX8FpsKxiwrDEedZxfCCXZr1WZ5yUvutFbvyftlPgVVm l90eT0OFbk0Js8vNVJPOEAdxWN7GEOjmdaekz8+EBL61JxZJUULJBksj7Wq3cZfpuvyp kmhAcklBKxmQbrSVDc3lG11Vpr2hXmqd7pnMYukPKjSIvIasBSknZacaKgVJX1+e50Yd i+NDRJ21vU1WgWv83Fec3WpVfB6sMCb5Nh4PQCCok99AvOT8OMa35C8E+0oYZwgLinX7 7wEA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:dkim-filter; bh=BKqGTv7UstB2deyF2KOlDMhyNWpRdGHnOL9MuTocPpU=; fh=E8ULNJBgdDg9an7hIBnyD2FCRjgTqVL5oPJZZj2SaL0=; b=x6Tr61ekpVMCreG7A0jMadWqiuvsBtZEuHy+1f+d2DeC3+9CsPhePceneHYMm8/ZcI x+yZcejnH0y07Y/ATpuu9eRnhCBTKo19i1h77VwrYysnvK46kJeeMbwmhwAxXOelWp8V s6gW5PjQUfDFAnXY5fo8/xXcALI/xuJTM6F5Oihvm7w4z0rWyFpS/za16n9lem/Qjeb1 gM+9r7os66Ubq19PWU70xKCpUeK3sEm8dxCCOTMBba4qmOh6BFE7IJRCYcWSQRPCckg/ xaz6wRIjAmwuwKvx3B2Pm7KtUIaoZfVfFN3NUXPLZneBt36drXfKwgINhWt1ZKYe6khl 4O2g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=JgSb7prC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id q4-20020aa79824000000b006c4fc8585c8si12569074pfl.364.2023.11.22.09.03.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Nov 2023 09:03:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=JgSb7prC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id C3DAA8217063; Wed, 22 Nov 2023 09:01:57 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343814AbjKVRBr (ORCPT <rfc822;ouuuleilei@gmail.com> + 99 others); Wed, 22 Nov 2023 12:01:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58386 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232077AbjKVRBm (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 22 Nov 2023 12:01:42 -0500 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 3988D9F; Wed, 22 Nov 2023 09:01:39 -0800 (PST) Received: from localhost.localdomain (77-166-152-30.fixed.kpn.net [77.166.152.30]) by linux.microsoft.com (Postfix) with ESMTPSA id C19D120B74C1; Wed, 22 Nov 2023 09:01:34 -0800 (PST) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com C19D120B74C1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1700672498; bh=BKqGTv7UstB2deyF2KOlDMhyNWpRdGHnOL9MuTocPpU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=JgSb7prCdabwpGgVQnbh4ee5XA8kqbASGpQJSDfED6R3bv7a6g5yCsOUIKJR7i7Em MpEhnCOYWXXxGAoLna1v4W0M97P/5nzYpql5x91XrtfyZR74+92IIBuIOF5jBW3ZqM f07lkA5njUbY2NHdKhdbMyC70l7jrKYd2wZGVr88= From: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com> To: linux-kernel@vger.kernel.org, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, "H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Michael Kelley <mhkelley58@gmail.com>, Nikolay Borisov <nik.borisov@suse.com>, Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@linutronix.de>, Tom Lendacky <thomas.lendacky@amd.com>, x86@kernel.org, Dexuan Cui <decui@microsoft.com> Cc: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>, linux-hyperv@vger.kernel.org, stefan.bader@canonical.com, tim.gardner@canonical.com, roxana.nicolescu@canonical.com, cascardo@canonical.com, kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, sashal@kernel.org, stable@vger.kernel.org Subject: [PATCH v1 2/3] x86/coco: Disable TDX module calls when TD partitioning is active Date: Wed, 22 Nov 2023 18:01:05 +0100 Message-Id: <20231122170106.270266-2-jpiotrowski@linux.microsoft.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231122170106.270266-1-jpiotrowski@linux.microsoft.com> References: <20231122170106.270266-1-jpiotrowski@linux.microsoft.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-17.5 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,ENV_AND_HDR_SPF_MATCH,SPF_HELO_PASS,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Wed, 22 Nov 2023 09:01:57 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1783284493703941052 X-GMAIL-MSGID: 1783284493703941052 |
Series |
[v1,1/3] x86/tdx: Check for TDX partitioning during early TDX init
|
|
Commit Message
Jeremi Piotrowski
Nov. 22, 2023, 5:01 p.m. UTC
Introduce CC_ATTR_TDX_MODULE_CALLS to allow code to check whether TDX module
calls are available. When TD partitioning is enabled, a L1 TD VMM handles most
TDX facilities and the kernel running as an L2 TD VM does not have access to
TDX module calls. The kernel still has access to TDVMCALL(0) which is forwarded
to the VMM for processing, which is the L1 TD VM in this case.
Cc: <stable@vger.kernel.org> # v6.5+
Signed-off-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
---
arch/x86/coco/core.c | 3 +++
arch/x86/include/asm/unaccepted_memory.h | 2 +-
drivers/virt/coco/tdx-guest/tdx-guest.c | 3 +++
include/linux/cc_platform.h | 8 ++++++++
4 files changed, 15 insertions(+), 1 deletion(-)
Comments
On Wed, Nov 22, 2023 at 06:01:05PM +0100, Jeremi Piotrowski wrote: > Introduce CC_ATTR_TDX_MODULE_CALLS to allow code to check whether TDX module > calls are available. When TD partitioning is enabled, a L1 TD VMM handles most > TDX facilities and the kernel running as an L2 TD VM does not have access to > TDX module calls. The kernel still has access to TDVMCALL(0) which is forwarded > to the VMM for processing, which is the L1 TD VM in this case. Sounds like a problem introduced by patch 1/3 :/
On 23/11/2023 15:13, Kirill A. Shutemov wrote: > On Wed, Nov 22, 2023 at 06:01:05PM +0100, Jeremi Piotrowski wrote: >> Introduce CC_ATTR_TDX_MODULE_CALLS to allow code to check whether TDX module >> calls are available. When TD partitioning is enabled, a L1 TD VMM handles most >> TDX facilities and the kernel running as an L2 TD VM does not have access to >> TDX module calls. The kernel still has access to TDVMCALL(0) which is forwarded >> to the VMM for processing, which is the L1 TD VM in this case. > Correction: it turns out TDVMCALL(0) is handled by L0 VMM. > Sounds like a problem introduced by patch 1/3 :/ > What problem are you referring to? This patch is making the kernel aware of which subfeatures of TDX are available to it. This patch is needed once you make the kernel aware of X86_FEATURE_TDX_GUEST, which is applicable because we're dealing with a TDX guest.
On Fri, 2023-11-24 at 11:38 +0100, Jeremi Piotrowski wrote: > On 23/11/2023 15:13, Kirill A. Shutemov wrote: > > On Wed, Nov 22, 2023 at 06:01:05PM +0100, Jeremi Piotrowski wrote: > > > Introduce CC_ATTR_TDX_MODULE_CALLS to allow code to check whether TDX module > > > calls are available. When TD partitioning is enabled, a L1 TD VMM handles most > > > TDX facilities and the kernel running as an L2 TD VM does not have access to > > > TDX module calls. The kernel still has access to TDVMCALL(0) which is forwarded > > > to the VMM for processing, which is the L1 TD VM in this case. > > > > Correction: it turns out TDVMCALL(0) is handled by L0 VMM. > > Some thoughts after checking the spec more to make sure we don't have misunderstanding on each other: The TDX module will unconditionally exit to L1 for any TDCALL (except the TDVMCALL) from the L2. This is expected behaviour. Because the L2 isn't a true TDX guest, L1 is expected to inject a #UD or #GP or whatever error to L2 based on the hardware spec to make sure L2 gets an correct architectural behaviour for the TDCALL instruction. I believe this is also the reason you mentioned "L2 TD VM does not have access to TDX module calls". However TDX module actually allows the L1 to control whether the L2 is allowed to execute TDVMCALL by controlling whether the TDVMCALL from L2 will exit to L0 or L1. I believe you mentioned "TDVMCALL(0) is handled by L0 VMM" is because the L1 hypervisor -- specifically, hyperv -- chooses to let the TDVMCALL from L2 exit to L0? But IMHO this is purely the hyperv's implementation, i.e., KVM can choose not to do so, and simply handle TDVMCALL in the same way as it handles normal TDCALL -- inject the architecture defined error to L2. Also AFAICT there's no architectural thing that controlled by L2 to allow the L1 know whether L2 is expecting to use TDVMCALL or not. In other words, whether to support TDVMCALL is purely L1 hypervisor implementation specific. So to me this whole series is hyperv specific enlightenment for the L2 running on TDX guest hyperv L1. And because of that, perhaps a better way to do is: 1) The default L2 should just be a normal VM that any TDX guest L1 hypervisor should be able to handle (guaranteed by the TDX partitioning architecture). 2) Different L2/L1 hypervisor can have it's own enlightenments. We can even have common enlightenments across different implementation of L1 hypervisors, but that requires cross-hypervisor cooperation. But IMHO it's not a good idea to say: L2 is running on a TDX partitioning enabled environment, let us mark it as a TDX guest but mark it as "TDX partitioning" to disable couple of TDX functionalities. Instead, perhaps it's better to let L2 explicitly opt-in TDX facilities that the underneath hypervisor supports. TDVMCALL can be the first facility to begin with. At last, even TDVMCALL has bunch of leafs, and hypervisor can choose to support them or not. Use a single "tdx_partitioning_active" to select what TDX facilities are supported doesn't seem a good idea. That's my 2cents w/o knowing details of hyperv enlightenments.
On 29/11/2023 11:37, Huang, Kai wrote: > On Fri, 2023-11-24 at 11:38 +0100, Jeremi Piotrowski wrote: >> On 23/11/2023 15:13, Kirill A. Shutemov wrote: >>> On Wed, Nov 22, 2023 at 06:01:05PM +0100, Jeremi Piotrowski wrote: >>>> Introduce CC_ATTR_TDX_MODULE_CALLS to allow code to check whether TDX module >>>> calls are available. When TD partitioning is enabled, a L1 TD VMM handles most >>>> TDX facilities and the kernel running as an L2 TD VM does not have access to >>>> TDX module calls. The kernel still has access to TDVMCALL(0) which is forwarded >>>> to the VMM for processing, which is the L1 TD VM in this case. >>> >> >> Correction: it turns out TDVMCALL(0) is handled by L0 VMM. >>>> > Some thoughts after checking the spec more to make sure we don't have > misunderstanding on each other: > > The TDX module will unconditionally exit to L1 for any TDCALL (except the > TDVMCALL) from the L2. This is expected behaviour. Because the L2 isn't a true > TDX guest, L1 is expected to inject a #UD or #GP or whatever error to L2 based > on the hardware spec to make sure L2 gets an correct architectural behaviour for > the TDCALL instruction. > > I believe this is also the reason you mentioned "L2 TD VM does not have access > to TDX module calls". Right. Injecting #UD/#GP/returning an error (?) might be desirable but the L2 guest would still not be guaranteed to be able to rely on the functionality provided by these TDCALLS. Here the TDCALLs lead to guest termination, but the kernel would panic if some of them would return an error. > > However TDX module actually allows the L1 to control whether the L2 is allowed > to execute TDVMCALL by controlling whether the TDVMCALL from L2 will exit to L0 > or L1. > > I believe you mentioned "TDVMCALL(0) is handled by L0 VMM" is because the L1 > hypervisor -- specifically, hyperv -- chooses to let the TDVMCALL from L2 exit > to L0? That is correct. The L1 hypervisor here (it's not hyperv, so maybe lets keep referring to it as paravisor?) enables ENABLE_TDVMCALL so that TDVMCALLs exit straight to L0. The TDVMCALLs are used for the I/O path which is not emulated or intercepted by the L1 hypervisor at all. > > But IMHO this is purely the hyperv's implementation, i.e., KVM can choose not to > do so, and simply handle TDVMCALL in the same way as it handles normal TDCALL -- > inject the architecture defined error to L2. > > Also AFAICT there's no architectural thing that controlled by L2 to allow the L1 > know whether L2 is expecting to use TDVMCALL or not. In other words, whether to > support TDVMCALL is purely L1 hypervisor implementation specific. > Right, the only way to know whether TDVMCALL/TDCALL is allowed is to identify the L1 hypervisor and use that knowledge. > So to me this whole series is hyperv specific enlightenment for the L2 running > on TDX guest hyperv L1. And because of that, perhaps a better way to do is: > > 1) The default L2 should just be a normal VM that any TDX guest L1 hypervisor > should be able to handle (guaranteed by the TDX partitioning architecture). > When you say "normal VM" you mean "legacy VM"? 'Any TDX guest L1 hypervisor' is a bit of a reach: the only TDX guest L1 hypervisor implementation that I know exists does not support guests that are entirely unaware of TDX. Maybe it's best if we avoid the name "TDX guest L1 hypervisor" altogether and refer to is like AMD calls it: "Secure VM Service Module" because that more accurately reflects the intention: providing certain targeted services needed in the context of a confidential VM. No one is interested in running a full blown hypervisor implementation in there. > 2) Different L2/L1 hypervisor can have it's own enlightenments. We can even > have common enlightenments across different implementation of L1 hypervisors, > but that requires cross-hypervisor cooperation. > > But IMHO it's not a good idea to say: > > L2 is running on a TDX partitioning enabled environment, let us mark it > as a TDX guest but mark it as "TDX partitioning" to disable couple of > TDX functionalities. > > Instead, perhaps it's better to let L2 explicitly opt-in TDX facilities that the > underneath hypervisor supports.> > TDVMCALL can be the first facility to begin with. > > At last, even TDVMCALL has bunch of leafs, and hypervisor can choose to support > them or not. Use a single "tdx_partitioning_active" to select what TDX > facilities are supported doesn't seem a good idea. > > That's my 2cents w/o knowing details of hyperv enlightenments. > I think on the whole we are on the same page. Let me rephrase what I hear you saying: 'tdx_partitioning_active' as a "catch all" is bad, but CC_ATTR_TDX_MODULE_CALLS is in the spirit of what we would like to have. So something like: case CC_ATTR_TDX_MODULE_CALLS: return tdx_status & TDCALL; and if (no_td_partitioning) tdx_status |= TDCALL; if (l1_td_vmm_supports_tdcalls) tdx_status |= TDCALL; would be ok? I can directly tell you that the next facility would control tdx_safe_halt() because that doesn't operate as intended (hlt traps to L1, tdx_safe_halt is a TDVMCALL and goes to L0). The other important goal of the patchset is ensuring that X86_FEATURE_TDX_GUEST is set.
diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c index eeec9986570e..2c1116da4d54 100644 --- a/arch/x86/coco/core.c +++ b/arch/x86/coco/core.c @@ -12,6 +12,7 @@ #include <asm/coco.h> #include <asm/processor.h> +#include <asm/tdx.h> enum cc_vendor cc_vendor __ro_after_init = CC_VENDOR_NONE; static u64 cc_mask __ro_after_init; @@ -24,6 +25,8 @@ static bool noinstr intel_cc_platform_has(enum cc_attr attr) case CC_ATTR_GUEST_MEM_ENCRYPT: case CC_ATTR_MEM_ENCRYPT: return true; + case CC_ATTR_TDX_MODULE_CALLS: + return !tdx_partitioning_active; default: return false; } diff --git a/arch/x86/include/asm/unaccepted_memory.h b/arch/x86/include/asm/unaccepted_memory.h index f5937e9866ac..b666062555ac 100644 --- a/arch/x86/include/asm/unaccepted_memory.h +++ b/arch/x86/include/asm/unaccepted_memory.h @@ -8,7 +8,7 @@ static inline void arch_accept_memory(phys_addr_t start, phys_addr_t end) { /* Platform-specific memory-acceptance call goes here */ - if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) { + if (cc_platform_has(CC_ATTR_TDX_MODULE_CALLS)) { if (!tdx_accept_memory(start, end)) panic("TDX: Failed to accept memory\n"); } else if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) { diff --git a/drivers/virt/coco/tdx-guest/tdx-guest.c b/drivers/virt/coco/tdx-guest/tdx-guest.c index 5e44a0fa69bd..2f995df8c795 100644 --- a/drivers/virt/coco/tdx-guest/tdx-guest.c +++ b/drivers/virt/coco/tdx-guest/tdx-guest.c @@ -87,6 +87,9 @@ static int __init tdx_guest_init(void) if (!x86_match_cpu(tdx_guest_ids)) return -ENODEV; + if (!cc_platform_has(CC_ATTR_TDX_MODULE_CALLS)) + return -ENODEV; + return misc_register(&tdx_misc_dev); } module_init(tdx_guest_init); diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h index cb0d6cd1c12f..d3c57e86773d 100644 --- a/include/linux/cc_platform.h +++ b/include/linux/cc_platform.h @@ -90,6 +90,14 @@ enum cc_attr { * Examples include TDX Guest. */ CC_ATTR_HOTPLUG_DISABLED, + + /** + * @CC_ATTR_TDX_MODULE_CALLS: TDX module calls are available. + * + * The platform supports issuing calls directly to the TDX module. + * This is not a given when TD partitioning is active. + */ + CC_ATTR_TDX_MODULE_CALLS, }; #ifdef CONFIG_ARCH_HAS_CC_PLATFORM