Message ID | d26254af8e5b3dcca8a070703c5d6d04f48d47a9.1668988357.git.kai.huang@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1322475wrr; Sun, 20 Nov 2022 16:28:31 -0800 (PST) X-Google-Smtp-Source: AA0mqf7Bf11I/xUP/azIebXJMzmACwD1w0fr0V5rWNb4rCe23ctBzyuFdREEnUT/h3GTxQY3Qi/0 X-Received: by 2002:a62:1c05:0:b0:56a:af55:629c with SMTP id c5-20020a621c05000000b0056aaf55629cmr18008177pfc.82.1668990511185; Sun, 20 Nov 2022 16:28:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668990511; cv=none; d=google.com; s=arc-20160816; b=YWeU6dBnoD1RMO6x3Q87UnucwDKxZbB1fK+ZSx0n/6xtfrFqZU3DnxjDqgZrYmybGn Gw6ycwvj1nsWZQbCbPO50PP1vjDJAaCbSvlsUjpI7+QLaWmST+Kz7y8Ng5JtCCjWRQvd uH7J/KuMku1EDPsuedFOXq7+IhrFMNzHbZDmq3IhiI+K/FOPg3dl9eYXZPGkisXZ9nRh M8bI5q9iVZHs/8iGRiKIP0GVcfebSggNj00hurfRHJyxgxEw36p3YfR8MaMY51Ql8Jj+ L/oVgxuSJZ8iXDnn56VCpHsQaY+U7kPIIqNk8SYFvj3nSQyRyYlmZRylbqjzoI6MNxZe ayTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=7ur5RcuQkWcjjwGZvulljBWYNnnq4yuWImrMgWC3aF4=; b=y0okKFf6V452tXF4DtAiy8yMr8enI8U8zqD70Lk93Ualqk/cK4YQ4+YnIfxXwHYY0N 24J+qRZVuem7cnMC2lK0yY87VYHPP4l5ozh+XdIpldZ6RNJ2bDABuazxz6Y1PDxTkARC 8X+z+OAw9amrqkyYaV4az1mbABdgJSQPSREueI4jCVafaLKDrFxRU/Vcr0a5XmVtO/ge nLbWlbNCAbCRxN/vHea40YVxSzoYPQceO4Xq5hZz7s9QszNxK96stKWI3sK7gHTwqdky 22zf49orlcDjA8KqHTzng1+auvicj0sOuTLa4CS//7tYV1deFCIPAA7RT982aG4eeGdO z+XQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=aZV3cbXl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z4-20020a63e544000000b0047730522d4esi7260349pgj.95.2022.11.20.16.28.17; Sun, 20 Nov 2022 16:28:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=aZV3cbXl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229932AbiKUA1o (ORCPT <rfc822;leviz.kernel.dev@gmail.com> + 99 others); Sun, 20 Nov 2022 19:27:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57432 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229708AbiKUA1T (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sun, 20 Nov 2022 19:27:19 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1DA592D1F3; Sun, 20 Nov 2022 16:27:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668990434; x=1700526434; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3s5S0WEvIVlYJHz6MygetQEQmwV83ab1apvR2JvNNVM=; b=aZV3cbXlFSLo9SFtMyaU+aTBZZMtAY5m8DID6ZDwvSBV1lRbfXyZCyFE m6pVABMaGZeyk1rzPF4gfGshidY/y1BYWvZKp1wuLXqHqW6jDGC6Hle/t ZKgFhw/vGVdQFd6F4gJAqtlr8LCws3lDs90Mpbd8oZ3F2aylO6vpKL9NJ lKYZC87UDEiNwlcU2nTHjHOqjgP8/F5OkYsm2MgazEYus+jyiQL/59wDD HRfesOLlzlvzQTCxRkk1mJFfb8bCu3Rljw0U4PTAI8i0WJCyC4jZ5s99o EPSaElWK8dx4n8s73BQ2+4HAJoOlgFrIb3SUFNr3FdEEiZRNkL2YxyCU/ w==; X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="399732292" X-IronPort-AV: E=Sophos;i="5.96,180,1665471600"; d="scan'208";a="399732292" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2022 16:27:12 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="729825228" X-IronPort-AV: E=Sophos;i="5.96,180,1665471600"; d="scan'208";a="729825228" Received: from tomnavar-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.176.15]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2022 16:27:08 -0800 From: Kai Huang <kai.huang@intel.com> To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, dan.j.williams@intel.com, rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com, ying.huang@intel.com, reinette.chatre@intel.com, len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org, ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com, sagis@google.com, imammedo@redhat.com, kai.huang@intel.com Subject: [PATCH v7 04/20] x86/virt/tdx: Add skeleton to initialize TDX on demand Date: Mon, 21 Nov 2022 13:26:26 +1300 Message-Id: <d26254af8e5b3dcca8a070703c5d6d04f48d47a9.1668988357.git.kai.huang@intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <cover.1668988357.git.kai.huang@intel.com> References: <cover.1668988357.git.kai.huang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1750063394372940491?= X-GMAIL-MSGID: =?utf-8?q?1750063394372940491?= |
Series |
TDX host kernel support
|
|
Commit Message
Kai Huang
Nov. 21, 2022, 12:26 a.m. UTC
Before the TDX module can be used to create and run TDX guests, it must be loaded and properly initialized. The TDX module is expected to be loaded by the BIOS, and to be initialized by the kernel. TDX introduces a new CPU mode: Secure Arbitration Mode (SEAM). The host kernel communicates with the TDX module via a new SEAMCALL instruction. The TDX module implements a set of SEAMCALL leaf functions to allow the host kernel to initialize it. The TDX module can be initialized only once in its lifetime. Instead of always initializing it at boot time, this implementation chooses an "on demand" approach to initialize TDX until there is a real need (e.g when requested by KVM). This approach has below pros: 1) It avoids consuming the memory that must be allocated by kernel and given to the TDX module as metadata (~1/256th of the TDX-usable memory), and also saves the CPU cycles of initializing the TDX module (and the metadata) when TDX is not used at all. 2) It is more flexible to support TDX module runtime updating in the future (after updating the TDX module, it needs to be initialized again). 3) It avoids having to do a "temporary" solution to handle VMXON in the core (non-KVM) kernel for now. This is because SEAMCALL requires CPU being in VMX operation (VMXON is done), but currently only KVM handles VMXON. Adding VMXON support to the core kernel isn't trivial. More importantly, from long-term a reference-based approach is likely needed in the core kernel as more kernel components are likely needed to support TDX as well. Allow KVM to initialize the TDX module avoids having to handle VMXON during kernel boot for now. Add a placeholder tdx_enable() to detect and initialize the TDX module on demand, with a state machine protected by mutex to support concurrent calls from multiple callers. The TDX module will be initialized in multi-steps defined by the TDX module: 1) Global initialization; 2) Logical-CPU scope initialization; 3) Enumerate the TDX module capabilities and platform configuration; 4) Configure the TDX module about TDX usable memory ranges and global KeyID information; 5) Package-scope configuration for the global KeyID; 6) Initialize usable memory ranges based on 4). The TDX module can also be shut down at any time during its lifetime. In case of any error during the initialization process, shut down the module. It's pointless to leave the module in any intermediate state during the initialization. Both logical CPU scope initialization and shutting down the TDX module require calling SEAMCALL on all boot-time present CPUs. For simplicity just temporarily disable CPU hotplug during the module initialization. Note TDX architecturally doesn't support physical CPU hot-add/removal. A non-buggy BIOS should never support ACPI CPU hot-add/removal. This implementation doesn't explicitly handle ACPI CPU hot-add/removal but depends on the BIOS to do the right thing. Reviewed-by: Chao Gao <chao.gao@intel.com> Signed-off-by: Kai Huang <kai.huang@intel.com> --- v6 -> v7: - No change. v5 -> v6: - Added code to set status to TDX_MODULE_NONE if TDX module is not loaded (Chao) - Added Chao's Reviewed-by. - Improved comments around cpus_read_lock(). - v3->v5 (no feedback on v4): - Removed the check that SEAMRR and TDX KeyID have been detected on all present cpus. - Removed tdx_detect(). - Added num_online_cpus() to MADT-enabled CPUs check within the CPU hotplug lock and return early with error message. - Improved dmesg printing for TDX module detection and initialization. --- arch/x86/include/asm/tdx.h | 2 + arch/x86/virt/vmx/tdx/tdx.c | 150 ++++++++++++++++++++++++++++++++++++ 2 files changed, 152 insertions(+)
Comments
On Mon, Nov 21, 2022 at 01:26:26PM +1300, Kai Huang wrote: > +static int __tdx_enable(void) > +{ > + int ret; > + > + /* > + * Initializing the TDX module requires doing SEAMCALL on all > + * boot-time present CPUs. For simplicity temporarily disable > + * CPU hotplug to prevent any CPU from going offline during > + * the initialization. > + */ > + cpus_read_lock(); > + > + /* > + * Check whether all boot-time present CPUs are online and > + * return early with a message so the user can be aware. > + * > + * Note a non-buggy BIOS should never support physical (ACPI) > + * CPU hotplug when TDX is enabled, and all boot-time present > + * CPU should be enabled in MADT, so there should be no > + * disabled_cpus and num_processors won't change at runtime > + * either. > + */ > + if (disabled_cpus || num_online_cpus() != num_processors) { > + pr_err("Unable to initialize the TDX module when there's offline CPU(s).\n"); > + ret = -EINVAL; > + goto out; > + } > + > + ret = init_tdx_module(); > + if (ret == -ENODEV) { > + pr_info("TDX module is not loaded.\n"); > + tdx_module_status = TDX_MODULE_NONE; > + goto out; > + } > + > + /* > + * Shut down the TDX module in case of any error during the > + * initialization process. It's meaningless to leave the TDX > + * module in any middle state of the initialization process. > + * > + * Shutting down the module also requires doing SEAMCALL on all > + * MADT-enabled CPUs. Do it while CPU hotplug is disabled. > + * > + * Return all errors during the initialization as -EFAULT as the > + * module is always shut down. > + */ > + if (ret) { > + pr_info("Failed to initialize TDX module. Shut it down.\n"); > + shutdown_tdx_module(); > + tdx_module_status = TDX_MODULE_SHUTDOWN; > + ret = -EFAULT; > + goto out; > + } > + > + pr_info("TDX module initialized.\n"); > + tdx_module_status = TDX_MODULE_INITIALIZED; > +out: > + cpus_read_unlock(); > + > + return ret; > +} Uhm.. so if we've offlined all the SMT siblings because of some speculation fail or other, this TDX thing will fail to initialize? Because as I understand it; this TDX initialization happens some random time after boot, when the first (TDX using) KVM instance gets created, long after the speculation mitigations are enforced.
On Tue, Nov 22 2022 at 10:02, Peter Zijlstra wrote: > On Mon, Nov 21, 2022 at 01:26:26PM +1300, Kai Huang wrote: >> + cpus_read_unlock(); >> + >> + return ret; >> +} > > Uhm.. so if we've offlined all the SMT siblings because of some > speculation fail or other, this TDX thing will fail to initialize? > > Because as I understand it; this TDX initialization happens some random > time after boot, when the first (TDX using) KVM instance gets created, > long after the speculation mitigations are enforced. Correct. Aside of that it's completely unclear from the changelog why TDX needs to run the seamcall on _all_ present CPUs and why it cannot handle CPU being hotplugged later. It's pretty much obvious that a TDX guest can only run on CPUs where the seam module has been initialized, but where does the requirement come from that _ALL_ CPUs must be initialized and _ALL_ CPUs must be able to run TDX guests? I just went and read through the documentation again. "1. After loading the Intel TDX module, the host VMM should call the TDH.SYS.INIT function to globally initialize the module. 2. The host VMM should then call the TDH.SYS.LP.INIT function on each logical processor. TDH.SYS.LP.INIT is intended to initialize the module within the scope of the Logical Processor (LP)." This clearly tells me, that: 1) TDX must be globally initialized (once) 2) TDX must be initialized on each logical processor on which TDX root/non-root operation should be executed But it does not define any requirement for doing this on all logical processors and for preventing physical hotplug (Neither for CPUs nor for memory). Nothing in the TDX specs and docs mentions physical hotplug or a requirement for invoking seamcall on the world. Thanks, tglx
On 11/22/22 02:31, Thomas Gleixner wrote: > Nothing in the TDX specs and docs mentions physical hotplug or a > requirement for invoking seamcall on the world. The TDX module source is actually out there[1] for us to look at. It's in a lovely, convenient zip file, but you can read it if sufficiently motivated. It has this lovely nugget in it: WARNING!!! Proprietary License!! Avert your virgin eyes!!! > if (tdx_global_data_ptr->num_of_init_lps < tdx_global_data_ptr->num_of_lps) > { > TDX_ERROR("Num of initialized lps %d is smaller than total num of lps %d\n", > tdx_global_data_ptr->num_of_init_lps, tdx_global_data_ptr->num_of_lps); > retval = TDX_SYS_CONFIG_NOT_PENDING; > goto EXIT; > } tdx_global_data_ptr->num_of_init_lps is incremented at TDH.SYS.INIT time. That if() is called at TDH.SYS.CONFIG time to help bring the module up. So, I think you're right. I don't see the docs that actually *explain* this "you must seamcall all the things" requirement. 1. https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html
On 11/20/22 16:26, Kai Huang wrote: > 2) It is more flexible to support TDX module runtime updating in the > future (after updating the TDX module, it needs to be initialized > again). I hate this generic blabber about "more flexible". There's a *REASON* it's more flexible, so let's talk about the reasons, please. It's really something like this, right? The TDX module design allows it to be updated while the system is running. The update procedure shares quite a few steps with this "on demand" loading mechanism. The hope is that much of this "on demand" mechanism can be shared with a future "update" mechanism. A boot-time TDX module implementation would not be able to share much code with the update mechanism. > 3) It avoids having to do a "temporary" solution to handle VMXON in the > core (non-KVM) kernel for now. This is because SEAMCALL requires CPU > being in VMX operation (VMXON is done), but currently only KVM handles > VMXON. Adding VMXON support to the core kernel isn't trivial. More > importantly, from long-term a reference-based approach is likely needed > in the core kernel as more kernel components are likely needed to > support TDX as well. Allow KVM to initialize the TDX module avoids > having to handle VMXON during kernel boot for now. There are a lot of words in there. 3) Loading the TDX module requires VMX to be enabled. Currently, only the kernel KVM code mucks with VMX enabling. If the TDX module were to be initialized separately from KVM (like at boot), the boot code would need to be taught how to muck with VMX enabling and KVM would need to be taught how to cope with that. Making KVM itself responsible for TDX initialization lets the rest of the kernel stay blissfully unaware of VMX. > Add a placeholder tdx_enable() to detect and initialize the TDX module > on demand, with a state machine protected by mutex to support concurrent > calls from multiple callers. As opposed to concurrent calls from one caller? ;) > The TDX module will be initialized in multi-steps defined by the TDX > module: > > 1) Global initialization; > 2) Logical-CPU scope initialization; > 3) Enumerate the TDX module capabilities and platform configuration; > 4) Configure the TDX module about TDX usable memory ranges and global > KeyID information; > 5) Package-scope configuration for the global KeyID; > 6) Initialize usable memory ranges based on 4). This would actually be a nice place to call out the SEAMCALL names and mention that each of these steps involves a set of SEAMCALLs. > The TDX module can also be shut down at any time during its lifetime. > In case of any error during the initialization process, shut down the > module. It's pointless to leave the module in any intermediate state > during the initialization. > > Both logical CPU scope initialization and shutting down the TDX module > require calling SEAMCALL on all boot-time present CPUs. For simplicity > just temporarily disable CPU hotplug during the module initialization. You might want to more precisely define "boot-time present CPUs". The boot of *what*? > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > index 8d943bdc8335..28c187b8726f 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.c > +++ b/arch/x86/virt/vmx/tdx/tdx.c > @@ -10,15 +10,34 @@ > #include <linux/types.h> > #include <linux/init.h> > #include <linux/printk.h> > +#include <linux/mutex.h> > +#include <linux/cpu.h> > +#include <linux/cpumask.h> > #include <asm/msr-index.h> > #include <asm/msr.h> > #include <asm/apic.h> > #include <asm/tdx.h> > #include "tdx.h" > > +/* TDX module status during initialization */ > +enum tdx_module_status_t { > + /* TDX module hasn't been detected and initialized */ > + TDX_MODULE_UNKNOWN, > + /* TDX module is not loaded */ > + TDX_MODULE_NONE, > + /* TDX module is initialized */ > + TDX_MODULE_INITIALIZED, > + /* TDX module is shut down due to initialization error */ > + TDX_MODULE_SHUTDOWN, > +}; Are these part of the ABI or just a purely OS-side construct? > static u32 tdx_keyid_start __ro_after_init; > static u32 tdx_keyid_num __ro_after_init; > > +static enum tdx_module_status_t tdx_module_status; > +/* Prevent concurrent attempts on TDX detection and initialization */ > +static DEFINE_MUTEX(tdx_module_lock); > + > /* > * Detect TDX private KeyIDs to see whether TDX has been enabled by the > * BIOS. Both initializing the TDX module and running TDX guest require > @@ -104,3 +123,134 @@ bool platform_tdx_enabled(void) > { > return !!tdx_keyid_num; > } > + > +/* > + * Detect and initialize the TDX module. > + * > + * Return -ENODEV when the TDX module is not loaded, 0 when it > + * is successfully initialized, or other error when it fails to > + * initialize. > + */ > +static int init_tdx_module(void) > +{ > + /* The TDX module hasn't been detected */ > + return -ENODEV; > +} > + > +static void shutdown_tdx_module(void) > +{ > + /* TODO: Shut down the TDX module */ > +} > + > +static int __tdx_enable(void) > +{ > + int ret; > + > + /* > + * Initializing the TDX module requires doing SEAMCALL on all > + * boot-time present CPUs. For simplicity temporarily disable > + * CPU hotplug to prevent any CPU from going offline during > + * the initialization. > + */ > + cpus_read_lock(); > + > + /* > + * Check whether all boot-time present CPUs are online and > + * return early with a message so the user can be aware. > + * > + * Note a non-buggy BIOS should never support physical (ACPI) > + * CPU hotplug when TDX is enabled, and all boot-time present > + * CPU should be enabled in MADT, so there should be no > + * disabled_cpus and num_processors won't change at runtime > + * either. > + */ Again, there are a lot of words in that comment, but I'm not sure why it's here. Despite all the whinging about ACPI, doesn't it boil down to: The TDX module itself establishes its own concept of how many logical CPUs there are in the system when it is loaded. The module will reject initialization attempts unless the kernel runs TDX initialization code on every last CPU. Ensure that the kernel is able to run code on all known logical CPUs. and these checks are just to see if the kernel has shot itself in the foot and is *KNOWS* that it is currently unable to run code on some logical CPU? > + if (disabled_cpus || num_online_cpus() != num_processors) { > + pr_err("Unable to initialize the TDX module when there's offline CPU(s).\n"); > + ret = -EINVAL; > + goto out; > + } > + > + ret = init_tdx_module(); > + if (ret == -ENODEV) { Why check for -ENODEV exclusively? Is there some other error nonzero code that indicates success? > + pr_info("TDX module is not loaded.\n"); > + tdx_module_status = TDX_MODULE_NONE; > + goto out; > + } > + > + /* > + * Shut down the TDX module in case of any error during the > + * initialization process. It's meaningless to leave the TDX > + * module in any middle state of the initialization process. > + * > + * Shutting down the module also requires doing SEAMCALL on all > + * MADT-enabled CPUs. Do it while CPU hotplug is disabled. > + * > + * Return all errors during the initialization as -EFAULT as the > + * module is always shut down. > + */ > + if (ret) { > + pr_info("Failed to initialize TDX module. Shut it down.\n"); "Shut it down" seems wrong here. That could be interpreted as "I have already shut it down". "Shutting down" seems better. > + shutdown_tdx_module(); > + tdx_module_status = TDX_MODULE_SHUTDOWN; > + ret = -EFAULT; > + goto out; > + } > + > + pr_info("TDX module initialized.\n"); > + tdx_module_status = TDX_MODULE_INITIALIZED; > +out: > + cpus_read_unlock(); > + > + return ret; > +} > + > +/** > + * tdx_enable - Enable TDX by initializing the TDX module > + * > + * Caller to make sure all CPUs are online and in VMX operation before > + * calling this function. CPU hotplug is temporarily disabled internally > + * to prevent any cpu from going offline. "cpu" or "CPU"? > + * This function can be called in parallel by multiple callers. > + * > + * Return: > + * > + * * 0: The TDX module has been successfully initialized. > + * * -ENODEV: The TDX module is not loaded, or TDX is not supported. > + * * -EINVAL: The TDX module cannot be initialized due to certain > + * conditions are not met (i.e. when not all MADT-enabled > + * CPUs are not online). > + * * -EFAULT: Other internal fatal errors, or the TDX module is in > + * shutdown mode due to it failed to initialize in previous > + * attempts. > + */ I honestly don't think all these error codes mean anything. They're plumbed nowhere and the use of -EFAULT is just plain wrong. Nobody can *DO* anything with these anyway. Just give one error code and make sure that you have pr_info()'s around to make it clear what went wrong. Then just do -EINVAL universally. Remove all the nonsense comments. > +int tdx_enable(void) > +{ > + int ret; > + > + if (!platform_tdx_enabled()) > + return -ENODEV; > + > + mutex_lock(&tdx_module_lock); > + > + switch (tdx_module_status) { > + case TDX_MODULE_UNKNOWN: > + ret = __tdx_enable(); > + break; > + case TDX_MODULE_NONE: > + ret = -ENODEV; > + break; TDX_MODULE_NONE should probably be called TDX_MODULE_NOT_LOADED. A comment would also be nice: /* The BIOS did not load the module. No way to fix that. */ > + case TDX_MODULE_INITIALIZED: /* Already initialized, great, tell the caller: */ > + ret = 0; > + break; > + default: > + WARN_ON_ONCE(tdx_module_status != TDX_MODULE_SHUTDOWN); > + ret = -EFAULT; > + break; > + } I don't get what that default: is for or what it has to do with TDX_MODULE_SHUTDOWN. > + mutex_unlock(&tdx_module_lock); > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(tdx_enable);
On Tue, Nov 22 2022 at 07:35, Dave Hansen wrote: > On 11/22/22 02:31, Thomas Gleixner wrote: >> Nothing in the TDX specs and docs mentions physical hotplug or a >> requirement for invoking seamcall on the world. > > The TDX module source is actually out there[1] for us to look at. It's > in a lovely, convenient zip file, but you can read it if sufficiently > motivated. zip file? Version control from the last millenium? The whole thing wants to be @github with a proper change history if Intel wants anyone to trust this and take it serious. /me refrains from ranting about the outrageous license choice. > It has this lovely nugget in it: > > WARNING!!! Proprietary License!! Avert your virgin eyes!!! It's probably not the only reasons to avert the eyes. >> if (tdx_global_data_ptr->num_of_init_lps < tdx_global_data_ptr->num_of_lps) >> { >> TDX_ERROR("Num of initialized lps %d is smaller than total num of lps %d\n", >> tdx_global_data_ptr->num_of_init_lps, tdx_global_data_ptr->num_of_lps); >> retval = TDX_SYS_CONFIG_NOT_PENDING; >> goto EXIT; >> } > > tdx_global_data_ptr->num_of_init_lps is incremented at TDH.SYS.INIT > time. That if() is called at TDH.SYS.CONFIG time to help bring the > module up. > > So, I think you're right. I don't see the docs that actually *explain* > this "you must seamcall all the things" requirement. The code actually enforces this. At TDH.SYS.INIT which is the first operation it gets the total number of LPs from the sysinfo table: src/vmm_dispatcher/api_calls/tdh_sys_init.c: tdx_global_data_ptr->num_of_lps = sysinfo_table_ptr->mcheck_fields.tot_num_lps; Then TDH.SYS.LP.INIT increments the count of initialized LPs. src/vmm_dispatcher/api_calls/tdh_sys_lp_init.c: increment_num_of_lps(tdx_global_data_ptr) _lock_xadd_32b(&tdx_global_data_ptr->num_of_init_lps, 1); Finally TDH.SYS.CONFIG checks whether _ALL_ LPs have been initialized. src/vmm_dispatcher/api_calls/tdh_sys_config.c: if (tdx_global_data_ptr->num_of_init_lps < tdx_global_data_ptr->num_of_lps) Clearly that's nowhere spelled out in the documentation, but I don't buy the 'architecturaly required' argument not at all. It's an implementation detail of the TDX module. Technically there is IMO ZERO requirement to do so. 1) The TDX module is global 2) Seam-root and Seam-non-root operation are strictly a LP property. The only architectural prerequisite for using Seam on a LP is that obviously the encryption/decryption mechanics have been initialized on the package to which the LP belongs. I can see why it might be complicated to add/remove an LP after initialization fact, but technically it should be possible. TDX/Seam is not that special. But what's absolutely annoying is that the documentation lacks any information about the choice of enforcement which has been hardcoded into the Seam module for whatever reasons. Maybe I overlooked it, but then it's definitely well hidden. Thanks, tglx
On Tue, Nov 22, 2022, Thomas Gleixner wrote: > On Tue, Nov 22 2022 at 07:35, Dave Hansen wrote: > > > On 11/22/22 02:31, Thomas Gleixner wrote: > >> Nothing in the TDX specs and docs mentions physical hotplug or a > >> requirement for invoking seamcall on the world. > > > > The TDX module source is actually out there[1] for us to look at. It's > > in a lovely, convenient zip file, but you can read it if sufficiently > > motivated. > > zip file? Version control from the last millenium? > > The whole thing wants to be @github with a proper change history if > Intel wants anyone to trust this and take it serious. > > /me refrains from ranting about the outrageous license choice. Let me know if you grab pitchforcks and torches, I'll join the mob :-)
On Tue, 2022-11-22 at 21:03 +0100, Thomas Gleixner wrote: > > > if (tdx_global_data_ptr->num_of_init_lps < tdx_global_data_ptr- > > > >num_of_lps) > > > { > > > TDX_ERROR("Num of initialized lps %d is smaller than total num of > > > lps %d\n", > > > tdx_global_data_ptr->num_of_init_lps, > > > tdx_global_data_ptr->num_of_lps); > > > retval = TDX_SYS_CONFIG_NOT_PENDING; > > > goto EXIT; > > > } > > > > tdx_global_data_ptr->num_of_init_lps is incremented at TDH.SYS.INIT > > time. That if() is called at TDH.SYS.CONFIG time to help bring the > > module up. > > > > So, I think you're right. I don't see the docs that actually *explain* > > this "you must seamcall all the things" requirement. > > The code actually enforces this. > > At TDH.SYS.INIT which is the first operation it gets the total number > of LPs from the sysinfo table: > > src/vmm_dispatcher/api_calls/tdh_sys_init.c: > > tdx_global_data_ptr->num_of_lps = sysinfo_table_ptr- > >mcheck_fields.tot_num_lps; > > Then TDH.SYS.LP.INIT increments the count of initialized LPs. > > src/vmm_dispatcher/api_calls/tdh_sys_lp_init.c: > > increment_num_of_lps(tdx_global_data_ptr) > _lock_xadd_32b(&tdx_global_data_ptr->num_of_init_lps, 1); > > Finally TDH.SYS.CONFIG checks whether _ALL_ LPs have been initialized. > > src/vmm_dispatcher/api_calls/tdh_sys_config.c: > > if (tdx_global_data_ptr->num_of_init_lps < tdx_global_data_ptr- > >num_of_lps) > > Clearly that's nowhere spelled out in the documentation, but I don't > buy the 'architecturaly required' argument not at all. It's an > implementation detail of the TDX module. Hi Thomas, Thanks for review! I agree on hardware level there shouldn't be such requirement (not 100% sure though), but I guess from kernel's perspective, "the implementation detail of the TDX module" is sort of "architectural requirement" -- at least Intel arch guys think so I guess. > > Technically there is IMO ZERO requirement to do so. > > 1) The TDX module is global > > 2) Seam-root and Seam-non-root operation are strictly a LP property. > > The only architectural prerequisite for using Seam on a LP is that > obviously the encryption/decryption mechanics have been initialized > on the package to which the LP belongs. > > I can see why it might be complicated to add/remove an LP after > initialization fact, but technically it should be possible. "kernel soft offline" actually isn't an issue. We can bring down a logical cpu after it gets initialized and then bring it up again. Only add/removal of physical cpu will cause problem: TDX MCHECK verifies all boot-time present cpus to make sure they are TDX- compatible before it enables TDX in hardware. MCHECK cannot run on hot-added CPU, so TDX cannot support physical CPU hotplug. We tried to get it clarified in the specification, and below is what TDX/module arch guys agreed to put to the TDX module spec (just checked it's not in latest public spec yet, but they said it will be in next release): " 4.1.3.2. CPU Configuration During platform boot, MCHECK verifies all logical CPUs to ensure they meet TDX’s security and certain functionality requirements, and MCHECK passes the following CPU configuration information to the NP-SEAMLDR, P-SEAMLDR and the TDX Module: · Total number of logical processors in the platform. · Total number of installed packages in the platform. · A table of per-package CPU family, model and stepping etc. identification, as enumerated by CPUID(1).EAX. The above information is static and does not change after platform boot and MCHECK run. Note: TDX doesn’t support adding or removing CPUs from TDX security perimeter, as checked my MCHECK. BIOS should prevent CPUs from being hot-added or hot-removed after platform boots. The TDX module performs additional checks of the CPU’s configuration and supported features, by reading MSRs and CPUID information as described in the following sections. " > > TDX/Seam is not that special. > > But what's absolutely annoying is that the documentation lacks any > information about the choice of enforcement which has been hardcoded > into the Seam module for whatever reasons. > > Maybe I overlooked it, but then it's definitely well hidden. It depends on the TDX Module implementation, which TDX arch guys think should be "architectural" I think.
On Wed, 2022-11-23 at 00:30 +0000, Huang, Kai wrote: > > > > Clearly that's nowhere spelled out in the documentation, but I don't > > buy the 'architecturaly required' argument not at all. It's an > > implementation detail of the TDX module. > > Hi Thomas, > > Thanks for review! > > I agree on hardware level there shouldn't be such requirement (not 100% sure > though), but I guess from kernel's perspective, "the implementation detail of > the TDX module" is sort of "architectural requirement" -- at least Intel arch > guys think so I guess. Let me double check with the TDX module folks and figure out the root of the requirement. Thanks.
On Tue, 2022-11-22 at 10:05 -0800, Dave Hansen wrote: > On 11/20/22 16:26, Kai Huang wrote: > > 2) It is more flexible to support TDX module runtime updating in the > > future (after updating the TDX module, it needs to be initialized > > again). > > I hate this generic blabber about "more flexible". There's a *REASON* > it's more flexible, so let's talk about the reasons, please. > > It's really something like this, right? > > The TDX module design allows it to be updated while the system > is running. The update procedure shares quite a few steps with > this "on demand" loading mechanism. The hope is that much of > this "on demand" mechanism can be shared with a future "update" > mechanism. A boot-time TDX module implementation would not be > able to share much code with the update mechanism. Yes. Thanks. > > > > 3) It avoids having to do a "temporary" solution to handle VMXON in the > > core (non-KVM) kernel for now. This is because SEAMCALL requires CPU > > being in VMX operation (VMXON is done), but currently only KVM handles > > VMXON. Adding VMXON support to the core kernel isn't trivial. More > > importantly, from long-term a reference-based approach is likely needed > > in the core kernel as more kernel components are likely needed to > > support TDX as well. Allow KVM to initialize the TDX module avoids > > having to handle VMXON during kernel boot for now. > > There are a lot of words in there. > > 3) Loading the TDX module requires VMX to be enabled. Currently, only > the kernel KVM code mucks with VMX enabling. If the TDX module were > to be initialized separately from KVM (like at boot), the boot code > would need to be taught how to muck with VMX enabling and KVM would > need to be taught how to cope with that. Making KVM itself > responsible for TDX initialization lets the rest of the kernel stay > blissfully unaware of VMX. Thanks. > > > Add a placeholder tdx_enable() to detect and initialize the TDX module > > on demand, with a state machine protected by mutex to support concurrent > > calls from multiple callers. > > As opposed to concurrent calls from one caller? ;) How about below? " Add a placeholder tdx_enable() to initialize the TDX module on demand. So far KVM will be the only caller, but other kernel components will need to use it too in the future. Just use a mutex protected state machine to make sure the module initialization can only be done once. " > > > The TDX module will be initialized in multi-steps defined by the TDX > > module: > > > > 1) Global initialization; > > 2) Logical-CPU scope initialization; > > 3) Enumerate the TDX module capabilities and platform configuration; > > 4) Configure the TDX module about TDX usable memory ranges and global > > KeyID information; > > 5) Package-scope configuration for the global KeyID; > > 6) Initialize usable memory ranges based on 4). > > This would actually be a nice place to call out the SEAMCALL names and > mention that each of these steps involves a set of SEAMCALLs. How about below? " The TDX module will be initialized in multi-steps defined by the TDX module and each of those steps involves a specific SEAMCALL: 1) Global initialization using TDH.SYS.INIT. 2) Logical-CPU scope initialization using TDH.SYS.LP.INIT. 3) Enumerate the TDX module capabilities and TDX-capable memory information using TDH.SYS.INFO. 4) Configure the TDX module with TDX-usable memory regions and the global KeyID information using TDH.SYS.CONFIG. 5) Package-scope configuration for the global KeyID using TDH.SYS.KEY.CONFIG. 6) Initialize TDX-usable memory regions using TDH.SYS.TDMR.INIT. Before step 4), the kernel needs to build a set of TDX-usable memory regions, and construct data structures to cover those regions. " > > > The TDX module can also be shut down at any time during its lifetime. > > In case of any error during the initialization process, shut down the > > module. It's pointless to leave the module in any intermediate state > > during the initialization. > > > > Both logical CPU scope initialization and shutting down the TDX module > > require calling SEAMCALL on all boot-time present CPUs. For simplicity > > just temporarily disable CPU hotplug during the module initialization. > > You might want to more precisely define "boot-time present CPUs". The > boot of *what*? How about use "BIOS-enabled CPUs" instead? If OK I'll use it consistently across this series. > > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > > index 8d943bdc8335..28c187b8726f 100644 > > --- a/arch/x86/virt/vmx/tdx/tdx.c > > +++ b/arch/x86/virt/vmx/tdx/tdx.c > > @@ -10,15 +10,34 @@ > > #include <linux/types.h> > > #include <linux/init.h> > > #include <linux/printk.h> > > +#include <linux/mutex.h> > > +#include <linux/cpu.h> > > +#include <linux/cpumask.h> > > #include <asm/msr-index.h> > > #include <asm/msr.h> > > #include <asm/apic.h> > > #include <asm/tdx.h> > > #include "tdx.h" > > > > +/* TDX module status during initialization */ > > +enum tdx_module_status_t { > > + /* TDX module hasn't been detected and initialized */ > > + TDX_MODULE_UNKNOWN, > > + /* TDX module is not loaded */ > > + TDX_MODULE_NONE, > > + /* TDX module is initialized */ > > + TDX_MODULE_INITIALIZED, > > + /* TDX module is shut down due to initialization error */ > > + TDX_MODULE_SHUTDOWN, > > +}; > > Are these part of the ABI or just a purely OS-side construct? Purely OS-side construct. I'll explicitly call out in the comment. > > > static u32 tdx_keyid_start __ro_after_init; > > static u32 tdx_keyid_num __ro_after_init; > > > > +static enum tdx_module_status_t tdx_module_status; > > +/* Prevent concurrent attempts on TDX detection and initialization */ > > +static DEFINE_MUTEX(tdx_module_lock); > > + > > /* > > * Detect TDX private KeyIDs to see whether TDX has been enabled by the > > * BIOS. Both initializing the TDX module and running TDX guest require > > @@ -104,3 +123,134 @@ bool platform_tdx_enabled(void) > > { > > return !!tdx_keyid_num; > > } > > + > > +/* > > + * Detect and initialize the TDX module. > > + * > > + * Return -ENODEV when the TDX module is not loaded, 0 when it > > + * is successfully initialized, or other error when it fails to > > + * initialize. > > + */ > > +static int init_tdx_module(void) > > +{ > > + /* The TDX module hasn't been detected */ > > + return -ENODEV; > > +} > > + > > +static void shutdown_tdx_module(void) > > +{ > > + /* TODO: Shut down the TDX module */ > > +} > > + > > +static int __tdx_enable(void) > > +{ > > + int ret; > > + > > + /* > > + * Initializing the TDX module requires doing SEAMCALL on all > > + * boot-time present CPUs. For simplicity temporarily disable > > + * CPU hotplug to prevent any CPU from going offline during > > + * the initialization. > > + */ > > + cpus_read_lock(); > > + > > + /* > > + * Check whether all boot-time present CPUs are online and > > + * return early with a message so the user can be aware. > > + * > > + * Note a non-buggy BIOS should never support physical (ACPI) > > + * CPU hotplug when TDX is enabled, and all boot-time present > > + * CPU should be enabled in MADT, so there should be no > > + * disabled_cpus and num_processors won't change at runtime > > + * either. > > + */ > > Again, there are a lot of words in that comment, but I'm not sure why > it's here. Despite all the whinging about ACPI, doesn't it boil down to: > > The TDX module itself establishes its own concept of how many > logical CPUs there are in the system when it is loaded. > This isn't accurate. TDX MCHECK records the total number of logical CPUs when the BIOS enables TDX. This happens before the TDX module is loaded. In fact the TDX module only gets this information from a secret location. > The > module will reject initialization attempts unless the kernel > runs TDX initialization code on every last CPU. > > Ensure that the kernel is able to run code on all known logical > CPUs. How about: TDX itself establishes its own concept of how many logical CPUs there are in the system when it gets enabled by the BIOS. The module will reject initialization attempts unless the kernel runs TDX initialization code on every last CPU. Ensure that the kernel is able to run code on all known logical CPUs. > > and these checks are just to see if the kernel has shot itself in the > foot and is *KNOWS* that it is currently unable to run code on some > logical CPU? Yes. > > > + if (disabled_cpus || num_online_cpus() != num_processors) { > > + pr_err("Unable to initialize the TDX module when there's offline CPU(s).\n"); > > + ret = -EINVAL; > > + goto out; > > + } > > + > > + ret = init_tdx_module(); > > + if (ret == -ENODEV) { > > Why check for -ENODEV exclusively? Is there some other error nonzero > code that indicates success? The idea is to print out "TDX module not loaded" to separate it from other errors, so that the user can get a better idea when something goes wrong. > > > + pr_info("TDX module is not loaded.\n"); > > + tdx_module_status = TDX_MODULE_NONE; > > + goto out; > > + } > > + > > + /* > > + * Shut down the TDX module in case of any error during the > > + * initialization process. It's meaningless to leave the TDX > > + * module in any middle state of the initialization process. > > + * > > + * Shutting down the module also requires doing SEAMCALL on all > > + * MADT-enabled CPUs. Do it while CPU hotplug is disabled. > > + * > > + * Return all errors during the initialization as -EFAULT as the > > + * module is always shut down. > > + */ > > + if (ret) { > > + pr_info("Failed to initialize TDX module. Shut it down.\n"); > > "Shut it down" seems wrong here. That could be interpreted as "I have > already shut it down". "Shutting down" seems better. Will change to "Shutting down" if we still want to keep the shut down patch (please see my another reply to Sean). > > > + shutdown_tdx_module(); > > + tdx_module_status = TDX_MODULE_SHUTDOWN; > > + ret = -EFAULT; > > + goto out; > > + } > > + > > + pr_info("TDX module initialized.\n"); > > + tdx_module_status = TDX_MODULE_INITIALIZED; > > +out: > > + cpus_read_unlock(); > > + > > + return ret; > > +} > > + > > +/** > > + * tdx_enable - Enable TDX by initializing the TDX module > > + * > > + * Caller to make sure all CPUs are online and in VMX operation before > > + * calling this function. CPU hotplug is temporarily disabled internally > > + * to prevent any cpu from going offline. > > "cpu" or "CPU"? > > > + * This function can be called in parallel by multiple callers. > > + * > > + * Return: > > + * > > + * * 0: The TDX module has been successfully initialized. > > + * * -ENODEV: The TDX module is not loaded, or TDX is not supported. > > + * * -EINVAL: The TDX module cannot be initialized due to certain > > + * conditions are not met (i.e. when not all MADT-enabled > > + * CPUs are not online). > > + * * -EFAULT: Other internal fatal errors, or the TDX module is in > > + * shutdown mode due to it failed to initialize in previous > > + * attempts. > > + */ > > I honestly don't think all these error codes mean anything. They're > plumbed nowhere and the use of -EFAULT is just plain wrong. > > Nobody can *DO* anything with these anyway. > > Just give one error code and make sure that you have pr_info()'s around > to make it clear what went wrong. Then just do -EINVAL universally. > Remove all the nonsense comments. OK. > > +int tdx_enable(void) > > +{ > > + int ret; > > + > > + if (!platform_tdx_enabled()) > > + return -ENODEV; > > + > > + mutex_lock(&tdx_module_lock); > > + > > + switch (tdx_module_status) { > > + case TDX_MODULE_UNKNOWN: > > + ret = __tdx_enable(); > > + break; > > + case TDX_MODULE_NONE: > > + ret = -ENODEV; > > + break; > > TDX_MODULE_NONE should probably be called TDX_MODULE_NOT_LOADED. A > comment would also be nice: > > /* The BIOS did not load the module. No way to fix that. */ > > > + case TDX_MODULE_INITIALIZED: > > /* Already initialized, great, tell the caller: */ Will do. > > > + ret = 0; > > + break; > > + default: > > + WARN_ON_ONCE(tdx_module_status != TDX_MODULE_SHUTDOWN); > > + ret = -EFAULT; > > + break; > > + } > > I don't get what that default: is for or what it has to do with > TDX_MODULE_SHUTDOWN. I meant we can only have 4 possible status, and the default case must be the TDX_MODULE_SHUTDOWN state. I think I can just remove that WARN()? > > > > + mutex_unlock(&tdx_module_lock); > > + > > + return ret; > > +} > > +EXPORT_SYMBOL_GPL(tdx_enable); >
Kai! On Wed, Nov 23 2022 at 00:30, Kai Huang wrote: > On Tue, 2022-11-22 at 21:03 +0100, Thomas Gleixner wrote: >> Clearly that's nowhere spelled out in the documentation, but I don't >> buy the 'architecturaly required' argument not at all. It's an >> implementation detail of the TDX module. > > I agree on hardware level there shouldn't be such requirement (not 100% sure > though), but I guess from kernel's perspective, "the implementation detail of > the TDX module" is sort of "architectural requirement" Sure, but then it needs to be clearly documented so. > -- at least Intel arch guys think so I guess. Intel "arch" guys? You might look up the meanings of "arch" in a dictionary. LKML is not twatter. >> Technically there is IMO ZERO requirement to do so. >> >> 1) The TDX module is global >> >> 2) Seam-root and Seam-non-root operation are strictly a LP property. >> >> The only architectural prerequisite for using Seam on a LP is that >> obviously the encryption/decryption mechanics have been initialized >> on the package to which the LP belongs. >> >> I can see why it might be complicated to add/remove an LP after >> initialization fact, but technically it should be possible. > > "kernel soft offline" actually isn't an issue. We can bring down a logical cpu > after it gets initialized and then bring it up again. That's the whole point where this discussion started: _AFTER_ it gets initialized. Which means that, e.g. adding "nosmt" to the kernel command line will make TDX fail hard because at the point where TDX is initialized the hyperthreads are not 'soft' online and cannot respond to anything, but the BIOS already accounted them. This is just wrong as we all know that "nosmt" is sadly one of the obvious counter measures for the never ending flood of speculation issues. > Only add/removal of physical cpu will cause problem: You wish. > TDX MCHECK verifies all boot-time present cpus to make sure they are TDX- > compatible before it enables TDX in hardware. MCHECK cannot run on hot-added > CPU, so TDX cannot support physical CPU hotplug. TDX can rightfully impose the limitation that it only executes on CPUs, which are known at boot/initialization time, and only utilizes "known" memory. That's it, but that does not enforce to prevent physical hotplug in general. > We tried to get it clarified in the specification, and below is what TDX/module > arch guys agreed to put to the TDX module spec (just checked it's not in latest > public spec yet, but they said it will be in next release): > > " > 4.1.3.2. CPU Configuration > > During platform boot, MCHECK verifies all logical CPUs to ensure they > meet TDX’s That MCHECK falls into the category of security voodoo. It needs to verify _ALL_ logical CPUs to ensure that Intel did not put different models and steppings into a package or what? > security and certain functionality requirements, and MCHECK passes the following > CPU configuration information to the NP-SEAMLDR, P-SEAMLDR and the TDX Module: > > · Total number of logical processors in the platform. You surely need MCHECK for this > · Total number of installed packages in the platform. and for this... > · A table of per-package CPU family, model and stepping etc. > identification, as enumerated by CPUID(1).EAX. > The above information is static and does not change after platform boot and > MCHECK run. > > Note: TDX doesn’t support adding or removing CPUs from TDX security > perimeter, as checked my MCHECK. More security voodoo. The TDX security perimeter has nothing to do with adding or removing CPUs on a system. If that'd be true then TDX is a complete fail. > BIOS should prevent CPUs from being hot-added or hot-removed after > platform boots. If the BIOS does not prevent it, then TDX and the Seam module will not even notice unless the OS tries to invoke seamcall() on a newly plugged CPU. A newly added CPU and newly added memory should not have any impact on the TDX integrity of the already running system. If they have then again, TDX is broken by design. > The TDX module performs additional checks of the CPU’s configuration and > supported features, by reading MSRs and CPUID information as described in the > following sections. to ensure that the MCHECK generated table is still valid at the point where TDX is initialized? That said, this documentation is at least better than the existing void, but that does not make it technically correct. Thanks, tglx
On Wed, 2022-11-23 at 12:05 +0100, Thomas Gleixner wrote: > Kai! > > On Wed, Nov 23 2022 at 00:30, Kai Huang wrote: > > On Tue, 2022-11-22 at 21:03 +0100, Thomas Gleixner wrote: > > > Clearly that's nowhere spelled out in the documentation, but I don't > > > buy the 'architecturaly required' argument not at all. It's an > > > implementation detail of the TDX module. > > > > I agree on hardware level there shouldn't be such requirement (not 100% sure > > though), but I guess from kernel's perspective, "the implementation detail of > > the TDX module" is sort of "architectural requirement" > > Sure, but then it needs to be clearly documented so. > > > -- at least Intel arch guys think so I guess. > > Intel "arch" guys? You might look up the meanings of "arch" in a > dictionary. LKML is not twatter. > > > > Technically there is IMO ZERO requirement to do so. > > > > > > 1) The TDX module is global > > > > > > 2) Seam-root and Seam-non-root operation are strictly a LP property. > > > > > > The only architectural prerequisite for using Seam on a LP is that > > > obviously the encryption/decryption mechanics have been initialized > > > on the package to which the LP belongs. > > > > > > I can see why it might be complicated to add/remove an LP after > > > initialization fact, but technically it should be possible. > > > > "kernel soft offline" actually isn't an issue. We can bring down a logical cpu > > after it gets initialized and then bring it up again. > > That's the whole point where this discussion started: _AFTER_ it gets > initialized. > > Which means that, e.g. adding "nosmt" to the kernel command line will > make TDX fail hard because at the point where TDX is initialized the > hyperthreads are not 'soft' online and cannot respond to anything, but > the BIOS already accounted them. > > This is just wrong as we all know that "nosmt" is sadly one of the > obvious counter measures for the never ending flood of speculation > issues. Agree. As said in my other replies, I'll follow up with TDX module guys on this. > > > Only add/removal of physical cpu will cause problem: > > You wish. > > > TDX MCHECK verifies all boot-time present cpus to make sure they are TDX- > > compatible before it enables TDX in hardware. MCHECK cannot run on hot-added > > CPU, so TDX cannot support physical CPU hotplug. > > TDX can rightfully impose the limitation that it only executes on CPUs, > which are known at boot/initialization time, and only utilizes "known" > memory. That's it, but that does not enforce to prevent physical hotplug > in general. Adding physical CPUs should have no problem I guess, they just cannot run TDX. TDX architecture doesn't expect they can run TDX code anyway. But would physical removal of boot-time present CPU cause problem? TDX MCHECK checks/records boot-time present CPUs. If a CPU is removed and then a new one is added, then TDX still treats it is TDX-compatible, but it may actually not. So if this happens, from functionality's point of view, it can break. I think TDX still wants to guarantee TDX code can work correctly on "TDX recorded" CPUs. Also, I am not sure whether there's any security issue if a malicious kernel tries to run TDX code on such removed-then-added CPU. This seems a TDX architecture problem rather than kernel policy issue. > > > We tried to get it clarified in the specification, and below is what TDX/module > > arch guys agreed to put to the TDX module spec (just checked it's not in latest > > public spec yet, but they said it will be in next release): > > > > " > > 4.1.3.2. CPU Configuration > > > > During platform boot, MCHECK verifies all logical CPUs to ensure they > > meet TDX’s > > That MCHECK falls into the category of security voodoo. > > It needs to verify _ALL_ logical CPUs to ensure that Intel did not put > different models and steppings into a package or what? I am guessing so. > > > security and certain functionality requirements, and MCHECK passes the following > > CPU configuration information to the NP-SEAMLDR, P-SEAMLDR and the TDX Module: > > > > · Total number of logical processors in the platform. > > You surely need MCHECK for this > > > · Total number of installed packages in the platform. > > and for this... > > > · A table of per-package CPU family, model and stepping etc. > > identification, as enumerated by CPUID(1).EAX. > > The above information is static and does not change after platform boot and > > MCHECK run. > > > > Note: TDX doesn’t support adding or removing CPUs from TDX security > > perimeter, as checked my MCHECK. > > More security voodoo. The TDX security perimeter has nothing to do with > adding or removing CPUs on a system. If that'd be true then TDX is a > complete fail. No argument here. > > BIOS should prevent CPUs from being hot-added or hot-removed after > > platform boots. > > If the BIOS does not prevent it, then TDX and the Seam module will not > even notice unless the OS tries to invoke seamcall() on a newly plugged > CPU. > > A newly added CPU and newly added memory should not have any impact on > the TDX integrity of the already running system. If they have then > again, TDX is broken by design. No argument here either. > > > The TDX module performs additional checks of the CPU’s configuration and > > supported features, by reading MSRs and CPUID information as described in the > > following sections. > > to ensure that the MCHECK generated table is still valid at the point > where TDX is initialized? I think it is trying to say: MCHECK doesn't do all the verifications. Some verfications are deferred to the TDX module to check when it gets initialized. > > That said, this documentation is at least better than the existing void, > but that does not make it technically correct. > > Thanks, > > tglx
On 11/23/22 02:18, Huang, Kai wrote: >> Again, there are a lot of words in that comment, but I'm not sure why >> it's here. Despite all the whinging about ACPI, doesn't it boil down to: >> >> The TDX module itself establishes its own concept of how many >> logical CPUs there are in the system when it is loaded. >> > This isn't accurate. TDX MCHECK records the total number of logical CPUs when > the BIOS enables TDX. This happens before the TDX module is loaded. In fact > the TDX module only gets this information from a secret location. Kai, this is the point where I lose patience with the conversation around this series. I'll you paste you the line of code where the TDX module literally "establishes its own concept of how many logical CPUs there are in the system": > //NUM_LPS > tdx_global_data_ptr->num_of_lps = sysinfo_table_ptr->mcheck_fields.tot_num_lps; Yes, this is derived directly from MCHECK. But, this concept is separate from MCHECK. We have seen zero actual facts from you or other folks at Intel that this is anything other than an arbitrary choice made for the convenience of the TDX module. It _might_ not be this way. I'm open to hearing those facts and changing my position on this. But, please bring facts, not random references to "secret locations" that aren't that secret.
On Wed, 2022-11-23 at 08:58 -0800, Dave Hansen wrote: > On 11/23/22 02:18, Huang, Kai wrote: > > > Again, there are a lot of words in that comment, but I'm not sure why > > > it's here. Despite all the whinging about ACPI, doesn't it boil down to: > > > > > > The TDX module itself establishes its own concept of how many > > > logical CPUs there are in the system when it is loaded. > > > > > This isn't accurate. TDX MCHECK records the total number of logical CPUs when > > the BIOS enables TDX. This happens before the TDX module is loaded. In fact > > the TDX module only gets this information from a secret location. > > Kai, this is the point where I lose patience with the conversation > around this series. I'll you paste you the line of code where the TDX > module literally "establishes its own concept of how many logical CPUs > there are in the system": > > > //NUM_LPS > > tdx_global_data_ptr->num_of_lps = sysinfo_table_ptr->mcheck_fields.tot_num_lps; > > Yes, this is derived directly from MCHECK. But, this concept is > separate from MCHECK. We have seen zero actual facts from you or other > folks at Intel that this is anything other than an arbitrary choice made > for the convenience of the TDX module. It _might_ not be this way. I'm > open to hearing those facts and changing my position on this. > > But, please bring facts, not random references to "secret locations" > that aren't that secret. Agreed. Thanks for providing the comment and will use it.
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 51c4222a13ae..05fc89d9742a 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -101,8 +101,10 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1, #ifdef CONFIG_INTEL_TDX_HOST bool platform_tdx_enabled(void); +int tdx_enable(void); #else /* !CONFIG_INTEL_TDX_HOST */ static inline bool platform_tdx_enabled(void) { return false; } +static inline int tdx_enable(void) { return -ENODEV; } #endif /* CONFIG_INTEL_TDX_HOST */ #endif /* !__ASSEMBLY__ */ diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 8d943bdc8335..28c187b8726f 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -10,15 +10,34 @@ #include <linux/types.h> #include <linux/init.h> #include <linux/printk.h> +#include <linux/mutex.h> +#include <linux/cpu.h> +#include <linux/cpumask.h> #include <asm/msr-index.h> #include <asm/msr.h> #include <asm/apic.h> #include <asm/tdx.h> #include "tdx.h" +/* TDX module status during initialization */ +enum tdx_module_status_t { + /* TDX module hasn't been detected and initialized */ + TDX_MODULE_UNKNOWN, + /* TDX module is not loaded */ + TDX_MODULE_NONE, + /* TDX module is initialized */ + TDX_MODULE_INITIALIZED, + /* TDX module is shut down due to initialization error */ + TDX_MODULE_SHUTDOWN, +}; + static u32 tdx_keyid_start __ro_after_init; static u32 tdx_keyid_num __ro_after_init; +static enum tdx_module_status_t tdx_module_status; +/* Prevent concurrent attempts on TDX detection and initialization */ +static DEFINE_MUTEX(tdx_module_lock); + /* * Detect TDX private KeyIDs to see whether TDX has been enabled by the * BIOS. Both initializing the TDX module and running TDX guest require @@ -104,3 +123,134 @@ bool platform_tdx_enabled(void) { return !!tdx_keyid_num; } + +/* + * Detect and initialize the TDX module. + * + * Return -ENODEV when the TDX module is not loaded, 0 when it + * is successfully initialized, or other error when it fails to + * initialize. + */ +static int init_tdx_module(void) +{ + /* The TDX module hasn't been detected */ + return -ENODEV; +} + +static void shutdown_tdx_module(void) +{ + /* TODO: Shut down the TDX module */ +} + +static int __tdx_enable(void) +{ + int ret; + + /* + * Initializing the TDX module requires doing SEAMCALL on all + * boot-time present CPUs. For simplicity temporarily disable + * CPU hotplug to prevent any CPU from going offline during + * the initialization. + */ + cpus_read_lock(); + + /* + * Check whether all boot-time present CPUs are online and + * return early with a message so the user can be aware. + * + * Note a non-buggy BIOS should never support physical (ACPI) + * CPU hotplug when TDX is enabled, and all boot-time present + * CPU should be enabled in MADT, so there should be no + * disabled_cpus and num_processors won't change at runtime + * either. + */ + if (disabled_cpus || num_online_cpus() != num_processors) { + pr_err("Unable to initialize the TDX module when there's offline CPU(s).\n"); + ret = -EINVAL; + goto out; + } + + ret = init_tdx_module(); + if (ret == -ENODEV) { + pr_info("TDX module is not loaded.\n"); + tdx_module_status = TDX_MODULE_NONE; + goto out; + } + + /* + * Shut down the TDX module in case of any error during the + * initialization process. It's meaningless to leave the TDX + * module in any middle state of the initialization process. + * + * Shutting down the module also requires doing SEAMCALL on all + * MADT-enabled CPUs. Do it while CPU hotplug is disabled. + * + * Return all errors during the initialization as -EFAULT as the + * module is always shut down. + */ + if (ret) { + pr_info("Failed to initialize TDX module. Shut it down.\n"); + shutdown_tdx_module(); + tdx_module_status = TDX_MODULE_SHUTDOWN; + ret = -EFAULT; + goto out; + } + + pr_info("TDX module initialized.\n"); + tdx_module_status = TDX_MODULE_INITIALIZED; +out: + cpus_read_unlock(); + + return ret; +} + +/** + * tdx_enable - Enable TDX by initializing the TDX module + * + * Caller to make sure all CPUs are online and in VMX operation before + * calling this function. CPU hotplug is temporarily disabled internally + * to prevent any cpu from going offline. + * + * This function can be called in parallel by multiple callers. + * + * Return: + * + * * 0: The TDX module has been successfully initialized. + * * -ENODEV: The TDX module is not loaded, or TDX is not supported. + * * -EINVAL: The TDX module cannot be initialized due to certain + * conditions are not met (i.e. when not all MADT-enabled + * CPUs are not online). + * * -EFAULT: Other internal fatal errors, or the TDX module is in + * shutdown mode due to it failed to initialize in previous + * attempts. + */ +int tdx_enable(void) +{ + int ret; + + if (!platform_tdx_enabled()) + return -ENODEV; + + mutex_lock(&tdx_module_lock); + + switch (tdx_module_status) { + case TDX_MODULE_UNKNOWN: + ret = __tdx_enable(); + break; + case TDX_MODULE_NONE: + ret = -ENODEV; + break; + case TDX_MODULE_INITIALIZED: + ret = 0; + break; + default: + WARN_ON_ONCE(tdx_module_status != TDX_MODULE_SHUTDOWN); + ret = -EFAULT; + break; + } + + mutex_unlock(&tdx_module_lock); + + return ret; +} +EXPORT_SYMBOL_GPL(tdx_enable);