[v7,04/20] x86/virt/tdx: Add skeleton to initialize TDX on demand

Message ID d26254af8e5b3dcca8a070703c5d6d04f48d47a9.1668988357.git.kai.huang@intel.com
State New
Headers
Series TDX host kernel support |

Commit Message

Kai Huang Nov. 21, 2022, 12:26 a.m. UTC
  Before the TDX module can be used to create and run TDX guests, it must
be loaded and properly initialized.  The TDX module is expected to be
loaded by the BIOS, and to be initialized by the kernel.

TDX introduces a new CPU mode: Secure Arbitration Mode (SEAM).  The host
kernel communicates with the TDX module via a new SEAMCALL instruction.
The TDX module implements a set of SEAMCALL leaf functions to allow the
host kernel to initialize it.

The TDX module can be initialized only once in its lifetime.  Instead
of always initializing it at boot time, this implementation chooses an
"on demand" approach to initialize TDX until there is a real need (e.g
when requested by KVM).  This approach has below pros:

1) It avoids consuming the memory that must be allocated by kernel and
given to the TDX module as metadata (~1/256th of the TDX-usable memory),
and also saves the CPU cycles of initializing the TDX module (and the
metadata) when TDX is not used at all.

2) It is more flexible to support TDX module runtime updating in the
future (after updating the TDX module, it needs to be initialized
again).

3) It avoids having to do a "temporary" solution to handle VMXON in the
core (non-KVM) kernel for now.  This is because SEAMCALL requires CPU
being in VMX operation (VMXON is done), but currently only KVM handles
VMXON.  Adding VMXON support to the core kernel isn't trivial.  More
importantly, from long-term a reference-based approach is likely needed
in the core kernel as more kernel components are likely needed to
support TDX as well.  Allow KVM to initialize the TDX module avoids
having to handle VMXON during kernel boot for now.

Add a placeholder tdx_enable() to detect and initialize the TDX module
on demand, with a state machine protected by mutex to support concurrent
calls from multiple callers.

The TDX module will be initialized in multi-steps defined by the TDX
module:

  1) Global initialization;
  2) Logical-CPU scope initialization;
  3) Enumerate the TDX module capabilities and platform configuration;
  4) Configure the TDX module about TDX usable memory ranges and global
     KeyID information;
  5) Package-scope configuration for the global KeyID;
  6) Initialize usable memory ranges based on 4).

The TDX module can also be shut down at any time during its lifetime.
In case of any error during the initialization process, shut down the
module.  It's pointless to leave the module in any intermediate state
during the initialization.

Both logical CPU scope initialization and shutting down the TDX module
require calling SEAMCALL on all boot-time present CPUs.  For simplicity
just temporarily disable CPU hotplug during the module initialization.

Note TDX architecturally doesn't support physical CPU hot-add/removal.
A non-buggy BIOS should never support ACPI CPU hot-add/removal.  This
implementation doesn't explicitly handle ACPI CPU hot-add/removal but
depends on the BIOS to do the right thing.

Reviewed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---

v6 -> v7:
 - No change.

v5 -> v6:
 - Added code to set status to TDX_MODULE_NONE if TDX module is not
   loaded (Chao)
 - Added Chao's Reviewed-by.
 - Improved comments around cpus_read_lock().

- v3->v5 (no feedback on v4):
 - Removed the check that SEAMRR and TDX KeyID have been detected on
   all present cpus.
 - Removed tdx_detect().
 - Added num_online_cpus() to MADT-enabled CPUs check within the CPU
   hotplug lock and return early with error message.
 - Improved dmesg printing for TDX module detection and initialization.

---
 arch/x86/include/asm/tdx.h  |   2 +
 arch/x86/virt/vmx/tdx/tdx.c | 150 ++++++++++++++++++++++++++++++++++++
 2 files changed, 152 insertions(+)
  

Comments

Peter Zijlstra Nov. 22, 2022, 9:02 a.m. UTC | #1
On Mon, Nov 21, 2022 at 01:26:26PM +1300, Kai Huang wrote:
> +static int __tdx_enable(void)
> +{
> +	int ret;
> +
> +	/*
> +	 * Initializing the TDX module requires doing SEAMCALL on all
> +	 * boot-time present CPUs.  For simplicity temporarily disable
> +	 * CPU hotplug to prevent any CPU from going offline during
> +	 * the initialization.
> +	 */
> +	cpus_read_lock();
> +
> +	/*
> +	 * Check whether all boot-time present CPUs are online and
> +	 * return early with a message so the user can be aware.
> +	 *
> +	 * Note a non-buggy BIOS should never support physical (ACPI)
> +	 * CPU hotplug when TDX is enabled, and all boot-time present
> +	 * CPU should be enabled in MADT, so there should be no
> +	 * disabled_cpus and num_processors won't change at runtime
> +	 * either.
> +	 */
> +	if (disabled_cpus || num_online_cpus() != num_processors) {
> +		pr_err("Unable to initialize the TDX module when there's offline CPU(s).\n");
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	ret = init_tdx_module();
> +	if (ret == -ENODEV) {
> +		pr_info("TDX module is not loaded.\n");
> +		tdx_module_status = TDX_MODULE_NONE;
> +		goto out;
> +	}
> +
> +	/*
> +	 * Shut down the TDX module in case of any error during the
> +	 * initialization process.  It's meaningless to leave the TDX
> +	 * module in any middle state of the initialization process.
> +	 *
> +	 * Shutting down the module also requires doing SEAMCALL on all
> +	 * MADT-enabled CPUs.  Do it while CPU hotplug is disabled.
> +	 *
> +	 * Return all errors during the initialization as -EFAULT as the
> +	 * module is always shut down.
> +	 */
> +	if (ret) {
> +		pr_info("Failed to initialize TDX module. Shut it down.\n");
> +		shutdown_tdx_module();
> +		tdx_module_status = TDX_MODULE_SHUTDOWN;
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	pr_info("TDX module initialized.\n");
> +	tdx_module_status = TDX_MODULE_INITIALIZED;
> +out:
> +	cpus_read_unlock();
> +
> +	return ret;
> +}

Uhm.. so if we've offlined all the SMT siblings because of some
speculation fail or other, this TDX thing will fail to initialize?

Because as I understand it; this TDX initialization happens some random
time after boot, when the first (TDX using) KVM instance gets created,
long after the speculation mitigations are enforced.
  
Thomas Gleixner Nov. 22, 2022, 10:31 a.m. UTC | #2
On Tue, Nov 22 2022 at 10:02, Peter Zijlstra wrote:
> On Mon, Nov 21, 2022 at 01:26:26PM +1300, Kai Huang wrote:
>> +	cpus_read_unlock();
>> +
>> +	return ret;
>> +}
>
> Uhm.. so if we've offlined all the SMT siblings because of some
> speculation fail or other, this TDX thing will fail to initialize?
>
> Because as I understand it; this TDX initialization happens some random
> time after boot, when the first (TDX using) KVM instance gets created,
> long after the speculation mitigations are enforced.

Correct. Aside of that it's completely unclear from the changelog why
TDX needs to run the seamcall on _all_ present CPUs and why it cannot
handle CPU being hotplugged later.

It's pretty much obvious that a TDX guest can only run on CPUs where
the seam module has been initialized, but where does the requirement
come from that _ALL_ CPUs must be initialized and _ALL_ CPUs must be
able to run TDX guests?

I just went and read through the documentation again.

  "1. After loading the Intel TDX module, the host VMM should call the
      TDH.SYS.INIT function to globally initialize the module.

   2. The host VMM should then call the TDH.SYS.LP.INIT function on each
      logical processor. TDH.SYS.LP.INIT is intended to initialize the
      module within the scope of the Logical Processor (LP)."

This clearly tells me, that:

  1) TDX must be globally initialized (once)

  2) TDX must be initialized on each logical processor on which TDX
     root/non-root operation should be executed

But it does not define any requirement for doing this on all logical
processors and for preventing physical hotplug (Neither for CPUs nor for
memory).

Nothing in the TDX specs and docs mentions physical hotplug or a
requirement for invoking seamcall on the world.

Thanks,

        tglx
  
Dave Hansen Nov. 22, 2022, 3:35 p.m. UTC | #3
On 11/22/22 02:31, Thomas Gleixner wrote:
> Nothing in the TDX specs and docs mentions physical hotplug or a
> requirement for invoking seamcall on the world.

The TDX module source is actually out there[1] for us to look at.  It's
in a lovely, convenient zip file, but you can read it if sufficiently
motivated.

It has this lovely nugget in it:

WARNING!!! Proprietary License!!  Avert your virgin eyes!!!

>     if (tdx_global_data_ptr->num_of_init_lps < tdx_global_data_ptr->num_of_lps)
>     {
>         TDX_ERROR("Num of initialized lps %d is smaller than total num of lps %d\n",
>                     tdx_global_data_ptr->num_of_init_lps, tdx_global_data_ptr->num_of_lps);
>         retval = TDX_SYS_CONFIG_NOT_PENDING;
>         goto EXIT;
>     }

tdx_global_data_ptr->num_of_init_lps is incremented at TDH.SYS.INIT
time.  That if() is called at TDH.SYS.CONFIG time to help bring the
module up.

So, I think you're right.  I don't see the docs that actually *explain*
this "you must seamcall all the things" requirement.

1.
https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html
  
Dave Hansen Nov. 22, 2022, 6:05 p.m. UTC | #4
On 11/20/22 16:26, Kai Huang wrote:
> 2) It is more flexible to support TDX module runtime updating in the
> future (after updating the TDX module, it needs to be initialized
> again).

I hate this generic blabber about "more flexible".  There's a *REASON*
it's more flexible, so let's talk about the reasons, please.

It's really something like this, right?

	The TDX module design allows it to be updated while the system
	is running.  The update procedure shares quite a few steps with
	this "on demand" loading mechanism.  The hope is that much of
	this "on demand" mechanism can be shared with a future "update"
	mechanism.  A boot-time TDX module implementation would not be
	able to share much code with the update mechanism.


> 3) It avoids having to do a "temporary" solution to handle VMXON in the
> core (non-KVM) kernel for now.  This is because SEAMCALL requires CPU
> being in VMX operation (VMXON is done), but currently only KVM handles
> VMXON.  Adding VMXON support to the core kernel isn't trivial.  More
> importantly, from long-term a reference-based approach is likely needed
> in the core kernel as more kernel components are likely needed to
> support TDX as well.  Allow KVM to initialize the TDX module avoids
> having to handle VMXON during kernel boot for now.

There are a lot of words in there.

3) Loading the TDX module requires VMX to be enabled.  Currently, only
   the kernel KVM code mucks with VMX enabling.  If the TDX module were
   to be initialized separately from KVM (like at boot), the boot code
   would need to be taught how to muck with VMX enabling and KVM would
   need to be taught how to cope with that.  Making KVM itself
   responsible for TDX initialization lets the rest of the kernel stay
   blissfully unaware of VMX.

> Add a placeholder tdx_enable() to detect and initialize the TDX module
> on demand, with a state machine protected by mutex to support concurrent
> calls from multiple callers.

As opposed to concurrent calls from one caller? ;)

> The TDX module will be initialized in multi-steps defined by the TDX
> module:
> 
>   1) Global initialization;
>   2) Logical-CPU scope initialization;
>   3) Enumerate the TDX module capabilities and platform configuration;
>   4) Configure the TDX module about TDX usable memory ranges and global
>      KeyID information;
>   5) Package-scope configuration for the global KeyID;
>   6) Initialize usable memory ranges based on 4).

This would actually be a nice place to call out the SEAMCALL names and
mention that each of these steps involves a set of SEAMCALLs.

> The TDX module can also be shut down at any time during its lifetime.
> In case of any error during the initialization process, shut down the
> module.  It's pointless to leave the module in any intermediate state
> during the initialization.
> 
> Both logical CPU scope initialization and shutting down the TDX module
> require calling SEAMCALL on all boot-time present CPUs.  For simplicity
> just temporarily disable CPU hotplug during the module initialization.

You might want to more precisely define "boot-time present CPUs".  The
boot of *what*?

> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 8d943bdc8335..28c187b8726f 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -10,15 +10,34 @@
>  #include <linux/types.h>
>  #include <linux/init.h>
>  #include <linux/printk.h>
> +#include <linux/mutex.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
>  #include <asm/msr-index.h>
>  #include <asm/msr.h>
>  #include <asm/apic.h>
>  #include <asm/tdx.h>
>  #include "tdx.h"
>  
> +/* TDX module status during initialization */
> +enum tdx_module_status_t {
> +	/* TDX module hasn't been detected and initialized */
> +	TDX_MODULE_UNKNOWN,
> +	/* TDX module is not loaded */
> +	TDX_MODULE_NONE,
> +	/* TDX module is initialized */
> +	TDX_MODULE_INITIALIZED,
> +	/* TDX module is shut down due to initialization error */
> +	TDX_MODULE_SHUTDOWN,
> +};

Are these part of the ABI or just a purely OS-side construct?

>  static u32 tdx_keyid_start __ro_after_init;
>  static u32 tdx_keyid_num __ro_after_init;
>  
> +static enum tdx_module_status_t tdx_module_status;
> +/* Prevent concurrent attempts on TDX detection and initialization */
> +static DEFINE_MUTEX(tdx_module_lock);
> +
>  /*
>   * Detect TDX private KeyIDs to see whether TDX has been enabled by the
>   * BIOS.  Both initializing the TDX module and running TDX guest require
> @@ -104,3 +123,134 @@ bool platform_tdx_enabled(void)
>  {
>  	return !!tdx_keyid_num;
>  }
> +
> +/*
> + * Detect and initialize the TDX module.
> + *
> + * Return -ENODEV when the TDX module is not loaded, 0 when it
> + * is successfully initialized, or other error when it fails to
> + * initialize.
> + */
> +static int init_tdx_module(void)
> +{
> +	/* The TDX module hasn't been detected */
> +	return -ENODEV;
> +}
> +
> +static void shutdown_tdx_module(void)
> +{
> +	/* TODO: Shut down the TDX module */
> +}
> +
> +static int __tdx_enable(void)
> +{
> +	int ret;
> +
> +	/*
> +	 * Initializing the TDX module requires doing SEAMCALL on all
> +	 * boot-time present CPUs.  For simplicity temporarily disable
> +	 * CPU hotplug to prevent any CPU from going offline during
> +	 * the initialization.
> +	 */
> +	cpus_read_lock();
> +
> +	/*
> +	 * Check whether all boot-time present CPUs are online and
> +	 * return early with a message so the user can be aware.
> +	 *
> +	 * Note a non-buggy BIOS should never support physical (ACPI)
> +	 * CPU hotplug when TDX is enabled, and all boot-time present
> +	 * CPU should be enabled in MADT, so there should be no
> +	 * disabled_cpus and num_processors won't change at runtime
> +	 * either.
> +	 */

Again, there are a lot of words in that comment, but I'm not sure why
it's here.  Despite all the whinging about ACPI, doesn't it boil down to:

	The TDX module itself establishes its own concept of how many
	logical CPUs there are in the system when it is loaded.  The
	module will reject initialization attempts unless the kernel
	runs TDX initialization code on every last CPU.

	Ensure that the kernel is able to run code on all known logical
	CPUs.

and these checks are just to see if the kernel has shot itself in the
foot and is *KNOWS* that it is currently unable to run code on some
logical CPU?

> +	if (disabled_cpus || num_online_cpus() != num_processors) {
> +		pr_err("Unable to initialize the TDX module when there's offline CPU(s).\n");
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	ret = init_tdx_module();
> +	if (ret == -ENODEV) {

Why check for -ENODEV exclusively?  Is there some other error nonzero
code that indicates success?

> +		pr_info("TDX module is not loaded.\n");
> +		tdx_module_status = TDX_MODULE_NONE;
> +		goto out;
> +	}
> +
> +	/*
> +	 * Shut down the TDX module in case of any error during the
> +	 * initialization process.  It's meaningless to leave the TDX
> +	 * module in any middle state of the initialization process.
> +	 *
> +	 * Shutting down the module also requires doing SEAMCALL on all
> +	 * MADT-enabled CPUs.  Do it while CPU hotplug is disabled.
> +	 *
> +	 * Return all errors during the initialization as -EFAULT as the
> +	 * module is always shut down.
> +	 */
> +	if (ret) {
> +		pr_info("Failed to initialize TDX module. Shut it down.\n");

"Shut it down" seems wrong here.  That could be interpreted as "I have
already shut it down".  "Shutting down" seems better.

> +		shutdown_tdx_module();
> +		tdx_module_status = TDX_MODULE_SHUTDOWN;
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	pr_info("TDX module initialized.\n");
> +	tdx_module_status = TDX_MODULE_INITIALIZED;
> +out:
> +	cpus_read_unlock();
> +
> +	return ret;
> +}
> +
> +/**
> + * tdx_enable - Enable TDX by initializing the TDX module
> + *
> + * Caller to make sure all CPUs are online and in VMX operation before
> + * calling this function.  CPU hotplug is temporarily disabled internally
> + * to prevent any cpu from going offline.

"cpu" or "CPU"?

> + * This function can be called in parallel by multiple callers.
> + *
> + * Return:
> + *
> + * * 0:		The TDX module has been successfully initialized.
> + * * -ENODEV:	The TDX module is not loaded, or TDX is not supported.
> + * * -EINVAL:	The TDX module cannot be initialized due to certain
> + *		conditions are not met (i.e. when not all MADT-enabled
> + *		CPUs are not online).
> + * * -EFAULT:	Other internal fatal errors, or the TDX module is in
> + *		shutdown mode due to it failed to initialize in previous
> + *		attempts.
> + */

I honestly don't think all these error codes mean anything.  They're
plumbed nowhere and the use of -EFAULT is just plain wrong.

Nobody can *DO* anything with these anyway.

Just give one error code and make sure that you have pr_info()'s around
to make it clear what went wrong.  Then just do -EINVAL universally.
Remove all the nonsense comments.

> +int tdx_enable(void)
> +{
> +	int ret;
> +
> +	if (!platform_tdx_enabled())
> +		return -ENODEV;
> +
> +	mutex_lock(&tdx_module_lock);
> +
> +	switch (tdx_module_status) {
> +	case TDX_MODULE_UNKNOWN:
> +		ret = __tdx_enable();
> +		break;
> +	case TDX_MODULE_NONE:
> +		ret = -ENODEV;
> +		break;

TDX_MODULE_NONE should probably be called TDX_MODULE_NOT_LOADED.  A
comment would also be nice:

	/* The BIOS did not load the module.  No way to fix that. */

> +	case TDX_MODULE_INITIALIZED:

		/* Already initialized, great, tell the caller: */

> +		ret = 0;
> +		break;
> +	default:
> +		WARN_ON_ONCE(tdx_module_status != TDX_MODULE_SHUTDOWN);
> +		ret = -EFAULT;
> +		break;
> +	}

I don't get what that default: is for or what it has to do with
TDX_MODULE_SHUTDOWN.


> +	mutex_unlock(&tdx_module_lock);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(tdx_enable);
  
Thomas Gleixner Nov. 22, 2022, 8:03 p.m. UTC | #5
On Tue, Nov 22 2022 at 07:35, Dave Hansen wrote:

> On 11/22/22 02:31, Thomas Gleixner wrote:
>> Nothing in the TDX specs and docs mentions physical hotplug or a
>> requirement for invoking seamcall on the world.
>
> The TDX module source is actually out there[1] for us to look at.  It's
> in a lovely, convenient zip file, but you can read it if sufficiently
> motivated.

zip file? Version control from the last millenium?

The whole thing wants to be @github with a proper change history if
Intel wants anyone to trust this and take it serious.

/me refrains from ranting about the outrageous license choice.

> It has this lovely nugget in it:
>
> WARNING!!! Proprietary License!!  Avert your virgin eyes!!!

It's probably not the only reasons to avert the eyes.

>>     if (tdx_global_data_ptr->num_of_init_lps < tdx_global_data_ptr->num_of_lps)
>>     {
>>         TDX_ERROR("Num of initialized lps %d is smaller than total num of lps %d\n",
>>                     tdx_global_data_ptr->num_of_init_lps, tdx_global_data_ptr->num_of_lps);
>>         retval = TDX_SYS_CONFIG_NOT_PENDING;
>>         goto EXIT;
>>     }
>
> tdx_global_data_ptr->num_of_init_lps is incremented at TDH.SYS.INIT
> time.  That if() is called at TDH.SYS.CONFIG time to help bring the
> module up.
>
> So, I think you're right.  I don't see the docs that actually *explain*
> this "you must seamcall all the things" requirement.

The code actually enforces this.

At TDH.SYS.INIT which is the first operation it gets the total number
of LPs from the sysinfo table:

src/vmm_dispatcher/api_calls/tdh_sys_init.c:

    tdx_global_data_ptr->num_of_lps = sysinfo_table_ptr->mcheck_fields.tot_num_lps;

Then TDH.SYS.LP.INIT increments the count of initialized LPs.

src/vmm_dispatcher/api_calls/tdh_sys_lp_init.c:

    increment_num_of_lps(tdx_global_data_ptr)
       _lock_xadd_32b(&tdx_global_data_ptr->num_of_init_lps, 1);

Finally TDH.SYS.CONFIG checks whether _ALL_ LPs have been initialized.

src/vmm_dispatcher/api_calls/tdh_sys_config.c:

    if (tdx_global_data_ptr->num_of_init_lps < tdx_global_data_ptr->num_of_lps)

Clearly that's nowhere spelled out in the documentation, but I don't
buy the 'architecturaly required' argument not at all. It's an
implementation detail of the TDX module.

Technically there is IMO ZERO requirement to do so.

 1) The TDX module is global

 2) Seam-root and Seam-non-root operation are strictly a LP property.

    The only architectural prerequisite for using Seam on a LP is that
    obviously the encryption/decryption mechanics have been initialized
    on the package to which the LP belongs.

I can see why it might be complicated to add/remove an LP after
initialization fact, but technically it should be possible.

TDX/Seam is not that special.

But what's absolutely annoying is that the documentation lacks any
information about the choice of enforcement which has been hardcoded
into the Seam module for whatever reasons.

Maybe I overlooked it, but then it's definitely well hidden.

Thanks,

        tglx
  
Sean Christopherson Nov. 22, 2022, 8:11 p.m. UTC | #6
On Tue, Nov 22, 2022, Thomas Gleixner wrote:
> On Tue, Nov 22 2022 at 07:35, Dave Hansen wrote:
> 
> > On 11/22/22 02:31, Thomas Gleixner wrote:
> >> Nothing in the TDX specs and docs mentions physical hotplug or a
> >> requirement for invoking seamcall on the world.
> >
> > The TDX module source is actually out there[1] for us to look at.  It's
> > in a lovely, convenient zip file, but you can read it if sufficiently
> > motivated.
> 
> zip file? Version control from the last millenium?
> 
> The whole thing wants to be @github with a proper change history if
> Intel wants anyone to trust this and take it serious.
> 
> /me refrains from ranting about the outrageous license choice.

Let me know if you grab pitchforcks and torches, I'll join the mob :-)
  
Kai Huang Nov. 23, 2022, 12:30 a.m. UTC | #7
On Tue, 2022-11-22 at 21:03 +0100, Thomas Gleixner wrote:
> > >      if (tdx_global_data_ptr->num_of_init_lps < tdx_global_data_ptr-
> > > >num_of_lps)
> > >      {
> > >          TDX_ERROR("Num of initialized lps %d is smaller than total num of
> > > lps %d\n",
> > >                      tdx_global_data_ptr->num_of_init_lps,
> > > tdx_global_data_ptr->num_of_lps);
> > >          retval = TDX_SYS_CONFIG_NOT_PENDING;
> > >          goto EXIT;
> > >      }
> > 
> > tdx_global_data_ptr->num_of_init_lps is incremented at TDH.SYS.INIT
> > time.  That if() is called at TDH.SYS.CONFIG time to help bring the
> > module up.
> > 
> > So, I think you're right.  I don't see the docs that actually *explain*
> > this "you must seamcall all the things" requirement.
> 
> The code actually enforces this.
> 
> At TDH.SYS.INIT which is the first operation it gets the total number
> of LPs from the sysinfo table:
> 
> src/vmm_dispatcher/api_calls/tdh_sys_init.c:
> 
>     tdx_global_data_ptr->num_of_lps = sysinfo_table_ptr-
> >mcheck_fields.tot_num_lps;
> 
> Then TDH.SYS.LP.INIT increments the count of initialized LPs.
> 
> src/vmm_dispatcher/api_calls/tdh_sys_lp_init.c:
> 
>     increment_num_of_lps(tdx_global_data_ptr)
>        _lock_xadd_32b(&tdx_global_data_ptr->num_of_init_lps, 1);
> 
> Finally TDH.SYS.CONFIG checks whether _ALL_ LPs have been initialized.
> 
> src/vmm_dispatcher/api_calls/tdh_sys_config.c:
> 
>     if (tdx_global_data_ptr->num_of_init_lps < tdx_global_data_ptr-
> >num_of_lps)
> 
> Clearly that's nowhere spelled out in the documentation, but I don't
> buy the 'architecturaly required' argument not at all. It's an
> implementation detail of the TDX module.

Hi Thomas,

Thanks for review!

I agree on hardware level there shouldn't be such requirement (not 100% sure
though), but I guess from kernel's perspective, "the implementation detail of
the TDX module" is sort of "architectural requirement" -- at least Intel arch
guys think so I guess.

> 
> Technically there is IMO ZERO requirement to do so.
> 
>  1) The TDX module is global
> 
>  2) Seam-root and Seam-non-root operation are strictly a LP property.
> 
>     The only architectural prerequisite for using Seam on a LP is that
>     obviously the encryption/decryption mechanics have been initialized
>     on the package to which the LP belongs.
> 
> I can see why it might be complicated to add/remove an LP after
> initialization fact, but technically it should be possible.

"kernel soft offline" actually isn't an issue.  We can bring down a logical cpu
after it gets initialized and then bring it up again.

Only add/removal of physical cpu will cause problem: 

TDX MCHECK verifies all boot-time present cpus to make sure they are TDX-
compatible before it enables TDX in hardware.  MCHECK cannot run on hot-added
CPU, so TDX cannot support physical CPU hotplug.

We tried to get it clarified in the specification, and below is what TDX/module
arch guys agreed to put to the TDX module spec (just checked it's not in latest
public spec yet, but they said it will be in next release):

"
4.1.3.2.  CPU Configuration

During platform boot, MCHECK verifies all logical CPUs to ensure they meet TDX’s
security and certain functionality requirements, and MCHECK passes the following
CPU configuration information to the NP-SEAMLDR, P-SEAMLDR and the TDX Module:

·         Total number of logical processors in the platform.
·         Total number of installed packages in the platform.
·         A table of per-package CPU family, model and stepping etc.
identification, as enumerated by CPUID(1).EAX.
The above information is static and does not change after platform boot and
MCHECK run.

Note:     TDX doesn’t support adding or removing CPUs from TDX security
perimeter, as checked my MCHECK.  BIOS should prevent CPUs from being hot-added
or hot-removed after platform boots.

The TDX module performs additional checks of the CPU’s configuration and
supported features, by reading MSRs and CPUID information as described in the
following sections.
"

> 
> TDX/Seam is not that special.
> 
> But what's absolutely annoying is that the documentation lacks any
> information about the choice of enforcement which has been hardcoded
> into the Seam module for whatever reasons.
> 
> Maybe I overlooked it, but then it's definitely well hidden.

It depends on the TDX Module implementation, which TDX arch guys think should be
"architectural" I think.
  
Kai Huang Nov. 23, 2022, 1:12 a.m. UTC | #8
On Wed, 2022-11-23 at 00:30 +0000, Huang, Kai wrote:
> > 
> > Clearly that's nowhere spelled out in the documentation, but I don't
> > buy the 'architecturaly required' argument not at all. It's an
> > implementation detail of the TDX module.
> 
> Hi Thomas,
> 
> Thanks for review!
> 
> I agree on hardware level there shouldn't be such requirement (not 100% sure
> though), but I guess from kernel's perspective, "the implementation detail of
> the TDX module" is sort of "architectural requirement" -- at least Intel arch
> guys think so I guess.

Let me double check with the TDX module folks and figure out the root of the
requirement.

Thanks.
  
Kai Huang Nov. 23, 2022, 10:18 a.m. UTC | #9
On Tue, 2022-11-22 at 10:05 -0800, Dave Hansen wrote:
> On 11/20/22 16:26, Kai Huang wrote:
> > 2) It is more flexible to support TDX module runtime updating in the
> > future (after updating the TDX module, it needs to be initialized
> > again).
> 
> I hate this generic blabber about "more flexible".  There's a *REASON*
> it's more flexible, so let's talk about the reasons, please.
> 
> It's really something like this, right?
> 
> 	The TDX module design allows it to be updated while the system
> 	is running.  The update procedure shares quite a few steps with
> 	this "on demand" loading mechanism.  The hope is that much of
> 	this "on demand" mechanism can be shared with a future "update"
> 	mechanism.  A boot-time TDX module implementation would not be
> 	able to share much code with the update mechanism.

Yes.  Thanks.

> 
> 
> > 3) It avoids having to do a "temporary" solution to handle VMXON in the
> > core (non-KVM) kernel for now.  This is because SEAMCALL requires CPU
> > being in VMX operation (VMXON is done), but currently only KVM handles
> > VMXON.  Adding VMXON support to the core kernel isn't trivial.  More
> > importantly, from long-term a reference-based approach is likely needed
> > in the core kernel as more kernel components are likely needed to
> > support TDX as well.  Allow KVM to initialize the TDX module avoids
> > having to handle VMXON during kernel boot for now.
> 
> There are a lot of words in there.
> 
> 3) Loading the TDX module requires VMX to be enabled.  Currently, only
>    the kernel KVM code mucks with VMX enabling.  If the TDX module were
>    to be initialized separately from KVM (like at boot), the boot code
>    would need to be taught how to muck with VMX enabling and KVM would
>    need to be taught how to cope with that.  Making KVM itself
>    responsible for TDX initialization lets the rest of the kernel stay
>    blissfully unaware of VMX.

Thanks.

> 
> > Add a placeholder tdx_enable() to detect and initialize the TDX module
> > on demand, with a state machine protected by mutex to support concurrent
> > calls from multiple callers.
> 
> As opposed to concurrent calls from one caller? ;)

How about below?

"
Add a placeholder tdx_enable() to initialize the TDX module on demand.  So far
KVM will be the only caller, but other kernel components will need to use it too
in the future.  Just use a mutex protected state machine to make sure the module
initialization can only be done once.
"

> 
> > The TDX module will be initialized in multi-steps defined by the TDX
> > module:
> > 
> >   1) Global initialization;
> >   2) Logical-CPU scope initialization;
> >   3) Enumerate the TDX module capabilities and platform configuration;
> >   4) Configure the TDX module about TDX usable memory ranges and global
> >      KeyID information;
> >   5) Package-scope configuration for the global KeyID;
> >   6) Initialize usable memory ranges based on 4).
> 
> This would actually be a nice place to call out the SEAMCALL names and
> mention that each of these steps involves a set of SEAMCALLs.

How about below?

"
The TDX module will be initialized in multi-steps defined by the TDX module and
each of those steps involves a specific SEAMCALL:
  1) Global initialization using TDH.SYS.INIT.
  2) Logical-CPU scope initialization using TDH.SYS.LP.INIT.
  3) Enumerate the TDX module capabilities and TDX-capable memory information 
     using TDH.SYS.INFO.
  4) Configure the TDX module with TDX-usable memory regions and the global
     KeyID information using TDH.SYS.CONFIG.
  5) Package-scope configuration for the global KeyID using TDH.SYS.KEY.CONFIG.
  6) Initialize TDX-usable memory regions using TDH.SYS.TDMR.INIT.

Before step 4), the kernel needs to build a set of TDX-usable memory regions,
and construct data structures to cover those regions.
"

> 
> > The TDX module can also be shut down at any time during its lifetime.
> > In case of any error during the initialization process, shut down the
> > module.  It's pointless to leave the module in any intermediate state
> > during the initialization.
> > 
> > Both logical CPU scope initialization and shutting down the TDX module
> > require calling SEAMCALL on all boot-time present CPUs.  For simplicity
> > just temporarily disable CPU hotplug during the module initialization.
> 
> You might want to more precisely define "boot-time present CPUs".  The
> boot of *what*?

How about use "BIOS-enabled CPUs" instead? If OK I'll use it consistently across
this series.

> 
> > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> > index 8d943bdc8335..28c187b8726f 100644
> > --- a/arch/x86/virt/vmx/tdx/tdx.c
> > +++ b/arch/x86/virt/vmx/tdx/tdx.c
> > @@ -10,15 +10,34 @@
> >  #include <linux/types.h>
> >  #include <linux/init.h>
> >  #include <linux/printk.h>
> > +#include <linux/mutex.h>
> > +#include <linux/cpu.h>
> > +#include <linux/cpumask.h>
> >  #include <asm/msr-index.h>
> >  #include <asm/msr.h>
> >  #include <asm/apic.h>
> >  #include <asm/tdx.h>
> >  #include "tdx.h"
> >  
> > +/* TDX module status during initialization */
> > +enum tdx_module_status_t {
> > +	/* TDX module hasn't been detected and initialized */
> > +	TDX_MODULE_UNKNOWN,
> > +	/* TDX module is not loaded */
> > +	TDX_MODULE_NONE,
> > +	/* TDX module is initialized */
> > +	TDX_MODULE_INITIALIZED,
> > +	/* TDX module is shut down due to initialization error */
> > +	TDX_MODULE_SHUTDOWN,
> > +};
> 
> Are these part of the ABI or just a purely OS-side construct?

Purely OS-side construct.  I'll explicitly call out in the comment.

> 
> >  static u32 tdx_keyid_start __ro_after_init;
> >  static u32 tdx_keyid_num __ro_after_init;
> >  
> > +static enum tdx_module_status_t tdx_module_status;
> > +/* Prevent concurrent attempts on TDX detection and initialization */
> > +static DEFINE_MUTEX(tdx_module_lock);
> > +
> >  /*
> >   * Detect TDX private KeyIDs to see whether TDX has been enabled by the
> >   * BIOS.  Both initializing the TDX module and running TDX guest require
> > @@ -104,3 +123,134 @@ bool platform_tdx_enabled(void)
> >  {
> >  	return !!tdx_keyid_num;
> >  }
> > +
> > +/*
> > + * Detect and initialize the TDX module.
> > + *
> > + * Return -ENODEV when the TDX module is not loaded, 0 when it
> > + * is successfully initialized, or other error when it fails to
> > + * initialize.
> > + */
> > +static int init_tdx_module(void)
> > +{
> > +	/* The TDX module hasn't been detected */
> > +	return -ENODEV;
> > +}
> > +
> > +static void shutdown_tdx_module(void)
> > +{
> > +	/* TODO: Shut down the TDX module */
> > +}
> > +
> > +static int __tdx_enable(void)
> > +{
> > +	int ret;
> > +
> > +	/*
> > +	 * Initializing the TDX module requires doing SEAMCALL on all
> > +	 * boot-time present CPUs.  For simplicity temporarily disable
> > +	 * CPU hotplug to prevent any CPU from going offline during
> > +	 * the initialization.
> > +	 */
> > +	cpus_read_lock();
> > +
> > +	/*
> > +	 * Check whether all boot-time present CPUs are online and
> > +	 * return early with a message so the user can be aware.
> > +	 *
> > +	 * Note a non-buggy BIOS should never support physical (ACPI)
> > +	 * CPU hotplug when TDX is enabled, and all boot-time present
> > +	 * CPU should be enabled in MADT, so there should be no
> > +	 * disabled_cpus and num_processors won't change at runtime
> > +	 * either.
> > +	 */
> 
> Again, there are a lot of words in that comment, but I'm not sure why
> it's here.  Despite all the whinging about ACPI, doesn't it boil down to:
> 
> 	The TDX module itself establishes its own concept of how many
> 	logical CPUs there are in the system when it is loaded.  
> 

This isn't accurate.  TDX MCHECK records the total number of logical CPUs when
the BIOS enables TDX.  This happens before the TDX module is loaded.  In fact
the TDX module only gets this information from a secret location.

> The
> 	module will reject initialization attempts unless the kernel
> 	runs TDX initialization code on every last CPU.
> 
> 	Ensure that the kernel is able to run code on all known logical
> 	CPUs.

How about:

	TDX itself establishes its own concept of how many logical CPUs there 
	are in the system when it gets enabled by the BIOS.  The module will 
	reject initialization attempts unless the kernel runs TDX
initialization 
	code on every last CPU.

	Ensure that the kernel is able to run code on all known logical CPUs.

> 
> and these checks are just to see if the kernel has shot itself in the
> foot and is *KNOWS* that it is currently unable to run code on some
> logical CPU?

Yes.

> 
> > +	if (disabled_cpus || num_online_cpus() != num_processors) {
> > +		pr_err("Unable to initialize the TDX module when there's offline CPU(s).\n");
> > +		ret = -EINVAL;
> > +		goto out;
> > +	}
> > +
> > +	ret = init_tdx_module();
> > +	if (ret == -ENODEV) {
> 
> Why check for -ENODEV exclusively?  Is there some other error nonzero
> code that indicates success?

The idea is to print out "TDX module not loaded" to separate it from other
errors, so that the user can get a better idea when something goes wrong.

> 
> > +		pr_info("TDX module is not loaded.\n");
> > +		tdx_module_status = TDX_MODULE_NONE;
> > +		goto out;
> > +	}
> > +
> > +	/*
> > +	 * Shut down the TDX module in case of any error during the
> > +	 * initialization process.  It's meaningless to leave the TDX
> > +	 * module in any middle state of the initialization process.
> > +	 *
> > +	 * Shutting down the module also requires doing SEAMCALL on all
> > +	 * MADT-enabled CPUs.  Do it while CPU hotplug is disabled.
> > +	 *
> > +	 * Return all errors during the initialization as -EFAULT as the
> > +	 * module is always shut down.
> > +	 */
> > +	if (ret) {
> > +		pr_info("Failed to initialize TDX module. Shut it down.\n");
> 
> "Shut it down" seems wrong here.  That could be interpreted as "I have
> already shut it down".  "Shutting down" seems better.

Will change to "Shutting down" if we still want to keep the shut down patch
(please see my another reply to Sean).

> 
> > +		shutdown_tdx_module();
> > +		tdx_module_status = TDX_MODULE_SHUTDOWN;
> > +		ret = -EFAULT;
> > +		goto out;
> > +	}
> > +
> > +	pr_info("TDX module initialized.\n");
> > +	tdx_module_status = TDX_MODULE_INITIALIZED;
> > +out:
> > +	cpus_read_unlock();
> > +
> > +	return ret;
> > +}
> > +
> > +/**
> > + * tdx_enable - Enable TDX by initializing the TDX module
> > + *
> > + * Caller to make sure all CPUs are online and in VMX operation before
> > + * calling this function.  CPU hotplug is temporarily disabled internally
> > + * to prevent any cpu from going offline.
> 
> "cpu" or "CPU"?
> 
> > + * This function can be called in parallel by multiple callers.
> > + *
> > + * Return:
> > + *
> > + * * 0:		The TDX module has been successfully initialized.
> > + * * -ENODEV:	The TDX module is not loaded, or TDX is not supported.
> > + * * -EINVAL:	The TDX module cannot be initialized due to certain
> > + *		conditions are not met (i.e. when not all MADT-enabled
> > + *		CPUs are not online).
> > + * * -EFAULT:	Other internal fatal errors, or the TDX module is in
> > + *		shutdown mode due to it failed to initialize in previous
> > + *		attempts.
> > + */
> 
> I honestly don't think all these error codes mean anything.  They're
> plumbed nowhere and the use of -EFAULT is just plain wrong.
> 
> Nobody can *DO* anything with these anyway.
> 
> Just give one error code and make sure that you have pr_info()'s around
> to make it clear what went wrong.  Then just do -EINVAL universally.
> Remove all the nonsense comments.

OK. 

> > +int tdx_enable(void)
> > +{
> > +	int ret;
> > +
> > +	if (!platform_tdx_enabled())
> > +		return -ENODEV;
> > +
> > +	mutex_lock(&tdx_module_lock);
> > +
> > +	switch (tdx_module_status) {
> > +	case TDX_MODULE_UNKNOWN:
> > +		ret = __tdx_enable();
> > +		break;
> > +	case TDX_MODULE_NONE:
> > +		ret = -ENODEV;
> > +		break;
> 
> TDX_MODULE_NONE should probably be called TDX_MODULE_NOT_LOADED.  A
> comment would also be nice:
> 
> 	/* The BIOS did not load the module.  No way to fix that. */
> 
> > +	case TDX_MODULE_INITIALIZED:
> 
> 		/* Already initialized, great, tell the caller: */

Will do.

> 
> > +		ret = 0;
> > +		break;
> > +	default:
> > +		WARN_ON_ONCE(tdx_module_status != TDX_MODULE_SHUTDOWN);
> > +		ret = -EFAULT;
> > +		break;
> > +	}
> 
> I don't get what that default: is for or what it has to do with
> TDX_MODULE_SHUTDOWN.

I meant we can only have 4 possible status, and the default case must be the
TDX_MODULE_SHUTDOWN state.

I think I can just remove that WARN()?

> 
> 
> > +	mutex_unlock(&tdx_module_lock);
> > +
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(tdx_enable);
>
  
Thomas Gleixner Nov. 23, 2022, 11:05 a.m. UTC | #10
Kai!

On Wed, Nov 23 2022 at 00:30, Kai Huang wrote:
> On Tue, 2022-11-22 at 21:03 +0100, Thomas Gleixner wrote:
>> Clearly that's nowhere spelled out in the documentation, but I don't
>> buy the 'architecturaly required' argument not at all. It's an
>> implementation detail of the TDX module.
>
> I agree on hardware level there shouldn't be such requirement (not 100% sure
> though), but I guess from kernel's perspective, "the implementation detail of
> the TDX module" is sort of "architectural requirement"

Sure, but then it needs to be clearly documented so.

> -- at least Intel arch guys think so I guess.

Intel "arch" guys? You might look up the meanings of "arch" in a
dictionary. LKML is not twatter.

>> Technically there is IMO ZERO requirement to do so.
>> 
>>  1) The TDX module is global
>> 
>>  2) Seam-root and Seam-non-root operation are strictly a LP property.
>> 
>>     The only architectural prerequisite for using Seam on a LP is that
>>     obviously the encryption/decryption mechanics have been initialized
>>     on the package to which the LP belongs.
>> 
>> I can see why it might be complicated to add/remove an LP after
>> initialization fact, but technically it should be possible.
>
> "kernel soft offline" actually isn't an issue.  We can bring down a logical cpu
> after it gets initialized and then bring it up again.

That's the whole point where this discussion started: _AFTER_ it gets
initialized.

Which means that, e.g. adding "nosmt" to the kernel command line will
make TDX fail hard because at the point where TDX is initialized the
hyperthreads are not 'soft' online and cannot respond to anything, but
the BIOS already accounted them.

This is just wrong as we all know that "nosmt" is sadly one of the
obvious counter measures for the never ending flood of speculation
issues.

> Only add/removal of physical cpu will cause problem: 

You wish. 

> TDX MCHECK verifies all boot-time present cpus to make sure they are TDX-
> compatible before it enables TDX in hardware.  MCHECK cannot run on hot-added
> CPU, so TDX cannot support physical CPU hotplug.

TDX can rightfully impose the limitation that it only executes on CPUs,
which are known at boot/initialization time, and only utilizes "known"
memory. That's it, but that does not enforce to prevent physical hotplug
in general.

> We tried to get it clarified in the specification, and below is what TDX/module
> arch guys agreed to put to the TDX module spec (just checked it's not in latest
> public spec yet, but they said it will be in next release):
>
> "
> 4.1.3.2.  CPU Configuration
>
> During platform boot, MCHECK verifies all logical CPUs to ensure they
> meet TDX’s

That MCHECK falls into the category of security voodoo.

It needs to verify _ALL_ logical CPUs to ensure that Intel did not put
different models and steppings into a package or what?

> security and certain functionality requirements, and MCHECK passes the following
> CPU configuration information to the NP-SEAMLDR, P-SEAMLDR and the TDX Module:
>
> ·         Total number of logical processors in the platform.

You surely need MCHECK for this

> ·         Total number of installed packages in the platform.

and for this...

> ·         A table of per-package CPU family, model and stepping etc.
> identification, as enumerated by CPUID(1).EAX.
> The above information is static and does not change after platform boot and
> MCHECK run.
>
> Note:     TDX doesn’t support adding or removing CPUs from TDX security
> perimeter, as checked my MCHECK.

More security voodoo. The TDX security perimeter has nothing to do with
adding or removing CPUs on a system. If that'd be true then TDX is a
complete fail.

> BIOS should prevent CPUs from being hot-added or hot-removed after
> platform boots.

If the BIOS does not prevent it, then TDX and the Seam module will not
even notice unless the OS tries to invoke seamcall() on a newly plugged
CPU.

A newly added CPU and newly added memory should not have any impact on
the TDX integrity of the already running system. If they have then
again, TDX is broken by design.

> The TDX module performs additional checks of the CPU’s configuration and
> supported features, by reading MSRs and CPUID information as described in the
> following sections.

to ensure that the MCHECK generated table is still valid at the point
where TDX is initialized? 

That said, this documentation is at least better than the existing void,
but that does not make it technically correct.

Thanks,

        tglx
  
Kai Huang Nov. 23, 2022, 12:22 p.m. UTC | #11
On Wed, 2022-11-23 at 12:05 +0100, Thomas Gleixner wrote:
> Kai!
> 
> On Wed, Nov 23 2022 at 00:30, Kai Huang wrote:
> > On Tue, 2022-11-22 at 21:03 +0100, Thomas Gleixner wrote:
> > > Clearly that's nowhere spelled out in the documentation, but I don't
> > > buy the 'architecturaly required' argument not at all. It's an
> > > implementation detail of the TDX module.
> > 
> > I agree on hardware level there shouldn't be such requirement (not 100% sure
> > though), but I guess from kernel's perspective, "the implementation detail of
> > the TDX module" is sort of "architectural requirement"
> 
> Sure, but then it needs to be clearly documented so.
> 
> > -- at least Intel arch guys think so I guess.
> 
> Intel "arch" guys? You might look up the meanings of "arch" in a
> dictionary. LKML is not twatter.
> 
> > > Technically there is IMO ZERO requirement to do so.
> > > 
> > >  1) The TDX module is global
> > > 
> > >  2) Seam-root and Seam-non-root operation are strictly a LP property.
> > > 
> > >     The only architectural prerequisite for using Seam on a LP is that
> > >     obviously the encryption/decryption mechanics have been initialized
> > >     on the package to which the LP belongs.
> > > 
> > > I can see why it might be complicated to add/remove an LP after
> > > initialization fact, but technically it should be possible.
> > 
> > "kernel soft offline" actually isn't an issue.  We can bring down a logical cpu
> > after it gets initialized and then bring it up again.
> 
> That's the whole point where this discussion started: _AFTER_ it gets
> initialized.
> 
> Which means that, e.g. adding "nosmt" to the kernel command line will
> make TDX fail hard because at the point where TDX is initialized the
> hyperthreads are not 'soft' online and cannot respond to anything, but
> the BIOS already accounted them.
> 
> This is just wrong as we all know that "nosmt" is sadly one of the
> obvious counter measures for the never ending flood of speculation
> issues.

Agree.  As said in my other replies, I'll follow up with TDX module guys on
this.

> 
> > Only add/removal of physical cpu will cause problem: 
> 
> You wish. 
> 
> > TDX MCHECK verifies all boot-time present cpus to make sure they are TDX-
> > compatible before it enables TDX in hardware.  MCHECK cannot run on hot-added
> > CPU, so TDX cannot support physical CPU hotplug.
> 
> TDX can rightfully impose the limitation that it only executes on CPUs,
> which are known at boot/initialization time, and only utilizes "known"
> memory. That's it, but that does not enforce to prevent physical hotplug
> in general.

Adding physical CPUs should have no problem I guess, they just cannot run TDX. 
TDX architecture doesn't expect they can run TDX code anyway.

But would physical removal of boot-time present CPU cause problem? TDX MCHECK
checks/records boot-time present CPUs.  If a CPU is removed and then a new one
is added, then TDX still treats it is TDX-compatible, but it may actually not.

So if this happens, from functionality's point of view, it can break.  I think
TDX still wants to guarantee TDX code can work correctly on "TDX recorded" CPUs.

Also, I am not sure whether there's any security issue if a malicious kernel
tries to run TDX code on such removed-then-added CPU.

This seems a TDX architecture problem rather than kernel policy issue.

> 
> > We tried to get it clarified in the specification, and below is what TDX/module
> > arch guys agreed to put to the TDX module spec (just checked it's not in latest
> > public spec yet, but they said it will be in next release):
> > 
> > "
> > 4.1.3.2.  CPU Configuration
> > 
> > During platform boot, MCHECK verifies all logical CPUs to ensure they
> > meet TDX’s
> 
> That MCHECK falls into the category of security voodoo.
> 
> It needs to verify _ALL_ logical CPUs to ensure that Intel did not put
> different models and steppings into a package or what?

I am guessing so.

> 
> > security and certain functionality requirements, and MCHECK passes the following
> > CPU configuration information to the NP-SEAMLDR, P-SEAMLDR and the TDX Module:
> > 
> > ·         Total number of logical processors in the platform.
> 
> You surely need MCHECK for this
> 
> > ·         Total number of installed packages in the platform.
> 
> and for this...
> 
> > ·         A table of per-package CPU family, model and stepping etc.
> > identification, as enumerated by CPUID(1).EAX.
> > The above information is static and does not change after platform boot and
> > MCHECK run.
> > 
> > Note:     TDX doesn’t support adding or removing CPUs from TDX security
> > perimeter, as checked my MCHECK.
> 
> More security voodoo. The TDX security perimeter has nothing to do with
> adding or removing CPUs on a system. If that'd be true then TDX is a
> complete fail.

No argument here.

> > BIOS should prevent CPUs from being hot-added or hot-removed after
> > platform boots.
> 
> If the BIOS does not prevent it, then TDX and the Seam module will not
> even notice unless the OS tries to invoke seamcall() on a newly plugged
> CPU.
> 
> A newly added CPU and newly added memory should not have any impact on
> the TDX integrity of the already running system. If they have then
> again, TDX is broken by design.

No argument here either.

> 
> > The TDX module performs additional checks of the CPU’s configuration and
> > supported features, by reading MSRs and CPUID information as described in the
> > following sections.
> 
> to ensure that the MCHECK generated table is still valid at the point
> where TDX is initialized? 

I think it is trying to say:

MCHECK doesn't do all the verifications. Some verfications are deferred to the
TDX module to check when it gets initialized.

> 
> That said, this documentation is at least better than the existing void,
> but that does not make it technically correct.
> 
> Thanks,
> 
>         tglx
  
Dave Hansen Nov. 23, 2022, 4:58 p.m. UTC | #12
On 11/23/22 02:18, Huang, Kai wrote:
>> Again, there are a lot of words in that comment, but I'm not sure why
>> it's here.  Despite all the whinging about ACPI, doesn't it boil down to:
>>
>>       The TDX module itself establishes its own concept of how many
>>       logical CPUs there are in the system when it is loaded.
>>
> This isn't accurate.  TDX MCHECK records the total number of logical CPUs when
> the BIOS enables TDX.  This happens before the TDX module is loaded.  In fact
> the TDX module only gets this information from a secret location.

Kai, this is the point where I lose patience with the conversation
around this series.  I'll you paste you the line of code where the TDX
module literally "establishes its own concept of how many logical CPUs
there are in the system":

>     //NUM_LPS
>     tdx_global_data_ptr->num_of_lps = sysinfo_table_ptr->mcheck_fields.tot_num_lps;

Yes, this is derived directly from MCHECK.  But, this concept is
separate from MCHECK.  We have seen zero actual facts from you or other
folks at Intel that this is anything other than an arbitrary choice made
for the convenience of the TDX module.  It _might_ not be this way.  I'm
open to hearing those facts and changing my position on this.

But, please bring facts, not random references to "secret locations"
that aren't that secret.
  
Kai Huang Nov. 23, 2022, 9:58 p.m. UTC | #13
On Wed, 2022-11-23 at 08:58 -0800, Dave Hansen wrote:
> On 11/23/22 02:18, Huang, Kai wrote:
> > > Again, there are a lot of words in that comment, but I'm not sure why
> > > it's here.  Despite all the whinging about ACPI, doesn't it boil down to:
> > > 
> > >       The TDX module itself establishes its own concept of how many
> > >       logical CPUs there are in the system when it is loaded.
> > > 
> > This isn't accurate.  TDX MCHECK records the total number of logical CPUs when
> > the BIOS enables TDX.  This happens before the TDX module is loaded.  In fact
> > the TDX module only gets this information from a secret location.
> 
> Kai, this is the point where I lose patience with the conversation
> around this series.  I'll you paste you the line of code where the TDX
> module literally "establishes its own concept of how many logical CPUs
> there are in the system":
> 
> >     //NUM_LPS
> >     tdx_global_data_ptr->num_of_lps = sysinfo_table_ptr->mcheck_fields.tot_num_lps;
> 
> Yes, this is derived directly from MCHECK.  But, this concept is
> separate from MCHECK.  We have seen zero actual facts from you or other
> folks at Intel that this is anything other than an arbitrary choice made
> for the convenience of the TDX module.  It _might_ not be this way.  I'm
> open to hearing those facts and changing my position on this.
> 
> But, please bring facts, not random references to "secret locations"
> that aren't that secret.

Agreed.  Thanks for providing the comment and will use it.
  

Patch

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 51c4222a13ae..05fc89d9742a 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -101,8 +101,10 @@  static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
 
 #ifdef CONFIG_INTEL_TDX_HOST
 bool platform_tdx_enabled(void);
+int tdx_enable(void);
 #else	/* !CONFIG_INTEL_TDX_HOST */
 static inline bool platform_tdx_enabled(void) { return false; }
+static inline int tdx_enable(void)  { return -ENODEV; }
 #endif	/* CONFIG_INTEL_TDX_HOST */
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 8d943bdc8335..28c187b8726f 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -10,15 +10,34 @@ 
 #include <linux/types.h>
 #include <linux/init.h>
 #include <linux/printk.h>
+#include <linux/mutex.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
 #include <asm/msr-index.h>
 #include <asm/msr.h>
 #include <asm/apic.h>
 #include <asm/tdx.h>
 #include "tdx.h"
 
+/* TDX module status during initialization */
+enum tdx_module_status_t {
+	/* TDX module hasn't been detected and initialized */
+	TDX_MODULE_UNKNOWN,
+	/* TDX module is not loaded */
+	TDX_MODULE_NONE,
+	/* TDX module is initialized */
+	TDX_MODULE_INITIALIZED,
+	/* TDX module is shut down due to initialization error */
+	TDX_MODULE_SHUTDOWN,
+};
+
 static u32 tdx_keyid_start __ro_after_init;
 static u32 tdx_keyid_num __ro_after_init;
 
+static enum tdx_module_status_t tdx_module_status;
+/* Prevent concurrent attempts on TDX detection and initialization */
+static DEFINE_MUTEX(tdx_module_lock);
+
 /*
  * Detect TDX private KeyIDs to see whether TDX has been enabled by the
  * BIOS.  Both initializing the TDX module and running TDX guest require
@@ -104,3 +123,134 @@  bool platform_tdx_enabled(void)
 {
 	return !!tdx_keyid_num;
 }
+
+/*
+ * Detect and initialize the TDX module.
+ *
+ * Return -ENODEV when the TDX module is not loaded, 0 when it
+ * is successfully initialized, or other error when it fails to
+ * initialize.
+ */
+static int init_tdx_module(void)
+{
+	/* The TDX module hasn't been detected */
+	return -ENODEV;
+}
+
+static void shutdown_tdx_module(void)
+{
+	/* TODO: Shut down the TDX module */
+}
+
+static int __tdx_enable(void)
+{
+	int ret;
+
+	/*
+	 * Initializing the TDX module requires doing SEAMCALL on all
+	 * boot-time present CPUs.  For simplicity temporarily disable
+	 * CPU hotplug to prevent any CPU from going offline during
+	 * the initialization.
+	 */
+	cpus_read_lock();
+
+	/*
+	 * Check whether all boot-time present CPUs are online and
+	 * return early with a message so the user can be aware.
+	 *
+	 * Note a non-buggy BIOS should never support physical (ACPI)
+	 * CPU hotplug when TDX is enabled, and all boot-time present
+	 * CPU should be enabled in MADT, so there should be no
+	 * disabled_cpus and num_processors won't change at runtime
+	 * either.
+	 */
+	if (disabled_cpus || num_online_cpus() != num_processors) {
+		pr_err("Unable to initialize the TDX module when there's offline CPU(s).\n");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = init_tdx_module();
+	if (ret == -ENODEV) {
+		pr_info("TDX module is not loaded.\n");
+		tdx_module_status = TDX_MODULE_NONE;
+		goto out;
+	}
+
+	/*
+	 * Shut down the TDX module in case of any error during the
+	 * initialization process.  It's meaningless to leave the TDX
+	 * module in any middle state of the initialization process.
+	 *
+	 * Shutting down the module also requires doing SEAMCALL on all
+	 * MADT-enabled CPUs.  Do it while CPU hotplug is disabled.
+	 *
+	 * Return all errors during the initialization as -EFAULT as the
+	 * module is always shut down.
+	 */
+	if (ret) {
+		pr_info("Failed to initialize TDX module. Shut it down.\n");
+		shutdown_tdx_module();
+		tdx_module_status = TDX_MODULE_SHUTDOWN;
+		ret = -EFAULT;
+		goto out;
+	}
+
+	pr_info("TDX module initialized.\n");
+	tdx_module_status = TDX_MODULE_INITIALIZED;
+out:
+	cpus_read_unlock();
+
+	return ret;
+}
+
+/**
+ * tdx_enable - Enable TDX by initializing the TDX module
+ *
+ * Caller to make sure all CPUs are online and in VMX operation before
+ * calling this function.  CPU hotplug is temporarily disabled internally
+ * to prevent any cpu from going offline.
+ *
+ * This function can be called in parallel by multiple callers.
+ *
+ * Return:
+ *
+ * * 0:		The TDX module has been successfully initialized.
+ * * -ENODEV:	The TDX module is not loaded, or TDX is not supported.
+ * * -EINVAL:	The TDX module cannot be initialized due to certain
+ *		conditions are not met (i.e. when not all MADT-enabled
+ *		CPUs are not online).
+ * * -EFAULT:	Other internal fatal errors, or the TDX module is in
+ *		shutdown mode due to it failed to initialize in previous
+ *		attempts.
+ */
+int tdx_enable(void)
+{
+	int ret;
+
+	if (!platform_tdx_enabled())
+		return -ENODEV;
+
+	mutex_lock(&tdx_module_lock);
+
+	switch (tdx_module_status) {
+	case TDX_MODULE_UNKNOWN:
+		ret = __tdx_enable();
+		break;
+	case TDX_MODULE_NONE:
+		ret = -ENODEV;
+		break;
+	case TDX_MODULE_INITIALIZED:
+		ret = 0;
+		break;
+	default:
+		WARN_ON_ONCE(tdx_module_status != TDX_MODULE_SHUTDOWN);
+		ret = -EFAULT;
+		break;
+	}
+
+	mutex_unlock(&tdx_module_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(tdx_enable);