[v7,17/20] x86/virt/tdx: Configure global KeyID on all packages

Message ID 8d8285cc5efa6302cf42a3fe2c9153d1a9dbcdac.1668988357.git.kai.huang@intel.com
State New
Headers
Series TDX host kernel support |

Commit Message

Kai Huang Nov. 21, 2022, 12:26 a.m. UTC
  After the array of TDMRs and the global KeyID are configured to the TDX
module, use TDH.SYS.KEY.CONFIG to configure the key of the global KeyID
on all packages.

TDH.SYS.KEY.CONFIG must be done on one (any) cpu for each package.  And
it cannot run concurrently on different CPUs.  Implement a helper to
run SEAMCALL on one cpu for each package one by one, and use it to
configure the global KeyID on all packages.

Intel hardware doesn't guarantee cache coherency across different
KeyIDs.  The kernel needs to flush PAMT's dirty cachelines (associated
with KeyID 0) before the TDX module uses the global KeyID to access the
PAMT.  Following the TDX module specification, flush cache before
configuring the global KeyID on all packages.

Given the PAMT size can be large (~1/256th of system RAM), just use
WBINVD on all CPUs to flush.

Note if any TDH.SYS.KEY.CONFIG fails, the TDX module may already have
used the global KeyID to write any PAMT.  Therefore, need to use WBINVD
to flush cache before freeing the PAMTs back to the kernel.  Note using
MOVDIR64B (which changes the page's associated KeyID from the old TDX
private KeyID back to KeyID 0, which is used by the kernel) to clear
PMATs isn't needed, as the KeyID 0 doesn't support integrity check.

Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---

v6 -> v7:
 - Improved changelong and comment to explain why MOVDIR64B isn't used
   when returning PAMTs back to the kernel.

---
 arch/x86/virt/vmx/tdx/tdx.c | 89 ++++++++++++++++++++++++++++++++++++-
 arch/x86/virt/vmx/tdx/tdx.h |  1 +
 2 files changed, 88 insertions(+), 2 deletions(-)
  

Comments

Dave Hansen Nov. 24, 2022, 12:28 a.m. UTC | #1
On 11/20/22 16:26, Kai Huang wrote:
> After the array of TDMRs and the global KeyID are configured to the TDX
> module, use TDH.SYS.KEY.CONFIG to configure the key of the global KeyID
> on all packages.
> 
> TDH.SYS.KEY.CONFIG must be done on one (any) cpu for each package.  And
> it cannot run concurrently on different CPUs.  Implement a helper to
> run SEAMCALL on one cpu for each package one by one, and use it to
> configure the global KeyID on all packages.

This has the same problems as SYS.LP.INIT.  It's basically snake oil in
some TDX configurations.

This really only needs to be done when the TDX module has memory
mappings on a socket for which it needs to use the "global KeyID".  If
there's no PAMT on a socket, there are probably no allocations there to
speak of and no *technical* reason to call TDH.SYS.KEY.CONFIG on that
socket.  At least none I can see.

So, let's check up on this requirement as well.  This could also turn
out to be a real pain if all the CPUs on a socket are offline.

> Intel hardware doesn't guarantee cache coherency across different
> KeyIDs.  The kernel needs to flush PAMT's dirty cachelines (associated
> with KeyID 0) before the TDX module uses the global KeyID to access the
> PAMT.  Following the TDX module specification, flush cache before
> configuring the global KeyID on all packages.

I think it's probably worth an aside here about why TDX security isn't
dependent on this step.  I *think* it boils down to the memory integrity
protections.  If the caches aren't flushed, a dirty KeyID-0 cacheline
could be written back to RAM.  The TDX module would come along later and
read the cacheline using KeyID-whatever, get an integrity mismatch,
machine check, and then everyone would be sad.

Everyone is sad, but TDX security remains intact because memory
integrity saved us.

Is it memory integrity or the TD bit, actually?

> Given the PAMT size can be large (~1/256th of system RAM), just use
> WBINVD on all CPUs to flush.

<sigh>

> Note if any TDH.SYS.KEY.CONFIG fails, the TDX module may already have
> used the global KeyID to write any PAMT.  Therefore, need to use WBINVD
> to flush cache before freeing the PAMTs back to the kernel.  Note using
> MOVDIR64B (which changes the page's associated KeyID from the old TDX
> private KeyID back to KeyID 0, which is used by the kernel) to clear
> PMATs isn't needed, as the KeyID 0 doesn't support integrity check.

I hope this is covered in the code well.

> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 3a032930e58a..99d1be5941a7 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -224,6 +224,46 @@ static void seamcall_on_each_cpu(struct seamcall_ctx *sc)
>  	on_each_cpu(seamcall_smp_call_function, sc, true);
>  }
>  
> +/*
> + * Call one SEAMCALL on one (any) cpu for each physical package in
> + * serialized way.  Return immediately in case of any error if
> + * SEAMCALL fails on any cpu.
> + *
> + * Note for serialized calls 'struct seamcall_ctx::err' doesn't have
> + * to be atomic, but for simplicity just reuse it instead of adding
> + * a new one.
> + */
> +static int seamcall_on_each_package_serialized(struct seamcall_ctx *sc)
> +{
> +	cpumask_var_t packages;
> +	int cpu, ret = 0;
> +
> +	if (!zalloc_cpumask_var(&packages, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	for_each_online_cpu(cpu) {
> +		if (cpumask_test_and_set_cpu(topology_physical_package_id(cpu),
> +					packages))
> +			continue;
> +
> +		ret = smp_call_function_single(cpu, seamcall_smp_call_function,
> +				sc, true);
> +		if (ret)
> +			break;
> +
> +		/*
> +		 * Doesn't have to use atomic_read(), but it doesn't
> +		 * hurt either.
> +		 */

I don't think you need to cover this twice.  Just do it in one comment.

> +		ret = atomic_read(&sc->err);
> +		if (ret)
> +			break;
> +	}
> +
> +	free_cpumask_var(packages);
> +	return ret;
> +}
> +
>  static int tdx_module_init_cpus(void)
>  {
>  	struct seamcall_ctx sc = { .fn = TDH_SYS_LP_INIT };
> @@ -1010,6 +1050,22 @@ static int config_tdx_module(struct tdmr_info *tdmr_array, int tdmr_num,
>  	return ret;
>  }
>  
> +static int config_global_keyid(void)
> +{
> +	struct seamcall_ctx sc = { .fn = TDH_SYS_KEY_CONFIG };
> +
> +	/*
> +	 * Configure the key of the global KeyID on all packages by
> +	 * calling TDH.SYS.KEY.CONFIG on all packages in a serialized
> +	 * way as it cannot run concurrently on different CPUs.
> +	 *
> +	 * TDH.SYS.KEY.CONFIG may fail with entropy error (which is
> +	 * a recoverable error).  Assume this is exceedingly rare and
> +	 * just return error if encountered instead of retrying.
> +	 */
> +	return seamcall_on_each_package_serialized(&sc);
> +}
> +
>  /*
>   * Detect and initialize the TDX module.
>   *
> @@ -1098,15 +1154,44 @@ static int init_tdx_module(void)
>  	if (ret)
>  		goto out_free_pamts;
>  
> +	/*
> +	 * Hardware doesn't guarantee cache coherency across different
> +	 * KeyIDs.  The kernel needs to flush PAMT's dirty cachelines
> +	 * (associated with KeyID 0) before the TDX module can use the
> +	 * global KeyID to access the PAMT.  Given PAMTs are potentially
> +	 * large (~1/256th of system RAM), just use WBINVD on all cpus
> +	 * to flush the cache.
> +	 *
> +	 * Follow the TDX spec to flush cache before configuring the
> +	 * global KeyID on all packages.
> +	 */
> +	wbinvd_on_all_cpus();
> +
> +	/* Config the key of global KeyID on all packages */
> +	ret = config_global_keyid();
> +	if (ret)
> +		goto out_free_pamts;
> +
>  	/*
>  	 * Return -EINVAL until all steps of TDX module initialization
>  	 * process are done.
>  	 */
>  	ret = -EINVAL;
>  out_free_pamts:
> -	if (ret)
> +	if (ret) {
> +		/*
> +		 * Part of PAMT may already have been initialized by

						s/initialized/written/

> +		 * TDX module.  Flush cache before returning PAMT back
> +		 * to the kernel.
> +		 *
> +		 * Note there's no need to do MOVDIR64B (which changes
> +		 * the page's associated KeyID from the old TDX private
> +		 * KeyID back to KeyID 0, which is used by the kernel),
> +		 * as KeyID 0 doesn't support integrity check.
> +		 */

MOVDIR64B is the tiniest of implementation details and also not the only
way to initialize memory integrity metadata.

Just keep this high level:

		* No need to worry about memory integrity checks here.
		* KeyID 0 has integrity checking disabled.

> +		wbinvd_on_all_cpus();
>  		tdmrs_free_pamt_all(tdmr_array, tdmr_num);
> -	else
> +	} else
>  		pr_info("%lu pages allocated for PAMT.\n",
>  				tdmrs_count_pamt_pages(tdmr_array, tdmr_num));
>  out_free_tdmrs:
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index c26bab2555ca..768d097412ab 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -15,6 +15,7 @@
>  /*
>   * TDX module SEAMCALL leaf functions
>   */
> +#define TDH_SYS_KEY_CONFIG	31
>  #define TDH_SYS_INFO		32
>  #define TDH_SYS_INIT		33
>  #define TDH_SYS_LP_INIT		35
  
Kai Huang Nov. 24, 2022, 10:28 p.m. UTC | #2
On Wed, 2022-11-23 at 16:28 -0800, Dave Hansen wrote:
> > On 11/20/22 16:26, Kai Huang wrote:
> > > > After the array of TDMRs and the global KeyID are configured to the TDX
> > > > module, use TDH.SYS.KEY.CONFIG to configure the key of the global KeyID
> > > > on all packages.
> > > > 
> > > > TDH.SYS.KEY.CONFIG must be done on one (any) cpu for each package.  And
> > > > it cannot run concurrently on different CPUs.  Implement a helper to
> > > > run SEAMCALL on one cpu for each package one by one, and use it to
> > > > configure the global KeyID on all packages.
> > 
> > This has the same problems as SYS.LP.INIT.  It's basically snake oil in
> > some TDX configurations.
> > 
> > This really only needs to be done when the TDX module has memory
> > mappings on a socket for which it needs to use the "global KeyID".  
> > 

I think so.

> > If
> > there's no PAMT on a socket, there are probably no allocations there to
> > speak of and no *technical* reason to call TDH.SYS.KEY.CONFIG on that
> > socket.  At least none I can see.

Right PAMT is the main user. Besides PAMT, each TDX guest has a "Trust Domain
Root" (TDR) page, and this TDR page is also encrypted using the global KeyID.

> > 
> > So, let's check up on this requirement as well.  This could also turn
> > out to be a real pain if all the CPUs on a socket are offline.

Yes I'll also check.

> > 
> > > > Intel hardware doesn't guarantee cache coherency across different
> > > > KeyIDs.  The kernel needs to flush PAMT's dirty cachelines (associated
> > > > with KeyID 0) before the TDX module uses the global KeyID to access the
> > > > PAMT.  Following the TDX module specification, flush cache before
> > > > configuring the global KeyID on all packages.
> > 
> > I think it's probably worth an aside here about why TDX security isn't
> > dependent on this step.  I *think* it boils down to the memory integrity
> > protections.  If the caches aren't flushed, a dirty KeyID-0 cacheline
> > could be written back to RAM.  The TDX module would come along later and
> > read the cacheline using KeyID-whatever, get an integrity mismatch,
> > machine check, and then everyone would be sad.
> > 
> > Everyone is sad, but TDX security remains intact because memory
> > integrity saved us.
> > 
> > Is it memory integrity or the TD bit, actually?

For this case, I think memory integrity.

The TD bit is mainly used to prevent host kernel from being able to read the
integrity checksum (MAC) of TD memory, which could result in potential blute-
force attack on the MAC.

Specifically, there's such attack that: the host kernel can establish a shared
(non-TDX private) KeyID mapping, and repeatedly use different keys via that
mapping to speculatively read TDX guest memory. W/o the TD bit, the hardware
always performs the integrity check and thus there's possibility that the host
eventually can find a key which matches the integrity check.

To prevent such case, the TD bit is added. Hardware will check the TD bit match
first, and only perform integrity check _after_ TD bit match. So in above case,
host kernel speculatively read TDX memory via shared KeyID mapping will always
get a TD bit mismatch, thus won't be able to achieve above attack.

> > > > Given the PAMT size can be large (~1/256th of system RAM), just use
> > > > WBINVD on all CPUs to flush.
> > 
> > <sigh>
> > 
> > > > Note if any TDH.SYS.KEY.CONFIG fails, the TDX module may already have
> > > > used the global KeyID to write any PAMT.  Therefore, need to use WBINVD
> > > > to flush cache before freeing the PAMTs back to the kernel.  Note using
> > > > MOVDIR64B (which changes the page's associated KeyID from the old TDX
> > > > private KeyID back to KeyID 0, which is used by the kernel) to clear
> > > > PMATs isn't needed, as the KeyID 0 doesn't support integrity check.
> > 
> > I hope this is covered in the code well.
> > 
> > > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> > > > index 3a032930e58a..99d1be5941a7 100644
> > > > --- a/arch/x86/virt/vmx/tdx/tdx.c
> > > > +++ b/arch/x86/virt/vmx/tdx/tdx.c
> > > > @@ -224,6 +224,46 @@ static void seamcall_on_each_cpu(struct
> > > > seamcall_ctx *sc)
> > > >  	on_each_cpu(seamcall_smp_call_function, sc, true);
> > > >  }
> > > >  
> > > > +/*
> > > > + * Call one SEAMCALL on one (any) cpu for each physical package in
> > > > + * serialized way.  Return immediately in case of any error if
> > > > + * SEAMCALL fails on any cpu.
> > > > + *
> > > > + * Note for serialized calls 'struct seamcall_ctx::err' doesn't have
> > > > + * to be atomic, but for simplicity just reuse it instead of adding
> > > > + * a new one.
> > > > + */
> > > > +static int seamcall_on_each_package_serialized(struct seamcall_ctx *sc)
> > > > +{
> > > > +	cpumask_var_t packages;
> > > > +	int cpu, ret = 0;
> > > > +
> > > > +	if (!zalloc_cpumask_var(&packages, GFP_KERNEL))
> > > > +		return -ENOMEM;
> > > > +
> > > > +	for_each_online_cpu(cpu) {
> > > > +		if
> > > > (cpumask_test_and_set_cpu(topology_physical_package_id(cpu),
> > > > +					packages))
> > > > +			continue;
> > > > +
> > > > +		ret = smp_call_function_single(cpu,
> > > > seamcall_smp_call_function,
> > > > +				sc, true);
> > > > +		if (ret)
> > > > +			break;
> > > > +
> > > > +		/*
> > > > +		 * Doesn't have to use atomic_read(), but it doesn't
> > > > +		 * hurt either.
> > > > +		 */
> > 
> > I don't think you need to cover this twice.  Just do it in one comment.

Right. I'll remove. I'll be more careful next time to avoid such pattern. 

> > 
> > > > +		ret = atomic_read(&sc->err);
> > > > +		if (ret)
> > > > +			break;
> > > > +	}
> > > > +
> > > > +	free_cpumask_var(packages);
> > > > +	return ret;
> > > > +}
> > > > +
> > > >  static int tdx_module_init_cpus(void)
> > > >  {
> > > >  	struct seamcall_ctx sc = { .fn = TDH_SYS_LP_INIT };
> > > > @@ -1010,6 +1050,22 @@ static int config_tdx_module(struct tdmr_info *tdmr_array, int tdmr_num,
> > > >  	return ret;
> > > >  }
> > > >  
> > > > +static int config_global_keyid(void)
> > > > +{
> > > > +	struct seamcall_ctx sc = { .fn = TDH_SYS_KEY_CONFIG };
> > > > +
> > > > +	/*
> > > > +	 * Configure the key of the global KeyID on all packages by
> > > > +	 * calling TDH.SYS.KEY.CONFIG on all packages in a serialized
> > > > +	 * way as it cannot run concurrently on different CPUs.
> > > > +	 *
> > > > +	 * TDH.SYS.KEY.CONFIG may fail with entropy error (which is
> > > > +	 * a recoverable error).  Assume this is exceedingly rare and
> > > > +	 * just return error if encountered instead of retrying.
> > > > +	 */
> > > > +	return seamcall_on_each_package_serialized(&sc);
> > > > +}
> > > > +
> > > >  /*
> > > >   * Detect and initialize the TDX module.
> > > >   *
> > > > @@ -1098,15 +1154,44 @@ static int init_tdx_module(void)
> > > >  	if (ret)
> > > >  		goto out_free_pamts;
> > > >  
> > > > +	/*
> > > > +	 * Hardware doesn't guarantee cache coherency across different
> > > > +	 * KeyIDs.  The kernel needs to flush PAMT's dirty cachelines
> > > > +	 * (associated with KeyID 0) before the TDX module can use the
> > > > +	 * global KeyID to access the PAMT.  Given PAMTs are potentially
> > > > +	 * large (~1/256th of system RAM), just use WBINVD on all cpus
> > > > +	 * to flush the cache.
> > > > +	 *
> > > > +	 * Follow the TDX spec to flush cache before configuring the
> > > > +	 * global KeyID on all packages.
> > > > +	 */
> > > > +	wbinvd_on_all_cpus();
> > > > +
> > > > +	/* Config the key of global KeyID on all packages */
> > > > +	ret = config_global_keyid();
> > > > +	if (ret)
> > > > +		goto out_free_pamts;
> > > > +
> > > >  	/*
> > > >  	 * Return -EINVAL until all steps of TDX module initialization
> > > >  	 * process are done.
> > > >  	 */
> > > >  	ret = -EINVAL;
> > > >  out_free_pamts:
> > > > -	if (ret)
> > > > +	if (ret) {
> > > > +		/*
> > > > +		 * Part of PAMT may already have been initialized by
> > 
> > 						s/initialized/written/

Will do.

> > 
> > > > +		 * TDX module.  Flush cache before returning PAMT back
> > > > +		 * to the kernel.
> > > > +		 *
> > > > +		 * Note there's no need to do MOVDIR64B (which changes
> > > > +		 * the page's associated KeyID from the old TDX private
> > > > +		 * KeyID back to KeyID 0, which is used by the kernel),
> > > > +		 * as KeyID 0 doesn't support integrity check.
> > > > +		 */
> > 
> > MOVDIR64B is the tiniest of implementation details and also not the only
> > way to initialize memory integrity metadata.
> > 
> > Just keep this high level:
> > 
> > 		* No need to worry about memory integrity checks here.
> > 		* KeyID 0 has integrity checking disabled.

Will do. Thanks.
  
Kai Huang Nov. 25, 2022, 12:08 a.m. UTC | #3
On Thu, 2022-11-24 at 22:28 +0000, Huang, Kai wrote:
> > > > > Intel hardware doesn't guarantee cache coherency across different
> > > > > KeyIDs.  The kernel needs to flush PAMT's dirty cachelines (associated
> > > > > with KeyID 0) before the TDX module uses the global KeyID to access
> > > > > the
> > > > > PAMT.  Following the TDX module specification, flush cache before
> > > > > configuring the global KeyID on all packages.
> > > 
> > > I think it's probably worth an aside here about why TDX security isn't
> > > dependent on this step.  I *think* it boils down to the memory integrity
> > > protections.  If the caches aren't flushed, a dirty KeyID-0 cacheline
> > > could be written back to RAM.  The TDX module would come along later and
> > > read the cacheline using KeyID-whatever, get an integrity mismatch,
> > > machine check, and then everyone would be sad.
> > > 
> > > Everyone is sad, but TDX security remains intact because memory
> > > integrity saved us.
> > > 
> > > Is it memory integrity or the TD bit, actually?
> 
> For this case, I think memory integrity.

Sorry thinking again, I was wrong.  It should be the TD bit, since TD bit check
happens before integrity check.

So the follow is:

1) Dirty KeyID-0 cacheline written back to RAM, which clears the TD bit.
2) TDX module reads PAMT using TDX private KeyId causes TD mismatch, which in
this case, results in poison.
3) Further consume of poisoned data results in #MC.

> 
> The TD bit is mainly used to prevent host kernel from being able to read the
> integrity checksum (MAC) of TD memory, which could result in potential blute-
> force attack on the MAC.
> 
> Specifically, there's such attack that: the host kernel can establish a shared
> (non-TDX private) KeyID mapping, and repeatedly use different keys via that
> mapping to speculatively read TDX guest memory. W/o the TD bit, the hardware
> always performs the integrity check and thus there's possibility that the host
> eventually can find a key which matches the integrity check.
> 
> To prevent such case, the TD bit is added. Hardware will check the TD bit
> match
> first, and only perform integrity check _after_ TD bit match. So in above
> case,
> host kernel speculatively read TDX memory via shared KeyID mapping will always
> get a TD bit mismatch, thus won't be able to achieve above attack.
  
Binbin Wu Nov. 30, 2022, 3:35 a.m. UTC | #4
On 11/21/2022 8:26 AM, Kai Huang wrote:
> After the array of TDMRs and the global KeyID are configured to the TDX
> module, use TDH.SYS.KEY.CONFIG to configure the key of the global KeyID
> on all packages.
>
> TDH.SYS.KEY.CONFIG must be done on one (any) cpu for each package.  And
> it cannot run concurrently on different CPUs.  Implement a helper to
> run SEAMCALL on one cpu for each package one by one, and use it to
> configure the global KeyID on all packages.
>
> Intel hardware doesn't guarantee cache coherency across different
> KeyIDs.  The kernel needs to flush PAMT's dirty cachelines (associated
> with KeyID 0) before the TDX module uses the global KeyID to access the
> PAMT.  Following the TDX module specification, flush cache before
> configuring the global KeyID on all packages.
>
> Given the PAMT size can be large (~1/256th of system RAM), just use
> WBINVD on all CPUs to flush.
>
> Note if any TDH.SYS.KEY.CONFIG fails, the TDX module may already have
> used the global KeyID to write any PAMT.  Therefore, need to use WBINVD
> to flush cache before freeing the PAMTs back to the kernel.  Note using
> MOVDIR64B (which changes the page's associated KeyID from the old TDX
> private KeyID back to KeyID 0, which is used by the kernel)

It seems not accurate to say MOVDIR64B changes the page's associated KeyID.
It just uses the current KeyID for memory operations.


> to clear
> PMATs isn't needed, as the KeyID 0 doesn't support integrity check.

For integrity check, is KeyID 0 special or it just has the same behavior 
as non-zero shared KeyID (if any)?

By saying "KeyID 0 doesn't support integrity check", is it because of  
the implementation of this patch set or hardware behavior?

According to Architecure Specification 1.0 of TDX Module (344425-004US), 
shared KeyID could also enable integrity check.


>
> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
>
> v6 -> v7:
>   - Improved changelong and comment to explain why MOVDIR64B isn't used
>     when returning PAMTs back to the kernel.
>
> ---
>   arch/x86/virt/vmx/tdx/tdx.c | 89 ++++++++++++++++++++++++++++++++++++-
>   arch/x86/virt/vmx/tdx/tdx.h |  1 +
>   2 files changed, 88 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 3a032930e58a..99d1be5941a7 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -224,6 +224,46 @@ static void seamcall_on_each_cpu(struct seamcall_ctx *sc)
>   	on_each_cpu(seamcall_smp_call_function, sc, true);
>   }
>   
> +/*
> + * Call one SEAMCALL on one (any) cpu for each physical package in
> + * serialized way.  Return immediately in case of any error if
> + * SEAMCALL fails on any cpu.
> + *
> + * Note for serialized calls 'struct seamcall_ctx::err' doesn't have
> + * to be atomic, but for simplicity just reuse it instead of adding
> + * a new one.
> + */
> +static int seamcall_on_each_package_serialized(struct seamcall_ctx *sc)
> +{
> +	cpumask_var_t packages;
> +	int cpu, ret = 0;
> +
> +	if (!zalloc_cpumask_var(&packages, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	for_each_online_cpu(cpu) {
> +		if (cpumask_test_and_set_cpu(topology_physical_package_id(cpu),
> +					packages))
> +			continue;
> +
> +		ret = smp_call_function_single(cpu, seamcall_smp_call_function,
> +				sc, true);
> +		if (ret)
> +			break;
> +
> +		/*
> +		 * Doesn't have to use atomic_read(), but it doesn't
> +		 * hurt either.
> +		 */
> +		ret = atomic_read(&sc->err);
> +		if (ret)
> +			break;
> +	}
> +
> +	free_cpumask_var(packages);
> +	return ret;
> +}
> +
>   static int tdx_module_init_cpus(void)
>   {
>   	struct seamcall_ctx sc = { .fn = TDH_SYS_LP_INIT };
> @@ -1010,6 +1050,22 @@ static int config_tdx_module(struct tdmr_info *tdmr_array, int tdmr_num,
>   	return ret;
>   }
>   
> +static int config_global_keyid(void)
> +{
> +	struct seamcall_ctx sc = { .fn = TDH_SYS_KEY_CONFIG };
> +
> +	/*
> +	 * Configure the key of the global KeyID on all packages by
> +	 * calling TDH.SYS.KEY.CONFIG on all packages in a serialized
> +	 * way as it cannot run concurrently on different CPUs.
> +	 *
> +	 * TDH.SYS.KEY.CONFIG may fail with entropy error (which is
> +	 * a recoverable error).  Assume this is exceedingly rare and
> +	 * just return error if encountered instead of retrying.
> +	 */
> +	return seamcall_on_each_package_serialized(&sc);
> +}
> +
>   /*
>    * Detect and initialize the TDX module.
>    *
> @@ -1098,15 +1154,44 @@ static int init_tdx_module(void)
>   	if (ret)
>   		goto out_free_pamts;
>   
> +	/*
> +	 * Hardware doesn't guarantee cache coherency across different
> +	 * KeyIDs.  The kernel needs to flush PAMT's dirty cachelines
> +	 * (associated with KeyID 0) before the TDX module can use the
> +	 * global KeyID to access the PAMT.  Given PAMTs are potentially
> +	 * large (~1/256th of system RAM), just use WBINVD on all cpus
> +	 * to flush the cache.
> +	 *
> +	 * Follow the TDX spec to flush cache before configuring the
> +	 * global KeyID on all packages.
> +	 */
> +	wbinvd_on_all_cpus();
> +
> +	/* Config the key of global KeyID on all packages */
> +	ret = config_global_keyid();
> +	if (ret)
> +		goto out_free_pamts;
> +
>   	/*
>   	 * Return -EINVAL until all steps of TDX module initialization
>   	 * process are done.
>   	 */
>   	ret = -EINVAL;
>   out_free_pamts:
> -	if (ret)
> +	if (ret) {
> +		/*
> +		 * Part of PAMT may already have been initialized by
> +		 * TDX module.  Flush cache before returning PAMT back
> +		 * to the kernel.
> +		 *
> +		 * Note there's no need to do MOVDIR64B (which changes
> +		 * the page's associated KeyID from the old TDX private
> +		 * KeyID back to KeyID 0, which is used by the kernel),
> +		 * as KeyID 0 doesn't support integrity check.
> +		 */
> +		wbinvd_on_all_cpus();
>   		tdmrs_free_pamt_all(tdmr_array, tdmr_num);
> -	else
> +	} else
>   		pr_info("%lu pages allocated for PAMT.\n",
>   				tdmrs_count_pamt_pages(tdmr_array, tdmr_num));
>   out_free_tdmrs:
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index c26bab2555ca..768d097412ab 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -15,6 +15,7 @@
>   /*
>    * TDX module SEAMCALL leaf functions
>    */
> +#define TDH_SYS_KEY_CONFIG	31
>   #define TDH_SYS_INFO		32
>   #define TDH_SYS_INIT		33
>   #define TDH_SYS_LP_INIT		35
  
Kai Huang Nov. 30, 2022, 8:34 a.m. UTC | #5
On Wed, 2022-11-30 at 11:35 +0800, Binbin Wu wrote:
> On 11/21/2022 8:26 AM, Kai Huang wrote:
> > After the array of TDMRs and the global KeyID are configured to the TDX
> > module, use TDH.SYS.KEY.CONFIG to configure the key of the global KeyID
> > on all packages.
> > 
> > TDH.SYS.KEY.CONFIG must be done on one (any) cpu for each package.  And
> > it cannot run concurrently on different CPUs.  Implement a helper to
> > run SEAMCALL on one cpu for each package one by one, and use it to
> > configure the global KeyID on all packages.
> > 
> > Intel hardware doesn't guarantee cache coherency across different
> > KeyIDs.  The kernel needs to flush PAMT's dirty cachelines (associated
> > with KeyID 0) before the TDX module uses the global KeyID to access the
> > PAMT.  Following the TDX module specification, flush cache before
> > configuring the global KeyID on all packages.
> > 
> > Given the PAMT size can be large (~1/256th of system RAM), just use
> > WBINVD on all CPUs to flush.
> > 
> > Note if any TDH.SYS.KEY.CONFIG fails, the TDX module may already have
> > used the global KeyID to write any PAMT.  Therefore, need to use WBINVD
> > to flush cache before freeing the PAMTs back to the kernel.  Note using
> > MOVDIR64B (which changes the page's associated KeyID from the old TDX
> > private KeyID back to KeyID 0, which is used by the kernel)
> 
> It seems not accurate to say MOVDIR64B changes the page's associated KeyID.
> It just uses the current KeyID for memory operations.

The "write" to the memory changes the page's associated KeyID to the KeyID that
does the "write".  A more accurate expression perhaps should be MOVDIR64B +
MFENSE, but I think it doesn't matter in changelog.

> 
> 
> > to clear
> > PMATs isn't needed, as the KeyID 0 doesn't support integrity check.
> 
> For integrity check, is KeyID 0 special or it just has the same behavior 
> as non-zero shared KeyID (if any)?

KeyID 0 is TME KeyID.  It is special. Hardware treats the KeyID 0 differently.

> 
> By saying "KeyID 0 doesn't support integrity check", is it because of  
> the implementation of this patch set or hardware behavior?

It's hardware behaviour.

> 
> According to Architecure Specification 1.0 of TDX Module (344425-004US), 
> shared KeyID could also enable integrity check.
> 

KeyID 0 is different from non-0 MKTME shared KeyID.  For example, you cannot do
PCONFIG on KeyID 0.
  
Kirill A. Shutemov Nov. 30, 2022, 2:04 p.m. UTC | #6
On Wed, Nov 30, 2022 at 08:34:46AM +0000, Huang, Kai wrote:
> On Wed, 2022-11-30 at 11:35 +0800, Binbin Wu wrote:
> > On 11/21/2022 8:26 AM, Kai Huang wrote:
> > > After the array of TDMRs and the global KeyID are configured to the TDX
> > > module, use TDH.SYS.KEY.CONFIG to configure the key of the global KeyID
> > > on all packages.
> > > 
> > > TDH.SYS.KEY.CONFIG must be done on one (any) cpu for each package.  And
> > > it cannot run concurrently on different CPUs.  Implement a helper to
> > > run SEAMCALL on one cpu for each package one by one, and use it to
> > > configure the global KeyID on all packages.
> > > 
> > > Intel hardware doesn't guarantee cache coherency across different
> > > KeyIDs.  The kernel needs to flush PAMT's dirty cachelines (associated
> > > with KeyID 0) before the TDX module uses the global KeyID to access the
> > > PAMT.  Following the TDX module specification, flush cache before
> > > configuring the global KeyID on all packages.
> > > 
> > > Given the PAMT size can be large (~1/256th of system RAM), just use
> > > WBINVD on all CPUs to flush.
> > > 
> > > Note if any TDH.SYS.KEY.CONFIG fails, the TDX module may already have
> > > used the global KeyID to write any PAMT.  Therefore, need to use WBINVD
> > > to flush cache before freeing the PAMTs back to the kernel.  Note using
> > > MOVDIR64B (which changes the page's associated KeyID from the old TDX
> > > private KeyID back to KeyID 0, which is used by the kernel)
> > 
> > It seems not accurate to say MOVDIR64B changes the page's associated KeyID.
> > It just uses the current KeyID for memory operations.
> 
> The "write" to the memory changes the page's associated KeyID to the KeyID that
> does the "write".  A more accurate expression perhaps should be MOVDIR64B +
> MFENSE, but I think it doesn't matter in changelog.

MOVDIR64B KeyID for the cache line, not the page. Integrity tracked on
per-cacheline basis.
  
Dave Hansen Nov. 30, 2022, 3:13 p.m. UTC | #7
On 11/30/22 00:34, Huang, Kai wrote:
> On Wed, 2022-11-30 at 11:35 +0800, Binbin Wu wrote:
>> On 11/21/2022 8:26 AM, Kai Huang wrote:
>>> After the array of TDMRs and the global KeyID are configured to the TDX
>>> module, use TDH.SYS.KEY.CONFIG to configure the key of the global KeyID
>>> on all packages.
>>>
>>> TDH.SYS.KEY.CONFIG must be done on one (any) cpu for each package.  And
>>> it cannot run concurrently on different CPUs.  Implement a helper to
>>> run SEAMCALL on one cpu for each package one by one, and use it to
>>> configure the global KeyID on all packages.
>>>
>>> Intel hardware doesn't guarantee cache coherency across different
>>> KeyIDs.  The kernel needs to flush PAMT's dirty cachelines (associated
>>> with KeyID 0) before the TDX module uses the global KeyID to access the
>>> PAMT.  Following the TDX module specification, flush cache before
>>> configuring the global KeyID on all packages.
>>>
>>> Given the PAMT size can be large (~1/256th of system RAM), just use
>>> WBINVD on all CPUs to flush.
>>>
>>> Note if any TDH.SYS.KEY.CONFIG fails, the TDX module may already have
>>> used the global KeyID to write any PAMT.  Therefore, need to use WBINVD
>>> to flush cache before freeing the PAMTs back to the kernel.  Note using
>>> MOVDIR64B (which changes the page's associated KeyID from the old TDX
>>> private KeyID back to KeyID 0, which is used by the kernel)
>>
>> It seems not accurate to say MOVDIR64B changes the page's associated KeyID.
>> It just uses the current KeyID for memory operations.
> 
> The "write" to the memory changes the page's associated KeyID to the KeyID that
> does the "write".  A more accurate expression perhaps should be MOVDIR64B +
> MFENSE, but I think it doesn't matter in changelog.

Just delete it from the changelog.  It's a distraction.  I'm not even
sure it's *necessary* to do any memory content conversion after the TDX
module has written gunk.

There won't be any integrity issues because integrity errors don't do
anything for KeyID-0 (no #MC).

I _think_ the reads of the page using KeyID-0 will see abort page
semantics.  That's *FINE*.
  
Dave Hansen Nov. 30, 2022, 5:37 p.m. UTC | #8
On 11/20/22 16:26, Kai Huang wrote:
> After the array of TDMRs and the global KeyID are configured to the TDX
> module, use TDH.SYS.KEY.CONFIG to configure the key of the global KeyID
> on all packages.

I want to circle back to this because it potentially has the same class
of issue that TDH.SYS.LP.INIT had.  So, here's some more background
followed by the key question: is TDH.SYS.KEY.CONFIG too strict?  Should
we explore relaxing it?

Here's the very long-winded way of asking the same thing:

This key is used to protect TDX module memory which is too large to fit
into the limited range-register-protected (SMRR) areas that most of the
module uses.  Right now, that metadata includes the Physical Address
Metadata Tables (PAMT) and "TD Root" (TDR) pages.  Using this "global
KeyID" provides stronger isolation and integrity protection for these
structures than is provided by KeyID-0.

The "global KeyID" only strictly needs to be programmed into a memory
controllers if a PAMT or TDR page is allocated in memory attached to
that controller.  However, the TDX module currently requires that
TDH.SYS.KEY.CONFIG be executed on one processor in each package.  This
is true even if there is no TDX Memory Region (TDMR) attached to that
package.

This was likely done for simplicity in the TDX module.  It currently has
no NUMA awareness (or even trusted NUMA metadata) and no ability to
correlate processor packages with the memory attached to their memory
controllers.

The TDH.SYS.KEY.CONFIG design is actually pretty similar to Kirill's
MKTME implementation[1].  Basically blast the KeyID configuration out to
one processor in each package, regardless of whether the KeyID will ever
get used on that package.

While this requirement from the TDX module is _slightly_ too strict, I'm
not quite as worried about it as I was about the *super* strict
TDH.SYS.LP.INIT requirements.  It's a lot harder and more rare to have
an entire package of CPUs unavailable versus a single logical CPU.
There is, for instance, no side-channel mitigation that disables an
entire package worth of CPUs.  I'm not even sure if we allow an entire
package worth of NOHZ_FULL-indisposed processors.

I'm happy to go run the same drill for TDH.SYS.KEY.CONFIG that we did
for TDH.SYS.LP.INIT.  Basically, can we relax the too-strict
restrictions?  But, I'm not sure anyone will ever reap a practical
benefit from it.  I'm tempted to just leave it as-is.

Does anyone feel differently?

1.
https://lore.kernel.org/lkml/20190508144422.13171-1-kirill.shutemov@linux.intel.com/T/#m936f260a345284687f8e929675f68f3d514725f5
  
Kai Huang Nov. 30, 2022, 8:17 p.m. UTC | #9
> On 11/30/22 00:34, Huang, Kai wrote:
> > On Wed, 2022-11-30 at 11:35 +0800, Binbin Wu wrote:
> >> On 11/21/2022 8:26 AM, Kai Huang wrote:
> >>> After the array of TDMRs and the global KeyID are configured to the
> >>> TDX module, use TDH.SYS.KEY.CONFIG to configure the key of the
> >>> global KeyID on all packages.
> >>>
> >>> TDH.SYS.KEY.CONFIG must be done on one (any) cpu for each package.
> >>> And it cannot run concurrently on different CPUs.  Implement a
> >>> helper to run SEAMCALL on one cpu for each package one by one, and
> >>> use it to configure the global KeyID on all packages.
> >>>
> >>> Intel hardware doesn't guarantee cache coherency across different
> >>> KeyIDs.  The kernel needs to flush PAMT's dirty cachelines
> >>> (associated with KeyID 0) before the TDX module uses the global
> >>> KeyID to access the PAMT.  Following the TDX module specification,
> >>> flush cache before configuring the global KeyID on all packages.
> >>>
> >>> Given the PAMT size can be large (~1/256th of system RAM), just use
> >>> WBINVD on all CPUs to flush.
> >>>
> >>> Note if any TDH.SYS.KEY.CONFIG fails, the TDX module may already
> >>> have used the global KeyID to write any PAMT.  Therefore, need to
> >>> use WBINVD to flush cache before freeing the PAMTs back to the
> >>> kernel.  Note using MOVDIR64B (which changes the page's associated
> >>> KeyID from the old TDX private KeyID back to KeyID 0, which is used
> >>> by the kernel)
> >>
> >> It seems not accurate to say MOVDIR64B changes the page's associated
> KeyID.
> >> It just uses the current KeyID for memory operations.
> >
> > The "write" to the memory changes the page's associated KeyID to the
> > KeyID that does the "write".  A more accurate expression perhaps
> > should be MOVDIR64B + MFENSE, but I think it doesn't matter in changelog.
> 
> Just delete it from the changelog.  It's a distraction.  I'm not even sure it's
> *necessary* to do any memory content conversion after the TDX module has
> written gunk.
> 
> There won't be any integrity issues because integrity errors don't do anything for
> KeyID-0 (no #MC).
> 
> I _think_ the reads of the page using KeyID-0 will see abort page semantics.
> That's *FINE*.

I'll remove this from the changelog.  Thanks!
  

Patch

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 3a032930e58a..99d1be5941a7 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -224,6 +224,46 @@  static void seamcall_on_each_cpu(struct seamcall_ctx *sc)
 	on_each_cpu(seamcall_smp_call_function, sc, true);
 }
 
+/*
+ * Call one SEAMCALL on one (any) cpu for each physical package in
+ * serialized way.  Return immediately in case of any error if
+ * SEAMCALL fails on any cpu.
+ *
+ * Note for serialized calls 'struct seamcall_ctx::err' doesn't have
+ * to be atomic, but for simplicity just reuse it instead of adding
+ * a new one.
+ */
+static int seamcall_on_each_package_serialized(struct seamcall_ctx *sc)
+{
+	cpumask_var_t packages;
+	int cpu, ret = 0;
+
+	if (!zalloc_cpumask_var(&packages, GFP_KERNEL))
+		return -ENOMEM;
+
+	for_each_online_cpu(cpu) {
+		if (cpumask_test_and_set_cpu(topology_physical_package_id(cpu),
+					packages))
+			continue;
+
+		ret = smp_call_function_single(cpu, seamcall_smp_call_function,
+				sc, true);
+		if (ret)
+			break;
+
+		/*
+		 * Doesn't have to use atomic_read(), but it doesn't
+		 * hurt either.
+		 */
+		ret = atomic_read(&sc->err);
+		if (ret)
+			break;
+	}
+
+	free_cpumask_var(packages);
+	return ret;
+}
+
 static int tdx_module_init_cpus(void)
 {
 	struct seamcall_ctx sc = { .fn = TDH_SYS_LP_INIT };
@@ -1010,6 +1050,22 @@  static int config_tdx_module(struct tdmr_info *tdmr_array, int tdmr_num,
 	return ret;
 }
 
+static int config_global_keyid(void)
+{
+	struct seamcall_ctx sc = { .fn = TDH_SYS_KEY_CONFIG };
+
+	/*
+	 * Configure the key of the global KeyID on all packages by
+	 * calling TDH.SYS.KEY.CONFIG on all packages in a serialized
+	 * way as it cannot run concurrently on different CPUs.
+	 *
+	 * TDH.SYS.KEY.CONFIG may fail with entropy error (which is
+	 * a recoverable error).  Assume this is exceedingly rare and
+	 * just return error if encountered instead of retrying.
+	 */
+	return seamcall_on_each_package_serialized(&sc);
+}
+
 /*
  * Detect and initialize the TDX module.
  *
@@ -1098,15 +1154,44 @@  static int init_tdx_module(void)
 	if (ret)
 		goto out_free_pamts;
 
+	/*
+	 * Hardware doesn't guarantee cache coherency across different
+	 * KeyIDs.  The kernel needs to flush PAMT's dirty cachelines
+	 * (associated with KeyID 0) before the TDX module can use the
+	 * global KeyID to access the PAMT.  Given PAMTs are potentially
+	 * large (~1/256th of system RAM), just use WBINVD on all cpus
+	 * to flush the cache.
+	 *
+	 * Follow the TDX spec to flush cache before configuring the
+	 * global KeyID on all packages.
+	 */
+	wbinvd_on_all_cpus();
+
+	/* Config the key of global KeyID on all packages */
+	ret = config_global_keyid();
+	if (ret)
+		goto out_free_pamts;
+
 	/*
 	 * Return -EINVAL until all steps of TDX module initialization
 	 * process are done.
 	 */
 	ret = -EINVAL;
 out_free_pamts:
-	if (ret)
+	if (ret) {
+		/*
+		 * Part of PAMT may already have been initialized by
+		 * TDX module.  Flush cache before returning PAMT back
+		 * to the kernel.
+		 *
+		 * Note there's no need to do MOVDIR64B (which changes
+		 * the page's associated KeyID from the old TDX private
+		 * KeyID back to KeyID 0, which is used by the kernel),
+		 * as KeyID 0 doesn't support integrity check.
+		 */
+		wbinvd_on_all_cpus();
 		tdmrs_free_pamt_all(tdmr_array, tdmr_num);
-	else
+	} else
 		pr_info("%lu pages allocated for PAMT.\n",
 				tdmrs_count_pamt_pages(tdmr_array, tdmr_num));
 out_free_tdmrs:
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index c26bab2555ca..768d097412ab 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -15,6 +15,7 @@ 
 /*
  * TDX module SEAMCALL leaf functions
  */
+#define TDH_SYS_KEY_CONFIG	31
 #define TDH_SYS_INFO		32
 #define TDH_SYS_INIT		33
 #define TDH_SYS_LP_INIT		35