[v7,09/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

Message ID cd23a9583edcfa85e11612d94ecfd2d5e862c1d5.1668988357.git.kai.huang@intel.com
State New
Headers
Series TDX host kernel support |

Commit Message

Kai Huang Nov. 21, 2022, 12:26 a.m. UTC
  TDX provides increased levels of memory confidentiality and integrity.
This requires special hardware support for features like memory
encryption and storage of memory integrity checksums.  Not all memory
satisfies these requirements.

As a result, TDX introduced the concept of a "Convertible Memory Region"
(CMR).  During boot, the firmware builds a list of all of the memory
ranges which can provide the TDX security guarantees.  The list of these
ranges, along with TDX module information, is available to the kernel by
querying the TDX module via TDH.SYS.INFO SEAMCALL.

The host kernel can choose whether or not to use all convertible memory
regions as TDX-usable memory.  Before the TDX module is ready to create
any TDX guests, the kernel needs to configure the TDX-usable memory
regions by passing an array of "TD Memory Regions" (TDMRs) to the TDX
module.  Constructing the TDMR array requires information of both the
TDX module (TDSYSINFO_STRUCT) and the Convertible Memory Regions.  Call
TDH.SYS.INFO to get this information as a preparation.

Use static variables for both TDSYSINFO_STRUCT and CMR array to avoid
having to pass them as function arguments when constructing the TDMR
array.  And they are too big to be put to the stack anyway.  Also, KVM
needs to use the TDSYSINFO_STRUCT to create TDX guests.

Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---

v6 -> v7:
 - Simplified the check of CMRs due to the fact that TDX actually
   verifies CMRs (that are passed by the BIOS) before enabling TDX.
 - Changed the function name from check_cmrs() -> trim_empty_cmrs().
 - Added CMR page aligned check so that later patch can just get the PFN
   using ">> PAGE_SHIFT".

v5 -> v6:
 - Added to also print TDX module's attribute (Isaku).
 - Removed all arguments in tdx_gete_sysinfo() to use static variables
   of 'tdx_sysinfo' and 'tdx_cmr_array' directly as they are all used
   directly in other functions in later patches.
 - Added Isaku's Reviewed-by.

- v3 -> v5 (no feedback on v4):
 - Renamed sanitize_cmrs() to check_cmrs().
 - Removed unnecessary sanity check against tdx_sysinfo and tdx_cmr_array
   actual size returned by TDH.SYS.INFO.
 - Changed -EFAULT to -EINVAL in couple places.
 - Added comments around tdx_sysinfo and tdx_cmr_array saying they are
   used by TDH.SYS.INFO ABI.
 - Changed to pass 'tdx_sysinfo' and 'tdx_cmr_array' as function
   arguments in tdx_get_sysinfo().
 - Changed to only print BIOS-CMR when check_cmrs() fails.

---
 arch/x86/virt/vmx/tdx/tdx.c | 125 ++++++++++++++++++++++++++++++++++++
 arch/x86/virt/vmx/tdx/tdx.h |  61 ++++++++++++++++++
 2 files changed, 186 insertions(+)
  

Comments

Dave Hansen Nov. 22, 2022, 11:39 p.m. UTC | #1
On 11/20/22 16:26, Kai Huang wrote:
> TDX provides increased levels of memory confidentiality and integrity.
> This requires special hardware support for features like memory
> encryption and storage of memory integrity checksums.  Not all memory
> satisfies these requirements.
> 
> As a result, TDX introduced the concept of a "Convertible Memory Region"
> (CMR).  During boot, the firmware builds a list of all of the memory
> ranges which can provide the TDX security guarantees.  The list of these
> ranges, along with TDX module information, is available to the kernel by
> querying the TDX module via TDH.SYS.INFO SEAMCALL.

I think the last sentence goes too far.  What does it matter what the
name of the SEAMCALL is?  Who cares at this point?  It's in the patch.
Scroll down two pages if you really care.

> The host kernel can choose whether or not to use all convertible memory
> regions as TDX-usable memory.  Before the TDX module is ready to create
> any TDX guests, the kernel needs to configure the TDX-usable memory
> regions by passing an array of "TD Memory Regions" (TDMRs) to the TDX
> module.  Constructing the TDMR array requires information of both the
> TDX module (TDSYSINFO_STRUCT) and the Convertible Memory Regions.  Call
> TDH.SYS.INFO to get this information as a preparation.

That last sentece is kinda goofy.  I think there's a way to distill this
whole thing down more effecively.

	CMRs tell the kernel which memory is TDX compatible.  The kernel
	takes CMRs and constructs  "TD Memory Regions" (TDMRs).  TDMRs
	let the kernel grante TDX protections to some or all of the CMR
	areas.
	
> Use static variables for both TDSYSINFO_STRUCT and CMR array to avoid

I find it very useful to be precise when referring to code.  Your code
says 'tdsysinfo_struct', yet this says 'TDSYSINFO_STRUCT'.  Why the
difference?

> having to pass them as function arguments when constructing the TDMR
> array.  And they are too big to be put to the stack anyway.  Also, KVM
> needs to use the TDSYSINFO_STRUCT to create TDX guests.

This is also a great place to mention that the tdsysinfo_struct contains
a *lot* of gunk which will not be used for a bit or that may never get
used.

> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 2cf7090667aa..43227af25e44 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -15,6 +15,7 @@
>  #include <linux/cpumask.h>
>  #include <linux/smp.h>
>  #include <linux/atomic.h>
> +#include <linux/align.h>
>  #include <asm/msr-index.h>
>  #include <asm/msr.h>
>  #include <asm/apic.h>
> @@ -40,6 +41,11 @@ static enum tdx_module_status_t tdx_module_status;
>  /* Prevent concurrent attempts on TDX detection and initialization */
>  static DEFINE_MUTEX(tdx_module_lock);
>  
> +/* Below two are used in TDH.SYS.INFO SEAMCALL ABI */
> +static struct tdsysinfo_struct tdx_sysinfo;
> +static struct cmr_info tdx_cmr_array[MAX_CMRS] __aligned(CMR_INFO_ARRAY_ALIGNMENT);
> +static int tdx_cmr_num;
> +
>  /*
>   * Detect TDX private KeyIDs to see whether TDX has been enabled by the
>   * BIOS.  Both initializing the TDX module and running TDX guest require
> @@ -208,6 +214,121 @@ static int tdx_module_init_cpus(void)
>  	return atomic_read(&sc.err);
>  }
>  
> +static inline bool is_cmr_empty(struct cmr_info *cmr)
> +{
> +	return !cmr->size;
> +}
> +
> +static inline bool is_cmr_ok(struct cmr_info *cmr)
> +{
> +	/* CMR must be page aligned */
> +	return IS_ALIGNED(cmr->base, PAGE_SIZE) &&
> +		IS_ALIGNED(cmr->size, PAGE_SIZE);
> +}
> +
> +static void print_cmrs(struct cmr_info *cmr_array, int cmr_num,
> +		       const char *name)
> +{
> +	int i;
> +
> +	for (i = 0; i < cmr_num; i++) {
> +		struct cmr_info *cmr = &cmr_array[i];
> +
> +		pr_info("%s : [0x%llx, 0x%llx)\n", name,
> +				cmr->base, cmr->base + cmr->size);
> +	}
> +}
> +
> +/* Check CMRs reported by TDH.SYS.INFO, and trim tail empty CMRs. */
> +static int trim_empty_cmrs(struct cmr_info *cmr_array, int *actual_cmr_num)
> +{
> +	struct cmr_info *cmr;
> +	int i, cmr_num;
> +
> +	/*
> +	 * Intel TDX module spec, 20.7.3 CMR_INFO:
> +	 *
> +	 *   TDH.SYS.INFO leaf function returns a MAX_CMRS (32) entry
> +	 *   array of CMR_INFO entries. The CMRs are sorted from the
> +	 *   lowest base address to the highest base address, and they
> +	 *   are non-overlapping.
> +	 *
> +	 * This implies that BIOS may generate invalid empty entries
> +	 * if total CMRs are less than 32.  Need to skip them manually.
> +	 *
> +	 * CMR also must be 4K aligned.  TDX doesn't trust BIOS.  TDX
> +	 * actually verifies CMRs before it gets enabled, so anything
> +	 * doesn't meet above means kernel bug (or TDX is broken).
> +	 */

I dislike comments like this that describe all the code below.  Can't
you simply put the comment near the code that implements it?

> +	cmr = &cmr_array[0];
> +	/* There must be at least one valid CMR */
> +	if (WARN_ON_ONCE(is_cmr_empty(cmr) || !is_cmr_ok(cmr)))
> +		goto err;
> +
> +	cmr_num = *actual_cmr_num;
> +	for (i = 1; i < cmr_num; i++) {
> +		struct cmr_info *cmr = &cmr_array[i];
> +		struct cmr_info *prev_cmr = NULL;
> +
> +		/* Skip further empty CMRs */
> +		if (is_cmr_empty(cmr))
> +			break;
> +
> +		/*
> +		 * Do sanity check anyway to make sure CMRs:
> +		 *  - are 4K aligned
> +		 *  - don't overlap
> +		 *  - are in address ascending order.
> +		 */
> +		if (WARN_ON_ONCE(!is_cmr_ok(cmr)))
> +			goto err;

Why does cmr_array[0] get a pass on the empty and sanity checks?

> +		prev_cmr = &cmr_array[i - 1];
> +		if (WARN_ON_ONCE((prev_cmr->base + prev_cmr->size) >
> +					cmr->base))
> +			goto err;
> +	}
> +
> +	/* Update the actual number of CMRs */
> +	*actual_cmr_num = i;

That comment is not helpful.  Yes, this is literally updating the number
of CMRs.  Literally.  That's the "what".  But, the "why" is important.
Why is it doing this?

> +	/* Print kernel checked CMRs */
> +	print_cmrs(cmr_array, *actual_cmr_num, "Kernel-checked-CMR");

This is the point where I start to lose patience with these comments.
These are just a waste of space.

Also, I saw the loop above check 'cmr_num' CMRs for is_cmr_ok().  Now,
it'll print an 'actual_cmr_num=1' number of CMRs as being
"kernel-checked".  Why?  That makes zero sense.

> +	return 0;
> +err:
> +	pr_info("[TDX broken ?]: Invalid CMRs detected\n");
> +	print_cmrs(cmr_array, cmr_num, "BIOS-CMR");
> +	return -EINVAL;
> +}
> +
> +static int tdx_get_sysinfo(void)
> +{
> +	struct tdx_module_output out;
> +	int ret;
> +
> +	BUILD_BUG_ON(sizeof(struct tdsysinfo_struct) != TDSYSINFO_STRUCT_SIZE);
> +
> +	ret = seamcall(TDH_SYS_INFO, __pa(&tdx_sysinfo), TDSYSINFO_STRUCT_SIZE,
> +			__pa(tdx_cmr_array), MAX_CMRS, NULL, &out);
> +	if (ret)
> +		return ret;
> +
> +	/* R9 contains the actual entries written the CMR array. */
> +	tdx_cmr_num = out.r9;
> +
> +	pr_info("TDX module: atributes 0x%x, vendor_id 0x%x, major_version %u, minor_version %u, build_date %u, build_num %u",
> +		tdx_sysinfo.attributes, tdx_sysinfo.vendor_id,
> +		tdx_sysinfo.major_version, tdx_sysinfo.minor_version,
> +		tdx_sysinfo.build_date, tdx_sysinfo.build_num);

This is a case where a little bit of vertical alignment will go a long way:

> +		tdx_sysinfo.attributes,    tdx_sysinfo.vendor_id,
> +		tdx_sysinfo.major_version, tdx_sysinfo.minor_version,
> +		tdx_sysinfo.build_date,    tdx_sysinfo.build_num);

> +
> +	/*
> +	 * trim_empty_cmrs() updates the actual number of CMRs by
> +	 * dropping all tail empty CMRs.
> +	 */
> +	return trim_empty_cmrs(tdx_cmr_array, &tdx_cmr_num);
> +}

Why does this both need to respect the "tdx_cmr_num = out.r9" value
*and* trim the empty ones?  Couldn't it just ignore the "tdx_cmr_num =
out.r9" value and just trim the empty ones either way?  It's not like
there is a billion of them.  It would simplify the code for sure.

>  /*
>   * Detect and initialize the TDX module.
>   *
> @@ -232,6 +353,10 @@ static int init_tdx_module(void)
>  	if (ret)
>  		goto out;
>  
> +	ret = tdx_get_sysinfo();
> +	if (ret)
> +		goto out;
> +
>  	/*
>  	 * Return -EINVAL until all steps of TDX module initialization
>  	 * process are done.
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index 9ba11808bd45..8e273756098c 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -15,10 +15,71 @@
>  /*
>   * TDX module SEAMCALL leaf functions
>   */
> +#define TDH_SYS_INFO		32
>  #define TDH_SYS_INIT		33
>  #define TDH_SYS_LP_INIT		35
>  #define TDH_SYS_LP_SHUTDOWN	44
>  
> +struct cmr_info {
> +	u64	base;
> +	u64	size;
> +} __packed;
> +
> +#define MAX_CMRS			32
> +#define CMR_INFO_ARRAY_ALIGNMENT	512
> +
> +struct cpuid_config {
> +	u32	leaf;
> +	u32	sub_leaf;
> +	u32	eax;
> +	u32	ebx;
> +	u32	ecx;
> +	u32	edx;
> +} __packed;
> +
> +#define TDSYSINFO_STRUCT_SIZE		1024
> +#define TDSYSINFO_STRUCT_ALIGNMENT	1024
> +
> +struct tdsysinfo_struct {
> +	/* TDX-SEAM Module Info */
> +	u32	attributes;
> +	u32	vendor_id;
> +	u32	build_date;
> +	u16	build_num;
> +	u16	minor_version;
> +	u16	major_version;
> +	u8	reserved0[14];
> +	/* Memory Info */
> +	u16	max_tdmrs;
> +	u16	max_reserved_per_tdmr;
> +	u16	pamt_entry_size;
> +	u8	reserved1[10];
> +	/* Control Struct Info */
> +	u16	tdcs_base_size;
> +	u8	reserved2[2];
> +	u16	tdvps_base_size;
> +	u8	tdvps_xfam_dependent_size;
> +	u8	reserved3[9];
> +	/* TD Capabilities */
> +	u64	attributes_fixed0;
> +	u64	attributes_fixed1;
> +	u64	xfam_fixed0;
> +	u64	xfam_fixed1;
> +	u8	reserved4[32];
> +	u32	num_cpuid_config;
> +	/*
> +	 * The actual number of CPUID_CONFIG depends on above
> +	 * 'num_cpuid_config'.  The size of 'struct tdsysinfo_struct'
> +	 * is 1024B defined by TDX architecture.  Use a union with
> +	 * specific padding to make 'sizeof(struct tdsysinfo_struct)'
> +	 * equal to 1024.
> +	 */
> +	union {
> +		struct cpuid_config	cpuid_configs[0];
> +		u8			reserved5[892];
> +	};

Can you double check what the "right" way to do variable arrays is these
days?  I thought the [0] method was discouraged.

Also, it isn't *really* 892 bytes of reserved space, right?  Anything
that's not cpuid_configs[] is reserved, I presume.  Could you try to be
more precise there?

> +} __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT);
> +
>  /*
>   * Do not put any hardware-defined TDX structure representations below
>   * this comment!
  
Kai Huang Nov. 23, 2022, 11:40 a.m. UTC | #2
On Tue, 2022-11-22 at 15:39 -0800, Dave Hansen wrote:
> On 11/20/22 16:26, Kai Huang wrote:
> > TDX provides increased levels of memory confidentiality and integrity.
> > This requires special hardware support for features like memory
> > encryption and storage of memory integrity checksums.  Not all memory
> > satisfies these requirements.
> > 
> > As a result, TDX introduced the concept of a "Convertible Memory Region"
> > (CMR).  During boot, the firmware builds a list of all of the memory
> > ranges which can provide the TDX security guarantees.  The list of these
> > ranges, along with TDX module information, is available to the kernel by
> > querying the TDX module via TDH.SYS.INFO SEAMCALL.
> 
> I think the last sentence goes too far.  What does it matter what the
> name of the SEAMCALL is?  Who cares at this point?  It's in the patch.
> Scroll down two pages if you really care.

I'll remove "via TDH.SYS.INFO SEAMCALL".

> 
> > The host kernel can choose whether or not to use all convertible memory
> > regions as TDX-usable memory.  Before the TDX module is ready to create
> > any TDX guests, the kernel needs to configure the TDX-usable memory
> > regions by passing an array of "TD Memory Regions" (TDMRs) to the TDX
> > module.  Constructing the TDMR array requires information of both the
> > TDX module (TDSYSINFO_STRUCT) and the Convertible Memory Regions.  Call
> > TDH.SYS.INFO to get this information as a preparation.
> 
> That last sentece is kinda goofy.  I think there's a way to distill this
> whole thing down more effecively.
> 
> 	CMRs tell the kernel which memory is TDX compatible.  The kernel
> 	takes CMRs and constructs  "TD Memory Regions" (TDMRs).  TDMRs
> 	let the kernel grante TDX protections to some or all of the CMR
> 	areas.

Will do.

But it seems we should still mention "Constructing TDMRs requires information of
both the TDX module (TDSYSINFO_STRUCT) and the CMRs"?  The reason is to justify
"use static to avoid having to pass them as function arguments when constructing
TDMRs" below.

> 	
> > Use static variables for both TDSYSINFO_STRUCT and CMR array to avoid
> 
> I find it very useful to be precise when referring to code.  Your code
> says 'tdsysinfo_struct', yet this says 'TDSYSINFO_STRUCT'.  Why the
> difference?

Here I actually didn't intend to refer to any code.  In the above paragraph
(that is going to be replaced with yours), I mentioned "TDSYSINFO_STRUCT" to
explain what does "information of the TDX module" actually refer to, since
TDSYSINFO_STRUCT is used in the spec. 

What's your preference?

> 
> > having to pass them as function arguments when constructing the TDMR
> > array.  And they are too big to be put to the stack anyway.  Also, KVM
> > needs to use the TDSYSINFO_STRUCT to create TDX guests.
> 
> This is also a great place to mention that the tdsysinfo_struct contains
> a *lot* of gunk which will not be used for a bit or that may never get
> used.

Perhaps below?

"Note many members in tdsysinfo_struct' are not used by the kernel".

Btw, may I ask why does it matter?

[...]


> > +
> > +/* Check CMRs reported by TDH.SYS.INFO, and trim tail empty CMRs. */
> > +static int trim_empty_cmrs(struct cmr_info *cmr_array, int *actual_cmr_num)
> > +{
> > +	struct cmr_info *cmr;
> > +	int i, cmr_num;
> > +
> > +	/*
> > +	 * Intel TDX module spec, 20.7.3 CMR_INFO:
> > +	 *
> > +	 *   TDH.SYS.INFO leaf function returns a MAX_CMRS (32) entry
> > +	 *   array of CMR_INFO entries. The CMRs are sorted from the
> > +	 *   lowest base address to the highest base address, and they
> > +	 *   are non-overlapping.
> > +	 *
> > +	 * This implies that BIOS may generate invalid empty entries
> > +	 * if total CMRs are less than 32.  Need to skip them manually.
> > +	 *
> > +	 * CMR also must be 4K aligned.  TDX doesn't trust BIOS.  TDX
> > +	 * actually verifies CMRs before it gets enabled, so anything
> > +	 * doesn't meet above means kernel bug (or TDX is broken).
> > +	 */
> 
> I dislike comments like this that describe all the code below.  Can't
> you simply put the comment near the code that implements it?

Will do.

> 
> > +	cmr = &cmr_array[0];
> > +	/* There must be at least one valid CMR */
> > +	if (WARN_ON_ONCE(is_cmr_empty(cmr) || !is_cmr_ok(cmr)))
> > +		goto err;
> > +
> > +	cmr_num = *actual_cmr_num;
> > +	for (i = 1; i < cmr_num; i++) {
> > +		struct cmr_info *cmr = &cmr_array[i];
> > +		struct cmr_info *prev_cmr = NULL;
> > +
> > +		/* Skip further empty CMRs */
> > +		if (is_cmr_empty(cmr))
> > +			break;
> > +
> > +		/*
> > +		 * Do sanity check anyway to make sure CMRs:
> > +		 *  - are 4K aligned
> > +		 *  - don't overlap
> > +		 *  - are in address ascending order.
> > +		 */
> > +		if (WARN_ON_ONCE(!is_cmr_ok(cmr)))
> > +			goto err;
> 
> Why does cmr_array[0] get a pass on the empty and sanity checks?

TDX MCHECK verifies CMRs before enabling TDX, so there must be at least one
valid CMR.

And cmr_array[0] is checked before this loop.

> 
> > +		prev_cmr = &cmr_array[i - 1];
> > +		if (WARN_ON_ONCE((prev_cmr->base + prev_cmr->size) >
> > +					cmr->base))
> > +			goto err;
> > +	}
> > +
> > +	/* Update the actual number of CMRs */
> > +	*actual_cmr_num = i;
> 
> That comment is not helpful.  Yes, this is literally updating the number
> of CMRs.  Literally.  That's the "what".  But, the "why" is important.
> Why is it doing this?

When building the list of "TDX-usable" memory regions, the kernel verifies those
regions against CMRs to see whether they are truly convertible memory.

How about adding a comment like below:

	/*
	 * When the kernel builds the TDX-usable memory regions, it verifies
	 * they are truly convertible memory by checking them against CMRs.
	 * Update the actual number of CMRs to skip those empty CMRs.
	 */

Also, I think printing CMRs in the dmesg is helpful.  Printing empty (zero) CMRs
will put meaningless log to the dmesg.
 
> 
> > +	/* Print kernel checked CMRs */
> > +	print_cmrs(cmr_array, *actual_cmr_num, "Kernel-checked-CMR");
> 
> This is the point where I start to lose patience with these comments.
> These are just a waste of space.

Sorry will remove.

> 
> Also, I saw the loop above check 'cmr_num' CMRs for is_cmr_ok().  Now,
> it'll print an 'actual_cmr_num=1' number of CMRs as being
> "kernel-checked".  Why?  That makes zero sense.

The loop quits when it sees an empty CMR.  I think there's no need to check
further CMRs as they must be empty (TDX MCHECK verifies CMRs).

> 
> > +	return 0;
> > +err:
> > +	pr_info("[TDX broken ?]: Invalid CMRs detected\n");
> > +	print_cmrs(cmr_array, cmr_num, "BIOS-CMR");
> > +	return -EINVAL;
> > +}
> > +
> > +static int tdx_get_sysinfo(void)
> > +{
> > +	struct tdx_module_output out;
> > +	int ret;
> > +
> > +	BUILD_BUG_ON(sizeof(struct tdsysinfo_struct) != TDSYSINFO_STRUCT_SIZE);
> > +
> > +	ret = seamcall(TDH_SYS_INFO, __pa(&tdx_sysinfo), TDSYSINFO_STRUCT_SIZE,
> > +			__pa(tdx_cmr_array), MAX_CMRS, NULL, &out);
> > +	if (ret)
> > +		return ret;
> > +
> > +	/* R9 contains the actual entries written the CMR array. */
> > +	tdx_cmr_num = out.r9;
> > +
> > +	pr_info("TDX module: atributes 0x%x, vendor_id 0x%x, major_version %u, minor_version %u, build_date %u, build_num %u",
> > +		tdx_sysinfo.attributes, tdx_sysinfo.vendor_id,
> > +		tdx_sysinfo.major_version, tdx_sysinfo.minor_version,
> > +		tdx_sysinfo.build_date, tdx_sysinfo.build_num);
> 
> This is a case where a little bit of vertical alignment will go a long way:
> 
> > +		tdx_sysinfo.attributes,    tdx_sysinfo.vendor_id,
> > +		tdx_sysinfo.major_version, tdx_sysinfo.minor_version,
> > +		tdx_sysinfo.build_date,    tdx_sysinfo.build_num);

Thanks will do.

> 
> > +
> > +	/*
> > +	 * trim_empty_cmrs() updates the actual number of CMRs by
> > +	 * dropping all tail empty CMRs.
> > +	 */
> > +	return trim_empty_cmrs(tdx_cmr_array, &tdx_cmr_num);
> > +}
> 
> Why does this both need to respect the "tdx_cmr_num = out.r9" value
> *and* trim the empty ones?  Couldn't it just ignore the "tdx_cmr_num =
> out.r9" value and just trim the empty ones either way?  It's not like
> there is a billion of them.  It would simplify the code for sure.

OK.  Since spec says MAX_CMRs is 32, so I can use 32 instead of reading out from
R9.

[...]

> > +struct cpuid_config {
> > +	u32	leaf;
> > +	u32	sub_leaf;
> > +	u32	eax;
> > +	u32	ebx;
> > +	u32	ecx;
> > +	u32	edx;
> > +} __packed;
> > +
> > +#define TDSYSINFO_STRUCT_SIZE		1024
> > +#define TDSYSINFO_STRUCT_ALIGNMENT	1024
> > +
> > +struct tdsysinfo_struct {
> > +	/* TDX-SEAM Module Info */
> > +	u32	attributes;
> > +	u32	vendor_id;
> > +	u32	build_date;
> > +	u16	build_num;
> > +	u16	minor_version;
> > +	u16	major_version;
> > +	u8	reserved0[14];
> > +	/* Memory Info */
> > +	u16	max_tdmrs;
> > +	u16	max_reserved_per_tdmr;
> > +	u16	pamt_entry_size;
> > +	u8	reserved1[10];
> > +	/* Control Struct Info */
> > +	u16	tdcs_base_size;
> > +	u8	reserved2[2];
> > +	u16	tdvps_base_size;
> > +	u8	tdvps_xfam_dependent_size;
> > +	u8	reserved3[9];
> > +	/* TD Capabilities */
> > +	u64	attributes_fixed0;
> > +	u64	attributes_fixed1;
> > +	u64	xfam_fixed0;
> > +	u64	xfam_fixed1;
> > +	u8	reserved4[32];
> > +	u32	num_cpuid_config;
> > +	/*
> > +	 * The actual number of CPUID_CONFIG depends on above
> > +	 * 'num_cpuid_config'.  The size of 'struct tdsysinfo_struct'
> > +	 * is 1024B defined by TDX architecture.  Use a union with
> > +	 * specific padding to make 'sizeof(struct tdsysinfo_struct)'
> > +	 * equal to 1024.
> > +	 */
> > +	union {
> > +		struct cpuid_config	cpuid_configs[0];
> > +		u8			reserved5[892];
> > +	};
> 
> Can you double check what the "right" way to do variable arrays is these
> days?  I thought the [0] method was discouraged.
> 
> Also, it isn't *really* 892 bytes of reserved space, right?  Anything
> that's not cpuid_configs[] is reserved, I presume.  Could you try to be
> more precise there?

I'll do some study first here and get back to you.  Thanks.

The intention is to make sure the structure size is 1024B, so that the static
variable will have enough space for the TDX module to write.
  
Dave Hansen Nov. 23, 2022, 4:44 p.m. UTC | #3
On 11/23/22 03:40, Huang, Kai wrote:
> On Tue, 2022-11-22 at 15:39 -0800, Dave Hansen wrote:
>> That last sentece is kinda goofy.  I think there's a way to distill this
>> whole thing down more effecively.
>>
>>       CMRs tell the kernel which memory is TDX compatible.  The kernel
>>       takes CMRs and constructs  "TD Memory Regions" (TDMRs).  TDMRs
>>       let the kernel grant TDX protections to some or all of the CMR
>>       areas.
> 
> Will do.
> 
> But it seems we should still mention "Constructing TDMRs requires information of
> both the TDX module (TDSYSINFO_STRUCT) and the CMRs"?  The reason is to justify
> "use static to avoid having to pass them as function arguments when constructing
> TDMRs" below.

In a changelog, no.  You do *NOT* use super technical language in
changelogs if not super necessary.  Mentioning "TDSYSINFO_STRUCT" here
is useless.  The *MOST* you would do for a good changelog is:

	The kernel takes CMRs (plus a little more metadata) and
	constructs "TD Memory Regions" (TDMRs).

You just need to talk about things at a high level in mostly
non-technical language so that folks know the structure of the code
below.  It's not a replacement for the code, the comments, *OR* the TDX
module specification.

I'm also not quite sure that this justifies the static variables anyway.
 They could be dynamically allocated and passed around, for instance.

>>> Use static variables for both TDSYSINFO_STRUCT and CMR array to avoid
>>
>> I find it very useful to be precise when referring to code.  Your code
>> says 'tdsysinfo_struct', yet this says 'TDSYSINFO_STRUCT'.  Why the
>> difference?
> 
> Here I actually didn't intend to refer to any code.  In the above paragraph
> (that is going to be replaced with yours), I mentioned "TDSYSINFO_STRUCT" to
> explain what does "information of the TDX module" actually refer to, since
> TDSYSINFO_STRUCT is used in the spec.
> 
> What's your preference?

Kill all mentions to TDSYSINFO_STRUCT whatsoever in the changelog.
Write comprehensible English.

>>> having to pass them as function arguments when constructing the TDMR
>>> array.  And they are too big to be put to the stack anyway.  Also, KVM
>>> needs to use the TDSYSINFO_STRUCT to create TDX guests.
>>
>> This is also a great place to mention that the tdsysinfo_struct contains
>> a *lot* of gunk which will not be used for a bit or that may never get
>> used.
> 
> Perhaps below?
> 
> "Note many members in tdsysinfo_struct' are not used by the kernel".
> 
> Btw, may I ask why does it matter?

Because you're adding a massive structure with all kinds of fields.
Those fields mostly aren't used.  That could be from an error in this
series, or because they will be used later or because they will *never*
be used.

>>> +   cmr = &cmr_array[0];
>>> +   /* There must be at least one valid CMR */
>>> +   if (WARN_ON_ONCE(is_cmr_empty(cmr) || !is_cmr_ok(cmr)))
>>> +           goto err;
>>> +
>>> +   cmr_num = *actual_cmr_num;
>>> +   for (i = 1; i < cmr_num; i++) {
>>> +           struct cmr_info *cmr = &cmr_array[i];
>>> +           struct cmr_info *prev_cmr = NULL;
>>> +
>>> +           /* Skip further empty CMRs */
>>> +           if (is_cmr_empty(cmr))
>>> +                   break;
>>> +
>>> +           /*
>>> +            * Do sanity check anyway to make sure CMRs:
>>> +            *  - are 4K aligned
>>> +            *  - don't overlap
>>> +            *  - are in address ascending order.
>>> +            */
>>> +           if (WARN_ON_ONCE(!is_cmr_ok(cmr)))
>>> +                   goto err;
>>
>> Why does cmr_array[0] get a pass on the empty and sanity checks?
> 
> TDX MCHECK verifies CMRs before enabling TDX, so there must be at least one
> valid CMR.
> 
> And cmr_array[0] is checked before this loop.

I think you're confusing two separate things.  MCHECK ensures that there
is convertible memory.  The CMRs that this code looks at are software
(TD module) defined and created structures that the OS and the module share.

This cmr_array[] structure is not created by MCHECK.

Go look at your code.  Consider what will happen if cmr_array[0] is
empty or !is_cmr_ok().  Then consider what will happen if cmr_array[1]
has the same happen.

Does that end result really justify having separate code for
cmr_array[0] and cmr_array[>0]?

>>> +           prev_cmr = &cmr_array[i - 1];
>>> +           if (WARN_ON_ONCE((prev_cmr->base + prev_cmr->size) >
>>> +                                   cmr->base))
>>> +                   goto err;
>>> +   }
>>> +
>>> +   /* Update the actual number of CMRs */
>>> +   *actual_cmr_num = i;
>>
>> That comment is not helpful.  Yes, this is literally updating the number
>> of CMRs.  Literally.  That's the "what".  But, the "why" is important.
>> Why is it doing this?
> 
> When building the list of "TDX-usable" memory regions, the kernel verifies those
> regions against CMRs to see whether they are truly convertible memory.
> 
> How about adding a comment like below:
> 
>         /*
>          * When the kernel builds the TDX-usable memory regions, it verifies
>          * they are truly convertible memory by checking them against CMRs.
>          * Update the actual number of CMRs to skip those empty CMRs.
>          */
> 
> Also, I think printing CMRs in the dmesg is helpful.  Printing empty (zero) CMRs
> will put meaningless log to the dmesg.

So it's just about printing them?

Then put a dang switch to the print function that says "print them all"
or not.

...
>> Also, I saw the loop above check 'cmr_num' CMRs for is_cmr_ok().  Now,
>> it'll print an 'actual_cmr_num=1' number of CMRs as being
>> "kernel-checked".  Why?  That makes zero sense.
> 
> The loop quits when it sees an empty CMR.  I think there's no need to check
> further CMRs as they must be empty (TDX MCHECK verifies CMRs).

OK, so you're going to get some more homework here.  Please explain to
me how MCHECK and the CMR array that comes out of the TDX module are
related.  How does the output from MCHECK get turned into the in-memory
cmr_array[], step by step?

At this point, I fear that you're offering up MCHECK like it's a bag of
magic beans rather than really truly thinking about the cmr_array[] data
structure.  How it is generated?  How might it be broken? Who might
break it?   If so, what the kernel should do about it?


>>> +
>>> +   /*
>>> +    * trim_empty_cmrs() updates the actual number of CMRs by
>>> +    * dropping all tail empty CMRs.
>>> +    */
>>> +   return trim_empty_cmrs(tdx_cmr_array, &tdx_cmr_num);
>>> +}
>>
>> Why does this both need to respect the "tdx_cmr_num = out.r9" value
>> *and* trim the empty ones?  Couldn't it just ignore the "tdx_cmr_num =
>> out.r9" value and just trim the empty ones either way?  It's not like
>> there is a billion of them.  It would simplify the code for sure.
> 
> OK.  Since spec says MAX_CMRs is 32, so I can use 32 instead of reading out from
> R9.

But then you still have the "trimming" code.  Why not just trust "r9"
and then axe all the trimming code?  Heck, and most of the sanity checks.

This code could be a *lot* smaller.
  
Kai Huang Nov. 23, 2022, 10:53 p.m. UTC | #4
On Wed, 2022-11-23 at 08:44 -0800, Dave Hansen wrote:
> > On 11/23/22 03:40, Huang, Kai wrote:
> > > > On Tue, 2022-11-22 at 15:39 -0800, Dave Hansen wrote:
> > > > > > That last sentece is kinda goofy.  I think there's a way to distill this
> > > > > > whole thing down more effecively.
> > > > > > 
> > > > > >       CMRs tell the kernel which memory is TDX compatible.  The kernel
> > > > > >       takes CMRs and constructs  "TD Memory Regions" (TDMRs).  TDMRs
> > > > > >       let the kernel grant TDX protections to some or all of the CMR
> > > > > >       areas.
> > > > 
> > > > Will do.
> > > > 
> > > > But it seems we should still mention "Constructing TDMRs requires information of
> > > > both the TDX module (TDSYSINFO_STRUCT) and the CMRs"?  The reason is to justify
> > > > "use static to avoid having to pass them as function arguments when constructing
> > > > TDMRs" below.
> > 
> > In a changelog, no.  You do *NOT* use super technical language in
> > changelogs if not super necessary.  Mentioning "TDSYSINFO_STRUCT" here
> > is useless.  The *MOST* you would do for a good changelog is:
> > 
> > 	The kernel takes CMRs (plus a little more metadata) and
> > 	constructs "TD Memory Regions" (TDMRs).
> > 
> > You just need to talk about things at a high level in mostly
> > non-technical language so that folks know the structure of the code
> > below.  It's not a replacement for the code, the comments, *OR* the TDX
> > module specification.
> > 
> > I'm also not quite sure that this justifies the static variables anyway.
> >  They could be dynamically allocated and passed around, for instance.

I see. Thanks for explaining.

> > 
> > > > > > > > Use static variables for both TDSYSINFO_STRUCT and CMR array to avoid
> > > > > > 
> > > > > > I find it very useful to be precise when referring to code.  Your code
> > > > > > says 'tdsysinfo_struct', yet this says 'TDSYSINFO_STRUCT'.  Why the
> > > > > > difference?
> > > > 
> > > > Here I actually didn't intend to refer to any code.  In the above paragraph
> > > > (that is going to be replaced with yours), I mentioned "TDSYSINFO_STRUCT" to
> > > > explain what does "information of the TDX module" actually refer to, since
> > > > TDSYSINFO_STRUCT is used in the spec.
> > > > 
> > > > What's your preference?
> > 
> > Kill all mentions to TDSYSINFO_STRUCT whatsoever in the changelog.
> > Write comprehensible English.

OK.

> > 
> > > > > > > > having to pass them as function arguments when constructing the TDMR
> > > > > > > > array.  And they are too big to be put to the stack anyway.  Also, KVM
> > > > > > > > needs to use the TDSYSINFO_STRUCT to create TDX guests.
> > > > > > 
> > > > > > This is also a great place to mention that the tdsysinfo_struct contains
> > > > > > a *lot* of gunk which will not be used for a bit or that may never get
> > > > > > used.
> > > > 
> > > > Perhaps below?
> > > > 
> > > > "Note many members in tdsysinfo_struct' are not used by the kernel".
> > > > 
> > > > Btw, may I ask why does it matter?
> > 
> > Because you're adding a massive structure with all kinds of fields.
> > Those fields mostly aren't used.  That could be from an error in this
> > series, or because they will be used later or because they will *never*
> > be used.

OK.

> > 
> > > > > > > > +   cmr = &cmr_array[0];
> > > > > > > > +   /* There must be at least one valid CMR */
> > > > > > > > +   if (WARN_ON_ONCE(is_cmr_empty(cmr) || !is_cmr_ok(cmr)))
> > > > > > > > +           goto err;
> > > > > > > > +
> > > > > > > > +   cmr_num = *actual_cmr_num;
> > > > > > > > +   for (i = 1; i < cmr_num; i++) {
> > > > > > > > +           struct cmr_info *cmr = &cmr_array[i];
> > > > > > > > +           struct cmr_info *prev_cmr = NULL;
> > > > > > > > +
> > > > > > > > +           /* Skip further empty CMRs */
> > > > > > > > +           if (is_cmr_empty(cmr))
> > > > > > > > +                   break;
> > > > > > > > +
> > > > > > > > +           /*
> > > > > > > > +            * Do sanity check anyway to make sure CMRs:
> > > > > > > > +            *  - are 4K aligned
> > > > > > > > +            *  - don't overlap
> > > > > > > > +            *  - are in address ascending order.
> > > > > > > > +            */
> > > > > > > > +           if (WARN_ON_ONCE(!is_cmr_ok(cmr)))
> > > > > > > > +                   goto err;
> > > > > > 
> > > > > > Why does cmr_array[0] get a pass on the empty and sanity checks?
> > > > 
> > > > TDX MCHECK verifies CMRs before enabling TDX, so there must be at least one
> > > > valid CMR.
> > > > 
> > > > And cmr_array[0] is checked before this loop.
> > 
> > I think you're confusing two separate things.  MCHECK ensures that there
> > is convertible memory.  The CMRs that this code looks at are software
> > (TD module) defined and created structures that the OS and the module share.

Not sure whether I completely got your words, but the CMRs are generated by the
BIOS, verified and stored by the MCHECK. Thus the CMR structure is also
meaningful to the BIOS and the MCHECK, but not TDX module defined and created.

There are couple of places in the TDX module spec which says this. One example
is "Table 3.1: Typical Intel TDX Module Platform-Scope Initialization Sequence"
and "13.1.1. Initialization and Configuration Flow". They both mention:

"BIOS configures Convertible Memory Regions (CMRs); MCHECK checks them and
securely stores the information."

Also, "20.8.3 CMR_INFO":

"CMR_INFO is designed to provide information about a Convertible Memory Range
(CMR), as configured by BIOS and checked and stored securely by MCHECK."

> > 
> > This cmr_array[] structure is not created by MCHECK.

Right.

But TDH.SYS.INFO only "Retrieve Intel TDX module information and convertible
memory (CMR) information." by writing CMRs to the buffer provided by the kernel
(cmr_array[]).

So my understanding is the entries in the cmr_array[] are just the same CMRs
that are verified by the MCHECK.

> > 
> > Go look at your code.  Consider what will happen if cmr_array[0] is
> > empty or !is_cmr_ok().  Then consider what will happen if cmr_array[1]
> > has the same happen.
> > 
> > Does that end result really justify having separate code for
> > cmr_array[0] and cmr_array[>0]?

One slight difference is cmr_array[0] must be valid, but cmr_array[>1] can be
empty. And for cmr_array[>0] we also have additional check against the previous
one.

> > 
> > > > > > > > +           prev_cmr = &cmr_array[i - 1];
> > > > > > > > +           if (WARN_ON_ONCE((prev_cmr->base + prev_cmr->size) >
> > > > > > > > +                                   cmr->base))
> > > > > > > > +                   goto err;
> > > > > > > > +   }
> > > > > > > > +
> > > > > > > > +   /* Update the actual number of CMRs */
> > > > > > > > +   *actual_cmr_num = i;
> > > > > > 
> > > > > > That comment is not helpful.  Yes, this is literally updating the number
> > > > > > of CMRs.  Literally.  That's the "what".  But, the "why" is important.
> > > > > > Why is it doing this?
> > > > 
> > > > When building the list of "TDX-usable" memory regions, the kernel verifies those
> > > > regions against CMRs to see whether they are truly convertible memory.
> > > > 
> > > > How about adding a comment like below:
> > > > 
> > > >         /*
> > > >          * When the kernel builds the TDX-usable memory regions, it verifies
> > > >          * they are truly convertible memory by checking them against CMRs.
> > > >          * Update the actual number of CMRs to skip those empty CMRs.
> > > >          */
> > > > 
> > > > Also, I think printing CMRs in the dmesg is helpful.  Printing empty (zero) CMRs
> > > > will put meaningless log to the dmesg.
> > 
> > So it's just about printing them?
> > 
> > Then put a dang switch to the print function that says "print them all"
> > or not.

Yes can do. Currently "print them all" is only done when the CMR sanity check
fails. We can unconditionally "print valid CMRs" if we don't need that check.

> > 
> > ...
> > > > > > Also, I saw the loop above check 'cmr_num' CMRs for is_cmr_ok().  Now,
> > > > > > it'll print an 'actual_cmr_num=1' number of CMRs as being
> > > > > > "kernel-checked".  Why?  That makes zero sense.
> > > > 
> > > > The loop quits when it sees an empty CMR.  I think there's no need to check
> > > > further CMRs as they must be empty (TDX MCHECK verifies CMRs).
> > 
> > OK, so you're going to get some more homework here.  Please explain to
> > me how MCHECK and the CMR array that comes out of the TDX module are
> > related.  How does the output from MCHECK get turned into the in-memory
> > cmr_array[], step by step?
> > 

(Please also see my above reply)

1. BIOS generates the CMRs and pass to the MCHECK
2. MCHECK verifies CMRs and stores the "CMR table in a pre-defined location in
SEAMRR’s SEAMCFG region so it can be read later and trusted by the Intel TDX
module" (13.1.4.1 Intel TDX ISA Background: Convertible Memory Ranges (CMRs)).
3. TDH.SYS.INFO copies the CMRs to the buffer provided by the kernel
(cmr_array[]).

> > At this point, I fear that you're offering up MCHECK like it's a bag of
> > magic beans rather than really truly thinking about the cmr_array[] data
> > structure.  How it is generated?  How might it be broken? Who might
> > break it?   If so, what the kernel should do about it?

Only kernel bug can break the cmr_array[] I think. As described in "13.1.4.1
Intel TDX ISA Background: Convertible Memory Ranges (CMRs)", MCHECK should have
guaranteed that:
	- there must be one CMR
	- CMR is page aligned
	- CMRs don't overlap and in address ascending order

The only legal thing is there might be empty CMRs at the tail of the cmr_array[]
following one or more valid CMRs.

> > 
> > 
> > > > > > > > +
> > > > > > > > +   /*
> > > > > > > > +    * trim_empty_cmrs() updates the actual number of CMRs by
> > > > > > > > +    * dropping all tail empty CMRs.
> > > > > > > > +    */
> > > > > > > > +   return trim_empty_cmrs(tdx_cmr_array, &tdx_cmr_num);
> > > > > > > > +}
> > > > > > 
> > > > > > Why does this both need to respect the "tdx_cmr_num = out.r9" value
> > > > > > *and* trim the empty ones?  Couldn't it just ignore the "tdx_cmr_num =
> > > > > > out.r9" value and just trim the empty ones either way?  It's not like
> > > > > > there is a billion of them.  It would simplify the code for sure.
> > > > 
> > > > OK.  Since spec says MAX_CMRs is 32, so I can use 32 instead of reading out from
> > > > R9.
> > 
> > But then you still have the "trimming" code.  Why not just trust "r9"
> > and then axe all the trimming code?  Heck, and most of the sanity checks.
> > 
> > This code could be a *lot* smaller.

As I said the only problem is there might be empty CMRs at the tail of the
cmr_array[] following one or more valid CMRs.  

But we can also do nothing here, but just skip empty CMRs when comparing the
memory region to it (in next patch).

Or, we don't even need to explicitly check memory region against CMRs. If the
memory regions that we provided in the TDMR doesn't fall into CMR, then
TDH.SYS.CONFIG will fail. We can just depend on the SEAMCALL to do that.
  
Kai Huang Dec. 2, 2022, 11:11 a.m. UTC | #5
On Tue, 2022-11-22 at 15:39 -0800, Dave Hansen wrote:
> > +#define TDSYSINFO_STRUCT_SIZE		1024
> > +#define TDSYSINFO_STRUCT_ALIGNMENT	1024
> > +
> > +struct tdsysinfo_struct {
> > +	/* TDX-SEAM Module Info */
> > +	u32	attributes;
> > +	u32	vendor_id;
> > +	u32	build_date;
> > +	u16	build_num;
> > +	u16	minor_version;
> > +	u16	major_version;
> > +	u8	reserved0[14];
> > +	/* Memory Info */
> > +	u16	max_tdmrs;
> > +	u16	max_reserved_per_tdmr;
> > +	u16	pamt_entry_size;
> > +	u8	reserved1[10];
> > +	/* Control Struct Info */
> > +	u16	tdcs_base_size;
> > +	u8	reserved2[2];
> > +	u16	tdvps_base_size;
> > +	u8	tdvps_xfam_dependent_size;
> > +	u8	reserved3[9];
> > +	/* TD Capabilities */
> > +	u64	attributes_fixed0;
> > +	u64	attributes_fixed1;
> > +	u64	xfam_fixed0;
> > +	u64	xfam_fixed1;
> > +	u8	reserved4[32];
> > +	u32	num_cpuid_config;
> > +	/*
> > +	 * The actual number of CPUID_CONFIG depends on above
> > +	 * 'num_cpuid_config'.  The size of 'struct tdsysinfo_struct'
> > +	 * is 1024B defined by TDX architecture.  Use a union with
> > +	 * specific padding to make 'sizeof(struct tdsysinfo_struct)'
> > +	 * equal to 1024.
> > +	 */
> > +	union {
> > +		struct cpuid_config	cpuid_configs[0];
> > +		u8			reserved5[892];
> > +	};
> 
> Can you double check what the "right" way to do variable arrays is these
> days?  I thought the [0] method was discouraged.
> 
> Also, it isn't *really* 892 bytes of reserved space, right?  Anything
> that's not cpuid_configs[] is reserved, I presume.  Could you try to be
> more precise there?

Hi Dave,

I did some search, and I think we should use DECLARE_FLEX_ARRAY() macro?

And also to address you concern that not all 892 bytes are reserved, how about
below:

        union {
-               struct cpuid_config     cpuid_configs[0];
-               u8                      reserved5[892];
+               DECLARE_FLEX_ARRAY(struct cpuid_config, cpuid_configs);
+               u8 padding[892];
        };
 } __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT);

The goal is to make the size of 'struct tdsysinfo_struct' to be 1024B so we can
use a static variable for it, and at the meantime, it can still have 1024B
(enough space) for the TDH.SYS.INFO to write to.
  
Kai Huang Dec. 2, 2022, 11:19 a.m. UTC | #6
On Wed, 2022-11-23 at 22:53 +0000, Huang, Kai wrote:
> > > > > > > > > +
> > > > > > > > > +   /*
> > > > > > > > > +    * trim_empty_cmrs() updates the actual number of CMRs by
> > > > > > > > > +    * dropping all tail empty CMRs.
> > > > > > > > > +    */
> > > > > > > > > +   return trim_empty_cmrs(tdx_cmr_array, &tdx_cmr_num);
> > > > > > > > > +}
> > > > > > > 
> > > > > > > Why does this both need to respect the "tdx_cmr_num = out.r9"
> > > > > > > value
> > > > > > > *and* trim the empty ones?  Couldn't it just ignore the
> > > > > > > "tdx_cmr_num =
> > > > > > > out.r9" value and just trim the empty ones either way?  It's not
> > > > > > > like
> > > > > > > there is a billion of them.  It would simplify the code for sure.
> > > > > 
> > > > > OK.  Since spec says MAX_CMRs is 32, so I can use 32 instead of
> > > > > reading out from
> > > > > R9.
> > > 
> > > But then you still have the "trimming" code.  Why not just trust "r9"
> > > and then axe all the trimming code?  Heck, and most of the sanity checks.
> > > 
> > > This code could be a *lot* smaller.
> 
> As I said the only problem is there might be empty CMRs at the tail of the
> cmr_array[] following one or more valid CMRs.  

Hi Dave,

Probably I forgot to mention the "r9" in practice always returns 32, so there
will be empty CMRs at the tail of the cmr_array[].

> 
> But we can also do nothing here, but just skip empty CMRs when comparing the
> memory region to it (in next patch).
> 
> Or, we don't even need to explicitly check memory region against CMRs. If the
> memory regions that we provided in the TDMR doesn't fall into CMR, then
> TDH.SYS.CONFIG will fail. We can just depend on the SEAMCALL to do that.

Sorry to ping, but do you have any comments here?

How about we just don't do any check of TDX memory regions against CMRs, but 
just let the TDH.SYS.CONFIG SEAMCALL to determine?
  
Dave Hansen Dec. 2, 2022, 5:06 p.m. UTC | #7
On 12/2/22 03:11, Huang, Kai wrote:
> And also to address you concern that not all 892 bytes are reserved, how about
> below:
> 
>         union {
> -               struct cpuid_config     cpuid_configs[0];
> -               u8                      reserved5[892];
> +               DECLARE_FLEX_ARRAY(struct cpuid_config, cpuid_configs);
> +               u8 padding[892];
>         };
>  } __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT);
> 
> The goal is to make the size of 'struct tdsysinfo_struct' to be 1024B so we can
> use a static variable for it, and at the meantime, it can still have 1024B
> (enough space) for the TDH.SYS.INFO to write to.

I just don't like the open-coded sizes.

For instance, wouldn't it be great if you didn't have to know the size
of *ANYTHING* else to properly size the '892'?

Maybe we just need some helpers to hide the gunk:

#define DECLARE_PADDED_STRUCT(type, name, alignment) 	\
struct type##_padded {					\
	union {						\
		struct type name;			\
		u8 padding[alignment];			\
	}						\
} name##_padded;

#define PADDED_STRUCT(name)	(name##_padded.name)

That can get used like this:

DECLARE_PADDED_STRUCT(struct tdsysinfo_struct, tdsysinfo,
		      TDSYSINFO_STRUCT_ALIGNMENT);


	struct tdsysinfo_struct sysinfo = PADDED_STRUCT(tdsysinfo)
  
Dave Hansen Dec. 2, 2022, 5:25 p.m. UTC | #8
On 12/2/22 03:19, Huang, Kai wrote:
> Probably I forgot to mention the "r9" in practice always returns 32, so there
> will be empty CMRs at the tail of the cmr_array[].

Right, so the r9 value is basically useless.  I bet the code gets
simpler if you just ignore it.

>> But we can also do nothing here, but just skip empty CMRs when comparing the
>> memory region to it (in next patch).
>>
>> Or, we don't even need to explicitly check memory region against CMRs. If the
>> memory regions that we provided in the TDMR doesn't fall into CMR, then
>> TDH.SYS.CONFIG will fail. We can just depend on the SEAMCALL to do that.
> 
> Sorry to ping, but do you have any comments here?
> 
> How about we just don't do any check of TDX memory regions against CMRs, but
> just let the TDH.SYS.CONFIG SEAMCALL to determine?

Right, if we screw it up TDH.SYS.CONFIG SEAMCALL will fail.  We don't
need to add more code to detect that failure ourselves.  TDX is screwed
either way.
  
Kai Huang Dec. 2, 2022, 9:56 p.m. UTC | #9
On Fri, 2022-12-02 at 09:06 -0800, Dave Hansen wrote:
> On 12/2/22 03:11, Huang, Kai wrote:
> > And also to address you concern that not all 892 bytes are reserved, how about
> > below:
> > 
> >         union {
> > -               struct cpuid_config     cpuid_configs[0];
> > -               u8                      reserved5[892];
> > +               DECLARE_FLEX_ARRAY(struct cpuid_config, cpuid_configs);
> > +               u8 padding[892];
> >         };
> >  } __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT);
> > 
> > The goal is to make the size of 'struct tdsysinfo_struct' to be 1024B so we can
> > use a static variable for it, and at the meantime, it can still have 1024B
> > (enough space) for the TDH.SYS.INFO to write to.
> 
> I just don't like the open-coded sizes.
> 
> For instance, wouldn't it be great if you didn't have to know the size
> of *ANYTHING* else to properly size the '892'?
> 
> Maybe we just need some helpers to hide the gunk:
> 
> #define DECLARE_PADDED_STRUCT(type, name, alignment) 	\
> struct type##_padded {					\
> 	union {						\
> 		struct type name;			\
> 		u8 padding[alignment];			\
> 	}						\
> } name##_padded;
> 
> #define PADDED_STRUCT(name)	(name##_padded.name)
> 
> That can get used like this:
> 
> DECLARE_PADDED_STRUCT(struct tdsysinfo_struct, tdsysinfo,
> 		      TDSYSINFO_STRUCT_ALIGNMENT);
> 
> 
> 	struct tdsysinfo_struct sysinfo = PADDED_STRUCT(tdsysinfo)

Thanks.  Will try out this way.
  
Kai Huang Dec. 2, 2022, 9:57 p.m. UTC | #10
On Fri, 2022-12-02 at 09:25 -0800, Dave Hansen wrote:
> On 12/2/22 03:19, Huang, Kai wrote:
> > Probably I forgot to mention the "r9" in practice always returns 32, so there
> > will be empty CMRs at the tail of the cmr_array[].
> 
> Right, so the r9 value is basically useless.  I bet the code gets
> simpler if you just ignore it.
> 
> > > But we can also do nothing here, but just skip empty CMRs when comparing the
> > > memory region to it (in next patch).
> > > 
> > > Or, we don't even need to explicitly check memory region against CMRs. If the
> > > memory regions that we provided in the TDMR doesn't fall into CMR, then
> > > TDH.SYS.CONFIG will fail. We can just depend on the SEAMCALL to do that.
> > 
> > Sorry to ping, but do you have any comments here?
> > 
> > How about we just don't do any check of TDX memory regions against CMRs, but
> > just let the TDH.SYS.CONFIG SEAMCALL to determine?
> 
> Right, if we screw it up TDH.SYS.CONFIG SEAMCALL will fail.  We don't
> need to add more code to detect that failure ourselves.  TDX is screwed
> either way.

Will do.  Thanks.
  

Patch

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 2cf7090667aa..43227af25e44 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -15,6 +15,7 @@ 
 #include <linux/cpumask.h>
 #include <linux/smp.h>
 #include <linux/atomic.h>
+#include <linux/align.h>
 #include <asm/msr-index.h>
 #include <asm/msr.h>
 #include <asm/apic.h>
@@ -40,6 +41,11 @@  static enum tdx_module_status_t tdx_module_status;
 /* Prevent concurrent attempts on TDX detection and initialization */
 static DEFINE_MUTEX(tdx_module_lock);
 
+/* Below two are used in TDH.SYS.INFO SEAMCALL ABI */
+static struct tdsysinfo_struct tdx_sysinfo;
+static struct cmr_info tdx_cmr_array[MAX_CMRS] __aligned(CMR_INFO_ARRAY_ALIGNMENT);
+static int tdx_cmr_num;
+
 /*
  * Detect TDX private KeyIDs to see whether TDX has been enabled by the
  * BIOS.  Both initializing the TDX module and running TDX guest require
@@ -208,6 +214,121 @@  static int tdx_module_init_cpus(void)
 	return atomic_read(&sc.err);
 }
 
+static inline bool is_cmr_empty(struct cmr_info *cmr)
+{
+	return !cmr->size;
+}
+
+static inline bool is_cmr_ok(struct cmr_info *cmr)
+{
+	/* CMR must be page aligned */
+	return IS_ALIGNED(cmr->base, PAGE_SIZE) &&
+		IS_ALIGNED(cmr->size, PAGE_SIZE);
+}
+
+static void print_cmrs(struct cmr_info *cmr_array, int cmr_num,
+		       const char *name)
+{
+	int i;
+
+	for (i = 0; i < cmr_num; i++) {
+		struct cmr_info *cmr = &cmr_array[i];
+
+		pr_info("%s : [0x%llx, 0x%llx)\n", name,
+				cmr->base, cmr->base + cmr->size);
+	}
+}
+
+/* Check CMRs reported by TDH.SYS.INFO, and trim tail empty CMRs. */
+static int trim_empty_cmrs(struct cmr_info *cmr_array, int *actual_cmr_num)
+{
+	struct cmr_info *cmr;
+	int i, cmr_num;
+
+	/*
+	 * Intel TDX module spec, 20.7.3 CMR_INFO:
+	 *
+	 *   TDH.SYS.INFO leaf function returns a MAX_CMRS (32) entry
+	 *   array of CMR_INFO entries. The CMRs are sorted from the
+	 *   lowest base address to the highest base address, and they
+	 *   are non-overlapping.
+	 *
+	 * This implies that BIOS may generate invalid empty entries
+	 * if total CMRs are less than 32.  Need to skip them manually.
+	 *
+	 * CMR also must be 4K aligned.  TDX doesn't trust BIOS.  TDX
+	 * actually verifies CMRs before it gets enabled, so anything
+	 * doesn't meet above means kernel bug (or TDX is broken).
+	 */
+	cmr = &cmr_array[0];
+	/* There must be at least one valid CMR */
+	if (WARN_ON_ONCE(is_cmr_empty(cmr) || !is_cmr_ok(cmr)))
+		goto err;
+
+	cmr_num = *actual_cmr_num;
+	for (i = 1; i < cmr_num; i++) {
+		struct cmr_info *cmr = &cmr_array[i];
+		struct cmr_info *prev_cmr = NULL;
+
+		/* Skip further empty CMRs */
+		if (is_cmr_empty(cmr))
+			break;
+
+		/*
+		 * Do sanity check anyway to make sure CMRs:
+		 *  - are 4K aligned
+		 *  - don't overlap
+		 *  - are in address ascending order.
+		 */
+		if (WARN_ON_ONCE(!is_cmr_ok(cmr)))
+			goto err;
+
+		prev_cmr = &cmr_array[i - 1];
+		if (WARN_ON_ONCE((prev_cmr->base + prev_cmr->size) >
+					cmr->base))
+			goto err;
+	}
+
+	/* Update the actual number of CMRs */
+	*actual_cmr_num = i;
+
+	/* Print kernel checked CMRs */
+	print_cmrs(cmr_array, *actual_cmr_num, "Kernel-checked-CMR");
+
+	return 0;
+err:
+	pr_info("[TDX broken ?]: Invalid CMRs detected\n");
+	print_cmrs(cmr_array, cmr_num, "BIOS-CMR");
+	return -EINVAL;
+}
+
+static int tdx_get_sysinfo(void)
+{
+	struct tdx_module_output out;
+	int ret;
+
+	BUILD_BUG_ON(sizeof(struct tdsysinfo_struct) != TDSYSINFO_STRUCT_SIZE);
+
+	ret = seamcall(TDH_SYS_INFO, __pa(&tdx_sysinfo), TDSYSINFO_STRUCT_SIZE,
+			__pa(tdx_cmr_array), MAX_CMRS, NULL, &out);
+	if (ret)
+		return ret;
+
+	/* R9 contains the actual entries written the CMR array. */
+	tdx_cmr_num = out.r9;
+
+	pr_info("TDX module: atributes 0x%x, vendor_id 0x%x, major_version %u, minor_version %u, build_date %u, build_num %u",
+		tdx_sysinfo.attributes, tdx_sysinfo.vendor_id,
+		tdx_sysinfo.major_version, tdx_sysinfo.minor_version,
+		tdx_sysinfo.build_date, tdx_sysinfo.build_num);
+
+	/*
+	 * trim_empty_cmrs() updates the actual number of CMRs by
+	 * dropping all tail empty CMRs.
+	 */
+	return trim_empty_cmrs(tdx_cmr_array, &tdx_cmr_num);
+}
+
 /*
  * Detect and initialize the TDX module.
  *
@@ -232,6 +353,10 @@  static int init_tdx_module(void)
 	if (ret)
 		goto out;
 
+	ret = tdx_get_sysinfo();
+	if (ret)
+		goto out;
+
 	/*
 	 * Return -EINVAL until all steps of TDX module initialization
 	 * process are done.
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 9ba11808bd45..8e273756098c 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -15,10 +15,71 @@ 
 /*
  * TDX module SEAMCALL leaf functions
  */
+#define TDH_SYS_INFO		32
 #define TDH_SYS_INIT		33
 #define TDH_SYS_LP_INIT		35
 #define TDH_SYS_LP_SHUTDOWN	44
 
+struct cmr_info {
+	u64	base;
+	u64	size;
+} __packed;
+
+#define MAX_CMRS			32
+#define CMR_INFO_ARRAY_ALIGNMENT	512
+
+struct cpuid_config {
+	u32	leaf;
+	u32	sub_leaf;
+	u32	eax;
+	u32	ebx;
+	u32	ecx;
+	u32	edx;
+} __packed;
+
+#define TDSYSINFO_STRUCT_SIZE		1024
+#define TDSYSINFO_STRUCT_ALIGNMENT	1024
+
+struct tdsysinfo_struct {
+	/* TDX-SEAM Module Info */
+	u32	attributes;
+	u32	vendor_id;
+	u32	build_date;
+	u16	build_num;
+	u16	minor_version;
+	u16	major_version;
+	u8	reserved0[14];
+	/* Memory Info */
+	u16	max_tdmrs;
+	u16	max_reserved_per_tdmr;
+	u16	pamt_entry_size;
+	u8	reserved1[10];
+	/* Control Struct Info */
+	u16	tdcs_base_size;
+	u8	reserved2[2];
+	u16	tdvps_base_size;
+	u8	tdvps_xfam_dependent_size;
+	u8	reserved3[9];
+	/* TD Capabilities */
+	u64	attributes_fixed0;
+	u64	attributes_fixed1;
+	u64	xfam_fixed0;
+	u64	xfam_fixed1;
+	u8	reserved4[32];
+	u32	num_cpuid_config;
+	/*
+	 * The actual number of CPUID_CONFIG depends on above
+	 * 'num_cpuid_config'.  The size of 'struct tdsysinfo_struct'
+	 * is 1024B defined by TDX architecture.  Use a union with
+	 * specific padding to make 'sizeof(struct tdsysinfo_struct)'
+	 * equal to 1024.
+	 */
+	union {
+		struct cpuid_config	cpuid_configs[0];
+		u8			reserved5[892];
+	};
+} __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT);
+
 /*
  * Do not put any hardware-defined TDX structure representations below
  * this comment!