Message ID | cd23a9583edcfa85e11612d94ecfd2d5e862c1d5.1668988357.git.kai.huang@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1322691wrr; Sun, 20 Nov 2022 16:29:25 -0800 (PST) X-Google-Smtp-Source: AA0mqf6sIs+xfg7TygmANUmtR2J8hYCC7pxJOSSB7wMzhrTHVkBW0f8ysCN/rBSmFuvSPUyilieG X-Received: by 2002:a17:902:dac2:b0:189:7d5:26ea with SMTP id q2-20020a170902dac200b0018907d526eamr9295730plx.145.1668990565486; Sun, 20 Nov 2022 16:29:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668990565; cv=none; d=google.com; s=arc-20160816; b=nQzxmtQRqCrd6j3+glDR6awNwA7bLMNu6/CbJH+tDTVnfrCo5M5J0Sg4tBejzRW2xn O9B1LBAShkXvleuxtS63sWJrCGJnjKkS+cxR1S7yEgJ/sFkJQEIiHM36JTECsQmPX3y0 VjfSIENuWoCKQ5gc1XQJbCjuRCAFCNWh8JQqV7mGzk71jpVwDqaXp1+gXwwBXe/dcoVN Gz4XTwhNk2Y2jAGiL1nN0HvxiVhPweYT7EI00BU4W9ttUrJZsl3HiDCSm43UT1VMjlRU BvYv+H/2qlGoxSX89MzMo4UTPqscgo2WlXLlVdyNRrq+gRCsrIrhovQncm1tYTQ3yFDn 8J+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=G7nW5sje5OqaY411P07fynbrjrGLuDjodie3EK+0lh4=; b=y/zJErMCcm3hw7IIje+ZpNvN80f7EELP9+GYj2yweq/IcGUMON5+PsK4HhbNFhJV7X hkBQmoIuzyC+OTjuqv1QVum5107hj52N8bfgscQCHNyUsXA7AroXCiNRYwSu1HWB/N4L CJfPDV3yO1zsZIol3PCtloG50eDQjT5xzCvCiP82tSuqRNmAEM2FoPSIN/4mZZoW5WUL VqNddM/jUIqCcR3M5gR5FMfzudOznVWNvwQhiR3pqT6SigB4Fdq5l6/CZ/t3W8nOCr9K +IXF3l32gYxg2ik8Ausm9cKe3yxuz3ntSi7APGfgA9jBwxpQ1dTx8itRwWCRB8+su+lo /cFw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=hyLIKqwq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q2-20020a056a00150200b0054d5253e7d7si9878964pfu.190.2022.11.20.16.29.13; Sun, 20 Nov 2022 16:29:25 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=hyLIKqwq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230018AbiKUA2q (ORCPT <rfc822;leviz.kernel.dev@gmail.com> + 99 others); Sun, 20 Nov 2022 19:28:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57434 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229890AbiKUA1f (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sun, 20 Nov 2022 19:27:35 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 17747616D; Sun, 20 Nov 2022 16:27:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668990454; x=1700526454; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=6Pa4jSAXA0/AWOM55d4JNHw0m9B5twdJCN8J0UtUQn8=; b=hyLIKqwq0kO/TATvXx1VJbXLf6jbfOc0ulSCkMEI0AUR42hgeNyKsf7g dwCh7zgLtT1qCTeoUBfFPhIxBtWt3nczOPvIavjBb1gf8oaUkbYqm2J4g YG8AwyRKg+OQb21Y1OaEXnAbPXGQ2fDJXdYT9hrqEj1QxHcNbx9YTFeCS NQnm0tC5UQxLc5pKaFBbi1D6Z1oS1Eo7rlfHjNVJZ6qX5SmTvQZLonkDm kf461/v1r3t+NhEqFBntYaPygUgDjFdtaZADY3iCdlynC0eXNS36bv21p HlPZgAtiYwGhlh24jUtOScpuhPh1XiQeBI1j+7lLfNohuHDoZQFldPvzH g==; X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="399732318" X-IronPort-AV: E=Sophos;i="5.96,180,1665471600"; d="scan'208";a="399732318" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2022 16:27:33 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="729825337" X-IronPort-AV: E=Sophos;i="5.96,180,1665471600"; d="scan'208";a="729825337" Received: from tomnavar-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.176.15]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2022 16:27:29 -0800 From: Kai Huang <kai.huang@intel.com> To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, dan.j.williams@intel.com, rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com, ying.huang@intel.com, reinette.chatre@intel.com, len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org, ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com, sagis@google.com, imammedo@redhat.com, kai.huang@intel.com Subject: [PATCH v7 09/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory Date: Mon, 21 Nov 2022 13:26:31 +1300 Message-Id: <cd23a9583edcfa85e11612d94ecfd2d5e862c1d5.1668988357.git.kai.huang@intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <cover.1668988357.git.kai.huang@intel.com> References: <cover.1668988357.git.kai.huang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1750063451511306314?= X-GMAIL-MSGID: =?utf-8?q?1750063451511306314?= |
Series |
TDX host kernel support
|
|
Commit Message
Kai Huang
Nov. 21, 2022, 12:26 a.m. UTC
TDX provides increased levels of memory confidentiality and integrity. This requires special hardware support for features like memory encryption and storage of memory integrity checksums. Not all memory satisfies these requirements. As a result, TDX introduced the concept of a "Convertible Memory Region" (CMR). During boot, the firmware builds a list of all of the memory ranges which can provide the TDX security guarantees. The list of these ranges, along with TDX module information, is available to the kernel by querying the TDX module via TDH.SYS.INFO SEAMCALL. The host kernel can choose whether or not to use all convertible memory regions as TDX-usable memory. Before the TDX module is ready to create any TDX guests, the kernel needs to configure the TDX-usable memory regions by passing an array of "TD Memory Regions" (TDMRs) to the TDX module. Constructing the TDMR array requires information of both the TDX module (TDSYSINFO_STRUCT) and the Convertible Memory Regions. Call TDH.SYS.INFO to get this information as a preparation. Use static variables for both TDSYSINFO_STRUCT and CMR array to avoid having to pass them as function arguments when constructing the TDMR array. And they are too big to be put to the stack anyway. Also, KVM needs to use the TDSYSINFO_STRUCT to create TDX guests. Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Kai Huang <kai.huang@intel.com> --- v6 -> v7: - Simplified the check of CMRs due to the fact that TDX actually verifies CMRs (that are passed by the BIOS) before enabling TDX. - Changed the function name from check_cmrs() -> trim_empty_cmrs(). - Added CMR page aligned check so that later patch can just get the PFN using ">> PAGE_SHIFT". v5 -> v6: - Added to also print TDX module's attribute (Isaku). - Removed all arguments in tdx_gete_sysinfo() to use static variables of 'tdx_sysinfo' and 'tdx_cmr_array' directly as they are all used directly in other functions in later patches. - Added Isaku's Reviewed-by. - v3 -> v5 (no feedback on v4): - Renamed sanitize_cmrs() to check_cmrs(). - Removed unnecessary sanity check against tdx_sysinfo and tdx_cmr_array actual size returned by TDH.SYS.INFO. - Changed -EFAULT to -EINVAL in couple places. - Added comments around tdx_sysinfo and tdx_cmr_array saying they are used by TDH.SYS.INFO ABI. - Changed to pass 'tdx_sysinfo' and 'tdx_cmr_array' as function arguments in tdx_get_sysinfo(). - Changed to only print BIOS-CMR when check_cmrs() fails. --- arch/x86/virt/vmx/tdx/tdx.c | 125 ++++++++++++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 61 ++++++++++++++++++ 2 files changed, 186 insertions(+)
Comments
On 11/20/22 16:26, Kai Huang wrote: > TDX provides increased levels of memory confidentiality and integrity. > This requires special hardware support for features like memory > encryption and storage of memory integrity checksums. Not all memory > satisfies these requirements. > > As a result, TDX introduced the concept of a "Convertible Memory Region" > (CMR). During boot, the firmware builds a list of all of the memory > ranges which can provide the TDX security guarantees. The list of these > ranges, along with TDX module information, is available to the kernel by > querying the TDX module via TDH.SYS.INFO SEAMCALL. I think the last sentence goes too far. What does it matter what the name of the SEAMCALL is? Who cares at this point? It's in the patch. Scroll down two pages if you really care. > The host kernel can choose whether or not to use all convertible memory > regions as TDX-usable memory. Before the TDX module is ready to create > any TDX guests, the kernel needs to configure the TDX-usable memory > regions by passing an array of "TD Memory Regions" (TDMRs) to the TDX > module. Constructing the TDMR array requires information of both the > TDX module (TDSYSINFO_STRUCT) and the Convertible Memory Regions. Call > TDH.SYS.INFO to get this information as a preparation. That last sentece is kinda goofy. I think there's a way to distill this whole thing down more effecively. CMRs tell the kernel which memory is TDX compatible. The kernel takes CMRs and constructs "TD Memory Regions" (TDMRs). TDMRs let the kernel grante TDX protections to some or all of the CMR areas. > Use static variables for both TDSYSINFO_STRUCT and CMR array to avoid I find it very useful to be precise when referring to code. Your code says 'tdsysinfo_struct', yet this says 'TDSYSINFO_STRUCT'. Why the difference? > having to pass them as function arguments when constructing the TDMR > array. And they are too big to be put to the stack anyway. Also, KVM > needs to use the TDSYSINFO_STRUCT to create TDX guests. This is also a great place to mention that the tdsysinfo_struct contains a *lot* of gunk which will not be used for a bit or that may never get used. > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > index 2cf7090667aa..43227af25e44 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.c > +++ b/arch/x86/virt/vmx/tdx/tdx.c > @@ -15,6 +15,7 @@ > #include <linux/cpumask.h> > #include <linux/smp.h> > #include <linux/atomic.h> > +#include <linux/align.h> > #include <asm/msr-index.h> > #include <asm/msr.h> > #include <asm/apic.h> > @@ -40,6 +41,11 @@ static enum tdx_module_status_t tdx_module_status; > /* Prevent concurrent attempts on TDX detection and initialization */ > static DEFINE_MUTEX(tdx_module_lock); > > +/* Below two are used in TDH.SYS.INFO SEAMCALL ABI */ > +static struct tdsysinfo_struct tdx_sysinfo; > +static struct cmr_info tdx_cmr_array[MAX_CMRS] __aligned(CMR_INFO_ARRAY_ALIGNMENT); > +static int tdx_cmr_num; > + > /* > * Detect TDX private KeyIDs to see whether TDX has been enabled by the > * BIOS. Both initializing the TDX module and running TDX guest require > @@ -208,6 +214,121 @@ static int tdx_module_init_cpus(void) > return atomic_read(&sc.err); > } > > +static inline bool is_cmr_empty(struct cmr_info *cmr) > +{ > + return !cmr->size; > +} > + > +static inline bool is_cmr_ok(struct cmr_info *cmr) > +{ > + /* CMR must be page aligned */ > + return IS_ALIGNED(cmr->base, PAGE_SIZE) && > + IS_ALIGNED(cmr->size, PAGE_SIZE); > +} > + > +static void print_cmrs(struct cmr_info *cmr_array, int cmr_num, > + const char *name) > +{ > + int i; > + > + for (i = 0; i < cmr_num; i++) { > + struct cmr_info *cmr = &cmr_array[i]; > + > + pr_info("%s : [0x%llx, 0x%llx)\n", name, > + cmr->base, cmr->base + cmr->size); > + } > +} > + > +/* Check CMRs reported by TDH.SYS.INFO, and trim tail empty CMRs. */ > +static int trim_empty_cmrs(struct cmr_info *cmr_array, int *actual_cmr_num) > +{ > + struct cmr_info *cmr; > + int i, cmr_num; > + > + /* > + * Intel TDX module spec, 20.7.3 CMR_INFO: > + * > + * TDH.SYS.INFO leaf function returns a MAX_CMRS (32) entry > + * array of CMR_INFO entries. The CMRs are sorted from the > + * lowest base address to the highest base address, and they > + * are non-overlapping. > + * > + * This implies that BIOS may generate invalid empty entries > + * if total CMRs are less than 32. Need to skip them manually. > + * > + * CMR also must be 4K aligned. TDX doesn't trust BIOS. TDX > + * actually verifies CMRs before it gets enabled, so anything > + * doesn't meet above means kernel bug (or TDX is broken). > + */ I dislike comments like this that describe all the code below. Can't you simply put the comment near the code that implements it? > + cmr = &cmr_array[0]; > + /* There must be at least one valid CMR */ > + if (WARN_ON_ONCE(is_cmr_empty(cmr) || !is_cmr_ok(cmr))) > + goto err; > + > + cmr_num = *actual_cmr_num; > + for (i = 1; i < cmr_num; i++) { > + struct cmr_info *cmr = &cmr_array[i]; > + struct cmr_info *prev_cmr = NULL; > + > + /* Skip further empty CMRs */ > + if (is_cmr_empty(cmr)) > + break; > + > + /* > + * Do sanity check anyway to make sure CMRs: > + * - are 4K aligned > + * - don't overlap > + * - are in address ascending order. > + */ > + if (WARN_ON_ONCE(!is_cmr_ok(cmr))) > + goto err; Why does cmr_array[0] get a pass on the empty and sanity checks? > + prev_cmr = &cmr_array[i - 1]; > + if (WARN_ON_ONCE((prev_cmr->base + prev_cmr->size) > > + cmr->base)) > + goto err; > + } > + > + /* Update the actual number of CMRs */ > + *actual_cmr_num = i; That comment is not helpful. Yes, this is literally updating the number of CMRs. Literally. That's the "what". But, the "why" is important. Why is it doing this? > + /* Print kernel checked CMRs */ > + print_cmrs(cmr_array, *actual_cmr_num, "Kernel-checked-CMR"); This is the point where I start to lose patience with these comments. These are just a waste of space. Also, I saw the loop above check 'cmr_num' CMRs for is_cmr_ok(). Now, it'll print an 'actual_cmr_num=1' number of CMRs as being "kernel-checked". Why? That makes zero sense. > + return 0; > +err: > + pr_info("[TDX broken ?]: Invalid CMRs detected\n"); > + print_cmrs(cmr_array, cmr_num, "BIOS-CMR"); > + return -EINVAL; > +} > + > +static int tdx_get_sysinfo(void) > +{ > + struct tdx_module_output out; > + int ret; > + > + BUILD_BUG_ON(sizeof(struct tdsysinfo_struct) != TDSYSINFO_STRUCT_SIZE); > + > + ret = seamcall(TDH_SYS_INFO, __pa(&tdx_sysinfo), TDSYSINFO_STRUCT_SIZE, > + __pa(tdx_cmr_array), MAX_CMRS, NULL, &out); > + if (ret) > + return ret; > + > + /* R9 contains the actual entries written the CMR array. */ > + tdx_cmr_num = out.r9; > + > + pr_info("TDX module: atributes 0x%x, vendor_id 0x%x, major_version %u, minor_version %u, build_date %u, build_num %u", > + tdx_sysinfo.attributes, tdx_sysinfo.vendor_id, > + tdx_sysinfo.major_version, tdx_sysinfo.minor_version, > + tdx_sysinfo.build_date, tdx_sysinfo.build_num); This is a case where a little bit of vertical alignment will go a long way: > + tdx_sysinfo.attributes, tdx_sysinfo.vendor_id, > + tdx_sysinfo.major_version, tdx_sysinfo.minor_version, > + tdx_sysinfo.build_date, tdx_sysinfo.build_num); > + > + /* > + * trim_empty_cmrs() updates the actual number of CMRs by > + * dropping all tail empty CMRs. > + */ > + return trim_empty_cmrs(tdx_cmr_array, &tdx_cmr_num); > +} Why does this both need to respect the "tdx_cmr_num = out.r9" value *and* trim the empty ones? Couldn't it just ignore the "tdx_cmr_num = out.r9" value and just trim the empty ones either way? It's not like there is a billion of them. It would simplify the code for sure. > /* > * Detect and initialize the TDX module. > * > @@ -232,6 +353,10 @@ static int init_tdx_module(void) > if (ret) > goto out; > > + ret = tdx_get_sysinfo(); > + if (ret) > + goto out; > + > /* > * Return -EINVAL until all steps of TDX module initialization > * process are done. > diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h > index 9ba11808bd45..8e273756098c 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.h > +++ b/arch/x86/virt/vmx/tdx/tdx.h > @@ -15,10 +15,71 @@ > /* > * TDX module SEAMCALL leaf functions > */ > +#define TDH_SYS_INFO 32 > #define TDH_SYS_INIT 33 > #define TDH_SYS_LP_INIT 35 > #define TDH_SYS_LP_SHUTDOWN 44 > > +struct cmr_info { > + u64 base; > + u64 size; > +} __packed; > + > +#define MAX_CMRS 32 > +#define CMR_INFO_ARRAY_ALIGNMENT 512 > + > +struct cpuid_config { > + u32 leaf; > + u32 sub_leaf; > + u32 eax; > + u32 ebx; > + u32 ecx; > + u32 edx; > +} __packed; > + > +#define TDSYSINFO_STRUCT_SIZE 1024 > +#define TDSYSINFO_STRUCT_ALIGNMENT 1024 > + > +struct tdsysinfo_struct { > + /* TDX-SEAM Module Info */ > + u32 attributes; > + u32 vendor_id; > + u32 build_date; > + u16 build_num; > + u16 minor_version; > + u16 major_version; > + u8 reserved0[14]; > + /* Memory Info */ > + u16 max_tdmrs; > + u16 max_reserved_per_tdmr; > + u16 pamt_entry_size; > + u8 reserved1[10]; > + /* Control Struct Info */ > + u16 tdcs_base_size; > + u8 reserved2[2]; > + u16 tdvps_base_size; > + u8 tdvps_xfam_dependent_size; > + u8 reserved3[9]; > + /* TD Capabilities */ > + u64 attributes_fixed0; > + u64 attributes_fixed1; > + u64 xfam_fixed0; > + u64 xfam_fixed1; > + u8 reserved4[32]; > + u32 num_cpuid_config; > + /* > + * The actual number of CPUID_CONFIG depends on above > + * 'num_cpuid_config'. The size of 'struct tdsysinfo_struct' > + * is 1024B defined by TDX architecture. Use a union with > + * specific padding to make 'sizeof(struct tdsysinfo_struct)' > + * equal to 1024. > + */ > + union { > + struct cpuid_config cpuid_configs[0]; > + u8 reserved5[892]; > + }; Can you double check what the "right" way to do variable arrays is these days? I thought the [0] method was discouraged. Also, it isn't *really* 892 bytes of reserved space, right? Anything that's not cpuid_configs[] is reserved, I presume. Could you try to be more precise there? > +} __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT); > + > /* > * Do not put any hardware-defined TDX structure representations below > * this comment!
On Tue, 2022-11-22 at 15:39 -0800, Dave Hansen wrote: > On 11/20/22 16:26, Kai Huang wrote: > > TDX provides increased levels of memory confidentiality and integrity. > > This requires special hardware support for features like memory > > encryption and storage of memory integrity checksums. Not all memory > > satisfies these requirements. > > > > As a result, TDX introduced the concept of a "Convertible Memory Region" > > (CMR). During boot, the firmware builds a list of all of the memory > > ranges which can provide the TDX security guarantees. The list of these > > ranges, along with TDX module information, is available to the kernel by > > querying the TDX module via TDH.SYS.INFO SEAMCALL. > > I think the last sentence goes too far. What does it matter what the > name of the SEAMCALL is? Who cares at this point? It's in the patch. > Scroll down two pages if you really care. I'll remove "via TDH.SYS.INFO SEAMCALL". > > > The host kernel can choose whether or not to use all convertible memory > > regions as TDX-usable memory. Before the TDX module is ready to create > > any TDX guests, the kernel needs to configure the TDX-usable memory > > regions by passing an array of "TD Memory Regions" (TDMRs) to the TDX > > module. Constructing the TDMR array requires information of both the > > TDX module (TDSYSINFO_STRUCT) and the Convertible Memory Regions. Call > > TDH.SYS.INFO to get this information as a preparation. > > That last sentece is kinda goofy. I think there's a way to distill this > whole thing down more effecively. > > CMRs tell the kernel which memory is TDX compatible. The kernel > takes CMRs and constructs "TD Memory Regions" (TDMRs). TDMRs > let the kernel grante TDX protections to some or all of the CMR > areas. Will do. But it seems we should still mention "Constructing TDMRs requires information of both the TDX module (TDSYSINFO_STRUCT) and the CMRs"? The reason is to justify "use static to avoid having to pass them as function arguments when constructing TDMRs" below. > > > Use static variables for both TDSYSINFO_STRUCT and CMR array to avoid > > I find it very useful to be precise when referring to code. Your code > says 'tdsysinfo_struct', yet this says 'TDSYSINFO_STRUCT'. Why the > difference? Here I actually didn't intend to refer to any code. In the above paragraph (that is going to be replaced with yours), I mentioned "TDSYSINFO_STRUCT" to explain what does "information of the TDX module" actually refer to, since TDSYSINFO_STRUCT is used in the spec. What's your preference? > > > having to pass them as function arguments when constructing the TDMR > > array. And they are too big to be put to the stack anyway. Also, KVM > > needs to use the TDSYSINFO_STRUCT to create TDX guests. > > This is also a great place to mention that the tdsysinfo_struct contains > a *lot* of gunk which will not be used for a bit or that may never get > used. Perhaps below? "Note many members in tdsysinfo_struct' are not used by the kernel". Btw, may I ask why does it matter? [...] > > + > > +/* Check CMRs reported by TDH.SYS.INFO, and trim tail empty CMRs. */ > > +static int trim_empty_cmrs(struct cmr_info *cmr_array, int *actual_cmr_num) > > +{ > > + struct cmr_info *cmr; > > + int i, cmr_num; > > + > > + /* > > + * Intel TDX module spec, 20.7.3 CMR_INFO: > > + * > > + * TDH.SYS.INFO leaf function returns a MAX_CMRS (32) entry > > + * array of CMR_INFO entries. The CMRs are sorted from the > > + * lowest base address to the highest base address, and they > > + * are non-overlapping. > > + * > > + * This implies that BIOS may generate invalid empty entries > > + * if total CMRs are less than 32. Need to skip them manually. > > + * > > + * CMR also must be 4K aligned. TDX doesn't trust BIOS. TDX > > + * actually verifies CMRs before it gets enabled, so anything > > + * doesn't meet above means kernel bug (or TDX is broken). > > + */ > > I dislike comments like this that describe all the code below. Can't > you simply put the comment near the code that implements it? Will do. > > > + cmr = &cmr_array[0]; > > + /* There must be at least one valid CMR */ > > + if (WARN_ON_ONCE(is_cmr_empty(cmr) || !is_cmr_ok(cmr))) > > + goto err; > > + > > + cmr_num = *actual_cmr_num; > > + for (i = 1; i < cmr_num; i++) { > > + struct cmr_info *cmr = &cmr_array[i]; > > + struct cmr_info *prev_cmr = NULL; > > + > > + /* Skip further empty CMRs */ > > + if (is_cmr_empty(cmr)) > > + break; > > + > > + /* > > + * Do sanity check anyway to make sure CMRs: > > + * - are 4K aligned > > + * - don't overlap > > + * - are in address ascending order. > > + */ > > + if (WARN_ON_ONCE(!is_cmr_ok(cmr))) > > + goto err; > > Why does cmr_array[0] get a pass on the empty and sanity checks? TDX MCHECK verifies CMRs before enabling TDX, so there must be at least one valid CMR. And cmr_array[0] is checked before this loop. > > > + prev_cmr = &cmr_array[i - 1]; > > + if (WARN_ON_ONCE((prev_cmr->base + prev_cmr->size) > > > + cmr->base)) > > + goto err; > > + } > > + > > + /* Update the actual number of CMRs */ > > + *actual_cmr_num = i; > > That comment is not helpful. Yes, this is literally updating the number > of CMRs. Literally. That's the "what". But, the "why" is important. > Why is it doing this? When building the list of "TDX-usable" memory regions, the kernel verifies those regions against CMRs to see whether they are truly convertible memory. How about adding a comment like below: /* * When the kernel builds the TDX-usable memory regions, it verifies * they are truly convertible memory by checking them against CMRs. * Update the actual number of CMRs to skip those empty CMRs. */ Also, I think printing CMRs in the dmesg is helpful. Printing empty (zero) CMRs will put meaningless log to the dmesg. > > > + /* Print kernel checked CMRs */ > > + print_cmrs(cmr_array, *actual_cmr_num, "Kernel-checked-CMR"); > > This is the point where I start to lose patience with these comments. > These are just a waste of space. Sorry will remove. > > Also, I saw the loop above check 'cmr_num' CMRs for is_cmr_ok(). Now, > it'll print an 'actual_cmr_num=1' number of CMRs as being > "kernel-checked". Why? That makes zero sense. The loop quits when it sees an empty CMR. I think there's no need to check further CMRs as they must be empty (TDX MCHECK verifies CMRs). > > > + return 0; > > +err: > > + pr_info("[TDX broken ?]: Invalid CMRs detected\n"); > > + print_cmrs(cmr_array, cmr_num, "BIOS-CMR"); > > + return -EINVAL; > > +} > > + > > +static int tdx_get_sysinfo(void) > > +{ > > + struct tdx_module_output out; > > + int ret; > > + > > + BUILD_BUG_ON(sizeof(struct tdsysinfo_struct) != TDSYSINFO_STRUCT_SIZE); > > + > > + ret = seamcall(TDH_SYS_INFO, __pa(&tdx_sysinfo), TDSYSINFO_STRUCT_SIZE, > > + __pa(tdx_cmr_array), MAX_CMRS, NULL, &out); > > + if (ret) > > + return ret; > > + > > + /* R9 contains the actual entries written the CMR array. */ > > + tdx_cmr_num = out.r9; > > + > > + pr_info("TDX module: atributes 0x%x, vendor_id 0x%x, major_version %u, minor_version %u, build_date %u, build_num %u", > > + tdx_sysinfo.attributes, tdx_sysinfo.vendor_id, > > + tdx_sysinfo.major_version, tdx_sysinfo.minor_version, > > + tdx_sysinfo.build_date, tdx_sysinfo.build_num); > > This is a case where a little bit of vertical alignment will go a long way: > > > + tdx_sysinfo.attributes, tdx_sysinfo.vendor_id, > > + tdx_sysinfo.major_version, tdx_sysinfo.minor_version, > > + tdx_sysinfo.build_date, tdx_sysinfo.build_num); Thanks will do. > > > + > > + /* > > + * trim_empty_cmrs() updates the actual number of CMRs by > > + * dropping all tail empty CMRs. > > + */ > > + return trim_empty_cmrs(tdx_cmr_array, &tdx_cmr_num); > > +} > > Why does this both need to respect the "tdx_cmr_num = out.r9" value > *and* trim the empty ones? Couldn't it just ignore the "tdx_cmr_num = > out.r9" value and just trim the empty ones either way? It's not like > there is a billion of them. It would simplify the code for sure. OK. Since spec says MAX_CMRs is 32, so I can use 32 instead of reading out from R9. [...] > > +struct cpuid_config { > > + u32 leaf; > > + u32 sub_leaf; > > + u32 eax; > > + u32 ebx; > > + u32 ecx; > > + u32 edx; > > +} __packed; > > + > > +#define TDSYSINFO_STRUCT_SIZE 1024 > > +#define TDSYSINFO_STRUCT_ALIGNMENT 1024 > > + > > +struct tdsysinfo_struct { > > + /* TDX-SEAM Module Info */ > > + u32 attributes; > > + u32 vendor_id; > > + u32 build_date; > > + u16 build_num; > > + u16 minor_version; > > + u16 major_version; > > + u8 reserved0[14]; > > + /* Memory Info */ > > + u16 max_tdmrs; > > + u16 max_reserved_per_tdmr; > > + u16 pamt_entry_size; > > + u8 reserved1[10]; > > + /* Control Struct Info */ > > + u16 tdcs_base_size; > > + u8 reserved2[2]; > > + u16 tdvps_base_size; > > + u8 tdvps_xfam_dependent_size; > > + u8 reserved3[9]; > > + /* TD Capabilities */ > > + u64 attributes_fixed0; > > + u64 attributes_fixed1; > > + u64 xfam_fixed0; > > + u64 xfam_fixed1; > > + u8 reserved4[32]; > > + u32 num_cpuid_config; > > + /* > > + * The actual number of CPUID_CONFIG depends on above > > + * 'num_cpuid_config'. The size of 'struct tdsysinfo_struct' > > + * is 1024B defined by TDX architecture. Use a union with > > + * specific padding to make 'sizeof(struct tdsysinfo_struct)' > > + * equal to 1024. > > + */ > > + union { > > + struct cpuid_config cpuid_configs[0]; > > + u8 reserved5[892]; > > + }; > > Can you double check what the "right" way to do variable arrays is these > days? I thought the [0] method was discouraged. > > Also, it isn't *really* 892 bytes of reserved space, right? Anything > that's not cpuid_configs[] is reserved, I presume. Could you try to be > more precise there? I'll do some study first here and get back to you. Thanks. The intention is to make sure the structure size is 1024B, so that the static variable will have enough space for the TDX module to write.
On 11/23/22 03:40, Huang, Kai wrote: > On Tue, 2022-11-22 at 15:39 -0800, Dave Hansen wrote: >> That last sentece is kinda goofy. I think there's a way to distill this >> whole thing down more effecively. >> >> CMRs tell the kernel which memory is TDX compatible. The kernel >> takes CMRs and constructs "TD Memory Regions" (TDMRs). TDMRs >> let the kernel grant TDX protections to some or all of the CMR >> areas. > > Will do. > > But it seems we should still mention "Constructing TDMRs requires information of > both the TDX module (TDSYSINFO_STRUCT) and the CMRs"? The reason is to justify > "use static to avoid having to pass them as function arguments when constructing > TDMRs" below. In a changelog, no. You do *NOT* use super technical language in changelogs if not super necessary. Mentioning "TDSYSINFO_STRUCT" here is useless. The *MOST* you would do for a good changelog is: The kernel takes CMRs (plus a little more metadata) and constructs "TD Memory Regions" (TDMRs). You just need to talk about things at a high level in mostly non-technical language so that folks know the structure of the code below. It's not a replacement for the code, the comments, *OR* the TDX module specification. I'm also not quite sure that this justifies the static variables anyway. They could be dynamically allocated and passed around, for instance. >>> Use static variables for both TDSYSINFO_STRUCT and CMR array to avoid >> >> I find it very useful to be precise when referring to code. Your code >> says 'tdsysinfo_struct', yet this says 'TDSYSINFO_STRUCT'. Why the >> difference? > > Here I actually didn't intend to refer to any code. In the above paragraph > (that is going to be replaced with yours), I mentioned "TDSYSINFO_STRUCT" to > explain what does "information of the TDX module" actually refer to, since > TDSYSINFO_STRUCT is used in the spec. > > What's your preference? Kill all mentions to TDSYSINFO_STRUCT whatsoever in the changelog. Write comprehensible English. >>> having to pass them as function arguments when constructing the TDMR >>> array. And they are too big to be put to the stack anyway. Also, KVM >>> needs to use the TDSYSINFO_STRUCT to create TDX guests. >> >> This is also a great place to mention that the tdsysinfo_struct contains >> a *lot* of gunk which will not be used for a bit or that may never get >> used. > > Perhaps below? > > "Note many members in tdsysinfo_struct' are not used by the kernel". > > Btw, may I ask why does it matter? Because you're adding a massive structure with all kinds of fields. Those fields mostly aren't used. That could be from an error in this series, or because they will be used later or because they will *never* be used. >>> + cmr = &cmr_array[0]; >>> + /* There must be at least one valid CMR */ >>> + if (WARN_ON_ONCE(is_cmr_empty(cmr) || !is_cmr_ok(cmr))) >>> + goto err; >>> + >>> + cmr_num = *actual_cmr_num; >>> + for (i = 1; i < cmr_num; i++) { >>> + struct cmr_info *cmr = &cmr_array[i]; >>> + struct cmr_info *prev_cmr = NULL; >>> + >>> + /* Skip further empty CMRs */ >>> + if (is_cmr_empty(cmr)) >>> + break; >>> + >>> + /* >>> + * Do sanity check anyway to make sure CMRs: >>> + * - are 4K aligned >>> + * - don't overlap >>> + * - are in address ascending order. >>> + */ >>> + if (WARN_ON_ONCE(!is_cmr_ok(cmr))) >>> + goto err; >> >> Why does cmr_array[0] get a pass on the empty and sanity checks? > > TDX MCHECK verifies CMRs before enabling TDX, so there must be at least one > valid CMR. > > And cmr_array[0] is checked before this loop. I think you're confusing two separate things. MCHECK ensures that there is convertible memory. The CMRs that this code looks at are software (TD module) defined and created structures that the OS and the module share. This cmr_array[] structure is not created by MCHECK. Go look at your code. Consider what will happen if cmr_array[0] is empty or !is_cmr_ok(). Then consider what will happen if cmr_array[1] has the same happen. Does that end result really justify having separate code for cmr_array[0] and cmr_array[>0]? >>> + prev_cmr = &cmr_array[i - 1]; >>> + if (WARN_ON_ONCE((prev_cmr->base + prev_cmr->size) > >>> + cmr->base)) >>> + goto err; >>> + } >>> + >>> + /* Update the actual number of CMRs */ >>> + *actual_cmr_num = i; >> >> That comment is not helpful. Yes, this is literally updating the number >> of CMRs. Literally. That's the "what". But, the "why" is important. >> Why is it doing this? > > When building the list of "TDX-usable" memory regions, the kernel verifies those > regions against CMRs to see whether they are truly convertible memory. > > How about adding a comment like below: > > /* > * When the kernel builds the TDX-usable memory regions, it verifies > * they are truly convertible memory by checking them against CMRs. > * Update the actual number of CMRs to skip those empty CMRs. > */ > > Also, I think printing CMRs in the dmesg is helpful. Printing empty (zero) CMRs > will put meaningless log to the dmesg. So it's just about printing them? Then put a dang switch to the print function that says "print them all" or not. ... >> Also, I saw the loop above check 'cmr_num' CMRs for is_cmr_ok(). Now, >> it'll print an 'actual_cmr_num=1' number of CMRs as being >> "kernel-checked". Why? That makes zero sense. > > The loop quits when it sees an empty CMR. I think there's no need to check > further CMRs as they must be empty (TDX MCHECK verifies CMRs). OK, so you're going to get some more homework here. Please explain to me how MCHECK and the CMR array that comes out of the TDX module are related. How does the output from MCHECK get turned into the in-memory cmr_array[], step by step? At this point, I fear that you're offering up MCHECK like it's a bag of magic beans rather than really truly thinking about the cmr_array[] data structure. How it is generated? How might it be broken? Who might break it? If so, what the kernel should do about it? >>> + >>> + /* >>> + * trim_empty_cmrs() updates the actual number of CMRs by >>> + * dropping all tail empty CMRs. >>> + */ >>> + return trim_empty_cmrs(tdx_cmr_array, &tdx_cmr_num); >>> +} >> >> Why does this both need to respect the "tdx_cmr_num = out.r9" value >> *and* trim the empty ones? Couldn't it just ignore the "tdx_cmr_num = >> out.r9" value and just trim the empty ones either way? It's not like >> there is a billion of them. It would simplify the code for sure. > > OK. Since spec says MAX_CMRs is 32, so I can use 32 instead of reading out from > R9. But then you still have the "trimming" code. Why not just trust "r9" and then axe all the trimming code? Heck, and most of the sanity checks. This code could be a *lot* smaller.
On Wed, 2022-11-23 at 08:44 -0800, Dave Hansen wrote: > > On 11/23/22 03:40, Huang, Kai wrote: > > > > On Tue, 2022-11-22 at 15:39 -0800, Dave Hansen wrote: > > > > > > That last sentece is kinda goofy. I think there's a way to distill this > > > > > > whole thing down more effecively. > > > > > > > > > > > > CMRs tell the kernel which memory is TDX compatible. The kernel > > > > > > takes CMRs and constructs "TD Memory Regions" (TDMRs). TDMRs > > > > > > let the kernel grant TDX protections to some or all of the CMR > > > > > > areas. > > > > > > > > Will do. > > > > > > > > But it seems we should still mention "Constructing TDMRs requires information of > > > > both the TDX module (TDSYSINFO_STRUCT) and the CMRs"? The reason is to justify > > > > "use static to avoid having to pass them as function arguments when constructing > > > > TDMRs" below. > > > > In a changelog, no. You do *NOT* use super technical language in > > changelogs if not super necessary. Mentioning "TDSYSINFO_STRUCT" here > > is useless. The *MOST* you would do for a good changelog is: > > > > The kernel takes CMRs (plus a little more metadata) and > > constructs "TD Memory Regions" (TDMRs). > > > > You just need to talk about things at a high level in mostly > > non-technical language so that folks know the structure of the code > > below. It's not a replacement for the code, the comments, *OR* the TDX > > module specification. > > > > I'm also not quite sure that this justifies the static variables anyway. > > They could be dynamically allocated and passed around, for instance. I see. Thanks for explaining. > > > > > > > > > > Use static variables for both TDSYSINFO_STRUCT and CMR array to avoid > > > > > > > > > > > > I find it very useful to be precise when referring to code. Your code > > > > > > says 'tdsysinfo_struct', yet this says 'TDSYSINFO_STRUCT'. Why the > > > > > > difference? > > > > > > > > Here I actually didn't intend to refer to any code. In the above paragraph > > > > (that is going to be replaced with yours), I mentioned "TDSYSINFO_STRUCT" to > > > > explain what does "information of the TDX module" actually refer to, since > > > > TDSYSINFO_STRUCT is used in the spec. > > > > > > > > What's your preference? > > > > Kill all mentions to TDSYSINFO_STRUCT whatsoever in the changelog. > > Write comprehensible English. OK. > > > > > > > > > > having to pass them as function arguments when constructing the TDMR > > > > > > > > array. And they are too big to be put to the stack anyway. Also, KVM > > > > > > > > needs to use the TDSYSINFO_STRUCT to create TDX guests. > > > > > > > > > > > > This is also a great place to mention that the tdsysinfo_struct contains > > > > > > a *lot* of gunk which will not be used for a bit or that may never get > > > > > > used. > > > > > > > > Perhaps below? > > > > > > > > "Note many members in tdsysinfo_struct' are not used by the kernel". > > > > > > > > Btw, may I ask why does it matter? > > > > Because you're adding a massive structure with all kinds of fields. > > Those fields mostly aren't used. That could be from an error in this > > series, or because they will be used later or because they will *never* > > be used. OK. > > > > > > > > > > + cmr = &cmr_array[0]; > > > > > > > > + /* There must be at least one valid CMR */ > > > > > > > > + if (WARN_ON_ONCE(is_cmr_empty(cmr) || !is_cmr_ok(cmr))) > > > > > > > > + goto err; > > > > > > > > + > > > > > > > > + cmr_num = *actual_cmr_num; > > > > > > > > + for (i = 1; i < cmr_num; i++) { > > > > > > > > + struct cmr_info *cmr = &cmr_array[i]; > > > > > > > > + struct cmr_info *prev_cmr = NULL; > > > > > > > > + > > > > > > > > + /* Skip further empty CMRs */ > > > > > > > > + if (is_cmr_empty(cmr)) > > > > > > > > + break; > > > > > > > > + > > > > > > > > + /* > > > > > > > > + * Do sanity check anyway to make sure CMRs: > > > > > > > > + * - are 4K aligned > > > > > > > > + * - don't overlap > > > > > > > > + * - are in address ascending order. > > > > > > > > + */ > > > > > > > > + if (WARN_ON_ONCE(!is_cmr_ok(cmr))) > > > > > > > > + goto err; > > > > > > > > > > > > Why does cmr_array[0] get a pass on the empty and sanity checks? > > > > > > > > TDX MCHECK verifies CMRs before enabling TDX, so there must be at least one > > > > valid CMR. > > > > > > > > And cmr_array[0] is checked before this loop. > > > > I think you're confusing two separate things. MCHECK ensures that there > > is convertible memory. The CMRs that this code looks at are software > > (TD module) defined and created structures that the OS and the module share. Not sure whether I completely got your words, but the CMRs are generated by the BIOS, verified and stored by the MCHECK. Thus the CMR structure is also meaningful to the BIOS and the MCHECK, but not TDX module defined and created. There are couple of places in the TDX module spec which says this. One example is "Table 3.1: Typical Intel TDX Module Platform-Scope Initialization Sequence" and "13.1.1. Initialization and Configuration Flow". They both mention: "BIOS configures Convertible Memory Regions (CMRs); MCHECK checks them and securely stores the information." Also, "20.8.3 CMR_INFO": "CMR_INFO is designed to provide information about a Convertible Memory Range (CMR), as configured by BIOS and checked and stored securely by MCHECK." > > > > This cmr_array[] structure is not created by MCHECK. Right. But TDH.SYS.INFO only "Retrieve Intel TDX module information and convertible memory (CMR) information." by writing CMRs to the buffer provided by the kernel (cmr_array[]). So my understanding is the entries in the cmr_array[] are just the same CMRs that are verified by the MCHECK. > > > > Go look at your code. Consider what will happen if cmr_array[0] is > > empty or !is_cmr_ok(). Then consider what will happen if cmr_array[1] > > has the same happen. > > > > Does that end result really justify having separate code for > > cmr_array[0] and cmr_array[>0]? One slight difference is cmr_array[0] must be valid, but cmr_array[>1] can be empty. And for cmr_array[>0] we also have additional check against the previous one. > > > > > > > > > > + prev_cmr = &cmr_array[i - 1]; > > > > > > > > + if (WARN_ON_ONCE((prev_cmr->base + prev_cmr->size) > > > > > > > > > + cmr->base)) > > > > > > > > + goto err; > > > > > > > > + } > > > > > > > > + > > > > > > > > + /* Update the actual number of CMRs */ > > > > > > > > + *actual_cmr_num = i; > > > > > > > > > > > > That comment is not helpful. Yes, this is literally updating the number > > > > > > of CMRs. Literally. That's the "what". But, the "why" is important. > > > > > > Why is it doing this? > > > > > > > > When building the list of "TDX-usable" memory regions, the kernel verifies those > > > > regions against CMRs to see whether they are truly convertible memory. > > > > > > > > How about adding a comment like below: > > > > > > > > /* > > > > * When the kernel builds the TDX-usable memory regions, it verifies > > > > * they are truly convertible memory by checking them against CMRs. > > > > * Update the actual number of CMRs to skip those empty CMRs. > > > > */ > > > > > > > > Also, I think printing CMRs in the dmesg is helpful. Printing empty (zero) CMRs > > > > will put meaningless log to the dmesg. > > > > So it's just about printing them? > > > > Then put a dang switch to the print function that says "print them all" > > or not. Yes can do. Currently "print them all" is only done when the CMR sanity check fails. We can unconditionally "print valid CMRs" if we don't need that check. > > > > ... > > > > > > Also, I saw the loop above check 'cmr_num' CMRs for is_cmr_ok(). Now, > > > > > > it'll print an 'actual_cmr_num=1' number of CMRs as being > > > > > > "kernel-checked". Why? That makes zero sense. > > > > > > > > The loop quits when it sees an empty CMR. I think there's no need to check > > > > further CMRs as they must be empty (TDX MCHECK verifies CMRs). > > > > OK, so you're going to get some more homework here. Please explain to > > me how MCHECK and the CMR array that comes out of the TDX module are > > related. How does the output from MCHECK get turned into the in-memory > > cmr_array[], step by step? > > (Please also see my above reply) 1. BIOS generates the CMRs and pass to the MCHECK 2. MCHECK verifies CMRs and stores the "CMR table in a pre-defined location in SEAMRR’s SEAMCFG region so it can be read later and trusted by the Intel TDX module" (13.1.4.1 Intel TDX ISA Background: Convertible Memory Ranges (CMRs)). 3. TDH.SYS.INFO copies the CMRs to the buffer provided by the kernel (cmr_array[]). > > At this point, I fear that you're offering up MCHECK like it's a bag of > > magic beans rather than really truly thinking about the cmr_array[] data > > structure. How it is generated? How might it be broken? Who might > > break it? If so, what the kernel should do about it? Only kernel bug can break the cmr_array[] I think. As described in "13.1.4.1 Intel TDX ISA Background: Convertible Memory Ranges (CMRs)", MCHECK should have guaranteed that: - there must be one CMR - CMR is page aligned - CMRs don't overlap and in address ascending order The only legal thing is there might be empty CMRs at the tail of the cmr_array[] following one or more valid CMRs. > > > > > > > > > > > > + > > > > > > > > + /* > > > > > > > > + * trim_empty_cmrs() updates the actual number of CMRs by > > > > > > > > + * dropping all tail empty CMRs. > > > > > > > > + */ > > > > > > > > + return trim_empty_cmrs(tdx_cmr_array, &tdx_cmr_num); > > > > > > > > +} > > > > > > > > > > > > Why does this both need to respect the "tdx_cmr_num = out.r9" value > > > > > > *and* trim the empty ones? Couldn't it just ignore the "tdx_cmr_num = > > > > > > out.r9" value and just trim the empty ones either way? It's not like > > > > > > there is a billion of them. It would simplify the code for sure. > > > > > > > > OK. Since spec says MAX_CMRs is 32, so I can use 32 instead of reading out from > > > > R9. > > > > But then you still have the "trimming" code. Why not just trust "r9" > > and then axe all the trimming code? Heck, and most of the sanity checks. > > > > This code could be a *lot* smaller. As I said the only problem is there might be empty CMRs at the tail of the cmr_array[] following one or more valid CMRs. But we can also do nothing here, but just skip empty CMRs when comparing the memory region to it (in next patch). Or, we don't even need to explicitly check memory region against CMRs. If the memory regions that we provided in the TDMR doesn't fall into CMR, then TDH.SYS.CONFIG will fail. We can just depend on the SEAMCALL to do that.
On Tue, 2022-11-22 at 15:39 -0800, Dave Hansen wrote: > > +#define TDSYSINFO_STRUCT_SIZE 1024 > > +#define TDSYSINFO_STRUCT_ALIGNMENT 1024 > > + > > +struct tdsysinfo_struct { > > + /* TDX-SEAM Module Info */ > > + u32 attributes; > > + u32 vendor_id; > > + u32 build_date; > > + u16 build_num; > > + u16 minor_version; > > + u16 major_version; > > + u8 reserved0[14]; > > + /* Memory Info */ > > + u16 max_tdmrs; > > + u16 max_reserved_per_tdmr; > > + u16 pamt_entry_size; > > + u8 reserved1[10]; > > + /* Control Struct Info */ > > + u16 tdcs_base_size; > > + u8 reserved2[2]; > > + u16 tdvps_base_size; > > + u8 tdvps_xfam_dependent_size; > > + u8 reserved3[9]; > > + /* TD Capabilities */ > > + u64 attributes_fixed0; > > + u64 attributes_fixed1; > > + u64 xfam_fixed0; > > + u64 xfam_fixed1; > > + u8 reserved4[32]; > > + u32 num_cpuid_config; > > + /* > > + * The actual number of CPUID_CONFIG depends on above > > + * 'num_cpuid_config'. The size of 'struct tdsysinfo_struct' > > + * is 1024B defined by TDX architecture. Use a union with > > + * specific padding to make 'sizeof(struct tdsysinfo_struct)' > > + * equal to 1024. > > + */ > > + union { > > + struct cpuid_config cpuid_configs[0]; > > + u8 reserved5[892]; > > + }; > > Can you double check what the "right" way to do variable arrays is these > days? I thought the [0] method was discouraged. > > Also, it isn't *really* 892 bytes of reserved space, right? Anything > that's not cpuid_configs[] is reserved, I presume. Could you try to be > more precise there? Hi Dave, I did some search, and I think we should use DECLARE_FLEX_ARRAY() macro? And also to address you concern that not all 892 bytes are reserved, how about below: union { - struct cpuid_config cpuid_configs[0]; - u8 reserved5[892]; + DECLARE_FLEX_ARRAY(struct cpuid_config, cpuid_configs); + u8 padding[892]; }; } __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT); The goal is to make the size of 'struct tdsysinfo_struct' to be 1024B so we can use a static variable for it, and at the meantime, it can still have 1024B (enough space) for the TDH.SYS.INFO to write to.
On Wed, 2022-11-23 at 22:53 +0000, Huang, Kai wrote: > > > > > > > > > + > > > > > > > > > + /* > > > > > > > > > + * trim_empty_cmrs() updates the actual number of CMRs by > > > > > > > > > + * dropping all tail empty CMRs. > > > > > > > > > + */ > > > > > > > > > + return trim_empty_cmrs(tdx_cmr_array, &tdx_cmr_num); > > > > > > > > > +} > > > > > > > > > > > > > > Why does this both need to respect the "tdx_cmr_num = out.r9" > > > > > > > value > > > > > > > *and* trim the empty ones? Couldn't it just ignore the > > > > > > > "tdx_cmr_num = > > > > > > > out.r9" value and just trim the empty ones either way? It's not > > > > > > > like > > > > > > > there is a billion of them. It would simplify the code for sure. > > > > > > > > > > OK. Since spec says MAX_CMRs is 32, so I can use 32 instead of > > > > > reading out from > > > > > R9. > > > > > > But then you still have the "trimming" code. Why not just trust "r9" > > > and then axe all the trimming code? Heck, and most of the sanity checks. > > > > > > This code could be a *lot* smaller. > > As I said the only problem is there might be empty CMRs at the tail of the > cmr_array[] following one or more valid CMRs. Hi Dave, Probably I forgot to mention the "r9" in practice always returns 32, so there will be empty CMRs at the tail of the cmr_array[]. > > But we can also do nothing here, but just skip empty CMRs when comparing the > memory region to it (in next patch). > > Or, we don't even need to explicitly check memory region against CMRs. If the > memory regions that we provided in the TDMR doesn't fall into CMR, then > TDH.SYS.CONFIG will fail. We can just depend on the SEAMCALL to do that. Sorry to ping, but do you have any comments here? How about we just don't do any check of TDX memory regions against CMRs, but just let the TDH.SYS.CONFIG SEAMCALL to determine?
On 12/2/22 03:11, Huang, Kai wrote: > And also to address you concern that not all 892 bytes are reserved, how about > below: > > union { > - struct cpuid_config cpuid_configs[0]; > - u8 reserved5[892]; > + DECLARE_FLEX_ARRAY(struct cpuid_config, cpuid_configs); > + u8 padding[892]; > }; > } __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT); > > The goal is to make the size of 'struct tdsysinfo_struct' to be 1024B so we can > use a static variable for it, and at the meantime, it can still have 1024B > (enough space) for the TDH.SYS.INFO to write to. I just don't like the open-coded sizes. For instance, wouldn't it be great if you didn't have to know the size of *ANYTHING* else to properly size the '892'? Maybe we just need some helpers to hide the gunk: #define DECLARE_PADDED_STRUCT(type, name, alignment) \ struct type##_padded { \ union { \ struct type name; \ u8 padding[alignment]; \ } \ } name##_padded; #define PADDED_STRUCT(name) (name##_padded.name) That can get used like this: DECLARE_PADDED_STRUCT(struct tdsysinfo_struct, tdsysinfo, TDSYSINFO_STRUCT_ALIGNMENT); struct tdsysinfo_struct sysinfo = PADDED_STRUCT(tdsysinfo)
On 12/2/22 03:19, Huang, Kai wrote: > Probably I forgot to mention the "r9" in practice always returns 32, so there > will be empty CMRs at the tail of the cmr_array[]. Right, so the r9 value is basically useless. I bet the code gets simpler if you just ignore it. >> But we can also do nothing here, but just skip empty CMRs when comparing the >> memory region to it (in next patch). >> >> Or, we don't even need to explicitly check memory region against CMRs. If the >> memory regions that we provided in the TDMR doesn't fall into CMR, then >> TDH.SYS.CONFIG will fail. We can just depend on the SEAMCALL to do that. > > Sorry to ping, but do you have any comments here? > > How about we just don't do any check of TDX memory regions against CMRs, but > just let the TDH.SYS.CONFIG SEAMCALL to determine? Right, if we screw it up TDH.SYS.CONFIG SEAMCALL will fail. We don't need to add more code to detect that failure ourselves. TDX is screwed either way.
On Fri, 2022-12-02 at 09:06 -0800, Dave Hansen wrote: > On 12/2/22 03:11, Huang, Kai wrote: > > And also to address you concern that not all 892 bytes are reserved, how about > > below: > > > > union { > > - struct cpuid_config cpuid_configs[0]; > > - u8 reserved5[892]; > > + DECLARE_FLEX_ARRAY(struct cpuid_config, cpuid_configs); > > + u8 padding[892]; > > }; > > } __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT); > > > > The goal is to make the size of 'struct tdsysinfo_struct' to be 1024B so we can > > use a static variable for it, and at the meantime, it can still have 1024B > > (enough space) for the TDH.SYS.INFO to write to. > > I just don't like the open-coded sizes. > > For instance, wouldn't it be great if you didn't have to know the size > of *ANYTHING* else to properly size the '892'? > > Maybe we just need some helpers to hide the gunk: > > #define DECLARE_PADDED_STRUCT(type, name, alignment) \ > struct type##_padded { \ > union { \ > struct type name; \ > u8 padding[alignment]; \ > } \ > } name##_padded; > > #define PADDED_STRUCT(name) (name##_padded.name) > > That can get used like this: > > DECLARE_PADDED_STRUCT(struct tdsysinfo_struct, tdsysinfo, > TDSYSINFO_STRUCT_ALIGNMENT); > > > struct tdsysinfo_struct sysinfo = PADDED_STRUCT(tdsysinfo) Thanks. Will try out this way.
On Fri, 2022-12-02 at 09:25 -0800, Dave Hansen wrote: > On 12/2/22 03:19, Huang, Kai wrote: > > Probably I forgot to mention the "r9" in practice always returns 32, so there > > will be empty CMRs at the tail of the cmr_array[]. > > Right, so the r9 value is basically useless. I bet the code gets > simpler if you just ignore it. > > > > But we can also do nothing here, but just skip empty CMRs when comparing the > > > memory region to it (in next patch). > > > > > > Or, we don't even need to explicitly check memory region against CMRs. If the > > > memory regions that we provided in the TDMR doesn't fall into CMR, then > > > TDH.SYS.CONFIG will fail. We can just depend on the SEAMCALL to do that. > > > > Sorry to ping, but do you have any comments here? > > > > How about we just don't do any check of TDX memory regions against CMRs, but > > just let the TDH.SYS.CONFIG SEAMCALL to determine? > > Right, if we screw it up TDH.SYS.CONFIG SEAMCALL will fail. We don't > need to add more code to detect that failure ourselves. TDX is screwed > either way. Will do. Thanks.
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 2cf7090667aa..43227af25e44 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -15,6 +15,7 @@ #include <linux/cpumask.h> #include <linux/smp.h> #include <linux/atomic.h> +#include <linux/align.h> #include <asm/msr-index.h> #include <asm/msr.h> #include <asm/apic.h> @@ -40,6 +41,11 @@ static enum tdx_module_status_t tdx_module_status; /* Prevent concurrent attempts on TDX detection and initialization */ static DEFINE_MUTEX(tdx_module_lock); +/* Below two are used in TDH.SYS.INFO SEAMCALL ABI */ +static struct tdsysinfo_struct tdx_sysinfo; +static struct cmr_info tdx_cmr_array[MAX_CMRS] __aligned(CMR_INFO_ARRAY_ALIGNMENT); +static int tdx_cmr_num; + /* * Detect TDX private KeyIDs to see whether TDX has been enabled by the * BIOS. Both initializing the TDX module and running TDX guest require @@ -208,6 +214,121 @@ static int tdx_module_init_cpus(void) return atomic_read(&sc.err); } +static inline bool is_cmr_empty(struct cmr_info *cmr) +{ + return !cmr->size; +} + +static inline bool is_cmr_ok(struct cmr_info *cmr) +{ + /* CMR must be page aligned */ + return IS_ALIGNED(cmr->base, PAGE_SIZE) && + IS_ALIGNED(cmr->size, PAGE_SIZE); +} + +static void print_cmrs(struct cmr_info *cmr_array, int cmr_num, + const char *name) +{ + int i; + + for (i = 0; i < cmr_num; i++) { + struct cmr_info *cmr = &cmr_array[i]; + + pr_info("%s : [0x%llx, 0x%llx)\n", name, + cmr->base, cmr->base + cmr->size); + } +} + +/* Check CMRs reported by TDH.SYS.INFO, and trim tail empty CMRs. */ +static int trim_empty_cmrs(struct cmr_info *cmr_array, int *actual_cmr_num) +{ + struct cmr_info *cmr; + int i, cmr_num; + + /* + * Intel TDX module spec, 20.7.3 CMR_INFO: + * + * TDH.SYS.INFO leaf function returns a MAX_CMRS (32) entry + * array of CMR_INFO entries. The CMRs are sorted from the + * lowest base address to the highest base address, and they + * are non-overlapping. + * + * This implies that BIOS may generate invalid empty entries + * if total CMRs are less than 32. Need to skip them manually. + * + * CMR also must be 4K aligned. TDX doesn't trust BIOS. TDX + * actually verifies CMRs before it gets enabled, so anything + * doesn't meet above means kernel bug (or TDX is broken). + */ + cmr = &cmr_array[0]; + /* There must be at least one valid CMR */ + if (WARN_ON_ONCE(is_cmr_empty(cmr) || !is_cmr_ok(cmr))) + goto err; + + cmr_num = *actual_cmr_num; + for (i = 1; i < cmr_num; i++) { + struct cmr_info *cmr = &cmr_array[i]; + struct cmr_info *prev_cmr = NULL; + + /* Skip further empty CMRs */ + if (is_cmr_empty(cmr)) + break; + + /* + * Do sanity check anyway to make sure CMRs: + * - are 4K aligned + * - don't overlap + * - are in address ascending order. + */ + if (WARN_ON_ONCE(!is_cmr_ok(cmr))) + goto err; + + prev_cmr = &cmr_array[i - 1]; + if (WARN_ON_ONCE((prev_cmr->base + prev_cmr->size) > + cmr->base)) + goto err; + } + + /* Update the actual number of CMRs */ + *actual_cmr_num = i; + + /* Print kernel checked CMRs */ + print_cmrs(cmr_array, *actual_cmr_num, "Kernel-checked-CMR"); + + return 0; +err: + pr_info("[TDX broken ?]: Invalid CMRs detected\n"); + print_cmrs(cmr_array, cmr_num, "BIOS-CMR"); + return -EINVAL; +} + +static int tdx_get_sysinfo(void) +{ + struct tdx_module_output out; + int ret; + + BUILD_BUG_ON(sizeof(struct tdsysinfo_struct) != TDSYSINFO_STRUCT_SIZE); + + ret = seamcall(TDH_SYS_INFO, __pa(&tdx_sysinfo), TDSYSINFO_STRUCT_SIZE, + __pa(tdx_cmr_array), MAX_CMRS, NULL, &out); + if (ret) + return ret; + + /* R9 contains the actual entries written the CMR array. */ + tdx_cmr_num = out.r9; + + pr_info("TDX module: atributes 0x%x, vendor_id 0x%x, major_version %u, minor_version %u, build_date %u, build_num %u", + tdx_sysinfo.attributes, tdx_sysinfo.vendor_id, + tdx_sysinfo.major_version, tdx_sysinfo.minor_version, + tdx_sysinfo.build_date, tdx_sysinfo.build_num); + + /* + * trim_empty_cmrs() updates the actual number of CMRs by + * dropping all tail empty CMRs. + */ + return trim_empty_cmrs(tdx_cmr_array, &tdx_cmr_num); +} + /* * Detect and initialize the TDX module. * @@ -232,6 +353,10 @@ static int init_tdx_module(void) if (ret) goto out; + ret = tdx_get_sysinfo(); + if (ret) + goto out; + /* * Return -EINVAL until all steps of TDX module initialization * process are done. diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 9ba11808bd45..8e273756098c 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -15,10 +15,71 @@ /* * TDX module SEAMCALL leaf functions */ +#define TDH_SYS_INFO 32 #define TDH_SYS_INIT 33 #define TDH_SYS_LP_INIT 35 #define TDH_SYS_LP_SHUTDOWN 44 +struct cmr_info { + u64 base; + u64 size; +} __packed; + +#define MAX_CMRS 32 +#define CMR_INFO_ARRAY_ALIGNMENT 512 + +struct cpuid_config { + u32 leaf; + u32 sub_leaf; + u32 eax; + u32 ebx; + u32 ecx; + u32 edx; +} __packed; + +#define TDSYSINFO_STRUCT_SIZE 1024 +#define TDSYSINFO_STRUCT_ALIGNMENT 1024 + +struct tdsysinfo_struct { + /* TDX-SEAM Module Info */ + u32 attributes; + u32 vendor_id; + u32 build_date; + u16 build_num; + u16 minor_version; + u16 major_version; + u8 reserved0[14]; + /* Memory Info */ + u16 max_tdmrs; + u16 max_reserved_per_tdmr; + u16 pamt_entry_size; + u8 reserved1[10]; + /* Control Struct Info */ + u16 tdcs_base_size; + u8 reserved2[2]; + u16 tdvps_base_size; + u8 tdvps_xfam_dependent_size; + u8 reserved3[9]; + /* TD Capabilities */ + u64 attributes_fixed0; + u64 attributes_fixed1; + u64 xfam_fixed0; + u64 xfam_fixed1; + u8 reserved4[32]; + u32 num_cpuid_config; + /* + * The actual number of CPUID_CONFIG depends on above + * 'num_cpuid_config'. The size of 'struct tdsysinfo_struct' + * is 1024B defined by TDX architecture. Use a union with + * specific padding to make 'sizeof(struct tdsysinfo_struct)' + * equal to 1024. + */ + union { + struct cpuid_config cpuid_configs[0]; + u8 reserved5[892]; + }; +} __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT); + /* * Do not put any hardware-defined TDX structure representations below * this comment!