[v7,11/20] x86/virt/tdx: Add placeholder to construct TDMRs to cover all TDX memory regions
Message ID | 32c1968fe34c8cf3cb834e3a9966cd2a201efc5b.1668988357.git.kai.huang@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1323032wrr; Sun, 20 Nov 2022 16:30:47 -0800 (PST) X-Google-Smtp-Source: AA0mqf4l3iK825Jkl9FOxpz/+UZYkIP9DhdEJ+pK1NFE1AnD7nSe8//s3JBef0CB4L7vBYOPAd5Z X-Received: by 2002:a17:90a:a405:b0:218:8416:7268 with SMTP id y5-20020a17090aa40500b0021884167268mr13818714pjp.194.1668990647068; Sun, 20 Nov 2022 16:30:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668990647; cv=none; d=google.com; s=arc-20160816; b=U8wu0RvfgGhTlmPqd6ONZCSss4xWMIP+7/NCgraIQKbBkgT3Stb42/t1+7I0V5CDFs 9afq4Lt+92djiLf4MB+gPmClAxiB+ALB5ETnkZuJCJC4xqIOrKlZVRv2Y0emxKrAPigk yMpU4GMJ6VDlKBPbvK1PBoCLTRfeMnvqXNzzVTyM+gJKtAJLxhEVcz93ogo0eD3P4PaQ t1sknSnQcDWsrFRaQhL5w3LCVSmPqRBh/vO4ZrkAmppB6UjDZn9eyXFL82iCraMzIy5i R/acC0d9s5GKQOsIOu98rWsySZQOwEx9ci23eLXsTBwDQct9PP0nK4Kjqet6/I/TEApz TKRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=6ly/tW/l8sxVF+1POXPZy0u43a5lyFM1xw+ryHowF2c=; b=b3qBTibk2Vk6ciZSTq/9UKQ84N+Wogq7IbRCR721e8489qSUBHySor64w/vAhUfEds DLhOPTEiNgTvKpv2fN5CDItvMjdml8dsldOhtGzshP/a8d+Pn7SBqAo9x0aWAwqTC0dE DKPrmWTzf/y+v/c/RC5wvtpm1F6eUF8U+JsLm7IaII+NCO6AFVP0wAwQZbziMzPUwrej MtJMmSkOtsMa1Ig8awjSH/boDri4jkRgwVF2jnVmGbTQ3UWDUn9Ar5ywjKOHyTKwVe8M BIZ2mdiLT30s2VWqVUmih3i3LLVME+9/Ti0RgQU5syxKA1Za19vEhfCE4NQWO4RS3K5B gJoA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=lIG1YbKV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lb7-20020a17090b4a4700b00205da45d8a5si13422266pjb.124.2022.11.20.16.30.34; Sun, 20 Nov 2022 16:30:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=lIG1YbKV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230040AbiKUA2z (ORCPT <rfc822;leviz.kernel.dev@gmail.com> + 99 others); Sun, 20 Nov 2022 19:28:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58426 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229939AbiKUA1p (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sun, 20 Nov 2022 19:27:45 -0500 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 329ED1DDC4; Sun, 20 Nov 2022 16:27:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668990464; x=1700526464; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=znmF9BwMgILl7Jshlgv9LDu7pEGe6gH0wYN4oKNnxSU=; b=lIG1YbKVek7SRmRs2ulPJr/54zdTj04/YEKb14bLim0gDl7HAFBqoe7J ByBLoEQ+Sl7wMu4A3c+PN9058Uq/p3aWQoqBPVUBQ0mat6pxuysS+2UiU ON33luCIE6KYOZHRuvxCjFz+HjYXbPLtP2HtkR3/4OhduRbju2gqyCKWC IYqaxiSllS/rtDrLSv2nQHjsxOe80y1LDwE65pS1aJ3K4B3/Z6tAIxR5H fmxdhMnl0j05De0/f0SEZMaZoGlnqkDeBz2ZzWk0kOY2cqXNoq1jlhVkh CuGeUiD6kRWsafWUZGq6t+M2rWIJwSnPU2isakMZkBsuXZP6dClrOQSXs w==; X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="377705706" X-IronPort-AV: E=Sophos;i="5.96,180,1665471600"; d="scan'208";a="377705706" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2022 16:27:42 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="729825390" X-IronPort-AV: E=Sophos;i="5.96,180,1665471600"; d="scan'208";a="729825390" Received: from tomnavar-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.176.15]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2022 16:27:38 -0800 From: Kai Huang <kai.huang@intel.com> To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, dan.j.williams@intel.com, rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com, ying.huang@intel.com, reinette.chatre@intel.com, len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org, ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com, sagis@google.com, imammedo@redhat.com, kai.huang@intel.com Subject: [PATCH v7 11/20] x86/virt/tdx: Add placeholder to construct TDMRs to cover all TDX memory regions Date: Mon, 21 Nov 2022 13:26:33 +1300 Message-Id: <32c1968fe34c8cf3cb834e3a9966cd2a201efc5b.1668988357.git.kai.huang@intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <cover.1668988357.git.kai.huang@intel.com> References: <cover.1668988357.git.kai.huang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1750063536793000198?= X-GMAIL-MSGID: =?utf-8?q?1750063536793000198?= |
Series |
TDX host kernel support
|
|
Commit Message
Kai Huang
Nov. 21, 2022, 12:26 a.m. UTC
TDX provides increased levels of memory confidentiality and integrity. This requires special hardware support for features like memory encryption and storage of memory integrity checksums. Not all memory satisfies these requirements. As a result, the TDX introduced the concept of a "Convertible Memory Region" (CMR). During boot, the firmware builds a list of all of the memory ranges which can provide the TDX security guarantees. The list of these ranges is available to the kernel by querying the TDX module. The TDX architecture needs additional metadata to record things like which TD guest "owns" a given page of memory. This metadata essentially serves as the 'struct page' for the TDX module. The space for this metadata is not reserved by the hardware up front and must be allocated by the kernel and given to the TDX module. Since this metadata consumes space, the VMM can choose whether or not to allocate it for a given area of convertible memory. If it chooses not to, the memory cannot receive TDX protections and can not be used by TDX guests as private memory. For every memory region that the VMM wants to use as TDX memory, it sets up a "TD Memory Region" (TDMR). Each TDMR represents a physically contiguous convertible range and must also have its own physically contiguous metadata table, referred to as a Physical Address Metadata Table (PAMT), to track status for each page in the TDMR range. Unlike a CMR, each TDMR requires 1G granularity and alignment. To support physical RAM areas that don't meet those strict requirements, each TDMR permits a number of internal "reserved areas" which can be placed over memory holes. If PAMT metadata is placed within a TDMR it must be covered by one of these reserved areas. Let's summarize the concepts: CMR - Firmware-enumerated physical ranges that support TDX. CMRs are 4K aligned. TDMR - Physical address range which is chosen by the kernel to support TDX. 1G granularity and alignment required. Each TDMR has reserved areas where TDX memory holes and overlapping PAMTs can be put into. PAMT - Physically contiguous TDX metadata. One table for each page size per TDMR. Roughly 1/256th of TDMR in size. 256G TDMR = ~1G PAMT. As one step of initializing the TDX module, the kernel configures TDX-usable memory regions by passing an array of TDMRs to the TDX module. Constructing the array of TDMRs consists below steps: 1) Create TDMRs to cover all memory regions that the TDX module can use; 2) Allocate and set up PAMT for each TDMR; 3) Set up reserved areas for each TDMR. Add a placeholder to construct TDMRs to do the above steps after all TDX memory regions are verified to be truly convertible. Always free TDMRs at the end of the initialization (no matter successful or not) as TDMRs are only used during the initialization. Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Kai Huang <kai.huang@intel.com> --- v6 -> v7: - Improved commit message to explain 'int' overflow cannot happen in cal_tdmr_size() and alloc_tdmr_array(). -- Andy/Dave. v5 -> v6: - construct_tdmrs_memblock() -> construct_tdmrs() as 'tdx_memblock' is used instead of memblock. - Added Isaku's Reviewed-by. - v3 -> v5 (no feedback on v4): - Moved calculating TDMR size to this patch. - Changed to use alloc_pages_exact() to allocate buffer for all TDMRs once, instead of allocating each TDMR individually. - Removed "crypto protection" in the changelog. - -EFAULT -> -EINVAL in couple of places. --- arch/x86/virt/vmx/tdx/tdx.c | 83 +++++++++++++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 23 ++++++++++ 2 files changed, 106 insertions(+)
Comments
On 11/20/22 16:26, Kai Huang wrote: > TDX provides increased levels of memory confidentiality and integrity. > This requires special hardware support for features like memory > encryption and storage of memory integrity checksums. Not all memory > satisfies these requirements. > > As a result, the TDX introduced the concept of a "Convertible Memory s/the TDX introduced/TDX introduces/ > Region" (CMR). During boot, the firmware builds a list of all of the > memory ranges which can provide the TDX security guarantees. The list > of these ranges is available to the kernel by querying the TDX module. > > The TDX architecture needs additional metadata to record things like > which TD guest "owns" a given page of memory. This metadata essentially > serves as the 'struct page' for the TDX module. The space for this > metadata is not reserved by the hardware up front and must be allocated > by the kernel and given to the TDX module. > > Since this metadata consumes space, the VMM can choose whether or not to > allocate it for a given area of convertible memory. If it chooses not > to, the memory cannot receive TDX protections and can not be used by TDX > guests as private memory. > > For every memory region that the VMM wants to use as TDX memory, it sets > up a "TD Memory Region" (TDMR). Each TDMR represents a physically > contiguous convertible range and must also have its own physically > contiguous metadata table, referred to as a Physical Address Metadata > Table (PAMT), to track status for each page in the TDMR range. > > Unlike a CMR, each TDMR requires 1G granularity and alignment. To > support physical RAM areas that don't meet those strict requirements, > each TDMR permits a number of internal "reserved areas" which can be > placed over memory holes. If PAMT metadata is placed within a TDMR it > must be covered by one of these reserved areas. > > Let's summarize the concepts: > > CMR - Firmware-enumerated physical ranges that support TDX. CMRs are > 4K aligned. > TDMR - Physical address range which is chosen by the kernel to support > TDX. 1G granularity and alignment required. Each TDMR has > reserved areas where TDX memory holes and overlapping PAMTs can > be put into. s/put into/represented/ > PAMT - Physically contiguous TDX metadata. One table for each page size > per TDMR. Roughly 1/256th of TDMR in size. 256G TDMR = ~1G > PAMT. > > As one step of initializing the TDX module, the kernel configures > TDX-usable memory regions by passing an array of TDMRs to the TDX module. > > Constructing the array of TDMRs consists below steps: > > 1) Create TDMRs to cover all memory regions that the TDX module can use; Slight tweak: 1) Create TDMRs to cover all memory regions that the TDX module will use for TD memory The TDX module "uses" more memory than strictly the TMDR's. > 2) Allocate and set up PAMT for each TDMR; > 3) Set up reserved areas for each TDMR. s/Set up/Designate/ > Add a placeholder to construct TDMRs to do the above steps after all > TDX memory regions are verified to be truly convertible. Always free > TDMRs at the end of the initialization (no matter successful or not) > as TDMRs are only used during the initialization. The changelog here actually looks really good to me so far. > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > index 32af86e31c47..26048c6b0170 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.c > +++ b/arch/x86/virt/vmx/tdx/tdx.c > @@ -445,6 +445,63 @@ static int build_tdx_memory(void) > return ret; > } > > +/* Calculate the actual TDMR_INFO size */ > +static inline int cal_tdmr_size(void) I think we can spare the bytes to add "culate" in the function name so we don't think these are California TDMRs. > +{ > + int tdmr_sz; > + > + /* > + * The actual size of TDMR_INFO depends on the maximum number > + * of reserved areas. > + * > + * Note: for TDX1.0 the max_reserved_per_tdmr is 16, and > + * TDMR_INFO size is aligned up to 512-byte. Even it is > + * extended in the future, it would be insane if TDMR_INFO > + * becomes larger than 4K. The tdmr_sz here should never > + * overflow. > + */ > + tdmr_sz = sizeof(struct tdmr_info); > + tdmr_sz += sizeof(struct tdmr_reserved_area) * > + tdx_sysinfo.max_reserved_per_tdmr; First, I think 'tdx_sysinfo' should probably be a local variable in init_tdx_module() and have its address passed in here. Having global variables always makes it more opaque about who is initializing it. Second, if this code is making assumptions about 'max_reserved_per_tdmr', then let's actually add assertions or sanity checks. For instance: if (tdx_sysinfo.max_reserved_per_tdmr > MAX_TDMRS) return -1; or even: if (tdmr_sz > PAGE_SIZE) return -1; It does almost no good to just assert what the limits are in a comment. > + /* > + * TDX requires each TDMR_INFO to be 512-byte aligned. Always > + * round up TDMR_INFO size to the 512-byte boundary. > + */ <sigh> More silly comments. The place to document this is TDMR_INFO_ALIGNMENT. If anyone wants to know what the alignment is, exactly, they can look at the definition. They don't need to be told *TWICE* what TDMR_INFO_ALIGNMENT #defines to in one comment. > + return ALIGN(tdmr_sz, TDMR_INFO_ALIGNMENT); > +} > + > +static struct tdmr_info *alloc_tdmr_array(int *array_sz) > +{ > + /* > + * TDX requires each TDMR_INFO to be 512-byte aligned. > + * Use alloc_pages_exact() to allocate all TDMRs at once. > + * Each TDMR_INFO will still be 512-byte aligned since > + * cal_tdmr_size() always returns 512-byte aligned size. > + */ OK, I think you're just trolling me now. Two *MORE* mentions of the 512-byte alignment? > + *array_sz = cal_tdmr_size() * tdx_sysinfo.max_tdmrs; > + > + /* > + * Zero the buffer so 'struct tdmr_info::size' can be > + * used to determine whether a TDMR is valid. > + * > + * Note: for TDX1.0 the max_tdmrs is 64 and TDMR_INFO size > + * is 512-byte. Even they are extended in the future, it > + * would be insane if the total size exceeds 4MB. > + */ > + return alloc_pages_exact(*array_sz, GFP_KERNEL | __GFP_ZERO); > +} This looks massively over complicated. Get rid of this function entirely. Then create: static int tdmr_array_size(void) { return tdmr_size_single() * tdx_sysinfo.max_tdmrs; } The *caller* can do: tdmr_array = alloc_pages_exact(tdmr_array_size(), GFP_KERNEL | __GFP_ZERO); if (!tdmr_array) { ... Then the error path is: free_pages_exact(tdmr_array, tdmr_array_size()); Then, there are no size pointers going back and forth. Easy peasy. I'm OK with a little arithmetic being repeated. > +/* > + * Construct an array of TDMRs to cover all TDX memory ranges. > + * The actual number of TDMRs is kept to @tdmr_num. > + */ > +static int construct_tdmrs(struct tdmr_info *tdmr_array, int *tdmr_num) > +{ > + /* Return -EINVAL until constructing TDMRs is done */ > + return -EINVAL; > +} > + > /* > * Detect and initialize the TDX module. > * > @@ -454,6 +511,9 @@ static int build_tdx_memory(void) > */ > static int init_tdx_module(void) > { > + struct tdmr_info *tdmr_array; > + int tdmr_array_sz; > + int tdmr_num; I tend to write these like: "tdmr_num" is the number of *a* TDMR. "nr_tdmrs" is the number of TDMRs. > int ret; > > /* > @@ -506,11 +566,34 @@ static int init_tdx_module(void) > ret = build_tdx_memory(); > if (ret) > goto out; > + > + /* Prepare enough space to construct TDMRs */ > + tdmr_array = alloc_tdmr_array(&tdmr_array_sz); > + if (!tdmr_array) { > + ret = -ENOMEM; > + goto out_free_tdx_mem; > + } > + > + /* Construct TDMRs to cover all TDX memory ranges */ > + ret = construct_tdmrs(tdmr_array, &tdmr_num); > + if (ret) > + goto out_free_tdmrs; > + > /* > * Return -EINVAL until all steps of TDX module initialization > * process are done. > */ > ret = -EINVAL; > +out_free_tdmrs: > + /* > + * The array of TDMRs is freed no matter the initialization is > + * successful or not. They are not needed anymore after the > + * module initialization. > + */ > + free_pages_exact(tdmr_array, tdmr_array_sz); > +out_free_tdx_mem: > + if (ret) > + free_tdx_memory(); > out: > /* > * Memory hotplug checks the hot-added memory region against the > diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h > index 8e273756098c..a737f2b51474 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.h > +++ b/arch/x86/virt/vmx/tdx/tdx.h > @@ -80,6 +80,29 @@ struct tdsysinfo_struct { > }; > } __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT); > > +struct tdmr_reserved_area { > + u64 offset; > + u64 size; > +} __packed; > + > +#define TDMR_INFO_ALIGNMENT 512 > + > +struct tdmr_info { > + u64 base; > + u64 size; > + u64 pamt_1g_base; > + u64 pamt_1g_size; > + u64 pamt_2m_base; > + u64 pamt_2m_size; > + u64 pamt_4k_base; > + u64 pamt_4k_size; > + /* > + * Actual number of reserved areas depends on > + * 'struct tdsysinfo_struct'::max_reserved_per_tdmr. > + */ > + struct tdmr_reserved_area reserved_areas[0]; > +} __packed __aligned(TDMR_INFO_ALIGNMENT); > + > /* > * Do not put any hardware-defined TDX structure representations below > * this comment!
On Wed, 2022-11-23 at 14:17 -0800, Dave Hansen wrote: > On 11/20/22 16:26, Kai Huang wrote: > > TDX provides increased levels of memory confidentiality and integrity. > > This requires special hardware support for features like memory > > encryption and storage of memory integrity checksums. Not all memory > > satisfies these requirements. > > > > As a result, the TDX introduced the concept of a "Convertible Memory > > s/the TDX introduced/TDX introduces/ > > > Region" (CMR). During boot, the firmware builds a list of all of the > > memory ranges which can provide the TDX security guarantees. The list > > of these ranges is available to the kernel by querying the TDX module. > > > > The TDX architecture needs additional metadata to record things like > > which TD guest "owns" a given page of memory. This metadata essentially > > serves as the 'struct page' for the TDX module. The space for this > > metadata is not reserved by the hardware up front and must be allocated > > by the kernel and given to the TDX module. > > > > Since this metadata consumes space, the VMM can choose whether or not to > > allocate it for a given area of convertible memory. If it chooses not > > to, the memory cannot receive TDX protections and can not be used by TDX > > guests as private memory. > > > > For every memory region that the VMM wants to use as TDX memory, it sets > > up a "TD Memory Region" (TDMR). Each TDMR represents a physically > > contiguous convertible range and must also have its own physically > > contiguous metadata table, referred to as a Physical Address Metadata > > Table (PAMT), to track status for each page in the TDMR range. > > > > Unlike a CMR, each TDMR requires 1G granularity and alignment. To > > support physical RAM areas that don't meet those strict requirements, > > each TDMR permits a number of internal "reserved areas" which can be > > placed over memory holes. If PAMT metadata is placed within a TDMR it > > must be covered by one of these reserved areas. > > > > Let's summarize the concepts: > > > > CMR - Firmware-enumerated physical ranges that support TDX. CMRs are > > 4K aligned. > > TDMR - Physical address range which is chosen by the kernel to support > > TDX. 1G granularity and alignment required. Each TDMR has > > reserved areas where TDX memory holes and overlapping PAMTs can > > be put into. > > s/put into/represented/ > > > PAMT - Physically contiguous TDX metadata. One table for each page size > > per TDMR. Roughly 1/256th of TDMR in size. 256G TDMR = ~1G > > PAMT. > > > > As one step of initializing the TDX module, the kernel configures > > TDX-usable memory regions by passing an array of TDMRs to the TDX module. > > > > Constructing the array of TDMRs consists below steps: > > > > 1) Create TDMRs to cover all memory regions that the TDX module can use; > > Slight tweak: > > 1) Create TDMRs to cover all memory regions that the TDX module will use > for TD memory > > The TDX module "uses" more memory than strictly the TMDR's. > > > 2) Allocate and set up PAMT for each TDMR; > > 3) Set up reserved areas for each TDMR. > > s/Set up/Designate/ Thanks. All above will be addressed. > > > Add a placeholder to construct TDMRs to do the above steps after all > > TDX memory regions are verified to be truly convertible. Always free > > TDMRs at the end of the initialization (no matter successful or not) > > as TDMRs are only used during the initialization. > > The changelog here actually looks really good to me so far. > > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > > index 32af86e31c47..26048c6b0170 100644 > > --- a/arch/x86/virt/vmx/tdx/tdx.c > > +++ b/arch/x86/virt/vmx/tdx/tdx.c > > @@ -445,6 +445,63 @@ static int build_tdx_memory(void) > > return ret; > > } > > > > +/* Calculate the actual TDMR_INFO size */ > > +static inline int cal_tdmr_size(void) > > I think we can spare the bytes to add "culate" in the function name so > we don't think these are California TDMRs. Sure will do. > > > +{ > > + int tdmr_sz; > > + > > + /* > > + * The actual size of TDMR_INFO depends on the maximum number > > + * of reserved areas. > > + * > > + * Note: for TDX1.0 the max_reserved_per_tdmr is 16, and > > + * TDMR_INFO size is aligned up to 512-byte. Even it is > > + * extended in the future, it would be insane if TDMR_INFO > > + * becomes larger than 4K. The tdmr_sz here should never > > + * overflow. > > + */ > > + tdmr_sz = sizeof(struct tdmr_info); > > + tdmr_sz += sizeof(struct tdmr_reserved_area) * > > + tdx_sysinfo.max_reserved_per_tdmr; > > First, I think 'tdx_sysinfo' should probably be a local variable in > init_tdx_module() and have its address passed in here. Having global > variables always makes it more opaque about who is initializing it. > > Second, if this code is making assumptions about > 'max_reserved_per_tdmr', then let's actually add assertions or sanity > checks. For instance: > > if (tdx_sysinfo.max_reserved_per_tdmr > MAX_TDMRS) > return -1; > > or even: > > if (tdmr_sz > PAGE_SIZE) > return -1; I can add this. > > It does almost no good to just assert what the limits are in a comment. > > > + /* > > + * TDX requires each TDMR_INFO to be 512-byte aligned. Always > > + * round up TDMR_INFO size to the 512-byte boundary. > > + */ > > <sigh> More silly comments. > > The place to document this is TDMR_INFO_ALIGNMENT. If anyone wants to > know what the alignment is, exactly, they can look at the definition. > They don't need to be told *TWICE* what TDMR_INFO_ALIGNMENT #defines to > in one comment. I see. Then I think we don't even need this comment since the name of TDMR_INFO_ALIGNMENT already implies? > > > + return ALIGN(tdmr_sz, TDMR_INFO_ALIGNMENT); > > +} > > + > > +static struct tdmr_info *alloc_tdmr_array(int *array_sz) > > +{ > > + /* > > + * TDX requires each TDMR_INFO to be 512-byte aligned. > > + * Use alloc_pages_exact() to allocate all TDMRs at once. > > + * Each TDMR_INFO will still be 512-byte aligned since > > + * cal_tdmr_size() always returns 512-byte aligned size. > > + */ > > OK, I think you're just trolling me now. Two *MORE* mentions of the > 512-byte alignment? I'll remove. > > > + *array_sz = cal_tdmr_size() * tdx_sysinfo.max_tdmrs; > > + > > + /* > > + * Zero the buffer so 'struct tdmr_info::size' can be > > + * used to determine whether a TDMR is valid. > > + * > > + * Note: for TDX1.0 the max_tdmrs is 64 and TDMR_INFO size > > + * is 512-byte. Even they are extended in the future, it > > + * would be insane if the total size exceeds 4MB. > > + */ > > + return alloc_pages_exact(*array_sz, GFP_KERNEL | __GFP_ZERO); > > +} > > This looks massively over complicated. > > Get rid of this function entirely. Then create: > > static int tdmr_array_size(void) > { > return tdmr_size_single() * tdx_sysinfo.max_tdmrs; > } > > The *caller* can do: > > tdmr_array = alloc_pages_exact(tdmr_array_size(), > GFP_KERNEL | __GFP_ZERO); > if (!tdmr_array) { > ... > > Then the error path is: > > free_pages_exact(tdmr_array, tdmr_array_size()); > > Then, there are no size pointers going back and forth. Easy peasy. I'm > OK with a little arithmetic being repeated. Yes. Will do. > > > +/* > > + * Construct an array of TDMRs to cover all TDX memory ranges. > > + * The actual number of TDMRs is kept to @tdmr_num. > > + */ > > +static int construct_tdmrs(struct tdmr_info *tdmr_array, int *tdmr_num) > > +{ > > + /* Return -EINVAL until constructing TDMRs is done */ > > + return -EINVAL; > > +} > > + > > /* > > * Detect and initialize the TDX module. > > * > > @@ -454,6 +511,9 @@ static int build_tdx_memory(void) > > */ > > static int init_tdx_module(void) > > { > > + struct tdmr_info *tdmr_array; > > + int tdmr_array_sz; > > + int tdmr_num; > > I tend to write these like: > > "tdmr_num" is the number of *a* TDMR. > > "nr_tdmrs" is the number of TDMRs. Indeed. Will do.
On Wed, 2022-11-23 at 14:17 -0800, Dave Hansen wrote: > First, I think 'tdx_sysinfo' should probably be a local variable in > init_tdx_module() and have its address passed in here. Having global > variables always makes it more opaque about who is initializing it. Sorry I missed to respond this. Using local variable for 'tdx_sysinfo' will cause a build warning: https://lore.kernel.org/lkml/a6694c81b4e96a22557fd0af70a81bd2c2e4e3e7.camel@intel.com/ So instead we can have a local variable for the pointer of 'tdx_sysinfo', and dynamically allocate memory for it. KVM will need to use it, though. So I think eventually we will need to have a global variable (either tdx_sysinfo itself, or the pointer of it). But this can be done in a separate patch. CMRs can be done in the same way (KVM doesn't need to use CMRs, but perhaps some day we may want to expose them to /sysfs, etc). What's your preference?
On 11/24/22 04:02, Huang, Kai wrote: > On Wed, 2022-11-23 at 14:17 -0800, Dave Hansen wrote: >> First, I think 'tdx_sysinfo' should probably be a local variable in >> init_tdx_module() and have its address passed in here. Having global >> variables always makes it more opaque about who is initializing it. > Sorry I missed to respond this. > > Using local variable for 'tdx_sysinfo' will cause a build warning: > > https://lore.kernel.org/lkml/a6694c81b4e96a22557fd0af70a81bd2c2e4e3e7.camel@intel.com/ Having it be local scope is a lot more important than having it be on stack. Just declare it local to the function but keep it off the stack. No need to dynamically allocate it, even.
On Mon, 2022-11-28 at 07:59 -0800, Dave Hansen wrote: > On 11/24/22 04:02, Huang, Kai wrote: > > On Wed, 2022-11-23 at 14:17 -0800, Dave Hansen wrote: > > > First, I think 'tdx_sysinfo' should probably be a local variable in > > > init_tdx_module() and have its address passed in here. Having global > > > variables always makes it more opaque about who is initializing it. > > Sorry I missed to respond this. > > > > Using local variable for 'tdx_sysinfo' will cause a build warning: > > > > https://lore.kernel.org/lkml/a6694c81b4e96a22557fd0af70a81bd2c2e4e3e7.camel@intel.com/ > > Having it be local scope is a lot more important than having it be on > stack. Just declare it local to the function but keep it off the stack. > No need to dynamically allocate it, even. Apologize I am not entirely sure whether I fully got your point. Do you mean something like below? static struct tdsysinfo_struct tdx_sysinfo; static int tdmr_size_single(int max_reserved_per_tdmr) { ... } static int tdmr_array_size(struct tdsysinfo_struct *sysinfo) { return tdmr_size_single(sysinfo->max_reserved_per_tdmr) * sysinfo->max_tdmrs; } static int init_tdx_module(void) { ... tdx_get_sysinfo(&tdx_sysinfo, ...); ... tdmr_array = alloc_pages_exact(tdmr_array_size(&tdx_sysinfo), GFP_KERNEL | __GFP_ZERO); ... construct_tdmrs(tdmr_array, &nr_tdmrs, &tdx_sysinfo); ... }
On 11/28/22 14:13, Huang, Kai wrote: > Apologize I am not entirely sure whether I fully got your point. Do you mean > something like below? ... No, something like this: static int init_tdx_module(void) { static struct tdsysinfo_struct tdx_sysinfo; /* too rotund for the stack */ ... tdx_get_sysinfo(&tdx_sysinfo, ...); ... But, also, seriously, 3k on the stack is *fine* if you can shut up the warnings. This isn't going to be a deep call stack to begin with.
On Mon, 2022-11-28 at 14:19 -0800, Dave Hansen wrote: > On 11/28/22 14:13, Huang, Kai wrote: > > Apologize I am not entirely sure whether I fully got your point. Do you mean > > something like below? > ... > > No, something like this: > > static int init_tdx_module(void) > { > static struct tdsysinfo_struct tdx_sysinfo; /* too rotund for the stack */ > ... > tdx_get_sysinfo(&tdx_sysinfo, ...); > ... > > But, also, seriously, 3k on the stack is *fine* if you can shut up the > warnings. This isn't going to be a deep call stack to begin with. > Let me try to find out whether it is possible to silent the warning. If I cannot, then I'll use your above way. Thanks!
On Mon, 2022-11-28 at 22:50 +0000, Huang, Kai wrote: > On Mon, 2022-11-28 at 14:19 -0800, Dave Hansen wrote: > > On 11/28/22 14:13, Huang, Kai wrote: > > > Apologize I am not entirely sure whether I fully got your point. Do you mean > > > something like below? > > ... > > > > No, something like this: > > > > static int init_tdx_module(void) > > { > > static struct tdsysinfo_struct tdx_sysinfo; /* too rotund for the stack */ > > ... > > tdx_get_sysinfo(&tdx_sysinfo, ...); > > ... > > > > But, also, seriously, 3k on the stack is *fine* if you can shut up the > > warnings. This isn't going to be a deep call stack to begin with. > > > > Let me try to find out whether it is possible to silent the warning. If I > cannot, then I'll use your above way. Thanks! Hi Dave, Sorry to double asking. Adding below build flag to Makefile can silent the warning: index 38d534f2c113..f8a40d15fdfc 100644 --- a/arch/x86/virt/vmx/tdx/Makefile +++ b/arch/x86/virt/vmx/tdx/Makefile @@ -1,2 +1,3 @@ # SPDX-License-Identifier: GPL-2.0-only +CFLAGS_tdx.o += -Wframe-larger-than=4096 So to confirm you want to add this flag to Makefile and just make tdx_sysinfo and tdx_cmr_array as local variables? Another reason I am double asking is, 'tdx_global_keyid' in this series can also be a local variable in init_tdx_module() but currently it is a static (as KVM will need it too). If I change to use local variable in the patch "x86/virt/tdx: Reserve TDX module global KeyID" like below: --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -50,9 +50,6 @@ static DEFINE_MUTEX(tdx_module_lock); /* All TDX-usable memory regions */ static LIST_HEAD(tdx_memlist); -/* TDX module global KeyID. Used in TDH.SYS.CONFIG ABI. */ -static u32 tdx_global_keyid; - /* * tdx_keyid_start and nr_tdx_keyids indicate that TDX is uninitialized. * This is used in TDX initialization error paths to take it from @@ -928,6 +925,7 @@ static int init_tdx_module(void) __aligned(CMR_INFO_ARRAY_ALIGNMENT); struct tdsysinfo_struct *sysinfo = &PADDED_STRUCT(tdsysinfo); struct tdmr_info_list tdmr_list; + u32 global_keyid; int ret; ret = tdx_get_sysinfo(sysinfo, cmr_array); @@ -964,7 +962,7 @@ static int init_tdx_module(void) * Pick the first TDX KeyID as global KeyID to protect * TDX module metadata. */ - tdx_global_keyid = tdx_keyid_start; + global_keyid = tdx_keyid_start; I got a warning for this particular patch: arch/x86/virt/vmx/tdx/tdx.c: In function ‘init_tdx_module’: arch/x86/virt/vmx/tdx/tdx.c:928:13: warning: variable ‘global_keyid’ set but not used [-Wunused-but-set-variable] 928 | u32 global_keyid; | ^~~~~~~~~~~~ To get rid of this warning, we need to merge this patch to the later patch (which configures the TDMRs and global keyid to the TDX module). Should I make the tdx_global_keyid as local variable too and merge patch "x86/virt/tdx: Reserve TDX module global KeyID" to the later patch?
On Wed, 2022-12-07 at 11:47 +0000, Huang, Kai wrote: > On Mon, 2022-11-28 at 22:50 +0000, Huang, Kai wrote: > > On Mon, 2022-11-28 at 14:19 -0800, Dave Hansen wrote: > > > On 11/28/22 14:13, Huang, Kai wrote: > > > > Apologize I am not entirely sure whether I fully got your point. Do you mean > > > > something like below? > > > ... > > > > > > No, something like this: > > > > > > static int init_tdx_module(void) > > > { > > > static struct tdsysinfo_struct tdx_sysinfo; /* too rotund for the stack */ > > > ... > > > tdx_get_sysinfo(&tdx_sysinfo, ...); > > > ... > > > > > > But, also, seriously, 3k on the stack is *fine* if you can shut up the > > > warnings. This isn't going to be a deep call stack to begin with. > > > > > > > Let me try to find out whether it is possible to silent the warning. If I > > cannot, then I'll use your above way. Thanks! > > Hi Dave, > > Sorry to double asking. > > Adding below build flag to Makefile can silent the warning: > > index 38d534f2c113..f8a40d15fdfc 100644 > --- a/arch/x86/virt/vmx/tdx/Makefile > +++ b/arch/x86/virt/vmx/tdx/Makefile > @@ -1,2 +1,3 @@ > # SPDX-License-Identifier: GPL-2.0-only > +CFLAGS_tdx.o += -Wframe-larger-than=4096 > > So to confirm you want to add this flag to Makefile and just make tdx_sysinfo > and tdx_cmr_array as local variables? Hi Dave, I found if I declare TDSYSINFO_STRUCT and CMR_ARRAY as local variable (on the stack), the TDH.SYS.INFO failed in my testing due to 'invalid operand' of the address of TDSYSINFO_STRUCT. If I declare them as static, the SEAMCALL works. I haven't looked into the reason yet but I suspect the address isn't aligned (I used __pa() to get the physical address). I'll take a look and report back. In the meantime, do you have any comments? Should I still pursue to keep them as local variable on the stack? Thanks. > > Another reason I am double asking is, 'tdx_global_keyid' in this series can also > be a local variable in init_tdx_module() but currently it is a static (as KVM > will need it too). If I change to use local variable in the patch > "x86/virt/tdx: Reserve TDX module global KeyID" like below: > > --- a/arch/x86/virt/vmx/tdx/tdx.c > +++ b/arch/x86/virt/vmx/tdx/tdx.c > @@ -50,9 +50,6 @@ static DEFINE_MUTEX(tdx_module_lock); > /* All TDX-usable memory regions */ > static LIST_HEAD(tdx_memlist); > > -/* TDX module global KeyID. Used in TDH.SYS.CONFIG ABI. */ > -static u32 tdx_global_keyid; > - > /* > * tdx_keyid_start and nr_tdx_keyids indicate that TDX is uninitialized. > * This is used in TDX initialization error paths to take it from > @@ -928,6 +925,7 @@ static int init_tdx_module(void) > __aligned(CMR_INFO_ARRAY_ALIGNMENT); > struct tdsysinfo_struct *sysinfo = &PADDED_STRUCT(tdsysinfo); > struct tdmr_info_list tdmr_list; > + u32 global_keyid; > int ret; > > ret = tdx_get_sysinfo(sysinfo, cmr_array); > @@ -964,7 +962,7 @@ static int init_tdx_module(void) > * Pick the first TDX KeyID as global KeyID to protect > * TDX module metadata. > */ > - tdx_global_keyid = tdx_keyid_start; > + global_keyid = tdx_keyid_start; > > I got a warning for this particular patch: > > arch/x86/virt/vmx/tdx/tdx.c: In function ‘init_tdx_module’: > arch/x86/virt/vmx/tdx/tdx.c:928:13: warning: variable ‘global_keyid’ set but not > used [-Wunused-but-set-variable] > 928 | u32 global_keyid; > | ^~~~~~~~~~~~ > > To get rid of this warning, we need to merge this patch to the later patch > (which configures the TDMRs and global keyid to the TDX module). > > Should I make the tdx_global_keyid as local variable too and merge patch > "x86/virt/tdx: Reserve TDX module global KeyID" to the later patch? And for this one, if we merge the two patches then in fact we can just remove 'tdx_global_keyid' but use 'tdx_start_keyid' directly. I have already done in this way. Any comments please let me know. Thanks for your time.
On 12/8/22 04:56, Huang, Kai wrote: > I haven't looked into the reason yet but I suspect the address isn't aligned (I > used __pa() to get the physical address). I'll take a look and report back. > > In the meantime, do you have any comments? Should I still pursue to keep them > as local variable on the stack? Yes, you should investigate the reason for the failure and try to understand both the success and the failure cases.
On Thu, 2022-12-08 at 06:58 -0800, Dave Hansen wrote: > On 12/8/22 04:56, Huang, Kai wrote: > > I haven't looked into the reason yet but I suspect the address isn't aligned (I > > used __pa() to get the physical address). I'll take a look and report back. > > > > In the meantime, do you have any comments? Should I still pursue to keep them > > as local variable on the stack? > > Yes, you should investigate the reason for the failure and try to > understand both the success and the failure cases. Hi Dave, Learned something new from Kirill today. The reason is not the alignment, but it's wrong to use __pa() to get the physical address of function local variable on the stack. It is because Kirill told me kernel stack can now be allocated via vmalloc(), so use __pa() won't work. I changed to use slow_virt_to_phys() and tried in my testing and it now works. So I'll change to use slow_virt_to_phys() for the next version. We can take a look at the new version to see if that is what you wanted. Thanks for your time.
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 32af86e31c47..26048c6b0170 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -445,6 +445,63 @@ static int build_tdx_memory(void) return ret; } +/* Calculate the actual TDMR_INFO size */ +static inline int cal_tdmr_size(void) +{ + int tdmr_sz; + + /* + * The actual size of TDMR_INFO depends on the maximum number + * of reserved areas. + * + * Note: for TDX1.0 the max_reserved_per_tdmr is 16, and + * TDMR_INFO size is aligned up to 512-byte. Even it is + * extended in the future, it would be insane if TDMR_INFO + * becomes larger than 4K. The tdmr_sz here should never + * overflow. + */ + tdmr_sz = sizeof(struct tdmr_info); + tdmr_sz += sizeof(struct tdmr_reserved_area) * + tdx_sysinfo.max_reserved_per_tdmr; + + /* + * TDX requires each TDMR_INFO to be 512-byte aligned. Always + * round up TDMR_INFO size to the 512-byte boundary. + */ + return ALIGN(tdmr_sz, TDMR_INFO_ALIGNMENT); +} + +static struct tdmr_info *alloc_tdmr_array(int *array_sz) +{ + /* + * TDX requires each TDMR_INFO to be 512-byte aligned. + * Use alloc_pages_exact() to allocate all TDMRs at once. + * Each TDMR_INFO will still be 512-byte aligned since + * cal_tdmr_size() always returns 512-byte aligned size. + */ + *array_sz = cal_tdmr_size() * tdx_sysinfo.max_tdmrs; + + /* + * Zero the buffer so 'struct tdmr_info::size' can be + * used to determine whether a TDMR is valid. + * + * Note: for TDX1.0 the max_tdmrs is 64 and TDMR_INFO size + * is 512-byte. Even they are extended in the future, it + * would be insane if the total size exceeds 4MB. + */ + return alloc_pages_exact(*array_sz, GFP_KERNEL | __GFP_ZERO); +} + +/* + * Construct an array of TDMRs to cover all TDX memory ranges. + * The actual number of TDMRs is kept to @tdmr_num. + */ +static int construct_tdmrs(struct tdmr_info *tdmr_array, int *tdmr_num) +{ + /* Return -EINVAL until constructing TDMRs is done */ + return -EINVAL; +} + /* * Detect and initialize the TDX module. * @@ -454,6 +511,9 @@ static int build_tdx_memory(void) */ static int init_tdx_module(void) { + struct tdmr_info *tdmr_array; + int tdmr_array_sz; + int tdmr_num; int ret; /* @@ -506,11 +566,34 @@ static int init_tdx_module(void) ret = build_tdx_memory(); if (ret) goto out; + + /* Prepare enough space to construct TDMRs */ + tdmr_array = alloc_tdmr_array(&tdmr_array_sz); + if (!tdmr_array) { + ret = -ENOMEM; + goto out_free_tdx_mem; + } + + /* Construct TDMRs to cover all TDX memory ranges */ + ret = construct_tdmrs(tdmr_array, &tdmr_num); + if (ret) + goto out_free_tdmrs; + /* * Return -EINVAL until all steps of TDX module initialization * process are done. */ ret = -EINVAL; +out_free_tdmrs: + /* + * The array of TDMRs is freed no matter the initialization is + * successful or not. They are not needed anymore after the + * module initialization. + */ + free_pages_exact(tdmr_array, tdmr_array_sz); +out_free_tdx_mem: + if (ret) + free_tdx_memory(); out: /* * Memory hotplug checks the hot-added memory region against the diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 8e273756098c..a737f2b51474 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -80,6 +80,29 @@ struct tdsysinfo_struct { }; } __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT); +struct tdmr_reserved_area { + u64 offset; + u64 size; +} __packed; + +#define TDMR_INFO_ALIGNMENT 512 + +struct tdmr_info { + u64 base; + u64 size; + u64 pamt_1g_base; + u64 pamt_1g_size; + u64 pamt_2m_base; + u64 pamt_2m_size; + u64 pamt_4k_base; + u64 pamt_4k_size; + /* + * Actual number of reserved areas depends on + * 'struct tdsysinfo_struct'::max_reserved_per_tdmr. + */ + struct tdmr_reserved_area reserved_areas[0]; +} __packed __aligned(TDMR_INFO_ALIGNMENT); + /* * Do not put any hardware-defined TDX structure representations below * this comment!