Message ID | 4db59b4a87f0309c29e61a79892b9fa6645754a8.1668988357.git.kai.huang@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1323038wrr; Sun, 20 Nov 2022 16:30:48 -0800 (PST) X-Google-Smtp-Source: AA0mqf53E02uAzkQJDgAPP/SGfgE2rVbcrgE4aeV6VJuB87Uiz4fpRtQCIOw+KiiltfX9GH8smSF X-Received: by 2002:a17:90b:2688:b0:218:b9e1:ebef with SMTP id pl8-20020a17090b268800b00218b9e1ebefmr1131103pjb.65.1668990648299; Sun, 20 Nov 2022 16:30:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668990648; cv=none; d=google.com; s=arc-20160816; b=XM/eaVSxeEMDOnpHPOMV0JvEhwnvHYWTG3BvNNRiWcwCTjQengXq3VI2/u2D6RTH4O +LG1QbfIecJCOLfR92NKAu89eMcKD3/FUDL0Ql0SBNLwSdLFzGS1+QZtkpX3Lt9YTQhp Wucry2OrEtWx7roHdujJqLN3Xqd33KWjMRV6A4WLNOSo7Vp3U4H4zZjsaYUGcdp9MCq/ s5ioZPkkskkUtWZ9vlyrWcxmoXMNYGuIefAjdb4yyeduuEmCZVL2zxNtxWX9fimMYDTM M+AU2Vkl0PXsBuPy+bS/ulPS3gzjMMYgIpbwKQ8tfQyPrg0oIV9ZKq/I3W13dfVgV0WJ BRAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=jHNQ7/yL06M/T50RVxsQUWyWLSm0bBCn37vpvdDxts4=; b=PstLaZqzaLZH4q4NMEWA6pBNEZvSyeEOacIcoBIW7u53oOjiUt2GUeu0OSdqHBiQt6 WR8uMEq+kqhmpyXAKVcGguupzogwfr/9nkVLsflLuz8NOfpj79uylo4A8Dg3EptdkpZJ jvHBgngeHMF3vFYDoLAtc86OuWMELYKyYeH0CFMqVjrmNFPjUCMl9kY8YLgcLRulftbZ 1rA5ExzTVf6BafRGfrMK4A0moWA9DTrb940EPlYoAug8hSnty1an53KSSgISue9u/8a4 bnhnxhi9hDyHfwtV15QOaBst0/odZd4TYcN56rXlI+WWJtWY2CjeMQN6qrYmFdD2Qk/I JHdw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="YWK/LEpx"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s10-20020a170902ea0a00b00172f8a4b3e1si10826710plg.81.2022.11.20.16.30.35; Sun, 20 Nov 2022 16:30:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="YWK/LEpx"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230049AbiKUA3K (ORCPT <rfc822;leviz.kernel.dev@gmail.com> + 99 others); Sun, 20 Nov 2022 19:29:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58700 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229985AbiKUA1w (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sun, 20 Nov 2022 19:27:52 -0500 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 158D31FCCC; Sun, 20 Nov 2022 16:27:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668990468; x=1700526468; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=lQ+GUoAJqvHedg+d+kfODMpHJrhD7+ALFsnhNJ3753Q=; b=YWK/LEpxzXQ2aVSmi1U76NwlzB5WwN1BMiA+IPa4IKYvqXVDGSK+WxYG FeXsldcW6gz6SGxqYbPWWWMQquTzZbiwmW4o1xZWjP8NwdJ1dNRRLJWPT oBrRQXzIGIyQNolh3VAJoR/OSXrsvHy0lo3j6pQ97WvmZ9Og8nXfE4gXt Q/kjh1uAkzvE3KEJw71XrZPgOT0yJeQ02sgjGZ+eAPwT6edZxwHhH4PkD md4KKi6OKq8Ul5gU8qHuZlNO2eO/kqOm6pl5HPTH795A6MLDyex7cPi3i jh2A0l43zhrSrZCb7wOLLL2l5gNXiYSkfiooAszCxh4pp3PpzUjNr1XlC w==; X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="377705713" X-IronPort-AV: E=Sophos;i="5.96,180,1665471600"; d="scan'208";a="377705713" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2022 16:27:47 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="729825418" X-IronPort-AV: E=Sophos;i="5.96,180,1665471600"; d="scan'208";a="729825418" Received: from tomnavar-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.176.15]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2022 16:27:42 -0800 From: Kai Huang <kai.huang@intel.com> To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, dan.j.williams@intel.com, rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com, ying.huang@intel.com, reinette.chatre@intel.com, len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org, ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com, sagis@google.com, imammedo@redhat.com, kai.huang@intel.com Subject: [PATCH v7 12/20] x86/virt/tdx: Create TDMRs to cover all TDX memory regions Date: Mon, 21 Nov 2022 13:26:34 +1300 Message-Id: <4db59b4a87f0309c29e61a79892b9fa6645754a8.1668988357.git.kai.huang@intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <cover.1668988357.git.kai.huang@intel.com> References: <cover.1668988357.git.kai.huang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1750063537747385272?= X-GMAIL-MSGID: =?utf-8?q?1750063537747385272?= |
Series |
TDX host kernel support
|
|
Commit Message
Kai Huang
Nov. 21, 2022, 12:26 a.m. UTC
The kernel configures TDX-usable memory regions by passing an array of
"TD Memory Regions" (TDMRs) to the TDX module. Each TDMR contains the
information of the base/size of a memory region, the base/size of the
associated Physical Address Metadata Table (PAMT) and a list of reserved
areas in the region.
Create a number of TDMRs to cover all TDX memory regions. To keep it
simple, always try to create one TDMR for each memory region. As the
first step only set up the base/size for each TDMR.
Each TDMR must be 1G aligned and the size must be in 1G granularity.
This implies that one TDMR could cover multiple memory regions. If a
memory region spans the 1GB boundary and the former part is already
covered by the previous TDMR, just create a new TDMR for the remaining
part.
TDX only supports a limited number of TDMRs. Disable TDX if all TDMRs
are consumed but there is more memory region to cover.
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
v6 -> v7:
- No change.
v5 -> v6:
- Rebase due to using 'tdx_memblock' instead of memblock.
- v3 -> v5 (no feedback on v4):
- Removed allocating TDMR individually.
- Improved changelog by using Dave's words.
- Made TDMR_START() and TDMR_END() as static inline function.
---
arch/x86/virt/vmx/tdx/tdx.c | 104 +++++++++++++++++++++++++++++++++++-
1 file changed, 103 insertions(+), 1 deletion(-)
Comments
On 11/20/22 16:26, Kai Huang wrote: > The kernel configures TDX-usable memory regions by passing an array of > "TD Memory Regions" (TDMRs) to the TDX module. Each TDMR contains the > information of the base/size of a memory region, the base/size of the > associated Physical Address Metadata Table (PAMT) and a list of reserved > areas in the region. > > Create a number of TDMRs to cover all TDX memory regions. To keep it > simple, always try to create one TDMR for each memory region. As the > first step only set up the base/size for each TDMR. > > Each TDMR must be 1G aligned and the size must be in 1G granularity. > This implies that one TDMR could cover multiple memory regions. If a > memory region spans the 1GB boundary and the former part is already > covered by the previous TDMR, just create a new TDMR for the remaining > part. > > TDX only supports a limited number of TDMRs. Disable TDX if all TDMRs > are consumed but there is more memory region to cover. Good changelog. This patch is doing *one* thing. > arch/x86/virt/vmx/tdx/tdx.c | 104 +++++++++++++++++++++++++++++++++++- > 1 file changed, 103 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > index 26048c6b0170..57b448de59a0 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.c > +++ b/arch/x86/virt/vmx/tdx/tdx.c > @@ -445,6 +445,24 @@ static int build_tdx_memory(void) > return ret; > } > > +/* TDMR must be 1gb aligned */ > +#define TDMR_ALIGNMENT BIT_ULL(30) > +#define TDMR_PFN_ALIGNMENT (TDMR_ALIGNMENT >> PAGE_SHIFT) > + > +/* Align up and down the address to TDMR boundary */ > +#define TDMR_ALIGN_DOWN(_addr) ALIGN_DOWN((_addr), TDMR_ALIGNMENT) > +#define TDMR_ALIGN_UP(_addr) ALIGN((_addr), TDMR_ALIGNMENT) > + > +static inline u64 tdmr_start(struct tdmr_info *tdmr) > +{ > + return tdmr->base; > +} I'm always skeptical that it's a good idea to take this in code: tdmr->base and make it this: tdmr_start(tdmr) because the helper is *LESS* compact than the open-coded form! I hope I'm proven wrong. > +static inline u64 tdmr_end(struct tdmr_info *tdmr) > +{ > + return tdmr->base + tdmr->size; > +} > + > /* Calculate the actual TDMR_INFO size */ > static inline int cal_tdmr_size(void) > { > @@ -492,14 +510,98 @@ static struct tdmr_info *alloc_tdmr_array(int *array_sz) > return alloc_pages_exact(*array_sz, GFP_KERNEL | __GFP_ZERO); > } > > +static struct tdmr_info *tdmr_array_entry(struct tdmr_info *tdmr_array, > + int idx) > +{ > + return (struct tdmr_info *)((unsigned long)tdmr_array + > + cal_tdmr_size() * idx); > +} FWIW, I think it's probably a bad idea to have 'struct tdmr_info *' types floating around since: tmdr_info_array[0] works, but: tmdr_info_array[1] will blow up in your face. It would almost make sense to have struct tdmr_info_list { struct tdmr_info *first_tdmr; } and then pass around pointers to the 'struct tdmr_info_list'. Maybe that's overkill, but it is kinda silly to call something an array if [] doesn't work on it. > +/* > + * Create TDMRs to cover all TDX memory regions. The actual number > + * of TDMRs is set to @tdmr_num. > + */ > +static int create_tdmrs(struct tdmr_info *tdmr_array, int *tdmr_num) > +{ > + struct tdx_memblock *tmb; > + int tdmr_idx = 0; > + > + /* > + * Loop over TDX memory regions and create TDMRs to cover them. > + * To keep it simple, always try to use one TDMR to cover > + * one memory region. > + */ This seems like it might tend to under-utilize TDMRs. I'm sure this is done for simplicity, but is it OK? Why is it OK? How are you sure this won't bite us later? > + list_for_each_entry(tmb, &tdx_memlist, list) { > + struct tdmr_info *tdmr; > + u64 start, end; > + > + tdmr = tdmr_array_entry(tdmr_array, tdmr_idx); > + start = TDMR_ALIGN_DOWN(tmb->start_pfn << PAGE_SHIFT); > + end = TDMR_ALIGN_UP(tmb->end_pfn << PAGE_SHIFT); Nit: a little vertical alignment can make this much more readable: start = TDMR_ALIGN_DOWN(tmb->start_pfn << PAGE_SHIFT); end = TDMR_ALIGN_UP (tmb->end_pfn << PAGE_SHIFT); > + > + /* > + * If the current TDMR's size hasn't been initialized, > + * it is a new TDMR to cover the new memory region. > + * Otherwise, the current TDMR has already covered the > + * previous memory region. In the latter case, check > + * whether the current memory region has been fully or > + * partially covered by the current TDMR, since TDMR is > + * 1G aligned. > + */ Again, we have a comment over a if() block that describes what the individual steps in the block do. *Plus* each individual step is *ALREADY* commented. What purpose does this comment serve? > + if (tdmr->size) { > + /* > + * Loop to the next memory region if the current > + * block has already been fully covered by the > + * current TDMR. > + */ > + if (end <= tdmr_end(tdmr)) > + continue; > + > + /* > + * If part of the current memory region has > + * already been covered by the current TDMR, > + * skip the already covered part. > + */ > + if (start < tdmr_end(tdmr)) > + start = tdmr_end(tdmr); > + > + /* > + * Create a new TDMR to cover the current memory > + * region, or the remaining part of it. > + */ > + tdmr_idx++; > + if (tdmr_idx >= tdx_sysinfo.max_tdmrs) > + return -E2BIG; > + > + tdmr = tdmr_array_entry(tdmr_array, tdmr_idx); > + } > + > + tdmr->base = start; > + tdmr->size = end - start; > + } > + > + /* @tdmr_idx is always the index of last valid TDMR. */ > + *tdmr_num = tdmr_idx + 1; > + > + return 0; > +} Seems like a positive return value could be the number of populated TDMRs. That would get rid of the int* argument. > /* > * Construct an array of TDMRs to cover all TDX memory ranges. > * The actual number of TDMRs is kept to @tdmr_num. > */ OK, so something else allocated the 'tdmr_array' and it's being passed in here to fill it out. "construct" and "create" are both near synonyms for "allocate", which isn't even being done here. We want something here that will make it clear that this function is taking an already populated list of TDMRs and filling it out. "fill_tmdrs()" seems like it might be a better choice. This is also a place where better words can help. If the function is called "construct", then there's *ZERO* value in using the same word in the comment. Using a word that is a close synonym but that can contrast it with something different would be really nice, say: This is also a place where the calling convention can be used to add clarity. If you implicitly use a global variable, you have to explain that. But, if you pass *in* a variable, it's a lot more clear. Take this, for instance: /* * Take the memory referenced in @tdx_memlist and populate the * preallocated @tmdr_array, following all the special alignment * and size rules for TDMR. */ static int fill_out_tdmrs(struct list_head *tdx_memlist, struct tdmr_info *tdmr_array) { ... That's 100% crystal clear about what's going on. You know what the inputs are and the outputs. You also know why this is even necessary. It's implied a bit, but it's because TDMRs have special rules about size/alignment and tdx_memlists do not. > static int construct_tdmrs(struct tdmr_info *tdmr_array, int *tdmr_num) > { > + int ret; > + > + ret = create_tdmrs(tdmr_array, tdmr_num); > + if (ret) > + goto err; > + > /* Return -EINVAL until constructing TDMRs is done */ > - return -EINVAL; > + ret = -EINVAL; > +err: > + return ret; > } > > /*
> > +static inline u64 tdmr_start(struct tdmr_info *tdmr) > > +{ > > + return tdmr->base; > > +} > > I'm always skeptical that it's a good idea to take this in code: > > tdmr->base > > and make it this: > > tdmr_start(tdmr) > > because the helper is *LESS* compact than the open-coded form! I hope > I'm proven wrong. IIUC you prefer using tdmr->base directly. Will do. > > > +static inline u64 tdmr_end(struct tdmr_info *tdmr) > > +{ > > + return tdmr->base + tdmr->size; > > +} > > + > > /* Calculate the actual TDMR_INFO size */ > > static inline int cal_tdmr_size(void) > > { > > @@ -492,14 +510,98 @@ static struct tdmr_info *alloc_tdmr_array(int *array_sz) > > return alloc_pages_exact(*array_sz, GFP_KERNEL | __GFP_ZERO); > > } > > > > +static struct tdmr_info *tdmr_array_entry(struct tdmr_info *tdmr_array, > > + int idx) > > +{ > > + return (struct tdmr_info *)((unsigned long)tdmr_array + > > + cal_tdmr_size() * idx); > > +} > > FWIW, I think it's probably a bad idea to have 'struct tdmr_info *' > types floating around since: > > tmdr_info_array[0] > > works, but: > > tmdr_info_array[1] > > will blow up in your face. It would almost make sense to have > > struct tdmr_info_list { > struct tdmr_info *first_tdmr; > } > > and then pass around pointers to the 'struct tdmr_info_list'. Maybe > that's overkill, but it is kinda silly to call something an array if [] > doesn't work on it. Then should I introduce 'struct tdmr_info_list' in the previous patch (which allocates enough space for the tdmr_array), and add functions to allocate/free this tdmr_info_list? > > > +/* > > + * Create TDMRs to cover all TDX memory regions. The actual number > > + * of TDMRs is set to @tdmr_num. > > + */ > > +static int create_tdmrs(struct tdmr_info *tdmr_array, int *tdmr_num) > > +{ > > + struct tdx_memblock *tmb; > > + int tdmr_idx = 0; > > + > > + /* > > + * Loop over TDX memory regions and create TDMRs to cover them. > > + * To keep it simple, always try to use one TDMR to cover > > + * one memory region. > > + */ > > This seems like it might tend to under-utilize TDMRs. I'm sure this is > done for simplicity, but is it OK? Why is it OK? How are you sure this > won't bite us later? In practice the maximum number of TDMRs is 64. In reality we never met a machine that could result in so many memory regions, and typically 20 TDMRs is big enough to cover them. But if user uses 'memmap' to deliberately create bunch of discrete memory regions, then we can run out of TDMRs. But I think we can blame user in this case. How about add a comment? /* * In practice TDX1.0 supports 64 TDMRs, which should be big enough * to cover all memory regions in reality if admin doesn't use 'memmap' * to create bunch of discrete memory regions. */ > > > + list_for_each_entry(tmb, &tdx_memlist, list) { > > + struct tdmr_info *tdmr; > > + u64 start, end; > > + > > + tdmr = tdmr_array_entry(tdmr_array, tdmr_idx); > > + start = TDMR_ALIGN_DOWN(tmb->start_pfn << PAGE_SHIFT); > > + end = TDMR_ALIGN_UP(tmb->end_pfn << PAGE_SHIFT); > > Nit: a little vertical alignment can make this much more readable: > > start = TDMR_ALIGN_DOWN(tmb->start_pfn << PAGE_SHIFT); > end = TDMR_ALIGN_UP (tmb->end_pfn << PAGE_SHIFT); Sure. Btw Ying suggested we can use PHYS_PFN() forĀ <phys> >> PAGE_SHIFT and PFN_PHYS() for <pfn> << PAGE_SHIFT Should I apply them to this entire series? > > > + > > + /* > > + * If the current TDMR's size hasn't been initialized, > > + * it is a new TDMR to cover the new memory region. > > + * Otherwise, the current TDMR has already covered the > > + * previous memory region. In the latter case, check > > + * whether the current memory region has been fully or > > + * partially covered by the current TDMR, since TDMR is > > + * 1G aligned. > > + */ > > Again, we have a comment over a if() block that describes what the > individual steps in the block do. *Plus* each individual step is > *ALREADY* commented. What purpose does this comment serve? I think the check of 'if (tdmr->size)' is still worth commenting. The last sentence can be removed -- as you said, it is kinda duplicated with the individual comments within the if(). > > > + if (tdmr->size) { > > + /* > > + * Loop to the next memory region if the current > > + * block has already been fully covered by the > > + * current TDMR. > > + */ > > + if (end <= tdmr_end(tdmr)) > > + continue; > > + > > + /* > > + * If part of the current memory region has > > + * already been covered by the current TDMR, > > + * skip the already covered part. > > + */ > > + if (start < tdmr_end(tdmr)) > > + start = tdmr_end(tdmr); > > + > > + /* > > + * Create a new TDMR to cover the current memory > > + * region, or the remaining part of it. > > + */ > > + tdmr_idx++; > > + if (tdmr_idx >= tdx_sysinfo.max_tdmrs) > > + return -E2BIG; > > + > > + tdmr = tdmr_array_entry(tdmr_array, tdmr_idx); > > + } > > + > > + tdmr->base = start; > > + tdmr->size = end - start; > > + } > > + > > + /* @tdmr_idx is always the index of last valid TDMR. */ > > + *tdmr_num = tdmr_idx + 1; > > + > > + return 0; > > +} > > Seems like a positive return value could be the number of populated > TDMRs. That would get rid of the int* argument. Yes we can. I'll make the function return -E2BIG, or the actual number of TDMRs. Btw, I think it's better to print out some error message in case of -E2BIG so user can easily tell the reason of failure? Something like this: if (tdmr_idx >= tdx_sysinfo.max_tdmrs) { pr_info("no enough TDMRs to cover all TDX memory regions\n"); return -E2BIG; } > > > /* > > * Construct an array of TDMRs to cover all TDX memory ranges. > > * The actual number of TDMRs is kept to @tdmr_num. > > */ > > OK, so something else allocated the 'tdmr_array' and it's being passed > in here to fill it out. "construct" and "create" are both near synonyms > for "allocate", which isn't even being done here. > > We want something here that will make it clear that this function is > taking an already populated list of TDMRs and filling it out. > "fill_tmdrs()" seems like it might be a better choice. > > This is also a place where better words can help. If the function is > called "construct", then there's *ZERO* value in using the same word in > the comment. Using a word that is a close synonym but that can contrast > it with something different would be really nice, say: Thanks for the tip! > > This is also a place where the calling convention can be used to add > clarity. If you implicitly use a global variable, you have to explain > that. But, if you pass *in* a variable, it's a lot more clear. > > Take this, for instance: > > /* > * Take the memory referenced in @tdx_memlist and populate the > * preallocated @tmdr_array, following all the special alignment > * and size rules for TDMR. > */ > static int fill_out_tdmrs(struct list_head *tdx_memlist, > struct tdmr_info *tdmr_array) > { > ... > > That's 100% crystal clear about what's going on. You know what the > inputs are and the outputs. You also know why this is even necessary. > It's implied a bit, but it's because TDMRs have special rules about > size/alignment and tdx_memlists do not. Agreed. Let me try this out.
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 26048c6b0170..57b448de59a0 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -445,6 +445,24 @@ static int build_tdx_memory(void) return ret; } +/* TDMR must be 1gb aligned */ +#define TDMR_ALIGNMENT BIT_ULL(30) +#define TDMR_PFN_ALIGNMENT (TDMR_ALIGNMENT >> PAGE_SHIFT) + +/* Align up and down the address to TDMR boundary */ +#define TDMR_ALIGN_DOWN(_addr) ALIGN_DOWN((_addr), TDMR_ALIGNMENT) +#define TDMR_ALIGN_UP(_addr) ALIGN((_addr), TDMR_ALIGNMENT) + +static inline u64 tdmr_start(struct tdmr_info *tdmr) +{ + return tdmr->base; +} + +static inline u64 tdmr_end(struct tdmr_info *tdmr) +{ + return tdmr->base + tdmr->size; +} + /* Calculate the actual TDMR_INFO size */ static inline int cal_tdmr_size(void) { @@ -492,14 +510,98 @@ static struct tdmr_info *alloc_tdmr_array(int *array_sz) return alloc_pages_exact(*array_sz, GFP_KERNEL | __GFP_ZERO); } +static struct tdmr_info *tdmr_array_entry(struct tdmr_info *tdmr_array, + int idx) +{ + return (struct tdmr_info *)((unsigned long)tdmr_array + + cal_tdmr_size() * idx); +} + +/* + * Create TDMRs to cover all TDX memory regions. The actual number + * of TDMRs is set to @tdmr_num. + */ +static int create_tdmrs(struct tdmr_info *tdmr_array, int *tdmr_num) +{ + struct tdx_memblock *tmb; + int tdmr_idx = 0; + + /* + * Loop over TDX memory regions and create TDMRs to cover them. + * To keep it simple, always try to use one TDMR to cover + * one memory region. + */ + list_for_each_entry(tmb, &tdx_memlist, list) { + struct tdmr_info *tdmr; + u64 start, end; + + tdmr = tdmr_array_entry(tdmr_array, tdmr_idx); + start = TDMR_ALIGN_DOWN(tmb->start_pfn << PAGE_SHIFT); + end = TDMR_ALIGN_UP(tmb->end_pfn << PAGE_SHIFT); + + /* + * If the current TDMR's size hasn't been initialized, + * it is a new TDMR to cover the new memory region. + * Otherwise, the current TDMR has already covered the + * previous memory region. In the latter case, check + * whether the current memory region has been fully or + * partially covered by the current TDMR, since TDMR is + * 1G aligned. + */ + if (tdmr->size) { + /* + * Loop to the next memory region if the current + * block has already been fully covered by the + * current TDMR. + */ + if (end <= tdmr_end(tdmr)) + continue; + + /* + * If part of the current memory region has + * already been covered by the current TDMR, + * skip the already covered part. + */ + if (start < tdmr_end(tdmr)) + start = tdmr_end(tdmr); + + /* + * Create a new TDMR to cover the current memory + * region, or the remaining part of it. + */ + tdmr_idx++; + if (tdmr_idx >= tdx_sysinfo.max_tdmrs) + return -E2BIG; + + tdmr = tdmr_array_entry(tdmr_array, tdmr_idx); + } + + tdmr->base = start; + tdmr->size = end - start; + } + + /* @tdmr_idx is always the index of last valid TDMR. */ + *tdmr_num = tdmr_idx + 1; + + return 0; +} + /* * Construct an array of TDMRs to cover all TDX memory ranges. * The actual number of TDMRs is kept to @tdmr_num. */ static int construct_tdmrs(struct tdmr_info *tdmr_array, int *tdmr_num) { + int ret; + + ret = create_tdmrs(tdmr_array, tdmr_num); + if (ret) + goto err; + /* Return -EINVAL until constructing TDMRs is done */ - return -EINVAL; + ret = -EINVAL; +err: + return ret; } /*