From patchwork Fri Dec 8 17:17:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: tip-bot2 for Thomas Gleixner X-Patchwork-Id: 175969 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp5599006vqy; Fri, 8 Dec 2023 09:18:53 -0800 (PST) X-Google-Smtp-Source: AGHT+IGiK4kW6hT4XAYoftrh5Zfoyz6NBHm0NebM3aLB5TB48eJxMFNL39IoRj/HI7R/YDLW0apF X-Received: by 2002:a17:90a:678b:b0:286:6cc1:273 with SMTP id o11-20020a17090a678b00b002866cc10273mr379398pjj.62.1702055933467; Fri, 08 Dec 2023 09:18:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702055933; cv=none; d=google.com; s=arc-20160816; b=bPBVHqvm7LU0PKWgR30Dz60FaA2K0yYZHFCY+GZHeWn9NaVADWhE1JYrPJx0K7/BW9 5JR1jwZZE/2omL13/b+DIoPUcm5CMV9L4dWn2cf7u/MVw8f5wn+ab7jR3Ao25YCBjBOH Bm5efirNsFSkXw7YMWc+9UFApqIvwk05hjAv25ZBKBlfWkdRlGuKuQjV44ky+igKfZpM aaPN8NDjsbbF8zvEKijuiHFuRrQRxRfyWfgSzoXILcGA7gLaq8A88HMJgwgxkF9l/WMz 7a5wKtl41zNj7+n1T55ozJAk+G+KND46ZW4gYsI3dMpMPs/hYsNVWhukku68PgcJhOx7 K77w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:robot-unsubscribe :robot-id:message-id:mime-version:cc:subject:to:reply-to:sender:from :dkim-signature:dkim-signature:date; bh=vxsbRLnoMOcdhsSw13YjxTL7dl98INtBFW0ShzSwgRo=; fh=zR1psDCdxBzlgHbDlTbgpi16arX0jOpnp9g6mNKbXVc=; b=qf51WD4Haz308adWYX/QvANntVGvbd1MR/t7tyS/XIJybYb2wgl42NHl5/hbKepiiW ymO6CJ6dE7eCMDTjqrdWHvkJtiqTF791OYrFJn1f7v4XOZwyVtoBg5DK0bQfQGT/FRLM HmE1pKgrgSmvz7xJu7osSl49CJogzFNUpPcUDjbjyC/5kV54Fn0iidZPNJY4ogWMbdjm 1Vg/GlY80N2RlMDkfpZHOJiWInD0ebuyqCJa3cMyzqVjJegAGVqWS84QpXeivhYjwbBP 0zx4p8UD2ZkSF6zCXqrvzn8bRke62i5tVOa8kX6qMpUNUdtqrOc0VwVtPRsCBQD2sO/Z NkJw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=QB0SgZm9; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=b7pVCGvR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from morse.vger.email (morse.vger.email. [23.128.96.31]) by mx.google.com with ESMTPS id pv14-20020a17090b3c8e00b00286a5319395si2056910pjb.123.2023.12.08.09.18.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Dec 2023 09:18:53 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) client-ip=23.128.96.31; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=QB0SgZm9; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=b7pVCGvR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id 2712283AEE12; Fri, 8 Dec 2023 09:18:48 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1574535AbjLHRSP (ORCPT + 99 others); Fri, 8 Dec 2023 12:18:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41434 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235984AbjLHRSB (ORCPT ); Fri, 8 Dec 2023 12:18:01 -0500 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EE10B19A4; Fri, 8 Dec 2023 09:17:28 -0800 (PST) Date: Fri, 08 Dec 2023 17:17:26 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702055847; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=vxsbRLnoMOcdhsSw13YjxTL7dl98INtBFW0ShzSwgRo=; b=QB0SgZm9fNLRsYHqcTuPfUkdqpOn9B896CA9XVrxB2B+g9j+y/OIl8dKFiU/d6yU97Fg7j IldF/GYHHcSFWdNqbUoCHsDdh4Ak+r+Ay8Bw+AnkQycQ6bRSzPQi0dYM6heM/iygeg4cJx 4UV4TKa9N6vK8S6Pza8Ny/Utskv53ddWJdT/bGX7g3K/qFJmzdNgjud/ARvrZ368ZCHtBr olK5AOczvZ8gpjl6zaBEyUWXrX7hIp3PkN+/HRJtcBAjrMCe5dt0RNmsBXHMQFBZNt2WfY +cNjoygqSo6XVquTZJuPNefORebpnlC62t2747wzrpvLdwm8lM79smfNMFmEhQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702055847; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=vxsbRLnoMOcdhsSw13YjxTL7dl98INtBFW0ShzSwgRo=; b=b7pVCGvRZOGW6RGTjuOgk+UPMKGhe9fpFgS+NLXbV4s2wDoIqagsyiAJkvtldS41HQIPiJ +sDt4o9SdqLPDFBA== From: "tip-bot2 for Kai Huang" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: x86/tdx] x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory Cc: Kai Huang , Dave Hansen , "Huang, Ying" , Isaku Yamahata , "Kirill A. Shutemov" , x86@kernel.org, linux-kernel@vger.kernel.org MIME-Version: 1.0 Message-ID: <170205584628.398.13659275218002139722.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Fri, 08 Dec 2023 09:18:48 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784735002979292172 X-GMAIL-MSGID: 1784735002979292172 The following commit has been merged into the x86/tdx branch of tip: Commit-ID: abe8dbab8f9f8370c26e7b79b49ed795c1b6b70f Gitweb: https://git.kernel.org/tip/abe8dbab8f9f8370c26e7b79b49ed795c1b6b70f Author: Kai Huang AuthorDate: Fri, 08 Dec 2023 09:07:27 -08:00 Committer: Dave Hansen CommitterDate: Fri, 08 Dec 2023 09:12:16 -08:00 x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory Start to transit out the "multi-steps" to initialize the TDX module. TDX provides increased levels of memory confidentiality and integrity. This requires special hardware support for features like memory encryption and storage of memory integrity checksums. Not all memory satisfies these requirements. As a result, TDX introduced the concept of a "Convertible Memory Region" (CMR). During boot, the firmware builds a list of all of the memory ranges which can provide the TDX security guarantees. The list of these ranges is available to the kernel by querying the TDX module. CMRs tell the kernel which memory is TDX compatible. The kernel needs to build a list of memory regions (out of CMRs) as "TDX-usable" memory and pass them to the TDX module. Once this is done, those "TDX-usable" memory regions are fixed during module's lifetime. To keep things simple, assume that all TDX-protected memory will come from the page allocator. Make sure all pages in the page allocator *are* TDX-usable memory. As TDX-usable memory is a fixed configuration, take a snapshot of the memory configuration from memblocks at the time of module initialization (memblocks are modified on memory hotplug). This snapshot is used to enable TDX support for *this* memory configuration only. Use a memory hotplug notifier to ensure that no other RAM can be added outside of this configuration. This approach requires all memblock memory regions at the time of module initialization to be TDX convertible memory to work, otherwise module initialization will fail in a later SEAMCALL when passing those regions to the module. This approach works when all boot-time "system RAM" is TDX convertible memory and no non-TDX-convertible memory is hot-added to the core-mm before module initialization. For instance, on the first generation of TDX machines, both CXL memory and NVDIMM are not TDX convertible memory. Using kmem driver to hot-add any CXL memory or NVDIMM to the core-mm before module initialization will result in failure to initialize the module. The SEAMCALL error code will be available in the dmesg to help user to understand the failure. Signed-off-by: Kai Huang Signed-off-by: Dave Hansen Reviewed-by: "Huang, Ying" Reviewed-by: Isaku Yamahata Reviewed-by: Dave Hansen Reviewed-by: Kirill A. Shutemov Link: https://lore.kernel.org/all/20231208170740.53979-7-dave.hansen%40intel.com --- arch/x86/Kconfig | 1 +- arch/x86/kernel/setup.c | 2 +- arch/x86/virt/vmx/tdx/tdx.c | 167 ++++++++++++++++++++++++++++++++++- arch/x86/virt/vmx/tdx/tdx.h | 6 +- 4 files changed, 174 insertions(+), 2 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index eb6e639..2c69ef8 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1971,6 +1971,7 @@ config INTEL_TDX_HOST depends on X86_64 depends on KVM_INTEL depends on X86_X2APIC + select ARCH_KEEP_MEMBLOCK help Intel Trust Domain Extensions (TDX) protects guest VMs from malicious host and certain physical attacks. This option enables necessary TDX diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 1526747..9597c00 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1033,6 +1033,8 @@ void __init setup_arch(char **cmdline_p) * * Moreover, on machines with SandyBridge graphics or in setups that use * crashkernel the entire 1M is reserved anyway. + * + * Note the host kernel TDX also requires the first 1MB being reserved. */ x86_platform.realmode_reserve(); diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index ecb0df8..6a3585b 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -16,6 +16,12 @@ #include #include #include +#include +#include +#include +#include +#include +#include #include #include #include @@ -31,6 +37,9 @@ static DEFINE_PER_CPU(bool, tdx_lp_initialized); static enum tdx_module_status_t tdx_module_status; static DEFINE_MUTEX(tdx_module_lock); +/* All TDX-usable memory regions. Protected by mem_hotplug_lock. */ +static LIST_HEAD(tdx_memlist); + typedef void (*sc_err_func_t)(u64 fn, u64 err, struct tdx_module_args *args); static inline void seamcall_err(u64 fn, u64 err, struct tdx_module_args *args) @@ -155,12 +164,102 @@ int tdx_cpu_enable(void) } EXPORT_SYMBOL_GPL(tdx_cpu_enable); +/* + * Add a memory region as a TDX memory block. The caller must make sure + * all memory regions are added in address ascending order and don't + * overlap. + */ +static int add_tdx_memblock(struct list_head *tmb_list, unsigned long start_pfn, + unsigned long end_pfn) +{ + struct tdx_memblock *tmb; + + tmb = kmalloc(sizeof(*tmb), GFP_KERNEL); + if (!tmb) + return -ENOMEM; + + INIT_LIST_HEAD(&tmb->list); + tmb->start_pfn = start_pfn; + tmb->end_pfn = end_pfn; + + /* @tmb_list is protected by mem_hotplug_lock */ + list_add_tail(&tmb->list, tmb_list); + return 0; +} + +static void free_tdx_memlist(struct list_head *tmb_list) +{ + /* @tmb_list is protected by mem_hotplug_lock */ + while (!list_empty(tmb_list)) { + struct tdx_memblock *tmb = list_first_entry(tmb_list, + struct tdx_memblock, list); + + list_del(&tmb->list); + kfree(tmb); + } +} + +/* + * Ensure that all memblock memory regions are convertible to TDX + * memory. Once this has been established, stash the memblock + * ranges off in a secondary structure because memblock is modified + * in memory hotplug while TDX memory regions are fixed. + */ +static int build_tdx_memlist(struct list_head *tmb_list) +{ + unsigned long start_pfn, end_pfn; + int i, ret; + + for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, NULL) { + /* + * The first 1MB is not reported as TDX convertible memory. + * Although the first 1MB is always reserved and won't end up + * to the page allocator, it is still in memblock's memory + * regions. Skip them manually to exclude them as TDX memory. + */ + start_pfn = max(start_pfn, PHYS_PFN(SZ_1M)); + if (start_pfn >= end_pfn) + continue; + + /* + * Add the memory regions as TDX memory. The regions in + * memblock has already guaranteed they are in address + * ascending order and don't overlap. + */ + ret = add_tdx_memblock(tmb_list, start_pfn, end_pfn); + if (ret) + goto err; + } + + return 0; +err: + free_tdx_memlist(tmb_list); + return ret; +} + static int init_tdx_module(void) { + int ret; + + /* + * To keep things simple, assume that all TDX-protected memory + * will come from the page allocator. Make sure all pages in the + * page allocator are TDX-usable memory. + * + * Build the list of "TDX-usable" memory regions which cover all + * pages in the page allocator to guarantee that. Do it while + * holding mem_hotplug_lock read-lock as the memory hotplug code + * path reads the @tdx_memlist to reject any new memory. + */ + get_online_mems(); + + ret = build_tdx_memlist(&tdx_memlist); + if (ret) + goto out_put_tdxmem; + /* * TODO: * - * - Build the list of TDX-usable memory regions. * - Get TDX module "TD Memory Region" (TDMR) global metadata. * - Construct a list of TDMRs to cover all TDX-usable memory * regions. @@ -170,7 +269,14 @@ static int init_tdx_module(void) * * Return error before all steps are done. */ - return -EINVAL; + ret = -EINVAL; +out_put_tdxmem: + /* + * @tdx_memlist is written here and read at memory hotplug time. + * Lock out memory hotplug code while building it. + */ + put_online_mems(); + return ret; } static int __tdx_enable(void) @@ -257,6 +363,56 @@ static __init int record_keyid_partitioning(u32 *tdx_keyid_start, return 0; } +static bool is_tdx_memory(unsigned long start_pfn, unsigned long end_pfn) +{ + struct tdx_memblock *tmb; + + /* + * This check assumes that the start_pfn<->end_pfn range does not + * cross multiple @tdx_memlist entries. A single memory online + * event across multiple memblocks (from which @tdx_memlist + * entries are derived at the time of module initialization) is + * not possible. This is because memory offline/online is done + * on granularity of 'struct memory_block', and the hotpluggable + * memory region (one memblock) must be multiple of memory_block. + */ + list_for_each_entry(tmb, &tdx_memlist, list) { + if (start_pfn >= tmb->start_pfn && end_pfn <= tmb->end_pfn) + return true; + } + return false; +} + +static int tdx_memory_notifier(struct notifier_block *nb, unsigned long action, + void *v) +{ + struct memory_notify *mn = v; + + if (action != MEM_GOING_ONLINE) + return NOTIFY_OK; + + /* + * Empty list means TDX isn't enabled. Allow any memory + * to go online. + */ + if (list_empty(&tdx_memlist)) + return NOTIFY_OK; + + /* + * The TDX memory configuration is static and can not be + * changed. Reject onlining any memory which is outside of + * the static configuration whether it supports TDX or not. + */ + if (is_tdx_memory(mn->start_pfn, mn->start_pfn + mn->nr_pages)) + return NOTIFY_OK; + + return NOTIFY_BAD; +} + +static struct notifier_block tdx_memory_nb = { + .notifier_call = tdx_memory_notifier, +}; + void __init tdx_init(void) { u32 tdx_keyid_start, nr_tdx_keyids; @@ -280,6 +436,13 @@ void __init tdx_init(void) return; } + err = register_memory_notifier(&tdx_memory_nb); + if (err) { + pr_err("initialization failed: register_memory_notifier() failed (%d)\n", + err); + return; + } + /* * Just use the first TDX KeyID as the 'global KeyID' and * leave the rest for TDX guests. diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index a3c5227..c11e0a7 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -27,4 +27,10 @@ enum tdx_module_status_t { TDX_MODULE_ERROR }; +struct tdx_memblock { + struct list_head list; + unsigned long start_pfn; + unsigned long end_pfn; +}; + #endif