From patchwork Wed Jan 31 11:31:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kai Huang X-Patchwork-Id: 194705 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp1825846dyb; Wed, 31 Jan 2024 03:41:33 -0800 (PST) X-Google-Smtp-Source: AGHT+IH3fu5O4fXuJaEGoV5sg0EGyo+WZjZMsoeWZvKCHGbUBwKyrqm0d4q6Ah7RU+S92b/mi4Wq X-Received: by 2002:a05:6808:4d3:b0:3bd:cddd:7634 with SMTP id a19-20020a05680804d300b003bdcddd7634mr1317571oie.29.1706701292934; Wed, 31 Jan 2024 03:41:32 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706701292; cv=pass; d=google.com; s=arc-20160816; b=FkIpLzgMQEPyuBth87kJztME9k0ej0axAhQpH3jKK5MM3zD3+tV/CFieNv33BchbjR kG/lRhjOQL9ZfJOcub+xto3fMTnk00/8YkF4hF1WgMOR5VBHrHd2DuQtKRp6N+nkAX/x Bt9IJnYWsjXKYYoCXIE0Mxz2/V4Tvu+AXWBDeAC5NNdaL/vW33zalEsfXy2Y9Km67Ckn wSk6AxTIrdN0Fxe12vkXudy6qpLwm1ViXrLtsT8rD+Z4xSHMW4/LBCMNODiG/Ddr3NmX ekoThNJPC1/lnUaNC+cjHc1lHTHT8KdzcrwE4gnBK++IvquXO1LUr9qr3DiOdmCq8ZPC U5Pw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=UnjLSx+Oa8bDWXRFx01zrrzCoDrQ2NKWEWE5hOOx5Es=; fh=hSNLS8OYE5zEk849oEbeJLdmZZEdKjk1TbyzIaF2Jvs=; b=fPTDlOLKV+6+RByyiQ/4zoJMh3QYgPAjJF4SP+bO7dthV39Ak4JTxUBeq7OkrYVhh9 b6XMg1zmuNwblBvUg8G18sGwkkU+qcEwTKQ4K3f1ZFhrwaOOgxx59408PM+44T7iHg+K HWHvtDesMh2uKncxvwgsn8gASmrSUCPTNLDrnKz0nW7EsmW5waoISRLUTLmKoEFUaY9d +Hk/SevVBpgklbJnRFP/dA6H+96f95wkXant6j5r/ivvjtRO12C+fv2bav4PZASmrbkg hcxk1dnakZIOzdKtzTvXNnpvgQgkwznPkxFpXW2B3HIHOa1/QeGPTN/UhEogzhI45WTR aOHA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="G7Jz/XYa"; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-46352-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-46352-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Forwarded-Encrypted: i=1; AJvYcCUnYePWArSF7EskCuO0nrVjLUlsPnSPwYFxNBf4VhRQn3u4uc24Q14DE8J5RdAYfazT6+ZlXL1m1U7rqDPvakWmPvsVyw== Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id s24-20020a62e718000000b006ddcc9294ffsi9371431pfh.47.2024.01.31.03.41.32 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 31 Jan 2024 03:41:32 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-46352-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="G7Jz/XYa"; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-46352-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-46352-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 0A37A29119E for ; Wed, 31 Jan 2024 11:33:18 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 9DE9C79935; Wed, 31 Jan 2024 11:32:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="G7Jz/XYa" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 23FEB77F2C for ; Wed, 31 Jan 2024 11:32:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.12 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706700740; cv=none; b=iONQAaSdkgpEBQIDxJDdAjw0vISHebSe9rYhPHNmyVXnqQypVsrToe7m6/s4M6CQtVyUfFryKtlD7+QDr71O+nzI2WpC+sMyIqtvR47pdmLMUUxVBT4UTnTqHpMLimg+qEBPgwViLIvdgica2iJf5sm1C94JmkrGDG2gPeQ4jTw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706700740; c=relaxed/simple; bh=3ayy1GCfAP4s1+4FRo5mCAe8slIxZjOZp22vN+iG2s8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ZDhovmJIGyLJ+k7qRgGheGdsSSJES+Z4XevF3I02vzscfEYO+hEvZPp1OvLBF5yppXN8I3fgrzV87zCJVls6ZFK7fbxSi0VEsO9JWQhmSMdtMrZUeHEzWexSSALny5FQd0O0qqf9sXmXeHMw/b/5nYuNB62MRwUTrhvA93JZrRg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=G7Jz/XYa; arc=none smtp.client-ip=192.198.163.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706700732; x=1738236732; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3ayy1GCfAP4s1+4FRo5mCAe8slIxZjOZp22vN+iG2s8=; b=G7Jz/XYadVTwCVWPk8W01x6k/ZxuyEkD8kkZYUirj5zOy2Bx+9zpkdQS U+U+Tunp/V7rJWE88qo3LmunydS1gtV9lKW45+zo5XvgbrL/DW7L684yb TeWjCflePJDJ7yR4dX4PiT0+o00PbZaWWYlaKjHKyBTzJVECAnBRQGEww njIlB8BLVXBPQ5Jg0QinL0tfh7oHIaCyOpvIbjtmXuSHclE1O8jK2G48o ZCbXbWZUaPZFhZqYe8NCxwDjvxRxo+ZXxgZUtbkbsnB4KwsDtsn2JfzQv GPk1hTb2BBw2QWgYKNqWce2VrxpyOwqJIvESiHpTejKVeAfaz75NE+JWz A==; X-IronPort-AV: E=McAfee;i="6600,9927,10969"; a="3414163" X-IronPort-AV: E=Sophos;i="6.05,231,1701158400"; d="scan'208";a="3414163" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jan 2024 03:32:11 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10969"; a="878764791" X-IronPort-AV: E=Sophos;i="6.05,231,1701158400"; d="scan'208";a="878764791" Received: from server.sh.intel.com ([10.239.53.117]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jan 2024 03:32:06 -0800 From: "Huang, Kai" To: linux-kernel@vger.kernel.org Cc: x86@kernel.org, dave.hansen@intel.com, kirill.shutemov@linux.intel.com, tglx@linutronix.de, bp@alien8.de, mingo@redhat.com, hpa@zytor.com, luto@kernel.org, peterz@infradead.org, thomas.lendacky@amd.com, chao.gao@intel.com, bhe@redhat.com, nik.borisov@suse.com, pbonzini@redhat.com Subject: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec Date: Wed, 31 Jan 2024 11:31:53 +0000 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789606014692205586 X-GMAIL-MSGID: 1789606014692205586 From: Kai Huang Currently on AMD SME platforms, during kexec() caches are flushed manually before jumping to the new kernel due to memory encryption. Intel TDX needs to flush cachelines of TDX private memory before jumping to the second kernel too, otherwise they may silently corrupt the new kernel. Instead of sprinkling both AMD and Intel's specific checks around, introduce a new CC_ATTR_HOST_MEM_INCOHERENT attribute to unify both Intel and AMD, and simplify the logic: Could the old kernel leave incoherent caches around? If so, do WBINVD. Convert the AMD SME to use this new CC attribute. A later patch will utilize this new attribute for Intel TDX too. Specifically, AMD SME flushes caches at two places: 1) stop_this_cpu(); 2) relocate_kernel(). stop_this_cpu() checks the CPUID directly to do WBINVD for the reason that the current kernel's SME enabling status may not match the new kernel's choice. However the relocate_kernel() only does the WBINVD when the current kernel has enabled SME for the reason that the new kernel is always placed in an "unencrypted" area. To simplify the logic, for AMD SME change to always use the way that is done in stop_this_cpu(). This will cause an additional WBINVD in relocate_kernel() when the current kernel hasn't enabled SME (e.g., disabled by kernel command line), but this is acceptable for the sake of having less complicated code (see [1] for the relevant discussion). Note currently the kernel only advertises CC vendor for AMD SME when SME is actually enabled by the kernel. To always advertise the new CC_ATTR_HOST_MEM_INCOHERENT regardless of the kernel's SME enabling status, change to set CC vendor as long as the hardware has enabled SME. Note "advertising CC_ATTR_HOST_MEM_INCOHERENT when the hardware has enabled SME" is still different from "checking the CPUID" (the way that is done in stop_this_cpu()), but technically the former also serves the purpose and is actually more accurate. Such change allows sme_me_mask to be 0 while CC vendor reports as AMD. But this doesn't impact other CC attributes on AMD platforms, nor does it impact the cc_mkdec()/cc_mkenc(). [1] https://lore.kernel.org/lkml/cbc9c527-17e5-4a63-80fe-85451394cc7c@amd.com/ Suggested-by: Dave Hansen Signed-off-by: Kai Huang --- arch/x86/coco/core.c | 13 +++++++++++++ arch/x86/kernel/machine_kexec_64.c | 2 +- arch/x86/kernel/process.c | 14 +++----------- arch/x86/mm/mem_encrypt_identity.c | 11 ++++++++++- include/linux/cc_platform.h | 15 +++++++++++++++ 5 files changed, 42 insertions(+), 13 deletions(-) diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c index eeec9986570e..8d6d727e6e18 100644 --- a/arch/x86/coco/core.c +++ b/arch/x86/coco/core.c @@ -72,6 +72,19 @@ static bool noinstr amd_cc_platform_has(enum cc_attr attr) case CC_ATTR_HOST_MEM_ENCRYPT: return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED); + case CC_ATTR_HOST_MEM_INCOHERENT: + /* + * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME has + * enabled on the platform regardless whether the kernel + * has actually enabled the SME. + */ + return !(sev_status & MSR_AMD64_SEV_ENABLED); + + /* + * For all CC_ATTR_GUEST_* there's no need to check sme_me_mask + * as it must be true when there's any SEV enable bit set in + * sev_status. + */ case CC_ATTR_GUEST_MEM_ENCRYPT: return sev_status & MSR_AMD64_SEV_ENABLED; diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c index bc0a5348b4a6..c9c6974e2e9c 100644 --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -358,7 +358,7 @@ void machine_kexec(struct kimage *image) (unsigned long)page_list, image->start, image->preserve_context, - cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)); + cc_platform_has(CC_ATTR_HOST_MEM_INCOHERENT)); #ifdef CONFIG_KEXEC_JUMP if (image->preserve_context) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index ab49ade31b0d..2c7e8d9889c0 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -813,18 +813,10 @@ void __noreturn stop_this_cpu(void *dummy) mcheck_cpu_clear(c); /* - * Use wbinvd on processors that support SME. This provides support - * for performing a successful kexec when going from SME inactive - * to SME active (or vice-versa). The cache must be cleared so that - * if there are entries with the same physical address, both with and - * without the encryption bit, they don't race each other when flushed - * and potentially end up with the wrong entry being committed to - * memory. - * - * Test the CPUID bit directly because the machine might've cleared - * X86_FEATURE_SME due to cmdline options. + * Use wbinvd on processors that the first kernel *could* + * potentially leave incoherent cachelines. */ - if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0))) + if (cc_platform_has(CC_ATTR_HOST_MEM_INCOHERENT)) native_wbinvd(); /* diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c index 7f72472a34d6..87e4fddab770 100644 --- a/arch/x86/mm/mem_encrypt_identity.c +++ b/arch/x86/mm/mem_encrypt_identity.c @@ -570,9 +570,19 @@ void __init sme_enable(struct boot_params *bp) msr = __rdmsr(MSR_AMD64_SYSCFG); if (!(msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT)) return; + + /* + * Always set CC vendor when the platform has SME enabled + * regardless whether the kernel will actually activates the + * SME or not. This reports the CC_ATTR_HOST_MEM_INCOHERENT + * being true as long as the platform has SME enabled so that + * stop_this_cpu() can do necessary WBINVD during kexec(). + */ + cc_vendor = CC_VENDOR_AMD; } else { /* SEV state cannot be controlled by a command line option */ sme_me_mask = me_mask; + cc_vendor = CC_VENDOR_AMD; goto out; } @@ -608,7 +618,6 @@ void __init sme_enable(struct boot_params *bp) out: if (sme_me_mask) { physical_mask &= ~sme_me_mask; - cc_vendor = CC_VENDOR_AMD; cc_set_mask(sme_me_mask); } } diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h index cb0d6cd1c12f..2f7273596102 100644 --- a/include/linux/cc_platform.h +++ b/include/linux/cc_platform.h @@ -42,6 +42,21 @@ enum cc_attr { */ CC_ATTR_HOST_MEM_ENCRYPT, + /** + * @CC_ATTR_HOST_MEM_INCOHERENT: Host memory encryption can be + * incoherent + * + * The platform/OS is running as a bare-metal system or a hypervisor. + * The memory encryption engine might have left non-cache-coherent + * data in the caches that needs to be flushed. + * + * Use this in places where the cache coherency of the memory matters + * but the encryption status does not. + * + * Includes all systems that set CC_ATTR_HOST_MEM_ENCRYPT. + */ + CC_ATTR_HOST_MEM_INCOHERENT, + /** * @CC_ATTR_GUEST_MEM_ENCRYPT: Guest memory encryption is active * From patchwork Wed Jan 31 11:31:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kai Huang X-Patchwork-Id: 194696 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp1821760dyb; Wed, 31 Jan 2024 03:32:49 -0800 (PST) X-Google-Smtp-Source: AGHT+IHaX0M9E0Z1v43HOcFXVH9093HRyduu1qfGfij8eGBZ2ET6zcTGAFIFMORKaQGlVhXruaBW X-Received: by 2002:ad4:5743:0:b0:68c:4f18:b057 with SMTP id q3-20020ad45743000000b0068c4f18b057mr1553666qvx.39.1706700769676; Wed, 31 Jan 2024 03:32:49 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706700769; cv=pass; d=google.com; s=arc-20160816; b=dhdcN3m20l5fgfGoZnxGTKnzztkW2lusr2W/YSXcFstjBt8UDZU7zBZ1xpNAUD5vg8 YVtW4HIxlMSdS3GQumKu0oagp2d0rczW5/KmJ80NWTjzv8QMaouMndUr9wxzAyNBbd8N atvhuPkKauucLOMg3Rv1BnQ96f1ngnivmzsI3DdpaLDhICd7WPFBiNvUMPdb80XB68HJ wORh3p6AobE4jkU6+2bWgxXwlr8Nm+bL9WXBInvrlbzHu2ZoMcoVL2wqd7zHgkO6tmtv T+2RLG84oZWolZ2jzDjUDwszT6rI0Ly5UCwxqGpettK/JwJXUDl/cc2O/1n16i7/xoIY BujA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=L83RsMudGs9zQO0uJJy+ASnO9AXLtwQyy6EBBToDTPc=; fh=T3q/119teAZvm8zaDiaVGcjbVe87hHJu126JRukB+dg=; b=BeuQRupVyLbT9S4JCeUHK04FyoMm/brXIbTL2TpkcWGS5duyWQI+ULYN27SloigfQl UfJ7wYCFgOctu80b/I8xfnz7jpdzphocRzMt9/UQbuoOlyup7nra17xVqQi0K5NmaZYS MnukME4r7ugeYExV3Y5JzhrU2UQMKoZGFlFi1cXSQ1ocKAxv4/0r6DHhYao9LUylgMS+ yCGxn889wT7ICfDSV+053DjBoNLhtcuvEYE5iKEdjH1L40eBG+kuia41ueT7rH+2Qtuw 9S8SCA8h5Ya0H7oILfOQrPQC2v3bJvaxi4vX242pz1FV+CHSo1dcKfQHV+CWv0M8l6TG T7RA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=d5DvL+Ch; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-46350-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-46350-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Forwarded-Encrypted: i=1; AJvYcCUIZaZ3M0I9/88bpv69VSfyWZAvrxKQ8FeQStY776CmYpkJej3gNJV3EC93I1FwxTaio1YZJT5TE8FpnuaCn2h7tGrkYA== Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id 20-20020ac85954000000b0042bea9f793esi810546qtz.608.2024.01.31.03.32.49 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 31 Jan 2024 03:32:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-46350-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=d5DvL+Ch; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-46350-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-46350-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 77C9A1C2090A for ; Wed, 31 Jan 2024 11:32:49 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 0738A7866B; Wed, 31 Jan 2024 11:32:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="d5DvL+Ch" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 76B6D78676 for ; Wed, 31 Jan 2024 11:32:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.12 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706700738; cv=none; b=sXIRX6sIrCjosGIlvxRHnNBCmHfum8yLmTBFlNZuymUb2IO5b4ZkkwX7lKF1NnqNjHTLI+XG7d2s7BMXbr16hQiC59+bqvg9vfVcqQNXrfqdkViGeHs5A8xGss7/d9iFVSNFiXYS3a++DPfoSJJfixIWHo0KK016wVPEdnMjGIA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706700738; c=relaxed/simple; bh=YJkpXfj77W6zId/DBHAD6wMYoSjxGXRmcWi4w8zk/iY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=kT+gPbtV50JlfbDyD0fOta8oflTdcnkYneBAaYNhNX5MhJgV5KC3DUeXTl6f0d8dq2HeuEJMTDLiKwRlhEfmQ5LMVtrq73t/xHDka46Te6cyq6JoqemaL+WpXx78LCaA9226CzdK/AU98Ths6NrOjiu7wmZrsKEA+ZSfAGIDpLY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=d5DvL+Ch; arc=none smtp.client-ip=192.198.163.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706700734; x=1738236734; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YJkpXfj77W6zId/DBHAD6wMYoSjxGXRmcWi4w8zk/iY=; b=d5DvL+ChUDBvlVnEbdQZioyHTdviW0BkX9OmBLqQnsiPYSETKmyYDYIf xy3CTewBeXz+g2p7rUDSz4HMkf71XKHhLyJ21ypFzWfKEbBdXwe9VAEdb d2PhQO1gacDn33hw/zk8ByaTkKNEQOmNKa+PYdejflMFGkrIoS3xzlXoV veV0w3aLyevNexrmwxVcqa8g1yqwEZXnjdDHGZ5BWwB87+9ARpNpGCVzi qRvnkoyjlq1fdDG/tUW+Ulwzch6V+cDC5Mw5cCYxl9rJ4zOhOQdKgZKRZ dvEDYQsqn/uEqoksjaWGR0HPBlf77TxS5k0PbNGl9o6+Ip9JTfWeU8rwg w==; X-IronPort-AV: E=McAfee;i="6600,9927,10969"; a="3414182" X-IronPort-AV: E=Sophos;i="6.05,231,1701158400"; d="scan'208";a="3414182" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jan 2024 03:32:13 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10969"; a="878764802" X-IronPort-AV: E=Sophos;i="6.05,231,1701158400"; d="scan'208";a="878764802" Received: from server.sh.intel.com ([10.239.53.117]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jan 2024 03:32:10 -0800 From: "Huang, Kai" To: linux-kernel@vger.kernel.org Cc: x86@kernel.org, dave.hansen@intel.com, kirill.shutemov@linux.intel.com, tglx@linutronix.de, bp@alien8.de, mingo@redhat.com, hpa@zytor.com, luto@kernel.org, peterz@infradead.org, thomas.lendacky@amd.com, chao.gao@intel.com, bhe@redhat.com, nik.borisov@suse.com, pbonzini@redhat.com Subject: [PATCH 2/4] x86/virt/tdx: Advertise the CC_ATTR_HOST_MEM_INCOHERENT for TDX host Date: Wed, 31 Jan 2024 11:31:54 +0000 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789605466024494314 X-GMAIL-MSGID: 1789605466024494314 From: Kai Huang On the TDX capable platform, during kexec() the old kernel needs to flush dirty cachelines of all TDX private memory otherwise they may silently corrupt the new kernel's memory. Advertise the new introduced CC_ATTR_HOST_MEM_INCOHERENT attribute for TDX host platform so the cache will be flushed during kexec(). Note theoretically cache flush is only needed when TDX module is initialized, but the module initialization is done at runtime so just advertise the CC attribute when the platform has TDX enabled. Signed-off-by: Kai Huang Reviewed-by: Kirill A. Shutemov Reviewed-by: Chao Gao --- arch/x86/Kconfig | 1 + arch/x86/coco/core.c | 21 ++++++++++++++++++++- arch/x86/virt/vmx/tdx/tdx.c | 3 +++ include/linux/cc_platform.h | 3 ++- 4 files changed, 26 insertions(+), 2 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 502986237cb6..ac3b32149a77 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1975,6 +1975,7 @@ config INTEL_TDX_HOST depends on CONTIG_ALLOC depends on !KEXEC_CORE depends on X86_MCE + select ARCH_HAS_CC_PLATFORM help Intel Trust Domain Extensions (TDX) protects guest VMs from malicious host and certain physical attacks. This option enables necessary TDX diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c index 8d6d727e6e18..ecb15852b69d 100644 --- a/arch/x86/coco/core.c +++ b/arch/x86/coco/core.c @@ -12,11 +12,12 @@ #include #include +#include enum cc_vendor cc_vendor __ro_after_init = CC_VENDOR_NONE; static u64 cc_mask __ro_after_init; -static bool noinstr intel_cc_platform_has(enum cc_attr attr) +static bool noinstr intel_cc_platform_guest_has(enum cc_attr attr) { switch (attr) { case CC_ATTR_GUEST_UNROLL_STRING_IO: @@ -29,6 +30,24 @@ static bool noinstr intel_cc_platform_has(enum cc_attr attr) } } +static bool noinstr intel_cc_platform_host_has(enum cc_attr attr) +{ + switch (attr) { + case CC_ATTR_HOST_MEM_INCOHERENT: + return true; + default: + return false; + } +} + +static bool noinstr intel_cc_platform_has(enum cc_attr attr) +{ + if (boot_cpu_has(X86_FEATURE_TDX_HOST_PLATFORM)) + return intel_cc_platform_host_has(attr); + + return intel_cc_platform_guest_has(attr); +} + /* * Handle the SEV-SNP vTOM case where sme_me_mask is zero, and * the other levels of SME/SEV functionality, including C-bit diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 4d6826a76f78..9f1fed458a32 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -37,6 +37,7 @@ #include #include #include +#include #include "tdx.h" static u32 tdx_global_keyid __ro_after_init; @@ -1488,5 +1489,7 @@ void __init tdx_init(void) setup_force_cpu_cap(X86_FEATURE_TDX_HOST_PLATFORM); + cc_vendor = CC_VENDOR_INTEL; + check_tdx_erratum(); } diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h index 2f7273596102..654777d64dc0 100644 --- a/include/linux/cc_platform.h +++ b/include/linux/cc_platform.h @@ -53,7 +53,8 @@ enum cc_attr { * Use this in places where the cache coherency of the memory matters * but the encryption status does not. * - * Includes all systems that set CC_ATTR_HOST_MEM_ENCRYPT. + * Includes all systems that set CC_ATTR_HOST_MEM_ENCRYPT, but + * additionally adds TDX hosts. */ CC_ATTR_HOST_MEM_INCOHERENT, From patchwork Wed Jan 31 11:31:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kai Huang X-Patchwork-Id: 194706 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp1825880dyb; Wed, 31 Jan 2024 03:41:36 -0800 (PST) X-Google-Smtp-Source: AGHT+IE440ya5s91dGbsLq7SnGK5qs2DMgbYEAOPHk66FUidXwpljNCvIvpMSJHEIuNvvSKQGjoa X-Received: by 2002:a9d:6d18:0:b0:6df:b685:1fde with SMTP id o24-20020a9d6d18000000b006dfb6851fdemr1074386otp.30.1706701295919; Wed, 31 Jan 2024 03:41:35 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706701295; cv=pass; d=google.com; s=arc-20160816; b=mRhED/HGDvdkAcD+pBb1O3n+MLqjNOZXJq38BxIJTFmF3k7ViPMAv4My3NpUHRX+ZC F0JeVa22oSFray1eSsCuo/6v0X66aaFqfP/NizMCybX590eT6qwa5i5cuiVbaz1XnKb+ yWm4tY36Ih9iU2G1xADNIhHrqwte9GjG6nnSbfVykhA+SZbEw359QF+ypcZFPExCYJzK MNvRCov4PNYokNFQKQ8eFlMBJbUowNQb8jQxpejF3TrRD4QQv4HdnHEc5e/MAjTaCWKQ /Yu2GwnlHxcgIS+uvJ3a7U0lu58czfKtFPx0Waio95i/Y17fPz91qOzyefVRCiUFY+1V hstw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=iW+4wSsJpmw9lFSexBJ7re4k8HCcBpL7YsjgF4rpS0M=; fh=AW0zwKcTSYeuqNyWgQk79lB+a6sfrP8YmUr9/r+iptE=; b=YolBUn3AcdktmJjJYUNRDqh6VF5OCNRoJchiJMZpkZYbhJz9jce94DYYIXxwHbTVFK Oy16SkFB5oDNRyWZ8TYyOZ8VdAHKD17LGzWUGYBf7TIncL4cGNRKNhzVO1yrz22l9FI8 XfmsB+YgQgEVoq3hPcemlprKBdHBdfVE81haO1DJqPadidOw+MSK5zWA5iKzqPn/Acz+ elPe2ZWdMgDokrCUklzI/6h3MLV7P3mTUzPbgJ0xoBJpwWp/zzIJFdMDecif0ikDH+ou Jr1clFoBVL3Jjtm8R2hBYI9+WPhuSrwhXN+1oeUtrXEilzkPgMHCH8rM4bdCdH5uOUI0 +h8g==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=kjuY2fWh; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-46351-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-46351-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Forwarded-Encrypted: i=1; AJvYcCVBwHOFBUG4HrtE8cna/oKK2BTbsdO8sKWfkQAhQAcJH/d4UcYSkFbImcjpW0+epj/5tPrmv332Q6FXU4vmJp7P3phGpQ== Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id q136-20020a632a8e000000b005c600ffa335si2738166pgq.217.2024.01.31.03.41.35 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 31 Jan 2024 03:41:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-46351-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=kjuY2fWh; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-46351-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-46351-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 41A6D29114D for ; Wed, 31 Jan 2024 11:33:23 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 0B54979959; Wed, 31 Jan 2024 11:32:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="kjuY2fWh" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CDF1769D39 for ; Wed, 31 Jan 2024 11:32:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.12 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706700739; cv=none; b=TAVP1vtPKN/biA6/qrgYNtKqqAFtnY3MKb3T5uN+BSmZKiTyoK7S1lmQJvGNqwj9IbaR2xTy+oGq2x3Bke8rzsSa3nhGJ80POH1fJgAT2IKhcTMyIusaKFsbBjvKL8pfC9bQf3l+muG8N6xWyuAieTFzeFXAy8YgF1VzQy58ihg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706700739; c=relaxed/simple; bh=QLWjPgOAvzM+NdIViGkH5yb5DqSyVU9intDk4LX036k=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=q9VfQx+4sq+bzzk/pcSN7aHgeL+V4D2XheLOoY7zGfMVtjHttLqljY6ePbZxySCCL9EGYesd8uDYrRZjZ8Ea+qiHlXnjd6wC8U+DBm47QVgZLeC0m+EG64xP/lLz61IHEZubFuBT1hWALdk0d0OLdg0VUCBrg/+sQjrCXR++oaE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=kjuY2fWh; arc=none smtp.client-ip=192.198.163.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706700738; x=1738236738; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=QLWjPgOAvzM+NdIViGkH5yb5DqSyVU9intDk4LX036k=; b=kjuY2fWhkCn8CEwJut9i0U3yXXRpZCpURemPalYn3xTNWQ/vHfnlT97f GZK1lgPwKxmrz+DoPH04dzjqmlSf0tQ0ygdoyEbS8MgM7WYwaNzxhctfM ecM3Ey2GDpuM7ORL6KBu4v57a63HWpdMNq/+XJSbBQhSy4lY4Nv3TSH3T 5AgoVJGa+gTZSzw4tGFdmDZJ6KDwLuCHdcG2NHJZHMH8Gk+BTdxnG9YZV e07GTmc6MtXoee1VXrzxmTGa+o3JqexzGR3N/AJWgTtF3O0fRbGuaFXzw A6RCn9qEF7nt2AF6Vim5kaF4226AzhkL5NTksp34s1N8DRaGUVCtyL78W A==; X-IronPort-AV: E=McAfee;i="6600,9927,10969"; a="3414232" X-IronPort-AV: E=Sophos;i="6.05,231,1701158400"; d="scan'208";a="3414232" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jan 2024 03:32:17 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10969"; a="878764813" X-IronPort-AV: E=Sophos;i="6.05,231,1701158400"; d="scan'208";a="878764813" Received: from server.sh.intel.com ([10.239.53.117]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jan 2024 03:32:13 -0800 From: "Huang, Kai" To: linux-kernel@vger.kernel.org Cc: x86@kernel.org, dave.hansen@intel.com, kirill.shutemov@linux.intel.com, tglx@linutronix.de, bp@alien8.de, mingo@redhat.com, hpa@zytor.com, luto@kernel.org, peterz@infradead.org, thomas.lendacky@amd.com, chao.gao@intel.com, bhe@redhat.com, nik.borisov@suse.com, pbonzini@redhat.com Subject: [PATCH 3/4] x86/kexec(): Reset TDX private memory on platforms with TDX erratum Date: Wed, 31 Jan 2024 11:31:55 +0000 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789606018027988301 X-GMAIL-MSGID: 1789606018027988301 From: Kai Huang The first few generations of TDX hardware have an erratum. A partial write to a TDX private memory cacheline will silently "poison" the line. Subsequent reads will consume the poison and generate a machine check. According to the TDX hardware spec, neither of these things should have happened. == Background == Virtually all kernel memory accesses operations happen in full cachelines. In practice, writing a "byte" of memory usually reads a 64 byte cacheline of memory, modifies it, then writes the whole line back. Those operations do not trigger this problem. This problem is triggered by "partial" writes where a write transaction of less than cacheline lands at the memory controller. The CPU does these via non-temporal write instructions (like MOVNTI), or through UC/WC memory mappings. The issue can also be triggered away from the CPU by devices doing partial writes via DMA. == Problem == A fast warm reset doesn't reset TDX private memory. Kexec() can also boot into the new kernel directly. Thus if the old kernel has enabled TDX on the platform with this erratum, the new kernel may get unexpected machine check. Note that w/o this erratum any kernel read/write on TDX private memory should never cause machine check, thus it's OK for the old kernel to leave TDX private pages as is. == Solution == In short, with this erratum, the kernel needs to explicitly convert all TDX private pages back to normal to give the new kernel a clean slate after kexec(). The BIOS is also expected to disable fast warm reset as a workaround to this erratum, thus this implementation doesn't try to reset TDX private memory for the reboot case in the kernel but depend on the BIOS to enable the workaround. Convert TDX private pages back to normal after all remote cpus has been stopped and cache flush has been done on all cpus, when no more TDX activity can happen further. Do it in machine_kexec() to avoid the additional overhead to the normal reboot/shutdown as the kernel depends on the BIOS to disable fast warm reset for the reboot case. For now TDX private memory can only be PAMT pages. It would be ideal to cover all types of TDX private memory here, but there are practical problems to do so: 1) There's no existing infrastructure to track TDX private pages; 2) It's not feasible to query the TDX module about page type because VMX has already been stopped when KVM receives the reboot notifier, plus the result from the TDX module may not be accurate (e.g., the remote CPU could be stopped right before MOVDIR64B). One temporary solution is to blindly convert all memory pages, but it's problematic to do so too, because not all pages are mapped as writable in the direct mapping. It can be done by switching to the identical mapping created for kexec() or a new page table, but the complexity looks overkill. Therefore, rather than doing something dramatic, only reset PAMT pages here. Other kernel components which use TDX need to do the conversion on their own by intercepting the rebooting/shutdown notifier (KVM already does that). Note kexec() can happen at anytime, including when TDX module is being initialized. Register TDX reboot notifier callback to stop further TDX module initialization. If there's any ongoing module initialization, wait until it finishes. This makes sure the TDX module status is stable after the reboot notifier callback, and the later kexec() code can read module status to decide whether PAMTs are stable and available. Also stop further TDX module initialization in case of machine shutdown and halt, but not limited to kexec(), as there's no reason to do so in these cases too. Signed-off-by: Kai Huang Reviewed-by: Kirill A. Shutemov --- arch/x86/include/asm/tdx.h | 2 + arch/x86/kernel/machine_kexec_64.c | 16 +++++ arch/x86/virt/vmx/tdx/tdx.c | 98 ++++++++++++++++++++++++++++++ 3 files changed, 116 insertions(+) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index eba178996d84..ed3ac9a8a079 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -116,11 +116,13 @@ static inline u64 sc_retry(sc_func_t func, u64 fn, int tdx_cpu_enable(void); int tdx_enable(void); const char *tdx_dump_mce_info(struct mce *m); +void tdx_reset_memory(void); #else static inline void tdx_init(void) { } static inline int tdx_cpu_enable(void) { return -ENODEV; } static inline int tdx_enable(void) { return -ENODEV; } static inline const char *tdx_dump_mce_info(struct mce *m) { return NULL; } +static inline void tdx_reset_memory(void) { } #endif /* CONFIG_INTEL_TDX_HOST */ #endif /* !__ASSEMBLY__ */ diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c index c9c6974e2e9c..b2279a3f6976 100644 --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -28,6 +28,7 @@ #include #include #include +#include #ifdef CONFIG_ACPI /* @@ -298,9 +299,24 @@ void machine_kexec(struct kimage *image) void *control_page; int save_ftrace_enabled; + /* + * For platforms with TDX "partial write machine check" erratum, + * all TDX private pages need to be converted back to normal + * before booting to the new kernel, otherwise the new kernel + * may get unexpected machine check. + * + * But skip this when preserve_context is on. The second kernel + * shouldn't write to the first kernel's memory anyway. Skipping + * this also avoids killing TDX in the first kernel, which would + * require more complicated handling. + */ #ifdef CONFIG_KEXEC_JUMP if (image->preserve_context) save_processor_state(); + else + tdx_reset_memory(); +#else + tdx_reset_memory(); #endif save_ftrace_enabled = __ftrace_enabled_save(); diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 9f1fed458a32..0537b1b76c2b 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include #include @@ -54,6 +55,8 @@ static DEFINE_MUTEX(tdx_module_lock); /* All TDX-usable memory regions. Protected by mem_hotplug_lock. */ static LIST_HEAD(tdx_memlist); +static bool tdx_rebooting; + typedef void (*sc_err_func_t)(u64 fn, u64 err, struct tdx_module_args *args); static inline void seamcall_err(u64 fn, u64 err, struct tdx_module_args *args) @@ -1187,6 +1190,9 @@ static int __tdx_enable(void) { int ret; + if (tdx_rebooting) + return -EAGAIN; + ret = init_tdx_module(); if (ret) { pr_err("module initialization failed (%d)\n", ret); @@ -1420,6 +1426,90 @@ static struct notifier_block tdx_memory_nb = { .notifier_call = tdx_memory_notifier, }; +/* + * Convert TDX private pages back to normal on platforms with + * "partial write machine check" erratum. + * + * Called from machine_kexec() before booting to the new kernel. + */ +void tdx_reset_memory(void) +{ + if (!boot_cpu_has(X86_FEATURE_TDX_HOST_PLATFORM)) + return; + + /* + * Kernel read/write to TDX private memory doesn't + * cause machine check on hardware w/o this erratum. + */ + if (!boot_cpu_has_bug(X86_BUG_TDX_PW_MCE)) + return; + + /* Called from kexec() when only rebooting cpu is alive */ + WARN_ON_ONCE(num_online_cpus() != 1); + + /* + * tdx_reboot_notifier() waits until ongoing TDX module + * initialization to finish, and module initialization is + * rejected after that. Therefore @tdx_module_status is + * stable here and can be read w/o holding lock. + */ + if (tdx_module_status != TDX_MODULE_INITIALIZED) + return; + + /* + * Flush cache of all TDX private memory _before_ converting + * them back to avoid silent memory corruption. + */ + native_wbinvd(); + + /* + * Convert PAMTs back to normal. All other cpus are already + * dead and TDMRs/PAMTs are stable. + * + * Ideally it's better to cover all types of TDX private pages + * here, but it's impractical: + * + * - There's no existing infrastructure to tell whether a page + * is TDX private memory or not. + * + * - Using SEAMCALL to query TDX module isn't feasible either: + * - VMX has been turned off by reaching here so SEAMCALL + * cannot be made; + * - Even SEAMCALL can be made the result from TDX module may + * not be accurate (e.g., remote CPU can be stopped while + * the kernel is in the middle of reclaiming TDX private + * page and doing MOVDIR64B). + * + * One temporary solution could be just converting all memory + * pages, but it's problematic too, because not all pages are + * mapped as writable in direct mapping. It can be done by + * switching to the identical mapping for kexec() or a new page + * table which maps all pages as writable, but the complexity is + * overkill. + * + * Thus instead of doing something dramatic to convert all pages, + * only convert PAMTs here. Other kernel components which use + * TDX need to do the conversion on their own by intercepting the + * rebooting/shutdown notifier (KVM already does that). + */ + tdmrs_reset_pamt_all(&tdx_tdmr_list); +} + +static int tdx_reboot_notifier(struct notifier_block *nb, unsigned long mode, + void *unused) +{ + /* Wait ongoing TDX initialization to finish */ + mutex_lock(&tdx_module_lock); + tdx_rebooting = true; + mutex_unlock(&tdx_module_lock); + + return NOTIFY_OK; +} + +static struct notifier_block tdx_reboot_nb = { + .notifier_call = tdx_reboot_notifier, +}; + static void __init check_tdx_erratum(void) { /* @@ -1474,6 +1564,14 @@ void __init tdx_init(void) return; } + err = register_reboot_notifier(&tdx_reboot_nb); + if (err) { + pr_err("initialization failed: register_reboot_notifier() failed (%d)\n", + err); + unregister_memory_notifier(&tdx_memory_nb); + return; + } + #if defined(CONFIG_ACPI) && defined(CONFIG_SUSPEND) pr_info("Disable ACPI S3. Turn off TDX in the BIOS to use ACPI S3.\n"); acpi_suspend_lowlevel = NULL; From patchwork Wed Jan 31 11:31:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kai Huang X-Patchwork-Id: 194697 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp1822096dyb; Wed, 31 Jan 2024 03:33:28 -0800 (PST) X-Google-Smtp-Source: AGHT+IH8Hqwc+4GTuDv70J79rSuamKSMO5R7tuPAJpJulN/lQrhnl96PHebid8veNTafafYLLMtF X-Received: by 2002:a05:6808:398e:b0:3bd:727f:9624 with SMTP id gq14-20020a056808398e00b003bd727f9624mr2023630oib.2.1706700808368; Wed, 31 Jan 2024 03:33:28 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706700808; cv=pass; d=google.com; s=arc-20160816; b=T9N9hhtRfZXw/E3jTdyxUHIU2Q0hFgHQbzyYfZdcye+27SMRv5x12ZlEHNKGZ7q0a6 /FmE04B2KZsoN0l4ZldIVHMk6beelSXJhOUN4s+duhXvWvabfrTrMDqzprYHU+hK6/2q OuE4cRwNo6TFSTSWibiarmzhcMshhVZd9ab5hNLMogcucYg2IY4OJWSEuYk2qbBay1tR HsZcoGaVR196YjqACQMEQuirl9n2w0ApOEeTivWftRba5a3p+JpZNQE3RjZj6qUbscIl ZSw0GQ2xvjSA2glTgVmjjhG+cmRf+cbt13vryT9snqkqSU9XpIMlUujTHC/I9vh4Krzs XGww== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=xBhoxHz2+/dxpz+OCymXXF3ZpSBZOskVqljqaYzUV88=; fh=IV1yUsDPqJiDYfLqW2WAnn0To569JQUtjjnIgVtuxOM=; b=yY41YYkOnvicMISUX61GfYAfm8Sxr+iIk3LfJxpxJ3y7qc9do860bKJT7iHHXJT86o 33Y4V9cmdf41Z0MFeX9JZwdATWrg4n8VqK4qQ38Mwc6VFZNtW7R9miHI9Sy5FpygC7r5 jCdiYCDegTwBhb8GJCW7/nDifIzYlJUfLjCGqrjCMVtyl9wfHXxOyIqB5oKbuDtkuPJp gq7bheinr1cQcOs0gPf/cSjUGjaeJSGm7WH1cgGOGdsLVMDP5LPja6D/uuusMxA5asCx 6d+BzS1n04vOlfu4Congblr64QokP0CnDMQvMWSGpixwpw2OsR11NEBUFPH7tlU2ESpN NSeQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ETeO7YRz; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-46353-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-46353-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Forwarded-Encrypted: i=1; AJvYcCUBljzHybJz0/1YUZ0jx1R0LF5BhDvvBYDlbQa8weywiQARhB808oNeI8yzYM1nbdBPnd6I36raTkbO9VByDVRlp7IuBg== Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id 20-20020ac85954000000b0042bea9f793esi811969qtz.608.2024.01.31.03.33.28 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 31 Jan 2024 03:33:28 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-46353-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ETeO7YRz; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-46353-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-46353-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 2763A1C23674 for ; Wed, 31 Jan 2024 11:33:28 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7AC4279DA9; Wed, 31 Jan 2024 11:32:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ETeO7YRz" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5EE3B76902 for ; Wed, 31 Jan 2024 11:32:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.12 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706700741; cv=none; b=TtsQjdfIubtsmYAILRimN0Dfgi34DNi5f+rivM9q7rRF3hot25ECJ9qBYhA9/2SIUJJQZO1Kk9251OIyXV25NGFe2OucNMx7D5J+eIzIyhlX+/opTfMAe1U7QYttxkVmQvXcPthEF89azNiYOBQWKvvXctzeYwDjGD1w8nQ0J9M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706700741; c=relaxed/simple; bh=Y7qo1m5CDGT1VKthMffqdzzDPZQ45nJRPtbVqpGFfiU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Yf2f9Wv6AbGjugFHjuDf8VAmsFNXhtyyMe6LM3o3YQUW+kxBZ0ERXgq/P86cNPcC6H3HguT+4QzCyAfQOH44ODLfsXww/RToq0tc+M6mYcZQOGMjjWbxK3IAEOqRasyikKxKOmVT11VYHCTY0vDojgGZtA7OAnS+akJb4fy48tk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ETeO7YRz; arc=none smtp.client-ip=192.198.163.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706700740; x=1738236740; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Y7qo1m5CDGT1VKthMffqdzzDPZQ45nJRPtbVqpGFfiU=; b=ETeO7YRzyWU3h12lRwbnHiqrZtYOgLLlcqxEHz401eMZdw6yWvaevA1o eP+D7ZrdQ8BNP2xgK4IfBuAanydvDnka+HpeXvJUWctI5mLUsTjHTwHsE KdrfHzsXs3GHtTOgN4yj6xPHF/TfipvXdC8w/AzgOxCtupgiwBjHGKHAv UV/7t1yAzspfGSXqrTMWQrRCBhHmoiPZm0/uqYricYcoU6eAsDAyogUSE FNCKGunB2SFYz6+s/RWFG82U8cyQNtkjDCIz1mwZI2cP1twDd23wiRqy5 4/wRPPOvvxcgvE8o8ekBlFGrc7Atd4/TfJPxf1vkkszaOC3IJc0o/d4tT w==; X-IronPort-AV: E=McAfee;i="6600,9927,10969"; a="3414270" X-IronPort-AV: E=Sophos;i="6.05,231,1701158400"; d="scan'208";a="3414270" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jan 2024 03:32:20 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10969"; a="878764820" X-IronPort-AV: E=Sophos;i="6.05,231,1701158400"; d="scan'208";a="878764820" Received: from server.sh.intel.com ([10.239.53.117]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jan 2024 03:32:16 -0800 From: "Huang, Kai" To: linux-kernel@vger.kernel.org Cc: x86@kernel.org, dave.hansen@intel.com, kirill.shutemov@linux.intel.com, tglx@linutronix.de, bp@alien8.de, mingo@redhat.com, hpa@zytor.com, luto@kernel.org, peterz@infradead.org, thomas.lendacky@amd.com, chao.gao@intel.com, bhe@redhat.com, nik.borisov@suse.com, pbonzini@redhat.com Subject: [PATCH 4/4] x86/virt/tdx: Remove the !KEXEC_CORE dependency Date: Wed, 31 Jan 2024 11:31:56 +0000 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789605507058464258 X-GMAIL-MSGID: 1789605507058464258 From: Kai Huang Now TDX host can work with kexec(). Remove the !KEXEC_CORE dependency. Signed-off-by: Kai Huang --- arch/x86/Kconfig | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index ac3b32149a77..5225f8f3eade 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1973,7 +1973,6 @@ config INTEL_TDX_HOST depends on X86_X2APIC select ARCH_KEEP_MEMBLOCK depends on CONTIG_ALLOC - depends on !KEXEC_CORE depends on X86_MCE select ARCH_HAS_CC_PLATFORM help