Message ID | cd88e97e53c502f0a457d6a82a31d9e8e0f9fca7.1706698706.git.kai.huang@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-46352-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp1825846dyb; Wed, 31 Jan 2024 03:41:33 -0800 (PST) X-Google-Smtp-Source: AGHT+IH3fu5O4fXuJaEGoV5sg0EGyo+WZjZMsoeWZvKCHGbUBwKyrqm0d4q6Ah7RU+S92b/mi4Wq X-Received: by 2002:a05:6808:4d3:b0:3bd:cddd:7634 with SMTP id a19-20020a05680804d300b003bdcddd7634mr1317571oie.29.1706701292934; Wed, 31 Jan 2024 03:41:32 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706701292; cv=pass; d=google.com; s=arc-20160816; b=FkIpLzgMQEPyuBth87kJztME9k0ej0axAhQpH3jKK5MM3zD3+tV/CFieNv33BchbjR kG/lRhjOQL9ZfJOcub+xto3fMTnk00/8YkF4hF1WgMOR5VBHrHd2DuQtKRp6N+nkAX/x Bt9IJnYWsjXKYYoCXIE0Mxz2/V4Tvu+AXWBDeAC5NNdaL/vW33zalEsfXy2Y9Km67Ckn wSk6AxTIrdN0Fxe12vkXudy6qpLwm1ViXrLtsT8rD+Z4xSHMW4/LBCMNODiG/Ddr3NmX ekoThNJPC1/lnUaNC+cjHc1lHTHT8KdzcrwE4gnBK++IvquXO1LUr9qr3DiOdmCq8ZPC U5Pw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=UnjLSx+Oa8bDWXRFx01zrrzCoDrQ2NKWEWE5hOOx5Es=; fh=hSNLS8OYE5zEk849oEbeJLdmZZEdKjk1TbyzIaF2Jvs=; b=fPTDlOLKV+6+RByyiQ/4zoJMh3QYgPAjJF4SP+bO7dthV39Ak4JTxUBeq7OkrYVhh9 b6XMg1zmuNwblBvUg8G18sGwkkU+qcEwTKQ4K3f1ZFhrwaOOgxx59408PM+44T7iHg+K HWHvtDesMh2uKncxvwgsn8gASmrSUCPTNLDrnKz0nW7EsmW5waoISRLUTLmKoEFUaY9d +Hk/SevVBpgklbJnRFP/dA6H+96f95wkXant6j5r/ivvjtRO12C+fv2bav4PZASmrbkg hcxk1dnakZIOzdKtzTvXNnpvgQgkwznPkxFpXW2B3HIHOa1/QeGPTN/UhEogzhI45WTR aOHA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="G7Jz/XYa"; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-46352-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-46352-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Forwarded-Encrypted: i=1; AJvYcCUnYePWArSF7EskCuO0nrVjLUlsPnSPwYFxNBf4VhRQn3u4uc24Q14DE8J5RdAYfazT6+ZlXL1m1U7rqDPvakWmPvsVyw== Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id s24-20020a62e718000000b006ddcc9294ffsi9371431pfh.47.2024.01.31.03.41.32 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 31 Jan 2024 03:41:32 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-46352-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="G7Jz/XYa"; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-46352-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-46352-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 0A37A29119E for <ouuuleilei@gmail.com>; Wed, 31 Jan 2024 11:33:18 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 9DE9C79935; Wed, 31 Jan 2024 11:32:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="G7Jz/XYa" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 23FEB77F2C for <linux-kernel@vger.kernel.org>; Wed, 31 Jan 2024 11:32:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.12 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706700740; cv=none; b=iONQAaSdkgpEBQIDxJDdAjw0vISHebSe9rYhPHNmyVXnqQypVsrToe7m6/s4M6CQtVyUfFryKtlD7+QDr71O+nzI2WpC+sMyIqtvR47pdmLMUUxVBT4UTnTqHpMLimg+qEBPgwViLIvdgica2iJf5sm1C94JmkrGDG2gPeQ4jTw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706700740; c=relaxed/simple; bh=3ayy1GCfAP4s1+4FRo5mCAe8slIxZjOZp22vN+iG2s8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ZDhovmJIGyLJ+k7qRgGheGdsSSJES+Z4XevF3I02vzscfEYO+hEvZPp1OvLBF5yppXN8I3fgrzV87zCJVls6ZFK7fbxSi0VEsO9JWQhmSMdtMrZUeHEzWexSSALny5FQd0O0qqf9sXmXeHMw/b/5nYuNB62MRwUTrhvA93JZrRg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=G7Jz/XYa; arc=none smtp.client-ip=192.198.163.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706700732; x=1738236732; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3ayy1GCfAP4s1+4FRo5mCAe8slIxZjOZp22vN+iG2s8=; b=G7Jz/XYadVTwCVWPk8W01x6k/ZxuyEkD8kkZYUirj5zOy2Bx+9zpkdQS U+U+Tunp/V7rJWE88qo3LmunydS1gtV9lKW45+zo5XvgbrL/DW7L684yb TeWjCflePJDJ7yR4dX4PiT0+o00PbZaWWYlaKjHKyBTzJVECAnBRQGEww njIlB8BLVXBPQ5Jg0QinL0tfh7oHIaCyOpvIbjtmXuSHclE1O8jK2G48o ZCbXbWZUaPZFhZqYe8NCxwDjvxRxo+ZXxgZUtbkbsnB4KwsDtsn2JfzQv GPk1hTb2BBw2QWgYKNqWce2VrxpyOwqJIvESiHpTejKVeAfaz75NE+JWz A==; X-IronPort-AV: E=McAfee;i="6600,9927,10969"; a="3414163" X-IronPort-AV: E=Sophos;i="6.05,231,1701158400"; d="scan'208";a="3414163" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jan 2024 03:32:11 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10969"; a="878764791" X-IronPort-AV: E=Sophos;i="6.05,231,1701158400"; d="scan'208";a="878764791" Received: from server.sh.intel.com ([10.239.53.117]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jan 2024 03:32:06 -0800 From: "Huang, Kai" <kai.huang@intel.com> To: linux-kernel@vger.kernel.org Cc: x86@kernel.org, dave.hansen@intel.com, kirill.shutemov@linux.intel.com, tglx@linutronix.de, bp@alien8.de, mingo@redhat.com, hpa@zytor.com, luto@kernel.org, peterz@infradead.org, thomas.lendacky@amd.com, chao.gao@intel.com, bhe@redhat.com, nik.borisov@suse.com, pbonzini@redhat.com Subject: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec Date: Wed, 31 Jan 2024 11:31:53 +0000 Message-Id: <cd88e97e53c502f0a457d6a82a31d9e8e0f9fca7.1706698706.git.kai.huang@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <cover.1706698706.git.kai.huang@intel.com> References: <cover.1706698706.git.kai.huang@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789606014692205586 X-GMAIL-MSGID: 1789606014692205586 |
Series |
TDX host: kexec() support
|
|
Commit Message
Kai Huang
Jan. 31, 2024, 11:31 a.m. UTC
From: Kai Huang <kai.huang@intel.com> Currently on AMD SME platforms, during kexec() caches are flushed manually before jumping to the new kernel due to memory encryption. Intel TDX needs to flush cachelines of TDX private memory before jumping to the second kernel too, otherwise they may silently corrupt the new kernel. Instead of sprinkling both AMD and Intel's specific checks around, introduce a new CC_ATTR_HOST_MEM_INCOHERENT attribute to unify both Intel and AMD, and simplify the logic: Could the old kernel leave incoherent caches around? If so, do WBINVD. Convert the AMD SME to use this new CC attribute. A later patch will utilize this new attribute for Intel TDX too. Specifically, AMD SME flushes caches at two places: 1) stop_this_cpu(); 2) relocate_kernel(). stop_this_cpu() checks the CPUID directly to do WBINVD for the reason that the current kernel's SME enabling status may not match the new kernel's choice. However the relocate_kernel() only does the WBINVD when the current kernel has enabled SME for the reason that the new kernel is always placed in an "unencrypted" area. To simplify the logic, for AMD SME change to always use the way that is done in stop_this_cpu(). This will cause an additional WBINVD in relocate_kernel() when the current kernel hasn't enabled SME (e.g., disabled by kernel command line), but this is acceptable for the sake of having less complicated code (see [1] for the relevant discussion). Note currently the kernel only advertises CC vendor for AMD SME when SME is actually enabled by the kernel. To always advertise the new CC_ATTR_HOST_MEM_INCOHERENT regardless of the kernel's SME enabling status, change to set CC vendor as long as the hardware has enabled SME. Note "advertising CC_ATTR_HOST_MEM_INCOHERENT when the hardware has enabled SME" is still different from "checking the CPUID" (the way that is done in stop_this_cpu()), but technically the former also serves the purpose and is actually more accurate. Such change allows sme_me_mask to be 0 while CC vendor reports as AMD. But this doesn't impact other CC attributes on AMD platforms, nor does it impact the cc_mkdec()/cc_mkenc(). [1] https://lore.kernel.org/lkml/cbc9c527-17e5-4a63-80fe-85451394cc7c@amd.com/ Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Kai Huang <kai.huang@intel.com> --- arch/x86/coco/core.c | 13 +++++++++++++ arch/x86/kernel/machine_kexec_64.c | 2 +- arch/x86/kernel/process.c | 14 +++----------- arch/x86/mm/mem_encrypt_identity.c | 11 ++++++++++- include/linux/cc_platform.h | 15 +++++++++++++++ 5 files changed, 42 insertions(+), 13 deletions(-)
Comments
On Wed, Jan 31, 2024 at 11:31:53AM +0000, Huang, Kai wrote: > From: Kai Huang <kai.huang@intel.com> > > Currently on AMD SME platforms, during kexec() caches are flushed > manually before jumping to the new kernel due to memory encryption. > Intel TDX needs to flush cachelines of TDX private memory before > jumping to the second kernel too, otherwise they may silently corrupt > the new kernel. > > Instead of sprinkling both AMD and Intel's specific checks around, > introduce a new CC_ATTR_HOST_MEM_INCOHERENT attribute to unify both > Intel and AMD, and simplify the logic: > > Could the old kernel leave incoherent caches around? "Is it possible that the kernel could leave caches in incoherent state?" > If so, do WBINVD. > > Convert the AMD SME to use this new CC attribute. > A later patch will > utilize this new attribute for Intel TDX too. Yeah, once those are all in git, the concept of "subsequent patch" becomes ambiguous depending on how you're sorting them. So try to read that commit message out of the context of all those "subsequent patches" and see if it still makes sense. IOW, you should strive for your commit messages to make sense on their own, without referencing other patches. In this particular case, that "later patch" can go. > Specifically, AMD SME flushes caches at two places: 1) stop_this_cpu(); > 2) relocate_kernel(). stop_this_cpu() checks the CPUID directly to do > WBINVD for the reason that the current kernel's SME enabling status may > not match the new kernel's choice. However the relocate_kernel() only > does the WBINVD when the current kernel has enabled SME for the reason > that the new kernel is always placed in an "unencrypted" area. > > To simplify the logic, for AMD SME change to always use the way that is > done in stop_this_cpu(). This will cause an additional WBINVD in > relocate_kernel() when the current kernel hasn't enabled SME (e.g., > disabled by kernel command line), but this is acceptable for the sake of > having less complicated code (see [1] for the relevant discussion). > > Note currently the kernel only advertises CC vendor for AMD SME when SME > is actually enabled by the kernel. To always advertise the new > CC_ATTR_HOST_MEM_INCOHERENT regardless of the kernel's SME enabling > status, change to set CC vendor as long as the hardware has enabled SME. > > Note "advertising CC_ATTR_HOST_MEM_INCOHERENT when the hardware has > enabled SME" is still different from "checking the CPUID" (the way that > is done in stop_this_cpu()), but technically the former also serves the > purpose and is actually more accurate. > > Such change allows sme_me_mask to be 0 while CC vendor reports as AMD. > But this doesn't impact other CC attributes on AMD platforms, nor does > it impact the cc_mkdec()/cc_mkenc(). > > [1] https://lore.kernel.org/lkml/cbc9c527-17e5-4a63-80fe-85451394cc7c@amd.com/ > > Suggested-by: Dave Hansen <dave.hansen@intel.com> > Signed-off-by: Kai Huang <kai.huang@intel.com> > --- > arch/x86/coco/core.c | 13 +++++++++++++ > arch/x86/kernel/machine_kexec_64.c | 2 +- > arch/x86/kernel/process.c | 14 +++----------- > arch/x86/mm/mem_encrypt_identity.c | 11 ++++++++++- > include/linux/cc_platform.h | 15 +++++++++++++++ > 5 files changed, 42 insertions(+), 13 deletions(-) > > diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c > index eeec9986570e..8d6d727e6e18 100644 > --- a/arch/x86/coco/core.c > +++ b/arch/x86/coco/core.c > @@ -72,6 +72,19 @@ static bool noinstr amd_cc_platform_has(enum cc_attr attr) > case CC_ATTR_HOST_MEM_ENCRYPT: > return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED); > > + case CC_ATTR_HOST_MEM_INCOHERENT: > + /* > + * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME has > + * enabled on the platform regardless whether the kernel > + * has actually enabled the SME. > + "represents whether SME has [been] enabled ... regardless whether the kernel has enabled SME"?!? I think this needs to be: diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c index d07be9d05cd0..e3653465e532 100644 --- a/arch/x86/coco/core.c +++ b/arch/x86/coco/core.c @@ -67,6 +67,13 @@ static bool noinstr amd_cc_platform_has(enum cc_attr attr) switch (attr) { case CC_ATTR_MEM_ENCRYPT: + + /* + * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME has + * enabled on the platform regardless whether the kernel + * has actually enabled the SME. + */ + case CC_ATTR_HOST_MEM_INCOHERENT: return sme_me_mask; case CC_ATTR_HOST_MEM_ENCRYPT: > + return !(sev_status & MSR_AMD64_SEV_ENABLED); > + > + /* > + * For all CC_ATTR_GUEST_* there's no need to check sme_me_mask > + * as it must be true when there's any SEV enable bit set in > + * sev_status. > + */ Superfluous comment. > case CC_ATTR_GUEST_MEM_ENCRYPT: > return sev_status & MSR_AMD64_SEV_ENABLED; > > diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c > index bc0a5348b4a6..c9c6974e2e9c 100644 > --- a/arch/x86/kernel/machine_kexec_64.c > +++ b/arch/x86/kernel/machine_kexec_64.c > @@ -358,7 +358,7 @@ void machine_kexec(struct kimage *image) > (unsigned long)page_list, > image->start, > image->preserve_context, > - cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)); > + cc_platform_has(CC_ATTR_HOST_MEM_INCOHERENT)); > > #ifdef CONFIG_KEXEC_JUMP > if (image->preserve_context) > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > index ab49ade31b0d..2c7e8d9889c0 100644 > --- a/arch/x86/kernel/process.c > +++ b/arch/x86/kernel/process.c > @@ -813,18 +813,10 @@ void __noreturn stop_this_cpu(void *dummy) > mcheck_cpu_clear(c); > > /* > - * Use wbinvd on processors that support SME. This provides support > - * for performing a successful kexec when going from SME inactive > - * to SME active (or vice-versa). The cache must be cleared so that > - * if there are entries with the same physical address, both with and > - * without the encryption bit, they don't race each other when flushed > - * and potentially end up with the wrong entry being committed to > - * memory. > - * > - * Test the CPUID bit directly because the machine might've cleared > - * X86_FEATURE_SME due to cmdline options. > + * Use wbinvd on processors that the first kernel *could* > + * potentially leave incoherent cachelines. No need for that comment anymore - people can grep for CC_ATTR_HOST_MEM_INCOHERENT's definition simply. > */ > - if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0))) > + if (cc_platform_has(CC_ATTR_HOST_MEM_INCOHERENT)) > native_wbinvd(); > > /* > diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c > index 7f72472a34d6..87e4fddab770 100644 > --- a/arch/x86/mm/mem_encrypt_identity.c > +++ b/arch/x86/mm/mem_encrypt_identity.c > @@ -570,9 +570,19 @@ void __init sme_enable(struct boot_params *bp) > msr = __rdmsr(MSR_AMD64_SYSCFG); > if (!(msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT)) > return; > + > + /* > + * Always set CC vendor when the platform has SME enabled > + * regardless whether the kernel will actually activates the > + * SME or not. This reports the CC_ATTR_HOST_MEM_INCOHERENT > + * being true as long as the platform has SME enabled so that > + * stop_this_cpu() can do necessary WBINVD during kexec(). > + */ > + cc_vendor = CC_VENDOR_AMD; > } else { > /* SEV state cannot be controlled by a command line option */ > sme_me_mask = me_mask; > + cc_vendor = CC_VENDOR_AMD; > goto out; > } > So you can't put it before the if - just slap it in both branches. Geez! diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c index 0166ab1780cc..1e4566cc233f 100644 --- a/arch/x86/mm/mem_encrypt_identity.c +++ b/arch/x86/mm/mem_encrypt_identity.c @@ -549,6 +549,8 @@ void __init sme_enable(struct boot_params *bp) if (snp && !(msr & MSR_AMD64_SEV_SNP_ENABLED)) snp_abort(); + cc_vendor = CC_VENDOR_AMD; + /* Check if memory encryption is enabled */ if (feature_mask == AMD_SME_BIT) { /* @@ -597,6 +599,5 @@ void __init sme_enable(struct boot_params *bp) out: RIP_REL_REF(sme_me_mask) = me_mask; physical_mask &= ~me_mask; - cc_vendor = CC_VENDOR_AMD; cc_set_mask(me_mask); } Btw, pls do your patches ontop of tip/master as other patches in there are touching this file. > @@ -608,7 +618,6 @@ void __init sme_enable(struct boot_params *bp) > out: > if (sme_me_mask) { > physical_mask &= ~sme_me_mask; > - cc_vendor = CC_VENDOR_AMD; > cc_set_mask(sme_me_mask); > } > } > diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h > index cb0d6cd1c12f..2f7273596102 100644 > --- a/include/linux/cc_platform.h > +++ b/include/linux/cc_platform.h > @@ -42,6 +42,21 @@ enum cc_attr { > */ > CC_ATTR_HOST_MEM_ENCRYPT, > This goes to the end of the enum. > + /** > + * @CC_ATTR_HOST_MEM_INCOHERENT: Host memory encryption can be > + * incoherent "...can leave caches in an incoherent state." > + * > + * The platform/OS is running as a bare-metal system or a hypervisor. > + * The memory encryption engine might have left non-cache-coherent > + * data in the caches that needs to be flushed. > + * > + * Use this in places where the cache coherency of the memory matters > + * but the encryption status does not. > + * > + * Includes all systems that set CC_ATTR_HOST_MEM_ENCRYPT. If that is the case, why do you even need a new CC_ATTR define? Might as well do: if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) native_wbinvd(); ? > + */ > + CC_ATTR_HOST_MEM_INCOHERENT, > +
On 2/19/24 10:16, Borislav Petkov wrote: > On Wed, Jan 31, 2024 at 11:31:53AM +0000, Huang, Kai wrote: >> From: Kai Huang <kai.huang@intel.com> >> >> Currently on AMD SME platforms, during kexec() caches are flushed >> manually before jumping to the new kernel due to memory encryption. >> Intel TDX needs to flush cachelines of TDX private memory before >> jumping to the second kernel too, otherwise they may silently corrupt >> the new kernel. >> >> Instead of sprinkling both AMD and Intel's specific checks around, >> introduce a new CC_ATTR_HOST_MEM_INCOHERENT attribute to unify both >> Intel and AMD, and simplify the logic: >> >> Could the old kernel leave incoherent caches around? > > "Is it possible that the kernel could leave caches in incoherent state?" > >> If so, do WBINVD. >> >> Convert the AMD SME to use this new CC attribute. > >> A later patch will >> utilize this new attribute for Intel TDX too. > > Yeah, once those are all in git, the concept of "subsequent patch" > becomes ambiguous depending on how you're sorting them. So try to read > that commit message out of the context of all those "subsequent patches" > and see if it still makes sense. > > IOW, you should strive for your commit messages to make sense on their > own, without referencing other patches. > > In this particular case, that "later patch" can go. > >> Specifically, AMD SME flushes caches at two places: 1) stop_this_cpu(); >> 2) relocate_kernel(). stop_this_cpu() checks the CPUID directly to do >> WBINVD for the reason that the current kernel's SME enabling status may >> not match the new kernel's choice. However the relocate_kernel() only >> does the WBINVD when the current kernel has enabled SME for the reason >> that the new kernel is always placed in an "unencrypted" area. >> >> To simplify the logic, for AMD SME change to always use the way that is >> done in stop_this_cpu(). This will cause an additional WBINVD in >> relocate_kernel() when the current kernel hasn't enabled SME (e.g., >> disabled by kernel command line), but this is acceptable for the sake of >> having less complicated code (see [1] for the relevant discussion). >> >> Note currently the kernel only advertises CC vendor for AMD SME when SME >> is actually enabled by the kernel. To always advertise the new >> CC_ATTR_HOST_MEM_INCOHERENT regardless of the kernel's SME enabling >> status, change to set CC vendor as long as the hardware has enabled SME. >> >> Note "advertising CC_ATTR_HOST_MEM_INCOHERENT when the hardware has >> enabled SME" is still different from "checking the CPUID" (the way that >> is done in stop_this_cpu()), but technically the former also serves the >> purpose and is actually more accurate. >> >> Such change allows sme_me_mask to be 0 while CC vendor reports as AMD. >> But this doesn't impact other CC attributes on AMD platforms, nor does >> it impact the cc_mkdec()/cc_mkenc(). >> >> [1] https://lore.kernel.org/lkml/cbc9c527-17e5-4a63-80fe-85451394cc7c@amd.com/ >> >> Suggested-by: Dave Hansen <dave.hansen@intel.com> >> Signed-off-by: Kai Huang <kai.huang@intel.com> >> --- >> arch/x86/coco/core.c | 13 +++++++++++++ >> arch/x86/kernel/machine_kexec_64.c | 2 +- >> arch/x86/kernel/process.c | 14 +++----------- >> arch/x86/mm/mem_encrypt_identity.c | 11 ++++++++++- >> include/linux/cc_platform.h | 15 +++++++++++++++ >> 5 files changed, 42 insertions(+), 13 deletions(-) >> >> diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c >> index eeec9986570e..8d6d727e6e18 100644 >> --- a/arch/x86/coco/core.c >> +++ b/arch/x86/coco/core.c >> @@ -72,6 +72,19 @@ static bool noinstr amd_cc_platform_has(enum cc_attr attr) >> case CC_ATTR_HOST_MEM_ENCRYPT: >> return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED); >> >> + case CC_ATTR_HOST_MEM_INCOHERENT: >> + /* >> + * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME has >> + * enabled on the platform regardless whether the kernel >> + * has actually enabled the SME. >> + > > "represents whether SME has [been] enabled ... regardless whether the > kernel has enabled SME"?!? > > I think this needs to be: > > diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c > index d07be9d05cd0..e3653465e532 100644 > --- a/arch/x86/coco/core.c > +++ b/arch/x86/coco/core.c > @@ -67,6 +67,13 @@ static bool noinstr amd_cc_platform_has(enum cc_attr attr) > > switch (attr) { > case CC_ATTR_MEM_ENCRYPT: > + > + /* > + * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME has > + * enabled on the platform regardless whether the kernel > + * has actually enabled the SME. > + */ > + case CC_ATTR_HOST_MEM_INCOHERENT: This change won't return the correct answer. The check needs to remain against the sev_status value. > return sme_me_mask; > > case CC_ATTR_HOST_MEM_ENCRYPT: > > >> + return !(sev_status & MSR_AMD64_SEV_ENABLED); >> + >> + /* >> + * For all CC_ATTR_GUEST_* there's no need to check sme_me_mask >> + * as it must be true when there's any SEV enable bit set in >> + * sev_status. >> + */ > > Superfluous comment. > >> case CC_ATTR_GUEST_MEM_ENCRYPT: >> return sev_status & MSR_AMD64_SEV_ENABLED; >> >> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c >> index bc0a5348b4a6..c9c6974e2e9c 100644 >> --- a/arch/x86/kernel/machine_kexec_64.c >> +++ b/arch/x86/kernel/machine_kexec_64.c >> @@ -358,7 +358,7 @@ void machine_kexec(struct kimage *image) >> (unsigned long)page_list, >> image->start, >> image->preserve_context, >> - cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)); >> + cc_platform_has(CC_ATTR_HOST_MEM_INCOHERENT)); >> >> #ifdef CONFIG_KEXEC_JUMP >> if (image->preserve_context) >> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c >> index ab49ade31b0d..2c7e8d9889c0 100644 >> --- a/arch/x86/kernel/process.c >> +++ b/arch/x86/kernel/process.c >> @@ -813,18 +813,10 @@ void __noreturn stop_this_cpu(void *dummy) >> mcheck_cpu_clear(c); >> >> /* >> - * Use wbinvd on processors that support SME. This provides support >> - * for performing a successful kexec when going from SME inactive >> - * to SME active (or vice-versa). The cache must be cleared so that >> - * if there are entries with the same physical address, both with and >> - * without the encryption bit, they don't race each other when flushed >> - * and potentially end up with the wrong entry being committed to >> - * memory. >> - * >> - * Test the CPUID bit directly because the machine might've cleared >> - * X86_FEATURE_SME due to cmdline options. >> + * Use wbinvd on processors that the first kernel *could* >> + * potentially leave incoherent cachelines. > > No need for that comment anymore - people can grep for > CC_ATTR_HOST_MEM_INCOHERENT's definition simply. > >> */ >> - if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0))) >> + if (cc_platform_has(CC_ATTR_HOST_MEM_INCOHERENT)) >> native_wbinvd(); >> >> /* >> diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c >> index 7f72472a34d6..87e4fddab770 100644 >> --- a/arch/x86/mm/mem_encrypt_identity.c >> +++ b/arch/x86/mm/mem_encrypt_identity.c >> @@ -570,9 +570,19 @@ void __init sme_enable(struct boot_params *bp) >> msr = __rdmsr(MSR_AMD64_SYSCFG); >> if (!(msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT)) >> return; >> + >> + /* >> + * Always set CC vendor when the platform has SME enabled >> + * regardless whether the kernel will actually activates the >> + * SME or not. This reports the CC_ATTR_HOST_MEM_INCOHERENT >> + * being true as long as the platform has SME enabled so that >> + * stop_this_cpu() can do necessary WBINVD during kexec(). >> + */ >> + cc_vendor = CC_VENDOR_AMD; >> } else { >> /* SEV state cannot be controlled by a command line option */ >> sme_me_mask = me_mask; >> + cc_vendor = CC_VENDOR_AMD; >> goto out; >> } >> > > So you can't put it before the if - just slap it in both branches. Geez! I think that will still work because sme_me_mask and sev_status will both be 0 on bare-metal if 'msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT' doesn't evaluate to true. However, that will cause any platform that hasn't enabled memory encryption (see SYS_CFG MSR), to also perform the WBINVD. The idea looks like it was to keep existing behavior where cc_vendor wasn't set if 'msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT' is false. > > diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c > index 0166ab1780cc..1e4566cc233f 100644 > --- a/arch/x86/mm/mem_encrypt_identity.c > +++ b/arch/x86/mm/mem_encrypt_identity.c > @@ -549,6 +549,8 @@ void __init sme_enable(struct boot_params *bp) > if (snp && !(msr & MSR_AMD64_SEV_SNP_ENABLED)) > snp_abort(); > > + cc_vendor = CC_VENDOR_AMD; > + > /* Check if memory encryption is enabled */ > if (feature_mask == AMD_SME_BIT) { > /* > @@ -597,6 +599,5 @@ void __init sme_enable(struct boot_params *bp) > out: > RIP_REL_REF(sme_me_mask) = me_mask; > physical_mask &= ~me_mask; > - cc_vendor = CC_VENDOR_AMD; > cc_set_mask(me_mask); > } > > Btw, pls do your patches ontop of tip/master as other patches in there > are touching this file. > >> @@ -608,7 +618,6 @@ void __init sme_enable(struct boot_params *bp) >> out: >> if (sme_me_mask) { >> physical_mask &= ~sme_me_mask; >> - cc_vendor = CC_VENDOR_AMD; >> cc_set_mask(sme_me_mask); >> } >> } >> diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h >> index cb0d6cd1c12f..2f7273596102 100644 >> --- a/include/linux/cc_platform.h >> +++ b/include/linux/cc_platform.h >> @@ -42,6 +42,21 @@ enum cc_attr { >> */ >> CC_ATTR_HOST_MEM_ENCRYPT, >> > > This goes to the end of the enum. > >> + /** >> + * @CC_ATTR_HOST_MEM_INCOHERENT: Host memory encryption can be >> + * incoherent > > "...can leave caches in an incoherent state." > >> + * >> + * The platform/OS is running as a bare-metal system or a hypervisor. >> + * The memory encryption engine might have left non-cache-coherent >> + * data in the caches that needs to be flushed. >> + * >> + * Use this in places where the cache coherency of the memory matters >> + * but the encryption status does not. >> + * >> + * Includes all systems that set CC_ATTR_HOST_MEM_ENCRYPT. > > If that is the case, why do you even need a new CC_ATTR define? > > Might as well do: > > if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) > native_wbinvd(); That won't work, because the current system may not have SME active. The cases that needs to be caught are kexec'ing from a mem_encrypt=off to a mem_encrypt=on or from a mem_encrypt=on to a mem_encrypt=off. For the first case, you need to determine if the system is SME capable, because cc_platform_has(CC_ATTR_MEM_ENCRYPT) will return false if SME is not active. Thanks, Tom > > ? > >> + */ >> + CC_ATTR_HOST_MEM_INCOHERENT, >> + > >
On Mon, Feb 19, 2024 at 01:45:37PM -0600, Tom Lendacky wrote: > This change won't return the correct answer. The check needs to remain > against the sev_status value. Feel free to explain because this patch is confusing me. > > So you can't put it before the if - just slap it in both branches. Geez! > > I think that will still work because sme_me_mask and sev_status will both be > 0 on bare-metal if 'msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT' doesn't evaluate to > true. However, that will cause any platform that hasn't enabled memory > encryption (see SYS_CFG MSR), to also perform the WBINVD. If it keeps the code simpler I don't mind. That's so not a fast path. > That won't work, because the current system may not have SME active. The > cases that needs to be caught are kexec'ing from a mem_encrypt=off to a > mem_encrypt=on or from a mem_encrypt=on to a mem_encrypt=off. And I'm saying, we should keep it simple and simply WBINVD on SME capable machines, regardless of the encryption setting. Any downsides to that which are actually real and noticeable? Thx.
On 2/19/24 14:32, Borislav Petkov wrote: > On Mon, Feb 19, 2024 at 01:45:37PM -0600, Tom Lendacky wrote: >> This change won't return the correct answer. The check needs to remain >> against the sev_status value. > > Feel free to explain because this patch is confusing me. In your previous email, you want to put the CC_ATTR_HOST_MEM_INCOHERENT case statement with the CC_ATTR_MEM_ENCRYPT case which is returning sme_me_mask. That will be zero/false if SME is not active, skipping the WBINVD. But, in reality you still need to perform the WBINVD in case the kexec target is doing mem_encrypt=on. That's why the '!(sev_status & MSR_AMD64_SEV_ENABLED)' works here. Basically, if you are bare-metal, it will return true. And it will only return true for machines that support SME and have the MSR_AMD64_SYSCFG_MEM_ENCRYPT bit set in SYS_CFG MSR because of where the 'cc_vendor = CC_VENDOR_AMD' assignment is. However, if you move the 'cc_vendor = CC_VENDOR_AMD' to before the if statement, then you will have the WBINVD called for any machine that supports SME, even if SME is not possible because the proper bit in the SYS_CFG MSR hasn't been set. I know what I'm trying to say, let me know if it is making sense... > >>> So you can't put it before the if - just slap it in both branches. Geez! >> >> I think that will still work because sme_me_mask and sev_status will both be >> 0 on bare-metal if 'msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT' doesn't evaluate to >> true. However, that will cause any platform that hasn't enabled memory >> encryption (see SYS_CFG MSR), to also perform the WBINVD. > > If it keeps the code simpler I don't mind. That's so not a fast path. > >> That won't work, because the current system may not have SME active. The >> cases that needs to be caught are kexec'ing from a mem_encrypt=off to a >> mem_encrypt=on or from a mem_encrypt=on to a mem_encrypt=off. > > And I'm saying, we should keep it simple and simply WBINVD on SME > capable machines, regardless of the encryption setting. In that case, CC_ATTR_HOST_MEM_INCOHERENT needs to be separate from CC_ATTR_MEM_ENCRYPT as the original patch has it. The comment might make more sense as: * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME is possible * on the platform, regardless of whether mem_encrypt=on has been * used to make SME active. Thanks, Tom > > Any downsides to that which are actually real and noticeable? > > Thx. >
On Mon, 2024-02-19 at 16:09 -0600, Tom Lendacky wrote: > On 2/19/24 14:32, Borislav Petkov wrote: > > On Mon, Feb 19, 2024 at 01:45:37PM -0600, Tom Lendacky wrote: > > > This change won't return the correct answer. The check needs to remain > > > against the sev_status value. > > > > Feel free to explain because this patch is confusing me. > > In your previous email, you want to put the CC_ATTR_HOST_MEM_INCOHERENT > case statement with the CC_ATTR_MEM_ENCRYPT case which is returning > sme_me_mask. That will be zero/false if SME is not active, skipping the > WBINVD. But, in reality you still need to perform the WBINVD in case the > kexec target is doing mem_encrypt=on. > > That's why the '!(sev_status & MSR_AMD64_SEV_ENABLED)' works here. > Basically, if you are bare-metal, it will return true. And it will only > return true for machines that support SME and have the > MSR_AMD64_SYSCFG_MEM_ENCRYPT bit set in SYS_CFG MSR because of where the > 'cc_vendor = CC_VENDOR_AMD' assignment is. > [...] > However, if you move the > 'cc_vendor = CC_VENDOR_AMD' to before the if statement, then you will have > the WBINVD called for any machine that supports SME, even if SME is not > possible because the proper bit in the SYS_CFG MSR hasn't been set. Hi Tom, Thanks for clarifying. However it seems to me that this is the behaviour in the current upstream code. The stop_this_cpu() checks CPUID directly w/o checking the SYS_CFG MSR: if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0))) native_wbinvd(); I believe the BIT(0) in CPUID, which is "Secure Memory Encryption Support", will always be true regardless of whether the MSR_AMD64_SYSCFG_MEM_ENCRYPT is set in SYS_CFG MSR? If so, IIUC moving the 'cc_vendor = CC_VENDOR_AMD' to the place right before the if statement as suggested by Boris seems just follows the current behaviour in the upstream code. Of course we need to always return true for CC_ATTR_HOST_MEM_INCOHERENT but not querying sme_me_mask. > > I know what I'm trying to say, let me know if it is making sense... > > > > > > > So you can't put it before the if - just slap it in both branches. Geez! > > > > > > I think that will still work because sme_me_mask and sev_status will both be > > > 0 on bare-metal if 'msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT' doesn't evaluate to > > > true. However, that will cause any platform that hasn't enabled memory > > > encryption (see SYS_CFG MSR), to also perform the WBINVD. > > > > If it keeps the code simpler I don't mind. That's so not a fast path. > > > > > That won't work, because the current system may not have SME active. The > > > cases that needs to be caught are kexec'ing from a mem_encrypt=off to a > > > mem_encrypt=on or from a mem_encrypt=on to a mem_encrypt=off. > > > > And I'm saying, we should keep it simple and simply WBINVD on SME > > capable machines, regardless of the encryption setting. > > In that case, CC_ATTR_HOST_MEM_INCOHERENT needs to be separate from > CC_ATTR_MEM_ENCRYPT as the original patch has it. The comment might make > more sense as: > > * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME is possible > * on the platform, regardless of whether mem_encrypt=on has been > * used to make SME active. > > Thanks, > Tom This looks good to me. Thanks!
On Mon, 2024-02-19 at 17:16 +0100, Borislav Petkov wrote: > On Wed, Jan 31, 2024 at 11:31:53AM +0000, Huang, Kai wrote: > > From: Kai Huang <kai.huang@intel.com> > > > > Currently on AMD SME platforms, during kexec() caches are flushed > > manually before jumping to the new kernel due to memory encryption. > > Intel TDX needs to flush cachelines of TDX private memory before > > jumping to the second kernel too, otherwise they may silently corrupt > > the new kernel. > > > > Instead of sprinkling both AMD and Intel's specific checks around, > > introduce a new CC_ATTR_HOST_MEM_INCOHERENT attribute to unify both > > Intel and AMD, and simplify the logic: > > > > Could the old kernel leave incoherent caches around? > > "Is it possible that the kernel could leave caches in incoherent state?" Will do. Thanks. > > > If so, do WBINVD. > > > > Convert the AMD SME to use this new CC attribute. > > > A later patch will > > utilize this new attribute for Intel TDX too. > > Yeah, once those are all in git, the concept of "subsequent patch" > becomes ambiguous depending on how you're sorting them. So try to read > that commit message out of the context of all those "subsequent patches" > and see if it still makes sense. Right. Makes sense. > > IOW, you should strive for your commit messages to make sense on their > own, without referencing other patches. > > In this particular case, that "later patch" can go. Will remove the "later patch" sentence. Thanks. > > > Specifically, AMD SME flushes caches at two places: 1) stop_this_cpu(); > > 2) relocate_kernel(). stop_this_cpu() checks the CPUID directly to do > > WBINVD for the reason that the current kernel's SME enabling status may > > not match the new kernel's choice. However the relocate_kernel() only > > does the WBINVD when the current kernel has enabled SME for the reason > > that the new kernel is always placed in an "unencrypted" area. > > > > To simplify the logic, for AMD SME change to always use the way that is > > done in stop_this_cpu(). This will cause an additional WBINVD in > > relocate_kernel() when the current kernel hasn't enabled SME (e.g., > > disabled by kernel command line), but this is acceptable for the sake of > > having less complicated code (see [1] for the relevant discussion). > > > > Note currently the kernel only advertises CC vendor for AMD SME when SME > > is actually enabled by the kernel. To always advertise the new > > CC_ATTR_HOST_MEM_INCOHERENT regardless of the kernel's SME enabling > > status, change to set CC vendor as long as the hardware has enabled SME. > > > > Note "advertising CC_ATTR_HOST_MEM_INCOHERENT when the hardware has > > enabled SME" is still different from "checking the CPUID" (the way that > > is done in stop_this_cpu()), but technically the former also serves the > > purpose and is actually more accurate. > > > > Such change allows sme_me_mask to be 0 while CC vendor reports as AMD. > > But this doesn't impact other CC attributes on AMD platforms, nor does > > it impact the cc_mkdec()/cc_mkenc(). > > > > [1] https://lore.kernel.org/lkml/cbc9c527-17e5-4a63-80fe-85451394cc7c@amd.com/ > > > > Suggested-by: Dave Hansen <dave.hansen@intel.com> > > Signed-off-by: Kai Huang <kai.huang@intel.com> > > --- > > arch/x86/coco/core.c | 13 +++++++++++++ > > arch/x86/kernel/machine_kexec_64.c | 2 +- > > arch/x86/kernel/process.c | 14 +++----------- > > arch/x86/mm/mem_encrypt_identity.c | 11 ++++++++++- > > include/linux/cc_platform.h | 15 +++++++++++++++ > > 5 files changed, 42 insertions(+), 13 deletions(-) > > > > diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c > > index eeec9986570e..8d6d727e6e18 100644 > > --- a/arch/x86/coco/core.c > > +++ b/arch/x86/coco/core.c > > @@ -72,6 +72,19 @@ static bool noinstr amd_cc_platform_has(enum cc_attr attr) > > case CC_ATTR_HOST_MEM_ENCRYPT: > > return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED); > > > > + case CC_ATTR_HOST_MEM_INCOHERENT: > > + /* > > + * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME has > > + * enabled on the platform regardless whether the kernel > > + * has actually enabled the SME. > > + > > "represents whether SME has [been] enabled ... regardless whether the > kernel has enabled SME"?!? > > I think this needs to be: > > diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c > index d07be9d05cd0..e3653465e532 100644 > --- a/arch/x86/coco/core.c > +++ b/arch/x86/coco/core.c > @@ -67,6 +67,13 @@ static bool noinstr amd_cc_platform_has(enum cc_attr attr) > > switch (attr) { > case CC_ATTR_MEM_ENCRYPT: > + > + /* > + * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME has > + * enabled on the platform regardless whether the kernel > + * has actually enabled the SME. > + */ > + case CC_ATTR_HOST_MEM_INCOHERENT: > return sme_me_mask; As Tom pointed out this will return false when kernel doesn't active SME due to command line, and WBINVD won't be done. > > case CC_ATTR_HOST_MEM_ENCRYPT: > > > > + return !(sev_status & MSR_AMD64_SEV_ENABLED); > > + > > + /* > > + * For all CC_ATTR_GUEST_* there's no need to check sme_me_mask > > + * as it must be true when there's any SEV enable bit set in > > + * sev_status. > > + */ > > Superfluous comment. Will remove. > > > case CC_ATTR_GUEST_MEM_ENCRYPT: > > return sev_status & MSR_AMD64_SEV_ENABLED; > > > > diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c > > index bc0a5348b4a6..c9c6974e2e9c 100644 > > --- a/arch/x86/kernel/machine_kexec_64.c > > +++ b/arch/x86/kernel/machine_kexec_64.c > > @@ -358,7 +358,7 @@ void machine_kexec(struct kimage *image) > > (unsigned long)page_list, > > image->start, > > image->preserve_context, > > - cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)); > > + cc_platform_has(CC_ATTR_HOST_MEM_INCOHERENT)); > > > > #ifdef CONFIG_KEXEC_JUMP > > if (image->preserve_context) > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > > index ab49ade31b0d..2c7e8d9889c0 100644 > > --- a/arch/x86/kernel/process.c > > +++ b/arch/x86/kernel/process.c > > @@ -813,18 +813,10 @@ void __noreturn stop_this_cpu(void *dummy) > > mcheck_cpu_clear(c); > > > > /* > > - * Use wbinvd on processors that support SME. This provides support > > - * for performing a successful kexec when going from SME inactive > > - * to SME active (or vice-versa). The cache must be cleared so that > > - * if there are entries with the same physical address, both with and > > - * without the encryption bit, they don't race each other when flushed > > - * and potentially end up with the wrong entry being committed to > > - * memory. > > - * > > - * Test the CPUID bit directly because the machine might've cleared > > - * X86_FEATURE_SME due to cmdline options. > > + * Use wbinvd on processors that the first kernel *could* > > + * potentially leave incoherent cachelines. > > No need for that comment anymore - people can grep for > CC_ATTR_HOST_MEM_INCOHERENT's definition simply. Will remove. Thanks. > > > */ > > - if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0))) > > + if (cc_platform_has(CC_ATTR_HOST_MEM_INCOHERENT)) > > native_wbinvd(); > > > > /* > > diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c > > index 7f72472a34d6..87e4fddab770 100644 > > --- a/arch/x86/mm/mem_encrypt_identity.c > > +++ b/arch/x86/mm/mem_encrypt_identity.c > > @@ -570,9 +570,19 @@ void __init sme_enable(struct boot_params *bp) > > msr = __rdmsr(MSR_AMD64_SYSCFG); > > if (!(msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT)) > > return; > > + > > + /* > > + * Always set CC vendor when the platform has SME enabled > > + * regardless whether the kernel will actually activates the > > + * SME or not. This reports the CC_ATTR_HOST_MEM_INCOHERENT > > + * being true as long as the platform has SME enabled so that > > + * stop_this_cpu() can do necessary WBINVD during kexec(). > > + */ > > + cc_vendor = CC_VENDOR_AMD; > > } else { > > /* SEV state cannot be controlled by a command line option */ > > sme_me_mask = me_mask; > > + cc_vendor = CC_VENDOR_AMD; > > goto out; > > } > > > > So you can't put it before the if - just slap it in both branches. Geez! I think we can. Please see my reply to Tom's email. > > diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c > index 0166ab1780cc..1e4566cc233f 100644 > --- a/arch/x86/mm/mem_encrypt_identity.c > +++ b/arch/x86/mm/mem_encrypt_identity.c > @@ -549,6 +549,8 @@ void __init sme_enable(struct boot_params *bp) > if (snp && !(msr & MSR_AMD64_SEV_SNP_ENABLED)) > snp_abort(); > > + cc_vendor = CC_VENDOR_AMD; > + > /* Check if memory encryption is enabled */ > if (feature_mask == AMD_SME_BIT) { > /* > @@ -597,6 +599,5 @@ void __init sme_enable(struct boot_params *bp) > out: > RIP_REL_REF(sme_me_mask) = me_mask; > physical_mask &= ~me_mask; > - cc_vendor = CC_VENDOR_AMD; > cc_set_mask(me_mask); > } > > Btw, pls do your patches ontop of tip/master as other patches in there > are touching this file. Yeah will do. I believe this series was generated based on tip/master but this was 3 weeks ago. > > > @@ -608,7 +618,6 @@ void __init sme_enable(struct boot_params *bp) > > out: > > if (sme_me_mask) { > > physical_mask &= ~sme_me_mask; > > - cc_vendor = CC_VENDOR_AMD; > > cc_set_mask(sme_me_mask); > > } > > } > > diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h > > index cb0d6cd1c12f..2f7273596102 100644 > > --- a/include/linux/cc_platform.h > > +++ b/include/linux/cc_platform.h > > @@ -42,6 +42,21 @@ enum cc_attr { > > */ > > CC_ATTR_HOST_MEM_ENCRYPT, > > > > This goes to the end of the enum. > > > + /** > > + * @CC_ATTR_HOST_MEM_INCOHERENT: Host memory encryption can be > > + * incoherent > > "...can leave caches in an incoherent state." Will do. And I'll move this CC_ATTR_HOST_MEM_INCOHERENT to the end of the enum if I understand you correctly. > > > + * > > + * The platform/OS is running as a bare-metal system or a hypervisor. > > + * The memory encryption engine might have left non-cache-coherent > > + * data in the caches that needs to be flushed. > > + * > > + * Use this in places where the cache coherency of the memory matters > > + * but the encryption status does not. > > + * > > + * Includes all systems that set CC_ATTR_HOST_MEM_ENCRYPT. > > If that is the case, why do you even need a new CC_ATTR define? > > Might as well do: > > if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) > native_wbinvd(); > > ? As Tom pointed out using the CC_ATTR_MEM_ENCRYPT will miss WBINVD when kernel disables SME by commandline. > > > + */ > > + CC_ATTR_HOST_MEM_INCOHERENT, > > + > >
On Mon, Feb 19, 2024 at 04:09:47PM -0600, Tom Lendacky wrote: > That's why the '!(sev_status & MSR_AMD64_SEV_ENABLED)' works here. I would've never figured that out just from staring at the test. :-\ > Basically, if you are bare-metal, it will return true. And it will only > return true for machines that support SME and have the > MSR_AMD64_SYSCFG_MEM_ENCRYPT bit set in SYS_CFG MSR because of where the > 'cc_vendor = CC_VENDOR_AMD' assignment is. However, if you move the > 'cc_vendor = CC_VENDOR_AMD' to before the if statement, then you will have > the WBINVD called for any machine that supports SME, even if SME is not > possible because the proper bit in the SYS_CFG MSR hasn't been set. > > I know what I'm trying to say, let me know if it is making sense... Yah, thanks for taking the time to explain. Here's an even more radical idea: Why not do WBINVD *unconditionally* on the CPU down path? - it is the opposite of a fast path, i.e., no one cares - it'll take care of every possible configuration without ugly logic - it wouldn't hurt to have the caches nice and coherent before going down Hmmm.
On 2/19/24 20:57, Huang, Kai wrote: > On Mon, 2024-02-19 at 16:09 -0600, Tom Lendacky wrote: >> On 2/19/24 14:32, Borislav Petkov wrote: >>> On Mon, Feb 19, 2024 at 01:45:37PM -0600, Tom Lendacky wrote: >>>> This change won't return the correct answer. The check needs to remain >>>> against the sev_status value. >>> >>> Feel free to explain because this patch is confusing me. >> >> In your previous email, you want to put the CC_ATTR_HOST_MEM_INCOHERENT >> case statement with the CC_ATTR_MEM_ENCRYPT case which is returning >> sme_me_mask. That will be zero/false if SME is not active, skipping the >> WBINVD. But, in reality you still need to perform the WBINVD in case the >> kexec target is doing mem_encrypt=on. >> >> That's why the '!(sev_status & MSR_AMD64_SEV_ENABLED)' works here. >> Basically, if you are bare-metal, it will return true. And it will only >> return true for machines that support SME and have the >> MSR_AMD64_SYSCFG_MEM_ENCRYPT bit set in SYS_CFG MSR because of where the >> 'cc_vendor = CC_VENDOR_AMD' assignment is. >> > > [...] > >> However, if you move the >> 'cc_vendor = CC_VENDOR_AMD' to before the if statement, then you will have >> the WBINVD called for any machine that supports SME, even if SME is not >> possible because the proper bit in the SYS_CFG MSR hasn't been set. > > Hi Tom, > > Thanks for clarifying. However it seems to me that this is the behaviour in the > current upstream code. The stop_this_cpu() checks CPUID directly w/o checking > the SYS_CFG MSR: Correct, it will match the upstream behavior this way. It would have been improved slightly with your original patch by avoiding the WBINVD if the MSR_AMD64_SYSCFG_MEM_ENCRYPT wasn't set. > > if (c->extended_cpuid_level >= 0x8000001f && > (cpuid_eax(0x8000001f) & BIT(0))) > native_wbinvd(); > > I believe the BIT(0) in CPUID, which is "Secure Memory Encryption Support", will > always be true regardless of whether the MSR_AMD64_SYSCFG_MEM_ENCRYPT is set in > SYS_CFG MSR? > > If so, IIUC moving the 'cc_vendor = CC_VENDOR_AMD' to the place right before the > if statement as suggested by Boris seems just follows the current behaviour in > the upstream code. Yep, that's how I see it, too. Thanks, Tom > > Of course we need to always return true for CC_ATTR_HOST_MEM_INCOHERENT but not > querying sme_me_mask. > >> >> I know what I'm trying to say, let me know if it is making sense... >> >>> >>>>> So you can't put it before the if - just slap it in both branches. Geez! >>>> >>>> I think that will still work because sme_me_mask and sev_status will both be >>>> 0 on bare-metal if 'msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT' doesn't evaluate to >>>> true. However, that will cause any platform that hasn't enabled memory >>>> encryption (see SYS_CFG MSR), to also perform the WBINVD. >>> >>> If it keeps the code simpler I don't mind. That's so not a fast path. >>> >>>> That won't work, because the current system may not have SME active. The >>>> cases that needs to be caught are kexec'ing from a mem_encrypt=off to a >>>> mem_encrypt=on or from a mem_encrypt=on to a mem_encrypt=off. >>> >>> And I'm saying, we should keep it simple and simply WBINVD on SME >>> capable machines, regardless of the encryption setting. >> >> In that case, CC_ATTR_HOST_MEM_INCOHERENT needs to be separate from >> CC_ATTR_MEM_ENCRYPT as the original patch has it. The comment might make >> more sense as: >> >> * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME is possible >> * on the platform, regardless of whether mem_encrypt=on has been >> * used to make SME active. >> >> Thanks, >> Tom > > This looks good to me. Thanks! > >
On 2/20/24 08:28, Borislav Petkov wrote: > On Mon, Feb 19, 2024 at 04:09:47PM -0600, Tom Lendacky wrote: >> That's why the '!(sev_status & MSR_AMD64_SEV_ENABLED)' works here. > > I would've never figured that out just from staring at the test. :-\ > >> Basically, if you are bare-metal, it will return true. And it will only >> return true for machines that support SME and have the >> MSR_AMD64_SYSCFG_MEM_ENCRYPT bit set in SYS_CFG MSR because of where the >> 'cc_vendor = CC_VENDOR_AMD' assignment is. However, if you move the >> 'cc_vendor = CC_VENDOR_AMD' to before the if statement, then you will have >> the WBINVD called for any machine that supports SME, even if SME is not >> possible because the proper bit in the SYS_CFG MSR hasn't been set. >> >> I know what I'm trying to say, let me know if it is making sense... > > Yah, thanks for taking the time to explain. > > Here's an even more radical idea: > > Why not do WBINVD *unconditionally* on the CPU down path? > > - it is the opposite of a fast path, i.e., no one cares > > - it'll take care of every possible configuration without ugly logic > > - it wouldn't hurt to have the caches nice and coherent before going > down > > Hmmm. That's what I initially did, but errors were reported, see commit: f23d74f6c66c ("x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()") But that may be because of the issue solved by commit: 1f5e7eb7868e ("x86/smp: Make stop_other_cpus() more robust") So... Thanks, Tom >
On Tue, 2024-02-20 at 08:47 -0600, Tom Lendacky wrote: > On 2/20/24 08:28, Borislav Petkov wrote: > > On Mon, Feb 19, 2024 at 04:09:47PM -0600, Tom Lendacky wrote: > > > That's why the '!(sev_status & MSR_AMD64_SEV_ENABLED)' works here. > > > > I would've never figured that out just from staring at the test. :-\ > > > > > Basically, if you are bare-metal, it will return true. And it will only > > > return true for machines that support SME and have the > > > MSR_AMD64_SYSCFG_MEM_ENCRYPT bit set in SYS_CFG MSR because of where the > > > 'cc_vendor = CC_VENDOR_AMD' assignment is. However, if you move the > > > 'cc_vendor = CC_VENDOR_AMD' to before the if statement, then you will have > > > the WBINVD called for any machine that supports SME, even if SME is not > > > possible because the proper bit in the SYS_CFG MSR hasn't been set. > > > > > > I know what I'm trying to say, let me know if it is making sense... > > > > Yah, thanks for taking the time to explain. > > > > Here's an even more radical idea: > > > > Why not do WBINVD *unconditionally* on the CPU down path? > > > > - it is the opposite of a fast path, i.e., no one cares > > > > - it'll take care of every possible configuration without ugly logic > > > > - it wouldn't hurt to have the caches nice and coherent before going > > down > > > > Hmmm. > > That's what I initially did, but errors were reported, see commit: > f23d74f6c66c ("x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()") This changelog only mentions "Some issues". Do you know exactly what kind issues did you see? Are these issues only appeared on SME enabled system or other non-SME-capable systems too? Because ... > > But that may be because of the issue solved by commit: > 1f5e7eb7868e ("x86/smp: Make stop_other_cpus() more robust") ... the issue resolved in this commit seems to be hang.
On 2/20/24 14:07, Huang, Kai wrote: > On Tue, 2024-02-20 at 08:47 -0600, Tom Lendacky wrote: >> On 2/20/24 08:28, Borislav Petkov wrote: >>> On Mon, Feb 19, 2024 at 04:09:47PM -0600, Tom Lendacky wrote: >>>> That's why the '!(sev_status & MSR_AMD64_SEV_ENABLED)' works here. >>> >>> I would've never figured that out just from staring at the test. :-\ >>> >>>> Basically, if you are bare-metal, it will return true. And it will only >>>> return true for machines that support SME and have the >>>> MSR_AMD64_SYSCFG_MEM_ENCRYPT bit set in SYS_CFG MSR because of where the >>>> 'cc_vendor = CC_VENDOR_AMD' assignment is. However, if you move the >>>> 'cc_vendor = CC_VENDOR_AMD' to before the if statement, then you will have >>>> the WBINVD called for any machine that supports SME, even if SME is not >>>> possible because the proper bit in the SYS_CFG MSR hasn't been set. >>>> >>>> I know what I'm trying to say, let me know if it is making sense... >>> >>> Yah, thanks for taking the time to explain. >>> >>> Here's an even more radical idea: >>> >>> Why not do WBINVD *unconditionally* on the CPU down path? >>> >>> - it is the opposite of a fast path, i.e., no one cares >>> >>> - it'll take care of every possible configuration without ugly logic >>> >>> - it wouldn't hurt to have the caches nice and coherent before going >>> down >>> >>> Hmmm. >> >> That's what I initially did, but errors were reported, see commit: >> f23d74f6c66c ("x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()") > > This changelog only mentions "Some issues". Do you know exactly what kind > issues did you see? Are these issues only appeared on SME enabled system or > other non-SME-capable systems too? I believe the issues were that different Intel systems would hang or reset and it was bisected to that commit that added the WBINVD. It was a while ago, but I remember that they were similar to what the 1f5e7eb7868e commit ended up fixing, which was debugged because sometimes the WBINVD was still occasionally issued resulting in the following patch 9b040453d444 ("x86/smp: Dont access non-existing CPUID leaf") It just means that if we go to an unconditional WBINVD, then we need to be careful. Thanks, Tom > > Because ... > >> >> But that may be because of the issue solved by commit: >> 1f5e7eb7868e ("x86/smp: Make stop_other_cpus() more robust") > > ... the issue resolved in this commit seems to be hang. >
On Tue, 2024-02-20 at 16:30 -0600, Tom Lendacky wrote: > On 2/20/24 14:07, Huang, Kai wrote: > > On Tue, 2024-02-20 at 08:47 -0600, Tom Lendacky wrote: > > > On 2/20/24 08:28, Borislav Petkov wrote: > > > > On Mon, Feb 19, 2024 at 04:09:47PM -0600, Tom Lendacky wrote: > > > > > That's why the '!(sev_status & MSR_AMD64_SEV_ENABLED)' works here. > > > > > > > > I would've never figured that out just from staring at the test. :-\ > > > > > > > > > Basically, if you are bare-metal, it will return true. And it will only > > > > > return true for machines that support SME and have the > > > > > MSR_AMD64_SYSCFG_MEM_ENCRYPT bit set in SYS_CFG MSR because of where the > > > > > 'cc_vendor = CC_VENDOR_AMD' assignment is. However, if you move the > > > > > 'cc_vendor = CC_VENDOR_AMD' to before the if statement, then you will have > > > > > the WBINVD called for any machine that supports SME, even if SME is not > > > > > possible because the proper bit in the SYS_CFG MSR hasn't been set. > > > > > > > > > > I know what I'm trying to say, let me know if it is making sense... > > > > > > > > Yah, thanks for taking the time to explain. > > > > > > > > Here's an even more radical idea: > > > > > > > > Why not do WBINVD *unconditionally* on the CPU down path? > > > > > > > > - it is the opposite of a fast path, i.e., no one cares > > > > > > > > - it'll take care of every possible configuration without ugly logic > > > > > > > > - it wouldn't hurt to have the caches nice and coherent before going > > > > down > > > > > > > > Hmmm. > > > > > > That's what I initially did, but errors were reported, see commit: > > > f23d74f6c66c ("x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()") > > > > This changelog only mentions "Some issues". Do you know exactly what kind > > issues did you see? Are these issues only appeared on SME enabled system or > > other non-SME-capable systems too? > > I believe the issues were that different Intel systems would hang or reset > and it was bisected to that commit that added the WBINVD. It was a while > ago, but I remember that they were similar to what the 1f5e7eb7868e commit > ended up fixing, which was debugged because sometimes the WBINVD was still > occasionally issued resulting in the following patch > > 9b040453d444 ("x86/smp: Dont access non-existing CPUID leaf") > > It just means that if we go to an unconditional WBINVD, then we need to be > careful. > > Thanks Tom for the info. That helps a lot. Hi Boris, Dave, I think I still prefer to keeping the existing SME kexec behaviour, that is, to have the new CC_ATTR_HOST_MEM_INCOHERENT attribute, because in this way there will be no risk. However based on the information above I believe the risk is small if we switch to unconditional WBINVD, in which way we don't need the new attribute and there's also no new code needed for TDX to do cache flush. Btw, I want to point out stop_this_cpu() is not the only place that needs to do WBINVD for SME/TDX, the relocate_kernel() assembly also needs to: image->start = relocate_kernel((unsigned long)image->head, (unsigned long)page_list, image->start, image->preserve_context, cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)); The last function argument cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT) is for SME. The relocate_kernel() assembly checks the last argument and does WBINVD if it is true. If we go with unconditional WBINVD, I think we can just change the assembly to do unconditional WBINVD and remove the last function parameter. Please let me know your preference?
On Tue, Feb 20, 2024 at 04:30:13PM -0600, Tom Lendacky wrote: > I believe the issues were that different Intel systems would hang or reset > and it was bisected to that commit that added the WBINVD. It was a while > ago, but I remember that they were similar to what the 1f5e7eb7868e commit > ended up fixing, which was debugged because sometimes the WBINVD was still > occasionally issued resulting in the following patch > > 9b040453d444 ("x86/smp: Dont access non-existing CPUID leaf") > > It just means that if we go to an unconditional WBINVD, then we need to be > careful. Let's try it. Dave, do you remember what issues f23d74f6c66c ("x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()") fixed? If so, can you try the below diff ontop of latest tip/master to see if those issues would reappear? Thx. --- diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index ab49ade31b0d..ec4dcc9f70ca 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -824,8 +824,7 @@ void __noreturn stop_this_cpu(void *dummy) * Test the CPUID bit directly because the machine might've cleared * X86_FEATURE_SME due to cmdline options. */ - if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0))) - native_wbinvd(); + native_wbinvd(); /* * This brings a cache line back and dirties it, but
On Wed, 2024-02-21 at 10:28 +0100, Borislav Petkov wrote: > On Tue, Feb 20, 2024 at 04:30:13PM -0600, Tom Lendacky wrote: > > I believe the issues were that different Intel systems would hang or reset > > and it was bisected to that commit that added the WBINVD. It was a while > > ago, but I remember that they were similar to what the 1f5e7eb7868e commit > > ended up fixing, which was debugged because sometimes the WBINVD was still > > occasionally issued resulting in the following patch > > > > 9b040453d444 ("x86/smp: Dont access non-existing CPUID leaf") > > > > It just means that if we go to an unconditional WBINVD, then we need to be > > careful. > > Let's try it. > > Dave, do you remember what issues > > f23d74f6c66c ("x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()") > > fixed? > > If so, can you try the below diff ontop of latest tip/master to see if > those issues would reappear? > > Thx. > > --- > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > index ab49ade31b0d..ec4dcc9f70ca 100644 > --- a/arch/x86/kernel/process.c > +++ b/arch/x86/kernel/process.c > @@ -824,8 +824,7 @@ void __noreturn stop_this_cpu(void *dummy) > * Test the CPUID bit directly because the machine might've cleared > * X86_FEATURE_SME due to cmdline options. > */ > - if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0))) > - native_wbinvd(); > + native_wbinvd(); > > /* > * This brings a cache line back and dirties it, but > I really appreciate if Dave can help here. I will also reach out to see whether there's anyone in Intel met this before.
Hi, On Thu, 22 Feb 2024 at 19:50, Huang, Kai <kai.huang@intel.com> wrote: > > On Wed, 2024-02-21 at 10:28 +0100, Borislav Petkov wrote: > > On Tue, Feb 20, 2024 at 04:30:13PM -0600, Tom Lendacky wrote: > > > I believe the issues were that different Intel systems would hang or reset > > > and it was bisected to that commit that added the WBINVD. It was a while > > > ago, but I remember that they were similar to what the 1f5e7eb7868e commit > > > ended up fixing, which was debugged because sometimes the WBINVD was still > > > occasionally issued resulting in the following patch > > > > > > 9b040453d444 ("x86/smp: Dont access non-existing CPUID leaf") > > > > > > It just means that if we go to an unconditional WBINVD, then we need to be > > > careful. > > > > Let's try it. > > > > Dave, do you remember what issues > > > > f23d74f6c66c ("x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()") > > > > fixed? > > > > If so, can you try the below diff ontop of latest tip/master to see if > > those issues would reappear? > > > > Thx. > > > > --- > > > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > > index ab49ade31b0d..ec4dcc9f70ca 100644 > > --- a/arch/x86/kernel/process.c > > +++ b/arch/x86/kernel/process.c > > @@ -824,8 +824,7 @@ void __noreturn stop_this_cpu(void *dummy) > > * Test the CPUID bit directly because the machine might've cleared > > * X86_FEATURE_SME due to cmdline options. > > */ > > - if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0))) > > - native_wbinvd(); > > + native_wbinvd(); > > > > /* > > * This brings a cache line back and dirties it, but > > > > I really appreciate if Dave can help here. > > I will also reach out to see whether there's anyone in Intel met this before. I forgot the details now, let me recall and try the patches if possible. Thanks Dave
On Wed, 21 Feb 2024 at 17:33, Borislav Petkov <bp@alien8.de> wrote: > > On Tue, Feb 20, 2024 at 04:30:13PM -0600, Tom Lendacky wrote: > > I believe the issues were that different Intel systems would hang or reset > > and it was bisected to that commit that added the WBINVD. It was a while > > ago, but I remember that they were similar to what the 1f5e7eb7868e commit > > ended up fixing, which was debugged because sometimes the WBINVD was still > > occasionally issued resulting in the following patch > > > > 9b040453d444 ("x86/smp: Dont access non-existing CPUID leaf") > > > > It just means that if we go to an unconditional WBINVD, then we need to be > > careful. > > Let's try it. > > Dave, do you remember what issues > > f23d74f6c66c ("x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()") > > fixed? It should be a kexec reboot failure describe in below thread: https://lore.kernel.org/lkml/20180117072123.GA1866@dhcp-128-65.nay.redhat.com/ > > If so, can you try the below diff ontop of latest tip/master to see if > those issues would reappear? It was reproduced on an old laptop (Thinkpad t440s or t480s, I can not remember), but I have replaced them with a new different one. I tried the latest tip-master with the if condition commented out, kexec reboot works fine. Let me try to find an old laptop to see if I can do more tests, will get back later next week. > > Thx. > > --- > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > index ab49ade31b0d..ec4dcc9f70ca 100644 > --- a/arch/x86/kernel/process.c > +++ b/arch/x86/kernel/process.c > @@ -824,8 +824,7 @@ void __noreturn stop_this_cpu(void *dummy) > * Test the CPUID bit directly because the machine might've cleared > * X86_FEATURE_SME due to cmdline options. > */ > - if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0))) > - native_wbinvd(); > + native_wbinvd(); > > /* > * This brings a cache line back and dirties it, but > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette >
On Fri, 23 Feb 2024 at 18:41, Dave Young <dyoung@redhat.com> wrote: > > On Wed, 21 Feb 2024 at 17:33, Borislav Petkov <bp@alien8.de> wrote: > > > > On Tue, Feb 20, 2024 at 04:30:13PM -0600, Tom Lendacky wrote: > > > I believe the issues were that different Intel systems would hang or reset > > > and it was bisected to that commit that added the WBINVD. It was a while > > > ago, but I remember that they were similar to what the 1f5e7eb7868e commit > > > ended up fixing, which was debugged because sometimes the WBINVD was still > > > occasionally issued resulting in the following patch > > > > > > 9b040453d444 ("x86/smp: Dont access non-existing CPUID leaf") > > > > > > It just means that if we go to an unconditional WBINVD, then we need to be > > > careful. > > > > Let's try it. > > > > Dave, do you remember what issues > > > > f23d74f6c66c ("x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()") > > > > fixed? > > It should be a kexec reboot failure describe in below thread: > https://lore.kernel.org/lkml/20180117072123.GA1866@dhcp-128-65.nay.redhat.com/ > > > > > If so, can you try the below diff ontop of latest tip/master to see if > > those issues would reappear? > > It was reproduced on an old laptop (Thinkpad t440s or t480s, I can not > remember), but I have replaced them with a new different one. I tried > the latest tip-master with the if condition commented out, kexec > reboot works fine. > > Let me try to find an old laptop to see if I can do more tests, will > get back later next week. Update: tested on an old laptop as well, I did not find any problems with unconditional native_wbinvd(), kexec and kdump works fine. > > > > > Thx. > > > > --- > > > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > > index ab49ade31b0d..ec4dcc9f70ca 100644 > > --- a/arch/x86/kernel/process.c > > +++ b/arch/x86/kernel/process.c > > @@ -824,8 +824,7 @@ void __noreturn stop_this_cpu(void *dummy) > > * Test the CPUID bit directly because the machine might've cleared > > * X86_FEATURE_SME due to cmdline options. > > */ > > - if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0))) > > - native_wbinvd(); > > + native_wbinvd(); > > > > /* > > * This brings a cache line back and dirties it, but > > > > -- > > Regards/Gruss, > > Boris. > > > > https://people.kernel.org/tglx/notes-about-netiquette > >
On Wed, 2024-02-28 at 10:54 +0800, Dave Young wrote: > On Fri, 23 Feb 2024 at 18:41, Dave Young <dyoung@redhat.com> wrote: > > > > On Wed, 21 Feb 2024 at 17:33, Borislav Petkov <bp@alien8.de> wrote: > > > > > > On Tue, Feb 20, 2024 at 04:30:13PM -0600, Tom Lendacky wrote: > > > > I believe the issues were that different Intel systems would hang or reset > > > > and it was bisected to that commit that added the WBINVD. It was a while > > > > ago, but I remember that they were similar to what the 1f5e7eb7868e commit > > > > ended up fixing, which was debugged because sometimes the WBINVD was still > > > > occasionally issued resulting in the following patch > > > > > > > > 9b040453d444 ("x86/smp: Dont access non-existing CPUID leaf") > > > > > > > > It just means that if we go to an unconditional WBINVD, then we need to be > > > > careful. > > > > > > Let's try it. > > > > > > Dave, do you remember what issues > > > > > > f23d74f6c66c ("x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()") > > > > > > fixed? > > > > It should be a kexec reboot failure describe in below thread: > > https://lore.kernel.org/lkml/20180117072123.GA1866@dhcp-128-65.nay.redhat.com/ > > > > > > > > If so, can you try the below diff ontop of latest tip/master to see if > > > those issues would reappear? > > > > It was reproduced on an old laptop (Thinkpad t440s or t480s, I can not > > remember), but I have replaced them with a new different one. I tried > > the latest tip-master with the if condition commented out, kexec > > reboot works fine. > > > > Let me try to find an old laptop to see if I can do more tests, will > > get back later next week. > > Update: tested on an old laptop as well, I did not find any problems > with unconditional native_wbinvd(), kexec and kdump works fine. > > Hi DaveY, Thanks for getting back and I appreciate your help. Hi Boris/DaveH, Based on this I'll change to do unconditional WBINVD during kexec() for in stop_this_cpu() and relocate_kernel(). Please let me know if you have any comments. Thanks all!
On Wed, Feb 28, 2024 at 10:54:22AM +0800, Dave Young wrote: > Update: tested on an old laptop as well, I did not find any problems > with unconditional native_wbinvd(), kexec and kdump works fine. Thanks a lot for testing it!
On Wed, Feb 28, 2024 at 09:21:00AM +0000, Huang, Kai wrote: > Based on this I'll change to do unconditional WBINVD during kexec() for in > stop_this_cpu() and relocate_kernel(). Please let me know if you have any > comments. Yes, pls make sure to summarize in the commit message what this thread figured out. But basically, we want to try the simple approach and WBINVD unconditionally on the CPU stopping path so that caches are clean and there's no potential issues... Thx.
On 29/02/2024 12:02 am, Borislav Petkov wrote: > On Wed, Feb 28, 2024 at 09:21:00AM +0000, Huang, Kai wrote: >> Based on this I'll change to do unconditional WBINVD during kexec() for in >> stop_this_cpu() and relocate_kernel(). Please let me know if you have any >> comments. > > Yes, pls make sure to summarize in the commit message what this thread > figured out. But basically, we want to try the simple approach and > WBINVD unconditionally on the CPU stopping path so that caches are clean > and there's no potential issues... > Yes will do. Thanks for the feedback.
diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c index eeec9986570e..8d6d727e6e18 100644 --- a/arch/x86/coco/core.c +++ b/arch/x86/coco/core.c @@ -72,6 +72,19 @@ static bool noinstr amd_cc_platform_has(enum cc_attr attr) case CC_ATTR_HOST_MEM_ENCRYPT: return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED); + case CC_ATTR_HOST_MEM_INCOHERENT: + /* + * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME has + * enabled on the platform regardless whether the kernel + * has actually enabled the SME. + */ + return !(sev_status & MSR_AMD64_SEV_ENABLED); + + /* + * For all CC_ATTR_GUEST_* there's no need to check sme_me_mask + * as it must be true when there's any SEV enable bit set in + * sev_status. + */ case CC_ATTR_GUEST_MEM_ENCRYPT: return sev_status & MSR_AMD64_SEV_ENABLED; diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c index bc0a5348b4a6..c9c6974e2e9c 100644 --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -358,7 +358,7 @@ void machine_kexec(struct kimage *image) (unsigned long)page_list, image->start, image->preserve_context, - cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)); + cc_platform_has(CC_ATTR_HOST_MEM_INCOHERENT)); #ifdef CONFIG_KEXEC_JUMP if (image->preserve_context) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index ab49ade31b0d..2c7e8d9889c0 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -813,18 +813,10 @@ void __noreturn stop_this_cpu(void *dummy) mcheck_cpu_clear(c); /* - * Use wbinvd on processors that support SME. This provides support - * for performing a successful kexec when going from SME inactive - * to SME active (or vice-versa). The cache must be cleared so that - * if there are entries with the same physical address, both with and - * without the encryption bit, they don't race each other when flushed - * and potentially end up with the wrong entry being committed to - * memory. - * - * Test the CPUID bit directly because the machine might've cleared - * X86_FEATURE_SME due to cmdline options. + * Use wbinvd on processors that the first kernel *could* + * potentially leave incoherent cachelines. */ - if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0))) + if (cc_platform_has(CC_ATTR_HOST_MEM_INCOHERENT)) native_wbinvd(); /* diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c index 7f72472a34d6..87e4fddab770 100644 --- a/arch/x86/mm/mem_encrypt_identity.c +++ b/arch/x86/mm/mem_encrypt_identity.c @@ -570,9 +570,19 @@ void __init sme_enable(struct boot_params *bp) msr = __rdmsr(MSR_AMD64_SYSCFG); if (!(msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT)) return; + + /* + * Always set CC vendor when the platform has SME enabled + * regardless whether the kernel will actually activates the + * SME or not. This reports the CC_ATTR_HOST_MEM_INCOHERENT + * being true as long as the platform has SME enabled so that + * stop_this_cpu() can do necessary WBINVD during kexec(). + */ + cc_vendor = CC_VENDOR_AMD; } else { /* SEV state cannot be controlled by a command line option */ sme_me_mask = me_mask; + cc_vendor = CC_VENDOR_AMD; goto out; } @@ -608,7 +618,6 @@ void __init sme_enable(struct boot_params *bp) out: if (sme_me_mask) { physical_mask &= ~sme_me_mask; - cc_vendor = CC_VENDOR_AMD; cc_set_mask(sme_me_mask); } } diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h index cb0d6cd1c12f..2f7273596102 100644 --- a/include/linux/cc_platform.h +++ b/include/linux/cc_platform.h @@ -42,6 +42,21 @@ enum cc_attr { */ CC_ATTR_HOST_MEM_ENCRYPT, + /** + * @CC_ATTR_HOST_MEM_INCOHERENT: Host memory encryption can be + * incoherent + * + * The platform/OS is running as a bare-metal system or a hypervisor. + * The memory encryption engine might have left non-cache-coherent + * data in the caches that needs to be flushed. + * + * Use this in places where the cache coherency of the memory matters + * but the encryption status does not. + * + * Includes all systems that set CC_ATTR_HOST_MEM_ENCRYPT. + */ + CC_ATTR_HOST_MEM_INCOHERENT, + /** * @CC_ATTR_GUEST_MEM_ENCRYPT: Guest memory encryption is active *