Message ID | 20221209132524.20200-3-kirill.shutemov@linux.intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp776547wrr; Fri, 9 Dec 2022 05:33:29 -0800 (PST) X-Google-Smtp-Source: AA0mqf5UJGBWLIvpn9kGpTl3JUZcscr8vvlqClf09sl7r5uDIVk4qQUMxPSvqTkyRj7/vSFb/3xP X-Received: by 2002:a17:906:1851:b0:7c1:1dc7:8837 with SMTP id w17-20020a170906185100b007c11dc78837mr5768614eje.66.1670592809521; Fri, 09 Dec 2022 05:33:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670592809; cv=none; d=google.com; s=arc-20160816; b=yJCv0P/lvzdYrhcU5EoXP/iF8NiNeqFpqC6ty+EYepmXF0sJ80ElXk7oZ/IffISNrW 28/38BKDhrB5zeI5Tq/PGJTqM47zjfueuESrantGMg+3mZjkeBvKx4V8v7VT/4UfYDGd QRecxj0fIUqT71+yk+ZFblWEjy1AuPNMLiQxItqK53mhK4Vu+dOXJ9LazGNKkrsG/Ttg bjQa4qHgVEQ6UJSxN3VbSZ+c7YJNfdO0r4e/YVnhWAv//du2nWpaOW4gXEEQirOXtkm5 mkl0N6N2LlUpKTLFarqRPFfpEsvKrpRX86TfFK764/f5zE/Pm9E5wEIxZplTeleCn8kl Et5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=sRY74v1eiiHXwmVm041LIaCBELB9b4B0UiBwt3G+wm8=; b=uRXDwewpQoDCmPfkDa/4D3nyqNzZow/HbTRCSdPlke1tcvuLzOXLfQYSxvljjDaf6W SAmue1CHAsDipJtMY4oeyUgYSoqmgajRlrUyuDMy3+IVrWPlBtT3G7nYxyL4TRNfuVOW egyjo4fAGNatMzArmwHDsgKNFvVNwv6DbBVCkwfiB0yqdgggUDOnRfD8bQkRU/qXYrt1 PmNOKXtEKXCg3wOHXKX1ACacJqPoA5miqT1q087e2kqDX+NKvqEZ+EqYNd+JOhQzT++9 hGcrxBQvqI5BuRS03s9V9vjEBMjn+0NJy34zot/UtwpSlTKzEJshbUM6D8FmsjppOZVd lhVA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=EovttXS6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id xa14-20020a170906fd8e00b007adadfc97c7si1188062ejb.918.2022.12.09.05.33.06; Fri, 09 Dec 2022 05:33:29 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=EovttXS6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229822AbiLINZm (ORCPT <rfc822;sophiezhao968@gmail.com> + 99 others); Fri, 9 Dec 2022 08:25:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40830 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229604AbiLINZi (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 9 Dec 2022 08:25:38 -0500 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B7D2E3D93D for <linux-kernel@vger.kernel.org>; Fri, 9 Dec 2022 05:25:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670592337; x=1702128337; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=5keQeb9S+KFaicFjIjAGLohVdO9xiTEtldY0S8UVDYw=; b=EovttXS6iaIn9O5sYm0fZH0QypmTs2UndyTrZIFP8Gnm/NL1lDJsVoNU fL78QWfhBHr77ZjJgDTUsqtFzMg9LJNXm2wbI9Icv+nX/0EMvYsL6vMSP 4cPJWtSJ0WxgZ46r7x6scxkcX+yQ+VEyP18VRGGXZFfSAlCgxWlO7s6Nm 6Vq/sw86wqWJfJ4Kpb23GmoQINMNfaWmAvvqlK0hm4nmbvS/cC+4tVFyU fAPStuz99LTp5tEzp+OCCWac/wSL8hAYjRVWv5sKkuihVxhJ72hEqfPx8 0TpfyMHse3vwq3AFO80OgpMSPtNtShBVRVC4V6pMieJVXkgDyw/1DvRVl A==; X-IronPort-AV: E=McAfee;i="6500,9779,10556"; a="317483303" X-IronPort-AV: E=Sophos;i="5.96,230,1665471600"; d="scan'208";a="317483303" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2022 05:25:36 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10556"; a="892670378" X-IronPort-AV: E=Sophos;i="5.96,230,1665471600"; d="scan'208";a="892670378" Received: from elinares-mobl.ger.corp.intel.com (HELO box.shutemov.name) ([10.249.38.98]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2022 05:25:33 -0800 Received: by box.shutemov.name (Postfix, from userid 1000) id 4B536109CE4; Fri, 9 Dec 2022 16:25:31 +0300 (+03) From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> To: Dave Hansen <dave.hansen@intel.com>, Borislav Petkov <bp@alien8.de>, Andy Lutomirski <luto@kernel.org> Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>, Thomas Gleixner <tglx@linutronix.de>, Elena Reshetova <elena.reshetova@intel.com>, x86@kernel.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Subject: [PATCH 2/4] x86/tdx: Use ReportFatalError to report missing SEPT_VE_DISABLE Date: Fri, 9 Dec 2022 16:25:22 +0300 Message-Id: <20221209132524.20200-3-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.38.0 In-Reply-To: <20221209132524.20200-1-kirill.shutemov@linux.intel.com> References: <20221209132524.20200-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_HI,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751743526062575264?= X-GMAIL-MSGID: =?utf-8?q?1751743526062575264?= |
Series |
x86/tdx: Changes for TDX guest initialization
|
|
Commit Message
Kirill A. Shutemov
Dec. 9, 2022, 1:25 p.m. UTC
The check for SEPT_VE_DISABLE happens early in the kernel boot where
earlyprintk is not yet functional. Kernel successfully detect broken
TD configuration and stops the kernel with panic(), but it cannot
communicate the reason to the user.
Use TDG.VP.VMCALL<ReportFatalError> to report the error. The hypercall
can encode message up to 64 bytes in eight registers.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
arch/x86/coco/tdx/tdx.c | 38 +++++++++++++++++++++++++++++++++++++-
1 file changed, 37 insertions(+), 1 deletion(-)
Comments
On 12/9/22 5:25 AM, Kirill A. Shutemov wrote: > The check for SEPT_VE_DISABLE happens early in the kernel boot where > earlyprintk is not yet functional. Kernel successfully detect broken > TD configuration and stops the kernel with panic(), but it cannot > communicate the reason to the user. > > Use TDG.VP.VMCALL<ReportFatalError> to report the error. The hypercall > can encode message up to 64 bytes in eight registers. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > --- > arch/x86/coco/tdx/tdx.c | 38 +++++++++++++++++++++++++++++++++++++- > 1 file changed, 37 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c > index cfd4c95b9f04..8ad04d101270 100644 > --- a/arch/x86/coco/tdx/tdx.c > +++ b/arch/x86/coco/tdx/tdx.c > @@ -22,6 +22,7 @@ > > /* TDX hypercall Leaf IDs */ > #define TDVMCALL_MAP_GPA 0x10001 > +#define TDVMCALL_REPORT_FATAL_ERROR 0x10003 > > /* MMIO direction */ > #define EPT_READ 0 > @@ -140,6 +141,41 @@ int tdx_mcall_get_report0(u8 *reportdata, u8 *tdreport) > } > EXPORT_SYMBOL_GPL(tdx_mcall_get_report0); > > +static void __noreturn tdx_panic(const char *msg) > +{ > + struct tdx_hypercall_args args = { > + .r10 = TDX_HYPERCALL_STANDARD, > + .r11 = TDVMCALL_REPORT_FATAL_ERROR, > + .r12 = 0, /* Error code: 0 is Panic */ > + }; > + union { > + /* Define register order according to the GHCI */ > + struct { u64 r14, r15, rbx, rdi, rsi, r8, r9, rdx; }; > + > + char str[64]; > + } message; > + > + /* VMM assumes '\0' in byte 65, if the message took all 64 bytes */ > + strncpy(message.str, msg, 64); > + > + args.r8 = message.r8; > + args.r9 = message.r9; > + args.r14 = message.r14; > + args.r15 = message.r15; > + args.rdi = message.rdi; > + args.rsi = message.rsi; > + args.rbx = message.rbx; > + args.rdx = message.rdx; > + > + /* > + * Keep calling the hypercall in case VMM did not terminated > + * the TD as it must. > + */ > + while (1) { > + __tdx_hypercall(&args, 0); > + } Instead of an infinite loop, I'm wondering if the guest should panic after retrying for few times. > +} > + > static void tdx_parse_tdinfo(u64 *cc_mask) > { > struct tdx_module_output out; > @@ -172,7 +208,7 @@ static void tdx_parse_tdinfo(u64 *cc_mask) > */ > td_attr = out.rdx; > if (!(td_attr & ATTR_SEPT_VE_DISABLE)) > - panic("TD misconfiguration: SEPT_VE_DISABLE attibute must be set.\n"); > + tdx_panic("TD misconfiguration: SEPT_VE_DISABLE attribute must be set."); > } > > /*
On Fri, Dec 09, 2022 at 07:42:56AM -0800, Sathyanarayanan Kuppuswamy wrote: > > > On 12/9/22 5:25 AM, Kirill A. Shutemov wrote: > > The check for SEPT_VE_DISABLE happens early in the kernel boot where > > earlyprintk is not yet functional. Kernel successfully detect broken > > TD configuration and stops the kernel with panic(), but it cannot > > communicate the reason to the user. > > > > Use TDG.VP.VMCALL<ReportFatalError> to report the error. The hypercall > > can encode message up to 64 bytes in eight registers. > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > --- > > arch/x86/coco/tdx/tdx.c | 38 +++++++++++++++++++++++++++++++++++++- > > 1 file changed, 37 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c > > index cfd4c95b9f04..8ad04d101270 100644 > > --- a/arch/x86/coco/tdx/tdx.c > > +++ b/arch/x86/coco/tdx/tdx.c > > @@ -22,6 +22,7 @@ > > > > /* TDX hypercall Leaf IDs */ > > #define TDVMCALL_MAP_GPA 0x10001 > > +#define TDVMCALL_REPORT_FATAL_ERROR 0x10003 > > > > /* MMIO direction */ > > #define EPT_READ 0 > > @@ -140,6 +141,41 @@ int tdx_mcall_get_report0(u8 *reportdata, u8 *tdreport) > > } > > EXPORT_SYMBOL_GPL(tdx_mcall_get_report0); > > > > +static void __noreturn tdx_panic(const char *msg) > > +{ > > + struct tdx_hypercall_args args = { > > + .r10 = TDX_HYPERCALL_STANDARD, > > + .r11 = TDVMCALL_REPORT_FATAL_ERROR, > > + .r12 = 0, /* Error code: 0 is Panic */ > > + }; > > + union { > > + /* Define register order according to the GHCI */ > > + struct { u64 r14, r15, rbx, rdi, rsi, r8, r9, rdx; }; > > + > > + char str[64]; > > + } message; > > + > > + /* VMM assumes '\0' in byte 65, if the message took all 64 bytes */ > > + strncpy(message.str, msg, 64); > > + > > + args.r8 = message.r8; > > + args.r9 = message.r9; > > + args.r14 = message.r14; > > + args.r15 = message.r15; > > + args.rdi = message.rdi; > > + args.rsi = message.rsi; > > + args.rbx = message.rbx; > > + args.rdx = message.rdx; > > + > > + /* > > + * Keep calling the hypercall in case VMM did not terminated > > + * the TD as it must. > > + */ > > + while (1) { > > + __tdx_hypercall(&args, 0); > > + } > > Instead of an infinite loop, I'm wondering if the guest should panic after > retrying for few times. Hm. What difference would it make?
On 12/9/22 9:06 AM, Kirill A. Shutemov wrote: > On Fri, Dec 09, 2022 at 07:42:56AM -0800, Sathyanarayanan Kuppuswamy wrote: >> >> >> On 12/9/22 5:25 AM, Kirill A. Shutemov wrote: >>> The check for SEPT_VE_DISABLE happens early in the kernel boot where >>> earlyprintk is not yet functional. Kernel successfully detect broken >>> TD configuration and stops the kernel with panic(), but it cannot >>> communicate the reason to the user. >>> >>> Use TDG.VP.VMCALL<ReportFatalError> to report the error. The hypercall >>> can encode message up to 64 bytes in eight registers. >>> >>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> >>> --- >>> arch/x86/coco/tdx/tdx.c | 38 +++++++++++++++++++++++++++++++++++++- >>> 1 file changed, 37 insertions(+), 1 deletion(-) >>> >>> diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c >>> index cfd4c95b9f04..8ad04d101270 100644 >>> --- a/arch/x86/coco/tdx/tdx.c >>> +++ b/arch/x86/coco/tdx/tdx.c >>> @@ -22,6 +22,7 @@ >>> >>> /* TDX hypercall Leaf IDs */ >>> #define TDVMCALL_MAP_GPA 0x10001 >>> +#define TDVMCALL_REPORT_FATAL_ERROR 0x10003 >>> >>> /* MMIO direction */ >>> #define EPT_READ 0 >>> @@ -140,6 +141,41 @@ int tdx_mcall_get_report0(u8 *reportdata, u8 *tdreport) >>> } >>> EXPORT_SYMBOL_GPL(tdx_mcall_get_report0); >>> >>> +static void __noreturn tdx_panic(const char *msg) >>> +{ >>> + struct tdx_hypercall_args args = { >>> + .r10 = TDX_HYPERCALL_STANDARD, >>> + .r11 = TDVMCALL_REPORT_FATAL_ERROR, >>> + .r12 = 0, /* Error code: 0 is Panic */ >>> + }; >>> + union { >>> + /* Define register order according to the GHCI */ >>> + struct { u64 r14, r15, rbx, rdi, rsi, r8, r9, rdx; }; >>> + >>> + char str[64]; >>> + } message; >>> + >>> + /* VMM assumes '\0' in byte 65, if the message took all 64 bytes */ >>> + strncpy(message.str, msg, 64); >>> + >>> + args.r8 = message.r8; >>> + args.r9 = message.r9; >>> + args.r14 = message.r14; >>> + args.r15 = message.r15; >>> + args.rdi = message.rdi; >>> + args.rsi = message.rsi; >>> + args.rbx = message.rbx; >>> + args.rdx = message.rdx; >>> + >>> + /* >>> + * Keep calling the hypercall in case VMM did not terminated >>> + * the TD as it must. >>> + */ >>> + while (1) { >>> + __tdx_hypercall(&args, 0); >>> + } >> >> Instead of an infinite loop, I'm wondering if the guest should panic after >> retrying for few times. > > Hm. What difference would it make? IIUC, the goal of this patch is to report the fatal error to VMM and panic. But, if VMM does not terminate the guest as we expect, rather than trying continuously, isn't it better to panic ourselves? That way the behavior will be similar to what we have currently. >
On 12/9/22 12:51, Sathyanarayanan Kuppuswamy wrote: >>>> + while (1) { >>>> + __tdx_hypercall(&args, 0); >>>> + } >>> Instead of an infinite loop, I'm wondering if the guest should panic after >>> retrying for few times. >> Hm. What difference would it make? > IIUC, the goal of this patch is to report the fatal error to VMM and panic. > But, if VMM does not terminate the guest as we expect, rather than trying > continuously, isn't it better to panic ourselves? That way the behavior > will be similar to what we have currently. What does "panic ourselves" mean exactly? What is the current behavior which that would match?
On 12/12/22 8:10 AM, Dave Hansen wrote: > On 12/9/22 12:51, Sathyanarayanan Kuppuswamy wrote: >>>>> + while (1) { >>>>> + __tdx_hypercall(&args, 0); >>>>> + } >>>> Instead of an infinite loop, I'm wondering if the guest should panic after >>>> retrying for few times. >>> Hm. What difference would it make? >> IIUC, the goal of this patch is to report the fatal error to VMM and panic. >> But, if VMM does not terminate the guest as we expect, rather than trying >> continuously, isn't it better to panic ourselves? That way the behavior >> will be similar to what we have currently. > > What does "panic ourselves" mean exactly? What is the current behavior > which that would match? I meant directly calling panic(). Before this patch, if the SEPT VE DISABLE attribute was not set, we would call panic(). In this patch, we try to report the error to VMM and wait for it to terminate the guest in the same case. But after reporting the error, if VMM does not terminate the guest as expected, I thought instead of retrying continuously, we can call panic() directly after some retries. >
On 12/12/22 08:37, Sathyanarayanan Kuppuswamy wrote: > On 12/12/22 8:10 AM, Dave Hansen wrote: >> On 12/9/22 12:51, Sathyanarayanan Kuppuswamy wrote: >>>>>> + while (1) { >>>>>> + __tdx_hypercall(&args, 0); >>>>>> + } >>>>> Instead of an infinite loop, I'm wondering if the guest should panic after >>>>> retrying for few times. >>>> Hm. What difference would it make? >>> IIUC, the goal of this patch is to report the fatal error to VMM and panic. >>> But, if VMM does not terminate the guest as we expect, rather than trying >>> continuously, isn't it better to panic ourselves? That way the behavior >>> will be similar to what we have currently. >> What does "panic ourselves" mean exactly? What is the current behavior >> which that would match? > I meant directly calling panic(). Before this patch, if the SEPT VE DISABLE > attribute was not set, we would call panic(). In this patch, we try to report > the error to VMM and wait for it to terminate the guest in the same case. > But after reporting the error, if VMM does not terminate the guest as expected, > I thought instead of retrying continuously, we can call panic() directly after > some retries. Could you explain how panic() is better than retrying? You might also want to go look at the original changelog for this patch.
On 12/9/22 05:25, Kirill A. Shutemov wrote: > The check for SEPT_VE_DISABLE happens early in the kernel boot where > earlyprintk is not yet functional. Kernel successfully detect broken > TD configuration and stops the kernel with panic(), but it cannot > communicate the reason to the user. Linux TDX guests require that the SEPT_VE_DISABLE "attribute" be set. If it is not set, the kernel is theoretically required to handle exceptions anywhere that kernel memory is accessed, including places like NMI handlers and in the syscall entry gap. Rather than even try to handle these exceptions, the kernel refuses to run if SEPT_VE_DISABLE is unset. However, the SEPT_VE_DISABLE detection and refusal code happens very early in boot, even before earlyprintk runs. Calling panic() will effectively just hang the system. Instead, call a TDX-specific panic() function. This makes a very simple TDVMCALL which gets a short error string out to the hypervisor without any console infrastructure. -- Is that better? Also, are you sure we want to do this? Is there any way to do this inside of panic() itself to get panic() itself to call tdx_panic() and get a short error message out to the hypervisor? Getting *all* users of panic this magic ability would be a lot better than giving it to one call-site of panic(). I'm all for making the panic() path as short and simple as possible, but it would be nice if this fancy hypercall would get used in more than one spot. > Use TDG.VP.VMCALL<ReportFatalError> to report the error. The hypercall > can encode message up to 64 bytes in eight registers. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > --- > arch/x86/coco/tdx/tdx.c | 38 +++++++++++++++++++++++++++++++++++++- > 1 file changed, 37 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c > index cfd4c95b9f04..8ad04d101270 100644 > --- a/arch/x86/coco/tdx/tdx.c > +++ b/arch/x86/coco/tdx/tdx.c > @@ -22,6 +22,7 @@ > > /* TDX hypercall Leaf IDs */ > #define TDVMCALL_MAP_GPA 0x10001 > +#define TDVMCALL_REPORT_FATAL_ERROR 0x10003 > > /* MMIO direction */ > #define EPT_READ 0 > @@ -140,6 +141,41 @@ int tdx_mcall_get_report0(u8 *reportdata, u8 *tdreport) > } > EXPORT_SYMBOL_GPL(tdx_mcall_get_report0); > > +static void __noreturn tdx_panic(const char *msg) > +{ > + struct tdx_hypercall_args args = { > + .r10 = TDX_HYPERCALL_STANDARD, > + .r11 = TDVMCALL_REPORT_FATAL_ERROR, > + .r12 = 0, /* Error code: 0 is Panic */ > + }; > + union { > + /* Define register order according to the GHCI */ > + struct { u64 r14, r15, rbx, rdi, rsi, r8, r9, rdx; }; > + > + char str[64]; > + } message; > + > + /* VMM assumes '\0' in byte 65, if the message took all 64 bytes */ > + strncpy(message.str, msg, 64); > + > + args.r8 = message.r8; > + args.r9 = message.r9; > + args.r14 = message.r14; > + args.r15 = message.r15; > + args.rdi = message.rdi; > + args.rsi = message.rsi; > + args.rbx = message.rbx; > + args.rdx = message.rdx; I dunno. Is that struct/union better, or would something like this be more readable: args.r8 = *(u64 *)&message[48]; args.r9 = *(u64 *)&message[56]; and just hard-code the offsets. > + /* > + * Keep calling the hypercall in case VMM did not terminated terminate^ > + * the TD as it must. > + */ > + while (1) { > + __tdx_hypercall(&args, 0); > + } > +} > + > static void tdx_parse_tdinfo(u64 *cc_mask) > { > struct tdx_module_output out; > @@ -172,7 +208,7 @@ static void tdx_parse_tdinfo(u64 *cc_mask) > */ > td_attr = out.rdx; > if (!(td_attr & ATTR_SEPT_VE_DISABLE)) > - panic("TD misconfiguration: SEPT_VE_DISABLE attibute must be set.\n"); > + tdx_panic("TD misconfiguration: SEPT_VE_DISABLE attribute must be set."); > } Would it be worth making it more clear when the message is truncated? Maybe something like: if (strlen(msg) > 64) { len = 64 strncpy(&msg[61], "...", 3); } I'm sure I have five off-by-one bugs in there, but you get the idea. Can we stick a "..." at the end of things that get truncated?
On Tue, Dec 13, 2022 at 03:06:07PM -0800, Dave Hansen wrote: > On 12/9/22 05:25, Kirill A. Shutemov wrote: > > The check for SEPT_VE_DISABLE happens early in the kernel boot where > > earlyprintk is not yet functional. Kernel successfully detect broken > > TD configuration and stops the kernel with panic(), but it cannot > > communicate the reason to the user. > > Linux TDX guests require that the SEPT_VE_DISABLE "attribute" be set. > If it is not set, the kernel is theoretically required to handle > exceptions anywhere that kernel memory is accessed, including places > like NMI handlers and in the syscall entry gap. > > Rather than even try to handle these exceptions, the kernel refuses to > run if SEPT_VE_DISABLE is unset. > > However, the SEPT_VE_DISABLE detection and refusal code happens very > early in boot, even before earlyprintk runs. Calling panic() will > effectively just hang the system. > > Instead, call a TDX-specific panic() function. This makes a very simple > TDVMCALL which gets a short error string out to the hypervisor without > any console infrastructure. > > -- > > Is that better? Yes, thank you. > Also, are you sure we want to do this? Is there any way to do this > inside of panic() itself to get panic() itself to call tdx_panic() and > get a short error message out to the hypervisor? > > Getting *all* users of panic this magic ability would be a lot better > than giving it to one call-site of panic(). > > I'm all for making the panic() path as short and simple as possible, but > it would be nice if this fancy hypercall would get used in more than one > spot. Well, I don't see an obvious way to integrate this into panic(). There is panic_notifier_list and it kinda/sorta works, see the patch below. But it breaks panic_notifier_list contract: the callback will never return and no other callback will be able to do their stuff. panic_timeout is also broken. So ReportFatalError() is no good for the task. And I don't have anything else :/ diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index 83ca9a7f0b75..81f9a964dc1f 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -7,6 +7,7 @@ #include <linux/cpufeature.h> #include <linux/export.h> #include <linux/io.h> +#include <linux/panic_notifier.h> #include <asm/coco.h> #include <asm/tdx.h> #include <asm/vmx.h> @@ -146,8 +147,10 @@ int tdx_mcall_get_report0(u8 *reportdata, u8 *tdreport) } EXPORT_SYMBOL_GPL(tdx_mcall_get_report0); -static void __noreturn tdx_panic(const char *msg) +static int tdx_panic(struct notifier_block *this, + unsigned long event, void *ptr) { + const char *msg = ptr; struct tdx_hypercall_args args = { .r10 = TDX_HYPERCALL_STANDARD, .r11 = TDVMCALL_REPORT_FATAL_ERROR, @@ -219,7 +222,7 @@ static void tdx_parse_tdinfo(u64 *cc_mask) if (td_attr & ATTR_DEBUG) pr_warn("%s\n", msg); else - tdx_panic(msg); + panic(msg); } } @@ -851,6 +854,10 @@ static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc) return true; } +static struct notifier_block panic_block = { + .notifier_call = tdx_panic, +}; + void __init tdx_early_init(void) { u64 cc_mask; @@ -863,6 +870,7 @@ void __init tdx_early_init(void) setup_force_cpu_cap(X86_FEATURE_TDX_GUEST); + atomic_notifier_chain_register(&panic_notifier_list, &panic_block); cc_set_vendor(CC_VENDOR_INTEL); tdx_parse_tdinfo(&cc_mask); cc_set_mask(cc_mask);
On 12/15/22 09:12, Kirill A. Shutemov wrote: >> Getting *all* users of panic this magic ability would be a lot better >> than giving it to one call-site of panic(). >> >> I'm all for making the panic() path as short and simple as possible, but >> it would be nice if this fancy hypercall would get used in more than one >> spot. > Well, I don't see an obvious way to integrate this into panic(). > > There is panic_notifier_list and it kinda/sorta works, see the patch > below. > > But it breaks panic_notifier_list contract: the callback will never return > and no other callback will be able to do their stuff. panic_timeout is > also broken. > > So ReportFatalError() is no good for the task. And I don't have anything > else :/ Do we *really* have to do a hard stop when SEPT_VE_DISABLE is missing? Wouldn't it be simpler to just defer the check until we can spit out a sane error message about it? Or is there too much security exposure by continuing?
On Thu, Dec 15, 2022 at 10:18:24AM -0800, Dave Hansen wrote: > On 12/15/22 09:12, Kirill A. Shutemov wrote: > >> Getting *all* users of panic this magic ability would be a lot better > >> than giving it to one call-site of panic(). > >> > >> I'm all for making the panic() path as short and simple as possible, but > >> it would be nice if this fancy hypercall would get used in more than one > >> spot. > > Well, I don't see an obvious way to integrate this into panic(). > > > > There is panic_notifier_list and it kinda/sorta works, see the patch > > below. > > > > But it breaks panic_notifier_list contract: the callback will never return > > and no other callback will be able to do their stuff. panic_timeout is > > also broken. > > > > So ReportFatalError() is no good for the task. And I don't have anything > > else :/ > > Do we *really* have to do a hard stop when SEPT_VE_DISABLE is missing? > > Wouldn't it be simpler to just defer the check until we can spit out a > sane error message about it? > > Or is there too much security exposure by continuing? Well, I guess we can. We always have attestation as a backstop. No sensitive user data has to be exposed to the TD before it passed the attestation. Do you prefer to have a separate initcall just to check SEPT_VE_DISABLE?
On 12/15/22 10:51, Kirill A. Shutemov wrote: >>> So ReportFatalError() is no good for the task. And I don't have anything >>> else :/ >> Do we *really* have to do a hard stop when SEPT_VE_DISABLE is missing? >> >> Wouldn't it be simpler to just defer the check until we can spit out a >> sane error message about it? >> >> Or is there too much security exposure by continuing? > Well, I guess we can. We always have attestation as a backstop. No > sensitive user data has to be exposed to the TD before it passed > the attestation. OK, so let's just pretend that SEPT_VE_DISABLE=0 is a blatant root hole that lets the VMM compromise the TDX guest (I know it's not, but let's just pretend it is). The guest starts up, the VMM compromises it after the attestation has run. The now compromised guest send along its report. But, since the report contains (or implies???) SEPT_VE_DISABLE=0, the guest will be assumed to be compromised and won't get any secrets provisioned? That assumes that the attestation service knows that SEPT_VE_DISABLE==0 plus Linux is bad. Is that a good assumption? > Do you prefer to have a separate initcall just to check SEPT_VE_DISABLE? I don't feel strongly about where the check should be as long as it can get a message out to the console.
On Thu, Dec 15, 2022 at 01:09:10PM -0800, Dave Hansen wrote: > On 12/15/22 10:51, Kirill A. Shutemov wrote: > >>> So ReportFatalError() is no good for the task. And I don't have anything > >>> else :/ > >> Do we *really* have to do a hard stop when SEPT_VE_DISABLE is missing? > >> > >> Wouldn't it be simpler to just defer the check until we can spit out a > >> sane error message about it? > >> > >> Or is there too much security exposure by continuing? > > Well, I guess we can. We always have attestation as a backstop. No > > sensitive user data has to be exposed to the TD before it passed > > the attestation. > > OK, so let's just pretend that SEPT_VE_DISABLE=0 is a blatant root hole > that lets the VMM compromise the TDX guest (I know it's not, but let's > just pretend it is). > > The guest starts up, the VMM compromises it after the attestation has > run. The now compromised guest send along its report. But, since the > report contains (or implies???) SEPT_VE_DISABLE=0, the guest will be > assumed to be compromised and won't get any secrets provisioned? > > That assumes that the attestation service knows that SEPT_VE_DISABLE==0 > plus Linux is bad. Is that a good assumption? I know that attestation quote includes all required information (attributes and kernel hash) to make the decision and I assume that attestation service is competent. So, yes, I think expectation Linux + SEPT_VE_DISABLE==0 going to be rejected is reasonable. Elena, is there anything you can elaborate on here? > > Do you prefer to have a separate initcall just to check SEPT_VE_DISABLE? > > I don't feel strongly about where the check should be as long as it can > get a message out to the console. I would rather keep current approach with simple tdx_panic() for early use if it works for you.
> > On Thu, Dec 15, 2022 at 01:09:10PM -0800, Dave Hansen wrote: > > On 12/15/22 10:51, Kirill A. Shutemov wrote: > > >>> So ReportFatalError() is no good for the task. And I don't have anything > > >>> else :/ > > >> Do we *really* have to do a hard stop when SEPT_VE_DISABLE is missing? > > >> > > >> Wouldn't it be simpler to just defer the check until we can spit out a > > >> sane error message about it? > > >> > > >> Or is there too much security exposure by continuing? > > > Well, I guess we can. We always have attestation as a backstop. No > > > sensitive user data has to be exposed to the TD before it passed > > > the attestation. > > > > OK, so let's just pretend that SEPT_VE_DISABLE=0 is a blatant root hole > > that lets the VMM compromise the TDX guest (I know it's not, but let's > > just pretend it is). > > > > The guest starts up, the VMM compromises it after the attestation has > > run. The now compromised guest send along its report. But, since the > > report contains (or implies???) SEPT_VE_DISABLE=0, the guest will be > > assumed to be compromised and won't get any secrets provisioned? > > > > That assumes that the attestation service knows that SEPT_VE_DISABLE==0 > > plus Linux is bad. Is that a good assumption? > > I know that attestation quote includes all required information > (attributes and kernel hash) to make the decision and I assume that > attestation service is competent. So, yes, I think expectation Linux + > SEPT_VE_DISABLE==0 going to be rejected is reasonable. > > Elena, is there anything you can elaborate on here? Yes, attestation quote has the attribute included for SEPT_VE_DISABLE. So the remote verifier can check this, *if* it understands that it is important. However, it is a big *IF* imo. In TDX module spec and attestation specs, SEPT_VE_DISABLE is marked as attribute that "potentially impacts security" vs TUD attributes like DEBUG that are classified as "your TD is not secure at all". So, we will be relying on verifiers to understand that in Linux case it is a critical thing vs "potentially impacting security thing". We will document this specifically in our TDX guest kernel documentation, but I have no guarantees on how careful people are reading it. My preference is to do the right thing in code. Best Regards, Elena.
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index cfd4c95b9f04..8ad04d101270 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -22,6 +22,7 @@ /* TDX hypercall Leaf IDs */ #define TDVMCALL_MAP_GPA 0x10001 +#define TDVMCALL_REPORT_FATAL_ERROR 0x10003 /* MMIO direction */ #define EPT_READ 0 @@ -140,6 +141,41 @@ int tdx_mcall_get_report0(u8 *reportdata, u8 *tdreport) } EXPORT_SYMBOL_GPL(tdx_mcall_get_report0); +static void __noreturn tdx_panic(const char *msg) +{ + struct tdx_hypercall_args args = { + .r10 = TDX_HYPERCALL_STANDARD, + .r11 = TDVMCALL_REPORT_FATAL_ERROR, + .r12 = 0, /* Error code: 0 is Panic */ + }; + union { + /* Define register order according to the GHCI */ + struct { u64 r14, r15, rbx, rdi, rsi, r8, r9, rdx; }; + + char str[64]; + } message; + + /* VMM assumes '\0' in byte 65, if the message took all 64 bytes */ + strncpy(message.str, msg, 64); + + args.r8 = message.r8; + args.r9 = message.r9; + args.r14 = message.r14; + args.r15 = message.r15; + args.rdi = message.rdi; + args.rsi = message.rsi; + args.rbx = message.rbx; + args.rdx = message.rdx; + + /* + * Keep calling the hypercall in case VMM did not terminated + * the TD as it must. + */ + while (1) { + __tdx_hypercall(&args, 0); + } +} + static void tdx_parse_tdinfo(u64 *cc_mask) { struct tdx_module_output out; @@ -172,7 +208,7 @@ static void tdx_parse_tdinfo(u64 *cc_mask) */ td_attr = out.rdx; if (!(td_attr & ATTR_SEPT_VE_DISABLE)) - panic("TD misconfiguration: SEPT_VE_DISABLE attibute must be set.\n"); + tdx_panic("TD misconfiguration: SEPT_VE_DISABLE attribute must be set."); } /*