Message ID | 20231205105030.8698-25-xin3.li@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp3359697vqy; Tue, 5 Dec 2023 03:23:48 -0800 (PST) X-Google-Smtp-Source: AGHT+IHzLdd1Mr1yNRw1dyE7l9hePOnrfykJ0AT6RIQrxIF96mX2jPTYNqzyMZGZBiL+IUbmGHzT X-Received: by 2002:a05:6a00:3486:b0:6ce:6cae:c262 with SMTP id cp6-20020a056a00348600b006ce6caec262mr1071463pfb.11.1701775427723; Tue, 05 Dec 2023 03:23:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701775427; cv=none; d=google.com; s=arc-20160816; b=CkzANj/E4xm7eELu2EbnFATXh7wPhbUH5nhNxLufWRrSCq8AVyjwOha6luhYf7cx1F rntYc9RnMiL/oR0J4lbZixPNivTqEoSkfY7/STtiu6K5s8vN6vWQlGsFl6wvHgC8Vy/f F/CAxszrpUG9TvX7amx4RnhmB2qcOCH4nZOmTfSt5fKpoEA70P7h39roEcLJuz+vL7nv agjVZQmmYjazBY2JE6NvTRRkH+lu9300g3JnK6ISXHqD3pXGLhCKV9JRJO617hBrntGI uDkBQaO4Fj5gjRKwyoF71IQFGgNLsgGGHlfSVY/4sM2WKKmpshBaMPbe8O03wezbuiFx 6aPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=YZbKk+yhxIO4Hc1L75rfc2o4MNHfzeffMQgVUvvKgN4=; fh=uCCWEUVpLc/xDP/D3WvGY/BxHU9d0NQYu/T71GEsNmA=; b=K3iaLqOU7TRaucb7YLau16T4SQ9Tn/lq5zeZhX05CR7qlW8/suMV0toTp/TFm4qu8C T8Bzzp8Fb+I44Xpp1EQjNYWyXsKs7jBvBqzc31Z6/dMjh8EuuWPJSmtfhLf/DOJargB1 HD6z8SHB3WoNcVZ815IACmJ9DLexqF4HTCqEBJ+rlHNJH2WncCwgHscAoW6Y8zhnVjKU UHzMEu7iu4Rv+E9/Z+RoAevTShNWcMVL6wA0U+5oMev2n1Qgtxdj1Q3deAiaK4L/WSWR VM3+C4H1rWcFqPgsgfOBJF3ypFeCmGgG0vOMnq7W4PsChMKFylgeSseEyDgVjNCY/9Au pZLQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=CRXWHQ4H; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id h37-20020a635765000000b005be1ee5be37si9511355pgm.133.2023.12.05.03.23.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 03:23:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=CRXWHQ4H; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 4915980B1330; Tue, 5 Dec 2023 03:23:38 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1442176AbjLELWg (ORCPT <rfc822;chrisfriedt@gmail.com> + 99 others); Tue, 5 Dec 2023 06:22:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39868 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1442151AbjLELVh (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 5 Dec 2023 06:21:37 -0500 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C4AD01B2; Tue, 5 Dec 2023 03:21:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701775293; x=1733311293; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vHkfUOkHLHmL9MW/AMzgDgLZBnOiSQn6Z3TVJI+bGO8=; b=CRXWHQ4H81R0cvmUGLucDD3HU24fq6oAs1lZ4glpJWSS19DCRrkAzXnJ touVoPG4zy5jKPfxLb2qP3YE3LRaH3WTCIUf1nYGXAi4jkmok+JG6qKWm LYY/Gd+wQypf5/NSZJykuASt5OkDl/lzJ0dKlcKHS5VybDlAjtTMMudg6 NXho0s7G5y4XSEHk3Uazcs/vaG4NWft7BIgwpvaTBd2nZBIEC8GZHJB8U r3BVx8Z8eMRqTfjTSW/Vf8LojPAiJbTpqYUXIgDIgc5b/BEx5VO3Ftvti 2fxC+pBs1WU7Z+XN59UX/XWbQQv5RRMsXNfzlfkcc3ncpASd9c/UR7QRK w==; X-IronPort-AV: E=McAfee;i="6600,9927,10914"; a="942594" X-IronPort-AV: E=Sophos;i="6.04,252,1695711600"; d="scan'208";a="942594" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Dec 2023 03:21:25 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10914"; a="1018192984" X-IronPort-AV: E=Sophos;i="6.04,252,1695711600"; d="scan'208";a="1018192984" Received: from unknown (HELO fred..) ([172.25.112.68]) by fmsmga006.fm.intel.com with ESMTP; 05 Dec 2023 03:21:23 -0800 From: Xin Li <xin3.li@intel.com> To: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, linux-hyperv@vger.kernel.org, kvm@vger.kernel.org, xen-devel@lists.xenproject.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, luto@kernel.org, pbonzini@redhat.com, seanjc@google.com, peterz@infradead.org, jgross@suse.com, ravi.v.shankar@intel.com, mhiramat@kernel.org, andrew.cooper3@citrix.com, jiangshanlai@gmail.com, nik.borisov@suse.com, shan.kang@intel.com Subject: [PATCH v13 24/35] x86/fred: Add a NMI entry stub for FRED Date: Tue, 5 Dec 2023 02:50:13 -0800 Message-ID: <20231205105030.8698-25-xin3.li@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20231205105030.8698-1-xin3.li@intel.com> References: <20231205105030.8698-1-xin3.li@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Tue, 05 Dec 2023 03:23:38 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784440871286428693 X-GMAIL-MSGID: 1784440871286428693 |
Series |
x86: enable FRED for x86-64
|
|
Commit Message
Li, Xin3
Dec. 5, 2023, 10:50 a.m. UTC
From: "H. Peter Anvin (Intel)" <hpa@zytor.com> On a FRED system, NMIs nest both with themselves and faults, transient information is saved into the stack frame, and NMI unblocking only happens when the stack frame indicates that so should happen. Thus, the NMI entry stub for FRED is really quite small... Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Tested-by: Shan Kang <shan.kang@intel.com> Signed-off-by: Xin Li <xin3.li@intel.com> --- arch/x86/kernel/nmi.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+)
Comments
So we have recently discovered an overlooked interaction with VT-x. Immediately before VMENTER and after VMEXIT, CR2 is live with the *guest* CR2. Regardless of if the guest uses FRED or not, this is guest state and SHOULD NOT be corrupted. Furthermore, host state MUST NOT leak into the guest. NMIs are blocked on VMEXIT if the cause was an NMI, but not for other reasons, so a NMI coming in during this window that then #PFs could corrupt the guest CR2. Intel is exploring ways to close this hole, but for schedule reasons, it will not be available at the same time as the first implementation of FRED. Therefore, as a workaround, it turns out that the FRED NMI stub *will*, unfortunately, have to save and restore CR2 after all when (at least) Intel KVM is in use. Note that this is airtight: it does add a performance penalty to the NMI path (two read CR2 in the common case of no #PF), but there is no gap during which a bad CR2 value could be introduced in the guest, no matter in which sequence the events happen. In theory the performance penalty could be further reduced by conditionalizing this on the NMI happening in the critical region in the KVM code, but it seems to be pretty far from necessary to me. This obviously was an unfortunate oversight on our part, but the workaround is simple and doesn't affect any non-NMI paths. -hpa On 12/5/23 02:50, Xin Li wrote: > + > + if (IS_ENABLED(CONFIG_SMP) && arch_cpu_is_offline(smp_processor_id())) > + return; > + This is cut & paste from elsewhere in the NMI code, but I believe the IS_ENABLED() is unnecessary (not to mention ugly): smp_processor_id() should always return zero on UP, and arch_cpu_is_offline() reduces to !(cpu == 0), so this is a statically true condition on UP. -hpa
> So we have recently discovered an overlooked interaction with VT-x. > Immediately before VMENTER and after VMEXIT, CR2 is live with the > *guest* CR2. Regardless of if the guest uses FRED or not, this is guest > state and SHOULD NOT be corrupted. Furthermore, host state MUST NOT leak > into the guest. > > NMIs are blocked on VMEXIT if the cause was an NMI, but not for other > reasons, so a NMI coming in during this window that then #PFs could > corrupt the guest CR2. I add a comment to vmx_vcpu_enter_exit() in https://lore.kernel.org/kvm/20231108183003.5981-1-xin3.li@intel.com/T/#m29616c02befc04305085b1cbac64df916364626a for this. > > Intel is exploring ways to close this hole, but for schedule reasons, it > will not be available at the same time as the first implementation of > FRED. Therefore, as a workaround, it turns out that the FRED NMI stub > *will*, unfortunately, have to save and restore CR2 after all when (at > least) Intel KVM is in use. > > Note that this is airtight: it does add a performance penalty to the NMI > path (two read CR2 in the common case of no #PF), but there is no gap > during which a bad CR2 value could be introduced in the guest, no matter > in which sequence the events happen. > > In theory the performance penalty could be further reduced by > conditionalizing this on the NMI happening in the critical region in the > KVM code, but it seems to be pretty far from necessary to me. We should keep the following code in the FRED NMI handler, right? { ... this_cpu_write(nmi_cr2, read_cr2()); ... if (unlikely(this_cpu_read(nmi_cr2) != read_cr2())) write_cr2(this_cpu_read(nmi_cr2)); ... } > This obviously was an unfortunate oversight on our part, but the > workaround is simple and doesn't affect any non-NMI paths. > > > + > > + if (IS_ENABLED(CONFIG_SMP) && > arch_cpu_is_offline(smp_processor_id())) > > + return; > > + > > This is cut & paste from elsewhere in the NMI code, but I believe the > IS_ENABLED() is unnecessary (not to mention ugly): smp_processor_id() > should always return zero on UP, and arch_cpu_is_offline() reduces to > !(cpu == 0), so this is a statically true condition on UP. Ah, good point!
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c index 17e955ab69fe..56350d839e44 100644 --- a/arch/x86/kernel/nmi.c +++ b/arch/x86/kernel/nmi.c @@ -35,6 +35,7 @@ #include <asm/nospec-branch.h> #include <asm/microcode.h> #include <asm/sev.h> +#include <asm/fred.h> #define CREATE_TRACE_POINTS #include <trace/events/nmi.h> @@ -651,6 +652,33 @@ void nmi_backtrace_stall_check(const struct cpumask *btp) #endif +#ifdef CONFIG_X86_FRED +/* + * With FRED, CR2/DR6 is pushed to #PF/#DB stack frame during FRED + * event delivery, i.e., there is no problem of transient states. + * And NMI unblocking only happens when the stack frame indicates + * that so should happen. + * + * Thus, the NMI entry stub for FRED is really straightforward and + * as simple as most exception handlers. As such, #DB is allowed + * during NMI handling. + */ +DEFINE_FREDENTRY_NMI(exc_nmi) +{ + irqentry_state_t irq_state; + + if (IS_ENABLED(CONFIG_SMP) && arch_cpu_is_offline(smp_processor_id())) + return; + + irq_state = irqentry_nmi_enter(regs); + + inc_irq_stat(__nmi_count); + default_do_nmi(regs); + + irqentry_nmi_exit(regs, irq_state); +} +#endif + void stop_nmi(void) { ignore_nmis++;