Message ID | 20230307023946.14516-29-xin3.li@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp2212029wrd; Mon, 6 Mar 2023 19:14:33 -0800 (PST) X-Google-Smtp-Source: AK7set+8YSyoqBKqfXW6mo9LxCdp6Dosm0jlzqFZPCm7uyb9RyqivF6H1J5sYj/InCu3ekWDn4Hy X-Received: by 2002:a17:902:da90:b0:19a:a0d0:10f0 with SMTP id j16-20020a170902da9000b0019aa0d010f0mr15822225plx.23.1678158872927; Mon, 06 Mar 2023 19:14:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678158872; cv=none; d=google.com; s=arc-20160816; b=FiAp2Z3t+qrl4YvWWtrI8jhnYr2eJgquKAfyCRI2B2F9jQm0XJlNCIon4pLEdu4fux /xrijPhcE/KceIL5TTXSBS9ooGayG+WWL0idquS8x28u/4jRWXMw/DPYjuKhuZKjm7Qx ZRs+tHjYHwgUROBIRAkh0SWKINFjhikvTRwSek/zqbw3wjHTpXJYXOm21L0H7076U7ho 9yZQ99ndj2/fWftAL1W+h0jBc3eJM7in/wzk9KN1bu9G/h7vFucPtRjMHfy+0SHum/2o hiIc1bVFVjdqrxuO2/5Q49wyzVfEgAPtx2AJeP2KUJt1v+7wE45r0tc0iWokeqfODjo2 lCHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=+yaNDyjHciRyCSV2CarXljfkk6fSKjRAy2iTkNBrIO4=; b=kVZCujwa2DwV1gWaM8X/FKHwtqJHHTbQqxqWRO8aNnpAvODfsfbYacqyR/unyNR53P SEfZA2nkw8HWSy3QoFJbG31LG8GH5ISySPZcoaAB1QYBDkdf/RIT9BqbnS2nPL3wgD1I TGiuP6pDSu+OQaGE4+0rYaHU1IkfYA6eTDm8ei0umWdfKv8mrfcCF9z3hY+qe0nRqXqL 1qjd4/wfh3A+7c2mVee91KqiRe4EBvAxL9sIE3oziVFvNuIUWRbGqg4Qw0KAOvnIAUp5 0Dl95v2ajv7bur8q3t9R+OT8jWdFPAyQDSvQ22HvnHWAAkDjkgG5s31w615Mc0AkP9q7 wydA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=nQoFP+TB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q1-20020a170902dac100b0019e6d854b08si12344223plx.559.2023.03.06.19.14.20; Mon, 06 Mar 2023 19:14:32 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=nQoFP+TB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230151AbjCGDHl (ORCPT <rfc822;toshivichauhan@gmail.com> + 99 others); Mon, 6 Mar 2023 22:07:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42848 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230387AbjCGDHB (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 6 Mar 2023 22:07:01 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 20B0D86DD3; Mon, 6 Mar 2023 19:05:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1678158342; x=1709694342; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=iXYrw/l8XuMvDJL0/urb7Y4MFn+tao/ntyyueUqHbW8=; b=nQoFP+TBCQA0Jh4HeLPNPQUCyU+jKFlYBCmS+zZAnLspb3Uy1PYFqFA4 ydcFlvd2DVYeWxfTNR9NbL2DdfknlyFduyegKDAeFG9jmmxoljw9uut5d TzvPavVjIgwm2tiWc0sSqDpdJAmGGndRrBJu8q2OaZokAgGHxDhMVnHsp qyJQ3F+l2K8xL1ppu909kMTPnvG7JWGWql4/Zuz2R88lYjF6fBg3U6mZ1 GpjPIZfTeX85P7/0XSsb3lQAw/appBdhSWjzy2KqqHPvQd4qTxuhn97FK IlRpKDLTd6XBR/PkcJeuK8MkTFi2EjJF54dkeo5ACr9cBhQJ4z4fKavqC g==; X-IronPort-AV: E=McAfee;i="6500,9779,10641"; a="338072585" X-IronPort-AV: E=Sophos;i="5.98,238,1673942400"; d="scan'208";a="338072585" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Mar 2023 19:05:21 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10641"; a="676409927" X-IronPort-AV: E=Sophos;i="5.98,238,1673942400"; d="scan'208";a="676409927" Received: from unknown (HELO fred..) ([172.25.112.68]) by orsmga002.jf.intel.com with ESMTP; 06 Mar 2023 19:05:21 -0800 From: Xin Li <xin3.li@intel.com> To: linux-kernel@vger.kernel.org, x86@kernel.org, kvm@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, peterz@infradead.org, andrew.cooper3@citrix.com, seanjc@google.com, pbonzini@redhat.com, ravi.v.shankar@intel.com Subject: [PATCH v5 28/34] x86/fred: fixup fault on ERETU by jumping to fred_entrypoint_user Date: Mon, 6 Mar 2023 18:39:40 -0800 Message-Id: <20230307023946.14516-29-xin3.li@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230307023946.14516-1-xin3.li@intel.com> References: <20230307023946.14516-1-xin3.li@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759677118362273391?= X-GMAIL-MSGID: =?utf-8?q?1759677118362273391?= |
Series |
x86: enable FRED for x86-64
|
|
Commit Message
Li, Xin3
March 7, 2023, 2:39 a.m. UTC
If the stack frame contains an invalid user context (e.g. due to invalid SS, a non-canonical RIP, etc.) the ERETU instruction will trap (#SS or #GP). From a Linux point of view, this really should be considered a user space failure, so use the standard fault fixup mechanism to intercept the fault, fix up the exception frame, and redirect execution to fred_entrypoint_user. The end result is that it appears just as if the hardware had taken the exception immediately after completing the transition to user space. Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com> Tested-by: Shan Kang <shan.kang@intel.com> Signed-off-by: Xin Li <xin3.li@intel.com> --- arch/x86/entry/entry_64_fred.S | 8 +++++-- arch/x86/include/asm/extable_fixup_types.h | 4 +++- arch/x86/mm/extable.c | 28 ++++++++++++++++++++++ 3 files changed, 37 insertions(+), 3 deletions(-)
Comments
> +#ifdef CONFIG_X86_FRED > +static bool ex_handler_eretu(const struct exception_table_entry *fixup, > + struct pt_regs *regs, unsigned long error_code) > +{ > + struct pt_regs *uregs = (struct pt_regs *)(regs->sp - offsetof(struct pt_regs, ip)); > + unsigned short ss = uregs->ss; > + unsigned short cs = uregs->cs; > + > + fred_info(uregs)->edata = fred_event_data(regs); > + uregs->ssx = regs->ssx; > + uregs->ss = ss; > + uregs->csx = regs->csx; > + uregs->current_stack_level = 0; > + uregs->cs = cs; Hello If the ERETU instruction had tried to return from NMI to ring3 and just faulted, is NMI still blocked? We know that IRET unconditionally enables NMI, but I can't find any clue in the FRED's manual. In the pseudocode of ERETU in the manual, it seems that NMI is only enabled when ERETU succeeds with bit28 in csx set. If so, this code will fail to reenable NMI if bit28 is not explicitly re-set in csx. Thanks, Lai > + > + /* Copy error code to uregs and adjust stack pointer accordingly */ > + uregs->orig_ax = error_code; > + regs->sp -= 8; > + > + return ex_handler_default(fixup, regs); > +}
On Fri, Mar 17, 2023 at 5:56 PM <andrew.cooper3@citrix.com> wrote: > > On 17/03/2023 9:39 am, Lai Jiangshan wrote: > >> +#ifdef CONFIG_X86_FRED > >> +static bool ex_handler_eretu(const struct exception_table_entry *fixup, > >> + struct pt_regs *regs, unsigned long error_code) > >> +{ > >> + struct pt_regs *uregs = (struct pt_regs *)(regs->sp - offsetof(struct pt_regs, ip)); > >> + unsigned short ss = uregs->ss; > >> + unsigned short cs = uregs->cs; > >> + > >> + fred_info(uregs)->edata = fred_event_data(regs); > >> + uregs->ssx = regs->ssx; > >> + uregs->ss = ss; > >> + uregs->csx = regs->csx; > >> + uregs->current_stack_level = 0; > >> + uregs->cs = cs; > > Hello > > > > If the ERETU instruction had tried to return from NMI to ring3 and just faulted, > > is NMI still blocked? > > > > We know that IRET unconditionally enables NMI, but I can't find any clue in the > > FRED's manual. > > > > In the pseudocode of ERETU in the manual, it seems that NMI is only enabled when > > ERETU succeeds with bit28 in csx set. If so, this code will fail to reenable > > NMI if bit28 is not explicitly re-set in csx. > > IRET clearing NMI blocking is the source of an immense amount of grief, > and ultimately the reason why Linux and others can't use supervisor > shadow stacks at the moment. > > Changing this property, so NMIs only get unblocked on successful > execution of an ERET{S,U}, was a key demand of the FRED spec. > > i.e. until you have successfully ERET*'d, you're still logically in the > NMI handler and NMIs need to remain blocked even when handling the #GP > from a bad ERET. > Handling of the #GP for a bad ERETU can be rescheduled. It is not OK to reschedule with NMI blocked. I think "regs->nmi = 1;" (not uregs->nmi) can fix the problem.
On March 17, 2023 2:55:44 AM PDT, andrew.cooper3@citrix.com wrote: >On 17/03/2023 9:39 am, Lai Jiangshan wrote: >>> +#ifdef CONFIG_X86_FRED >>> +static bool ex_handler_eretu(const struct exception_table_entry *fixup, >>> + struct pt_regs *regs, unsigned long error_code) >>> +{ >>> + struct pt_regs *uregs = (struct pt_regs *)(regs->sp - offsetof(struct pt_regs, ip)); >>> + unsigned short ss = uregs->ss; >>> + unsigned short cs = uregs->cs; >>> + >>> + fred_info(uregs)->edata = fred_event_data(regs); >>> + uregs->ssx = regs->ssx; >>> + uregs->ss = ss; >>> + uregs->csx = regs->csx; >>> + uregs->current_stack_level = 0; >>> + uregs->cs = cs; >> Hello >> >> If the ERETU instruction had tried to return from NMI to ring3 and just faulted, >> is NMI still blocked? >> >> We know that IRET unconditionally enables NMI, but I can't find any clue in the >> FRED's manual. >> >> In the pseudocode of ERETU in the manual, it seems that NMI is only enabled when >> ERETU succeeds with bit28 in csx set. If so, this code will fail to reenable >> NMI if bit28 is not explicitly re-set in csx. > >IRET clearing NMI blocking is the source of an immense amount of grief, >and ultimately the reason why Linux and others can't use supervisor >shadow stacks at the moment. > >Changing this property, so NMIs only get unblocked on successful >execution of an ERET{S,U}, was a key demand of the FRED spec. > >i.e. until you have successfully ERET*'d, you're still logically in the >NMI handler and NMIs need to remain blocked even when handling the #GP >from a bad ERET. > >~Andrew This is correct.
On March 17, 2023 6:02:52 AM PDT, Lai Jiangshan <jiangshanlai@gmail.com> wrote: >On Fri, Mar 17, 2023 at 5:56 PM <andrew.cooper3@citrix.com> wrote: >> >> On 17/03/2023 9:39 am, Lai Jiangshan wrote: >> >> +#ifdef CONFIG_X86_FRED >> >> +static bool ex_handler_eretu(const struct exception_table_entry *fixup, >> >> + struct pt_regs *regs, unsigned long error_code) >> >> +{ >> >> + struct pt_regs *uregs = (struct pt_regs *)(regs->sp - offsetof(struct pt_regs, ip)); >> >> + unsigned short ss = uregs->ss; >> >> + unsigned short cs = uregs->cs; >> >> + >> >> + fred_info(uregs)->edata = fred_event_data(regs); >> >> + uregs->ssx = regs->ssx; >> >> + uregs->ss = ss; >> >> + uregs->csx = regs->csx; >> >> + uregs->current_stack_level = 0; >> >> + uregs->cs = cs; >> > Hello >> > >> > If the ERETU instruction had tried to return from NMI to ring3 and just faulted, >> > is NMI still blocked? >> > >> > We know that IRET unconditionally enables NMI, but I can't find any clue in the >> > FRED's manual. >> > >> > In the pseudocode of ERETU in the manual, it seems that NMI is only enabled when >> > ERETU succeeds with bit28 in csx set. If so, this code will fail to reenable >> > NMI if bit28 is not explicitly re-set in csx. >> >> IRET clearing NMI blocking is the source of an immense amount of grief, >> and ultimately the reason why Linux and others can't use supervisor >> shadow stacks at the moment. >> >> Changing this property, so NMIs only get unblocked on successful >> execution of an ERET{S,U}, was a key demand of the FRED spec. >> >> i.e. until you have successfully ERET*'d, you're still logically in the >> NMI handler and NMIs need to remain blocked even when handling the #GP >> from a bad ERET. >> > >Handling of the #GP for a bad ERETU can be rescheduled. It is not >OK to reschedule with NMI blocked. > >I think "regs->nmi = 1;" (not uregs->nmi) can fix the problem. > You are quite correct, since what we want here is to emulate having taken the fault in user space – which meant that NMI would have been re-enabled by the never-executed return. I think the "best" solution is: regs->nmi = uregs->nmi; uregs->nmi = 0; ... as enabling NMI is expected to have a performance penalty (being the less common case, an implementation which has a performance difference at all would want to optimize the non-NMI case), and I believe the compiler should be able to at least mostly fold those operations into ones it is doing anyway.
> > +#ifdef CONFIG_X86_FRED > > +static bool ex_handler_eretu(const struct exception_table_entry *fixup, > > + struct pt_regs *regs, unsigned long > > +error_code) { > > + struct pt_regs *uregs = (struct pt_regs *)(regs->sp - offsetof(struct pt_regs, > ip)); > > + unsigned short ss = uregs->ss; > > + unsigned short cs = uregs->cs; > > + > > + fred_info(uregs)->edata = fred_event_data(regs); > > + uregs->ssx = regs->ssx; > > + uregs->ss = ss; > > + uregs->csx = regs->csx; > > + uregs->current_stack_level = 0; > > + uregs->cs = cs; > > Hello > > If the ERETU instruction had tried to return from NMI to ring3 and just faulted, is > NMI still blocked? Do you mean the NMI FRED stack frame contains an invalid ring3 context?
diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S index 1fb765fd3871..027ef8f1e600 100644 --- a/arch/x86/entry/entry_64_fred.S +++ b/arch/x86/entry/entry_64_fred.S @@ -5,8 +5,10 @@ * The actual FRED entry points. */ #include <linux/linkage.h> -#include <asm/errno.h> +#include <asm/asm.h> #include <asm/asm-offsets.h> +#include <asm/errno.h> +#include <asm/export.h> #include <asm/fred.h> #include "calling.h" @@ -38,7 +40,9 @@ SYM_CODE_START_NOALIGN(fred_entrypoint_user) call fred_entry_from_user SYM_INNER_LABEL(fred_exit_user, SYM_L_GLOBAL) FRED_EXIT - ERETU +1: ERETU + + _ASM_EXTABLE_TYPE(1b, fred_entrypoint_user, EX_TYPE_ERETU) SYM_CODE_END(fred_entrypoint_user) /* diff --git a/arch/x86/include/asm/extable_fixup_types.h b/arch/x86/include/asm/extable_fixup_types.h index 991e31cfde94..1585c798a02f 100644 --- a/arch/x86/include/asm/extable_fixup_types.h +++ b/arch/x86/include/asm/extable_fixup_types.h @@ -64,6 +64,8 @@ #define EX_TYPE_UCOPY_LEN4 (EX_TYPE_UCOPY_LEN | EX_DATA_IMM(4)) #define EX_TYPE_UCOPY_LEN8 (EX_TYPE_UCOPY_LEN | EX_DATA_IMM(8)) -#define EX_TYPE_ZEROPAD 20 /* longword load with zeropad on fault */ +#define EX_TYPE_ZEROPAD 20 /* longword load with zeropad on fault */ + +#define EX_TYPE_ERETU 21 #endif diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c index 60814e110a54..88a2c419ce8b 100644 --- a/arch/x86/mm/extable.c +++ b/arch/x86/mm/extable.c @@ -6,6 +6,7 @@ #include <xen/xen.h> #include <asm/fpu/api.h> +#include <asm/fred.h> #include <asm/sev.h> #include <asm/traps.h> #include <asm/kdebug.h> @@ -195,6 +196,29 @@ static bool ex_handler_ucopy_len(const struct exception_table_entry *fixup, return ex_handler_uaccess(fixup, regs, trapnr); } +#ifdef CONFIG_X86_FRED +static bool ex_handler_eretu(const struct exception_table_entry *fixup, + struct pt_regs *regs, unsigned long error_code) +{ + struct pt_regs *uregs = (struct pt_regs *)(regs->sp - offsetof(struct pt_regs, ip)); + unsigned short ss = uregs->ss; + unsigned short cs = uregs->cs; + + fred_info(uregs)->edata = fred_event_data(regs); + uregs->ssx = regs->ssx; + uregs->ss = ss; + uregs->csx = regs->csx; + uregs->current_stack_level = 0; + uregs->cs = cs; + + /* Copy error code to uregs and adjust stack pointer accordingly */ + uregs->orig_ax = error_code; + regs->sp -= 8; + + return ex_handler_default(fixup, regs); +} +#endif + int ex_get_fixup_type(unsigned long ip) { const struct exception_table_entry *e = search_exception_tables(ip); @@ -272,6 +296,10 @@ int fixup_exception(struct pt_regs *regs, int trapnr, unsigned long error_code, return ex_handler_ucopy_len(e, regs, trapnr, reg, imm); case EX_TYPE_ZEROPAD: return ex_handler_zeropad(e, regs, fault_addr); +#ifdef CONFIG_X86_FRED + case EX_TYPE_ERETU: + return ex_handler_eretu(e, regs, error_code); +#endif } BUG(); }