[v5,34/34] KVM: x86/vmx: execute "int $2" to handle NMI in NMI caused VM exits when FRED is enabled
Message ID | 20230307023946.14516-35-xin3.li@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp2211263wrd; Mon, 6 Mar 2023 19:12:07 -0800 (PST) X-Google-Smtp-Source: AK7set/X4GvcN8OfOBdkNTVlEC1JVOJ+09Ot+P3Zrz3o8i6EKgoTKHRGY0B2uDo1U4E9ibvV46sY X-Received: by 2002:a17:902:7006:b0:19c:fc41:2dfd with SMTP id y6-20020a170902700600b0019cfc412dfdmr13507777plk.29.1678158727593; Mon, 06 Mar 2023 19:12:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678158727; cv=none; d=google.com; s=arc-20160816; b=G1AHozi+x3UKdgfopJugGoMGXnZsisfF63Us+WYnrdXXgsEn2yAVjU/xEUEnVWSoIj pUiRC7q0qpIqSXk8awRu0WgJbddTgSJrgEn7FdF5Q9/yQvMbgTGi/0lMbJkAIwFfIETY r3dr36rMi/DcP09hQg5sI0hCsaUDyo8jL7UCobG26766T+J8diICytOBt/GWqOL8USpa SA2KNY00rWmsWGM04xt2HVws66lrCyve84ROupoLBR42lYc6TP8Zqis2kX1hbtFVHftD hW7h5mqhY3jjixiarx68xktd3uze25D2NsVeOjHu/FO8hTthX+tTb+T8RdjcqTm7AN8L l0Kg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ZR6LXewVSogLtn5kIAwP0YrFkbFL4N9sZ11A3Dm7Zw4=; b=lccIthAGVOeVN+HOJqdjxiSnO50PlrPfpKTUTiTw3cjgaFoffzIUvw0C5GdZ92R8TE JPeppjGFT34kJgR2P6EhFKllpAc8dKhrM8sL0YArIQ6WnWz8j+l3E+8kkW68PKz6wQUd dfq3/Y60S9g6g1WamZ3/IKT+YQDwlEdI9FPqNVPPMHZmg+LmuykDkTefLyWl1x96sX8P j579LUT2jS+Zzxq7oUjrShzLtVfe0F7IvyoRprSIo1mAzcYgkJDa6iYmrHeT3+0SKgxa 4apUIkw2qzUr9ulIb65Y3rZ6b8UHj0b0jGvT4rp7yJtL+7EoLy71nRoHSud9sfnbZ3XT Jz0Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=gYUd1i5p; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q1-20020a170902dac100b0019e6d854b08si12344223plx.559.2023.03.06.19.11.54; Mon, 06 Mar 2023 19:12:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=gYUd1i5p; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230311AbjCGDIS (ORCPT <rfc822;toshivichauhan@gmail.com> + 99 others); Mon, 6 Mar 2023 22:08:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45452 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231151AbjCGDHl (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 6 Mar 2023 22:07:41 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4517C87A23; Mon, 6 Mar 2023 19:06:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1678158374; x=1709694374; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ldK8JEsrDCe0H7nzPkeyohBcSS2VhVlUfGora58zyTs=; b=gYUd1i5p3deU1a1+kFJPfJk4eomdmH6lOxmc6qsu+Gr+NU5l8BlT8EmT sRChkgj/gRtxAPW9GdXvCuA7w7fVvHrevUkVf6mugSbd+xpsEPHISfMZT RXO9b+tE7G07P7WwgHukCGFr7CxfnwQsSkPKyALVks1NY6/Ut/uH0Khq4 FfLIAMXa40Gsbuf0vL/4QNjCLTFhpc+yjjpnroDWrjsTc0+wNbPQCV8iq do89m4ky2SSoiBLURfulzR2Td7/xzUJGMZ3qQEqE6189xyqyba9bUs4+/ zcu6Eu7i+BIfJXmLgq2hwRVQJqH9+VFPheQ+vSKAghgh4hYco+ajKZvCy Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10641"; a="338072644" X-IronPort-AV: E=Sophos;i="5.98,238,1673942400"; d="scan'208";a="338072644" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Mar 2023 19:05:24 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10641"; a="676409966" X-IronPort-AV: E=Sophos;i="5.98,238,1673942400"; d="scan'208";a="676409966" Received: from unknown (HELO fred..) ([172.25.112.68]) by orsmga002.jf.intel.com with ESMTP; 06 Mar 2023 19:05:23 -0800 From: Xin Li <xin3.li@intel.com> To: linux-kernel@vger.kernel.org, x86@kernel.org, kvm@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, peterz@infradead.org, andrew.cooper3@citrix.com, seanjc@google.com, pbonzini@redhat.com, ravi.v.shankar@intel.com Subject: [PATCH v5 34/34] KVM: x86/vmx: execute "int $2" to handle NMI in NMI caused VM exits when FRED is enabled Date: Mon, 6 Mar 2023 18:39:46 -0800 Message-Id: <20230307023946.14516-35-xin3.li@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230307023946.14516-1-xin3.li@intel.com> References: <20230307023946.14516-1-xin3.li@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759676966003658059?= X-GMAIL-MSGID: =?utf-8?q?1759676966003658059?= |
Series |
x86: enable FRED for x86-64
|
|
Commit Message
Li, Xin3
March 7, 2023, 2:39 a.m. UTC
Execute "int $2" to handle NMI in NMI caused VM exits when FRED is enabled. Like IRET for IDT, ERETS/ERETU are required to end the NMI handler for FRED to unblock NMI ASAP (w/ bit 28 of CS set). And there are 2 approaches to invoke the FRED NMI handler: 1) execute "int $2", let the h/w do the job. 2) create a FRED NMI stack frame on the current kernel stack with ASM, and then jump to fred_entrypoint_kernel in arch/x86/entry/entry_64_fred.S. 1) is preferred as we want less ASM. Tested-by: Shan Kang <shan.kang@intel.com> Signed-off-by: Xin Li <xin3.li@intel.com> --- Changes since v4: *) Do NOT use the term "injection", which in the KVM context means to reinject an event into the guest (Sean Christopherson). *) Add the explanation of why to execute "int $2" to invoke the NMI handler in NMI caused VM exits (Sean Christopherson). --- arch/x86/kvm/vmx/vmx.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
Comments
> Execute "int $2" to handle NMI in NMI caused VM exits when FRED is enabled. > > Like IRET for IDT, ERETS/ERETU are required to end the NMI handler for FRED > to unblock NMI ASAP (w/ bit 28 of CS set). And there are 2 approaches to > invoke the FRED NMI handler: > 1) execute "int $2", let the h/w do the job. > 2) create a FRED NMI stack frame on the current kernel stack with ASM, > and then jump to fred_entrypoint_kernel in arch/x86/entry/entry_64_fred.S. > > 1) is preferred as we want less ASM. > > Tested-by: Shan Kang <shan.kang@intel.com> > Signed-off-by: Xin Li <xin3.li@intel.com> > --- > > Changes since v4: > *) Do NOT use the term "injection", which in the KVM context means to > reinject an event into the guest (Sean Christopherson). > *) Add the explanation of why to execute "int $2" to invoke the NMI handler > in NMI caused VM exits (Sean Christopherson). Sean, Do you have any further issue with the last 2 VMX patches? If not, would you ack them? Thanks! Xin > --- > arch/x86/kvm/vmx/vmx.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 3ebeaab34b2e..4f12ead2266b 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -7229,7 +7229,16 @@ static noinstr void vmx_vcpu_enter_exit(struct > kvm_vcpu *vcpu, > if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI && > is_nmi(vmx_get_intr_info(vcpu))) { > kvm_before_interrupt(vcpu, KVM_HANDLING_NMI); > - vmx_do_nmi_irqoff(); > + /* > + * Like IRET for IDT, ERETS/ERETU are required to end the NMI > + * handler for FRED to unblock NMI ASAP (w/ bit 28 of CS set). > + * > + * Invoke the FRED NMI handler through executing "int $2". > + */ > + if (cpu_feature_enabled(X86_FEATURE_FRED)) > + asm volatile("int $2"); > + else > + vmx_do_nmi_irqoff(); > kvm_after_interrupt(vcpu); > } > > -- > 2.34.1
On Mon, Mar 06, 2023, Xin Li wrote: > Execute "int $2" to handle NMI in NMI caused VM exits when FRED is enabled. > > Like IRET for IDT, ERETS/ERETU are required to end the NMI handler for FRED > to unblock NMI ASAP (w/ bit 28 of CS set). That's "CS" on the stack correct? Is bit 28 set manually by software, or is it set automatically by hardware? If it's set by hardware, does "int $2" actually set the bit since it's not a real NMI? > And there are 2 approaches to > invoke the FRED NMI handler: > 1) execute "int $2", let the h/w do the job. > 2) create a FRED NMI stack frame on the current kernel stack with ASM, > and then jump to fred_entrypoint_kernel in arch/x86/entry/entry_64_fred.S. > > 1) is preferred as we want less ASM. Who is "we", and how much assembly are we talking about? E.g. I personally don't mind a trampoline in KVM if it's small and/or can share code with existing assembly subroutines.
On Wed, Mar 22, 2023, andrew.cooper3@citrix.com wrote: > On 22/03/2023 5:49 pm, Sean Christopherson wrote: > > On Mon, Mar 06, 2023, Xin Li wrote: > >> Execute "int $2" to handle NMI in NMI caused VM exits when FRED is enabled. > >> > >> Like IRET for IDT, ERETS/ERETU are required to end the NMI handler for FRED > >> to unblock NMI ASAP (w/ bit 28 of CS set). > > That's "CS" on the stack correct? Is bit 28 set manually by software, or is it > > set automatically by hardware? If it's set by hardware, does "int $2" actually > > set the bit since it's not a real NMI? > > int $2 had better not set it...� This is the piece of state that is > intended to cause everything which isn't a real NMI to nest properly > inside a real NMI. > > It is supposed to be set on delivery of an NMI, and act as the trigger > for ERET{U,S} to drop the latch. > > Software is can set it manually in a FRED-frame in order to explicitly > unblock NMIs. Ah, found this in patch 19. That hunk really belongs in this patch, because this patch is full of magic without that information. + /* + * VM exits induced by NMIs keep NMI blocked, and we do + * "int $2" to reinject the NMI w/ NMI kept being blocked. + * However "int $2" doesn't set the nmi bit in the FRED + * stack frame, so we explicitly set it to make sure a + * later ERETS will unblock NMI immediately. + */ + regs->nmi = 1; Organization aside, this seems to defeat the purpose of _not_ unconditionally unmasking NMIs on ERET since the kernel assumes any random "int $2" is coming from KVM after an NMI VM-Exit. Eww, and "int $2" doesn't even go directly to fred_exc_nmi(), it trampolines through fred_sw_interrupt_kernel() first. Looks like "int $2" from userspace gets routed to a #GP, so at least that bit is handled. I'm not dead set against the proposed approach, but IMO it's not obviously better than a bit of assembly to have a more direct call into the NMI handler.
> > Like IRET for IDT, ERETS/ERETU are required to end the NMI handler for > > FRED to unblock NMI ASAP (w/ bit 28 of CS set). > > That's "CS" on the stack correct? Is bit 28 set manually by software, or is it set > automatically by hardware? If it's set by hardware, does "int $2" actually set the > bit since it's not a real NMI? Right, It's the "CS" on the stack. The bit 28 is set by the FRED NMI handler: https://lore.kernel.org/lkml/20230307023946.14516-20-xin3.li@intel.com/ Upon a NMI delivery, the NMI bit is always set by H/W. However, "int $2" does NOT set it, thus we need to explicitly set it. > > And there are 2 approaches to > > invoke the FRED NMI handler: > > 1) execute "int $2", let the h/w do the job. > > 2) create a FRED NMI stack frame on the current kernel stack with ASM, > > and then jump to fred_entrypoint_kernel in arch/x86/entry/entry_64_fred.S. > > > > 1) is preferred as we want less ASM. > > Who is "we", and how much assembly are we talking about? E.g. I personally don't > mind a trampoline in KVM if it's small and/or can share code with existing assembly > subroutines. I ever got such a comment: https://lore.kernel.org/lkml/8735bpbhat.ffs@tglx/ However, if ASM is also okay, I can work on it. And I don't think the ASM code will be big.
> Organization aside, this seems to defeat the purpose of _not_ unconditionally > unmasking NMIs on ERET since the kernel assumes any random "int $2" is coming > from KVM after an NMI VM-Exit. I'm a bit confused. KVM VMX is the only component needing to execute "int $2" and it surely has NMI blocked after an NMI VM-exit. > Eww, and "int $2" doesn't even go directly to fred_exc_nmi(), it trampolines > through fred_sw_interrupt_kernel() first. Looks like "int $2" from userspace gets > routed to a #GP, so at least that bit is handled. FRED does a 2-level dispatch, unless an event handler is on a hot path, we don't promote its handling. NMI seems not a frequent event. > I'm not dead set against the proposed approach, but IMO it's not obviously better > than a bit of assembly to have a more direct call into the NMI handler. I will give it a shot.
> > I'm not dead set against the proposed approach, but IMO it's not > > obviously better than a bit of assembly to have a more direct call into the NMI > handler. > > I will give it a shot. Hi Sean, I got a working patch, before I resend the whole FRED patch set again, can you please check if this is what you're expecting? When FRED is enabled, the x86 CPU always pushes an error code on the stack immediately after the return instruction address is pushed. To generate such a stack frame, call a trampoline function first to push the return instruction address on the stack, and the trampoline function then pushes an error code (0 for IRQ/NMI) and jump to fred_entrypoint_kernel. I could have vmx_do_interrupt_trampoline jump to fred_entrypoint_kernel Instead of calling external_interrupt(), but that would reenter the noinstr text again (not a big problem but seems not preferred by Peter Z). Thanks! Xin diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S index 631fd7da2bc3..6682b5bd202b 100644 --- a/arch/x86/kvm/vmx/vmenter.S +++ b/arch/x86/kvm/vmx/vmenter.S @@ -31,7 +31,7 @@ #define VCPU_R15 __VCPU_REGS_R15 * WORD_SIZE #endif -.macro VMX_DO_EVENT_IRQOFF call_insn call_target +.macro VMX_DO_EVENT_IRQOFF call_insn call_target fred=1 nmi=0 /* * Unconditionally create a stack frame, getting the correct RSP on the * stack (for x86-64) would take two instructions anyways, and RBP can @@ -46,11 +46,34 @@ * creating the synthetic interrupt stack frame for the IRQ/NMI. */ and $-16, %rsp + + .if \fred + push $0 /* Reserved by FRED, must be 0 */ + push $0 /* FRED event data, 0 for NMI and external interrupts */ + + .if \nmi + mov $(2 << 32 | 2 << 48), %_ASM_AX /* NMI event type and vector */ + .else + mov %_ASM_ARG1, %_ASM_AX + shl $32, %_ASM_AX /* external interrupt vector */ + .endif + add $__KERNEL_DS, %_ASM_AX + bts $57, %_ASM_AX /* bit 57: 64-bit mode */ + push %_ASM_AX + .else push $__KERNEL_DS + .endif + push %rbp #endif pushf + .if \nmi + mov $__KERNEL_CS, %_ASM_AX + bts $28, %_ASM_AX /* set the NMI bit */ + push %_ASM_AX + .else push $__KERNEL_CS + .endif \call_insn \call_target /* @@ -299,8 +322,19 @@ SYM_INNER_LABEL(vmx_vmexit, SYM_L_GLOBAL) SYM_FUNC_END(__vmx_vcpu_run) +SYM_FUNC_START(vmx_do_nmi_trampoline) +#ifdef CONFIG_X86_FRED + ALTERNATIVE "jmp .Lno_errorcode_push", "", X86_FEATURE_FRED + push $0 /* FRED error code, 0 for NMI */ + jmp fred_entrypoint_kernel +#endif + +.Lno_errorcode_push: + jmp asm_exc_nmi_kvm_vmx +SYM_FUNC_END(vmx_do_nmi_trampoline) + SYM_FUNC_START(vmx_do_nmi_irqoff) - VMX_DO_EVENT_IRQOFF call asm_exc_nmi_kvm_vmx + VMX_DO_EVENT_IRQOFF call vmx_do_nmi_trampoline nmi=1 SYM_FUNC_END(vmx_do_nmi_irqoff) @@ -358,5 +392,51 @@ SYM_FUNC_END(vmread_error_trampoline) #endif SYM_FUNC_START(vmx_do_interrupt_irqoff) - VMX_DO_EVENT_IRQOFF CALL_NOSPEC _ASM_ARG1 + VMX_DO_EVENT_IRQOFF CALL_NOSPEC _ASM_ARG1 fred=0 SYM_FUNC_END(vmx_do_interrupt_irqoff) + +#ifdef CONFIG_X86_64 +SYM_FUNC_START(vmx_do_interrupt_trampoline) + push $0 /* FRED error code, 0 for NMI and external interrupts */ + push %rdi + push %rsi + push %rdx + push %rcx + push %rax + push %r8 + push %r9 + push %r10 + push %r11 + push %rbx + push %rbp + push %r12 + push %r13 + push %r14 + push %r15 + + movq %rsp, %rdi /* %rdi -> pt_regs */ + call external_interrupt + + pop %r15 + pop %r14 + pop %r13 + pop %r12 + pop %rbp + pop %rbx + pop %r11 + pop %r10 + pop %r9 + pop %r8 + pop %rax + pop %rcx + pop %rdx + pop %rsi + pop %rdi + addq $8,%rsp /* Drop FRED error code */ + RET +SYM_FUNC_END(vmx_do_interrupt_trampoline) + +SYM_FUNC_START(vmx_do_fred_interrupt_irqoff) + VMX_DO_EVENT_IRQOFF call vmx_do_interrupt_trampoline +SYM_FUNC_END(vmx_do_fred_interrupt_irqoff) +#endif diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index d2d6e1b6c788..5addfee5cc6d 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6875,6 +6875,7 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu) } void vmx_do_interrupt_irqoff(unsigned long entry); +void vmx_do_fred_interrupt_irqoff(unsigned int vector); void vmx_do_nmi_irqoff(void); static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu) @@ -6923,7 +6924,12 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu) return; kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ); - vmx_do_interrupt_irqoff(gate_offset(desc)); +#ifdef CONFIG_X86_64 + if (cpu_feature_enabled(X86_FEATURE_FRED)) + vmx_do_fred_interrupt_irqoff(vector); + else +#endif + vmx_do_interrupt_irqoff(gate_offset(desc)); kvm_after_interrupt(vcpu); vcpu->arch.at_instruction_boundary = true;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 3ebeaab34b2e..4f12ead2266b 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7229,7 +7229,16 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI && is_nmi(vmx_get_intr_info(vcpu))) { kvm_before_interrupt(vcpu, KVM_HANDLING_NMI); - vmx_do_nmi_irqoff(); + /* + * Like IRET for IDT, ERETS/ERETU are required to end the NMI + * handler for FRED to unblock NMI ASAP (w/ bit 28 of CS set). + * + * Invoke the FRED NMI handler through executing "int $2". + */ + if (cpu_feature_enabled(X86_FEATURE_FRED)) + asm volatile("int $2"); + else + vmx_do_nmi_irqoff(); kvm_after_interrupt(vcpu); }