[v5,34/34] KVM: x86/vmx: execute "int $2" to handle NMI in NMI caused VM exits when FRED is enabled

Message ID 20230307023946.14516-35-xin3.li@intel.com
State New
Headers
Series x86: enable FRED for x86-64 |

Commit Message

Li, Xin3 March 7, 2023, 2:39 a.m. UTC
  Execute "int $2" to handle NMI in NMI caused VM exits when FRED is enabled.

Like IRET for IDT, ERETS/ERETU are required to end the NMI handler for FRED
to unblock NMI ASAP (w/ bit 28 of CS set). And there are 2 approaches to
invoke the FRED NMI handler:
1) execute "int $2", let the h/w do the job.
2) create a FRED NMI stack frame on the current kernel stack with ASM,
   and then jump to fred_entrypoint_kernel in arch/x86/entry/entry_64_fred.S.

1) is preferred as we want less ASM.

Tested-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---

Changes since v4:
*) Do NOT use the term "injection", which in the KVM context means to
   reinject an event into the guest (Sean Christopherson).
*) Add the explanation of why to execute "int $2" to invoke the NMI handler
   in NMI caused VM exits (Sean Christopherson).
---
 arch/x86/kvm/vmx/vmx.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)
  

Comments

Li, Xin3 March 7, 2023, 10 p.m. UTC | #1
> Execute "int $2" to handle NMI in NMI caused VM exits when FRED is enabled.
> 
> Like IRET for IDT, ERETS/ERETU are required to end the NMI handler for FRED
> to unblock NMI ASAP (w/ bit 28 of CS set). And there are 2 approaches to
> invoke the FRED NMI handler:
> 1) execute "int $2", let the h/w do the job.
> 2) create a FRED NMI stack frame on the current kernel stack with ASM,
>    and then jump to fred_entrypoint_kernel in arch/x86/entry/entry_64_fred.S.
> 
> 1) is preferred as we want less ASM.
> 
> Tested-by: Shan Kang <shan.kang@intel.com>
> Signed-off-by: Xin Li <xin3.li@intel.com>
> ---
> 
> Changes since v4:
> *) Do NOT use the term "injection", which in the KVM context means to
>    reinject an event into the guest (Sean Christopherson).
> *) Add the explanation of why to execute "int $2" to invoke the NMI handler
>    in NMI caused VM exits (Sean Christopherson).

Sean,

Do you have any further issue with the last 2 VMX patches?

If not, would you ack them?

Thanks!
  Xin


> ---
>  arch/x86/kvm/vmx/vmx.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 3ebeaab34b2e..4f12ead2266b 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7229,7 +7229,16 @@ static noinstr void vmx_vcpu_enter_exit(struct
> kvm_vcpu *vcpu,
>  	if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
>  	    is_nmi(vmx_get_intr_info(vcpu))) {
>  		kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
> -		vmx_do_nmi_irqoff();
> +		/*
> +		 * Like IRET for IDT, ERETS/ERETU are required to end the NMI
> +		 * handler for FRED to unblock NMI ASAP (w/ bit 28 of CS set).
> +		 *
> +		 * Invoke the FRED NMI handler through executing "int $2".
> +		 */
> +		if (cpu_feature_enabled(X86_FEATURE_FRED))
> +			asm volatile("int $2");
> +		else
> +			vmx_do_nmi_irqoff();
>  		kvm_after_interrupt(vcpu);
>  	}
> 
> --
> 2.34.1
  
Sean Christopherson March 22, 2023, 5:49 p.m. UTC | #2
On Mon, Mar 06, 2023, Xin Li wrote:
> Execute "int $2" to handle NMI in NMI caused VM exits when FRED is enabled.
> 
> Like IRET for IDT, ERETS/ERETU are required to end the NMI handler for FRED
> to unblock NMI ASAP (w/ bit 28 of CS set).

That's "CS" on the stack correct?  Is bit 28 set manually by software, or is it
set automatically by hardware?  If it's set by hardware, does "int $2" actually
set the bit since it's not a real NMI?

> And there are 2 approaches to
> invoke the FRED NMI handler:
> 1) execute "int $2", let the h/w do the job.
> 2) create a FRED NMI stack frame on the current kernel stack with ASM,
>    and then jump to fred_entrypoint_kernel in arch/x86/entry/entry_64_fred.S.
> 
> 1) is preferred as we want less ASM.

Who is "we", and how much assembly are we talking about?  E.g. I personally don't
mind a trampoline in KVM if it's small and/or can share code with existing
assembly subroutines.
  
Sean Christopherson March 22, 2023, 11:42 p.m. UTC | #3
On Wed, Mar 22, 2023, andrew.cooper3@citrix.com wrote:
> On 22/03/2023 5:49 pm, Sean Christopherson wrote:
> > On Mon, Mar 06, 2023, Xin Li wrote:
> >> Execute "int $2" to handle NMI in NMI caused VM exits when FRED is enabled.
> >>
> >> Like IRET for IDT, ERETS/ERETU are required to end the NMI handler for FRED
> >> to unblock NMI ASAP (w/ bit 28 of CS set).
> > That's "CS" on the stack correct?  Is bit 28 set manually by software, or is it
> > set automatically by hardware?  If it's set by hardware, does "int $2" actually
> > set the bit since it's not a real NMI?
> 
> int $2 had better not set it...� This is the piece of state that is
> intended to cause everything which isn't a real NMI to nest properly
> inside a real NMI.
> 
> It is supposed to be set on delivery of an NMI, and act as the trigger
> for ERET{U,S} to drop the latch.
> 
> Software is can set it manually in a FRED-frame in order to explicitly
> unblock NMIs.

Ah, found this in patch 19.  That hunk really belongs in this patch, because this
patch is full of magic without that information.

+       /*
+        * VM exits induced by NMIs keep NMI blocked, and we do
+        * "int $2" to reinject the NMI w/ NMI kept being blocked.
+        * However "int $2" doesn't set the nmi bit in the FRED
+        * stack frame, so we explicitly set it to make sure a
+        * later ERETS will unblock NMI immediately.
+        */
+       regs->nmi = 1;

Organization aside, this seems to defeat the purpose of _not_ unconditionally
unmasking NMIs on ERET since the kernel assumes any random "int $2" is coming from
KVM after an NMI VM-Exit.

Eww, and "int $2" doesn't even go directly to fred_exc_nmi(), it trampolines
through fred_sw_interrupt_kernel() first.  Looks like "int $2" from userspace gets
routed to a #GP, so at least that bit is handled.

I'm not dead set against the proposed approach, but IMO it's not obviously better
than a bit of assembly to have a more direct call into the NMI handler.
  
Li, Xin3 March 22, 2023, 11:43 p.m. UTC | #4
> > Like IRET for IDT, ERETS/ERETU are required to end the NMI handler for
> > FRED to unblock NMI ASAP (w/ bit 28 of CS set).
> 
> That's "CS" on the stack correct?  Is bit 28 set manually by software, or is it set
> automatically by hardware?  If it's set by hardware, does "int $2" actually set the
> bit since it's not a real NMI?

Right, It's the "CS" on the stack. The bit 28 is set by the FRED NMI handler:
https://lore.kernel.org/lkml/20230307023946.14516-20-xin3.li@intel.com/

Upon a NMI delivery, the NMI bit is always set by H/W. However, "int $2" does
NOT set it, thus we need to explicitly set it.
 
> > And there are 2 approaches to
> > invoke the FRED NMI handler:
> > 1) execute "int $2", let the h/w do the job.
> > 2) create a FRED NMI stack frame on the current kernel stack with ASM,
> >    and then jump to fred_entrypoint_kernel in arch/x86/entry/entry_64_fred.S.
> >
> > 1) is preferred as we want less ASM.
> 
> Who is "we", and how much assembly are we talking about?  E.g. I personally don't
> mind a trampoline in KVM if it's small and/or can share code with existing assembly
> subroutines.

I ever got such a comment:
https://lore.kernel.org/lkml/8735bpbhat.ffs@tglx/

However, if ASM is also okay, I can work on it.  And I don't think the ASM code
will be big.
  
Li, Xin3 March 23, 2023, 12:26 a.m. UTC | #5
> Organization aside, this seems to defeat the purpose of _not_ unconditionally
> unmasking NMIs on ERET since the kernel assumes any random "int $2" is coming
> from KVM after an NMI VM-Exit.

I'm a bit confused.  KVM VMX is the only component needing to execute "int $2"
and it surely has NMI blocked after an NMI VM-exit.

> Eww, and "int $2" doesn't even go directly to fred_exc_nmi(), it trampolines
> through fred_sw_interrupt_kernel() first.  Looks like "int $2" from userspace gets
> routed to a #GP, so at least that bit is handled.

FRED does a 2-level dispatch, unless an event handler is on a hot path,
we don't promote its handling.  NMI seems not a frequent event.

> I'm not dead set against the proposed approach, but IMO it's not obviously better
> than a bit of assembly to have a more direct call into the NMI handler.

I will give it a shot.
  
Li, Xin3 March 24, 2023, 5:45 p.m. UTC | #6
> > I'm not dead set against the proposed approach, but IMO it's not
> > obviously better than a bit of assembly to have a more direct call into the NMI
> handler.
> 
> I will give it a shot.

Hi Sean,

I got a working patch, before I resend the whole FRED patch set again, can
you please check if this is what you're expecting?

When FRED is enabled, the x86 CPU always pushes an error code on the stack
immediately after the return instruction address is pushed. To generate such
a stack frame, call a trampoline function first to push the return instruction
address on the stack, and the trampoline function then pushes an error code
(0 for IRQ/NMI) and jump to fred_entrypoint_kernel.

I could have vmx_do_interrupt_trampoline jump to fred_entrypoint_kernel
Instead of calling external_interrupt(), but that would reenter the noinstr
text again (not a big problem but seems not preferred by Peter Z).

Thanks!
  Xin

diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 631fd7da2bc3..6682b5bd202b 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -31,7 +31,7 @@
 #define VCPU_R15       __VCPU_REGS_R15 * WORD_SIZE
 #endif

-.macro VMX_DO_EVENT_IRQOFF call_insn call_target
+.macro VMX_DO_EVENT_IRQOFF call_insn call_target fred=1 nmi=0
        /*
         * Unconditionally create a stack frame, getting the correct RSP on the
         * stack (for x86-64) would take two instructions anyways, and RBP can
@@ -46,11 +46,34 @@
         * creating the synthetic interrupt stack frame for the IRQ/NMI.
         */
        and  $-16, %rsp
+
+       .if \fred
+       push $0         /* Reserved by FRED, must be 0 */
+       push $0         /* FRED event data, 0 for NMI and external interrupts */
+
+       .if \nmi
+       mov $(2 << 32 | 2 << 48), %_ASM_AX      /* NMI event type and vector */
+       .else
+       mov %_ASM_ARG1, %_ASM_AX
+       shl $32, %_ASM_AX                       /* external interrupt vector */
+       .endif
+       add $__KERNEL_DS, %_ASM_AX
+       bts $57, %_ASM_AX                       /* bit 57: 64-bit mode */
+       push %_ASM_AX
+       .else
        push $__KERNEL_DS
+       .endif
+
        push %rbp
 #endif
        pushf
+       .if \nmi
+       mov $__KERNEL_CS, %_ASM_AX
+       bts $28, %_ASM_AX                       /* set the NMI bit */
+       push %_ASM_AX
+       .else
        push $__KERNEL_CS
+       .endif
        \call_insn \call_target

        /*
@@ -299,8 +322,19 @@ SYM_INNER_LABEL(vmx_vmexit, SYM_L_GLOBAL)

 SYM_FUNC_END(__vmx_vcpu_run)

+SYM_FUNC_START(vmx_do_nmi_trampoline)
+#ifdef CONFIG_X86_FRED
+       ALTERNATIVE "jmp .Lno_errorcode_push", "", X86_FEATURE_FRED
+       push $0         /* FRED error code, 0 for NMI */
+       jmp fred_entrypoint_kernel
+#endif
+
+.Lno_errorcode_push:
+       jmp asm_exc_nmi_kvm_vmx
+SYM_FUNC_END(vmx_do_nmi_trampoline)
+
 SYM_FUNC_START(vmx_do_nmi_irqoff)
-       VMX_DO_EVENT_IRQOFF call asm_exc_nmi_kvm_vmx
+       VMX_DO_EVENT_IRQOFF call vmx_do_nmi_trampoline nmi=1
 SYM_FUNC_END(vmx_do_nmi_irqoff)


@@ -358,5 +392,51 @@ SYM_FUNC_END(vmread_error_trampoline)
 #endif

 SYM_FUNC_START(vmx_do_interrupt_irqoff)
-       VMX_DO_EVENT_IRQOFF CALL_NOSPEC _ASM_ARG1
+       VMX_DO_EVENT_IRQOFF CALL_NOSPEC _ASM_ARG1 fred=0
 SYM_FUNC_END(vmx_do_interrupt_irqoff)
+
+#ifdef CONFIG_X86_64
+SYM_FUNC_START(vmx_do_interrupt_trampoline)
+       push $0 /* FRED error code, 0 for NMI and external interrupts */
+       push %rdi
+       push %rsi
+       push %rdx
+       push %rcx
+       push %rax
+       push %r8
+       push %r9
+       push %r10
+       push %r11
+       push %rbx
+       push %rbp
+       push %r12
+       push %r13
+       push %r14
+       push %r15
+
+       movq    %rsp, %rdi      /* %rdi -> pt_regs */
+       call external_interrupt
+
+       pop %r15
+       pop %r14
+       pop %r13
+       pop %r12
+       pop %rbp
+       pop %rbx
+       pop %r11
+       pop %r10
+       pop %r9
+       pop %r8
+       pop %rax
+       pop %rcx
+       pop %rdx
+       pop %rsi
+       pop %rdi
+       addq $8,%rsp            /* Drop FRED error code */
+       RET
+SYM_FUNC_END(vmx_do_interrupt_trampoline)
+
+SYM_FUNC_START(vmx_do_fred_interrupt_irqoff)
+       VMX_DO_EVENT_IRQOFF call vmx_do_interrupt_trampoline
+SYM_FUNC_END(vmx_do_fred_interrupt_irqoff)
+#endif
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d2d6e1b6c788..5addfee5cc6d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6875,6 +6875,7 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 }

 void vmx_do_interrupt_irqoff(unsigned long entry);
+void vmx_do_fred_interrupt_irqoff(unsigned int vector);
 void vmx_do_nmi_irqoff(void);

 static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu)
@@ -6923,7 +6924,12 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
                return;

        kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ);
-       vmx_do_interrupt_irqoff(gate_offset(desc));
+#ifdef CONFIG_X86_64
+       if (cpu_feature_enabled(X86_FEATURE_FRED))
+               vmx_do_fred_interrupt_irqoff(vector);
+       else
+#endif
+               vmx_do_interrupt_irqoff(gate_offset(desc));
        kvm_after_interrupt(vcpu);

        vcpu->arch.at_instruction_boundary = true;
  

Patch

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3ebeaab34b2e..4f12ead2266b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7229,7 +7229,16 @@  static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
 	if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
 	    is_nmi(vmx_get_intr_info(vcpu))) {
 		kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
-		vmx_do_nmi_irqoff();
+		/*
+		 * Like IRET for IDT, ERETS/ERETU are required to end the NMI
+		 * handler for FRED to unblock NMI ASAP (w/ bit 28 of CS set).
+		 *
+		 * Invoke the FRED NMI handler through executing "int $2".
+		 */
+		if (cpu_feature_enabled(X86_FEATURE_FRED))
+			asm volatile("int $2");
+		else
+			vmx_do_nmi_irqoff();
 		kvm_after_interrupt(vcpu);
 	}