From patchwork Mon Jul 31 06:41:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Xin3" X-Patchwork-Id: 128474 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:918b:0:b0:3e4:2afc:c1 with SMTP id s11csp1884699vqg; Mon, 31 Jul 2023 02:01:36 -0700 (PDT) X-Google-Smtp-Source: APBJJlEjUzE5T1H92grOSvDLqq9oNPMnU5rLeo3pbMUv/ygc29wVPRfu+O0zbG7fcN10cHmIkpgl X-Received: by 2002:a17:902:b58b:b0:1b6:a99b:ceb1 with SMTP id a11-20020a170902b58b00b001b6a99bceb1mr7669407pls.50.1690794096185; Mon, 31 Jul 2023 02:01:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690794096; cv=none; d=google.com; s=arc-20160816; b=ByXC3Te05sCNCsQA8PXa8U6CfSVkDDJNYUBCpcJWxiHunk++pvokEETER49fQoJq2i qu7r3UqoPrpyNfAeinkFhrbBUWR3zFMQW1xD2pqyc9tokwU0UlpLTYnRVwpLNAjSNyMa 9VyLh1ODIwLbpt02Gy/muIVI/1R4kbnxDIw7HozRthj6VfilG6pVHhx4M9KX9yl8wTlM g2Qz8CFxyBEpcnXo+yxPmDQetmrxprsAsl6UyxAw8f/9PX6dK9fWVikUudkscPUDQDGO KBy/ivWDxUW9/kEYzX1kpLLW0Otq72WGfezfFfIyhjH0a3p9DWmjC+uMJdwpA3tHv8b1 JBvw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=HaqQzrHGuvrfMIFHi31s+cKRPkgS17G46gh7wP/aVms=; fh=runz8kYc7mvoALl4f6YJmLswtUHBUyF5+hFgjyimmLU=; b=J+IQH752qBApo0co/aw4ettrr1mGLils/DYGNWKPXRcc5Bx3WXeHEGTfGa04z8nfxs 5W2cWST0vUh3h/XFxFhcKO4UYAJ77iaTIN2FRDY8USvctNVsyWB4tVIx8vRLmaXfAE3n 8fFyDlUI/SBtyMzspWa59NOFrzhkgghRFb/QNf9457QfqzNw+jYbb0XciG5SbLbaAIUl I1b4UmwqD9GTp8DoVPwdQPsHhJl3yznhwFOLS3bVutHs2/Fsj+FrXgE+jyRCWWInJkoo pe5+UxMYercgdaKLJsabRfp3flF9yNbM3N0tZzwoVrNNQVohGvsGiB0bvZl+HbgGn6l9 qU6w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Sgo4CRgM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c20-20020a170902c1d400b001b041591a5fsi7042954plc.459.2023.07.31.02.01.23; Mon, 31 Jul 2023 02:01:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Sgo4CRgM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231135AbjGaHNg (ORCPT + 99 others); Mon, 31 Jul 2023 03:13:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49556 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231316AbjGaHMT (ORCPT ); Mon, 31 Jul 2023 03:12:19 -0400 Received: from mgamail.intel.com (unknown [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B345D3C26; Mon, 31 Jul 2023 00:10:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1690787423; x=1722323423; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nRz3qqnddKXYmfF22m/o97+CO0o46xEgUK6odjAvUmY=; b=Sgo4CRgMb+0zOJuQqZBNFyzvFz/NhlGktqACW/W0Utjj+k/OhFu++vmI Pq6jVZk7wE6VvwFcRUpUAXPUBIjipbkhyKX6hqRGuAv72jXxz/vBTN++h 8pR8JRlQTiW5PNz1T3NppJNXPu59u5sYw4AGGEZ7oWSZrczhHv3UgHiCj P26DHMzNcnxkd0cv6eJvyUDcHzZelxzpBSg8lngpGD2gTOjpMCe94w/pp LKtWkOTD3Z+SpFeuEHH1K30eLP8dzxl2teUZg92B1q8zcmejFIH+T1XIz Dumegt2Vowgr/3INoCZ4rxf7uA8LtOwYoiX/oKMw+WZISMTGpgjFfRsDT Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10787"; a="432750214" X-IronPort-AV: E=Sophos;i="6.01,244,1684825200"; d="scan'208";a="432750214" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2023 00:10:13 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10787"; a="798157787" X-IronPort-AV: E=Sophos;i="6.01,244,1684825200"; d="scan'208";a="798157787" Received: from unknown (HELO fred..) ([172.25.112.68]) by fmsmga004.fm.intel.com with ESMTP; 31 Jul 2023 00:10:09 -0700 From: Xin Li To: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, linux-hyperv@vger.kernel.org, kvm@vger.kernel.org, xen-devel@lists.xenproject.org Cc: Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H . Peter Anvin" , Andy Lutomirski , Oleg Nesterov , Tony Luck , "K . Y . Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Sean Christopherson , Peter Zijlstra , Juergen Gross , Stefano Stabellini , Oleksandr Tyshchenko , Josh Poimboeuf , "Paul E . McKenney" , Catalin Marinas , Randy Dunlap , Steven Rostedt , Kim Phillips , Xin Li , Hyeonggon Yoo <42.hyeyoo@gmail.com>, "Liam R . Howlett" , Sebastian Reichel , "Kirill A . Shutemov" , Suren Baghdasaryan , Pawan Gupta , Jiaxi Chen , Babu Moger , Jim Mattson , Sandipan Das , Lai Jiangshan , Hans de Goede , Reinette Chatre , Daniel Sneddon , Breno Leitao , Nikunj A Dadhania , Brian Gerst , Sami Tolvanen , Alexander Potapenko , Andrew Morton , Arnd Bergmann , "Eric W . Biederman" , Kees Cook , Masami Hiramatsu , Masahiro Yamada , Ze Gao , Fei Li , Conghui , Ashok Raj , "Jason A . Donenfeld" , Mark Rutland , Jacob Pan , Jiapeng Chong , Jane Malalane , David Woodhouse , Boris Ostrovsky , Arnaldo Carvalho de Melo , Yantengsi , Christophe Leroy , Sathvika Vasireddy Subject: [PATCH v9 33/36] KVM: VMX: Add VMX_DO_FRED_EVENT_IRQOFF for IRQ/NMI handling Date: Sun, 30 Jul 2023 23:41:30 -0700 Message-Id: <20230731064133.3881-4-xin3.li@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230731064133.3881-1-xin3.li@intel.com> References: <20230731064133.3881-1-xin3.li@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772926110650730030 X-GMAIL-MSGID: 1772926110650730030 Compared to an IDT stack frame, a FRED stack frame has extra 16 bytes of information pushed at the regular stack top and 8 bytes of error code _always_ pushed at the regular stack bottom, add VMX_DO_FRED_EVENT_IRQOFF to generate FRED stack frames with event type and vector properly set. Thus, IRQ/NMI can be handled with the existing approach when FRED is enabled. For IRQ handling, general purpose registers are pushed to the stack to form a pt_regs structure, which is then used to call external_interrupt(). As a result, IRQ handling no longer re-enters the noinstr code. Tested-by: Shan Kang Signed-off-by: Xin Li --- Changes since v8: * Add a new macro VMX_DO_FRED_EVENT_IRQOFF for FRED instead of refactoring VMX_DO_EVENT_IRQOFF (Sean Christopherson). * Do NOT use a trampoline, just LEA+PUSH the return RIP, PUSH the error code, and jump to the FRED kernel entry point for NMI or call external_interrupt() for IRQs (Sean Christopherson). * Call external_interrupt() only when FRED is enabled, and convert the non-FRED handling to external_interrupt() after FRED lands (Sean Christopherson). --- arch/x86/kvm/vmx/vmenter.S | 88 ++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/vmx.c | 19 ++++++-- 2 files changed, 104 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S index 07e927d4d099..5ee6a57b59a5 100644 --- a/arch/x86/kvm/vmx/vmenter.S +++ b/arch/x86/kvm/vmx/vmenter.S @@ -2,12 +2,14 @@ #include #include #include +#include #include #include #include #include #include "kvm-asm-offsets.h" #include "run_flags.h" +#include "../../entry/calling.h" #define WORD_SIZE (BITS_PER_LONG / 8) @@ -31,6 +33,80 @@ #define VCPU_R15 __VCPU_REGS_R15 * WORD_SIZE #endif +#ifdef CONFIG_X86_FRED +.macro VMX_DO_FRED_EVENT_IRQOFF branch_insn branch_target nmi=0 + /* + * Unconditionally create a stack frame, getting the correct RSP on the + * stack (for x86-64) would take two instructions anyways, and RBP can + * be used to restore RSP to make objtool happy (see below). + */ + push %_ASM_BP + mov %_ASM_SP, %_ASM_BP + + /* + * Don't check the FRED stack level, the call stack leading to this + * helper is effectively constant and shallow (relatively speaking). + * + * Emulate the FRED-defined redzone and stack alignment. + */ + sub $(FRED_CONFIG_REDZONE_AMOUNT << 6), %rsp + and $FRED_STACK_FRAME_RSP_MASK, %rsp + + /* + * A FRED stack frame has extra 16 bytes of information pushed at the + * regular stack top compared to an IDT stack frame. + */ + push $0 /* Reserved by FRED, must be 0 */ + push $0 /* FRED event data, 0 for NMI and external interrupts */ + + shl $32, %rdi /* FRED event type and vector */ + .if \nmi + bts $FRED_SSX_NMI_BIT, %rdi /* Set the NMI bit */ + .endif + bts $FRED_SSX_64_BIT_MODE_BIT, %rdi /* Set the 64-bit mode */ + or $__KERNEL_DS, %rdi + push %rdi + push %rbp + pushf + mov $__KERNEL_CS, %rax + push %rax + + /* + * Unlike the IDT event delivery, FRED _always_ pushes an error code + * after pushing the return RIP, thus the CALL instruction CANNOT be + * used here to push the return RIP, otherwise there is no chance to + * push an error code before invoking the IRQ/NMI handler. + * + * Use LEA to get the return RIP and push it, then push an error code. + */ + lea 1f(%rip), %rax + push %rax + push $0 /* FRED error code, 0 for NMI and external interrupts */ + + .if \nmi == 0 + PUSH_REGS + mov %rsp, %rdi + .endif + + \branch_insn \branch_target + + .if \nmi == 0 + POP_REGS + .endif + +1: + /* + * "Restore" RSP from RBP, even though IRET has already unwound RSP to + * the correct value. objtool doesn't know the callee will IRET and, + * without the explicit restore, thinks the stack is getting walloped. + * Using an unwind hint is problematic due to x86-64's dynamic alignment. + */ + mov %_ASM_BP, %_ASM_SP + pop %_ASM_BP + RET +.endm +#endif + .macro VMX_DO_EVENT_IRQOFF call_insn call_target /* * Unconditionally create a stack frame, getting the correct RSP on the @@ -299,6 +375,12 @@ SYM_INNER_LABEL_ALIGN(vmx_vmexit, SYM_L_GLOBAL) SYM_FUNC_END(__vmx_vcpu_run) +#ifdef CONFIG_X86_FRED +SYM_FUNC_START(vmx_do_fred_nmi_irqoff) + VMX_DO_FRED_EVENT_IRQOFF jmp fred_entrypoint_kernel nmi=1 +SYM_FUNC_END(vmx_do_fred_nmi_irqoff) +#endif + SYM_FUNC_START(vmx_do_nmi_irqoff) VMX_DO_EVENT_IRQOFF call asm_exc_nmi_kvm_vmx SYM_FUNC_END(vmx_do_nmi_irqoff) @@ -357,6 +439,12 @@ SYM_FUNC_START(vmread_error_trampoline) SYM_FUNC_END(vmread_error_trampoline) #endif +#ifdef CONFIG_X86_FRED +SYM_FUNC_START(vmx_do_fred_interrupt_irqoff) + VMX_DO_FRED_EVENT_IRQOFF call external_interrupt +SYM_FUNC_END(vmx_do_fred_interrupt_irqoff) +#endif + SYM_FUNC_START(vmx_do_interrupt_irqoff) VMX_DO_EVENT_IRQOFF CALL_NOSPEC _ASM_ARG1 SYM_FUNC_END(vmx_do_interrupt_irqoff) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 0ecf4be2c6af..4e90c69a92bf 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6890,6 +6890,14 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu) memset(vmx->pi_desc.pir, 0, sizeof(vmx->pi_desc.pir)); } +#ifdef CONFIG_X86_FRED +void vmx_do_fred_interrupt_irqoff(unsigned int vector); +void vmx_do_fred_nmi_irqoff(unsigned int vector); +#else +#define vmx_do_fred_interrupt_irqoff(x) BUG() +#define vmx_do_fred_nmi_irqoff(x) BUG() +#endif + void vmx_do_interrupt_irqoff(unsigned long entry); void vmx_do_nmi_irqoff(void); @@ -6932,14 +6940,16 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu) { u32 intr_info = vmx_get_intr_info(vcpu); unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK; - gate_desc *desc = (gate_desc *)host_idt_base + vector; if (KVM_BUG(!is_external_intr(intr_info), vcpu->kvm, "unexpected VM-Exit interrupt info: 0x%x", intr_info)) return; kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ); - vmx_do_interrupt_irqoff(gate_offset(desc)); + if (cpu_feature_enabled(X86_FEATURE_FRED)) + vmx_do_fred_interrupt_irqoff(vector); /* Event type is 0 */ + else + vmx_do_interrupt_irqoff(gate_offset((gate_desc *)host_idt_base + vector)); kvm_after_interrupt(vcpu); vcpu->arch.at_instruction_boundary = true; @@ -7225,7 +7235,10 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI && is_nmi(vmx_get_intr_info(vcpu))) { kvm_before_interrupt(vcpu, KVM_HANDLING_NMI); - vmx_do_nmi_irqoff(); + if (cpu_feature_enabled(X86_FEATURE_FRED)) + vmx_do_fred_nmi_irqoff((EVENT_TYPE_NMI << 16) | NMI_VECTOR); + else + vmx_do_nmi_irqoff(); kvm_after_interrupt(vcpu); }