From patchwork Tue Aug 1 08:35:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Xin3" X-Patchwork-Id: 129117 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:918b:0:b0:3e4:2afc:c1 with SMTP id s11csp2560094vqg; Tue, 1 Aug 2023 03:02:20 -0700 (PDT) X-Google-Smtp-Source: APBJJlEXSt7+lXPxpjwMzcmnEbNr7t+fOSU+Z/MxsKs3iplJwHTrkL1gUztt/rcltm9krWZRS1D8 X-Received: by 2002:adf:e883:0:b0:317:74bb:82ff with SMTP id d3-20020adfe883000000b0031774bb82ffmr1936339wrm.7.1690884140273; Tue, 01 Aug 2023 03:02:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690884140; cv=none; d=google.com; s=arc-20160816; b=hyccxWTLyy2IarTnPBskaXYarGaa+4h1IEaqHcY7WqSfy+st5reLsczBwn+cZDAdGB ME3OwyA492MFA80MuWcPxSpbtfFAHDrQBgv7nWilqfXX9OGhFFCjoNbI+DiHYe+e6Oos LbA9zF5paSTBEBXflx9ypvwCFCe0vvQ6QW4Jy982M/z81uphW9eVMyUCgM/DtN2zZ9u6 Ki6pcz11q2EoGWqBetJx3hmJg59ejCic3Yna4uCg28Q004WtC1ZWeN4Gn5oqm6RY80A6 FKeD2dKHRQtPEt2Pl5/GwaQ/c7MWSUy8dY/G4XJa4gFRfrWx0rIgwfZyMdmFpVuOMRhD VoBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=HaqQzrHGuvrfMIFHi31s+cKRPkgS17G46gh7wP/aVms=; fh=nYO9ukkM7JnAAJZcChGC8UWBlCLXeJxIDT82cxoIaf0=; b=JbxT+jGJvVB5zFZvmM5r/azh2XddkOzD5KNDGdJgyjp7OiMQsOAUEDtdLBfpkZ+KJN rHmBhuGoHcVJbGkRbpjhx3hW+u5STog7NbLNmI6y1/py7P3P0hm7H2KPkoh+sIYzPowc NiZhMs+tWZESvyOKa/YEb2hHwbhonXODXRqP5SKEsJbAYXe2zUUxMDwSh/kWRVHVzMNd p8cn0uqqLEV23wf5imSWbYQ9Dd7JhojcZEw/u8FwH3Duf+JUiHlyaChlN2Z8CZ6dnb6m fj2WAkAr6C4GYFSqT/CzjjwVBOcxrDTVQfrJSMU0dU1/L2u5iLSypKO/ZrniU8eJBsIy BuRQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Q9vtqNUg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b17-20020aa7df91000000b005225522e9a0si8641786edy.597.2023.08.01.03.01.56; Tue, 01 Aug 2023 03:02:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Q9vtqNUg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232574AbjHAJJD (ORCPT + 99 others); Tue, 1 Aug 2023 05:09:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37474 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232997AbjHAJI3 (ORCPT ); Tue, 1 Aug 2023 05:08:29 -0400 Received: from mgamail.intel.com (unknown [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F0BD93AB1; Tue, 1 Aug 2023 02:06:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1690880776; x=1722416776; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nRz3qqnddKXYmfF22m/o97+CO0o46xEgUK6odjAvUmY=; b=Q9vtqNUgrv91xetqbj8sCYUHU6Kolb1g7i+76rJoKKCfFDhzH3slwBJ8 Oa6xQ/to9QDce3Qd8xZ2jJsWsZvHvEVZVZ5y5FQfl0FBGogKtWaQUGy7p Ja9VXx0EpJ1HROBuLls3BR133+x5g7N5LDNG/KNxJ1g0OPwGrAAarPpzn GvxOarbQsVzXDFb12qLJPph1wMKu0g/ZMXMEHDU/rbndcjn60/WHGnifd qNbtaxN4NNVHxztnFkE7PSvfYxP2OoCcd0Sbs0wO7K4qyuh5wbeVJDmwe NZzVt1WQrRVPAdW3riFcXnus19onqebtzz3EJ5kxaTLrtBIToJqNj6XRo g==; X-IronPort-AV: E=McAfee;i="6600,9927,10788"; a="366713683" X-IronPort-AV: E=Sophos;i="6.01,246,1684825200"; d="scan'208";a="366713683" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Aug 2023 02:04:38 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10788"; a="722420749" X-IronPort-AV: E=Sophos;i="6.01,246,1684825200"; d="scan'208";a="722420749" Received: from unknown (HELO fred..) ([172.25.112.68]) by orsmga007.jf.intel.com with ESMTP; 01 Aug 2023 02:04:36 -0700 From: Xin Li To: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, linux-hyperv@vger.kernel.org, kvm@vger.kernel.org, xen-devel@lists.xenproject.org Cc: Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H . Peter Anvin" , Andy Lutomirski , Oleg Nesterov , Tony Luck , "K . Y . Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Sean Christopherson , Peter Zijlstra , Juergen Gross , Stefano Stabellini , Oleksandr Tyshchenko , Josh Poimboeuf , "Paul E . McKenney" , Catalin Marinas , Randy Dunlap , Steven Rostedt , Kim Phillips , Xin Li , Hyeonggon Yoo <42.hyeyoo@gmail.com>, "Liam R . Howlett" , Sebastian Reichel , "Kirill A . Shutemov" , Suren Baghdasaryan , Pawan Gupta , Babu Moger , Jim Mattson , Sandipan Das , Lai Jiangshan , Hans de Goede , Reinette Chatre , Daniel Sneddon , Breno Leitao , Nikunj A Dadhania , Brian Gerst , Sami Tolvanen , Alexander Potapenko , Andrew Morton , Arnd Bergmann , "Eric W . Biederman" , Kees Cook , Masami Hiramatsu , Masahiro Yamada , Ze Gao , Fei Li , Conghui , Ashok Raj , "Jason A . Donenfeld" , Mark Rutland , Jacob Pan , Jiapeng Chong , Jane Malalane , David Woodhouse , Boris Ostrovsky , Arnaldo Carvalho de Melo , Yantengsi , Christophe Leroy , Sathvika Vasireddy Subject: [PATCH RESEND v9 33/36] KVM: VMX: Add VMX_DO_FRED_EVENT_IRQOFF for IRQ/NMI handling Date: Tue, 1 Aug 2023 01:35:50 -0700 Message-Id: <20230801083553.8468-7-xin3.li@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230801083553.8468-1-xin3.li@intel.com> References: <20230801083553.8468-1-xin3.li@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773020528325343269 X-GMAIL-MSGID: 1773020528325343269 Compared to an IDT stack frame, a FRED stack frame has extra 16 bytes of information pushed at the regular stack top and 8 bytes of error code _always_ pushed at the regular stack bottom, add VMX_DO_FRED_EVENT_IRQOFF to generate FRED stack frames with event type and vector properly set. Thus, IRQ/NMI can be handled with the existing approach when FRED is enabled. For IRQ handling, general purpose registers are pushed to the stack to form a pt_regs structure, which is then used to call external_interrupt(). As a result, IRQ handling no longer re-enters the noinstr code. Tested-by: Shan Kang Signed-off-by: Xin Li --- Changes since v8: * Add a new macro VMX_DO_FRED_EVENT_IRQOFF for FRED instead of refactoring VMX_DO_EVENT_IRQOFF (Sean Christopherson). * Do NOT use a trampoline, just LEA+PUSH the return RIP, PUSH the error code, and jump to the FRED kernel entry point for NMI or call external_interrupt() for IRQs (Sean Christopherson). * Call external_interrupt() only when FRED is enabled, and convert the non-FRED handling to external_interrupt() after FRED lands (Sean Christopherson). --- arch/x86/kvm/vmx/vmenter.S | 88 ++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/vmx.c | 19 ++++++-- 2 files changed, 104 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S index 07e927d4d099..5ee6a57b59a5 100644 --- a/arch/x86/kvm/vmx/vmenter.S +++ b/arch/x86/kvm/vmx/vmenter.S @@ -2,12 +2,14 @@ #include #include #include +#include #include #include #include #include #include "kvm-asm-offsets.h" #include "run_flags.h" +#include "../../entry/calling.h" #define WORD_SIZE (BITS_PER_LONG / 8) @@ -31,6 +33,80 @@ #define VCPU_R15 __VCPU_REGS_R15 * WORD_SIZE #endif +#ifdef CONFIG_X86_FRED +.macro VMX_DO_FRED_EVENT_IRQOFF branch_insn branch_target nmi=0 + /* + * Unconditionally create a stack frame, getting the correct RSP on the + * stack (for x86-64) would take two instructions anyways, and RBP can + * be used to restore RSP to make objtool happy (see below). + */ + push %_ASM_BP + mov %_ASM_SP, %_ASM_BP + + /* + * Don't check the FRED stack level, the call stack leading to this + * helper is effectively constant and shallow (relatively speaking). + * + * Emulate the FRED-defined redzone and stack alignment. + */ + sub $(FRED_CONFIG_REDZONE_AMOUNT << 6), %rsp + and $FRED_STACK_FRAME_RSP_MASK, %rsp + + /* + * A FRED stack frame has extra 16 bytes of information pushed at the + * regular stack top compared to an IDT stack frame. + */ + push $0 /* Reserved by FRED, must be 0 */ + push $0 /* FRED event data, 0 for NMI and external interrupts */ + + shl $32, %rdi /* FRED event type and vector */ + .if \nmi + bts $FRED_SSX_NMI_BIT, %rdi /* Set the NMI bit */ + .endif + bts $FRED_SSX_64_BIT_MODE_BIT, %rdi /* Set the 64-bit mode */ + or $__KERNEL_DS, %rdi + push %rdi + push %rbp + pushf + mov $__KERNEL_CS, %rax + push %rax + + /* + * Unlike the IDT event delivery, FRED _always_ pushes an error code + * after pushing the return RIP, thus the CALL instruction CANNOT be + * used here to push the return RIP, otherwise there is no chance to + * push an error code before invoking the IRQ/NMI handler. + * + * Use LEA to get the return RIP and push it, then push an error code. + */ + lea 1f(%rip), %rax + push %rax + push $0 /* FRED error code, 0 for NMI and external interrupts */ + + .if \nmi == 0 + PUSH_REGS + mov %rsp, %rdi + .endif + + \branch_insn \branch_target + + .if \nmi == 0 + POP_REGS + .endif + +1: + /* + * "Restore" RSP from RBP, even though IRET has already unwound RSP to + * the correct value. objtool doesn't know the callee will IRET and, + * without the explicit restore, thinks the stack is getting walloped. + * Using an unwind hint is problematic due to x86-64's dynamic alignment. + */ + mov %_ASM_BP, %_ASM_SP + pop %_ASM_BP + RET +.endm +#endif + .macro VMX_DO_EVENT_IRQOFF call_insn call_target /* * Unconditionally create a stack frame, getting the correct RSP on the @@ -299,6 +375,12 @@ SYM_INNER_LABEL_ALIGN(vmx_vmexit, SYM_L_GLOBAL) SYM_FUNC_END(__vmx_vcpu_run) +#ifdef CONFIG_X86_FRED +SYM_FUNC_START(vmx_do_fred_nmi_irqoff) + VMX_DO_FRED_EVENT_IRQOFF jmp fred_entrypoint_kernel nmi=1 +SYM_FUNC_END(vmx_do_fred_nmi_irqoff) +#endif + SYM_FUNC_START(vmx_do_nmi_irqoff) VMX_DO_EVENT_IRQOFF call asm_exc_nmi_kvm_vmx SYM_FUNC_END(vmx_do_nmi_irqoff) @@ -357,6 +439,12 @@ SYM_FUNC_START(vmread_error_trampoline) SYM_FUNC_END(vmread_error_trampoline) #endif +#ifdef CONFIG_X86_FRED +SYM_FUNC_START(vmx_do_fred_interrupt_irqoff) + VMX_DO_FRED_EVENT_IRQOFF call external_interrupt +SYM_FUNC_END(vmx_do_fred_interrupt_irqoff) +#endif + SYM_FUNC_START(vmx_do_interrupt_irqoff) VMX_DO_EVENT_IRQOFF CALL_NOSPEC _ASM_ARG1 SYM_FUNC_END(vmx_do_interrupt_irqoff) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 0ecf4be2c6af..4e90c69a92bf 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6890,6 +6890,14 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu) memset(vmx->pi_desc.pir, 0, sizeof(vmx->pi_desc.pir)); } +#ifdef CONFIG_X86_FRED +void vmx_do_fred_interrupt_irqoff(unsigned int vector); +void vmx_do_fred_nmi_irqoff(unsigned int vector); +#else +#define vmx_do_fred_interrupt_irqoff(x) BUG() +#define vmx_do_fred_nmi_irqoff(x) BUG() +#endif + void vmx_do_interrupt_irqoff(unsigned long entry); void vmx_do_nmi_irqoff(void); @@ -6932,14 +6940,16 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu) { u32 intr_info = vmx_get_intr_info(vcpu); unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK; - gate_desc *desc = (gate_desc *)host_idt_base + vector; if (KVM_BUG(!is_external_intr(intr_info), vcpu->kvm, "unexpected VM-Exit interrupt info: 0x%x", intr_info)) return; kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ); - vmx_do_interrupt_irqoff(gate_offset(desc)); + if (cpu_feature_enabled(X86_FEATURE_FRED)) + vmx_do_fred_interrupt_irqoff(vector); /* Event type is 0 */ + else + vmx_do_interrupt_irqoff(gate_offset((gate_desc *)host_idt_base + vector)); kvm_after_interrupt(vcpu); vcpu->arch.at_instruction_boundary = true; @@ -7225,7 +7235,10 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI && is_nmi(vmx_get_intr_info(vcpu))) { kvm_before_interrupt(vcpu, KVM_HANDLING_NMI); - vmx_do_nmi_irqoff(); + if (cpu_feature_enabled(X86_FEATURE_FRED)) + vmx_do_fred_nmi_irqoff((EVENT_TYPE_NMI << 16) | NMI_VECTOR); + else + vmx_do_nmi_irqoff(); kvm_after_interrupt(vcpu); }