Message ID | 20230801083553.8468-7-xin3.li@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:918b:0:b0:3e4:2afc:c1 with SMTP id s11csp2560094vqg; Tue, 1 Aug 2023 03:02:20 -0700 (PDT) X-Google-Smtp-Source: APBJJlEXSt7+lXPxpjwMzcmnEbNr7t+fOSU+Z/MxsKs3iplJwHTrkL1gUztt/rcltm9krWZRS1D8 X-Received: by 2002:adf:e883:0:b0:317:74bb:82ff with SMTP id d3-20020adfe883000000b0031774bb82ffmr1936339wrm.7.1690884140273; Tue, 01 Aug 2023 03:02:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690884140; cv=none; d=google.com; s=arc-20160816; b=hyccxWTLyy2IarTnPBskaXYarGaa+4h1IEaqHcY7WqSfy+st5reLsczBwn+cZDAdGB ME3OwyA492MFA80MuWcPxSpbtfFAHDrQBgv7nWilqfXX9OGhFFCjoNbI+DiHYe+e6Oos LbA9zF5paSTBEBXflx9ypvwCFCe0vvQ6QW4Jy982M/z81uphW9eVMyUCgM/DtN2zZ9u6 Ki6pcz11q2EoGWqBetJx3hmJg59ejCic3Yna4uCg28Q004WtC1ZWeN4Gn5oqm6RY80A6 FKeD2dKHRQtPEt2Pl5/GwaQ/c7MWSUy8dY/G4XJa4gFRfrWx0rIgwfZyMdmFpVuOMRhD VoBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=HaqQzrHGuvrfMIFHi31s+cKRPkgS17G46gh7wP/aVms=; fh=nYO9ukkM7JnAAJZcChGC8UWBlCLXeJxIDT82cxoIaf0=; b=JbxT+jGJvVB5zFZvmM5r/azh2XddkOzD5KNDGdJgyjp7OiMQsOAUEDtdLBfpkZ+KJN rHmBhuGoHcVJbGkRbpjhx3hW+u5STog7NbLNmI6y1/py7P3P0hm7H2KPkoh+sIYzPowc NiZhMs+tWZESvyOKa/YEb2hHwbhonXODXRqP5SKEsJbAYXe2zUUxMDwSh/kWRVHVzMNd p8cn0uqqLEV23wf5imSWbYQ9Dd7JhojcZEw/u8FwH3Duf+JUiHlyaChlN2Z8CZ6dnb6m fj2WAkAr6C4GYFSqT/CzjjwVBOcxrDTVQfrJSMU0dU1/L2u5iLSypKO/ZrniU8eJBsIy BuRQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Q9vtqNUg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b17-20020aa7df91000000b005225522e9a0si8641786edy.597.2023.08.01.03.01.56; Tue, 01 Aug 2023 03:02:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Q9vtqNUg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232574AbjHAJJD (ORCPT <rfc822;maxi.paulin@gmail.com> + 99 others); Tue, 1 Aug 2023 05:09:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37474 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232997AbjHAJI3 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 1 Aug 2023 05:08:29 -0400 Received: from mgamail.intel.com (unknown [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F0BD93AB1; Tue, 1 Aug 2023 02:06:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1690880776; x=1722416776; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nRz3qqnddKXYmfF22m/o97+CO0o46xEgUK6odjAvUmY=; b=Q9vtqNUgrv91xetqbj8sCYUHU6Kolb1g7i+76rJoKKCfFDhzH3slwBJ8 Oa6xQ/to9QDce3Qd8xZ2jJsWsZvHvEVZVZ5y5FQfl0FBGogKtWaQUGy7p Ja9VXx0EpJ1HROBuLls3BR133+x5g7N5LDNG/KNxJ1g0OPwGrAAarPpzn GvxOarbQsVzXDFb12qLJPph1wMKu0g/ZMXMEHDU/rbndcjn60/WHGnifd qNbtaxN4NNVHxztnFkE7PSvfYxP2OoCcd0Sbs0wO7K4qyuh5wbeVJDmwe NZzVt1WQrRVPAdW3riFcXnus19onqebtzz3EJ5kxaTLrtBIToJqNj6XRo g==; X-IronPort-AV: E=McAfee;i="6600,9927,10788"; a="366713683" X-IronPort-AV: E=Sophos;i="6.01,246,1684825200"; d="scan'208";a="366713683" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Aug 2023 02:04:38 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10788"; a="722420749" X-IronPort-AV: E=Sophos;i="6.01,246,1684825200"; d="scan'208";a="722420749" Received: from unknown (HELO fred..) ([172.25.112.68]) by orsmga007.jf.intel.com with ESMTP; 01 Aug 2023 02:04:36 -0700 From: Xin Li <xin3.li@intel.com> To: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, linux-hyperv@vger.kernel.org, kvm@vger.kernel.org, xen-devel@lists.xenproject.org Cc: Jonathan Corbet <corbet@lwn.net>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>, Andy Lutomirski <luto@kernel.org>, Oleg Nesterov <oleg@redhat.com>, Tony Luck <tony.luck@intel.com>, "K . Y . Srinivasan" <kys@microsoft.com>, Haiyang Zhang <haiyangz@microsoft.com>, Wei Liu <wei.liu@kernel.org>, Dexuan Cui <decui@microsoft.com>, Paolo Bonzini <pbonzini@redhat.com>, Wanpeng Li <wanpengli@tencent.com>, Vitaly Kuznetsov <vkuznets@redhat.com>, Sean Christopherson <seanjc@google.com>, Peter Zijlstra <peterz@infradead.org>, Juergen Gross <jgross@suse.com>, Stefano Stabellini <sstabellini@kernel.org>, Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>, Josh Poimboeuf <jpoimboe@kernel.org>, "Paul E . McKenney" <paulmck@kernel.org>, Catalin Marinas <catalin.marinas@arm.com>, Randy Dunlap <rdunlap@infradead.org>, Steven Rostedt <rostedt@goodmis.org>, Kim Phillips <kim.phillips@amd.com>, Xin Li <xin3.li@intel.com>, Hyeonggon Yoo <42.hyeyoo@gmail.com>, "Liam R . Howlett" <Liam.Howlett@Oracle.com>, Sebastian Reichel <sebastian.reichel@collabora.com>, "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>, Suren Baghdasaryan <surenb@google.com>, Pawan Gupta <pawan.kumar.gupta@linux.intel.com>, Babu Moger <babu.moger@amd.com>, Jim Mattson <jmattson@google.com>, Sandipan Das <sandipan.das@amd.com>, Lai Jiangshan <jiangshanlai@gmail.com>, Hans de Goede <hdegoede@redhat.com>, Reinette Chatre <reinette.chatre@intel.com>, Daniel Sneddon <daniel.sneddon@linux.intel.com>, Breno Leitao <leitao@debian.org>, Nikunj A Dadhania <nikunj@amd.com>, Brian Gerst <brgerst@gmail.com>, Sami Tolvanen <samitolvanen@google.com>, Alexander Potapenko <glider@google.com>, Andrew Morton <akpm@linux-foundation.org>, Arnd Bergmann <arnd@arndb.de>, "Eric W . Biederman" <ebiederm@xmission.com>, Kees Cook <keescook@chromium.org>, Masami Hiramatsu <mhiramat@kernel.org>, Masahiro Yamada <masahiroy@kernel.org>, Ze Gao <zegao2021@gmail.com>, Fei Li <fei1.li@intel.com>, Conghui <conghui.chen@intel.com>, Ashok Raj <ashok.raj@intel.com>, "Jason A . Donenfeld" <Jason@zx2c4.com>, Mark Rutland <mark.rutland@arm.com>, Jacob Pan <jacob.jun.pan@linux.intel.com>, Jiapeng Chong <jiapeng.chong@linux.alibaba.com>, Jane Malalane <jane.malalane@citrix.com>, David Woodhouse <dwmw@amazon.co.uk>, Boris Ostrovsky <boris.ostrovsky@oracle.com>, Arnaldo Carvalho de Melo <acme@redhat.com>, Yantengsi <siyanteng@loongson.cn>, Christophe Leroy <christophe.leroy@csgroup.eu>, Sathvika Vasireddy <sv@linux.ibm.com> Subject: [PATCH RESEND v9 33/36] KVM: VMX: Add VMX_DO_FRED_EVENT_IRQOFF for IRQ/NMI handling Date: Tue, 1 Aug 2023 01:35:50 -0700 Message-Id: <20230801083553.8468-7-xin3.li@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230801083553.8468-1-xin3.li@intel.com> References: <20230801083553.8468-1-xin3.li@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773020528325343269 X-GMAIL-MSGID: 1773020528325343269 |
Series |
x86: enable FRED for x86-64
|
|
Commit Message
Li, Xin3
Aug. 1, 2023, 8:35 a.m. UTC
Compared to an IDT stack frame, a FRED stack frame has extra 16 bytes of information pushed at the regular stack top and 8 bytes of error code _always_ pushed at the regular stack bottom, add VMX_DO_FRED_EVENT_IRQOFF to generate FRED stack frames with event type and vector properly set. Thus, IRQ/NMI can be handled with the existing approach when FRED is enabled. For IRQ handling, general purpose registers are pushed to the stack to form a pt_regs structure, which is then used to call external_interrupt(). As a result, IRQ handling no longer re-enters the noinstr code. Tested-by: Shan Kang <shan.kang@intel.com> Signed-off-by: Xin Li <xin3.li@intel.com> --- Changes since v8: * Add a new macro VMX_DO_FRED_EVENT_IRQOFF for FRED instead of refactoring VMX_DO_EVENT_IRQOFF (Sean Christopherson). * Do NOT use a trampoline, just LEA+PUSH the return RIP, PUSH the error code, and jump to the FRED kernel entry point for NMI or call external_interrupt() for IRQs (Sean Christopherson). * Call external_interrupt() only when FRED is enabled, and convert the non-FRED handling to external_interrupt() after FRED lands (Sean Christopherson). --- arch/x86/kvm/vmx/vmenter.S | 88 ++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/vmx.c | 19 ++++++-- 2 files changed, 104 insertions(+), 3 deletions(-)
Comments
On Tue, Aug 01, 2023, Xin Li wrote: > > diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S > index 07e927d4d099..5ee6a57b59a5 100644 > --- a/arch/x86/kvm/vmx/vmenter.S > +++ b/arch/x86/kvm/vmx/vmenter.S > @@ -2,12 +2,14 @@ > #include <linux/linkage.h> > #include <asm/asm.h> > #include <asm/bitsperlong.h> > +#include <asm/fred.h> > #include <asm/kvm_vcpu_regs.h> > #include <asm/nospec-branch.h> > #include <asm/percpu.h> > #include <asm/segment.h> > #include "kvm-asm-offsets.h" > #include "run_flags.h" > +#include "../../entry/calling.h" Rather than do the low level PUSH_REGS and POP_REGS, I vote to have core code expose a FRED-specific wrapper for invoking external_interrupt(). More below. > > #define WORD_SIZE (BITS_PER_LONG / 8) > > @@ -31,6 +33,80 @@ > #define VCPU_R15 __VCPU_REGS_R15 * WORD_SIZE > #endif > > +#ifdef CONFIG_X86_FRED > +.macro VMX_DO_FRED_EVENT_IRQOFF branch_insn branch_target nmi=0 objtool isn't happy. arch/x86/kvm/vmx/vmenter.o: warning: objtool: vmx_do_fred_interrupt_irqoff+0x6c: return with modified stack frame arch/x86/kvm/vmx/vmenter.o: warning: objtool: vmx_do_fred_nmi_irqoff+0x37: sibling call from callable instruction with modified stack frame The "return with modified stack frame" goes away with my suggested changes, but the sibling call remains for the NMI case due to the JMP instead of a call. > + /* > + * Unconditionally create a stack frame, getting the correct RSP on the > + * stack (for x86-64) would take two instructions anyways, and RBP can > + * be used to restore RSP to make objtool happy (see below). > + */ > + push %_ASM_BP > + mov %_ASM_SP, %_ASM_BP The frame stuff is worth throwing in a macro, if only to avoid a copy+pasted comment, which by the by, is wrong. (a) it's ERETS, not IRET. (b) the IRQ does a vanilla RET, not ERETS. E.g. like so: .macro VMX_DO_EVENT_FRAME_BEGIN /* * Unconditionally create a stack frame, getting the correct RSP on the * stack (for x86-64) would take two instructions anyways, and RBP can * be used to restore RSP to make objtool happy (see below). */ push %_ASM_BP mov %_ASM_SP, %_ASM_BP .endm .macro VMX_DO_EVENT_FRAME_END /* * "Restore" RSP from RBP, even though {E,I}RET has already unwound RSP * to the correct value *in most cases*. KVM's IRQ handling with FRED * doesn't do ERETS, and objtool doesn't know the callee will IRET/ERET * and, without the explicit restore, thinks the stack is getting walloped. * Using an unwind hint is problematic due to x86-64's dynamic alignment. */ mov %_ASM_BP, %_ASM_SP pop %_ASM_BP .endm > + > + /* > + * Don't check the FRED stack level, the call stack leading to this > + * helper is effectively constant and shallow (relatively speaking). > + * > + * Emulate the FRED-defined redzone and stack alignment. > + */ > + sub $(FRED_CONFIG_REDZONE_AMOUNT << 6), %rsp > + and $FRED_STACK_FRAME_RSP_MASK, %rsp > + > + /* > + * A FRED stack frame has extra 16 bytes of information pushed at the > + * regular stack top compared to an IDT stack frame. There is pretty much no chance that anyone remembers the layout of an IDT stack frame off the top of their head. I.e. saying "FRED has 16 bytes more" isn't all that useful. It also fails to capture the fact that FRED stuff a hell of a lot more information in those "common" 48 bytes. It'll be hard/impossible to capture all of the overload info in a comment, but showing the actual layout of the frame would be super helpful, e.g. something like this /* * FRED stack frames are always 64 bytes: * * ------------------------------ * | Bytes | Usage | * -----------------------------| * | 63:56 | Reserved | * | 55:48 | Event Data | * | 47:40 | SS + Event Info | * | 39:32 | RSP | * | 31:24 | RFLAGS | * | 23:16 | CS + Aux Info | * | 15:8 | RIP | * | 7:0 | Error Code | * ------------------------------ */ > + */ > + push $0 /* Reserved by FRED, must be 0 */ > + push $0 /* FRED event data, 0 for NMI and external interrupts */ > + > + shl $32, %rdi /* FRED event type and vector */ > + .if \nmi > + bts $FRED_SSX_NMI_BIT, %rdi /* Set the NMI bit */ The spec I have from May 2022 says the NMI bit colocated with CS, not SS. And the cover letter's suggestion to use a search engine to find the spec ain't exactly helpful, that just gives me the same "May 2022 Revision 3.0" spec. So either y'all have a spec that I can't find, or this is wrong. > + .endif > + bts $FRED_SSX_64_BIT_MODE_BIT, %rdi /* Set the 64-bit mode */ > + or $__KERNEL_DS, %rdi > + push %rdi > + push %rbp > + pushf > + mov $__KERNEL_CS, %rax > + push %rax > + > + /* > + * Unlike the IDT event delivery, FRED _always_ pushes an error code > + * after pushing the return RIP, thus the CALL instruction CANNOT be > + * used here to push the return RIP, otherwise there is no chance to > + * push an error code before invoking the IRQ/NMI handler. > + * > + * Use LEA to get the return RIP and push it, then push an error code. > + */ > + lea 1f(%rip), %rax This is quite misleading for IRQs. It took me a while to figure out that the only reason it's functionally ok is that external_interrupt() will do RET, not ERETS, i.e. the RIP that's pushed here isn't used for IRQs! Expanding the above comment would be quite helpful, e.g. * * Use LEA to get the return RIP and push it, then push an error code. * Note, only NMI handling does an ERETS to the target! IRQ handling * doesn't need to unmask NMIs and so simply uses CALL+RET, i.e. the * RIP pushed here is only truly consumed for NMIs! > + push %rax > + push $0 /* FRED error code, 0 for NMI and external interrupts */ > + > + .if \nmi == 0 > + PUSH_REGS > + mov %rsp, %rdi Nit, *if* this stays in KVM, please use %_ASM_ARG1 instead of %rdi. I normally dislike unnecessary abstraction, but in this case using _ASM_ARG1 makes it clear (without a comment) that this code is loading a param for a funciton call, *not* for some FRED magic. > + .endif Jumping way back to providing a wrapper for FRED, if we do that, then there's no need to include calling.h, and the weird wrinkle about the ERET target kinda goes away too. E.g. provide this in arch/x86/entry/entry_64_fred.S .section .text, "ax" /* Note, this is instrumentation safe, and returns via RET, not ERETS! */ #if IS_ENABLED(CONFIG_KVM_INTEL) SYM_CODE_START(fred_irq_entry_from_kvm) FRED_ENTER call external_interrupt FRED_EXIT RET SYM_CODE_END(fred_irq_entry_from_kvm) EXPORT_SYMBOL_GPL(fred_irq_entry_from_kvm); #endif and then the KVM side for this particular chunk is more simply: lea 1f(%rip), %rax push %rax push $0 /* FRED error code, 0 for NMI and external interrupts */ \branch_insn \branch_target 1: VMX_DO_EVENT_FRAME_END RET Alternatively, the whole thing could be shoved into arch/x86/entry/entry_64_fred.S, but at a glance I don't think that would be a net positive due to the need to handle IRQs vs. NMIs. > + \branch_insn \branch_target > + > + .if \nmi == 0 > + POP_REGS > + .endif > + > +1: > + /* > + * "Restore" RSP from RBP, even though IRET has already unwound RSP to As mentioned above, this is incorrect on two fronts. > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 0ecf4be2c6af..4e90c69a92bf 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -6890,6 +6890,14 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu) > memset(vmx->pi_desc.pir, 0, sizeof(vmx->pi_desc.pir)); > } > > +#ifdef CONFIG_X86_FRED > +void vmx_do_fred_interrupt_irqoff(unsigned int vector); > +void vmx_do_fred_nmi_irqoff(unsigned int vector); > +#else > +#define vmx_do_fred_interrupt_irqoff(x) BUG() > +#define vmx_do_fred_nmi_irqoff(x) BUG() > +#endif My slight preference is to open code the BUG() as a ud2 in assembly, purely to avoid more #ifdefs. > + > void vmx_do_interrupt_irqoff(unsigned long entry); > void vmx_do_nmi_irqoff(void); > > @@ -6932,14 +6940,16 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu) > { > u32 intr_info = vmx_get_intr_info(vcpu); > unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK; > - gate_desc *desc = (gate_desc *)host_idt_base + vector; > > if (KVM_BUG(!is_external_intr(intr_info), vcpu->kvm, > "unexpected VM-Exit interrupt info: 0x%x", intr_info)) > return; > > kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ); > - vmx_do_interrupt_irqoff(gate_offset(desc)); > + if (cpu_feature_enabled(X86_FEATURE_FRED)) > + vmx_do_fred_interrupt_irqoff(vector); /* Event type is 0 */ I strongly prefer to use code to document what's going on. E.g. the tail comment just left me wondering, what event type is 0? Whereas this makes it quite clear that KVM is signaling a hardware interrupt. The fact that it's a nop as far as code generation goes is irrelevant. vmx_do_fred_interrupt_irqoff((EVENT_TYPE_HWINT << 16) | vector); > + else > + vmx_do_interrupt_irqoff(gate_offset((gate_desc *)host_idt_base + vector)); > kvm_after_interrupt(vcpu); > > vcpu->arch.at_instruction_boundary = true; Here's a diff for (hopefully) everything I've suggested above. --- arch/x86/entry/entry_64_fred.S | 17 ++++++- arch/x86/kernel/traps.c | 5 -- arch/x86/kvm/vmx/vmenter.S | 84 +++++++++++++++------------------- arch/x86/kvm/vmx/vmx.c | 7 +-- 4 files changed, 55 insertions(+), 58 deletions(-) diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S index 12063267d2ac..a973c0bd29f6 100644 --- a/arch/x86/entry/entry_64_fred.S +++ b/arch/x86/entry/entry_64_fred.S @@ -10,7 +10,6 @@ #include "calling.h" .code64 - .section ".noinstr.text", "ax" .macro FRED_ENTER UNWIND_HINT_END_OF_STACK @@ -24,6 +23,22 @@ POP_REGS .endm + .section .text, "ax" + +/* Note, this is instrumentation safe, and returns via RET, not ERETS! */ +#if IS_ENABLED(CONFIG_KVM_INTEL) +SYM_CODE_START(fred_irq_entry_from_kvm) + FRED_ENTER + call external_interrupt + FRED_EXIT + RET +SYM_CODE_END(fred_irq_entry_from_kvm) +EXPORT_SYMBOL_GPL(fred_irq_entry_from_kvm); +#endif + + .section ".noinstr.text", "ax" + + /* * The new RIP value that FRED event delivery establishes is * IA32_FRED_CONFIG & ~FFFH for events that occur in ring 3. diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 21eeba7b188f..cbcb83c71dab 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -1566,11 +1566,6 @@ int external_interrupt(struct pt_regs *regs) return 0; } -#if IS_ENABLED(CONFIG_KVM_INTEL) -/* For KVM VMX to handle IRQs in IRQ induced VM exits. */ -EXPORT_SYMBOL_GPL(external_interrupt); -#endif - #endif /* CONFIG_X86_64 */ void __init trap_init(void) diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S index 79a4c91d9434..e25df565c3f8 100644 --- a/arch/x86/kvm/vmx/vmenter.S +++ b/arch/x86/kvm/vmx/vmenter.S @@ -9,7 +9,6 @@ #include <asm/segment.h> #include "kvm-asm-offsets.h" #include "run_flags.h" -#include "../../entry/calling.h" #define WORD_SIZE (BITS_PER_LONG / 8) @@ -33,15 +32,31 @@ #define VCPU_R15 __VCPU_REGS_R15 * WORD_SIZE #endif +.macro VMX_DO_EVENT_FRAME_BEGIN + /* + * Unconditionally create a stack frame, getting the correct RSP on the + * stack (for x86-64) would take two instructions anyways, and RBP can + * be used to restore RSP to make objtool happy (see below). + */ + push %_ASM_BP + mov %_ASM_SP, %_ASM_BP +.endm + +.macro VMX_DO_EVENT_FRAME_END + /* + * "Restore" RSP from RBP, even though {E,I}RET has already unwound RSP + * to the correct value *in most cases*. KVM's IRQ handling with FRED + * doesn't do ERETS, and objtool doesn't know the callee will IRET/ERET + * and, without the explicit restore, thinks the stack is getting walloped. + * Using an unwind hint is problematic due to x86-64's dynamic alignment. + */ + mov %_ASM_BP, %_ASM_SP + pop %_ASM_BP +.endm + #ifdef CONFIG_X86_FRED .macro VMX_DO_FRED_EVENT_IRQOFF branch_insn branch_target nmi=0 - /* - * Unconditionally create a stack frame, getting the correct RSP on the - * stack (for x86-64) would take two instructions anyways, and RBP can - * be used to restore RSP to make objtool happy (see below). - */ - push %_ASM_BP - mov %_ASM_SP, %_ASM_BP + VMX_DO_EVENT_FRAME_BEGIN /* * Don't check the FRED stack level, the call stack leading to this @@ -78,43 +93,23 @@ * push an error code before invoking the IRQ/NMI handler. * * Use LEA to get the return RIP and push it, then push an error code. + * Note, only NMI handling does an ERETS to the target! IRQ handling + * doesn't need to unmask NMIs and so simply uses CALL+RET, i.e. the + * RIP pushed here is only truly consumed for NMIs! */ lea 1f(%rip), %rax push %rax push $0 /* FRED error code, 0 for NMI and external interrupts */ - .if \nmi == 0 - PUSH_REGS - mov %rsp, %rdi - .endif - \branch_insn \branch_target - - .if \nmi == 0 - POP_REGS - .endif - 1: - /* - * "Restore" RSP from RBP, even though IRET has already unwound RSP to - * the correct value. objtool doesn't know the callee will IRET and, - * without the explicit restore, thinks the stack is getting walloped. - * Using an unwind hint is problematic due to x86-64's dynamic alignment. - */ - mov %_ASM_BP, %_ASM_SP - pop %_ASM_BP + VMX_DO_EVENT_FRAME_END RET .endm #endif .macro VMX_DO_EVENT_IRQOFF call_insn call_target - /* - * Unconditionally create a stack frame, getting the correct RSP on the - * stack (for x86-64) would take two instructions anyways, and RBP can - * be used to restore RSP to make objtool happy (see below). - */ - push %_ASM_BP - mov %_ASM_SP, %_ASM_BP + VMX_DO_EVENT_FRAME_BEGIN #ifdef CONFIG_X86_64 /* @@ -129,14 +124,7 @@ push $__KERNEL_CS \call_insn \call_target - /* - * "Restore" RSP from RBP, even though IRET has already unwound RSP to - * the correct value. objtool doesn't know the callee will IRET and, - * without the explicit restore, thinks the stack is getting walloped. - * Using an unwind hint is problematic due to x86-64's dynamic alignment. - */ - mov %_ASM_BP, %_ASM_SP - pop %_ASM_BP + VMX_DO_EVENT_FRAME_END RET .endm @@ -375,11 +363,13 @@ SYM_INNER_LABEL_ALIGN(vmx_vmexit, SYM_L_GLOBAL) SYM_FUNC_END(__vmx_vcpu_run) -#ifdef CONFIG_X86_FRED SYM_FUNC_START(vmx_do_fred_nmi_irqoff) +#ifdef CONFIG_X86_FRED VMX_DO_FRED_EVENT_IRQOFF jmp fred_entrypoint_kernel nmi=1 +#else + ud2 +#endif SYM_FUNC_END(vmx_do_fred_nmi_irqoff) -#endif SYM_FUNC_START(vmx_do_nmi_irqoff) VMX_DO_EVENT_IRQOFF call asm_exc_nmi_kvm_vmx @@ -438,11 +428,13 @@ SYM_FUNC_END(vmread_error_trampoline) #endif .section .text, "ax" -#ifdef CONFIG_X86_FRED SYM_FUNC_START(vmx_do_fred_interrupt_irqoff) - VMX_DO_FRED_EVENT_IRQOFF call external_interrupt +#ifdef CONFIG_X86_FRED + VMX_DO_FRED_EVENT_IRQOFF call fred_irq_entry_from_kvm +#else + ud2 +#endif SYM_FUNC_END(vmx_do_fred_interrupt_irqoff) -#endif SYM_FUNC_START(vmx_do_interrupt_irqoff) VMX_DO_EVENT_IRQOFF CALL_NOSPEC _ASM_ARG1 diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index bf757f5071e4..cb4675dd87df 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6919,13 +6919,8 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu) memset(vmx->pi_desc.pir, 0, sizeof(vmx->pi_desc.pir)); } -#ifdef CONFIG_X86_FRED void vmx_do_fred_interrupt_irqoff(unsigned int vector); void vmx_do_fred_nmi_irqoff(unsigned int vector); -#else -#define vmx_do_fred_interrupt_irqoff(x) BUG() -#define vmx_do_fred_nmi_irqoff(x) BUG() -#endif void vmx_do_interrupt_irqoff(unsigned long entry); void vmx_do_nmi_irqoff(void); @@ -6976,7 +6971,7 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu) kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ); if (cpu_feature_enabled(X86_FEATURE_FRED)) - vmx_do_fred_interrupt_irqoff(vector); /* Event type is 0 */ + vmx_do_fred_interrupt_irqoff((EVENT_TYPE_HWINT << 16) | vector); else vmx_do_interrupt_irqoff(gate_offset((gate_desc *)host_idt_base + vector)); kvm_after_interrupt(vcpu); base-commit: 8961078ffe509a97ec7803b17912e57c47b93fa2 --
On Tue, Aug 01, 2023 at 07:01:15PM +0000, Sean Christopherson wrote: > The spec I have from May 2022 says the NMI bit colocated with CS, not SS. And > the cover letter's suggestion to use a search engine to find the spec ain't > exactly helpful, that just gives me the same "May 2022 Revision 3.0" spec. So > either y'all have a spec that I can't find, or this is wrong. https://intel.com/sdm is a useful shorthand I've recently been told about. On that page is also "Flexible Return and Event Delivery Specification", when clicked it will gift you a FRED v5.0 PDF.
On Tue, Aug 01, 2023, Peter Zijlstra wrote: > On Tue, Aug 01, 2023 at 07:01:15PM +0000, Sean Christopherson wrote: > > The spec I have from May 2022 says the NMI bit colocated with CS, not SS. And > > the cover letter's suggestion to use a search engine to find the spec ain't > > exactly helpful, that just gives me the same "May 2022 Revision 3.0" spec. So > > either y'all have a spec that I can't find, or this is wrong. > > https://intel.com/sdm > > is a useful shorthand I've recently been told about. Hallelujah! > On that page is also "Flexible Return and Event Delivery Specification", when > clicked it will gift you a FRED v5.0 PDF. Worked for me, too. Thanks!
> > +#include "../../entry/calling.h" > > Rather than do the low level PUSH_REGS and POP_REGS, I vote to have core code > expose a FRED-specific wrapper for invoking external_interrupt(). More below. Nice idea! > > + /* > > + * A FRED stack frame has extra 16 bytes of information pushed at the > > + * regular stack top compared to an IDT stack frame. > > There is pretty much no chance that anyone remembers the layout of an IDT stack > frame off the top of their head. I.e. saying "FRED has 16 bytes more" isn't all > that useful. It also fails to capture the fact that FRED stuff a hell of a lot > more information in those "common" 48 bytes. > > It'll be hard/impossible to capture all of the overload info in a comment, but > showing the actual layout of the frame would be super helpful, e.g. something > like > this > > /* > * FRED stack frames are always 64 bytes: > * > * ------------------------------ > * | Bytes | Usage | > * -----------------------------| > * | 63:56 | Reserved | > * | 55:48 | Event Data | > * | 47:40 | SS + Event Info | > * | 39:32 | RSP | > * | 31:24 | RFLAGS | > * | 23:16 | CS + Aux Info | > * | 15:8 | RIP | > * | 7:0 | Error Code | > * ------------------------------ > */ > * > * Use LEA to get the return RIP and push it, then push an error code. > * Note, only NMI handling does an ERETS to the target! IRQ handling > * doesn't need to unmask NMIs and so simply uses CALL+RET, i.e. the > * RIP pushed here is only truly consumed for NMIs! I take these as ASM code does need more love, i.e., nice comments :/ Otherwise only more confusion. > > Jumping way back to providing a wrapper for FRED, if we do that, then there's no > need to include calling.h, and the weird wrinkle about the ERET target kinda goes > away too. E.g. provide this in arch/x86/entry/entry_64_fred.S > > .section .text, "ax" > > /* Note, this is instrumentation safe, and returns via RET, not ERETS! */ > #if IS_ENABLED(CONFIG_KVM_INTEL) > SYM_CODE_START(fred_irq_entry_from_kvm) > FRED_ENTER > call external_interrupt > FRED_EXIT > RET > SYM_CODE_END(fred_irq_entry_from_kvm) > EXPORT_SYMBOL_GPL(fred_irq_entry_from_kvm); > #endif > > and then the KVM side for this particular chunk is more simply: > > lea 1f(%rip), %rax > push %rax > push $0 /* FRED error code, 0 for NMI and external > interrupts */ > > \branch_insn \branch_target > 1: > VMX_DO_EVENT_FRAME_END > RET > > > Alternatively, the whole thing could be shoved into > arch/x86/entry/entry_64_fred.S, > but at a glance I don't think that would be a net positive due to the need to > handle > IRQs vs. NMIs. > > > + \branch_insn \branch_target > > + > > + .if \nmi == 0 > > + POP_REGS > > + .endif > > + > > +1: > > + /* > > + * "Restore" RSP from RBP, even though IRET has already unwound RSP to > > As mentioned above, this is incorrect on two fronts. > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > > index 0ecf4be2c6af..4e90c69a92bf 100644 > > --- a/arch/x86/kvm/vmx/vmx.c > > +++ b/arch/x86/kvm/vmx/vmx.c > > @@ -6890,6 +6890,14 @@ static void vmx_apicv_post_state_restore(struct > kvm_vcpu *vcpu) > > memset(vmx->pi_desc.pir, 0, sizeof(vmx->pi_desc.pir)); > > } > > > > +#ifdef CONFIG_X86_FRED > > +void vmx_do_fred_interrupt_irqoff(unsigned int vector); > > +void vmx_do_fred_nmi_irqoff(unsigned int vector); > > +#else > > +#define vmx_do_fred_interrupt_irqoff(x) BUG() > > +#define vmx_do_fred_nmi_irqoff(x) BUG() > > +#endif > > My slight preference is to open code the BUG() as a ud2 in assembly, purely to > avoid more #ifdefs. > > > + > > void vmx_do_interrupt_irqoff(unsigned long entry); > > void vmx_do_nmi_irqoff(void); > > > > @@ -6932,14 +6940,16 @@ static void handle_external_interrupt_irqoff(struct > kvm_vcpu *vcpu) > > { > > u32 intr_info = vmx_get_intr_info(vcpu); > > unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK; > > - gate_desc *desc = (gate_desc *)host_idt_base + vector; > > > > if (KVM_BUG(!is_external_intr(intr_info), vcpu->kvm, > > "unexpected VM-Exit interrupt info: 0x%x", intr_info)) > > return; > > > > kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ); > > - vmx_do_interrupt_irqoff(gate_offset(desc)); > > + if (cpu_feature_enabled(X86_FEATURE_FRED)) > > + vmx_do_fred_interrupt_irqoff(vector); /* Event type is 0 */ > > I strongly prefer to use code to document what's going on. E.g. the tail comment > just left me wondering, what event type is 0? Whereas this makes it quite clear > that KVM is signaling a hardware interrupt. The fact that it's a nop as far as > code generation goes is irrelevant. > > vmx_do_fred_interrupt_irqoff((EVENT_TYPE_HWINT << 16) | vector); Better readability. > > > + else > > + vmx_do_interrupt_irqoff(gate_offset((gate_desc *)host_idt_base > + vector)); > > kvm_after_interrupt(vcpu); > > > > vcpu->arch.at_instruction_boundary = true; > > Here's a diff for (hopefully) everything I've suggested above. Thanks a lot! I will test it and update the patch in this mail thread. > > --- > arch/x86/entry/entry_64_fred.S | 17 ++++++- > arch/x86/kernel/traps.c | 5 -- > arch/x86/kvm/vmx/vmenter.S | 84 +++++++++++++++------------------- > arch/x86/kvm/vmx/vmx.c | 7 +-- > 4 files changed, 55 insertions(+), 58 deletions(-) > > diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S > index 12063267d2ac..a973c0bd29f6 100644 > --- a/arch/x86/entry/entry_64_fred.S > +++ b/arch/x86/entry/entry_64_fred.S > @@ -10,7 +10,6 @@ > #include "calling.h" > > .code64 > - .section ".noinstr.text", "ax" > > .macro FRED_ENTER > UNWIND_HINT_END_OF_STACK > @@ -24,6 +23,22 @@ > POP_REGS > .endm > > + .section .text, "ax" > + > +/* Note, this is instrumentation safe, and returns via RET, not ERETS! */ > +#if IS_ENABLED(CONFIG_KVM_INTEL) > +SYM_CODE_START(fred_irq_entry_from_kvm) > + FRED_ENTER > + call external_interrupt > + FRED_EXIT > + RET > +SYM_CODE_END(fred_irq_entry_from_kvm) > +EXPORT_SYMBOL_GPL(fred_irq_entry_from_kvm); > +#endif > + > + .section ".noinstr.text", "ax" > + > + > /* > * The new RIP value that FRED event delivery establishes is > * IA32_FRED_CONFIG & ~FFFH for events that occur in ring 3. > diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c > index 21eeba7b188f..cbcb83c71dab 100644 > --- a/arch/x86/kernel/traps.c > +++ b/arch/x86/kernel/traps.c > @@ -1566,11 +1566,6 @@ int external_interrupt(struct pt_regs *regs) > return 0; > } > > -#if IS_ENABLED(CONFIG_KVM_INTEL) > -/* For KVM VMX to handle IRQs in IRQ induced VM exits. */ > -EXPORT_SYMBOL_GPL(external_interrupt); > -#endif > - > #endif /* CONFIG_X86_64 */ > > void __init trap_init(void) > diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S > index 79a4c91d9434..e25df565c3f8 100644 > --- a/arch/x86/kvm/vmx/vmenter.S > +++ b/arch/x86/kvm/vmx/vmenter.S > @@ -9,7 +9,6 @@ > #include <asm/segment.h> > #include "kvm-asm-offsets.h" > #include "run_flags.h" > -#include "../../entry/calling.h" > > #define WORD_SIZE (BITS_PER_LONG / 8) > > @@ -33,15 +32,31 @@ > #define VCPU_R15 __VCPU_REGS_R15 * WORD_SIZE > #endif > > +.macro VMX_DO_EVENT_FRAME_BEGIN > + /* > + * Unconditionally create a stack frame, getting the correct RSP on the > + * stack (for x86-64) would take two instructions anyways, and RBP can > + * be used to restore RSP to make objtool happy (see below). > + */ > + push %_ASM_BP > + mov %_ASM_SP, %_ASM_BP > +.endm > + > +.macro VMX_DO_EVENT_FRAME_END > + /* > + * "Restore" RSP from RBP, even though {E,I}RET has already unwound RSP > + * to the correct value *in most cases*. KVM's IRQ handling with FRED > + * doesn't do ERETS, and objtool doesn't know the callee will IRET/ERET > + * and, without the explicit restore, thinks the stack is getting walloped. > + * Using an unwind hint is problematic due to x86-64's dynamic alignment. > + */ > + mov %_ASM_BP, %_ASM_SP > + pop %_ASM_BP > +.endm > + > #ifdef CONFIG_X86_FRED > .macro VMX_DO_FRED_EVENT_IRQOFF branch_insn branch_target nmi=0 > - /* > - * Unconditionally create a stack frame, getting the correct RSP on the > - * stack (for x86-64) would take two instructions anyways, and RBP can > - * be used to restore RSP to make objtool happy (see below). > - */ > - push %_ASM_BP > - mov %_ASM_SP, %_ASM_BP > + VMX_DO_EVENT_FRAME_BEGIN > > /* > * Don't check the FRED stack level, the call stack leading to this > @@ -78,43 +93,23 @@ > * push an error code before invoking the IRQ/NMI handler. > * > * Use LEA to get the return RIP and push it, then push an error code. > + * Note, only NMI handling does an ERETS to the target! IRQ handling > + * doesn't need to unmask NMIs and so simply uses CALL+RET, i.e. the > + * RIP pushed here is only truly consumed for NMIs! > */ > lea 1f(%rip), %rax > push %rax > push $0 /* FRED error code, 0 for NMI and external > interrupts */ > > - .if \nmi == 0 > - PUSH_REGS > - mov %rsp, %rdi > - .endif > - > \branch_insn \branch_target > - > - .if \nmi == 0 > - POP_REGS > - .endif > - > 1: > - /* > - * "Restore" RSP from RBP, even though IRET has already unwound RSP to > - * the correct value. objtool doesn't know the callee will IRET and, > - * without the explicit restore, thinks the stack is getting walloped. > - * Using an unwind hint is problematic due to x86-64's dynamic alignment. > - */ > - mov %_ASM_BP, %_ASM_SP > - pop %_ASM_BP > + VMX_DO_EVENT_FRAME_END > RET > .endm > #endif > > .macro VMX_DO_EVENT_IRQOFF call_insn call_target > - /* > - * Unconditionally create a stack frame, getting the correct RSP on the > - * stack (for x86-64) would take two instructions anyways, and RBP can > - * be used to restore RSP to make objtool happy (see below). > - */ > - push %_ASM_BP > - mov %_ASM_SP, %_ASM_BP > + VMX_DO_EVENT_FRAME_BEGIN > > #ifdef CONFIG_X86_64 > /* > @@ -129,14 +124,7 @@ > push $__KERNEL_CS > \call_insn \call_target > > - /* > - * "Restore" RSP from RBP, even though IRET has already unwound RSP to > - * the correct value. objtool doesn't know the callee will IRET and, > - * without the explicit restore, thinks the stack is getting walloped. > - * Using an unwind hint is problematic due to x86-64's dynamic alignment. > - */ > - mov %_ASM_BP, %_ASM_SP > - pop %_ASM_BP > + VMX_DO_EVENT_FRAME_END > RET > .endm > > @@ -375,11 +363,13 @@ SYM_INNER_LABEL_ALIGN(vmx_vmexit, > SYM_L_GLOBAL) > > SYM_FUNC_END(__vmx_vcpu_run) > > -#ifdef CONFIG_X86_FRED > SYM_FUNC_START(vmx_do_fred_nmi_irqoff) > +#ifdef CONFIG_X86_FRED > VMX_DO_FRED_EVENT_IRQOFF jmp fred_entrypoint_kernel nmi=1 > +#else > + ud2 > +#endif > SYM_FUNC_END(vmx_do_fred_nmi_irqoff) > -#endif > > SYM_FUNC_START(vmx_do_nmi_irqoff) > VMX_DO_EVENT_IRQOFF call asm_exc_nmi_kvm_vmx > @@ -438,11 +428,13 @@ SYM_FUNC_END(vmread_error_trampoline) > #endif > > .section .text, "ax" > -#ifdef CONFIG_X86_FRED > SYM_FUNC_START(vmx_do_fred_interrupt_irqoff) > - VMX_DO_FRED_EVENT_IRQOFF call external_interrupt > +#ifdef CONFIG_X86_FRED > + VMX_DO_FRED_EVENT_IRQOFF call fred_irq_entry_from_kvm > +#else > + ud2 > +#endif > SYM_FUNC_END(vmx_do_fred_interrupt_irqoff) > -#endif > > SYM_FUNC_START(vmx_do_interrupt_irqoff) > VMX_DO_EVENT_IRQOFF CALL_NOSPEC _ASM_ARG1 > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index bf757f5071e4..cb4675dd87df 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -6919,13 +6919,8 @@ static void vmx_apicv_post_state_restore(struct > kvm_vcpu *vcpu) > memset(vmx->pi_desc.pir, 0, sizeof(vmx->pi_desc.pir)); > } > > -#ifdef CONFIG_X86_FRED > void vmx_do_fred_interrupt_irqoff(unsigned int vector); > void vmx_do_fred_nmi_irqoff(unsigned int vector); > -#else > -#define vmx_do_fred_interrupt_irqoff(x) BUG() > -#define vmx_do_fred_nmi_irqoff(x) BUG() > -#endif > > void vmx_do_interrupt_irqoff(unsigned long entry); > void vmx_do_nmi_irqoff(void); > @@ -6976,7 +6971,7 @@ static void handle_external_interrupt_irqoff(struct > kvm_vcpu *vcpu) > > kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ); > if (cpu_feature_enabled(X86_FEATURE_FRED)) > - vmx_do_fred_interrupt_irqoff(vector); /* Event type is 0 */ > + vmx_do_fred_interrupt_irqoff((EVENT_TYPE_HWINT << 16) | > vector); > else > vmx_do_interrupt_irqoff(gate_offset((gate_desc *)host_idt_base > + vector)); > kvm_after_interrupt(vcpu); > > base-commit: 8961078ffe509a97ec7803b17912e57c47b93fa2 > --
> > Jumping way back to providing a wrapper for FRED, if we do that, then > > there's no need to include calling.h, and the weird wrinkle about the > > ERET target kinda goes away too. E.g. provide this in > > arch/x86/entry/entry_64_fred.S > > > > .section .text, "ax" > > > > /* Note, this is instrumentation safe, and returns via RET, not ERETS! > > */ #if IS_ENABLED(CONFIG_KVM_INTEL) > > SYM_CODE_START(fred_irq_entry_from_kvm) > > FRED_ENTER > > call external_interrupt > > FRED_EXIT > > RET > > SYM_CODE_END(fred_irq_entry_from_kvm) > > EXPORT_SYMBOL_GPL(fred_irq_entry_from_kvm); > > #endif > > > > and then the KVM side for this particular chunk is more simply: > > > > lea 1f(%rip), %rax > > push %rax > > push $0 /* FRED error code, 0 for NMI and external > > interrupts */ > > > > \branch_insn \branch_target The call instruction here inserts a return address between the FRED stack frame just created and the GP regs pushed in the FRED wrapper, thus it doesn't work. I only realize it after looking at the stack layout in the Intel Simics emulator. > > 1: > > VMX_DO_EVENT_FRAME_END > > RET > > > Alternatively, the whole thing could be shoved into > > arch/x86/entry/entry_64_fred.S, but at a glance I don't think that > > would be a net positive due to the need to handle IRQs vs. NMIs. But following your idea of "the whole thing could be shoved into ...", plus the patch you posted, I get a patch as the following, which works and gets rid of the 2 warnings of "with modified stack frame". After calling external_interrupt() and POP_REGS, we can also use ERETS, thus to unify the stack adjustment. diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S index de9e2dc70e40..2b638914c6ed 100644 --- a/arch/x86/entry/entry_64_fred.S +++ b/arch/x86/entry/entry_64_fred.S @@ -4,12 +4,87 @@ */ #include <asm/asm.h> +#include <asm/export.h> #include <asm/fred.h> +#include <asm/segment.h> #include "calling.h" .code64 +#if IS_ENABLED(CONFIG_KVM_INTEL) +.macro FRED_ENTRY_FROM_KVM event_func nmi=0 + ENDBR +#ifdef CONFIG_X86_FRED + push %rbp + mov %rsp, %rbp + + /* + * Don't check the FRED stack level, the call stack leading to this + * helper is effectively constant and shallow (relatively speaking). + * + * Emulate the FRED-defined redzone and stack alignment. + */ + sub $(FRED_CONFIG_REDZONE_AMOUNT << 6), %rsp + and $FRED_STACK_FRAME_RSP_MASK, %rsp + + /* + * Start to push a FRED stack frame, which is always 64 bytes: + * + * +--------+-----------------+ + * | Bytes | Usage | + * +--------+-----------------+ + * | 63:56 | Reserved | + * | 55:48 | Event Data | + * | 47:40 | SS + Event Info | + * | 39:32 | RSP | + * | 31:24 | RFLAGS | + * | 23:16 | CS + Aux Info | + * | 15:8 | RIP | + * | 7:0 | Error Code | + * +--------+-----------------+ + */ + push $0 /* Reserved, must be 0 */ + push $0 /* Event data, 0 for IRQ/NMI */ + + shl $32, %rdi /* Event type and vector */ + .if \nmi + bts $FRED_SSX_NMI_BIT, %rdi + .endif + bts $FRED_SSX_64_BIT_MODE_BIT, %rdi + or $__KERNEL_DS, %rdi + push %rdi + push %rbp + pushf + mov $__KERNEL_CS, %rax + push %rax + + /* + * Unlike the IDT event delivery, FRED _always_ pushes an error code + * after pushing the return RIP, thus the CALL instruction CANNOT be + * used here to push the return RIP, otherwise there is no chance to + * push an error code before invoking the IRQ/NMI handler. + * + * Use LEA to get the return RIP and push it, then push an error code. + */ + lea 1f(%rip), %rax + push %rax /* Return RIP */ + push $0 /* Error code, 0 for IRQ/NMI */ + + PUSH_AND_CLEAR_REGS + movq %rsp, %rdi /* %rdi -> pt_regs */ + call \event_func + POP_REGS + ERETS +1: + pop %rbp + RET +#else /* CONFIG_X86_FRED */ + ud2 +#endif /* CONFIG_X86_FRED */ +.endm +#endif /* CONFIG_KVM_INTEL */ + .macro FRED_ENTER UNWIND_HINT_END_OF_STACK ENDBR @@ -22,6 +97,16 @@ POP_REGS .endm + .section .text, "ax" + +/* Note, this is instrumentation safe, and returns via RET */ +#if IS_ENABLED(CONFIG_KVM_INTEL) +SYM_CODE_START(fred_irq_entry_from_kvm) + FRED_ENTRY_FROM_KVM external_interrupt +SYM_CODE_END(fred_irq_entry_from_kvm) +EXPORT_SYMBOL_GPL(fred_irq_entry_from_kvm); +#endif + .section .noinstr.text, "ax" /* @@ -55,3 +140,10 @@ SYM_CODE_START_NOALIGN(fred_entrypoint_kernel) FRED_EXIT ERETS SYM_CODE_END(fred_entrypoint_kernel) + +#if IS_ENABLED(CONFIG_KVM_INTEL) +SYM_CODE_START(fred_nmi_entry_from_kvm) + FRED_ENTRY_FROM_KVM fred_exc_nmi nmi=1 +SYM_CODE_END(fred_nmi_entry_from_kvm) +EXPORT_SYMBOL_GPL(fred_nmi_entry_from_kvm); +#endif diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h index 3c91f0eae62e..c7288213de14 100644 --- a/arch/x86/include/asm/fred.h +++ b/arch/x86/include/asm/fred.h @@ -122,6 +122,10 @@ DECLARE_FRED_HANDLER(fred_exc_double_fault); extern asmlinkage __visible void fred_entrypoint_user(void); extern asmlinkage __visible void fred_entrypoint_kernel(void); +/* For KVM VMX to handle IRQs in IRQ induced VM exits */ +extern asmlinkage __visible void fred_irq_entry_from_kvm(unsigned int event_info); +extern asmlinkage __visible void fred_nmi_entry_from_kvm(unsigned int event_info); + #endif /* __ASSEMBLY__ */ #endif /* CONFIG_X86_FRED */ diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S index be275a0410a8..f6f557c6329e 100644 --- a/arch/x86/kvm/vmx/vmenter.S +++ b/arch/x86/kvm/vmx/vmenter.S @@ -40,6 +40,20 @@ push %_ASM_BP mov %_ASM_SP, %_ASM_BP + /* + * Start to push an IDT IRQ/NMI stack frame, which is 40 bytes on + * x86_64 and 24 bytes on x86_32: + * + * +-------+-------------------+ + * | Bytes | Usage | + * +-------+-------------------+ + * | 39:32 | SS (x86_64 only) | + * | 31:24 | RSP (x86_64 only) | + * | 23:16 | RFLAGS | + * | 15:8 | CS | + * | 7:0 | RIP | + * +-------+-------------------+ + */ #ifdef CONFIG_X86_64 /* * Align RSP to a 16-byte boundary (to emulate CPU behavior) before diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index df461f387e20..83c89239fcff 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6961,14 +6961,16 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu) { u32 intr_info = vmx_get_intr_info(vcpu); unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK; - gate_desc *desc = (gate_desc *)host_idt_base + vector; if (KVM_BUG(!is_external_intr(intr_info), vcpu->kvm, "unexpected VM-Exit interrupt info: 0x%x", intr_info)) return; kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ); - vmx_do_interrupt_irqoff(gate_offset(desc)); + if (cpu_feature_enabled(X86_FEATURE_FRED)) + fred_irq_entry_from_kvm((EVENT_TYPE_HWINT << 16) | vector); + else + vmx_do_interrupt_irqoff(gate_offset((gate_desc *)host_idt_base + vector)); kvm_after_interrupt(vcpu); vcpu->arch.at_instruction_boundary = true; @@ -7254,7 +7256,10 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI && is_nmi(vmx_get_intr_info(vcpu))) { kvm_before_interrupt(vcpu, KVM_HANDLING_NMI); - vmx_do_nmi_irqoff(); + if (cpu_feature_enabled(X86_FEATURE_FRED)) + fred_nmi_entry_from_kvm((EVENT_TYPE_NMI << 16) | NMI_VECTOR); + else + vmx_do_nmi_irqoff(); kvm_after_interrupt(vcpu); }
diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S index 07e927d4d099..5ee6a57b59a5 100644 --- a/arch/x86/kvm/vmx/vmenter.S +++ b/arch/x86/kvm/vmx/vmenter.S @@ -2,12 +2,14 @@ #include <linux/linkage.h> #include <asm/asm.h> #include <asm/bitsperlong.h> +#include <asm/fred.h> #include <asm/kvm_vcpu_regs.h> #include <asm/nospec-branch.h> #include <asm/percpu.h> #include <asm/segment.h> #include "kvm-asm-offsets.h" #include "run_flags.h" +#include "../../entry/calling.h" #define WORD_SIZE (BITS_PER_LONG / 8) @@ -31,6 +33,80 @@ #define VCPU_R15 __VCPU_REGS_R15 * WORD_SIZE #endif +#ifdef CONFIG_X86_FRED +.macro VMX_DO_FRED_EVENT_IRQOFF branch_insn branch_target nmi=0 + /* + * Unconditionally create a stack frame, getting the correct RSP on the + * stack (for x86-64) would take two instructions anyways, and RBP can + * be used to restore RSP to make objtool happy (see below). + */ + push %_ASM_BP + mov %_ASM_SP, %_ASM_BP + + /* + * Don't check the FRED stack level, the call stack leading to this + * helper is effectively constant and shallow (relatively speaking). + * + * Emulate the FRED-defined redzone and stack alignment. + */ + sub $(FRED_CONFIG_REDZONE_AMOUNT << 6), %rsp + and $FRED_STACK_FRAME_RSP_MASK, %rsp + + /* + * A FRED stack frame has extra 16 bytes of information pushed at the + * regular stack top compared to an IDT stack frame. + */ + push $0 /* Reserved by FRED, must be 0 */ + push $0 /* FRED event data, 0 for NMI and external interrupts */ + + shl $32, %rdi /* FRED event type and vector */ + .if \nmi + bts $FRED_SSX_NMI_BIT, %rdi /* Set the NMI bit */ + .endif + bts $FRED_SSX_64_BIT_MODE_BIT, %rdi /* Set the 64-bit mode */ + or $__KERNEL_DS, %rdi + push %rdi + push %rbp + pushf + mov $__KERNEL_CS, %rax + push %rax + + /* + * Unlike the IDT event delivery, FRED _always_ pushes an error code + * after pushing the return RIP, thus the CALL instruction CANNOT be + * used here to push the return RIP, otherwise there is no chance to + * push an error code before invoking the IRQ/NMI handler. + * + * Use LEA to get the return RIP and push it, then push an error code. + */ + lea 1f(%rip), %rax + push %rax + push $0 /* FRED error code, 0 for NMI and external interrupts */ + + .if \nmi == 0 + PUSH_REGS + mov %rsp, %rdi + .endif + + \branch_insn \branch_target + + .if \nmi == 0 + POP_REGS + .endif + +1: + /* + * "Restore" RSP from RBP, even though IRET has already unwound RSP to + * the correct value. objtool doesn't know the callee will IRET and, + * without the explicit restore, thinks the stack is getting walloped. + * Using an unwind hint is problematic due to x86-64's dynamic alignment. + */ + mov %_ASM_BP, %_ASM_SP + pop %_ASM_BP + RET +.endm +#endif + .macro VMX_DO_EVENT_IRQOFF call_insn call_target /* * Unconditionally create a stack frame, getting the correct RSP on the @@ -299,6 +375,12 @@ SYM_INNER_LABEL_ALIGN(vmx_vmexit, SYM_L_GLOBAL) SYM_FUNC_END(__vmx_vcpu_run) +#ifdef CONFIG_X86_FRED +SYM_FUNC_START(vmx_do_fred_nmi_irqoff) + VMX_DO_FRED_EVENT_IRQOFF jmp fred_entrypoint_kernel nmi=1 +SYM_FUNC_END(vmx_do_fred_nmi_irqoff) +#endif + SYM_FUNC_START(vmx_do_nmi_irqoff) VMX_DO_EVENT_IRQOFF call asm_exc_nmi_kvm_vmx SYM_FUNC_END(vmx_do_nmi_irqoff) @@ -357,6 +439,12 @@ SYM_FUNC_START(vmread_error_trampoline) SYM_FUNC_END(vmread_error_trampoline) #endif +#ifdef CONFIG_X86_FRED +SYM_FUNC_START(vmx_do_fred_interrupt_irqoff) + VMX_DO_FRED_EVENT_IRQOFF call external_interrupt +SYM_FUNC_END(vmx_do_fred_interrupt_irqoff) +#endif + SYM_FUNC_START(vmx_do_interrupt_irqoff) VMX_DO_EVENT_IRQOFF CALL_NOSPEC _ASM_ARG1 SYM_FUNC_END(vmx_do_interrupt_irqoff) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 0ecf4be2c6af..4e90c69a92bf 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6890,6 +6890,14 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu) memset(vmx->pi_desc.pir, 0, sizeof(vmx->pi_desc.pir)); } +#ifdef CONFIG_X86_FRED +void vmx_do_fred_interrupt_irqoff(unsigned int vector); +void vmx_do_fred_nmi_irqoff(unsigned int vector); +#else +#define vmx_do_fred_interrupt_irqoff(x) BUG() +#define vmx_do_fred_nmi_irqoff(x) BUG() +#endif + void vmx_do_interrupt_irqoff(unsigned long entry); void vmx_do_nmi_irqoff(void); @@ -6932,14 +6940,16 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu) { u32 intr_info = vmx_get_intr_info(vcpu); unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK; - gate_desc *desc = (gate_desc *)host_idt_base + vector; if (KVM_BUG(!is_external_intr(intr_info), vcpu->kvm, "unexpected VM-Exit interrupt info: 0x%x", intr_info)) return; kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ); - vmx_do_interrupt_irqoff(gate_offset(desc)); + if (cpu_feature_enabled(X86_FEATURE_FRED)) + vmx_do_fred_interrupt_irqoff(vector); /* Event type is 0 */ + else + vmx_do_interrupt_irqoff(gate_offset((gate_desc *)host_idt_base + vector)); kvm_after_interrupt(vcpu); vcpu->arch.at_instruction_boundary = true; @@ -7225,7 +7235,10 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI && is_nmi(vmx_get_intr_info(vcpu))) { kvm_before_interrupt(vcpu, KVM_HANDLING_NMI); - vmx_do_nmi_irqoff(); + if (cpu_feature_enabled(X86_FEATURE_FRED)) + vmx_do_fred_nmi_irqoff((EVENT_TYPE_NMI << 16) | NMI_VECTOR); + else + vmx_do_nmi_irqoff(); kvm_after_interrupt(vcpu); }