From patchwork Tue Aug 1 08:35:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Xin3" X-Patchwork-Id: 129183 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:918b:0:b0:3e4:2afc:c1 with SMTP id s11csp2592117vqg; Tue, 1 Aug 2023 04:10:20 -0700 (PDT) X-Google-Smtp-Source: APBJJlHVr632q0E0aGL6MkQC+wocZvHs9E87eQ7F8s3b9q5PZPEN3raeG4rsHNx2+b5IKyJW/CB+ X-Received: by 2002:ac2:4e81:0:b0:4f9:5396:ed1b with SMTP id o1-20020ac24e81000000b004f95396ed1bmr1689019lfr.28.1690888219830; Tue, 01 Aug 2023 04:10:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690888219; cv=none; d=google.com; s=arc-20160816; b=LqKyLMWu1DsJSBomzHrrEsH2wKIUguScBFLbtTXt3XJV9AfWcahOmKqBKP8SPs7ETf JHv8lWnrLpn3QWMSkXjtj57aaTopgycK0ySQbaBf+M+GqGeS8EcdcQM9zixjQHr9ppEY 3AQU5D/su4KMQozROjZS7BS+mAxu1Dly3R2Nk2gRWSeVmkl/6bXT4dN1ynzMJQ4eWPAZ 3nAf/0KIUXkIH/r5L4HhfF/qMtH87p2enh5l+P9z3cgki2ffrG5toEOQCQ+dOa7ng2XJ n++lHBd6jLxbiXckTjE+DPTEOUaAOnpEAPyPuVqUkDY8XEjs6APVZQrw5tpVcZ2ccY4B espQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=+AYzWpwjnouLHD0Q2extCh7sVb8xamw26aML7OR/I40=; fh=nYO9ukkM7JnAAJZcChGC8UWBlCLXeJxIDT82cxoIaf0=; b=hbdkNReJe7DuHqNgb1F+O57lug7UzaT8l5Xa7fcHNAiBqr87S3+OSog1qWw+nLWXC1 xd31FL+cmEAdXXAjrFMIppIuXg1zt8ON53pm+f/wWX//SHi+kIZfnuPmBF0ePyotZhtw r4FHyfCajuB5iYNfBVmJJaX4IDjHs37pDuKIB0dQy/CHbh6vzkDkKLM9CYAsRcgt8r4K KDomdnCPn8G4fH1sjAhOPWc60D/DABojR1ZusOhbwfSoMlZKGxCLhOX+46TRvQm44V/b Rwn4llKblugWCH4ugRkoXQKk4PBzcWzRDpgRLcOwFXcVwyo/jULMYTezCoqNVmbqsuMc 2kKw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=fnCP3Q2D; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u19-20020a056402065300b00522ab764879si4922176edx.456.2023.08.01.04.09.55; Tue, 01 Aug 2023 04:10:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=fnCP3Q2D; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232806AbjHAJJJ (ORCPT + 99 others); Tue, 1 Aug 2023 05:09:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36768 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233018AbjHAJIb (ORCPT ); Tue, 1 Aug 2023 05:08:31 -0400 Received: from mgamail.intel.com (unknown [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 890D31FCA; Tue, 1 Aug 2023 02:06:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1690880778; x=1722416778; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=lojZodI3hwQLsEtOWQ3jf2ThLdccjOvBbgNRJld8Dhw=; b=fnCP3Q2Das3N7zts1f+DuNPXk2zmp4qMXLhnm2WKtqYDRB//PbD3r6HU eiwIbfgSGzHqzWpocAeZQVipGp1VNRYamX8qJytzRJl5H7V/5MGj1QVMc 2nh/GGixi0690Tc0MWNKR22XHeCyiUucbQnj/Y75HhVgYe1V+FX+0q2G9 ZKAfCBIri7ehQnexuw3RB7iZo8LTYlw+XsGhwXGXEZTM+S4paz6/hfXYb kePSMosLH9ZKAXiocll9eZPA+uWGW6bO1SB2+4QYjOiWKrTBuvuEffrTJ lk8ze7O4YPDLnvN/avzBpM4sQy17qZxCA6+v9BqGIJmP5tJOj+pvsAQzg w==; X-IronPort-AV: E=McAfee;i="6600,9927,10788"; a="366713752" X-IronPort-AV: E=Sophos;i="6.01,246,1684825200"; d="scan'208";a="366713752" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Aug 2023 02:04:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10788"; a="722420766" X-IronPort-AV: E=Sophos;i="6.01,246,1684825200"; d="scan'208";a="722420766" Received: from unknown (HELO fred..) ([172.25.112.68]) by orsmga007.jf.intel.com with ESMTP; 01 Aug 2023 02:04:40 -0700 From: Xin Li To: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, linux-hyperv@vger.kernel.org, kvm@vger.kernel.org, xen-devel@lists.xenproject.org Cc: Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H . Peter Anvin" , Andy Lutomirski , Oleg Nesterov , Tony Luck , "K . Y . Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Sean Christopherson , Peter Zijlstra , Juergen Gross , Stefano Stabellini , Oleksandr Tyshchenko , Josh Poimboeuf , "Paul E . McKenney" , Catalin Marinas , Randy Dunlap , Steven Rostedt , Kim Phillips , Xin Li , Hyeonggon Yoo <42.hyeyoo@gmail.com>, "Liam R . Howlett" , Sebastian Reichel , "Kirill A . Shutemov" , Suren Baghdasaryan , Pawan Gupta , Babu Moger , Jim Mattson , Sandipan Das , Lai Jiangshan , Hans de Goede , Reinette Chatre , Daniel Sneddon , Breno Leitao , Nikunj A Dadhania , Brian Gerst , Sami Tolvanen , Alexander Potapenko , Andrew Morton , Arnd Bergmann , "Eric W . Biederman" , Kees Cook , Masami Hiramatsu , Masahiro Yamada , Ze Gao , Fei Li , Conghui , Ashok Raj , "Jason A . Donenfeld" , Mark Rutland , Jacob Pan , Jiapeng Chong , Jane Malalane , David Woodhouse , Boris Ostrovsky , Arnaldo Carvalho de Melo , Yantengsi , Christophe Leroy , Sathvika Vasireddy Subject: [PATCH RESEND v9 35/36] x86/fred: FRED initialization code Date: Tue, 1 Aug 2023 01:35:52 -0700 Message-Id: <20230801083553.8468-9-xin3.li@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230801083553.8468-1-xin3.li@intel.com> References: <20230801083553.8468-1-xin3.li@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773024806019134587 X-GMAIL-MSGID: 1773024806019134587 From: "H. Peter Anvin (Intel)" The code to initialize FRED when it's available and _not_ disabled. cpu_init_fred_exceptions() is the core function to initialize FRED, which 1. Sets up FRED entrypoints for events happening in ring 0 and 3. 2. Sets up a default stack for event handling. 3. Sets up dedicated event stacks for DB/NMI/MC/DF, equivalent to the IDT IST stacks. 4. Forces 32-bit system calls to use "int $0x80" only. 5. Enables FRED and invalidtes IDT. When the FRED is used, cpu_init_exception_handling() initializes FRED through calling cpu_init_fred_exceptions(), otherwise it sets up TSS IST and loads IDT. As FRED uses the ring 3 FRED entrypoint for SYSCALL and SYSENTER, it skips setting up SYSCALL/SYSENTER related MSRs, e.g., MSR_LSTAR. Signed-off-by: H. Peter Anvin (Intel) Co-developed-by: Xin Li Tested-by: Shan Kang Signed-off-by: Xin Li --- Changes since v8: * Move this patch after all required changes are in place (Thomas Gleixner). Changes since v5: * Add a comment for FRED stack level settings (Lai Jiangshan). * Define #DB/NMI/#MC/#DF stack levels using macros. --- arch/x86/include/asm/fred.h | 28 ++++++++++++++++ arch/x86/include/asm/traps.h | 4 ++- arch/x86/kernel/Makefile | 1 + arch/x86/kernel/cpu/common.c | 28 +++++++++++++--- arch/x86/kernel/fred.c | 64 ++++++++++++++++++++++++++++++++++++ arch/x86/kernel/irqinit.c | 7 +++- arch/x86/kernel/traps.c | 11 ++++++- 7 files changed, 135 insertions(+), 8 deletions(-) create mode 100644 arch/x86/kernel/fred.c diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h index 3c91f0eae62e..6031138b778c 100644 --- a/arch/x86/include/asm/fred.h +++ b/arch/x86/include/asm/fred.h @@ -68,6 +68,19 @@ #define FRED_SSX_64_BIT_MODE_BIT 57 #define FRED_SSX_64_BIT_MODE _BITUL(FRED_SSX_64_BIT_MODE_BIT) +/* #DB in the kernel would imply the use of a kernel debugger. */ +#define FRED_DB_STACK_LEVEL 1 +#define FRED_NMI_STACK_LEVEL 2 +#define FRED_MC_STACK_LEVEL 2 +/* + * #DF is the highest level because a #DF means "something went wrong + * *while delivering an exception*." The number of cases for which that + * can happen with FRED is drastically reduced and basically amounts to + * "the stack you pointed me to is broken." Thus, always change stacks + * on #DF, which means it should be at the highest level. + */ +#define FRED_DF_STACK_LEVEL 3 + /* * FRED event delivery establishes a full supervisor context by * saving the essential information about an event to a FRED @@ -122,8 +135,23 @@ DECLARE_FRED_HANDLER(fred_exc_double_fault); extern asmlinkage __visible void fred_entrypoint_user(void); extern asmlinkage __visible void fred_entrypoint_kernel(void); +void cpu_init_fred_exceptions(void); +void fred_setup_apic(void); + #endif /* __ASSEMBLY__ */ +#else +#ifndef __ASSEMBLY__ +static inline void cpu_init_fred_exceptions(void) +{ + BUG(); +} + +static inline void fred_setup_apic(void) +{ + BUG(); +} +#endif #endif /* CONFIG_X86_FRED */ #endif /* ASM_X86_FRED_H */ diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h index 48daa78ee88c..da7e8ab1d66d 100644 --- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -49,6 +49,7 @@ void __noreturn handle_stack_overflow(struct pt_regs *regs, #ifdef CONFIG_X86_64 inline void set_sysvec_handler(unsigned int i, system_interrupt_handler func); +bool is_sysvec_used(unsigned int i); static inline void sysvec_setup_fred(unsigned int vector, system_interrupt_handler func) { @@ -63,7 +64,8 @@ static inline void sysvec_setup_fred(unsigned int vector, system_interrupt_handl #define sysvec_install(vector, func) { \ sysvec_setup_fred(vector, func); \ - alloc_intr_gate(vector, asm_##func); \ + if (!cpu_feature_enabled(X86_FEATURE_FRED)) \ + alloc_intr_gate(vector, asm_##func); \ } int external_interrupt(struct pt_regs *regs); diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 4070a01c11b7..46d8daa11c17 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -48,6 +48,7 @@ obj-y += platform-quirks.o obj-y += process_$(BITS).o signal.o signal_$(BITS).o obj-y += traps.o idt.o irq.o irq_$(BITS).o dumpstack_$(BITS).o obj-y += time.o ioport.o dumpstack.o nmi.o +obj-$(CONFIG_X86_FRED) += fred.o obj-$(CONFIG_MODIFY_LDT_SYSCALL) += ldt.o obj-y += setup.o x86_init.o i8259.o irqinit.o obj-$(CONFIG_JUMP_LABEL) += jump_label.o diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index bb03dacc5fb8..b34a8a138755 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -62,6 +62,7 @@ #include #include #include +#include #include #include #include @@ -2062,13 +2063,24 @@ static inline void idt_syscall_init(void) X86_EFLAGS_AC|X86_EFLAGS_ID); } +static inline void fred_syscall_init(void) +{ + /* Both sysexit and sysret cause #UD when FRED is enabled */ + wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)GDT_ENTRY_INVALID_SEG); + wrmsrl_safe(MSR_IA32_SYSENTER_ESP, 0ULL); + wrmsrl_safe(MSR_IA32_SYSENTER_EIP, 0ULL); +} + /* May not be marked __init: used by software suspend */ void syscall_init(void) { /* The default user and kernel segments */ wrmsr(MSR_STAR, 0, (__USER32_CS << 16) | __KERNEL_CS); - idt_syscall_init(); + if (cpu_feature_enabled(X86_FEATURE_FRED)) + fred_syscall_init(); + else + idt_syscall_init(); } #else /* CONFIG_X86_64 */ @@ -2184,8 +2196,6 @@ void cpu_init_exception_handling(void) /* paranoid_entry() gets the CPU number from the GDT */ setup_getcpu(cpu); - /* IST vectors need TSS to be set up. */ - tss_setup_ist(tss); tss_setup_io_bitmap(tss); set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss); @@ -2194,8 +2204,16 @@ void cpu_init_exception_handling(void) /* GHCB needs to be setup to handle #VC. */ setup_ghcb(); - /* Finally load the IDT */ - load_current_idt(); + if (cpu_feature_enabled(X86_FEATURE_FRED)) { + /* Set up FRED exception handling */ + cpu_init_fred_exceptions(); + } else { + /* IST vectors need TSS to be set up. */ + tss_setup_ist(tss); + + /* Finally load the IDT */ + load_current_idt(); + } } /* diff --git a/arch/x86/kernel/fred.c b/arch/x86/kernel/fred.c new file mode 100644 index 000000000000..7fdf79c964a8 --- /dev/null +++ b/arch/x86/kernel/fred.c @@ -0,0 +1,64 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#include + +#include +#include +#include +#include + +void cpu_init_fred_exceptions(void) +{ + wrmsrl(MSR_IA32_FRED_CONFIG, + /* Reserve for CALL emulation */ + FRED_CONFIG_REDZONE | + FRED_CONFIG_INT_STKLVL(0) | + FRED_CONFIG_ENTRYPOINT(fred_entrypoint_user)); + + /* + * The purpose of separate stacks for NMI, #DB and #MC *in the kernel* + * (remember that user space faults are always taken on stack level 0) + * is to avoid overflowing the kernel stack. + */ + wrmsrl(MSR_IA32_FRED_STKLVLS, + FRED_STKLVL(X86_TRAP_DB, FRED_DB_STACK_LEVEL) | + FRED_STKLVL(X86_TRAP_NMI, FRED_NMI_STACK_LEVEL) | + FRED_STKLVL(X86_TRAP_MC, FRED_MC_STACK_LEVEL) | + FRED_STKLVL(X86_TRAP_DF, FRED_DF_STACK_LEVEL)); + + /* The FRED equivalents to IST stacks... */ + wrmsrl(MSR_IA32_FRED_RSP1, __this_cpu_ist_top_va(DB)); + wrmsrl(MSR_IA32_FRED_RSP2, __this_cpu_ist_top_va(NMI)); + wrmsrl(MSR_IA32_FRED_RSP3, __this_cpu_ist_top_va(DF)); + + /* Not used with FRED */ + wrmsrl(MSR_LSTAR, 0ULL); + wrmsrl(MSR_CSTAR, 0ULL); + wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)GDT_ENTRY_INVALID_SEG); + wrmsrl_safe(MSR_IA32_SYSENTER_ESP, 0ULL); + wrmsrl_safe(MSR_IA32_SYSENTER_EIP, 0ULL); + + /* Enable FRED */ + cr4_set_bits(X86_CR4_FRED); + /* Any further IDT use is a bug */ + idt_invalidate(); + + /* Use int $0x80 for 32-bit system calls in FRED mode */ + setup_clear_cpu_cap(X86_FEATURE_SYSENTER32); + setup_clear_cpu_cap(X86_FEATURE_SYSCALL32); +} + +/* + * Initialize system vectors from a FRED perspective, so + * lapic_assign_system_vectors() can do its job. + */ +void __init fred_setup_apic(void) +{ + int i; + + for (i = 0; i < FIRST_EXTERNAL_VECTOR; i++) + set_bit(i, system_vectors); + + for (i = 0; i < NR_SYSTEM_VECTORS; i++) + if (is_sysvec_used(i)) + set_bit(i + FIRST_SYSTEM_VECTOR, system_vectors); +} diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c index c683666876f1..2a510f72dd11 100644 --- a/arch/x86/kernel/irqinit.c +++ b/arch/x86/kernel/irqinit.c @@ -28,6 +28,7 @@ #include #include #include +#include #include /* @@ -96,7 +97,11 @@ void __init native_init_IRQ(void) /* Execute any quirks before the call gates are initialised: */ x86_init.irqs.pre_vector_init(); - idt_setup_apic_and_irq_gates(); + if (cpu_feature_enabled(X86_FEATURE_FRED)) + fred_setup_apic(); + else + idt_setup_apic_and_irq_gates(); + lapic_assign_system_vectors(); if (!acpi_ioapic && !of_ioapic && nr_legacy_irqs()) { diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 6143ad56008e..21eeba7b188f 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -1542,6 +1542,12 @@ void set_sysvec_handler(unsigned int i, system_interrupt_handler func) system_interrupt_handlers[i] = func; } +bool is_sysvec_used(unsigned int i) +{ + BUG_ON(i >= NR_SYSTEM_VECTORS); + return system_interrupt_handlers[i] != dispatch_table_spurious_interrupt; +} + int external_interrupt(struct pt_regs *regs) { unsigned int vector = regs->vector; @@ -1577,7 +1583,10 @@ void __init trap_init(void) /* Initialize TSS before setting up traps so ISTs work */ cpu_init_exception_handling(); + /* Setup traps as cpu_init() might #GP */ - idt_setup_traps(); + if (!cpu_feature_enabled(X86_FEATURE_FRED)) + idt_setup_traps(); + cpu_init(); }