From patchwork Wed Feb 1 20:43:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Usama Arif X-Patchwork-Id: 51563 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp502887wrn; Wed, 1 Feb 2023 12:47:24 -0800 (PST) X-Google-Smtp-Source: AK7set8DupeBuTB8IVw9Pjhy5F0iSRRnVE7aWVH2GrB7EqfauyHvNAmM+3dtfQOcj+4q1x6VvRi/ X-Received: by 2002:a17:903:1251:b0:196:88e0:ea11 with SMTP id u17-20020a170903125100b0019688e0ea11mr4766138plh.35.1675284444452; Wed, 01 Feb 2023 12:47:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675284444; cv=none; d=google.com; s=arc-20160816; b=T34pnZVHHlMAGOqaZSYbolGwD8WNzWYN0doA+05IkEt0S+n4OEcERG1nq+mFz2OWKs zyyjTC4RwUEN+4oHTiPigQlx6BLKlK8kAuq5xyP0a2vMrWPj1IgeoVCiflzUlcv1v4y/ 3pKObEyw/fCTv0MGD5MVY8mj2nHDDo/CojLO5OGhKBp6RZWgl/KDUC0pg41M0QtlvZ+E 0iDhpZIvCzp0GhgVcWnPAbh87Na5O5wNk1dTjeSiufd4S/SzIN2t+aofIxCkCKxmcfNp aHszxyLnGFYgS/M6UAdD3flN9IAz6Zw+bDQwbQUCl9UZ+PL71Wnu/zd3WNPb8adstTyz G9cQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=1hZJEiSfBX2wQvagNcOApmVWpKDmG3BeofJQJZakBEQ=; b=CpaQL6nt8XRsS1dFa2+F5+pBo0NWZ35711fdASq1tG1zTlkThIXe+UFPatbrh67cET opj+fbw+PJyi11jmJw7dQYevtNDfekEs+4nmDpAV0sZ8RlpDvA/EFMbePmjXMdQt7XBQ O+o4ni6a3HF2JX7Vn5jG6AxtOA0H2QyOV0c3IdWs6MkVgTEmZQN3ncYhBV+LRxFLRtsH ClKIItnJsDBPLc8DuR3PTFt9lbq+HBeBlYLAHQrjCS3oWnQ6XNrmTCdIIAfZ+5MyOEee 6eutI9TQfDbtG4yH+pRKb28u57L8pmUvjY1ujvQRSLZcC6xlFXkiKFGkrCcJjDQ9Tgk5 KmYw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=zJiYzcq9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x11-20020a170902e04b00b001978f58e452si7222173plx.309.2023.02.01.12.47.12; Wed, 01 Feb 2023 12:47:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=zJiYzcq9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231790AbjBAUqP (ORCPT + 99 others); Wed, 1 Feb 2023 15:46:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53638 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231339AbjBAUpf (ORCPT ); Wed, 1 Feb 2023 15:45:35 -0500 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C5587A4B1 for ; Wed, 1 Feb 2023 12:44:54 -0800 (PST) Received: by mail-wr1-x42a.google.com with SMTP id y1so18518744wru.2 for ; Wed, 01 Feb 2023 12:44:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=1hZJEiSfBX2wQvagNcOApmVWpKDmG3BeofJQJZakBEQ=; b=zJiYzcq95uUppV8U6o/cbPusjbNQE0o3HbAIMVOiv8LCM8VygpLlZ5GDbSegiWpg5m DUK+f3RMDblHEw4eyuXVDoIFxoNlPu8uoDWi/ziyTsnnSXX/5/mckEK7GJv3jXVAycei 3RyzBNj9GkalObeHryYwS606ZU84B5O5gnysw3h3/xYwE+GqaUz2c5mwVsaUQYnIjaOb f9RSwG8hsrn65hvOHCQzy+z60nj9clVFtrJnWBacrMFVyvOyZHVb+LOmncIYzy5IJ9os tbeRKOWqMwAJRJD7s+R0mbYHgcuAqd708npbNyExf2ytnGFoZasNO9AkGu1tzWaGRTXt BQdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1hZJEiSfBX2wQvagNcOApmVWpKDmG3BeofJQJZakBEQ=; b=lx5mz7UwwWP4gJr2z3zoI6xc9Ak27ODo0x80hlE1xc4ukYKbKnnNQz5Xoml18LjUg/ ryiJsMDYPONzdXxxWDP/k75HOh/7hV2kHACeuzfPKOtjzcRCDAi0k3yexCE2UDQlvhi3 TCPpv3Qqw+edcE7fVOiIZwWqI3cyDxk+UQ6UK6PCTzvrej/d0wmR9ObeBdDnOiyIDsrt 8Gi0z0B3Wryjhb0lcAJ/Tr/EaJCz0dUH4rm5SNqAXqi8QkoS0URQI0ZsE0Sg1iH1QsP6 DfQcErk0xMqC2hmckn3hshme6t5XysMxwbmGbfMdBxAOwirMMVB8GW+SHV1NVjiSGgNB 5emA== X-Gm-Message-State: AO0yUKVN7YSUzr5fFXMMBQ8FmapfHNVdALaJPyKgNFZoah3OgKpxQkZA RbylminGEHiYVdbQm7MRPIQxxA== X-Received: by 2002:adf:f911:0:b0:2bd:e7a0:6b5e with SMTP id b17-20020adff911000000b002bde7a06b5emr3423929wrr.40.1675284292702; Wed, 01 Feb 2023 12:44:52 -0800 (PST) Received: from usaari01.cust.communityfibre.co.uk ([2a02:6b6a:b566:0:7611:c340:3d8d:d46c]) by smtp.gmail.com with ESMTPSA id n15-20020a5d598f000000b002bdff778d87sm19993584wri.34.2023.02.01.12.44.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Feb 2023 12:44:52 -0800 (PST) From: Usama Arif To: dwmw2@infradead.org, tglx@linutronix.de Cc: mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, x86@kernel.org, pbonzini@redhat.com, paulmck@kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, rcu@vger.kernel.org, mimoja@mimoja.de, hewenliang4@huawei.com, thomas.lendacky@amd.com, seanjc@google.com, pmenzel@molgen.mpg.de, fam.zheng@bytedance.com, punit.agrawal@bytedance.com, simon.evans@bytedance.com, liangma@liangbit.com, Usama Arif Subject: [PATCH 6/9] x86/smpboot: Support parallel startup of secondary CPUs Date: Wed, 1 Feb 2023 20:43:35 +0000 Message-Id: <20230201204338.1337562-7-usama.arif@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230201204338.1337562-1-usama.arif@bytedance.com> References: <20230201204338.1337562-1-usama.arif@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756663061805841292?= X-GMAIL-MSGID: =?utf-8?q?1756663061805841292?= From: Thomas Gleixner To allow for parallel AP bringup, we need to avoid the use of global variables for passing information to the APs, as well as preventing them from all trying to use the same real-mode stack simultaneously. So, introduce a 'lock' field in struct trampoline_header to use as a simple bit-spinlock for the real-mode stack. That lock also protects the global variables initial_gs, initial_stack and early_gdt_descr, which can now be calculated... So how do we calculate those addresses? Well, they they can all be found from the per_cpu data for this CPU. Simples! Except... how does it know what its CPU# is? OK, we export the cpuid_to_apicid[] array and it can search it to find its APIC ID in there. But now you whine at me that it doesn't even know its APIC ID? Well, if it's a relatively modern CPU then the APIC ID is in CPUID leaf 0x0B so we can use that. Otherwise... erm... OK, otherwise it can't have parallel CPU bringup for now. We'll still use a global variable for those CPUs and bring them up one at a time. So add a global 'smpboot_control' field which either contains the APIC ID, or a flag indicating that it can be found in CPUID. This adds the 'do_parallel_bringup' flag in preparation but doesn't actually enable parallel bringup yet. [ dwmw2: Minor tweaks, write a commit message ] [ seanc: Fix stray override of initial_gs in common_cpu_up() ] [ Usama Arif: Disable parallel bringup for AMD CPUs] Not-signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Usama Arif --- arch/x86/include/asm/realmode.h | 3 ++ arch/x86/include/asm/smp.h | 9 +++- arch/x86/kernel/acpi/sleep.c | 1 + arch/x86/kernel/apic/apic.c | 2 +- arch/x86/kernel/head_64.S | 73 ++++++++++++++++++++++++++++ arch/x86/kernel/smpboot.c | 35 +++++++++++-- arch/x86/realmode/init.c | 3 ++ arch/x86/realmode/rm/trampoline_64.S | 14 ++++++ kernel/smpboot.c | 2 +- 9 files changed, 135 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h index fd6f6e5b755a..503336462264 100644 --- a/arch/x86/include/asm/realmode.h +++ b/arch/x86/include/asm/realmode.h @@ -52,6 +52,7 @@ struct trampoline_header { u64 efer; u32 cr4; u32 flags; + u32 lock; #endif }; @@ -65,6 +66,8 @@ extern unsigned long initial_stack; extern unsigned long initial_vc_handler; #endif +extern u32 *trampoline_lock; + extern unsigned char real_mode_blob[]; extern unsigned char real_mode_relocs[]; diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h index a73bced40e24..58ddb2f78092 100644 --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -201,5 +201,12 @@ extern void nmi_selftest(void); #define nmi_selftest() do { } while (0) #endif -#endif /* __ASSEMBLY__ */ +extern unsigned int smpboot_control; + +#endif /* !__ASSEMBLY__ */ + +/* Control bits for startup_64 */ +#define STARTUP_PARALLEL 0x80000000 +#define STARTUP_SECONDARY 0x40000000 + #endif /* _ASM_X86_SMP_H */ diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c index 3b7f4cdbf2e0..06adf340a0f1 100644 --- a/arch/x86/kernel/acpi/sleep.c +++ b/arch/x86/kernel/acpi/sleep.c @@ -115,6 +115,7 @@ int x86_acpi_suspend_lowlevel(void) early_gdt_descr.address = (unsigned long)get_cpu_gdt_rw(smp_processor_id()); initial_gs = per_cpu_offset(smp_processor_id()); + smpboot_control = 0; #endif initial_code = (unsigned long)wakeup_long64; saved_magic = 0x123456789abcdef0L; diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index c6876d3ea4b1..a841c4b4083f 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -2374,7 +2374,7 @@ static int nr_logical_cpuids = 1; /* * Used to store mapping between logical CPU IDs and APIC IDs. */ -static int cpuid_to_apicid[] = { +int cpuid_to_apicid[] = { [0 ... NR_CPUS - 1] = -1, }; diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index d860d437631b..dd931c049acb 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -25,6 +25,7 @@ #include #include #include +#include /* * We are not able to switch in one step to the final KERNEL ADDRESS SPACE @@ -241,6 +242,66 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL) UNWIND_HINT_EMPTY ANNOTATE_NOENDBR // above + /* + * Is this the boot CPU coming up? If so everything is available + * in initial_gs, initial_stack and early_gdt_descr. + */ + movl smpboot_control(%rip), %eax + testl %eax, %eax + jz .Lsetup_cpu + + /* + * Secondary CPUs find out the offsets via the APIC ID. For parallel + * boot the APIC ID is retrieved from CPUID, otherwise it's encoded + * in smpboot_control: + * Bit 0-29 APIC ID if STARTUP_PARALLEL flag is not set + * Bit 30 STARTUP_SECONDARY flag + * Bit 31 STARTUP_PARALLEL flag (use CPUID 0x0b for APIC ID) + */ + testl $STARTUP_PARALLEL, %eax + jnz .Luse_cpuid_0b + andl $0x0FFFFFFF, %eax + jmp .Lsetup_AP + +.Luse_cpuid_0b: + mov $0x0B, %eax + xorl %ecx, %ecx + cpuid + mov %edx, %eax + +.Lsetup_AP: + /* EAX contains the APICID of the current CPU */ + xorl %ecx, %ecx + leaq cpuid_to_apicid(%rip), %rbx + +.Lfind_cpunr: + cmpl (%rbx), %eax + jz .Linit_cpu_data + addq $4, %rbx + addq $8, %rcx + jmp .Lfind_cpunr + +.Linit_cpu_data: + /* Get the per cpu offset */ + leaq __per_cpu_offset(%rip), %rbx + addq %rcx, %rbx + movq (%rbx), %rbx + /* Save it for GS BASE setup */ + movq %rbx, initial_gs(%rip) + + /* Calculate the GDT address */ + movq $gdt_page, %rcx + addq %rbx, %rcx + movq %rcx, early_gdt_descr_base(%rip) + + /* Find the idle task stack */ + movq $idle_threads, %rcx + addq %rbx, %rcx + movq (%rcx), %rcx + movq TASK_threadsp(%rcx), %rcx + movq %rcx, initial_stack(%rip) + +.Lsetup_cpu: /* * We must switch to a new descriptor in kernel space for the GDT * because soon the kernel won't have access anymore to the userspace @@ -281,6 +342,14 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL) */ movq initial_stack(%rip), %rsp + /* Drop the realmode protection. For the boot CPU the pointer is NULL! */ + movq trampoline_lock(%rip), %rax + testq %rax, %rax + jz .Lsetup_idt + lock + btrl $0, (%rax) + +.Lsetup_idt: /* Setup and Load IDT */ pushq %rsi call early_setup_idt @@ -425,6 +494,7 @@ SYM_DATA(initial_vc_handler, .quad handle_vc_boot_ghcb) * reliably detect the end of the stack. */ SYM_DATA(initial_stack, .quad init_thread_union + THREAD_SIZE - FRAME_SIZE) +SYM_DATA(trampoline_lock, .quad 0); __FINITDATA __INIT @@ -659,6 +729,9 @@ SYM_DATA_END(level1_fixmap_pgt) SYM_DATA(early_gdt_descr, .word GDT_ENTRIES*8-1) SYM_DATA_LOCAL(early_gdt_descr_base, .quad INIT_PER_CPU_VAR(gdt_page)) + .align 16 +SYM_DATA(smpboot_control, .long 0) + .align 16 /* This must match the first entry in level2_kernel_pgt */ SYM_DATA(phys_base, .quad 0x0) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 98604a3953e6..44e9f7ae5afc 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -799,6 +799,16 @@ static int __init cpu_init_udelay(char *str) } early_param("cpu_init_udelay", cpu_init_udelay); +static bool do_parallel_bringup = true; + +static int __init no_parallel_bringup(char *str) +{ + do_parallel_bringup = false; + + return 0; +} +early_param("no_parallel_bringup", no_parallel_bringup); + static void __init smp_quirk_init_udelay(void) { /* if cmdline changed it from default, leave it alone */ @@ -1086,8 +1096,6 @@ int common_cpu_up(unsigned int cpu, struct task_struct *idle) #ifdef CONFIG_X86_32 /* Stack for startup_32 can be just as for start_secondary onwards */ per_cpu(cpu_current_top_of_stack, cpu) = task_top_of_stack(idle); -#else - initial_gs = per_cpu_offset(cpu); #endif return 0; } @@ -1112,9 +1120,16 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle, start_ip = real_mode_header->trampoline_start64; #endif idle->thread.sp = (unsigned long)task_pt_regs(idle); - early_gdt_descr.address = (unsigned long)get_cpu_gdt_rw(cpu); initial_code = (unsigned long)start_secondary; - initial_stack = idle->thread.sp; + + if (IS_ENABLED(CONFIG_X86_32)) { + early_gdt_descr.address = (unsigned long)get_cpu_gdt_rw(cpu); + initial_stack = idle->thread.sp; + } else if (do_parallel_bringup) { + smpboot_control = STARTUP_SECONDARY | STARTUP_PARALLEL; + } else { + smpboot_control = STARTUP_SECONDARY | apicid; + } /* Enable the espfix hack for this CPU */ init_espfix_ap(cpu); @@ -1516,6 +1531,18 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus) speculative_store_bypass_ht_init(); + /* + * We can do 64-bit AP bringup in parallel if the CPU reports its + * APIC ID in CPUID leaf 0x0B. Otherwise it's too hard. And not + * for SEV-ES guests because they can't use CPUID that early. + * Also, some AMD CPUs crash when doing parallel cpu bringup, disable + * it for all AMD CPUs to be on the safe side. + */ + if (IS_ENABLED(CONFIG_X86_32) || boot_cpu_data.cpuid_level < 0x0B || + cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT) || + boot_cpu_data.x86_vendor == X86_VENDOR_AMD) + do_parallel_bringup = false; + snp_set_wakeup_secondary_cpu(); } diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c index 41d7669a97ad..8658ba456254 100644 --- a/arch/x86/realmode/init.c +++ b/arch/x86/realmode/init.c @@ -154,6 +154,9 @@ static void __init setup_real_mode(void) trampoline_header->flags = 0; + trampoline_lock = &trampoline_header->lock; + *trampoline_lock = 0; + trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd); /* Map the real mode stub as virtual == physical */ diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/trampoline_64.S index e38d61d6562e..49ebc1636ffd 100644 --- a/arch/x86/realmode/rm/trampoline_64.S +++ b/arch/x86/realmode/rm/trampoline_64.S @@ -49,6 +49,19 @@ SYM_CODE_START(trampoline_start) mov %ax, %es mov %ax, %ss + /* + * Make sure only one CPU fiddles with the realmode stack + */ +.Llock_rm: + btl $0, tr_lock + jnc 2f + pause + jmp .Llock_rm +2: + lock + btsl $0, tr_lock + jc .Llock_rm + # Setup stack movl $rm_stack_end, %esp @@ -241,6 +254,7 @@ SYM_DATA_START(trampoline_header) SYM_DATA(tr_efer, .space 8) SYM_DATA(tr_cr4, .space 4) SYM_DATA(tr_flags, .space 4) + SYM_DATA(tr_lock, .space 4) SYM_DATA_END(trampoline_header) #include "trampoline_common.S" diff --git a/kernel/smpboot.c b/kernel/smpboot.c index 2c7396da470c..a18a21dff9bc 100644 --- a/kernel/smpboot.c +++ b/kernel/smpboot.c @@ -25,7 +25,7 @@ * For the hotplug case we keep the task structs around and reuse * them. */ -static DEFINE_PER_CPU(struct task_struct *, idle_threads); +DEFINE_PER_CPU(struct task_struct *, idle_threads); struct task_struct *idle_thread_get(unsigned int cpu) {