Message ID | 20230302111227.2102545-1-usama.arif@bytedance.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp4174805wrd; Thu, 2 Mar 2023 03:17:05 -0800 (PST) X-Google-Smtp-Source: AK7set9YrlJKvqBx7827IFvE0W8oLjg7Gp9WnQfjYLV6gCR79AQAzF7PkypNwCmrWQq37os+aXEe X-Received: by 2002:aa7:d546:0:b0:4ac:d3bc:cb0d with SMTP id u6-20020aa7d546000000b004acd3bccb0dmr9474325edr.3.1677755824945; Thu, 02 Mar 2023 03:17:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1677755824; cv=none; d=google.com; s=arc-20160816; b=yizV64JyKW54r6K3Oy91G8LR+ld8gFbZlaSuOUrmlJ2YpxzLjRobb+rzuMjhYYonXt rd+2hs/+l9CFRSkZNaB/2IMHLoa/NUk1nHBNY81Y3lscamtni54F+pJy0ZmAxoY7DeBv mChH0tGAN/0ks1uS5fhrVy36TM/rIJp/N1DxbS/SMBPCMGMPO35++GqRQxaJDDVA9XsY 2figo1zzWUtm7GLYu39z5UOQ4qAlZDa21vx4GnEAKlOGNDQqPMfzX9Wwn7W6KbAQscMo ZlSuTc3KJU5NzI2oKO+EoAAcEKfvz6DkQLORWn5MKIHqjpdDAmT0OoV8VxOyiSSeLLrG L1Iw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=SmbgYBfsiP69zAShGBqEQbH1PuV83xuFSvcJJxGMMf4=; b=N18L8mxz8Qnvhe9lVTxsUuzrnPuJw3ILS8/5EHI2iI0CINYVTrdHOjpEQSgi6zBpMd CzcjD3/TCa/ff/Jh0htMT2az4XE20N722bb63Q+MCyLO73TsQqiWWBXKM5cyglw4JwpG ujW1RejPaZz77aSkI/d2JM0JAtZCl3SViIajPZi5iFLZCx/Oba5uiQsLm690AD+MfC0i 0HvkR93CY/lmIdD8X9yeE3fLhreQ+NSRg6Z6U2PowfQwEAanX8lIKa3aoQOP4HaBeSLZ cgAP2fEQNN/jWgApkyiPC50M4jRGBQ+fpUhKxiLtR5QOg6v/yTu5PZ3Cpk+rtJ98DAL+ ZBnA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=DrUlMSLx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w3-20020a056402128300b004abb522bf9bsi7118971edv.142.2023.03.02.03.16.42; Thu, 02 Mar 2023 03:17:04 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=DrUlMSLx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230227AbjCBLNE (ORCPT <rfc822;davidbtadokoro@gmail.com> + 99 others); Thu, 2 Mar 2023 06:13:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33332 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230062AbjCBLM6 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 2 Mar 2023 06:12:58 -0500 Received: from mail-wr1-x429.google.com (mail-wr1-x429.google.com [IPv6:2a00:1450:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7245C3E082 for <linux-kernel@vger.kernel.org>; Thu, 2 Mar 2023 03:12:32 -0800 (PST) Received: by mail-wr1-x429.google.com with SMTP id j2so16119920wrh.9 for <linux-kernel@vger.kernel.org>; Thu, 02 Mar 2023 03:12:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=SmbgYBfsiP69zAShGBqEQbH1PuV83xuFSvcJJxGMMf4=; b=DrUlMSLxCfk+wenY5dGy8zdp8YdI8rwkdVkiM9b4xwhFqbBcP8gKQGiVekkShAMDGb siTZabe8yR4OsN5yFqEf8tWw23l9QEZd1B8mQN/xwWdY2GwRCgDaxdwXnDBXjWcOhFep zmavEFKBv6RbLcB3Y4wZJ7XWc2Fw37ZaUdqhU2vo9HMB8jdBrrHp5RvaQSaEkhP0n5dw ozot/+J4p8hyqVnp0u0amlEz5wYTixfb5YB7sMyyJkGiJlNuYRzrFRAk57eMqqeeEF3P UCxjArfwFZ+h97ZOj0DuNvdL4gSBPKxwVkKyrRIUuILFpN+yMY6Wce6+3NqU2ghB2TcL RRJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=SmbgYBfsiP69zAShGBqEQbH1PuV83xuFSvcJJxGMMf4=; b=kjAZDS8M++0LPY0gBhanrztmRDXCzaRIi/H6aWQmyUtTdvjoxHBtMbbpWGBmCYOEeV Zn8FQF+Xg2B6jNVdOSrIDRtW+Iqh1avCClxKanZhcPRCIzJchnLNwctVs59/xla0BeFN mTqfkUXeFtyeNoh3PFqQiNVixYhhOKoRfOyzQMVQ2xyBIExvIJlv2FSr9w5ZPvvbOVhJ n7xPK1p4dqWb7EUCLjZayCKEsROkYIQUblOxCi+1Nz9FRVZaw1ScNjCWzXsJjzBylYzx wtyfj97yNDrZEWgJALJQNKUKjL1RZxFhNiYF/0L84DlMUdsUC7VCJPuYXi0BqbjOpFD9 +3lg== X-Gm-Message-State: AO0yUKUt6dPbGGBiJVkhbPlaVX2hLoSjO1yr43o/Ticxxk3gDuOO5tp8 khDGzm1SaQ0VHjdPFe3bFj2HmQ== X-Received: by 2002:adf:fecd:0:b0:2c7:13e4:2094 with SMTP id q13-20020adffecd000000b002c713e42094mr7740020wrs.42.1677755550902; Thu, 02 Mar 2023 03:12:30 -0800 (PST) Received: from usaari01.cust.communityfibre.co.uk ([2a02:6b6a:b566:0:11aa:3c13:d3e:eb29]) by smtp.gmail.com with ESMTPSA id a5-20020a5d4565000000b002c3f81c51b6sm14724830wrc.90.2023.03.02.03.12.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Mar 2023 03:12:30 -0800 (PST) From: Usama Arif <usama.arif@bytedance.com> To: dwmw2@infradead.org, tglx@linutronix.de, kim.phillips@amd.com, brgerst@gmail.com Cc: piotrgorski@cachyos.org, oleksandr@natalenko.name, arjan@linux.intel.com, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, x86@kernel.org, pbonzini@redhat.com, paulmck@kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, rcu@vger.kernel.org, mimoja@mimoja.de, hewenliang4@huawei.com, thomas.lendacky@amd.com, seanjc@google.com, pmenzel@molgen.mpg.de, fam.zheng@bytedance.com, punit.agrawal@bytedance.com, simon.evans@bytedance.com, liangma@liangbit.com, Usama Arif <usama.arif@bytedance.com> Subject: [PATCH v13 00/11] Parallel CPU bringup for x86_64 Date: Thu, 2 Mar 2023 11:12:16 +0000 Message-Id: <20230302111227.2102545-1-usama.arif@bytedance.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759254491929699838?= X-GMAIL-MSGID: =?utf-8?q?1759254491929699838?= |
Series |
Parallel CPU bringup for x86_64
|
|
Message
Usama Arif
March 2, 2023, 11:12 a.m. UTC
The main code change over v12 is to fix the build error when CONFIG_FORCE_NR_CPUS is present. The commit message for removing initial stack has also been improved, typos have been fixed and extra comments have been added to make code clearer. Thanks, Usama Changes across versions: v2: Cut it back to just INIT/SIPI/SIPI in parallel for now, nothing more v3: Clean up x2apic patch, add MTRR optimisation, lock topology update in preparation for more parallelisation. v4: Fixes to the real mode parallelisation patch spotted by SeanC, to avoid scribbling on initial_gs in common_cpu_up(), and to allow all 24 bits of the physical X2APIC ID to be used. That patch still needs a Signed-off-by from its original author, who once claimed not to remember writing it at all. But now we've fixed it, hopefully he'll admit it now :) v5: rebase to v6.1 and remeasure performance, disable parallel bringup for AMD CPUs. v6: rebase to v6.2-rc6, disabled parallel boot on amd as a cpu bug and reused timer calibration for secondary CPUs. v7: [David Woodhouse] iterate over all possible CPUs to find any existing cluster mask in alloc_clustermask. (patch 1/9) Keep parallel AMD support enabled in AMD, using APIC ID in CPUID leaf 0x0B (for x2APIC mode) or CPUID leaf 0x01 where 8 bits are sufficient. Included sanity checks for APIC id from 0x0B. (patch 6/9) Removed patch for reusing timer calibration for secondary CPUs. commit message and code improvements. v8: Fix CPU0 hotplug by setting up the initial_gs, initial_stack and early_gdt_descr. Drop trampoline lock and bail if APIC ID not found in find_cpunr. Code comments improved and debug prints added. v9: Drop patch to avoid repeated saves of MTRR at boot time. rebased and retested at v6.2-rc8. added kernel doc for no_parallel_bringup and made do_parallel_bringup __ro_after_init. v10: Fixed suspend/resume not working with parallel smpboot. rebased and retested to 6.2. fixed checkpatch errors. v11: Added patches from Brian Gerst to remove the global variables initial_gs, initial_stack, and early_gdt_descr from the 64-bit boot code (https://lore.kernel.org/all/20230222221301.245890-1-brgerst@gmail.com/). v12: Fixed compilation errors, acquire tr_lock for every stack setup in trampoline_64.S. Rearranged commits for a cleaner git history. v13: Fix build error with CONFIG_FORCE_NR_CPUS. Commit message improved, typos fixed and extra comments added. Brian Gerst (3): x86/smpboot: Remove initial_stack on 64-bit x86/smpboot: Remove early_gdt_descr on 64-bit x86/smpboot: Remove initial_gs David Woodhouse (8): x86/apic/x2apic: Allow CPU cluster_mask to be populated in parallel cpu/hotplug: Move idle_thread_get() to <linux/smpboot.h> cpu/hotplug: Add dynamic parallel bringup states before CPUHP_BRINGUP_CPU x86/smpboot: Reference count on smpboot_setup_warm_reset_vector() x86/smpboot: Split up native_cpu_up into separate phases and document them x86/smpboot: Support parallel startup of secondary CPUs x86/smpboot: Send INIT/SIPI/SIPI to secondary CPUs in parallel x86/smpboot: Serialize topology updates for secondary bringup .../admin-guide/kernel-parameters.txt | 3 + arch/x86/include/asm/processor.h | 6 +- arch/x86/include/asm/realmode.h | 4 +- arch/x86/include/asm/smp.h | 15 +- arch/x86/include/asm/topology.h | 2 - arch/x86/kernel/acpi/sleep.c | 30 +- arch/x86/kernel/apic/apic.c | 2 +- arch/x86/kernel/apic/x2apic_cluster.c | 126 ++++--- arch/x86/kernel/asm-offsets.c | 1 + arch/x86/kernel/cpu/common.c | 6 +- arch/x86/kernel/head_64.S | 132 +++++-- arch/x86/kernel/smpboot.c | 350 +++++++++++++----- arch/x86/realmode/init.c | 3 + arch/x86/realmode/rm/trampoline_64.S | 27 +- arch/x86/xen/smp_pv.c | 4 +- arch/x86/xen/xen-head.S | 2 +- include/linux/cpuhotplug.h | 2 + include/linux/smpboot.h | 7 + kernel/cpu.c | 31 +- kernel/smpboot.h | 2 - 20 files changed, 556 insertions(+), 199 deletions(-)
Comments
On Thu, 2023-03-02 at 11:12 +0000, Usama Arif wrote: > The main code change over v12 is to fix the build error when > CONFIG_FORCE_NR_CPUS is present. > > The commit message for removing initial stack has also been improved, typos > have been fixed and extra comments have been added to make code clearer. Might something like this make it work in parallel with SEV-SNP? If so, I can clean it up and adjust the C code to actually invoke it... diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h index b8357d6ecd47..f25df4bd318e 100644 --- a/arch/x86/include/asm/sev-common.h +++ b/arch/x86/include/asm/sev-common.h @@ -70,6 +70,7 @@ /* GHCBData[63:12] */ \ (((u64)(v) & GENMASK_ULL(63, 12)) >> 12) +#ifndef __ASSEMBLY__ /* * SNP Page State Change Operation * @@ -160,6 +161,8 @@ struct snp_psc_desc { #define GHCB_RESP_CODE(v) ((v) & GHCB_MSR_INFO_MASK) +#endif /* __ASSEMBLY__ */ + /* * Error codes related to GHCB input that can be communicated back to the guest * by setting the lower 32-bits of the GHCB SW_EXITINFO1 field to 2. diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h index defe76ee9e64..0c26f80f488c 100644 --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -204,6 +204,7 @@ extern unsigned int smpboot_control; /* Control bits for startup_64 */ #define STARTUP_APICID_CPUID_0B 0x80000000 #define STARTUP_APICID_CPUID_01 0x40000000 +#define STARTUP_APICID_SEV_SNP 0x20000000 #define STARTUP_PARALLEL_MASK (STARTUP_APICID_CPUID_01 | STARTUP_APICID_CPUID_0B) diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index c35f7c173832..b2571034562c 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -26,7 +26,7 @@ #include <asm/nospec-branch.h> #include <asm/fixmap.h> #include <asm/smp.h> - +#include <asm/sev-common.h> /* * We are not able to switch in one step to the final KERNEL ADDRESS SPACE * because we need identity-mapped pages. @@ -242,6 +242,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL) * * Bit 31 STARTUP_APICID_CPUID_0B flag (use CPUID 0x0b) * Bit 30 STARTUP_APICID_CPUID_01 flag (use CPUID 0x01) + * Bit 29 STARTUP_APICID_SEV_SNP flag (CPUID 0x0v via GHCB MSR) * Bit 0-24 CPU# if STARTUP_APICID_CPUID_xx flags are not set */ movl smpboot_control(%rip), %ecx @@ -249,6 +250,8 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL) jnz .Luse_cpuid_0b testl $STARTUP_APICID_CPUID_01, %ecx jnz .Luse_cpuid_01 + testl $STARTUP_APICID_SEV_SNP, %ecx + jnz .Luse_sev_cpuid_0b andl $0x0FFFFFFF, %ecx jmp .Lsetup_cpu @@ -259,6 +262,18 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL) shr $24, %edx jmp .Lsetup_AP +.Luse_sev_cpuid_0b: + movl $MSR_AMD64_SEV_ES_GHCB, %ecx + # GHCB_CPUID_REQ(0x0b, GHCB_CPUID_REQ_EDX) + movq $0xc00000040000000b, %rax + xorl %edx, %edx + vmgexit + rdmsr + andl $GHCB_MSR_INFO_MASK, %eax + cmpl $GHCB_MSR_CPUID_RESP, %eax + jne 1f + jmp .Lsetup_AP + .Luse_cpuid_0b: mov $0x0B, %eax xorl %ecx, %ecx
On 3/7/23 08:42, David Woodhouse wrote: > On Thu, 2023-03-02 at 11:12 +0000, Usama Arif wrote: >> The main code change over v12 is to fix the build error when >> CONFIG_FORCE_NR_CPUS is present. >> >> The commit message for removing initial stack has also been improved, typos >> have been fixed and extra comments have been added to make code clearer. > > Might something like this make it work in parallel with SEV-SNP? If so, > I can clean it up and adjust the C code to actually invoke it... This should be ok for both SEV-ES and SEV-SNP. > > diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h > index b8357d6ecd47..f25df4bd318e 100644 > --- a/arch/x86/include/asm/sev-common.h > +++ b/arch/x86/include/asm/sev-common.h > @@ -70,6 +70,7 @@ > /* GHCBData[63:12] */ \ > (((u64)(v) & GENMASK_ULL(63, 12)) >> 12) > > +#ifndef __ASSEMBLY__ > /* > * SNP Page State Change Operation > * > @@ -160,6 +161,8 @@ struct snp_psc_desc { > > #define GHCB_RESP_CODE(v) ((v) & GHCB_MSR_INFO_MASK) > > +#endif /* __ASSEMBLY__ */ > + > /* > * Error codes related to GHCB input that can be communicated back to the guest > * by setting the lower 32-bits of the GHCB SW_EXITINFO1 field to 2. > diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h > index defe76ee9e64..0c26f80f488c 100644 > --- a/arch/x86/include/asm/smp.h > +++ b/arch/x86/include/asm/smp.h > @@ -204,6 +204,7 @@ extern unsigned int smpboot_control; > /* Control bits for startup_64 */ > #define STARTUP_APICID_CPUID_0B 0x80000000 > #define STARTUP_APICID_CPUID_01 0x40000000 > +#define STARTUP_APICID_SEV_SNP 0x20000000 > > #define STARTUP_PARALLEL_MASK (STARTUP_APICID_CPUID_01 | STARTUP_APICID_CPUID_0B) > > diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S > index c35f7c173832..b2571034562c 100644 > --- a/arch/x86/kernel/head_64.S > +++ b/arch/x86/kernel/head_64.S > @@ -26,7 +26,7 @@ > #include <asm/nospec-branch.h> > #include <asm/fixmap.h> > #include <asm/smp.h> > - > +#include <asm/sev-common.h> > /* > * We are not able to switch in one step to the final KERNEL ADDRESS SPACE > * because we need identity-mapped pages. > @@ -242,6 +242,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL) > * > * Bit 31 STARTUP_APICID_CPUID_0B flag (use CPUID 0x0b) > * Bit 30 STARTUP_APICID_CPUID_01 flag (use CPUID 0x01) > + * Bit 29 STARTUP_APICID_SEV_SNP flag (CPUID 0x0v via GHCB MSR) > * Bit 0-24 CPU# if STARTUP_APICID_CPUID_xx flags are not set > */ > movl smpboot_control(%rip), %ecx > @@ -249,6 +250,8 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL) > jnz .Luse_cpuid_0b > testl $STARTUP_APICID_CPUID_01, %ecx > jnz .Luse_cpuid_01 > + testl $STARTUP_APICID_SEV_SNP, %ecx > + jnz .Luse_sev_cpuid_0b > andl $0x0FFFFFFF, %ecx > jmp .Lsetup_cpu > > @@ -259,6 +262,18 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL) > shr $24, %edx > jmp .Lsetup_AP > > +.Luse_sev_cpuid_0b: > + movl $MSR_AMD64_SEV_ES_GHCB, %ecx > + # GHCB_CPUID_REQ(0x0b, GHCB_CPUID_REQ_EDX) > + movq $0xc00000040000000b, %rax > + xorl %edx, %edx > + vmgexit According to the GHCB spec, the GHCB MSR protocol is triggered when the GHCB value is non-zero in bits 11:0. For the CPUID function, bits 63:32 hold the CPUID function, bits 31:30 hold the requested register and bits 11:0 == 0x4. So this should be: /* Set the GHCB MSR to request CPUID 0xB_EDX */ movl $MSR_AMD64_SEV_ES_GHCB, %ecx movl $0xc0000004, %eax movl $0xb, %edx wrmsr /* Perform GHCB MSR protocol */ vmgexit /* * Get the result. After the RDMSR: * EAX should be 0xc0000005 * EDX should have the CPUID register value and since EDX * is the target register, no need to move the result. */ > + rdmsr > + andl $GHCB_MSR_INFO_MASK, %eax > + cmpl $GHCB_MSR_CPUID_RESP, %eax > + jne 1f > + jmp .Lsetup_AP > + > .Luse_cpuid_0b: > mov $0x0B, %eax > xorl %ecx, %ecx > Thanks, Tom >
On Tue, 2023-03-07 at 10:45 -0600, Tom Lendacky wrote: > On 3/7/23 08:42, David Woodhouse wrote: > > On Thu, 2023-03-02 at 11:12 +0000, Usama Arif wrote: > > > The main code change over v12 is to fix the build error when > > > CONFIG_FORCE_NR_CPUS is present. > > > > > > The commit message for removing initial stack has also been improved, typos > > > have been fixed and extra comments have been added to make code clearer. > > > > Might something like this make it work in parallel with SEV-SNP? If so, > > I can clean it up and adjust the C code to actually invoke it... > > This should be ok for both SEV-ES and SEV-SNP. Thanks. So... something like this then? Is static_branch_unlikely(&sev_es_enable_key) the right thing to use, and does that cover SNP too? Pushed to https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/parallel-6.2-rc8-v12bis From d03aed91e5cfe840f4b18820fe6aed1765687321 Mon Sep 17 00:00:00 2001 From: David Woodhouse <dwmw@amazon.co.uk> Date: Tue, 7 Mar 2023 19:06:50 +0000 Subject: [PATCH] x86/smpboot: Allow parallel bringup for SEV-ES Enable parallel bringup for SEV-ES guests. The APs can't actually execute the CPUID instruction directly during early startup, but they can make the GHCB call directly instead, just as the VC trap handler would do. Factor out a prepare_parallel_bringup() function to help reduce the level of complexity by allowing a simple 'return false' in the bail-out cases/ Thanks to Sabin for talking me through the way this works. Suggested-by: Sabin Rapan <sabrapan@amazon.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> --- arch/x86/include/asm/sev-common.h | 3 + arch/x86/include/asm/smp.h | 3 +- arch/x86/kernel/head_64.S | 27 ++++++- arch/x86/kernel/smpboot.c | 112 ++++++++++++++++++------------ 4 files changed, 98 insertions(+), 47 deletions(-) diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h index b8357d6ecd47..f25df4bd318e 100644 --- a/arch/x86/include/asm/sev-common.h +++ b/arch/x86/include/asm/sev-common.h @@ -70,6 +70,7 @@ /* GHCBData[63:12] */ \ (((u64)(v) & GENMASK_ULL(63, 12)) >> 12) +#ifndef __ASSEMBLY__ /* * SNP Page State Change Operation * @@ -160,6 +161,8 @@ struct snp_psc_desc { #define GHCB_RESP_CODE(v) ((v) & GHCB_MSR_INFO_MASK) +#endif /* __ASSEMBLY__ */ + /* * Error codes related to GHCB input that can be communicated back to the guest * by setting the lower 32-bits of the GHCB SW_EXITINFO1 field to 2. diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h index defe76ee9e64..b3f67a764bfa 100644 --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -204,7 +204,8 @@ extern unsigned int smpboot_control; /* Control bits for startup_64 */ #define STARTUP_APICID_CPUID_0B 0x80000000 #define STARTUP_APICID_CPUID_01 0x40000000 +#define STARTUP_APICID_SEV_ES 0x20000000 -#define STARTUP_PARALLEL_MASK (STARTUP_APICID_CPUID_01 | STARTUP_APICID_CPUID_0B) +#define STARTUP_PARALLEL_MASK (STARTUP_APICID_CPUID_01 | STARTUP_APICID_CPUID_0B | STARTUP_APICID_SEV_ES) #endif /* _ASM_X86_SMP_H */ diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index c35f7c173832..3f5904eab678 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -26,7 +26,7 @@ #include <asm/nospec-branch.h> #include <asm/fixmap.h> #include <asm/smp.h> - +#include <asm/sev-common.h> /* * We are not able to switch in one step to the final KERNEL ADDRESS SPACE * because we need identity-mapped pages. @@ -242,6 +242,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL) * * Bit 31 STARTUP_APICID_CPUID_0B flag (use CPUID 0x0b) * Bit 30 STARTUP_APICID_CPUID_01 flag (use CPUID 0x01) + * Bit 29 STARTUP_APICID_SEV_ES flag (CPUID 0x0b via GHCB MSR) * Bit 0-24 CPU# if STARTUP_APICID_CPUID_xx flags are not set */ movl smpboot_control(%rip), %ecx @@ -249,6 +250,8 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL) jnz .Luse_cpuid_0b testl $STARTUP_APICID_CPUID_01, %ecx jnz .Luse_cpuid_01 + testl $STARTUP_APICID_SEV_ES, %ecx + jnz .Luse_sev_cpuid_0b andl $0x0FFFFFFF, %ecx jmp .Lsetup_cpu @@ -259,6 +262,28 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL) shr $24, %edx jmp .Lsetup_AP +.Luse_sev_cpuid_0b: + /* Set the GHCB MSR to request CPUID 0xB_EDX */ + movl $MSR_AMD64_SEV_ES_GHCB, %ecx + movl $(GHCB_CPUID_REQ_EDX << 30) | GHCB_MSR_CPUID_REQ, %eax + movl $0x0B, %edx + wrmsr + + /* Perform GHCB MSR protocol */ + vmgexit + + /* + * Get the result. After the RDMSR: + * EAX should be 0xc0000005 + * EDX should have the CPUID register value and since EDX + * is the target register, no need to move the result. + */ + rdmsr + andl $GHCB_MSR_INFO_MASK, %eax + cmpl $GHCB_MSR_CPUID_RESP, %eax + jne 1f + jmp .Lsetup_AP + .Luse_cpuid_0b: mov $0x0B, %eax xorl %ecx, %ecx diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 9d956571ecc1..d194c4ffeef8 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1510,6 +1510,71 @@ void __init smp_prepare_cpus_common(void) set_cpu_sibling_map(0); } + +/* + * We can do 64-bit AP bringup in parallel if the CPU reports its APIC + * ID in CPUID (either leaf 0x0B if we need the full APIC ID in X2APIC + * mode, or leaf 0x01 if 8 bits are sufficient). Otherwise it's too + * hard. And not for SEV-ES guests because they can't use CPUID that + * early. + */ +static bool __init prepare_parallel_bringup(void) +{ + if (IS_ENABLED(CONFIG_X86_32) || boot_cpu_data.cpuid_level < 1) + return false; + + if (x2apic_mode) { + unsigned int eax, ebx, ecx, edx; + + if (boot_cpu_data.cpuid_level < 0xb) + return false; + + /* + * To support parallel bringup in x2apic mode, the AP will need + * to obtain its APIC ID from CPUID 0x0B, since CPUID 0x01 has + * only 8 bits. Check that it is present and seems correct. + */ + cpuid_count(0xb, 0, &eax, &ebx, &ecx, &edx); + + /* + * AMD says that if executed with an umimplemented level in + * ECX, then it will return all zeroes in EAX. Intel says it + * will return zeroes in both EAX and EBX. Checking only EAX + * should be sufficient. + */ + if (!eax) { + pr_info("Disabling parallel bringup because CPUID 0xb looks untrustworthy\n"); + return false; + } + + if (IS_ENABLED(AMD_MEM_ENCRYPT) && static_branch_unlikely(&sev_es_enable_key)) { + pr_debug("Using SEV-ES CPUID 0xb for parallel CPU startup\n"); + smpboot_control = STARTUP_APICID_SEV_ES; + } else if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) { + /* + * Other forms of memory encryption need to implement a way of + * finding the APs' APIC IDs that early. + */ + return false; + } else { + pr_debug("Using CPUID 0xb for parallel CPU startup\n"); + smpboot_control = STARTUP_APICID_CPUID_0B; + } + } else { + if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) + return false; + + /* Without X2APIC, what's in CPUID 0x01 should suffice. */ + pr_debug("Using CPUID 0x1 for parallel CPU startup\n"); + smpboot_control = STARTUP_APICID_CPUID_01; + } + + cpuhp_setup_state_nocalls(CPUHP_BP_PARALLEL_DYN, "x86/cpu:kick", + native_cpu_kick, NULL); + + return true; +} + /* * Prepare for SMP bootup. * @max_cpus: configured maximum number of CPUs, It is a legacy parameter @@ -1550,51 +1615,8 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus) speculative_store_bypass_ht_init(); - /* - * We can do 64-bit AP bringup in parallel if the CPU reports - * its APIC ID in CPUID (either leaf 0x0B if we need the full - * APIC ID in X2APIC mode, or leaf 0x01 if 8 bits are - * sufficient). Otherwise it's too hard. And not for SEV-ES - * guests because they can't use CPUID that early. - */ - if (IS_ENABLED(CONFIG_X86_32) || boot_cpu_data.cpuid_level < 1 || - (x2apic_mode && boot_cpu_data.cpuid_level < 0xb) || - cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) - do_parallel_bringup = false; - - if (do_parallel_bringup && x2apic_mode) { - unsigned int eax, ebx, ecx, edx; - - /* - * To support parallel bringup in x2apic mode, the AP will need - * to obtain its APIC ID from CPUID 0x0B, since CPUID 0x01 has - * only 8 bits. Check that it is present and seems correct. - */ - cpuid_count(0xb, 0, &eax, &ebx, &ecx, &edx); - - /* - * AMD says that if executed with an umimplemented level in - * ECX, then it will return all zeroes in EAX. Intel says it - * will return zeroes in both EAX and EBX. Checking only EAX - * should be sufficient. - */ - if (eax) { - pr_debug("Using CPUID 0xb for parallel CPU startup\n"); - smpboot_control = STARTUP_APICID_CPUID_0B; - } else { - pr_info("Disabling parallel bringup because CPUID 0xb looks untrustworthy\n"); - do_parallel_bringup = false; - } - } else if (do_parallel_bringup) { - /* Without X2APIC, what's in CPUID 0x01 should suffice. */ - pr_debug("Using CPUID 0x1 for parallel CPU startup\n"); - smpboot_control = STARTUP_APICID_CPUID_01; - } - - if (do_parallel_bringup) { - cpuhp_setup_state_nocalls(CPUHP_BP_PARALLEL_DYN, "x86/cpu:kick", - native_cpu_kick, NULL); - } + if (do_parallel_bringup) + do_parallel_bringup = prepare_parallel_bringup(); snp_set_wakeup_secondary_cpu(); }
On 3/7/23 13:18, David Woodhouse wrote: > On Tue, 2023-03-07 at 10:45 -0600, Tom Lendacky wrote: >> On 3/7/23 08:42, David Woodhouse wrote: >>> On Thu, 2023-03-02 at 11:12 +0000, Usama Arif wrote: >>>> The main code change over v12 is to fix the build error when >>>> CONFIG_FORCE_NR_CPUS is present. >>>> >>>> The commit message for removing initial stack has also been improved, typos >>>> have been fixed and extra comments have been added to make code clearer. >>> >>> Might something like this make it work in parallel with SEV-SNP? If so, >>> I can clean it up and adjust the C code to actually invoke it... >> >> This should be ok for both SEV-ES and SEV-SNP. > > Thanks. So... something like this then? > > Is static_branch_unlikely(&sev_es_enable_key) the right thing to use, > and does that cover SNP too? Yes, that will cover SNP, too. > > Pushed to > https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/parallel-6.2-rc8-v12bis > > From d03aed91e5cfe840f4b18820fe6aed1765687321 Mon Sep 17 00:00:00 2001 > From: David Woodhouse <dwmw@amazon.co.uk> > Date: Tue, 7 Mar 2023 19:06:50 +0000 > Subject: [PATCH] x86/smpboot: Allow parallel bringup for SEV-ES > > Enable parallel bringup for SEV-ES guests. The APs can't actually > execute the CPUID instruction directly during early startup, but they > can make the GHCB call directly instead, just as the VC trap handler > would do. > > Factor out a prepare_parallel_bringup() function to help reduce the level > of complexity by allowing a simple 'return false' in the bail-out cases/ > > Thanks to Sabin for talking me through the way this works. > > Suggested-by: Sabin Rapan <sabrapan@amazon.com> > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> > --- > arch/x86/include/asm/sev-common.h | 3 + > arch/x86/include/asm/smp.h | 3 +- > arch/x86/kernel/head_64.S | 27 ++++++- > arch/x86/kernel/smpboot.c | 112 ++++++++++++++++++------------ > 4 files changed, 98 insertions(+), 47 deletions(-) > > diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h > index b8357d6ecd47..f25df4bd318e 100644 > --- a/arch/x86/include/asm/sev-common.h > +++ b/arch/x86/include/asm/sev-common.h > @@ -70,6 +70,7 @@ > /* GHCBData[63:12] */ \ > (((u64)(v) & GENMASK_ULL(63, 12)) >> 12) > > +#ifndef __ASSEMBLY__ > /* > * SNP Page State Change Operation > * > @@ -160,6 +161,8 @@ struct snp_psc_desc { > > #define GHCB_RESP_CODE(v) ((v) & GHCB_MSR_INFO_MASK) > > +#endif /* __ASSEMBLY__ */ > + > /* > * Error codes related to GHCB input that can be communicated back to the guest > * by setting the lower 32-bits of the GHCB SW_EXITINFO1 field to 2. > diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h > index defe76ee9e64..b3f67a764bfa 100644 > --- a/arch/x86/include/asm/smp.h > +++ b/arch/x86/include/asm/smp.h > @@ -204,7 +204,8 @@ extern unsigned int smpboot_control; > /* Control bits for startup_64 */ > #define STARTUP_APICID_CPUID_0B 0x80000000 > #define STARTUP_APICID_CPUID_01 0x40000000 > +#define STARTUP_APICID_SEV_ES 0x20000000 > > -#define STARTUP_PARALLEL_MASK (STARTUP_APICID_CPUID_01 | STARTUP_APICID_CPUID_0B) > +#define STARTUP_PARALLEL_MASK (STARTUP_APICID_CPUID_01 | STARTUP_APICID_CPUID_0B | STARTUP_APICID_SEV_ES) > > #endif /* _ASM_X86_SMP_H */ > diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S > index c35f7c173832..3f5904eab678 100644 > --- a/arch/x86/kernel/head_64.S > +++ b/arch/x86/kernel/head_64.S > @@ -26,7 +26,7 @@ > #include <asm/nospec-branch.h> > #include <asm/fixmap.h> > #include <asm/smp.h> > - > +#include <asm/sev-common.h> > /* > * We are not able to switch in one step to the final KERNEL ADDRESS SPACE > * because we need identity-mapped pages. > @@ -242,6 +242,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL) > * > * Bit 31 STARTUP_APICID_CPUID_0B flag (use CPUID 0x0b) > * Bit 30 STARTUP_APICID_CPUID_01 flag (use CPUID 0x01) > + * Bit 29 STARTUP_APICID_SEV_ES flag (CPUID 0x0b via GHCB MSR) > * Bit 0-24 CPU# if STARTUP_APICID_CPUID_xx flags are not set > */ > movl smpboot_control(%rip), %ecx > @@ -249,6 +250,8 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL) > jnz .Luse_cpuid_0b > testl $STARTUP_APICID_CPUID_01, %ecx > jnz .Luse_cpuid_01 > + testl $STARTUP_APICID_SEV_ES, %ecx > + jnz .Luse_sev_cpuid_0b > andl $0x0FFFFFFF, %ecx > jmp .Lsetup_cpu > > @@ -259,6 +262,28 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL) > shr $24, %edx > jmp .Lsetup_AP > > +.Luse_sev_cpuid_0b: > + /* Set the GHCB MSR to request CPUID 0xB_EDX */ > + movl $MSR_AMD64_SEV_ES_GHCB, %ecx > + movl $(GHCB_CPUID_REQ_EDX << 30) | GHCB_MSR_CPUID_REQ, %eax > + movl $0x0B, %edx > + wrmsr > + > + /* Perform GHCB MSR protocol */ > + vmgexit > + > + /* > + * Get the result. After the RDMSR: > + * EAX should be 0xc0000005 > + * EDX should have the CPUID register value and since EDX > + * is the target register, no need to move the result. > + */ > + rdmsr > + andl $GHCB_MSR_INFO_MASK, %eax > + cmpl $GHCB_MSR_CPUID_RESP, %eax > + jne 1f > + jmp .Lsetup_AP > + > .Luse_cpuid_0b: > mov $0x0B, %eax > xorl %ecx, %ecx > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > index 9d956571ecc1..d194c4ffeef8 100644 > --- a/arch/x86/kernel/smpboot.c > +++ b/arch/x86/kernel/smpboot.c > @@ -1510,6 +1510,71 @@ void __init smp_prepare_cpus_common(void) > set_cpu_sibling_map(0); > } > > + > +/* > + * We can do 64-bit AP bringup in parallel if the CPU reports its APIC > + * ID in CPUID (either leaf 0x0B if we need the full APIC ID in X2APIC > + * mode, or leaf 0x01 if 8 bits are sufficient). Otherwise it's too > + * hard. And not for SEV-ES guests because they can't use CPUID that > + * early. > + */ > +static bool __init prepare_parallel_bringup(void) > +{ > + if (IS_ENABLED(CONFIG_X86_32) || boot_cpu_data.cpuid_level < 1) > + return false; > + > + if (x2apic_mode) { > + unsigned int eax, ebx, ecx, edx; > + > + if (boot_cpu_data.cpuid_level < 0xb) > + return false; > + > + /* > + * To support parallel bringup in x2apic mode, the AP will need > + * to obtain its APIC ID from CPUID 0x0B, since CPUID 0x01 has > + * only 8 bits. Check that it is present and seems correct. > + */ > + cpuid_count(0xb, 0, &eax, &ebx, &ecx, &edx); > + > + /* > + * AMD says that if executed with an umimplemented level in > + * ECX, then it will return all zeroes in EAX. Intel says it > + * will return zeroes in both EAX and EBX. Checking only EAX > + * should be sufficient. > + */ > + if (!eax) { > + pr_info("Disabling parallel bringup because CPUID 0xb looks untrustworthy\n"); > + return false; > + } > + > + if (IS_ENABLED(AMD_MEM_ENCRYPT) && static_branch_unlikely(&sev_es_enable_key)) { This should be IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT) Let me take this patch and run some tests to confirm that everything works as expected. Thanks! Tom > + pr_debug("Using SEV-ES CPUID 0xb for parallel CPU startup\n"); > + smpboot_control = STARTUP_APICID_SEV_ES; > + } else if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) { > + /* > + * Other forms of memory encryption need to implement a way of > + * finding the APs' APIC IDs that early. > + */ > + return false; > + } else { > + pr_debug("Using CPUID 0xb for parallel CPU startup\n"); > + smpboot_control = STARTUP_APICID_CPUID_0B; > + } > + } else { > + if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) > + return false; > + > + /* Without X2APIC, what's in CPUID 0x01 should suffice. */ > + pr_debug("Using CPUID 0x1 for parallel CPU startup\n"); > + smpboot_control = STARTUP_APICID_CPUID_01; > + } > + > + cpuhp_setup_state_nocalls(CPUHP_BP_PARALLEL_DYN, "x86/cpu:kick", > + native_cpu_kick, NULL); > + > + return true; > +} > + > /* > * Prepare for SMP bootup. > * @max_cpus: configured maximum number of CPUs, It is a legacy parameter > @@ -1550,51 +1615,8 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus) > > speculative_store_bypass_ht_init(); > > - /* > - * We can do 64-bit AP bringup in parallel if the CPU reports > - * its APIC ID in CPUID (either leaf 0x0B if we need the full > - * APIC ID in X2APIC mode, or leaf 0x01 if 8 bits are > - * sufficient). Otherwise it's too hard. And not for SEV-ES > - * guests because they can't use CPUID that early. > - */ > - if (IS_ENABLED(CONFIG_X86_32) || boot_cpu_data.cpuid_level < 1 || > - (x2apic_mode && boot_cpu_data.cpuid_level < 0xb) || > - cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) > - do_parallel_bringup = false; > - > - if (do_parallel_bringup && x2apic_mode) { > - unsigned int eax, ebx, ecx, edx; > - > - /* > - * To support parallel bringup in x2apic mode, the AP will need > - * to obtain its APIC ID from CPUID 0x0B, since CPUID 0x01 has > - * only 8 bits. Check that it is present and seems correct. > - */ > - cpuid_count(0xb, 0, &eax, &ebx, &ecx, &edx); > - > - /* > - * AMD says that if executed with an umimplemented level in > - * ECX, then it will return all zeroes in EAX. Intel says it > - * will return zeroes in both EAX and EBX. Checking only EAX > - * should be sufficient. > - */ > - if (eax) { > - pr_debug("Using CPUID 0xb for parallel CPU startup\n"); > - smpboot_control = STARTUP_APICID_CPUID_0B; > - } else { > - pr_info("Disabling parallel bringup because CPUID 0xb looks untrustworthy\n"); > - do_parallel_bringup = false; > - } > - } else if (do_parallel_bringup) { > - /* Without X2APIC, what's in CPUID 0x01 should suffice. */ > - pr_debug("Using CPUID 0x1 for parallel CPU startup\n"); > - smpboot_control = STARTUP_APICID_CPUID_01; > - } > - > - if (do_parallel_bringup) { > - cpuhp_setup_state_nocalls(CPUHP_BP_PARALLEL_DYN, "x86/cpu:kick", > - native_cpu_kick, NULL); > - } > + if (do_parallel_bringup) > + do_parallel_bringup = prepare_parallel_bringup(); > > snp_set_wakeup_secondary_cpu(); > }
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > index 9d956571ecc1..d194c4ffeef8 100644 > --- a/arch/x86/kernel/smpboot.c > +++ b/arch/x86/kernel/smpboot.c > @@ -1510,6 +1510,71 @@ void __init smp_prepare_cpus_common(void) > set_cpu_sibling_map(0); > } > > + > +/* > + * We can do 64-bit AP bringup in parallel if the CPU reports its APIC > + * ID in CPUID (either leaf 0x0B if we need the full APIC ID in X2APIC > + * mode, or leaf 0x01 if 8 bits are sufficient). Otherwise it's too > + * hard. And not for SEV-ES guests because they can't use CPUID that > + * early. > + */ > +static bool __init prepare_parallel_bringup(void) > +{ > + if (IS_ENABLED(CONFIG_X86_32) || boot_cpu_data.cpuid_level < 1) > + return false; > + > + if (x2apic_mode) { > + unsigned int eax, ebx, ecx, edx; > + > + if (boot_cpu_data.cpuid_level < 0xb) > + return false; > + > + /* > + * To support parallel bringup in x2apic mode, the AP will need > + * to obtain its APIC ID from CPUID 0x0B, since CPUID 0x01 has > + * only 8 bits. Check that it is present and seems correct. > + */ > + cpuid_count(0xb, 0, &eax, &ebx, &ecx, &edx); > + > + /* > + * AMD says that if executed with an umimplemented level in > + * ECX, then it will return all zeroes in EAX. Intel says it > + * will return zeroes in both EAX and EBX. Checking only EAX > + * should be sufficient. > + */ > + if (!eax) { > + pr_info("Disabling parallel bringup because CPUID 0xb looks untrustworthy\n"); > + return false; > + } > + > + if (IS_ENABLED(AMD_MEM_ENCRYPT) && static_branch_unlikely(&sev_es_enable_key)) { > + pr_debug("Using SEV-ES CPUID 0xb for parallel CPU startup\n"); > + smpboot_control = STARTUP_APICID_SEV_ES; > + } else if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) { > + /* > + * Other forms of memory encryption need to implement a way of > + * finding the APs' APIC IDs that early. > + */ > + return false; > + } else { > + pr_debug("Using CPUID 0xb for parallel CPU startup\n"); > + smpboot_control = STARTUP_APICID_CPUID_0B; I believe TDX guests with x2apic mode will end up here and enable parallel smp if Sean was correct in this (https://lore.kernel.org/all/Y91PoIfc2jdRv0WG@google.com/). i.e. "TDX guest state is also encrypted, but TDX doesn't return true CC_ATTR_GUEST_STATE_ENCRYPT.". So I believe the above else if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) is not useful as thats set for just SEV-ES guests? which is covered in the if part. Thanks, Usama > + } > + } else { > + if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) > + return false; > + > + /* Without X2APIC, what's in CPUID 0x01 should suffice. */ > + pr_debug("Using CPUID 0x1 for parallel CPU startup\n"); > + smpboot_control = STARTUP_APICID_CPUID_01; > + } > + > + cpuhp_setup_state_nocalls(CPUHP_BP_PARALLEL_DYN, "x86/cpu:kick", > + native_cpu_kick, NULL); > + > + return true; > +} > + > /* > * Prepare for SMP bootup.
On 3/7/23 14:06, Tom Lendacky wrote: > On 3/7/23 13:18, David Woodhouse wrote: >> On Tue, 2023-03-07 at 10:45 -0600, Tom Lendacky wrote: >>> On 3/7/23 08:42, David Woodhouse wrote: >>>> On Thu, 2023-03-02 at 11:12 +0000, Usama Arif wrote: >>>>> The main code change over v12 is to fix the build error when >>>>> CONFIG_FORCE_NR_CPUS is present. >>>>> >>>>> The commit message for removing initial stack has also been improved, >>>>> typos >>>>> have been fixed and extra comments have been added to make code clearer. >>>> >>>> Might something like this make it work in parallel with SEV-SNP? If so, >>>> I can clean it up and adjust the C code to actually invoke it... >>> >>> This should be ok for both SEV-ES and SEV-SNP. >> >> Thanks. So... something like this then? >> >> Is static_branch_unlikely(&sev_es_enable_key) the right thing to use, >> and does that cover SNP too? > > Yes, that will cover SNP, too. > >> >> Pushed to >> https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/parallel-6.2-rc8-v12bis >> >> From d03aed91e5cfe840f4b18820fe6aed1765687321 Mon Sep 17 00:00:00 2001 >> From: David Woodhouse <dwmw@amazon.co.uk> >> Date: Tue, 7 Mar 2023 19:06:50 +0000 >> Subject: [PATCH] x86/smpboot: Allow parallel bringup for SEV-ES >> >> Enable parallel bringup for SEV-ES guests. The APs can't actually >> execute the CPUID instruction directly during early startup, but they >> can make the GHCB call directly instead, just as the VC trap handler >> would do. >> >> Factor out a prepare_parallel_bringup() function to help reduce the level >> of complexity by allowing a simple 'return false' in the bail-out cases/ >> >> Thanks to Sabin for talking me through the way this works. >> >> Suggested-by: Sabin Rapan <sabrapan@amazon.com> >> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> I did some Qemu/KVM testing. One thing I noticed is that on AMD, CPUID 0xB EAX will be non-zero only if SMT is enabled. So just booting some guests without CPU topology never did parallel booting ("smpboot: Disabling parallel bringup because CPUID 0xb looks untrustworthy"). I would imagine a bare-metal system that has diabled SMT will not do parallel booting, too (but I haven't had time to test that). I did have to change "vmgexit" to a "rep; vmmcall" based on my binutils and the IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT) change I mentioned before. But with that, I was able to successfully boot 64 vCPU SEV-ES and SEV-SNP guests using parallel booting. Thanks, Tom >> --- >> arch/x86/include/asm/sev-common.h | 3 + >> arch/x86/include/asm/smp.h | 3 +- >> arch/x86/kernel/head_64.S | 27 ++++++- >> arch/x86/kernel/smpboot.c | 112 ++++++++++++++++++------------ >> 4 files changed, 98 insertions(+), 47 deletions(-) >> >> diff --git a/arch/x86/include/asm/sev-common.h >> b/arch/x86/include/asm/sev-common.h >> index b8357d6ecd47..f25df4bd318e 100644 >> --- a/arch/x86/include/asm/sev-common.h >> +++ b/arch/x86/include/asm/sev-common.h >> @@ -70,6 +70,7 @@ >> /* GHCBData[63:12] */ \ >> (((u64)(v) & GENMASK_ULL(63, 12)) >> 12) >> +#ifndef __ASSEMBLY__ >> /* >> * SNP Page State Change Operation >> * >> @@ -160,6 +161,8 @@ struct snp_psc_desc { >> #define GHCB_RESP_CODE(v) ((v) & GHCB_MSR_INFO_MASK) >> +#endif /* __ASSEMBLY__ */ >> + >> /* >> * Error codes related to GHCB input that can be communicated back to >> the guest >> * by setting the lower 32-bits of the GHCB SW_EXITINFO1 field to 2. >> diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h >> index defe76ee9e64..b3f67a764bfa 100644 >> --- a/arch/x86/include/asm/smp.h >> +++ b/arch/x86/include/asm/smp.h >> @@ -204,7 +204,8 @@ extern unsigned int smpboot_control; >> /* Control bits for startup_64 */ >> #define STARTUP_APICID_CPUID_0B 0x80000000 >> #define STARTUP_APICID_CPUID_01 0x40000000 >> +#define STARTUP_APICID_SEV_ES 0x20000000 >> -#define STARTUP_PARALLEL_MASK (STARTUP_APICID_CPUID_01 | >> STARTUP_APICID_CPUID_0B) >> +#define STARTUP_PARALLEL_MASK (STARTUP_APICID_CPUID_01 | >> STARTUP_APICID_CPUID_0B | STARTUP_APICID_SEV_ES) >> #endif /* _ASM_X86_SMP_H */ >> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S >> index c35f7c173832..3f5904eab678 100644 >> --- a/arch/x86/kernel/head_64.S >> +++ b/arch/x86/kernel/head_64.S >> @@ -26,7 +26,7 @@ >> #include <asm/nospec-branch.h> >> #include <asm/fixmap.h> >> #include <asm/smp.h> >> - >> +#include <asm/sev-common.h> >> /* >> * We are not able to switch in one step to the final KERNEL ADDRESS >> SPACE >> * because we need identity-mapped pages. >> @@ -242,6 +242,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, >> SYM_L_GLOBAL) >> * >> * Bit 31 STARTUP_APICID_CPUID_0B flag (use CPUID 0x0b) >> * Bit 30 STARTUP_APICID_CPUID_01 flag (use CPUID 0x01) >> + * Bit 29 STARTUP_APICID_SEV_ES flag (CPUID 0x0b via GHCB MSR) >> * Bit 0-24 CPU# if STARTUP_APICID_CPUID_xx flags are not set >> */ >> movl smpboot_control(%rip), %ecx >> @@ -249,6 +250,8 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, >> SYM_L_GLOBAL) >> jnz .Luse_cpuid_0b >> testl $STARTUP_APICID_CPUID_01, %ecx >> jnz .Luse_cpuid_01 >> + testl $STARTUP_APICID_SEV_ES, %ecx >> + jnz .Luse_sev_cpuid_0b >> andl $0x0FFFFFFF, %ecx >> jmp .Lsetup_cpu >> @@ -259,6 +262,28 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, >> SYM_L_GLOBAL) >> shr $24, %edx >> jmp .Lsetup_AP >> +.Luse_sev_cpuid_0b: >> + /* Set the GHCB MSR to request CPUID 0xB_EDX */ >> + movl $MSR_AMD64_SEV_ES_GHCB, %ecx >> + movl $(GHCB_CPUID_REQ_EDX << 30) | GHCB_MSR_CPUID_REQ, %eax >> + movl $0x0B, %edx >> + wrmsr >> + >> + /* Perform GHCB MSR protocol */ >> + vmgexit >> + >> + /* >> + * Get the result. After the RDMSR: >> + * EAX should be 0xc0000005 >> + * EDX should have the CPUID register value and since EDX >> + * is the target register, no need to move the result. >> + */ >> + rdmsr >> + andl $GHCB_MSR_INFO_MASK, %eax >> + cmpl $GHCB_MSR_CPUID_RESP, %eax >> + jne 1f >> + jmp .Lsetup_AP >> + >> .Luse_cpuid_0b: >> mov $0x0B, %eax >> xorl %ecx, %ecx >> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c >> index 9d956571ecc1..d194c4ffeef8 100644 >> --- a/arch/x86/kernel/smpboot.c >> +++ b/arch/x86/kernel/smpboot.c >> @@ -1510,6 +1510,71 @@ void __init smp_prepare_cpus_common(void) >> set_cpu_sibling_map(0); >> } >> + >> +/* >> + * We can do 64-bit AP bringup in parallel if the CPU reports its APIC >> + * ID in CPUID (either leaf 0x0B if we need the full APIC ID in X2APIC >> + * mode, or leaf 0x01 if 8 bits are sufficient). Otherwise it's too >> + * hard. And not for SEV-ES guests because they can't use CPUID that >> + * early. >> + */ >> +static bool __init prepare_parallel_bringup(void) >> +{ >> + if (IS_ENABLED(CONFIG_X86_32) || boot_cpu_data.cpuid_level < 1) >> + return false; >> + >> + if (x2apic_mode) { >> + unsigned int eax, ebx, ecx, edx; >> + >> + if (boot_cpu_data.cpuid_level < 0xb) >> + return false; >> + >> + /* >> + * To support parallel bringup in x2apic mode, the AP will need >> + * to obtain its APIC ID from CPUID 0x0B, since CPUID 0x01 has >> + * only 8 bits. Check that it is present and seems correct. >> + */ >> + cpuid_count(0xb, 0, &eax, &ebx, &ecx, &edx); >> + >> + /* >> + * AMD says that if executed with an umimplemented level in >> + * ECX, then it will return all zeroes in EAX. Intel says it >> + * will return zeroes in both EAX and EBX. Checking only EAX >> + * should be sufficient. >> + */ >> + if (!eax) { >> + pr_info("Disabling parallel bringup because CPUID 0xb looks >> untrustworthy\n"); >> + return false; >> + } >> + >> + if (IS_ENABLED(AMD_MEM_ENCRYPT) && >> static_branch_unlikely(&sev_es_enable_key)) { > > This should be IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT) > > Let me take this patch and run some tests to confirm that everything works > as expected. > > Thanks! > Tom > >> + pr_debug("Using SEV-ES CPUID 0xb for parallel CPU startup\n"); >> + smpboot_control = STARTUP_APICID_SEV_ES; >> + } else if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) { >> + /* >> + * Other forms of memory encryption need to implement a way of >> + * finding the APs' APIC IDs that early. >> + */ >> + return false; >> + } else { >> + pr_debug("Using CPUID 0xb for parallel CPU startup\n"); >> + smpboot_control = STARTUP_APICID_CPUID_0B; >> + } >> + } else { >> + if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) >> + return false; >> + >> + /* Without X2APIC, what's in CPUID 0x01 should suffice. */ >> + pr_debug("Using CPUID 0x1 for parallel CPU startup\n"); >> + smpboot_control = STARTUP_APICID_CPUID_01; >> + } >> + >> + cpuhp_setup_state_nocalls(CPUHP_BP_PARALLEL_DYN, "x86/cpu:kick", >> + native_cpu_kick, NULL); >> + >> + return true; >> +} >> + >> /* >> * Prepare for SMP bootup. >> * @max_cpus: configured maximum number of CPUs, It is a legacy parameter >> @@ -1550,51 +1615,8 @@ void __init native_smp_prepare_cpus(unsigned int >> max_cpus) >> speculative_store_bypass_ht_init(); >> - /* >> - * We can do 64-bit AP bringup in parallel if the CPU reports >> - * its APIC ID in CPUID (either leaf 0x0B if we need the full >> - * APIC ID in X2APIC mode, or leaf 0x01 if 8 bits are >> - * sufficient). Otherwise it's too hard. And not for SEV-ES >> - * guests because they can't use CPUID that early. >> - */ >> - if (IS_ENABLED(CONFIG_X86_32) || boot_cpu_data.cpuid_level < 1 || >> - (x2apic_mode && boot_cpu_data.cpuid_level < 0xb) || >> - cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) >> - do_parallel_bringup = false; >> - >> - if (do_parallel_bringup && x2apic_mode) { >> - unsigned int eax, ebx, ecx, edx; >> - >> - /* >> - * To support parallel bringup in x2apic mode, the AP will need >> - * to obtain its APIC ID from CPUID 0x0B, since CPUID 0x01 has >> - * only 8 bits. Check that it is present and seems correct. >> - */ >> - cpuid_count(0xb, 0, &eax, &ebx, &ecx, &edx); >> - >> - /* >> - * AMD says that if executed with an umimplemented level in >> - * ECX, then it will return all zeroes in EAX. Intel says it >> - * will return zeroes in both EAX and EBX. Checking only EAX >> - * should be sufficient. >> - */ >> - if (eax) { >> - pr_debug("Using CPUID 0xb for parallel CPU startup\n"); >> - smpboot_control = STARTUP_APICID_CPUID_0B; >> - } else { >> - pr_info("Disabling parallel bringup because CPUID 0xb looks >> untrustworthy\n"); >> - do_parallel_bringup = false; >> - } >> - } else if (do_parallel_bringup) { >> - /* Without X2APIC, what's in CPUID 0x01 should suffice. */ >> - pr_debug("Using CPUID 0x1 for parallel CPU startup\n"); >> - smpboot_control = STARTUP_APICID_CPUID_01; >> - } >> - >> - if (do_parallel_bringup) { >> - cpuhp_setup_state_nocalls(CPUHP_BP_PARALLEL_DYN, "x86/cpu:kick", >> - native_cpu_kick, NULL); >> - } >> + if (do_parallel_bringup) >> + do_parallel_bringup = prepare_parallel_bringup(); >> snp_set_wakeup_secondary_cpu(); >> }
On Tue, 2023-03-07 at 16:22 -0600, Tom Lendacky wrote: > > I did some Qemu/KVM testing. One thing I noticed is that on AMD, CPUID 0xB > EAX will be non-zero only if SMT is enabled. So just booting some guests > without CPU topology never did parallel booting ("smpboot: Disabling > parallel bringup because CPUID 0xb looks untrustworthy"). I would imagine > a bare-metal system that has diabled SMT will not do parallel booting, too > (but I haven't had time to test that). Interesting, thanks. Should I change to checking for *both* EAX and EBX being zero? That's what I did first, after reading only the Intel SDM. But I changed to only EAX because the AMD doc only says that EAX will be zero for unsupported leaves. > I did have to change "vmgexit" to a "rep; vmmcall" based on my binutils > and the IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT) change I mentioned before. But > with that, I was able to successfully boot 64 vCPU SEV-ES and SEV-SNP > guests using parallel booting. Thanks. I'll look at retconning that rework of can_parallel_bringup() out into a separate function, into the earlier parts of the series.
On 3/7/23 16:27, David Woodhouse wrote: > On Tue, 2023-03-07 at 16:22 -0600, Tom Lendacky wrote: >> >> I did some Qemu/KVM testing. One thing I noticed is that on AMD, CPUID 0xB >> EAX will be non-zero only if SMT is enabled. So just booting some guests >> without CPU topology never did parallel booting ("smpboot: Disabling >> parallel bringup because CPUID 0xb looks untrustworthy"). I would imagine >> a bare-metal system that has diabled SMT will not do parallel booting, too >> (but I haven't had time to test that). > > Interesting, thanks. Should I change to checking for *both* EAX and EBX > being zero? That's what I did first, after reading only the Intel SDM. > But I changed to only EAX because the AMD doc only says that EAX will > be zero for unsupported leaves. From a baremetal perspective, I think that works. Rome was the first generation to support x2apic, and the PPR for Rome states that 0's are returned in all 4 registers for undefined function numbers. For virtualization, at least Qemu/KVM, that also looks to be a safe test. Thanks, Tom > >> I did have to change "vmgexit" to a "rep; vmmcall" based on my binutils >> and the IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT) change I mentioned before. But >> with that, I was able to successfully boot 64 vCPU SEV-ES and SEV-SNP >> guests using parallel booting. > > Thanks. I'll look at retconning that rework of can_parallel_bringup() > out into a separate function, into the earlier parts of the series.
On Tue, Mar 07, 2023, David Woodhouse wrote: > On Tue, 2023-03-07 at 16:22 -0600, Tom Lendacky wrote: > > > > I did some Qemu/KVM testing. One thing I noticed is that on AMD, CPUID 0xB > > EAX will be non-zero only if SMT is enabled. So just booting some guests > > without CPU topology never did parallel booting ("smpboot: Disabling > > parallel bringup because CPUID 0xb looks untrustworthy"). I would imagine > > a bare-metal system that has diabled SMT will not do parallel booting, too > > (but I haven't had time to test that). > > Interesting, thanks. Should I change to checking for *both* EAX and EBX > being zero? That's what I did first, after reading only the Intel SDM. > But I changed to only EAX because the AMD doc only says that EAX will > be zero for unsupported leaves. LOL, nice. '0' is a prefectly valid output for the shift if there's exactly one CPU at the current level, which is why Intel states that both EAX and EBX are cleared. I assume/hope this is effectively a documentation goof, in that the APM assumes that all "real" CPUs that support CPUID 0xB also support sub-function 0. Nit, the AMD says EAX will be zero for unsupported _levels_. The distinction matters because if the entire leaf is unsupported, AMD behavior is to zero out all output registers, even if the input leaf is above the max supported leaf. Anyways, the kernel already sanity checks the outputs on all x86 CPUs, just piggyback that logic, e.g. expose detect_extended_topology_leaf() or so. If AMD or a VMM is doing something crazy, the kernel is doomed anyways.
On Tue, 2023-03-07 at 15:35 -0800, Sean Christopherson wrote: > On Tue, Mar 07, 2023, David Woodhouse wrote: > > On Tue, 2023-03-07 at 16:22 -0600, Tom Lendacky wrote: > > > > > > I did some Qemu/KVM testing. One thing I noticed is that on AMD, CPUID 0xB > > > EAX will be non-zero only if SMT is enabled. So just booting some guests > > > without CPU topology never did parallel booting ("smpboot: Disabling > > > parallel bringup because CPUID 0xb looks untrustworthy"). I would imagine > > > a bare-metal system that has diabled SMT will not do parallel booting, too > > > (but I haven't had time to test that). > > > > Interesting, thanks. Should I change to checking for *both* EAX and EBX > > being zero? That's what I did first, after reading only the Intel SDM. > > But I changed to only EAX because the AMD doc only says that EAX will > > be zero for unsupported leaves. > > LOL, nice. '0' is a prefectly valid output for the shift if there's exactly one > CPU at the current level, which is why Intel states that both EAX and EBX are > cleared. I assume/hope this is effectively a documentation goof, in that the APM > assumes that all "real" CPUs that support CPUID 0xB also support sub-function 0. > > Nit, the AMD says EAX will be zero for unsupported _levels_. The distinction > matters because if the entire leaf is unsupported, AMD behavior is to zero out > all output registers, even if the input leaf is above the max supported leaf. > > Anyways, the kernel already sanity checks the outputs on all x86 CPUs, just > piggyback that logic, e.g. expose detect_extended_topology_leaf() or so. If AMD > or a VMM is doing something crazy, the kernel is doomed anyways. Well, we have to be somewhat careful with that assumption. Turns out that if CPUID 0xb doesn't have valid data, the kernel *wasn't* doomed anyway. A bunch of our early "WTF doesn't it work on AMD?" issue with this series were due to precisely that. But yeah, check_extended_topology_leaf() does check for EBX being non- zero, and also for SMT_TYPE being in CH. I'll look at just calling that; it'll make the mess of if/else in the prepare_parallel_bringup() function a bit cleaner.
On Tue, 2023-03-07 at 16:55 -0600, Tom Lendacky wrote: > On 3/7/23 16:27, David Woodhouse wrote: > > On Tue, 2023-03-07 at 16:22 -0600, Tom Lendacky wrote: > > > > > > I did some Qemu/KVM testing. One thing I noticed is that on AMD, CPUID 0xB > > > EAX will be non-zero only if SMT is enabled. So just booting some guests > > > without CPU topology never did parallel booting ("smpboot: Disabling > > > parallel bringup because CPUID 0xb looks untrustworthy"). I would imagine > > > a bare-metal system that has diabled SMT will not do parallel booting, too > > > (but I haven't had time to test that). > > > > Interesting, thanks. Should I change to checking for *both* EAX and EBX > > being zero? That's what I did first, after reading only the Intel SDM. > > But I changed to only EAX because the AMD doc only says that EAX will > > be zero for unsupported leaves. > > From a baremetal perspective, I think that works. Rome was the first > generation to support x2apic, and the PPR for Rome states that 0's are > returned in all 4 registers for undefined function numbers. > > For virtualization, at least Qemu/KVM, that also looks to be a safe test. At Sean's suggestion, I've switched it to use the existing check_extended_topology_leaf() which checks for EBX being non-zero, and CH being 1 (SMT_TYPE). I also made it work even if the kernel isn't using x2apic mode (is that even possible, or does SEV-ES require the MSR-based access anyway?) It just looked odd handling SEV-ES in the CPUID 0x0B path but not the CPUID 0x01 case, and I certainly didn't want to implement the asm side for handling CPUID 0x01 via the GHCB protocol. And this way I can pull the check for CC_ATTR_GUEST_STATE_ENCRYPT up above. Which I've kept for now for the reason described in the comment, but I won't die on that hill. https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/parallel-6.2-v14 Looks like this: /* * We can do 64-bit AP bringup in parallel if the CPU reports its APIC * ID in CPUID (either leaf 0x0B if we need the full APIC ID in X2APIC * mode, or leaf 0x01 if 8 bits are sufficient). Otherwise it's too * hard. */ static bool prepare_parallel_bringup(void) { bool has_sev_es = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT) && static_branch_unlikely(&sev_es_enable_key); if (IS_ENABLED(CONFIG_X86_32)) return false; /* * Encrypted guests other than SEV-ES (in the future) will need to * implement an early way of finding the APIC ID, since they will * presumably block direct CPUID too. Be kind to our future selves * by warning here instead of just letting them break. Parallel * startup doesn't have to be in the first round of enabling patches * for any such technology. */ if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT) || !has_sev_es) { pr_info("Disabling parallel bringup due to guest memory encryption\n"); return false; } if (x2apic_mode || has_sev_es) { if (boot_cpu_data.cpuid_level < 0x0b) return false; if (check_extended_topology_leaf(0x0b) != 0) { pr_info("Disabling parallel bringup because CPUID 0xb looks untrustworthy\n"); return false; } if (has_sev_es) { pr_debug("Using SEV-ES CPUID 0xb for parallel CPU startup\n"); smpboot_control = STARTUP_APICID_SEV_ES; } else { pr_debug("Using CPUID 0xb for parallel CPU startup\n"); smpboot_control = STARTUP_APICID_CPUID_0B; } } else { /* Without X2APIC, what's in CPUID 0x01 should suffice. */ if (boot_cpu_data.cpuid_level < 0x01) return false; pr_debug("Using CPUID 0x1 for parallel CPU startup\n"); smpboot_control = STARTUP_APICID_CPUID_01; } cpuhp_setup_state_nocalls(CPUHP_BP_PARALLEL_DYN, "x86/cpu:kick", native_cpu_kick, NULL); return true; }
On 08/03/2023 09:04, David Woodhouse wrote: > On Tue, 2023-03-07 at 16:55 -0600, Tom Lendacky wrote: >> On 3/7/23 16:27, David Woodhouse wrote: >>> On Tue, 2023-03-07 at 16:22 -0600, Tom Lendacky wrote: >>>> >>>> I did some Qemu/KVM testing. One thing I noticed is that on AMD, CPUID 0xB >>>> EAX will be non-zero only if SMT is enabled. So just booting some guests >>>> without CPU topology never did parallel booting ("smpboot: Disabling >>>> parallel bringup because CPUID 0xb looks untrustworthy"). I would imagine >>>> a bare-metal system that has diabled SMT will not do parallel booting, too >>>> (but I haven't had time to test that). >>> >>> Interesting, thanks. Should I change to checking for *both* EAX and EBX >>> being zero? That's what I did first, after reading only the Intel SDM. >>> But I changed to only EAX because the AMD doc only says that EAX will >>> be zero for unsupported leaves. >> >> From a baremetal perspective, I think that works. Rome was the first >> generation to support x2apic, and the PPR for Rome states that 0's are >> returned in all 4 registers for undefined function numbers. >> >> For virtualization, at least Qemu/KVM, that also looks to be a safe test. > > At Sean's suggestion, I've switched it to use the existing > check_extended_topology_leaf() which checks for EBX being non-zero, and > CH being 1 (SMT_TYPE). > > I also made it work even if the kernel isn't using x2apic mode (is that > even possible, or does SEV-ES require the MSR-based access anyway?) > > It just looked odd handling SEV-ES in the CPUID 0x0B path but not the > CPUID 0x01 case, and I certainly didn't want to implement the asm side > for handling CPUID 0x01 via the GHCB protocol. And this way I can pull > the check for CC_ATTR_GUEST_STATE_ENCRYPT up above. Which I've kept for > now for the reason described in the comment, but I won't die on that > hill. > > https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/parallel-6.2-v14 > > Looks like this: > > /* > * We can do 64-bit AP bringup in parallel if the CPU reports its APIC > * ID in CPUID (either leaf 0x0B if we need the full APIC ID in X2APIC > * mode, or leaf 0x01 if 8 bits are sufficient). Otherwise it's too > * hard. > */ > static bool prepare_parallel_bringup(void) > { > bool has_sev_es = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT) && > static_branch_unlikely(&sev_es_enable_key); > > if (IS_ENABLED(CONFIG_X86_32)) > return false; > > /* > * Encrypted guests other than SEV-ES (in the future) will need to > * implement an early way of finding the APIC ID, since they will > * presumably block direct CPUID too. Be kind to our future selves > * by warning here instead of just letting them break. Parallel > * startup doesn't have to be in the first round of enabling patches > * for any such technology. > */ > if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT) || !has_sev_es) { > pr_info("Disabling parallel bringup due to guest memory encryption\n"); > return false; I believe this is still going to enable parallel bringup for TDX? Looking at include/linux/cc_platform.h, it looks like CC_ATTR_GUEST_STATE_ENCRYPT is only set for SEV-ES and TDX guest with x2apic will go on in this function and enable parallel bringup if leaf 0xB is ok. I guess if the apic ID is OK for the TDX guest, then its fine, but just wanted to check if anyone has tested this on TDX guest? > } > > if (x2apic_mode || has_sev_es) { > if (boot_cpu_data.cpuid_level < 0x0b) > return false; > > if (check_extended_topology_leaf(0x0b) != 0) { > pr_info("Disabling parallel bringup because CPUID 0xb looks untrustworthy\n"); > return false; > } > > if (has_sev_es) { > pr_debug("Using SEV-ES CPUID 0xb for parallel CPU startup\n"); > smpboot_control = STARTUP_APICID_SEV_ES; > } else { > pr_debug("Using CPUID 0xb for parallel CPU startup\n"); > smpboot_control = STARTUP_APICID_CPUID_0B; > } > } else { > /* Without X2APIC, what's in CPUID 0x01 should suffice. */ > if (boot_cpu_data.cpuid_level < 0x01) > return false; > > pr_debug("Using CPUID 0x1 for parallel CPU startup\n"); > smpboot_control = STARTUP_APICID_CPUID_01; > } > > cpuhp_setup_state_nocalls(CPUHP_BP_PARALLEL_DYN, "x86/cpu:kick", > native_cpu_kick, NULL); > return true; > } >
From: David Woodhouse > Sent: 08 March 2023 09:05 ... > https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/parallel-6.2-v14 > > Looks like this: > > /* > * We can do 64-bit AP bringup in parallel if the CPU reports its APIC > * ID in CPUID (either leaf 0x0B if we need the full APIC ID in X2APIC > * mode, or leaf 0x01 if 8 bits are sufficient). Otherwise it's too > * hard. > */ > static bool prepare_parallel_bringup(void) > { > bool has_sev_es = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT) && > static_branch_unlikely(&sev_es_enable_key); > > if (IS_ENABLED(CONFIG_X86_32)) > return false; > > /* > * Encrypted guests other than SEV-ES (in the future) will need to > * implement an early way of finding the APIC ID, since they will > * presumably block direct CPUID too. Be kind to our future selves > * by warning here instead of just letting them break. Parallel > * startup doesn't have to be in the first round of enabling patches > * for any such technology. > */ > if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT) || !has_sev_es) { > pr_info("Disabling parallel bringup due to guest memory encryption\n"); > return false; > } That looks wrong, won't has_sev_es almost always be false so it prints the message and returns? Maybe s/||/&&/ ? David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Wed, 2023-03-08 at 12:15 +0000, David Laight wrote: > > > /* > > * Encrypted guests other than SEV-ES (in the future) will need to > > * implement an early way of finding the APIC ID, since they will > > * presumably block direct CPUID too. Be kind to our future selves > > * by warning here instead of just letting them break. Parallel > > * startup doesn't have to be in the first round of enabling patches > > * for any such technology. > > */ > > if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT) || !has_sev_es) { > > pr_info("Disabling parallel bringup due to guest memory encryption\n"); > > return false; > > } > > That looks wrong, won't has_sev_es almost always be false > so it prints the message and returns? > Maybe s/||/&&/ ? D'oh! Fixed; thanks.
> static bool prepare_parallel_bringup(void) > { > bool has_sev_es = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT) && > static_branch_unlikely(&sev_es_enable_key); sev_es_enable_key is only defined when CONFIG_AMD_MEM_ENCRYPT is enabled, so it gives a build error when AMD_MEM_ENCRYPT is disabled. maybe below is better? diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 282cca020777..e7df41436cfe 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1519,8 +1519,12 @@ void __init smp_prepare_cpus_common(void) */ static bool prepare_parallel_bringup(void) { - bool has_sev_es = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT) && - static_branch_unlikely(&sev_es_enable_key); + bool has_sev_es; +#ifdef CONFIG_AMD_MEM_ENCRYPT + has_sev_es = static_branch_unlikely(&sev_es_enable_key); +#else + has_sev_es = 0; +#endif if (IS_ENABLED(CONFIG_X86_32)) return false;
On 3/8/23 03:04, David Woodhouse wrote: > On Tue, 2023-03-07 at 16:55 -0600, Tom Lendacky wrote: >> On 3/7/23 16:27, David Woodhouse wrote: >>> On Tue, 2023-03-07 at 16:22 -0600, Tom Lendacky wrote: >>>> >>>> I did some Qemu/KVM testing. One thing I noticed is that on AMD, CPUID 0xB >>>> EAX will be non-zero only if SMT is enabled. So just booting some guests >>>> without CPU topology never did parallel booting ("smpboot: Disabling >>>> parallel bringup because CPUID 0xb looks untrustworthy"). I would imagine >>>> a bare-metal system that has diabled SMT will not do parallel booting, too >>>> (but I haven't had time to test that). >>> >>> Interesting, thanks. Should I change to checking for *both* EAX and EBX >>> being zero? That's what I did first, after reading only the Intel SDM. >>> But I changed to only EAX because the AMD doc only says that EAX will >>> be zero for unsupported leaves. >> >> From a baremetal perspective, I think that works. Rome was the first >> generation to support x2apic, and the PPR for Rome states that 0's are >> returned in all 4 registers for undefined function numbers. >> >> For virtualization, at least Qemu/KVM, that also looks to be a safe test. > > At Sean's suggestion, I've switched it to use the existing > check_extended_topology_leaf() which checks for EBX being non-zero, and > CH being 1 (SMT_TYPE). > > I also made it work even if the kernel isn't using x2apic mode (is that > even possible, or does SEV-ES require the MSR-based access anyway?) > > It just looked odd handling SEV-ES in the CPUID 0x0B path but not the > CPUID 0x01 case, and I certainly didn't want to implement the asm side > for handling CPUID 0x01 via the GHCB protocol. And this way I can pull > the check for CC_ATTR_GUEST_STATE_ENCRYPT up above. Which I've kept for > now for the reason described in the comment, but I won't die on that > hill. You can boot an SEV-ES guest in apic mode, but that would be unusual, so I think this approach is fine. Thanks, Tom > > https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/parallel-6.2-v14 > > Looks like this: > > /* > * We can do 64-bit AP bringup in parallel if the CPU reports its APIC > * ID in CPUID (either leaf 0x0B if we need the full APIC ID in X2APIC > * mode, or leaf 0x01 if 8 bits are sufficient). Otherwise it's too > * hard. > */ > static bool prepare_parallel_bringup(void) > { > bool has_sev_es = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT) && > static_branch_unlikely(&sev_es_enable_key); > > if (IS_ENABLED(CONFIG_X86_32)) > return false; > > /* > * Encrypted guests other than SEV-ES (in the future) will need to > * implement an early way of finding the APIC ID, since they will > * presumably block direct CPUID too. Be kind to our future selves > * by warning here instead of just letting them break. Parallel > * startup doesn't have to be in the first round of enabling patches > * for any such technology. > */ > if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT) || !has_sev_es) { > pr_info("Disabling parallel bringup due to guest memory encryption\n"); > return false; > } > > if (x2apic_mode || has_sev_es) { > if (boot_cpu_data.cpuid_level < 0x0b) > return false; > > if (check_extended_topology_leaf(0x0b) != 0) { > pr_info("Disabling parallel bringup because CPUID 0xb looks untrustworthy\n"); > return false; > } > > if (has_sev_es) { > pr_debug("Using SEV-ES CPUID 0xb for parallel CPU startup\n"); > smpboot_control = STARTUP_APICID_SEV_ES; > } else { > pr_debug("Using CPUID 0xb for parallel CPU startup\n"); > smpboot_control = STARTUP_APICID_CPUID_0B; > } > } else { > /* Without X2APIC, what's in CPUID 0x01 should suffice. */ > if (boot_cpu_data.cpuid_level < 0x01) > return false; > > pr_debug("Using CPUID 0x1 for parallel CPU startup\n"); > smpboot_control = STARTUP_APICID_CPUID_01; > } > > cpuhp_setup_state_nocalls(CPUHP_BP_PARALLEL_DYN, "x86/cpu:kick", > native_cpu_kick, NULL); > return true; > } >