Message ID | 20230223191140.4155012-8-usama.arif@bytedance.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp484963wrd; Thu, 23 Feb 2023 11:14:36 -0800 (PST) X-Google-Smtp-Source: AK7set/R0HESdnbZdfpF34K18vStzkJMBQVCY5CjO1oD86UEpaZcCJ6b1YNx5ETZwAZeX/HIX8cl X-Received: by 2002:a05:6402:35d0:b0:4ad:6f56:a362 with SMTP id z16-20020a05640235d000b004ad6f56a362mr15291677edc.4.1677179676795; Thu, 23 Feb 2023 11:14:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1677179676; cv=none; d=google.com; s=arc-20160816; b=wXnPD3RnRv7Fh5cAyXut5YaFDo4PP+6xkKMq52Nuh5oBga4tjwTQxlP0+8VPQ0fHiL 34gL+IJmiLxYhmoeK+hDivpeQWHkeXgn9G+Kwu+gk2BiYKlMKCl0H1Da0NzLPfTqvXPh YQHtTX36QCZuHYh7yUZr1qNNO6ocBWTUYDD+o9K8W86/jEJEjIAtCF88s8kOv+ekT3cr 1tmGEKM7y3yAaAf9L7tEvx6wVI+BpGowKNJG1pUOayau6cgoAXLB1mkl6AuX0USGT+Q3 6T9393uco8q13i12av24aXMCgfFWmI9nUsF9006KnLWsecnO3bd/6xb5Yr8yclD7llyM TyIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=0cOxOkpCD+vr2n7vBER3mbkwWwlvY2KHB3oBnHbXr/U=; b=JuXSGfXWlu6Qzx2SSPQzxyfCt+Ni+P6SNyYZfQvyMKikJcyolZ6Qj3fuVwMCqq8X1w g7h27SK6VvCo0+dUbVWh8LpwpALsN/JMglMLWTW85bFrDrpWaJPMalKQLZCUL852nMmm um5LdvibEI4X7IJj8VCpr2fokV2ceMg/lrdZvOKDpA5yy6ywd2pJPZEoUMJxtto9ioWP EhJ9AkC4WhV8KStuiXIXvNAGUOebiAAghWYitsVMVe8hr9M79gCCirHLFNOV3F2BsI9f dEUtZw4LAO5zxX0P+Yphms5Xeid9xgpIYwyvZ/9dU8RyLeU0oHbsNd2oAcyfaDn0qZfE +1CA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=ErntIUgA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r13-20020aa7d14d000000b0045fca739593si8948421edo.188.2023.02.23.11.14.12; Thu, 23 Feb 2023 11:14:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=ErntIUgA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232537AbjBWTNe (ORCPT <rfc822;cambridge8321@gmail.com> + 99 others); Thu, 23 Feb 2023 14:13:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60202 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232312AbjBWTNM (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 23 Feb 2023 14:13:12 -0500 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E3AA75BBA2 for <linux-kernel@vger.kernel.org>; Thu, 23 Feb 2023 11:12:22 -0800 (PST) Received: by mail-wr1-x42e.google.com with SMTP id i11so5397702wrp.5 for <linux-kernel@vger.kernel.org>; Thu, 23 Feb 2023 11:12:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0cOxOkpCD+vr2n7vBER3mbkwWwlvY2KHB3oBnHbXr/U=; b=ErntIUgAGEyRKXt4RMT66Pr0KsOFVw3vBflYFgfNggAyszQJLjPK+0lmvNVh2KXY7g wLgsBRVHOmbmRWbenYGMIxstY0+viUEq09wJYgsdb0oa72GZHZyRNICptybAqmyCeY/4 JSfHCnSohGbanQEtvvBqQOiPK3IGTQsbhuRvCIkXAtcI/l1zNfcYBX53buZ7xaGJm9Si tIx8lFSAKy+fGR9TzyT8jccKHgDF22drQHBcbFUumT4w4Cm+cupSv8AmEqAky0qy3J34 Dsakgq/7oGhgzOOA/T+PDYwDyEqdsiF1RVugLC3zmLNw4vIThARfDuXru7zf/cuVHvwj jtaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0cOxOkpCD+vr2n7vBER3mbkwWwlvY2KHB3oBnHbXr/U=; b=bTwqbJj+YmeJuObeSRUN9rJR/W5XRgu9lVl2mjgJup/WHUmZUJlXRlouOwkAWt2Ln1 1nheTiZhY8Y7U+MqZvlYRdHJ3dkhZjLmXxVKb0sh9YAnYbwO41Anz8bGuxxPoGCTZdcd y0LihAAbUSmzj+2lHlGlcSQuUZngWuMaJon5lfX1nkbACPN9HXERVGV+LRfXCzbQKp3f bb70MFsFWMplIGbzTNF1dAzwFlzMjhUi+8rETMx2iv6qLbp1huIM/koHCLuCYguUm8NA Hz98ZEpvxY47EwnzpO2mo8F8jKZn080ZliBk4W8TgS7u5Dmuar2UhVfdhStP5CqN11J0 253g== X-Gm-Message-State: AO0yUKX/wmPb1Vg1ZroOdRryIx+jiioRzb0h+2C/deCX8gRLrjFFV/Ya W0bFCCuXQ1kBYp/QFPiLC0qaHA== X-Received: by 2002:adf:ea11:0:b0:2c7:a39:7453 with SMTP id q17-20020adfea11000000b002c70a397453mr6424268wrm.54.1677179509837; Thu, 23 Feb 2023 11:11:49 -0800 (PST) Received: from usaari01.cust.communityfibre.co.uk ([2a02:6b6a:b566:0:5ee0:5af0:64bd:6198]) by smtp.gmail.com with ESMTPSA id b15-20020a5d4b8f000000b002c561805a4csm12957286wrt.45.2023.02.23.11.11.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Feb 2023 11:11:49 -0800 (PST) From: Usama Arif <usama.arif@bytedance.com> To: dwmw2@infradead.org, tglx@linutronix.de, kim.phillips@amd.com, brgerst@gmail.com Cc: piotrgorski@cachyos.org, oleksandr@natalenko.name, arjan@linux.intel.com, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, x86@kernel.org, pbonzini@redhat.com, paulmck@kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, rcu@vger.kernel.org, mimoja@mimoja.de, hewenliang4@huawei.com, thomas.lendacky@amd.com, seanjc@google.com, pmenzel@molgen.mpg.de, fam.zheng@bytedance.com, punit.agrawal@bytedance.com, simon.evans@bytedance.com, liangma@liangbit.com, David Woodhouse <dwmw@amazon.co.uk>, Usama Arif <usama.arif@bytedance.com> Subject: [PATCH v11 07/12] x86/smpboot: Send INIT/SIPI/SIPI to secondary CPUs in parallel Date: Thu, 23 Feb 2023 19:11:35 +0000 Message-Id: <20230223191140.4155012-8-usama.arif@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230223191140.4155012-1-usama.arif@bytedance.com> References: <20230223191140.4155012-1-usama.arif@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758650357038102469?= X-GMAIL-MSGID: =?utf-8?q?1758650357038102469?= |
Series |
Parallel CPU bringup for x86_64
|
|
Commit Message
Usama Arif
Feb. 23, 2023, 7:11 p.m. UTC
From: David Woodhouse <dwmw@amazon.co.uk> When the APs can find their own APIC ID without assistance, perform the AP bringup in parallel. Register a CPUHP_BP_PARALLEL_DYN stage "x86/cpu:kick" which just calls do_boot_cpu() to deliver INIT/SIPI/SIPI to each AP in turn before the normal native_cpu_up() does the rest of the hand-holding. The APs will then take turns through the real mode code (which has its own bitlock for exclusion) until they make it to their own stack, then proceed through the first few lines of start_secondary() and execute these parts in parallel: start_secondary() -> cr4_init() -> (some 32-bit only stuff so not in the parallel cases) -> cpu_init_secondary() -> cpu_init_exception_handling() -> cpu_init() -> wait_for_master_cpu() At this point they wait for the BSP to set their bit in cpu_callout_mask (from do_wait_cpu_initialized()), and release them to continue through the rest of cpu_init() and beyond. This reduces the time taken for bringup on my 28-thread Haswell system from about 120ms to 80ms. On a socket 96-thread Skylake it takes the bringup time from 500ms to 100ms. There is more speedup to be had by doing the remaining parts in parallel too — especially notify_cpu_starting() in which the AP takes itself through all the stages from CPUHP_BRINGUP_CPU to CPUHP_ONLINE. But those require careful auditing to ensure they are reentrant, before we can go that far. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Usama Arif <usama.arif@bytedance.com> Tested-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Kim Phillips <kim.phillips@amd.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> --- arch/x86/kernel/smpboot.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-)
Comments
From: Usama Arif <usama.arif@bytedance.com> Sent: Thursday, February 23, 2023 11:12 AM > > From: David Woodhouse <dwmw@amazon.co.uk> > > When the APs can find their own APIC ID without assistance, perform the > AP bringup in parallel. > > Register a CPUHP_BP_PARALLEL_DYN stage "x86/cpu:kick" which just calls > do_boot_cpu() to deliver INIT/SIPI/SIPI to each AP in turn before the > normal native_cpu_up() does the rest of the hand-holding. > > The APs will then take turns through the real mode code (which has its > own bitlock for exclusion) until they make it to their own stack, then > proceed through the first few lines of start_secondary() and execute > these parts in parallel: > > start_secondary() > -> cr4_init() > -> (some 32-bit only stuff so not in the parallel cases) > -> cpu_init_secondary() > -> cpu_init_exception_handling() > -> cpu_init() > -> wait_for_master_cpu() > > At this point they wait for the BSP to set their bit in cpu_callout_mask > (from do_wait_cpu_initialized()), and release them to continue through > the rest of cpu_init() and beyond. > > This reduces the time taken for bringup on my 28-thread Haswell system > from about 120ms to 80ms. On a socket 96-thread Skylake it takes the > bringup time from 500ms to 100ms. I built and tested this series in a Hyper-V VM with 64 vCPUs running on an AMD EPYC "Milan" processor. The VM has an xapic, not an x2apic. The patch set works correctly, with and without the no_parallel_bringup kernel boot option. In a running Linux instance, I was looking for a way to confirm whether it used parallel bringup. I could only find checking for the "x86/cpu:kick" state in /sys/devices/system/cpu/hotplug/states. Always outputting a boot message to indicate the approach might be helpful. Interestingly, I found no reduction in elapsed time to bring up the 64 vCPUs. Depending on exactly where you measure, it is 80 to 90 milliseconds before applying the patch set, and after applying the patch set (with or without no_parallel_bringup). Evidently, VMs already avoid a good part of the overhead in the existing serialized approach. [ 1.503699] smp: Bringing up secondary CPUs ... [ 1.507339] x86: Booting SMP configuration: [ 1.511192] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 #50 #51 #52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63 [ 1.588039] smp: Brought up 1 node, 64 CPUs [ 1.595513] smpboot: Max logical packages: 1 [ 1.599186] smpboot: Total of 64 processors activated (255524.22 BogoMIPS) The "x86/cpu:kick" state was present for the parallel bringup case, so presumably the parallel behavior *did* happen, unless there is later bailout path that I missed. But there weren't any boot messages indicating such. Michael For the series, on Hyper-V guests: Tested-by: Michael Kelley <mikelley@microsoft.com> > > There is more speedup to be had by doing the remaining parts in parallel > too — especially notify_cpu_starting() in which the AP takes itself > through all the stages from CPUHP_BRINGUP_CPU to CPUHP_ONLINE. But those > require careful auditing to ensure they are reentrant, before we can go > that far. > > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> > Signed-off-by: Usama Arif <usama.arif@bytedance.com> > Tested-by: Paul E. McKenney <paulmck@kernel.org> > Tested-by: Kim Phillips <kim.phillips@amd.com> > Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> > --- > arch/x86/kernel/smpboot.c | 21 ++++++++++++++++++--- > 1 file changed, 18 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > index 74c76c78f7d2..85ce6a8978ff 100644 > --- a/arch/x86/kernel/smpboot.c > +++ b/arch/x86/kernel/smpboot.c > @@ -57,6 +57,7 @@ > #include <linux/pgtable.h> > #include <linux/overflow.h> > #include <linux/stackprotector.h> > +#include <linux/smpboot.h> > > #include <asm/acpi.h> > #include <asm/cacheinfo.h> > @@ -1325,9 +1326,12 @@ int native_cpu_up(unsigned int cpu, struct task_struct > *tidle) > { > int ret; > > - ret = do_cpu_up(cpu, tidle); > - if (ret) > - return ret; > + /* If parallel AP bringup isn't enabled, perform the first steps now. */ > + if (!do_parallel_bringup) { > + ret = do_cpu_up(cpu, tidle); > + if (ret) > + return ret; > + } > > ret = do_wait_cpu_initialized(cpu); > if (ret) > @@ -1349,6 +1353,12 @@ int native_cpu_up(unsigned int cpu, struct task_struct > *tidle) > return ret; > } > > +/* Bringup step one: Send INIT/SIPI to the target AP */ > +static int native_cpu_kick(unsigned int cpu) > +{ > + return do_cpu_up(cpu, idle_thread_get(cpu)); > +} > + > /** > * arch_disable_smp_support() - disables SMP support for x86 at runtime > */ > @@ -1566,6 +1576,11 @@ void __init native_smp_prepare_cpus(unsigned int > max_cpus) > smpboot_control = STARTUP_SECONDARY | > STARTUP_APICID_CPUID_01; > } > > + if (do_parallel_bringup) { > + cpuhp_setup_state_nocalls(CPUHP_BP_PARALLEL_DYN, > "x86/cpu:kick", > + native_cpu_kick, NULL); > + } > + > snp_set_wakeup_secondary_cpu(); > } > > -- > 2.25.1
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 74c76c78f7d2..85ce6a8978ff 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -57,6 +57,7 @@ #include <linux/pgtable.h> #include <linux/overflow.h> #include <linux/stackprotector.h> +#include <linux/smpboot.h> #include <asm/acpi.h> #include <asm/cacheinfo.h> @@ -1325,9 +1326,12 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle) { int ret; - ret = do_cpu_up(cpu, tidle); - if (ret) - return ret; + /* If parallel AP bringup isn't enabled, perform the first steps now. */ + if (!do_parallel_bringup) { + ret = do_cpu_up(cpu, tidle); + if (ret) + return ret; + } ret = do_wait_cpu_initialized(cpu); if (ret) @@ -1349,6 +1353,12 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle) return ret; } +/* Bringup step one: Send INIT/SIPI to the target AP */ +static int native_cpu_kick(unsigned int cpu) +{ + return do_cpu_up(cpu, idle_thread_get(cpu)); +} + /** * arch_disable_smp_support() - disables SMP support for x86 at runtime */ @@ -1566,6 +1576,11 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus) smpboot_control = STARTUP_SECONDARY | STARTUP_APICID_CPUID_01; } + if (do_parallel_bringup) { + cpuhp_setup_state_nocalls(CPUHP_BP_PARALLEL_DYN, "x86/cpu:kick", + native_cpu_kick, NULL); + } + snp_set_wakeup_secondary_cpu(); }