From patchwork Tue Feb 21 22:33:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Usama Arif X-Patchwork-Id: 60275 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp258014wrd; Tue, 21 Feb 2023 14:38:15 -0800 (PST) X-Google-Smtp-Source: AK7set9RdlUPeSzPPEV20Qo5uv51QLQqtVmnXPMc0qJ15f8v9cREfy0vWlur6JaWwf5pchWE7Iet X-Received: by 2002:a50:fb04:0:b0:4af:601e:6034 with SMTP id d4-20020a50fb04000000b004af601e6034mr271948edq.42.1677019095607; Tue, 21 Feb 2023 14:38:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1677019095; cv=none; d=google.com; s=arc-20160816; b=jcUc6Ju4OeJ6N0HiKBkd0z4sEwQhDZzFmKvWKxlCAtlDGU9liWaOqPxYI2mphNas1K WxoU682Rd5FbDePzKUpVe82pWCBVraaZjLZQ54hM8gp70n2fwU8IaA1CHQ+KEa58ZcEy RsTRp1W/xtNioTJuKz+fHBvVlyPCH4+oAaEwJFk6VVtApnrRZekrJFo+VZeXNcn8uz5S AWpQ3p7stQ4y/4hZwaOpGdMFJoEjoyPCgPocBra4PLE7RBUPW6PgWaqzsHXmJ2kA7WVM RYVZcpPcJWQpftJ/OX0gjsFyFofKXH9hKf8/Zuo7HpNBuRO5DgLgHrJSvD064jFF/zpa O99w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=8SFcyAg12RLJRTJdp4mtixGxiIqYpnMS3znV8hvpPts=; b=qOFbey8p/PkEaz6TvbbSQ9bG6lujPgvpZB00NI6SAWw9EXjBfjRGFdRfeETigqBMkx s1KLnsyEc77YrJJ7a/zHirtQdV4isbaKtVva80jmeCpUFtUq0JjPtm+edBi3A2tmgFXl KHbcnf0jK+dvLwmj6682t7itUHyqniGf9X6T3FQTwMtJdPAvuC6sSLb2dacfOrFF0Yyx 7AY9HnLZyftfM1Sppdu0969vKss52aEJgR930BTtzBnRFV1+mIA0ClSVnu4aHud8NABt A7fd6XJcGVqvGNlh+U6Ih7ijB+gGsyczyN254EelhIJ2Sy+mtGkzMJt3C6ovUyVhmNNu yzQQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=S3HMRrMM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n7-20020aa7c687000000b004ad4c4b5ab2si19898896edq.392.2023.02.21.14.37.51; Tue, 21 Feb 2023 14:38:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=S3HMRrMM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230310AbjBUWfu (ORCPT + 99 others); Tue, 21 Feb 2023 17:35:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58178 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229911AbjBUWft (ORCPT ); Tue, 21 Feb 2023 17:35:49 -0500 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7F57F32CD5 for ; Tue, 21 Feb 2023 14:35:46 -0800 (PST) Received: by mail-wr1-x42a.google.com with SMTP id v3so5923252wrp.2 for ; Tue, 21 Feb 2023 14:35:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8SFcyAg12RLJRTJdp4mtixGxiIqYpnMS3znV8hvpPts=; b=S3HMRrMMTj8/UxA+6TiH/7TSuv5PXhAsDiDnkI9Hl0so3DWTcA4w3pMgYIp7KvOrgS Xf+a0k0uXEhs7lCYyriG4d2JCEQXHKsg/yDQSwn3owv/C5Lo2N2BWSEIDIQjh7tope9g 7ayiUl9vuvwTUMFfRwFS65OJhY9fZ9ro08u3Gn5G4eUitLER3itUXLtO21OA0/QQJg29 9l5Dsl4HffSOnSava/YjI6Jbtzib+zmg3dln1NHLxpsa21ld4pG3D2hobj6gBXXshLqN YgSMgW2edtErlqDCautMKoVcG93CMcMuhale3rYrX9lKlGF7NwWwVQb/7JJidcjQQ/f9 gwjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8SFcyAg12RLJRTJdp4mtixGxiIqYpnMS3znV8hvpPts=; b=h2ckK9R9XBSM1+se9mPMCbZFSoBQYFVrJHZUfDAqyR+wvuiof57k3J7p8esppgztZu 4bioJEz/4Cg7Pyu7Tcmb/RydWFd3EQs1V49zMnIn7Jp5Zr6ttnpdSaituJLFKFWKAY1f 3UElqPnuXKLRlz5Pzqt0DPOqHrhYDOis8Q6cHNpi3pOZeUt4sME4u97m9ARWB/hcUsib uWs8PB35aI0AK5I9paz/wB3EI2zGCuqQWbq8slUvPJ/Z6s+lklxTnuPV8xzeOqKiEgQV ut+guX2JSTeQ3UfLodfSlp/5sCAs3M5yRd04vVFiHwRjM9kawo3naLGO9v0BTjqKPlVZ wtfA== X-Gm-Message-State: AO0yUKVHsEsHOYebIsxglDY8XPZ45AnPBIrX/w87lbZr425DHcQsI7oR tWBErmHmtQp5i+GQLTM6Kpp9c21SAHqOUoAt X-Received: by 2002:adf:db4c:0:b0:2c5:4a89:f60b with SMTP id f12-20020adfdb4c000000b002c54a89f60bmr6229953wrj.38.1677018944969; Tue, 21 Feb 2023 14:35:44 -0800 (PST) Received: from usaari01.cust.communityfibre.co.uk ([2a02:6b6a:b566:0:1a14:8be6:b3a9:a95e]) by smtp.gmail.com with ESMTPSA id u13-20020a5d434d000000b002c55ec7f661sm4501254wrr.5.2023.02.21.14.35.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Feb 2023 14:35:44 -0800 (PST) From: Usama Arif To: dwmw2@infradead.org, tglx@linutronix.de, kim.phillips@amd.com Cc: piotrgorski@cachyos.org, oleksandr@natalenko.name, arjan@linux.intel.com, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, x86@kernel.org, pbonzini@redhat.com, paulmck@kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, rcu@vger.kernel.org, mimoja@mimoja.de, hewenliang4@huawei.com, thomas.lendacky@amd.com, seanjc@google.com, pmenzel@molgen.mpg.de, fam.zheng@bytedance.com, punit.agrawal@bytedance.com, simon.evans@bytedance.com, liangma@liangbit.com, David Woodhouse , Usama Arif Subject: [PATCH v10 1/8] x86/apic/x2apic: Allow CPU cluster_mask to be populated in parallel Date: Tue, 21 Feb 2023 22:33:45 +0000 Message-Id: <20230221223352.2288528-2-usama.arif@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230221223352.2288528-1-usama.arif@bytedance.com> References: <20230221223352.2288528-1-usama.arif@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758481975460327752?= X-GMAIL-MSGID: =?utf-8?q?1758481975460327752?= From: David Woodhouse Each of the sibling CPUs in a cluster uses the same clustermask. The first CPU in a cluster will need a new clustermask allocated, while subsequent siblings will use the same clustermask as the first. However, the CPU being brought up cannot yet perform memory allocations at the point that this occurs in init_x2apic_ldr(). So at present, the alloc_clustermask() function allocates a clustermask just in case it's needed, storing it in the global cluster_hotplug_mask. A CPU which is the first sibling of a cluster will "take" it from there and set cluster_hotplug_mask to NULL, in order for alloc_clustermask() to allocate a new one before bringing up the next CPU. To facilitate parallel bringup of CPUs in future, switch to a model where alloc_clustermask() prepopulates the clustermask in the per_cpu data for each present CPU in the cluster in advance. All that the CPU needs to do for itself in init_x2apic_ldr() is set its own bit in that mask. The 'node' and 'clusterid' members of struct cluster_mask are thus redundant, and it can become a simple struct cpumask instead. Suggested-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Usama Arif Tested-by: Paul E. McKenney Tested-by: Kim Phillips Tested-by: Oleksandr Natalenko --- arch/x86/kernel/apic/x2apic_cluster.c | 126 +++++++++++++++++--------- 1 file changed, 82 insertions(+), 44 deletions(-) diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c index e696e22d0531..b2b2b7f3e03f 100644 --- a/arch/x86/kernel/apic/x2apic_cluster.c +++ b/arch/x86/kernel/apic/x2apic_cluster.c @@ -9,11 +9,7 @@ #include "local.h" -struct cluster_mask { - unsigned int clusterid; - int node; - struct cpumask mask; -}; +#define apic_cluster(apicid) ((apicid) >> 4) /* * __x2apic_send_IPI_mask() possibly needs to read @@ -23,8 +19,7 @@ struct cluster_mask { static u32 *x86_cpu_to_logical_apicid __read_mostly; static DEFINE_PER_CPU(cpumask_var_t, ipi_mask); -static DEFINE_PER_CPU_READ_MOSTLY(struct cluster_mask *, cluster_masks); -static struct cluster_mask *cluster_hotplug_mask; +static DEFINE_PER_CPU_READ_MOSTLY(struct cpumask *, cluster_masks); static int x2apic_acpi_madt_oem_check(char *oem_id, char *oem_table_id) { @@ -60,10 +55,10 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest) /* Collapse cpus in a cluster so a single IPI per cluster is sent */ for_each_cpu(cpu, tmpmsk) { - struct cluster_mask *cmsk = per_cpu(cluster_masks, cpu); + struct cpumask *cmsk = per_cpu(cluster_masks, cpu); dest = 0; - for_each_cpu_and(clustercpu, tmpmsk, &cmsk->mask) + for_each_cpu_and(clustercpu, tmpmsk, cmsk) dest |= x86_cpu_to_logical_apicid[clustercpu]; if (!dest) @@ -71,7 +66,7 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest) __x2apic_send_IPI_dest(dest, vector, APIC_DEST_LOGICAL); /* Remove cluster CPUs from tmpmask */ - cpumask_andnot(tmpmsk, tmpmsk, &cmsk->mask); + cpumask_andnot(tmpmsk, tmpmsk, cmsk); } local_irq_restore(flags); @@ -105,55 +100,98 @@ static u32 x2apic_calc_apicid(unsigned int cpu) static void init_x2apic_ldr(void) { - struct cluster_mask *cmsk = this_cpu_read(cluster_masks); - u32 cluster, apicid = apic_read(APIC_LDR); - unsigned int cpu; + struct cpumask *cmsk = this_cpu_read(cluster_masks); - x86_cpu_to_logical_apicid[smp_processor_id()] = apicid; + BUG_ON(!cmsk); - if (cmsk) - goto update; - - cluster = apicid >> 16; - for_each_online_cpu(cpu) { - cmsk = per_cpu(cluster_masks, cpu); - /* Matching cluster found. Link and update it. */ - if (cmsk && cmsk->clusterid == cluster) - goto update; + cpumask_set_cpu(smp_processor_id(), cmsk); +} + +/* + * As an optimisation during boot, set the cluster_mask for all present + * CPUs at once, to prevent each of them having to iterate over the others + * to find the existing cluster_mask. + */ +static void prefill_clustermask(struct cpumask *cmsk, unsigned int cpu, u32 cluster) +{ + int cpu_i; + + for_each_present_cpu(cpu_i) { + struct cpumask **cpu_cmsk = &per_cpu(cluster_masks, cpu_i); + u32 apicid = apic->cpu_present_to_apicid(cpu_i); + + if (apicid == BAD_APICID || cpu_i == cpu || apic_cluster(apicid) != cluster) + continue; + + if (WARN_ON_ONCE(*cpu_cmsk == cmsk)) + continue; + + BUG_ON(*cpu_cmsk); + *cpu_cmsk = cmsk; } - cmsk = cluster_hotplug_mask; - cmsk->clusterid = cluster; - cluster_hotplug_mask = NULL; -update: - this_cpu_write(cluster_masks, cmsk); - cpumask_set_cpu(smp_processor_id(), &cmsk->mask); } -static int alloc_clustermask(unsigned int cpu, int node) +static int alloc_clustermask(unsigned int cpu, u32 cluster, int node) { + struct cpumask *cmsk = NULL; + unsigned int cpu_i; + + /* + * At boot time, the CPU present mask is stable. The cluster mask is + * allocated for the first CPU in the cluster and propagated to all + * present siblings in the cluster. If the cluster mask is already set + * on entry to this function for a given CPU, there is nothing to do. + */ if (per_cpu(cluster_masks, cpu)) return 0; + + if (system_state < SYSTEM_RUNNING) + goto alloc; + /* - * If a hotplug spare mask exists, check whether it's on the right - * node. If not, free it and allocate a new one. + * On post boot hotplug for a CPU which was not present at boot time, + * iterate over all possible CPUs (even those which are not present + * any more) to find any existing cluster mask. */ - if (cluster_hotplug_mask) { - if (cluster_hotplug_mask->node == node) - return 0; - kfree(cluster_hotplug_mask); + for_each_possible_cpu(cpu_i) { + u32 apicid = apic->cpu_present_to_apicid(cpu_i); + + if (apicid != BAD_APICID && apic_cluster(apicid) == cluster) { + cmsk = per_cpu(cluster_masks, cpu_i); + /* + * If the cluster is already initialized, just store + * the mask and return. There's no need to propagate. + */ + if (cmsk) { + per_cpu(cluster_masks, cpu) = cmsk; + return 0; + } + } } - - cluster_hotplug_mask = kzalloc_node(sizeof(*cluster_hotplug_mask), - GFP_KERNEL, node); - if (!cluster_hotplug_mask) + /* + * No CPU in the cluster has ever been initialized, so fall through to + * the boot time code which will also populate the cluster mask for any + * other CPU in the cluster which is (now) present. + */ +alloc: + cmsk = kzalloc_node(sizeof(*cmsk), GFP_KERNEL, node); + if (!cmsk) return -ENOMEM; - cluster_hotplug_mask->node = node; + per_cpu(cluster_masks, cpu) = cmsk; + prefill_clustermask(cmsk, cpu, cluster); + return 0; } static int x2apic_prepare_cpu(unsigned int cpu) { - if (alloc_clustermask(cpu, cpu_to_node(cpu)) < 0) + u32 phys_apicid = apic->cpu_present_to_apicid(cpu); + u32 cluster = apic_cluster(phys_apicid); + u32 logical_apicid = (cluster << 16) | (1 << (phys_apicid & 0xf)); + + x86_cpu_to_logical_apicid[cpu] = logical_apicid; + + if (alloc_clustermask(cpu, cluster, cpu_to_node(cpu)) < 0) return -ENOMEM; if (!zalloc_cpumask_var(&per_cpu(ipi_mask, cpu), GFP_KERNEL)) return -ENOMEM; @@ -162,10 +200,10 @@ static int x2apic_prepare_cpu(unsigned int cpu) static int x2apic_dead_cpu(unsigned int dead_cpu) { - struct cluster_mask *cmsk = per_cpu(cluster_masks, dead_cpu); + struct cpumask *cmsk = per_cpu(cluster_masks, dead_cpu); if (cmsk) - cpumask_clear_cpu(dead_cpu, &cmsk->mask); + cpumask_clear_cpu(dead_cpu, cmsk); free_cpumask_var(per_cpu(ipi_mask, dead_cpu)); return 0; }