Message ID | 20230314075345.1325187-1-suagrfillet@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1621501wrd; Tue, 14 Mar 2023 00:58:06 -0700 (PDT) X-Google-Smtp-Source: AK7set8mfE8V70G3bJrlJALodRtFRJ7kcYDopJXx9DWfJYUkfaR1OFtSN0hcLAB6g77nKwdAL3Rj X-Received: by 2002:a17:902:ec92:b0:19d:397b:f05 with SMTP id x18-20020a170902ec9200b0019d397b0f05mr9994126plg.32.1678780686432; Tue, 14 Mar 2023 00:58:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678780686; cv=none; d=google.com; s=arc-20160816; b=UV7gCl6MoTW1+fC68+nY9khtSqRWTpG505iHLUU7JZpLRpYpz2mPrkHcLDBe7v3sb7 vWKulJOFUGJC307DLw6gGReidAQybk8P9fROK0XcwyVXvnfQ3gCRri+pME64oXaEWAf1 EBGlTYZpW88eM/oZ+V6xH7bGmru6e4M/NU5bGntSfaf6koeEw1Smo0YQ0GT6HXSDwsBH XoMyuUw0JzGquD7CahA15C3zz6Y+XxrZ88uLrdKQz9XzrIYTCse9eWuta8YWr3tLEUVZ G6dcE0djC2YMIE9TXytO9jEyUnNPw/z8z5DL1FiTOQU6e6WmSBf/3PDR4pcOvIcc1db0 uPsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=REZq+KnWqonCVzM6/fMMTlvObgutUJ1WPPjvLPCRSqM=; b=iAQ9pINABXvvnOGY/pVy3wKC/6eECSHlt91Rbuac1NmvTmjZ+RW5SbrDTWHKDEpaxP WNq6az108eY/fPpHjm/AMf/3VrpYQ0ODSicKnHwwrnQWjmR54BzT/FxVqL2n08PKFIz2 JN6gw4KWhRSU2Ux55VTr4Iu0bT5USdxILQux/7zs5ApDMtX3oBfJpz9HzyheqvrfzkGV NPrGK3R4c33190BuNJ9cZv9UjznVnu/mCaMUEGMQa8xodek1H5GDXY92jmFv70lOBMXT 70aRXxZEUATYPnJx8Knl9BdUojTwjB3/oEmU9xNccKELNTK6N2sJ+3VqmDitNIasInHf VYDg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="H/MXFf2+"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h22-20020a17090aa89600b0023accb1a9e1si1886003pjq.167.2023.03.14.00.57.53; Tue, 14 Mar 2023 00:58:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="H/MXFf2+"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230140AbjCNHyN (ORCPT <rfc822;realc9580@gmail.com> + 99 others); Tue, 14 Mar 2023 03:54:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46888 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230120AbjCNHx6 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 14 Mar 2023 03:53:58 -0400 Received: from mail-pj1-x102d.google.com (mail-pj1-x102d.google.com [IPv6:2607:f8b0:4864:20::102d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DDB0D126E7 for <linux-kernel@vger.kernel.org>; Tue, 14 Mar 2023 00:53:51 -0700 (PDT) Received: by mail-pj1-x102d.google.com with SMTP id p13-20020a17090a284d00b0023d2e945aebso628433pjf.0 for <linux-kernel@vger.kernel.org>; Tue, 14 Mar 2023 00:53:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1678780431; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=REZq+KnWqonCVzM6/fMMTlvObgutUJ1WPPjvLPCRSqM=; b=H/MXFf2+PUHynARpD2H1uDIZ9aYybBlEXbX+FvhuS8WLF2uCJEPqMp/LZ0ADSSNGk5 TgLnmLuj4qAmOTz6jRuRKF/paCGm/q35Y6wy/CRG+aSgmFN6ksuvUUh2/4coH7dUbNBa A5tc8VuyqUUVP/XYN0GGJJjDqGVzikr4VGp2HBztujUijIKzU7UhU6JDXJMsb7Ju9sE4 GfJcaxJ5DabynjP7QbRBnUdS9RRCDgrMpj/YmEd/Oui9QBRYETXvCsGD8vkyvj7jZC1f 7msiPelDKtvelgsGtckfuBgBqSfC1QZOvU/LLGPtaMsCGRMsT4Qut+Vedo5NDXkJUf8N xajQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678780431; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=REZq+KnWqonCVzM6/fMMTlvObgutUJ1WPPjvLPCRSqM=; b=H9IPl8OmxhAJZld38w+VKrTVFVFEdbTWYZovbunQXjyEtE99FIdlTsjFlINjiyxgNs WFPbglAoqr2LvrYWWzFf9zaKGODjM/qCEyl/G77eFC+qGyxjlpiXrf1v4nu9O4JUTG8T ShUaSi4e2vYJNUwk91xUlNsJizKPMPGtEs3Kc8h88pEo67ETbD33wMxQN1fjnAa3wpsM cetIPlVo0jBZS4AdiaDSF4oRZy5gNOpO0RK5KzhvBQpHKyCC8gwyHlBCkD6O6sRWeUU2 5OX2Fwz5T81jMpGFuUkJVb6iQZUhWqgceQh00ZbwbjcP//KpPwDYbIu1C7WTopl7cf2M /l4A== X-Gm-Message-State: AO0yUKWwauRbWocjlyfUFGmKogTWaDiwg8U3ZjGF2y6WJeGrxGSf1QMr /QrR+uIAJOmxOuN/5dVaLQQ= X-Received: by 2002:a05:6a21:33a2:b0:cd:97f3:25e1 with SMTP id yy34-20020a056a2133a200b000cd97f325e1mr48543299pzb.51.1678780431267; Tue, 14 Mar 2023 00:53:51 -0700 (PDT) Received: from localhost.localdomain ([221.226.144.218]) by smtp.gmail.com with ESMTPSA id m13-20020aa7900d000000b00624f6dd2414sm975070pfo.97.2023.03.14.00.53.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Mar 2023 00:53:50 -0700 (PDT) From: Song Shuai <suagrfillet@gmail.com> To: sudeep.holla@arm.com, gregkh@linuxfoundation.org, rafael@kernel.org, conor.dooley@microchip.com, ionela.voinescu@arm.com Cc: linux-kernel@vger.kernel.org, Song Shuai <suagrfillet@gmail.com> Subject: [PATCH V2] arch_topology: Clear LLC sibling when cacheinfo teardown Date: Tue, 14 Mar 2023 15:53:45 +0800 Message-Id: <20230314075345.1325187-1-suagrfillet@gmail.com> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760329137409498345?= X-GMAIL-MSGID: =?utf-8?q?1760329137409498345?= |
Series |
[V2] arch_topology: Clear LLC sibling when cacheinfo teardown
|
|
Commit Message
Song Shuai
March 14, 2023, 7:53 a.m. UTC
The teardown of CPUHP_AP_BASE_CACHEINFO_ONLINE now only invokes
free_cache_attributes() to clear share_cpu_map of cacheinfo list.
At the same time, clearing cpu_topology[].llc_sibling is
called quite late at the teardown code in hotplug STARTING section.
To avoid the incorrect LLC sibling masks generated, move its clearing
right after free_cache_attributes().
Link: https://lore.kernel.org/linux-kernel/20230313102752.1134472-1-suagrfillet@gmail.com/
Fixes: 3fcbf1c77d08 ("arch_topology: Fix cache attributes detection in the CPU hotplug path")
Signed-off-by: Song Shuai <suagrfillet@gmail.com>
---
changes from V1:
- fix implicit declaration of clear_llc_topology
---
drivers/base/arch_topology.c | 16 ++++++++++++++--
drivers/base/cacheinfo.c | 2 ++
include/linux/arch_topology.h | 3 +++
3 files changed, 19 insertions(+), 2 deletions(-)
Comments
On Tue, Mar 14, 2023 at 03:53:45PM +0800, Song Shuai wrote: > The teardown of CPUHP_AP_BASE_CACHEINFO_ONLINE now only invokes > free_cache_attributes() to clear share_cpu_map of cacheinfo list. > At the same time, clearing cpu_topology[].llc_sibling is > called quite late at the teardown code in hotplug STARTING section. > > To avoid the incorrect LLC sibling masks generated, move its clearing > right after free_cache_attributes(). > > Link: https://lore.kernel.org/linux-kernel/20230313102752.1134472-1-suagrfillet@gmail.com/ btw, I think you've added the wrong link here - this seems to be a link to your previous submission. Was it meant to link to something else? Cheers, Conor. > Fixes: 3fcbf1c77d08 ("arch_topology: Fix cache attributes detection in the CPU hotplug path") > Signed-off-by: Song Shuai <suagrfillet@gmail.com> > --- > changes from V1: > - fix implicit declaration of clear_llc_topology > --- > drivers/base/arch_topology.c | 16 ++++++++++++++-- > drivers/base/cacheinfo.c | 2 ++ > include/linux/arch_topology.h | 3 +++ > 3 files changed, 19 insertions(+), 2 deletions(-) > > diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c > index b1c1dd38ab01..8681654d6c07 100644 > --- a/drivers/base/arch_topology.c > +++ b/drivers/base/arch_topology.c > @@ -769,6 +769,20 @@ void update_siblings_masks(unsigned int cpuid) > } > } > > +void clear_llc_topology(unsigned int cpu) > +{ > + int sib; > + > + for_each_cpu(sib, topology_llc_cpumask(cpu)) { > + if (sib == cpu) > + continue; > + if (last_level_cache_is_shared(cpu, sib)) { > + cpumask_clear_cpu(cpu, topology_llc_cpumask(sib)); > + cpumask_clear_cpu(sib, topology_llc_cpumask(cpu)); > + } > + } > +} > + > static void clear_cpu_topology(int cpu) > { > struct cpu_topology *cpu_topo = &cpu_topology[cpu]; > @@ -811,8 +825,6 @@ void remove_cpu_topology(unsigned int cpu) > cpumask_clear_cpu(cpu, topology_sibling_cpumask(sibling)); > for_each_cpu(sibling, topology_cluster_cpumask(cpu)) > cpumask_clear_cpu(cpu, topology_cluster_cpumask(sibling)); > - for_each_cpu(sibling, topology_llc_cpumask(cpu)) > - cpumask_clear_cpu(cpu, topology_llc_cpumask(sibling)); > > clear_cpu_topology(cpu); > } > diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c > index f6573c335f4c..1c276b30fd5a 100644 > --- a/drivers/base/cacheinfo.c > +++ b/drivers/base/cacheinfo.c > @@ -19,6 +19,7 @@ > #include <linux/slab.h> > #include <linux/smp.h> > #include <linux/sysfs.h> > +#include <linux/arch_topology.h> > > /* pointer to per cpu cacheinfo */ > static DEFINE_PER_CPU(struct cpu_cacheinfo, ci_cpu_cacheinfo); > @@ -814,6 +815,7 @@ static int cacheinfo_cpu_pre_down(unsigned int cpu) > cpu_cache_sysfs_exit(cpu); > > free_cache_attributes(cpu); > + clear_llc_topology(cpu); > return 0; > } > > diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h > index a07b510e7dc5..569e05607934 100644 > --- a/include/linux/arch_topology.h > +++ b/include/linux/arch_topology.h > @@ -89,9 +89,12 @@ void store_cpu_topology(unsigned int cpuid); > const struct cpumask *cpu_coregroup_mask(int cpu); > const struct cpumask *cpu_clustergroup_mask(int cpu); > void update_siblings_masks(unsigned int cpu); > +void clear_llc_topology(unsigned int cpu); > void remove_cpu_topology(unsigned int cpuid); > void reset_cpu_topology(void); > int parse_acpi_topology(void); > +#else > +static inline void clear_llc_topology(unsigned int cpu) { } > #endif > > #endif /* _LINUX_ARCH_TOPOLOGY_H_ */ > -- > 2.20.1 >
On Tue, Mar 14, 2023 at 03:53:45PM +0800, Song Shuai wrote: > The teardown of CPUHP_AP_BASE_CACHEINFO_ONLINE now only invokes > free_cache_attributes() to clear share_cpu_map of cacheinfo list. > At the same time, clearing cpu_topology[].llc_sibling is > called quite late at the teardown code in hotplug STARTING section. > > To avoid the incorrect LLC sibling masks generated, move its clearing > right after free_cache_attributes(). > Technically in terms of flow/timing this is correct. However I would like to know if you are seeing any issues without this change ? Technically, if a cpu is hotplugged out, the cacheinfo is reset first and then the topology. Until the cpu is removes, the LLC info in the topology is still valid. Also I am not sure if anything gets scheduled and this LLC info is utilised once the teardown of CPUHP_AP_BASE_CACHEINFO_ONLINE has started. So I am trying to understand if we really need this change. Please let me know if I am missing anything here.
Sudeep Holla <sudeep.holla@arm.com> 于2023年3月16日周四 09:29写道: > > On Tue, Mar 14, 2023 at 03:53:45PM +0800, Song Shuai wrote: > > The teardown of CPUHP_AP_BASE_CACHEINFO_ONLINE now only invokes > > free_cache_attributes() to clear share_cpu_map of cacheinfo list. > > At the same time, clearing cpu_topology[].llc_sibling is > > called quite late at the teardown code in hotplug STARTING section. > > > > To avoid the incorrect LLC sibling masks generated, move its clearing > > right after free_cache_attributes(). > > > > Technically in terms of flow/timing this is correct. However I would like > to know if you are seeing any issues without this change ? > > Technically, if a cpu is hotplugged out, the cacheinfo is reset first > and then the topology. Until the cpu is removes, the LLC info in the > topology is still valid. Also I am not sure if anything gets scheduled > and this LLC info is utilised once the teardown of CPUHP_AP_BASE_CACHEINFO_ONLINE > has started. There is no visible issue in the entire offline process(eg: echo 0 > online). However, when I hotplugged out the CPU into the state before CACHEINFO_ONLINE on my kernel with the CONFIG_CPU_HOTPLUG_STATE_CONTROL configured, the share_cpu_map had been updated but llc_sibling had not, which would result in a trivial issue: At the end of stepped hotplugging out, the cpuset_hotplug_work would be flushed and then sched domain would be rebuilt where the **cpu_coregroup_mask** in sched_domain_topology got incorrect llc_sibling, but the result of rebuilding was correct due to the protection of cpu_active_mask. The stepped hotplugging may not be used in the production environment, but the issue existed. Even in the entire offline process, it's possible that a future user gets wrong the llc_sibling when accessing it concurrently or right after the teardown of CACHEINFO_ONLINE. > > So I am trying to understand if we really need this change. Please let me > know if I am missing anything here. > > -- > Regards, > Sudeep
On Thu, Mar 16, 2023 at 10:30:52AM +0000, Song Shuai wrote: > Sudeep Holla <sudeep.holla@arm.com> 于2023年3月16日周四 09:29写道: > > > > On Tue, Mar 14, 2023 at 03:53:45PM +0800, Song Shuai wrote: > > > The teardown of CPUHP_AP_BASE_CACHEINFO_ONLINE now only invokes > > > free_cache_attributes() to clear share_cpu_map of cacheinfo list. > > > At the same time, clearing cpu_topology[].llc_sibling is > > > called quite late at the teardown code in hotplug STARTING section. > > > > > > To avoid the incorrect LLC sibling masks generated, move its clearing > > > right after free_cache_attributes(). > > > > > > > Technically in terms of flow/timing this is correct. However I would like > > to know if you are seeing any issues without this change ? > > > > Technically, if a cpu is hotplugged out, the cacheinfo is reset first > > and then the topology. Until the cpu is removes, the LLC info in the > > topology is still valid. Also I am not sure if anything gets scheduled > > and this LLC info is utilised once the teardown of CPUHP_AP_BASE_CACHEINFO_ONLINE > > has started. > > There is no visible issue in the entire offline process(eg: echo 0 > online). > > However, when I hotplugged out the CPU into the state before CACHEINFO_ONLINE on > my kernel with the CONFIG_CPU_HOTPLUG_STATE_CONTROL configured, > the share_cpu_map had been updated but llc_sibling had not, which > would result in a trivial issue: > > At the end of stepped hotplugging out, the cpuset_hotplug_work would > be flushed and then sched domain would be rebuilt > where the **cpu_coregroup_mask** in sched_domain_topology got > incorrect llc_sibling, but the result of rebuilding was correct due > to the protection of cpu_active_mask. > Wait, I would like to disagree there. While I agree there is inconsistency between cacheinfo cpu_shared_map and the llc_sibling in the tear down path, it is still correct and terming it as "incorrect" llc_sibling is wrong. The cpu is not yet completely offline yet and hence the llc_sibling represents all the cpus it shares LLC. When the cpu is offlined, the cpu_topology is anyway removed. So I don't see it as an issue at all. If you follow __cpu_disable()->remove_cpu_topology(), it gets updated there. If the sched_domain_topology is not rebuilt after that, then we may have other issues. What am I missing ? I am not bothered by cacheinfo cpu_shared_map and cpu_topology llc_sibling mismatch for short window during the teardown as technically until the cpu is torndown, it is sharing llc with llc_sibling and it is definitely not wrong to have it in there. > The stepped hotplugging may not be used in the production environment, > but the issue existed. What issue ? If it is just inconsistency, then I am fine to ignore. That is just artificial and momentary and it impacts nothing. > Even in the entire offline process, it's possible that a future user > gets wrong the llc_sibling when accessing it concurrently or right > after the teardown of CACHEINFO_ONLINE. As I said, even if someone access it, it is not wrong information. -- Regards, Sudeep
Sudeep Holla <sudeep.holla@arm.com> 于2023年3月16日周四 14:51写道: > > On Thu, Mar 16, 2023 at 10:30:52AM +0000, Song Shuai wrote: > > Sudeep Holla <sudeep.holla@arm.com> 于2023年3月16日周四 09:29写道: > > > > > > On Tue, Mar 14, 2023 at 03:53:45PM +0800, Song Shuai wrote: > > > > The teardown of CPUHP_AP_BASE_CACHEINFO_ONLINE now only invokes > > > > free_cache_attributes() to clear share_cpu_map of cacheinfo list. > > > > At the same time, clearing cpu_topology[].llc_sibling is > > > > called quite late at the teardown code in hotplug STARTING section. > > > > > > > > To avoid the incorrect LLC sibling masks generated, move its clearing > > > > right after free_cache_attributes(). > > > > > > > > > > Technically in terms of flow/timing this is correct. However I would like > > > to know if you are seeing any issues without this change ? > > > > > > Technically, if a cpu is hotplugged out, the cacheinfo is reset first > > > and then the topology. Until the cpu is removes, the LLC info in the > > > topology is still valid. Also I am not sure if anything gets scheduled > > > and this LLC info is utilised once the teardown of CPUHP_AP_BASE_CACHEINFO_ONLINE > > > has started. > > > > There is no visible issue in the entire offline process(eg: echo 0 > online). > > > > However, when I hotplugged out the CPU into the state before CACHEINFO_ONLINE on > > my kernel with the CONFIG_CPU_HOTPLUG_STATE_CONTROL configured, > > the share_cpu_map had been updated but llc_sibling had not, which > > would result in a trivial issue: > > > > At the end of stepped hotplugging out, the cpuset_hotplug_work would > > be flushed and then sched domain would be rebuilt > > where the **cpu_coregroup_mask** in sched_domain_topology got > > incorrect llc_sibling, but the result of rebuilding was correct due > > to the protection of cpu_active_mask. > > > > Wait, I would like to disagree there. While I agree there is inconsistency > between cacheinfo cpu_shared_map and the llc_sibling in the tear down path, > it is still correct and terming it as "incorrect" llc_sibling is wrong. > The cpu is not yet completely offline yet and hence the llc_sibling > represents all the cpus it shares LLC. When the cpu is offlined, the > cpu_topology is anyway removed. So I don't see it as an issue at all. > If you follow __cpu_disable()->remove_cpu_topology(), it gets updated there. > If the sched_domain_topology is not rebuilt after that, then we may have > other issues. What am I missing ? > > I am not bothered by cacheinfo cpu_shared_map and cpu_topology llc_sibling > mismatch for short window during the teardown as technically until the cpu > is torndown, it is sharing llc with llc_sibling and it is definitely not > wrong to have it in there. > > > The stepped hotplugging may not be used in the production environment, > > but the issue existed. > > What issue ? If it is just inconsistency, then I am fine to ignore. That > is just artificial and momentary and it impacts nothing. My original point is to clear the llc_sibling right after clearing of share_cpu_map like what you did in 3fcbf1c77d08. And the ~~issue~~ I described above was found when I manually tested the 'base/cacheinfo:online' hpstate, which can be triggered by the following commands: ``` hpid=$(sed -n '/cacheinfo/s/:.*//p' /sys/devices/system/cpu/hotplug/states) echo $((hpid-1)) > /sys/devices/system/cpu/cpu2/hotplug/target ``` Anyway, the short inconsistency window you explained seems acceptable to me. > > > Even in the entire offline process, it's possible that a future user > > gets wrong the llc_sibling when accessing it concurrently or right > > after the teardown of CACHEINFO_ONLINE. > > As I said, even if someone access it, it is not wrong information. > > -- > Regards, > Sudeep
On Mon, Mar 20, 2023 at 06:20:34AM +0000, Song Shuai wrote: > > My original point is to clear the llc_sibling right after clearing of > share_cpu_map like what you did in 3fcbf1c77d08. > Yes I understood that. There were other issues that were fixed later and hence the state of current code. > And the ~~issue~~ I described above was found when I manually tested > the 'base/cacheinfo:online' hpstate, which can be triggered by the > following commands: > > ``` > hpid=$(sed -n '/cacheinfo/s/:.*//p' /sys/devices/system/cpu/hotplug/states) > echo $((hpid-1)) > /sys/devices/system/cpu/cpu2/hotplug/target > > ``` > Thanks for the detailed steps. I had guessed something very similar. > Anyway, the short inconsistency window you explained seems acceptable to me. > Yes just inconsistency but technically the CPU topology is still correct including LLC information. I don't see a point in clearing the cacheinfo at this point and just couple of hotplug state later reset all the topology as the CPU is being removed. I feel it to be redundant and we can add if we absolutely need it(i.e. if there are new users of that information and need it to be aligned to cacheinfo which I think is highly unlikely).
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c index b1c1dd38ab01..8681654d6c07 100644 --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -769,6 +769,20 @@ void update_siblings_masks(unsigned int cpuid) } } +void clear_llc_topology(unsigned int cpu) +{ + int sib; + + for_each_cpu(sib, topology_llc_cpumask(cpu)) { + if (sib == cpu) + continue; + if (last_level_cache_is_shared(cpu, sib)) { + cpumask_clear_cpu(cpu, topology_llc_cpumask(sib)); + cpumask_clear_cpu(sib, topology_llc_cpumask(cpu)); + } + } +} + static void clear_cpu_topology(int cpu) { struct cpu_topology *cpu_topo = &cpu_topology[cpu]; @@ -811,8 +825,6 @@ void remove_cpu_topology(unsigned int cpu) cpumask_clear_cpu(cpu, topology_sibling_cpumask(sibling)); for_each_cpu(sibling, topology_cluster_cpumask(cpu)) cpumask_clear_cpu(cpu, topology_cluster_cpumask(sibling)); - for_each_cpu(sibling, topology_llc_cpumask(cpu)) - cpumask_clear_cpu(cpu, topology_llc_cpumask(sibling)); clear_cpu_topology(cpu); } diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c index f6573c335f4c..1c276b30fd5a 100644 --- a/drivers/base/cacheinfo.c +++ b/drivers/base/cacheinfo.c @@ -19,6 +19,7 @@ #include <linux/slab.h> #include <linux/smp.h> #include <linux/sysfs.h> +#include <linux/arch_topology.h> /* pointer to per cpu cacheinfo */ static DEFINE_PER_CPU(struct cpu_cacheinfo, ci_cpu_cacheinfo); @@ -814,6 +815,7 @@ static int cacheinfo_cpu_pre_down(unsigned int cpu) cpu_cache_sysfs_exit(cpu); free_cache_attributes(cpu); + clear_llc_topology(cpu); return 0; } diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h index a07b510e7dc5..569e05607934 100644 --- a/include/linux/arch_topology.h +++ b/include/linux/arch_topology.h @@ -89,9 +89,12 @@ void store_cpu_topology(unsigned int cpuid); const struct cpumask *cpu_coregroup_mask(int cpu); const struct cpumask *cpu_clustergroup_mask(int cpu); void update_siblings_masks(unsigned int cpu); +void clear_llc_topology(unsigned int cpu); void remove_cpu_topology(unsigned int cpuid); void reset_cpu_topology(void); int parse_acpi_topology(void); +#else +static inline void clear_llc_topology(unsigned int cpu) { } #endif #endif /* _LINUX_ARCH_TOPOLOGY_H_ */