Message ID | 20221122203532.15013-6-ricardo.neri-calderon@linux.intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp2425153wrr; Tue, 22 Nov 2022 12:30:48 -0800 (PST) X-Google-Smtp-Source: AA0mqf75bCnO7+0zM9FFiFeI3nS7WbCQEoFCXuMLlZOTOhia5+Z50BZ9ju0jGKToWfe/j2XlG1hk X-Received: by 2002:a17:906:c40d:b0:7af:1534:73c9 with SMTP id u13-20020a170906c40d00b007af153473c9mr5005708ejz.558.1669149047891; Tue, 22 Nov 2022 12:30:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669149047; cv=none; d=google.com; s=arc-20160816; b=ICBVTJfTkurAdUwguKZlWjXLKYhOK62NxTJY647UfegIUPIxbWrjTiOCI4iT3G9CLq g9kRzt7HqrdMiU1s1DBrIBAUTJyqc1KhhViOH++f+Czh4K3kGT7/Ao6EFvasx0XwXeWj cRvWL6Guz+cK145d/Xvc+9+sDMUOaO8ch6odnkst+rtZ9bBGvfeVyD5MsMLEnhCyXaDW Wx9c97yYos+9MQmouJvB7gL9nbzTR3cvfdQXlwenXjjZLEbE9VQkGUC8eCzokTkE4ZmC IuN8NYvfqFg0xkWtXNyigMOkhNkMXcl5k2FBoEunu1EPplzepaRDrmPAvZKHY6mEW3AC VUAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=MmqyR15v2ZN53QVWLOcGu+qn2Cn4MiE8w7DDSiR+oT0=; b=agayLeWzEWh8ipZklgfSMC6CittNtPmqv/ANSJHT6RSZenefPNYsWl56DkprQw/azh MlXqt58Dehy78jP9Ue+PbxdjBhkx0bN+B/rFDvwqnHzXBMYrAAyGeOW3XptXCEIeBWEm uBJjSFeiVdWnaHEpt6MQ0qSPsk/GdpdVhklCwHoyUw8RigHjYT5Q25iRP56miogJFlJ7 1VtbC67DdttgJ5PQWHw4itbuQZ8UYuMp+byuB2PPfe0gZKSjvsmCs3Z/oBVhyOA6cPuM yn4PquqXABH9XZaclm5c2vTcf9eMDP77YCrj3E5G6wDG0TmyjdqtrJonAlvpHsyfdfyb 2UUw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ezwAMMGq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c17-20020aa7c751000000b0046329e2724dsi11267264eds.86.2022.11.22.12.30.23; Tue, 22 Nov 2022 12:30:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ezwAMMGq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234889AbiKVU2l (ORCPT <rfc822;cjcooper78@gmail.com> + 99 others); Tue, 22 Nov 2022 15:28:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37766 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234470AbiKVU2P (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 22 Nov 2022 15:28:15 -0500 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5576A17055 for <linux-kernel@vger.kernel.org>; Tue, 22 Nov 2022 12:28:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1669148895; x=1700684895; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=vXIaD/zsIWPHn5n3olE8jagfMylwb+X/0KOXnCS5E6Q=; b=ezwAMMGqxlaicR1khS+T/6eMe09a2oG4+ubhca8mbuxeAjebbBBQvccE yzAEvaI1egytNfUP8SgWJBb7VEL8ioI+Tz1KLvCUnlhHstuikCyyrzj7P ziiLnTyZ2qqndpeHd7fW90As0qVFZKL9ayyWp6s4rnQfvVsYS1GPU1ScP tMKKJds2SCMB58oLW3toHRRzpxH9YwcTbI2jGCtxXd+kKO5ZxvH/NwzdI NZgL9e+EEUdUUislcc9xH6c+EExe3QlcVv07D3Lz3K2YFBGJ0WonXyCXs 8Neit6x8iKNal3IW2uBcWWK2VMF6J5l0k5wIZVsj/BwshL8RHETA7PZA1 g==; X-IronPort-AV: E=McAfee;i="6500,9779,10539"; a="293616511" X-IronPort-AV: E=Sophos;i="5.96,185,1665471600"; d="scan'208";a="293616511" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Nov 2022 12:28:15 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10539"; a="816228141" X-IronPort-AV: E=Sophos;i="5.96,185,1665471600"; d="scan'208";a="816228141" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga005.jf.intel.com with ESMTP; 22 Nov 2022 12:28:14 -0800 From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> To: "Peter Zijlstra (Intel)" <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>, Vincent Guittot <vincent.guittot@linaro.org> Cc: Ricardo Neri <ricardo.neri@intel.com>, "Ravi V. Shankar" <ravi.v.shankar@intel.com>, Ben Segall <bsegall@google.com>, Daniel Bristot de Oliveira <bristot@redhat.com>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Len Brown <len.brown@intel.com>, Mel Gorman <mgorman@suse.de>, "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>, Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>, Steven Rostedt <rostedt@goodmis.org>, Tim Chen <tim.c.chen@linux.intel.com>, Valentin Schneider <vschneid@redhat.com>, x86@kernel.org, linux-kernel@vger.kernel.org, Ricardo Neri <ricardo.neri-calderon@linux.intel.com>, "Tim C . Chen" <tim.c.chen@intel.com> Subject: [PATCH v2 5/7] x86/sched: Remove SD_ASYM_PACKING from the "SMT" domain Date: Tue, 22 Nov 2022 12:35:30 -0800 Message-Id: <20221122203532.15013-6-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221122203532.15013-1-ricardo.neri-calderon@linux.intel.com> References: <20221122203532.15013-1-ricardo.neri-calderon@linux.intel.com> X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_PASS, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1750229632240157968?= X-GMAIL-MSGID: =?utf-8?q?1750229632240157968?= |
Series |
x86/sched: Avoid unnecessary migrations within SMT domains
|
|
Commit Message
Ricardo Neri
Nov. 22, 2022, 8:35 p.m. UTC
There is no difference between any of the SMT siblings of a physical core.
asym_packing load balancing is not needed among siblings.
When balancing load among physical cores, the scheduler now considers the
state of the siblings when checking the priority of a CPU.
Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
* Introduced this patch.
---
arch/x86/kernel/smpboot.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Comments
Hi Ricardo, On Tuesday 22 Nov 2022 at 12:35:30 (-0800), Ricardo Neri wrote: > There is no difference between any of the SMT siblings of a physical core. > asym_packing load balancing is not needed among siblings. > > When balancing load among physical cores, the scheduler now considers the > state of the siblings when checking the priority of a CPU. > > Cc: Ben Segall <bsegall@google.com> > Cc: Daniel Bristot de Oliveira <bristot@redhat.com> > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> > Cc: Len Brown <len.brown@intel.com> > Cc: Mel Gorman <mgorman@suse.de> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> > Cc: Steven Rostedt <rostedt@goodmis.org> > Cc: Tim C. Chen <tim.c.chen@intel.com> > Cc: Valentin Schneider <vschneid@redhat.com> > Cc: x86@kernel.org > Cc: linux-kernel@vger.kernel.org > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> > --- > Changes since v1: > * Introduced this patch. > --- > arch/x86/kernel/smpboot.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > index 3f3ea0287f69..c3de98224cb4 100644 > --- a/arch/x86/kernel/smpboot.c > +++ b/arch/x86/kernel/smpboot.c > @@ -545,7 +545,7 @@ static int x86_core_flags(void) > #ifdef CONFIG_SCHED_SMT > static int x86_smt_flags(void) > { > - return cpu_smt_flags() | x86_sched_itmt_flags(); > + return cpu_smt_flags(); Based on: kernel/sched/topology.c: sd = highest_flag_domain(cpu, SD_ASYM_PACKING); rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd); and described at: include/linux/sched/sd_flags.h: /* * Place busy tasks earlier in the domain * * SHARED_CHILD: Usually set on the SMT level. Technically could be set further * up, but currently assumed to be set from the base domain * upwards (see update_top_cache_domain()). * NEEDS_GROUPS: Load balancing flag. */ SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS) doesn't your change result in sd_asym_packing being NULL? The SD_ASYM_PACKING flag requires all children of a domain to have it set as well. So having SMT not setting the flag, while CLUSTER and MC having set the flag would result in a broken topology, right? Thanks, Ionela. > } > #endif > #ifdef CONFIG_SCHED_CLUSTER > -- > 2.25.1 > >
On Thu, Dec 08, 2022 at 04:03:04PM +0000, Ionela Voinescu wrote: > Hi Ricardo, Hi Ionela, Thank you very much for your feedback! > > On Tuesday 22 Nov 2022 at 12:35:30 (-0800), Ricardo Neri wrote: > > There is no difference between any of the SMT siblings of a physical core. > > asym_packing load balancing is not needed among siblings. > > > > When balancing load among physical cores, the scheduler now considers the > > state of the siblings when checking the priority of a CPU. > > > > Cc: Ben Segall <bsegall@google.com> > > Cc: Daniel Bristot de Oliveira <bristot@redhat.com> > > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> > > Cc: Len Brown <len.brown@intel.com> > > Cc: Mel Gorman <mgorman@suse.de> > > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> > > Cc: Steven Rostedt <rostedt@goodmis.org> > > Cc: Tim C. Chen <tim.c.chen@intel.com> > > Cc: Valentin Schneider <vschneid@redhat.com> > > Cc: x86@kernel.org > > Cc: linux-kernel@vger.kernel.org > > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> > > --- > > Changes since v1: > > * Introduced this patch. > > --- > > arch/x86/kernel/smpboot.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > > index 3f3ea0287f69..c3de98224cb4 100644 > > --- a/arch/x86/kernel/smpboot.c > > +++ b/arch/x86/kernel/smpboot.c > > @@ -545,7 +545,7 @@ static int x86_core_flags(void) > > #ifdef CONFIG_SCHED_SMT > > static int x86_smt_flags(void) > > { > > - return cpu_smt_flags() | x86_sched_itmt_flags(); > > + return cpu_smt_flags(); > > Based on: > > kernel/sched/topology.c: > sd = highest_flag_domain(cpu, SD_ASYM_PACKING); > rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd); > > and described at: > > include/linux/sched/sd_flags.h: > /* > * Place busy tasks earlier in the domain > * > * SHARED_CHILD: Usually set on the SMT level. Technically could be set further > * up, but currently assumed to be set from the base domain > * upwards (see update_top_cache_domain()). > * NEEDS_GROUPS: Load balancing flag. > */ > SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS) > > doesn't your change result in sd_asym_packing being NULL? Yes. This is a good catch. Thanks! > > The SD_ASYM_PACKING flag requires all children of a domain to have it set > as well. So having SMT not setting the flag, while CLUSTER and MC having > set the flag would result in a broken topology, right? I'd say that highest_flag_domain(..., flag) requires all children to have `flag`, but clearly the comment you quote allows for SD_ASYM_PACKING to be located in upper domains. Perhaps this can be fixed with a variant of highest_flag_domain() that do not require all children to have the flag? Thanks and BR, Ricardo
On 14/12/22 08:59, Ricardo Neri wrote: > On Thu, Dec 08, 2022 at 04:03:04PM +0000, Ionela Voinescu wrote: >> Based on: >> >> kernel/sched/topology.c: >> sd = highest_flag_domain(cpu, SD_ASYM_PACKING); >> rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd); >> >> and described at: >> >> include/linux/sched/sd_flags.h: >> /* >> * Place busy tasks earlier in the domain >> * >> * SHARED_CHILD: Usually set on the SMT level. Technically could be set further >> * up, but currently assumed to be set from the base domain >> * upwards (see update_top_cache_domain()). >> * NEEDS_GROUPS: Load balancing flag. >> */ >> SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS) >> >> doesn't your change result in sd_asym_packing being NULL? > > Yes. This is a good catch. Thanks! > Nice to see those being useful :-) FYI if you run your kernel with CONFIG_SCHED_DEBUG=y and sched_debug on the cmdline, you should get a warning at boot time from the topology debug code checking assertions against those flags. >> >> The SD_ASYM_PACKING flag requires all children of a domain to have it set >> as well. So having SMT not setting the flag, while CLUSTER and MC having >> set the flag would result in a broken topology, right? > > I'd say that highest_flag_domain(..., flag) requires all children to have > `flag`, but clearly the comment you quote allows for SD_ASYM_PACKING to > be located in upper domains. > > Perhaps this can be fixed with a variant of highest_flag_domain() that do > not require all children to have the flag? > So I gave that flag SDF_SHARED_CHILD because its cached SD pointer was set up using highest_flag_domain(). Looking for the highest level where it is set matches how it is used in nohz_balancer_kick(), so you might want a new helper. With that said, so far all but one flag (SD_PREFER_SIBLING, and that's because of big.LITTLE woes) follow the SDF_SHARED_{CHILD, PARENT} pattern, if SD_ASYM_PACKING no longer does then we need to think whether we're trying to make it do funky things. I need to look at the rest of your series to get an idea, that unfortunately won't be today but it's now in my todolist. > Thanks and BR, > Ricardo
On Thu, Dec 15, 2022 at 04:48:14PM +0000, Valentin Schneider wrote: > On 14/12/22 08:59, Ricardo Neri wrote: > > On Thu, Dec 08, 2022 at 04:03:04PM +0000, Ionela Voinescu wrote: > >> Based on: > >> > >> kernel/sched/topology.c: > >> sd = highest_flag_domain(cpu, SD_ASYM_PACKING); > >> rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd); > >> > >> and described at: > >> > >> include/linux/sched/sd_flags.h: > >> /* > >> * Place busy tasks earlier in the domain > >> * > >> * SHARED_CHILD: Usually set on the SMT level. Technically could be set further > >> * up, but currently assumed to be set from the base domain > >> * upwards (see update_top_cache_domain()). > >> * NEEDS_GROUPS: Load balancing flag. > >> */ > >> SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS) > >> > >> doesn't your change result in sd_asym_packing being NULL? > > > > Yes. This is a good catch. Thanks! > > > > Nice to see those being useful :-) FYI if you run your kernel with > CONFIG_SCHED_DEBUG=y and sched_debug on the cmdline, you should get a > warning at boot time from the topology debug code checking assertions > against those flags. Thanks! I missed this warning. Indeed, it is there. > > >> > >> The SD_ASYM_PACKING flag requires all children of a domain to have it set > >> as well. So having SMT not setting the flag, while CLUSTER and MC having > >> set the flag would result in a broken topology, right? > > > > I'd say that highest_flag_domain(..., flag) requires all children to have > > `flag`, but clearly the comment you quote allows for SD_ASYM_PACKING to > > be located in upper domains. > > > > Perhaps this can be fixed with a variant of highest_flag_domain() that do > > not require all children to have the flag? > > > > So I gave that flag SDF_SHARED_CHILD because its cached SD pointer was set > up using highest_flag_domain(). Looking for the highest level where it is > set matches how it is used in nohz_balancer_kick(), so you might want a new > helper. Right. I was thinking on a highest_flag_domain_weak() or a changing highest_flag_domain(..., bool shared_child). > > With that said, so far all but one flag (SD_PREFER_SIBLING, and that's > because of big.LITTLE woes) follow the SDF_SHARED_{CHILD, PARENT} pattern, > if SD_ASYM_PACKING no longer does then we need to think whether we're > trying to make it do funky things. My thesis is that x86 does not need the SD_ASYM_PACKING flag at the SMT level because all SMT siblings are identical. There are cores of higher priority at the "MC" level (maybe in the future at the "CLS" level). Power7 is fine because it only uses SD_ASYM_PACKING at the SMT level. > I need to look at the rest of your > series to get an idea, that unfortunately won't be today but it's now in my > todolist. Thank you! BR, Ricardo
On 19/12/22 16:42, Ricardo Neri wrote: > On Thu, Dec 15, 2022 at 04:48:14PM +0000, Valentin Schneider wrote: >> With that said, so far all but one flag (SD_PREFER_SIBLING, and that's >> because of big.LITTLE woes) follow the SDF_SHARED_{CHILD, PARENT} pattern, >> if SD_ASYM_PACKING no longer does then we need to think whether we're >> trying to make it do funky things. > > My thesis is that x86 does not need the SD_ASYM_PACKING flag at the SMT > level because all SMT siblings are identical. There are cores of higher > priority at the "MC" level (maybe in the future at the "CLS" level). > > Power7 is fine because it only uses SD_ASYM_PACKING at the SMT level. > So with what I groked from your series, I agree with you, x86 shouldn't need it at SMT level. What about the below? --- diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h index 57bde66d95f7a..8dc16942135b4 100644 --- a/include/linux/sched/sd_flags.h +++ b/include/linux/sched/sd_flags.h @@ -132,12 +132,12 @@ SD_FLAG(SD_SERIALIZE, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS) /* * Place busy tasks earlier in the domain * - * SHARED_CHILD: Usually set on the SMT level. Technically could be set further - * up, but currently assumed to be set from the base domain - * upwards (see update_top_cache_domain()). + * SHARED_PARENT: Usually set on the SMT level. Can be set further up if all + * siblings of an SMT core are identical, but SMT cores themselves + * have different priorites. * NEEDS_GROUPS: Load balancing flag. */ -SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS) +SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS) /* * Prefer to place tasks in a sibling domain diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index b1d338a740e56..2d532e29373b1 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1772,6 +1772,19 @@ queue_balance_callback(struct rq *rq, for (__sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd); \ __sd; __sd = __sd->parent) +static inline struct sched_domain *__highest_flag_domain(struct sched_domain *sd, int flag) +{ + struct sched_domain *hsd = NULL; + + for (; sd; sd = sd->parent) { + if (!(sd->flags & flag)) + break; + hsd = sd; + } + + return hsd; +} + /** * highest_flag_domain - Return highest sched_domain containing flag. * @cpu: The CPU whose highest level of sched domain is to @@ -1783,15 +1796,7 @@ queue_balance_callback(struct rq *rq, */ static inline struct sched_domain *highest_flag_domain(int cpu, int flag) { - struct sched_domain *sd, *hsd = NULL; - - for_each_domain(cpu, sd) { - if (!(sd->flags & flag)) - break; - hsd = sd; - } - - return hsd; + return __highest_flag_domain(rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd), flag); } static inline struct sched_domain *lowest_flag_domain(int cpu, int flag) @@ -1806,6 +1811,16 @@ static inline struct sched_domain *lowest_flag_domain(int cpu, int flag) return sd; } +static inline struct sched_domain *highest_parent_flag_domain(int cpu, int flag) +{ + struct sched_domain *sd; + + SCHED_WARN_ON(!(sd_flag_debug[ilog2(flag)].meta_flags & SDF_SHARED_PARENT)); + + sd = lowest_flag_domain(cpu, flag); + return __highest_flag_domain(sd, flag); +} + DECLARE_PER_CPU(struct sched_domain __rcu *, sd_llc); DECLARE_PER_CPU(int, sd_llc_size); DECLARE_PER_CPU(int, sd_llc_id); diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 8154ef590b9f8..4e0e5b27c331b 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -692,7 +692,7 @@ static void update_top_cache_domain(int cpu) sd = lowest_flag_domain(cpu, SD_NUMA); rcu_assign_pointer(per_cpu(sd_numa, cpu), sd); - sd = highest_flag_domain(cpu, SD_ASYM_PACKING); + sd = highest_parent_flag_domain(cpu, SD_ASYM_PACKING); rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd); sd = lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY_FULL);
On Thu, Dec 22, 2022 at 04:56:51PM +0000, Valentin Schneider wrote: > On 19/12/22 16:42, Ricardo Neri wrote: > > On Thu, Dec 15, 2022 at 04:48:14PM +0000, Valentin Schneider wrote: > >> With that said, so far all but one flag (SD_PREFER_SIBLING, and that's > >> because of big.LITTLE woes) follow the SDF_SHARED_{CHILD, PARENT} pattern, > >> if SD_ASYM_PACKING no longer does then we need to think whether we're > >> trying to make it do funky things. > > > > My thesis is that x86 does not need the SD_ASYM_PACKING flag at the SMT > > level because all SMT siblings are identical. There are cores of higher > > priority at the "MC" level (maybe in the future at the "CLS" level). > > > > Power7 is fine because it only uses SD_ASYM_PACKING at the SMT level. > > > > So with what I groked from your series, I agree with you, x86 shouldn't > need it at SMT level. > > What about the below? > > --- > > diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h > index 57bde66d95f7a..8dc16942135b4 100644 > --- a/include/linux/sched/sd_flags.h > +++ b/include/linux/sched/sd_flags.h > @@ -132,12 +132,12 @@ SD_FLAG(SD_SERIALIZE, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS) > /* > * Place busy tasks earlier in the domain > * > - * SHARED_CHILD: Usually set on the SMT level. Technically could be set further > - * up, but currently assumed to be set from the base domain > - * upwards (see update_top_cache_domain()). > + * SHARED_PARENT: Usually set on the SMT level. Can be set further up if all > + * siblings of an SMT core are identical, but SMT cores themselves > + * have different priorites. > * NEEDS_GROUPS: Load balancing flag. > */ > -SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS) > +SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS) But this would not work for Power7. It only has SD_ASYM_PACKING in the SMT sched domain. Must it have either of these flags? In Power7 SMT siblings have the different priority but, IIUC, physical cores are identical. It seems to me that asym_packing is specific to a domain. Thanks and BR, Ricardo
On 29/12/22 11:02, Ricardo Neri wrote: > On Thu, Dec 22, 2022 at 04:56:51PM +0000, Valentin Schneider wrote: >> diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h >> index 57bde66d95f7a..8dc16942135b4 100644 >> --- a/include/linux/sched/sd_flags.h >> +++ b/include/linux/sched/sd_flags.h >> @@ -132,12 +132,12 @@ SD_FLAG(SD_SERIALIZE, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS) >> /* >> * Place busy tasks earlier in the domain >> * >> - * SHARED_CHILD: Usually set on the SMT level. Technically could be set further >> - * up, but currently assumed to be set from the base domain >> - * upwards (see update_top_cache_domain()). >> + * SHARED_PARENT: Usually set on the SMT level. Can be set further up if all >> + * siblings of an SMT core are identical, but SMT cores themselves >> + * have different priorites. >> * NEEDS_GROUPS: Load balancing flag. >> */ >> -SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS) >> +SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS) > > But this would not work for Power7. It only has SD_ASYM_PACKING in the SMT > sched domain. Must it have either of these flags? > It's not mandatory, but making sure SD flags conform to either of them means the topology debugging infra can help spot misshapen topologies... > In Power7 SMT siblings have the different priority but, IIUC, physical > cores are identical. > ...But you're right, this doesn't work with Power7 as it would need SD_ASYM_PACKING all the way up the topology to conform with SDF_SHARED_PARENT, which clearly doesn't work with how Power7 uses asym_packing. > It seems to me that asym_packing is specific to a domain. > For Power7 it is, since the asymmetry is only between siblings of a given core. For other systems where the asymmetry is between cores, that could theoretically affect several levels. Consider: DIE [ ] MC [ ][ ] SMT [ ][ ][ ][ ] CPU 0 1 2 3 4 5 6 7 prio 3 3 2 2 1 1 0 0 As done in your patch, here asym_packing doesn't make sense for SMT, but it does for MC and DIE. Anywho, I think what this means if we should drop the SDF_SHARED_* metaflag for SD_ASYM_PACKING, unless we can think of a nice way to programmatically describe how SD_ASYM_PACKING should be set. > Thanks and BR, > Ricardo
On Tue, Jan 10, 2023 at 07:17:51PM +0000, Valentin Schneider wrote: > On 29/12/22 11:02, Ricardo Neri wrote: > > On Thu, Dec 22, 2022 at 04:56:51PM +0000, Valentin Schneider wrote: > >> diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h > >> index 57bde66d95f7a..8dc16942135b4 100644 > >> --- a/include/linux/sched/sd_flags.h > >> +++ b/include/linux/sched/sd_flags.h > >> @@ -132,12 +132,12 @@ SD_FLAG(SD_SERIALIZE, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS) > >> /* > >> * Place busy tasks earlier in the domain > >> * > >> - * SHARED_CHILD: Usually set on the SMT level. Technically could be set further > >> - * up, but currently assumed to be set from the base domain > >> - * upwards (see update_top_cache_domain()). > >> + * SHARED_PARENT: Usually set on the SMT level. Can be set further up if all > >> + * siblings of an SMT core are identical, but SMT cores themselves > >> + * have different priorites. > >> * NEEDS_GROUPS: Load balancing flag. > >> */ > >> -SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS) > >> +SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS) > > > > But this would not work for Power7. It only has SD_ASYM_PACKING in the SMT > > sched domain. Must it have either of these flags? > > > > It's not mandatory, but making sure SD flags conform to either of them > means the topology debugging infra can help spot misshapen topologies... > > > In Power7 SMT siblings have the different priority but, IIUC, physical > > cores are identical. > > > > > ...But you're right, this doesn't work with Power7 as it would need > SD_ASYM_PACKING all the way up the topology to conform with > SDF_SHARED_PARENT, which clearly doesn't work with how Power7 uses > asym_packing. > > > It seems to me that asym_packing is specific to a domain. > > > > For Power7 it is, since the asymmetry is only between siblings of a given > core. For other systems where the asymmetry is between cores, that could > theoretically affect several levels. Consider: > > DIE [ ] > MC [ ][ ] > SMT [ ][ ][ ][ ] > CPU 0 1 2 3 4 5 6 7 > prio 3 3 2 2 1 1 0 0 > > As done in your patch, here asym_packing doesn't make sense for SMT, but it > does for MC and DIE. > > Anywho, I think what this means if we should drop the SDF_SHARED_* metaflag > for SD_ASYM_PACKING, unless we can think of a nice way to programmatically > describe how SD_ASYM_PACKING should be set. Perhaps it can be done by adding an extra check for sg::asym_prefer_cpu. In the example you give, DIE does not need SD_ASYM_PACKING if asym_prefer_cpu all the MC sched groups of have the same priority. This would satisfy both Power7 and x86. This assumes that priorities are available when checking the sanity of the topology. Thanks and BR, Ricardo
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 3f3ea0287f69..c3de98224cb4 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -545,7 +545,7 @@ static int x86_core_flags(void) #ifdef CONFIG_SCHED_SMT static int x86_smt_flags(void) { - return cpu_smt_flags() | x86_sched_itmt_flags(); + return cpu_smt_flags(); } #endif #ifdef CONFIG_SCHED_CLUSTER