[v2,5/7] x86/sched: Remove SD_ASYM_PACKING from the "SMT" domain

Message ID 20221122203532.15013-6-ricardo.neri-calderon@linux.intel.com
State New
Headers
Series x86/sched: Avoid unnecessary migrations within SMT domains |

Commit Message

Ricardo Neri Nov. 22, 2022, 8:35 p.m. UTC
  There is no difference between any of the SMT siblings of a physical core.
asym_packing load balancing is not needed among siblings.

When balancing load among physical cores, the scheduler now considers the
state of the siblings when checking the priority of a CPU.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * Introduced this patch.
---
 arch/x86/kernel/smpboot.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Ionela Voinescu Dec. 8, 2022, 4:03 p.m. UTC | #1
Hi Ricardo,

On Tuesday 22 Nov 2022 at 12:35:30 (-0800), Ricardo Neri wrote:
> There is no difference between any of the SMT siblings of a physical core.
> asym_packing load balancing is not needed among siblings.
> 
> When balancing load among physical cores, the scheduler now considers the
> state of the siblings when checking the priority of a CPU.
> 
> Cc: Ben Segall <bsegall@google.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Len Brown <len.brown@intel.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tim C. Chen <tim.c.chen@intel.com>
> Cc: Valentin Schneider <vschneid@redhat.com>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
> Changes since v1:
>  * Introduced this patch.
> ---
>  arch/x86/kernel/smpboot.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 3f3ea0287f69..c3de98224cb4 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -545,7 +545,7 @@ static int x86_core_flags(void)
>  #ifdef CONFIG_SCHED_SMT
>  static int x86_smt_flags(void)
>  {
> -	return cpu_smt_flags() | x86_sched_itmt_flags();
> +	return cpu_smt_flags();

Based on:

kernel/sched/topology.c:
sd = highest_flag_domain(cpu, SD_ASYM_PACKING);
rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd);

and described at:

include/linux/sched/sd_flags.h:
/*
 * Place busy tasks earlier in the domain
 *
 * SHARED_CHILD: Usually set on the SMT level. Technically could be set further
 *               up, but currently assumed to be set from the base domain
 *               upwards (see update_top_cache_domain()).
 * NEEDS_GROUPS: Load balancing flag.
 */
SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)

doesn't your change result in sd_asym_packing being NULL?

The SD_ASYM_PACKING flag requires all children of a domain to have it set
as well. So having SMT not setting the flag, while CLUSTER and MC having
set the flag would result in a broken topology, right?

Thanks,
Ionela.

>  }
>  #endif
>  #ifdef CONFIG_SCHED_CLUSTER
> -- 
> 2.25.1
> 
>
  
Ricardo Neri Dec. 14, 2022, 4:59 p.m. UTC | #2
On Thu, Dec 08, 2022 at 04:03:04PM +0000, Ionela Voinescu wrote:
> Hi Ricardo,

Hi Ionela,

Thank you very much for your feedback!

> 
> On Tuesday 22 Nov 2022 at 12:35:30 (-0800), Ricardo Neri wrote:
> > There is no difference between any of the SMT siblings of a physical core.
> > asym_packing load balancing is not needed among siblings.
> > 
> > When balancing load among physical cores, the scheduler now considers the
> > state of the siblings when checking the priority of a CPU.
> > 
> > Cc: Ben Segall <bsegall@google.com>
> > Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > Cc: Len Brown <len.brown@intel.com>
> > Cc: Mel Gorman <mgorman@suse.de>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Tim C. Chen <tim.c.chen@intel.com>
> > Cc: Valentin Schneider <vschneid@redhat.com>
> > Cc: x86@kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> > Changes since v1:
> >  * Introduced this patch.
> > ---
> >  arch/x86/kernel/smpboot.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> > index 3f3ea0287f69..c3de98224cb4 100644
> > --- a/arch/x86/kernel/smpboot.c
> > +++ b/arch/x86/kernel/smpboot.c
> > @@ -545,7 +545,7 @@ static int x86_core_flags(void)
> >  #ifdef CONFIG_SCHED_SMT
> >  static int x86_smt_flags(void)
> >  {
> > -	return cpu_smt_flags() | x86_sched_itmt_flags();
> > +	return cpu_smt_flags();
> 
> Based on:
> 
> kernel/sched/topology.c:
> sd = highest_flag_domain(cpu, SD_ASYM_PACKING);
> rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd);
> 
> and described at:
> 
> include/linux/sched/sd_flags.h:
> /*
>  * Place busy tasks earlier in the domain
>  *
>  * SHARED_CHILD: Usually set on the SMT level. Technically could be set further
>  *               up, but currently assumed to be set from the base domain
>  *               upwards (see update_top_cache_domain()).
>  * NEEDS_GROUPS: Load balancing flag.
>  */
> SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
> 
> doesn't your change result in sd_asym_packing being NULL?

Yes. This is a good catch. Thanks!

> 
> The SD_ASYM_PACKING flag requires all children of a domain to have it set
> as well. So having SMT not setting the flag, while CLUSTER and MC having
> set the flag would result in a broken topology, right?

I'd say that highest_flag_domain(..., flag) requires all children to have
`flag`, but clearly the comment you quote allows for SD_ASYM_PACKING to
be located in upper domains.

Perhaps this can be fixed with a variant of highest_flag_domain() that do
not require all children to have the flag?

Thanks and BR,
Ricardo
  
Valentin Schneider Dec. 15, 2022, 4:48 p.m. UTC | #3
On 14/12/22 08:59, Ricardo Neri wrote:
> On Thu, Dec 08, 2022 at 04:03:04PM +0000, Ionela Voinescu wrote:
>> Based on:
>>
>> kernel/sched/topology.c:
>> sd = highest_flag_domain(cpu, SD_ASYM_PACKING);
>> rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd);
>>
>> and described at:
>>
>> include/linux/sched/sd_flags.h:
>> /*
>>  * Place busy tasks earlier in the domain
>>  *
>>  * SHARED_CHILD: Usually set on the SMT level. Technically could be set further
>>  *               up, but currently assumed to be set from the base domain
>>  *               upwards (see update_top_cache_domain()).
>>  * NEEDS_GROUPS: Load balancing flag.
>>  */
>> SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
>>
>> doesn't your change result in sd_asym_packing being NULL?
>
> Yes. This is a good catch. Thanks!
>

Nice to see those being useful :-) FYI if you run your kernel with
CONFIG_SCHED_DEBUG=y and sched_debug on the cmdline, you should get a
warning at boot time from the topology debug code checking assertions
against those flags.

>>
>> The SD_ASYM_PACKING flag requires all children of a domain to have it set
>> as well. So having SMT not setting the flag, while CLUSTER and MC having
>> set the flag would result in a broken topology, right?
>
> I'd say that highest_flag_domain(..., flag) requires all children to have
> `flag`, but clearly the comment you quote allows for SD_ASYM_PACKING to
> be located in upper domains.
>
> Perhaps this can be fixed with a variant of highest_flag_domain() that do
> not require all children to have the flag?
>

So I gave that flag SDF_SHARED_CHILD because its cached SD pointer was set
up using highest_flag_domain(). Looking for the highest level where it is
set matches how it is used in nohz_balancer_kick(), so you might want a new
helper.

With that said, so far all but one flag (SD_PREFER_SIBLING, and that's
because of big.LITTLE woes) follow the SDF_SHARED_{CHILD, PARENT} pattern,
if SD_ASYM_PACKING no longer does then we need to think whether we're
trying to make it do funky things. I need to look at the rest of your
series to get an idea, that unfortunately won't be today but it's now in my
todolist.

> Thanks and BR,
> Ricardo
  
Ricardo Neri Dec. 20, 2022, 12:42 a.m. UTC | #4
On Thu, Dec 15, 2022 at 04:48:14PM +0000, Valentin Schneider wrote:
> On 14/12/22 08:59, Ricardo Neri wrote:
> > On Thu, Dec 08, 2022 at 04:03:04PM +0000, Ionela Voinescu wrote:
> >> Based on:
> >>
> >> kernel/sched/topology.c:
> >> sd = highest_flag_domain(cpu, SD_ASYM_PACKING);
> >> rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd);
> >>
> >> and described at:
> >>
> >> include/linux/sched/sd_flags.h:
> >> /*
> >>  * Place busy tasks earlier in the domain
> >>  *
> >>  * SHARED_CHILD: Usually set on the SMT level. Technically could be set further
> >>  *               up, but currently assumed to be set from the base domain
> >>  *               upwards (see update_top_cache_domain()).
> >>  * NEEDS_GROUPS: Load balancing flag.
> >>  */
> >> SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
> >>
> >> doesn't your change result in sd_asym_packing being NULL?
> >
> > Yes. This is a good catch. Thanks!
> >
> 
> Nice to see those being useful :-) FYI if you run your kernel with
> CONFIG_SCHED_DEBUG=y and sched_debug on the cmdline, you should get a
> warning at boot time from the topology debug code checking assertions
> against those flags.

Thanks! I missed this warning. Indeed, it is there.
> 
> >>
> >> The SD_ASYM_PACKING flag requires all children of a domain to have it set
> >> as well. So having SMT not setting the flag, while CLUSTER and MC having
> >> set the flag would result in a broken topology, right?
> >
> > I'd say that highest_flag_domain(..., flag) requires all children to have
> > `flag`, but clearly the comment you quote allows for SD_ASYM_PACKING to
> > be located in upper domains.
> >
> > Perhaps this can be fixed with a variant of highest_flag_domain() that do
> > not require all children to have the flag?
> >
> 
> So I gave that flag SDF_SHARED_CHILD because its cached SD pointer was set
> up using highest_flag_domain(). Looking for the highest level where it is
> set matches how it is used in nohz_balancer_kick(), so you might want a new
> helper.

Right. I was thinking on a highest_flag_domain_weak() or a changing
highest_flag_domain(..., bool shared_child).

> 
> With that said, so far all but one flag (SD_PREFER_SIBLING, and that's
> because of big.LITTLE woes) follow the SDF_SHARED_{CHILD, PARENT} pattern,
> if SD_ASYM_PACKING no longer does then we need to think whether we're
> trying to make it do funky things.

My thesis is that x86 does not need the SD_ASYM_PACKING flag at the SMT
level because all SMT siblings are identical. There are cores of higher
priority at the "MC" level (maybe in the future at the "CLS" level).

Power7 is fine because it only uses SD_ASYM_PACKING at the SMT level.

> I need to look at the rest of your
> series to get an idea, that unfortunately won't be today but it's now in my
> todolist.

Thank you!

BR,
Ricardo
  
Valentin Schneider Dec. 22, 2022, 4:56 p.m. UTC | #5
On 19/12/22 16:42, Ricardo Neri wrote:
> On Thu, Dec 15, 2022 at 04:48:14PM +0000, Valentin Schneider wrote:
>> With that said, so far all but one flag (SD_PREFER_SIBLING, and that's
>> because of big.LITTLE woes) follow the SDF_SHARED_{CHILD, PARENT} pattern,
>> if SD_ASYM_PACKING no longer does then we need to think whether we're
>> trying to make it do funky things.
>
> My thesis is that x86 does not need the SD_ASYM_PACKING flag at the SMT
> level because all SMT siblings are identical. There are cores of higher
> priority at the "MC" level (maybe in the future at the "CLS" level).
>
> Power7 is fine because it only uses SD_ASYM_PACKING at the SMT level.
>

So with what I groked from your series, I agree with you, x86 shouldn't
need it at SMT level.

What about the below?

---

diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
index 57bde66d95f7a..8dc16942135b4 100644
--- a/include/linux/sched/sd_flags.h
+++ b/include/linux/sched/sd_flags.h
@@ -132,12 +132,12 @@ SD_FLAG(SD_SERIALIZE, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
 /*
  * Place busy tasks earlier in the domain
  *
- * SHARED_CHILD: Usually set on the SMT level. Technically could be set further
- *               up, but currently assumed to be set from the base domain
- *               upwards (see update_top_cache_domain()).
+ * SHARED_PARENT: Usually set on the SMT level. Can be set further up if all
+ *                siblings of an SMT core are identical, but SMT cores themselves
+ *                have different priorites.
  * NEEDS_GROUPS: Load balancing flag.
  */
-SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
+SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
 
 /*
  * Prefer to place tasks in a sibling domain
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b1d338a740e56..2d532e29373b1 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1772,6 +1772,19 @@ queue_balance_callback(struct rq *rq,
 	for (__sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd); \
 			__sd; __sd = __sd->parent)
 
+static inline struct sched_domain *__highest_flag_domain(struct sched_domain *sd, int flag)
+{
+	struct sched_domain *hsd = NULL;
+
+	for (; sd; sd = sd->parent) {
+		if (!(sd->flags & flag))
+			break;
+		hsd = sd;
+	}
+
+	return hsd;
+}
+
 /**
  * highest_flag_domain - Return highest sched_domain containing flag.
  * @cpu:	The CPU whose highest level of sched domain is to
@@ -1783,15 +1796,7 @@ queue_balance_callback(struct rq *rq,
  */
 static inline struct sched_domain *highest_flag_domain(int cpu, int flag)
 {
-	struct sched_domain *sd, *hsd = NULL;
-
-	for_each_domain(cpu, sd) {
-		if (!(sd->flags & flag))
-			break;
-		hsd = sd;
-	}
-
-	return hsd;
+	return __highest_flag_domain(rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd), flag);
 }
 
 static inline struct sched_domain *lowest_flag_domain(int cpu, int flag)
@@ -1806,6 +1811,16 @@ static inline struct sched_domain *lowest_flag_domain(int cpu, int flag)
 	return sd;
 }
 
+static inline struct sched_domain *highest_parent_flag_domain(int cpu, int flag)
+{
+	struct sched_domain *sd;
+
+	SCHED_WARN_ON(!(sd_flag_debug[ilog2(flag)].meta_flags & SDF_SHARED_PARENT));
+
+	sd = lowest_flag_domain(cpu, flag);
+	return __highest_flag_domain(sd, flag);
+}
+
 DECLARE_PER_CPU(struct sched_domain __rcu *, sd_llc);
 DECLARE_PER_CPU(int, sd_llc_size);
 DECLARE_PER_CPU(int, sd_llc_id);
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 8154ef590b9f8..4e0e5b27c331b 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -692,7 +692,7 @@ static void update_top_cache_domain(int cpu)
 	sd = lowest_flag_domain(cpu, SD_NUMA);
 	rcu_assign_pointer(per_cpu(sd_numa, cpu), sd);
 
-	sd = highest_flag_domain(cpu, SD_ASYM_PACKING);
+	sd = highest_parent_flag_domain(cpu, SD_ASYM_PACKING);
 	rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd);
 
 	sd = lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY_FULL);
  
Ricardo Neri Dec. 29, 2022, 7:02 p.m. UTC | #6
On Thu, Dec 22, 2022 at 04:56:51PM +0000, Valentin Schneider wrote:
> On 19/12/22 16:42, Ricardo Neri wrote:
> > On Thu, Dec 15, 2022 at 04:48:14PM +0000, Valentin Schneider wrote:
> >> With that said, so far all but one flag (SD_PREFER_SIBLING, and that's
> >> because of big.LITTLE woes) follow the SDF_SHARED_{CHILD, PARENT} pattern,
> >> if SD_ASYM_PACKING no longer does then we need to think whether we're
> >> trying to make it do funky things.
> >
> > My thesis is that x86 does not need the SD_ASYM_PACKING flag at the SMT
> > level because all SMT siblings are identical. There are cores of higher
> > priority at the "MC" level (maybe in the future at the "CLS" level).
> >
> > Power7 is fine because it only uses SD_ASYM_PACKING at the SMT level.
> >
> 
> So with what I groked from your series, I agree with you, x86 shouldn't
> need it at SMT level.
> 
> What about the below?
> 
> ---
> 
> diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
> index 57bde66d95f7a..8dc16942135b4 100644
> --- a/include/linux/sched/sd_flags.h
> +++ b/include/linux/sched/sd_flags.h
> @@ -132,12 +132,12 @@ SD_FLAG(SD_SERIALIZE, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
>  /*
>   * Place busy tasks earlier in the domain
>   *
> - * SHARED_CHILD: Usually set on the SMT level. Technically could be set further
> - *               up, but currently assumed to be set from the base domain
> - *               upwards (see update_top_cache_domain()).
> + * SHARED_PARENT: Usually set on the SMT level. Can be set further up if all
> + *                siblings of an SMT core are identical, but SMT cores themselves
> + *                have different priorites.
>   * NEEDS_GROUPS: Load balancing flag.
>   */
> -SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
> +SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)

But this would not work for Power7. It only has SD_ASYM_PACKING in the SMT
sched domain. Must it have either of these flags?

In Power7 SMT siblings have the different priority but, IIUC, physical
cores are identical.

It seems to me that asym_packing is specific to a domain.

Thanks and BR,
Ricardo
  
Valentin Schneider Jan. 10, 2023, 7:17 p.m. UTC | #7
On 29/12/22 11:02, Ricardo Neri wrote:
> On Thu, Dec 22, 2022 at 04:56:51PM +0000, Valentin Schneider wrote:
>> diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
>> index 57bde66d95f7a..8dc16942135b4 100644
>> --- a/include/linux/sched/sd_flags.h
>> +++ b/include/linux/sched/sd_flags.h
>> @@ -132,12 +132,12 @@ SD_FLAG(SD_SERIALIZE, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
>>  /*
>>   * Place busy tasks earlier in the domain
>>   *
>> - * SHARED_CHILD: Usually set on the SMT level. Technically could be set further
>> - *               up, but currently assumed to be set from the base domain
>> - *               upwards (see update_top_cache_domain()).
>> + * SHARED_PARENT: Usually set on the SMT level. Can be set further up if all
>> + *                siblings of an SMT core are identical, but SMT cores themselves
>> + *                have different priorites.
>>   * NEEDS_GROUPS: Load balancing flag.
>>   */
>> -SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
>> +SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
>
> But this would not work for Power7. It only has SD_ASYM_PACKING in the SMT
> sched domain. Must it have either of these flags?
>

It's not mandatory, but making sure SD flags conform to either of them
means the topology debugging infra can help spot misshapen topologies...

> In Power7 SMT siblings have the different priority but, IIUC, physical
> cores are identical.
>


...But you're right, this doesn't work with Power7 as it would need
SD_ASYM_PACKING all the way up the topology to conform with
SDF_SHARED_PARENT, which clearly doesn't work with how Power7 uses
asym_packing.

> It seems to me that asym_packing is specific to a domain.
>

For Power7 it is, since the asymmetry is only between siblings of a given
core. For other systems where the asymmetry is between cores, that could
theoretically affect several levels. Consider:

  DIE [                      ]
  MC  [          ][          ]
  SMT [    ][    ][    ][    ]
  CPU  0  1  2  3  4  5  6  7
  prio 3  3  2  2  1  1  0  0

As done in your patch, here asym_packing doesn't make sense for SMT, but it
does for MC and DIE.

Anywho, I think what this means if we should drop the SDF_SHARED_* metaflag
for SD_ASYM_PACKING, unless we can think of a nice way to programmatically
describe how SD_ASYM_PACKING should be set.

> Thanks and BR,
> Ricardo
  
Ricardo Neri Jan. 13, 2023, 1:31 a.m. UTC | #8
On Tue, Jan 10, 2023 at 07:17:51PM +0000, Valentin Schneider wrote:
> On 29/12/22 11:02, Ricardo Neri wrote:
> > On Thu, Dec 22, 2022 at 04:56:51PM +0000, Valentin Schneider wrote:
> >> diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
> >> index 57bde66d95f7a..8dc16942135b4 100644
> >> --- a/include/linux/sched/sd_flags.h
> >> +++ b/include/linux/sched/sd_flags.h
> >> @@ -132,12 +132,12 @@ SD_FLAG(SD_SERIALIZE, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
> >>  /*
> >>   * Place busy tasks earlier in the domain
> >>   *
> >> - * SHARED_CHILD: Usually set on the SMT level. Technically could be set further
> >> - *               up, but currently assumed to be set from the base domain
> >> - *               upwards (see update_top_cache_domain()).
> >> + * SHARED_PARENT: Usually set on the SMT level. Can be set further up if all
> >> + *                siblings of an SMT core are identical, but SMT cores themselves
> >> + *                have different priorites.
> >>   * NEEDS_GROUPS: Load balancing flag.
> >>   */
> >> -SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
> >> +SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
> >
> > But this would not work for Power7. It only has SD_ASYM_PACKING in the SMT
> > sched domain. Must it have either of these flags?
> >
> 
> It's not mandatory, but making sure SD flags conform to either of them
> means the topology debugging infra can help spot misshapen topologies...
> 
> > In Power7 SMT siblings have the different priority but, IIUC, physical
> > cores are identical.
> >
> 
> 
> ...But you're right, this doesn't work with Power7 as it would need
> SD_ASYM_PACKING all the way up the topology to conform with
> SDF_SHARED_PARENT, which clearly doesn't work with how Power7 uses
> asym_packing.
> 
> > It seems to me that asym_packing is specific to a domain.
> >
> 
> For Power7 it is, since the asymmetry is only between siblings of a given
> core. For other systems where the asymmetry is between cores, that could
> theoretically affect several levels. Consider:
> 
>   DIE [                      ]
>   MC  [          ][          ]
>   SMT [    ][    ][    ][    ]
>   CPU  0  1  2  3  4  5  6  7
>   prio 3  3  2  2  1  1  0  0
> 
> As done in your patch, here asym_packing doesn't make sense for SMT, but it
> does for MC and DIE.
> 
> Anywho, I think what this means if we should drop the SDF_SHARED_* metaflag
> for SD_ASYM_PACKING, unless we can think of a nice way to programmatically
> describe how SD_ASYM_PACKING should be set.

Perhaps it can be done by adding an extra check for sg::asym_prefer_cpu. In
the example you give, DIE does not need SD_ASYM_PACKING if asym_prefer_cpu 
all the MC sched groups of have the same priority. This would satisfy both
Power7 and x86.

This assumes that priorities are available when checking the sanity of the
topology.

Thanks and BR,
Ricardo
  

Patch

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 3f3ea0287f69..c3de98224cb4 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -545,7 +545,7 @@  static int x86_core_flags(void)
 #ifdef CONFIG_SCHED_SMT
 static int x86_smt_flags(void)
 {
-	return cpu_smt_flags() | x86_sched_itmt_flags();
+	return cpu_smt_flags();
 }
 #endif
 #ifdef CONFIG_SCHED_CLUSTER