[v3,2/6] sched/topology: Record number of cores in sched group

Message ID 04641eeb0e95c21224352f5743ecb93dfac44654.1688770494.git.tim.c.chen@linux.intel.com
State New
Headers
Series Enable Cluster Scheduling for x86 Hybrid CPUs |

Commit Message

Tim Chen July 7, 2023, 10:57 p.m. UTC
  From: Tim C Chen <tim.c.chen@linux.intel.com>

When balancing sibling domains that have different number of cores,
tasks in respective sibling domain should be proportional to the number
of cores in each domain. In preparation of implementing such a policy,
record the number of tasks in a scheduling group.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/sched.h    |  1 +
 kernel/sched/topology.c | 10 +++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)
  

Comments

Valentin Schneider July 10, 2023, 8:33 p.m. UTC | #1
On 07/07/23 15:57, Tim Chen wrote:
> From: Tim C Chen <tim.c.chen@linux.intel.com>
>
> When balancing sibling domains that have different number of cores,
> tasks in respective sibling domain should be proportional to the number
> of cores in each domain. In preparation of implementing such a policy,
> record the number of tasks in a scheduling group.
>
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> ---
>  kernel/sched/sched.h    |  1 +
>  kernel/sched/topology.c | 10 +++++++++-
>  2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 3d0eb36350d2..5f7f36e45b87 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1860,6 +1860,7 @@ struct sched_group {
>       atomic_t		ref;
>
>       unsigned int		group_weight;
> +	unsigned int		cores;
>       struct sched_group_capacity *sgc;
>       int			asym_prefer_cpu;	/* CPU of highest priority in group */
>       int			flags;
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 6d5628fcebcf..6b099dbdfb39 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1275,14 +1275,22 @@ build_sched_groups(struct sched_domain *sd, int cpu)
>  static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
>  {
>       struct sched_group *sg = sd->groups;
> +	struct cpumask *mask = sched_domains_tmpmask2;
>
>       WARN_ON(!sg);
>
>       do {
> -		int cpu, max_cpu = -1;
> +		int cpu, cores = 0, max_cpu = -1;
>
>               sg->group_weight = cpumask_weight(sched_group_span(sg));
>
> +		cpumask_copy(mask, sched_group_span(sg));
> +		for_each_cpu(cpu, mask) {
> +			cores++;
> +			cpumask_andnot(mask, mask, cpu_smt_mask(cpu));
> +		}


This rekindled my desire for an SMT core cpumask/iterator. I played around
with a global mask but that's a headache: what if we end up with a core
whose SMT threads are split across two exclusive cpusets?

I ended up necro'ing a patch from Peter [1], but didn't get anywhere nice
(the LLC shared storage caused me issues).

All that to say, I couldn't think of a nicer way :(

[1]: https://lore.kernel.org/all/20180530143106.082002139@infradead.org/#t
  
Tim Chen July 10, 2023, 10:13 p.m. UTC | #2
On Mon, 2023-07-10 at 21:33 +0100, Valentin Schneider wrote:
> On 07/07/23 15:57, Tim Chen wrote:
> > From: Tim C Chen <tim.c.chen@linux.intel.com>
> > 
> > When balancing sibling domains that have different number of cores,
> > tasks in respective sibling domain should be proportional to the number
> > of cores in each domain. In preparation of implementing such a policy,
> > record the number of tasks in a scheduling group.
> > 
> > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > ---
> >  kernel/sched/sched.h    |  1 +
> >  kernel/sched/topology.c | 10 +++++++++-
> >  2 files changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > index 3d0eb36350d2..5f7f36e45b87 100644
> > --- a/kernel/sched/sched.h
> > +++ b/kernel/sched/sched.h
> > @@ -1860,6 +1860,7 @@ struct sched_group {
> >       atomic_t		ref;
> > 
> >       unsigned int		group_weight;
> > +	unsigned int		cores;
> >       struct sched_group_capacity *sgc;
> >       int			asym_prefer_cpu;	/* CPU of highest priority in group */
> >       int			flags;
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 6d5628fcebcf..6b099dbdfb39 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -1275,14 +1275,22 @@ build_sched_groups(struct sched_domain *sd, int cpu)
> >  static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
> >  {
> >       struct sched_group *sg = sd->groups;
> > +	struct cpumask *mask = sched_domains_tmpmask2;
> > 
> >       WARN_ON(!sg);
> > 
> >       do {
> > -		int cpu, max_cpu = -1;
> > +		int cpu, cores = 0, max_cpu = -1;
> > 
> >               sg->group_weight = cpumask_weight(sched_group_span(sg));
> > 
> > +		cpumask_copy(mask, sched_group_span(sg));
> > +		for_each_cpu(cpu, mask) {
> > +			cores++;
> > +			cpumask_andnot(mask, mask, cpu_smt_mask(cpu));
> > +		}
> 
> 
> This rekindled my desire for an SMT core cpumask/iterator. I played around
> with a global mask but that's a headache: what if we end up with a core
> whose SMT threads are split across two exclusive cpusets?

Peter and I pondered that for a while.  But it seems like partitioning
threads in a core between two different sched domains is not a very
reasonable thing to do. 

https://lore.kernel.org/all/20230612112945.GK4253@hirez.programming.kicks-ass.net/

Tim

> 
> I ended up necro'ing a patch from Peter [1], but didn't get anywhere nice
> (the LLC shared storage caused me issues).
> 
> All that to say, I couldn't think of a nicer way :(
> 
> [1]: https://lore.kernel.org/all/20180530143106.082002139@infradead.org/#t
>
  
Tim Chen July 10, 2023, 10:40 p.m. UTC | #3
On Fri, 2023-07-07 at 15:57 -0700, Tim Chen wrote:
> From: Tim C Chen <tim.c.chen@linux.intel.com>
> 
> When balancing sibling domains that have different number of cores,
> tasks in respective sibling domain should be proportional to the number
> of cores in each domain. In preparation of implementing such a policy,
> record the number of tasks in a scheduling group.

Caught a typo.  Should be "the number of cores" instead of
"the number of tasks" in a scheduling group.

Peter, should I send you another patch with the corrected commit log?

Tim

> 
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> ---
>  kernel/sched/sched.h    |  1 +
>  kernel/sched/topology.c | 10 +++++++++-
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 3d0eb36350d2..5f7f36e45b87 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1860,6 +1860,7 @@ struct sched_group {
>  	atomic_t		ref;
>  
>  	unsigned int		group_weight;
> +	unsigned int		cores;
>  	struct sched_group_capacity *sgc;
>  	int			asym_prefer_cpu;	/* CPU of highest priority in group */
>  	int			flags;
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 6d5628fcebcf..6b099dbdfb39 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1275,14 +1275,22 @@ build_sched_groups(struct sched_domain *sd, int cpu)
>  static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
>  {
>  	struct sched_group *sg = sd->groups;
> +	struct cpumask *mask = sched_domains_tmpmask2;
>  
>  	WARN_ON(!sg);
>  
>  	do {
> -		int cpu, max_cpu = -1;
> +		int cpu, cores = 0, max_cpu = -1;
>  
>  		sg->group_weight = cpumask_weight(sched_group_span(sg));
>  
> +		cpumask_copy(mask, sched_group_span(sg));
> +		for_each_cpu(cpu, mask) {
> +			cores++;
> +			cpumask_andnot(mask, mask, cpu_smt_mask(cpu));
> +		}
> +		sg->cores = cores;
> +
>  		if (!(sd->flags & SD_ASYM_PACKING))
>  			goto next;
>
  
Peter Zijlstra July 11, 2023, 11:31 a.m. UTC | #4
On Mon, Jul 10, 2023 at 03:40:34PM -0700, Tim Chen wrote:
> On Fri, 2023-07-07 at 15:57 -0700, Tim Chen wrote:
> > From: Tim C Chen <tim.c.chen@linux.intel.com>
> > 
> > When balancing sibling domains that have different number of cores,
> > tasks in respective sibling domain should be proportional to the number
> > of cores in each domain. In preparation of implementing such a policy,
> > record the number of tasks in a scheduling group.
> 
> Caught a typo.  Should be "the number of cores" instead of
> "the number of tasks" in a scheduling group.
> 
> Peter, should I send you another patch with the corrected commit log?

I'll fix it up, already had to fix the patch because due to robot
finding a compile fail for SCHED_SMT=n builds.



> > @@ -1275,14 +1275,22 @@ build_sched_groups(struct sched_domain *sd, int cpu)
> >  static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
> >  {
> >  	struct sched_group *sg = sd->groups;
> > +	struct cpumask *mask = sched_domains_tmpmask2;
> >  
> >  	WARN_ON(!sg);
> >  
> >  	do {
> > -		int cpu, max_cpu = -1;
> > +		int cpu, cores = 0, max_cpu = -1;
> >  
> >  		sg->group_weight = cpumask_weight(sched_group_span(sg));
> >  
> > +		cpumask_copy(mask, sched_group_span(sg));
> > +		for_each_cpu(cpu, mask) {
> > +			cores++;
#ifdef CONFIG_SCHED_SMT
> > +			cpumask_andnot(mask, mask, cpu_smt_mask(cpu));
#else
			__cpumask_clear_cpu(cpu, mask);
#endif

or something along them lines -- should be in queue.git/sched/core
already.

> > +		}
> > +		sg->cores = cores;
> > +
> >  		if (!(sd->flags & SD_ASYM_PACKING))
> >  			goto next;
> >  
>
  
Tim Chen July 11, 2023, 4:32 p.m. UTC | #5
On Tue, 2023-07-11 at 13:31 +0200, Peter Zijlstra wrote:
> On Mon, Jul 10, 2023 at 03:40:34PM -0700, Tim Chen wrote:
> > On Fri, 2023-07-07 at 15:57 -0700, Tim Chen wrote:
> > > From: Tim C Chen <tim.c.chen@linux.intel.com>
> > > 
> > > When balancing sibling domains that have different number of cores,
> > > tasks in respective sibling domain should be proportional to the number
> > > of cores in each domain. In preparation of implementing such a policy,
> > > record the number of tasks in a scheduling group.
> > 
> > Caught a typo.  Should be "the number of cores" instead of
> > "the number of tasks" in a scheduling group.
> > 
> > Peter, should I send you another patch with the corrected commit log?
> 
> I'll fix it up, already had to fix the patch because due to robot
> finding a compile fail for SCHED_SMT=n builds.
> 
> 
> 
> > > @@ -1275,14 +1275,22 @@ build_sched_groups(struct sched_domain *sd, int cpu)
> > >  static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
> > >  {
> > >  	struct sched_group *sg = sd->groups;
> > > +	struct cpumask *mask = sched_domains_tmpmask2;
> > >  
> > >  	WARN_ON(!sg);
> > >  
> > >  	do {
> > > -		int cpu, max_cpu = -1;
> > > +		int cpu, cores = 0, max_cpu = -1;
> > >  
> > >  		sg->group_weight = cpumask_weight(sched_group_span(sg));
> > >  
> > > +		cpumask_copy(mask, sched_group_span(sg));
> > > +		for_each_cpu(cpu, mask) {
> > > +			cores++;
> #ifdef CONFIG_SCHED_SMT
> > > +			cpumask_andnot(mask, mask, cpu_smt_mask(cpu));
> #else
> 			__cpumask_clear_cpu(cpu, mask);

Thanks for fixing up the non SCHED_SMT.

I think "__cpumask_clear_cpu(cpu, mask);" can be removed.

Since we have already considered the CPU in the iterator, clearing it
is unnecessay.  So effectively

for_each_cpu(cpu, mask) {
	cores++;
}

should be good enough for the non SCHED_SMT case.  

Or replace the patch with the patch below so we don't
have #ifdef in the middle of code body.  Either way
is fine.

---

From 9f19714db69739a7985e46bc1f8334d70a69cf2e Mon Sep 17 00:00:00 2001
Message-Id: <9f19714db69739a7985e46bc1f8334d70a69cf2e.1689092923.git.tim.c.chen@linux.intel.com>
In-Reply-To: <cover.1689092923.git.tim.c.chen@linux.intel.com>
References: <cover.1689092923.git.tim.c.chen@linux.intel.com>
From: Tim C Chen <tim.c.chen@linux.intel.com>
Date: Wed, 17 May 2023 09:09:54 -0700
Subject: [Patch v3 2/6] sched/topology: Record number of cores in sched group
To: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>, Vincent Guittot <vincent.guittot@linaro.org>, Ricardo Neri <ricardo.neri@intel.com>, Ravi V. Shankar <ravi.v.shankar@intel.com>, Ben Segall
<bsegall@google.com>, Daniel Bristot de Oliveira <bristot@redhat.com>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Len Brown <len.brown@intel.com>, Mel Gorman <mgorman@suse.de>, Rafael J. Wysocki
<rafael.j.wysocki@intel.com>, Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>, Steven Rostedt <rostedt@goodmis.org>, Tim Chen <tim.c.chen@linux.intel.com>, Valentin Schneider
<vschneid@redhat.com>, Ionela Voinescu <ionela.voinescu@arm.com>, x86@kernel.org, linux-kernel@vger.kernel.org, Shrikanth Hegde <sshegde@linux.vnet.ibm.com>, Srikar Dronamraju
<srikar@linux.vnet.ibm.com>, naveen.n.rao@linux.vnet.ibm.com, Yicong Yang <yangyicong@hisilicon.com>, Barry Song <v-songbaohua@oppo.com>, Chen Yu <yu.c.chen@intel.com>, Hillf Danton <hdanton@sina.com>

When balancing sibling domains that have different number of cores,
tasks in respective sibling domain should be proportional to the number
of cores in each domain. In preparation of implementing such a policy,
record the number of cores in a scheduling group.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/sched.h    |  1 +
 kernel/sched/topology.c | 21 +++++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3d0eb36350d2..5f7f36e45b87 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1860,6 +1860,7 @@ struct sched_group {
 	atomic_t		ref;
 
 	unsigned int		group_weight;
+	unsigned int		cores;
 	struct sched_group_capacity *sgc;
 	int			asym_prefer_cpu;	/* CPU of highest priority in group */
 	int			flags;
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 6d5628fcebcf..4ecdaef3f8ab 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1262,6 +1262,26 @@ build_sched_groups(struct sched_domain *sd, int cpu)
 	return 0;
 }
 
+#ifdef CONFIG_SCHED_SMT
+static inline int sched_group_cores(struct sched_group *sg)
+{
+	struct cpumask *mask = sched_domains_tmpmask2;
+	int cpu, cores = 0;
+
+	cpumask_copy(mask, sched_group_span(sg));
+	for_each_cpu(cpu, mask) {
+		cores++;
+		cpumask_andnot(mask, mask, cpu_smt_mask(cpu));
+	}
+	return cores;
+}
+#else
+static inline int sched_group_cores(struct sched_group *sg)
+{
+	return sg->group_weight;
+}
+#endif
+
 /*
  * Initialize sched groups cpu_capacity.
  *
@@ -1282,6 +1302,7 @@ static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
 		int cpu, max_cpu = -1;
 
 		sg->group_weight = cpumask_weight(sched_group_span(sg));
+		sg->cores = sched_group_cores(sg);
 
 		if (!(sd->flags & SD_ASYM_PACKING))
 			goto next;
  
Valentin Schneider July 12, 2023, 9:27 a.m. UTC | #6
On 10/07/23 15:13, Tim Chen wrote:
> On Mon, 2023-07-10 at 21:33 +0100, Valentin Schneider wrote:
>> On 07/07/23 15:57, Tim Chen wrote:
>> > From: Tim C Chen <tim.c.chen@linux.intel.com>
>> >
>> > When balancing sibling domains that have different number of cores,
>> > tasks in respective sibling domain should be proportional to the number
>> > of cores in each domain. In preparation of implementing such a policy,
>> > record the number of tasks in a scheduling group.
>> >
>> > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
>> > ---
>> >  kernel/sched/sched.h    |  1 +
>> >  kernel/sched/topology.c | 10 +++++++++-
>> >  2 files changed, 10 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> > index 3d0eb36350d2..5f7f36e45b87 100644
>> > --- a/kernel/sched/sched.h
>> > +++ b/kernel/sched/sched.h
>> > @@ -1860,6 +1860,7 @@ struct sched_group {
>> >       atomic_t		ref;
>> >
>> >       unsigned int		group_weight;
>> > +	unsigned int		cores;
>> >       struct sched_group_capacity *sgc;
>> >       int			asym_prefer_cpu;	/* CPU of highest priority in group */
>> >       int			flags;
>> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
>> > index 6d5628fcebcf..6b099dbdfb39 100644
>> > --- a/kernel/sched/topology.c
>> > +++ b/kernel/sched/topology.c
>> > @@ -1275,14 +1275,22 @@ build_sched_groups(struct sched_domain *sd, int cpu)
>> >  static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
>> >  {
>> >       struct sched_group *sg = sd->groups;
>> > +	struct cpumask *mask = sched_domains_tmpmask2;
>> >
>> >       WARN_ON(!sg);
>> >
>> >       do {
>> > -		int cpu, max_cpu = -1;
>> > +		int cpu, cores = 0, max_cpu = -1;
>> >
>> >               sg->group_weight = cpumask_weight(sched_group_span(sg));
>> >
>> > +		cpumask_copy(mask, sched_group_span(sg));
>> > +		for_each_cpu(cpu, mask) {
>> > +			cores++;
>> > +			cpumask_andnot(mask, mask, cpu_smt_mask(cpu));
>> > +		}
>>
>>
>> This rekindled my desire for an SMT core cpumask/iterator. I played around
>> with a global mask but that's a headache: what if we end up with a core
>> whose SMT threads are split across two exclusive cpusets?
>
> Peter and I pondered that for a while.  But it seems like partitioning
> threads in a core between two different sched domains is not a very
> reasonable thing to do.
>
> https://lore.kernel.org/all/20230612112945.GK4253@hirez.programming.kicks-ass.net/
>

Thanks for the link. I'll poke at this a bit more, but regardless:

Reviewed-by: Valentin Schneider <vschneid@redhat.com>
  

Patch

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3d0eb36350d2..5f7f36e45b87 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1860,6 +1860,7 @@  struct sched_group {
 	atomic_t		ref;
 
 	unsigned int		group_weight;
+	unsigned int		cores;
 	struct sched_group_capacity *sgc;
 	int			asym_prefer_cpu;	/* CPU of highest priority in group */
 	int			flags;
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 6d5628fcebcf..6b099dbdfb39 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1275,14 +1275,22 @@  build_sched_groups(struct sched_domain *sd, int cpu)
 static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
 {
 	struct sched_group *sg = sd->groups;
+	struct cpumask *mask = sched_domains_tmpmask2;
 
 	WARN_ON(!sg);
 
 	do {
-		int cpu, max_cpu = -1;
+		int cpu, cores = 0, max_cpu = -1;
 
 		sg->group_weight = cpumask_weight(sched_group_span(sg));
 
+		cpumask_copy(mask, sched_group_span(sg));
+		for_each_cpu(cpu, mask) {
+			cores++;
+			cpumask_andnot(mask, mask, cpu_smt_mask(cpu));
+		}
+		sg->cores = cores;
+
 		if (!(sd->flags & SD_ASYM_PACKING))
 			goto next;