[v2,4/7] sched/fair: Introduce sched_smt_siblings_idle()

Message ID 20221122203532.15013-5-ricardo.neri-calderon@linux.intel.com
State New
Headers
Series x86/sched: Avoid unnecessary migrations within SMT domains |

Commit Message

Ricardo Neri Nov. 22, 2022, 8:35 p.m. UTC
  Architectures that implement arch_asym_cpu_priority() may need to know the
idle state of the SMT siblings of a CPU. The scheduler has this information
and functionality. Expose it.

Move the existing functionality outside of the NUMA code.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * Introduced this patch.
---
 include/linux/sched.h |  2 ++
 kernel/sched/fair.c   | 39 ++++++++++++++++++++++-----------------
 2 files changed, 24 insertions(+), 17 deletions(-)
  

Comments

Dietmar Eggemann Dec. 6, 2022, 6:03 p.m. UTC | #1
On 22/11/2022 21:35, Ricardo Neri wrote:
> Architectures that implement arch_asym_cpu_priority() may need to know the
> idle state of the SMT siblings of a CPU. The scheduler has this information
> and functionality. Expose it.
> 
> Move the existing functionality outside of the NUMA code.

[...]

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0e4251f83807..9517c48df50e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1052,6 +1052,28 @@ update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
>   * Scheduling class queueing methods:
>   */
>  
> +static inline bool is_core_idle(int cpu)
> +{
> +#ifdef CONFIG_SCHED_SMT
> +	int sibling;
> +
> +	for_each_cpu(sibling, cpu_smt_mask(cpu)) {
> +		if (cpu == sibling)
> +			continue;
> +
> +		if (!idle_cpu(sibling))
> +			return false;
> +	}
> +#endif
> +
> +	return true;
> +}
> +
> +bool sched_smt_siblings_idle(int cpu)
> +{
> +	return is_core_idle(cpu);
> +}

Nitpick: Can we not just have one exported function for both use-cases:
NUMA and x86 ITMT?

[...]
  
Ricardo Neri Dec. 12, 2022, 5:54 p.m. UTC | #2
On Tue, Dec 06, 2022 at 07:03:37PM +0100, Dietmar Eggemann wrote:
> On 22/11/2022 21:35, Ricardo Neri wrote:
> > Architectures that implement arch_asym_cpu_priority() may need to know the
> > idle state of the SMT siblings of a CPU. The scheduler has this information
> > and functionality. Expose it.
> > 
> > Move the existing functionality outside of the NUMA code.
> 
> [...]
> 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 0e4251f83807..9517c48df50e 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -1052,6 +1052,28 @@ update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
> >   * Scheduling class queueing methods:
> >   */
> >  
> > +static inline bool is_core_idle(int cpu)
> > +{
> > +#ifdef CONFIG_SCHED_SMT
> > +	int sibling;
> > +
> > +	for_each_cpu(sibling, cpu_smt_mask(cpu)) {
> > +		if (cpu == sibling)
> > +			continue;
> > +
> > +		if (!idle_cpu(sibling))
> > +			return false;
> > +	}
> > +#endif
> > +
> > +	return true;
> > +}
> > +
> > +bool sched_smt_siblings_idle(int cpu)
> > +{
> > +	return is_core_idle(cpu);
> > +}
> 
> Nitpick: Can we not just have one exported function for both use-cases:
> NUMA and x86 ITMT?

By adding a new function I intend to preserve the inlinig of is_core_idle()
in update_numa_stats() (via numa_idle_core(), which is also inline). Do you
think there is no value?

A downside of having the new function is that now the code is duplicated
in update_numa_stats() and sched_smt_siblings_idle().

I can take your suggestion if losing the inline is OK.

Thanks and BR,
Ricardo
  
Dietmar Eggemann Dec. 22, 2022, 11:12 a.m. UTC | #3
On 12/12/2022 18:54, Ricardo Neri wrote:
> On Tue, Dec 06, 2022 at 07:03:37PM +0100, Dietmar Eggemann wrote:
>> On 22/11/2022 21:35, Ricardo Neri wrote:

[...]

>>> +bool sched_smt_siblings_idle(int cpu)
>>> +{
>>> +	return is_core_idle(cpu);
>>> +}
>>
>> Nitpick: Can we not just have one exported function for both use-cases:
>> NUMA and x86 ITMT?
> 
> By adding a new function I intend to preserve the inlinig of is_core_idle()
> in update_numa_stats() (via numa_idle_core(), which is also inline). Do you
> think there is no value?

OK. It's only used in NUMA balancing (task_numa_fault() -> ... ->
update_numa_stats()). I can't see that this will have a noticeable perf
impact but only benchmark can really tell.

A `static inline bool sched_is_core_idle(int cpu)` via
include/linux/sched/topology.h might work? We have similar functions
(like sched_core_cookie_match()` but in the private scheduler header
file kernel/sched/sched.h though.

> A downside of having the new function is that now the code is duplicated
> in update_numa_stats() and sched_smt_siblings_idle().
> 
> I can take your suggestion if losing the inline is OK.

I doubt that it will have an impact but can't be sure.

[...]
  
Valentin Schneider Dec. 22, 2022, 4:56 p.m. UTC | #4
On 22/11/22 12:35, Ricardo Neri wrote:
> Architectures that implement arch_asym_cpu_priority() may need to know the
> idle state of the SMT siblings of a CPU. The scheduler has this information
> and functionality. Expose it.
>
> Move the existing functionality outside of the NUMA code.
>

test_idle_cores() does something similar without an iteration, did you
consider using that instead?

> Cc: Ben Segall <bsegall@google.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Len Brown <len.brown@intel.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tim C. Chen <tim.c.chen@intel.com>
> Cc: Valentin Schneider <vschneid@redhat.com>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
> Changes since v1:
>  * Introduced this patch.
> ---
>  include/linux/sched.h |  2 ++
>  kernel/sched/fair.c   | 39 ++++++++++++++++++++++-----------------
>  2 files changed, 24 insertions(+), 17 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index ffb6eb55cd13..0d01c64ac737 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2426,4 +2426,6 @@ static inline void sched_core_fork(struct task_struct *p) { }
>  
>  extern void sched_set_stop_task(int cpu, struct task_struct *stop);
>  
> +extern bool sched_smt_siblings_idle(int cpu);
> +
>  #endif
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0e4251f83807..9517c48df50e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1052,6 +1052,28 @@ update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
>   * Scheduling class queueing methods:
>   */
>  
> +static inline bool is_core_idle(int cpu)
> +{
> +#ifdef CONFIG_SCHED_SMT
> +	int sibling;
> +
> +	for_each_cpu(sibling, cpu_smt_mask(cpu)) {
> +		if (cpu == sibling)
> +			continue;
> +
> +		if (!idle_cpu(sibling))
> +			return false;
> +	}
> +#endif
> +
> +	return true;
> +}
> +
> +bool sched_smt_siblings_idle(int cpu)
> +{
> +	return is_core_idle(cpu);
> +}
> +
>  #ifdef CONFIG_NUMA
>  #define NUMA_IMBALANCE_MIN 2
>  
> @@ -1691,23 +1713,6 @@ struct numa_stats {
>  	int idle_cpu;
>  };
>  
> -static inline bool is_core_idle(int cpu)
> -{
> -#ifdef CONFIG_SCHED_SMT
> -	int sibling;
> -
> -	for_each_cpu(sibling, cpu_smt_mask(cpu)) {
> -		if (cpu == sibling)
> -			continue;
> -
> -		if (!idle_cpu(sibling))
> -			return false;
> -	}
> -#endif
> -
> -	return true;
> -}
> -
>  struct task_numa_env {
>  	struct task_struct *p;
>  
> -- 
> 2.25.1
  
Ricardo Neri Dec. 24, 2022, 5:28 a.m. UTC | #5
On Thu, Dec 22, 2022 at 04:56:22PM +0000, Valentin Schneider wrote:
> On 22/11/22 12:35, Ricardo Neri wrote:
> > Architectures that implement arch_asym_cpu_priority() may need to know the
> > idle state of the SMT siblings of a CPU. The scheduler has this information
> > and functionality. Expose it.
> >
> > Move the existing functionality outside of the NUMA code.
> >
> 
> test_idle_cores() does something similar without an iteration, did you
> consider using that instead?

IIUC, test_idle_cores() returns true if there is at least one idle core in
the package. In my case, I need to know the idle state of only the SMT
siblings of a specific CPU. Am I missing something?

Thanks and BR,
Ricardo
  
Chen Yu Dec. 28, 2022, 3:29 p.m. UTC | #6
On 2022-12-23 at 21:28:50 -0800, Ricardo Neri wrote:
> On Thu, Dec 22, 2022 at 04:56:22PM +0000, Valentin Schneider wrote:
> > On 22/11/22 12:35, Ricardo Neri wrote:
> > > Architectures that implement arch_asym_cpu_priority() may need to know the
> > > idle state of the SMT siblings of a CPU. The scheduler has this information
> > > and functionality. Expose it.
> > >
> > > Move the existing functionality outside of the NUMA code.
> > >
> > 
> > test_idle_cores() does something similar without an iteration, did you
> > consider using that instead?
> 
> IIUC, test_idle_cores() returns true if there is at least one idle core in
> the package. In my case, I need to know the idle state of only the SMT
> siblings of a specific CPU. Am I missing something?
>
I guess a similar one is select_idle_core(), but it also consider the CPU with
SCHED_IDLE task as idle. Is CPU with SCHED_IDLE task a candidate in your
scenario?

thanks,
Chenyu
> Thanks and BR,
> Ricardo
  
Ricardo Neri Dec. 30, 2022, 12:17 a.m. UTC | #7
On Wed, Dec 28, 2022 at 11:29:52PM +0800, Chen Yu wrote:
> On 2022-12-23 at 21:28:50 -0800, Ricardo Neri wrote:
> > On Thu, Dec 22, 2022 at 04:56:22PM +0000, Valentin Schneider wrote:
> > > On 22/11/22 12:35, Ricardo Neri wrote:
> > > > Architectures that implement arch_asym_cpu_priority() may need to know the
> > > > idle state of the SMT siblings of a CPU. The scheduler has this information
> > > > and functionality. Expose it.
> > > >
> > > > Move the existing functionality outside of the NUMA code.
> > > >
> > > 
> > > test_idle_cores() does something similar without an iteration, did you
> > > consider using that instead?
> > 
> > IIUC, test_idle_cores() returns true if there is at least one idle core in
> > the package. In my case, I need to know the idle state of only the SMT
> > siblings of a specific CPU. Am I missing something?
> >
> I guess a similar one is select_idle_core(), but it also consider the CPU with
> SCHED_IDLE task as idle. Is CPU with SCHED_IDLE task a candidate in your
> scenario?

However, we are not looking for an idle CPU. We want to know the idle state of
the siblings of a CPU. I see that select_idle_core() uses available_idle_cpu(),
which in turn uses idle_cpu(). is_core_idle() also uses it.

As per 943d355d7fee ("sched/core: Distinguish between idle_cpu() calls based on
desired effect, introduce available_idle_cpu()") the load balancer can just
call idle_cpu().

Thanks and BR,
Ricardo
  
Valentin Schneider Jan. 10, 2023, 7:21 p.m. UTC | #8
On 23/12/22 21:28, Ricardo Neri wrote:
> On Thu, Dec 22, 2022 at 04:56:22PM +0000, Valentin Schneider wrote:
>> On 22/11/22 12:35, Ricardo Neri wrote:
>> > Architectures that implement arch_asym_cpu_priority() may need to know the
>> > idle state of the SMT siblings of a CPU. The scheduler has this information
>> > and functionality. Expose it.
>> >
>> > Move the existing functionality outside of the NUMA code.
>> >
>> 
>> test_idle_cores() does something similar without an iteration, did you
>> consider using that instead?
>
> IIUC, test_idle_cores() returns true if there is at least one idle core in
> the package. In my case, I need to know the idle state of only the SMT
> siblings of a specific CPU. Am I missing something?
>

No, you're right, clearly I needed that end of the year break - sorry for
the noise.

> Thanks and BR,
> Ricardo
  

Patch

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ffb6eb55cd13..0d01c64ac737 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2426,4 +2426,6 @@  static inline void sched_core_fork(struct task_struct *p) { }
 
 extern void sched_set_stop_task(int cpu, struct task_struct *stop);
 
+extern bool sched_smt_siblings_idle(int cpu);
+
 #endif
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0e4251f83807..9517c48df50e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1052,6 +1052,28 @@  update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
  * Scheduling class queueing methods:
  */
 
+static inline bool is_core_idle(int cpu)
+{
+#ifdef CONFIG_SCHED_SMT
+	int sibling;
+
+	for_each_cpu(sibling, cpu_smt_mask(cpu)) {
+		if (cpu == sibling)
+			continue;
+
+		if (!idle_cpu(sibling))
+			return false;
+	}
+#endif
+
+	return true;
+}
+
+bool sched_smt_siblings_idle(int cpu)
+{
+	return is_core_idle(cpu);
+}
+
 #ifdef CONFIG_NUMA
 #define NUMA_IMBALANCE_MIN 2
 
@@ -1691,23 +1713,6 @@  struct numa_stats {
 	int idle_cpu;
 };
 
-static inline bool is_core_idle(int cpu)
-{
-#ifdef CONFIG_SCHED_SMT
-	int sibling;
-
-	for_each_cpu(sibling, cpu_smt_mask(cpu)) {
-		if (cpu == sibling)
-			continue;
-
-		if (!idle_cpu(sibling))
-			return false;
-	}
-#endif
-
-	return true;
-}
-
 struct task_numa_env {
 	struct task_struct *p;