[v3,04/10] sched/fair: Let low-priority cores help high-priority busy SMT cores

Message ID 20230207045838.11243-5-ricardo.neri-calderon@linux.intel.com
State New
Headers
Series sched/fair: Avoid unnecessary migrations within SMT domains |

Commit Message

Ricardo Neri Feb. 7, 2023, 4:58 a.m. UTC
  Using asym_packing priorities within an SMT core is straightforward. Just
follow the priorities that hardware indicates.

When balancing load from an SMT core, also consider the idle of its
siblings. Priorities do not reflect that an SMT core divides its throughput
among all its busy siblings. They only makes sense when exactly one sibling
is busy.

Indicate that active balance is needed if the destination CPU has lower
priority than the source CPU but the latter has busy SMT siblings.

Make find_busiest_queue() not skip higher-priority SMT cores with more than
busy sibling.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Suggested-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v2:
 * Introduced this patch.

Changes since v1:
 * N/A
---
 kernel/sched/fair.c | 31 ++++++++++++++++++++++++++-----
 1 file changed, 26 insertions(+), 5 deletions(-)
  

Comments

Vincent Guittot Feb. 8, 2023, 7:56 a.m. UTC | #1
On Tue, 7 Feb 2023 at 05:50, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
>
> Using asym_packing priorities within an SMT core is straightforward. Just
> follow the priorities that hardware indicates.
>
> When balancing load from an SMT core, also consider the idle of its
> siblings. Priorities do not reflect that an SMT core divides its throughput
> among all its busy siblings. They only makes sense when exactly one sibling
> is busy.
>
> Indicate that active balance is needed if the destination CPU has lower
> priority than the source CPU but the latter has busy SMT siblings.
>
> Make find_busiest_queue() not skip higher-priority SMT cores with more than
> busy sibling.
>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Len Brown <len.brown@intel.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tim C. Chen <tim.c.chen@intel.com>
> Cc: Valentin Schneider <vschneid@redhat.com>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> Suggested-by: Valentin Schneider <vschneid@redhat.com>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
> Changes since v2:
>  * Introduced this patch.
>
> Changes since v1:
>  * N/A
> ---
>  kernel/sched/fair.c | 31 ++++++++++++++++++++++++++-----
>  1 file changed, 26 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 80c86462c6f6..c9d0ddfd11f2 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10436,11 +10436,20 @@ static struct rq *find_busiest_queue(struct lb_env *env,
>                     nr_running == 1)
>                         continue;
>
> -               /* Make sure we only pull tasks from a CPU of lower priority */
> +               /*
> +                * Make sure we only pull tasks from a CPU of lower priority
> +                * when balancing between SMT siblings.
> +                *
> +                * If balancing between cores, let lower priority CPUs help
> +                * SMT cores with more than one busy sibling.
> +                */
>                 if ((env->sd->flags & SD_ASYM_PACKING) &&
>                     sched_asym_prefer(i, env->dst_cpu) &&
> -                   nr_running == 1)
> -                       continue;
> +                   nr_running == 1) {
> +                       if (env->sd->flags & SD_SHARE_CPUCAPACITY ||
> +                           (!(env->sd->flags & SD_SHARE_CPUCAPACITY) && is_core_idle(i)))

This 2nd if could be merged with the upper one
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10518,11 +10518,10 @@ static struct rq *find_busiest_queue(struct
lb_env *env,
                 */
                if ((env->sd->flags & SD_ASYM_PACKING) &&
                    sched_asym_prefer(i, env->dst_cpu) &&
-                   nr_running == 1) {
-                       if (env->sd->flags & SD_SHARE_CPUCAPACITY ||
-                           (!(env->sd->flags & SD_SHARE_CPUCAPACITY)
&& is_core_idle(i)))
+                   (nr_running == 1) &&
+                   (env->sd->flags & SD_SHARE_CPUCAPACITY ||
+                           (!(env->sd->flags & SD_SHARE_CPUCAPACITY)
&& is_core_idle(i))))
                                continue;
-               }

                switch (env->migration_type) {
                case migrate_load:
---

AFAICT, you can even remove one env->sd->flags & SD_SHARE_CPUCAPACITY
test with the below but this make the condition far less obvious

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a6021af9de11..7dfa30c45327 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10518,11 +10518,10 @@ static struct rq *find_busiest_queue(struct
lb_env *env,
                 */
                if ((env->sd->flags & SD_ASYM_PACKING) &&
                    sched_asym_prefer(i, env->dst_cpu) &&
-                   nr_running == 1) {
-                       if (env->sd->flags & SD_SHARE_CPUCAPACITY ||
-                           (!(env->sd->flags & SD_SHARE_CPUCAPACITY)
&& is_core_idle(i)))
+                   (nr_running == 1) &&
+                   !(!(env->sd->flags & SD_SHARE_CPUCAPACITY) &&
+                       !is_core_idle(i)))
                                continue;
-               }

> +                               continue;
> +               }
>
>                 switch (env->migration_type) {
>                 case migrate_load:
> @@ -10530,8 +10539,20 @@ asym_active_balance(struct lb_env *env)
>          * lower priority CPUs in order to pack all tasks in the
>          * highest priority CPUs.
>          */
> -       return env->idle != CPU_NOT_IDLE && (env->sd->flags & SD_ASYM_PACKING) &&
> -              sched_asym_prefer(env->dst_cpu, env->src_cpu);
> +       if (env->idle != CPU_NOT_IDLE && (env->sd->flags & SD_ASYM_PACKING)) {
> +               /* Always obey priorities between SMT siblings. */
> +               if (env->sd->flags & SD_SHARE_CPUCAPACITY)
> +                       return sched_asym_prefer(env->dst_cpu, env->src_cpu);
> +
> +               /*
> +                * A lower priority CPU can help an SMT core with more than one
> +                * busy sibling.
> +                */
> +               return sched_asym_prefer(env->dst_cpu, env->src_cpu) ||
> +                      !is_core_idle(env->src_cpu);
> +       }
> +
> +       return false;
>  }
>
>  static inline bool
> --
> 2.25.1
>
  
Peter Zijlstra Feb. 9, 2023, 11:53 a.m. UTC | #2
On Wed, Feb 08, 2023 at 08:56:32AM +0100, Vincent Guittot wrote:

> > +                       if (env->sd->flags & SD_SHARE_CPUCAPACITY ||
> > +                           (!(env->sd->flags & SD_SHARE_CPUCAPACITY) && is_core_idle(i)))
> 
> This 2nd if could be merged with the upper one

Wasn't this exact same condition used in the previous patch as well?
Does it wants to be a helper perhaps?
  
Ricardo Neri Feb. 10, 2023, 12:43 a.m. UTC | #3
On Thu, Feb 09, 2023 at 12:53:41PM +0100, Peter Zijlstra wrote:
> On Wed, Feb 08, 2023 at 08:56:32AM +0100, Vincent Guittot wrote:
> 
> > > +                       if (env->sd->flags & SD_SHARE_CPUCAPACITY ||
> > > +                           (!(env->sd->flags & SD_SHARE_CPUCAPACITY) && is_core_idle(i)))
> > 
> > This 2nd if could be merged with the upper one
> 
> Wasn't this exact same condition used in the previous patch as well?
> Does it wants to be a helper perhaps?

Patch 3 focuses on the destination CPU: make sure that it and its SMT
siblings are idle before attempting to do asym_packing balance.

This patch focuses on the busiest group: if the busiest group is an SMT
core with more than one busy sibling, help it even if it has higher
priority.
  
Ricardo Neri Feb. 10, 2023, 1:52 a.m. UTC | #4
On Wed, Feb 08, 2023 at 08:56:32AM +0100, Vincent Guittot wrote:
> On Tue, 7 Feb 2023 at 05:50, Ricardo Neri
> <ricardo.neri-calderon@linux.intel.com> wrote:
> >
> > Using asym_packing priorities within an SMT core is straightforward. Just
> > follow the priorities that hardware indicates.
> >
> > When balancing load from an SMT core, also consider the idle of its
> > siblings. Priorities do not reflect that an SMT core divides its throughput
> > among all its busy siblings. They only makes sense when exactly one sibling
> > is busy.
> >
> > Indicate that active balance is needed if the destination CPU has lower
> > priority than the source CPU but the latter has busy SMT siblings.
> >
> > Make find_busiest_queue() not skip higher-priority SMT cores with more than
> > busy sibling.
> >
> > Cc: Ben Segall <bsegall@google.com>
> > Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > Cc: Len Brown <len.brown@intel.com>
> > Cc: Mel Gorman <mgorman@suse.de>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Tim C. Chen <tim.c.chen@intel.com>
> > Cc: Valentin Schneider <vschneid@redhat.com>
> > Cc: x86@kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Suggested-by: Valentin Schneider <vschneid@redhat.com>
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> > Changes since v2:
> >  * Introduced this patch.
> >
> > Changes since v1:
> >  * N/A
> > ---
> >  kernel/sched/fair.c | 31 ++++++++++++++++++++++++++-----
> >  1 file changed, 26 insertions(+), 5 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 80c86462c6f6..c9d0ddfd11f2 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -10436,11 +10436,20 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> >                     nr_running == 1)
> >                         continue;
> >
> > -               /* Make sure we only pull tasks from a CPU of lower priority */
> > +               /*
> > +                * Make sure we only pull tasks from a CPU of lower priority
> > +                * when balancing between SMT siblings.
> > +                *
> > +                * If balancing between cores, let lower priority CPUs help
> > +                * SMT cores with more than one busy sibling.
> > +                */
> >                 if ((env->sd->flags & SD_ASYM_PACKING) &&
> >                     sched_asym_prefer(i, env->dst_cpu) &&
> > -                   nr_running == 1)
> > -                       continue;
> > +                   nr_running == 1) {
> > +                       if (env->sd->flags & SD_SHARE_CPUCAPACITY ||
> > +                           (!(env->sd->flags & SD_SHARE_CPUCAPACITY) && is_core_idle(i)))
> 
> This 2nd if could be merged with the upper one
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10518,11 +10518,10 @@ static struct rq *find_busiest_queue(struct
> lb_env *env,
>                  */
>                 if ((env->sd->flags & SD_ASYM_PACKING) &&
>                     sched_asym_prefer(i, env->dst_cpu) &&
> -                   nr_running == 1) {
> -                       if (env->sd->flags & SD_SHARE_CPUCAPACITY ||
> -                           (!(env->sd->flags & SD_SHARE_CPUCAPACITY)
> && is_core_idle(i)))
> +                   (nr_running == 1) &&
> +                   (env->sd->flags & SD_SHARE_CPUCAPACITY ||
> +                           (!(env->sd->flags & SD_SHARE_CPUCAPACITY)
> && is_core_idle(i))))
>                                 continue;
> -               }
> 
>                 switch (env->migration_type) {
>                 case migrate_load:
> ---
> 
> AFAICT, you can even remove one env->sd->flags & SD_SHARE_CPUCAPACITY
> test with the below but this make the condition far less obvious
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index a6021af9de11..7dfa30c45327 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10518,11 +10518,10 @@ static struct rq *find_busiest_queue(struct
> lb_env *env,
>                  */
>                 if ((env->sd->flags & SD_ASYM_PACKING) &&
>                     sched_asym_prefer(i, env->dst_cpu) &&
> -                   nr_running == 1) {
> -                       if (env->sd->flags & SD_SHARE_CPUCAPACITY ||
> -                           (!(env->sd->flags & SD_SHARE_CPUCAPACITY)
> && is_core_idle(i)))
> +                   (nr_running == 1) &&
> +                   !(!(env->sd->flags & SD_SHARE_CPUCAPACITY) &&
> +                       !is_core_idle(i)))
>                                 continue;

I agree. This expression is equivalent to what I proposed. It is less
obvious but the comment above clarifies what is going on. I will take
your suggestion.

Thanks and BR,
Ricardo
  
Peter Zijlstra Feb. 10, 2023, 8:41 a.m. UTC | #5
On Thu, Feb 09, 2023 at 04:43:33PM -0800, Ricardo Neri wrote:
> On Thu, Feb 09, 2023 at 12:53:41PM +0100, Peter Zijlstra wrote:
> > On Wed, Feb 08, 2023 at 08:56:32AM +0100, Vincent Guittot wrote:
> > 
> > > > +                       if (env->sd->flags & SD_SHARE_CPUCAPACITY ||
> > > > +                           (!(env->sd->flags & SD_SHARE_CPUCAPACITY) && is_core_idle(i)))
> > > 
> > > This 2nd if could be merged with the upper one
> > 
> > Wasn't this exact same condition used in the previous patch as well?
> > Does it wants to be a helper perhaps?
> 
> Patch 3 focuses on the destination CPU: make sure that it and its SMT
> siblings are idle before attempting to do asym_packing balance.
> 
> This patch focuses on the busiest group: if the busiest group is an SMT
> core with more than one busy sibling, help it even if it has higher
> priority.

Yeah, so? If its a recurring expression a helper never hurts.
  
Ricardo Neri Feb. 10, 2023, 1:05 p.m. UTC | #6
On Fri, Feb 10, 2023 at 09:41:14AM +0100, Peter Zijlstra wrote:
> On Thu, Feb 09, 2023 at 04:43:33PM -0800, Ricardo Neri wrote:
> > On Thu, Feb 09, 2023 at 12:53:41PM +0100, Peter Zijlstra wrote:
> > > On Wed, Feb 08, 2023 at 08:56:32AM +0100, Vincent Guittot wrote:
> > > 
> > > > > +                       if (env->sd->flags & SD_SHARE_CPUCAPACITY ||
> > > > > +                           (!(env->sd->flags & SD_SHARE_CPUCAPACITY) && is_core_idle(i)))
> > > > 
> > > > This 2nd if could be merged with the upper one
> > > 
> > > Wasn't this exact same condition used in the previous patch as well?
> > > Does it wants to be a helper perhaps?
> > 
> > Patch 3 focuses on the destination CPU: make sure that it and its SMT
> > siblings are idle before attempting to do asym_packing balance.
> > 
> > This patch focuses on the busiest group: if the busiest group is an SMT
> > core with more than one busy sibling, help it even if it has higher
> > priority.
> 
> Yeah, so? If its a recurring expression a helper never hurts.

Ah! I get your point now. You meant a helper function. Thank you for the
clarification.

Sure! I can add this helper function.
  
Dietmar Eggemann Feb. 13, 2023, 1:40 p.m. UTC | #7
On 07/02/2023 05:58, Ricardo Neri wrote:

[...]

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 80c86462c6f6..c9d0ddfd11f2 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10436,11 +10436,20 @@ static struct rq *find_busiest_queue(struct lb_env *env,
>  		    nr_running == 1)
>  			continue;
>  
> -		/* Make sure we only pull tasks from a CPU of lower priority */
> +		/*
> +		 * Make sure we only pull tasks from a CPU of lower priority
> +		 * when balancing between SMT siblings.
> +		 *
> +		 * If balancing between cores, let lower priority CPUs help
> +		 * SMT cores with more than one busy sibling.
> +		 */
>  		if ((env->sd->flags & SD_ASYM_PACKING) &&
>  		    sched_asym_prefer(i, env->dst_cpu) &&
> -		    nr_running == 1)
> -			continue;
> +		    nr_running == 1) {
> +			if (env->sd->flags & SD_SHARE_CPUCAPACITY ||
> +			    (!(env->sd->flags & SD_SHARE_CPUCAPACITY) && is_core_idle(i)))
> +				continue;

is_core_idle(i) returns true for !CONFIG_SCHED_SMP. So far it was always
guarded by `flags & SD_SHARE_CPUCAPACITY` which is only set for
CONFIG_SCHED_SMP.

Here it's different but still depends on `flags & SD_ASYM_PACKING`.

Can we have SD_ASYM_PACKING w/o CONFIG_SCHED_SMP? The comment just says
`If balancing between cores (MC), let lower priority CPUs help SMT cores
with more than one busy sibling.`

So this only mentions your specific asymmetric e-cores w/o SMT and
p-cores w/ SMT case.

I'm asking since numa_idle_core(), the only user of is_core_idle() so
far has an extra `!static_branch_likely(&sched_smt_present)` condition
before calling it.

> +		}
>  
>  		switch (env->migration_type) {
>  		case migrate_load:
> @@ -10530,8 +10539,20 @@ asym_active_balance(struct lb_env *env)
>  	 * lower priority CPUs in order to pack all tasks in the
>  	 * highest priority CPUs.
>  	 */
> -	return env->idle != CPU_NOT_IDLE && (env->sd->flags & SD_ASYM_PACKING) &&
> -	       sched_asym_prefer(env->dst_cpu, env->src_cpu);
> +	if (env->idle != CPU_NOT_IDLE && (env->sd->flags & SD_ASYM_PACKING)) {
> +		/* Always obey priorities between SMT siblings. */
> +		if (env->sd->flags & SD_SHARE_CPUCAPACITY)
> +			return sched_asym_prefer(env->dst_cpu, env->src_cpu);
> +
> +		/*
> +		 * A lower priority CPU can help an SMT core with more than one
> +		 * busy sibling.
> +		 */
> +		return sched_asym_prefer(env->dst_cpu, env->src_cpu) ||
> +		       !is_core_idle(env->src_cpu);

Here it is similar.

> +	}
> +
> +	return false;
  
Ricardo Neri Feb. 13, 2023, 11:23 p.m. UTC | #8
On Mon, Feb 13, 2023 at 02:40:24PM +0100, Dietmar Eggemann wrote:
> On 07/02/2023 05:58, Ricardo Neri wrote:
> 
> [...]
> 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 80c86462c6f6..c9d0ddfd11f2 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -10436,11 +10436,20 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> >  		    nr_running == 1)
> >  			continue;
> >  
> > -		/* Make sure we only pull tasks from a CPU of lower priority */
> > +		/*
> > +		 * Make sure we only pull tasks from a CPU of lower priority
> > +		 * when balancing between SMT siblings.
> > +		 *
> > +		 * If balancing between cores, let lower priority CPUs help
> > +		 * SMT cores with more than one busy sibling.
> > +		 */
> >  		if ((env->sd->flags & SD_ASYM_PACKING) &&
> >  		    sched_asym_prefer(i, env->dst_cpu) &&
> > -		    nr_running == 1)
> > -			continue;
> > +		    nr_running == 1) {
> > +			if (env->sd->flags & SD_SHARE_CPUCAPACITY ||
> > +			    (!(env->sd->flags & SD_SHARE_CPUCAPACITY) && is_core_idle(i)))
> > +				continue;
> 
> is_core_idle(i) returns true for !CONFIG_SCHED_SMP. So far it was always
> guarded by `flags & SD_SHARE_CPUCAPACITY` which is only set for
> CONFIG_SCHED_SMP.
> 
> Here it's different but still depends on `flags & SD_ASYM_PACKING`.
> 
> Can we have SD_ASYM_PACKING w/o CONFIG_SCHED_SMP? The comment just says
> `If balancing between cores (MC), let lower priority CPUs help SMT cores
> with more than one busy sibling.`

We cannot have SD_ASYM_PACKING w/o CONFIG_SCHED_SMP. We may have it without
CONFIG_SCHED_SMT. In the latter case we want is_core_idle() to return true
as there are no SMT siblings competing for core throughput and CPU priority 
is meaningful. I can add an extra comment clarifying the !CONFIG_SCHED_SMT /

> 
> So this only mentions your specific asymmetric e-cores w/o SMT and
> p-cores w/ SMT case.
> 
> I'm asking since numa_idle_core(), the only user of is_core_idle() so
> far has an extra `!static_branch_likely(&sched_smt_present)` condition
> before calling it.

That is a good point. Calling is_core_idle() is pointless if
!static_branch_likely(&sched_smt_present).

As per feedback from Vincent and Peter, I have put this logic in a helper
function. I'll add an extra check for this static key.

> 
> > +		}
> >  
> >  		switch (env->migration_type) {
> >  		case migrate_load:
> > @@ -10530,8 +10539,20 @@ asym_active_balance(struct lb_env *env)
> >  	 * lower priority CPUs in order to pack all tasks in the
> >  	 * highest priority CPUs.
> >  	 */
> > -	return env->idle != CPU_NOT_IDLE && (env->sd->flags & SD_ASYM_PACKING) &&
> > -	       sched_asym_prefer(env->dst_cpu, env->src_cpu);
> > +	if (env->idle != CPU_NOT_IDLE && (env->sd->flags & SD_ASYM_PACKING)) {
> > +		/* Always obey priorities between SMT siblings. */
> > +		if (env->sd->flags & SD_SHARE_CPUCAPACITY)
> > +			return sched_asym_prefer(env->dst_cpu, env->src_cpu);
> > +
> > +		/*
> > +		 * A lower priority CPU can help an SMT core with more than one
> > +		 * busy sibling.
> > +		 */
> > +		return sched_asym_prefer(env->dst_cpu, env->src_cpu) ||
> > +		       !is_core_idle(env->src_cpu);
> 
> Here it is similar.

I will use my helper function here as well.

Thanks and BR,
Ricardo
  
Tim Chen March 10, 2023, 12:51 a.m. UTC | #9
On Mon, 2023-02-06 at 20:58 -0800, Ricardo Neri wrote:
> Using asym_packing priorities within an SMT core is straightforward.
> Just
> follow the priorities that hardware indicates.
> 
> When balancing load from an SMT core, also consider the idle of its
> siblings. Priorities do not reflect that an SMT core divides its
> throughput
> among all its busy siblings. They only makes sense when exactly one
> sibling
> is busy.
> 
> Indicate that active balance is needed if the destination CPU has
> lower
> priority than the source CPU but the latter has busy SMT siblings.
> 
> Make find_busiest_queue() not skip higher-priority SMT cores with
> more than
> busy sibling.
> 
> Cc: Ben Segall <bsegall@google.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Len Brown <len.brown@intel.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tim C. Chen <tim.c.chen@intel.com>
> Cc: Valentin Schneider <vschneid@redhat.com>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> Suggested-by: Valentin Schneider <vschneid@redhat.com>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
> Changes since v2:
>  * Introduced this patch.
> 
> Changes since v1:
>  * N/A
> ---
>  kernel/sched/fair.c | 31 ++++++++++++++++++++++++++-----
>  1 file changed, 26 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 80c86462c6f6..c9d0ddfd11f2 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10436,11 +10436,20 @@ static struct rq *find_busiest_queue(struct
> lb_env *env,
>                     nr_running == 1)
>                         continue;
>  
> -               /* Make sure we only pull tasks from a CPU of lower
> priority */
> +               /*
> +                * Make sure we only pull tasks from a CPU of lower
> priority
> +                * when balancing between SMT siblings.
> +                *
> +                * If balancing between cores, let lower priority
> CPUs help
> +                * SMT cores with more than one busy sibling.
> +                */
>                 if ((env->sd->flags & SD_ASYM_PACKING) &&
>                     sched_asym_prefer(i, env->dst_cpu) &&
> -                   nr_running == 1)
> -                       continue;
> +                   nr_running == 1) {
> +                       if (env->sd->flags & SD_SHARE_CPUCAPACITY ||
> +                           (!(env->sd->flags & SD_SHARE_CPUCAPACITY)
> && is_core_idle(i)))
> +                               continue;
> +               }
>  
>                 switch (env->migration_type) {
>                 case migrate_load:
> @@ -10530,8 +10539,20 @@ asym_active_balance(struct lb_env *env)
>          * lower priority CPUs in order to pack all tasks in the
>          * highest priority CPUs.
>          */
> -       return env->idle != CPU_NOT_IDLE && (env->sd->flags &
> SD_ASYM_PACKING) &&
> -              sched_asym_prefer(env->dst_cpu, env->src_cpu);
> +       if (env->idle != CPU_NOT_IDLE && (env->sd->flags &
> SD_ASYM_PACKING)) {
> +               /* Always obey priorities between SMT siblings. */
> +               if (env->sd->flags & SD_SHARE_CPUCAPACITY)
> +                       return sched_asym_prefer(env->dst_cpu, env-
> >src_cpu);
> +
> +               /*
> +                * A lower priority CPU can help an SMT core with
> more than one
> +                * busy sibling.
> +                */
> +               return sched_asym_prefer(env->dst_cpu, env->src_cpu)
> ||
> +                      !is_core_idle(env->src_cpu);
> +       }

Suppose we have the Atom cores in a sched group (e.g. a cluster),
this will pull the tasks from those core to a SMT thread even if
its sibling thread is busy.  Suggest this change

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index da1afa99cd55..b671cb0d7ab3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10473,9 +10473,11 @@ asym_active_balance(struct lb_env *env)
                /*
                 * A lower priority CPU can help an SMT core with more than one
                 * busy sibling.
+                * Pull only if no SMT sibling busy.
                 */
-               return sched_asym_prefer(env->dst_cpu, env->src_cpu) ||
-                      !is_core_idle(env->src_cpu);
+               if (is_core_idle(env->dst_cpu))
+                       return sched_asym_prefer(env->dst_cpu, env->src_cpu) ||
+                              !is_core_idle(env->src_cpu);
        }


Tim

> +
> +       return false;
>  }
>  
>  static inline bool
  
Ricardo Neri March 14, 2023, 11:54 p.m. UTC | #10
On Thu, Mar 09, 2023 at 04:51:35PM -0800, Tim Chen wrote:
> On Mon, 2023-02-06 at 20:58 -0800, Ricardo Neri wrote:
> > Using asym_packing priorities within an SMT core is straightforward.
> > Just
> > follow the priorities that hardware indicates.
> > 
> > When balancing load from an SMT core, also consider the idle of its
> > siblings. Priorities do not reflect that an SMT core divides its
> > throughput
> > among all its busy siblings. They only makes sense when exactly one
> > sibling
> > is busy.
> > 
> > Indicate that active balance is needed if the destination CPU has
> > lower
> > priority than the source CPU but the latter has busy SMT siblings.
> > 
> > Make find_busiest_queue() not skip higher-priority SMT cores with
> > more than
> > busy sibling.
> > 
> > Cc: Ben Segall <bsegall@google.com>
> > Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > Cc: Len Brown <len.brown@intel.com>
> > Cc: Mel Gorman <mgorman@suse.de>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Tim C. Chen <tim.c.chen@intel.com>
> > Cc: Valentin Schneider <vschneid@redhat.com>
> > Cc: x86@kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Suggested-by: Valentin Schneider <vschneid@redhat.com>
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> > Changes since v2:
> >  * Introduced this patch.
> > 
> > Changes since v1:
> >  * N/A
> > ---
> >  kernel/sched/fair.c | 31 ++++++++++++++++++++++++++-----
> >  1 file changed, 26 insertions(+), 5 deletions(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 80c86462c6f6..c9d0ddfd11f2 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -10436,11 +10436,20 @@ static struct rq *find_busiest_queue(struct
> > lb_env *env,
> >                     nr_running == 1)
> >                         continue;
> >  
> > -               /* Make sure we only pull tasks from a CPU of lower
> > priority */
> > +               /*
> > +                * Make sure we only pull tasks from a CPU of lower
> > priority
> > +                * when balancing between SMT siblings.
> > +                *
> > +                * If balancing between cores, let lower priority
> > CPUs help
> > +                * SMT cores with more than one busy sibling.
> > +                */
> >                 if ((env->sd->flags & SD_ASYM_PACKING) &&
> >                     sched_asym_prefer(i, env->dst_cpu) &&
> > -                   nr_running == 1)
> > -                       continue;
> > +                   nr_running == 1) {
> > +                       if (env->sd->flags & SD_SHARE_CPUCAPACITY ||
> > +                           (!(env->sd->flags & SD_SHARE_CPUCAPACITY)
> > && is_core_idle(i)))
> > +                               continue;
> > +               }
> >  
> >                 switch (env->migration_type) {
> >                 case migrate_load:
> > @@ -10530,8 +10539,20 @@ asym_active_balance(struct lb_env *env)
> >          * lower priority CPUs in order to pack all tasks in the
> >          * highest priority CPUs.
> >          */
> > -       return env->idle != CPU_NOT_IDLE && (env->sd->flags &
> > SD_ASYM_PACKING) &&
> > -              sched_asym_prefer(env->dst_cpu, env->src_cpu);
> > +       if (env->idle != CPU_NOT_IDLE && (env->sd->flags &
> > SD_ASYM_PACKING)) {
> > +               /* Always obey priorities between SMT siblings. */
> > +               if (env->sd->flags & SD_SHARE_CPUCAPACITY)
> > +                       return sched_asym_prefer(env->dst_cpu, env-
> > >src_cpu);
> > +
> > +               /*
> > +                * A lower priority CPU can help an SMT core with
> > more than one
> > +                * busy sibling.
> > +                */
> > +               return sched_asym_prefer(env->dst_cpu, env->src_cpu)
> > ||
> > +                      !is_core_idle(env->src_cpu);
> > +       }
> 
> Suppose we have the Atom cores in a sched group (e.g. a cluster),
> this will pull the tasks from those core to a SMT thread even if
> its sibling thread is busy.  Suggest this change
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index da1afa99cd55..b671cb0d7ab3 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10473,9 +10473,11 @@ asym_active_balance(struct lb_env *env)
>                 /*
>                  * A lower priority CPU can help an SMT core with more than one
>                  * busy sibling.
> +                * Pull only if no SMT sibling busy.
>                  */
> -               return sched_asym_prefer(env->dst_cpu, env->src_cpu) ||
> -                      !is_core_idle(env->src_cpu);
> +               if (is_core_idle(env->dst_cpu))
> +                       return sched_asym_prefer(env->dst_cpu, env->src_cpu) ||
> +                              !is_core_idle(env->src_cpu);

Thank you Tim! Patch 3 does this check for asym_packing, but we could land
from other types of idle load balancing.

I wil integrate this change to the series.

Thanks and BR,
Ricardo
  

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 80c86462c6f6..c9d0ddfd11f2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10436,11 +10436,20 @@  static struct rq *find_busiest_queue(struct lb_env *env,
 		    nr_running == 1)
 			continue;
 
-		/* Make sure we only pull tasks from a CPU of lower priority */
+		/*
+		 * Make sure we only pull tasks from a CPU of lower priority
+		 * when balancing between SMT siblings.
+		 *
+		 * If balancing between cores, let lower priority CPUs help
+		 * SMT cores with more than one busy sibling.
+		 */
 		if ((env->sd->flags & SD_ASYM_PACKING) &&
 		    sched_asym_prefer(i, env->dst_cpu) &&
-		    nr_running == 1)
-			continue;
+		    nr_running == 1) {
+			if (env->sd->flags & SD_SHARE_CPUCAPACITY ||
+			    (!(env->sd->flags & SD_SHARE_CPUCAPACITY) && is_core_idle(i)))
+				continue;
+		}
 
 		switch (env->migration_type) {
 		case migrate_load:
@@ -10530,8 +10539,20 @@  asym_active_balance(struct lb_env *env)
 	 * lower priority CPUs in order to pack all tasks in the
 	 * highest priority CPUs.
 	 */
-	return env->idle != CPU_NOT_IDLE && (env->sd->flags & SD_ASYM_PACKING) &&
-	       sched_asym_prefer(env->dst_cpu, env->src_cpu);
+	if (env->idle != CPU_NOT_IDLE && (env->sd->flags & SD_ASYM_PACKING)) {
+		/* Always obey priorities between SMT siblings. */
+		if (env->sd->flags & SD_SHARE_CPUCAPACITY)
+			return sched_asym_prefer(env->dst_cpu, env->src_cpu);
+
+		/*
+		 * A lower priority CPU can help an SMT core with more than one
+		 * busy sibling.
+		 */
+		return sched_asym_prefer(env->dst_cpu, env->src_cpu) ||
+		       !is_core_idle(env->src_cpu);
+	}
+
+	return false;
 }
 
 static inline bool