[5/6] sched/fair: Consider the idle state of the whole core for load balance

Message ID 849d3fee51d218de71ecc9d557667ff6b137ac2d.1683156492.git.tim.c.chen@linux.intel.com
State New
Headers
Series Enable Cluster Scheduling for x86 Hybrid CPUs |

Commit Message

Tim Chen May 4, 2023, 4:09 p.m. UTC
  From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>

should_we_balance() traverses the group_balance_mask (AND'ed with lb_env::
cpus) starting from lower numbered CPUs looking for the first idle CPU.

In hybrid x86 systems, the siblings of SMT cores get CPU numbers, before
non-SMT cores:

	[0, 1] [2, 3] [4, 5] 5 6 7 8
         b  i   b  i   b  i  b i i i

In the figure above, CPUs in brackets are siblings of an SMT core. The
rest are non-SMT cores. 'b' indicates a busy CPU, 'i' indicates an
idle CPU.

We should let a CPU on a fully idle core get the first chance to idle
load balance as it has more CPU capacity than a CPU on an idle SMT
CPU with busy sibling.  So for the figure above, if we are running
should_we_balance() to CPU 1, we should return false to let CPU 6 on
idle core to have a chance first to idle load balance.

A partially busy (i.e., of type group_has_spare) local group with SMT 
cores will often have only one SMT sibling busy. If the destination CPU
is a non-SMT core, partially busy, lower-numbered, SMT cores should not
be considered when finding the first idle CPU. 

However, in should_we_balance(), when we encounter idle SMT first in partially
busy core, we prematurely break the search for the first idle CPU.

Higher-numbered, non-SMT cores is not given the chance to have
idle balance done on their behalf. Those CPUs will only be considered
for idle balancing by chance via CPU_NEWLY_IDLE.

Instead, consider the idle state of the whole SMT core.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/fair.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)
  

Comments

Peter Zijlstra May 5, 2023, 1:23 p.m. UTC | #1
On Thu, May 04, 2023 at 09:09:55AM -0700, Tim Chen wrote:

> @@ -10709,11 +10709,26 @@ static int should_we_balance(struct lb_env *env)
>  	for_each_cpu_and(cpu, group_balance_mask(sg), env->cpus) {
>  		if (!idle_cpu(cpu))
>  			continue;
> +		else {
> +			/*
> +			 * Don't balance to idle SMT in busy core right away when
> +			 * balancing cores, but remember the first idle SMT CPU for
> +			 * later consideration.  Find CPU on an idle core first.
> +			 */
> +			if (!(env->sd->flags & SD_SHARE_CPUCAPACITY) && !is_core_idle(cpu)) {
> +				if (idle_smt == -1)
> +					idle_smt = cpu;
> +				continue;
> +			}
> +		}

Not only does that bust CodingStyle, it's also entirely daft. What
exactly is the purpose of that else statement?

>  
>  		/* Are we the first idle CPU? */
>  		return cpu == env->dst_cpu;
>  	}
  
Tim Chen May 5, 2023, 10:51 p.m. UTC | #2
On Fri, 2023-05-05 at 15:23 +0200, Peter Zijlstra wrote:
> On Thu, May 04, 2023 at 09:09:55AM -0700, Tim Chen wrote:
> 
> > @@ -10709,11 +10709,26 @@ static int should_we_balance(struct lb_env *env)
> >  	for_each_cpu_and(cpu, group_balance_mask(sg), env->cpus) {
> >  		if (!idle_cpu(cpu))
> >  			continue;
> > +		else {
> > +			/*
> > +			 * Don't balance to idle SMT in busy core right away when
> > +			 * balancing cores, but remember the first idle SMT CPU for
> > +			 * later consideration.  Find CPU on an idle core first.
> > +			 */
> > +			if (!(env->sd->flags & SD_SHARE_CPUCAPACITY) && !is_core_idle(cpu)) {
> > +				if (idle_smt == -1)
> > +					idle_smt = cpu;
> > +				continue;
> > +			}
> > +		}
> 
> Not only does that bust CodingStyle, it's also entirely daft. What
> exactly is the purpose of that else statement?
> 
> 

Yeah, that's a dumb "else" statement.  Will remove that.

Tim
  

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 58ef7d529731..c77fadba063d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10683,7 +10683,7 @@  static int active_load_balance_cpu_stop(void *data);
 static int should_we_balance(struct lb_env *env)
 {
 	struct sched_group *sg = env->sd->groups;
-	int cpu;
+	int cpu, idle_smt = -1;
 
 	/*
 	 * Ensure the balancing environment is consistent; can happen
@@ -10709,11 +10709,26 @@  static int should_we_balance(struct lb_env *env)
 	for_each_cpu_and(cpu, group_balance_mask(sg), env->cpus) {
 		if (!idle_cpu(cpu))
 			continue;
+		else {
+			/*
+			 * Don't balance to idle SMT in busy core right away when
+			 * balancing cores, but remember the first idle SMT CPU for
+			 * later consideration.  Find CPU on an idle core first.
+			 */
+			if (!(env->sd->flags & SD_SHARE_CPUCAPACITY) && !is_core_idle(cpu)) {
+				if (idle_smt == -1)
+					idle_smt = cpu;
+				continue;
+			}
+		}
 
 		/* Are we the first idle CPU? */
 		return cpu == env->dst_cpu;
 	}
 
+	if (idle_smt == env->dst_cpu)
+		return true;
+
 	/* Are we the first CPU of this group ? */
 	return group_balance_cpu(sg) == env->dst_cpu;
 }