[2/7] sched/balancing: Change 'enum cpu_idle_type' to have more natural definitions

Message ID 20240301110951.3707367-3-mingo@kernel.org
State New
Headers
Series sched/balancing: Misc updates & cleanups |

Commit Message

Ingo Molnar March 1, 2024, 11:09 a.m. UTC
  The cpu_idle_type enum has the confusingly inverted property
that 'not idle' is 1, and 'idle' is '0'.

This resulted in a number of unnecessary complications in the code.

Reverse the order, remove the CPU_NOT_IDLE type, and convert
all code to a natural boolean form.

It's much more readable:

  -       enum cpu_idle_type idle = this_rq->idle_balance ?
  -                                               CPU_IDLE : CPU_NOT_IDLE;
  -
  +       enum cpu_idle_type idle = this_rq->idle_balance;

  --------------------------------

  -       if (env->idle == CPU_NOT_IDLE || !busiest->sum_nr_running)
  +       if (!env->idle || !busiest->sum_nr_running)

  --------------------------------

And gets rid of the double negation in these usages:

  -               if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1)
  +               if (env->idle && env->src_rq->nr_running <= 1)

Furthermore, this makes code much more obvious where there's
differentiation between CPU_IDLE and CPU_NEWLY_IDLE.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Valentin Schneider <vschneid@redhat.com>
---
 include/linux/sched/idle.h |  3 +--
 kernel/sched/fair.c        | 27 ++++++++++++---------------
 2 files changed, 13 insertions(+), 17 deletions(-)
  

Comments

Shrikanth Hegde March 2, 2024, 9:04 a.m. UTC | #1
On 3/1/24 4:39 PM, Ingo Molnar wrote:
> The cpu_idle_type enum has the confusingly inverted property
> that 'not idle' is 1, and 'idle' is '0'.
> 
> This resulted in a number of unnecessary complications in the code.
> 
> Reverse the order, remove the CPU_NOT_IDLE type, and convert
> all code to a natural boolean form.
> 
> It's much more readable:
> 
>   -       enum cpu_idle_type idle = this_rq->idle_balance ?
>   -                                               CPU_IDLE : CPU_NOT_IDLE;
>   -
>   +       enum cpu_idle_type idle = this_rq->idle_balance;
> 
>   --------------------------------
> 
>   -       if (env->idle == CPU_NOT_IDLE || !busiest->sum_nr_running)
>   +       if (!env->idle || !busiest->sum_nr_running)
> 
>   --------------------------------
> 
> And gets rid of the double negation in these usages:
> 
>   -               if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1)
>   +               if (env->idle && env->src_rq->nr_running <= 1)
> 
> Furthermore, this makes code much more obvious where there's
> differentiation between CPU_IDLE and CPU_NEWLY_IDLE.
> 
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Valentin Schneider <vschneid@redhat.com>
> ---
>  include/linux/sched/idle.h |  3 +--
>  kernel/sched/fair.c        | 27 ++++++++++++---------------
>  2 files changed, 13 insertions(+), 17 deletions(-)
> 
> diff --git a/include/linux/sched/idle.h b/include/linux/sched/idle.h
> index 478084f9105e..4a6423700ffc 100644
> --- a/include/linux/sched/idle.h
> +++ b/include/linux/sched/idle.h
> @@ -5,8 +5,7 @@
>  #include <linux/sched.h>
>  
>  enum cpu_idle_type {
> -	CPU_IDLE,
> -	CPU_NOT_IDLE,
> +	CPU_IDLE = 1,
>  	CPU_NEWLY_IDLE,
>  	CPU_MAX_IDLE_TYPES
>  };

[...]

>  	struct rq *this_rq = this_rq();
> -	enum cpu_idle_type idle = this_rq->idle_balance ?
> -						CPU_IDLE : CPU_NOT_IDLE;
> -
> +	enum cpu_idle_type idle = this_rq->idle_balance;
>  	/*
>  	 * If this CPU has a pending nohz_balance_kick, then do the
>  	 * balancing on behalf of the other idle CPUs whose ticks are

Hi Ingo.
This is more readable code indeed.

But schedtstat area also needs a fix. I applied the patch, I see it displays 
only for CPU_IDLE and CPU_NEWLY_IDLE since stats.c displays starting from CPU_IDLE. 


Did below test.
echo 1 > /proc/sys/kernel/sched_schedstats
sleep 300
stress-ng --cpu=$(nproc) -t 300
cat /proc/schedstat


6.8.rc5  -- There are 36 fields.
cpu0 0 0 4400 1485 1624 1229 301472313236 120382198 7714                        
domain0 00000000,00000000,00000055 1661 1661 0 0 0 0 0 1661 2495 2495 0 0 0 0 0 2495 67 66 1 2 0 0 0 66 0 0 0 0 0 0 0 0 0 133 38 0
domain1 ff000000,00ff0000,ffffffff 382 369 13 13 4 0 2 207 198 195 3 36 0 0 0 195 67 64 3 3 0 0 0 64 4 0 4 0 0 0 0 0 0 124 9 0
domain2 ff00ffff,00ffffff,ffffffff 586 585 1 6 0 0 0 365 118 116 2 96 0 0 0 116 67 67 0 0 0 0 0 67 0 0 0 0 0 0 0 0 0 59 0 0
domain3 ffffffff,ffffffff,ffffffff 481 479 2 58 0 0 0 387 97 97 0 0 0 0 0 96 67 67 0 0 0 0 0 67 0 0 0 0 0 0 0 0 0 79 0 0

+Patch - There are 28 fields.
cpu0 0 0 2520 1244 1283 974 2273868054 212337506 6911                           
domain0 00000000,00000000,00000055 1975 1975 0 0 0 0 0 1975 45 45 0 0 0 0 0 45 0 0 0 0 0 0 0 0 0 65 3 0
domain1 ff000000,00ff0000,ffffffff 441 438 3 3 1 0 0 242 45 43 2 2 0 0 0 43 1 0 1 0 0 0 0 0 0 48 0 0
domain2 ff00ffff,00ffffff,ffffffff 655 654 1 2 0 0 0 468 45 45 0 0 0 0 0 45 0 0 0 0 0 0 0 0 0 152 0 0
domain3 ffffffff,ffffffff,ffffffff 521 521 0 0 0 0 0 472 44 44 0 0 0 0 0 44 0 0 0 0 0 0 0 0 0 44 0 0


I think its all getting accounted. Just that its not being printed. 
With the below change, able to see all the three types, albeit not in right order as 
per schedstat documentation. 

diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
index 857f837f52cb..f36b54bdd9fa 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -150,7 +150,7 @@ static int show_schedstat(struct seq_file *seq, void *v)
 
                        seq_printf(seq, "domain%d %*pb", dcount++,
                                   cpumask_pr_args(sched_domain_span(sd)));
-                       for (itype = CPU_IDLE; itype < CPU_MAX_IDLE_TYPES;
+                       for (itype = 0; itype < CPU_MAX_IDLE_TYPES;
                                        itype++) {
                                seq_printf(seq, " %u %u %u %u %u %u %u %u",
                                    sd->lb_count[itype],
  
Ingo Molnar March 4, 2024, 9:26 a.m. UTC | #2
* Shrikanth Hegde <sshegde@linux.ibm.com> wrote:

> I think its all getting accounted. Just that its not being printed. With 
> the below change, able to see all the three types, albeit not in right 
> order as per schedstat documentation.
> 
> diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
> index 857f837f52cb..f36b54bdd9fa 100644
> --- a/kernel/sched/stats.c
> +++ b/kernel/sched/stats.c
> @@ -150,7 +150,7 @@ static int show_schedstat(struct seq_file *seq, void *v)
>  
>                         seq_printf(seq, "domain%d %*pb", dcount++,
>                                    cpumask_pr_args(sched_domain_span(sd)));
> -                       for (itype = CPU_IDLE; itype < CPU_MAX_IDLE_TYPES;
> +                       for (itype = 0; itype < CPU_MAX_IDLE_TYPES;
>                                         itype++) {
>                                 seq_printf(seq, " %u %u %u %u %u %u %u %u",
>                                     sd->lb_count[itype],

Indeed, good catch - there was an implicit reliance on CPU_IDLE's position 
in the loop above that my patch didn't take into account.

Fixed.

Thanks,

	Ingo
  

Patch

diff --git a/include/linux/sched/idle.h b/include/linux/sched/idle.h
index 478084f9105e..4a6423700ffc 100644
--- a/include/linux/sched/idle.h
+++ b/include/linux/sched/idle.h
@@ -5,8 +5,7 @@ 
 #include <linux/sched.h>
 
 enum cpu_idle_type {
-	CPU_IDLE,
-	CPU_NOT_IDLE,
+	CPU_IDLE = 1,
 	CPU_NEWLY_IDLE,
 	CPU_MAX_IDLE_TYPES
 };
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 64ae3d8dc93b..f11fc6dd39b1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9070,7 +9070,7 @@  static int detach_tasks(struct lb_env *env)
 		 * We don't want to steal all, otherwise we may be treated likewise,
 		 * which could at worst lead to a livelock crash.
 		 */
-		if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1)
+		if (env->idle && env->src_rq->nr_running <= 1)
 			break;
 
 		env->loop++;
@@ -9803,7 +9803,7 @@  static inline bool smt_vs_nonsmt_groups(struct sched_group *sg1,
 static inline bool smt_balance(struct lb_env *env, struct sg_lb_stats *sgs,
 			       struct sched_group *group)
 {
-	if (env->idle == CPU_NOT_IDLE)
+	if (!env->idle)
 		return false;
 
 	/*
@@ -9827,7 +9827,7 @@  static inline long sibling_imbalance(struct lb_env *env,
 	int ncores_busiest, ncores_local;
 	long imbalance;
 
-	if (env->idle == CPU_NOT_IDLE || !busiest->sum_nr_running)
+	if (!env->idle || !busiest->sum_nr_running)
 		return 0;
 
 	ncores_busiest = sds->busiest->cores;
@@ -9927,8 +9927,7 @@  static inline void update_sg_lb_stats(struct lb_env *env,
 				sgs->group_misfit_task_load = rq->misfit_task_load;
 				*sg_status |= SG_OVERLOAD;
 			}
-		} else if ((env->idle != CPU_NOT_IDLE) &&
-			   sched_reduced_capacity(rq, env->sd)) {
+		} else if (env->idle && sched_reduced_capacity(rq, env->sd)) {
 			/* Check for a task running on a CPU with reduced capacity */
 			if (sgs->group_misfit_task_load < load)
 				sgs->group_misfit_task_load = load;
@@ -9940,7 +9939,7 @@  static inline void update_sg_lb_stats(struct lb_env *env,
 	sgs->group_weight = group->group_weight;
 
 	/* Check if dst CPU is idle and preferred to this group */
-	if (!local_group && env->idle != CPU_NOT_IDLE && sgs->sum_h_nr_running &&
+	if (!local_group && env->idle && sgs->sum_h_nr_running &&
 	    sched_group_asym(env, sgs, group))
 		sgs->group_asym_packing = 1;
 
@@ -10698,7 +10697,7 @@  static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 			 * waiting task in this overloaded busiest group. Let's
 			 * try to pull it.
 			 */
-			if (env->idle != CPU_NOT_IDLE && env->imbalance == 0) {
+			if (env->idle && env->imbalance == 0) {
 				env->migration_type = migrate_task;
 				env->imbalance = 1;
 			}
@@ -10913,7 +10912,7 @@  static struct sched_group *find_busiest_group(struct lb_env *env)
 		goto force_balance;
 
 	if (busiest->group_type != group_overloaded) {
-		if (env->idle == CPU_NOT_IDLE) {
+		if (!env->idle) {
 			/*
 			 * If the busiest group is not overloaded (and as a
 			 * result the local one too) but this CPU is already
@@ -11121,7 +11120,7 @@  asym_active_balance(struct lb_env *env)
 	 * the lower priority @env::dst_cpu help it. Do not follow
 	 * CPU priority.
 	 */
-	return env->idle != CPU_NOT_IDLE && sched_use_asym_prio(env->sd, env->dst_cpu) &&
+	return env->idle && sched_use_asym_prio(env->sd, env->dst_cpu) &&
 	       (sched_asym_prefer(env->dst_cpu, env->src_cpu) ||
 		!sched_use_asym_prio(env->sd, env->src_cpu));
 }
@@ -11159,7 +11158,7 @@  static int need_active_balance(struct lb_env *env)
 	 * because of other sched_class or IRQs if more capacity stays
 	 * available on dst_cpu.
 	 */
-	if ((env->idle != CPU_NOT_IDLE) &&
+	if (env->idle &&
 	    (env->src_rq->cfs.h_nr_running == 1)) {
 		if ((check_cpu_capacity(env->src_rq, sd)) &&
 		    (capacity_of(env->src_cpu)*sd->imbalance_pct < capacity_of(env->dst_cpu)*100))
@@ -11735,8 +11734,8 @@  static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
 				 * env->dst_cpu, so we can't know our idle
 				 * state even if we migrated tasks. Update it.
 				 */
-				idle = idle_cpu(cpu) ? CPU_IDLE : CPU_NOT_IDLE;
-				busy = idle != CPU_IDLE && !sched_idle_cpu(cpu);
+				idle = idle_cpu(cpu);
+				busy = !idle && !sched_idle_cpu(cpu);
 			}
 			sd->last_balance = jiffies;
 			interval = get_sd_balance_interval(sd, busy);
@@ -12416,9 +12415,7 @@  static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
 static __latent_entropy void run_rebalance_domains(struct softirq_action *h)
 {
 	struct rq *this_rq = this_rq();
-	enum cpu_idle_type idle = this_rq->idle_balance ?
-						CPU_IDLE : CPU_NOT_IDLE;
-
+	enum cpu_idle_type idle = this_rq->idle_balance;
 	/*
 	 * If this CPU has a pending nohz_balance_kick, then do the
 	 * balancing on behalf of the other idle CPUs whose ticks are