[2/3] sched/nohz: Update comments about NEWILB_KICK

Message ID 20231020014031.919742-2-joel@joelfernandes.org
State New
Headers
Series [1/3] sched/nohz: Update nohz.next_balance directly without IPIs (v2) |

Commit Message

Joel Fernandes Oct. 20, 2023, 1:40 a.m. UTC
  How ILB is triggered without IPIs is cryptic. Out of mercy for future
code readers, document it in code comments.

The comments are derived from a discussion with Vincent in a past
review.

Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/sched/fair.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)
  

Comments

Ingo Molnar Oct. 20, 2023, 7:51 a.m. UTC | #1
* Joel Fernandes (Google) <joel@joelfernandes.org> wrote:

> How ILB is triggered without IPIs is cryptic. Out of mercy for future
> code readers, document it in code comments.
> 
> The comments are derived from a discussion with Vincent in a past
> review.
> 
> Cc: Suleiman Souhlal <suleiman@google.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/sched/fair.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 965c30fbbe5c..8e276d12c3cb 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -11959,8 +11959,19 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>  }
>  
>  /*
> - * Check if we need to run the ILB for updating blocked load before entering
> - * idle state.
> + * Check if we need to directly run the ILB for updating blocked load before
> + * entering idle state. Here we run ILB directly without issuing IPIs.
> + *
> + * Note that when this function is called, the tick may not yet be stopped on
> + * this CPU yet. nohz.idle_cpus_mask is updated only when tick is stopped and
> + * cleared on the next busy tick. In other words, nohz.idle_cpus_mask updates
> + * don't align with CPUs enter/exit idle to avoid bottlenecks due to high idle
> + * entry/exit rate (usec). So it is possible that _nohz_idle_balance() is
> + * called from this function on (this) CPU that's not yet in the mask. That's
> + * OK because the goal of nohz_run_idle_balance() is to run ILB only for
> + * updating the blocked load of already idle CPUs without waking up one of
> + * those idle CPUs and outside the preempt disable / irq off phase of the local
> + * cpu about to enter idle, because it can take a long time.

Much appreciated! Feel free to update comments for the entire relevant code 
base, a lot of it has become cryptic and under-documented at best as 
complexity increased ...

Thanks,

	Ingo
  

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 965c30fbbe5c..8e276d12c3cb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11959,8 +11959,19 @@  static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 }
 
 /*
- * Check if we need to run the ILB for updating blocked load before entering
- * idle state.
+ * Check if we need to directly run the ILB for updating blocked load before
+ * entering idle state. Here we run ILB directly without issuing IPIs.
+ *
+ * Note that when this function is called, the tick may not yet be stopped on
+ * this CPU yet. nohz.idle_cpus_mask is updated only when tick is stopped and
+ * cleared on the next busy tick. In other words, nohz.idle_cpus_mask updates
+ * don't align with CPUs enter/exit idle to avoid bottlenecks due to high idle
+ * entry/exit rate (usec). So it is possible that _nohz_idle_balance() is
+ * called from this function on (this) CPU that's not yet in the mask. That's
+ * OK because the goal of nohz_run_idle_balance() is to run ILB only for
+ * updating the blocked load of already idle CPUs without waking up one of
+ * those idle CPUs and outside the preempt disable / irq off phase of the local
+ * cpu about to enter idle, because it can take a long time.
  */
 void nohz_run_idle_balance(int cpu)
 {