[10/10] sched/timers: Explain why idle task schedules out on remote timer enqueue

Message ID 20230811170049.308866-11-frederic@kernel.org
State New
Headers
Series timers/cpuidle: Fixes and cleanups |

Commit Message

Frederic Weisbecker Aug. 11, 2023, 5 p.m. UTC
  Trying to avoid that didn't bring much value after testing, add comment
about this.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/sched/core.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)
  

Comments

Rafael J. Wysocki Aug. 11, 2023, 5:43 p.m. UTC | #1
On Fri, Aug 11, 2023 at 7:01 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> Trying to avoid that didn't bring much value after testing, add comment
> about this.
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Acked-by: Rafael J. Wysocki <rafael@kernel.org>

> ---
>  kernel/sched/core.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index c52c2eba7c73..e53b892167ad 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1135,6 +1135,28 @@ static void wake_up_idle_cpu(int cpu)
>         if (cpu == smp_processor_id())
>                 return;
>
> +       /*
> +        * Set TIF_NEED_RESCHED and send an IPI if in the non-polling
> +        * part of the idle loop. This forces an exit from the idle loop
> +        * and a round trip to schedule(). Now this could be optimized
> +        * because a simple new idle loop iteration is enough to
> +        * re-evaluate the next tick. Provided some re-ordering of tick
> +        * nohz functions that would need to follow TIF_NR_POLLING
> +        * clearing:
> +        *
> +        * - On most archs, a simple fetch_or on ti::flags with a
> +        *   "0" value would be enough to know if an IPI needs to be sent.
> +        *
> +        * - x86 needs to perform a last need_resched() check between
> +        *   monitor and mwait which doesn't take timers into account.
> +        *   There a dedicated TIF_TIMER flag would be required to
> +        *   fetch_or here and be checked along with TIF_NEED_RESCHED
> +        *   before mwait().
> +        *
> +        * However, remote timer enqueue is not such a frequent event
> +        * and testing of the above solutions didn't appear to report
> +        * much benefits.
> +        */
>         if (set_nr_and_not_polling(rq->idle))
>                 smp_send_reschedule(cpu);
>         else
> --
> 2.34.1
>
  

Patch

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c52c2eba7c73..e53b892167ad 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1135,6 +1135,28 @@  static void wake_up_idle_cpu(int cpu)
 	if (cpu == smp_processor_id())
 		return;
 
+	/*
+	 * Set TIF_NEED_RESCHED and send an IPI if in the non-polling
+	 * part of the idle loop. This forces an exit from the idle loop
+	 * and a round trip to schedule(). Now this could be optimized
+	 * because a simple new idle loop iteration is enough to
+	 * re-evaluate the next tick. Provided some re-ordering of tick
+	 * nohz functions that would need to follow TIF_NR_POLLING
+	 * clearing:
+	 *
+	 * - On most archs, a simple fetch_or on ti::flags with a
+	 *   "0" value would be enough to know if an IPI needs to be sent.
+	 *
+	 * - x86 needs to perform a last need_resched() check between
+	 *   monitor and mwait which doesn't take timers into account.
+	 *   There a dedicated TIF_TIMER flag would be required to
+	 *   fetch_or here and be checked along with TIF_NEED_RESCHED
+	 *   before mwait().
+	 *
+	 * However, remote timer enqueue is not such a frequent event
+	 * and testing of the above solutions didn't appear to report
+	 * much benefits.
+	 */
 	if (set_nr_and_not_polling(rq->idle))
 		smp_send_reschedule(cpu);
 	else