[v10,20/20] timers: Always queue timers on the local CPU

Message ID 20240115143743.27827-21-anna-maria@linutronix.de
State New
Headers
Series timers: Move from a push remote at enqueue to a pull at expiry model |

Commit Message

Anna-Maria Behnsen Jan. 15, 2024, 2:37 p.m. UTC
  The timer pull model is in place so we can remove the heuristics which try
to guess the best target CPU at enqueue/modification time.

All non pinned timers are queued on the local CPU in the separate storage
and eventually pulled at expiry time to a remote CPU.

Originally-by: Richard Cochran (linutronix GmbH) <richardcochran@gmail.com>
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
---
v9:
 - Update to the changes of the preceding patches

v6:
 - Update TIMER_PINNED flag description.

v5:
 - Move WARN_ONCE() in add_timer_on() into a previous patch
 - Fold crystallball magic related hunks into this patch

v4: Update comment about TIMER_PINNED flag (heristic is removed)
---
 include/linux/timer.h | 14 ++++----------
 kernel/time/timer.c   | 34 +++++++++++++---------------------
 2 files changed, 17 insertions(+), 31 deletions(-)
  

Comments

Frederic Weisbecker Feb. 1, 2024, 5:36 p.m. UTC | #1
Le Mon, Jan 15, 2024 at 03:37:43PM +0100, Anna-Maria Behnsen a écrit :
> The timer pull model is in place so we can remove the heuristics which try
> to guess the best target CPU at enqueue/modification time.
> 
> All non pinned timers are queued on the local CPU in the separate storage
> and eventually pulled at expiry time to a remote CPU.
> 
> Originally-by: Richard Cochran (linutronix GmbH) <richardcochran@gmail.com>
> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

Just one detail below:

> @@ -590,10 +590,13 @@ trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
>  
>  	/*
>  	 * We might have to IPI the remote CPU if the base is idle and the
> -	 * timer is not deferrable. If the other CPU is on the way to idle
> -	 * then it can't set base->is_idle as we hold the base lock:
> +	 * timer is pinned. If it is a non pinned timer, it is only queued
> +	 * on the remote CPU, when timer was running during queueing. Then
> +	 * everything is handled by remote CPU anyway. If the other CPU is
> +	 * on the way to idle then it can't set base->is_idle as we hold
> +	 * the base lock:
>  	 */
> -	if (base->is_idle)
> +	if (base->is_idle && timer->flags & TIMER_PINNED)

Is the TIMER_PINNED test necessary? If base->is_idle, then the timer
is now guaranteed to be TIMER_PINNED, right?

Thanks.
  
Anna-Maria Behnsen Feb. 1, 2024, 8:58 p.m. UTC | #2
Frederic Weisbecker <frederic@kernel.org> writes:

> Le Mon, Jan 15, 2024 at 03:37:43PM +0100, Anna-Maria Behnsen a écrit :
>> The timer pull model is in place so we can remove the heuristics which try
>> to guess the best target CPU at enqueue/modification time.
>> 
>> All non pinned timers are queued on the local CPU in the separate storage
>> and eventually pulled at expiry time to a remote CPU.
>> 
>> Originally-by: Richard Cochran (linutronix GmbH) <richardcochran@gmail.com>
>> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
>
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
>
> Just one detail below:
>
>> @@ -590,10 +590,13 @@ trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
>>  
>>  	/*
>>  	 * We might have to IPI the remote CPU if the base is idle and the
>> -	 * timer is not deferrable. If the other CPU is on the way to idle
>> -	 * then it can't set base->is_idle as we hold the base lock:
>> +	 * timer is pinned. If it is a non pinned timer, it is only queued
>> +	 * on the remote CPU, when timer was running during queueing. Then
>> +	 * everything is handled by remote CPU anyway. If the other CPU is
>> +	 * on the way to idle then it can't set base->is_idle as we hold
>> +	 * the base lock:
>>  	 */
>> -	if (base->is_idle)
>> +	if (base->is_idle && timer->flags & TIMER_PINNED)
>
> Is the TIMER_PINNED test necessary? If base->is_idle, then the timer
> is now guaranteed to be TIMER_PINNED, right?
>

Yes, you are right. Should I drop it? To clarify it, I could add a

  WARN_ON_ONCE(!timer->flags & TIMER_PINNED)

instead.

Thanks,

	Anna-Maria
  
Frederic Weisbecker Feb. 2, 2024, 11:57 a.m. UTC | #3
On Thu, Feb 01, 2024 at 09:58:38PM +0100, Anna-Maria Behnsen wrote:
> Frederic Weisbecker <frederic@kernel.org> writes:
> 
> > Le Mon, Jan 15, 2024 at 03:37:43PM +0100, Anna-Maria Behnsen a écrit :
> >> The timer pull model is in place so we can remove the heuristics which try
> >> to guess the best target CPU at enqueue/modification time.
> >> 
> >> All non pinned timers are queued on the local CPU in the separate storage
> >> and eventually pulled at expiry time to a remote CPU.
> >> 
> >> Originally-by: Richard Cochran (linutronix GmbH) <richardcochran@gmail.com>
> >> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
> >
> > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> >
> > Just one detail below:
> >
> >> @@ -590,10 +590,13 @@ trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
> >>  
> >>  	/*
> >>  	 * We might have to IPI the remote CPU if the base is idle and the
> >> -	 * timer is not deferrable. If the other CPU is on the way to idle
> >> -	 * then it can't set base->is_idle as we hold the base lock:
> >> +	 * timer is pinned. If it is a non pinned timer, it is only queued
> >> +	 * on the remote CPU, when timer was running during queueing. Then
> >> +	 * everything is handled by remote CPU anyway. If the other CPU is
> >> +	 * on the way to idle then it can't set base->is_idle as we hold
> >> +	 * the base lock:
> >>  	 */
> >> -	if (base->is_idle)
> >> +	if (base->is_idle && timer->flags & TIMER_PINNED)
> >
> > Is the TIMER_PINNED test necessary? If base->is_idle, then the timer
> > is now guaranteed to be TIMER_PINNED, right?
> >
> 
> Yes, you are right. Should I drop it? To clarify it, I could add a
> 
>   WARN_ON_ONCE(!timer->flags & TIMER_PINNED)

Yep, that looks good!

Thanks.

> 
> instead.
> 
> Thanks,
> 
> 	Anna-Maria
>
  

Patch

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 404bb31a95c7..4dd59e4e5681 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -50,16 +50,10 @@  struct timer_list {
  * workqueue locking issues. It's not meant for executing random crap
  * with interrupts disabled. Abuse is monitored!
  *
- * @TIMER_PINNED: A pinned timer will not be affected by any timer
- * placement heuristics (like, NOHZ) and will always expire on the CPU
- * on which the timer was enqueued.
- *
- * Note: Because enqueuing of timers can migrate the timer from one
- * CPU to another, pinned timers are not guaranteed to stay on the
- * initialy selected CPU.  They move to the CPU on which the enqueue
- * function is invoked via mod_timer() or add_timer().  If the timer
- * should be placed on a particular CPU, then add_timer_on() has to be
- * used.
+ * @TIMER_PINNED: A pinned timer will always expire on the CPU on which the
+ * timer was enqueued. When a particular CPU is required, add_timer_on()
+ * has to be used. Enqueue via mod_timer() and add_timer() is always done
+ * on the local CPU.
  */
 #define TIMER_CPUMASK		0x0003FFFF
 #define TIMER_MIGRATING		0x00040000
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index a52f68be8dfd..8d53718af21d 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -590,10 +590,13 @@  trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
 
 	/*
 	 * We might have to IPI the remote CPU if the base is idle and the
-	 * timer is not deferrable. If the other CPU is on the way to idle
-	 * then it can't set base->is_idle as we hold the base lock:
+	 * timer is pinned. If it is a non pinned timer, it is only queued
+	 * on the remote CPU, when timer was running during queueing. Then
+	 * everything is handled by remote CPU anyway. If the other CPU is
+	 * on the way to idle then it can't set base->is_idle as we hold
+	 * the base lock:
 	 */
-	if (base->is_idle)
+	if (base->is_idle && timer->flags & TIMER_PINNED)
 		wake_up_nohz_cpu(base->cpu);
 }
 
@@ -941,17 +944,6 @@  static inline struct timer_base *get_timer_base(u32 tflags)
 	return get_timer_cpu_base(tflags, tflags & TIMER_CPUMASK);
 }
 
-static inline struct timer_base *
-get_target_base(struct timer_base *base, unsigned tflags)
-{
-#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
-	if (static_branch_likely(&timers_migration_enabled) &&
-	    !(tflags & TIMER_PINNED))
-		return get_timer_cpu_base(tflags, get_nohz_timer_target());
-#endif
-	return get_timer_this_cpu_base(tflags);
-}
-
 static inline void __forward_timer_base(struct timer_base *base,
 					unsigned long basej)
 {
@@ -1106,7 +1098,7 @@  __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int option
 	if (!ret && (options & MOD_TIMER_PENDING_ONLY))
 		goto out_unlock;
 
-	new_base = get_target_base(base, timer->flags);
+	new_base = get_timer_this_cpu_base(timer->flags);
 
 	if (base != new_base) {
 		/*
@@ -2237,7 +2229,7 @@  static inline u64 __get_next_timer_interrupt(unsigned long basej, u64 basem,
 		 * granularity skew (by design).
 		 */
 		if (!base_local->is_idle && time_after(nextevt, basej + 1)) {
-			base_local->is_idle = base_global->is_idle = true;
+			base_local->is_idle = true;
 			trace_timer_base_idle(true, base_local->cpu);
 		}
 		*idle = base_local->is_idle;
@@ -2303,13 +2295,13 @@  u64 timer_base_try_to_set_idle(unsigned long basej, u64 basem, bool *idle)
 void timer_clear_idle(void)
 {
 	/*
-	 * We do this unlocked. The worst outcome is a remote enqueue sending
-	 * a pointless IPI, but taking the lock would just make the window for
-	 * sending the IPI a few instructions smaller for the cost of taking
-	 * the lock in the exit from idle path.
+	 * We do this unlocked. The worst outcome is a remote pinned timer
+	 * enqueue sending a pointless IPI, but taking the lock would just
+	 * make the window for sending the IPI a few instructions smaller
+	 * for the cost of taking the lock in the exit from idle
+	 * path. Required for BASE_LOCAL only.
 	 */
 	__this_cpu_write(timer_bases[BASE_LOCAL].is_idle, false);
-	__this_cpu_write(timer_bases[BASE_GLOBAL].is_idle, false);
 	trace_timer_base_idle(false, smp_processor_id());
 
 	/* Activate without holding the timer_base->lock */