[v4,05/16] add_timer_on(): Make sure callers have TIMER_PINNED flag

Message ID 20221104145737.71236-6-anna-maria@linutronix.de
State New
Headers
Series timer: Move from a push remote at enqueue to a pull at expiry model |

Commit Message

Anna-Maria Behnsen Nov. 4, 2022, 2:57 p.m. UTC
  The implementation of the hierachical timer pull model will change the
timer bases per CPU. Timers, that have to expire on a specific CPU, require
the TIMER_PINNED flag. Otherwise they will be queued on the dedicated CPU
but in global timer base and those timers could also expire on other
CPUs. Timers with TIMER_DEFERRABLE flag end up in a separate base anyway
and are executed on the local CPU only.

Therefore add the missing TIMER_PINNED flag for those callers who use
add_timer_on() without the flag. No functional change.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: John Stultz <jstultz@google.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
---
v4:
  - Move patch before local and global base are introduced
  - Add missing user (drivers/char/random.c) of add_timer_on() without
    TIMER_PINNED flag (kernel test robot)
---
 arch/x86/kernel/tsc_sync.c | 3 ++-
 drivers/char/random.c      | 2 +-
 kernel/time/clocksource.c  | 2 +-
 kernel/workqueue.c         | 7 +++++--
 4 files changed, 9 insertions(+), 5 deletions(-)
  

Comments

Frederic Weisbecker Nov. 4, 2022, 4:43 p.m. UTC | #1
On Fri, Nov 04, 2022 at 03:57:26PM +0100, Anna-Maria Behnsen wrote:
> The implementation of the hierachical timer pull model will change the
> timer bases per CPU. Timers, that have to expire on a specific CPU, require
> the TIMER_PINNED flag. Otherwise they will be queued on the dedicated CPU
> but in global timer base and those timers could also expire on other
> CPUs. Timers with TIMER_DEFERRABLE flag end up in a separate base anyway
> and are executed on the local CPU only.
> 
> Therefore add the missing TIMER_PINNED flag for those callers who use
> add_timer_on() without the flag. No functional change.

You're fixing the current callers but what about the future ones?

add_timer_on() should always guarantee that a timer runs on the
right destination, which is not the case after your patchset if the
timer hasn't been set to TIMER_PINNED.

Therefore I think we should either have:

* add_timer_on() enforce TIMER_PINNED (doesn't work because if the timer is
  later called with mod_timer(), we should expect it to run anywhere)

or

* add_timer_on() warns if !TIMER_PINNED

or

* have an internal flag TIMER_LOCAL, that is turned on when
  add_timer_on() is called or add_timer()/mod_timer() is called
  on a TIMER_PINNED. Otherwise it is turned off.

The last solution should work with existing API and you don't need to
chase the current and future users of add_timer_on().

Thanks.
  
Anna-Maria Behnsen Nov. 7, 2022, 8:11 a.m. UTC | #2
On Fri, 4 Nov 2022, Frederic Weisbecker wrote:

> On Fri, Nov 04, 2022 at 03:57:26PM +0100, Anna-Maria Behnsen wrote:
> > The implementation of the hierachical timer pull model will change the
> > timer bases per CPU. Timers, that have to expire on a specific CPU, require
> > the TIMER_PINNED flag. Otherwise they will be queued on the dedicated CPU
> > but in global timer base and those timers could also expire on other
> > CPUs. Timers with TIMER_DEFERRABLE flag end up in a separate base anyway
> > and are executed on the local CPU only.
> > 
> > Therefore add the missing TIMER_PINNED flag for those callers who use
> > add_timer_on() without the flag. No functional change.
> 
> You're fixing the current callers but what about the future ones?
> 
> add_timer_on() should always guarantee that a timer runs on the
> right destination, which is not the case after your patchset if the
> timer hasn't been set to TIMER_PINNED.
> 
> Therefore I think we should either have:
> 
> * add_timer_on() enforce TIMER_PINNED (doesn't work because if the timer is
>   later called with mod_timer(), we should expect it to run anywhere)
> 
> or
> 
> * add_timer_on() warns if !TIMER_PINNED

This is already part of the last patch of the queue where also the
crystalball logic is removed. But the patch where I added the WARN_ONCE()
might be the wrong patch, it should be better part of the next patch where
the new timer bases are introduced.

> or
> 
> * have an internal flag TIMER_LOCAL, that is turned on when
>   add_timer_on() is called or add_timer()/mod_timer() is called
>   on a TIMER_PINNED. Otherwise it is turned off.
> 
> The last solution should work with existing API and you don't need to
> chase the current and future users of add_timer_on().

With the last approach it doesn't matter how the timer is setup. Everything
is done by timer code implicitly. When a future caller uses add_timer_on()
and wants to modfiy this "implicitly pinned timer", he will call
mod_timer() and the timer is no longer pinned (if it do not end up in the
same bucket it was before). For a user this does not seems to be very
obvious, or am I wrong?

But if the caller sets up the timer correctly we do not need this extra
timer flag. With the WARN_ONCE() in place, callers need to do the timer
setup properly and it is more clear to the caller what should be done.

BTW, the hunk in this patch for the workqueue is also not a final fix in my
opinion. I'm preparing a cleanup queue (it's part of the deferrable cleanup
queue), where I want to set the timer flags properly when
initializing/defining the workers. I should have added a comment here...

Thanks,

	Anna-Maria
  
Frederic Weisbecker Nov. 7, 2022, 10:11 a.m. UTC | #3
On Mon, Nov 07, 2022 at 09:11:11AM +0100, Anna-Maria Behnsen wrote:
> On Fri, 4 Nov 2022, Frederic Weisbecker wrote:
> 
> > On Fri, Nov 04, 2022 at 03:57:26PM +0100, Anna-Maria Behnsen wrote:
> > > The implementation of the hierachical timer pull model will change the
> > > timer bases per CPU. Timers, that have to expire on a specific CPU, require
> > > the TIMER_PINNED flag. Otherwise they will be queued on the dedicated CPU
> > > but in global timer base and those timers could also expire on other
> > > CPUs. Timers with TIMER_DEFERRABLE flag end up in a separate base anyway
> > > and are executed on the local CPU only.
> > > 
> > > Therefore add the missing TIMER_PINNED flag for those callers who use
> > > add_timer_on() without the flag. No functional change.
> > 
> > You're fixing the current callers but what about the future ones?
> > 
> > add_timer_on() should always guarantee that a timer runs on the
> > right destination, which is not the case after your patchset if the
> > timer hasn't been set to TIMER_PINNED.
> > 
> > Therefore I think we should either have:
> > 
> > * add_timer_on() enforce TIMER_PINNED (doesn't work because if the timer is
> >   later called with mod_timer(), we should expect it to run anywhere)
> > 
> > or
> > 
> > * add_timer_on() warns if !TIMER_PINNED
> 
> This is already part of the last patch of the queue where also the
> crystalball logic is removed. But the patch where I added the WARN_ONCE()
> might be the wrong patch, it should be better part of the next patch where
> the new timer bases are introduced.

Ok.

> 
> > or
> > 
> > * have an internal flag TIMER_LOCAL, that is turned on when
> >   add_timer_on() is called or add_timer()/mod_timer() is called
> >   on a TIMER_PINNED. Otherwise it is turned off.
> > 
> > The last solution should work with existing API and you don't need to
> > chase the current and future users of add_timer_on().
> 
> With the last approach it doesn't matter how the timer is setup. Everything
> is done by timer code implicitly. When a future caller uses add_timer_on()
> and wants to modfiy this "implicitly pinned timer", he will call
> mod_timer() and the timer is no longer pinned (if it do not end up in the
> same bucket it was before). For a user this does not seems to be very
> obvious, or am I wrong?

That's right indeed.

> 
> But if the caller sets up the timer correctly we do not need this extra
> timer flag. With the WARN_ONCE() in place, callers need to do the timer
> setup properly and it is more clear to the caller what should be done.

Yeah that sounds better.

> BTW, the hunk in this patch for the workqueue is also not a final fix in my
> opinion. I'm preparing a cleanup queue (it's part of the deferrable cleanup
> queue), where I want to set the timer flags properly when
> initializing/defining the workers. I should have added a comment here...

Ok, if we have some pinned initializers such as DECLARE_DELAYED_WORK_PINNED()
and DECLARE_DEFERRABKE_WORK_PINNED() then I think that cleans the situation.

Sounds good, thanks!

> 
> Thanks,
> 
> 	Anna-Maria
>
  

Patch

diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index 9452dc9664b5..eab827288e0f 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -110,7 +110,8 @@  static int __init start_sync_check_timer(void)
 	if (!cpu_feature_enabled(X86_FEATURE_TSC_ADJUST) || tsc_clocksource_reliable)
 		return 0;
 
-	timer_setup(&tsc_sync_check_timer, tsc_sync_check_timer_fn, 0);
+	timer_setup(&tsc_sync_check_timer, tsc_sync_check_timer_fn,
+		    TIMER_PINNED);
 	tsc_sync_check_timer.expires = jiffies + SYNC_CHECK_INTERVAL;
 	add_timer(&tsc_sync_check_timer);
 
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 69754155300e..2cae98dc86dc 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -949,7 +949,7 @@  static DEFINE_PER_CPU(struct fast_pool, irq_randomness) = {
 #define FASTMIX_PERM HSIPHASH_PERMUTATION
 	.pool = { HSIPHASH_CONST_0, HSIPHASH_CONST_1, HSIPHASH_CONST_2, HSIPHASH_CONST_3 },
 #endif
-	.mix = __TIMER_INITIALIZER(mix_interrupt_randomness, 0)
+	.mix = __TIMER_INITIALIZER(mix_interrupt_randomness, TIMER_PINNED)
 };
 
 /*
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 8058bec87ace..f8c310e62758 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -523,7 +523,7 @@  static inline void clocksource_start_watchdog(void)
 {
 	if (watchdog_running || !watchdog || list_empty(&watchdog_list))
 		return;
-	timer_setup(&watchdog_timer, clocksource_watchdog, 0);
+	timer_setup(&watchdog_timer, clocksource_watchdog, TIMER_PINNED);
 	watchdog_timer.expires = jiffies + WATCHDOG_INTERVAL;
 	add_timer_on(&watchdog_timer, cpumask_first(cpu_online_mask));
 	watchdog_running = 1;
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 7cd5f5e7e0a1..a0f7bf7be6f2 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1670,10 +1670,13 @@  static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
 	dwork->cpu = cpu;
 	timer->expires = jiffies + delay;
 
-	if (unlikely(cpu != WORK_CPU_UNBOUND))
+	if (unlikely(cpu != WORK_CPU_UNBOUND)) {
+		timer->flags |= TIMER_PINNED;
 		add_timer_on(timer, cpu);
-	else
+	} else {
+		timer->flags &= ~TIMER_PINNED;
 		add_timer(timer);
+	}
 }
 
 /**