[v8,08/25] timer: Rework idle logic

Message ID 20231004123454.15691-9-anna-maria@linutronix.de
State New
Headers
Series timer: Move from a push remote at enqueue to a pull at expiry model |

Commit Message

Anna-Maria Behnsen Oct. 4, 2023, 12:34 p.m. UTC
  From: Thomas Gleixner <tglx@linutronix.de>

To improve readability of the code, split base->idle calculation and
expires calculation into separate parts.

Thereby the following subtle change happens if the next event is just one
jiffy ahead and the tick was already stopped: Originally base->is_idle
remains true in this situation. Now base->is_idle turns to false. This may
spare an IPI if a timer is enqueued remotely to an idle CPU that is going
to tick on the next jiffy.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
---
v4: Change condition to force 0 delta and update commit message (Frederic)
---
 kernel/time/timer.c | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)
  

Comments

Thomas Gleixner Oct. 9, 2023, 10:15 p.m. UTC | #1
On Wed, Oct 04 2023 at 14:34, Anna-Maria Behnsen wrote:
>  
> -	if (time_before_eq(nextevt, basej)) {
> -		expires = basem;
> -		base->is_idle = false;
> -	} else {
> -		if (base->timers_pending)
> -			expires = basem + (u64)(nextevt - basej) * TICK_NSEC;
> -		/*
> -		 * If we expect to sleep more than a tick, mark the base idle.
> -		 * Also the tick is stopped so any added timer must forward
> -		 * the base clk itself to keep granularity small. This idle
> -		 * logic is only maintained for the BASE_STD base, deferrable
> -		 * timers may still see large granularity skew (by design).
> -		 */
> -		if ((expires - basem) > TICK_NSEC)
> -			base->is_idle = true;
> +	/*
> +	 * Base is idle if the next event is more than a tick away. Also
> +	 * the tick is stopped so any added timer must forward the base clk
> +	 * itself to keep granularity small. This idle logic is only
> +	 * maintained for the BASE_STD base, deferrable timers may still
> +	 * see large granularity skew (by design).
> +	 */
> +	base->is_idle = time_after(nextevt, basej + 1);

This is wrongly ordered. base->is_idle must be updated _after_
evaluating base->timers_pending because the below can change nextevt,
no?

> +	if (base->timers_pending) {
> +		/* If we missed a tick already, force 0 delta */
> +		if (time_before(nextevt, basej))
> +			nextevt = basej;
> +		expires = basem + (u64)(nextevt - basej) * TICK_NSEC;

Thanks,

        tglx
  
Frederic Weisbecker Oct. 10, 2023, 11:19 a.m. UTC | #2
On Tue, Oct 10, 2023 at 12:15:09AM +0200, Thomas Gleixner wrote:
> On Wed, Oct 04 2023 at 14:34, Anna-Maria Behnsen wrote:
> >  
> > -	if (time_before_eq(nextevt, basej)) {
> > -		expires = basem;
> > -		base->is_idle = false;
> > -	} else {
> > -		if (base->timers_pending)
> > -			expires = basem + (u64)(nextevt - basej) * TICK_NSEC;
> > -		/*
> > -		 * If we expect to sleep more than a tick, mark the base idle.
> > -		 * Also the tick is stopped so any added timer must forward
> > -		 * the base clk itself to keep granularity small. This idle
> > -		 * logic is only maintained for the BASE_STD base, deferrable
> > -		 * timers may still see large granularity skew (by design).
> > -		 */
> > -		if ((expires - basem) > TICK_NSEC)
> > -			base->is_idle = true;
> > +	/*
> > +	 * Base is idle if the next event is more than a tick away. Also
> > +	 * the tick is stopped so any added timer must forward the base clk
> > +	 * itself to keep granularity small. This idle logic is only
> > +	 * maintained for the BASE_STD base, deferrable timers may still
> > +	 * see large granularity skew (by design).
> > +	 */
> > +	base->is_idle = time_after(nextevt, basej + 1);
> 
> This is wrongly ordered. base->is_idle must be updated _after_
> evaluating base->timers_pending because the below can change nextevt,
> no?
> 
> > +	if (base->timers_pending) {
> > +		/* If we missed a tick already, force 0 delta */
> > +		if (time_before(nextevt, basej))
> > +			nextevt = basej;
> > +		expires = basem + (u64)(nextevt - basej) * TICK_NSEC;

I suspect it doesn't matter in pratice: base->is_idle will remain false
if it's before/equal jiffies.

Still it hurts the eyes so I agree the re-ordering should happen here and
this will even simplify a bit the next patch.

Thanks.


> Thanks,
> 
>         tglx
  
Thomas Gleixner Oct. 10, 2023, 11:48 a.m. UTC | #3
On Tue, Oct 10 2023 at 13:19, Frederic Weisbecker wrote:
> On Tue, Oct 10, 2023 at 12:15:09AM +0200, Thomas Gleixner wrote:
>> > +	base->is_idle = time_after(nextevt, basej + 1);
>> 
>> This is wrongly ordered. base->is_idle must be updated _after_
>> evaluating base->timers_pending because the below can change nextevt,
>> no?
>> 
>> > +	if (base->timers_pending) {
>> > +		/* If we missed a tick already, force 0 delta */
>> > +		if (time_before(nextevt, basej))
>> > +			nextevt = basej;
>> > +		expires = basem + (u64)(nextevt - basej) * TICK_NSEC;
>
> I suspect it doesn't matter in pratice: base->is_idle will remain false
> if it's before/equal jiffies.
>
> Still it hurts the eyes so I agree the re-ordering should happen here and
> this will even simplify a bit the next patch.

Right. Anna-Maria just pointed that out to me before, but we are all in
violent agreement that it sucks :)
  

Patch

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index dc58c479d35a..18f8aac9b19a 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1943,21 +1943,20 @@  u64 get_next_timer_interrupt(unsigned long basej, u64 basem)
 	 */
 	__forward_timer_base(base, basej);
 
-	if (time_before_eq(nextevt, basej)) {
-		expires = basem;
-		base->is_idle = false;
-	} else {
-		if (base->timers_pending)
-			expires = basem + (u64)(nextevt - basej) * TICK_NSEC;
-		/*
-		 * If we expect to sleep more than a tick, mark the base idle.
-		 * Also the tick is stopped so any added timer must forward
-		 * the base clk itself to keep granularity small. This idle
-		 * logic is only maintained for the BASE_STD base, deferrable
-		 * timers may still see large granularity skew (by design).
-		 */
-		if ((expires - basem) > TICK_NSEC)
-			base->is_idle = true;
+	/*
+	 * Base is idle if the next event is more than a tick away. Also
+	 * the tick is stopped so any added timer must forward the base clk
+	 * itself to keep granularity small. This idle logic is only
+	 * maintained for the BASE_STD base, deferrable timers may still
+	 * see large granularity skew (by design).
+	 */
+	base->is_idle = time_after(nextevt, basej + 1);
+
+	if (base->timers_pending) {
+		/* If we missed a tick already, force 0 delta */
+		if (time_before(nextevt, basej))
+			nextevt = basej;
+		expires = basem + (u64)(nextevt - basej) * TICK_NSEC;
 	}
 	raw_spin_unlock(&base->lock);