[v9,12/32] timers: Fix nextevt calculation when no timers are pending

Message ID 20231201092654.34614-13-anna-maria@linutronix.de
State New
Headers
Series timers: Move from a push remote at enqueue to a pull at expiry model |

Commit Message

Anna-Maria Behnsen Dec. 1, 2023, 9:26 a.m. UTC
  When no timer is queued into an empty timer base, the next_expiry will not
be updated. It was originally calculated as

  base->clk + NEXT_TIMER_MAX_DELTA

When the timer base stays empty long enough (> NEXT_TIMER_MAX_DELTA), the
next_expiry value of the empty base suggests that there is a timer pending
soon. This might be more a kind of a theoretical problem, but the fix
doesn't hurt.

Use only base->next_expiry value as nextevt when timers are
pending. Otherwise nextevt will be jiffies + NEXT_TIMER_MAX_DELTA. As all
information is in place, update base->next_expiry value of the empty timer
base as well.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
---
v9: New patch
---
 kernel/time/timer.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)
  

Comments

Sebastian Andrzej Siewior Dec. 4, 2023, 4:03 p.m. UTC | #1
On 2023-12-01 10:26:34 [+0100], Anna-Maria Behnsen wrote:
> When no timer is queued into an empty timer base, the next_expiry will not
> be updated. It was originally calculated as
> 
>   base->clk + NEXT_TIMER_MAX_DELTA
> 
> When the timer base stays empty long enough (> NEXT_TIMER_MAX_DELTA), the
> next_expiry value of the empty base suggests that there is a timer pending
> soon. This might be more a kind of a theoretical problem, but the fix
> doesn't hurt.

So __run_timers() sets base::next_expiry to base->clk +
NEXT_TIMER_MAX_DELTA and then we have no more timers enqueued.

But wouldn't base->timers_pending remain false? Therefore it would use
"expires = KTIME_MAX" as return value (well cmp_next_hrtimer_event())?

Based on the code as of #11, it would only set timer_base::is_idle
wrongly false if it wraps around. Other than that, I don't see an issue.
What do I miss?

If you update it regardless here then it would make a difference to
run_local_timers() assuming we have still hrtimer which expire and this
next_expiry check might raise a softirq since it does not consider the
timers_pending value.

> Use only base->next_expiry value as nextevt when timers are
> pending. Otherwise nextevt will be jiffies + NEXT_TIMER_MAX_DELTA. As all
> information is in place, update base->next_expiry value of the empty timer
> base as well.

or consider timers_pending in run_local_timers()? An additional read vs
write?

> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1944,10 +1943,20 @@ u64 get_next_timer_interrupt(unsigned long basej, u64 basem)
>  	__forward_timer_base(base, basej);
>  
>  	if (base->timers_pending) {
> +		nextevt = base->next_expiry;
> +
>  		/* If we missed a tick already, force 0 delta */
>  		if (time_before(nextevt, basej))
>  			nextevt = basej;
>  		expires = basem + (u64)(nextevt - basej) * TICK_NSEC;
> +	} else {
> +		/*
> +		 * Move next_expiry for the empty base into the future to
> +		 * prevent a unnecessary raise of the timer softirq when the
                           an
> +		 * next_expiry value will be reached even if there is no timer
> +		 * pending.
> +		 */
> +		base->next_expiry = nextevt;
>  	}
>  
>  	/*

Sebastian
  
Anna-Maria Behnsen Dec. 5, 2023, 11:53 a.m. UTC | #2
Sebastian Siewior <bigeasy@linutronix.de> writes:

> On 2023-12-01 10:26:34 [+0100], Anna-Maria Behnsen wrote:
>> When no timer is queued into an empty timer base, the next_expiry will not
>> be updated. It was originally calculated as
>> 
>>   base->clk + NEXT_TIMER_MAX_DELTA
>> 
>> When the timer base stays empty long enough (> NEXT_TIMER_MAX_DELTA), the
>> next_expiry value of the empty base suggests that there is a timer pending
>> soon. This might be more a kind of a theoretical problem, but the fix
>> doesn't hurt.
>
> So __run_timers() sets base::next_expiry to base->clk +
> NEXT_TIMER_MAX_DELTA and then we have no more timers enqueued.
>
> But wouldn't base->timers_pending remain false? Therefore it would use
> "expires = KTIME_MAX" as return value (well cmp_next_hrtimer_event())?

Jupp.

> Based on the code as of #11, it would only set timer_base::is_idle
> wrongly false if it wraps around. Other than that, I don't see an issue.
> What do I miss?

And it will raise an unnecessary softirq when it wraps around as you
also mentioned on the next paragraph.

> If you update it regardless here then it would make a difference to
> run_local_timers() assuming we have still hrtimer which expire and this
> next_expiry check might raise a softirq since it does not consider the
> timers_pending value.

The only difference with this change would be that the softirq will not
be raised when it wraps around.

>> Use only base->next_expiry value as nextevt when timers are
>> pending. Otherwise nextevt will be jiffies + NEXT_TIMER_MAX_DELTA. As all
>> information is in place, update base->next_expiry value of the empty timer
>> base as well.
>
> or consider timers_pending in run_local_timers()? An additional read vs
> write?

This would also be a possibility to add the check in run_local_timers()
with timers_pending. And we also have to make the is_idle marking in
get_next_timer_interrupt() dependant on base::timers_pending bit. But
this also means, we cannot rely on next_expiry when no timer is pending.

Frederic, what do you think?

Thanks,

	Anna-Maria
  
Frederic Weisbecker Dec. 10, 2023, 12:35 a.m. UTC | #3
Le Tue, Dec 05, 2023 at 12:53:03PM +0100, Anna-Maria Behnsen a écrit :
> Sebastian Siewior <bigeasy@linutronix.de> writes:
> >> Use only base->next_expiry value as nextevt when timers are
> >> pending. Otherwise nextevt will be jiffies + NEXT_TIMER_MAX_DELTA. As all
> >> information is in place, update base->next_expiry value of the empty timer
> >> base as well.
> >
> > or consider timers_pending in run_local_timers()? An additional read vs
> > write?
> 
> This would also be a possibility to add the check in run_local_timers()
> with timers_pending.

We could but do we really care about avoiding a potential softirq every 12 days
(on 1000 Hz...)

> And we also have to make the is_idle marking in
> get_next_timer_interrupt() dependant on base::timers_pending bit.

Yes that, on the other hand, looks mandatory! Because if the CPU sleeps for 12
days and then gets an interrupt and then go back to sleep, base->is_idle will be
set as false and remote enqueues won't be notified.

> But this also means, we cannot rely on next_expiry when no timer is pending.

But note that this patch only fixes that partially anyway. Suppose the tick is
stopped entirely and the CPU sleeps for 13 days without any interruption.
Then it's woken up with TIF_RESCHED without any timer queued,
get_next_timer_interrupt() won't be called upon tick restart to fix
->next_expiry.

> 
> Frederic, what do you think?

So it looks like is_idle must be fixed.

As for the timer softirq, ->next_expiry is already unreliable because when
a timer is removed, ->next_expiry is not updated (even though that removed
timer might have been the earliest). So ->next_expiry can already carry a
"too early" value. The only constraint is that ->next_expiry can't be later
than the first timer.

So I'd rather put a comment somewhere about the fact that wrapping is expected
to behave ok. But it's your call.

Thanks.
  
Anna-Maria Behnsen Dec. 12, 2023, 1:21 p.m. UTC | #4
Frederic Weisbecker <frederic@kernel.org> writes:

> Le Tue, Dec 05, 2023 at 12:53:03PM +0100, Anna-Maria Behnsen a écrit :
>> 
>> Frederic, what do you think?
>
> So it looks like is_idle must be fixed.
>
> As for the timer softirq, ->next_expiry is already unreliable because when
> a timer is removed, ->next_expiry is not updated (even though that removed
> timer might have been the earliest). So ->next_expiry can already carry a
> "too early" value. The only constraint is that ->next_expiry can't be later
> than the first timer.
>
> So I'd rather put a comment somewhere about the fact that wrapping is expected
> to behave ok. But it's your call.

Ok. If both solutions are fine, I would like to take the solution with
updating the next_expiry values for empty bases. It will make the
compare of expiry values of global and local timer base easier in one of
the patches later on.

Thanks,

	Anna-Maria
  
Frederic Weisbecker Dec. 12, 2023, 1:37 p.m. UTC | #5
On Tue, Dec 12, 2023 at 02:21:25PM +0100, Anna-Maria Behnsen wrote:
> Frederic Weisbecker <frederic@kernel.org> writes:
> 
> > Le Tue, Dec 05, 2023 at 12:53:03PM +0100, Anna-Maria Behnsen a écrit :
> >> 
> >> Frederic, what do you think?
> >
> > So it looks like is_idle must be fixed.
> >
> > As for the timer softirq, ->next_expiry is already unreliable because when
> > a timer is removed, ->next_expiry is not updated (even though that removed
> > timer might have been the earliest). So ->next_expiry can already carry a
> > "too early" value. The only constraint is that ->next_expiry can't be later
> > than the first timer.
> >
> > So I'd rather put a comment somewhere about the fact that wrapping is expected
> > to behave ok. But it's your call.
> 
> Ok. If both solutions are fine, I would like to take the solution with
> updating the next_expiry values for empty bases. It will make the
> compare of expiry values of global and local timer base easier in one of
> the patches later on.

Fine by me at least!

Thanks.

> Thanks,
> 
> 	Anna-Maria
>
  
Frederic Weisbecker Dec. 20, 2023, 2:49 p.m. UTC | #6
Le Fri, Dec 01, 2023 at 10:26:34AM +0100, Anna-Maria Behnsen a écrit :
> When no timer is queued into an empty timer base, the next_expiry will not
> be updated. It was originally calculated as
> 
>   base->clk + NEXT_TIMER_MAX_DELTA
> 
> When the timer base stays empty long enough (> NEXT_TIMER_MAX_DELTA), the
> next_expiry value of the empty base suggests that there is a timer pending
> soon. This might be more a kind of a theoretical problem, but the fix
> doesn't hurt.

This solves a real issue. I suggest removing the last sentence and add instead:

If the CPU sleeps in idle for a bit more than NEXT_TIMER_MAX_DELTA
(~12 days in HZ=1000) and then an interrupt fires, upon going back to idle
get_next_timer_interrupt() will still return KTIME_MAX but incorrectly set
is_idle to false. Therefore the CPU will keep the tick stopped and go back to
sleep though further remote enqueue of timers to this CPU will fail to send an IPI.
As a result the timer will remain ignored.

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
  

Patch

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 0826018d9873..4dffe966424c 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1922,8 +1922,8 @@  static u64 cmp_next_hrtimer_event(u64 basem, u64 expires)
 u64 get_next_timer_interrupt(unsigned long basej, u64 basem)
 {
 	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
+	unsigned long nextevt = basej + NEXT_TIMER_MAX_DELTA;
 	u64 expires = KTIME_MAX;
-	unsigned long nextevt;
 
 	/*
 	 * Pretend that there is no timer pending if the cpu is offline.
@@ -1935,7 +1935,6 @@  u64 get_next_timer_interrupt(unsigned long basej, u64 basem)
 	raw_spin_lock(&base->lock);
 	if (base->next_expiry_recalc)
 		next_expiry_recalc(base);
-	nextevt = base->next_expiry;
 
 	/*
 	 * We have a fresh next event. Check whether we can forward the
@@ -1944,10 +1943,20 @@  u64 get_next_timer_interrupt(unsigned long basej, u64 basem)
 	__forward_timer_base(base, basej);
 
 	if (base->timers_pending) {
+		nextevt = base->next_expiry;
+
 		/* If we missed a tick already, force 0 delta */
 		if (time_before(nextevt, basej))
 			nextevt = basej;
 		expires = basem + (u64)(nextevt - basej) * TICK_NSEC;
+	} else {
+		/*
+		 * Move next_expiry for the empty base into the future to
+		 * prevent a unnecessary raise of the timer softirq when the
+		 * next_expiry value will be reached even if there is no timer
+		 * pending.
+		 */
+		base->next_expiry = nextevt;
 	}
 
 	/*