timers: fix LVL_START macro

Message ID 20221115025614.79537-1-yun.zhou@windriver.com
State New
Headers
Series timers: fix LVL_START macro |

Commit Message

Yun Zhou Nov. 15, 2022, 2:56 a.m. UTC
  The number of buckets per level should be LVL_SIZE, not LVL_SIZE-1.

Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
---
 kernel/time/timer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Frederic Weisbecker Nov. 15, 2022, 12:02 p.m. UTC | #1
Hi Yun Zhou,

On Tue, Nov 15, 2022 at 10:56:14AM +0800, Yun Zhou wrote:
> The number of buckets per level should be LVL_SIZE, not LVL_SIZE-1.
> 
> Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
> ---
>  kernel/time/timer.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index 717fcb9fb14a..1116b208093e 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -161,7 +161,7 @@ EXPORT_SYMBOL(jiffies_64);
>   * time. We start from the last possible delta of the previous level
>   * so that we can later add an extra LVL_GRAN(n) to n (see calc_index()).
>   */
> -#define LVL_START(n)	((LVL_SIZE - 1) << (((n) - 1) * LVL_CLK_SHIFT))
> +#define LVL_START(n)	(LVL_SIZE << (((n) - 1) * LVL_CLK_SHIFT))

See the comment above:

   "We start from the last possible delta of the previous level
    so that we can later add an extra LVL_GRAN(n) to n (see calc_index())."

Thanks.

>  
>  /* Size of each clock level */
>  #define LVL_BITS	6
> -- 
> 2.35.2
>
  
Frederic Weisbecker Nov. 15, 2022, 10:40 p.m. UTC | #2
On Tue, Nov 15, 2022 at 01:15:11PM +0000, Zhou, Yun wrote:
> Hi Frederic,
> 
> The issue now is that a timer may be thrown into the upper level bucket. For example, expires 4090 and 1000 HZ, it should be in level 2, but now it will be placed in the level 3. Is this expected?
> 
>  * HZ 1000 steps
>  * Level Offset  Granularity            Range
>  *  0      0         1 ms                0 ms -         63 ms
>  *  1     64         8 ms               64 ms -        511 ms
>  *  2    128        64 ms              512 ms -       4095 ms (512ms - ~4s)
>  *  3    192       512 ms             4096 ms -      32767 ms (~4s - ~32s)
>  *  4    256      4096 ms (~4s)      32768 ms -     262143 ms (~32s - ~4m)

The rule is that a timer is not allowed to expire too early. But it can expire
a bit late. Hence why it is always rounded up. So in the case of 4090, we have
the choice between:

1) expiring at bucket 2 after 4096 - 64 = 4032 ms
2) expiring at bucket 3 after 4096 ms

The 1) rounds down and expires too early. The 2) rounds up and expires a bit
late. So the second solution is preferred.

Thanks.
  
Thomas Gleixner Nov. 16, 2022, 11:48 p.m. UTC | #3
On Tue, Nov 15 2022 at 23:40, Frederic Weisbecker wrote:
> On Tue, Nov 15, 2022 at 01:15:11PM +0000, Zhou, Yun wrote:
>> Hi Frederic,
>> 
>> The issue now is that a timer may be thrown into the upper level bucket. For example, expires 4090 and 1000 HZ, it should be in level 2, but now it will be placed in the level 3. Is this expected?
>> 
>>  * HZ 1000 steps
>>  * Level Offset  Granularity            Range
>>  *  0      0         1 ms                0 ms -         63 ms
>>  *  1     64         8 ms               64 ms -        511 ms
>>  *  2    128        64 ms              512 ms -       4095 ms (512ms - ~4s)
>>  *  3    192       512 ms             4096 ms -      32767 ms (~4s - ~32s)
>>  *  4    256      4096 ms (~4s)      32768 ms -     262143 ms (~32s - ~4m)
>
> The rule is that a timer is not allowed to expire too early. But it can expire
> a bit late. Hence why it is always rounded up. So in the case of 4090, we have
> the choice between:
>
> 1) expiring at bucket 2 after 4096 - 64 = 4032 ms
> 2) expiring at bucket 3 after 4096 ms
>
> The 1) rounds down and expires too early. The 2) rounds up and expires a bit
> late. So the second solution is preferred.

It's not only preferred, it's required simply because the timer wheel
has only one guarantee: Not to expire early.

Timer wheel based timers are fundamentaly not precise unless the timeout
is short and hits the first level.

But even hrtimers which are designed to be precise have only one real
guarantee: Not to expire early.

hrtimers do not have the side effect of batching on long timeouts like
timer wheel based timer have, but that's it.

Timers in the kernel come with a choice:

  -  Imprecise and inexpensive to arm and cancel (timer_list)
  -  Precise and expensive to arm and cancel (hrtimer)

You can't have both. That's well documented.

Thanks,

        tglx
  
Frederic Weisbecker Nov. 17, 2022, 12:14 p.m. UTC | #4
On Thu, Nov 17, 2022 at 12:48:05AM +0100, Thomas Gleixner wrote:
> On Tue, Nov 15 2022 at 23:40, Frederic Weisbecker wrote:
> > On Tue, Nov 15, 2022 at 01:15:11PM +0000, Zhou, Yun wrote:
> >> Hi Frederic,
> >> 
> >> The issue now is that a timer may be thrown into the upper level bucket. For example, expires 4090 and 1000 HZ, it should be in level 2, but now it will be placed in the level 3. Is this expected?
> >> 
> >>  * HZ 1000 steps
> >>  * Level Offset  Granularity            Range
> >>  *  0      0         1 ms                0 ms -         63 ms
> >>  *  1     64         8 ms               64 ms -        511 ms
> >>  *  2    128        64 ms              512 ms -       4095 ms (512ms - ~4s)
> >>  *  3    192       512 ms             4096 ms -      32767 ms (~4s - ~32s)
> >>  *  4    256      4096 ms (~4s)      32768 ms -     262143 ms (~32s - ~4m)
> >
> > The rule is that a timer is not allowed to expire too early. But it can expire
> > a bit late. Hence why it is always rounded up. So in the case of 4090, we have
> > the choice between:
> >
> > 1) expiring at bucket 2 after 4096 - 64 = 4032 ms
> > 2) expiring at bucket 3 after 4096 ms
> >
> > The 1) rounds down and expires too early. The 2) rounds up and expires a bit
> > late. So the second solution is preferred.
> 
> It's not only preferred, it's required simply because the timer wheel
> has only one guarantee: Not to expire early.
> 
> Timer wheel based timers are fundamentaly not precise unless the timeout
> is short and hits the first level.
> 
> But even hrtimers which are designed to be precise have only one real
> guarantee: Not to expire early.
> 
> hrtimers do not have the side effect of batching on long timeouts like
> timer wheel based timer have, but that's it.
> 
> Timers in the kernel come with a choice:
> 
>   -  Imprecise and inexpensive to arm and cancel (timer_list)
>   -  Precise and expensive to arm and cancel (hrtimer)
> 
> You can't have both. That's well documented.

Actually I'm pretty sure we can manage imprecise and expensive to arm and
cancel. It's a matter of willpower!

Anyway, thanks for confirming what I thought about timers guarantees.

Thanks.

> 
> Thanks,
> 
>         tglx
  

Patch

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 717fcb9fb14a..1116b208093e 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -161,7 +161,7 @@  EXPORT_SYMBOL(jiffies_64);
  * time. We start from the last possible delta of the previous level
  * so that we can later add an extra LVL_GRAN(n) to n (see calc_index()).
  */
-#define LVL_START(n)	((LVL_SIZE - 1) << (((n) - 1) * LVL_CLK_SHIFT))
+#define LVL_START(n)	(LVL_SIZE << (((n) - 1) * LVL_CLK_SHIFT))
 
 /* Size of each clock level */
 #define LVL_BITS	6