[clocksource,1/2] clocksource: Add comments to classify bogus measurements

Message ID 20221102184009.1306751-1-paulmck@kernel.org
State New
Headers
Series Clocksource-watchdog classification and backoff |

Commit Message

Paul E. McKenney Nov. 2, 2022, 6:40 p.m. UTC
  An extremely busy system can delay the clocksource watchdog, so that
the corresponding too-long bogus-measurement error does not necessarily
imply an error in the system.  However, a too-short bogus-measurement
error likely indicates a bug in hardware, firmware or software.

Therefore, add comments clarifying these bogus-measurement pr_warn()s.

Reported-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: John Stultz <jstultz@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Waiman Long <longman@redhat.com>
---
 kernel/time/clocksource.c | 2 ++
 1 file changed, 2 insertions(+)
  

Comments

Waiman Long Nov. 3, 2022, 2:23 a.m. UTC | #1
On 11/2/22 14:40, Paul E. McKenney wrote:
> An extremely busy system can delay the clocksource watchdog, so that
> the corresponding too-long bogus-measurement error does not necessarily
> imply an error in the system.  However, a too-short bogus-measurement
> error likely indicates a bug in hardware, firmware or software.
>
> Therefore, add comments clarifying these bogus-measurement pr_warn()s.
>
> Reported-by: Feng Tang <feng.tang@intel.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Cc: John Stultz <jstultz@google.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Stephen Boyd <sboyd@kernel.org>
> Cc: Feng Tang <feng.tang@intel.com>
> Cc: Waiman Long <longman@redhat.com>
> ---
>   kernel/time/clocksource.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index dcaf38c062161..3f5317faf891f 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -443,10 +443,12 @@ static void clocksource_watchdog(struct timer_list *unused)
>   		/* Check for bogus measurements. */
>   		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
>   		if (wd_nsec < (wdi >> 2)) {
> +			/* This usually indicates broken timer code or hardware. */
>   			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
>   			continue;
>   		}
>   		if (wd_nsec > (wdi << 2)) {
> +			/* This can happen on busy systems, which can delay the watchdog. */
>   			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
>   			continue;
>   		}

Looks good.

Reviewed-by: Waiman Long <longman@redhat.com>
  
Paul E. McKenney Nov. 3, 2022, 8:47 p.m. UTC | #2
On Wed, Nov 02, 2022 at 10:23:25PM -0400, Waiman Long wrote:
> On 11/2/22 14:40, Paul E. McKenney wrote:
> > An extremely busy system can delay the clocksource watchdog, so that
> > the corresponding too-long bogus-measurement error does not necessarily
> > imply an error in the system.  However, a too-short bogus-measurement
> > error likely indicates a bug in hardware, firmware or software.
> > 
> > Therefore, add comments clarifying these bogus-measurement pr_warn()s.
> > 
> > Reported-by: Feng Tang <feng.tang@intel.com>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Cc: John Stultz <jstultz@google.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Stephen Boyd <sboyd@kernel.org>
> > Cc: Feng Tang <feng.tang@intel.com>
> > Cc: Waiman Long <longman@redhat.com>
> > ---
> >   kernel/time/clocksource.c | 2 ++
> >   1 file changed, 2 insertions(+)
> > 
> > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> > index dcaf38c062161..3f5317faf891f 100644
> > --- a/kernel/time/clocksource.c
> > +++ b/kernel/time/clocksource.c
> > @@ -443,10 +443,12 @@ static void clocksource_watchdog(struct timer_list *unused)
> >   		/* Check for bogus measurements. */
> >   		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
> >   		if (wd_nsec < (wdi >> 2)) {
> > +			/* This usually indicates broken timer code or hardware. */
> >   			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> >   			continue;
> >   		}
> >   		if (wd_nsec > (wdi << 2)) {
> > +			/* This can happen on busy systems, which can delay the watchdog. */
> >   			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> >   			continue;
> >   		}
> 
> Looks good.
> 
> Reviewed-by: Waiman Long <longman@redhat.com>

Applied, thank you!

							Thanx, Paul
  
Feng Tang Nov. 4, 2022, 2:13 a.m. UTC | #3
On Wed, Nov 02, 2022 at 11:40:08AM -0700, Paul E. McKenney wrote:
> An extremely busy system can delay the clocksource watchdog, so that
> the corresponding too-long bogus-measurement error does not necessarily
> imply an error in the system.  However, a too-short bogus-measurement
> error likely indicates a bug in hardware, firmware or software.
> 
> Therefore, add comments clarifying these bogus-measurement pr_warn()s.

Looks good to me.

Reviewed-by: Feng Tang <feng.tang@intel.com>

> 
> Reported-by: Feng Tang <feng.tang@intel.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Cc: John Stultz <jstultz@google.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Stephen Boyd <sboyd@kernel.org>
> Cc: Feng Tang <feng.tang@intel.com>
> Cc: Waiman Long <longman@redhat.com>
> ---
>  kernel/time/clocksource.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index dcaf38c062161..3f5317faf891f 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -443,10 +443,12 @@ static void clocksource_watchdog(struct timer_list *unused)
>  		/* Check for bogus measurements. */
>  		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
>  		if (wd_nsec < (wdi >> 2)) {
> +			/* This usually indicates broken timer code or hardware. */
>  			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
>  			continue;
>  		}
>  		if (wd_nsec > (wdi << 2)) {
> +			/* This can happen on busy systems, which can delay the watchdog. */
>  			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
>  			continue;
>  		}
> -- 
> 2.31.1.189.g2e36527f23
>
  

Patch

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index dcaf38c062161..3f5317faf891f 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -443,10 +443,12 @@  static void clocksource_watchdog(struct timer_list *unused)
 		/* Check for bogus measurements. */
 		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
 		if (wd_nsec < (wdi >> 2)) {
+			/* This usually indicates broken timer code or hardware. */
 			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
 			continue;
 		}
 		if (wd_nsec > (wdi << 2)) {
+			/* This can happen on busy systems, which can delay the watchdog. */
 			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
 			continue;
 		}