[clocksource,2/2] clocksource: Exponential backoff for load-induced bogus watchdog reads
Message ID | 20221102184009.1306751-2-paulmck@kernel.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp74661wru; Wed, 2 Nov 2022 11:41:46 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7lSvsxMqnonPecMVHQLi2biXnIe59LsqjCB8+hJwNmoRn4SIBy5hJ9aC54FI8ON76motTr X-Received: by 2002:a17:902:ea03:b0:187:30ec:2b9e with SMTP id s3-20020a170902ea0300b0018730ec2b9emr12671360plg.75.1667414506610; Wed, 02 Nov 2022 11:41:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667414506; cv=none; d=google.com; s=arc-20160816; b=qpFFZxoD+HEu9kNDv4Dx1Tbynxyo5B1pWC2jiJt0dQBCFmXiXxzxmBK9MA88r/ViIy P3f1Tg9BLe49oZhlS9tWibmga0qr4BiEdMipeCKa8n0xUF61DxE+67KN+37NjgFornpT vWAmdhgM6iepX/mmzscmeY/x0yJbSdUbvWIW8AfnrpcsjXd6GCVePa4a4m5OO3WylqFK DQsutP29FDWBBo6LfxJBm12hbOnu9VCzLTFjYfrTM6LhiOuj5S9VFlNVW/mZvJN665Np N0EChBheDfC5oFOQ6cN2ZJr9rle08gYnVx3oG3FehQ2bCwcNz83p9Gh4VrXTWaKBezRt rCyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=bGhZu1zlDrMXZs5FwJmwokVVd20jO/qjJwh0TLJj5Pc=; b=AD2vNjAKfK4C639lvwNlgDPIbblofmqp3uboVdiVVxUlt2fQu+D7YNVQNA6eQOF9d1 y0Qk1JMFpV9NPZ/I6lk2ZgceqaKNlmK7PaYunrtih6aUUZFoJjVnAox8W4JB5Qy/Vpya fcyWTvLZTzgCthusqAeWfm3rpSXfFUAtOXG/Rw0N+1CP5pemu+WyJJAslSjHs9KoWgdz iPATDuZ5bwYt2LTrmDJe0F56VxQYhGbrb9Q9QeNjQFB7M96GrwDKk3S8h7/cqGtJfYDO KHvwONk6NfIzEz3EInejOT6YkhQMz39wVb5MGRf6JF9aIc9gbFgCstDk1KbOOP5oI5kV 91+w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Kov6kfM7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 32-20020a630c60000000b0044034f2c3b8si15832957pgm.310.2022.11.02.11.41.33; Wed, 02 Nov 2022 11:41:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Kov6kfM7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231327AbiKBSkT (ORCPT <rfc822;yves.mi.zy@gmail.com> + 99 others); Wed, 2 Nov 2022 14:40:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35244 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231211AbiKBSkP (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 2 Nov 2022 14:40:15 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5293831A for <linux-kernel@vger.kernel.org>; Wed, 2 Nov 2022 11:40:13 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id E89DFB82437 for <linux-kernel@vger.kernel.org>; Wed, 2 Nov 2022 18:40:11 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 886BFC433C1; Wed, 2 Nov 2022 18:40:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667414410; bh=SaIxDonz7jxIjNGEnJ7dwqSnqRS3l2440YhJ0ssJuBw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Kov6kfM75q2SWRX1ME+CRswKJu3gMjUkIPttchOTUJmiWdjGywmC5DZOozwtM1ifP mTkaZddazjVVesBcV2Ha+G93BUI4LAdnsgI1iItRo4JES9GatHV5SZZLgVT1AxvXGA bFPUDn8/Ym3Jx+KYXZD90JWTV6q5mh5KDL2700RLTkkcaqzuoxWJUeCTtOk3Yge1gA VeHchcw4DsR4edRDmz2PnmUg01rYCwVuzQT6Vv0Nik7SxjSRIDZ+ZGYJlkH0QDRGhY /JeKgDro1PnkiI6e+RryEbkZYE0w1zj9m7FYRlJzcRm/X1pCzm31xjmBmEtEqr7qPg dMHpaCMOlx4EQ== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 41F415C18FC; Wed, 2 Nov 2022 11:40:10 -0700 (PDT) From: "Paul E. McKenney" <paulmck@kernel.org> To: tglx@linutronix.de Cc: linux-kernel@vger.kernel.org, john.stultz@linaro.org, sboyd@kernel.org, corbet@lwn.net, Mark.Rutland@arm.com, maz@kernel.org, kernel-team@meta.com, neeraju@codeaurora.org, ak@linux.intel.com, feng.tang@intel.com, zhengjun.xing@intel.com, "Paul E. McKenney" <paulmck@kernel.org>, Waiman Long <longman@redhat.com>, John Stultz <jstultz@google.com> Subject: [PATCH clocksource 2/2] clocksource: Exponential backoff for load-induced bogus watchdog reads Date: Wed, 2 Nov 2022 11:40:09 -0700 Message-Id: <20221102184009.1306751-2-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20221102184001.GA1306489@paulmck-ThinkPad-P17-Gen-1> References: <20221102184001.GA1306489@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-8.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748410834059213080?= X-GMAIL-MSGID: =?utf-8?q?1748410834059213080?= |
Series |
Clocksource-watchdog classification and backoff
|
|
Commit Message
Paul E. McKenney
Nov. 2, 2022, 6:40 p.m. UTC
The clocksource watchdog will reject measurements that are excessively
delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second
watchdog interval. On an extremely busy system, this can result in a
console message being printed every two seconds. This is excessively
noisy for a non-error condition.
Therefore, apply exponential backoff to these messages. This exponential
backoff is capped at 1024 times the watchdog interval, which comes to
not quite one message per ten minutes.
Please note that the bogus watchdog reads that occur when the watchdog
interval is less than 0.125 seconds are still printed unconditionally
because these likely correspond to a serious error condition in the
timer code or hardware.
[ paulmck: Apply Feng Tang feedback. ]
Reported-by: Waiman Long <longman@redhat.com>
Reported-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: John Stultz <jstultz@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Waiman Long <longman@redhat.com>
---
include/linux/clocksource.h | 4 ++++
kernel/time/clocksource.c | 31 +++++++++++++++++++++++++------
2 files changed, 29 insertions(+), 6 deletions(-)
Comments
On 11/2/22 14:40, Paul E. McKenney wrote: > The clocksource watchdog will reject measurements that are excessively > delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second > watchdog interval. On an extremely busy system, this can result in a > console message being printed every two seconds. This is excessively > noisy for a non-error condition. > > Therefore, apply exponential backoff to these messages. This exponential > backoff is capped at 1024 times the watchdog interval, which comes to > not quite one message per ten minutes. > > Please note that the bogus watchdog reads that occur when the watchdog > interval is less than 0.125 seconds are still printed unconditionally > because these likely correspond to a serious error condition in the > timer code or hardware. > > [ paulmck: Apply Feng Tang feedback. ] > > Reported-by: Waiman Long <longman@redhat.com> > Reported-by: Feng Tang <feng.tang@intel.com> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > Cc: John Stultz <jstultz@google.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Stephen Boyd <sboyd@kernel.org> > Cc: Feng Tang <feng.tang@intel.com> > Cc: Waiman Long <longman@redhat.com> > --- > include/linux/clocksource.h | 4 ++++ > kernel/time/clocksource.c | 31 +++++++++++++++++++++++++------ > 2 files changed, 29 insertions(+), 6 deletions(-) > > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h > index 1d42d4b173271..daac05aedf56a 100644 > --- a/include/linux/clocksource.h > +++ b/include/linux/clocksource.h > @@ -125,6 +125,10 @@ struct clocksource { > struct list_head wd_list; > u64 cs_last; > u64 wd_last; > + u64 wd_last_bogus; > + int wd_bogus_shift; > + unsigned long wd_bogus_count; > + unsigned long wd_bogus_count_last; > #endif > struct module *owner; > }; > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c > index 3f5317faf891f..de8047b6720f5 100644 > --- a/kernel/time/clocksource.c > +++ b/kernel/time/clocksource.c > @@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused) > > /* Check for bogus measurements. */ > wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL); > - if (wd_nsec < (wdi >> 2)) { > - /* This usually indicates broken timer code or hardware. */ > - pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); > + if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) { > + bool needwarn = false; > + u64 wd_lb; > + > + cs->wd_bogus_count++; > + if (!cs->wd_bogus_shift) { > + needwarn = true; > + } else { > + delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask); > + wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift); > + if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb) > + needwarn = true; > + } > + if (needwarn) { > + /* This can happen on busy systems, which can delay the watchdog. */ > + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count - cs->wd_bogus_count_last); > + cs->wd_last_bogus = wdnow; > + if (cs->wd_bogus_shift < 10) > + cs->wd_bogus_shift++; > + cs->wd_bogus_count_last = cs->wd_bogus_count; I don't think you need to keep 2 counters to store the number of skipped warnings. How about just use a single wd_bogus_skip_count like If (needwarn) { pr_warn("... %lu additonal ...", ... cs->wd_bogus_skip_count); cs->wd_bogus_skip_count = 0: } else { cs->wd_bogus_skip_count++; } Cheers, Longman
On Wed, Nov 02, 2022 at 10:28:05PM -0400, Waiman Long wrote: > On 11/2/22 14:40, Paul E. McKenney wrote: > > The clocksource watchdog will reject measurements that are excessively > > delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second > > watchdog interval. On an extremely busy system, this can result in a > > console message being printed every two seconds. This is excessively > > noisy for a non-error condition. > > > > Therefore, apply exponential backoff to these messages. This exponential > > backoff is capped at 1024 times the watchdog interval, which comes to > > not quite one message per ten minutes. > > > > Please note that the bogus watchdog reads that occur when the watchdog > > interval is less than 0.125 seconds are still printed unconditionally > > because these likely correspond to a serious error condition in the > > timer code or hardware. > > > > [ paulmck: Apply Feng Tang feedback. ] > > > > Reported-by: Waiman Long <longman@redhat.com> > > Reported-by: Feng Tang <feng.tang@intel.com> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > Cc: John Stultz <jstultz@google.com> > > Cc: Thomas Gleixner <tglx@linutronix.de> > > Cc: Stephen Boyd <sboyd@kernel.org> > > Cc: Feng Tang <feng.tang@intel.com> > > Cc: Waiman Long <longman@redhat.com> > > --- > > include/linux/clocksource.h | 4 ++++ > > kernel/time/clocksource.c | 31 +++++++++++++++++++++++++------ > > 2 files changed, 29 insertions(+), 6 deletions(-) > > > > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h > > index 1d42d4b173271..daac05aedf56a 100644 > > --- a/include/linux/clocksource.h > > +++ b/include/linux/clocksource.h > > @@ -125,6 +125,10 @@ struct clocksource { > > struct list_head wd_list; > > u64 cs_last; > > u64 wd_last; > > + u64 wd_last_bogus; > > + int wd_bogus_shift; > > + unsigned long wd_bogus_count; > > + unsigned long wd_bogus_count_last; > > #endif > > struct module *owner; > > }; > > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c > > index 3f5317faf891f..de8047b6720f5 100644 > > --- a/kernel/time/clocksource.c > > +++ b/kernel/time/clocksource.c > > @@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused) > > /* Check for bogus measurements. */ > > wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL); > > - if (wd_nsec < (wdi >> 2)) { > > - /* This usually indicates broken timer code or hardware. */ > > - pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); > > + if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) { > > + bool needwarn = false; > > + u64 wd_lb; > > + > > + cs->wd_bogus_count++; > > + if (!cs->wd_bogus_shift) { > > + needwarn = true; > > + } else { > > + delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask); > > + wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift); > > + if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb) > > + needwarn = true; > > + } > > + if (needwarn) { > > + /* This can happen on busy systems, which can delay the watchdog. */ > > + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count - cs->wd_bogus_count_last); > > + cs->wd_last_bogus = wdnow; > > + if (cs->wd_bogus_shift < 10) > > + cs->wd_bogus_shift++; > > + cs->wd_bogus_count_last = cs->wd_bogus_count; > > I don't think you need to keep 2 counters to store the number of skipped > warnings. How about just use a single wd_bogus_skip_count like > > If (needwarn) { > pr_warn("... %lu additonal ...", ... cs->wd_bogus_skip_count); > cs->wd_bogus_skip_count = 0: > } else { > cs->wd_bogus_skip_count++; > } Good point! Perhaps I am too used to there being multiple entities taking snapshots? Anyway, does the updated version shown below look better? Thanx, Paul ------------------------------------------------------------------------ commit da44b8af99222ff8761a98ca8c00837a7d607d28 Author: Paul E. McKenney <paulmck@kernel.org> Date: Fri Oct 28 10:38:58 2022 -0700 clocksource: Exponential backoff for load-induced bogus watchdog reads The clocksource watchdog will reject measurements that are excessively delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second watchdog interval. On an extremely busy system, this can result in a console message being printed every two seconds. This is excessively noisy for a non-error condition. Therefore, apply exponential backoff to these messages. This exponential backoff is capped at 1024 times the watchdog interval, which comes to not quite one message per ten minutes. Please note that the bogus watchdog reads that occur when the watchdog interval is less than 0.125 seconds are still printed unconditionally because these likely correspond to a serious error condition in the timer code or hardware. [ paulmck: Apply Feng Tang feedback. ] [ paulmck: Apply Waiman Long feedback. ] Reported-by: Waiman Long <longman@redhat.com> Reported-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: John Stultz <jstultz@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Feng Tang <feng.tang@intel.com> Cc: Waiman Long <longman@redhat.com> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index 1d42d4b173271..23b73f2293d6d 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -125,6 +125,9 @@ struct clocksource { struct list_head wd_list; u64 cs_last; u64 wd_last; + u64 wd_last_bogus; + int wd_bogus_shift; + unsigned long wd_bogus_count; #endif struct module *owner; }; diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 3f5317faf891f..1eefb56505350 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused) /* Check for bogus measurements. */ wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL); - if (wd_nsec < (wdi >> 2)) { - /* This usually indicates broken timer code or hardware. */ - pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); + if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) { + bool needwarn = false; + u64 wd_lb; + + cs->wd_bogus_count++; + if (!cs->wd_bogus_shift) { + needwarn = true; + } else { + delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask); + wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift); + if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb) + needwarn = true; + } + if (needwarn) { + /* This can happen on busy systems, which can delay the watchdog. */ + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count); + cs->wd_last_bogus = wdnow; + if (cs->wd_bogus_shift < 10) + cs->wd_bogus_shift++; + cs->wd_bogus_count = 0; + } continue; } - if (wd_nsec > (wdi << 2)) { - /* This can happen on busy systems, which can delay the watchdog. */ - pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); + /* Check too-short measurements second to handle wrap. */ + if (wd_nsec < (wdi >> 2) || cs_nsec < (wdi >> 2)) { + /* This usually indicates broken timer code or hardware. */ + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); continue; }
On 11/3/22 16:49, Paul E. McKenney wrote: > commit da44b8af99222ff8761a98ca8c00837a7d607d28 > Author: Paul E. McKenney<paulmck@kernel.org> > Date: Fri Oct 28 10:38:58 2022 -0700 > > clocksource: Exponential backoff for load-induced bogus watchdog reads > > The clocksource watchdog will reject measurements that are excessively > delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second > watchdog interval. On an extremely busy system, this can result in a > console message being printed every two seconds. This is excessively > noisy for a non-error condition. > > Therefore, apply exponential backoff to these messages. This exponential > backoff is capped at 1024 times the watchdog interval, which comes to > not quite one message per ten minutes. > > Please note that the bogus watchdog reads that occur when the watchdog > interval is less than 0.125 seconds are still printed unconditionally > because these likely correspond to a serious error condition in the > timer code or hardware. > > [ paulmck: Apply Feng Tang feedback. ] > [ paulmck: Apply Waiman Long feedback. ] > > Reported-by: Waiman Long<longman@redhat.com> > Reported-by: Feng Tang<feng.tang@intel.com> > Signed-off-by: Paul E. McKenney<paulmck@kernel.org> > Cc: John Stultz<jstultz@google.com> > Cc: Thomas Gleixner<tglx@linutronix.de> > Cc: Stephen Boyd<sboyd@kernel.org> > Cc: Feng Tang<feng.tang@intel.com> > Cc: Waiman Long<longman@redhat.com> > > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h > index 1d42d4b173271..23b73f2293d6d 100644 > --- a/include/linux/clocksource.h > +++ b/include/linux/clocksource.h > @@ -125,6 +125,9 @@ struct clocksource { > struct list_head wd_list; > u64 cs_last; > u64 wd_last; > + u64 wd_last_bogus; > + int wd_bogus_shift; > + unsigned long wd_bogus_count; > #endif > struct module *owner; > }; > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c > index 3f5317faf891f..1eefb56505350 100644 > --- a/kernel/time/clocksource.c > +++ b/kernel/time/clocksource.c > @@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused) > > /* Check for bogus measurements. */ > wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL); > - if (wd_nsec < (wdi >> 2)) { > - /* This usually indicates broken timer code or hardware. */ > - pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); > + if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) { > + bool needwarn = false; > + u64 wd_lb; > + > + cs->wd_bogus_count++; > + if (!cs->wd_bogus_shift) { > + needwarn = true; > + } else { > + delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask); > + wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift); > + if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb) > + needwarn = true; > + } > + if (needwarn) { > + /* This can happen on busy systems, which can delay the watchdog. */ > + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count); Just one question, does "%lu additional" means the number of bogus count that doesn't meet the needwarn requirement and hence skipped. If so, I think you have to use "cs->wd_bogus_cnt - 1". Other than that, the change looks good to me. Cheers, Longman
On Thu, Nov 03, 2022 at 08:20:27PM -0400, Waiman Long wrote: > On 11/3/22 16:49, Paul E. McKenney wrote: > > commit da44b8af99222ff8761a98ca8c00837a7d607d28 > > Author: Paul E. McKenney<paulmck@kernel.org> > > Date: Fri Oct 28 10:38:58 2022 -0700 > > > > clocksource: Exponential backoff for load-induced bogus watchdog reads > > The clocksource watchdog will reject measurements that are excessively > > delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second > > watchdog interval. On an extremely busy system, this can result in a > > console message being printed every two seconds. This is excessively > > noisy for a non-error condition. > > Therefore, apply exponential backoff to these messages. This exponential > > backoff is capped at 1024 times the watchdog interval, which comes to > > not quite one message per ten minutes. > > Please note that the bogus watchdog reads that occur when the watchdog > > interval is less than 0.125 seconds are still printed unconditionally > > because these likely correspond to a serious error condition in the > > timer code or hardware. > > [ paulmck: Apply Feng Tang feedback. ] > > [ paulmck: Apply Waiman Long feedback. ] > > Reported-by: Waiman Long<longman@redhat.com> > > Reported-by: Feng Tang<feng.tang@intel.com> > > Signed-off-by: Paul E. McKenney<paulmck@kernel.org> > > Cc: John Stultz<jstultz@google.com> > > Cc: Thomas Gleixner<tglx@linutronix.de> > > Cc: Stephen Boyd<sboyd@kernel.org> > > Cc: Feng Tang<feng.tang@intel.com> > > Cc: Waiman Long<longman@redhat.com> > > > > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h > > index 1d42d4b173271..23b73f2293d6d 100644 > > --- a/include/linux/clocksource.h > > +++ b/include/linux/clocksource.h > > @@ -125,6 +125,9 @@ struct clocksource { > > struct list_head wd_list; > > u64 cs_last; > > u64 wd_last; > > + u64 wd_last_bogus; > > + int wd_bogus_shift; > > + unsigned long wd_bogus_count; > > #endif > > struct module *owner; > > }; > > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c > > index 3f5317faf891f..1eefb56505350 100644 > > --- a/kernel/time/clocksource.c > > +++ b/kernel/time/clocksource.c > > @@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused) > > /* Check for bogus measurements. */ > > wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL); > > - if (wd_nsec < (wdi >> 2)) { > > - /* This usually indicates broken timer code or hardware. */ > > - pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); > > + if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) { > > + bool needwarn = false; > > + u64 wd_lb; > > + > > + cs->wd_bogus_count++; > > + if (!cs->wd_bogus_shift) { > > + needwarn = true; > > + } else { > > + delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask); > > + wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift); > > + if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb) > > + needwarn = true; > > + } > > + if (needwarn) { > > + /* This can happen on busy systems, which can delay the watchdog. */ > > + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count); > > Just one question, does "%lu additional" means the number of bogus count > that doesn't meet the needwarn requirement and hence skipped. If so, I think > you have to use "cs->wd_bogus_cnt - 1". Other than that, the change looks > good to me. It means the number since the last report, or, for the first report, the number since boot. Does that work for you? Thanx, Paul
On 11/3/22 20:26, Paul E. McKenney wrote: > On Thu, Nov 03, 2022 at 08:20:27PM -0400, Waiman Long wrote: >> On 11/3/22 16:49, Paul E. McKenney wrote: >>> commit da44b8af99222ff8761a98ca8c00837a7d607d28 >>> Author: Paul E. McKenney<paulmck@kernel.org> >>> Date: Fri Oct 28 10:38:58 2022 -0700 >>> >>> clocksource: Exponential backoff for load-induced bogus watchdog reads >>> The clocksource watchdog will reject measurements that are excessively >>> delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second >>> watchdog interval. On an extremely busy system, this can result in a >>> console message being printed every two seconds. This is excessively >>> noisy for a non-error condition. >>> Therefore, apply exponential backoff to these messages. This exponential >>> backoff is capped at 1024 times the watchdog interval, which comes to >>> not quite one message per ten minutes. >>> Please note that the bogus watchdog reads that occur when the watchdog >>> interval is less than 0.125 seconds are still printed unconditionally >>> because these likely correspond to a serious error condition in the >>> timer code or hardware. >>> [ paulmck: Apply Feng Tang feedback. ] >>> [ paulmck: Apply Waiman Long feedback. ] >>> Reported-by: Waiman Long<longman@redhat.com> >>> Reported-by: Feng Tang<feng.tang@intel.com> >>> Signed-off-by: Paul E. McKenney<paulmck@kernel.org> >>> Cc: John Stultz<jstultz@google.com> >>> Cc: Thomas Gleixner<tglx@linutronix.de> >>> Cc: Stephen Boyd<sboyd@kernel.org> >>> Cc: Feng Tang<feng.tang@intel.com> >>> Cc: Waiman Long<longman@redhat.com> >>> >>> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h >>> index 1d42d4b173271..23b73f2293d6d 100644 >>> --- a/include/linux/clocksource.h >>> +++ b/include/linux/clocksource.h >>> @@ -125,6 +125,9 @@ struct clocksource { >>> struct list_head wd_list; >>> u64 cs_last; >>> u64 wd_last; >>> + u64 wd_last_bogus; >>> + int wd_bogus_shift; >>> + unsigned long wd_bogus_count; >>> #endif >>> struct module *owner; >>> }; >>> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c >>> index 3f5317faf891f..1eefb56505350 100644 >>> --- a/kernel/time/clocksource.c >>> +++ b/kernel/time/clocksource.c >>> @@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused) >>> /* Check for bogus measurements. */ >>> wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL); >>> - if (wd_nsec < (wdi >> 2)) { >>> - /* This usually indicates broken timer code or hardware. */ >>> - pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); >>> + if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) { >>> + bool needwarn = false; >>> + u64 wd_lb; >>> + >>> + cs->wd_bogus_count++; >>> + if (!cs->wd_bogus_shift) { >>> + needwarn = true; >>> + } else { >>> + delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask); >>> + wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift); >>> + if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb) >>> + needwarn = true; >>> + } >>> + if (needwarn) { >>> + /* This can happen on busy systems, which can delay the watchdog. */ >>> + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count); >> Just one question, does "%lu additional" means the number of bogus count >> that doesn't meet the needwarn requirement and hence skipped. If so, I think >> you have to use "cs->wd_bogus_cnt - 1". Other than that, the change looks >> good to me. > It means the number since the last report, or, for the first report, > the number since boot. > > Does that work for you? OK, I think the word "additional" tricks me into thinking about extra bogus messages in additional to the current one. Using another word like "total" may be less confusing. Cheers, Longman
On Wed, Nov 02, 2022 at 11:40:09AM -0700, Paul E. McKenney wrote: > The clocksource watchdog will reject measurements that are excessively > delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second > watchdog interval. On an extremely busy system, this can result in a > console message being printed every two seconds. This is excessively > noisy for a non-error condition. > > Therefore, apply exponential backoff to these messages. This exponential > backoff is capped at 1024 times the watchdog interval, which comes to > not quite one message per ten minutes. > > Please note that the bogus watchdog reads that occur when the watchdog > interval is less than 0.125 seconds are still printed unconditionally > because these likely correspond to a serious error condition in the > timer code or hardware. I saw there is ongoing discussion about some wording, overall it looks good to me, thanks! Reviewed-by: Feng Tang <feng.tang@intel.com> > [ paulmck: Apply Feng Tang feedback. ] > > Reported-by: Waiman Long <longman@redhat.com> > Reported-by: Feng Tang <feng.tang@intel.com> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > Cc: John Stultz <jstultz@google.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Stephen Boyd <sboyd@kernel.org> > Cc: Feng Tang <feng.tang@intel.com> > Cc: Waiman Long <longman@redhat.com> > --- > include/linux/clocksource.h | 4 ++++ > kernel/time/clocksource.c | 31 +++++++++++++++++++++++++------ > 2 files changed, 29 insertions(+), 6 deletions(-) > > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h > index 1d42d4b173271..daac05aedf56a 100644 > --- a/include/linux/clocksource.h > +++ b/include/linux/clocksource.h > @@ -125,6 +125,10 @@ struct clocksource { > struct list_head wd_list; > u64 cs_last; > u64 wd_last; > + u64 wd_last_bogus; > + int wd_bogus_shift; > + unsigned long wd_bogus_count; > + unsigned long wd_bogus_count_last; > #endif > struct module *owner; > }; > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c > index 3f5317faf891f..de8047b6720f5 100644 > --- a/kernel/time/clocksource.c > +++ b/kernel/time/clocksource.c > @@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused) > > /* Check for bogus measurements. */ > wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL); > - if (wd_nsec < (wdi >> 2)) { > - /* This usually indicates broken timer code or hardware. */ > - pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); > + if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) { > + bool needwarn = false; > + u64 wd_lb; > + > + cs->wd_bogus_count++; > + if (!cs->wd_bogus_shift) { > + needwarn = true; > + } else { > + delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask); > + wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift); > + if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb) > + needwarn = true; > + } > + if (needwarn) { > + /* This can happen on busy systems, which can delay the watchdog. */ > + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count - cs->wd_bogus_count_last); > + cs->wd_last_bogus = wdnow; > + if (cs->wd_bogus_shift < 10) > + cs->wd_bogus_shift++; > + cs->wd_bogus_count_last = cs->wd_bogus_count; > + } > continue; > } > - if (wd_nsec > (wdi << 2)) { > - /* This can happen on busy systems, which can delay the watchdog. */ > - pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); > + /* Check too-short measurements second to handle wrap. */ > + if (wd_nsec < (wdi >> 2) || cs_nsec < (wdi >> 2)) { > + /* This usually indicates broken timer code or hardware. */ > + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); > continue; > } > > -- > 2.31.1.189.g2e36527f23 >
On Thu, Nov 03, 2022 at 09:01:45PM -0400, Waiman Long wrote: > On 11/3/22 20:26, Paul E. McKenney wrote: > > On Thu, Nov 03, 2022 at 08:20:27PM -0400, Waiman Long wrote: > > > On 11/3/22 16:49, Paul E. McKenney wrote: > > > > commit da44b8af99222ff8761a98ca8c00837a7d607d28 > > > > Author: Paul E. McKenney<paulmck@kernel.org> > > > > Date: Fri Oct 28 10:38:58 2022 -0700 > > > > > > > > clocksource: Exponential backoff for load-induced bogus watchdog reads > > > > The clocksource watchdog will reject measurements that are excessively > > > > delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second > > > > watchdog interval. On an extremely busy system, this can result in a > > > > console message being printed every two seconds. This is excessively > > > > noisy for a non-error condition. > > > > Therefore, apply exponential backoff to these messages. This exponential > > > > backoff is capped at 1024 times the watchdog interval, which comes to > > > > not quite one message per ten minutes. > > > > Please note that the bogus watchdog reads that occur when the watchdog > > > > interval is less than 0.125 seconds are still printed unconditionally > > > > because these likely correspond to a serious error condition in the > > > > timer code or hardware. > > > > [ paulmck: Apply Feng Tang feedback. ] > > > > [ paulmck: Apply Waiman Long feedback. ] > > > > Reported-by: Waiman Long<longman@redhat.com> > > > > Reported-by: Feng Tang<feng.tang@intel.com> > > > > Signed-off-by: Paul E. McKenney<paulmck@kernel.org> > > > > Cc: John Stultz<jstultz@google.com> > > > > Cc: Thomas Gleixner<tglx@linutronix.de> > > > > Cc: Stephen Boyd<sboyd@kernel.org> > > > > Cc: Feng Tang<feng.tang@intel.com> > > > > Cc: Waiman Long<longman@redhat.com> > > > > > > > > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h > > > > index 1d42d4b173271..23b73f2293d6d 100644 > > > > --- a/include/linux/clocksource.h > > > > +++ b/include/linux/clocksource.h > > > > @@ -125,6 +125,9 @@ struct clocksource { > > > > struct list_head wd_list; > > > > u64 cs_last; > > > > u64 wd_last; > > > > + u64 wd_last_bogus; > > > > + int wd_bogus_shift; > > > > + unsigned long wd_bogus_count; > > > > #endif > > > > struct module *owner; > > > > }; > > > > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c > > > > index 3f5317faf891f..1eefb56505350 100644 > > > > --- a/kernel/time/clocksource.c > > > > +++ b/kernel/time/clocksource.c > > > > @@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused) > > > > /* Check for bogus measurements. */ > > > > wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL); > > > > - if (wd_nsec < (wdi >> 2)) { > > > > - /* This usually indicates broken timer code or hardware. */ > > > > - pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); > > > > + if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) { > > > > + bool needwarn = false; > > > > + u64 wd_lb; > > > > + > > > > + cs->wd_bogus_count++; > > > > + if (!cs->wd_bogus_shift) { > > > > + needwarn = true; > > > > + } else { > > > > + delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask); > > > > + wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift); > > > > + if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb) > > > > + needwarn = true; > > > > + } > > > > + if (needwarn) { > > > > + /* This can happen on busy systems, which can delay the watchdog. */ > > > > + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count); > > > Just one question, does "%lu additional" means the number of bogus count > > > that doesn't meet the needwarn requirement and hence skipped. If so, I think > > > you have to use "cs->wd_bogus_cnt - 1". Other than that, the change looks > > > good to me. > > It means the number since the last report, or, for the first report, > > the number since boot. > > > > Does that work for you? > > OK, I think the word "additional" tricks me into thinking about extra bogus > messages in additional to the current one. Using another word like "total" > may be less confusing. My concern with "total" is that people might think that the numbers meant the total number of instances since boot. So how about "(9 since last message)" or similar? Thanx, Paul
On Fri, Nov 04, 2022 at 10:16:53AM +0800, Feng Tang wrote: > On Wed, Nov 02, 2022 at 11:40:09AM -0700, Paul E. McKenney wrote: > > The clocksource watchdog will reject measurements that are excessively > > delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second > > watchdog interval. On an extremely busy system, this can result in a > > console message being printed every two seconds. This is excessively > > noisy for a non-error condition. > > > > Therefore, apply exponential backoff to these messages. This exponential > > backoff is capped at 1024 times the watchdog interval, which comes to > > not quite one message per ten minutes. > > > > Please note that the bogus watchdog reads that occur when the watchdog > > interval is less than 0.125 seconds are still printed unconditionally > > because these likely correspond to a serious error condition in the > > timer code or hardware. > > I saw there is ongoing discussion about some wording, overall it > looks good to me, thanks! > > Reviewed-by: Feng Tang <feng.tang@intel.com> Thank you! I will apply both of your Reviewed-by tags on the next rebase, by which time hopefully Waiman and I come to agreement on wording. ;-) Thanx, Paul > > [ paulmck: Apply Feng Tang feedback. ] > > > > Reported-by: Waiman Long <longman@redhat.com> > > Reported-by: Feng Tang <feng.tang@intel.com> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > Cc: John Stultz <jstultz@google.com> > > Cc: Thomas Gleixner <tglx@linutronix.de> > > Cc: Stephen Boyd <sboyd@kernel.org> > > Cc: Feng Tang <feng.tang@intel.com> > > Cc: Waiman Long <longman@redhat.com> > > --- > > include/linux/clocksource.h | 4 ++++ > > kernel/time/clocksource.c | 31 +++++++++++++++++++++++++------ > > 2 files changed, 29 insertions(+), 6 deletions(-) > > > > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h > > index 1d42d4b173271..daac05aedf56a 100644 > > --- a/include/linux/clocksource.h > > +++ b/include/linux/clocksource.h > > @@ -125,6 +125,10 @@ struct clocksource { > > struct list_head wd_list; > > u64 cs_last; > > u64 wd_last; > > + u64 wd_last_bogus; > > + int wd_bogus_shift; > > + unsigned long wd_bogus_count; > > + unsigned long wd_bogus_count_last; > > #endif > > struct module *owner; > > }; > > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c > > index 3f5317faf891f..de8047b6720f5 100644 > > --- a/kernel/time/clocksource.c > > +++ b/kernel/time/clocksource.c > > @@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused) > > > > /* Check for bogus measurements. */ > > wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL); > > - if (wd_nsec < (wdi >> 2)) { > > - /* This usually indicates broken timer code or hardware. */ > > - pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); > > + if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) { > > + bool needwarn = false; > > + u64 wd_lb; > > + > > + cs->wd_bogus_count++; > > + if (!cs->wd_bogus_shift) { > > + needwarn = true; > > + } else { > > + delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask); > > + wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift); > > + if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb) > > + needwarn = true; > > + } > > + if (needwarn) { > > + /* This can happen on busy systems, which can delay the watchdog. */ > > + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count - cs->wd_bogus_count_last); > > + cs->wd_last_bogus = wdnow; > > + if (cs->wd_bogus_shift < 10) > > + cs->wd_bogus_shift++; > > + cs->wd_bogus_count_last = cs->wd_bogus_count; > > + } > > continue; > > } > > - if (wd_nsec > (wdi << 2)) { > > - /* This can happen on busy systems, which can delay the watchdog. */ > > - pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); > > + /* Check too-short measurements second to handle wrap. */ > > + if (wd_nsec < (wdi >> 2) || cs_nsec < (wdi >> 2)) { > > + /* This usually indicates broken timer code or hardware. */ > > + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); > > continue; > > } > > > > -- > > 2.31.1.189.g2e36527f23 > >
On 11/3/22 22:23, Paul E. McKenney wrote: > On Thu, Nov 03, 2022 at 09:01:45PM -0400, Waiman Long wrote: >> On 11/3/22 20:26, Paul E. McKenney wrote: >>> On Thu, Nov 03, 2022 at 08:20:27PM -0400, Waiman Long wrote: >>>> On 11/3/22 16:49, Paul E. McKenney wrote: >>>>> commit da44b8af99222ff8761a98ca8c00837a7d607d28 >>>>> Author: Paul E. McKenney<paulmck@kernel.org> >>>>> Date: Fri Oct 28 10:38:58 2022 -0700 >>>>> >>>>> clocksource: Exponential backoff for load-induced bogus watchdog reads >>>>> The clocksource watchdog will reject measurements that are excessively >>>>> delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second >>>>> watchdog interval. On an extremely busy system, this can result in a >>>>> console message being printed every two seconds. This is excessively >>>>> noisy for a non-error condition. >>>>> Therefore, apply exponential backoff to these messages. This exponential >>>>> backoff is capped at 1024 times the watchdog interval, which comes to >>>>> not quite one message per ten minutes. >>>>> Please note that the bogus watchdog reads that occur when the watchdog >>>>> interval is less than 0.125 seconds are still printed unconditionally >>>>> because these likely correspond to a serious error condition in the >>>>> timer code or hardware. >>>>> [ paulmck: Apply Feng Tang feedback. ] >>>>> [ paulmck: Apply Waiman Long feedback. ] >>>>> Reported-by: Waiman Long<longman@redhat.com> >>>>> Reported-by: Feng Tang<feng.tang@intel.com> >>>>> Signed-off-by: Paul E. McKenney<paulmck@kernel.org> >>>>> Cc: John Stultz<jstultz@google.com> >>>>> Cc: Thomas Gleixner<tglx@linutronix.de> >>>>> Cc: Stephen Boyd<sboyd@kernel.org> >>>>> Cc: Feng Tang<feng.tang@intel.com> >>>>> Cc: Waiman Long<longman@redhat.com> >>>>> >>>>> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h >>>>> index 1d42d4b173271..23b73f2293d6d 100644 >>>>> --- a/include/linux/clocksource.h >>>>> +++ b/include/linux/clocksource.h >>>>> @@ -125,6 +125,9 @@ struct clocksource { >>>>> struct list_head wd_list; >>>>> u64 cs_last; >>>>> u64 wd_last; >>>>> + u64 wd_last_bogus; >>>>> + int wd_bogus_shift; >>>>> + unsigned long wd_bogus_count; >>>>> #endif >>>>> struct module *owner; >>>>> }; >>>>> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c >>>>> index 3f5317faf891f..1eefb56505350 100644 >>>>> --- a/kernel/time/clocksource.c >>>>> +++ b/kernel/time/clocksource.c >>>>> @@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused) >>>>> /* Check for bogus measurements. */ >>>>> wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL); >>>>> - if (wd_nsec < (wdi >> 2)) { >>>>> - /* This usually indicates broken timer code or hardware. */ >>>>> - pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); >>>>> + if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) { >>>>> + bool needwarn = false; >>>>> + u64 wd_lb; >>>>> + >>>>> + cs->wd_bogus_count++; >>>>> + if (!cs->wd_bogus_shift) { >>>>> + needwarn = true; >>>>> + } else { >>>>> + delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask); >>>>> + wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift); >>>>> + if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb) >>>>> + needwarn = true; >>>>> + } >>>>> + if (needwarn) { >>>>> + /* This can happen on busy systems, which can delay the watchdog. */ >>>>> + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count); >>>> Just one question, does "%lu additional" means the number of bogus count >>>> that doesn't meet the needwarn requirement and hence skipped. If so, I think >>>> you have to use "cs->wd_bogus_cnt - 1". Other than that, the change looks >>>> good to me. >>> It means the number since the last report, or, for the first report, >>> the number since boot. >>> >>> Does that work for you? >> OK, I think the word "additional" tricks me into thinking about extra bogus >> messages in additional to the current one. Using another word like "total" >> may be less confusing. > My concern with "total" is that people might think that the numbers > meant the total number of instances since boot. > > So how about "(9 since last message)" or similar? > > Thanx, Paul Yes, that looks good to me. Thanks! Longman
On Fri, Nov 04, 2022 at 09:55:02AM -0400, Waiman Long wrote: > On 11/3/22 22:23, Paul E. McKenney wrote: > > On Thu, Nov 03, 2022 at 09:01:45PM -0400, Waiman Long wrote: > > > On 11/3/22 20:26, Paul E. McKenney wrote: > > > > On Thu, Nov 03, 2022 at 08:20:27PM -0400, Waiman Long wrote: > > > > > On 11/3/22 16:49, Paul E. McKenney wrote: [ . . . ] > > > > > > + needwarn = true; > > > > > > + } > > > > > > + if (needwarn) { > > > > > > + /* This can happen on busy systems, which can delay the watchdog. */ > > > > > > + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count); > > > > > Just one question, does "%lu additional" means the number of bogus count > > > > > that doesn't meet the needwarn requirement and hence skipped. If so, I think > > > > > you have to use "cs->wd_bogus_cnt - 1". Other than that, the change looks > > > > > good to me. > > > > It means the number since the last report, or, for the first report, > > > > the number since boot. > > > > > > > > Does that work for you? > > > OK, I think the word "additional" tricks me into thinking about extra bogus > > > messages in additional to the current one. Using another word like "total" > > > may be less confusing. > > My concern with "total" is that people might think that the numbers > > meant the total number of instances since boot. > > > > So how about "(9 since last message)" or similar? > > > > Thanx, Paul > > Yes, that looks good to me. Thank you, and please see below for the updated patch. Thanx, Paul ------------------------------------------------------------------------ commit a7dc308e8359eafb58df360e06b66ecbf79a4d0b Author: Paul E. McKenney <paulmck@kernel.org> Date: Fri Oct 28 10:38:58 2022 -0700 clocksource: Exponential backoff for load-induced bogus watchdog reads The clocksource watchdog will reject measurements that are excessively delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second watchdog interval. On an extremely busy system, this can result in a console message being printed every two seconds. This is excessively noisy for a non-error condition. Therefore, apply exponential backoff to these messages. This exponential backoff is capped at 1024 times the watchdog interval, which comes to not quite one message per ten minutes. Please note that the bogus watchdog reads that occur when the watchdog interval is less than 0.125 seconds are still printed unconditionally because these likely correspond to a serious error condition in the timer code or hardware. [ paulmck: Apply Feng Tang feedback. ] [ paulmck: Apply Waiman Long feedback. ] Reported-by: Waiman Long <longman@redhat.com> Reported-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Feng Tang <feng.tang@intel.com> Cc: John Stultz <jstultz@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Waiman Long <longman@redhat.com> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index 1d42d4b173271..23b73f2293d6d 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -125,6 +125,9 @@ struct clocksource { struct list_head wd_list; u64 cs_last; u64 wd_last; + u64 wd_last_bogus; + int wd_bogus_shift; + unsigned long wd_bogus_count; #endif struct module *owner; }; diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 3f5317faf891f..4015ec6503a52 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused) /* Check for bogus measurements. */ wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL); - if (wd_nsec < (wdi >> 2)) { - /* This usually indicates broken timer code or hardware. */ - pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); + if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) { + bool needwarn = false; + u64 wd_lb; + + cs->wd_bogus_count++; + if (!cs->wd_bogus_shift) { + needwarn = true; + } else { + delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask); + wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift); + if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb) + needwarn = true; + } + if (needwarn) { + /* This can happen on busy systems, which can delay the watchdog. */ + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu since last message), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count); + cs->wd_last_bogus = wdnow; + if (cs->wd_bogus_shift < 10) + cs->wd_bogus_shift++; + cs->wd_bogus_count = 0; + } continue; } - if (wd_nsec > (wdi << 2)) { - /* This can happen on busy systems, which can delay the watchdog. */ - pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); + /* Check too-short measurements second to handle wrap. */ + if (wd_nsec < (wdi >> 2) || cs_nsec < (wdi >> 2)) { + /* This usually indicates broken timer code or hardware. */ + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); continue; }
On 11/4/22 22:38, Paul E. McKenney wrote: > On Fri, Nov 04, 2022 at 09:55:02AM -0400, Waiman Long wrote: >> On 11/3/22 22:23, Paul E. McKenney wrote: >>> On Thu, Nov 03, 2022 at 09:01:45PM -0400, Waiman Long wrote: >>>> On 11/3/22 20:26, Paul E. McKenney wrote: >>>>> On Thu, Nov 03, 2022 at 08:20:27PM -0400, Waiman Long wrote: >>>>>> On 11/3/22 16:49, Paul E. McKenney wrote: > [ . . . ] > >>>>>>> + needwarn = true; >>>>>>> + } >>>>>>> + if (needwarn) { >>>>>>> + /* This can happen on busy systems, which can delay the watchdog. */ >>>>>>> + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count); >>>>>> Just one question, does "%lu additional" means the number of bogus count >>>>>> that doesn't meet the needwarn requirement and hence skipped. If so, I think >>>>>> you have to use "cs->wd_bogus_cnt - 1". Other than that, the change looks >>>>>> good to me. >>>>> It means the number since the last report, or, for the first report, >>>>> the number since boot. >>>>> >>>>> Does that work for you? >>>> OK, I think the word "additional" tricks me into thinking about extra bogus >>>> messages in additional to the current one. Using another word like "total" >>>> may be less confusing. >>> My concern with "total" is that people might think that the numbers >>> meant the total number of instances since boot. >>> >>> So how about "(9 since last message)" or similar? >>> >>> Thanx, Paul >> Yes, that looks good to me. > Thank you, and please see below for the updated patch. > > Thanx, Paul > > ------------------------------------------------------------------------ > > commit a7dc308e8359eafb58df360e06b66ecbf79a4d0b > Author: Paul E. McKenney <paulmck@kernel.org> > Date: Fri Oct 28 10:38:58 2022 -0700 > > clocksource: Exponential backoff for load-induced bogus watchdog reads > > The clocksource watchdog will reject measurements that are excessively > delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second > watchdog interval. On an extremely busy system, this can result in a > console message being printed every two seconds. This is excessively > noisy for a non-error condition. > > Therefore, apply exponential backoff to these messages. This exponential > backoff is capped at 1024 times the watchdog interval, which comes to > not quite one message per ten minutes. > > Please note that the bogus watchdog reads that occur when the watchdog > interval is less than 0.125 seconds are still printed unconditionally > because these likely correspond to a serious error condition in the > timer code or hardware. > > [ paulmck: Apply Feng Tang feedback. ] > [ paulmck: Apply Waiman Long feedback. ] > > Reported-by: Waiman Long <longman@redhat.com> > Reported-by: Feng Tang <feng.tang@intel.com> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > Reviewed-by: Feng Tang <feng.tang@intel.com> > Cc: John Stultz <jstultz@google.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Stephen Boyd <sboyd@kernel.org> > Cc: Waiman Long <longman@redhat.com> > > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h > index 1d42d4b173271..23b73f2293d6d 100644 > --- a/include/linux/clocksource.h > +++ b/include/linux/clocksource.h > @@ -125,6 +125,9 @@ struct clocksource { > struct list_head wd_list; > u64 cs_last; > u64 wd_last; > + u64 wd_last_bogus; > + int wd_bogus_shift; > + unsigned long wd_bogus_count; > #endif > struct module *owner; > }; > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c > index 3f5317faf891f..4015ec6503a52 100644 > --- a/kernel/time/clocksource.c > +++ b/kernel/time/clocksource.c > @@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused) > > /* Check for bogus measurements. */ > wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL); > - if (wd_nsec < (wdi >> 2)) { > - /* This usually indicates broken timer code or hardware. */ > - pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); > + if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) { > + bool needwarn = false; > + u64 wd_lb; > + > + cs->wd_bogus_count++; > + if (!cs->wd_bogus_shift) { > + needwarn = true; > + } else { > + delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask); > + wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift); > + if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb) > + needwarn = true; > + } > + if (needwarn) { > + /* This can happen on busy systems, which can delay the watchdog. */ > + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu since last message), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count); > + cs->wd_last_bogus = wdnow; > + if (cs->wd_bogus_shift < 10) > + cs->wd_bogus_shift++; > + cs->wd_bogus_count = 0; > + } > continue; > } > - if (wd_nsec > (wdi << 2)) { > - /* This can happen on busy systems, which can delay the watchdog. */ > - pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); > + /* Check too-short measurements second to handle wrap. */ > + if (wd_nsec < (wdi >> 2) || cs_nsec < (wdi >> 2)) { > + /* This usually indicates broken timer code or hardware. */ > + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); > continue; > } > > Looks good to me. Reviewed-by: Waiman Long <longman@redhat.com>
On Fri, Nov 04, 2022 at 10:49:13PM -0400, Waiman Long wrote: > > On 11/4/22 22:38, Paul E. McKenney wrote: > > On Fri, Nov 04, 2022 at 09:55:02AM -0400, Waiman Long wrote: > > > On 11/3/22 22:23, Paul E. McKenney wrote: > > > > On Thu, Nov 03, 2022 at 09:01:45PM -0400, Waiman Long wrote: > > > > > On 11/3/22 20:26, Paul E. McKenney wrote: > > > > > > On Thu, Nov 03, 2022 at 08:20:27PM -0400, Waiman Long wrote: > > > > > > > On 11/3/22 16:49, Paul E. McKenney wrote: > > [ . . . ] > > > > > > > > > > + needwarn = true; > > > > > > > > + } > > > > > > > > + if (needwarn) { > > > > > > > > + /* This can happen on busy systems, which can delay the watchdog. */ > > > > > > > > + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count); > > > > > > > Just one question, does "%lu additional" means the number of bogus count > > > > > > > that doesn't meet the needwarn requirement and hence skipped. If so, I think > > > > > > > you have to use "cs->wd_bogus_cnt - 1". Other than that, the change looks > > > > > > > good to me. > > > > > > It means the number since the last report, or, for the first report, > > > > > > the number since boot. > > > > > > > > > > > > Does that work for you? > > > > > OK, I think the word "additional" tricks me into thinking about extra bogus > > > > > messages in additional to the current one. Using another word like "total" > > > > > may be less confusing. > > > > My concern with "total" is that people might think that the numbers > > > > meant the total number of instances since boot. > > > > > > > > So how about "(9 since last message)" or similar? > > > > > > > > Thanx, Paul > > > Yes, that looks good to me. > > Thank you, and please see below for the updated patch. > > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > commit a7dc308e8359eafb58df360e06b66ecbf79a4d0b > > Author: Paul E. McKenney <paulmck@kernel.org> > > Date: Fri Oct 28 10:38:58 2022 -0700 > > > > clocksource: Exponential backoff for load-induced bogus watchdog reads > > The clocksource watchdog will reject measurements that are excessively > > delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second > > watchdog interval. On an extremely busy system, this can result in a > > console message being printed every two seconds. This is excessively > > noisy for a non-error condition. > > Therefore, apply exponential backoff to these messages. This exponential > > backoff is capped at 1024 times the watchdog interval, which comes to > > not quite one message per ten minutes. > > Please note that the bogus watchdog reads that occur when the watchdog > > interval is less than 0.125 seconds are still printed unconditionally > > because these likely correspond to a serious error condition in the > > timer code or hardware. > > [ paulmck: Apply Feng Tang feedback. ] > > [ paulmck: Apply Waiman Long feedback. ] > > Reported-by: Waiman Long <longman@redhat.com> > > Reported-by: Feng Tang <feng.tang@intel.com> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > Reviewed-by: Feng Tang <feng.tang@intel.com> > > Cc: John Stultz <jstultz@google.com> > > Cc: Thomas Gleixner <tglx@linutronix.de> > > Cc: Stephen Boyd <sboyd@kernel.org> > > Cc: Waiman Long <longman@redhat.com> > > > > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h > > index 1d42d4b173271..23b73f2293d6d 100644 > > --- a/include/linux/clocksource.h > > +++ b/include/linux/clocksource.h > > @@ -125,6 +125,9 @@ struct clocksource { > > struct list_head wd_list; > > u64 cs_last; > > u64 wd_last; > > + u64 wd_last_bogus; > > + int wd_bogus_shift; > > + unsigned long wd_bogus_count; > > #endif > > struct module *owner; > > }; > > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c > > index 3f5317faf891f..4015ec6503a52 100644 > > --- a/kernel/time/clocksource.c > > +++ b/kernel/time/clocksource.c > > @@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused) > > /* Check for bogus measurements. */ > > wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL); > > - if (wd_nsec < (wdi >> 2)) { > > - /* This usually indicates broken timer code or hardware. */ > > - pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); > > + if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) { > > + bool needwarn = false; > > + u64 wd_lb; > > + > > + cs->wd_bogus_count++; > > + if (!cs->wd_bogus_shift) { > > + needwarn = true; > > + } else { > > + delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask); > > + wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift); > > + if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb) > > + needwarn = true; > > + } > > + if (needwarn) { > > + /* This can happen on busy systems, which can delay the watchdog. */ > > + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu since last message), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count); > > + cs->wd_last_bogus = wdnow; > > + if (cs->wd_bogus_shift < 10) > > + cs->wd_bogus_shift++; > > + cs->wd_bogus_count = 0; > > + } > > continue; > > } > > - if (wd_nsec > (wdi << 2)) { > > - /* This can happen on busy systems, which can delay the watchdog. */ > > - pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); > > + /* Check too-short measurements second to handle wrap. */ > > + if (wd_nsec < (wdi >> 2) || cs_nsec < (wdi >> 2)) { > > + /* This usually indicates broken timer code or hardware. */ > > + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); > > continue; > > } > > > Looks good to me. > > Reviewed-by: Waiman Long <longman@redhat.com> Thank you! I will apply this on my next rebase. Thanx, Paul
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index 1d42d4b173271..daac05aedf56a 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -125,6 +125,10 @@ struct clocksource { struct list_head wd_list; u64 cs_last; u64 wd_last; + u64 wd_last_bogus; + int wd_bogus_shift; + unsigned long wd_bogus_count; + unsigned long wd_bogus_count_last; #endif struct module *owner; }; diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 3f5317faf891f..de8047b6720f5 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused) /* Check for bogus measurements. */ wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL); - if (wd_nsec < (wdi >> 2)) { - /* This usually indicates broken timer code or hardware. */ - pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); + if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) { + bool needwarn = false; + u64 wd_lb; + + cs->wd_bogus_count++; + if (!cs->wd_bogus_shift) { + needwarn = true; + } else { + delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask); + wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift); + if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb) + needwarn = true; + } + if (needwarn) { + /* This can happen on busy systems, which can delay the watchdog. */ + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count - cs->wd_bogus_count_last); + cs->wd_last_bogus = wdnow; + if (cs->wd_bogus_shift < 10) + cs->wd_bogus_shift++; + cs->wd_bogus_count_last = cs->wd_bogus_count; + } continue; } - if (wd_nsec > (wdi << 2)) { - /* This can happen on busy systems, which can delay the watchdog. */ - pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); + /* Check too-short measurements second to handle wrap. */ + if (wd_nsec < (wdi >> 2) || cs_nsec < (wdi >> 2)) { + /* This usually indicates broken timer code or hardware. */ + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL); continue; }