[v5,02/18] watchdog/perf: More properly prevent false positives with turbo modes

Message ID 20230519101840.v5.2.I843b0d1de3e096ba111a179f3adb16d576bef5c7@changeid
State New
Headers
Series watchdog/hardlockup: Add the buddy hardlockup detector |

Commit Message

Doug Anderson May 19, 2023, 5:18 p.m. UTC
  Currently, in the watchdog_overflow_callback() we first check to see
if the watchdog had been touched and _then_ we handle the workaround
for turbo mode. This order should be reversed.

Specifically, "touching" the hardlockup detector's watchdog should
avoid lockups being detected for one period that should be roughly the
same regardless of whether we're running turbo or not. That means that
we should do the extra accounting for turbo _before_ we look at (and
clear) the global indicating that we've been touched.

NOTE: this fix is made based on code inspection. I am not aware of any
reports where the old code would have generated false positives. That
being said, this order seems more correct and also makes it easier
down the line to share code with the "buddy" hardlockup detector.

Fixes: 7edaeb6841df ("kernel/watchdog: Prevent false positives with turbo modes")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v5:
- ("More properly prevent false ...") promoted to its own patch for v5.

 kernel/watchdog_hld.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
  

Comments

Petr Mladek May 23, 2023, 9:35 a.m. UTC | #1
On Fri 2023-05-19 10:18:26, Douglas Anderson wrote:
> Currently, in the watchdog_overflow_callback() we first check to see
> if the watchdog had been touched and _then_ we handle the workaround
> for turbo mode. This order should be reversed.
> 
> Specifically, "touching" the hardlockup detector's watchdog should
> avoid lockups being detected for one period that should be roughly the
> same regardless of whether we're running turbo or not. That means that
> we should do the extra accounting for turbo _before_ we look at (and
> clear) the global indicating that we've been touched.

The ideal solution would be to reset the turbo-mode-related
variables when the watchdog is touched. And keep checking
watchdog_nmi_touch first.

But this ordering change should be good enough. It causes that
we always check watchdog_nmi_touch when the turbo-more-related
variables are already reset.

> NOTE: this fix is made based on code inspection. I am not aware of any
> reports where the old code would have generated false positives. That
> being said, this order seems more correct and also makes it easier
> down the line to share code with the "buddy" hardlockup detector.
> 
> Fixes: 7edaeb6841df ("kernel/watchdog: Prevent false positives with turbo modes")
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr
  

Patch

diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 247bf0b1582c..1e8a49dc956e 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -114,14 +114,14 @@  static void watchdog_overflow_callback(struct perf_event *event,
 	/* Ensure the watchdog never gets throttled */
 	event->hw.interrupts = 0;
 
+	if (!watchdog_check_timestamp())
+		return;
+
 	if (__this_cpu_read(watchdog_nmi_touch) == true) {
 		__this_cpu_write(watchdog_nmi_touch, false);
 		return;
 	}
 
-	if (!watchdog_check_timestamp())
-		return;
-
 	/* check for a hardlockup
 	 * This is done by making sure our timer interrupt
 	 * is incrementing.  The timer interrupt should have