From patchwork Mon Feb 13 18:48:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: tip-bot2 for Thomas Gleixner X-Patchwork-Id: 56447 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp2513300wrn; Mon, 13 Feb 2023 10:53:52 -0800 (PST) X-Google-Smtp-Source: AK7set+1XdRJQ3TmBLqMV2Pz6e3OkXfxvmFgBt6DsTQCWJRecyuwTEdjY8Aj8N7gH45b4pdlMpPn X-Received: by 2002:a17:907:20da:b0:8ab:a378:5f96 with SMTP id qq26-20020a17090720da00b008aba3785f96mr18179763ejb.3.1676314432434; Mon, 13 Feb 2023 10:53:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676314432; cv=none; d=google.com; s=arc-20160816; b=WvP9jNc1zpZm/lqv9QfoiEPkduGC+Mddr3xEOOeYLIs4Qj8W0P5KigUBcufgn8rBsy 1k8LBGm9bVXhdqqxxgiMpawS3eA74ylqkYuFurjTyFX19QjUsCevnnfQgcBXf/3om4LR yuVsd6gRRDlfhg58H/eFYLlHIh3lr4rYzjMIAZPN8wQrG3QJVpvfBYtSOexd+yPUjLQU T9jiZdKXlPljFREVyPPcojcZqpMzVpZnFkDyledWgwWTALjCmAs9H6AUOLYXdZfxKso2 wAvXOJhNlx4UByA3T7jvGCloB5pljrlKbEp4TA0UAxzgofMvTiRIHUK0BIqfuM8gTDCb lSpA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:robot-unsubscribe :robot-id:message-id:mime-version:cc:subject:to:reply-to:sender:from :dkim-signature:dkim-signature:date; bh=dHNwsm3yjCynJiuBVDd4PecaPqNLpHOjRsyZeHKr3qY=; b=mlEV3bhXDC1GTgULNEaXOdapM81evie6psOUaMTdfsx0uBMR74yfUiWCuvZAsMDDKm QIOgIBLIRU7eY7QsXJu1phc1kP0FcQQXWBs5RaQ3upjKF6lfEjAOvivYsCsH6ftKGBBv 2w0c9uVRrwNJq6lptbwKUeQNvH+/RtE6GfYY49+FOrW95RX5LK1jJVkK4T2nt8Bq0QZP EKEsXMCUq9fGnS1au76JV6+FLxWOipdRe0fuBT8qLSRRCdU2nQZMXtNZTGD30OL8eWq0 vOUdNCkpkkFDp2Tet1qz3by2H8GdyARvLkbBXusFLbkw/qJ51EOYCeeEF6XVi166Veak SP3Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=pmO8TXtc; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=4AZpTZsT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 1-20020a170906100100b008b12654c973si717345ejm.269.2023.02.13.10.53.29; Mon, 13 Feb 2023 10:53:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=pmO8TXtc; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=4AZpTZsT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229946AbjBMSsx (ORCPT + 99 others); Mon, 13 Feb 2023 13:48:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42366 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229785AbjBMSsg (ORCPT ); Mon, 13 Feb 2023 13:48:36 -0500 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 87E861CAF7; Mon, 13 Feb 2023 10:48:34 -0800 (PST) Date: Mon, 13 Feb 2023 18:48:32 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1676314113; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=dHNwsm3yjCynJiuBVDd4PecaPqNLpHOjRsyZeHKr3qY=; b=pmO8TXtcqUB4dMLm7Sydu/bUb+k99sw4nS6k3ZFSGA6qUWaegXw2S6fIHKA/ER37MKBkHb SzcpguwcdjqK9uZ46lzMSHk6RX/t7HNdq3lymr8LJBLlpxCEnhD2rHS6hkLO8Ci6GWr6mH gS0qGzCs0kAiFMIPOoKWCNsfV4Q7jaH/lEk/z1ENhHyFzumD37Vw2K/jkIDJF+9azBQJ4f vhxfJV5PJyWNEdHCIk0XGyX66xxFUkWlcgJXKC1j7RTJwhrlg0w2kbQcAhSMtSXEChj3+A 7tWIhMFp28x4CYbuOCtx3Axucf5UB7b6gfup1zcyZ8RqF4ijUVuVC+FeNhuXUg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1676314113; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=dHNwsm3yjCynJiuBVDd4PecaPqNLpHOjRsyZeHKr3qY=; b=4AZpTZsTzJkzLnqTbYHBIOZ9ylCcYLficUw7uwbh0t9UZE8NesAy+xa+nk1qoKGWs1lSBb Hx83R7GDqKYEt0Bg== From: "tip-bot2 for Feng Tang" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: timers/core] clocksource: Suspend the watchdog temporarily when high read latency detected Cc: Feng Tang , Waiman Long , John Stultz , Thomas Gleixner , Stephen Boyd , "Paul E. McKenney" , x86@kernel.org, linux-kernel@vger.kernel.org MIME-Version: 1.0 Message-ID: <167631411278.4906.130996180925938138.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757743082201607372?= X-GMAIL-MSGID: =?utf-8?q?1757743082201607372?= The following commit has been merged into the timers/core branch of tip: Commit-ID: b7082cdfc464bf9231300605d03eebf943dda307 Gitweb: https://git.kernel.org/tip/b7082cdfc464bf9231300605d03eebf943dda307 Author: Feng Tang AuthorDate: Tue, 20 Dec 2022 16:25:12 +08:00 Committer: Paul E. McKenney CommitterDate: Tue, 24 Jan 2023 15:12:48 -08:00 clocksource: Suspend the watchdog temporarily when high read latency detected Bugs have been reported on 8 sockets x86 machines in which the TSC was wrongly disabled when the system is under heavy workload. [ 818.380354] clocksource: timekeeping watchdog on CPU336: hpet wd-wd read-back delay of 1203520ns [ 818.436160] clocksource: wd-tsc-wd read-back delay of 181880ns, clock-skew test skipped! [ 819.402962] clocksource: timekeeping watchdog on CPU338: hpet wd-wd read-back delay of 324000ns [ 819.448036] clocksource: wd-tsc-wd read-back delay of 337240ns, clock-skew test skipped! [ 819.880863] clocksource: timekeeping watchdog on CPU339: hpet read-back delay of 150280ns, attempt 3, marking unstable [ 819.936243] tsc: Marking TSC unstable due to clocksource watchdog [ 820.068173] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. [ 820.092382] sched_clock: Marking unstable (818769414384, 1195404998) [ 820.643627] clocksource: Checking clocksource tsc synchronization from CPU 267 to CPUs 0,4,25,70,126,430,557,564. [ 821.067990] clocksource: Switched to clocksource hpet This can be reproduced by running memory intensive 'stream' tests, or some of the stress-ng subcases such as 'ioport'. The reason for these issues is the when system is under heavy load, the read latency of the clocksources can be very high. Even lightweight TSC reads can show high latencies, and latencies are much worse for external clocksources such as HPET or the APIC PM timer. These latencies can result in false-positive clocksource-unstable determinations. These issues were initially reported by a customer running on a production system, and this problem was reproduced on several generations of Xeon servers, especially when running the stress-ng test. These Xeon servers were not production systems, but they did have the latest steppings and firmware. Given that the clocksource watchdog is a continual diagnostic check with frequency of twice a second, there is no need to rush it when the system is under heavy load. Therefore, when high clocksource read latencies are detected, suspend the watchdog timer for 5 minutes. Signed-off-by: Feng Tang Acked-by: Waiman Long Cc: John Stultz Cc: Thomas Gleixner Cc: Stephen Boyd Cc: Feng Tang Signed-off-by: Paul E. McKenney --- kernel/time/clocksource.c | 45 +++++++++++++++++++++++++++----------- 1 file changed, 32 insertions(+), 13 deletions(-) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index fc486cd..91836b7 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -387,6 +387,15 @@ void clocksource_verify_percpu(struct clocksource *cs) } EXPORT_SYMBOL_GPL(clocksource_verify_percpu); +static inline void clocksource_reset_watchdog(void) +{ + struct clocksource *cs; + + list_for_each_entry(cs, &watchdog_list, wd_list) + cs->flags &= ~CLOCK_SOURCE_WATCHDOG; +} + + static void clocksource_watchdog(struct timer_list *unused) { u64 csnow, wdnow, cslast, wdlast, delta; @@ -394,6 +403,7 @@ static void clocksource_watchdog(struct timer_list *unused) int64_t wd_nsec, cs_nsec; struct clocksource *cs; enum wd_read_status read_ret; + unsigned long extra_wait = 0; u32 md; spin_lock(&watchdog_lock); @@ -413,13 +423,30 @@ static void clocksource_watchdog(struct timer_list *unused) read_ret = cs_watchdog_read(cs, &csnow, &wdnow); - if (read_ret != WD_READ_SUCCESS) { - if (read_ret == WD_READ_UNSTABLE) - /* Clock readout unreliable, so give it up. */ - __clocksource_unstable(cs); + if (read_ret == WD_READ_UNSTABLE) { + /* Clock readout unreliable, so give it up. */ + __clocksource_unstable(cs); continue; } + /* + * When WD_READ_SKIP is returned, it means the system is likely + * under very heavy load, where the latency of reading + * watchdog/clocksource is very big, and affect the accuracy of + * watchdog check. So give system some space and suspend the + * watchdog check for 5 minutes. + */ + if (read_ret == WD_READ_SKIP) { + /* + * As the watchdog timer will be suspended, and + * cs->last could keep unchanged for 5 minutes, reset + * the counters. + */ + clocksource_reset_watchdog(); + extra_wait = HZ * 300; + break; + } + /* Clocksource initialized ? */ if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) || atomic_read(&watchdog_reset_pending)) { @@ -523,7 +550,7 @@ static void clocksource_watchdog(struct timer_list *unused) * pair clocksource_stop_watchdog() clocksource_start_watchdog(). */ if (!timer_pending(&watchdog_timer)) { - watchdog_timer.expires += WATCHDOG_INTERVAL; + watchdog_timer.expires += WATCHDOG_INTERVAL + extra_wait; add_timer_on(&watchdog_timer, next_cpu); } out: @@ -548,14 +575,6 @@ static inline void clocksource_stop_watchdog(void) watchdog_running = 0; } -static inline void clocksource_reset_watchdog(void) -{ - struct clocksource *cs; - - list_for_each_entry(cs, &watchdog_list, wd_list) - cs->flags &= ~CLOCK_SOURCE_WATCHDOG; -} - static void clocksource_resume_watchdog(void) { atomic_inc(&watchdog_reset_pending);