Message ID | 20230125002730.1471349-6-paulmck@kernel.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp12411wrn; Tue, 24 Jan 2023 16:30:26 -0800 (PST) X-Google-Smtp-Source: AMrXdXt8f8EPtpB1QHjR1weQKb3s/LfSeQr3hpU4qVr6o8/6fZ4JUiq7YJ7e3BkWUGqKCYViY/59 X-Received: by 2002:a17:907:2129:b0:86f:ed46:c07f with SMTP id qo9-20020a170907212900b0086fed46c07fmr30587913ejb.75.1674606626396; Tue, 24 Jan 2023 16:30:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674606626; cv=none; d=google.com; s=arc-20160816; b=l7kqnGkkyhMhqKQ4lGlfv0Np3WXlP8shOt2NJ0UwbdcIUDVrKeTyHxDkJKAy+HrKGX roXyipW/yDXcdasATM1OGSCsa3fg/nCGRa8T5f2ofvjTISPi+5Se99Q31+M6FmCtKJiR 9GI5/658ls85/hPI/ywkm+2wQanxPElZrso0SY/kgS7beJOip3KpsPedyMjPZo2IEvZB zYm5GCsy/KICNbfbhSXtLPCFm4+uWzs1qTavoVkajhg6Ttr1DiN73eSMvfOzu6pmnUph 33eL8nLIycTnwHCnJ17R8Piw3GNun4TMAVKZple0UXWcVZCcq4Eyx97RRK8P/Bsuhlck ysHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=25mF64haFbH/+4jnakI7eErLPLKssRORE9tlub0E9JE=; b=OOrv++tzcZu5OYgm1Ob61fpvMAzS0mqXrjlXIp3srZNhVH6vaLgRuYwoNqYaGKGtX6 /Js7lkJgL8uhOxCLMwDVl5UVB03Kl9W8ulYeEKRqg+ciDpOTJhwmJ4bInnZ3GLpgEzVV x1qnCZ/7QlfGqkpSYNNz7Bk+DsvT0Xfh2FER1CVBatlOcAE0/wUl0TYO7nkhQ7XxeFIV q59AsLLI2TxmVTPzWHmqPGLNmMmUM56cGYevFK70ucl3mhagZlgD9yQjB/1Ytt/4XwKy 8RP0fob5gzy5ARUGZuwzSgU3h4siUzxNPdhOip27KBLa/19eKkOj9YMhMiRnblygkbzv ysUw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=P890L5dM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gj38-20020a170907742600b00877da429e45si3599851ejc.922.2023.01.24.16.29.58; Tue, 24 Jan 2023 16:30:26 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=P890L5dM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234870AbjAYA2p (ORCPT <rfc822;rust.linux@gmail.com> + 99 others); Tue, 24 Jan 2023 19:28:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41114 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234845AbjAYA2m (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 24 Jan 2023 19:28:42 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 76E5D65AF for <linux-kernel@vger.kernel.org>; Tue, 24 Jan 2023 16:28:11 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 089CF61425 for <linux-kernel@vger.kernel.org>; Wed, 25 Jan 2023 00:27:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 20CB2C43321; Wed, 25 Jan 2023 00:27:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1674606453; bh=2niA6R2oYPuWrzIU0B6wVlNq4jSpbdluB2kfj2HTDEE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=P890L5dMPd0z14y217P4NJAsrRGhnNTPWJb+uAeG8aIx0xrLSURfo8KclMKBjrr/j QVpNEjrbBNAWCWVdYlzAXYuyhgIZfGFxc+VZ7iLzq+DDd/ty5W/VEE9aR3TGBwy3Eo TG2lF20+d1MZzl3woOiq2XdeWEo6b3P9ZzJJt7qrfMRvL2BnvkRHToVIZHpizGtF9S rgmcbOO7HCBGo4p7/AWWMq39sHof8LIUrc6OIw0LMExGdS8BDKMchdWlOmTU6QWUH3 RNgxElyg24b5zVZwYzBOvMfj2tGgxSmqPyjPDM0KYnRtYiFHI1mhuq2E/Mh88XW80N c5h9N7Nc69zBw== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 7B5EF5C1CF4; Tue, 24 Jan 2023 16:27:32 -0800 (PST) From: "Paul E. McKenney" <paulmck@kernel.org> To: tglx@linutronix.de Cc: linux-kernel@vger.kernel.org, john.stultz@linaro.org, sboyd@kernel.org, corbet@lwn.net, Mark.Rutland@arm.com, maz@kernel.org, kernel-team@meta.com, neeraju@codeaurora.org, ak@linux.intel.com, feng.tang@intel.com, zhengjun.xing@intel.com, "Paul E. McKenney" <paulmck@kernel.org>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, "H. Peter Anvin" <hpa@zytor.com>, Daniel Lezcano <daniel.lezcano@linaro.org>, Waiman Long <longman@redhat.com>, x86@kernel.org Subject: [PATCH v2 clocksource 6/7] clocksource: Verify HPET and PMTMR when TSC unverified Date: Tue, 24 Jan 2023 16:27:29 -0800 Message-Id: <20230125002730.1471349-6-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20230125002708.GA1471122@paulmck-ThinkPad-P17-Gen-1> References: <20230125002708.GA1471122@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755952317852444720?= X-GMAIL-MSGID: =?utf-8?q?1755952317852444720?= |
Series |
Clocksource watchdog updates for v6.3
|
|
Commit Message
Paul E. McKenney
Jan. 25, 2023, 12:27 a.m. UTC
On systems with two or fewer sockets, when the boot CPU has CONSTANT_TSC, NONSTOP_TSC, and TSC_ADJUST, clocksource watchdog verification of the TSC is disabled. This works well much of the time, but there is the occasional production-level system that meets all of these criteria, but which still has a TSC that skews significantly from atomic-clock time. This is usually attributed to a firmware or hardware fault. Yes, the various NTP daemons do express their opinions of userspace-to-atomic-clock time skew, but they put them in various places, depending on the daemon and distro in question. It would therefore be good for the kernel to have some clue that there is a problem. The old behavior of marking the TSC unstable is a non-starter because a great many workloads simply cannot tolerate the overheads and latencies of the various non-TSC clocksources. In addition, NTP-corrected systems sometimes can tolerate significant kernel-space time skew as long as the userspace time sources are within epsilon of atomic-clock time. Therefore, when watchdog verification of TSC is disabled, enable it for HPET and PMTMR (AKA ACPI PM timer). This provides the needed in-kernel time-skew diagnostic without degrading the system's performance. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Waiman Long <longman@redhat.com> Cc: <x86@kernel.org> Tested-by: Feng Tang <feng.tang@intel.com> --- arch/x86/include/asm/time.h | 1 + arch/x86/kernel/hpet.c | 2 ++ arch/x86/kernel/tsc.c | 5 +++++ drivers/clocksource/acpi_pm.c | 6 ++++-- 4 files changed, 12 insertions(+), 2 deletions(-)
Comments
Hi Thomas, are you ok with this patch ? Shall I pick it ? Thanks -- Daniel On 25/01/2023 01:27, Paul E. McKenney wrote: > On systems with two or fewer sockets, when the boot CPU has CONSTANT_TSC, > NONSTOP_TSC, and TSC_ADJUST, clocksource watchdog verification of the > TSC is disabled. This works well much of the time, but there is the > occasional production-level system that meets all of these criteria, but > which still has a TSC that skews significantly from atomic-clock time. > This is usually attributed to a firmware or hardware fault. Yes, the > various NTP daemons do express their opinions of userspace-to-atomic-clock > time skew, but they put them in various places, depending on the daemon > and distro in question. It would therefore be good for the kernel to > have some clue that there is a problem. > > The old behavior of marking the TSC unstable is a non-starter because a > great many workloads simply cannot tolerate the overheads and latencies > of the various non-TSC clocksources. In addition, NTP-corrected systems > sometimes can tolerate significant kernel-space time skew as long as > the userspace time sources are within epsilon of atomic-clock time. > > Therefore, when watchdog verification of TSC is disabled, enable it for > HPET and PMTMR (AKA ACPI PM timer). This provides the needed in-kernel > time-skew diagnostic without degrading the system's performance. > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Borislav Petkov <bp@alien8.de> > Cc: Dave Hansen <dave.hansen@linux.intel.com> > Cc: "H. Peter Anvin" <hpa@zytor.com> > Cc: Daniel Lezcano <daniel.lezcano@linaro.org> > Cc: Waiman Long <longman@redhat.com> > Cc: <x86@kernel.org> > Tested-by: Feng Tang <feng.tang@intel.com> > --- > arch/x86/include/asm/time.h | 1 + > arch/x86/kernel/hpet.c | 2 ++ > arch/x86/kernel/tsc.c | 5 +++++ > drivers/clocksource/acpi_pm.c | 6 ++++-- > 4 files changed, 12 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/include/asm/time.h b/arch/x86/include/asm/time.h > index 8ac563abb567b..a53961c64a567 100644 > --- a/arch/x86/include/asm/time.h > +++ b/arch/x86/include/asm/time.h > @@ -8,6 +8,7 @@ > extern void hpet_time_init(void); > extern void time_init(void); > extern bool pit_timer_init(void); > +extern bool tsc_clocksource_watchdog_disabled(void); > > extern struct clock_event_device *global_clock_event; > > diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c > index 71f336425e58a..c8eb1ac5125ab 100644 > --- a/arch/x86/kernel/hpet.c > +++ b/arch/x86/kernel/hpet.c > @@ -1091,6 +1091,8 @@ int __init hpet_enable(void) > if (!hpet_counting()) > goto out_nohpet; > > + if (tsc_clocksource_watchdog_disabled()) > + clocksource_hpet.flags |= CLOCK_SOURCE_MUST_VERIFY; > clocksource_register_hz(&clocksource_hpet, (u32)hpet_freq); > > if (id & HPET_ID_LEGSUP) { > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c > index a78e73da4a74b..af3782fb6200c 100644 > --- a/arch/x86/kernel/tsc.c > +++ b/arch/x86/kernel/tsc.c > @@ -1186,6 +1186,11 @@ static void __init tsc_disable_clocksource_watchdog(void) > clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY; > } > > +bool tsc_clocksource_watchdog_disabled(void) > +{ > + return !(clocksource_tsc.flags & CLOCK_SOURCE_MUST_VERIFY); > +} > + > static void __init check_system_tsc_reliable(void) > { > #if defined(CONFIG_MGEODEGX1) || defined(CONFIG_MGEODE_LX) || defined(CONFIG_X86_GENERIC) > diff --git a/drivers/clocksource/acpi_pm.c b/drivers/clocksource/acpi_pm.c > index 279ddff81ab49..82338773602ca 100644 > --- a/drivers/clocksource/acpi_pm.c > +++ b/drivers/clocksource/acpi_pm.c > @@ -23,6 +23,7 @@ > #include <linux/pci.h> > #include <linux/delay.h> > #include <asm/io.h> > +#include <asm/time.h> > > /* > * The I/O port the PMTMR resides at. > @@ -210,8 +211,9 @@ static int __init init_acpi_pm_clocksource(void) > return -ENODEV; > } > > - return clocksource_register_hz(&clocksource_acpi_pm, > - PMTMR_TICKS_PER_SEC); > + if (tsc_clocksource_watchdog_disabled()) > + clocksource_acpi_pm.flags |= CLOCK_SOURCE_MUST_VERIFY; > + return clocksource_register_hz(&clocksource_acpi_pm, PMTMR_TICKS_PER_SEC); > } > > /* We use fs_initcall because we want the PCI fixups to have run
On Thu, Jan 26, 2023 at 11:57:41AM +0100, Daniel Lezcano wrote: > > Hi Thomas, > > are you ok with this patch ? Shall I pick it ? Seeing no response, I sent a pull request. If it would be helpful for me to make the pilgrimmage to (say) Intel Jones Farm, I can easily make that trip. Thanx, Paul > Thanks > > -- Daniel > > > On 25/01/2023 01:27, Paul E. McKenney wrote: > > On systems with two or fewer sockets, when the boot CPU has CONSTANT_TSC, > > NONSTOP_TSC, and TSC_ADJUST, clocksource watchdog verification of the > > TSC is disabled. This works well much of the time, but there is the > > occasional production-level system that meets all of these criteria, but > > which still has a TSC that skews significantly from atomic-clock time. > > This is usually attributed to a firmware or hardware fault. Yes, the > > various NTP daemons do express their opinions of userspace-to-atomic-clock > > time skew, but they put them in various places, depending on the daemon > > and distro in question. It would therefore be good for the kernel to > > have some clue that there is a problem. > > > > The old behavior of marking the TSC unstable is a non-starter because a > > great many workloads simply cannot tolerate the overheads and latencies > > of the various non-TSC clocksources. In addition, NTP-corrected systems > > sometimes can tolerate significant kernel-space time skew as long as > > the userspace time sources are within epsilon of atomic-clock time. > > > > Therefore, when watchdog verification of TSC is disabled, enable it for > > HPET and PMTMR (AKA ACPI PM timer). This provides the needed in-kernel > > time-skew diagnostic without degrading the system's performance. > > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > Cc: Thomas Gleixner <tglx@linutronix.de> > > Cc: Ingo Molnar <mingo@redhat.com> > > Cc: Borislav Petkov <bp@alien8.de> > > Cc: Dave Hansen <dave.hansen@linux.intel.com> > > Cc: "H. Peter Anvin" <hpa@zytor.com> > > Cc: Daniel Lezcano <daniel.lezcano@linaro.org> > > Cc: Waiman Long <longman@redhat.com> > > Cc: <x86@kernel.org> > > Tested-by: Feng Tang <feng.tang@intel.com> > > --- > > arch/x86/include/asm/time.h | 1 + > > arch/x86/kernel/hpet.c | 2 ++ > > arch/x86/kernel/tsc.c | 5 +++++ > > drivers/clocksource/acpi_pm.c | 6 ++++-- > > 4 files changed, 12 insertions(+), 2 deletions(-) > > > > diff --git a/arch/x86/include/asm/time.h b/arch/x86/include/asm/time.h > > index 8ac563abb567b..a53961c64a567 100644 > > --- a/arch/x86/include/asm/time.h > > +++ b/arch/x86/include/asm/time.h > > @@ -8,6 +8,7 @@ > > extern void hpet_time_init(void); > > extern void time_init(void); > > extern bool pit_timer_init(void); > > +extern bool tsc_clocksource_watchdog_disabled(void); > > extern struct clock_event_device *global_clock_event; > > diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c > > index 71f336425e58a..c8eb1ac5125ab 100644 > > --- a/arch/x86/kernel/hpet.c > > +++ b/arch/x86/kernel/hpet.c > > @@ -1091,6 +1091,8 @@ int __init hpet_enable(void) > > if (!hpet_counting()) > > goto out_nohpet; > > + if (tsc_clocksource_watchdog_disabled()) > > + clocksource_hpet.flags |= CLOCK_SOURCE_MUST_VERIFY; > > clocksource_register_hz(&clocksource_hpet, (u32)hpet_freq); > > if (id & HPET_ID_LEGSUP) { > > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c > > index a78e73da4a74b..af3782fb6200c 100644 > > --- a/arch/x86/kernel/tsc.c > > +++ b/arch/x86/kernel/tsc.c > > @@ -1186,6 +1186,11 @@ static void __init tsc_disable_clocksource_watchdog(void) > > clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY; > > } > > +bool tsc_clocksource_watchdog_disabled(void) > > +{ > > + return !(clocksource_tsc.flags & CLOCK_SOURCE_MUST_VERIFY); > > +} > > + > > static void __init check_system_tsc_reliable(void) > > { > > #if defined(CONFIG_MGEODEGX1) || defined(CONFIG_MGEODE_LX) || defined(CONFIG_X86_GENERIC) > > diff --git a/drivers/clocksource/acpi_pm.c b/drivers/clocksource/acpi_pm.c > > index 279ddff81ab49..82338773602ca 100644 > > --- a/drivers/clocksource/acpi_pm.c > > +++ b/drivers/clocksource/acpi_pm.c > > @@ -23,6 +23,7 @@ > > #include <linux/pci.h> > > #include <linux/delay.h> > > #include <asm/io.h> > > +#include <asm/time.h> > > /* > > * The I/O port the PMTMR resides at. > > @@ -210,8 +211,9 @@ static int __init init_acpi_pm_clocksource(void) > > return -ENODEV; > > } > > - return clocksource_register_hz(&clocksource_acpi_pm, > > - PMTMR_TICKS_PER_SEC); > > + if (tsc_clocksource_watchdog_disabled()) > > + clocksource_acpi_pm.flags |= CLOCK_SOURCE_MUST_VERIFY; > > + return clocksource_register_hz(&clocksource_acpi_pm, PMTMR_TICKS_PER_SEC); > > } > > /* We use fs_initcall because we want the PCI fixups to have run > > -- > <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs > > Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | > <http://twitter.com/#!/linaroorg> Twitter | > <http://www.linaro.org/linaro-blog/> Blog >
Paul! On Tue, Jan 24 2023 at 16:27, Paul E. McKenney wrote: > On systems with two or fewer sockets, when the boot CPU has CONSTANT_TSC, > NONSTOP_TSC, and TSC_ADJUST, clocksource watchdog verification of the > TSC is disabled. This works well much of the time, but there is the > occasional production-level system that meets all of these criteria, but > which still has a TSC that skews significantly from atomic-clock time. > This is usually attributed to a firmware or hardware fault. Yes, the > various NTP daemons do express their opinions of userspace-to-atomic-clock > time skew, but they put them in various places, depending on the daemon > and distro in question. It would therefore be good for the kernel to > have some clue that there is a problem. > > The old behavior of marking the TSC unstable is a non-starter because a > great many workloads simply cannot tolerate the overheads and latencies > of the various non-TSC clocksources. In addition, NTP-corrected systems > sometimes can tolerate significant kernel-space time skew as long as > the userspace time sources are within epsilon of atomic-clock time. > > Therefore, when watchdog verification of TSC is disabled, enable it for > HPET and PMTMR (AKA ACPI PM timer). This provides the needed in-kernel > time-skew diagnostic without degrading the system's performance. I'm more than unhappy about this. We finally have a point where the TSC watchdog overhead can go away without adding TSC=reliable to the kernel commandline. Now you add an unconditionally enforce the watchdog again in a way which even cannot be disabled on the kernel command line. Patently bad idea, no cookies for you! Thanks, tglx
Hi Thomas, On Wed, Feb 01, 2023 at 11:24:14AM +0100, Thomas Gleixner wrote: > Paul! > > On Tue, Jan 24 2023 at 16:27, Paul E. McKenney wrote: > > On systems with two or fewer sockets, when the boot CPU has CONSTANT_TSC, > > NONSTOP_TSC, and TSC_ADJUST, clocksource watchdog verification of the > > TSC is disabled. This works well much of the time, but there is the > > occasional production-level system that meets all of these criteria, but > > which still has a TSC that skews significantly from atomic-clock time. > > This is usually attributed to a firmware or hardware fault. Yes, the > > various NTP daemons do express their opinions of userspace-to-atomic-clock > > time skew, but they put them in various places, depending on the daemon > > and distro in question. It would therefore be good for the kernel to > > have some clue that there is a problem. > > > > The old behavior of marking the TSC unstable is a non-starter because a > > great many workloads simply cannot tolerate the overheads and latencies > > of the various non-TSC clocksources. In addition, NTP-corrected systems > > sometimes can tolerate significant kernel-space time skew as long as > > the userspace time sources are within epsilon of atomic-clock time. > > > > Therefore, when watchdog verification of TSC is disabled, enable it for > > HPET and PMTMR (AKA ACPI PM timer). This provides the needed in-kernel > > time-skew diagnostic without degrading the system's performance. > > I'm more than unhappy about this. We finally have a point where the TSC > watchdog overhead can go away without adding TSC=reliable to the kernel > commandline. > > Now you add an unconditionally enforce the watchdog again in a way which > even cannot be disabled on the kernel command line. Yes, this is a valid concern. Waiman, Paul and I discussed this and had some proposal to handle this side effect, like only watchdoging HPET/ACPI-PM timer for a short period of time in this case. https://lore.kernel.org/lkml/20221227183819.GI4001@paulmck-ThinkPad-P17-Gen-1/ My bad that I didn't follow up as my proposed code looked ugly as bringing more complexsities. Does the idea of setting a watchdog time limit sound fine to you? Thanks, Feng > Patently bad idea, no cookies for you! > > Thanks, > > tglx
On 2/1/23 05:24, Thomas Gleixner wrote: > Paul! > > On Tue, Jan 24 2023 at 16:27, Paul E. McKenney wrote: >> On systems with two or fewer sockets, when the boot CPU has CONSTANT_TSC, >> NONSTOP_TSC, and TSC_ADJUST, clocksource watchdog verification of the >> TSC is disabled. This works well much of the time, but there is the >> occasional production-level system that meets all of these criteria, but >> which still has a TSC that skews significantly from atomic-clock time. >> This is usually attributed to a firmware or hardware fault. Yes, the >> various NTP daemons do express their opinions of userspace-to-atomic-clock >> time skew, but they put them in various places, depending on the daemon >> and distro in question. It would therefore be good for the kernel to >> have some clue that there is a problem. >> >> The old behavior of marking the TSC unstable is a non-starter because a >> great many workloads simply cannot tolerate the overheads and latencies >> of the various non-TSC clocksources. In addition, NTP-corrected systems >> sometimes can tolerate significant kernel-space time skew as long as >> the userspace time sources are within epsilon of atomic-clock time. >> >> Therefore, when watchdog verification of TSC is disabled, enable it for >> HPET and PMTMR (AKA ACPI PM timer). This provides the needed in-kernel >> time-skew diagnostic without degrading the system's performance. > I'm more than unhappy about this. We finally have a point where the TSC > watchdog overhead can go away without adding TSC=reliable to the kernel > commandline. > > Now you add an unconditionally enforce the watchdog again in a way which > even cannot be disabled on the kernel command line. > > Patently bad idea, no cookies for you! I have a similar concern about this patch as well. That is why I was suggesting to have this enabled for a limited time after boot for sanity checking purpose only. The previous "[PATCH clocksource 5/6] clocksource: Suspend the watchdog temporarily when high read latency detected" patch, however, should be fine. Right? Cheers, Longman
On Wed, Feb 01, 2023 at 11:24:14AM +0100, Thomas Gleixner wrote: > Paul! > > On Tue, Jan 24 2023 at 16:27, Paul E. McKenney wrote: > > On systems with two or fewer sockets, when the boot CPU has CONSTANT_TSC, > > NONSTOP_TSC, and TSC_ADJUST, clocksource watchdog verification of the > > TSC is disabled. This works well much of the time, but there is the > > occasional production-level system that meets all of these criteria, but > > which still has a TSC that skews significantly from atomic-clock time. > > This is usually attributed to a firmware or hardware fault. Yes, the > > various NTP daemons do express their opinions of userspace-to-atomic-clock > > time skew, but they put them in various places, depending on the daemon > > and distro in question. It would therefore be good for the kernel to > > have some clue that there is a problem. > > > > The old behavior of marking the TSC unstable is a non-starter because a > > great many workloads simply cannot tolerate the overheads and latencies > > of the various non-TSC clocksources. In addition, NTP-corrected systems > > sometimes can tolerate significant kernel-space time skew as long as > > the userspace time sources are within epsilon of atomic-clock time. > > > > Therefore, when watchdog verification of TSC is disabled, enable it for > > HPET and PMTMR (AKA ACPI PM timer). This provides the needed in-kernel > > time-skew diagnostic without degrading the system's performance. > > I'm more than unhappy about this. We finally have a point where the TSC > watchdog overhead can go away without adding TSC=reliable to the kernel > commandline. > > Now you add an unconditionally enforce the watchdog again in a way which > even cannot be disabled on the kernel command line. > > Patently bad idea, no cookies for you! What can I say? 40,000 parts per million TSC clock skew did raise some eyebrows, and therefore the complete suppressing of that diagnostic completely was not at all a welcome development. So how about the (untested) patch below, either on top of the existing series or folded into e57818b20b0b ("clocksource: Verify HPET and PMTMR when TSC unverified")? The idea is to provide TSC checking of HPET and PMTMR only when the TSC is deemed reliable and the new tsc=watchdog kernel boot parameter is provided. If both tsc=watchdog and tsc=nowatchdog are provided, tsc=watchdog wins and a console message is emitted (no splat). To restate, with this patch, unless the sysadmin asks for it, there will be no clocksource watchdog unless there also would have been one without this patch. Thoughts? Thanx, Paul ------------------------------------------------------------------------ diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 115829d71d0ca..d681f9252aaa7 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -6382,6 +6382,12 @@ (HPET or PM timer) on systems whose TSC frequency was obtained from HW or FW using either an MSR or CPUID(0x15). Warn if the difference is more than 500 ppm. + [x86] watchdog: Use TSC as the watchdog clocksource with + which to check other HW timers (HPET or PM timer), but + only on systems where TSC has been deemed trustworthy. + This will be suppressed by an earlier tsc=nowatchdog and + can be overridden by a later tsc=nowatchdog. A console + message will flag any such suppression or overriding. tsc_early_khz= [X86] Skip early TSC calibration and use the given value instead. Useful when the early TSC frequency discovery diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index a5371c6d4b64b..306c233c98d84 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -294,6 +294,7 @@ __setup("notsc", notsc_setup); static int no_sched_irq_time; static int no_tsc_watchdog; +static int tsc_as_watchdog; static int __init tsc_setup(char *str) { @@ -303,10 +304,22 @@ static int __init tsc_setup(char *str) no_sched_irq_time = 1; if (!strcmp(str, "unstable")) mark_tsc_unstable("boot parameter"); - if (!strcmp(str, "nowatchdog")) + if (!strcmp(str, "nowatchdog")) { no_tsc_watchdog = 1; + if (tsc_as_watchdog) + pr_alert("%s: Overriding earlier tsc=watchdog with tsc=nowatchdog\n", + __func__); + tsc_as_watchdog = 0; + } if (!strcmp(str, "recalibrate")) tsc_force_recalibrate = 1; + if (!strcmp(str, "watchdog")) { + if (no_tsc_watchdog) + pr_alert("%s: tsc=watchdog overridden by earlier tsc=nowatchdog\n", + __func__); + else + tsc_as_watchdog = 1; + } return 1; } @@ -1192,7 +1205,8 @@ static void __init tsc_disable_clocksource_watchdog(void) bool tsc_clocksource_watchdog_disabled(void) { - return !(clocksource_tsc.flags & CLOCK_SOURCE_MUST_VERIFY); + return !(clocksource_tsc.flags & CLOCK_SOURCE_MUST_VERIFY) && + tsc_as_watchdog && !no_tsc_watchdog; } static void __init check_system_tsc_reliable(void)
On Wed, Feb 01, 2023 at 02:26:29PM -0500, Waiman Long wrote: > On 2/1/23 05:24, Thomas Gleixner wrote: > > Paul! > > > > On Tue, Jan 24 2023 at 16:27, Paul E. McKenney wrote: > > > On systems with two or fewer sockets, when the boot CPU has CONSTANT_TSC, > > > NONSTOP_TSC, and TSC_ADJUST, clocksource watchdog verification of the > > > TSC is disabled. This works well much of the time, but there is the > > > occasional production-level system that meets all of these criteria, but > > > which still has a TSC that skews significantly from atomic-clock time. > > > This is usually attributed to a firmware or hardware fault. Yes, the > > > various NTP daemons do express their opinions of userspace-to-atomic-clock > > > time skew, but they put them in various places, depending on the daemon > > > and distro in question. It would therefore be good for the kernel to > > > have some clue that there is a problem. > > > > > > The old behavior of marking the TSC unstable is a non-starter because a > > > great many workloads simply cannot tolerate the overheads and latencies > > > of the various non-TSC clocksources. In addition, NTP-corrected systems > > > sometimes can tolerate significant kernel-space time skew as long as > > > the userspace time sources are within epsilon of atomic-clock time. > > > > > > Therefore, when watchdog verification of TSC is disabled, enable it for > > > HPET and PMTMR (AKA ACPI PM timer). This provides the needed in-kernel > > > time-skew diagnostic without degrading the system's performance. > > I'm more than unhappy about this. We finally have a point where the TSC > > watchdog overhead can go away without adding TSC=reliable to the kernel > > commandline. > > > > Now you add an unconditionally enforce the watchdog again in a way which > > even cannot be disabled on the kernel command line. > > > > Patently bad idea, no cookies for you! > > I have a similar concern about this patch as well. That is why I was > suggesting to have this enabled for a limited time after boot for sanity > checking purpose only. Fair enough! If the watchdog checking of HPET and/or PMTMR against TSC only happens only when the sysadm asks for it, would you still want to have the ability to enable such watchdog checking at boot time, and then to disable it once the system had been running for some limited time? Thanx, Paul
On 2/1/23 14:55, Paul E. McKenney wrote: > On Wed, Feb 01, 2023 at 02:26:29PM -0500, Waiman Long wrote: >> On 2/1/23 05:24, Thomas Gleixner wrote: >>> Paul! >>> >>> On Tue, Jan 24 2023 at 16:27, Paul E. McKenney wrote: >>>> On systems with two or fewer sockets, when the boot CPU has CONSTANT_TSC, >>>> NONSTOP_TSC, and TSC_ADJUST, clocksource watchdog verification of the >>>> TSC is disabled. This works well much of the time, but there is the >>>> occasional production-level system that meets all of these criteria, but >>>> which still has a TSC that skews significantly from atomic-clock time. >>>> This is usually attributed to a firmware or hardware fault. Yes, the >>>> various NTP daemons do express their opinions of userspace-to-atomic-clock >>>> time skew, but they put them in various places, depending on the daemon >>>> and distro in question. It would therefore be good for the kernel to >>>> have some clue that there is a problem. >>>> >>>> The old behavior of marking the TSC unstable is a non-starter because a >>>> great many workloads simply cannot tolerate the overheads and latencies >>>> of the various non-TSC clocksources. In addition, NTP-corrected systems >>>> sometimes can tolerate significant kernel-space time skew as long as >>>> the userspace time sources are within epsilon of atomic-clock time. >>>> >>>> Therefore, when watchdog verification of TSC is disabled, enable it for >>>> HPET and PMTMR (AKA ACPI PM timer). This provides the needed in-kernel >>>> time-skew diagnostic without degrading the system's performance. >>> I'm more than unhappy about this. We finally have a point where the TSC >>> watchdog overhead can go away without adding TSC=reliable to the kernel >>> commandline. >>> >>> Now you add an unconditionally enforce the watchdog again in a way which >>> even cannot be disabled on the kernel command line. >>> >>> Patently bad idea, no cookies for you! >> I have a similar concern about this patch as well. That is why I was >> suggesting to have this enabled for a limited time after boot for sanity >> checking purpose only. > Fair enough! > > If the watchdog checking of HPET and/or PMTMR against TSC only happens > only when the sysadm asks for it, would you still want to have the ability > to enable such watchdog checking at boot time, and then to disable it > once the system had been running for some limited time? Yes, being optional is another way to avoid the overhead for the majority of users. The paranoids can turn it on if they want to. Cheers, Longman
On Wed, Feb 01, 2023 at 10:40:56PM -0500, Waiman Long wrote: > On 2/1/23 14:55, Paul E. McKenney wrote: > > On Wed, Feb 01, 2023 at 02:26:29PM -0500, Waiman Long wrote: > > > On 2/1/23 05:24, Thomas Gleixner wrote: > > > > Paul! > > > > > > > > On Tue, Jan 24 2023 at 16:27, Paul E. McKenney wrote: > > > > > On systems with two or fewer sockets, when the boot CPU has CONSTANT_TSC, > > > > > NONSTOP_TSC, and TSC_ADJUST, clocksource watchdog verification of the > > > > > TSC is disabled. This works well much of the time, but there is the > > > > > occasional production-level system that meets all of these criteria, but > > > > > which still has a TSC that skews significantly from atomic-clock time. > > > > > This is usually attributed to a firmware or hardware fault. Yes, the > > > > > various NTP daemons do express their opinions of userspace-to-atomic-clock > > > > > time skew, but they put them in various places, depending on the daemon > > > > > and distro in question. It would therefore be good for the kernel to > > > > > have some clue that there is a problem. > > > > > > > > > > The old behavior of marking the TSC unstable is a non-starter because a > > > > > great many workloads simply cannot tolerate the overheads and latencies > > > > > of the various non-TSC clocksources. In addition, NTP-corrected systems > > > > > sometimes can tolerate significant kernel-space time skew as long as > > > > > the userspace time sources are within epsilon of atomic-clock time. > > > > > > > > > > Therefore, when watchdog verification of TSC is disabled, enable it for > > > > > HPET and PMTMR (AKA ACPI PM timer). This provides the needed in-kernel > > > > > time-skew diagnostic without degrading the system's performance. > > > > I'm more than unhappy about this. We finally have a point where the TSC > > > > watchdog overhead can go away without adding TSC=reliable to the kernel > > > > commandline. > > > > > > > > Now you add an unconditionally enforce the watchdog again in a way which > > > > even cannot be disabled on the kernel command line. > > > > > > > > Patently bad idea, no cookies for you! > > > I have a similar concern about this patch as well. That is why I was > > > suggesting to have this enabled for a limited time after boot for sanity > > > checking purpose only. > > Fair enough! > > > > If the watchdog checking of HPET and/or PMTMR against TSC only happens > > only when the sysadm asks for it, would you still want to have the ability > > to enable such watchdog checking at boot time, and then to disable it > > once the system had been running for some limited time? > > Yes, being optional is another way to avoid the overhead for the majority of > users. The paranoids can turn it on if they want to. Very good, thank you! Thanx, Paul
On Wed, Feb 01 2023 at 22:40, Waiman Long wrote: > On 2/1/23 14:55, Paul E. McKenney wrote: >>>>> Therefore, when watchdog verification of TSC is disabled, enable it for >>>>> HPET and PMTMR (AKA ACPI PM timer). This provides the needed in-kernel >>>>> time-skew diagnostic without degrading the system's performance. >>>> I'm more than unhappy about this. We finally have a point where the TSC >>>> watchdog overhead can go away without adding TSC=reliable to the kernel >>>> commandline. >>>> >>>> Now you add an unconditionally enforce the watchdog again in a way which >>>> even cannot be disabled on the kernel command line. >>>> >>>> Patently bad idea, no cookies for you! >>> I have a similar concern about this patch as well. That is why I was >>> suggesting to have this enabled for a limited time after boot for sanity >>> checking purpose only. >> Fair enough! >> >> If the watchdog checking of HPET and/or PMTMR against TSC only happens >> only when the sysadm asks for it, would you still want to have the ability >> to enable such watchdog checking at boot time, and then to disable it >> once the system had been running for some limited time? > > Yes, being optional is another way to avoid the overhead for the > majority of users. The paranoids can turn it on if they want to. Yes, opt-in is good enough. Thanks, tglx
On Thu, Feb 02, 2023 at 08:57:39AM +0100, Thomas Gleixner wrote: > On Wed, Feb 01 2023 at 22:40, Waiman Long wrote: > > On 2/1/23 14:55, Paul E. McKenney wrote: > >>>>> Therefore, when watchdog verification of TSC is disabled, enable it for > >>>>> HPET and PMTMR (AKA ACPI PM timer). This provides the needed in-kernel > >>>>> time-skew diagnostic without degrading the system's performance. > >>>> I'm more than unhappy about this. We finally have a point where the TSC > >>>> watchdog overhead can go away without adding TSC=reliable to the kernel > >>>> commandline. > >>>> > >>>> Now you add an unconditionally enforce the watchdog again in a way which > >>>> even cannot be disabled on the kernel command line. > >>>> > >>>> Patently bad idea, no cookies for you! > >>> I have a similar concern about this patch as well. That is why I was > >>> suggesting to have this enabled for a limited time after boot for sanity > >>> checking purpose only. > >> Fair enough! > >> > >> If the watchdog checking of HPET and/or PMTMR against TSC only happens > >> only when the sysadm asks for it, would you still want to have the ability > >> to enable such watchdog checking at boot time, and then to disable it > >> once the system had been running for some limited time? > > > > Yes, being optional is another way to avoid the overhead for the > > majority of users. The paranoids can turn it on if they want to. > > Yes, opt-in is good enough. I have added this commit to my clocksource branch: 2ff7dacc88b0 clocksource: Enable TSC watchdog checking of HPET and PMTMR only when requested It is passing my internal tests, and if it does fine for a couple of days in -next, I will send an updated pull request. Thanx, Paul ------------------------------------------------------------------------ The full clocksource branch: beaa1ffe551c3 clocksource: Print clocksource name when clocksource is tested unstable c37e85c135cea clocksource: Loosen clocksource watchdog constraints f092eb34b3304 clocksource: Improve read-back-delay message dd029269947a3 clocksource: Improve "skew is too large" messages b7082cdfc464b clocksource: Suspend the watchdog temporarily when high read latency detected a7ec817d55421 x86/tsc: Add option to force frequency recalibration with HW timer efc8b329c7fdc clocksource: Verify HPET and PMTMR when TSC unverified 2ff7dacc88b0c clocksource: Enable TSC watchdog checking of HPET and PMTMR only when requested
diff --git a/arch/x86/include/asm/time.h b/arch/x86/include/asm/time.h index 8ac563abb567b..a53961c64a567 100644 --- a/arch/x86/include/asm/time.h +++ b/arch/x86/include/asm/time.h @@ -8,6 +8,7 @@ extern void hpet_time_init(void); extern void time_init(void); extern bool pit_timer_init(void); +extern bool tsc_clocksource_watchdog_disabled(void); extern struct clock_event_device *global_clock_event; diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c index 71f336425e58a..c8eb1ac5125ab 100644 --- a/arch/x86/kernel/hpet.c +++ b/arch/x86/kernel/hpet.c @@ -1091,6 +1091,8 @@ int __init hpet_enable(void) if (!hpet_counting()) goto out_nohpet; + if (tsc_clocksource_watchdog_disabled()) + clocksource_hpet.flags |= CLOCK_SOURCE_MUST_VERIFY; clocksource_register_hz(&clocksource_hpet, (u32)hpet_freq); if (id & HPET_ID_LEGSUP) { diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index a78e73da4a74b..af3782fb6200c 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -1186,6 +1186,11 @@ static void __init tsc_disable_clocksource_watchdog(void) clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY; } +bool tsc_clocksource_watchdog_disabled(void) +{ + return !(clocksource_tsc.flags & CLOCK_SOURCE_MUST_VERIFY); +} + static void __init check_system_tsc_reliable(void) { #if defined(CONFIG_MGEODEGX1) || defined(CONFIG_MGEODE_LX) || defined(CONFIG_X86_GENERIC) diff --git a/drivers/clocksource/acpi_pm.c b/drivers/clocksource/acpi_pm.c index 279ddff81ab49..82338773602ca 100644 --- a/drivers/clocksource/acpi_pm.c +++ b/drivers/clocksource/acpi_pm.c @@ -23,6 +23,7 @@ #include <linux/pci.h> #include <linux/delay.h> #include <asm/io.h> +#include <asm/time.h> /* * The I/O port the PMTMR resides at. @@ -210,8 +211,9 @@ static int __init init_acpi_pm_clocksource(void) return -ENODEV; } - return clocksource_register_hz(&clocksource_acpi_pm, - PMTMR_TICKS_PER_SEC); + if (tsc_clocksource_watchdog_disabled()) + clocksource_acpi_pm.flags |= CLOCK_SOURCE_MUST_VERIFY; + return clocksource_register_hz(&clocksource_acpi_pm, PMTMR_TICKS_PER_SEC); } /* We use fs_initcall because we want the PCI fixups to have run