[v2] x86/tsc: Have tsc=recalibrate override things

Message ID 20231101111621.GC19106@noisy.programming.kicks-ass.net
State New
Headers
Series [v2] x86/tsc: Have tsc=recalibrate override things |

Commit Message

Peter Zijlstra Nov. 1, 2023, 11:16 a.m. UTC
  Subject: x86/tsc: Have tsc=recalibrate override things
From: Peter Zijlstra <peterz@infradead.org>
Date: Mon, 30 Oct 2023 17:00:50 +0100

My brand-spanking new SPR supermicro workstation was reporting NTP
failures:

Oct 30 13:00:26 spr ntpd[3517]: CLOCK: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
Oct 30 13:00:58 spr ntpd[3517]: CLOCK: time stepped by 32.316775
Oct 30 13:00:58 spr ntpd[3517]: CLOCK: frequency error 41699 PPM exceeds tolerance 500 PPM

CPUID provides:

    Time Stamp Counter/Core Crystal Clock Information (0x15):
       TSC/clock ratio = 200/2
       nominal core crystal clock = 25000000 Hz
    Processor Frequency Information (0x16):
       Core Base Frequency (MHz) = 0x9c4 (2500)
       Core Maximum Frequency (MHz) = 0x12c0 (4800)
       Bus (Reference) Frequency (MHz) = 0x64 (100)

and the kernel believes this. Since commit a7ec817d5542 ("x86/tsc: Add
option to force frequency recalibration with HW timer") there is the
tsc=recalibrate option, which forces the recalibrate.

This duely reports:

Oct 30 12:42:39 spr kernel: tsc: Warning: TSC freq calibrated by CPUID/MSR differs from what is calibrated by HW timer, please check with vendor!!
Oct 30 12:42:39 spr kernel: tsc: Previous calibrated TSC freq:         2500.000 MHz
Oct 30 12:42:39 spr kernel: tsc: TSC freq recalibrated by [HPET]:         2399.967 MHz

but then continues running at 2500, offering no solace and keeping NTP
upset -- drifting ~30 seconds for every 15 minutes.

Have tsc=recalibrate override the CPUID provided numbers. This makes the
machine usable and keeps NTP 'happy':

Oct 30 16:48:44 spr ntpd[2200]: CLOCK: time stepped by -0.768679

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/tsc.c |   15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)
  

Comments

Feng Tang Nov. 1, 2023, 1:17 p.m. UTC | #1
Add Paul as he also met similar issue before.

On Wed, Nov 01, 2023 at 12:16:21PM +0100, Peter Zijlstra wrote:
> Subject: x86/tsc: Have tsc=recalibrate override things
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Mon, 30 Oct 2023 17:00:50 +0100
> 
> My brand-spanking new SPR supermicro workstation was reporting NTP
> failures:
> 
> Oct 30 13:00:26 spr ntpd[3517]: CLOCK: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
> Oct 30 13:00:58 spr ntpd[3517]: CLOCK: time stepped by 32.316775
> Oct 30 13:00:58 spr ntpd[3517]: CLOCK: frequency error 41699 PPM exceeds tolerance 500 PPM
> 
> CPUID provides:
> 
>     Time Stamp Counter/Core Crystal Clock Information (0x15):
>        TSC/clock ratio = 200/2
>        nominal core crystal clock = 25000000 Hz
>     Processor Frequency Information (0x16):
>        Core Base Frequency (MHz) = 0x9c4 (2500)
>        Core Maximum Frequency (MHz) = 0x12c0 (4800)
>        Bus (Reference) Frequency (MHz) = 0x64 (100)
> 
> and the kernel believes this. Since commit a7ec817d5542 ("x86/tsc: Add
> option to force frequency recalibration with HW timer") there is the
> tsc=recalibrate option, which forces the recalibrate.
> 
> This duely reports:
> 
> Oct 30 12:42:39 spr kernel: tsc: Warning: TSC freq calibrated by CPUID/MSR differs from what is calibrated by HW timer, please check with vendor!!
> Oct 30 12:42:39 spr kernel: tsc: Previous calibrated TSC freq:         2500.000 MHz
> Oct 30 12:42:39 spr kernel: tsc: TSC freq recalibrated by [HPET]:         2399.967 MHz
> 
> but then continues running at 2500, offering no solace and keeping NTP
> upset -- drifting ~30 seconds for every 15 minutes.
> 
> Have tsc=recalibrate override the CPUID provided numbers. This makes the
> machine usable and keeps NTP 'happy':
> 
> Oct 30 16:48:44 spr ntpd[2200]: CLOCK: time stepped by -0.768679
 
Hi Peter,

The patch looks fine to me.

One thought is the original semantics of  'tsc=recalibrate' is just
to recalibrate and print out the frequency, and with this change,
it adds 'override' semantics, so the description for 'recalibrate'
in kernel-parameters.txt may also need some update. 

Thanks,
Feng

> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/kernel/tsc.c |   15 ++++++---------
>  1 file changed, 6 insertions(+), 9 deletions(-)
> 
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1430,14 +1430,13 @@ static void tsc_refine_calibration_work(
>  			hpet ? "HPET" : "PM_TIMER",
>  			(unsigned long)freq / 1000,
>  			(unsigned long)freq % 1000);
> +	} else {
>  
> -		return;
> +		/* Make sure we're within 1% */
> +		if (abs(tsc_khz - freq) > tsc_khz/100)
> +			goto out;
>  	}
>  
> -	/* Make sure we're within 1% */
> -	if (abs(tsc_khz - freq) > tsc_khz/100)
> -		goto out;
> -
>  	tsc_khz = freq;
>  	pr_info("Refined TSC clocksource calibration: %lu.%03lu MHz\n",
>  		(unsigned long)tsc_khz / 1000,
> @@ -1479,14 +1478,12 @@ static int __init init_tsc_clocksource(v
>  	 * When TSC frequency is known (retrieved via MSR or CPUID), we skip
>  	 * the refined calibration and directly register it as a clocksource.
>  	 */
> -	if (boot_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ)) {
> +	if (boot_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ) && !tsc_force_recalibrate) {
>  		if (boot_cpu_has(X86_FEATURE_ART))
>  			art_related_clocksource = &clocksource_tsc;
>  		clocksource_register_khz(&clocksource_tsc, tsc_khz);
>  		clocksource_unregister(&clocksource_tsc_early);
> -
> -		if (!tsc_force_recalibrate)
> -			return 0;
> +		return 0;
>  	}
>  
>  	schedule_delayed_work(&tsc_irqwork, 0);
  
Peter Zijlstra Nov. 1, 2023, 5:50 p.m. UTC | #2
On Wed, Nov 01, 2023 at 12:16:21PM +0100, Peter Zijlstra wrote:
> Subject: x86/tsc: Have tsc=recalibrate override things
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Mon, 30 Oct 2023 17:00:50 +0100
> 
> My brand-spanking new SPR supermicro workstation was reporting NTP
> failures:
> 
> Oct 30 13:00:26 spr ntpd[3517]: CLOCK: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
> Oct 30 13:00:58 spr ntpd[3517]: CLOCK: time stepped by 32.316775
> Oct 30 13:00:58 spr ntpd[3517]: CLOCK: frequency error 41699 PPM exceeds tolerance 500 PPM
> 
> CPUID provides:
> 
>     Time Stamp Counter/Core Crystal Clock Information (0x15):
>        TSC/clock ratio = 200/2
>        nominal core crystal clock = 25000000 Hz
>     Processor Frequency Information (0x16):
>        Core Base Frequency (MHz) = 0x9c4 (2500)
>        Core Maximum Frequency (MHz) = 0x12c0 (4800)
>        Bus (Reference) Frequency (MHz) = 0x64 (100)
> 
> and the kernel believes this. Since commit a7ec817d5542 ("x86/tsc: Add
> option to force frequency recalibration with HW timer") there is the
> tsc=recalibrate option, which forces the recalibrate.
> 
> This duely reports:
> 
> Oct 30 12:42:39 spr kernel: tsc: Warning: TSC freq calibrated by CPUID/MSR differs from what is calibrated by HW timer, please check with vendor!!
> Oct 30 12:42:39 spr kernel: tsc: Previous calibrated TSC freq:         2500.000 MHz
> Oct 30 12:42:39 spr kernel: tsc: TSC freq recalibrated by [HPET]:         2399.967 MHz

Additionally, we could consider something like the below. This makes the
machine print:

[    0.000000] DMI: Supermicro SYS-531A-I/X13SRA-TF, BIOS 1.1b 08/01/2023
[    0.000000] tsc: [Firmware Bug]: DMI based CPUID-15h crystal frequency override: 24000000

This way I can boot without additional parameters.

---
Index: linux-2.6/arch/x86/kernel/tsc.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/tsc.c
+++ linux-2.6/arch/x86/kernel/tsc.c
@@ -15,6 +15,7 @@
 #include <linux/timex.h>
 #include <linux/static_key.h>
 #include <linux/static_call.h>
+#include <linux/dmi.h>
 
 #include <asm/hpet.h>
 #include <asm/timer.h>
@@ -651,6 +652,35 @@ success:
 	return delta;
 }
 
+static unsigned long dmi_crystal_hz;
+
+static int dmi_crystal_hz_override(const struct dmi_system_id *d)
+{
+	if (dmi_crystal_hz != (unsigned long)d->driver_data) {
+		dmi_crystal_hz = (unsigned long)d->driver_data;
+		pr_err(FW_BUG "DMI based CPUID-15h crystal frequency override: %lu\n",
+				dmi_crystal_hz);
+	}
+	return 1;
+}
+
+/*
+ * List of systems that managed to screw up CPUID-15h :-(
+ */
+static const struct dmi_system_id tsc_dmi_table[] = {
+	{
+		.callback = dmi_crystal_hz_override,
+		.ident = "Supermicro X13SRA-TF",
+		.matches = {
+			DMI_MATCH(DMI_BOARD_VENDOR, "Supermicro"),
+			DMI_MATCH(DMI_BOARD_NAME, "X13SRA-TF"),
+		},
+		/* The board advertises a 25MHz crystal, in reality it has 24MHz */
+		.driver_data = (void *)24000000UL,
+	},
+	{}
+};
+
 /**
  * native_calibrate_tsc
  * Determine TSC frequency via CPUID, else return 0.
@@ -671,6 +701,10 @@ unsigned long native_calibrate_tsc(void)
 	/* CPUID 15H TSC/Crystal ratio, plus optionally Crystal Hz */
 	cpuid(0x15, &eax_denominator, &ebx_numerator, &ecx_hz, &edx);
 
+	dmi_crystal_hz = ecx_hz;
+	if (ecx_hz && dmi_check_system(tsc_dmi_table))
+		ecx_hz = dmi_crystal_hz;
+
 	if (ebx_numerator == 0 || eax_denominator == 0)
 		return 0;
  

Patch

--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1430,14 +1430,13 @@  static void tsc_refine_calibration_work(
 			hpet ? "HPET" : "PM_TIMER",
 			(unsigned long)freq / 1000,
 			(unsigned long)freq % 1000);
+	} else {
 
-		return;
+		/* Make sure we're within 1% */
+		if (abs(tsc_khz - freq) > tsc_khz/100)
+			goto out;
 	}
 
-	/* Make sure we're within 1% */
-	if (abs(tsc_khz - freq) > tsc_khz/100)
-		goto out;
-
 	tsc_khz = freq;
 	pr_info("Refined TSC clocksource calibration: %lu.%03lu MHz\n",
 		(unsigned long)tsc_khz / 1000,
@@ -1479,14 +1478,12 @@  static int __init init_tsc_clocksource(v
 	 * When TSC frequency is known (retrieved via MSR or CPUID), we skip
 	 * the refined calibration and directly register it as a clocksource.
 	 */
-	if (boot_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ)) {
+	if (boot_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ) && !tsc_force_recalibrate) {
 		if (boot_cpu_has(X86_FEATURE_ART))
 			art_related_clocksource = &clocksource_tsc;
 		clocksource_register_khz(&clocksource_tsc, tsc_khz);
 		clocksource_unregister(&clocksource_tsc_early);
-
-		if (!tsc_force_recalibrate)
-			return 0;
+		return 0;
 	}
 
 	schedule_delayed_work(&tsc_irqwork, 0);