[v5] x86/tsc: Add option to force frequency recalibration with HW timer

Message ID 20230104081938.1014511-1-feng.tang@intel.com
State New
Headers
Series [v5] x86/tsc: Add option to force frequency recalibration with HW timer |

Commit Message

Feng Tang Jan. 4, 2023, 8:19 a.m. UTC
  The kernel assumes that the TSC frequency which is provided by the
hardware / firmware via MSRs or CPUID(0x15) is correct after applying
a few basic consistency checks. This disables the TSC recalibration
against HPET or PM timer.

As a result there is no mechanism to validate that frequency in cases
where a firmware or hardware defect is suspected. And there was case
that some user used atomic clock to measure the TSC frequency and
reported an inaccuracy issue, which was later fixed in firmware.

Add an option 'recalibrate' for 'tsc' kernel parameter to force the
tsc freq recalibration with HPET or PM timer, and warn if the
deviation from previous value is more than about 500 PPM, which
provides a way to verify the data from hardware / firmware.

There is no functional change to existing work flow.

Recently there was a real-world case: "The 40ms/s divergence between
TSC and HPET was observed on hardware that is quite recent" [1], on
that platform the TSC frequence 1896 MHz was got from CPUID(0x15),
and the force-reclibration with HPET/PMTIMER both calibrated out
value of 1975 MHz, which also matched with check from software
'chronyd', indicating it's a problem of BIOS or firmware.

[Thanks tglx for helping improving the commit log]

[1]. https://lore.kernel.org/lkml/20221117230910.GI4001@paulmck-ThinkPad-P17-Gen-1/
Signed-off-by: Feng Tang <feng.tang@intel.com>
---
Changelog:

  since v4:
    * add the real world case, where the patch helped to root
      caused a BIOS/FW problem of inaccurate CPUID-0x15 info
    * rebase against v6.2-rc1

  since v3:
    * add some real world case into commit log
    * rebase against v6.0-rc1

  since v2:
    * revise the option description in kernel-parameters.txt
    * rebase against v5.19-rc2

  since v1:
    * refine commit log to state clearly the problem and intention
      of the patch by copying Thomas' words.

.../admin-guide/kernel-parameters.txt         |  4 +++
 arch/x86/kernel/tsc.c                         | 34 ++++++++++++++++---
 2 files changed, 34 insertions(+), 4 deletions(-)
  

Comments

Paul E. McKenney Jan. 4, 2023, 2:32 p.m. UTC | #1
On Wed, Jan 04, 2023 at 04:19:38PM +0800, Feng Tang wrote:
> The kernel assumes that the TSC frequency which is provided by the
> hardware / firmware via MSRs or CPUID(0x15) is correct after applying
> a few basic consistency checks. This disables the TSC recalibration
> against HPET or PM timer.
> 
> As a result there is no mechanism to validate that frequency in cases
> where a firmware or hardware defect is suspected. And there was case
> that some user used atomic clock to measure the TSC frequency and
> reported an inaccuracy issue, which was later fixed in firmware.
> 
> Add an option 'recalibrate' for 'tsc' kernel parameter to force the
> tsc freq recalibration with HPET or PM timer, and warn if the
> deviation from previous value is more than about 500 PPM, which
> provides a way to verify the data from hardware / firmware.
> 
> There is no functional change to existing work flow.
> 
> Recently there was a real-world case: "The 40ms/s divergence between
> TSC and HPET was observed on hardware that is quite recent" [1], on
> that platform the TSC frequence 1896 MHz was got from CPUID(0x15),
> and the force-reclibration with HPET/PMTIMER both calibrated out
> value of 1975 MHz, which also matched with check from software
> 'chronyd', indicating it's a problem of BIOS or firmware.
> 
> [Thanks tglx for helping improving the commit log]
> 
> [1]. https://lore.kernel.org/lkml/20221117230910.GI4001@paulmck-ThinkPad-P17-Gen-1/
> Signed-off-by: Feng Tang <feng.tang@intel.com>

Nice!!!

Tested-by: Paul E. McKenney <paulmck@kernel.org>

> ---
> Changelog:
> 
>   since v4:
>     * add the real world case, where the patch helped to root
>       caused a BIOS/FW problem of inaccurate CPUID-0x15 info
>     * rebase against v6.2-rc1
> 
>   since v3:
>     * add some real world case into commit log
>     * rebase against v6.0-rc1
> 
>   since v2:
>     * revise the option description in kernel-parameters.txt
>     * rebase against v5.19-rc2
> 
>   since v1:
>     * refine commit log to state clearly the problem and intention
>       of the patch by copying Thomas' words.
> 
> .../admin-guide/kernel-parameters.txt         |  4 +++
>  arch/x86/kernel/tsc.c                         | 34 ++++++++++++++++---
>  2 files changed, 34 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 6cfa6e3996cf..d9eb98e748d5 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -6369,6 +6369,10 @@
>  			in situations with strict latency requirements (where
>  			interruptions from clocksource watchdog are not
>  			acceptable).
> +			[x86] recalibrate: force to do frequency recalibration
> +			with a HW timer (HPET or PM timer) for systems whose
> +			TSC frequency comes from HW or FW through MSR or CPUID(0x15),
> +			and warn if the difference is more than 500 ppm.
>  
>  	tsc_early_khz=  [X86] Skip early TSC calibration and use the given
>  			value instead. Useful when the early TSC frequency discovery
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index a78e73da4a74..92bbc4a6b3fc 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -48,6 +48,8 @@ static DEFINE_STATIC_KEY_FALSE(__use_tsc);
>  
>  int tsc_clocksource_reliable;
>  
> +static int __read_mostly tsc_force_recalibrate;
> +
>  static u32 art_to_tsc_numerator;
>  static u32 art_to_tsc_denominator;
>  static u64 art_to_tsc_offset;
> @@ -303,6 +305,8 @@ static int __init tsc_setup(char *str)
>  		mark_tsc_unstable("boot parameter");
>  	if (!strcmp(str, "nowatchdog"))
>  		no_tsc_watchdog = 1;
> +	if (!strcmp(str, "recalibrate"))
> +		tsc_force_recalibrate = 1;
>  	return 1;
>  }
>  
> @@ -1374,6 +1378,25 @@ static void tsc_refine_calibration_work(struct work_struct *work)
>  	else
>  		freq = calc_pmtimer_ref(delta, ref_start, ref_stop);
>  
> +	/* Will hit this only if tsc_force_recalibrate has been set */
> +	if (boot_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ)) {
> +
> +		/* Warn if the deviation exceeds 500 ppm */
> +		if (abs(tsc_khz - freq) > (tsc_khz >> 11)) {
> +			pr_warn("Warning: TSC freq calibrated by CPUID/MSR differs from what is calibrated by HW timer, please check with vendor!!\n");
> +			pr_info("Previous calibrated TSC freq:\t %lu.%03lu MHz\n",
> +				(unsigned long)tsc_khz / 1000,
> +				(unsigned long)tsc_khz % 1000);
> +		}
> +
> +		pr_info("TSC freq recalibrated by [%s]:\t %lu.%03lu MHz\n",
> +			hpet ? "HPET" : "PM_TIMER",
> +			(unsigned long)freq / 1000,
> +			(unsigned long)freq % 1000);
> +
> +		return;
> +	}
> +
>  	/* Make sure we're within 1% */
>  	if (abs(tsc_khz - freq) > tsc_khz/100)
>  		goto out;
> @@ -1407,8 +1430,10 @@ static int __init init_tsc_clocksource(void)
>  	if (!boot_cpu_has(X86_FEATURE_TSC) || !tsc_khz)
>  		return 0;
>  
> -	if (tsc_unstable)
> -		goto unreg;
> +	if (tsc_unstable) {
> +		clocksource_unregister(&clocksource_tsc_early);
> +		return 0;
> +	}
>  
>  	if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC_S3))
>  		clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
> @@ -1421,9 +1446,10 @@ static int __init init_tsc_clocksource(void)
>  		if (boot_cpu_has(X86_FEATURE_ART))
>  			art_related_clocksource = &clocksource_tsc;
>  		clocksource_register_khz(&clocksource_tsc, tsc_khz);
> -unreg:
>  		clocksource_unregister(&clocksource_tsc_early);
> -		return 0;
> +
> +		if (!tsc_force_recalibrate)
> +			return 0;
>  	}
>  
>  	schedule_delayed_work(&tsc_irqwork, 0);
> -- 
> 2.34.1
>
  
Paul E. McKenney Jan. 4, 2023, 5:46 p.m. UTC | #2
On Wed, Jan 04, 2023 at 06:32:27AM -0800, Paul E. McKenney wrote:
> On Wed, Jan 04, 2023 at 04:19:38PM +0800, Feng Tang wrote:
> > The kernel assumes that the TSC frequency which is provided by the
> > hardware / firmware via MSRs or CPUID(0x15) is correct after applying
> > a few basic consistency checks. This disables the TSC recalibration
> > against HPET or PM timer.
> > 
> > As a result there is no mechanism to validate that frequency in cases
> > where a firmware or hardware defect is suspected. And there was case
> > that some user used atomic clock to measure the TSC frequency and
> > reported an inaccuracy issue, which was later fixed in firmware.
> > 
> > Add an option 'recalibrate' for 'tsc' kernel parameter to force the
> > tsc freq recalibration with HPET or PM timer, and warn if the
> > deviation from previous value is more than about 500 PPM, which
> > provides a way to verify the data from hardware / firmware.
> > 
> > There is no functional change to existing work flow.
> > 
> > Recently there was a real-world case: "The 40ms/s divergence between
> > TSC and HPET was observed on hardware that is quite recent" [1], on
> > that platform the TSC frequence 1896 MHz was got from CPUID(0x15),
> > and the force-reclibration with HPET/PMTIMER both calibrated out
> > value of 1975 MHz, which also matched with check from software
> > 'chronyd', indicating it's a problem of BIOS or firmware.
> > 
> > [Thanks tglx for helping improving the commit log]
> > 
> > [1]. https://lore.kernel.org/lkml/20221117230910.GI4001@paulmck-ThinkPad-P17-Gen-1/
> > Signed-off-by: Feng Tang <feng.tang@intel.com>
> 
> Nice!!!
> 
> Tested-by: Paul E. McKenney <paulmck@kernel.org>

And I have queued this on -rcu for further review and testing, in
particular, to get it into -next sooner rather than later.  Hope that
is OK!

I was thinking that this recalibrate patch made mine unnecessary:

b32498162f5c ("clocksource: Verify HPET and PMTMR when TSC unverified")

But upon further thought, I remembered that what we here at Meta need is
for TSC to remain in use on systems for which it is deemed trustworthy.
The reason is that even a short switch to HPET can terminally annoy some
of our systems.  So I must therefore keep b32498162f5c.

							Thanx, Paul

> > ---
> > Changelog:
> > 
> >   since v4:
> >     * add the real world case, where the patch helped to root
> >       caused a BIOS/FW problem of inaccurate CPUID-0x15 info
> >     * rebase against v6.2-rc1
> > 
> >   since v3:
> >     * add some real world case into commit log
> >     * rebase against v6.0-rc1
> > 
> >   since v2:
> >     * revise the option description in kernel-parameters.txt
> >     * rebase against v5.19-rc2
> > 
> >   since v1:
> >     * refine commit log to state clearly the problem and intention
> >       of the patch by copying Thomas' words.
> > 
> > .../admin-guide/kernel-parameters.txt         |  4 +++
> >  arch/x86/kernel/tsc.c                         | 34 ++++++++++++++++---
> >  2 files changed, 34 insertions(+), 4 deletions(-)
> > 
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index 6cfa6e3996cf..d9eb98e748d5 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -6369,6 +6369,10 @@
> >  			in situations with strict latency requirements (where
> >  			interruptions from clocksource watchdog are not
> >  			acceptable).
> > +			[x86] recalibrate: force to do frequency recalibration
> > +			with a HW timer (HPET or PM timer) for systems whose
> > +			TSC frequency comes from HW or FW through MSR or CPUID(0x15),
> > +			and warn if the difference is more than 500 ppm.
> >  
> >  	tsc_early_khz=  [X86] Skip early TSC calibration and use the given
> >  			value instead. Useful when the early TSC frequency discovery
> > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > index a78e73da4a74..92bbc4a6b3fc 100644
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -48,6 +48,8 @@ static DEFINE_STATIC_KEY_FALSE(__use_tsc);
> >  
> >  int tsc_clocksource_reliable;
> >  
> > +static int __read_mostly tsc_force_recalibrate;
> > +
> >  static u32 art_to_tsc_numerator;
> >  static u32 art_to_tsc_denominator;
> >  static u64 art_to_tsc_offset;
> > @@ -303,6 +305,8 @@ static int __init tsc_setup(char *str)
> >  		mark_tsc_unstable("boot parameter");
> >  	if (!strcmp(str, "nowatchdog"))
> >  		no_tsc_watchdog = 1;
> > +	if (!strcmp(str, "recalibrate"))
> > +		tsc_force_recalibrate = 1;
> >  	return 1;
> >  }
> >  
> > @@ -1374,6 +1378,25 @@ static void tsc_refine_calibration_work(struct work_struct *work)
> >  	else
> >  		freq = calc_pmtimer_ref(delta, ref_start, ref_stop);
> >  
> > +	/* Will hit this only if tsc_force_recalibrate has been set */
> > +	if (boot_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ)) {
> > +
> > +		/* Warn if the deviation exceeds 500 ppm */
> > +		if (abs(tsc_khz - freq) > (tsc_khz >> 11)) {
> > +			pr_warn("Warning: TSC freq calibrated by CPUID/MSR differs from what is calibrated by HW timer, please check with vendor!!\n");
> > +			pr_info("Previous calibrated TSC freq:\t %lu.%03lu MHz\n",
> > +				(unsigned long)tsc_khz / 1000,
> > +				(unsigned long)tsc_khz % 1000);
> > +		}
> > +
> > +		pr_info("TSC freq recalibrated by [%s]:\t %lu.%03lu MHz\n",
> > +			hpet ? "HPET" : "PM_TIMER",
> > +			(unsigned long)freq / 1000,
> > +			(unsigned long)freq % 1000);
> > +
> > +		return;
> > +	}
> > +
> >  	/* Make sure we're within 1% */
> >  	if (abs(tsc_khz - freq) > tsc_khz/100)
> >  		goto out;
> > @@ -1407,8 +1430,10 @@ static int __init init_tsc_clocksource(void)
> >  	if (!boot_cpu_has(X86_FEATURE_TSC) || !tsc_khz)
> >  		return 0;
> >  
> > -	if (tsc_unstable)
> > -		goto unreg;
> > +	if (tsc_unstable) {
> > +		clocksource_unregister(&clocksource_tsc_early);
> > +		return 0;
> > +	}
> >  
> >  	if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC_S3))
> >  		clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
> > @@ -1421,9 +1446,10 @@ static int __init init_tsc_clocksource(void)
> >  		if (boot_cpu_has(X86_FEATURE_ART))
> >  			art_related_clocksource = &clocksource_tsc;
> >  		clocksource_register_khz(&clocksource_tsc, tsc_khz);
> > -unreg:
> >  		clocksource_unregister(&clocksource_tsc_early);
> > -		return 0;
> > +
> > +		if (!tsc_force_recalibrate)
> > +			return 0;
> >  	}
> >  
> >  	schedule_delayed_work(&tsc_irqwork, 0);
> > -- 
> > 2.34.1
> >
  
Feng Tang Jan. 5, 2023, 2:56 p.m. UTC | #3
On Wed, Jan 04, 2023 at 09:46:34AM -0800, Paul E. McKenney wrote:
> On Wed, Jan 04, 2023 at 06:32:27AM -0800, Paul E. McKenney wrote:
> > On Wed, Jan 04, 2023 at 04:19:38PM +0800, Feng Tang wrote:
> > > The kernel assumes that the TSC frequency which is provided by the
> > > hardware / firmware via MSRs or CPUID(0x15) is correct after applying
> > > a few basic consistency checks. This disables the TSC recalibration
> > > against HPET or PM timer.
> > > 
> > > As a result there is no mechanism to validate that frequency in cases
> > > where a firmware or hardware defect is suspected. And there was case
> > > that some user used atomic clock to measure the TSC frequency and
> > > reported an inaccuracy issue, which was later fixed in firmware.
> > > 
> > > Add an option 'recalibrate' for 'tsc' kernel parameter to force the
> > > tsc freq recalibration with HPET or PM timer, and warn if the
> > > deviation from previous value is more than about 500 PPM, which
> > > provides a way to verify the data from hardware / firmware.
> > > 
> > > There is no functional change to existing work flow.
> > > 
> > > Recently there was a real-world case: "The 40ms/s divergence between
> > > TSC and HPET was observed on hardware that is quite recent" [1], on
> > > that platform the TSC frequence 1896 MHz was got from CPUID(0x15),
> > > and the force-reclibration with HPET/PMTIMER both calibrated out
> > > value of 1975 MHz, which also matched with check from software
> > > 'chronyd', indicating it's a problem of BIOS or firmware.
> > > 
> > > [Thanks tglx for helping improving the commit log]
> > > 
> > > [1]. https://lore.kernel.org/lkml/20221117230910.GI4001@paulmck-ThinkPad-P17-Gen-1/
> > > Signed-off-by: Feng Tang <feng.tang@intel.com>
> > 
> > Nice!!!
> > 
> > Tested-by: Paul E. McKenney <paulmck@kernel.org>
> 
> And I have queued this on -rcu for further review and testing, in
> particular, to get it into -next sooner rather than later.  Hope that
> is OK!
> 
> I was thinking that this recalibrate patch made mine unnecessary:
> 
> b32498162f5c ("clocksource: Verify HPET and PMTMR when TSC unverified")
> 
> But upon further thought, I remembered that what we here at Meta need is
> for TSC to remain in use on systems for which it is deemed trustworthy.
> The reason is that even a short switch to HPET can terminally annoy some
> of our systems.  So I must therefore keep b32498162f5c.

Yes, my patch only adds an optional kernel cmdline parameter, and it
can't use both HPET and PM_TIMER to do the calibration in one turn.
Also, the purpose of the 2 patches are completely different.

Thanks,
Feng

> 
> 							Thanx, Paul
> 
> > > ---
> > > Changelog:
> > > 
> > >   since v4:
> > >     * add the real world case, where the patch helped to root
> > >       caused a BIOS/FW problem of inaccurate CPUID-0x15 info
> > >     * rebase against v6.2-rc1
> > > 
> > >   since v3:
> > >     * add some real world case into commit log
> > >     * rebase against v6.0-rc1
> > > 
> > >   since v2:
> > >     * revise the option description in kernel-parameters.txt
> > >     * rebase against v5.19-rc2
> > > 
> > >   since v1:
> > >     * refine commit log to state clearly the problem and intention
> > >       of the patch by copying Thomas' words.
> > > 
> > > .../admin-guide/kernel-parameters.txt         |  4 +++
> > >  arch/x86/kernel/tsc.c                         | 34 ++++++++++++++++---
> > >  2 files changed, 34 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > index 6cfa6e3996cf..d9eb98e748d5 100644
> > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > @@ -6369,6 +6369,10 @@
> > >  			in situations with strict latency requirements (where
> > >  			interruptions from clocksource watchdog are not
> > >  			acceptable).
> > > +			[x86] recalibrate: force to do frequency recalibration
> > > +			with a HW timer (HPET or PM timer) for systems whose
> > > +			TSC frequency comes from HW or FW through MSR or CPUID(0x15),
> > > +			and warn if the difference is more than 500 ppm.
> > >  
> > >  	tsc_early_khz=  [X86] Skip early TSC calibration and use the given
> > >  			value instead. Useful when the early TSC frequency discovery
> > > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > > index a78e73da4a74..92bbc4a6b3fc 100644
> > > --- a/arch/x86/kernel/tsc.c
> > > +++ b/arch/x86/kernel/tsc.c
> > > @@ -48,6 +48,8 @@ static DEFINE_STATIC_KEY_FALSE(__use_tsc);
> > >  
> > >  int tsc_clocksource_reliable;
> > >  
> > > +static int __read_mostly tsc_force_recalibrate;
> > > +
> > >  static u32 art_to_tsc_numerator;
> > >  static u32 art_to_tsc_denominator;
> > >  static u64 art_to_tsc_offset;
> > > @@ -303,6 +305,8 @@ static int __init tsc_setup(char *str)
> > >  		mark_tsc_unstable("boot parameter");
> > >  	if (!strcmp(str, "nowatchdog"))
> > >  		no_tsc_watchdog = 1;
> > > +	if (!strcmp(str, "recalibrate"))
> > > +		tsc_force_recalibrate = 1;
> > >  	return 1;
> > >  }
> > >  
> > > @@ -1374,6 +1378,25 @@ static void tsc_refine_calibration_work(struct work_struct *work)
> > >  	else
> > >  		freq = calc_pmtimer_ref(delta, ref_start, ref_stop);
> > >  
> > > +	/* Will hit this only if tsc_force_recalibrate has been set */
> > > +	if (boot_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ)) {
> > > +
> > > +		/* Warn if the deviation exceeds 500 ppm */
> > > +		if (abs(tsc_khz - freq) > (tsc_khz >> 11)) {
> > > +			pr_warn("Warning: TSC freq calibrated by CPUID/MSR differs from what is calibrated by HW timer, please check with vendor!!\n");
> > > +			pr_info("Previous calibrated TSC freq:\t %lu.%03lu MHz\n",
> > > +				(unsigned long)tsc_khz / 1000,
> > > +				(unsigned long)tsc_khz % 1000);
> > > +		}
> > > +
> > > +		pr_info("TSC freq recalibrated by [%s]:\t %lu.%03lu MHz\n",
> > > +			hpet ? "HPET" : "PM_TIMER",
> > > +			(unsigned long)freq / 1000,
> > > +			(unsigned long)freq % 1000);
> > > +
> > > +		return;
> > > +	}
> > > +
> > >  	/* Make sure we're within 1% */
> > >  	if (abs(tsc_khz - freq) > tsc_khz/100)
> > >  		goto out;
> > > @@ -1407,8 +1430,10 @@ static int __init init_tsc_clocksource(void)
> > >  	if (!boot_cpu_has(X86_FEATURE_TSC) || !tsc_khz)
> > >  		return 0;
> > >  
> > > -	if (tsc_unstable)
> > > -		goto unreg;
> > > +	if (tsc_unstable) {
> > > +		clocksource_unregister(&clocksource_tsc_early);
> > > +		return 0;
> > > +	}
> > >  
> > >  	if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC_S3))
> > >  		clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
> > > @@ -1421,9 +1446,10 @@ static int __init init_tsc_clocksource(void)
> > >  		if (boot_cpu_has(X86_FEATURE_ART))
> > >  			art_related_clocksource = &clocksource_tsc;
> > >  		clocksource_register_khz(&clocksource_tsc, tsc_khz);
> > > -unreg:
> > >  		clocksource_unregister(&clocksource_tsc_early);
> > > -		return 0;
> > > +
> > > +		if (!tsc_force_recalibrate)
> > > +			return 0;
> > >  	}
> > >  
> > >  	schedule_delayed_work(&tsc_irqwork, 0);
> > > -- 
> > > 2.34.1
> > >
  

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6cfa6e3996cf..d9eb98e748d5 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6369,6 +6369,10 @@ 
 			in situations with strict latency requirements (where
 			interruptions from clocksource watchdog are not
 			acceptable).
+			[x86] recalibrate: force to do frequency recalibration
+			with a HW timer (HPET or PM timer) for systems whose
+			TSC frequency comes from HW or FW through MSR or CPUID(0x15),
+			and warn if the difference is more than 500 ppm.
 
 	tsc_early_khz=  [X86] Skip early TSC calibration and use the given
 			value instead. Useful when the early TSC frequency discovery
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index a78e73da4a74..92bbc4a6b3fc 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -48,6 +48,8 @@  static DEFINE_STATIC_KEY_FALSE(__use_tsc);
 
 int tsc_clocksource_reliable;
 
+static int __read_mostly tsc_force_recalibrate;
+
 static u32 art_to_tsc_numerator;
 static u32 art_to_tsc_denominator;
 static u64 art_to_tsc_offset;
@@ -303,6 +305,8 @@  static int __init tsc_setup(char *str)
 		mark_tsc_unstable("boot parameter");
 	if (!strcmp(str, "nowatchdog"))
 		no_tsc_watchdog = 1;
+	if (!strcmp(str, "recalibrate"))
+		tsc_force_recalibrate = 1;
 	return 1;
 }
 
@@ -1374,6 +1378,25 @@  static void tsc_refine_calibration_work(struct work_struct *work)
 	else
 		freq = calc_pmtimer_ref(delta, ref_start, ref_stop);
 
+	/* Will hit this only if tsc_force_recalibrate has been set */
+	if (boot_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ)) {
+
+		/* Warn if the deviation exceeds 500 ppm */
+		if (abs(tsc_khz - freq) > (tsc_khz >> 11)) {
+			pr_warn("Warning: TSC freq calibrated by CPUID/MSR differs from what is calibrated by HW timer, please check with vendor!!\n");
+			pr_info("Previous calibrated TSC freq:\t %lu.%03lu MHz\n",
+				(unsigned long)tsc_khz / 1000,
+				(unsigned long)tsc_khz % 1000);
+		}
+
+		pr_info("TSC freq recalibrated by [%s]:\t %lu.%03lu MHz\n",
+			hpet ? "HPET" : "PM_TIMER",
+			(unsigned long)freq / 1000,
+			(unsigned long)freq % 1000);
+
+		return;
+	}
+
 	/* Make sure we're within 1% */
 	if (abs(tsc_khz - freq) > tsc_khz/100)
 		goto out;
@@ -1407,8 +1430,10 @@  static int __init init_tsc_clocksource(void)
 	if (!boot_cpu_has(X86_FEATURE_TSC) || !tsc_khz)
 		return 0;
 
-	if (tsc_unstable)
-		goto unreg;
+	if (tsc_unstable) {
+		clocksource_unregister(&clocksource_tsc_early);
+		return 0;
+	}
 
 	if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC_S3))
 		clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
@@ -1421,9 +1446,10 @@  static int __init init_tsc_clocksource(void)
 		if (boot_cpu_has(X86_FEATURE_ART))
 			art_related_clocksource = &clocksource_tsc;
 		clocksource_register_khz(&clocksource_tsc, tsc_khz);
-unreg:
 		clocksource_unregister(&clocksource_tsc_early);
-		return 0;
+
+		if (!tsc_force_recalibrate)
+			return 0;
 	}
 
 	schedule_delayed_work(&tsc_irqwork, 0);