[RFC] x86/tsc: use topology_max_packages() in tsc watchdog check

Message ID 20221017132942.1646934-1-feng.tang@intel.com
State New
Headers
Series [RFC] x86/tsc: use topology_max_packages() in tsc watchdog check |

Commit Message

Feng Tang Oct. 17, 2022, 1:29 p.m. UTC
  Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC
on qualified platorms") was introduced to solve problem that
sometimes TSC clocksource is wrongly judged as unstable by watchdog
like 'jiffies', HPET, etc.

In it, the hardware socket number is a key factor for judging
whether to disable the watchdog for TSC, and 'nr_online_nodes' was
chosen as an estimation due to it is needed in early boot phase
before registering 'tsc-early' clocksource, where all none-boot
CPUs are not brought up yet.

In recent patch review, Dave Hansen pointed out there are many
cases that 'nr_online_nodes' could have issue, like:
* numa emulation (numa=fake=4 etc.)
* numa=off
* platforms with CPU+DRAM nodes, CPU-less HBM nodes, CPU-less
  persistent memory nodes.

Peter Zijlstra suggested to use logical package ids, but it is
only usable after smp_init() and all CPUs are initialized.

One solution is to skip the watchdog for 'tsc-early' clocksource,
and move the check after smp_init(), while before 'tsc'
clocksoure is registered, where topology_max_packages() could
be used as a much more accurate socket number.

Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 arch/x86/kernel/tsc.c | 42 ++++++++++++++++--------------------------
 1 file changed, 16 insertions(+), 26 deletions(-)
  

Comments

Dave Hansen Oct. 17, 2022, 11:15 p.m. UTC | #1
On 10/17/22 06:29, Feng Tang wrote:
> +	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> +	    boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> +	    boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
> +	    topology_max_packages() <= 2)
> +		clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;

I couldn't help but notice the comment in here:

> void __init calculate_max_logical_packages(void)
> {
>         int ncpus;
> 
>         /*
>          * Today neither Intel nor AMD support heterogeneous systems so
>          * extrapolate the boot cpu's data to all packages.
>          */
>         ncpus = cpu_data(0).booted_cores * topology_max_smt_threads();
>         __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);
>         pr_info("Max logical packages: %u\n", __max_logical_packages);
> }

Could you double check for me that the Alder Lake combination Core/Atom
CPUs don't count as "heterogeneous systems" in this case?
  
Feng Tang Oct. 18, 2022, 1:07 a.m. UTC | #2
On Mon, Oct 17, 2022 at 04:15:24PM -0700, Dave Hansen wrote:
> On 10/17/22 06:29, Feng Tang wrote:
> > +	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> > +	    boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> > +	    boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
> > +	    topology_max_packages() <= 2)
> > +		clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
> 
> I couldn't help but notice the comment in here:
> 
> > void __init calculate_max_logical_packages(void)
> > {
> >         int ncpus;
> > 
> >         /*
> >          * Today neither Intel nor AMD support heterogeneous systems so
> >          * extrapolate the boot cpu's data to all packages.
> >          */
> >         ncpus = cpu_data(0).booted_cores * topology_max_smt_threads();
> >         __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);
> >         pr_info("Max logical packages: %u\n", __max_logical_packages);
> > }
> 
> Could you double check for me that the Alder Lake combination Core/Atom
> CPUs don't count as "heterogeneous systems" in this case?

Thanks for the great catch! This API seems to be obsolete and shouldn't
be used, even if the number happens to equal to 'logical_packages'  

I found a Alder Lake platform which has 6 P-cores and 8 E-cores (Thanks
Rui for sharing it), the log shows "smpboot: Max logical packages: 1",
and 'cat /proc/cpuinfo | grep "physical id"' shows all its CPU's package
ID is '0'

When writing the RFC patch, I followed the smp_init() discussed by Peter
and Rui, where 'logical_packages' is the appropriate number, which is
updated in topology_update_package_map() after each CPU gets initialized.

Possible fix for this could be:
* export 'logical_packages' for tsc use
* update calculate_max_logical_packages() to return 'logical_packages'

Thoughts?

Thanks,
Feng
  

Patch

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index cafacb2e58cc..8dc7a0aeaf4d 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1131,8 +1131,7 @@  static struct clocksource clocksource_tsc_early = {
 	.uncertainty_margin	= 32 * NSEC_PER_MSEC,
 	.read			= read_tsc,
 	.mask			= CLOCKSOURCE_MASK(64),
-	.flags			= CLOCK_SOURCE_IS_CONTINUOUS |
-				  CLOCK_SOURCE_MUST_VERIFY,
+	.flags			= CLOCK_SOURCE_IS_CONTINUOUS,
 	.vdso_clock_mode	= VDSO_CLOCKMODE_TSC,
 	.enable			= tsc_cs_enable,
 	.resume			= tsc_resume,
@@ -1180,12 +1179,6 @@  void mark_tsc_unstable(char *reason)
 
 EXPORT_SYMBOL_GPL(mark_tsc_unstable);
 
-static void __init tsc_disable_clocksource_watchdog(void)
-{
-	clocksource_tsc_early.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
-	clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
-}
-
 static void __init check_system_tsc_reliable(void)
 {
 #if defined(CONFIG_MGEODEGX1) || defined(CONFIG_MGEODE_LX) || defined(CONFIG_X86_GENERIC)
@@ -1202,23 +1195,6 @@  static void __init check_system_tsc_reliable(void)
 #endif
 	if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE))
 		tsc_clocksource_reliable = 1;
-
-	/*
-	 * Disable the clocksource watchdog when the system has:
-	 *  - TSC running at constant frequency
-	 *  - TSC which does not stop in C-States
-	 *  - the TSC_ADJUST register which allows to detect even minimal
-	 *    modifications
-	 *  - not more than two sockets. As the number of sockets cannot be
-	 *    evaluated at the early boot stage where this has to be
-	 *    invoked, check the number of online memory nodes as a
-	 *    fallback solution which is an reasonable estimate.
-	 */
-	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
-	    boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
-	    boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
-	    nr_online_nodes <= 2)
-		tsc_disable_clocksource_watchdog();
 }
 
 /*
@@ -1413,6 +1389,20 @@  static int __init init_tsc_clocksource(void)
 	if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC_S3))
 		clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
 
+	/*
+	 * Disable the clocksource watchdog when the system has:
+	 *  - TSC running at constant frequency
+	 *  - TSC which does not stop in C-States
+	 *  - the TSC_ADJUST register which allows to detect even minimal
+	 *    modifications
+	 *  - not more than two sockets.
+	 */
+	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
+	    boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
+	    boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
+	    topology_max_packages() <= 2)
+		clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
+
 	/*
 	 * When TSC frequency is known (retrieved via MSR or CPUID), we skip
 	 * the refined calibration and directly register it as a clocksource.
@@ -1547,7 +1537,7 @@  void __init tsc_init(void)
 	}
 
 	if (tsc_clocksource_reliable || no_tsc_watchdog)
-		tsc_disable_clocksource_watchdog();
+		clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
 
 	clocksource_register_khz(&clocksource_tsc_early, tsc_khz);
 	detect_art();