[V2,1/1] x86/topology: fix erroneous smp_num_siblings on Intel Hybrid platform

Message ID 20230220032856.661884-2-rui.zhang@intel.com
State New
Headers
Series x86: cpu topology fix and question on x86_max_cores |

Commit Message

Zhang, Rui Feb. 20, 2023, 3:28 a.m. UTC
  The SMT siblings value returned by CPUID.1F SMT level EBX differs
among CPUs on Intel Hybrid platforms like AlderLake and MeteorLake.
It returns 2 for Pcore CPUs which have SMT siblings and returns 1 for
Ecore CPUs which do not have SMT siblings.

Today, the CPU boot code sets the global variable smp_num_siblings when
every CPU thread is brought up. The last thread to boot will overwrite
it with the number of siblings of *that* thread. That last thread to
boot will "win". If the thread is a Pcore, smp_num_siblings == 2.  If it
is an Ecore, smp_num_siblings == 1.

smp_num_siblings describes if the *system* supports SMT.  It should
specify the maximum number of SMT threads among all cores.

Ensure that smp_num_siblings represents the system-wide maximum number
of siblings by always increasing its value. Never allow it to decrease.

On MeteorLake-P platform, this fixes a problem that the Ecore CPUs are
not updated in any cpu sibling map because the system is treated as an
UP system when probing Ecore CPUs.

Below shows part of the CPU topology information before and after the
fix, for both Pcore and Ecore CPU (cpu0 is Pcore, cpu 12 is Ecore).
...
-/sys/devices/system/cpu/cpu0/topology/package_cpus:000fff
-/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-11
+/sys/devices/system/cpu/cpu0/topology/package_cpus:3fffff
+/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-21
...
-/sys/devices/system/cpu/cpu12/topology/package_cpus:001000
-/sys/devices/system/cpu/cpu12/topology/package_cpus_list:12
+/sys/devices/system/cpu/cpu12/topology/package_cpus:3fffff
+/sys/devices/system/cpu/cpu12/topology/package_cpus_list:0-21

And this also breaks userspace tools like lscpu
-Core(s) per socket:  1
-Socket(s):           11
+Core(s) per socket:  16
+Socket(s):           1

CC: stable@kernel.org
Suggested-by: Len Brown <len.brown@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
---
 arch/x86/kernel/cpu/topology.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
  

Comments

Peter Zijlstra Feb. 20, 2023, 11:22 a.m. UTC | #1
On Mon, Feb 20, 2023 at 11:28:56AM +0800, Zhang Rui wrote:
> The SMT siblings value returned by CPUID.1F SMT level EBX differs
> among CPUs on Intel Hybrid platforms like AlderLake and MeteorLake.
> It returns 2 for Pcore CPUs which have SMT siblings and returns 1 for
> Ecore CPUs which do not have SMT siblings.
> 
> Today, the CPU boot code sets the global variable smp_num_siblings when
> every CPU thread is brought up. The last thread to boot will overwrite
> it with the number of siblings of *that* thread. That last thread to
> boot will "win". If the thread is a Pcore, smp_num_siblings == 2.  If it
> is an Ecore, smp_num_siblings == 1.
> 
> smp_num_siblings describes if the *system* supports SMT.  It should
> specify the maximum number of SMT threads among all cores.
> 
> Ensure that smp_num_siblings represents the system-wide maximum number
> of siblings by always increasing its value. Never allow it to decrease.
> 
> On MeteorLake-P platform, this fixes a problem that the Ecore CPUs are
> not updated in any cpu sibling map because the system is treated as an
> UP system when probing Ecore CPUs.
> 
> Below shows part of the CPU topology information before and after the
> fix, for both Pcore and Ecore CPU (cpu0 is Pcore, cpu 12 is Ecore).
> ...
> -/sys/devices/system/cpu/cpu0/topology/package_cpus:000fff
> -/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-11
> +/sys/devices/system/cpu/cpu0/topology/package_cpus:3fffff
> +/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-21
> ...
> -/sys/devices/system/cpu/cpu12/topology/package_cpus:001000
> -/sys/devices/system/cpu/cpu12/topology/package_cpus_list:12
> +/sys/devices/system/cpu/cpu12/topology/package_cpus:3fffff
> +/sys/devices/system/cpu/cpu12/topology/package_cpus_list:0-21
> 
> And this also breaks userspace tools like lscpu
> -Core(s) per socket:  1
> -Socket(s):           11
> +Core(s) per socket:  16
> +Socket(s):           1
> 
> CC: stable@kernel.org
> Suggested-by: Len Brown <len.brown@intel.com>
> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> ---
>  arch/x86/kernel/cpu/topology.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
> index 5e868b62a7c4..0270925fe013 100644
> --- a/arch/x86/kernel/cpu/topology.c
> +++ b/arch/x86/kernel/cpu/topology.c
> @@ -79,7 +79,7 @@ int detect_extended_topology_early(struct cpuinfo_x86 *c)
>  	 * initial apic id, which also represents 32-bit extended x2apic id.
>  	 */
>  	c->initial_apicid = edx;
> -	smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
> +	smp_num_siblings = max_t(int, smp_num_siblings, LEVEL_MAX_SIBLINGS(ebx));
>  #endif
>  	return 0;
>  }
> @@ -109,7 +109,8 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
>  	 */
>  	cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
>  	c->initial_apicid = edx;
> -	core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
> +	core_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
> +	smp_num_siblings = max_t(int, smp_num_siblings, LEVEL_MAX_SIBLINGS(ebx));
>  	core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
>  	die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
>  	pkg_mask_width = die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);

Seems ok, but perhaps you can stick an 'int' cast in
LEVEL_MAX_SIGLINGS instead and write a simpler max() -- and/or convert
smt_num_siblings to unsigned int.

Regardless,

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
  
Zhang, Rui Feb. 21, 2023, 8:34 a.m. UTC | #2
Hi, Peter,

> > ---
> >  arch/x86/kernel/cpu/topology.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/topology.c
> > b/arch/x86/kernel/cpu/topology.c
> > index 5e868b62a7c4..0270925fe013 100644
> > --- a/arch/x86/kernel/cpu/topology.c
> > +++ b/arch/x86/kernel/cpu/topology.c
> > @@ -79,7 +79,7 @@ int detect_extended_topology_early(struct
> > cpuinfo_x86 *c)
> >  	 * initial apic id, which also represents 32-bit extended
> > x2apic id.
> >  	 */
> >  	c->initial_apicid = edx;
> > -	smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
> > +	smp_num_siblings = max_t(int, smp_num_siblings,
> > LEVEL_MAX_SIBLINGS(ebx));
> >  #endif
> >  	return 0;
> >  }
> > @@ -109,7 +109,8 @@ int detect_extended_topology(struct cpuinfo_x86
> > *c)
> >  	 */
> >  	cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
> >  	c->initial_apicid = edx;
> > -	core_level_siblings = smp_num_siblings =
> > LEVEL_MAX_SIBLINGS(ebx);
> > +	core_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
> > +	smp_num_siblings = max_t(int, smp_num_siblings,
> > LEVEL_MAX_SIBLINGS(ebx));
> >  	core_plus_mask_width = ht_mask_width =
> > BITS_SHIFT_NEXT_LEVEL(eax);
> >  	die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
> >  	pkg_mask_width = die_plus_mask_width =
> > BITS_SHIFT_NEXT_LEVEL(eax);
> 
> Seems ok, but perhaps you can stick an 'int' cast in
> LEVEL_MAX_SIGLINGS instead and write a simpler max() -- and/or
> convert
> smt_num_siblings to unsigned int.
> 
yeah, it is doable. I'd prefer to use the current version to keep this
fix simpler if you don't mind.

> Regardless,
> 
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Thanks for your ACK.

-rui
  
Zhang, Rui March 13, 2023, 2:05 a.m. UTC | #3
On Tue, 2023-02-21 at 16:34 +0800, Zhang Rui wrote:
> Hi, Peter,
> 
> > > ---
> > >  arch/x86/kernel/cpu/topology.c | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/x86/kernel/cpu/topology.c
> > > b/arch/x86/kernel/cpu/topology.c
> > > index 5e868b62a7c4..0270925fe013 100644
> > > --- a/arch/x86/kernel/cpu/topology.c
> > > +++ b/arch/x86/kernel/cpu/topology.c
> > > @@ -79,7 +79,7 @@ int detect_extended_topology_early(struct
> > > cpuinfo_x86 *c)
> > >  	 * initial apic id, which also represents 32-bit extended
> > > x2apic id.
> > >  	 */
> > >  	c->initial_apicid = edx;
> > > -	smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
> > > +	smp_num_siblings = max_t(int, smp_num_siblings,
> > > LEVEL_MAX_SIBLINGS(ebx));
> > >  #endif
> > >  	return 0;
> > >  }
> > > @@ -109,7 +109,8 @@ int detect_extended_topology(struct
> > > cpuinfo_x86
> > > *c)
> > >  	 */
> > >  	cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
> > >  	c->initial_apicid = edx;
> > > -	core_level_siblings = smp_num_siblings =
> > > LEVEL_MAX_SIBLINGS(ebx);
> > > +	core_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
> > > +	smp_num_siblings = max_t(int, smp_num_siblings,
> > > LEVEL_MAX_SIBLINGS(ebx));
> > >  	core_plus_mask_width = ht_mask_width =
> > > BITS_SHIFT_NEXT_LEVEL(eax);
> > >  	die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
> > >  	pkg_mask_width = die_plus_mask_width =
> > > BITS_SHIFT_NEXT_LEVEL(eax);
> > 
> > Seems ok, but perhaps you can stick an 'int' cast in
> > LEVEL_MAX_SIGLINGS instead and write a simpler max() -- and/or
> > convert
> > smt_num_siblings to unsigned int.
> > 
> yeah, it is doable. I'd prefer to use the current version to keep
> this
> fix simpler if you don't mind.
> 
> > Regardless,
> > 
> > Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> Thanks for your ACK.

Hi, all,

Despite the discussions about future improvements in the cover letter
of this patch series, is there any further changes needed for this one?

thanks,
rui
  

Patch

diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index 5e868b62a7c4..0270925fe013 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -79,7 +79,7 @@  int detect_extended_topology_early(struct cpuinfo_x86 *c)
 	 * initial apic id, which also represents 32-bit extended x2apic id.
 	 */
 	c->initial_apicid = edx;
-	smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
+	smp_num_siblings = max_t(int, smp_num_siblings, LEVEL_MAX_SIBLINGS(ebx));
 #endif
 	return 0;
 }
@@ -109,7 +109,8 @@  int detect_extended_topology(struct cpuinfo_x86 *c)
 	 */
 	cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
 	c->initial_apicid = edx;
-	core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
+	core_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
+	smp_num_siblings = max_t(int, smp_num_siblings, LEVEL_MAX_SIBLINGS(ebx));
 	core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
 	die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
 	pkg_mask_width = die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);