[1/1] x86/topology: fix erroneous smp_num_siblings on Intel Hybrid platform

Message ID 20230217163724.581513-2-rui.zhang@intel.com
State New
Headers
Series x86: cpu topology fix and question on x86_max_cores |

Commit Message

Zhang, Rui Feb. 17, 2023, 4:37 p.m. UTC
  The SMT siblings value returned by CPUID.1F SMT level EBX differs
among CPUs on Intel Hybrid platforms like AlderLake and MeteorLake.
It returns 2 for Pcore CPUs which have SMT siblings and returns 1 for
Ecore CPUs which do not have SMT siblings.

Today, the CPU boot code sets the global variable smp_num_siblings when
every CPU thread is brought up. The last thread to boot will overwrite
it with the number of siblings of *that* thread. That last thread to
boot will "win". If the thread is a Pcore, smp_num_siblings == 2.  If it
is an Ecore, smp_num_siblings == 1.

smp_num_siblings describes if the *system* supports SMT.  It should
specify the maximum number of SMT threads among all cores.

On AlderLake-P/S platforms, it does not cause any functional issues so
far.
But on MeteorLake-P platform, when probing an Ecore CPU,
a). smp_num_siblings varies like AlderLake and it is set to 1 for Ecore.
b). x86_max_cores is totally broken and it is set to 1 for the boot cpu.
Altogether, these two issues make the system being treated as an UP
system in set_cpu_sibling_map() when probing Ecore CPUs, and the Ecore
CPUs are not updated in any cpu sibling maps erroneously.

Below shows part of the CPU topology information before and after the
fix, for both Pcore and Ecore CPU (cpu0 is Pcore, cpu 12 is Ecore).
...
-/sys/devices/system/cpu/cpu0/topology/package_cpus:000fff
-/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-11
+/sys/devices/system/cpu/cpu0/topology/package_cpus:3fffff
+/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-21
...
-/sys/devices/system/cpu/cpu12/topology/package_cpus:001000
-/sys/devices/system/cpu/cpu12/topology/package_cpus_list:12
+/sys/devices/system/cpu/cpu12/topology/package_cpus:3fffff
+/sys/devices/system/cpu/cpu12/topology/package_cpus_list:0-21

And this also breaks userspace tools like lscpu
-Core(s) per socket:  1
-Socket(s):           11
+Core(s) per socket:  16
+Socket(s):           1

To fix the first issue, ensure that smp_num_siblings represents the
system-wide maximum number of siblings by always increasing its value.
Never allow it to decrease.

Note that this fix is sufficient to make set_cpu_sibling_map() work
correctly. And how to fix the bogus cpuinfo_x86.x86_max_cores will be
addressed separately.

CC: stable@kernel.org
Suggested-by: Len Brown <len.brown@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
---
 arch/x86/kernel/cpu/topology.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
  

Comments

Dave Hansen Feb. 17, 2023, 6:03 p.m. UTC | #1
On 2/17/23 08:37, Zhang Rui wrote:
> The SMT siblings value returned by CPUID.1F SMT level EBX differs
> among CPUs on Intel Hybrid platforms like AlderLake and MeteorLake.
> It returns 2 for Pcore CPUs which have SMT siblings and returns 1 for
> Ecore CPUs which do not have SMT siblings.
> 
> Today, the CPU boot code sets the global variable smp_num_siblings when
> every CPU thread is brought up. The last thread to boot will overwrite
> it with the number of siblings of *that* thread. That last thread to
> boot will "win". If the thread is a Pcore, smp_num_siblings == 2.  If it
> is an Ecore, smp_num_siblings == 1.
> 
> smp_num_siblings describes if the *system* supports SMT.  It should
> specify the maximum number of SMT threads among all cores.

I was with you until here, but I'm having a hard time parsing this:

> On AlderLake-P/S platforms, it does not cause any functional issues so
> far.
> But on MeteorLake-P platform, when probing an Ecore CPU,
> a). smp_num_siblings varies like AlderLake and it is set to 1 for Ecore.
> b). x86_max_cores is totally broken and it is set to 1 for the boot cpu.
> Altogether, these two issues make the system being treated as an UP
> system in set_cpu_sibling_map() when probing Ecore CPUs, and the Ecore
> CPUs are not updated in any cpu sibling maps erroneously.

Let's try and focus this changelog on the problem at hand which is a
broken smp_num_siblings on MeterorLake.  Right?

> Below shows part of the CPU topology information before and after the
> fix, for both Pcore and Ecore CPU (cpu0 is Pcore, cpu 12 is Ecore).
> ...
> -/sys/devices/system/cpu/cpu0/topology/package_cpus:000fff
> -/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-11
> +/sys/devices/system/cpu/cpu0/topology/package_cpus:3fffff
> +/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-21
> ...
> -/sys/devices/system/cpu/cpu12/topology/package_cpus:001000
> -/sys/devices/system/cpu/cpu12/topology/package_cpus_list:12
> +/sys/devices/system/cpu/cpu12/topology/package_cpus:3fffff
> +/sys/devices/system/cpu/cpu12/topology/package_cpus_list:0-21
> 
> And this also breaks userspace tools like lscpu
> -Core(s) per socket:  1
> -Socket(s):           11
> +Core(s) per socket:  16
> +Socket(s):           1

Heh, yeah, 11 sockets is a tiny bug.

> To fix the first issue, ensure that smp_num_siblings represents the
> system-wide maximum number of siblings by always increasing its value.
> Never allow it to decrease.
> 
> Note that this fix is sufficient to make set_cpu_sibling_map() work
> correctly. And how to fix the bogus cpuinfo_x86.x86_max_cores will be
> addressed separately.

Having this note here is probably OK.  But, I'm not sure even mentioning
x86_max_cores is worth it.  Doesn't it just add confusion?

> diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
> index 5e868b62a7c4..0270925fe013 100644
> --- a/arch/x86/kernel/cpu/topology.c
> +++ b/arch/x86/kernel/cpu/topology.c
> @@ -79,7 +79,7 @@ int detect_extended_topology_early(struct cpuinfo_x86 *c)
>  	 * initial apic id, which also represents 32-bit extended x2apic id.
>  	 */
>  	c->initial_apicid = edx;
> -	smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
> +	smp_num_siblings = max_t(int, smp_num_siblings, LEVEL_MAX_SIBLINGS(ebx));
>  #endif
>  	return 0;
>  }
> @@ -109,7 +109,8 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
>  	 */
>  	cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
>  	c->initial_apicid = edx;
> -	core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
> +	core_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
> +	smp_num_siblings = max_t(int, smp_num_siblings, LEVEL_MAX_SIBLINGS(ebx));
>  	core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
>  	die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
>  	pkg_mask_width = die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);

The fix seems simple enough.
  
Zhang, Rui Feb. 18, 2023, 4:11 p.m. UTC | #2
Hi, Dave,

Thanks for reviewing this.

On Fri, 2023-02-17 at 10:03 -0800, Dave Hansen wrote:
> On 2/17/23 08:37, Zhang Rui wrote:
> > The SMT siblings value returned by CPUID.1F SMT level EBX differs
> > among CPUs on Intel Hybrid platforms like AlderLake and MeteorLake.
> > It returns 2 for Pcore CPUs which have SMT siblings and returns 1
> > for
> > Ecore CPUs which do not have SMT siblings.
> > 
> > Today, the CPU boot code sets the global variable smp_num_siblings
> > when
> > every CPU thread is brought up. The last thread to boot will
> > overwrite
> > it with the number of siblings of *that* thread. That last thread
> > to
> > boot will "win". If the thread is a Pcore, smp_num_siblings ==
> > 2.  If it
> > is an Ecore, smp_num_siblings == 1.
> > 
> > smp_num_siblings describes if the *system* supports SMT.  It should
> > specify the maximum number of SMT threads among all cores.
> 
> I was with you until here, but I'm having a hard time parsing this:
> 
> > On AlderLake-P/S platforms, it does not cause any functional issues
> > so
> > far.
> > But on MeteorLake-P platform, when probing an Ecore CPU,
> > a). smp_num_siblings varies like AlderLake and it is set to 1 for
> > Ecore.
> > b). x86_max_cores is totally broken and it is set to 1 for the boot
> > cpu.
> > Altogether, these two issues make the system being treated as an UP
> > system in set_cpu_sibling_map() when probing Ecore CPUs, and the
> > Ecore
> > CPUs are not updated in any cpu sibling maps erroneously.
> 
> Let's try and focus this changelog on the problem at hand which is a
> broken smp_num_siblings on MeterorLake.  Right?

yes. I totally agree with this.

But when showing the (cpu topology info and lscpu) problem below, I
want to deliver a clear message that
1. there are two bugs and *both* of them are required in order to
   trigger the problem
2. this patch just fixes one of the bugs

Do you mean that I don't need to mention the x86_max_cores issue here?

> 
> > Below shows part of the CPU topology information before and after
> > the
> > fix, for both Pcore and Ecore CPU (cpu0 is Pcore, cpu 12 is Ecore).
> > ...
> > -/sys/devices/system/cpu/cpu0/topology/package_cpus:000fff
> > -/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-11
> > +/sys/devices/system/cpu/cpu0/topology/package_cpus:3fffff
> > +/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-21
> > ...
> > -/sys/devices/system/cpu/cpu12/topology/package_cpus:001000
> > -/sys/devices/system/cpu/cpu12/topology/package_cpus_list:12
> > +/sys/devices/system/cpu/cpu12/topology/package_cpus:3fffff
> > +/sys/devices/system/cpu/cpu12/topology/package_cpus_list:0-21
> > 
> > And this also breaks userspace tools like lscpu
> > -Core(s) per socket:  1
> > -Socket(s):           11
> > +Core(s) per socket:  16
> > +Socket(s):           1
> 
> Heh, yeah, 11 sockets is a tiny bug.
> 
> > To fix the first issue, ensure that smp_num_siblings represents the
> > system-wide maximum number of siblings by always increasing its
> > value.
> > Never allow it to decrease.
> > 
> > Note that this fix is sufficient to make set_cpu_sibling_map() work
> > correctly. And how to fix the bogus cpuinfo_x86.x86_max_cores will
> > be
> > addressed separately.
> 
> Having this note here is probably OK.  But, I'm not sure even
> mentioning
> x86_max_cores is worth it.  Doesn't it just add confusion?

Even without the CPU topology problem we see, changing smp_num_siblings
from a larger value to a smaller value is wrong. In that sense, I can
remove this from the changelog.

thanks,
rui
  
Dave Hansen Feb. 18, 2023, 5:19 p.m. UTC | #3
On 2/18/23 08:11, Zhang, Rui wrote:
> yes. I totally agree with this.
> 
> But when showing the (cpu topology info and lscpu) problem below, I
> want to deliver a clear message that
> 1. there are two bugs and *both* of them are required in order to
>    trigger the problem
> 2. this patch just fixes one of the bugs

That's fine, but please deliver that message in the cover letter, not
the patch changelog.

> Do you mean that I don't need to mention the x86_max_cores issue here?

Yes.
  

Patch

diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index 5e868b62a7c4..0270925fe013 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -79,7 +79,7 @@  int detect_extended_topology_early(struct cpuinfo_x86 *c)
 	 * initial apic id, which also represents 32-bit extended x2apic id.
 	 */
 	c->initial_apicid = edx;
-	smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
+	smp_num_siblings = max_t(int, smp_num_siblings, LEVEL_MAX_SIBLINGS(ebx));
 #endif
 	return 0;
 }
@@ -109,7 +109,8 @@  int detect_extended_topology(struct cpuinfo_x86 *c)
 	 */
 	cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
 	c->initial_apicid = edx;
-	core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
+	core_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
+	smp_num_siblings = max_t(int, smp_num_siblings, LEVEL_MAX_SIBLINGS(ebx));
 	core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
 	die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
 	pkg_mask_width = die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);