[V4,02/41] cpu/SMT: Make SMT control more robust against enumeration failures

Message ID 20230814085112.149440843@linutronix.de
State New
Headers
Series x86/cpu: Rework the topology evaluation |

Commit Message

Thomas Gleixner Aug. 14, 2023, 8:53 a.m. UTC
  The SMT control mechanism got added as speculation attack vector
mitigation. The implemented logic relies on the primary thread mask to
be set up properly.

This turns out to be an issue with XEN/PV guests because their CPU hotplug
mechanics do not enumerate APICs and therefore the mask is never correctly
populated.

This went unnoticed so far because by chance XEN/PV ends up with
smp_num_siblings == 2. So smt_hotplug_control stays at its default value
CPU_SMT_ENABLED and the primary thread mask is never evaluated in the
context of CPU hotplug.

This stopped "working" with the upcoming overhaul of the topology
evaluation which legitimately provides a fake topology for XEN/PV. That
sets smp_num_siblings to 1, which causes the core CPU hot-plug core to
refuse to bring up the APs.

This happens because smt_hotplug_control is set to CPU_SMT_NOT_SUPPORTED
which causes cpu_smt_allowed() to evaluate the unpopulated primary thread
mask with the conclusion that all non-boot CPUs are not valid to be
plugged.

Make cpu_smt_allowed() more robust and take CPU_SMT_NOT_SUPPORTED and
CPU_SMT_NOT_IMPLEMENTED into account. Rename it to cpu_bootable() while at
it as that makes it more clear what the function is about.

The primary mask issue on x86 XEN/PV needs to be addressed separately as
there are users outside of the CPU hotplug code too.

Fixes: 05736e4ac13c ("cpu/hotplug: Provide knobs to control SMT")
Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>

---
V2: Rename cpu_smt_allowed() - Borislav
---
 kernel/cpu.c |   16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)
  

Comments

Thomas Gleixner Oct. 10, 2023, 12:18 p.m. UTC | #1
On Tue, Aug 15 2023 at 14:15, Dave Hansen wrote:
> On 8/14/23 01:53, Thomas Gleixner wrote:
>> -static inline bool cpu_smt_allowed(unsigned int cpu)
>> +static inline bool cpu_bootable(unsigned int cpu)
>>  {
>>  	if (cpu_smt_control == CPU_SMT_ENABLED)
>>  		return true;
>>  
>> +	if (cpu_smt_control == CPU_SMT_NOT_SUPPORTED)
>> +		return true;
>
> I found this new pair of if()'s rather counterintuitive to read.
>
> The first one reads like:
>
> 	"If SMT is not supported, the CPU is always bootable"
>
> but "supported" could easily mean CONFIG_SMP==n (which is actually
> covered in the next case).  Would this be better named:
>
> 	CPU_SMT_NOT_ENUMERATED
> or
> 	CPU_SMT_NOT_DETECTED
>
> ?

Yes, no, maybe. I rather keep them as is because the strings which are
exposed via sysfs cannot be changed and are matching.

> 	/* Every CPU is bootable on non-SMT systems: */
> 	if (cpu_smt_control == CPU_SMT_NOT_DETECTED)
> 		return true;
>
> For the next one:
>
>> +	if (cpu_smt_control == CPU_SMT_NOT_IMPLEMENTED)
>> +		return true;
>
> This reads a bit like "SMT is not implemented" rather than "SMT controls
> are not implemented".  Maybe a comment would help:
>
> 	/* All CPUs are bootable if controls are not implemented: */

Sure.
  

Patch

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -625,11 +625,17 @@  static int __init smt_cmdline_disable(ch
 }
 early_param("nosmt", smt_cmdline_disable);
 
-static inline bool cpu_smt_allowed(unsigned int cpu)
+static inline bool cpu_bootable(unsigned int cpu)
 {
 	if (cpu_smt_control == CPU_SMT_ENABLED)
 		return true;
 
+	if (cpu_smt_control == CPU_SMT_NOT_SUPPORTED)
+		return true;
+
+	if (cpu_smt_control == CPU_SMT_NOT_IMPLEMENTED)
+		return true;
+
 	if (topology_is_primary_thread(cpu))
 		return true;
 
@@ -660,7 +666,7 @@  static inline const struct cpumask *cpuh
 	return cpu_primary_thread_mask;
 }
 #else
-static inline bool cpu_smt_allowed(unsigned int cpu) { return true; }
+static inline bool cpu_bootable(unsigned int cpu) { return true; }
 static inline bool cpuhp_smt_aware(void) { return false; }
 static inline const struct cpumask *cpuhp_get_primary_thread_mask(void)
 {
@@ -768,10 +774,10 @@  static int bringup_wait_for_ap_online(un
 	 * SMT soft disabling on X86 requires to bring the CPU out of the
 	 * BIOS 'wait for SIPI' state in order to set the CR4.MCE bit.  The
 	 * CPU marked itself as booted_once in notify_cpu_starting() so the
-	 * cpu_smt_allowed() check will now return false if this is not the
+	 * cpu_bootable() check will now return false if this is not the
 	 * primary sibling.
 	 */
-	if (!cpu_smt_allowed(cpu))
+	if (!cpu_bootable(cpu))
 		return -ECANCELED;
 	return 0;
 }
@@ -1699,7 +1705,7 @@  static int cpu_up(unsigned int cpu, enum
 		err = -EBUSY;
 		goto out;
 	}
-	if (!cpu_smt_allowed(cpu)) {
+	if (!cpu_bootable(cpu)) {
 		err = -EPERM;
 		goto out;
 	}