[RFC] x86/acpi: Ignore invalid x2APIC entries

Message ID 20230702162802.344176-1-rui.zhang@intel.com
State New
Headers
Series [RFC] x86/acpi: Ignore invalid x2APIC entries |

Commit Message

Zhang, Rui July 2, 2023, 4:28 p.m. UTC
  Currently, kernel enumerates the possible CPUs by parsing both ACPI MADT
Local APIC entries and x2APIC entries. So CPUs with "valid" APIC IDs,
even if they have duplicated APIC IDs in Local APIC and x2APIC, are
always enumerated.

Below is what ACPI MADT Local APIC and x2APIC describes on an
Ivebridge-EP system,

[02Ch 0044   1]                Subtable Type : 00 [Processor Local APIC]
[02Fh 0047   1]                Local Apic ID : 00
...
[164h 0356   1]                Subtable Type : 00 [Processor Local APIC]
[167h 0359   1]                Local Apic ID : 39
[16Ch 0364   1]                Subtable Type : 00 [Processor Local APIC]
[16Fh 0367   1]                Local Apic ID : FF
...
[3ECh 1004   1]                Subtable Type : 09 [Processor Local x2APIC]
[3F0h 1008   4]                Processor x2Apic ID : 00000000
...
[B5Ch 2908   1]                Subtable Type : 09 [Processor Local x2APIC]
[B60h 2912   4]                Processor x2Apic ID : 00000077

As a result, kernel shows "smpboot: Allowing 168 CPUs, 120 hotplug CPUs".
And this wastes significant amount of memory for the per-cpu data.
Plus this also breaks https://lore.kernel.org/all/87edm36qqb.ffs@tglx/,
because __max_logical_packages is over-estimated by the APIC IDs in
the x2APIC entries.

According to https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#processor-local-x2apic-structure,
"[Compatibility note] On some legacy OSes, Logical processors with APIC
ID values less than 255 (whether in XAPIC or X2APIC mode) must use the
Processor Local APIC structure to convey their APIC information to OSPM,
and those processors must be declared in the DSDT using the Processor()
keyword. Logical processors with APIC ID values 255 and greater must use
the Processor Local x2APIC structure and be declared using the Device()
keyword.".

Enumerate CPUs from x2APIC enties with APIC ID values 255 or greater,
when valid CPU from Local APIC is already detected.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
---
I didn't find any clear statement in the ACPI spec about if a mixture of
Local APIC and x2APIC entries is allowed or not. So it would be great if
this can be clarified.

And FYI, I have auditted a series of Intel servers, and one IVB-EP in
LKP lab and one IVB-EP from PeterZ are the only ones with a mixture of
Local APIC entries + x2APIC entries.

Plat    Status
IVB-EP  valid LAPIC + invalid LAPIC (APIC ID 0xFF) + unknown x2APIC entries (valid APIC ID + Enable bit cleared)
IVB-EP  valid LAPIC + invalid LAPIC (APIC ID 0xFF) + unknown x2APIC entries (valid APIC ID + Enable bit cleared)
CLX     valid LAPIC + invalid LAPIC (APIC ID 0xFF) + invalid x2APIC entries (APIC ID 0xFFFFFFFF)
CLX     valid LAPIC + invalid LAPIC (APIC ID 0xFF) + invalid x2APIC entries (APIC ID 0xFFFFFFFF)
ICX     valid LAPIC only
SPR     valid LAPIC only
SPR     valid x2APIC only
---
 arch/x86/kernel/acpi/boot.c | 35 ++++++++++++++++++-----------------
 1 file changed, 18 insertions(+), 17 deletions(-)
  

Comments

Thomas Gleixner July 28, 2023, 12:51 p.m. UTC | #1
On Mon, Jul 03 2023 at 00:28, Zhang Rui wrote:
>  
> +static bool has_lapic_cpus;

Yet another random flag. Sigh.

I really hate this. Why not doing the obvious?

--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2452,6 +2452,9 @@ int generic_processor_info(int apicid, i
 	bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
 				phys_cpu_present_map);
 
+	if (physid_isset(apicid, phys_cpu_present_map))
+		return -EBUSY;
+
 	/*
 	 * boot_cpu_physical_apicid is designed to have the apicid
 	 * returned by read_apic_id(), i.e, the apicid of the

As the call sites during MADT parsing ignore the return value anyway,
there is no harm and this is a proper defense against broken tables
which enumerate an APIC twice.

Thanks,

        tglx
  
Thomas Gleixner July 28, 2023, 12:55 p.m. UTC | #2
On Fri, Jul 28 2023 at 14:51, Thomas Gleixner wrote:
> On Mon, Jul 03 2023 at 00:28, Zhang Rui wrote:
>>  
>> +static bool has_lapic_cpus;
>
> Yet another random flag. Sigh.
>
> I really hate this. Why not doing the obvious?
>
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -2452,6 +2452,9 @@ int generic_processor_info(int apicid, i
>  	bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
>  				phys_cpu_present_map);
>  
> +	if (physid_isset(apicid, phys_cpu_present_map))
> +		return -EBUSY;
> +
>  	/*
>  	 * boot_cpu_physical_apicid is designed to have the apicid
>  	 * returned by read_apic_id(), i.e, the apicid of the
>
> As the call sites during MADT parsing ignore the return value anyway,
> there is no harm and this is a proper defense against broken tables
> which enumerate an APIC twice.

In fact that function should not have a return value at all, but because
it's not clearly separated between boot time and physical hotplug, it
has to have one ...
  
Zhang, Rui July 28, 2023, 4:47 p.m. UTC | #3
On Fri, 2023-07-28 at 14:51 +0200, Thomas Gleixner wrote:
> On Mon, Jul 03 2023 at 00:28, Zhang Rui wrote:
> >  
> > +static bool has_lapic_cpus;
> 
> Yet another random flag. Sigh.
> 
> I really hate this. Why not doing the obvious?
> 
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -2452,6 +2452,9 @@ int generic_processor_info(int apicid, i
>         bool boot_cpu_detected =
> physid_isset(boot_cpu_physical_apicid,
>                                 phys_cpu_present_map);
>  
> +       if (physid_isset(apicid, phys_cpu_present_map))
> +               return -EBUSY;
> +
>         /*
>          * boot_cpu_physical_apicid is designed to have the apicid
>          * returned by read_apic_id(), i.e, the apicid of the
> 
> As the call sites during MADT parsing ignore the return value anyway,
> there is no harm and this is a proper defense against broken tables
> which enumerate an APIC twice.

Yeah, this can fix the duplicate APIC ID issue.

But for x2APIC CPUs with unique APIC ID, but smaller than 255, should
we still enumerate them when we already have valid LAPIC entries?

For the Ivebridge-EP 2-socket system,

LAPIC: APIC ID from 0x0 - 0xB, 0x10 - 0x1B, 0x20 - 0x2B, 0x30 - 0x3B
x2APIC: APIC ID from 0x0 - 0x77

# cpuid -1 -l 0xb -s 1
CPU:
      --- level 1 (core) ---
      bits to shift APIC ID to get next = 0x5 (5)
      logical processors at this level  = 0x18 (24)
      level number                      = 0x1 (1)
      level type                        = core (2)
      extended APIC ID                  = 0

If we still enumerates all the x2APIC entries,
1. we got 72 extra possible CPUs from x2APIC
2. with the patch at https://lore.kernel.org/all/87edm36qqb.ffs@tglx/ ,
_max_logical_packages is set to 4 instead of 2.

this is still a problem, right?

thanks,
rui
  
Thomas Gleixner July 29, 2023, 7:07 a.m. UTC | #4
On Fri, Jul 28 2023 at 16:47, Rui Zhang wrote:
> On Fri, 2023-07-28 at 14:51 +0200, Thomas Gleixner wrote:
>> As the call sites during MADT parsing ignore the return value anyway,
>> there is no harm and this is a proper defense against broken tables
>> which enumerate an APIC twice.
>
> Yeah, this can fix the duplicate APIC ID issue.

We want it independent of the below.

> But for x2APIC CPUs with unique APIC ID, but smaller than 255, should
> we still enumerate them when we already have valid LAPIC entries?
>
> For the Ivebridge-EP 2-socket system,
>
> LAPIC: APIC ID from 0x0 - 0xB, 0x10 - 0x1B, 0x20 - 0x2B, 0x30 - 0x3B
> x2APIC: APIC ID from 0x0 - 0x77
>
> # cpuid -1 -l 0xb -s 1
> CPU:
>       --- level 1 (core) ---
>       bits to shift APIC ID to get next = 0x5 (5)
>       logical processors at this level  = 0x18 (24)
>       level number                      = 0x1 (1)
>       level type                        = core (2)
>       extended APIC ID                  = 0
>
> If we still enumerates all the x2APIC entries,
> 1. we got 72 extra possible CPUs from x2APIC
> 2. with the patch at https://lore.kernel.org/all/87edm36qqb.ffs@tglx/ ,
> _max_logical_packages is set to 4 instead of 2.
>
> this is still a problem, right?

Yes, you are right.

But I still don't like the indirection of the returned CPU number. It's
an ACPI selfcontained issue, no?

So something like this should do the trick:

+		count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_APIC,
+					acpi_parse_lapic, MAX_LOCAL_APIC);
+		if (count)
+			has_lapic_cpus = true;
+		x2count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_X2APIC,
+					acpi_parse_x2apic, MAX_LOCAL_APIC);
 	}
 	if (!count && !x2count) {
 		pr_err("No LAPIC entries present\n");
  
Zhang, Rui July 31, 2023, 1:04 p.m. UTC | #5
On Sat, 2023-07-29 at 09:07 +0200, Thomas Gleixner wrote:
> On Fri, Jul 28 2023 at 16:47, Rui Zhang wrote:
> > On Fri, 2023-07-28 at 14:51 +0200, Thomas Gleixner wrote:
> > > As the call sites during MADT parsing ignore the return value
> > > anyway,
> > > there is no harm and this is a proper defense against broken
> > > tables
> > > which enumerate an APIC twice.
> > 
> > Yeah, this can fix the duplicate APIC ID issue.
> 
> We want it independent of the below.
> 
> > But for x2APIC CPUs with unique APIC ID, but smaller than 255,
> > should
> > we still enumerate them when we already have valid LAPIC entries?
> > 
> > For the Ivebridge-EP 2-socket system,
> > 
> > LAPIC: APIC ID from 0x0 - 0xB, 0x10 - 0x1B, 0x20 - 0x2B, 0x30 -
> > 0x3B
> > x2APIC: APIC ID from 0x0 - 0x77
> > 
> > # cpuid -1 -l 0xb -s 1
> > CPU:
> >       --- level 1 (core) ---
> >       bits to shift APIC ID to get next = 0x5 (5)
> >       logical processors at this level  = 0x18 (24)
> >       level number                      = 0x1 (1)
> >       level type                        = core (2)
> >       extended APIC ID                  = 0
> > 
> > If we still enumerates all the x2APIC entries,
> > 1. we got 72 extra possible CPUs from x2APIC
> > 2. with the patch at
> > https://lore.kernel.org/all/87edm36qqb.ffs@tglx/ ,
> > _max_logical_packages is set to 4 instead of 2.
> > 
> > this is still a problem, right?
> 
> Yes, you are right.
> 
> But I still don't like the indirection of the returned CPU number.
> It's
> an ACPI selfcontained issue, no?
> 
> So something like this should do the trick:
> 
> +               count =
> acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_APIC,
> +                                       acpi_parse_lapic,
> MAX_LOCAL_APIC);
> +               if (count)
> +                       has_lapic_cpus = true;
> +               x2count =
> acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_X2APIC,
> +                                       acpi_parse_x2apic,
> MAX_LOCAL_APIC);
>         }
>         if (!count && !x2count) {
>                 pr_err("No LAPIC entries present\n");

Agreed, thanks for the advice.
Let me try to do this in v2 patch series.

thanks,
rui
  

Patch

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 21b542a6866c..a41124d58e29 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -204,6 +204,8 @@  static bool __init acpi_is_processor_usable(u32 lapic_flags)
 	return false;
 }
 
+static bool has_lapic_cpus;
+
 static int __init
 acpi_parse_x2apic(union acpi_subtable_headers *header, const unsigned long end)
 {
@@ -232,6 +234,14 @@  acpi_parse_x2apic(union acpi_subtable_headers *header, const unsigned long end)
 	if (!acpi_is_processor_usable(processor->lapic_flags))
 		return 0;
 
+	/*
+	 * According to https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#processor-local-x2apic-structure
+	 * when MADT provides both valid LAPIC and x2APIC entries, the APIC ID
+	 * in x2APIC must be equal or greater than 0xff.
+	 */
+	if (has_lapic_cpus && apic_id < 0xff)
+		return 0;
+
 	/*
 	 * We need to register disabled CPU as well to permit
 	 * counting disabled CPUs. This allows us to size
@@ -257,6 +267,7 @@  static int __init
 acpi_parse_lapic(union acpi_subtable_headers * header, const unsigned long end)
 {
 	struct acpi_madt_local_apic *processor = NULL;
+	int cpu;
 
 	processor = (struct acpi_madt_local_apic *)header;
 
@@ -280,10 +291,11 @@  acpi_parse_lapic(union acpi_subtable_headers * header, const unsigned long end)
 	 * to not preallocating memory for all NR_CPUS
 	 * when we use CPU hotplug.
 	 */
-	acpi_register_lapic(processor->id,	/* APIC ID */
+	cpu = acpi_register_lapic(processor->id,	/* APIC ID */
 			    processor->processor_id, /* ACPI ID */
 			    processor->lapic_flags & ACPI_MADT_ENABLED);
-
+	if (cpu >= 0)
+		has_lapic_cpus = true;
 	return 0;
 }
 
@@ -1123,21 +1135,10 @@  static int __init acpi_parse_madt_lapic_entries(void)
 				      acpi_parse_sapic, MAX_LOCAL_APIC);
 
 	if (!count) {
-		memset(madt_proc, 0, sizeof(madt_proc));
-		madt_proc[0].id = ACPI_MADT_TYPE_LOCAL_APIC;
-		madt_proc[0].handler = acpi_parse_lapic;
-		madt_proc[1].id = ACPI_MADT_TYPE_LOCAL_X2APIC;
-		madt_proc[1].handler = acpi_parse_x2apic;
-		ret = acpi_table_parse_entries_array(ACPI_SIG_MADT,
-				sizeof(struct acpi_table_madt),
-				madt_proc, ARRAY_SIZE(madt_proc), MAX_LOCAL_APIC);
-		if (ret < 0) {
-			pr_err("Error parsing LAPIC/X2APIC entries\n");
-			return ret;
-		}
-
-		count = madt_proc[0].count;
-		x2count = madt_proc[1].count;
+		count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_APIC,
+					acpi_parse_lapic, MAX_LOCAL_APIC);
+		x2count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_X2APIC,
+					acpi_parse_x2apic, MAX_LOCAL_APIC);
 	}
 	if (!count && !x2count) {
 		pr_err("No LAPIC entries present\n");