[v2] acpi,pci: warn about duplicate IRQ routing entries returned from _PRT

Message ID 20221113173442.5770-1-mat.jonczyk@o2.pl
State New
Headers
Series [v2] acpi,pci: warn about duplicate IRQ routing entries returned from _PRT |

Commit Message

Mateusz Jończyk Nov. 13, 2022, 5:34 p.m. UTC
  On some platforms, the ACPI _PRT function returns duplicate interrupt
routing entries. Linux uses the first matching entry, but sometimes the
second matching entry contains the correct interrupt vector.

Print a warning to dmesg if duplicate interrupt routing entries are
present, so that we could check how many models are affected.

This happens on a Dell Latitude E6500 laptop with the i2c-i801 Intel
SMBus controller. This controller was nonfunctional unless its interrupt
usage was disabled (using the "disable_features=0x10" module parameter).

After investigation, it turned out that the driver was using an
incorrect interrupt vector: in lspci output for this device there was:
        Interrupt: pin B routed to IRQ 19
but after running i2cdetect (without using any i2c-i801 module
parameters) the following was logged to dmesg:

        [...]
        i801_smbus 0000:00:1f.3: Timeout waiting for interrupt!
        i801_smbus 0000:00:1f.3: Transaction timeout
        i801_smbus 0000:00:1f.3: Timeout waiting for interrupt!
        i801_smbus 0000:00:1f.3: Transaction timeout
        irq 17: nobody cared (try booting with the "irqpoll" option)

Existence of duplicate entries in a table returned by the _PRT method
was confirmed by disassembling the ACPI DSDT table.

Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jean Delvare <jdelvare@suse.com>

--
v2: - add a newline at the end of the kernel log message,
    - replace: "if (match == NULL)" -> "if (!match)"
    - patch description tweaks.

Tested on two computers, including the affected Dell Latitude E6500 laptop.

 drivers/acpi/pci_irq.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)


base-commit: f0c4d9fc9cc9462659728d168387191387e903cc
  

Comments

Jean Delvare Nov. 15, 2022, 8:36 a.m. UTC | #1
Hi Mateusz,

On Sun, 13 Nov 2022 18:34:42 +0100, Mateusz Jończyk wrote:
> On some platforms, the ACPI _PRT function returns duplicate interrupt
> routing entries. Linux uses the first matching entry, but sometimes the
> second matching entry contains the correct interrupt vector.
> 
> Print a warning to dmesg if duplicate interrupt routing entries are
> present, so that we could check how many models are affected.

Excellent idea. We want hardware manufacturers to fix such bugs in the
firmware, and the best way for this to happen is to report them
whenever they are encountered.

> This happens on a Dell Latitude E6500 laptop with the i2c-i801 Intel
> SMBus controller. This controller was nonfunctional unless its interrupt
> usage was disabled (using the "disable_features=0x10" module parameter).
> 
> After investigation, it turned out that the driver was using an
> incorrect interrupt vector: in lspci output for this device there was:
>         Interrupt: pin B routed to IRQ 19
> but after running i2cdetect (without using any i2c-i801 module
> parameters) the following was logged to dmesg:
> 
>         [...]
>         i801_smbus 0000:00:1f.3: Timeout waiting for interrupt!
>         i801_smbus 0000:00:1f.3: Transaction timeout
>         i801_smbus 0000:00:1f.3: Timeout waiting for interrupt!
>         i801_smbus 0000:00:1f.3: Transaction timeout
>         irq 17: nobody cared (try booting with the "irqpoll" option)
> 
> Existence of duplicate entries in a table returned by the _PRT method
> was confirmed by disassembling the ACPI DSDT table.

Excuse a probably stupid question, but what would happen if we would
plain ignore the IRQ routing information from ACPI in this case? Would
we fallback to some pure-PCI routing logic which may have a chance to
find the right IRQ routing (matching the second ACPI routing entry in
this case)?

> Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Len Brown <lenb@kernel.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Jean Delvare <jdelvare@suse.com>
> 
> --
> v2: - add a newline at the end of the kernel log message,
>     - replace: "if (match == NULL)" -> "if (!match)"
>     - patch description tweaks.
> 
> Tested on two computers, including the affected Dell Latitude E6500 laptop.
> 
>  drivers/acpi/pci_irq.c | 25 ++++++++++++++++++++++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
> index 08e15774fb9f..a4e41b7b71ed 100644
> --- a/drivers/acpi/pci_irq.c
> +++ b/drivers/acpi/pci_irq.c
> @@ -203,6 +203,8 @@ static int acpi_pci_irq_find_prt_entry(struct pci_dev *dev,
>  	struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
>  	struct acpi_pci_routing_table *entry;
>  	acpi_handle handle = NULL;
> +	struct acpi_prt_entry *match = NULL;
> +	const char *match_int_source = NULL;
>  
>  	if (dev->bus->bridge)
>  		handle = ACPI_HANDLE(dev->bus->bridge);
> @@ -219,13 +221,30 @@ static int acpi_pci_irq_find_prt_entry(struct pci_dev *dev,
>  
>  	entry = buffer.pointer;
>  	while (entry && (entry->length > 0)) {
> -		if (!acpi_pci_irq_check_entry(handle, dev, pin,
> -						 entry, entry_ptr))
> -			break;
> +		struct acpi_prt_entry *curr;
> +
> +		if (!acpi_pci_irq_check_entry(handle, dev, pin, entry, &curr)) {
> +			if (!match) {
> +				match = curr;
> +				match_int_source = entry->source;
> +			} else {
> +				pr_warn(FW_BUG
> +				"ACPI _PRT returned duplicate IRQ routing entries for device "
> +					"%04x:%02x:%02x[INT%c]: %s[%d] and %s[%d].\n",

The beginning of the string should be aligned with the opening
parenthesis, and the string should be on a single line (this is a
encouraged exception to the 80-column rule). I would also omit the
tailing dot for consistency.

> +					curr->id.segment, curr->id.bus, curr->id.device,

Is the IRQ per PCI device, or per PCI function? If the latter, then you
should print "%02x.%x" instead of just "%02x", with the extra element
being curr->id.function.

> +					pin_name(curr->pin),
> +					match_int_source, match->index,
> +					entry->source, curr->index);
> +				// we use the first matching entry nonetheless

The rest of the file uses /* C89-style comments */ so I would stick to
that for consistency.

> +			}
> +		}
> +
>  		entry = (struct acpi_pci_routing_table *)
>  		    ((unsigned long)entry + entry->length);
>  	}
>  
> +	*entry_ptr = match;
> +
>  	kfree(buffer.pointer);
>  	return 0;
>  }

Reviewed-by: Jean Delvare <jdelvare@suse.de>
Tested-by: Jean Delvare <jdelvare@suse.de>

(Tested on a Dell OptiPlex 9020 not affected by the problem.)
  
Mateusz Jończyk Nov. 23, 2022, 8:28 p.m. UTC | #2
Hello,

W dniu 15.11.2022 o 09:36, Jean Delvare pisze:
> Hi Mateusz,
>
> On Sun, 13 Nov 2022 18:34:42 +0100, Mateusz Jończyk wrote:
>> On some platforms, the ACPI _PRT function returns duplicate interrupt
>> routing entries. Linux uses the first matching entry, but sometimes the
>> second matching entry contains the correct interrupt vector.
>>
>> Print a warning to dmesg if duplicate interrupt routing entries are
>> present, so that we could check how many models are affected.
> Excellent idea. We want hardware manufacturers to fix such bugs in the
> firmware, and the best way for this to happen is to report them
> whenever they are encountered.
>
>> This happens on a Dell Latitude E6500 laptop with the i2c-i801 Intel
>> SMBus controller. This controller was nonfunctional unless its interrupt
>> usage was disabled (using the "disable_features=0x10" module parameter).
>>
>> After investigation, it turned out that the driver was using an
>> incorrect interrupt vector: in lspci output for this device there was:
>>         Interrupt: pin B routed to IRQ 19
>> but after running i2cdetect (without using any i2c-i801 module
>> parameters) the following was logged to dmesg:
>>
>>         [...]
>>         i801_smbus 0000:00:1f.3: Timeout waiting for interrupt!
>>         i801_smbus 0000:00:1f.3: Transaction timeout
>>         i801_smbus 0000:00:1f.3: Timeout waiting for interrupt!
>>         i801_smbus 0000:00:1f.3: Transaction timeout
>>         irq 17: nobody cared (try booting with the "irqpoll" option)
>>
>> Existence of duplicate entries in a table returned by the _PRT method
>> was confirmed by disassembling the ACPI DSDT table.
> Excuse a probably stupid question, but what would happen if we would
> plain ignore the IRQ routing information from ACPI in this case? Would
> we fallback to some pure-PCI routing logic which may have a chance to
> find the right IRQ routing (matching the second ACPI routing entry in
> this case)?

From what I understand, the PCI IRQ routing information is not discoverable
by probing the hardware (in the general case), it has to be obtained from
the ACPI tables (or perhaps from the obsolete MP tables, also provided by
firmware). See https://docs.kernel.org/PCI/acpi-info.html :

> For example, there’s no standard hardware mechanism for enumerating PCI
> host bridges, so the ACPI namespace must describe each host bridge,
> the method for accessing PCI config space below it, the address space
> windows the host bridge forwards to PCI (using _CRS), and the routing
> of legacy INTx interrupts (using _PRT).

(a PCI host bridge connects the CPU cores to the PCI bus, it is the root of the PCI
device tree. This patch concerns the "legacy INTx interrupts" as above).

In the case of this particular laptop, however, it should be possible to obtain
the information by reading chipset registers, which are documented at
https://www.intel.com/content/www/us/en/io/io-controller-hub-9-datasheet.html
But this is difficult to implement in every case.

>> Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
>> Cc: Bjorn Helgaas <bhelgaas@google.com>
>> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>> Cc: Len Brown <lenb@kernel.org>
>> Cc: Borislav Petkov <bp@suse.de>
>> Cc: Jean Delvare <jdelvare@suse.com>
>>
>> --
>> v2: - add a newline at the end of the kernel log message,
>>     - replace: "if (match == NULL)" -> "if (!match)"
>>     - patch description tweaks.
>>
>> Tested on two computers, including the affected Dell Latitude E6500 laptop.
>>
>>  drivers/acpi/pci_irq.c | 25 ++++++++++++++++++++++---
>>  1 file changed, 22 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
>> index 08e15774fb9f..a4e41b7b71ed 100644
>> --- a/drivers/acpi/pci_irq.c
>> +++ b/drivers/acpi/pci_irq.c
>> @@ -203,6 +203,8 @@ static int acpi_pci_irq_find_prt_entry(struct pci_dev *dev,
>>  	struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
>>  	struct acpi_pci_routing_table *entry;
>>  	acpi_handle handle = NULL;
>> +	struct acpi_prt_entry *match = NULL;
>> +	const char *match_int_source = NULL;
>>  
>>  	if (dev->bus->bridge)
>>  		handle = ACPI_HANDLE(dev->bus->bridge);
>> @@ -219,13 +221,30 @@ static int acpi_pci_irq_find_prt_entry(struct pci_dev *dev,
>>  
>>  	entry = buffer.pointer;
>>  	while (entry && (entry->length > 0)) {
>> -		if (!acpi_pci_irq_check_entry(handle, dev, pin,
>> -						 entry, entry_ptr))
>> -			break;
>> +		struct acpi_prt_entry *curr;
>> +
>> +		if (!acpi_pci_irq_check_entry(handle, dev, pin, entry, &curr)) {
>> +			if (!match) {
>> +				match = curr;
>> +				match_int_source = entry->source;
>> +			} else {
>> +				pr_warn(FW_BUG
>> +				"ACPI _PRT returned duplicate IRQ routing entries for device "
>> +					"%04x:%02x:%02x[INT%c]: %s[%d] and %s[%d].\n",
> The beginning of the string should be aligned with the opening
> parenthesis, and the string should be on a single line (this is a
> encouraged exception to the 80-column rule). I would also omit the
> tailing dot for consistency.
OK
>> +					curr->id.segment, curr->id.bus, curr->id.device,
> Is the IRQ per PCI device, or per PCI function? If the latter, then you
> should print "%02x.%x" instead of just "%02x", with the extra element
> being curr->id.function.

This is per PCI device.

[snip]

> Reviewed-by: Jean Delvare <jdelvare@suse.de>
> Tested-by: Jean Delvare <jdelvare@suse.de>
>
> (Tested on a Dell OptiPlex 9020 not affected by the problem.)
>
Thank you for reviewing.

Greetings,

Mateusz
  

Patch

diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
index 08e15774fb9f..a4e41b7b71ed 100644
--- a/drivers/acpi/pci_irq.c
+++ b/drivers/acpi/pci_irq.c
@@ -203,6 +203,8 @@  static int acpi_pci_irq_find_prt_entry(struct pci_dev *dev,
 	struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
 	struct acpi_pci_routing_table *entry;
 	acpi_handle handle = NULL;
+	struct acpi_prt_entry *match = NULL;
+	const char *match_int_source = NULL;
 
 	if (dev->bus->bridge)
 		handle = ACPI_HANDLE(dev->bus->bridge);
@@ -219,13 +221,30 @@  static int acpi_pci_irq_find_prt_entry(struct pci_dev *dev,
 
 	entry = buffer.pointer;
 	while (entry && (entry->length > 0)) {
-		if (!acpi_pci_irq_check_entry(handle, dev, pin,
-						 entry, entry_ptr))
-			break;
+		struct acpi_prt_entry *curr;
+
+		if (!acpi_pci_irq_check_entry(handle, dev, pin, entry, &curr)) {
+			if (!match) {
+				match = curr;
+				match_int_source = entry->source;
+			} else {
+				pr_warn(FW_BUG
+				"ACPI _PRT returned duplicate IRQ routing entries for device "
+					"%04x:%02x:%02x[INT%c]: %s[%d] and %s[%d].\n",
+					curr->id.segment, curr->id.bus, curr->id.device,
+					pin_name(curr->pin),
+					match_int_source, match->index,
+					entry->source, curr->index);
+				// we use the first matching entry nonetheless
+			}
+		}
+
 		entry = (struct acpi_pci_routing_table *)
 		    ((unsigned long)entry + entry->length);
 	}
 
+	*entry_ptr = match;
+
 	kfree(buffer.pointer);
 	return 0;
 }