[V2] PCI: Clear errors logged in Secondary Status Register

Message ID 20240116143258.483235-1-vidyas@nvidia.com
State New
Headers
Series [V2] PCI: Clear errors logged in Secondary Status Register |

Commit Message

Vidya Sagar Jan. 16, 2024, 2:32 p.m. UTC
  The enumeration process leaves the 'Received Master Abort' bit set in
the Secondary Status Register of the downstream port in the following
scenarios.

(1) The device connected to the downstream port has ARI capability
    and that makes the kernel set the 'ARI Forwarding Enable' bit in
    the Device Control 2 Register of the downstream port. This
    effectively makes the downstream port forward the configuration
    requests targeting the devices downstream of it, even though they
    don't exist in reality. It causes the downstream devices return
    completions with UR set in the status in turn causing 'Received
    Master Abort' bit set.

    In contrast, if the downstream device doesn't have ARI capability,
    the 'ARI Forwarding Enable' bit in the downstream port is not set
    and any configuration requests targeting the downstream devices
    that don't exist are terminated (section 6.13 of PCI Express Base
    6.0 spec) in the downstream port itself resulting in no change of
    the 'Received Master Abort' bit.

(2) A PCIe switch is connected to the downstream port and when the
    enumeration flow tries to explore the presence of devices that
    don't really exist downstream of the switch, the downstream
    port receives the completions with UR set causing the 'Received
    Master Abort' bit set.

Clear 'Received Master Abort' bit to keep the bridge device in a clean
state post enumeration.

Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
---
V2:
* Changed commit message based on Bjorn's feedback

 drivers/pci/probe.c | 3 +++
 1 file changed, 3 insertions(+)
  

Comments

Bjorn Helgaas Jan. 22, 2024, 11 p.m. UTC | #1
On Tue, Jan 16, 2024 at 08:02:58PM +0530, Vidya Sagar wrote:
> The enumeration process leaves the 'Received Master Abort' bit set in
> the Secondary Status Register of the downstream port in the following
> scenarios.
> 
> (1) The device connected to the downstream port has ARI capability
>     and that makes the kernel set the 'ARI Forwarding Enable' bit in
>     the Device Control 2 Register of the downstream port. This
>     effectively makes the downstream port forward the configuration
>     requests targeting the devices downstream of it, even though they
>     don't exist in reality. It causes the downstream devices return
>     completions with UR set in the status in turn causing 'Received
>     Master Abort' bit set.
> 
>     In contrast, if the downstream device doesn't have ARI capability,
>     the 'ARI Forwarding Enable' bit in the downstream port is not set
>     and any configuration requests targeting the downstream devices
>     that don't exist are terminated (section 6.13 of PCI Express Base
>     6.0 spec) in the downstream port itself resulting in no change of
>     the 'Received Master Abort' bit.
> 
> (2) A PCIe switch is connected to the downstream port and when the
>     enumeration flow tries to explore the presence of devices that
>     don't really exist downstream of the switch, the downstream
>     port receives the completions with UR set causing the 'Received
>     Master Abort' bit set.

Are these the only possible ways this error is logged?  I expected
them to be logged when we enumerate below a Root Port that has nothing
attached, for example.

Does clearing them in pci_scan_bridge_extend() cover all ways this
error might be logged during enumeration?  I can't remember whether
all enumeration goes through this path.

> Clear 'Received Master Abort' bit to keep the bridge device in a clean
> state post enumeration.
> 
> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> ---
> V2:
> * Changed commit message based on Bjorn's feedback
> 
>  drivers/pci/probe.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 795534589b98..640d2871b061 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1470,6 +1470,9 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
>  	}
>  
>  out:
> +	/* Clear errors in the Secondary Status Register */
> +	pci_write_config_word(dev, PCI_SEC_STATUS, 0xffff);
> +
>  	pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl);
>  
>  	pm_runtime_put(&dev->dev);
> -- 
> 2.25.1
>
  

Patch

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 795534589b98..640d2871b061 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1470,6 +1470,9 @@  static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
 	}
 
 out:
+	/* Clear errors in the Secondary Status Register */
+	pci_write_config_word(dev, PCI_SEC_STATUS, 0xffff);
+
 	pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl);
 
 	pm_runtime_put(&dev->dev);