[v2,1/1] PCI: Fix link activation wait logic

Message ID 20240208132205.4550-1-ilpo.jarvinen@linux.intel.com
State New
Headers
Series [v2,1/1] PCI: Fix link activation wait logic |

Commit Message

Ilpo Järvinen Feb. 8, 2024, 1:22 p.m. UTC
  If link retraining fails in pcie_failed_link_retrain() it returns false
but the wrong logic in pcie_wait_for_link_delay() translates this into
success by returning true after a delay.

As a result, pci_bridge_wait_for_secondary_bus() does not print out a
message and return failure but goes into pci_dev_wait() which just
spends >60s waiting for a device that will not come up.

The long resume delay problem has been observed to occur when resuming
devices that got disconnected while suspended:

pcieport 0000:00:07.2: power state changed by ACPI to D3cold
..
thunderbolt 1-701: device disconnected
pcieport 0000:00:07.2: power state changed by ACPI to D0
pcieport 0000:00:07.2: waiting 100 ms for downstream link
pcieport 0000:57:03.0: waiting 100 ms for downstream link, after activation
pcieport 0000:57:03.0: broken device, retraining non-functional downstream link at 2.5GT/s
pcieport 0000:57:03.0: retraining failed
pcieport 0000:57:03.0: broken device, retraining non-functional downstream link at 2.5GT/s
pcieport 0000:57:03.0: retraining failed
pcieport 0000:73:00.0: not ready 1023ms after resume; waiting
pcieport 0000:73:00.0: not ready 2047ms after resume; waiting
pcieport 0000:73:00.0: not ready 4095ms after resume; waiting
pcieport 0000:73:00.0: not ready 8191ms after resume; waiting
pcieport 0000:73:00.0: not ready 16383ms after resume; waiting
pcieport 0000:73:00.0: not ready 32767ms after resume; waiting
pcieport 0000:73:00.0: not ready 65535ms after resume; giving up
pcieport 0000:57:03.0: pciehp: pciehp_check_link_active: lnk_status = 5041
pcieport 0000:73:00.0: Unable to change power state from D3cold to D0, device inaccessible
pcieport 0000:57:03.0: pciehp: Slot(3): Card not present

Fix the logic error by returning false immediately if
pcie_failed_link_retrain() fails.

Fixes: 1abb47390350 ("Merge branch 'pci/enumeration'")
Link: https://lore.kernel.org/linux-pci/a0b070b7-14ce-7cc5-4e6c-6e15f3fcab75@linux.intel.com/T/#t
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---

I think this change should be made in the same change as the Target
Speed quirk fix (make it return false when no retraining was
attempted) because otherwise there are additional logic troubles
in the intermediate state.

v2:
- Removed quirks part (still needed but Maciej planned to test and send
  another patch for that)
- Improved commit message

---
 drivers/pci/pci.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)
  

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d8f11a078924..ca4159472a72 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5068,9 +5068,7 @@  static bool pcie_wait_for_link_delay(struct pci_dev *pdev, bool active,
 		msleep(20);
 	rc = pcie_wait_for_link_status(pdev, false, active);
 	if (active) {
-		if (rc)
-			rc = pcie_failed_link_retrain(pdev);
-		if (rc)
+		if (rc < 0 && !pcie_failed_link_retrain(pdev))
 			return false;
 
 		msleep(delay);