[v2,1/2] PCI: pciehp: Add support for async hotplug with native AER and DPC/EDR

Message ID 20230418210526.36514-2-Smita.KoralahalliChannabasappa@amd.com
State New
Headers
Series PCI: pciehp: Add support for native AER and DPC handling on async remove |

Commit Message

Smita Koralahalli April 18, 2023, 9:05 p.m. UTC
  According to Section 6.7.6 of PCIe Base Specification [1], async removal
with DPC and EDR may be unexpected and may result in a Surprise Down error.
This error is just a side effect of hot remove. Most of the time, these
errors will be abstract to the OS as current systems rely on Firmware-First
model for AER and DPC, where the error handling (side effects of async
remove) and other necessary HW sequencing actions is taken care by the FW
and is abstract to the OS. However, FW-First model poses issues while
rolling out updates or fixing bugs as the servers need to be brought down
for firmware updates.

Add support for async hot-plug with native AER and DPC/EDR. Here, OS is
responsible for handling async add and remove along with handling of AER
and DPC events which are generated as a side-effect of async remove.

The implementation is as follows: On an async remove a DPC is triggered
along with a Presence Detect State change. Determine it's an async remove
by checking for DPC Trigger Status in DPC Status Register and Surprise Down
Error Status in AER Uncorrected Error Status to be non-zero. If true, treat
the DPC event as a side-effect of async remove, clear the error status
registers and continue with hot-plug tear down routines. If not, follow the
existing routine to handle AER and DPC errors.

Dmesg before:

  pcieport 0000:00:01.4: DPC: containment event, status:0x1f01 source:0x0000
  pcieport 0000:00:01.4: DPC: unmasked uncorrectable error detected
  pcieport 0000:00:01.4: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
  pcieport 0000:00:01.4:   device [1022:14ab] error status/mask=00000020/04004000
  pcieport 0000:00:01.4:    [ 5] SDES (First)
  nvme nvme2: frozen state error detected, reset controller
  pcieport 0000:00:01.4: DPC: Data Link Layer Link Active not set in 1000 msec
  pcieport 0000:00:01.4: AER: subordinate device reset failed
  pcieport 0000:00:01.4: AER: device recovery failed
  pcieport 0000:00:01.4: pciehp: Slot(16): Link Down
  nvme2n1: detected capacity change from 1953525168 to 0
  pci 0000:04:00.0: Removing from iommu group 49

Dmesg after:

 pcieport 0000:00:01.4: pciehp: Slot(16): Link Down
 nvme1n1: detected capacity change from 1953525168 to 0
 pci 0000:04:00.0: Removing from iommu group 37

[1] PCI Express Base Specification Revision 6.0, Dec 16 2021.
    https://members.pcisig.com/wg/PCI-SIG/document/16609

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
v2:
	Indentation is taken care. (Bjorn)
	Unrelevant dmesg logs are removed. (Bjorn)
	Rephrased commit message, to be clear on native vs FW-First
	handling. (Bjorn and Sathyanarayanan)
	Prefix changed from pciehp_ to dpc_. (Lukas)
	Clearing ARI and AtomicOp Requester are performed as a part of
	(de-)enumeration in pciehp_unconfigure_device(). (Lukas)
	Changed to clearing all optional capabilities in DEVCTL2.
	OS-First -> native. (Sathyanarayanan)

Please note that, I have provided explanation why I'm not setting the
Surprise Down bit in uncorrectable error mask register in AER.
https://lore.kernel.org/all/fba22d6b-c225-4b44-674b-2c62306135ed@amd.com/

Also, while testing I noticed PCI_STATUS and PCI_EXP_DEVSTA will be set
on an async remove and will not be cleared while the device is brought
down. I have included clearing them here in order to mask any kind of
appearance that there was an error and as well duplicating our BIOS
functionality. I can remove if its not necessary.

On AMD systems we observe Presence Detect State change along with DPC
event on an async remove. Hence, the errors observed are benign on AMD
systems and the device will be brought down normally with PDSC. But the
errors logged might confuse users.
---
 drivers/pci/pcie/dpc.c | 50 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)
  

Comments

Kuppuswamy Sathyanarayanan May 9, 2023, 9:10 p.m. UTC | #1
On 4/18/23 2:05 PM, Smita Koralahalli wrote:
> According to Section 6.7.6 of PCIe Base Specification [1], async removal
> with DPC and EDR may be unexpected and may result in a Surprise Down error.
> This error is just a side effect of hot remove. Most of the time, these
> errors will be abstract to the OS as current systems rely on Firmware-First
> model for AER and DPC, where the error handling (side effects of async
> remove) and other necessary HW sequencing actions is taken care by the FW
> and is abstract to the OS. However, FW-First model poses issues while
> rolling out updates or fixing bugs as the servers need to be brought down
> for firmware updates.
> 
> Add support for async hot-plug with native AER and DPC/EDR. Here, OS is
> responsible for handling async add and remove along with handling of AER
> and DPC events which are generated as a side-effect of async remove.

PCIe spec r6.0, sec 6.7.6 mentions that the async removal can be handled
via DPC. So why treat it as a special case here? What do we gain with this
patch other than preventing the error recovery process?

> 
> The implementation is as follows: On an async remove a DPC is triggered
> along with a Presence Detect State change. Determine it's an async remove
> by checking for DPC Trigger Status in DPC Status Register and Surprise Down
> Error Status in AER Uncorrected Error Status to be non-zero. If true, treat
> the DPC event as a side-effect of async remove, clear the error status
> registers and continue with hot-plug tear down routines. If not, follow the
> existing routine to handle AER and DPC errors.
> 
> Dmesg before:
> 
>   pcieport 0000:00:01.4: DPC: containment event, status:0x1f01 source:0x0000
>   pcieport 0000:00:01.4: DPC: unmasked uncorrectable error detected
>   pcieport 0000:00:01.4: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
>   pcieport 0000:00:01.4:   device [1022:14ab] error status/mask=00000020/04004000
>   pcieport 0000:00:01.4:    [ 5] SDES (First)
>   nvme nvme2: frozen state error detected, reset controller
>   pcieport 0000:00:01.4: DPC: Data Link Layer Link Active not set in 1000 msec
>   pcieport 0000:00:01.4: AER: subordinate device reset failed
>   pcieport 0000:00:01.4: AER: device recovery failed
>   pcieport 0000:00:01.4: pciehp: Slot(16): Link Down
>   nvme2n1: detected capacity change from 1953525168 to 0
>   pci 0000:04:00.0: Removing from iommu group 49
> 
> Dmesg after:
> 
>  pcieport 0000:00:01.4: pciehp: Slot(16): Link Down
>  nvme1n1: detected capacity change from 1953525168 to 0
>  pci 0000:04:00.0: Removing from iommu group 37
> 
> [1] PCI Express Base Specification Revision 6.0, Dec 16 2021.
>     https://members.pcisig.com/wg/PCI-SIG/document/16609
> 
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> ---
> v2:
> 	Indentation is taken care. (Bjorn)
> 	Unrelevant dmesg logs are removed. (Bjorn)
> 	Rephrased commit message, to be clear on native vs FW-First
> 	handling. (Bjorn and Sathyanarayanan)
> 	Prefix changed from pciehp_ to dpc_. (Lukas)
> 	Clearing ARI and AtomicOp Requester are performed as a part of
> 	(de-)enumeration in pciehp_unconfigure_device(). (Lukas)
> 	Changed to clearing all optional capabilities in DEVCTL2.
> 	OS-First -> native. (Sathyanarayanan)
> 
> Please note that, I have provided explanation why I'm not setting the
> Surprise Down bit in uncorrectable error mask register in AER.
> https://lore.kernel.org/all/fba22d6b-c225-4b44-674b-2c62306135ed@amd.com/
> 
> Also, while testing I noticed PCI_STATUS and PCI_EXP_DEVSTA will be set
> on an async remove and will not be cleared while the device is brought
> down. I have included clearing them here in order to mask any kind of
> appearance that there was an error and as well duplicating our BIOS
> functionality. I can remove if its not necessary.
> 
> On AMD systems we observe Presence Detect State change along with DPC
> event on an async remove. Hence, the errors observed are benign on AMD
> systems and the device will be brought down normally with PDSC. But the
> errors logged might confuse users.
> ---
>  drivers/pci/pcie/dpc.c | 50 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 50 insertions(+)
> 
> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> index a5d7c69b764e..78559188b9ac 100644
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -293,10 +293,60 @@ void dpc_process_error(struct pci_dev *pdev)
>  	}
>  }
>  
> +static void pci_clear_surpdn_errors(struct pci_dev *pdev)
> +{
> +	u16 reg16;
> +	u32 reg32;
> +
> +	pci_read_config_dword(pdev, pdev->dpc_cap + PCI_EXP_DPC_RP_PIO_STATUS, &reg32);
> +	pci_write_config_dword(pdev, pdev->dpc_cap + PCI_EXP_DPC_RP_PIO_STATUS, reg32);
> +
> +	pci_read_config_word(pdev, PCI_STATUS, &reg16);
> +	pci_write_config_word(pdev, PCI_STATUS, reg16);
> +
> +	pcie_capability_read_word(pdev, PCI_EXP_DEVSTA, &reg16);
> +	pcie_capability_write_word(pdev, PCI_EXP_DEVSTA, reg16);
> +}
> +
> +static void dpc_handle_surprise_removal(struct pci_dev *pdev)
> +{
> +	if (pdev->dpc_rp_extensions && dpc_wait_rp_inactive(pdev))
> +		return;
> +
> +	/*
> +	 * According to Section 6.7.6 of the PCIe Base Spec 6.0, since async
> +	 * removal might be unexpected, errors might be reported as a side
> +	 * effect of the event and software should handle them as an expected
> +	 * part of this event.
> +	 */
> +	pci_aer_raw_clear_status(pdev);
> +	pci_clear_surpdn_errors(pdev);
> +
> +	pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS,
> +			      PCI_EXP_DPC_STATUS_TRIGGER);
> +}
> +
> +static bool dpc_is_surprise_removal(struct pci_dev *pdev)
> +{
> +	u16 status;
> +
> +	pci_read_config_word(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, &status);
> +
> +	if (!(status & PCI_ERR_UNC_SURPDN))
> +		return false;
> +
> +	dpc_handle_surprise_removal(pdev);
> +
> +	return true;
> +}
> +
>  static irqreturn_t dpc_handler(int irq, void *context)
>  {
>  	struct pci_dev *pdev = context;
>  
> +	if (dpc_is_surprise_removal(pdev))
> +		return IRQ_HANDLED;
> +
>  	dpc_process_error(pdev);
>  
>  	/* We configure DPC so it only triggers on ERR_FATAL */
  
Lukas Wunner May 11, 2023, 6:56 a.m. UTC | #2
On Wed, May 10, 2023 at 02:42:13PM -0700, Smita Koralahalli wrote:
> As far as I can see, async removal solely with DPC is not handled properly
> in Linux.

The dpc driver can react to a DPC event, attempt reset recovery.

But it doesn't do de-enumeration or re-enumeration of subordinate devices.
It also doesn't do slot handling (enable/disable Power Controller etc).
That's only implemented in the hotplug driver.

PCIe r6.0.1 contains appendix I.2 which basically suggests to "use DPC"
for async hot-plug but that doesn't really seem to make sense.


> On AMD systems, PDSC is triggered along with DPC on a async remove. And this
> PDSC event (hotplug handler) will unconfigure and uninitialize the driver
> and device.
> This is one thing which I wanted clarity on as per my question in v1.
> Whether all systems
> trigger PDSC on a async remove along with DPC?

In principle, yes.  Actually the hotplug driver will see both a DLLSC
*and* a PDC event and will react to whichever comes first.  Experience
has shown that the two events may occur in arbitrary order and with
significant delays in-between.

There are systems which erroneously hardwire Presence Detect to zero.
The hotplug driver works even with those.  It solely relies on the
DLLSC event then, see commit 80696f991424 ("PCI: pciehp: Tolerate
Presence Detect hardwired to zero").


> I feel there are two approaches going forward. Since, hotplug handler is
> also
> triggered with PDSC, rely on it to bring down the device and prevent calling
> the
> error_recovery process in dpc handler as its not a true error event. I have
> taken this
> approach.
> 
> Or, don't call the hotplug handler at all and rely on DPC solely to bring
> down the device
> but here, there should be additional callbacks to unconfigure and
> uninitialize the pcie
> driver and currently I only see report_slot_reset() being called from
> error_recovery()
> and I don't think it unconfigures the driver/device.

The latter approach doesn't really make sense to me because we'd have to
duplicate all the slot handling and device de-/re-enumeration in the dpc
driver.

Let's try masking Surprise Down Errors first and see how that goes.

Thanks,

Lukas
  
Lukas Wunner May 16, 2023, 10:10 a.m. UTC | #3
On Tue, Apr 18, 2023 at 09:05:25PM +0000, Smita Koralahalli wrote:
> According to Section 6.7.6 of PCIe Base Specification [1], async removal
> with DPC and EDR may be unexpected and may result in a Surprise Down error.
> This error is just a side effect of hot remove. Most of the time, these
> errors will be abstract to the OS as current systems rely on Firmware-First
> model for AER and DPC, where the error handling (side effects of async
> remove) and other necessary HW sequencing actions is taken care by the FW
> and is abstract to the OS. However, FW-First model poses issues while
> rolling out updates or fixing bugs as the servers need to be brought down
> for firmware updates.
> 
> Add support for async hot-plug with native AER and DPC/EDR. Here, OS is
> responsible for handling async add and remove along with handling of AER
> and DPC events which are generated as a side-effect of async remove.

I think you can probably leave out the details about Firmware-First.
Pointing to 6.7.6 and the fact that Surprise Down Errors may occur as
an expected side-effect of surprise removal is sufficient.  They should
be treated as such.

You also want to point out what you're trying to achieve here, i.e. get
rid of irritating log messages and a 1 sec delay while pciehp waits for
DPC recovery.


> Please note that, I have provided explanation why I'm not setting the
> Surprise Down bit in uncorrectable error mask register in AER.
> https://lore.kernel.org/all/fba22d6b-c225-4b44-674b-2c62306135ed@amd.com/

Add an explanation to the commit message that masking Surprise Down Errors
was explored as an alternative approach, but scrapped due to the odd
behavior that masking only avoids the interrupt, but still records an
error per PCIe r6.0.1 sec 6.2.3.2.2.  That stale error is going to be
reported the next time some error other than Surprise Down is handled.


> Also, while testing I noticed PCI_STATUS and PCI_EXP_DEVSTA will be set
> on an async remove and will not be cleared while the device is brought
> down. I have included clearing them here in order to mask any kind of
> appearance that there was an error and as well duplicating our BIOS
> functionality. I can remove if its not necessary.

Which bits are set exactly?  Can you constrain the register write to
those bits?


> +static void dpc_handle_surprise_removal(struct pci_dev *pdev)
> +{
> +	if (pdev->dpc_rp_extensions && dpc_wait_rp_inactive(pdev))
> +		return;

Emit an error message here?


> +	/*
> +	 * According to Section 6.7.6 of the PCIe Base Spec 6.0, since async
> +	 * removal might be unexpected, errors might be reported as a side
> +	 * effect of the event and software should handle them as an expected
> +	 * part of this event.
> +	 */

I'd move that code comment to into dpc_handler() above the
"if (dpc_is_surprise_removal(pdev))" check.


> +	pci_aer_raw_clear_status(pdev);
> +	pci_clear_surpdn_errors(pdev);
> +
> +	pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS,
> +			      PCI_EXP_DPC_STATUS_TRIGGER);
> +}

Do you need a "wake_up_all(&dpc_completed_waitqueue);" at the end
of the function to wake up a pciehp handler waiting for DPC recovery?


> +static bool dpc_is_surprise_removal(struct pci_dev *pdev)
> +{
> +	u16 status;
> +
> +	pci_read_config_word(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, &status);
> +
> +	if (!(status & PCI_ERR_UNC_SURPDN))
> +		return false;
> +

You need an additional check for pdev->is_hotplug_bridge here.

And you need to read PCI_EXP_SLTCAP and check for PCI_EXP_SLTCAP_HPS.

Return false if either of them isn't set.


> +	dpc_handle_surprise_removal(pdev);
> +
> +	return true;
> +}

A function named "..._is_..." should not have any side effects.
So move the dpc_handle_surprise_removal() call down into dpc_handler()
before the "return IRQ_HANDLED;" statement.


What about ports which support AER but not DPC?  Don't you need to
amend aer.c in that case?  I suppose you don't have hardware without
DPC to test that?

Thanks,

Lukas
  
Smita Koralahalli May 22, 2023, 10:23 p.m. UTC | #4
Hi,

On 5/16/2023 3:10 AM, Lukas Wunner wrote:
> On Tue, Apr 18, 2023 at 09:05:25PM +0000, Smita Koralahalli wrote:
>> According to Section 6.7.6 of PCIe Base Specification [1], async removal
>> with DPC and EDR may be unexpected and may result in a Surprise Down error.
>> This error is just a side effect of hot remove. Most of the time, these
>> errors will be abstract to the OS as current systems rely on Firmware-First
>> model for AER and DPC, where the error handling (side effects of async
>> remove) and other necessary HW sequencing actions is taken care by the FW
>> and is abstract to the OS. However, FW-First model poses issues while
>> rolling out updates or fixing bugs as the servers need to be brought down
>> for firmware updates.
>>
>> Add support for async hot-plug with native AER and DPC/EDR. Here, OS is
>> responsible for handling async add and remove along with handling of AER
>> and DPC events which are generated as a side-effect of async remove.
> 
> I think you can probably leave out the details about Firmware-First.
> Pointing to 6.7.6 and the fact that Surprise Down Errors may occur as
> an expected side-effect of surprise removal is sufficient.  They should
> be treated as such.
> 
> You also want to point out what you're trying to achieve here, i.e. get
> rid of irritating log messages and a 1 sec delay while pciehp waits for
> DPC recovery.

Okay.

> 
> 
>> Please note that, I have provided explanation why I'm not setting the
>> Surprise Down bit in uncorrectable error mask register in AER.
>> https://lore.kernel.org/all/fba22d6b-c225-4b44-674b-2c62306135ed@amd.com/
> 
> Add an explanation to the commit message that masking Surprise Down Errors
> was explored as an alternative approach, but scrapped due to the odd
> behavior that masking only avoids the interrupt, but still records an
> error per PCIe r6.0.1 sec 6.2.3.2.2.  That stale error is going to be
> reported the next time some error other than Surprise Down is handled.
> 
> 

Okay.

>> Also, while testing I noticed PCI_STATUS and PCI_EXP_DEVSTA will be set
>> on an async remove and will not be cleared while the device is brought
>> down. I have included clearing them here in order to mask any kind of
>> appearance that there was an error and as well duplicating our BIOS
>> functionality. I can remove if its not necessary.
> 
> Which bits are set exactly?  Can you constrain the register write to
> those bits?
>
Hmm, I was mostly trying to follow the similar approach done for AER.
pci_aer_raw_clear_status(), where they clear status registers 
unconditionally. Also, thought might be better if we are dealing with 
legacy endpoints or so.

I see these bits set in status registers:
PCI_ERR_UNCOR_STATUS 0x20 (Surprise Down)
PCI_EXP_DPC_RP_PIO_STATUS 0x10000 (Memory Request received URCompletion)
PCI_EXP_DEVSTA 0x604 (fatal error detected)

> 
>> +static void dpc_handle_surprise_removal(struct pci_dev *pdev)
>> +{
>> +	if (pdev->dpc_rp_extensions && dpc_wait_rp_inactive(pdev))
>> +		return;
> 
> Emit an error message here?

Okay

> 
> 
>> +	/*
>> +	 * According to Section 6.7.6 of the PCIe Base Spec 6.0, since async
>> +	 * removal might be unexpected, errors might be reported as a side
>> +	 * effect of the event and software should handle them as an expected
>> +	 * part of this event.
>> +	 */
> 
> I'd move that code comment to into dpc_handler() above the
> "if (dpc_is_surprise_removal(pdev))" check.

Okay.

> 
> 
>> +	pci_aer_raw_clear_status(pdev);
>> +	pci_clear_surpdn_errors(pdev);
>> +
>> +	pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS,
>> +			      PCI_EXP_DPC_STATUS_TRIGGER);
>> +}
> 
> Do you need a "wake_up_all(&dpc_completed_waitqueue);" at the end
> of the function to wake up a pciehp handler waiting for DPC recovery?

I don't think so. The pciehp handler is however getting invoked 
simultaneously due to PDSC or DLLSC state change right.. Let me know if 
I'm missing anything here.

> 
> 
>> +static bool dpc_is_surprise_removal(struct pci_dev *pdev)
>> +{
>> +	u16 status;
>> +
>> +	pci_read_config_word(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, &status);
>> +
>> +	if (!(status & PCI_ERR_UNC_SURPDN))
>> +		return false;
>> +
> 
> You need an additional check for pdev->is_hotplug_bridge here.
> 
> And you need to read PCI_EXP_SLTCAP and check for PCI_EXP_SLTCAP_HPS.
> 
> Return false if either of them isn't set.

Return false, if PCI_EXP_SLTCAP isn't set only correct? 
PCI_EXP_SLTCAP_HPS should be disabled if DPC is enabled.

Implementation notes in 6.7.6 says that:
"The Hot-Plug Surprise (HPS) mechanism, as indicated by the Hot-Plug
Surprise bit in the Slot Capabilities Register being Set, is deprecated
for use with async hot-plug. DPC is the recommended mechanism for 
supporting async hot-plug."

Platform FW will disable the SLTCAP_HPS bit at boot time to enable async 
hotplug on AMD devices.

Probably check if SLTCAP_HPS bit is set and return false?

> 
> 
>> +	dpc_handle_surprise_removal(pdev);
>> +
>> +	return true;
>> +}
> 
> A function named "..._is_..." should not have any side effects.
> So move the dpc_handle_surprise_removal() call down into dpc_handler()
> before the "return IRQ_HANDLED;" statement.

Okay.

> 
> 
> What about ports which support AER but not DPC?  Don't you need to
> amend aer.c in that case?  I suppose you don't have hardware without
> DPC to test that?

Yeah right. Also, if DPC isn't supported we leave HPS bit set and we 
won't see the DPC event at that time.

Thanks,
Smita

> 
> Thanks,
> 
> Lukas
  
Lukas Wunner June 16, 2023, 5:31 p.m. UTC | #5
On Mon, May 22, 2023 at 03:23:57PM -0700, Smita Koralahalli wrote:
> On 5/16/2023 3:10 AM, Lukas Wunner wrote:
> > On Tue, Apr 18, 2023 at 09:05:25PM +0000, Smita Koralahalli wrote:
> > > Also, while testing I noticed PCI_STATUS and PCI_EXP_DEVSTA will be set
> > > on an async remove and will not be cleared while the device is brought
> > > down. I have included clearing them here in order to mask any kind of
> > > appearance that there was an error and as well duplicating our BIOS
> > > functionality. I can remove if its not necessary.
> > 
> > Which bits are set exactly?  Can you constrain the register write to
> > those bits?
> 
> Hmm, I was mostly trying to follow the similar approach done for AER.
> pci_aer_raw_clear_status(), where they clear status registers
> unconditionally. Also, thought might be better if we are dealing with legacy
> endpoints or so.
> 
> I see these bits set in status registers:
> PCI_ERR_UNCOR_STATUS 0x20 (Surprise Down)
> PCI_EXP_DPC_RP_PIO_STATUS 0x10000 (Memory Request received URCompletion)
> PCI_EXP_DEVSTA 0x604 (fatal error detected)

I'd recommend clearing only PCI_EXP_DEVSTA_FED in PCI_EXP_DEVSTA.

As for PCI_EXP_DPC_RP_PIO_STATUS, PCIe r6.1 sec 2.9.3 says that
during DPC, either UR or CA completions are returned depending on
the DPC Completion Control bit in the DPC Control register.
The kernel doesn't touch that bit, so it will contain whatever value
the BIOS has set. It seems fine to me to just clear all bits in
PCI_EXP_DPC_RP_PIO_STATUS, as you've done in your patch.

However, the RP PIO Status register is present only in Root Ports
that support RP Extensions for DPC, per PCIe r6.1 sec 7.9.14.6.
So you need to constrain that to "if (pdev->dpc_rp_extensions)".


> > > +	pci_aer_raw_clear_status(pdev);
> > > +	pci_clear_surpdn_errors(pdev);
> > > +
> > > +	pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS,
> > > +			      PCI_EXP_DPC_STATUS_TRIGGER);
> > > +}
> > 
> > Do you need a "wake_up_all(&dpc_completed_waitqueue);" at the end
> > of the function to wake up a pciehp handler waiting for DPC recovery?
> 
> I don't think so. The pciehp handler is however getting invoked
> simultaneously due to PDSC or DLLSC state change right.. Let me know if I'm
> missing anything here.

I think you need to follow the procedure in dpc_reset_link().

That function first waits for the link to go down, in accordance with
PCIe r6.1 sec 6.2.11:

	if (!pcie_wait_for_link(pdev, false))
	...

Note that the link should not come back up due to a newly hot-added
device until DPC Trigger Status is cleared.

The function then waits for the Root Port to quiesce:

	if (pdev->dpc_rp_extensions && dpc_wait_rp_inactive(pdev))
	...

And only then does the function clear DPC Trigger Status.

You definitely need to wake_up_all(&dpc_completed_waitqueue) because
pciehp may be waiting for DPC Trigger Status to clear.

And you need to "clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags)"
before calling wake_up_all().


> > > +static bool dpc_is_surprise_removal(struct pci_dev *pdev)
> > > +{
> > > +	u16 status;
> > > +
> > > +	pci_read_config_word(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, &status);
> > > +
> > > +	if (!(status & PCI_ERR_UNC_SURPDN))
> > > +		return false;
> > > +
> > 
> > You need an additional check for pdev->is_hotplug_bridge here.
> > 
> > And you need to read PCI_EXP_SLTCAP and check for PCI_EXP_SLTCAP_HPS.
> > 
> > Return false if either of them isn't set.
> 
> Return false, if PCI_EXP_SLTCAP isn't set only correct? PCI_EXP_SLTCAP_HPS
> should be disabled if DPC is enabled.
> 
> Implementation notes in 6.7.6 says that:
> "The Hot-Plug Surprise (HPS) mechanism, as indicated by the Hot-Plug
> Surprise bit in the Slot Capabilities Register being Set, is deprecated
> for use with async hot-plug. DPC is the recommended mechanism for supporting
> async hot-plug."
> 
> Platform FW will disable the SLTCAP_HPS bit at boot time to enable async
> hotplug on AMD devices.

Huh, is PCI_EXP_SLTCAP_HPS not set on the hotplug port in question?

If it's not set, why do you get Surprise Down Errors in the first place?

How do you bring down the slot without surprise-removal capability?
Via sysfs?


> Probably check if SLTCAP_HPS bit is set and return false?

Quite the opposite!  If it's not set, return false.


Thanks,

Lukas
  
Smita Koralahalli June 16, 2023, 11:30 p.m. UTC | #6
On 6/16/2023 10:31 AM, Lukas Wunner wrote:
> On Mon, May 22, 2023 at 03:23:57PM -0700, Smita Koralahalli wrote:
>> On 5/16/2023 3:10 AM, Lukas Wunner wrote:
>>> On Tue, Apr 18, 2023 at 09:05:25PM +0000, Smita Koralahalli wrote:

> 
> I'd recommend clearing only PCI_EXP_DEVSTA_FED in PCI_EXP_DEVSTA.
> 
> As for PCI_EXP_DPC_RP_PIO_STATUS, PCIe r6.1 sec 2.9.3 says that
> during DPC, either UR or CA completions are returned depending on
> the DPC Completion Control bit in the DPC Control register.
> The kernel doesn't touch that bit, so it will contain whatever value
> the BIOS has set. It seems fine to me to just clear all bits in
> PCI_EXP_DPC_RP_PIO_STATUS, as you've done in your patch.
> 
> However, the RP PIO Status register is present only in Root Ports
> that support RP Extensions for DPC, per PCIe r6.1 sec 7.9.14.6.
> So you need to constrain that to "if (pdev->dpc_rp_extensions)".
>

Okay will make changes.

> 
>>>> +	pci_aer_raw_clear_status(pdev);
>>>> +	pci_clear_surpdn_errors(pdev);
>>>> +
>>>> +	pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS,
>>>> +			      PCI_EXP_DPC_STATUS_TRIGGER);
>>>> +}
>>>
>>> Do you need a "wake_up_all(&dpc_completed_waitqueue);" at the end
>>> of the function to wake up a pciehp handler waiting for DPC recovery?
>>
>> I don't think so. The pciehp handler is however getting invoked
>> simultaneously due to PDSC or DLLSC state change right.. Let me know if I'm
>> missing anything here.
> 
> I think you need to follow the procedure in dpc_reset_link().
> 
> That function first waits for the link to go down, in accordance with
> PCIe r6.1 sec 6.2.11:
> 
> 	if (!pcie_wait_for_link(pdev, false))
> 	...
> 
> Note that the link should not come back up due to a newly hot-added
> device until DPC Trigger Status is cleared.
> 
> The function then waits for the Root Port to quiesce:
> 
> 	if (pdev->dpc_rp_extensions && dpc_wait_rp_inactive(pdev))
> 	...
> 
> And only then does the function clear DPC Trigger Status.
> 
> You definitely need to wake_up_all(&dpc_completed_waitqueue) because
> pciehp may be waiting for DPC Trigger Status to clear.
> 
> And you need to "clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags)"
> before calling wake_up_all().
> 
>

Okay. I did not consider the fact that pciehp handler "may" wait on DPC
Trigger Status to be cleared. Because in my case both the handlers were
invoked due to their respective bit changes and I did not come across 
the case where pciehp handler was waiting on DPC to complete.


>>>> +static bool dpc_is_surprise_removal(struct pci_dev *pdev)
>>>> +{
>>>> +	u16 status;
>>>> +
>>>> +	pci_read_config_word(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, &status);
>>>> +
>>>> +	if (!(status & PCI_ERR_UNC_SURPDN))
>>>> +		return false;
>>>> +
>>>
>>> You need an additional check for pdev->is_hotplug_bridge here.
>>>
>>> And you need to read PCI_EXP_SLTCAP and check for PCI_EXP_SLTCAP_HPS.
>>>
>>> Return false if either of them isn't set.
>>
>> Return false, if PCI_EXP_SLTCAP isn't set only correct? PCI_EXP_SLTCAP_HPS
>> should be disabled if DPC is enabled.
>>
>> Implementation notes in 6.7.6 says that:
>> "The Hot-Plug Surprise (HPS) mechanism, as indicated by the Hot-Plug
>> Surprise bit in the Slot Capabilities Register being Set, is deprecated
>> for use with async hot-plug. DPC is the recommended mechanism for supporting
>> async hot-plug."
>>
>> Platform FW will disable the SLTCAP_HPS bit at boot time to enable async
>> hotplug on AMD devices.
> 
> Huh, is PCI_EXP_SLTCAP_HPS not set on the hotplug port in question?
> 
> If it's not set, why do you get Surprise Down Errors in the first place?
> 
> How do you bring down the slot without surprise-removal capability?
> Via sysfs?
>

As per SPEC 6.7.6, "Either Downstream Port Containment (DPC) or the 
Hot-Plug Surprise (HPS) mechanism may be used to support async removal 
as part of an overall async hot-plug architecture".

Also, the implementation notes below, it conveys that HPS is deprecated 
and DPC is recommended mechanism. More details can be found in Appendix 
I, I.1 Async Hot-Plug Initial Configuration:
...
If DPC capability then,
	If HPS bit not Set, use DPC
	Else attempt to Clear HPS bit (§ Section 6.7.4.4 )
		If successful, use DPC
		Else use HPS
...

So, this is "likely" a new feature support patch where DPC supports 
async remove. HPS bit will be disabled by BIOS if DPC is chosen as 
recommended mechanism to handle async removal.

I see the slot is being brought down by PDC or DLLSC event, which is 
triggered alongside DPC.

pciehp_handle_presence_or_link_change() -> pciehp_disable_slot() -> 
__pciehp_disable_slot() -> remove_board()..

But I want to clear one thing, are you implying that PDC or DLLSC 
shouldn't be triggered when HPS is disabled?

Thanks,
Smita

> 
>> Probably check if SLTCAP_HPS bit is set and return false?
> 
> Quite the opposite!  If it's not set, return false.
> 
> 
> Thanks,
> 
> Lukas
  
Lukas Wunner June 17, 2023, 7:59 a.m. UTC | #7
On Fri, Jun 16, 2023 at 04:30:49PM -0700, Smita Koralahalli wrote:
> On 6/16/2023 10:31 AM, Lukas Wunner wrote:
> > On Mon, May 22, 2023 at 03:23:57PM -0700, Smita Koralahalli wrote:
> > > On 5/16/2023 3:10 AM, Lukas Wunner wrote:
> > > > On Tue, Apr 18, 2023 at 09:05:25PM +0000, Smita Koralahalli wrote:
> > > > > +static bool dpc_is_surprise_removal(struct pci_dev *pdev)
> > > > > +{
> > > > > +	u16 status;
> > > > > +
> > > > > +	pci_read_config_word(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, &status);
> > > > > +
> > > > > +	if (!(status & PCI_ERR_UNC_SURPDN))
> > > > > +		return false;
> > > > > +
> > > > 
> > > > You need an additional check for pdev->is_hotplug_bridge here.
> > > > 
> > > > And you need to read PCI_EXP_SLTCAP and check for PCI_EXP_SLTCAP_HPS.
> > > > 
> > > > Return false if either of them isn't set.
> > > 
> > > Return false, if PCI_EXP_SLTCAP isn't set only correct? PCI_EXP_SLTCAP_HPS
> > > should be disabled if DPC is enabled.
> > > 
> > > Implementation notes in 6.7.6 says that:
> > > "The Hot-Plug Surprise (HPS) mechanism, as indicated by the Hot-Plug
> > > Surprise bit in the Slot Capabilities Register being Set, is deprecated
> > > for use with async hot-plug. DPC is the recommended mechanism for supporting
> > > async hot-plug."
> > > 
> > > Platform FW will disable the SLTCAP_HPS bit at boot time to enable async
> > > hotplug on AMD devices.
> > 
> > Huh, is PCI_EXP_SLTCAP_HPS not set on the hotplug port in question?
> > 
> > If it's not set, why do you get Surprise Down Errors in the first place?
> > 
> > How do you bring down the slot without surprise-removal capability?
> > Via sysfs?
> 
> As per SPEC 6.7.6, "Either Downstream Port Containment (DPC) or the Hot-Plug
> Surprise (HPS) mechanism may be used to support async removal as part of an
> overall async hot-plug architecture".
> 
> Also, the implementation notes below, it conveys that HPS is deprecated and
> DPC is recommended mechanism. More details can be found in Appendix I, I.1
> Async Hot-Plug Initial Configuration:
> ...
> If DPC capability then,
> 	If HPS bit not Set, use DPC
> 	Else attempt to Clear HPS bit (§ Section 6.7.4.4 )
> 		If successful, use DPC
> 		Else use HPS
> ...
> 
> So, this is "likely" a new feature support patch where DPC supports async
> remove. HPS bit will be disabled by BIOS if DPC is chosen as recommended
> mechanism to handle async removal.
> 
> I see the slot is being brought down by PDC or DLLSC event, which is
> triggered alongside DPC.
> 
> pciehp_handle_presence_or_link_change() -> pciehp_disable_slot() ->
> __pciehp_disable_slot() -> remove_board()..
> 
> But I want to clear one thing, are you implying that PDC or DLLSC shouldn't
> be triggered when HPS is disabled?

Sorry, please ignore my advice to check PCI_EXP_SLTCAP_HPS in
dpc_is_surprise_removal().

Instead, only check for pdev->is_hotplug_bridge.  The rationale being
that Surprise Down errors are par for the course on hotplug ports,
but they're an anomaly that must be reported on non-hotplug ports.

PCI_EXP_SLTCAP_HPS has no bearing on pciehp's behavior.  If there's
a hotplug port and hotplug control was granted to the OS, pciehp will
bind to the device.  pciehp will react to any DLLSC and PDC event.

I think the target audience for PCIe r6.1 sec 6.7.6 are hardware
designers and the advice given there is to add DPC capability to
hotplug ports.  That's fine.  It doesn't affect how Linux handles
async removal.

I think the wording in the spec to "use DPC" for async removal is
confusing.  We don't want to duplicate the code for device
de-/re-enumeration and slot bringup / bringdown in the dpc driver.
We just continue letting pciehp do that (and thus retain compatibility
with older, non-DPC-capable hardware).  But we wire up pciehp with
the dpc driver to add DPC-awareness for hotplug.

One part of the equation was to ignore link changes which occur
as a side effect of a DPC event from which we've successfully
recovered.  That was added with commit a97396c6eb13 ("PCI: pciehp:
Ignore Link Down/Up caused by DPC").

The other part of the equation (which you're adding here) is to
ignore Surprise Down errors in the dpc driver which occur as a side
effect of async removal.

I think that makes us compliant with PCIe r6.1 sec 6.7.6, although
we're interpreting "use DPC" (or async removal) somewhat liberally.
Actually I'd prefer the term "pragmatically" instead of "liberally". ;)

We don't want to duplicate code just for the sake of being word-by-word
compliant with the spec.  If it works in the real world, it's fine. :)

Thanks,

Lukas
  

Patch

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index a5d7c69b764e..78559188b9ac 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -293,10 +293,60 @@  void dpc_process_error(struct pci_dev *pdev)
 	}
 }
 
+static void pci_clear_surpdn_errors(struct pci_dev *pdev)
+{
+	u16 reg16;
+	u32 reg32;
+
+	pci_read_config_dword(pdev, pdev->dpc_cap + PCI_EXP_DPC_RP_PIO_STATUS, &reg32);
+	pci_write_config_dword(pdev, pdev->dpc_cap + PCI_EXP_DPC_RP_PIO_STATUS, reg32);
+
+	pci_read_config_word(pdev, PCI_STATUS, &reg16);
+	pci_write_config_word(pdev, PCI_STATUS, reg16);
+
+	pcie_capability_read_word(pdev, PCI_EXP_DEVSTA, &reg16);
+	pcie_capability_write_word(pdev, PCI_EXP_DEVSTA, reg16);
+}
+
+static void dpc_handle_surprise_removal(struct pci_dev *pdev)
+{
+	if (pdev->dpc_rp_extensions && dpc_wait_rp_inactive(pdev))
+		return;
+
+	/*
+	 * According to Section 6.7.6 of the PCIe Base Spec 6.0, since async
+	 * removal might be unexpected, errors might be reported as a side
+	 * effect of the event and software should handle them as an expected
+	 * part of this event.
+	 */
+	pci_aer_raw_clear_status(pdev);
+	pci_clear_surpdn_errors(pdev);
+
+	pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS,
+			      PCI_EXP_DPC_STATUS_TRIGGER);
+}
+
+static bool dpc_is_surprise_removal(struct pci_dev *pdev)
+{
+	u16 status;
+
+	pci_read_config_word(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, &status);
+
+	if (!(status & PCI_ERR_UNC_SURPDN))
+		return false;
+
+	dpc_handle_surprise_removal(pdev);
+
+	return true;
+}
+
 static irqreturn_t dpc_handler(int irq, void *context)
 {
 	struct pci_dev *pdev = context;
 
+	if (dpc_is_surprise_removal(pdev))
+		return IRQ_HANDLED;
+
 	dpc_process_error(pdev);
 
 	/* We configure DPC so it only triggers on ERR_FATAL */