[v2] arm64: dts: qcom: sa8540p-ride: disable pcie2a node

Message ID qcoqksikfvdqxk6stezbzc7l2br37ccgqswztzqejmhrkhbrwt@ta4npsm35mqk
State New
Headers
Series [v2] arm64: dts: qcom: sa8540p-ride: disable pcie2a node |

Commit Message

Lucas Karpinski Jan. 9, 2024, 3:20 p.m. UTC
  pcie2a and pcie3a both cause interrupt storms to occur. However, when
both are enabled simultaneously, the two combined interrupt storms will
lead to rcu stalls. Red Hat is the only company still using this board
and since we still need pcie3a, just disable pcie2a.

Signed-off-by: Lucas Karpinski <lkarpins@redhat.com>
---
v2:
- don't remove the entire pcie2a node, just set status to disabled.
- update commit message.

 arch/arm64/boot/dts/qcom/sa8540p-ride.dts | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
  

Comments

Brian Masney Jan. 11, 2024, 2:02 p.m. UTC | #1
On Tue, Jan 09, 2024 at 10:20:50AM -0500, Lucas Karpinski wrote:
> pcie2a and pcie3a both cause interrupt storms to occur. However, when
> both are enabled simultaneously, the two combined interrupt storms will
> lead to rcu stalls. Red Hat is the only company still using this board
> and since we still need pcie3a, just disable pcie2a.
> 
> Signed-off-by: Lucas Karpinski <lkarpins@redhat.com>

Reviewed-by: Brian Masney <bmasney@redhat.com>

To elaborate further: Leaving both pcie2a and pcie3a enabled will lead
to rcu stalls and the board fails to boot when both are enabled. We
have the latest firmware that we've been able to get from QC.
Disabling one of the pcie nodes works around the boot issue. There's
nothing interesting on pcie2a on the development board, and pcie3a is
enabled because it has 10GB ethernet that works upstream.

The interrupt storm on pcie3a can still occur on this platform, however
that's a separate issue.

Brian
  
Andrew Halaney Jan. 11, 2024, 3:06 p.m. UTC | #2
On Thu, Jan 11, 2024 at 09:02:41AM -0500, Brian Masney wrote:
> On Tue, Jan 09, 2024 at 10:20:50AM -0500, Lucas Karpinski wrote:
> > pcie2a and pcie3a both cause interrupt storms to occur. However, when
> > both are enabled simultaneously, the two combined interrupt storms will
> > lead to rcu stalls. Red Hat is the only company still using this board
> > and since we still need pcie3a, just disable pcie2a.
> > 
> > Signed-off-by: Lucas Karpinski <lkarpins@redhat.com>
> 
> Reviewed-by: Brian Masney <bmasney@redhat.com>
> 
> To elaborate further: Leaving both pcie2a and pcie3a enabled will lead
> to rcu stalls and the board fails to boot when both are enabled. We
> have the latest firmware that we've been able to get from QC.
> Disabling one of the pcie nodes works around the boot issue. There's
> nothing interesting on pcie2a on the development board, and pcie3a is
> enabled because it has 10GB ethernet that works upstream.
> 
> The interrupt storm on pcie3a can still occur on this platform, however
> that's a separate issue.

Related work-around to that in case anyone is interested in the paper
trail:

    https://lore.kernel.org/all/89c13962f5502a89d48f1efb7a6203d155a7e18d.camel@redhat.com/
  
Bjorn Andersson Jan. 30, 2024, 9:29 p.m. UTC | #3
On Tue, Jan 09, 2024 at 10:20:50AM -0500, Lucas Karpinski wrote:
> pcie2a and pcie3a both cause interrupt storms to occur. However, when
> both are enabled simultaneously, the two combined interrupt storms will
> lead to rcu stalls. Red Hat is the only company still using this board
> and since we still need pcie3a, just disable pcie2a.
> 

Why are there interrupt storms? What interrupt(s) is(are) involved?

Do you consider this a temporary fix?

Are you okay with pcie3a misbehaving?

Regards,
Bjorn

> Signed-off-by: Lucas Karpinski <lkarpins@redhat.com>
> ---
> v2:
> - don't remove the entire pcie2a node, just set status to disabled.
> - update commit message.
> 
>  arch/arm64/boot/dts/qcom/sa8540p-ride.dts | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> index b04f72ec097c..177b9dad6ff7 100644
> --- a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> +++ b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> @@ -376,14 +376,14 @@ &pcie2a {
>  	pinctrl-names = "default";
>  	pinctrl-0 = <&pcie2a_default>;
>  
> -	status = "okay";
> +	status = "disabled";
>  };
>  
>  &pcie2a_phy {
>  	vdda-phy-supply = <&vreg_l11a>;
>  	vdda-pll-supply = <&vreg_l3a>;
>  
> -	status = "okay";
> +	status = "disabled";
>  };
>  
>  &pcie3a {
> -- 
> 2.43.0
>
  
Lucas Karpinski Jan. 30, 2024, 10:15 p.m. UTC | #4
> Why are there interrupt storms? What interrupt(s) is(are) involved?
In the earlier link that Andrew mentioned, the DesignWare PCIe driver
uses a chained interrupt to demultiplex the downstream MSI interrupts.
This meant we couldn't identify the MSI interrupt source, so it is not
clear what is causing the hw to misbehave the way that it is.
                                                   
> Do you consider this a temporary fix?            
This will likely be a permanent fix. Qualcomm disabled pcie2a in their 
downstream kernel as well, quite some time ago, so this may never be 
actually fixed.
                                                   
> Are you okay with pcie3a misbehaving?            
Yes, it would be great of the underlying issue was addressed, but at 
least the boards are usable with just pcie3a enabled and the nic will be 
available.      
                                                   
Lucas  

 
> > Signed-off-by: Lucas Karpinski <lkarpins@redhat.com>
> > ---
> > v2:
> > - don't remove the entire pcie2a node, just set status to disabled.
> > - update commit message.
> > 
> >  arch/arm64/boot/dts/qcom/sa8540p-ride.dts | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> > index b04f72ec097c..177b9dad6ff7 100644
> > --- a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> > +++ b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> > @@ -376,14 +376,14 @@ &pcie2a {
> >  	pinctrl-names = "default";
> >  	pinctrl-0 = <&pcie2a_default>;
> >  
> > -	status = "okay";
> > +	status = "disabled";
> >  };
> >  
> >  &pcie2a_phy {
> >  	vdda-phy-supply = <&vreg_l11a>;
> >  	vdda-pll-supply = <&vreg_l3a>;
> >  
> > -	status = "okay";
> > +	status = "disabled";
> >  };
> >  
> >  &pcie3a {
> > -- 
> > 2.43.0
> > 
>
  
Bjorn Andersson Feb. 20, 2024, 5:57 p.m. UTC | #5
On Tue, 09 Jan 2024 10:20:50 -0500, Lucas Karpinski wrote:
> pcie2a and pcie3a both cause interrupt storms to occur. However, when
> both are enabled simultaneously, the two combined interrupt storms will
> lead to rcu stalls. Red Hat is the only company still using this board
> and since we still need pcie3a, just disable pcie2a.
> 
> 

Applied, thanks!

[1/1] arm64: dts: qcom: sa8540p-ride: disable pcie2a node
      commit: 07bbe3fd0704ab47d365756a31f45a86e3b45c0a

Best regards,
  

Patch

diff --git a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
index b04f72ec097c..177b9dad6ff7 100644
--- a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
+++ b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
@@ -376,14 +376,14 @@  &pcie2a {
 	pinctrl-names = "default";
 	pinctrl-0 = <&pcie2a_default>;
 
-	status = "okay";
+	status = "disabled";
 };
 
 &pcie2a_phy {
 	vdda-phy-supply = <&vreg_l11a>;
 	vdda-pll-supply = <&vreg_l3a>;
 
-	status = "okay";
+	status = "disabled";
 };
 
 &pcie3a {