Message ID | pmodcoakbs25z2a7mlo5gpuz63zluh35vbgb5itn6k5aqhjnny@jvphbpvahtse |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp1278997vqr; Fri, 2 Jun 2023 13:22:39 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4uKXL7+L9tRQisybL5tDKXGROAZMVJNw3zzjZTPPQU+S9Gk/QQfiJ4IWRMGMKtG5GKUDGw X-Received: by 2002:a05:6a21:7891:b0:10a:eea0:6987 with SMTP id bf17-20020a056a21789100b0010aeea06987mr7224792pzc.26.1685737359622; Fri, 02 Jun 2023 13:22:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685737359; cv=none; d=google.com; s=arc-20160816; b=zuT/taTtRTgOI2G/1VHW1IoqFsFHaAdgZb/MkYUcdFjnZ3YgqTP7ssA53+4DICX75P ciZnDh73VgT0ThCtRX/iUo1in3Q0+0N7nMMhxWlWHSmNFVVSl+svc72MLH6TVQNMiD4b XHbYobgOLXOVzds7tYW0dweX9lQYXdiz83VAfSG7iQQhid4NvwuHRGLP/exOUDrdoMbR mkNF7xPam2jOAcdjgeFByQdb/1sVwlMw+tBbdDiuuwkZxEJ96l3+k7+fKBhxgP7/foyd sfieBSS6HvYqr+32B/1tceA/Zr5wenGbd6D04ZExiysn5pO6AomW2TNGuxYmXHO/XAEU pT9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:content-disposition:mime-version :message-id:subject:cc:to:from:date:dkim-signature; bh=7Z1/Pgz58TbhaQAzpuoEur5XNSR4LavXZhKLTHjF5Cw=; b=LEgMXhp7eQj0zZWbIVDu03de5I/vSzJ+/hEyat+kcnx2tqtp++yU7t+eIZsREAukEq VVzA04AYuO3Dh2fy5eQefkzx/wFlitjU3c1222baap1bU+sIoSm9Si6+OtCQaexEsUK4 N7BraRqHHBDVOOLiCRGeys47HvQ3sCGWFhOsVUIAUGLQ8MmntL3mTeuBgqd0RF7tTLg4 JiZZfZmkZw7oQ2kEgtQwsqC0qZ7XxCzgdFa4OzDFxguFTf13dRsO9NfKoht+qHiWQYSM nea3U653l3rugcQiBC1QEaDn79nENc4tCuRVm8Osb3drdr6G7LECxdc1Zmub9/kLUq+6 oe7A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=FlZeDhvs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h18-20020aa796d2000000b0063b669ec9a0si1242374pfq.103.2023.06.02.13.22.27; Fri, 02 Jun 2023 13:22:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=FlZeDhvs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234452AbjFBTeg (ORCPT <rfc822;limurcpp@gmail.com> + 99 others); Fri, 2 Jun 2023 15:34:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38732 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236141AbjFBTeY (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 2 Jun 2023 15:34:24 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3A77E19B for <linux-kernel@vger.kernel.org>; Fri, 2 Jun 2023 12:33:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685734406; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type; bh=7Z1/Pgz58TbhaQAzpuoEur5XNSR4LavXZhKLTHjF5Cw=; b=FlZeDhvsdoVuIeC5RUv0HGDQMjfAbROZdUmW6xV21lgMmSSVDqdsPm/178bwkDAu/RgoC7 azlDlrKeaXaR9AR4CtHOBxygo2Ixd3qBYpe8S7bv7KvfcijdO02ywssSxvoTqiqnag90LD cLtmWlirHa+1SI/7EDv0h8atnxkqPRs= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-215-4mpDBvSYNBqVaCzvH69W8Q-1; Fri, 02 Jun 2023 15:33:25 -0400 X-MC-Unique: 4mpDBvSYNBqVaCzvH69W8Q-1 Received: by mail-qv1-f71.google.com with SMTP id 6a1803df08f44-6237c937691so28208436d6.0 for <linux-kernel@vger.kernel.org>; Fri, 02 Jun 2023 12:33:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685734403; x=1688326403; h=user-agent:content-disposition:mime-version:message-id:subject:cc :to:from:date:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=7Z1/Pgz58TbhaQAzpuoEur5XNSR4LavXZhKLTHjF5Cw=; b=BKqiQGCPrVjw8cjf4vRPtEQhOVcy65IyvjBlzgnHhynY/xTrhlm9iJzConwc6xwV0K taous/1LeFp38DF+gmoJHaEbX1jaXrlGKDZ9TrHV0q+DxyOY4OGqav2s4d6gtJ3cS7nc UXboY8yVvmSXMB5m6UIypW0vRuayQoBRsVDqIEV2wDlh1VzoZOnL3nQPfP8iajqQ6XQo Y1Zm+qaaVsKtmf7AAzTlEaxEcmcbttMd+EzBOOK1HtfHKuKVdMEl4bUP3eTl/dStK08U Bv2Qe00lgFWnda8vqIhxn3KsY2OQWNlDtICmohw70qKqc7I8ISYtKo/ekzRpxkRY9hPm EBrg== X-Gm-Message-State: AC+VfDxyOcVXrYsTl5CWKpeL32imJOJRF7NaN2jKFjDZiRxM9uVzhWMT DK6fJD7OTbV7x3r9eRuyoE4GgzHIPVGUJ9QzMLQwpFT44ToZcJyaidvLCOsWhOcY+cwPtf/7KYm JugqAo8Y+9OT0T9SRjV9mxakOpMzRwxl5xsSP2naTwplkPcT1OD9zPhQVSvnyqmaTYV4qoG4LkV TeYLkVFt8bggufmg== X-Received: by 2002:a05:6214:240b:b0:623:9a08:4edd with SMTP id fv11-20020a056214240b00b006239a084eddmr7805026qvb.25.1685734403696; Fri, 02 Jun 2023 12:33:23 -0700 (PDT) X-Received: by 2002:a05:6214:240b:b0:623:9a08:4edd with SMTP id fv11-20020a056214240b00b006239a084eddmr7804993qvb.25.1685734403378; Fri, 02 Jun 2023 12:33:23 -0700 (PDT) Received: from fedora (bras-base-wndson1334w-grc-09-142-113-164-22.dsl.bell.ca. [142.113.164.22]) by smtp.gmail.com with ESMTPSA id f30-20020ad4559e000000b0061c7431810esm1145512qvx.141.2023.06.02.12.33.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 02 Jun 2023 12:33:22 -0700 (PDT) Date: Fri, 2 Jun 2023 15:33:21 -0400 From: Lucas Karpinski <lkarpins@redhat.com> To: linux-kernel@vger.kernel.org Cc: agross@kernel.org, andersson@kernel.org, konrad.dybcio@linaro.org, robh+dt@kernel.org, krzysztof.kozlowski+dt@linaro.org, linux-arm-msm@vger.kernel.org, devicetree@vger.kernel.org, ahalaney@redhat.com, echanude@redhat.com, bmasney@redhat.com, quic_shazhuss@quicinc.com Subject: [PATCH] Revert "arm64: dts: qcom: sa8540p-ride: enable pcie2a node" Message-ID: <pmodcoakbs25z2a7mlo5gpuz63zluh35vbgb5itn6k5aqhjnny@jvphbpvahtse> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: NeoMutt/20230517 X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767623737328554710?= X-GMAIL-MSGID: =?utf-8?q?1767623737328554710?= |
Series |
Revert "arm64: dts: qcom: sa8540p-ride: enable pcie2a node"
|
|
Commit Message
Lucas Karpinski
June 2, 2023, 7:33 p.m. UTC
This reverts commit 2eb4cdcd5aba2db83f2111de1242721eeb659f71.
The patch introduced a sporadic error where the Qdrive3 will fail to
boot occasionally due to an rcu preempt stall.
Qualcomm has disabled pcie2a downstream:
https://git.codelinaro.org/clo/la/platform/vendor/qcom-opensource/rh-patch/-/commit/447f2135909683d1385af36f95fae5e1d63a7e2f
rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 0-....: (1 GPs behind) idle=77fc/1/0x4000000000000004 softirq=841/841 fqs=2476
rcu: (t=5253 jiffies g=-175 q=2552 ncpus=8)
Call trace:
__do_softirq
____do_softirq
call_on_irq_stack
do_softirq_own_stack
__irq_exit_rcu
irq_exit_rcu
The issue occurs normally once every 3-4 boot cycles.
There is likely a race condition caused when setting up the two pcie
domains concurrently (pcie2a and pcie3a).
The issue is not present when only pcie2a is enabled or when only pcie3a
is enabled.
A workaround was found that allowed the Qdrive3 to boot with both pcie2a
and pcie3a enabled.
Set the .probe_type to PROBE_FORCE_SYNCHRONOUS and add an msleep() to
the probing function.
This is not a solution, so this patch is disabling pcie2a as it seems
Red Hat are the only ones working on the board,
we're find with disabling the node until a root cause is found. If
anyone has further suggestions for debugging, let me know.
Signed-off-by: Lucas Karpinski <lkarpins@redhat.com>
---
During debugging:
- Added additional time for clock/regulator stabilization.
- Reduced the bandwidth across pcie2a and pcie3a.
- Replaced the interconnect setup from another driver.
- The 32-bit/64-bit/config-io space for both pcie2a and pcie3a look to be mapped correctly.
- Verified interconnects were started successfully.
arch/arm64/boot/dts/qcom/sa8540p-ride.dts | 44 -----------------------
1 file changed, 44 deletions(-)
Comments
Hi Lucas, On Fri, Jun 02, 2023 at 03:33:21PM -0400, Lucas Karpinski wrote: > This reverts commit 2eb4cdcd5aba2db83f2111de1242721eeb659f71. I am all for reverting this commit however I think your commit message needs cleaned up. > The patch introduced a sporadic error where the Qdrive3 will fail to > boot occasionally due to an rcu preempt stall. > Qualcomm has disabled pcie2a downstream: > https://git.codelinaro.org/clo/la/platform/vendor/qcom-opensource/rh-patch/-/commit/447f2135909683d1385af36f95fae5e1d63a7e2f Personally I'd remove the mention of the downstream kernel is this case. Also your paragraphs are formatted weird with a newline at the end of every sentence. Get them to flow together as a regular paragraph. This is the relevant line that I have in my muttrc file to help. set editor="vim -c 'set spell spelllang=en' -c 'set tw=72' -c 'set wrap'" > rcu: INFO: rcu_preempt self-detected stall on CPU > rcu: 0-....: (1 GPs behind) idle=77fc/1/0x4000000000000004 softirq=841/841 fqs=2476 > rcu: (t=5253 jiffies g=-175 q=2552 ncpus=8) > Call trace: > __do_softirq > ____do_softirq > call_on_irq_stack > do_softirq_own_stack > __irq_exit_rcu > irq_exit_rcu > > The issue occurs normally once every 3-4 boot cycles. > There is likely a race condition caused when setting up the two pcie > domains concurrently (pcie2a and pcie3a). I would also add that Qualcomm told us that upgrading the firmware on the PCIe switch would correct this issue. We've upgraded the PCIe switch to the latest firmware and this issue is still present. Apparently we need to use a specific older version of the firmware that we can't get from the PCIe switch vendor or Qualcomm. Nothing is hooked up to pcie2a on the QDrive3 so there's no loss in functionality by disabling this. We always have to remember to revert this commit when working with an upstream kernel. > This is not a solution, so this patch is disabling pcie2a as it seems > Red Hat are the only ones working on the board, > we're find with disabling the node until a root cause is found. If > anyone has further suggestions for debugging, let me know. This should go under the ---. Brian
On Fri, Jun 02, 2023 at 03:33:21PM -0400, Lucas Karpinski wrote: > This reverts commit 2eb4cdcd5aba2db83f2111de1242721eeb659f71. > > The patch introduced a sporadic error where the Qdrive3 will fail to > boot occasionally due to an rcu preempt stall. > Qualcomm has disabled pcie2a downstream: > https://git.codelinaro.org/clo/la/platform/vendor/qcom-opensource/rh-patch/-/commit/447f2135909683d1385af36f95fae5e1d63a7e2f > > rcu: INFO: rcu_preempt self-detected stall on CPU > rcu: 0-....: (1 GPs behind) idle=77fc/1/0x4000000000000004 softirq=841/841 fqs=2476 > rcu: (t=5253 jiffies g=-175 q=2552 ncpus=8) > Call trace: > __do_softirq > ____do_softirq > call_on_irq_stack > do_softirq_own_stack > __irq_exit_rcu > irq_exit_rcu > > The issue occurs normally once every 3-4 boot cycles. > There is likely a race condition caused when setting up the two pcie > domains concurrently (pcie2a and pcie3a). > > The issue is not present when only pcie2a is enabled or when only pcie3a > is enabled. > A workaround was found that allowed the Qdrive3 to boot with both pcie2a > and pcie3a enabled. > Set the .probe_type to PROBE_FORCE_SYNCHRONOUS and add an msleep() to > the probing function. > This is not a solution, so this patch is disabling pcie2a as it seems > Red Hat are the only ones working on the board, > we're find with disabling the node until a root cause is found. If > anyone has further suggestions for debugging, let me know. > > Signed-off-by: Lucas Karpinski <lkarpins@redhat.com> > --- > During debugging: > - Added additional time for clock/regulator stabilization. > - Reduced the bandwidth across pcie2a and pcie3a. > - Replaced the interconnect setup from another driver. > - The 32-bit/64-bit/config-io space for both pcie2a and pcie3a look to be mapped correctly. > - Verified interconnects were started successfully. I was looking at another issue downstream triggering a soft lock on CPU0, but it turns out this could be the same thing except the symptoms are less noticeable (the 3-4 boot cycles you mention). Using next-20230609, if I add a return kprobe on dw_handle_msi_irq: echo 'r:dwmsi_probe dw_handle_msi_irq $retval' > /sys/kernel/debug/tracing/kprobe_events echo 1 > /sys/kernel/debug/tracing/events/kprobes/dwmsi_probe/enable cat /sys/kernel/debug/tracing/trace_pipe <idle>-0 [000] d.h1. 690.417268: dwmsi_probe: (dw_chained_msi_isr+0x38/0xb8 <- dw_handle_msi_irq) arg1=0x0 <idle>-0 [000] d.h1. 690.417272: dwmsi_probe: (dw_chained_msi_isr+0x38/0xb8 <- dw_handle_msi_irq) arg1=0x0 <idle>-0 [000] d.h1. 690.417276: dwmsi_probe: (dw_chained_msi_isr+0x38/0xb8 <- dw_handle_msi_irq) arg1=0x0 <idle>-0 [000] d.h1. 690.417281: dwmsi_probe: (dw_chained_msi_isr+0x38/0xb8 <- dw_handle_msi_irq) arg1=0x0 <idle>-0 [000] d.h1. 690.417284: dwmsi_probe: (dw_chained_msi_isr+0x38/0xb8 <- dw_handle_msi_irq) arg1=0x0 <idle>-0 [000] d.h1. 690.417288: dwmsi_probe: (dw_chained_msi_isr+0x38/0xb8 <- dw_handle_msi_irq) arg1=0x0 [...] dw_handle_msi_irq constantly fires and never returns IRQ_HANDLED. It happens consistently for pcie2a or pcie3a, after I disable one or the other. I presume having both might be enough to overwhelm the system and trigger the stall? Looking at the handler, the status is always 0 after: status = dw_pcie_readl_dbi(pci, PCIE_MSI_INTR0_STATUS + (i * MSI_REG_CTRL_BLOCK_SIZE)); Unfortunately I do not know why that is yet. > > arch/arm64/boot/dts/qcom/sa8540p-ride.dts | 44 ----------------------- > 1 file changed, 44 deletions(-) > > diff --git a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts > index 24fa449d48a6..d492723ccf7c 100644 > --- a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts > +++ b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts > @@ -186,27 +186,6 @@ &i2c18 { > status = "okay"; > }; > > -&pcie2a { > - ranges = <0x01000000 0x0 0x3c200000 0x0 0x3c200000 0x0 0x100000>, > - <0x02000000 0x0 0x3c300000 0x0 0x3c300000 0x0 0x1d00000>, > - <0x03000000 0x5 0x00000000 0x5 0x00000000 0x1 0x00000000>; > - > - perst-gpios = <&tlmm 143 GPIO_ACTIVE_LOW>; > - wake-gpios = <&tlmm 145 GPIO_ACTIVE_HIGH>; > - > - pinctrl-names = "default"; > - pinctrl-0 = <&pcie2a_default>; > - > - status = "okay"; > -}; > - > -&pcie2a_phy { > - vdda-phy-supply = <&vreg_l11a>; > - vdda-pll-supply = <&vreg_l3a>; > - > - status = "okay"; > -}; > - > &pcie3a { > ranges = <0x01000000 0x0 0x40200000 0x0 0x40200000 0x0 0x100000>, > <0x02000000 0x0 0x40300000 0x0 0x40300000 0x0 0x20000000>, > @@ -356,29 +335,6 @@ i2c18_default: i2c18-default-state { > bias-pull-up; > }; > > - pcie2a_default: pcie2a-default-state { > - perst-pins { > - pins = "gpio143"; > - function = "gpio"; > - drive-strength = <2>; > - bias-pull-down; > - }; > - > - clkreq-pins { > - pins = "gpio142"; > - function = "pcie2a_clkreq"; > - drive-strength = <2>; > - bias-pull-up; > - }; > - > - wake-pins { > - pins = "gpio145"; > - function = "gpio"; > - drive-strength = <2>; > - bias-pull-up; > - }; > - }; > - > pcie3a_default: pcie3a-default-state { > perst-pins { > pins = "gpio151"; > -- > 2.40.1 >
diff --git a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts index 24fa449d48a6..d492723ccf7c 100644 --- a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts +++ b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts @@ -186,27 +186,6 @@ &i2c18 { status = "okay"; }; -&pcie2a { - ranges = <0x01000000 0x0 0x3c200000 0x0 0x3c200000 0x0 0x100000>, - <0x02000000 0x0 0x3c300000 0x0 0x3c300000 0x0 0x1d00000>, - <0x03000000 0x5 0x00000000 0x5 0x00000000 0x1 0x00000000>; - - perst-gpios = <&tlmm 143 GPIO_ACTIVE_LOW>; - wake-gpios = <&tlmm 145 GPIO_ACTIVE_HIGH>; - - pinctrl-names = "default"; - pinctrl-0 = <&pcie2a_default>; - - status = "okay"; -}; - -&pcie2a_phy { - vdda-phy-supply = <&vreg_l11a>; - vdda-pll-supply = <&vreg_l3a>; - - status = "okay"; -}; - &pcie3a { ranges = <0x01000000 0x0 0x40200000 0x0 0x40200000 0x0 0x100000>, <0x02000000 0x0 0x40300000 0x0 0x40300000 0x0 0x20000000>, @@ -356,29 +335,6 @@ i2c18_default: i2c18-default-state { bias-pull-up; }; - pcie2a_default: pcie2a-default-state { - perst-pins { - pins = "gpio143"; - function = "gpio"; - drive-strength = <2>; - bias-pull-down; - }; - - clkreq-pins { - pins = "gpio142"; - function = "pcie2a_clkreq"; - drive-strength = <2>; - bias-pull-up; - }; - - wake-pins { - pins = "gpio145"; - function = "gpio"; - drive-strength = <2>; - bias-pull-up; - }; - }; - pcie3a_default: pcie3a-default-state { perst-pins { pins = "gpio151";