Message ID | 20230509153912.515218-1-vincenzopalazzodev@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2995382vqo; Tue, 9 May 2023 09:02:46 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5pxeiMndstkDx6dMWSq9QDCDkMdvpSYn8nwyDW3uEgcVRa9XRaXUi5+2215MngASW64V2o X-Received: by 2002:a17:90a:bd94:b0:24e:5a0:d2c5 with SMTP id z20-20020a17090abd9400b0024e05a0d2c5mr14540743pjr.30.1683648166137; Tue, 09 May 2023 09:02:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683648166; cv=none; d=google.com; s=arc-20160816; b=ZYuTHDSI5RPwyX97F5BB6sHuWOzssARqpySpnUbmCb7s4p/PWL+0lTAOe2mHfm8q/i MRBSHiZT69mjlHw3vm7RONyKlM+2Z/c3GPA0jKxIOtm1l432C2yiAL0nDNeRS4rkMuwm kD6Yx8IY1GRkICk33DXVLr3tW7/3KyBr7ZA0IONsugOysCQ1JjyXmARSUOvNJ1JBjT5g +wRVvMgXKMt2ozcHGtjbJNdkTojYJ3H0houK2jVk5/m4BHnCW97XPvwcbvvYel7aAqKo KmgM2GoWqvEM5v2d18GULEA5OMUyi1sWxuLrJGnjEzr/vMfYzy3kd4QVs0aHJrRVx4CM sMNQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=F36MPH23paVmBGSmxmn9RmQiUOeoap4Sfmdw21UVa9w=; b=IiO/5j8mPWTKctGZRiwANMPI570ufx9NEgEbW9Tkmx1r413EZpRwuPgIZpr+f7uMrd wxoQJ+lIdVK2yxvX1ifd6Y068TwXF3fPg2nH2yUs0cbjKPxBfiQBawYDHysBiM4/Kb9x zb+gAW90R8oHMVDmVUwJoYk1xGzF5F/Bk0RqBBeX1/vzeZ3R1t+Qbe4ei+2q6czE1u9U j2ZIujRZJyPMrbyo7za0s5XzqxQ59uXNv26c/8ywc0RIG9ugb+cUWQgqksyBiiv+Jo5l OBbeiPlc864p/Fu+lNMKvzy7UAEjbq3PIhvRzqIKkiQ/QEdPYlF8mxJH0ep+eeVuoWiB +6Lg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=XBiuMgHj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 28-20020a17090a1a1c00b002299b06dca9si14153580pjk.83.2023.05.09.09.02.20; Tue, 09 May 2023 09:02:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=XBiuMgHj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235838AbjEIPjY (ORCPT <rfc822;baris.duru.linux@gmail.com> + 99 others); Tue, 9 May 2023 11:39:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52780 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235635AbjEIPjW (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 9 May 2023 11:39:22 -0400 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0D9FC4692; Tue, 9 May 2023 08:39:19 -0700 (PDT) Received: by mail-wr1-x434.google.com with SMTP id ffacd0b85a97d-2f27a9c7970so5740863f8f.2; Tue, 09 May 2023 08:39:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1683646758; x=1686238758; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=F36MPH23paVmBGSmxmn9RmQiUOeoap4Sfmdw21UVa9w=; b=XBiuMgHjTejfrPttOMKprcz7BcB4jNVzYAr4eyM2xB/XiscRrTTOSrOfnMW1NElRsg iFk0A7gcSxyGllWXDWrJXiKh4GnAGWwVrbiigSDlHUXgoCY+XoGS9k2izWtenR5M5peJ FE6frrJLM3xTfnvHl2gDgCPfoJ4zyjA2NCKpdU/k4EZgomLYeqknzl5kwkpQjMOEA1KZ lVUl2y92LrZsMFDtUPnJV5xUYDzd3D1xeobev10f+1mbZXcNEzdeQqI0DQFQy9dCCqXu o3WjMI4I7OVGQmJp/Fnw8fxyAL1lxRKc5AepFKKabuMS++ReR+p6gZqQr6pzPYQD0lXY Le9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683646758; x=1686238758; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=F36MPH23paVmBGSmxmn9RmQiUOeoap4Sfmdw21UVa9w=; b=Vt/yIkpRA9KYRRx0Ifj+2LkxgUvGpIhQS/nD/ZuR0a6oC84IE4Bb+aK/aVhsIIji5F S96Ksk905os99UTWhY3QNXoW4REKRshkuZ1y2w4y3tfa2NcgCeC4g8WyljlQBAYZz63G LPHfaw2bKumtOn8fq4/Nnsd5SKECZAiVKlcBD0kqsKzn0dvGK5er7FZWI3y0tB4m1Aa9 +NM0PVgob0Lx1us6bZnvUzwLV5mpjy98tcrOOqRaxXKBgaamd5ZR+Fo7ZzuolGvTI5Mk VqnKok+kD24l2CC6o/bMtXN82Ltcrf8iQkulOBfH/B3jc0b9mbHiBET/4opddeKZP6tU 7cgg== X-Gm-Message-State: AC+VfDwYkSLIYL24D37HXSI3BmNYj/FYs7FID5enLE/q669AEBbaQ258 SyUTzBb9QLZvH3DFqsacu8BB0Ymx+uSt6Oiwgww= X-Received: by 2002:adf:fccb:0:b0:306:3899:ccbf with SMTP id f11-20020adffccb000000b003063899ccbfmr10480307wrs.14.1683646757670; Tue, 09 May 2023 08:39:17 -0700 (PDT) Received: from localhost.localdomain ([146.70.132.238]) by smtp.gmail.com with ESMTPSA id z11-20020a5d4c8b000000b003079d61a107sm4632589wrs.25.2023.05.09.08.39.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 May 2023 08:39:17 -0700 (PDT) From: Vincenzo Palazzo <vincenzopalazzodev@gmail.com> To: linux-pci@vger.kernel.org Cc: linux-rockchip@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kw@linux.com, robh@kernel.org, bhelgaas@google.com, heiko@sntech.de, lgirdwood@gmail.com, broonie@kernel.org, skhan@linuxfoundation.org, shawn.lin@rock-chips.com, lpieralisi@kernel.org, linux-kernel-mentees@lists.linuxfoundation.org, Vincenzo Palazzo <vincenzopalazzodev@gmail.com>, Dan Johansen <strit@manjaro.org> Subject: [PATCH v1] drivers: pci: introduce configurable delay for Rockchip PCIe bus scan Date: Tue, 9 May 2023 17:39:12 +0200 Message-Id: <20230509153912.515218-1-vincenzopalazzodev@gmail.com> X-Mailer: git-send-email 2.40.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1765433059652433598?= X-GMAIL-MSGID: =?utf-8?q?1765433059652433598?= |
Series |
[v1] drivers: pci: introduce configurable delay for Rockchip PCIe bus scan
|
|
Commit Message
Vincenzo Palazzo
May 9, 2023, 3:39 p.m. UTC
Add a configurable delay to the Rockchip PCIe driver to address
crashes that occur on some old devices, such as the Pine64 RockPro64.
This issue is affecting the ARM community, but there is no
upstream solution for it yet.
The crash I worked on is described below.
Co-Developed-by: Dan Johansen <strit@manjaro.org>
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
Crash dump (customized Manjaro kernel before this patch):
[ 1.229856] SError Interrupt on CPU4, code 0xbf000002 -- SError
[ 1.229860] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.9.9-2.0-MANJARO-ARM #1
[ 1.229862] Hardware name: Pine64 RockPro64 v2.1 (DT)
[ 1.229864] pstate: 60000085 (nZCv daIf -PAN -UAO BTYPE=--)
[ 1.229866] pc : rockchip_pcie_rd_conf+0xb4/0x270
[ 1.229868] lr : rockchip_pcie_rd_conf+0x1b4/0x270
[ 1.229870] sp : ffff80001004b850
[ 1.229872] x29: ffff80001004b850 x28: 0000000000000001
[ 1.229877] x27: 0000000000000000 x26: ffff00007a795000
[ 1.229882] x25: ffff00007a7910b0 x24: 0000000000000000
[ 1.229887] x23: 0000000000000000 x22: ffff00007b3a4380
[ 1.229891] x21: ffff80001004b8c4 x20: 0000000000000004
[ 1.229895] x19: 0000000000100000 x18: 0000000000000020
[ 1.229900] x17: 0000000000000001 x16: 0000000000000019
[ 1.229904] x15: ffff00007b222fd8 x14: ffffffffffffffff
[ 1.229908] x13: ffff00007a79ba1c x12: ffff00007a79b290
[ 1.229912] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
[ 1.229917] x9 : ff72646268756463 x8 : 0000000000000391
[ 1.229921] x7 : ffff80001004b880 x6 : 0000000000000001
[ 1.229925] x5 : 0000000000000000 x4 : 0000000000000000
[ 1.229930] x3 : 0000000000c00008 x2 : 000000000080000a
[ 1.229934] x1 : 0000000000000000 x0 : ffff800014000000
[ 1.229939] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 1.229942] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.9.9-2.0-MANJARO-ARM #1
[ 1.229944] Hardware name: Pine64 RockPro64 v2.1 (DT)
[ 1.229946] Call trace:
[ 1.229948] dump_backtrace+0x0/0x1d0
[ 1.229949] show_stack+0x18/0x24
[ 1.229951] dump_stack+0xc0/0x118
[ 1.229953] panic+0x148/0x320
[ 1.229955] nmi_panic+0x8c/0x90
[ 1.229956] arm64_serror_panic+0x78/0x84
[ 1.229958] do_serror+0x15c/0x160
[ 1.229960] el1_error+0x84/0x100
[ 1.229962] rockchip_pcie_rd_conf+0xb4/0x270
[ 1.229964] pci_bus_read_config_dword+0x6c/0xd0
[ 1.229966] pci_bus_generic_read_dev_vendor_id+0x34/0x1b0
[ 1.229968] pci_scan_single_device+0xa4/0x144
[ 1.229970] pci_scan_slot+0x40/0x12c
[ 1.229972] pci_scan_child_bus_extend+0x58/0x34c
[ 1.229974] pci_scan_bridge_extend+0x310/0x590
[ 1.229976] pci_scan_child_bus_extend+0x210/0x34c
[ 1.229978] pci_scan_root_bus_bridge+0x68/0xdc
[ 1.229980] pci_host_probe+0x18/0xc4
[ 1.229981] rockchip_pcie_probe+0x204/0x330
[ 1.229984] platform_drv_probe+0x54/0xb0
[ 1.229985] really_probe+0xe8/0x500
[ 1.229987] driver_probe_device+0xd8/0xf0
[ 1.229989] device_driver_attach+0xc0/0xcc
[ 1.229991] __driver_attach+0xa4/0x170
[ 1.229993] bus_for_each_dev+0x70/0xc0
[ 1.229994] driver_attach+0x24/0x30
[ 1.229996] bus_add_driver+0x140/0x234
[ 1.229998] driver_register+0x78/0x130
[ 1.230000] __platform_driver_register+0x4c/0x60
[ 1.230002] rockchip_pcie_driver_init+0x1c/0x28
[ 1.230004] do_one_initcall+0x54/0x1c0
[ 1.230005] do_initcalls+0xf4/0x130
[ 1.230007] kernel_init_freeable+0x144/0x19c
[ 1.230009] kernel_init+0x14/0x11c
[ 1.230011] ret_from_fork+0x10/0x34
[ 1.230035] SMP: stopping secondary CPUs
[ 1.230037] Kernel Offset: disabled
[ 1.230039] CPU features: 0x0240022,2100200c
[ 1.230041] Memory Limit: none
---
.../admin-guide/kernel-parameters.txt | 8 +++++
.../boot/dts/rockchip/rk3399-rockpro64.dtsi | 3 +-
drivers/pci/controller/pcie-rockchip-host.c | 31 +++++++++++++++++++
drivers/pci/controller/pcie-rockchip.c | 5 +++
drivers/pci/controller/pcie-rockchip.h | 10 ++++++
5 files changed, 56 insertions(+), 1 deletion(-)
Comments
Hi Vincenzo, Thanks for raising this issue. Let's see what we can do to address it. On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote: > Add a configurable delay to the Rockchip PCIe driver to address > crashes that occur on some old devices, such as the Pine64 RockPro64. > > This issue is affecting the ARM community, but there is no > upstream solution for it yet. It sounds like this happens with several endpoints, right? And I assume the endpoints work fine in other non-Rockchip systems? If that's the case, my guess is the problem is with the Rockchip host controller and how it's initialized, not with the endpoints. The only delays and timeouts I see in the driver now are in rockchip_pcie_host_init_port(), where it waits for link training to complete. I assume the link training did completely successfully since you don't mention either a gen1 or gen2 timeout (although the gen2 message is a dev_dbg() that normally wouldn't go to the console). I don't know that the spec contains a retrain timeout value. Several other drivers use 1 second, while rockchip uses 500ms (for example, see LINK_RETRAIN_TIMEOUT and LINK_UP_TIMEOUT). I think we need to understand the issue better before adding a DT property and a module parameter. Those are hard for users to deal with. If we can figure out a value that works for everybody, it would be better to just hard-code it in the driver and use that all the time. A few minor style/formatting comments below just for future reference: > Crash dump (customized Manjaro kernel before this patch): > [ 1.229856] SError Interrupt on CPU4, code 0xbf000002 -- SError > [ 1.229860] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.9.9-2.0-MANJARO-ARM #1 > [ 1.229862] Hardware name: Pine64 RockPro64 v2.1 (DT) > [ 1.229864] pstate: 60000085 (nZCv daIf -PAN -UAO BTYPE=--) > [ 1.229866] pc : rockchip_pcie_rd_conf+0xb4/0x270 > [ 1.229868] lr : rockchip_pcie_rd_conf+0x1b4/0x270 > [ 1.229870] sp : ffff80001004b850 > [ 1.229872] x29: ffff80001004b850 x28: 0000000000000001 > [ 1.229877] x27: 0000000000000000 x26: ffff00007a795000 > [ 1.229882] x25: ffff00007a7910b0 x24: 0000000000000000 > [ 1.229887] x23: 0000000000000000 x22: ffff00007b3a4380 > [ 1.229891] x21: ffff80001004b8c4 x20: 0000000000000004 > [ 1.229895] x19: 0000000000100000 x18: 0000000000000020 > [ 1.229900] x17: 0000000000000001 x16: 0000000000000019 > [ 1.229904] x15: ffff00007b222fd8 x14: ffffffffffffffff > [ 1.229908] x13: ffff00007a79ba1c x12: ffff00007a79b290 > [ 1.229912] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f > [ 1.229917] x9 : ff72646268756463 x8 : 0000000000000391 > [ 1.229921] x7 : ffff80001004b880 x6 : 0000000000000001 > [ 1.229925] x5 : 0000000000000000 x4 : 0000000000000000 > [ 1.229930] x3 : 0000000000c00008 x2 : 000000000080000a > [ 1.229934] x1 : 0000000000000000 x0 : ffff800014000000 > [ 1.229939] Kernel panic - not syncing: Asynchronous SError Interrupt > [ 1.229942] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.9.9-2.0-MANJARO-ARM #1 > [ 1.229944] Hardware name: Pine64 RockPro64 v2.1 (DT) > [ 1.229946] Call trace: > [ 1.229948] dump_backtrace+0x0/0x1d0 > [ 1.229949] show_stack+0x18/0x24 > [ 1.229951] dump_stack+0xc0/0x118 > [ 1.229953] panic+0x148/0x320 > [ 1.229955] nmi_panic+0x8c/0x90 > [ 1.229956] arm64_serror_panic+0x78/0x84 > [ 1.229958] do_serror+0x15c/0x160 > [ 1.229960] el1_error+0x84/0x100 > [ 1.229962] rockchip_pcie_rd_conf+0xb4/0x270 > [ 1.229964] pci_bus_read_config_dword+0x6c/0xd0 > [ 1.229966] pci_bus_generic_read_dev_vendor_id+0x34/0x1b0 > [ 1.229968] pci_scan_single_device+0xa4/0x144 > [ 1.229970] pci_scan_slot+0x40/0x12c > [ 1.229972] pci_scan_child_bus_extend+0x58/0x34c > [ 1.229974] pci_scan_bridge_extend+0x310/0x590 > [ 1.229976] pci_scan_child_bus_extend+0x210/0x34c > [ 1.229978] pci_scan_root_bus_bridge+0x68/0xdc > [ 1.229980] pci_host_probe+0x18/0xc4 > [ 1.229981] rockchip_pcie_probe+0x204/0x330 Include only the parts of the crash dump that are needed to debug the problem or identify the problem enough to find this patch. Timestamps probably aren't necessary. Register contents -- probably not either. The rest of the backtrace (below here) probably isn't useful. > [ 1.229984] platform_drv_probe+0x54/0xb0 > [ 1.229985] really_probe+0xe8/0x500 > [ 1.229987] driver_probe_device+0xd8/0xf0 > [ 1.229989] device_driver_attach+0xc0/0xcc > [ 1.229991] __driver_attach+0xa4/0x170 > [ 1.229993] bus_for_each_dev+0x70/0xc0 > [ 1.229994] driver_attach+0x24/0x30 > [ 1.229996] bus_add_driver+0x140/0x234 > [ 1.229998] driver_register+0x78/0x130 > [ 1.230000] __platform_driver_register+0x4c/0x60 > [ 1.230002] rockchip_pcie_driver_init+0x1c/0x28 > [ 1.230004] do_one_initcall+0x54/0x1c0 > [ 1.230005] do_initcalls+0xf4/0x130 > [ 1.230007] kernel_init_freeable+0x144/0x19c > [ 1.230009] kernel_init+0x14/0x11c > [ 1.230011] ret_from_fork+0x10/0x34 > [ 1.230035] SMP: stopping secondary CPUs > [ 1.230037] Kernel Offset: disabled > [ 1.230039] CPU features: 0x0240022,2100200c > [ 1.230041] Memory Limit: none > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -4286,6 +4286,14 @@ > any pair of devices, possibly at the cost of > reduced performance. This also guarantees > that hot-added devices will work. > + pcie_rockchip_host.bus_scan_delay= [PCIE] Delay in ms before > + scanning PCIe bus in Rockchip PCIe host driver. Some PCIe > + cards seem to need delays that can be several hundred ms. > + If set to greater than or equal to 0 this parameter will > + override delay that can be set in device tree. > + Values less than 0 the module will hit an assertion > + during the init. > + The default value is 0. Generally speaking module-specific stuff like this doesn't get documented in kernel-parameters.txt. > +++ b/arch/arm64/boot/dts/rockchip/rk3399-rockpro64.dtsi > @@ -663,7 +663,8 @@ &pcie0 { > pinctrl-0 = <&pcie_perst>; > vpcie12v-supply = <&vcc12v_dcin>; > vpcie3v3-supply = <&vcc3v3_pcie>; > - status = "okay"; > + bus-scan-delay-ms = <0>; > + status = "okay"; Please don't add arbitrary whitespace changes (it looks like this uses leading spaces instead of tabs). > +/* bus_scan_delay - module parameter to override the > + * device tree value, which is 0 by default. */ Please follow comment style in the file, i.e., /* * bus_scan_delay - ... */ Wrap comments to fill 78 columns or so to match the rest of the file. > @@ -987,6 +996,23 @@ static int rockchip_pcie_probe(struct platform_device *pdev) > > rockchip_pcie_enable_interrupts(rockchip); > > + prob_delay = rockchip->bus_scan_delay; > + if (bus_scan_delay) > + prob_delay = bus_scan_delay; > + > + /* > + * FIXME: This is a workaround for some devices that crash on calls to pci_host_probe() > + * or pci_scan_root_bus_bridge(). We add a delay before bus scanning to avoid the crash. > + * The call trace reaches rockchip_pcie_rd_conf() while attempting to read the vendor ID > + * (pci_bus_generic_read_dev_vendor_id() is in the call stack) before panicking. > + * > + * I'm not sure why this workaround is effective or what causes the panic. It may be related > + * to the cansleep value. Wrap comments to fit in 78 columns to match the rest of the file. > + */ > + dev_info(dev, "wait %u ms before bus scan\n", prob_delay); > + if (prob_delay > 0) > + msleep(prob_delay); I don't think we want to just add a random delay here that's not connected to anything else. I assume it could go in rockchip_pcie_host_init_port() or perhaps rockchip_pcie_init_port() (which deasserts resets, and there are usually timing constraints related to deasserting resets). Hopefully Shawn can shed some light on this. > err = pci_host_probe(bridge); > if (err < 0) > goto err_remove_irq_domain; > @@ -1055,6 +1081,11 @@ static struct platform_driver rockchip_pcie_driver = { > }; > module_platform_driver(rockchip_pcie_driver); > > +/** Allow to override the device tree default configuration with > + * a command line argument. > + **/ Use multi-line comment style that matches the rest of the file. > +module_param_named(bus_scan_delay, bus_scan_delay, int, S_IRUGO); This should go right next to the bus_scan_delay definition above. > +++ b/drivers/pci/controller/pcie-rockchip.c > @@ -149,6 +149,11 @@ int rockchip_pcie_parse_dt(struct rockchip_pcie *rockchip) > return PTR_ERR(rockchip->clk_pcie_pm); > } > > + err = of_property_read_u32(node, "bus-scan-delay-ms", &rockchip->bus_scan_delay); > + if (err) { > + dev_info(dev, "no bus scan delay, default to 0 ms\n"); > + rockchip->bus_scan_delay = 0; I hope we don't need this property at all, but if we do, I assume it should be optional, with no message needed if it's not present. > +++ b/drivers/pci/controller/pcie-rockchip.h > @@ -299,6 +299,16 @@ struct rockchip_pcie { > phys_addr_t msg_bus_addr; > bool is_rc; > struct resource *mem_res; > + > + /* It seems that the driver crashes on some > + * older devices. To work around this, we > + * should add a sleep delay before probing. > + * > + * FIXME: need more investigated with an, > + * but looks like the problem can be related with > + * the cansleep value? > + **/ We need better understanding of what's going on here. Then this comment could be made more specific, shorter, and formatted like others. > + u32 bus_scan_delay;
On Tue, May 9, 2023 at 5:19 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > Hi Vincenzo, > > Thanks for raising this issue. Let's see what we can do to address > it. > > On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote: > > Add a configurable delay to the Rockchip PCIe driver to address > > crashes that occur on some old devices, such as the Pine64 RockPro64. > > > > This issue is affecting the ARM community, but there is no > > upstream solution for it yet. > > It sounds like this happens with several endpoints, right? And I > assume the endpoints work fine in other non-Rockchip systems? If > that's the case, my guess is the problem is with the Rockchip host > controller and how it's initialized, not with the endpoints. > > The only delays and timeouts I see in the driver now are in > rockchip_pcie_host_init_port(), where it waits for link training to > complete. I assume the link training did completely successfully > since you don't mention either a gen1 or gen2 timeout (although the > gen2 message is a dev_dbg() that normally wouldn't go to the console). > > I don't know that the spec contains a retrain timeout value. Several > other drivers use 1 second, while rockchip uses 500ms (for example, > see LINK_RETRAIN_TIMEOUT and LINK_UP_TIMEOUT). > > I think we need to understand the issue better before adding a DT > property and a module parameter. Those are hard for users to deal > with. If we can figure out a value that works for everybody, it would > be better to just hard-code it in the driver and use that all the > time. Good Evening, The main issue with the rk3399 is the PCIe controller is buggy and triggers a SoC panic when certain error conditions occur that should be handled gracefully. One of those conditions is when an endpoint requests an access to wait and retry later. Many years ago we ran that issue to ground and with Robin Murphy's help we found that while it's possible to gracefully handle that condition it required hijacking the entire arm64 error handling routine. Not exactly scalable for just one SoC. The configurable waits allow us to program reasonable times for 90% of the endpoints that come up in the normal amount of time, while being able to adjust it for the other 10% that do not. Some require multiple seconds before they return without error. Part of the reason we don't want to hardcode the wait time is because the probe isn't handled asynchronously, so the kernel appears to hang while waiting for the timeout. I'm curious if it's been tested with this series on top: https://lore.kernel.org/linux-arm-kernel/20230418074700.1083505-8-rick.wertenbroek@gmail.com/T/ I'm particularly curious if [v5,04/11] PCI: rockchip: Add poll and timeout to wait for PHY PLLs to be locked makes a difference in the behavior. Please test this and see if it improves the timeouts you need for the endpoints you're testing against. Very Respectfully, Peter Geis > > A few minor style/formatting comments below just for future reference: > > > Crash dump (customized Manjaro kernel before this patch): > > [ 1.229856] SError Interrupt on CPU4, code 0xbf000002 -- SError > > [ 1.229860] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.9.9-2.0-MANJARO-ARM #1 > > [ 1.229862] Hardware name: Pine64 RockPro64 v2.1 (DT) > > [ 1.229864] pstate: 60000085 (nZCv daIf -PAN -UAO BTYPE=--) > > [ 1.229866] pc : rockchip_pcie_rd_conf+0xb4/0x270 > > [ 1.229868] lr : rockchip_pcie_rd_conf+0x1b4/0x270 > > [ 1.229870] sp : ffff80001004b850 > > [ 1.229872] x29: ffff80001004b850 x28: 0000000000000001 > > [ 1.229877] x27: 0000000000000000 x26: ffff00007a795000 > > [ 1.229882] x25: ffff00007a7910b0 x24: 0000000000000000 > > [ 1.229887] x23: 0000000000000000 x22: ffff00007b3a4380 > > [ 1.229891] x21: ffff80001004b8c4 x20: 0000000000000004 > > [ 1.229895] x19: 0000000000100000 x18: 0000000000000020 > > [ 1.229900] x17: 0000000000000001 x16: 0000000000000019 > > [ 1.229904] x15: ffff00007b222fd8 x14: ffffffffffffffff > > [ 1.229908] x13: ffff00007a79ba1c x12: ffff00007a79b290 > > [ 1.229912] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f > > [ 1.229917] x9 : ff72646268756463 x8 : 0000000000000391 > > [ 1.229921] x7 : ffff80001004b880 x6 : 0000000000000001 > > [ 1.229925] x5 : 0000000000000000 x4 : 0000000000000000 > > [ 1.229930] x3 : 0000000000c00008 x2 : 000000000080000a > > [ 1.229934] x1 : 0000000000000000 x0 : ffff800014000000 > > [ 1.229939] Kernel panic - not syncing: Asynchronous SError Interrupt > > [ 1.229942] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.9.9-2.0-MANJARO-ARM #1 > > [ 1.229944] Hardware name: Pine64 RockPro64 v2.1 (DT) > > [ 1.229946] Call trace: > > [ 1.229948] dump_backtrace+0x0/0x1d0 > > [ 1.229949] show_stack+0x18/0x24 > > [ 1.229951] dump_stack+0xc0/0x118 > > [ 1.229953] panic+0x148/0x320 > > [ 1.229955] nmi_panic+0x8c/0x90 > > [ 1.229956] arm64_serror_panic+0x78/0x84 > > [ 1.229958] do_serror+0x15c/0x160 > > [ 1.229960] el1_error+0x84/0x100 > > [ 1.229962] rockchip_pcie_rd_conf+0xb4/0x270 > > [ 1.229964] pci_bus_read_config_dword+0x6c/0xd0 > > [ 1.229966] pci_bus_generic_read_dev_vendor_id+0x34/0x1b0 > > [ 1.229968] pci_scan_single_device+0xa4/0x144 > > [ 1.229970] pci_scan_slot+0x40/0x12c > > [ 1.229972] pci_scan_child_bus_extend+0x58/0x34c > > [ 1.229974] pci_scan_bridge_extend+0x310/0x590 > > [ 1.229976] pci_scan_child_bus_extend+0x210/0x34c > > [ 1.229978] pci_scan_root_bus_bridge+0x68/0xdc > > [ 1.229980] pci_host_probe+0x18/0xc4 > > [ 1.229981] rockchip_pcie_probe+0x204/0x330 > > Include only the parts of the crash dump that are needed to debug the > problem or identify the problem enough to find this patch. Timestamps > probably aren't necessary. Register contents -- probably not either. > > The rest of the backtrace (below here) probably isn't useful. > > > [ 1.229984] platform_drv_probe+0x54/0xb0 > > [ 1.229985] really_probe+0xe8/0x500 > > [ 1.229987] driver_probe_device+0xd8/0xf0 > > [ 1.229989] device_driver_attach+0xc0/0xcc > > [ 1.229991] __driver_attach+0xa4/0x170 > > [ 1.229993] bus_for_each_dev+0x70/0xc0 > > [ 1.229994] driver_attach+0x24/0x30 > > [ 1.229996] bus_add_driver+0x140/0x234 > > [ 1.229998] driver_register+0x78/0x130 > > [ 1.230000] __platform_driver_register+0x4c/0x60 > > [ 1.230002] rockchip_pcie_driver_init+0x1c/0x28 > > [ 1.230004] do_one_initcall+0x54/0x1c0 > > [ 1.230005] do_initcalls+0xf4/0x130 > > [ 1.230007] kernel_init_freeable+0x144/0x19c > > [ 1.230009] kernel_init+0x14/0x11c > > [ 1.230011] ret_from_fork+0x10/0x34 > > [ 1.230035] SMP: stopping secondary CPUs > > [ 1.230037] Kernel Offset: disabled > > [ 1.230039] CPU features: 0x0240022,2100200c > > [ 1.230041] Memory Limit: none > > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > @@ -4286,6 +4286,14 @@ > > any pair of devices, possibly at the cost of > > reduced performance. This also guarantees > > that hot-added devices will work. > > + pcie_rockchip_host.bus_scan_delay= [PCIE] Delay in ms before > > + scanning PCIe bus in Rockchip PCIe host driver. Some PCIe > > + cards seem to need delays that can be several hundred ms. > > + If set to greater than or equal to 0 this parameter will > > + override delay that can be set in device tree. > > + Values less than 0 the module will hit an assertion > > + during the init. > > + The default value is 0. > > Generally speaking module-specific stuff like this doesn't get > documented in kernel-parameters.txt. > > > +++ b/arch/arm64/boot/dts/rockchip/rk3399-rockpro64.dtsi > > @@ -663,7 +663,8 @@ &pcie0 { > > pinctrl-0 = <&pcie_perst>; > > vpcie12v-supply = <&vcc12v_dcin>; > > vpcie3v3-supply = <&vcc3v3_pcie>; > > - status = "okay"; > > + bus-scan-delay-ms = <0>; > > + status = "okay"; > > Please don't add arbitrary whitespace changes (it looks like this > uses leading spaces instead of tabs). > > > +/* bus_scan_delay - module parameter to override the > > + * device tree value, which is 0 by default. */ > > Please follow comment style in the file, i.e., > > /* > * bus_scan_delay - ... > */ > > Wrap comments to fill 78 columns or so to match the rest of the file. > > > @@ -987,6 +996,23 @@ static int rockchip_pcie_probe(struct platform_device *pdev) > > > > rockchip_pcie_enable_interrupts(rockchip); > > > > + prob_delay = rockchip->bus_scan_delay; > > + if (bus_scan_delay) > > + prob_delay = bus_scan_delay; > > + > > + /* > > + * FIXME: This is a workaround for some devices that crash on calls to pci_host_probe() > > + * or pci_scan_root_bus_bridge(). We add a delay before bus scanning to avoid the crash. > > + * The call trace reaches rockchip_pcie_rd_conf() while attempting to read the vendor ID > > + * (pci_bus_generic_read_dev_vendor_id() is in the call stack) before panicking. > > + * > > + * I'm not sure why this workaround is effective or what causes the panic. It may be related > > + * to the cansleep value. > > Wrap comments to fit in 78 columns to match the rest of the file. > > > + */ > > + dev_info(dev, "wait %u ms before bus scan\n", prob_delay); > > + if (prob_delay > 0) > > + msleep(prob_delay); > > I don't think we want to just add a random delay here that's not > connected to anything else. I assume it could go in > rockchip_pcie_host_init_port() or perhaps rockchip_pcie_init_port() > (which deasserts resets, and there are usually timing constraints > related to deasserting resets). Hopefully Shawn can shed some light > on this. > > > err = pci_host_probe(bridge); > > if (err < 0) > > goto err_remove_irq_domain; > > @@ -1055,6 +1081,11 @@ static struct platform_driver rockchip_pcie_driver = { > > }; > > module_platform_driver(rockchip_pcie_driver); > > > > +/** Allow to override the device tree default configuration with > > + * a command line argument. > > + **/ > > Use multi-line comment style that matches the rest of the file. > > > +module_param_named(bus_scan_delay, bus_scan_delay, int, S_IRUGO); > > This should go right next to the bus_scan_delay definition above. > > > +++ b/drivers/pci/controller/pcie-rockchip.c > > @@ -149,6 +149,11 @@ int rockchip_pcie_parse_dt(struct rockchip_pcie *rockchip) > > return PTR_ERR(rockchip->clk_pcie_pm); > > } > > > > + err = of_property_read_u32(node, "bus-scan-delay-ms", &rockchip->bus_scan_delay); > > + if (err) { > > + dev_info(dev, "no bus scan delay, default to 0 ms\n"); > > + rockchip->bus_scan_delay = 0; > > I hope we don't need this property at all, but if we do, I assume it > should be optional, with no message needed if it's not present. > > > +++ b/drivers/pci/controller/pcie-rockchip.h > > @@ -299,6 +299,16 @@ struct rockchip_pcie { > > phys_addr_t msg_bus_addr; > > bool is_rc; > > struct resource *mem_res; > > + > > + /* It seems that the driver crashes on some > > + * older devices. To work around this, we > > + * should add a sleep delay before probing. > > + * > > + * FIXME: need more investigated with an, > > + * but looks like the problem can be related with > > + * the cansleep value? > > + **/ > > We need better understanding of what's going on here. Then this > comment could be made more specific, shorter, and formatted like > others. > > > + u32 bus_scan_delay; > > _______________________________________________ > Linux-rockchip mailing list > Linux-rockchip@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-rockchip
On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote: > --- a/drivers/pci/controller/pcie-rockchip-host.c > +++ b/drivers/pci/controller/pcie-rockchip-host.c > @@ -38,6 +38,10 @@ > #include "../pci.h" > #include "pcie-rockchip.h" > > +/* bus_scan_delay - module parameter to override the > + * device tree value, which is 0 by default. */ > +static int bus_scan_delay = -1; Please do not add new module parameters, this is not the 1990's anymore. We have per-device settings everywhere, this makes that impossible. Just use a DT value, if it is wrong, then fix the DT value! No need to have the kernel override it, that's not what DT files are for. thanks, greg k-h
> On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote: > > --- a/drivers/pci/controller/pcie-rockchip-host.c > > +++ b/drivers/pci/controller/pcie-rockchip-host.c > > @@ -38,6 +38,10 @@ > > #include "../pci.h" > > #include "pcie-rockchip.h" > > > > +/* bus_scan_delay - module parameter to override the > > + * device tree value, which is 0 by default. */ > > +static int bus_scan_delay = -1; > > Please do not add new module parameters, this is not the 1990's anymore. > We have per-device settings everywhere, this makes that impossible. > Just use a DT value, if it is wrong, then fix the DT value! No need to > have the kernel override it, that's not what DT files are for. Thanks! Cheers! Vincent.
> On Tue, May 9, 2023 at 5:19 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > Hi Vincenzo, > > > > Thanks for raising this issue. Let's see what we can do to address > > it. > > > > On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote: > > > Add a configurable delay to the Rockchip PCIe driver to address > > > crashes that occur on some old devices, such as the Pine64 RockPro64. > > > > > > This issue is affecting the ARM community, but there is no > > > upstream solution for it yet. > > > > It sounds like this happens with several endpoints, right? And I > > assume the endpoints work fine in other non-Rockchip systems? If > > that's the case, my guess is the problem is with the Rockchip host > > controller and how it's initialized, not with the endpoints. > > > > The only delays and timeouts I see in the driver now are in > > rockchip_pcie_host_init_port(), where it waits for link training to > > complete. I assume the link training did completely successfully > > since you don't mention either a gen1 or gen2 timeout (although the > > gen2 message is a dev_dbg() that normally wouldn't go to the console). > > > > I don't know that the spec contains a retrain timeout value. Several > > other drivers use 1 second, while rockchip uses 500ms (for example, > > see LINK_RETRAIN_TIMEOUT and LINK_UP_TIMEOUT). > > > > I think we need to understand the issue better before adding a DT > > property and a module parameter. Those are hard for users to deal > > with. If we can figure out a value that works for everybody, it would > > be better to just hard-code it in the driver and use that all the > > time. > > Good Evening, > > The main issue with the rk3399 is the PCIe controller is buggy and > triggers a SoC panic when certain error conditions occur that should > be handled gracefully. One of those conditions is when an endpoint > requests an access to wait and retry later. Many years ago we ran that > issue to ground and with Robin Murphy's help we found that while it's > possible to gracefully handle that condition it required hijacking the > entire arm64 error handling routine. Not exactly scalable for just one > SoC. The configurable waits allow us to program reasonable times for > 90% of the endpoints that come up in the normal amount of time, while > being able to adjust it for the other 10% that do not. Some require > multiple seconds before they return without error. Part of the reason > we don't want to hardcode the wait time is because the probe isn't > handled asynchronously, so the kernel appears to hang while waiting > for the timeout. Yeah, I smell a hardware bug in my code. I hate waiting in this way inside the code, so it's usually wrong when you need to do something like that. During my research, I also found this patch (https://bugzilla.redhat.com/show_bug.cgi?id=2134177) that provides a fix in another possibly cleaner way. But I don't understand the reasoning behind it, so maybe I haven't spent enough time thinking about it. > I'm curious if it's been tested with this series on top: > https://lore.kernel.org/linux-arm-kernel/20230418074700.1083505-8-rick.wertenbroek@gmail.com/T/ > I'm particularly curious if > [v5,04/11] PCI: rockchip: Add poll and timeout to wait for PHY PLLs to be locked > makes a difference in the behavior. Please test this and see if it > improves the timeouts you need for the endpoints you're testing > against. Mh, I can try to cherry-pick the commit and test it in my own test environment. Currently, I haven't been able to test it due to a lack of hardware, but I'm seeking a way to obtain one. Luckily, I have someone on the Manjaro arm team who can help me test it, so I'll try to do that. Cheers! Vincent.
> Hi Vincenzo, Hi :) > Thanks for raising this issue. Let's see what we can do to address > it. Yeah, as I said in my cover letter, I am not happy with my solution, but we should start somewhere to discuss it. > > Add a configurable delay to the Rockchip PCIe driver to address > > crashes that occur on some old devices, such as the Pine64 RockPro64. > > > > This issue is affecting the ARM community, but there is no > > upstream solution for it yet. > > It sounds like this happens with several endpoints, right? And I > assume the endpoints work fine in other non-Rockchip systems? If > that's the case, my guess is the problem is with the Rockchip host > controller and how it's initialized, not with the endpoints. Yeah, the crash is only reproducible with the Rockchip system, or better, the crash is reproducible only in some modern devices that use the old Rockchip driver mentioned in this patch. > The only delays and timeouts I see in the driver now are in > rockchip_pcie_host_init_port(), where it waits for link training to > complete. I assume the link training did completely successfully > since you don't mention either a gen1 or gen2 timeout (although the > gen2 message is a dev_dbg() that normally wouldn't go to the console). > > I don't know that the spec contains a retrain timeout value. Several > other drivers use 1 second, while rockchip uses 500ms (for example, > see LINK_RETRAIN_TIMEOUT and LINK_UP_TIMEOUT). > > I think we need to understand the issue better before adding a DT > property and a module parameter. Those are hard for users to deal > with. If we can figure out a value that works for everybody, it would > be better to just hard-code it in the driver and use that all the > time. Yeah, I see, I see. This makes sense. Is there any path that I can follow in order to better understand what's going on at the hardware level? In other words, how can I help to understand this issue better and provide a unique solution for everybody? Thanks for the nits in the patch, I will take a look with a fresh mind later in the day. Cheers! Vincent.
On Wed, May 10, 2023 at 7:16 AM Vincenzo Palazzo <vincenzopalazzodev@gmail.com> wrote: > > > On Tue, May 9, 2023 at 5:19 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > Hi Vincenzo, > > > > > > Thanks for raising this issue. Let's see what we can do to address > > > it. > > > > > > On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote: > > > > Add a configurable delay to the Rockchip PCIe driver to address > > > > crashes that occur on some old devices, such as the Pine64 RockPro64. > > > > > > > > This issue is affecting the ARM community, but there is no > > > > upstream solution for it yet. > > > > > > It sounds like this happens with several endpoints, right? And I > > > assume the endpoints work fine in other non-Rockchip systems? If > > > that's the case, my guess is the problem is with the Rockchip host > > > controller and how it's initialized, not with the endpoints. > > > > > > The only delays and timeouts I see in the driver now are in > > > rockchip_pcie_host_init_port(), where it waits for link training to > > > complete. I assume the link training did completely successfully > > > since you don't mention either a gen1 or gen2 timeout (although the > > > gen2 message is a dev_dbg() that normally wouldn't go to the console). > > > > > > I don't know that the spec contains a retrain timeout value. Several > > > other drivers use 1 second, while rockchip uses 500ms (for example, > > > see LINK_RETRAIN_TIMEOUT and LINK_UP_TIMEOUT). > > > > > > I think we need to understand the issue better before adding a DT > > > property and a module parameter. Those are hard for users to deal > > > with. If we can figure out a value that works for everybody, it would > > > be better to just hard-code it in the driver and use that all the > > > time. > > > > Good Evening, > > > > The main issue with the rk3399 is the PCIe controller is buggy and > > triggers a SoC panic when certain error conditions occur that should > > be handled gracefully. One of those conditions is when an endpoint > > requests an access to wait and retry later. Many years ago we ran that > > issue to ground and with Robin Murphy's help we found that while it's > > possible to gracefully handle that condition it required hijacking the > > entire arm64 error handling routine. Not exactly scalable for just one > > SoC. The configurable waits allow us to program reasonable times for > > 90% of the endpoints that come up in the normal amount of time, while > > being able to adjust it for the other 10% that do not. Some require > > multiple seconds before they return without error. Part of the reason > > we don't want to hardcode the wait time is because the probe isn't > > handled asynchronously, so the kernel appears to hang while waiting > > for the timeout. > > Yeah, I smell a hardware bug in my code. I hate waiting in this way inside > the code, so it's usually wrong when you need to do something like that. Correct, it's the unfortunate way of arm64 development. Almost none of the SoCs implement all of their hardware in a completely specification compliant way. > > During my research, I also found this patch (https://bugzilla.redhat.com/show_bug.cgi?id=2134177) > that provides a fix in another possibly cleaner way. > > But I don't understand the reasoning behind it, so maybe I > haven't spent enough time thinking about it. That is a completely different issue, unrelated to the PCI crash. > > > I'm curious if it's been tested with this series on top: > > https://lore.kernel.org/linux-arm-kernel/20230418074700.1083505-8-rick.wertenbroek@gmail.com/T/ > > I'm particularly curious if > > [v5,04/11] PCI: rockchip: Add poll and timeout to wait for PHY PLLs to be locked > > makes a difference in the behavior. Please test this and see if it > > improves the timeouts you need for the endpoints you're testing > > against. > > Mh, I can try to cherry-pick the commit and test it in my own test environment. Currently, I haven't been > able to test it due to a lack of hardware, but I'm seeking a way to obtain one. > Luckily, I have someone on the Manjaro arm team who can help me test it, > so I'll try to do that. > > Cheers! > > Vincent.
On Tue, May 09, 2023 at 08:11:29PM -0400, Peter Geis wrote: > On Tue, May 9, 2023 at 5:19 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote: > > > Add a configurable delay to the Rockchip PCIe driver to address > > > crashes that occur on some old devices, such as the Pine64 RockPro64. > > > > > > This issue is affecting the ARM community, but there is no > > > upstream solution for it yet. > > > > It sounds like this happens with several endpoints, right? And I > > assume the endpoints work fine in other non-Rockchip systems? If > > that's the case, my guess is the problem is with the Rockchip host > > controller and how it's initialized, not with the endpoints. > > ... > The main issue with the rk3399 is the PCIe controller is buggy and > triggers a SoC panic when certain error conditions occur that should > be handled gracefully. One of those conditions is when an endpoint > requests an access to wait and retry later. I assume this refers to a Completion with Request Retry Status (RRS)? > Many years ago we ran that issue to ground and with Robin Murphy's > help we found that while it's possible to gracefully handle that > condition it required hijacking the entire arm64 error handling > routine. Not exactly scalable for just one SoC. Do you have a pointer to that discussion? The URL might save repeating the whole exercise and could be useful for the commit log when we try to resolve this. > The configurable waits allow us to program reasonable times for > 90% of the endpoints that come up in the normal amount of time, while > being able to adjust it for the other 10% that do not. Some require > multiple seconds before they return without error. Part of the reason > we don't want to hardcode the wait time is because the probe isn't > handled asynchronously, so the kernel appears to hang while waiting > for the timeout. Is there some way for users to figure out that they would need this property? Or is it just "if your kernel panics on boot, try adding or increasing "bus-scan-delay-ms" in your DT? Bjorn
On Wed, May 10, 2023 at 4:48 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Tue, May 09, 2023 at 08:11:29PM -0400, Peter Geis wrote: > > On Tue, May 9, 2023 at 5:19 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote: > > > > Add a configurable delay to the Rockchip PCIe driver to address > > > > crashes that occur on some old devices, such as the Pine64 RockPro64. > > > > > > > > This issue is affecting the ARM community, but there is no > > > > upstream solution for it yet. > > > > > > It sounds like this happens with several endpoints, right? And I > > > assume the endpoints work fine in other non-Rockchip systems? If > > > that's the case, my guess is the problem is with the Rockchip host > > > controller and how it's initialized, not with the endpoints. > > > ... > > > The main issue with the rk3399 is the PCIe controller is buggy and > > triggers a SoC panic when certain error conditions occur that should > > be handled gracefully. One of those conditions is when an endpoint > > requests an access to wait and retry later. > > I assume this refers to a Completion with Request Retry Status (RRS)? I'm not sure the full coverage, the test patch from Shawn Lin that allowed the system to handle the errors has the following description: "Native defect prevents this RC far from supporting any response from EP which UR filed is set." > > > Many years ago we ran that issue to ground and with Robin Murphy's > > help we found that while it's possible to gracefully handle that > > condition it required hijacking the entire arm64 error handling > > routine. Not exactly scalable for just one SoC. > > Do you have a pointer to that discussion? The URL might save > repeating the whole exercise and could be useful for the commit log > when we try to resolve this. The link to the patch email is here, the full discussion is pretty easy to follow: https://lore.kernel.org/linux-pci/2a381384-9d47-a7e2-679c-780950cd862d@rock-chips.com/ Also: https://lore.kernel.org/linux-rockchip/1f180d4b-9e5d-c829-555b-c9750940361e@web.de/T/#m9c9d4a28a0d3aa064864f188b8ee3b16ce107aff > > > The configurable waits allow us to program reasonable times for > > 90% of the endpoints that come up in the normal amount of time, while > > being able to adjust it for the other 10% that do not. Some require > > multiple seconds before they return without error. Part of the reason > > we don't want to hardcode the wait time is because the probe isn't > > handled asynchronously, so the kernel appears to hang while waiting > > for the timeout. > > Is there some way for users to figure out that they would need this > property? Or is it just "if your kernel panics on boot, try > adding or increasing "bus-scan-delay-ms" in your DT? There's a listing of tested cards at: https://wiki.pine64.org/wiki/ROCKPro64_Hardware_compatibility Most cards work fine that don't require a large BAR. PCIe switches are completely dead without the above hack patch. Cards that lie in the middle are ones that expect BIOS / EFI support to initialize, or ones that have complex boot roms and don't initialize quickly. But yes, it's unfortunately going to be "if you panic, increase the delay" unless a more complete database of cards can be generated. Very Respectfully, Peter Geis > > Bjorn
> > > Many years ago we ran that issue to ground and with Robin Murphy's > > > help we found that while it's possible to gracefully handle that > > > condition it required hijacking the entire arm64 error handling > > > routine. Not exactly scalable for just one SoC. > > > > Do you have a pointer to that discussion? The URL might save > > repeating the whole exercise and could be useful for the commit log > > when we try to resolve this. > > The link to the patch email is here, the full discussion is pretty > easy to follow: > https://lore.kernel.org/linux-pci/2a381384-9d47-a7e2-679c-780950cd862d@rock-chips.com/ > Also: > https://lore.kernel.org/linux-rockchip/1f180d4b-9e5d-c829-555b-c9750940361e@web.de/T/#m9c9d4a28a0d3aa064864f188b8ee3b16ce107aff I have some concerns about the patch proposed in the email that you share. It seems like it is quite extensive (code that is it not just related to the HW) just to fix a hardware issue. I would have expected the code to fix the bug to be integrated into the driver itself, so that if the hardware will died at some point in the future, I would expect that also the buddy code will died with it. However, it is possible that I may have missed something in the patch, and my thoughts could be wrong. > > > > > The configurable waits allow us to program reasonable times for > > > 90% of the endpoints that come up in the normal amount of time, while > > > being able to adjust it for the other 10% that do not. Some require > > > multiple seconds before they return without error. Part of the reason > > > we don't want to hardcode the wait time is because the probe isn't > > > handled asynchronously, so the kernel appears to hang while waiting > > > for the timeout. > > > > Is there some way for users to figure out that they would need this > > property? Or is it just "if your kernel panics on boot, try > > adding or increasing "bus-scan-delay-ms" in your DT? > > There's a listing of tested cards at: > https://wiki.pine64.org/wiki/ROCKPro64_Hardware_compatibility > > Most cards work fine that don't require a large BAR. PCIe switches are > completely dead without the above hack patch. Cards that lie in the > middle are ones that expect BIOS / EFI support to initialize, or ones > that have complex boot roms and don't initialize quickly. > But yes, it's unfortunately going to be "if you panic, increase the > delay" unless a more complete database of cards can be generated. This is really unfortunate because as mentioned in some previous emails, using sleep time slows down the kernel. Is there any way to tell the kernel to tell the kernel "hey we need some more time here", or in computer science terms, load a driver asynchronously? Thanks. Vincent.
> > Thanks for raising this issue. Let's see what we can do to address > it. > > On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote: > > Add a configurable delay to the Rockchip PCIe driver to address > > crashes that occur on some old devices, such as the Pine64 RockPro64. > > > > This issue is affecting the ARM community, but there is no > > upstream solution for it yet. > > It sounds like this happens with several endpoints, right? And I > assume the endpoints work fine in other non-Rockchip systems? If > that's the case, my guess is the problem is with the Rockchip host > controller and how it's initialized, not with the endpoints. > > The only delays and timeouts I see in the driver now are in > rockchip_pcie_host_init_port(), where it waits for link training to > complete. I assume the link training did completely successfully > since you don't mention either a gen1 or gen2 timeout (although the > gen2 message is a dev_dbg() that normally wouldn't go to the console). > > I don't know that the spec contains a retrain timeout value. Several > other drivers use 1 second, while rockchip uses 500ms (for example, > see LINK_RETRAIN_TIMEOUT and LINK_UP_TIMEOUT). > > I think we need to understand the issue better before adding a DT > property and a module parameter. Those are hard for users to deal > with. If we can figure out a value that works for everybody, it would > be better to just hard-code it in the driver and use that all the > time. > > A few minor style/formatting comments below just for future reference: Due the recent email I think it is worth make a version 2 of this patch with your suggestion? and iterate over it another time? Cheers! Vincent.
[+cc ARM64 folks, in case you have abort handling tips; thread at: https://lore.kernel.org/r/20230509153912.515218-1-vincenzopalazzodev@gmail.com] Pine64 RockPro64 panics while enumerating some PCIe devices. Adding a delay avoids the panic. My theory is a PCIe Request Retry Status to a Vendor ID config read causes an abort that we don't handle. > On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote: >> ... >> [ 1.229856] SError Interrupt on CPU4, code 0xbf000002 -- SError >> [ 1.229860] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.9.9-2.0-MANJARO-ARM >> #1 >> [ 1.229862] Hardware name: Pine64 RockPro64 v2.1 (DT) >> [ 1.229864] pstate: 60000085 (nZCv daIf -PAN -UAO BTYPE=--) >> [ 1.229866] pc : rockchip_pcie_rd_conf+0xb4/0x270 >> [ 1.229868] lr : rockchip_pcie_rd_conf+0x1b4/0x270 >> ... >> [ 1.229939] Kernel panic - not syncing: Asynchronous SError Interrupt >> ... >> [ 1.229955] nmi_panic+0x8c/0x90 >> [ 1.229956] arm64_serror_panic+0x78/0x84 >> [ 1.229958] do_serror+0x15c/0x160 >> [ 1.229960] el1_error+0x84/0x100 >> [ 1.229962] rockchip_pcie_rd_conf+0xb4/0x270 >> [ 1.229964] pci_bus_read_config_dword+0x6c/0xd0 >> [ 1.229966] pci_bus_generic_read_dev_vendor_id+0x34/0x1b0 >> [ 1.229968] pci_scan_single_device+0xa4/0x144 On Fri, May 12, 2023 at 12:46:21PM +0200, Vincenzo Palazzo wrote: > ... Is there any way to tell the kernel "hey we need some more time > here"? We enumerate PCI devices by trying to read the Vendor ID of every possible device address (see pci_scan_slot()). On PCIe, if a device doesn't exist at that address, the Vendor ID config read will be terminated with Unsupported Request (UR) status. This is normal and happens every time we enumerate devices. The crash doesn't happen every time we enumerate, so I don't think this UR is the problem. Also, if it *were* the problem, adding a delay would not make any difference. There *is* a way for a PCIe device to say "I need more time". It does this by responding to that Vendor ID config read with Request Retry Status (RRS, aka CRS in older specs), which means "I'm not ready yet, but I will be ready in the future." Adding a delay would definitely make a difference here, so my guess is this is what's happening. Most root complexes return ~0 data to the CPU when a config read terminates with UR or RRS. It sounds like rockchip does this for UR but possibly not for RRS. There is a "RRS Software Visibility" feature, which is supposed to turn the RRS into a special value (Vendor ID == 0x0001), but per [1], rockchip doesn't support it (lspci calls it "CRSVisible"). But the CPU load instruction corresponding to the config read has to complete by reading *something* or else be aborted. It sounds like it's aborted in this case. I don't know the arm64 details, but if we could catch that abort and determine that it was an RRS and not a UR, maybe we could fabricate the magic RRS 0x0001 value. imx6q_pcie_abort_handler() does something like that, although I think it's for arm32, not arm64. But obviously we already catch the abort enough to dump the register state and panic, so maybe there's a way to extend that? Bjorn [1] https://lore.kernel.org/linux-pci/CAMdYzYpOFAVq30N+O2gOxXiRtpoHpakFg3LKq3TEZq4S6Y0y0g@mail.gmail.com/
On Fri, May 12, 2023 at 9:24 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > [+cc ARM64 folks, in case you have abort handling tips; thread at: > https://lore.kernel.org/r/20230509153912.515218-1-vincenzopalazzodev@gmail.com] > > Pine64 RockPro64 panics while enumerating some PCIe devices. Adding a > delay avoids the panic. My theory is a PCIe Request Retry Status to a > Vendor ID config read causes an abort that we don't handle. > > > On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote: > >> ... > >> [ 1.229856] SError Interrupt on CPU4, code 0xbf000002 -- SError > >> [ 1.229860] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.9.9-2.0-MANJARO-ARM > >> #1 > >> [ 1.229862] Hardware name: Pine64 RockPro64 v2.1 (DT) > >> [ 1.229864] pstate: 60000085 (nZCv daIf -PAN -UAO BTYPE=--) > >> [ 1.229866] pc : rockchip_pcie_rd_conf+0xb4/0x270 > >> [ 1.229868] lr : rockchip_pcie_rd_conf+0x1b4/0x270 > >> ... > >> [ 1.229939] Kernel panic - not syncing: Asynchronous SError Interrupt > >> ... > >> [ 1.229955] nmi_panic+0x8c/0x90 > >> [ 1.229956] arm64_serror_panic+0x78/0x84 > >> [ 1.229958] do_serror+0x15c/0x160 > >> [ 1.229960] el1_error+0x84/0x100 > >> [ 1.229962] rockchip_pcie_rd_conf+0xb4/0x270 > >> [ 1.229964] pci_bus_read_config_dword+0x6c/0xd0 > >> [ 1.229966] pci_bus_generic_read_dev_vendor_id+0x34/0x1b0 > >> [ 1.229968] pci_scan_single_device+0xa4/0x144 > > On Fri, May 12, 2023 at 12:46:21PM +0200, Vincenzo Palazzo wrote: > > ... Is there any way to tell the kernel "hey we need some more time > > here"? > > We enumerate PCI devices by trying to read the Vendor ID of every > possible device address (see pci_scan_slot()). On PCIe, if a device > doesn't exist at that address, the Vendor ID config read will be > terminated with Unsupported Request (UR) status. This is normal > and happens every time we enumerate devices. > > The crash doesn't happen every time we enumerate, so I don't think > this UR is the problem. Also, if it *were* the problem, adding a > delay would not make any difference. Is this behavior different if there is a switch device forwarding on the UR? On rk3399 switches are completely non-functional because of the panic, which is observed in the output of the dmesg in [2] with the hack patch enabled. Considering what you just described it looks like the forwarded UR for each non-existent device behind the switch is causing an serror. > > There *is* a way for a PCIe device to say "I need more time". It does > this by responding to that Vendor ID config read with Request Retry > Status (RRS, aka CRS in older specs), which means "I'm not ready yet, > but I will be ready in the future." Adding a delay would definitely > make a difference here, so my guess is this is what's happening. > > Most root complexes return ~0 data to the CPU when a config read > terminates with UR or RRS. It sounds like rockchip does this for UR > but possibly not for RRS. > > There is a "RRS Software Visibility" feature, which is supposed to > turn the RRS into a special value (Vendor ID == 0x0001), but per [1], > rockchip doesn't support it (lspci calls it "CRSVisible"). > > But the CPU load instruction corresponding to the config read has to > complete by reading *something* or else be aborted. It sounds like > it's aborted in this case. I don't know the arm64 details, but if we > could catch that abort and determine that it was an RRS and not a UR, > maybe we could fabricate the magic RRS 0x0001 value. > > imx6q_pcie_abort_handler() does something like that, although I think > it's for arm32, not arm64. But obviously we already catch the abort > enough to dump the register state and panic, so maybe there's a way to > extend that? Perhaps a hook mechanism that allows drivers to register with the serror handler and offer to handle specific errors before the generic code causes the system panic? Very Respectfully, Peter Geis [2] https://lore.kernel.org/linux-pci/CAMdYzYqn3L7x-vc+_K6jG0EVTiPGbz8pQ-N1Q1mRbcVXE822Yg@mail.gmail.com/ > > Bjorn > > [1] https://lore.kernel.org/linux-pci/CAMdYzYpOFAVq30N+O2gOxXiRtpoHpakFg3LKq3TEZq4S6Y0y0g@mail.gmail.com/
> > > > There *is* a way for a PCIe device to say "I need more time". It does > > this by responding to that Vendor ID config read with Request Retry > > Status (RRS, aka CRS in older specs), which means "I'm not ready yet, > > but I will be ready in the future." Adding a delay would definitely > > make a difference here, so my guess is this is what's happening. > > > > Most root complexes return ~0 data to the CPU when a config read > > terminates with UR or RRS. It sounds like rockchip does this for UR > > but possibly not for RRS. > > > > There is a "RRS Software Visibility" feature, which is supposed to > > turn the RRS into a special value (Vendor ID == 0x0001), but per [1], > > rockchip doesn't support it (lspci calls it "CRSVisible"). > > > > But the CPU load instruction corresponding to the config read has to > > complete by reading *something* or else be aborted. It sounds like > > it's aborted in this case. I don't know the arm64 details, but if we > > could catch that abort and determine that it was an RRS and not a UR, > > maybe we could fabricate the magic RRS 0x0001 value. > > > > imx6q_pcie_abort_handler() does something like that, although I think > > it's for arm32, not arm64. But obviously we already catch the abort > > enough to dump the register state and panic, so maybe there's a way to > > extend that? > > Perhaps a hook mechanism that allows drivers to register with the > serror handler and offer to handle specific errors before the generic > code causes the system panic? This sounds to me a good general solution that also help to handle future HW like this one. So this is a Concept Ack for me. Cheers! Vincent.
On Sat, May 13, 2023 at 07:40:12AM -0400, Peter Geis wrote: > On Fri, May 12, 2023 at 9:24 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > [+cc ARM64 folks, in case you have abort handling tips; thread at: > > https://lore.kernel.org/r/20230509153912.515218-1-vincenzopalazzodev@gmail.com] > > > > Pine64 RockPro64 panics while enumerating some PCIe devices. Adding a > > delay avoids the panic. My theory is a PCIe Request Retry Status to a > > Vendor ID config read causes an abort that we don't handle. > > > > > On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote: > > >> ... > > >> [ 1.229856] SError Interrupt on CPU4, code 0xbf000002 -- SError > > >> [ 1.229860] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.9.9-2.0-MANJARO-ARM > > >> #1 > > >> [ 1.229862] Hardware name: Pine64 RockPro64 v2.1 (DT) > > >> [ 1.229864] pstate: 60000085 (nZCv daIf -PAN -UAO BTYPE=--) > > >> [ 1.229866] pc : rockchip_pcie_rd_conf+0xb4/0x270 > > >> [ 1.229868] lr : rockchip_pcie_rd_conf+0x1b4/0x270 > > >> ... > > >> [ 1.229939] Kernel panic - not syncing: Asynchronous SError Interrupt > > >> ... > > >> [ 1.229955] nmi_panic+0x8c/0x90 > > >> [ 1.229956] arm64_serror_panic+0x78/0x84 > > >> [ 1.229958] do_serror+0x15c/0x160 > > >> [ 1.229960] el1_error+0x84/0x100 > > >> [ 1.229962] rockchip_pcie_rd_conf+0xb4/0x270 > > >> [ 1.229964] pci_bus_read_config_dword+0x6c/0xd0 > > >> [ 1.229966] pci_bus_generic_read_dev_vendor_id+0x34/0x1b0 > > >> [ 1.229968] pci_scan_single_device+0xa4/0x144 > > > > On Fri, May 12, 2023 at 12:46:21PM +0200, Vincenzo Palazzo wrote: > > > ... Is there any way to tell the kernel "hey we need some more time > > > here"? > > > > We enumerate PCI devices by trying to read the Vendor ID of every > > possible device address (see pci_scan_slot()). On PCIe, if a device > > doesn't exist at that address, the Vendor ID config read will be > > terminated with Unsupported Request (UR) status. This is normal > > and happens every time we enumerate devices. > > > > The crash doesn't happen every time we enumerate, so I don't think > > this UR is the problem. Also, if it *were* the problem, adding a > > delay would not make any difference. > > Is this behavior different if there is a switch device forwarding on > the UR? On rk3399 switches are completely non-functional because of > the panic, which is observed in the output of the dmesg in [2] with > the hack patch enabled. Considering what you just described it looks > like the forwarded UR for each non-existent device behind the switch > is causing an serror. I don't know exactly what the panic looks like, but I wouldn't expect UR handling to be different when there's a switch. pcie-rockchip-host.c does handle devices on the root bus (00) differently than others because rockchip_pcie_valid_device() knows that device 00:00 is the only device on the root bus. That part makes sense because 00:00 is built into the SoC. I'm a little suspicious of the fact that rockchip_pcie_valid_device() also enforces that bus 01 can only have a single device on it. No other *_pcie_valid_device() implementations enforce that. It's true that traditional PCIe devices can only implement device 00, but ARI relaxes that by reusing the Device Number as extended Function Number bits. > > There *is* a way for a PCIe device to say "I need more time". It does > > this by responding to that Vendor ID config read with Request Retry > > Status (RRS, aka CRS in older specs), which means "I'm not ready yet, > > but I will be ready in the future." Adding a delay would definitely > > make a difference here, so my guess is this is what's happening. > > > > Most root complexes return ~0 data to the CPU when a config read > > terminates with UR or RRS. It sounds like rockchip does this for UR > > but possibly not for RRS. > > > > There is a "RRS Software Visibility" feature, which is supposed to > > turn the RRS into a special value (Vendor ID == 0x0001), but per [1], > > rockchip doesn't support it (lspci calls it "CRSVisible"). > > > > But the CPU load instruction corresponding to the config read has to > > complete by reading *something* or else be aborted. It sounds like > > it's aborted in this case. I don't know the arm64 details, but if we > > could catch that abort and determine that it was an RRS and not a UR, > > maybe we could fabricate the magic RRS 0x0001 value. > > > > imx6q_pcie_abort_handler() does something like that, although I think > > it's for arm32, not arm64. But obviously we already catch the abort > > enough to dump the register state and panic, so maybe there's a way to > > extend that? > > Perhaps a hook mechanism that allows drivers to register with the > serror handler and offer to handle specific errors before the generic > code causes the system panic? > > Very Respectfully, > Peter Geis > > [2] https://lore.kernel.org/linux-pci/CAMdYzYqn3L7x-vc+_K6jG0EVTiPGbz8pQ-N1Q1mRbcVXE822Yg@mail.gmail.com/ > > > > > Bjorn > > > > [1] https://lore.kernel.org/linux-pci/CAMdYzYpOFAVq30N+O2gOxXiRtpoHpakFg3LKq3TEZq4S6Y0y0g@mail.gmail.com/ > _______________________________________________ > Linux-kernel-mentees mailing list > Linux-kernel-mentees@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees
On Mon, May 15, 2023 at 12:51 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Sat, May 13, 2023 at 07:40:12AM -0400, Peter Geis wrote: > > On Fri, May 12, 2023 at 9:24 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > [+cc ARM64 folks, in case you have abort handling tips; thread at: > > > https://lore.kernel.org/r/20230509153912.515218-1-vincenzopalazzodev@gmail.com] > > > > > > Pine64 RockPro64 panics while enumerating some PCIe devices. Adding a > > > delay avoids the panic. My theory is a PCIe Request Retry Status to a > > > Vendor ID config read causes an abort that we don't handle. > > > > > > > On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote: > > > >> ... > > > >> [ 1.229856] SError Interrupt on CPU4, code 0xbf000002 -- SError > > > >> [ 1.229860] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.9.9-2.0-MANJARO-ARM > > > >> #1 > > > >> [ 1.229862] Hardware name: Pine64 RockPro64 v2.1 (DT) > > > >> [ 1.229864] pstate: 60000085 (nZCv daIf -PAN -UAO BTYPE=--) > > > >> [ 1.229866] pc : rockchip_pcie_rd_conf+0xb4/0x270 > > > >> [ 1.229868] lr : rockchip_pcie_rd_conf+0x1b4/0x270 > > > >> ... > > > >> [ 1.229939] Kernel panic - not syncing: Asynchronous SError Interrupt > > > >> ... > > > >> [ 1.229955] nmi_panic+0x8c/0x90 > > > >> [ 1.229956] arm64_serror_panic+0x78/0x84 > > > >> [ 1.229958] do_serror+0x15c/0x160 > > > >> [ 1.229960] el1_error+0x84/0x100 > > > >> [ 1.229962] rockchip_pcie_rd_conf+0xb4/0x270 > > > >> [ 1.229964] pci_bus_read_config_dword+0x6c/0xd0 > > > >> [ 1.229966] pci_bus_generic_read_dev_vendor_id+0x34/0x1b0 > > > >> [ 1.229968] pci_scan_single_device+0xa4/0x144 > > > > > > On Fri, May 12, 2023 at 12:46:21PM +0200, Vincenzo Palazzo wrote: > > > > ... Is there any way to tell the kernel "hey we need some more time > > > > here"? > > > > > > We enumerate PCI devices by trying to read the Vendor ID of every > > > possible device address (see pci_scan_slot()). On PCIe, if a device > > > doesn't exist at that address, the Vendor ID config read will be > > > terminated with Unsupported Request (UR) status. This is normal > > > and happens every time we enumerate devices. > > > > > > The crash doesn't happen every time we enumerate, so I don't think > > > this UR is the problem. Also, if it *were* the problem, adding a > > > delay would not make any difference. > > > > Is this behavior different if there is a switch device forwarding on > > the UR? On rk3399 switches are completely non-functional because of > > the panic, which is observed in the output of the dmesg in [2] with > > the hack patch enabled. Considering what you just described it looks > > like the forwarded UR for each non-existent device behind the switch > > is causing an serror. > > I don't know exactly what the panic looks like, but I wouldn't expect > UR handling to be different when there's a switch. > > pcie-rockchip-host.c does handle devices on the root bus (00) > differently than others because rockchip_pcie_valid_device() knows > that device 00:00 is the only device on the root bus. That part makes > sense because 00:00 is built into the SoC. > > I'm a little suspicious of the fact that rockchip_pcie_valid_device() > also enforces that bus 01 can only have a single device on it. No > other *_pcie_valid_device() implementations enforce that. It's true > that traditional PCIe devices can only implement device 00, but ARI > relaxes that by reusing the Device Number as extended Function Number > bits. Bjorn, great catch, thank you! I suspect you're actually onto the core of the problem. Looking through various other drivers that implement _pcie_valid_device they all appear to file similar restrictions on scanning for devices. The drivers are all similar enough that I am starting to suspect they are all running some version of the same bugged IP. Then I came across advk_pcie_pio_is_running() in pci-aardvark.c, which describes our issue pretty spot on including the same exact SError. Interestingly enough they made a TF-A patch [3] to catch and handle the error without ever passing it to the kernel. Other limitations they added are ensuring reads are not attempted while the link is down. pci-aardvark.c also implements limitations on Completion Retry Status. It has given me ideas for solving the problem. Very Respectfully, Peter Geis [3] https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/commit/?id=3c7dcdac5c50 > > > > There *is* a way for a PCIe device to say "I need more time". It does > > > this by responding to that Vendor ID config read with Request Retry > > > Status (RRS, aka CRS in older specs), which means "I'm not ready yet, > > > but I will be ready in the future." Adding a delay would definitely > > > make a difference here, so my guess is this is what's happening. > > > > > > Most root complexes return ~0 data to the CPU when a config read > > > terminates with UR or RRS. It sounds like rockchip does this for UR > > > but possibly not for RRS. > > > > > > There is a "RRS Software Visibility" feature, which is supposed to > > > turn the RRS into a special value (Vendor ID == 0x0001), but per [1], > > > rockchip doesn't support it (lspci calls it "CRSVisible"). > > > > > > But the CPU load instruction corresponding to the config read has to > > > complete by reading *something* or else be aborted. It sounds like > > > it's aborted in this case. I don't know the arm64 details, but if we > > > could catch that abort and determine that it was an RRS and not a UR, > > > maybe we could fabricate the magic RRS 0x0001 value. > > > > > > imx6q_pcie_abort_handler() does something like that, although I think > > > it's for arm32, not arm64. But obviously we already catch the abort > > > enough to dump the register state and panic, so maybe there's a way to > > > extend that? > > > > Perhaps a hook mechanism that allows drivers to register with the > > serror handler and offer to handle specific errors before the generic > > code causes the system panic? > > > > Very Respectfully, > > Peter Geis > > > > [2] https://lore.kernel.org/linux-pci/CAMdYzYqn3L7x-vc+_K6jG0EVTiPGbz8pQ-N1Q1mRbcVXE822Yg@mail.gmail.com/ > > > > > > > > Bjorn > > > > > > [1] https://lore.kernel.org/linux-pci/CAMdYzYpOFAVq30N+O2gOxXiRtpoHpakFg3LKq3TEZq4S6Y0y0g@mail.gmail.com/ > > _______________________________________________ > > Linux-kernel-mentees mailing list > > Linux-kernel-mentees@lists.linuxfoundation.org > > https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees
Hi all, > Perhaps a hook mechanism that allows drivers to register with the > serror handler and offer to handle specific errors before the generic > code causes the system panic? I have some time to work on it. I'm interested in exploring the solution proposed here. However, I would appreciate some guidance in understanding how to implement this type of hook to report the error. Cheers. Vincent.
My RockPro64 occasionally failed on boot with this crash dump, at least since Linux 5.15. Since Linux 6.5, every boot fails in this manner. I applied a similar patch[0] against Linux 6.6 that sleeps during probe, and I'm now able to boot successfully each time. 0. https://gitlab.manjaro.org/manjaro-arm/packages/core/linux/-/blob/44e81d83b7e002e9955ac3c54e276218dc9ac76d/1005-rk3399-rp64-pcie-Reimplement-rockchip-PCIe-bus-scan-delay.patch
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 7016cb12dc4e..3f0e0d670126 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4286,6 +4286,14 @@ any pair of devices, possibly at the cost of reduced performance. This also guarantees that hot-added devices will work. + pcie_rockchip_host.bus_scan_delay= [PCIE] Delay in ms before + scanning PCIe bus in Rockchip PCIe host driver. Some PCIe + cards seem to need delays that can be several hundred ms. + If set to greater than or equal to 0 this parameter will + override delay that can be set in device tree. + Values less than 0 the module will hit an assertion + during the init. + The default value is 0. cbiosize=nn[KMG] The fixed amount of bus space which is reserved for the CardBus bridge's IO window. The default value is 256 bytes. diff --git a/arch/arm64/boot/dts/rockchip/rk3399-rockpro64.dtsi b/arch/arm64/boot/dts/rockchip/rk3399-rockpro64.dtsi index bca2b50e0a93..95a4623b8c18 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399-rockpro64.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3399-rockpro64.dtsi @@ -663,7 +663,8 @@ &pcie0 { pinctrl-0 = <&pcie_perst>; vpcie12v-supply = <&vcc12v_dcin>; vpcie3v3-supply = <&vcc3v3_pcie>; - status = "okay"; + bus-scan-delay-ms = <0>; + status = "okay"; }; &pcie_phy { diff --git a/drivers/pci/controller/pcie-rockchip-host.c b/drivers/pci/controller/pcie-rockchip-host.c index c96c0f454570..b97f3102a628 100644 --- a/drivers/pci/controller/pcie-rockchip-host.c +++ b/drivers/pci/controller/pcie-rockchip-host.c @@ -38,6 +38,10 @@ #include "../pci.h" #include "pcie-rockchip.h" +/* bus_scan_delay - module parameter to override the + * device tree value, which is 0 by default. */ +static int bus_scan_delay = -1; + static void rockchip_pcie_enable_bw_int(struct rockchip_pcie *rockchip) { u32 status; @@ -931,6 +935,11 @@ static int rockchip_pcie_probe(struct platform_device *pdev) struct rockchip_pcie *rockchip; struct device *dev = &pdev->dev; struct pci_host_bridge *bridge; + /* It seems that the driver crashes on some + * older devices. To work around this, we + * should add a sleep delay before probing. + **/ + u32 prob_delay; int err; if (!dev->of_node) @@ -987,6 +996,23 @@ static int rockchip_pcie_probe(struct platform_device *pdev) rockchip_pcie_enable_interrupts(rockchip); + prob_delay = rockchip->bus_scan_delay; + if (bus_scan_delay) + prob_delay = bus_scan_delay; + + /* + * FIXME: This is a workaround for some devices that crash on calls to pci_host_probe() + * or pci_scan_root_bus_bridge(). We add a delay before bus scanning to avoid the crash. + * The call trace reaches rockchip_pcie_rd_conf() while attempting to read the vendor ID + * (pci_bus_generic_read_dev_vendor_id() is in the call stack) before panicking. + * + * I'm not sure why this workaround is effective or what causes the panic. It may be related + * to the cansleep value. + */ + dev_info(dev, "wait %u ms before bus scan\n", prob_delay); + if (prob_delay > 0) + msleep(prob_delay); + err = pci_host_probe(bridge); if (err < 0) goto err_remove_irq_domain; @@ -1055,6 +1081,11 @@ static struct platform_driver rockchip_pcie_driver = { }; module_platform_driver(rockchip_pcie_driver); +/** Allow to override the device tree default configuration with + * a command line argument. + **/ +module_param_named(bus_scan_delay, bus_scan_delay, int, S_IRUGO); + MODULE_AUTHOR("Rockchip Inc"); MODULE_DESCRIPTION("Rockchip AXI PCIe driver"); MODULE_LICENSE("GPL v2"); diff --git a/drivers/pci/controller/pcie-rockchip.c b/drivers/pci/controller/pcie-rockchip.c index 990a00e08bc5..7350be04f662 100644 --- a/drivers/pci/controller/pcie-rockchip.c +++ b/drivers/pci/controller/pcie-rockchip.c @@ -149,6 +149,11 @@ int rockchip_pcie_parse_dt(struct rockchip_pcie *rockchip) return PTR_ERR(rockchip->clk_pcie_pm); } + err = of_property_read_u32(node, "bus-scan-delay-ms", &rockchip->bus_scan_delay); + if (err) { + dev_info(dev, "no bus scan delay, default to 0 ms\n"); + rockchip->bus_scan_delay = 0; + } return 0; } EXPORT_SYMBOL_GPL(rockchip_pcie_parse_dt); diff --git a/drivers/pci/controller/pcie-rockchip.h b/drivers/pci/controller/pcie-rockchip.h index 32c3a859c26b..76503dde6099 100644 --- a/drivers/pci/controller/pcie-rockchip.h +++ b/drivers/pci/controller/pcie-rockchip.h @@ -299,6 +299,16 @@ struct rockchip_pcie { phys_addr_t msg_bus_addr; bool is_rc; struct resource *mem_res; + + /* It seems that the driver crashes on some + * older devices. To work around this, we + * should add a sleep delay before probing. + * + * FIXME: need more investigated with an, + * but looks like the problem can be related with + * the cansleep value? + **/ + u32 bus_scan_delay; }; static u32 rockchip_pcie_read(struct rockchip_pcie *rockchip, u32 reg)