Message ID | 20240212165043.26961-1-johan+linaro@kernel.org |
---|---|
Headers |
Return-Path: <linux-kernel+bounces-62016-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:bc8a:b0:106:860b:bbdd with SMTP id dn10csp40969dyb; Mon, 12 Feb 2024 08:55:40 -0800 (PST) X-Google-Smtp-Source: AGHT+IHoaVlSFUSAJSid4C4GDxjJg64clbKIHURHp99vWZey+97/5RETZdJaoqg2rBFy+wCZK46r X-Received: by 2002:ac8:7e83:0:b0:42b:f656:c0a5 with SMTP id w3-20020ac87e83000000b0042bf656c0a5mr10940142qtj.18.1707756940321; Mon, 12 Feb 2024 08:55:40 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707756940; cv=pass; d=google.com; s=arc-20160816; b=jJoNai4NTFUWIQ4y7mnaUkcqmOG1UNJZMCUcRHgzINld6lbbu2jPnGLzYbYLzvSMs4 7h8b+j4Muuhh0dt4vtQJ1Xr+idOpkaOsRKYovze2uU50Y8OB2Nz1Oc0z/fgc4x1eq0us E6UEReN/X3K9WafEF01ON1RGl1GoQ/fu51q020QLuIh+CfsNUJ+GvU/NdGQb+7eRarL/ al6RXP6HUZJGMnex/MvfhrUINSH6RLqFyB8gikX1MtI6/VDat1UC4r1DAEcGBikwyMDS HvNyiqCo2kKcRUYr92AAc5jTtQj5/5ZT1x9m/ELzEeKALF/VViuZ9BZ1Wc6HAOLn3T3B SulA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=bMmX1dzIW69fE6TrUlwFCmfUU1Eg+YbiZRi9/1uTryE=; fh=UAmdvHK0oL23w2a8HO8Szi10i+pf9CC1qiPx9dUjln0=; b=nCfotu2/tQKVfl86ssugIgbckSSXZ4Be0PRVoAh8pXtIf4LmpIOeqd0wuTWEYlzlt+ jjUYghfW7kgehWvfXBTwNB9LZmMP8zk4RpyTV+9d6cw6tmkqR07aWcIkzB1AiH1zgHwW nFG5YAZuDg7q64TdAv8wlTStPfJkeizLJD8FIP/ehRRPHUYIeln83KQOUeNVZ8kSsS8z ++Q8c5ULA5nQ+nZSf+qGK3m/CMQsonZE2OpCpzvlSmm4M4ncl+Cb9dyVdtrfDGpH0pcJ 8zfFj5ilzbSeZn4O4bJj3Q9JgSuDEjAc/PvTwxNqWDFUCepjnL4H6TCZK4P4eJkhuZKU TvSw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="rCSJub/z"; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-62016-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-62016-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Forwarded-Encrypted: i=2; AJvYcCXhAY8Dd9TN2mC5tedYgx3Bwjt+uN13WhD53yLdhKa8P+is4lyAB0XONyb0R1vxInSjruN9HyQVn6iYhiWy0TlbCCVq2A== Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id p10-20020ac8740a000000b0042c71310957si751150qtq.175.2024.02.12.08.55.40 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Feb 2024 08:55:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-62016-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="rCSJub/z"; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-62016-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-62016-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id C07131C20B7A for <ouuuleilei@gmail.com>; Mon, 12 Feb 2024 16:55:38 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 403D24120B; Mon, 12 Feb 2024 16:53:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="rCSJub/z" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 28B373D3BC; Mon, 12 Feb 2024 16:53:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707756819; cv=none; b=O5n3kHed5DvBQ3R+tgTTfBfQqHnpMb20Mx49bZ/r/etcGXgq6MrCgPWRusf4i2A81rXWOqRQKSzqOLTJhgzAlYQzr/mFdfTeuuibNslCnsSehkcm9LVt/vI2uS6Xvwlzuxjd3v5MqMvJFuNumb6NRAlc0FgZ2aceWm7zmoHN34g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707756819; c=relaxed/simple; bh=Lcwh9oBaUlbIJIUDyERFA6U/A3CwBvNwqmxbv7vqjcY=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=X+L353M1/afzvH90AHJEtLaHJ4FXOQcimMfkpcfP7VaF831oda3qAptDlQZEh1xM46Zj6dtQHa27UvEiT7zF4EGR2f933TLMSNf57Ad05wsDZRfH943nJ0qGdkXLOVAS4cYZ9jGmWXVCvBkvgaCXiDOPo+V+YZdu6A3Z6FBSCK0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=rCSJub/z; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id AA1ABC433A6; Mon, 12 Feb 2024 16:53:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1707756818; bh=Lcwh9oBaUlbIJIUDyERFA6U/A3CwBvNwqmxbv7vqjcY=; h=From:To:Cc:Subject:Date:From; b=rCSJub/zIPFFAM8/y0sjGaoc1aoHaLB4akvR3z1UCkPSmfbdf2Y0mbZrRqVt7u7MT 1T1vwvsrtdGPsirrJo/ZcU7lBGb/lK4OsQL9WwMXZpDSIdeWlRUn7i7010+rX2cGoT p34MYWWKmdsqshE9HFLCecZfOWB8B2EKUCmAB7Q3j8Z1etlBPbxnQSG6CP99DPKCOi 4KuSz8PQ3Sy6fNnvGKWFLhHg1mJCtcrwNzUru24sQxxteh4CSA0s29ZU1Eyw2qLDAL FICIb1YikFVSXgRubXFjvew+7c2Dq+sXsqlrQfqtGVLGrzaVTJa7Gk9NF74K5eky5h 4dQkGd8UKnhLg== Received: from johan by xi.lan with local (Exim 4.97.1) (envelope-from <johan+linaro@kernel.org>) id 1rZZZ2-000000007Mx-0uIS; Mon, 12 Feb 2024 17:53:52 +0100 From: Johan Hovold <johan+linaro@kernel.org> To: Bjorn Andersson <andersson@kernel.org>, Bjorn Helgaas <bhelgaas@google.com> Cc: Konrad Dybcio <konrad.dybcio@linaro.org>, Lorenzo Pieralisi <lpieralisi@kernel.org>, =?utf-8?q?Krzysztof_Wilczy=C5=84?= =?utf-8?q?ski?= <kw@linux.com>, Rob Herring <robh@kernel.org>, Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org>, Conor Dooley <conor+dt@kernel.org>, Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>, linux-arm-msm@vger.kernel.org, linux-pci@vger.kernel.org, devicetree@vger.kernel.org, linux-kernel@vger.kernel.org, Johan Hovold <johan+linaro@kernel.org> Subject: [PATCH 00/10] arm64: dts: qcom: sc8280xp: enable GICv3 ITS for PCIe Date: Mon, 12 Feb 2024 17:50:33 +0100 Message-ID: <20240212165043.26961-1-johan+linaro@kernel.org> X-Mailer: git-send-email 2.43.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790712941569344927 X-GMAIL-MSGID: 1790712941569344927 |
Series |
arm64: dts: qcom: sc8280xp: enable GICv3 ITS for PCIe
|
|
Message
Johan Hovold
Feb. 12, 2024, 4:50 p.m. UTC
This series addresses a few problems with the sc8280xp PCIe implementation. The DWC PCIe controller can either use its internal MSI controller or an external one such as the GICv3 ITS. Enabling the latter allows for assigning affinity to individual interrupts, but results in a large amount of Correctable Errors being logged on both the Lenovo ThinkPad X13s and the sc8280xp-crd reference design. It turns out that these errors are always generated, but for some yet to be determined reason, the AER interrupts are never received when using the internal MSI controller, which makes the link errors harder to notice. On the X13s, there is a large number of errors generated when bringing up the link on boot. This is related to the fact that UEFI firmware has already enabled the Wi-Fi PCIe link at Gen2 speed and restarting the link at Gen3 generates a massive amount of errors until the Wi-Fi firmware is restarted. A recent commit enabling ASPM on certain Qualcomm platforms introduced further errors when using the Wi-Fi on the X13s as well as when accessing the NVMe on the CRD. The exact reason for this has not yet been identified, but disabling ASPM L0s makes the errors go away. This could suggest that either the current ASPM implementation is incomplete or that L0s is not supported with these devices. Note that the X13s and CRD use the same Wi-Fi controller, but the errors are only generated on the X13s. The NVMe controller on my X13s does not support L0s so there are no issues there, unlike on the CRD which uses a different controller. The modem on the CRD does not generate any errors, but both the NVMe and modem keeps bouncing in and out of L0s/L1 also when not used, which could indicate that there are bigger problems with the ASPM implementation. I don't have a modem on my X13s so I have not been able to test whether L0s causes an trouble there. Enabling AER error reporting on sc8280xp could similarly also reveal existing problems with the related sa8295p and sa8540p platforms as they share the base dtsi. The last four patches, marked as RFC, adds support for disabling ASPM L0s in the devicetree and disables it selectively for the X13s Wi-Fi and CRD NVMe. If it turns out that the Qualcomm PCIe implementation is incomplete, we may need to disable ASPM (L0s) completely in the driver instead. Note that disabling ASPM L0s for the X13s Wi-Fi does not seem to have a significant impact on the power consumption The DT bindings and PCI patch are expected to go through the PCI tree, while Bjorn A takes the devicetree updates through the Qualcomm tree. Johan Johan Hovold (10): dt-bindings: PCI: qcom: Allow 'required-opps' dt-bindings: PCI: qcom: Do not require 'msi-map-mask' arm64: dts: qcom: sc8280xp: add missing PCIe minimum OPP arm64: dts: qcom: sc8280xp-crd: limit pcie4 link speed arm64: dts: qcom: sc8280xp-x13s: limit pcie4 link speed arm64: dts: qcom: sc8280xp: enable GICv3 ITS for PCIe dt-bindings: PCI: qcom: Allow 'aspm-no-l0s' PCI: qcom: Add support for disabling ASPM L0s in devicetree arm64: dts: qcom: sc8280xp-crd: disable ASPM L0s for NVMe arm64: dts: qcom: sc8280xp-x13s: disable ASPM L0s for Wi-Fi .../devicetree/bindings/pci/qcom,pcie.yaml | 6 +++++- arch/arm64/boot/dts/qcom/sc8280xp-crd.dts | 4 ++++ .../qcom/sc8280xp-lenovo-thinkpad-x13s.dts | 3 +++ arch/arm64/boot/dts/qcom/sc8280xp.dtsi | 17 +++++++++++++++- drivers/pci/controller/dwc/pcie-qcom.c | 20 +++++++++++++++++++ 5 files changed, 48 insertions(+), 2 deletions(-)
Comments
On Mon, Feb 12, 2024 at 05:50:33PM +0100, Johan Hovold wrote: > This series addresses a few problems with the sc8280xp PCIe > implementation. > > The DWC PCIe controller can either use its internal MSI controller or an > external one such as the GICv3 ITS. Enabling the latter allows for > assigning affinity to individual interrupts, but results in a large > amount of Correctable Errors being logged on both the Lenovo ThinkPad > X13s and the sc8280xp-crd reference design. > > It turns out that these errors are always generated, How did you confirm this? > but for some yet to > be determined reason, the AER interrupts are never received when using > the internal MSI controller, which makes the link errors harder to > notice. > If you manually inject the errors using "aer-inject", are you not seeing the AER errors with internal MSI controller as well? > On the X13s, there is a large number of errors generated when bringing > up the link on boot. This is related to the fact that UEFI firmware has > already enabled the Wi-Fi PCIe link at Gen2 speed and restarting the > link at Gen3 generates a massive amount of errors until the Wi-Fi > firmware is restarted. > > A recent commit enabling ASPM on certain Qualcomm platforms introduced > further errors when using the Wi-Fi on the X13s as well as when > accessing the NVMe on the CRD. The exact reason for this has not yet > been identified, but disabling ASPM L0s makes the errors go away. This > could suggest that either the current ASPM implementation is incomplete > or that L0s is not supported with these devices. > What are those "further errors" you are seeing with ASPM enabled? Are those errors appear with GIC ITS or with internal MSI controller as well? > Note that the X13s and CRD use the same Wi-Fi controller, but the errors > are only generated on the X13s. The NVMe controller on my X13s does not > support L0s so there are no issues there, unlike on the CRD which uses a > different controller. The modem on the CRD does not generate any errors, > but both the NVMe and modem keeps bouncing in and out of L0s/L1 also > when not used, which could indicate that there are bigger problems with > the ASPM implementation. I don't have a modem on my X13s so I have not > been able to test whether L0s causes an trouble there. > > Enabling AER error reporting on sc8280xp could similarly also reveal > existing problems with the related sa8295p and sa8540p platforms as they > share the base dtsi. > > The last four patches, marked as RFC, adds support for disabling ASPM > L0s in the devicetree and disables it selectively for the X13s Wi-Fi > and CRD NVMe. If it turns out that the Qualcomm PCIe implementation is > incomplete, we may need to disable ASPM (L0s) completely in the driver > instead. > If the device is not supporting L0s, then it as to be disabled in the device, not in the PCIe controller, no? > Note that disabling ASPM L0s for the X13s Wi-Fi does not seem to have a > significant impact on the power consumption > > The DT bindings and PCI patch are expected to go through the PCI tree, > while Bjorn A takes the devicetree updates through the Qualcomm tree. > Since I took a stab at enabling the GIC ITS previously, I noticed that the NVMe performance got a slight dip. And that was one of the reasons (apart from AER errors) that I never submitted the patch. Could you share the NVMe benchmark (fio) with this series? > Johan > > > Johan Hovold (10): > dt-bindings: PCI: qcom: Allow 'required-opps' > dt-bindings: PCI: qcom: Do not require 'msi-map-mask' > arm64: dts: qcom: sc8280xp: add missing PCIe minimum OPP > arm64: dts: qcom: sc8280xp-crd: limit pcie4 link speed > arm64: dts: qcom: sc8280xp-x13s: limit pcie4 link speed > arm64: dts: qcom: sc8280xp: enable GICv3 ITS for PCIe Is this patch based on the version I shared with you long back? If so, I'd expect to have some credit. If you came up with your own version, then ignore this comment. - Mani > dt-bindings: PCI: qcom: Allow 'aspm-no-l0s' > PCI: qcom: Add support for disabling ASPM L0s in devicetree > arm64: dts: qcom: sc8280xp-crd: disable ASPM L0s for NVMe > arm64: dts: qcom: sc8280xp-x13s: disable ASPM L0s for Wi-Fi > > .../devicetree/bindings/pci/qcom,pcie.yaml | 6 +++++- > arch/arm64/boot/dts/qcom/sc8280xp-crd.dts | 4 ++++ > .../qcom/sc8280xp-lenovo-thinkpad-x13s.dts | 3 +++ > arch/arm64/boot/dts/qcom/sc8280xp.dtsi | 17 +++++++++++++++- > drivers/pci/controller/dwc/pcie-qcom.c | 20 +++++++++++++++++++ > 5 files changed, 48 insertions(+), 2 deletions(-) > > -- > 2.43.0 >
On Wed, Feb 14, 2024 at 12:05:54PM +0530, Manivannan Sadhasivam wrote: > On Mon, Feb 12, 2024 at 05:50:33PM +0100, Johan Hovold wrote: > > This series addresses a few problems with the sc8280xp PCIe > > implementation. > > > > The DWC PCIe controller can either use its internal MSI controller or an > > external one such as the GICv3 ITS. Enabling the latter allows for > > assigning affinity to individual interrupts, but results in a large > > amount of Correctable Errors being logged on both the Lenovo ThinkPad > > X13s and the sc8280xp-crd reference design. > > > > It turns out that these errors are always generated, > > How did you confirm this? You can see that error flags being set in the controller and endpoint, for example, using lspci -vv: CESta: RxErr- BadTLP+ BadDLLP- Rollover- Timeout- AdvNonFatalErr- > > but for some yet to > > be determined reason, the AER interrupts are never received when using > > the internal MSI controller, which makes the link errors harder to > > notice. > > If you manually inject the errors using "aer-inject", are you not seeing the AER > errors with internal MSI controller as well? I haven't tried that, I'm just reporting that that piece of functionality is currently broken and that that partly explains why the ASPM problems went unnoticed. > > On the X13s, there is a large number of errors generated when bringing > > up the link on boot. This is related to the fact that UEFI firmware has > > already enabled the Wi-Fi PCIe link at Gen2 speed and restarting the > > link at Gen3 generates a massive amount of errors until the Wi-Fi > > firmware is restarted. > > > > A recent commit enabling ASPM on certain Qualcomm platforms introduced > > further errors when using the Wi-Fi on the X13s as well as when > > accessing the NVMe on the CRD. The exact reason for this has not yet > > been identified, but disabling ASPM L0s makes the errors go away. This > > could suggest that either the current ASPM implementation is incomplete > > or that L0s is not supported with these devices. > > What are those "further errors" you are seeing with ASPM enabled? Are those > errors appear with GIC ITS or with internal MSI controller as well? Further errors as in further correctable errors that are not related to the errors seen when resetting the X13s Wi-Fi link at boot. These show up, for example, when accessing the NVMe on the CRD or when using the Wi-Fi on the X13s. These errors go away when L0s is disabled. And yes, you see them with both the external and internal MSI controller (in the latter case, by looking at the error flags mentioned above). > > Note that the X13s and CRD use the same Wi-Fi controller, but the errors > > are only generated on the X13s. The NVMe controller on my X13s does not > > support L0s so there are no issues there, unlike on the CRD which uses a > > different controller. The modem on the CRD does not generate any errors, > > but both the NVMe and modem keeps bouncing in and out of L0s/L1 also > > when not used, which could indicate that there are bigger problems with > > the ASPM implementation. I don't have a modem on my X13s so I have not > > been able to test whether L0s causes an trouble there. > > > > Enabling AER error reporting on sc8280xp could similarly also reveal > > existing problems with the related sa8295p and sa8540p platforms as they > > share the base dtsi. > > > > The last four patches, marked as RFC, adds support for disabling ASPM > > L0s in the devicetree and disables it selectively for the X13s Wi-Fi > > and CRD NVMe. If it turns out that the Qualcomm PCIe implementation is > > incomplete, we may need to disable ASPM (L0s) completely in the driver > > instead. > > If the device is not supporting L0s, then it as to be disabled in the device, > not in the PCIe controller, no? Well, we don't know yet where the problem lies, just that enabling L0s results in a large number of correctable errors. Until yesterday I had not seen any such errors for the Wi-Fi on the CRD, which uses essentially the same ath11k controller, so there was no clear indication that this was necessarily a problem with the devices either. > > Note that disabling ASPM L0s for the X13s Wi-Fi does not seem to have a > > significant impact on the power consumption > > > > The DT bindings and PCI patch are expected to go through the PCI tree, > > while Bjorn A takes the devicetree updates through the Qualcomm tree. > > Since I took a stab at enabling the GIC ITS previously, I noticed that the NVMe > performance got a slight dip. And that was one of the reasons (apart from AER > errors) that I never submitted the patch. > > Could you share the NVMe benchmark (fio) with this series? Did you have any particular benchmark in mind? I have run multiple fio benchmarks and while the results vary with the parameters, the impact of switching to ITS (so that not all PCIe interrupts are processed on CPU0) is generally favourable. A raw sequential read shows no change in throughput on either the X13s or the CRD even if for some reason this test performs really badly on the X13s (i.e. regardless of which MSI controller is used): crd-rseq-read: IOPS=11.1k, BW=2764MiB/s (2898MB/s)(81.0GiB/30003msec) X13s-rseq-read: IOPS=508, BW=127MiB/s (134MB/s)(3841MiB/30169msec) Another benchmark I've used against a mounted ext4 partition shows a 2x improvement in throughput with ITS for sequential and random reads and writes on the X13s: seq-read: IOPS=88.4k, BW=345MiB/s (362MB/s)(10.0GiB/29657msec) rand-read: IOPS=21.2k, BW=82.8MiB/s (86.8MB/s)(4967MiB/60001msec) seq-write: IOPS=162k, BW=632MiB/s (662MB/s)(10.0GiB/16213msec) rand-write: IOPS=142k, BW=555MiB/s (582MB/s)(10.0GiB/18439msec) while the results are essentially unchanged with a larger block size and queue depth (32/2m instead of 4/4k): seq-read: IOPS=1095, BW=2191MiB/s (2298MB/s)(10.0GiB/4673msec) rand-read: IOPS=1020, BW=2041MiB/s (2140MB/s)(10.0GiB/5017msec) seq-write: IOPS=918, BW=1837MiB/s (1926MB/s)(10.0GiB/5574msec) rand-write: IOPS=826, BW=1653MiB/s (1734MB/s)(10.0GiB/6194msec) > > Johan Hovold (10): > > dt-bindings: PCI: qcom: Allow 'required-opps' > > dt-bindings: PCI: qcom: Do not require 'msi-map-mask' > > arm64: dts: qcom: sc8280xp: add missing PCIe minimum OPP > > arm64: dts: qcom: sc8280xp-crd: limit pcie4 link speed > > arm64: dts: qcom: sc8280xp-x13s: limit pcie4 link speed > > arm64: dts: qcom: sc8280xp: enable GICv3 ITS for PCIe > > Is this patch based on the version I shared with you long back? If so, I'd > expect to have some credit. If you came up with your own version, then ignore > this comment. No, this patch has beeen created and evaluated from scratch based on the downstream direwolf dts, which has these five 'msi-map' properties. I debated whether I should base it on your version instead, but in the end it would have a new commit message and only these properties from the downstream dtsi would remain (you also removed existing properties IIRC). So while it's certainly inspired by your work, this has been done from scratch, including the testing. If you prefer I can make this clear in the commit message, but adding a Co-developed-by didn't seem quite right either as I did this work without your involvement. But perhaps that would be better? Johan
On Wed, Feb 14, 2024 at 12:09:15PM +0100, Johan Hovold wrote: > On Wed, Feb 14, 2024 at 12:05:54PM +0530, Manivannan Sadhasivam wrote: > > On Mon, Feb 12, 2024 at 05:50:33PM +0100, Johan Hovold wrote: > > > This series addresses a few problems with the sc8280xp PCIe > > > implementation. > > > > > > The DWC PCIe controller can either use its internal MSI controller or an > > > external one such as the GICv3 ITS. Enabling the latter allows for > > > assigning affinity to individual interrupts, but results in a large > > > amount of Correctable Errors being logged on both the Lenovo ThinkPad > > > X13s and the sc8280xp-crd reference design. > > > > > > It turns out that these errors are always generated, > > > > How did you confirm this? > > You can see that error flags being set in the controller and endpoint, > for example, using lspci -vv: > > CESta: RxErr- BadTLP+ BadDLLP- Rollover- Timeout- AdvNonFatalErr- > Okay. > > > but for some yet to > > > be determined reason, the AER interrupts are never received when using > > > the internal MSI controller, which makes the link errors harder to > > > notice. > > > > If you manually inject the errors using "aer-inject", are you not seeing the AER > > errors with internal MSI controller as well? > > I haven't tried that, I'm just reporting that that piece of > functionality is currently broken and that that partly explains why the > ASPM problems went unnoticed. > I just gave it a shot and I could see the AER interrupts raised for correctable errors with internal MSI controller. Now I'm puzzled why this is not getting triggered by default. I'll check with the hardware team if they have any clue. > > > On the X13s, there is a large number of errors generated when bringing > > > up the link on boot. This is related to the fact that UEFI firmware has > > > already enabled the Wi-Fi PCIe link at Gen2 speed and restarting the > > > link at Gen3 generates a massive amount of errors until the Wi-Fi > > > firmware is restarted. > > > > > > A recent commit enabling ASPM on certain Qualcomm platforms introduced > > > further errors when using the Wi-Fi on the X13s as well as when > > > accessing the NVMe on the CRD. The exact reason for this has not yet > > > been identified, but disabling ASPM L0s makes the errors go away. This > > > could suggest that either the current ASPM implementation is incomplete > > > or that L0s is not supported with these devices. > > > > What are those "further errors" you are seeing with ASPM enabled? Are those > > errors appear with GIC ITS or with internal MSI controller as well? > > Further errors as in further correctable errors that are not related to > the errors seen when resetting the X13s Wi-Fi link at boot. > > These show up, for example, when accessing the NVMe on the CRD or when > using the Wi-Fi on the X13s. These errors go away when L0s is disabled. > > And yes, you see them with both the external and internal MSI controller > (in the latter case, by looking at the error flags mentioned above). > Hmm. > > > Note that the X13s and CRD use the same Wi-Fi controller, but the errors > > > are only generated on the X13s. The NVMe controller on my X13s does not > > > support L0s so there are no issues there, unlike on the CRD which uses a > > > different controller. The modem on the CRD does not generate any errors, > > > but both the NVMe and modem keeps bouncing in and out of L0s/L1 also > > > when not used, which could indicate that there are bigger problems with > > > the ASPM implementation. I don't have a modem on my X13s so I have not > > > been able to test whether L0s causes an trouble there. > > > > > > Enabling AER error reporting on sc8280xp could similarly also reveal > > > existing problems with the related sa8295p and sa8540p platforms as they > > > share the base dtsi. > > > > > > The last four patches, marked as RFC, adds support for disabling ASPM > > > L0s in the devicetree and disables it selectively for the X13s Wi-Fi > > > and CRD NVMe. If it turns out that the Qualcomm PCIe implementation is > > > incomplete, we may need to disable ASPM (L0s) completely in the driver > > > instead. > > > > If the device is not supporting L0s, then it as to be disabled in the device, > > not in the PCIe controller, no? > > Well, we don't know yet where the problem lies, just that enabling L0s > results in a large number of correctable errors. > > Until yesterday I had not seen any such errors for the Wi-Fi on the CRD, > which uses essentially the same ath11k controller, so there was no clear > indication that this was necessarily a problem with the devices either. > I'll confirm the L0s compatibility with the hardware team. > > > Note that disabling ASPM L0s for the X13s Wi-Fi does not seem to have a > > > significant impact on the power consumption > > > > > > The DT bindings and PCI patch are expected to go through the PCI tree, > > > while Bjorn A takes the devicetree updates through the Qualcomm tree. > > > > Since I took a stab at enabling the GIC ITS previously, I noticed that the NVMe > > performance got a slight dip. And that was one of the reasons (apart from AER > > errors) that I never submitted the patch. > > > > Could you share the NVMe benchmark (fio) with this series? > > Did you have any particular benchmark in mind? > > I have run multiple fio benchmarks and while the results vary with the > parameters, the impact of switching to ITS (so that not all PCIe > interrupts are processed on CPU0) is generally favourable. > > A raw sequential read shows no change in throughput on either the X13s > or the CRD even if for some reason this test performs really badly on > the X13s (i.e. regardless of which MSI controller is used): > > crd-rseq-read: IOPS=11.1k, BW=2764MiB/s (2898MB/s)(81.0GiB/30003msec) > X13s-rseq-read: IOPS=508, BW=127MiB/s (134MB/s)(3841MiB/30169msec) > > Another benchmark I've used against a mounted ext4 partition shows a 2x > improvement in throughput with ITS for sequential and random reads and > writes on the X13s: > > seq-read: IOPS=88.4k, BW=345MiB/s (362MB/s)(10.0GiB/29657msec) > rand-read: IOPS=21.2k, BW=82.8MiB/s (86.8MB/s)(4967MiB/60001msec) > seq-write: IOPS=162k, BW=632MiB/s (662MB/s)(10.0GiB/16213msec) > rand-write: IOPS=142k, BW=555MiB/s (582MB/s)(10.0GiB/18439msec) > > while the results are essentially unchanged with a larger block size and > queue depth (32/2m instead of 4/4k): > > seq-read: IOPS=1095, BW=2191MiB/s (2298MB/s)(10.0GiB/4673msec) > rand-read: IOPS=1020, BW=2041MiB/s (2140MB/s)(10.0GiB/5017msec) > seq-write: IOPS=918, BW=1837MiB/s (1926MB/s)(10.0GiB/5574msec) > rand-write: IOPS=826, BW=1653MiB/s (1734MB/s)(10.0GiB/6194msec) > Ok, this looks promising. Long back when I tried the benchmark (seq & rand r/w), performance dropped slightly with GIC ITS. But looks like things have changed. > > > Johan Hovold (10): > > > dt-bindings: PCI: qcom: Allow 'required-opps' > > > dt-bindings: PCI: qcom: Do not require 'msi-map-mask' > > > arm64: dts: qcom: sc8280xp: add missing PCIe minimum OPP > > > arm64: dts: qcom: sc8280xp-crd: limit pcie4 link speed > > > arm64: dts: qcom: sc8280xp-x13s: limit pcie4 link speed > > > arm64: dts: qcom: sc8280xp: enable GICv3 ITS for PCIe > > > > Is this patch based on the version I shared with you long back? If so, I'd > > expect to have some credit. If you came up with your own version, then ignore > > this comment. > > No, this patch has beeen created and evaluated from scratch based on the > downstream direwolf dts, which has these five 'msi-map' properties. > > I debated whether I should base it on your version instead, but in the > end it would have a new commit message and only these properties from > the downstream dtsi would remain (you also removed existing properties > IIRC). So while it's certainly inspired by your work, this has been done > from scratch, including the testing. > > If you prefer I can make this clear in the commit message, but adding a > Co-developed-by didn't seem quite right either as I did this work > without your involvement. But perhaps that would be better? > Nah. As I said, if you have created the patch without basing on my version, then no credit is required. I just wanted to know since I shared the patch earlier. - Mani