Message ID | 20230525163448.v1.1.Id388e4e2aa48fc56f9cd2d413aabd461ff81d615@changeid |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp221630vqr; Thu, 25 May 2023 02:10:44 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5vtjIhtTJJihtPlf3Ib7axjt7jULP9fyqyt8zXjjxv3DA+o8PBd1oHAGdBw3ma5nI7LCli X-Received: by 2002:a05:6a21:998b:b0:10e:5c1f:660f with SMTP id ve11-20020a056a21998b00b0010e5c1f660fmr4397927pzb.35.1685005844538; Thu, 25 May 2023 02:10:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685005844; cv=none; d=google.com; s=arc-20160816; b=T0GVAbX1+Gx3f1fe91UySRZfbi/352CtTEnDle5y75suHPBi+LKrln5zbiqEtha7XI R9PAg2/eQVt6pkBZ4bUzR9eGRH4DiYQfUcIMccDVvl/RnOmEgq1DBN8LkIzmVgYu2m5c ClcL1Jpa0f+ujd0z2tyVO4UeoFIHhp1JGLWogp4T64X4mTTsbbI4qUBhm1S9pMeCUqyT h56QqXqC6kQLJYBbcvzk7ZCpoJ21/F06ouIH2Gcwb5n9uMEhCX1zyvrobrMb1VpfFRtY tnRh1zt4pVccKWapsnp/udam7MSQGfgZPF2MdqgqBtEc+dudFSJhqCryKV1IgfzdhC6b 2hcg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:subject:cc:to:from :dkim-signature; bh=XwxYuZAaDrbWvA6fNvvfa3UTwfQ09LwU2WlVlYLVfbE=; b=IiRf4PRTxdOz0gLHgg0BXVCS8wDLe3EaQtVGcdcUHp45F9+Nz0tQ0GHpuNqAKtRJ6S d/IGjI+5+AzXueRcSbWeB06WUkkgL81eKW11erk1otUdKp+JLVJE87kjUzB5lBQJKr7Q bcfhRoRGwrbuXzvNHtWg+6gs/Kk8BGWtIfC/HAL/SKEX9B1NS4sbXrjZ0Yf0FEkRl4WE 0BlvEFt9hEkW4KPGxWQP+RLpg6DL1wzXeiRknFvX1E/n3lU9d7xgO06OMjEClbGNDL0V 96pjABE/wU/Jx05t2Eg/0ugOKHd2iTpMS66Du6vGqC8CuUYZoB7V7ZvNfm22fEXuCMIq OHdw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=OI4KQNs0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h71-20020a63834a000000b0050bfa82c243si767920pge.17.2023.05.25.02.10.29; Thu, 25 May 2023 02:10:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=OI4KQNs0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240404AbjEYIgd (ORCPT <rfc822;ahmedalshaiji.dev@gmail.com> + 99 others); Thu, 25 May 2023 04:36:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59150 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240167AbjEYIgL (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 25 May 2023 04:36:11 -0400 Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com [IPv6:2607:f8b0:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F0BFC171F; Thu, 25 May 2023 01:35:40 -0700 (PDT) Received: by mail-pf1-x429.google.com with SMTP id d2e1a72fcca58-64d604cc0aaso1512572b3a.2; Thu, 25 May 2023 01:35:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685003722; x=1687595722; h=message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=XwxYuZAaDrbWvA6fNvvfa3UTwfQ09LwU2WlVlYLVfbE=; b=OI4KQNs0Vu1mrj91oIqkWUzpeXRr0QJ9eEwkadPD2CL2y6GsKvDqvnbWBpG3CNT4Dn r9iDQyKXs1IBFmRDsad/RRGqhQrI2g93M1m0fy/jEMKkJkxb2wHpzUHPaU2q+5RwsDni F3arDPu3mhEBTz5Tfs6YjoPtW9TkCVF/CG0xnJh4P/5WEO0t1W4hlU0MNQLYbTjLhSgP j9gOCSCRTaoXmOib23ugSq1IaY+NVgf8uZUPrSAcpNjz0dNouMuDyff1+Ty41IblF2sW JwcqJvBsYOE3I9zHc2mmIl/MGAwFO/D4RdtL3rNrSLF+dqAqyvamQt4BEEhTJ21wLqfs /J1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685003722; x=1687595722; h=message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XwxYuZAaDrbWvA6fNvvfa3UTwfQ09LwU2WlVlYLVfbE=; b=M0b4W+6HnhSf0etHKyD5B5xFZGgH3dduzkhmHY7akUw7utbnTkHtCgeeLfNxXAKJOn gJtjs8nJWCBEXIwn3TM7JS5uOu0IHI2x/Q/e68C32B5tBI7mBk/O7vPFUy9bpq7MAI8c T8EojQ7gXf/mXq3xvWwMyJcajpvj7UuUMbhfFfNCwyQg8AquS4vcfmSNnrF7sQWPnpRi p+3ICwfDCt7nbIFNYpWfm/Bc9oZH9iyGf5XyZ/kAlJ1c4PGYNpgxjd+52hBrRf+EvTiW 6vdLkJ72FcjKrZpeWvQmjdDxEfLFENa1gYVlBlJnHlnZwG2m34lsMu/xIP8ot2zTQVR4 JYjw== X-Gm-Message-State: AC+VfDx/F2nmLnGZatmLxwc/RM+0+ZT+CuBVD7AzCLm6pHNSowNE+H5M nDGRAYuYViz1xsFp+7oa1C5a02ewCt0= X-Received: by 2002:a05:6a00:2d09:b0:64f:4019:ec5b with SMTP id fa9-20020a056a002d0900b0064f4019ec5bmr9014927pfb.7.1685003722494; Thu, 25 May 2023 01:35:22 -0700 (PDT) Received: from localhost.localdomain (2001-b400-e2ae-cfb3-c8e7-b613-8fc3-c8f3.emome-ip6.hinet.net. [2001:b400:e2ae:cfb3:c8e7:b613:8fc3:c8f3]) by smtp.gmail.com with ESMTPSA id x7-20020a056a00270700b0062bc045bf4fsm747951pfv.19.2023.05.25.01.35.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 May 2023 01:35:21 -0700 (PDT) From: Owen Yang <ecs.taipeikernel@gmail.com> To: LKML <linux-kernel@vger.kernel.org> Cc: Bob Moragues <moragues@google.com>, Abner Yen <abner.yen@ecs.com.tw>, Doug Anderson <dianders@chromium.org>, Matthias Kaehlcke <mka@google.com>, Stephen Boyd <swboyd@chromium.org>, Harvey <hunge@google.com>, Gavin Lee <gavin.lee@ecs.com.tw>, Owen Yang <ecs.taipeikernel@gmail.com>, Bjorn Helgaas <bhelgaas@google.com>, linux-pci@vger.kernel.org Subject: [PATCH v1] drivers: pci: quirks: Add suspend fixup for SSD on sc7280 Date: Thu, 25 May 2023 16:35:12 +0800 Message-Id: <20230525163448.v1.1.Id388e4e2aa48fc56f9cd2d413aabd461ff81d615@changeid> X-Mailer: git-send-email 2.17.1 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766856688582091139?= X-GMAIL-MSGID: =?utf-8?q?1766856688582091139?= |
Series |
[v1] drivers: pci: quirks: Add suspend fixup for SSD on sc7280
|
|
Commit Message
Owen Yang
May 25, 2023, 8:35 a.m. UTC
Implement this workaround until Qualcomm fixed the
correct NVMe suspend process.
Signed-off-by: Owen Yang <ecs.taipeikernel@gmail.com>
---
drivers/pci/quirks.c | 10 ++++++++++
1 file changed, 10 insertions(+)
Comments
On Thu, May 25, 2023 at 04:35:12PM +0800, Owen Yang wrote: > Implement this workaround until Qualcomm fixed the > correct NVMe suspend process. Thanks for the patch. Before I can do anything, this needs: - Subject line in style of the file (use "git log --oneline drivers/pci/quirks.c"). - Format commit log correctly (fill 75 columns, no leading spaces). - Description of incorrect behavior. What does the user see? If there's a bug report, include a link to it. - Multi-line code comments in style of the file (look at existing comments in the file). - Details of "the correct ASPM state". ASPM may be enabled or disabled by the user, so you can't assume any particular ASPM configuration. - Details on the Qualcomm sc7280 connection. This quirk would affect Phison SSDs on *all* platforms, not just sc7280. I don't want to slow down suspend on all platforms just for a sc7280 issue. - Drop the "until Qualcomm fixes NVMe suspend" text. Even if Qualcomm fixes something, we can't just drop this quirk because there will be platforms in the field that don't have the Qualcomm fix. Bjorn > Signed-off-by: Owen Yang <ecs.taipeikernel@gmail.com> > --- > > drivers/pci/quirks.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index f4e2a88729fd..b57876dc2624 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -5945,6 +5945,16 @@ static void nvidia_ion_ahci_fixup(struct pci_dev *pdev) > } > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NVIDIA, 0x0ab8, nvidia_ion_ahci_fixup); > > +/* In Qualcomm 7c gen 3 sc7280 platform. Some of the SSD won't enter > + * the correct ASPM state properly. Therefore. Implement this workaround > + * until Qualcomm fixed the correct NVMe suspend process*/ > +static void phison_suspend_fixup(struct pci_dev *pdev) > +{ > + msleep(30); > +} > +DECLARE_PCI_FIXUP_SUSPEND(0x1987, 0x5013, phison_suspend_fixup); > +DECLARE_PCI_FIXUP_SUSPEND(0x1987, 0x5015, phison_suspend_fixup); > + > static void rom_bar_overlap_defect(struct pci_dev *dev) > { > pci_info(dev, "working around ROM BAR overlap defect\n"); > -- > 2.17.1 >
On Thu, May 25, 2023 at 04:35:12PM +0800, Owen Yang wrote: > Implement this workaround until Qualcomm fixed the > correct NVMe suspend process. > > Signed-off-by: Owen Yang <ecs.taipeikernel@gmail.com> > --- > > drivers/pci/quirks.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index f4e2a88729fd..b57876dc2624 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -5945,6 +5945,16 @@ static void nvidia_ion_ahci_fixup(struct pci_dev *pdev) > } > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NVIDIA, 0x0ab8, nvidia_ion_ahci_fixup); > > +/* In Qualcomm 7c gen 3 sc7280 platform. Some of the SSD won't enter > + * the correct ASPM state properly. Therefore. Implement this workaround > + * until Qualcomm fixed the correct NVMe suspend process*/ What is there to fix during suspend? Currently, Qcom PCIe driver just votes for low interconnect bandwidth and keeps the resources (clocks, regulators) ON during suspend. So there is no way the device would move to D3Cold. Earlier Qcom reported that during suspend, link down event happens when the resources are turned OFF without waiting for the link to enter L1ss. But as I said above, we are _not_ turning OFF any resources. I believe this patch is addressing an issue that is caused by an out-of-tree patch. - Mani > +static void phison_suspend_fixup(struct pci_dev *pdev) > +{ > + msleep(30); > +} > +DECLARE_PCI_FIXUP_SUSPEND(0x1987, 0x5013, phison_suspend_fixup); > +DECLARE_PCI_FIXUP_SUSPEND(0x1987, 0x5015, phison_suspend_fixup); > + > static void rom_bar_overlap_defect(struct pci_dev *dev) > { > pci_info(dev, "working around ROM BAR overlap defect\n"); > -- > 2.17.1 >
On Mon, May 29, 2023 at 02:24:53PM +0800, 楊宗翰 wrote: > Hi Bjorn, > > Thanks for your kind directions. > >  - Subject line in style of the file (use "git log --oneline >   drivers/pci/quirks.c"). > Done, and I resend in topic "[PATCH v1] PCI: Add suspend fixup for SSD >  on sc7280", please review it. > >  - Format commit log correctly (fill 75 columns, no leading spaces). > Done. > >  - Description of incorrect behavior. What does the user see? If >   there's a bug report, include a link to it. > This issue seems to be discovered in ChromeOS only. SSD will randomly > crashed at 100~250+ suspend/resume cycle. Phison and Qualcomm > found that its due to NVMe entering D3cold instead of L1ss. It should be noted that D3cold (or whatever condition that causes the issue) is not always entered, but only in the failure case (at least that was the case for the Kioxia NVMe, which has a similar issue). >  - Multi-line code comments in style of the file (look at existing >   comments in the file). > Done. > >  - Details of "the correct ASPM state". ASPM may be enabled or >   disabled by the user, so you can't assume any particular ASPM >   configuration. > According to Qualcomm. This issue has been found last year and they have > attempt to submit some patches to fix the pci suspend behavior. > (ref:https://patchwork.kernel.org/project/linux-arm-msm/list/? > series=665060&state=%2A&archive=both). > But somehow these patches were rejected because of its complexity. And > we've got advise from Google that it will be more efficient that we implement > a quirks to fix this issue. IIRC the primary goal of this series was to be able to turn off the PCI clocks during suspend, to allow the SoC to enter a lower power state. This fixing element for NVMe with the issue described above is the the retry loop of "PCI: qcom: Add retry logic for link to be stable in L1ss" [1]. It is currently unclear why *some* NVMe *sometimes* need a longer time to enter the L1 sub-state. That's something Qualcomm and the vendors of impacted NVMes should figure out. [1] https://patchwork.kernel.org/project/linux-arm-msm/patch/1659526134-22978-4-git-send-email-quic_krichai@quicinc.com/ >  - Details on the Qualcomm sc7280 connection. This quirk would >   affect Phison SSDs on *all* platforms, not just sc7280. I don't >   want to slow down suspend on all platforms just for a sc7280 >   issue. As of now the issue has only been observed on QC SC7280, I don't know if ECS has tried this part on other platforms. The issue could be QC/SC7280-specific or not. > The DECLARE_PCI_FIXUP_SUSPEND function has already specify the PCI device > ID. And this SSD will only be used at our Chromebook device only. It could be used in devices that are produced by other manufacturers. A dedicated Kconfig option for the Phison NVMe could be an option. Or a QC specific #ifdef (ugh ...) with a comment explaining that the issue has been only observed on QC SC7280 *so far*.
On Mon, May 29, 2023 at 10:18:56PM +0530, Manivannan Sadhasivam wrote: > On Thu, May 25, 2023 at 04:35:12PM +0800, Owen Yang wrote: > > Implement this workaround until Qualcomm fixed the > > correct NVMe suspend process. > > > > Signed-off-by: Owen Yang <ecs.taipeikernel@gmail.com> > > --- > > > > drivers/pci/quirks.c | 10 ++++++++++ > > 1 file changed, 10 insertions(+) > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > > index f4e2a88729fd..b57876dc2624 100644 > > --- a/drivers/pci/quirks.c > > +++ b/drivers/pci/quirks.c > > @@ -5945,6 +5945,16 @@ static void nvidia_ion_ahci_fixup(struct pci_dev *pdev) > > } > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NVIDIA, 0x0ab8, nvidia_ion_ahci_fixup); > > > > +/* In Qualcomm 7c gen 3 sc7280 platform. Some of the SSD won't enter > > + * the correct ASPM state properly. Therefore. Implement this workaround > > + * until Qualcomm fixed the correct NVMe suspend process*/ > > What is there to fix during suspend? Currently, Qcom PCIe driver just votes for > low interconnect bandwidth and keeps the resources (clocks, regulators) ON > during suspend. So there is no way the device would move to D3Cold. > > Earlier Qcom reported that during suspend, link down event happens when the > resources are turned OFF without waiting for the link to enter L1ss. But as I > said above, we are _not_ turning OFF any resources. Right, it makes little sense that the NVMe would move to D3Cold. And why does the issue only reproduces sometimes (with certain NVMes) and not consistently? > I believe this patch is addressing an issue that is caused by an out-of-tree > patch. I think ECS observed this with Chrome OS v5.15 kernel. On the PCI side this kernel only has backported changes from upstream (mostly clean picks), no downstream patches, so it seems unlikely that the issue is caused by a downstream patch.
On Tue, May 30, 2023 at 09:17:02PM +0000, Matthias Kaehlcke wrote: > On Mon, May 29, 2023 at 10:18:56PM +0530, Manivannan Sadhasivam wrote: > > On Thu, May 25, 2023 at 04:35:12PM +0800, Owen Yang wrote: > > > Implement this workaround until Qualcomm fixed the > > > correct NVMe suspend process. > > > > > > Signed-off-by: Owen Yang <ecs.taipeikernel@gmail.com> > > > --- > > > > > > drivers/pci/quirks.c | 10 ++++++++++ > > > 1 file changed, 10 insertions(+) > > > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > > > index f4e2a88729fd..b57876dc2624 100644 > > > --- a/drivers/pci/quirks.c > > > +++ b/drivers/pci/quirks.c > > > @@ -5945,6 +5945,16 @@ static void nvidia_ion_ahci_fixup(struct pci_dev *pdev) > > > } > > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NVIDIA, 0x0ab8, nvidia_ion_ahci_fixup); > > > > > > +/* In Qualcomm 7c gen 3 sc7280 platform. Some of the SSD won't enter > > > + * the correct ASPM state properly. Therefore. Implement this workaround > > > + * until Qualcomm fixed the correct NVMe suspend process*/ > > > > What is there to fix during suspend? Currently, Qcom PCIe driver just votes for > > low interconnect bandwidth and keeps the resources (clocks, regulators) ON > > during suspend. So there is no way the device would move to D3Cold. > > > > Earlier Qcom reported that during suspend, link down event happens when the > > resources are turned OFF without waiting for the link to enter L1ss. But as I > > said above, we are _not_ turning OFF any resources. > > Right, it makes little sense that the NVMe would move to D3Cold. And why does > the issue only reproduces sometimes (with certain NVMes) and not consistently? > Honestly, I don't have any idea why it is happening. The link should transition to L1ss during suspend and we keep all resources ON. Did ECS only observe this issue when ASPM is enabled (powersupersave)? If so, then it is a NVMe firmware issue not Qualcomm. > > I believe this patch is addressing an issue that is caused by an out-of-tree > > patch. > > I think ECS observed this with Chrome OS v5.15 kernel. On the PCI side this > kernel only has backported changes from upstream (mostly clean picks), no > downstream patches, so it seems unlikely that the issue is caused by a > downstream patch. Okay, thanks for the clarification. Is it possible to reproduce it on mainline? Just to rule out the upstream vs downstream difference elsewhere. That should also be the case to submit a patch against mainline. - Mani
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index f4e2a88729fd..b57876dc2624 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -5945,6 +5945,16 @@ static void nvidia_ion_ahci_fixup(struct pci_dev *pdev) } DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NVIDIA, 0x0ab8, nvidia_ion_ahci_fixup); +/* In Qualcomm 7c gen 3 sc7280 platform. Some of the SSD won't enter + * the correct ASPM state properly. Therefore. Implement this workaround + * until Qualcomm fixed the correct NVMe suspend process*/ +static void phison_suspend_fixup(struct pci_dev *pdev) +{ + msleep(30); +} +DECLARE_PCI_FIXUP_SUSPEND(0x1987, 0x5013, phison_suspend_fixup); +DECLARE_PCI_FIXUP_SUSPEND(0x1987, 0x5015, phison_suspend_fixup); + static void rom_bar_overlap_defect(struct pci_dev *dev) { pci_info(dev, "working around ROM BAR overlap defect\n");