Message ID | 20221229122640.239859-1-rajat.khandelwal@linux.intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp2377212wrt; Thu, 29 Dec 2022 04:34:15 -0800 (PST) X-Google-Smtp-Source: AMrXdXsQ+Zy6yBIwbzP7HPjpONCWihU8BlMCsIbxck+1FIH/3zisTQhHSug6rtYTkM4CxQDI53+x X-Received: by 2002:a17:902:eb8a:b0:192:9dc5:54b6 with SMTP id q10-20020a170902eb8a00b001929dc554b6mr3323620plg.48.1672317254946; Thu, 29 Dec 2022 04:34:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672317254; cv=none; d=google.com; s=arc-20160816; b=FQJ3CuR8bWOcEFcHPNoTNuKDXj5aYVJSqHO5HjBzHbkM5VqHBgyHwtf3JQAi758oIG Dyir1aIAQq/dpFLj9OrW68iCdWlW1liCiYTrRCgjAMDu6AcJeP5YfJ71JIKvPxjCH2Yz vMdPuJqocXrvDgT+YCUUZyeHt8ieiemvOw7aWBFgJImAZcI+rMJWF0TjM9LwadA2sGHB GA4r0XbEO40NnZVr/Rxe/Q331gJ1bAIXhmXnBXEE1EUXDPqDThVOApqaL8CeHnK3uFNT Vfk3+VsetAcWMDG6On/tKQadHcajgZX4jqeAIPmn7QuUgU4mu6Lq5gfRWoMOYWcct28k VoYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=/t9kSHZ4ko63ClmLaOucX8pZ3/kcgiUg3pJJUY1q7CQ=; b=axpISVvCpoa5mY9mmPX5EdHkAOFkK71PNzh3h06rxqT4Yy6FrnYf9cIICEonj10Wqi bBXWd+hUce+agqPRGuu2c2Hjwty4T81MneVvE1VBZU1uvnUAL9y4y7Ah2NopfQiy8TGG yqNcGHMKvpHUY7PpBiMZGnnneVDvwe/AFw56gVtoruocmKuGC7Cnq5zlWlhX+BbC67Qo ufxD4OknRpTITcvVQN2XbDy3A7oXhtOmPfY+QKbRF8eDV2tkTCOHBCzH6KyielfkyoA4 y3cJb4xukdAAJ1xw8PTYUsH6JWzwc3hGqgQQpu746KqaqqtmZPM28wc2nwlizdSjj07a ySyQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="Z+3/Dx+h"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y16-20020a170902b49000b00188f4d824a2si17484775plr.241.2022.12.29.04.34.02; Thu, 29 Dec 2022 04:34:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="Z+3/Dx+h"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233270AbiL2M0l (ORCPT <rfc822;eddaouddi.ayoub@gmail.com> + 99 others); Thu, 29 Dec 2022 07:26:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59612 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229535AbiL2M0j (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 29 Dec 2022 07:26:39 -0500 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A6E7F263C; Thu, 29 Dec 2022 04:26:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1672316798; x=1703852798; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=dn3epGzIAwWUMYhcvjgvIraCnxNDGBxxZThnjliU4Tw=; b=Z+3/Dx+hEFfWZ/sCBI1YvpUQ59mO+HGjoUbRf7n6iQ4+do1cOFk5K+p8 5Azf0GI9dNqszWyFX3nc0jDoXLIeUPxFyF5Ud1q8hM6inagu2ILLTKESv Q8mSZQ/5pFugLwP/I153CGTF1MHth9nHcluMcqtTGf/mqi/Fv03FS8DHf ANDoLtuS1KrFAHO2VQ+emWXJ5FauZLYz9YdY3/y/vvcX7wS6kzpnua9Ao N40jQpOcz9oSejYeS3pk51Ke1jFjWO6G9wCjEUf44T6wH1B6n+iLQnYYC JL8ZEm3pMwuhnXstN31Q+SEenPHonuSHa05q6jihbnRahJxdSNebhViht A==; X-IronPort-AV: E=McAfee;i="6500,9779,10574"; a="348243043" X-IronPort-AV: E=Sophos;i="5.96,284,1665471600"; d="scan'208";a="348243043" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Dec 2022 04:26:38 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10574"; a="795956079" X-IronPort-AV: E=Sophos;i="5.96,284,1665471600"; d="scan'208";a="795956079" Received: from unknown (HELO rajath-NUC10i7FNH..) ([10.223.165.88]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Dec 2022 04:26:34 -0800 From: Rajat Khandelwal <rajat.khandelwal@linux.intel.com> To: jesse.brandeburg@intel.com, anthony.l.nguyen@intel.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, rajat.khandelwal@intel.com, Rajat Khandelwal <rajat.khandelwal@linux.intel.com> Subject: [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP Date: Thu, 29 Dec 2022 17:56:40 +0530 Message-Id: <20221229122640.239859-1-rajat.khandelwal@linux.intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1753551737969303688?= X-GMAIL-MSGID: =?utf-8?q?1753551737969303688?= |
Series |
igc: Mask replay rollover/timeout errors in I225_LMVP
|
|
Commit Message
Rajat Khandelwal
Dec. 29, 2022, 12:26 p.m. UTC
The CPU logs get flooded with replay rollover/timeout AER errors in
the system with i225_lmvp connected, usually inside thunderbolt devices.
One of the prominent TBT4 docks we use is HP G4 Hook2, which incorporates
an Intel Foxville chipset, which uses the igc driver.
On connecting ethernet, CPU logs get inundated with these errors. The point
is we shouldn't be spamming the logs with such correctible errors as it
confuses other kernel developers less familiar with PCI errors, support
staff, and users who happen to look at the logs.
Signed-off-by: Rajat Khandelwal <rajat.khandelwal@linux.intel.com>
---
drivers/net/ethernet/intel/igc/igc_main.c | 28 +++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)
Comments
On 12/29/2022 14:26, Rajat Khandelwal wrote: > The CPU logs get flooded with replay rollover/timeout AER errors in > the system with i225_lmvp connected, usually inside thunderbolt devices. > > One of the prominent TBT4 docks we use is HP G4 Hook2, which incorporates > an Intel Foxville chipset, which uses the igc driver. > On connecting ethernet, CPU logs get inundated with these errors. The point > is we shouldn't be spamming the logs with such correctible errors as it > confuses other kernel developers less familiar with PCI errors, support > staff, and users who happen to look at the logs. > > Signed-off-by: Rajat Khandelwal <rajat.khandelwal@linux.intel.com> > --- > drivers/net/ethernet/intel/igc/igc_main.c | 28 +++++++++++++++++++++-- > 1 file changed, 26 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c > index ebff0e04045d..a3a6e8086c8d 100644 > --- a/drivers/net/ethernet/intel/igc/igc_main.c > +++ b/drivers/net/ethernet/intel/igc/igc_main.c > @@ -6201,6 +6201,26 @@ u32 igc_rd32(struct igc_hw *hw, u32 reg) > return value; > } > > +#ifdef CONFIG_PCIEAER > +static void igc_mask_aer_replay_correctible(struct igc_adapter *adapter) > +{ > + struct pci_dev *pdev = adapter->pdev; > + u32 aer_pos, corr_mask; > + > + if (pdev->device != IGC_DEV_ID_I225_LMVP) > + return; > + > + aer_pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR); > + if (!aer_pos) > + return; > + > + pci_read_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, &corr_mask); > + > + corr_mask |= PCI_ERR_COR_REP_ROLL | PCI_ERR_COR_REP_TIMER; > + pci_write_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, corr_mask); > +} > +#endif > + Hello Rajat, May we use the privilege flag approach, give user control: and mask some advanced errors? Although... Why did it happen? Didn't you prefer not to investigate it or else mask it? (I have concerns about the PCIe link over the thunderbolt tunnel) > /** > * igc_probe - Device Initialization Routine > * @pdev: PCI device information struct > @@ -6236,8 +6256,6 @@ static int igc_probe(struct pci_dev *pdev, > if (err) > goto err_pci_reg; > > - pci_enable_pcie_error_reporting(pdev); > - > err = pci_enable_ptm(pdev, NULL); > if (err < 0) > dev_info(&pdev->dev, "PCIe PTM not supported by PCIe bus/controller\n"); > @@ -6272,6 +6290,12 @@ static int igc_probe(struct pci_dev *pdev, > if (!adapter->io_addr) > goto err_ioremap; > > +#ifdef CONFIG_PCIEAER > + igc_mask_aer_replay_correctible(adapter); > +#endif > + > + pci_enable_pcie_error_reporting(pdev); > + > /* hw->hw_addr can be zeroed, so use adapter->io_addr for unmap */ > hw->hw_addr = adapter->io_addr; >
On Thu, Dec 29, 2022 at 05:56:40PM +0530, Rajat Khandelwal wrote: > The CPU logs get flooded with replay rollover/timeout AER errors in > the system with i225_lmvp connected, usually inside thunderbolt devices. > > One of the prominent TBT4 docks we use is HP G4 Hook2, which incorporates > an Intel Foxville chipset, which uses the igc driver. > On connecting ethernet, CPU logs get inundated with these errors. The point > is we shouldn't be spamming the logs with such correctible errors as it > confuses other kernel developers less familiar with PCI errors, support > staff, and users who happen to look at the logs. > > Signed-off-by: Rajat Khandelwal <rajat.khandelwal@linux.intel.com> > --- > drivers/net/ethernet/intel/igc/igc_main.c | 28 +++++++++++++++++++++-- > 1 file changed, 26 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c > index ebff0e04045d..a3a6e8086c8d 100644 > --- a/drivers/net/ethernet/intel/igc/igc_main.c > +++ b/drivers/net/ethernet/intel/igc/igc_main.c > @@ -6201,6 +6201,26 @@ u32 igc_rd32(struct igc_hw *hw, u32 reg) > return value; > } > > +#ifdef CONFIG_PCIEAER > +static void igc_mask_aer_replay_correctible(struct igc_adapter *adapter) > +{ > + struct pci_dev *pdev = adapter->pdev; > + u32 aer_pos, corr_mask; > + > + if (pdev->device != IGC_DEV_ID_I225_LMVP) > + return; > + > + aer_pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR); > + if (!aer_pos) > + return; > + > + pci_read_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, &corr_mask); > + > + corr_mask |= PCI_ERR_COR_REP_ROLL | PCI_ERR_COR_REP_TIMER; > + pci_write_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, corr_mask); Shouldn't this igc_mask_aer_replay_correctible function be implemented in drivers/pci/quirks.c and not in igc_probe()? Thanks
[Cc: +Bjorn, +linux-pci] Dear Rajat, Thank you for your patch. Am 29.12.22 um 13:26 schrieb Rajat Khandelwal: > The CPU logs get flooded with replay rollover/timeout AER errors in > the system with i225_lmvp connected, usually inside thunderbolt devices. Please add one example log message to the commit message. > One of the prominent TBT4 docks we use is HP G4 Hook2, which incorporates I couldn’t find that device. Is that the correct name? > an Intel Foxville chipset, which uses the igc driver. Please add a blank line between paragraphs. > On connecting ethernet, CPU logs get inundated with these errors. The point > is we shouldn't be spamming the logs with such correctible errors as it correctable > confuses other kernel developers less familiar with PCI errors, support > staff, and users who happen to look at the logs. Please reference the bug reports (bug tracker and mailing list), you know of, where this was reported. > Signed-off-by: Rajat Khandelwal <rajat.khandelwal@linux.intel.com> > --- > drivers/net/ethernet/intel/igc/igc_main.c | 28 +++++++++++++++++++++-- > 1 file changed, 26 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c > index ebff0e04045d..a3a6e8086c8d 100644 > --- a/drivers/net/ethernet/intel/igc/igc_main.c > +++ b/drivers/net/ethernet/intel/igc/igc_main.c > @@ -6201,6 +6201,26 @@ u32 igc_rd32(struct igc_hw *hw, u32 reg) > return value; > } > > +#ifdef CONFIG_PCIEAER > +static void igc_mask_aer_replay_correctible(struct igc_adapter *adapter) correctable > +{ > + struct pci_dev *pdev = adapter->pdev; > + u32 aer_pos, corr_mask; Instead of using the preprocessor, use a normal C conditional. From `Documentation/process/coding-style.rst`: > Within code, where possible, use the IS_ENABLED macro to convert a Kconfig > symbol into a C boolean expression, and use it in a normal C conditional: > > .. code-block:: c > > if (IS_ENABLED(CONFIG_SOMETHING)) { > ... > } > > The compiler will constant-fold the conditional away, and include or exclude > the block of code just as with an #ifdef, so this will not add any runtime > overhead. However, this approach still allows the C compiler to see the code > inside the block, and check it for correctness (syntax, types, symbol > references, etc). Thus, you still have to use an #ifdef if the code inside the > block references symbols that will not exist if the condition is not met. > + > + if (pdev->device != IGC_DEV_ID_I225_LMVP) > + return; > + > + aer_pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR); > + if (!aer_pos) > + return; > + > + pci_read_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, &corr_mask); > + > + corr_mask |= PCI_ERR_COR_REP_ROLL | PCI_ERR_COR_REP_TIMER; > + pci_write_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, corr_mask); > +} > +#endif > + > /** > * igc_probe - Device Initialization Routine > * @pdev: PCI device information struct > @@ -6236,8 +6256,6 @@ static int igc_probe(struct pci_dev *pdev, > if (err) > goto err_pci_reg; > > - pci_enable_pcie_error_reporting(pdev); > - > err = pci_enable_ptm(pdev, NULL); > if (err < 0) > dev_info(&pdev->dev, "PCIe PTM not supported by PCIe bus/controller\n"); > @@ -6272,6 +6290,12 @@ static int igc_probe(struct pci_dev *pdev, > if (!adapter->io_addr) > goto err_ioremap; > > +#ifdef CONFIG_PCIEAER > + igc_mask_aer_replay_correctible(adapter); > +#endif > + > + pci_enable_pcie_error_reporting(pdev); > + > /* hw->hw_addr can be zeroed, so use adapter->io_addr for unmap */ > hw->hw_addr = adapter->io_addr; > Kind regards, Paul
[Cc: +Bjorn, +linux-pci] Dear Leon, dear Rajat, Am 01.01.23 um 09:32 schrieb Leon Romanovsky: > On Thu, Dec 29, 2022 at 05:56:40PM +0530, Rajat Khandelwal wrote: >> The CPU logs get flooded with replay rollover/timeout AER errors in >> the system with i225_lmvp connected, usually inside thunderbolt devices. >> >> One of the prominent TBT4 docks we use is HP G4 Hook2, which incorporates >> an Intel Foxville chipset, which uses the igc driver. >> On connecting ethernet, CPU logs get inundated with these errors. The point >> is we shouldn't be spamming the logs with such correctible errors as it >> confuses other kernel developers less familiar with PCI errors, support >> staff, and users who happen to look at the logs. >> >> Signed-off-by: Rajat Khandelwal <rajat.khandelwal@linux.intel.com> >> --- >> drivers/net/ethernet/intel/igc/igc_main.c | 28 +++++++++++++++++++++-- >> 1 file changed, 26 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c >> index ebff0e04045d..a3a6e8086c8d 100644 >> --- a/drivers/net/ethernet/intel/igc/igc_main.c >> +++ b/drivers/net/ethernet/intel/igc/igc_main.c >> @@ -6201,6 +6201,26 @@ u32 igc_rd32(struct igc_hw *hw, u32 reg) >> return value; >> } >> >> +#ifdef CONFIG_PCIEAER >> +static void igc_mask_aer_replay_correctible(struct igc_adapter *adapter) >> +{ >> + struct pci_dev *pdev = adapter->pdev; >> + u32 aer_pos, corr_mask; >> + >> + if (pdev->device != IGC_DEV_ID_I225_LMVP) >> + return; >> + >> + aer_pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR); >> + if (!aer_pos) >> + return; >> + >> + pci_read_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, &corr_mask); >> + >> + corr_mask |= PCI_ERR_COR_REP_ROLL | PCI_ERR_COR_REP_TIMER; >> + pci_write_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, corr_mask); > > Shouldn't this igc_mask_aer_replay_correctible function be implemented > in drivers/pci/quirks.c and not in igc_probe()? Probably. Though I think, the PCI quirk file, is getting too big. Kind regards, Paul
Hi Paul, Sasha Thanks for the acknowledgement! -> Will add the example logs -> Device: https://www.hp.com/us-en/monitors-accessories/computer-accessories/thunderbolt-G4-dock.html -> correctible -> correctable -> I guess acc to the convention, I still have to use #ifdef for my function since it references variables that won't exist if the condition is not met. However, I have used the IS_ENABLED macro to call the function inside igc_probe(). I hope that's okay! -> One last thing, I was also skeptical on the location of this function, but then I witnessed netxen_mask_aer_correctable() function inside net/ethernet/qlogic/netxen/netxen_nic_main.c, which masks the correctable errors in its PCIe device. Also, I don’t see a CONFIG_PCIEAER macro enabled function in pci/quirks.c! I still think to keep the function in igc_main.c, but I am waiting for your judgement. @Neftin, Sasha, I and my team prefer masking these errors rather than debugging them. First, they are correctable and non-fatal. Second, these errors are observed in many of the devices I have worked with (i.e., replay errors). Maybe there is something universal which has to be done for the thunderbolt domain regarding these specific replay errors in the long term? Anyhow, we would like to mask these errors for now to avoid any confusions when ethernet gets connected to the dock. I hope that will be okay? Waiting for your judgement :) Let me know on any more queries and any suggestions until I roll out v2. Thanks Rajat -----Original Message----- From: Paul Menzel <pmenzel@molgen.mpg.de> Sent: Sunday, January 1, 2023 4:02 PM To: Rajat Khandelwal <rajat.khandelwal@linux.intel.com> Cc: Brandeburg, Jesse <jesse.brandeburg@intel.com>; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; davem@davemloft.net; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org; linux-kernel@vger.kernel.org; Khandelwal, Rajat <rajat.khandelwal@intel.com>; Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org Subject: Re: [Intel-wired-lan] [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP [Cc: +Bjorn, +linux-pci] Dear Rajat, Thank you for your patch. Am 29.12.22 um 13:26 schrieb Rajat Khandelwal: > The CPU logs get flooded with replay rollover/timeout AER errors in > the system with i225_lmvp connected, usually inside thunderbolt devices. Please add one example log message to the commit message. > One of the prominent TBT4 docks we use is HP G4 Hook2, which > incorporates I couldn’t find that device. Is that the correct name? > an Intel Foxville chipset, which uses the igc driver. Please add a blank line between paragraphs. > On connecting ethernet, CPU logs get inundated with these errors. The > point is we shouldn't be spamming the logs with such correctible > errors as it correctable > confuses other kernel developers less familiar with PCI errors, > support staff, and users who happen to look at the logs. Please reference the bug reports (bug tracker and mailing list), you know of, where this was reported. > Signed-off-by: Rajat Khandelwal <rajat.khandelwal@linux.intel.com> > --- > drivers/net/ethernet/intel/igc/igc_main.c | 28 +++++++++++++++++++++-- > 1 file changed, 26 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/intel/igc/igc_main.c > b/drivers/net/ethernet/intel/igc/igc_main.c > index ebff0e04045d..a3a6e8086c8d 100644 > --- a/drivers/net/ethernet/intel/igc/igc_main.c > +++ b/drivers/net/ethernet/intel/igc/igc_main.c > @@ -6201,6 +6201,26 @@ u32 igc_rd32(struct igc_hw *hw, u32 reg) > return value; > } > > +#ifdef CONFIG_PCIEAER > +static void igc_mask_aer_replay_correctible(struct igc_adapter > +*adapter) correctable > +{ > + struct pci_dev *pdev = adapter->pdev; > + u32 aer_pos, corr_mask; Instead of using the preprocessor, use a normal C conditional. From `Documentation/process/coding-style.rst`: > Within code, where possible, use the IS_ENABLED macro to convert a > Kconfig symbol into a C boolean expression, and use it in a normal C conditional: > > .. code-block:: c > > if (IS_ENABLED(CONFIG_SOMETHING)) { > ... > } > > The compiler will constant-fold the conditional away, and include or > exclude the block of code just as with an #ifdef, so this will not add > any runtime overhead. However, this approach still allows the C > compiler to see the code inside the block, and check it for > correctness (syntax, types, symbol references, etc). Thus, you still > have to use an #ifdef if the code inside the block references symbols that will not exist if the condition is not met. > + > + if (pdev->device != IGC_DEV_ID_I225_LMVP) > + return; > + > + aer_pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR); > + if (!aer_pos) > + return; > + > + pci_read_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, &corr_mask); > + > + corr_mask |= PCI_ERR_COR_REP_ROLL | PCI_ERR_COR_REP_TIMER; > + pci_write_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, corr_mask); > +} #endif > + > /** > * igc_probe - Device Initialization Routine > * @pdev: PCI device information struct @@ -6236,8 +6256,6 @@ static > int igc_probe(struct pci_dev *pdev, > if (err) > goto err_pci_reg; > > - pci_enable_pcie_error_reporting(pdev); > - > err = pci_enable_ptm(pdev, NULL); > if (err < 0) > dev_info(&pdev->dev, "PCIe PTM not supported by PCIe > bus/controller\n"); @@ -6272,6 +6290,12 @@ static int igc_probe(struct pci_dev *pdev, > if (!adapter->io_addr) > goto err_ioremap; > > +#ifdef CONFIG_PCIEAER > + igc_mask_aer_replay_correctible(adapter); > +#endif > + > + pci_enable_pcie_error_reporting(pdev); > + > /* hw->hw_addr can be zeroed, so use adapter->io_addr for unmap */ > hw->hw_addr = adapter->io_addr; > Kind regards, Paul
On Sun, Jan 01, 2023 at 11:34:21AM +0100, Paul Menzel wrote: > [Cc: +Bjorn, +linux-pci] > > Dear Leon, dear Rajat, > > > Am 01.01.23 um 09:32 schrieb Leon Romanovsky: > > On Thu, Dec 29, 2022 at 05:56:40PM +0530, Rajat Khandelwal wrote: > > > The CPU logs get flooded with replay rollover/timeout AER errors in > > > the system with i225_lmvp connected, usually inside thunderbolt devices. > > > > > > One of the prominent TBT4 docks we use is HP G4 Hook2, which incorporates > > > an Intel Foxville chipset, which uses the igc driver. > > > On connecting ethernet, CPU logs get inundated with these errors. The point > > > is we shouldn't be spamming the logs with such correctible errors as it > > > confuses other kernel developers less familiar with PCI errors, support > > > staff, and users who happen to look at the logs. > > > > > > Signed-off-by: Rajat Khandelwal <rajat.khandelwal@linux.intel.com> > > > --- > > > drivers/net/ethernet/intel/igc/igc_main.c | 28 +++++++++++++++++++++-- > > > 1 file changed, 26 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c > > > index ebff0e04045d..a3a6e8086c8d 100644 > > > --- a/drivers/net/ethernet/intel/igc/igc_main.c > > > +++ b/drivers/net/ethernet/intel/igc/igc_main.c > > > @@ -6201,6 +6201,26 @@ u32 igc_rd32(struct igc_hw *hw, u32 reg) > > > return value; > > > } > > > +#ifdef CONFIG_PCIEAER > > > +static void igc_mask_aer_replay_correctible(struct igc_adapter *adapter) > > > +{ > > > + struct pci_dev *pdev = adapter->pdev; > > > + u32 aer_pos, corr_mask; > > > + > > > + if (pdev->device != IGC_DEV_ID_I225_LMVP) > > > + return; > > > + > > > + aer_pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR); > > > + if (!aer_pos) > > > + return; > > > + > > > + pci_read_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, &corr_mask); > > > + > > > + corr_mask |= PCI_ERR_COR_REP_ROLL | PCI_ERR_COR_REP_TIMER; > > > + pci_write_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, corr_mask); > > > > Shouldn't this igc_mask_aer_replay_correctible function be implemented > > in drivers/pci/quirks.c and not in igc_probe()? > > Probably. Though I think, the PCI quirk file, is getting too big. As long as that file is right location, we should use it. One can refactor quirk file later. Thanks > > > Kind regards, > > Paul
On Tue, Jan 03, 2023 at 11:54:24AM +0200, Leon Romanovsky wrote: > On Sun, Jan 01, 2023 at 11:34:21AM +0100, Paul Menzel wrote: > > Am 01.01.23 um 09:32 schrieb Leon Romanovsky: > > > On Thu, Dec 29, 2022 at 05:56:40PM +0530, Rajat Khandelwal wrote: > > > > The CPU logs get flooded with replay rollover/timeout AER errors in > > > > the system with i225_lmvp connected, usually inside thunderbolt devices. > > > > > > > > One of the prominent TBT4 docks we use is HP G4 Hook2, which incorporates > > > > an Intel Foxville chipset, which uses the igc driver. > > > > On connecting ethernet, CPU logs get inundated with these errors. The point > > > > is we shouldn't be spamming the logs with such correctible errors as it > > > > confuses other kernel developers less familiar with PCI errors, support > > > > staff, and users who happen to look at the logs. > > > > --- a/drivers/net/ethernet/intel/igc/igc_main.c > > > > +++ b/drivers/net/ethernet/intel/igc/igc_main.c > > > > +static void igc_mask_aer_replay_correctible(struct igc_adapter *adapter) > > > Shouldn't this igc_mask_aer_replay_correctible function be implemented > > > in drivers/pci/quirks.c and not in igc_probe()? > > > > Probably. Though I think, the PCI quirk file, is getting too big. > > As long as that file is right location, we should use it. > One can refactor quirk file later. If a quirk like this is only needed when the driver is loaded, I think the driver is a better place than drivers/pci/quirks.c. If it's in quirks.c, either we have to replicate driver Kconfig via #ifdefs, or the kernel contains the quirk for systems that don't need it. I'm generally not a fan of simply masking errors because they're annoying. I'd prefer to figure out the root cause and fix it if possible. Or maybe we can tone down or rate-limit the logging so it's not so alarming. Bjorn
On Tue, Jan 03, 2023 at 05:54:02AM -0600, Bjorn Helgaas wrote: > On Tue, Jan 03, 2023 at 11:54:24AM +0200, Leon Romanovsky wrote: > > On Sun, Jan 01, 2023 at 11:34:21AM +0100, Paul Menzel wrote: > > > Am 01.01.23 um 09:32 schrieb Leon Romanovsky: > > > > On Thu, Dec 29, 2022 at 05:56:40PM +0530, Rajat Khandelwal wrote: > > > > > The CPU logs get flooded with replay rollover/timeout AER errors in > > > > > the system with i225_lmvp connected, usually inside thunderbolt devices. > > > > > > > > > > One of the prominent TBT4 docks we use is HP G4 Hook2, which incorporates > > > > > an Intel Foxville chipset, which uses the igc driver. > > > > > On connecting ethernet, CPU logs get inundated with these errors. The point > > > > > is we shouldn't be spamming the logs with such correctible errors as it > > > > > confuses other kernel developers less familiar with PCI errors, support > > > > > staff, and users who happen to look at the logs. > > > > > > --- a/drivers/net/ethernet/intel/igc/igc_main.c > > > > > +++ b/drivers/net/ethernet/intel/igc/igc_main.c > > > > > > +static void igc_mask_aer_replay_correctible(struct igc_adapter *adapter) > > > > > Shouldn't this igc_mask_aer_replay_correctible function be implemented > > > > in drivers/pci/quirks.c and not in igc_probe()? > > > > > > Probably. Though I think, the PCI quirk file, is getting too big. > > > > As long as that file is right location, we should use it. > > One can refactor quirk file later. > > If a quirk like this is only needed when the driver is loaded, This is always the case with PCI devices managed through kernel, isn't it? Users don't care/aware about "broken" devices unless they start to use them. Thanks
On Tue, Jan 03, 2023 at 02:00:04PM +0200, Leon Romanovsky wrote: > On Tue, Jan 03, 2023 at 05:54:02AM -0600, Bjorn Helgaas wrote: > > On Tue, Jan 03, 2023 at 11:54:24AM +0200, Leon Romanovsky wrote: > > > On Sun, Jan 01, 2023 at 11:34:21AM +0100, Paul Menzel wrote: > > > > Am 01.01.23 um 09:32 schrieb Leon Romanovsky: > > > > > On Thu, Dec 29, 2022 at 05:56:40PM +0530, Rajat Khandelwal wrote: > > > > > > The CPU logs get flooded with replay rollover/timeout AER errors in > > > > > > the system with i225_lmvp connected, usually inside thunderbolt devices. > > > > > > > > > > > > One of the prominent TBT4 docks we use is HP G4 Hook2, which incorporates > > > > > > an Intel Foxville chipset, which uses the igc driver. > > > > > > On connecting ethernet, CPU logs get inundated with these errors. The point > > > > > > is we shouldn't be spamming the logs with such correctible errors as it > > > > > > confuses other kernel developers less familiar with PCI errors, support > > > > > > staff, and users who happen to look at the logs. > > > > > > > > --- a/drivers/net/ethernet/intel/igc/igc_main.c > > > > > > +++ b/drivers/net/ethernet/intel/igc/igc_main.c > > > > > > > > +static void igc_mask_aer_replay_correctible(struct igc_adapter *adapter) > > > > > > > Shouldn't this igc_mask_aer_replay_correctible function be implemented > > > > > in drivers/pci/quirks.c and not in igc_probe()? > > > > > > > > Probably. Though I think, the PCI quirk file, is getting too big. > > > > > > As long as that file is right location, we should use it. > > > One can refactor quirk file later. > > > > If a quirk like this is only needed when the driver is loaded, > > This is always the case with PCI devices managed through kernel, isn't it? > Users don't care/aware about "broken" devices unless they start to use them. Indeed, that's usually the case. There's a lot of stuff in quirks.c that could probably be in drivers instead. Bjorn
On Tue, Jan 03, 2023 at 08:21:04AM -0600, Bjorn Helgaas wrote: > On Tue, Jan 03, 2023 at 02:00:04PM +0200, Leon Romanovsky wrote: > > On Tue, Jan 03, 2023 at 05:54:02AM -0600, Bjorn Helgaas wrote: > > > On Tue, Jan 03, 2023 at 11:54:24AM +0200, Leon Romanovsky wrote: > > > > On Sun, Jan 01, 2023 at 11:34:21AM +0100, Paul Menzel wrote: > > > > > Am 01.01.23 um 09:32 schrieb Leon Romanovsky: > > > > > > On Thu, Dec 29, 2022 at 05:56:40PM +0530, Rajat Khandelwal wrote: > > > > > > > The CPU logs get flooded with replay rollover/timeout AER errors in > > > > > > > the system with i225_lmvp connected, usually inside thunderbolt devices. > > > > > > > > > > > > > > One of the prominent TBT4 docks we use is HP G4 Hook2, which incorporates > > > > > > > an Intel Foxville chipset, which uses the igc driver. > > > > > > > On connecting ethernet, CPU logs get inundated with these errors. The point > > > > > > > is we shouldn't be spamming the logs with such correctible errors as it > > > > > > > confuses other kernel developers less familiar with PCI errors, support > > > > > > > staff, and users who happen to look at the logs. > > > > > > > > > > --- a/drivers/net/ethernet/intel/igc/igc_main.c > > > > > > > +++ b/drivers/net/ethernet/intel/igc/igc_main.c > > > > > > > > > > +static void igc_mask_aer_replay_correctible(struct igc_adapter *adapter) > > > > > > > > > Shouldn't this igc_mask_aer_replay_correctible function be implemented > > > > > > in drivers/pci/quirks.c and not in igc_probe()? > > > > > > > > > > Probably. Though I think, the PCI quirk file, is getting too big. > > > > > > > > As long as that file is right location, we should use it. > > > > One can refactor quirk file later. > > > > > > If a quirk like this is only needed when the driver is loaded, > > > > This is always the case with PCI devices managed through kernel, isn't it? > > Users don't care/aware about "broken" devices unless they start to use them. > > Indeed, that's usually the case. There's a lot of stuff in quirks.c > that could probably be in drivers instead. NP, so or deprecate quirks.c and prohibit any change to that file or don't allow drivers to mangle PCI in their probe routines. Everything in-between will cause to enormous mess in long run. Thanks > > Bjorn
On Tue, Jan 03, 2023 at 07:16:58PM +0200, Leon Romanovsky wrote: > On Tue, Jan 03, 2023 at 08:21:04AM -0600, Bjorn Helgaas wrote: <...> > > > > If a quirk like this is only needed when the driver is loaded, > > > > > > This is always the case with PCI devices managed through kernel, isn't it? > > > Users don't care/aware about "broken" devices unless they start to use them. > > > > Indeed, that's usually the case. There's a lot of stuff in quirks.c > > that could probably be in drivers instead. > > NP, so or deprecate quirks.c and prohibit any change to that file or > don't allow drivers to mangle PCI in their probe routines. > Everything in-between will cause to enormous mess in long run. Another thing to consider what if you go with "probe variant", users will see behavioral differences between drivers and subsystems on how to control these quirks. As an example, see proposal in this thread to add ethtool private flag to enable/disable quirk. In other places, it will be module parameter, sysfs or special to that subsystem tool. Thanks > > Thanks > > > > > Bjorn
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c index ebff0e04045d..a3a6e8086c8d 100644 --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -6201,6 +6201,26 @@ u32 igc_rd32(struct igc_hw *hw, u32 reg) return value; } +#ifdef CONFIG_PCIEAER +static void igc_mask_aer_replay_correctible(struct igc_adapter *adapter) +{ + struct pci_dev *pdev = adapter->pdev; + u32 aer_pos, corr_mask; + + if (pdev->device != IGC_DEV_ID_I225_LMVP) + return; + + aer_pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR); + if (!aer_pos) + return; + + pci_read_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, &corr_mask); + + corr_mask |= PCI_ERR_COR_REP_ROLL | PCI_ERR_COR_REP_TIMER; + pci_write_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, corr_mask); +} +#endif + /** * igc_probe - Device Initialization Routine * @pdev: PCI device information struct @@ -6236,8 +6256,6 @@ static int igc_probe(struct pci_dev *pdev, if (err) goto err_pci_reg; - pci_enable_pcie_error_reporting(pdev); - err = pci_enable_ptm(pdev, NULL); if (err < 0) dev_info(&pdev->dev, "PCIe PTM not supported by PCIe bus/controller\n"); @@ -6272,6 +6290,12 @@ static int igc_probe(struct pci_dev *pdev, if (!adapter->io_addr) goto err_ioremap; +#ifdef CONFIG_PCIEAER + igc_mask_aer_replay_correctible(adapter); +#endif + + pci_enable_pcie_error_reporting(pdev); + /* hw->hw_addr can be zeroed, so use adapter->io_addr for unmap */ hw->hw_addr = adapter->io_addr;