Message ID | 20240129112710.2852-3-ilpo.jarvinen@linux.intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-42616-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp579374dyb; Mon, 29 Jan 2024 05:56:23 -0800 (PST) X-Google-Smtp-Source: AGHT+IGRbObnWDwA4MhX0EDNmlzBi5mH8TE7VkxMi7XO9x3LBMjHhxoAJG09Oagvl9xqTbhsIk62 X-Received: by 2002:a2e:9a96:0:b0:2cd:a2e:fba6 with SMTP id p22-20020a2e9a96000000b002cd0a2efba6mr3916094lji.11.1706536582936; Mon, 29 Jan 2024 05:56:22 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706536582; cv=pass; d=google.com; s=arc-20160816; b=mBKSWttnqfzpDwlPja/utl3vKjOE7EzYbwPRbRO4Vo2WSOYZrDSq2eihSa6W7yanp4 IV1vAKJC8vJQfTYugeHkSomXIXueeMEY1LFW1ItIpfF+WPwhbTlGQPE9VgJS44blTYu7 Bvql7nfa7hFSEdygwUvVOODXeAdOQ/k6xT+lGWdi790R8PHAddvg1duSTA9NEy//7QLO Cs10MyXyusufjQPIko6LWndxGFv9f1baJ23z08gb3kn3Ob0oWsRuo2teqj8YDyZnNObq d4H87cfONjSUH/hcderkGzZyc6cFgQbimJ9lwavO1I4njns49M+lj8FfMB6SmUsBTRia 0pZg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=HcEo05hyo+jZVmOhy0EQW/yxr+A9Ydvy2rS7DgLU+cU=; fh=uaDhNdmOckUBDf+XjXUykn5C4NpzoLZK8jGmWlP+92w=; b=NBXUNwkEPwYQ8Eq5cft2lrA0BbLxd4Zry1B6utA4/7YXJblBnA0Su0y/NBpGfNh9WT DcIN5KMQE80mWt/WNPQ2Jlvq5/1YHlzH6pkpF2n0KCzKYbGi6HTycPYmQ3hBvSrHXQib d81Z1FrGAcgx5SfhRW5Qm2mRwnu1n6HywhOlJiNDdkYOua3lzDoNmOpP++U7ap/SHgVR GH6Xx99TqTuN/5tT8fH5VgiZzDhjDoqsnZPGGdYgnFnCgDWa51FGOGb6X8Z2uA5rwWHf tOXT/ZNAo/Yq9SgcVNEIt20E7lt775bc7UC4GX4a76H/lGtu0bmL8rObgm/Ko6Wz2xbg N/3A== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=YPY9ehdi; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-42616-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-42616-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id e19-20020a056402191300b0055f1b88200asi371847edz.631.2024.01.29.05.56.22 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Jan 2024 05:56:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-42616-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=YPY9ehdi; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-42616-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-42616-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id B9EA11F24139 for <ouuuleilei@gmail.com>; Mon, 29 Jan 2024 11:30:15 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6AE835BAC0; Mon, 29 Jan 2024 11:29:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="YPY9ehdi" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ED17B5B5B2; Mon, 29 Jan 2024 11:29:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706527791; cv=none; b=DkG8Qm8CkK2rZzvnAUwBJw8Etb3BciaMpcqEAt8QZu+KS9ntZVY2WCidAQSOk7CHDpWSzZ8D5H47jE93zedpizfcU4B5ECYrXhDN0/p6KvsTchGcxq2tJIJYp/AdRHjoDeVLpg0R4nypOn+3JDmbwk9SljHsDPL98A62CgVxMVo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706527791; c=relaxed/simple; bh=C1bZO06qv1Eiozo75zTNq7EL0T6h5w//IZNDQaGVD68=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=ONnMykk2rqbQeP9XbVl/txQ8XXcEMCV+RC4dUXtZF7znEMRhWKD15Pgnj5dyoljHx//CgWagZKQa2kBUmIxgMkdqPvkHU/5G1btmjlzZY9b1I+rWfyttYTmNx+qk+IHyT1kfUISVOSiGpe9TY96q/0QZBCVibS/mjx1BJkHTc9E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=YPY9ehdi; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706527790; x=1738063790; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=C1bZO06qv1Eiozo75zTNq7EL0T6h5w//IZNDQaGVD68=; b=YPY9ehdi779HyVu/ug4cNFRoALxjw5V+hY3VV8MRtC0KN8fUHppzqCCv B4x7iLNGA3XtDUx0PBvj/0roMj80r4i/ECKbiSzfox+McX6tHwjTiZ352 FisRAyFZL9O0J3mscofoj4BfLGcmvVLq6Sib+VRcom94Onj/mPlPz183R CXEJGqbCwtqXpnjQTZCtKFuPpTpkf6lF9FMSgMsRQBdk55Jx028zYSXXQ Upi4842QMaCse9mZ4yJK0wR8wjiynnpNXCMh7VjYjdcOA81g29iUb/JnK nszDdKiTrX9tnmatsU1SWtZN1Mq0aKlAXDiLvtjdU7JDL89r0sTE5NOwg Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10967"; a="16284307" X-IronPort-AV: E=Sophos;i="6.05,227,1701158400"; d="scan'208";a="16284307" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2024 03:29:45 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10967"; a="960868400" X-IronPort-AV: E=Sophos;i="6.05,227,1701158400"; d="scan'208";a="960868400" Received: from ijarvine-desk1.ger.corp.intel.com (HELO localhost) ([10.94.253.213]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2024 03:29:18 -0800 From: =?utf-8?q?Ilpo_J=C3=A4rvinen?= <ilpo.jarvinen@linux.intel.com> To: Bjorn Helgaas <bhelgaas@google.com>, linux-pci@vger.kernel.org, "Maciej W. Rozycki" <macro@orcam.me.uk>, linux-kernel@vger.kernel.org Cc: Mika Westerberg <mika.westerberg@linux.intel.com>, =?utf-8?q?Ilpo_J?= =?utf-8?q?=C3=A4rvinen?= <ilpo.jarvinen@linux.intel.com> Subject: [PATCH 2/2] PCI: Do not wait for disconnected devices when resuming Date: Mon, 29 Jan 2024 13:27:10 +0200 Message-Id: <20240129112710.2852-3-ilpo.jarvinen@linux.intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240129112710.2852-1-ilpo.jarvinen@linux.intel.com> References: <20240129112710.2852-1-ilpo.jarvinen@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789433304072700290 X-GMAIL-MSGID: 1789433304072700290 |
Series |
PCI: Fix disconnect related issues
|
|
Commit Message
Ilpo Järvinen
Jan. 29, 2024, 11:27 a.m. UTC
On runtime resume, pci_dev_wait() is called: pci_pm_runtime_resume() pci_pm_bridge_power_up_actions() pci_bridge_wait_for_secondary_bus() pci_dev_wait() While a device is runtime suspended along with its PCIe hierarchy, the device could get disconnected. In such case, the link will not come up no matter how log pci_dev_wait() waits for it. Besides the above mentioned case, there could be other ways to get the device disconnected while pci_dev_wait() is waiting for the link to come up. Make pci_dev_wait() to exit if the device is already disconnected to avoid unnecessary delay. As disconnected device is not really even a failure in the same sense as link failing to come up for whatever reason, return 0 instead of errno. Also make sure compiler does not become too clever with dev->error_state and use READ_ONCE() to force a fetch for the up-to-date value. Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> --- drivers/pci/pci.c | 5 +++++ drivers/pci/pci.h | 4 +++- 2 files changed, 8 insertions(+), 1 deletion(-)
Comments
On Mon, Jan 29, 2024 at 01:27:10PM +0200, Ilpo Järvinen wrote: > On runtime resume, pci_dev_wait() is called: > pci_pm_runtime_resume() > pci_pm_bridge_power_up_actions() > pci_bridge_wait_for_secondary_bus() > pci_dev_wait() > > While a device is runtime suspended along with its PCIe hierarchy, the > device could get disconnected. In such case, the link will not come up > no matter how log pci_dev_wait() waits for it. s/PCIe/PCI/ (unless this is a PCIe-specific thing) s/log/long/ > Besides the above mentioned case, there could be other ways to get the > device disconnected while pci_dev_wait() is waiting for the link to > come up. > > Make pci_dev_wait() to exit if the device is already disconnected to > avoid unnecessary delay. As disconnected device is not really even a > failure in the same sense as link failing to come up for whatever > reason, return 0 instead of errno. The device being disconnected is not the same as a link failure. Do all the callers do the right thing if pci_dev_wait() returns success when there's no device there? > Also make sure compiler does not become too clever with > dev->error_state and use READ_ONCE() to force a fetch for the > up-to-date value. I think we should have a comment there to say why READ_ONCE() is needed. Otherwise it's hard to know whether a future change might make it unnecessary. > Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com> > Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> > --- > drivers/pci/pci.c | 5 +++++ > drivers/pci/pci.h | 4 +++- > 2 files changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index d8f11a078924..ec9bf6c90312 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -1250,6 +1250,11 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout) > for (;;) { > u32 id; > > + if (pci_dev_is_disconnected(dev)) { > + pci_dbg(dev, "disconnected; not waiting\n"); > + return 0; > + } > + > pci_read_config_dword(dev, PCI_COMMAND, &id); > if (!PCI_POSSIBLE_ERROR(id)) > break; > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index 2336a8d1edab..563a275dff67 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -4,6 +4,8 @@ > > #include <linux/pci.h> > > +#include <asm/rwonce.h> > + > /* Number of possible devfns: 0.0 to 1f.7 inclusive */ > #define MAX_NR_DEVFNS 256 > > @@ -370,7 +372,7 @@ static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused) > > static inline bool pci_dev_is_disconnected(const struct pci_dev *dev) > { > - return dev->error_state == pci_channel_io_perm_failure; > + return READ_ONCE(dev->error_state) == pci_channel_io_perm_failure; > } > > /* pci_dev priv_flags */ > -- > 2.39.2 >
On Mon, 29 Jan 2024, Bjorn Helgaas wrote: > On Mon, Jan 29, 2024 at 01:27:10PM +0200, Ilpo Järvinen wrote: > > On runtime resume, pci_dev_wait() is called: > > pci_pm_runtime_resume() > > pci_pm_bridge_power_up_actions() > > pci_bridge_wait_for_secondary_bus() > > pci_dev_wait() > > > > While a device is runtime suspended along with its PCIe hierarchy, the > > device could get disconnected. In such case, the link will not come up > > no matter how log pci_dev_wait() waits for it. > > s/PCIe/PCI/ (unless this is a PCIe-specific thing) > s/log/long/ > > > Besides the above mentioned case, there could be other ways to get the > > device disconnected while pci_dev_wait() is waiting for the link to > > come up. > > > > Make pci_dev_wait() to exit if the device is already disconnected to > > avoid unnecessary delay. As disconnected device is not really even a > > failure in the same sense as link failing to come up for whatever > > reason, return 0 instead of errno. > > The device being disconnected is not the same as a link failure. This we agree and it's what I tried to write above. > Do > all the callers do the right thing if pci_dev_wait() returns success > when there's no device there? It's a bit complicated. I honestly don't know what is the best approach here so I'm very much open to your suggestion what would be preferrable. There are two main use cases (more than two callers but they seem boil down to two use cases). One use case is reset related functions and those would probably prefer to have error returned if the wait, and thus reset, failed. Then the another is wait for buses, that is, pci_bridge_wait_for_secondary_bus() which return 0 if there's no device (wait successful). For it, it would make sense to return 0 also when device is disconnected because it seems analoguous to the case where there's no device in the first place. Perhaps it would be better to return -ENOTTY from pci_dev_wait() and check for that disconnected condition inside pci_bridge_wait_for_secondary_bus()? To further complicate things, however, DPC also depends on the return value of pci_bridge_wait_for_secondary_bus() and from its perspective, returning error when there is a device that is disconnected might be the desirable alternative (but I'm not fully sure how everything in DPC works but I highly suspect I'm correct here). Either way, the fix itself does care and seemed to work regardless of the actual value returned by pci_dev_wait(). > > Also make sure compiler does not become too clever with > > dev->error_state and use READ_ONCE() to force a fetch for the > > up-to-date value. > > I think we should have a comment there to say why READ_ONCE() is > needed. Otherwise it's hard to know whether a future change might > make it unnecessary. Sure, I'll add a comment there.
On Tue, 30 Jan 2024, Ilpo Järvinen wrote: > On Mon, 29 Jan 2024, Bjorn Helgaas wrote: > > > On Mon, Jan 29, 2024 at 01:27:10PM +0200, Ilpo Järvinen wrote: > > > On runtime resume, pci_dev_wait() is called: > > > pci_pm_runtime_resume() > > > pci_pm_bridge_power_up_actions() > > > pci_bridge_wait_for_secondary_bus() > > > pci_dev_wait() > > > > > > While a device is runtime suspended along with its PCIe hierarchy, the > > > device could get disconnected. In such case, the link will not come up > > > no matter how log pci_dev_wait() waits for it. > > > > s/PCIe/PCI/ (unless this is a PCIe-specific thing) > > s/log/long/ > > > > > Besides the above mentioned case, there could be other ways to get the > > > device disconnected while pci_dev_wait() is waiting for the link to > > > come up. > > > > > > Make pci_dev_wait() to exit if the device is already disconnected to > > > avoid unnecessary delay. As disconnected device is not really even a > > > failure in the same sense as link failing to come up for whatever > > > reason, return 0 instead of errno. > > > > The device being disconnected is not the same as a link failure. > > This we agree and it's what I tried to write above. > > > Do > > all the callers do the right thing if pci_dev_wait() returns success > > when there's no device there? > > It's a bit complicated. I honestly don't know what is the best approach > here so I'm very much open to your suggestion what would be preferrable. > > There are two main use cases (more than two callers but they seem boil > down to two use cases). > > One use case is reset related functions and those would probably prefer to > have error returned if the wait, and thus reset, failed. > > Then the another is wait for buses, that is, > pci_bridge_wait_for_secondary_bus() which return 0 if there's no device > (wait successful). For it, it would make sense to return 0 also when > device is disconnected because it seems analoguous to the case where > there's no device in the first place. > > Perhaps it would be better to return -ENOTTY from pci_dev_wait() and check > for that disconnected condition inside > pci_bridge_wait_for_secondary_bus()? To further complicate things, > however, DPC also depends on the return value of > pci_bridge_wait_for_secondary_bus() and from its perspective, returning > error when there is a device that is disconnected might be the desirable > alternative (but I'm not fully sure how everything in DPC works but I > highly suspect I'm correct here). Just to note here I intend to reverse the return to -ENOTTY in v2. It is easier and doing anything more complicated than that felt over-engineering because it would have just avoided marking same disconnected devices disconnected for another time which is harmless.
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index d8f11a078924..ec9bf6c90312 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1250,6 +1250,11 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout) for (;;) { u32 id; + if (pci_dev_is_disconnected(dev)) { + pci_dbg(dev, "disconnected; not waiting\n"); + return 0; + } + pci_read_config_dword(dev, PCI_COMMAND, &id); if (!PCI_POSSIBLE_ERROR(id)) break; diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 2336a8d1edab..563a275dff67 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -4,6 +4,8 @@ #include <linux/pci.h> +#include <asm/rwonce.h> + /* Number of possible devfns: 0.0 to 1f.7 inclusive */ #define MAX_NR_DEVFNS 256 @@ -370,7 +372,7 @@ static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused) static inline bool pci_dev_is_disconnected(const struct pci_dev *dev) { - return dev->error_state == pci_channel_io_perm_failure; + return READ_ONCE(dev->error_state) == pci_channel_io_perm_failure; } /* pci_dev priv_flags */