[v2] pci: fix device presence detection for VFs

Message ID 20221026060912.173250-1-mst@redhat.com
State New
Headers
Series [v2] pci: fix device presence detection for VFs |

Commit Message

Michael S. Tsirkin Oct. 26, 2022, 6:11 a.m. UTC
  virtio uses the same driver for VFs and PFs.  Accordingly,
pci_device_is_present is used to detect device presence. This function
isn't currently working properly for VFs since it attempts reading
device and vendor ID. As VFs are present if and only if PF is present,
just return the value for that device.

Reported-by: Wei Gong <gongwei833x@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

Wei Gong, thanks for your testing of the RFC!
As I made a small change, would appreciate re-testing.

Thanks!

changes from RFC:
	use pci_physfn() wrapper to make the code build without PCI_IOV


 drivers/pci/pci.c | 5 +++++
 1 file changed, 5 insertions(+)
  

Comments

Wei Gong Oct. 26, 2022, 1:46 p.m. UTC | #1
On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> virtio uses the same driver for VFs and PFs.  Accordingly,
> pci_device_is_present is used to detect device presence. This function
> isn't currently working properly for VFs since it attempts reading
> device and vendor ID. As VFs are present if and only if PF is present,
> just return the value for that device.
> 
> Reported-by: Wei Gong <gongwei833x@gmail.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Tested-by: Wei Gong <gongwei833x@gmail.com>

> ---
> 
> Wei Gong, thanks for your testing of the RFC!
> As I made a small change, would appreciate re-testing.

I updated your change and retested in my environment.
It worked well and fixed my problem.

> 
> Thanks!
> 
> changes from RFC:
> 	use pci_physfn() wrapper to make the code build without PCI_IOV
> 
> 
>  drivers/pci/pci.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 2127aba3550b..899b3f52e84e 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -6445,8 +6445,13 @@ bool pci_devs_are_dma_aliases(struct pci_dev *dev1, struct pci_dev *dev2)
>  
>  bool pci_device_is_present(struct pci_dev *pdev)
>  {
> +	struct pci_dev *physfn = pci_physfn(pdev);
>  	u32 v;
>  
> +	/* Not a PF? Switch to the PF. */
> +	if (physfn != pdev)
> +		return pci_device_is_present(physfn);
> +
>  	if (pci_dev_is_disconnected(pdev))
>  		return false;
>  	return pci_bus_read_dev_vendor_id(pdev->bus, pdev->devfn, &v, 0);
> -- 
> MST
>
  
Wei Gong Nov. 8, 2022, 4:52 a.m. UTC | #2
O Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> virtio uses the same driver for VFs and PFs.  Accordingly,
> pci_device_is_present is used to detect device presence. This function
> isn't currently working properly for VFs since it attempts reading
> device and vendor ID. As VFs are present if and only if PF is present,
> just return the value for that device.
> 
> Reported-by: Wei Gong <gongwei833x@gmail.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> 
> Wei Gong, thanks for your testing of the RFC!
> As I made a small change, would appreciate re-testing.
> 
> Thanks!
> 
> changes from RFC:
> 	use pci_physfn() wrapper to make the code build without PCI_IOV
> 
> 
>  drivers/pci/pci.c | 5 +++++
>  1 file changed, 5 insertions(+)

Tested-by: Wei Gong <gongwei833x@gmail.com>

retest done and well.

I would rephrase the bug.

according to sriov's protocol specification vendor_id and
device_id field in all VFs return FFFFh when read
so when vf devs is in the pci_device_is_present,it will be
misjudged as surprise removeal

when io is issued on the vf, normally disable virtio_blk vf
devs,at this time the disable opration will hang. and virtio
blk dev io hang.

task:bash            state:D stack:    0 pid: 1773 ppid:  1241 flags:0x00004002
Call Trace:
 <TASK>
 __schedule+0x2ee/0x900
 schedule+0x4f/0xc0
 blk_mq_freeze_queue_wait+0x69/0xa0
 ? wait_woken+0x80/0x80
 blk_mq_freeze_queue+0x1b/0x20
 blk_cleanup_queue+0x3d/0xd0
 virtblk_remove+0x3c/0xb0 [virtio_blk]
 virtio_dev_remove+0x4b/0x80
 device_release_driver_internal+0x103/0x1d0
 device_release_driver+0x12/0x20
 bus_remove_device+0xe1/0x150
 device_del+0x192/0x3f0
 device_unregister+0x1b/0x60
 unregister_virtio_device+0x18/0x30
 virtio_pci_remove+0x41/0x80
 pci_device_remove+0x3e/0xb0
 device_release_driver_internal+0x103/0x1d0
 device_release_driver+0x12/0x20
 pci_stop_bus_device+0x79/0xa0
 pci_stop_and_remove_bus_device+0x13/0x20
 pci_iov_remove_virtfn+0xc5/0x130
 ? pci_get_device+0x4a/0x60
 sriov_disable+0x33/0xf0
 pci_disable_sriov+0x26/0x30
 virtio_pci_sriov_configure+0x6f/0xa0
 sriov_numvfs_store+0x104/0x140
 dev_attr_store+0x17/0x30
 sysfs_kf_write+0x3e/0x50
 kernfs_fop_write_iter+0x138/0x1c0
 new_sync_write+0x117/0x1b0
 vfs_write+0x185/0x250
 ksys_write+0x67/0xe0
 __x64_sys_write+0x1a/0x20
 do_syscall_64+0x61/0xb0
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f21bd1f3ba4
RSP: 002b:00007ffd34a24188 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f21bd1f3ba4
RDX: 0000000000000002 RSI: 0000560305040800 RDI: 0000000000000001
RBP: 0000560305040800 R08: 000056030503fd50 R09: 0000000000000073
R10: 00000000ffffffff R11: 0000000000000202 R12: 0000000000000002
R13: 00007f21bd2de760 R14: 00007f21bd2da5e0 R15: 00007f21bd2d99e0

when virtio_blk is performing io, as long as there two stages of:
1. dispatch io. queue_usage_counter++;
2. io is completed after receiving the interrupt. queue_usage_counter--;

disable virtio_blk vfs:
  if(!pci_device_is_present(pci_dev))
    virtio_break_device(&vp_dev->vdev);
virtqueue for vf devs will be marked broken.
the interrupt notification io is end. Since it is judged that the
virtqueue has been marked as broken, the completed io will not be
performed.
So queue_usage_counter will not be cleared.
when the disk is removed at the same time, the queue will be frozen,
and you must wait for the queue_usage_counter to be cleared.
Therefore, it leads to the removeal of hang.



Thanks,
Wei Gong
  
Michael S. Tsirkin Nov. 8, 2022, 4:58 a.m. UTC | #3
On Tue, Nov 08, 2022 at 04:52:16AM +0000, Wei Gong wrote:
> O Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> > virtio uses the same driver for VFs and PFs.  Accordingly,
> > pci_device_is_present is used to detect device presence. This function
> > isn't currently working properly for VFs since it attempts reading
> > device and vendor ID. As VFs are present if and only if PF is present,
> > just return the value for that device.
> > 
> > Reported-by: Wei Gong <gongwei833x@gmail.com>
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > 
> > Wei Gong, thanks for your testing of the RFC!
> > As I made a small change, would appreciate re-testing.
> > 
> > Thanks!
> > 
> > changes from RFC:
> > 	use pci_physfn() wrapper to make the code build without PCI_IOV
> > 
> > 
> >  drivers/pci/pci.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> 
> Tested-by: Wei Gong <gongwei833x@gmail.com>



Thanks!
Bjorn, could you queue this fix in your tree or should I queue this
in the virtio tree? I also think this would be nice to have for stable too -
do you agree?
Thanks!


> retest done and well.
> 
> I would rephrase the bug.
> 
> according to sriov's protocol specification vendor_id and
> device_id field in all VFs return FFFFh when read
> so when vf devs is in the pci_device_is_present,it will be
> misjudged as surprise removeal
> 
> when io is issued on the vf, normally disable virtio_blk vf
> devs,at this time the disable opration will hang. and virtio
> blk dev io hang.
> 
> task:bash            state:D stack:    0 pid: 1773 ppid:  1241 flags:0x00004002
> Call Trace:
>  <TASK>
>  __schedule+0x2ee/0x900
>  schedule+0x4f/0xc0
>  blk_mq_freeze_queue_wait+0x69/0xa0
>  ? wait_woken+0x80/0x80
>  blk_mq_freeze_queue+0x1b/0x20
>  blk_cleanup_queue+0x3d/0xd0
>  virtblk_remove+0x3c/0xb0 [virtio_blk]
>  virtio_dev_remove+0x4b/0x80
>  device_release_driver_internal+0x103/0x1d0
>  device_release_driver+0x12/0x20
>  bus_remove_device+0xe1/0x150
>  device_del+0x192/0x3f0
>  device_unregister+0x1b/0x60
>  unregister_virtio_device+0x18/0x30
>  virtio_pci_remove+0x41/0x80
>  pci_device_remove+0x3e/0xb0
>  device_release_driver_internal+0x103/0x1d0
>  device_release_driver+0x12/0x20
>  pci_stop_bus_device+0x79/0xa0
>  pci_stop_and_remove_bus_device+0x13/0x20
>  pci_iov_remove_virtfn+0xc5/0x130
>  ? pci_get_device+0x4a/0x60
>  sriov_disable+0x33/0xf0
>  pci_disable_sriov+0x26/0x30
>  virtio_pci_sriov_configure+0x6f/0xa0
>  sriov_numvfs_store+0x104/0x140
>  dev_attr_store+0x17/0x30
>  sysfs_kf_write+0x3e/0x50
>  kernfs_fop_write_iter+0x138/0x1c0
>  new_sync_write+0x117/0x1b0
>  vfs_write+0x185/0x250
>  ksys_write+0x67/0xe0
>  __x64_sys_write+0x1a/0x20
>  do_syscall_64+0x61/0xb0
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f21bd1f3ba4
> RSP: 002b:00007ffd34a24188 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f21bd1f3ba4
> RDX: 0000000000000002 RSI: 0000560305040800 RDI: 0000000000000001
> RBP: 0000560305040800 R08: 000056030503fd50 R09: 0000000000000073
> R10: 00000000ffffffff R11: 0000000000000202 R12: 0000000000000002
> R13: 00007f21bd2de760 R14: 00007f21bd2da5e0 R15: 00007f21bd2d99e0
> 
> when virtio_blk is performing io, as long as there two stages of:
> 1. dispatch io. queue_usage_counter++;
> 2. io is completed after receiving the interrupt. queue_usage_counter--;
> 
> disable virtio_blk vfs:
>   if(!pci_device_is_present(pci_dev))
>     virtio_break_device(&vp_dev->vdev);
> virtqueue for vf devs will be marked broken.
> the interrupt notification io is end. Since it is judged that the
> virtqueue has been marked as broken, the completed io will not be
> performed.
> So queue_usage_counter will not be cleared.
> when the disk is removed at the same time, the queue will be frozen,
> and you must wait for the queue_usage_counter to be cleared.
> Therefore, it leads to the removeal of hang.
> 
> 
> 
> Thanks,
> Wei Gong
>
  
Bjorn Helgaas Nov. 8, 2022, 5:06 a.m. UTC | #4
On Mon, Nov 07, 2022 at 11:58:21PM -0500, Michael S. Tsirkin wrote:
> On Tue, Nov 08, 2022 at 04:52:16AM +0000, Wei Gong wrote:
> > O Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> > > virtio uses the same driver for VFs and PFs.  Accordingly,
> > > pci_device_is_present is used to detect device presence. This function
> > > isn't currently working properly for VFs since it attempts reading
> > > device and vendor ID. As VFs are present if and only if PF is present,
> > > just return the value for that device.
> > > 
> > > Reported-by: Wei Gong <gongwei833x@gmail.com>
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > > 
> > > Wei Gong, thanks for your testing of the RFC!
> > > As I made a small change, would appreciate re-testing.
> > > 
> > > Thanks!
> > > 
> > > changes from RFC:
> > > 	use pci_physfn() wrapper to make the code build without PCI_IOV
> > > 
> > > 
> > >  drivers/pci/pci.c | 5 +++++
> > >  1 file changed, 5 insertions(+)
> > 
> > Tested-by: Wei Gong <gongwei833x@gmail.com>
> 
> 
> 
> Thanks!
> Bjorn, could you queue this fix in your tree or should I queue this
> in the virtio tree? I also think this would be nice to have for stable too -
> do you agree?
> Thanks!

It's on my list to look at, sorry for the delay.

Bjorn
  
Bjorn Helgaas Nov. 8, 2022, 2:53 p.m. UTC | #5
On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> virtio uses the same driver for VFs and PFs.  Accordingly,
> pci_device_is_present is used to detect device presence. This function
> isn't currently working properly for VFs since it attempts reading
> device and vendor ID.

> As VFs are present if and only if PF is present,
> just return the value for that device.

VFs are only present when the PF is present *and* the PF has VF Enable
set.  Do you care about the possibility that VF Enable has been
cleared?

> Reported-by: Wei Gong <gongwei833x@gmail.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> 
> Wei Gong, thanks for your testing of the RFC!
> As I made a small change, would appreciate re-testing.
> 
> Thanks!
> 
> changes from RFC:
> 	use pci_physfn() wrapper to make the code build without PCI_IOV
> 
> 
>  drivers/pci/pci.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 2127aba3550b..899b3f52e84e 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -6445,8 +6445,13 @@ bool pci_devs_are_dma_aliases(struct pci_dev *dev1, struct pci_dev *dev2)
>  
>  bool pci_device_is_present(struct pci_dev *pdev)
>  {
> +	struct pci_dev *physfn = pci_physfn(pdev);
>  	u32 v;
>  
> +	/* Not a PF? Switch to the PF. */
> +	if (physfn != pdev)
> +		return pci_device_is_present(physfn);
> +
>  	if (pci_dev_is_disconnected(pdev))
>  		return false;
>  	return pci_bus_read_dev_vendor_id(pdev->bus, pdev->devfn, &v, 0);
> -- 
> MST
>
  
Bjorn Helgaas Nov. 8, 2022, 3:02 p.m. UTC | #6
On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote:
> On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> > virtio uses the same driver for VFs and PFs.  Accordingly,
> > pci_device_is_present is used to detect device presence. This function
> > isn't currently working properly for VFs since it attempts reading
> > device and vendor ID.
> 
> > As VFs are present if and only if PF is present,
> > just return the value for that device.
> 
> VFs are only present when the PF is present *and* the PF has VF Enable
> set.  Do you care about the possibility that VF Enable has been
> cleared?

Can you also include a hint about how the problem manifests, and a URL
to the report if available?

It's beyond the scope of this patch, but I've never liked the
semantics of pci_device_is_present() because it's racy by design.  All
it tells us is that some time in the *past*, the device was present.
It's telling that almost all calls test for !pci_device_is_present(),
which does make a little more sense.

> > Reported-by: Wei Gong <gongwei833x@gmail.com>
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > 
> > Wei Gong, thanks for your testing of the RFC!
> > As I made a small change, would appreciate re-testing.
> > 
> > Thanks!
> > 
> > changes from RFC:
> > 	use pci_physfn() wrapper to make the code build without PCI_IOV
> > 
> > 
> >  drivers/pci/pci.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 2127aba3550b..899b3f52e84e 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -6445,8 +6445,13 @@ bool pci_devs_are_dma_aliases(struct pci_dev *dev1, struct pci_dev *dev2)
> >  
> >  bool pci_device_is_present(struct pci_dev *pdev)
> >  {
> > +	struct pci_dev *physfn = pci_physfn(pdev);
> >  	u32 v;
> >  
> > +	/* Not a PF? Switch to the PF. */
> > +	if (physfn != pdev)
> > +		return pci_device_is_present(physfn);
> > +
> >  	if (pci_dev_is_disconnected(pdev))
> >  		return false;
> >  	return pci_bus_read_dev_vendor_id(pdev->bus, pdev->devfn, &v, 0);
> > -- 
> > MST
> >
  
Michael S. Tsirkin Nov. 8, 2022, 3:19 p.m. UTC | #7
On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote:
> On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote:
> > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> > > virtio uses the same driver for VFs and PFs.  Accordingly,
> > > pci_device_is_present is used to detect device presence. This function
> > > isn't currently working properly for VFs since it attempts reading
> > > device and vendor ID.
> > 
> > > As VFs are present if and only if PF is present,
> > > just return the value for that device.
> > 
> > VFs are only present when the PF is present *and* the PF has VF Enable
> > set.  Do you care about the possibility that VF Enable has been
> > cleared?
> 
> Can you also include a hint about how the problem manifests, and a URL
> to the report if available?

Here you go:
lore.kernel.org/all/20221108044819.GA861843%40zander/t.mbox.gz

is it enough to include this link or do you want me
to repost copying the text from there?

> It's beyond the scope of this patch, but I've never liked the
> semantics of pci_device_is_present() because it's racy by design.  All
> it tells us is that some time in the *past*, the device was present.
> It's telling that almost all calls test for !pci_device_is_present(),
> which does make a little more sense.

I agree. The problem is in the API really.
What people want is pci_device_was_removed()

With surprise removal at least at the pci express level
we know that there was a surprise removal event.
PCI subsystem seems to chose to discard that information.
There's nothing driver could do to reliably detect
that - if someone pulled the card out then stuck it back in
quickly driver will assume it's the old card and
attempt graceful removal, which is likely to fail.

However some of the problem is at the hardware level too.
If you are poking at the device's config and it's
pulled out and another is put back in quickly, your
config access might land at the new card.
Does not feel robust. I don't have a good solution for this
except "try to avoid config cycles as much as you can".




> > > Reported-by: Wei Gong <gongwei833x@gmail.com>
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > > 
> > > Wei Gong, thanks for your testing of the RFC!
> > > As I made a small change, would appreciate re-testing.
> > > 
> > > Thanks!
> > > 
> > > changes from RFC:
> > > 	use pci_physfn() wrapper to make the code build without PCI_IOV
> > > 
> > > 
> > >  drivers/pci/pci.c | 5 +++++
> > >  1 file changed, 5 insertions(+)
> > > 
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index 2127aba3550b..899b3f52e84e 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -6445,8 +6445,13 @@ bool pci_devs_are_dma_aliases(struct pci_dev *dev1, struct pci_dev *dev2)
> > >  
> > >  bool pci_device_is_present(struct pci_dev *pdev)
> > >  {
> > > +	struct pci_dev *physfn = pci_physfn(pdev);
> > >  	u32 v;
> > >  
> > > +	/* Not a PF? Switch to the PF. */
> > > +	if (physfn != pdev)
> > > +		return pci_device_is_present(physfn);
> > > +
> > >  	if (pci_dev_is_disconnected(pdev))
> > >  		return false;
> > >  	return pci_bus_read_dev_vendor_id(pdev->bus, pdev->devfn, &v, 0);
> > > -- 
> > > MST
> > >
  
Bjorn Helgaas Nov. 8, 2022, 5:58 p.m. UTC | #8
On Tue, Nov 08, 2022 at 10:19:07AM -0500, Michael S. Tsirkin wrote:
> On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote:
> > On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote:
> > > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> > > > virtio uses the same driver for VFs and PFs.  Accordingly,
> > > > pci_device_is_present is used to detect device presence. This function
> > > > isn't currently working properly for VFs since it attempts reading
> > > > device and vendor ID.
> > > 
> > > > As VFs are present if and only if PF is present,
> > > > just return the value for that device.
> > > 
> > > VFs are only present when the PF is present *and* the PF has VF Enable
> > > set.  Do you care about the possibility that VF Enable has been
> > > cleared?

I think you missed this question.

> > Can you also include a hint about how the problem manifests, and a URL
> > to the report if available?
> 
> Here you go:
> lore.kernel.org/all/20221108044819.GA861843%40zander/t.mbox.gz
> 
> is it enough to include this link or do you want me
> to repost copying the text from there?

Uh, well, OK, I guess I could dig through that and figure what what's
relevant.  I'd like the commit log to contain a hint of what the
problem looks like and some justification for why it should be
backported to stable.

I still look at Documentation/process/stable-kernel-rules.rst
occasionally to decide things like this, but I get the feeling that
it's a little out-of-date and more restrictive than current practice.

But I do think the "PF exists but VF disabled" situation needs to be
clarified somehow, too.

Bjorn
  
Michael S. Tsirkin Nov. 8, 2022, 6:02 p.m. UTC | #9
On Tue, Nov 08, 2022 at 11:58:53AM -0600, Bjorn Helgaas wrote:
> On Tue, Nov 08, 2022 at 10:19:07AM -0500, Michael S. Tsirkin wrote:
> > On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote:
> > > On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote:
> > > > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> > > > > virtio uses the same driver for VFs and PFs.  Accordingly,
> > > > > pci_device_is_present is used to detect device presence. This function
> > > > > isn't currently working properly for VFs since it attempts reading
> > > > > device and vendor ID.
> > > > 
> > > > > As VFs are present if and only if PF is present,
> > > > > just return the value for that device.
> > > > 
> > > > VFs are only present when the PF is present *and* the PF has VF Enable
> > > > set.  Do you care about the possibility that VF Enable has been
> > > > cleared?
> 
> I think you missed this question.

I was hoping Wei will answer that, I don't have the hardware.

> > > Can you also include a hint about how the problem manifests, and a URL
> > > to the report if available?
> > 
> > Here you go:
> > lore.kernel.org/all/20221108044819.GA861843%40zander/t.mbox.gz
> > 
> > is it enough to include this link or do you want me
> > to repost copying the text from there?
> 
> Uh, well, OK, I guess I could dig through that and figure what what's
> relevant.  I'd like the commit log to contain a hint of what the
> problem looks like and some justification for why it should be
> backported to stable.
> 
> I still look at Documentation/process/stable-kernel-rules.rst
> occasionally to decide things like this, but I get the feeling that
> it's a little out-of-date and more restrictive than current practice.
> 
> But I do think the "PF exists but VF disabled" situation needs to be
> clarified somehow, too.
> 
> Bjorn
  
Wei Gong Nov. 9, 2022, 4:36 a.m. UTC | #10
O Tue, Nov 08, 2022 at 01:02:35PM -0500, Michael S. Tsirkin wrote:
> On Tue, Nov 08, 2022 at 11:58:53AM -0600, Bjorn Helgaas wrote:
> > On Tue, Nov 08, 2022 at 10:19:07AM -0500, Michael S. Tsirkin wrote:
> > > On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote:
> > > > On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote:
> > > > > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> > > > > > virtio uses the same driver for VFs and PFs.  Accordingly,
> > > > > > pci_device_is_present is used to detect device presence. This function
> > > > > > isn't currently working properly for VFs since it attempts reading
> > > > > > device and vendor ID.
> > > > > 
> > > > > > As VFs are present if and only if PF is present,
> > > > > > just return the value for that device.
> > > > > 
> > > > > VFs are only present when the PF is present *and* the PF has VF Enable
> > > > > set.  Do you care about the possibility that VF Enable has been
> > > > > cleared?
> > 
> > I think you missed this question.
> 
> I was hoping Wei will answer that, I don't have the hardware.

In my case I don't care that VF Enable has been cleared.

> 
> > > > Can you also include a hint about how the problem manifests, and a URL
> > > > to the report if available?
> > > 
> > > Here you go:
> > > lore.kernel.org/all/20221108044819.GA861843%40zander/t.mbox.gz
> > > 
> > > is it enough to include this link or do you want me
> > > to repost copying the text from there?
> > 
> > Uh, well, OK, I guess I could dig through that and figure what what's
> > relevant.  I'd like the commit log to contain a hint of what the
> > problem looks like and some justification for why it should be
> > backported to stable.
> > 
> > I still look at Documentation/process/stable-kernel-rules.rst
> > occasionally to decide things like this, but I get the feeling that
> > it's a little out-of-date and more restrictive than current practice.
> > 
> > But I do think the "PF exists but VF disabled" situation needs to be
> > clarified somehow, too.
> > 
> > Bjorn
>
  
Bjorn Helgaas Nov. 9, 2022, 5:12 a.m. UTC | #11
On Wed, Nov 09, 2022 at 04:36:17AM +0000, Wei Gong wrote:
> O Tue, Nov 08, 2022 at 01:02:35PM -0500, Michael S. Tsirkin wrote:
> > On Tue, Nov 08, 2022 at 11:58:53AM -0600, Bjorn Helgaas wrote:
> > > On Tue, Nov 08, 2022 at 10:19:07AM -0500, Michael S. Tsirkin wrote:
> > > > On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote:
> > > > > On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote:
> > > > > > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> > > > > > > virtio uses the same driver for VFs and PFs.
> > > > > > > Accordingly, pci_device_is_present is used to detect
> > > > > > > device presence. This function isn't currently working
> > > > > > > properly for VFs since it attempts reading device and
> > > > > > > vendor ID.
> > > > > > 
> > > > > > > As VFs are present if and only if PF is present,
> > > > > > > just return the value for that device.
> > > > > > 
> > > > > > VFs are only present when the PF is present *and* the PF
> > > > > > has VF Enable set.  Do you care about the possibility that
> > > > > > VF Enable has been cleared?
> > > 
> > > I think you missed this question.
> > 
> > I was hoping Wei will answer that, I don't have the hardware.
> 
> In my case I don't care that VF Enable has been cleared.

OK, let me rephrase that :)

I think pci_device_is_present(VF) should return "false" if the PF is
present but VFs are disabled.

If you think it should return "true" when the PF is present and VFs
are disabled, we should explain why.

We would also need to fix the commit log, because "VFs are present if
and only if PF is present" is not actually true.  "VFs are present
only if PF is present" is true, but "VFs are present if PF is present"
is not.

Bjorn
  
Wei Gong Nov. 9, 2022, 7 a.m. UTC | #12
On Tue, Nov 08, 2022 at 11:12:34PM -0600, Bjorn Helgaas wrote:
> On Wed, Nov 09, 2022 at 04:36:17AM +0000, Wei Gong wrote:
> > O Tue, Nov 08, 2022 at 01:02:35PM -0500, Michael S. Tsirkin wrote:
> > > On Tue, Nov 08, 2022 at 11:58:53AM -0600, Bjorn Helgaas wrote:
> > > > On Tue, Nov 08, 2022 at 10:19:07AM -0500, Michael S. Tsirkin wrote:
> > > > > On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote:
> > > > > > On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote:
> > > > > > > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> > > > > > > > virtio uses the same driver for VFs and PFs.
> > > > > > > > Accordingly, pci_device_is_present is used to detect
> > > > > > > > device presence. This function isn't currently working
> > > > > > > > properly for VFs since it attempts reading device and
> > > > > > > > vendor ID.
> > > > > > > 
> > > > > > > > As VFs are present if and only if PF is present,
> > > > > > > > just return the value for that device.
> > > > > > > 
> > > > > > > VFs are only present when the PF is present *and* the PF
> > > > > > > has VF Enable set.  Do you care about the possibility that
> > > > > > > VF Enable has been cleared?
> > > > 
> > > > I think you missed this question.
> > > 
> > > I was hoping Wei will answer that, I don't have the hardware.
> > 
> > In my case I don't care that VF Enable has been cleared.
> 
> OK, let me rephrase that :)
> 
> I think pci_device_is_present(VF) should return "false" if the PF is
> present but VFs are disabled.

I agree.

> 
> If you think it should return "true" when the PF is present and VFs
> are disabled, we should explain why.

I don't think it should return "true" when the PF is present and VFS
are disabled.

I think pci_device_is_present(VF) should return "true" if the PF is
present and VFs are enabled.
In the current implementation, it cannot correctly judge whether the
VF is present.
When the PF is present and VFs are enabled, I think it should return
"true", but in fact it returns "false"

Through your comments, I realize that this patch is inaccurate in
judging whether VF present in the case of "the PF is present and 
VFs are disabled"

Thinks,
Wei

> 
> We would also need to fix the commit log, because "VFs are present if
> and only if PF is present" is not actually true.  "VFs are present
> only if PF is present" is true, but "VFs are present if PF is present"
> is not.
> 
> Bjorn
  
Michael S. Tsirkin Nov. 9, 2022, 7:10 a.m. UTC | #13
On Tue, Nov 08, 2022 at 11:12:34PM -0600, Bjorn Helgaas wrote:
> On Wed, Nov 09, 2022 at 04:36:17AM +0000, Wei Gong wrote:
> > O Tue, Nov 08, 2022 at 01:02:35PM -0500, Michael S. Tsirkin wrote:
> > > On Tue, Nov 08, 2022 at 11:58:53AM -0600, Bjorn Helgaas wrote:
> > > > On Tue, Nov 08, 2022 at 10:19:07AM -0500, Michael S. Tsirkin wrote:
> > > > > On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote:
> > > > > > On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote:
> > > > > > > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> > > > > > > > virtio uses the same driver for VFs and PFs.
> > > > > > > > Accordingly, pci_device_is_present is used to detect
> > > > > > > > device presence. This function isn't currently working
> > > > > > > > properly for VFs since it attempts reading device and
> > > > > > > > vendor ID.
> > > > > > > 
> > > > > > > > As VFs are present if and only if PF is present,
> > > > > > > > just return the value for that device.
> > > > > > > 
> > > > > > > VFs are only present when the PF is present *and* the PF
> > > > > > > has VF Enable set.  Do you care about the possibility that
> > > > > > > VF Enable has been cleared?
> > > > 
> > > > I think you missed this question.
> > > 
> > > I was hoping Wei will answer that, I don't have the hardware.
> > 
> > In my case I don't care that VF Enable has been cleared.
> 
> OK, let me rephrase that :)
> 
> I think pci_device_is_present(VF) should return "false" if the PF is
> present but VFs are disabled.
> 
> If you think it should return "true" when the PF is present and VFs
> are disabled, we should explain why.
> 
> We would also need to fix the commit log, because "VFs are present if
> and only if PF is present" is not actually true.  "VFs are present
> only if PF is present" is true, but "VFs are present if PF is present"
> is not.
> 
> Bjorn

Bjorn, I don't really understand the question.

How does one get a vf pointer without enabling sriov?
They are only created by sriov_add_vfs after calling
pcibios_sriov_enable.
  
Bjorn Helgaas Nov. 9, 2022, 5:30 p.m. UTC | #14
On Wed, Nov 09, 2022 at 02:10:30AM -0500, Michael S. Tsirkin wrote:
> On Tue, Nov 08, 2022 at 11:12:34PM -0600, Bjorn Helgaas wrote:
> > On Wed, Nov 09, 2022 at 04:36:17AM +0000, Wei Gong wrote:
> > > O Tue, Nov 08, 2022 at 01:02:35PM -0500, Michael S. Tsirkin wrote:
> > > > On Tue, Nov 08, 2022 at 11:58:53AM -0600, Bjorn Helgaas wrote:
> > > > > On Tue, Nov 08, 2022 at 10:19:07AM -0500, Michael S. Tsirkin wrote:
> > > > > > On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote:
> > > > > > > On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote:
> > > > > > > > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> > > > > > > > > virtio uses the same driver for VFs and PFs.
> > > > > > > > > Accordingly, pci_device_is_present is used to detect
> > > > > > > > > device presence. This function isn't currently working
> > > > > > > > > properly for VFs since it attempts reading device and
> > > > > > > > > vendor ID.
> > > > > > > > 
> > > > > > > > > As VFs are present if and only if PF is present,
> > > > > > > > > just return the value for that device.
> > > > > > > > 
> > > > > > > > VFs are only present when the PF is present *and* the PF
> > > > > > > > has VF Enable set.  Do you care about the possibility that
> > > > > > > > VF Enable has been cleared?
> > > > > 
> > > > > I think you missed this question.
> > > > 
> > > > I was hoping Wei will answer that, I don't have the hardware.
> > > 
> > > In my case I don't care that VF Enable has been cleared.
> > 
> > OK, let me rephrase that :)
> > 
> > I think pci_device_is_present(VF) should return "false" if the PF is
> > present but VFs are disabled.
> > 
> > If you think it should return "true" when the PF is present and VFs
> > are disabled, we should explain why.
> > 
> > We would also need to fix the commit log, because "VFs are present if
> > and only if PF is present" is not actually true.  "VFs are present
> > only if PF is present" is true, but "VFs are present if PF is present"
> > is not.
> 
> Bjorn, I don't really understand the question.
> 
> How does one get a vf pointer without enabling sriov?
> They are only created by sriov_add_vfs after calling
> pcibios_sriov_enable.

Oh, I think I see where you're coming from.  The fact that we have a
VF pointer means VFs were enabled in the past, and as long as the PF
is still present, the VFs should still be enabled.

Since the continued existence of the VF device depends on VF Enable, I
guess my question is whether we need to worry about VF Enable being
cleared, e.g., via sysfs reset or a buggy PF driver.

Taking a step back, I don't understand the
"if (!pci_device_is_present()) virtio_break_device()" strategy because
checking for device presence is always unreliable.  I assume the
consumer of vq->broken, e.g., virtnet_send_command(), would see a
failed PCI read that probably returns ~0 data.  Could it not check for
that and then figure out whether that's valid data or an error
indication?

It looks like today, virtnet_send_command() might sit in that "while"
loop calling virtqueue_get_buf() repeatedly until virtio_pci_remove()
notices the device is gone and marks it broken.  Something must be
failing in virtqueue_get_buf() in that interval between the device
disappearing and virtio_pci_remove() noticing it.

Bjorn
  
Michael S. Tsirkin Nov. 9, 2022, 5:49 p.m. UTC | #15
On Wed, Nov 09, 2022 at 11:30:29AM -0600, Bjorn Helgaas wrote:
> On Wed, Nov 09, 2022 at 02:10:30AM -0500, Michael S. Tsirkin wrote:
> > On Tue, Nov 08, 2022 at 11:12:34PM -0600, Bjorn Helgaas wrote:
> > > On Wed, Nov 09, 2022 at 04:36:17AM +0000, Wei Gong wrote:
> > > > O Tue, Nov 08, 2022 at 01:02:35PM -0500, Michael S. Tsirkin wrote:
> > > > > On Tue, Nov 08, 2022 at 11:58:53AM -0600, Bjorn Helgaas wrote:
> > > > > > On Tue, Nov 08, 2022 at 10:19:07AM -0500, Michael S. Tsirkin wrote:
> > > > > > > On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote:
> > > > > > > > On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote:
> > > > > > > > > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> > > > > > > > > > virtio uses the same driver for VFs and PFs.
> > > > > > > > > > Accordingly, pci_device_is_present is used to detect
> > > > > > > > > > device presence. This function isn't currently working
> > > > > > > > > > properly for VFs since it attempts reading device and
> > > > > > > > > > vendor ID.
> > > > > > > > > 
> > > > > > > > > > As VFs are present if and only if PF is present,
> > > > > > > > > > just return the value for that device.
> > > > > > > > > 
> > > > > > > > > VFs are only present when the PF is present *and* the PF
> > > > > > > > > has VF Enable set.  Do you care about the possibility that
> > > > > > > > > VF Enable has been cleared?
> > > > > > 
> > > > > > I think you missed this question.
> > > > > 
> > > > > I was hoping Wei will answer that, I don't have the hardware.
> > > > 
> > > > In my case I don't care that VF Enable has been cleared.
> > > 
> > > OK, let me rephrase that :)
> > > 
> > > I think pci_device_is_present(VF) should return "false" if the PF is
> > > present but VFs are disabled.
> > > 
> > > If you think it should return "true" when the PF is present and VFs
> > > are disabled, we should explain why.
> > > 
> > > We would also need to fix the commit log, because "VFs are present if
> > > and only if PF is present" is not actually true.  "VFs are present
> > > only if PF is present" is true, but "VFs are present if PF is present"
> > > is not.
> > 
> > Bjorn, I don't really understand the question.
> > 
> > How does one get a vf pointer without enabling sriov?
> > They are only created by sriov_add_vfs after calling
> > pcibios_sriov_enable.
> 
> Oh, I think I see where you're coming from.  The fact that we have a
> VF pointer means VFs were enabled in the past, and as long as the PF
> is still present, the VFs should still be enabled.
> 
> Since the continued existence of the VF device depends on VF Enable, I
> guess my question is whether we need to worry about VF Enable being
> cleared, e.g., via sysfs reset or a buggy PF driver.
> 
> Taking a step back, I don't understand the
> "if (!pci_device_is_present()) virtio_break_device()" strategy because
> checking for device presence is always unreliable.

The point is to break out of loops.


>  I assume the
> consumer of vq->broken, e.g., virtnet_send_command(), would see a
> failed PCI read that probably returns ~0 data.  Could it not check for
> that and then figure out whether that's valid data or an error
> indication?

No, it's not doing any reads - it is waiting for a DMA.

> It looks like today, virtnet_send_command() might sit in that "while"
> loop calling virtqueue_get_buf() repeatedly until virtio_pci_remove()
> notices the device is gone and marks it broken.  Something must be
> failing in virtqueue_get_buf() in that interval between the device
> disappearing and virtio_pci_remove() noticing it.
> 
> Bjorn

Nope - it is just doing posted writes, these disappear into thin ether
if there's no target.
  
Bjorn Helgaas Nov. 10, 2022, 7:35 p.m. UTC | #16
Hi Wei,

I can't quite parse this.  Is the problem that you had some virtio I/O
in progress, you wrote "0" to /sys/.../sriov_numvfs, and the virtio
I/O operation hangs?

Is there any indication to the user, e.g., softlockup oops?

More questions below.

On Tue, Nov 08, 2022 at 04:52:16AM +0000, Wei Gong wrote:

> according to sriov's protocol specification vendor_id and
> device_id field in all VFs return FFFFh when read
> so when vf devs is in the pci_device_is_present,it will be
> misjudged as surprise removeal
> 
> when io is issued on the vf, normally disable virtio_blk vf
> devs,at this time the disable opration will hang. and virtio
> blk dev io hang.
>
> task:bash            state:D stack:    0 pid: 1773 ppid:  1241 flags:0x00004002
> Call Trace:
>  <TASK>
>  __schedule+0x2ee/0x900
>  schedule+0x4f/0xc0
>  blk_mq_freeze_queue_wait+0x69/0xa0
>  ? wait_woken+0x80/0x80
>  blk_mq_freeze_queue+0x1b/0x20
>  blk_cleanup_queue+0x3d/0xd0
>  virtblk_remove+0x3c/0xb0 [virtio_blk]
>  virtio_dev_remove+0x4b/0x80
>  device_release_driver_internal+0x103/0x1d0
>  device_release_driver+0x12/0x20
>  bus_remove_device+0xe1/0x150
>  device_del+0x192/0x3f0
>  device_unregister+0x1b/0x60
>  unregister_virtio_device+0x18/0x30
>  virtio_pci_remove+0x41/0x80
>  pci_device_remove+0x3e/0xb0
>  device_release_driver_internal+0x103/0x1d0
>  device_release_driver+0x12/0x20
>  pci_stop_bus_device+0x79/0xa0
>  pci_stop_and_remove_bus_device+0x13/0x20
>  pci_iov_remove_virtfn+0xc5/0x130
>  ? pci_get_device+0x4a/0x60
>  sriov_disable+0x33/0xf0
>  pci_disable_sriov+0x26/0x30
>  virtio_pci_sriov_configure+0x6f/0xa0
>  sriov_numvfs_store+0x104/0x140
>  dev_attr_store+0x17/0x30
>  sysfs_kf_write+0x3e/0x50
>  kernfs_fop_write_iter+0x138/0x1c0
>  new_sync_write+0x117/0x1b0
>  vfs_write+0x185/0x250
>  ksys_write+0x67/0xe0
>  __x64_sys_write+0x1a/0x20
>  do_syscall_64+0x61/0xb0
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f21bd1f3ba4
> RSP: 002b:00007ffd34a24188 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f21bd1f3ba4
> RDX: 0000000000000002 RSI: 0000560305040800 RDI: 0000000000000001
> RBP: 0000560305040800 R08: 000056030503fd50 R09: 0000000000000073
> R10: 00000000ffffffff R11: 0000000000000202 R12: 0000000000000002
> R13: 00007f21bd2de760 R14: 00007f21bd2da5e0 R15: 00007f21bd2d99e0
> 
> when virtio_blk is performing io, as long as there two stages of:
> 1. dispatch io. queue_usage_counter++;
> 2. io is completed after receiving the interrupt. queue_usage_counter--;
> 
> disable virtio_blk vfs:
>   if(!pci_device_is_present(pci_dev))
>     virtio_break_device(&vp_dev->vdev);
> virtqueue for vf devs will be marked broken.
> the interrupt notification io is end. Since it is judged that the
> virtqueue has been marked as broken, the completed io will not be
> performed.
> So queue_usage_counter will not be cleared.
> when the disk is removed at the same time, the queue will be frozen,
> and you must wait for the queue_usage_counter to be cleared.
> Therefore, it leads to the removeal of hang.

I want to follow along in the code, but I need some hints.

"queue_usage_counter" looks like it's supposed to be a symbol, but I
can't find it.

Where (which function) is the I/O dispatched and queue_usage_counter
incremented?  Where is queue_usage_counter decremented?

Prior to this change pci_device_is_present(VF) returned "false"
(because the VF Vendor ID is 0xffff); after the change it will return
"true" (because it will look at the PF Vendor ID instead).

Previously virtio_pci_remove() called virtio_break_device().  I guess
that meant the virtio I/O operation will never be completed?

But if we don't call virtio_break_device(), the virtio I/O operation
*will* be completed?

Bjorn
  
Michael S. Tsirkin Nov. 10, 2022, 8:15 p.m. UTC | #17
On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote:
> Hi Wei,
> 
> I can't quite parse this.  Is the problem that you had some virtio I/O
> in progress, you wrote "0" to /sys/.../sriov_numvfs, and the virtio
> I/O operation hangs?

I think so. I also think that just attempting to
remove the module or to unbind the driver from it
will have the same effect.

> Is there any indication to the user, e.g., softlockup oops?
> 
> More questions below.
> 
> On Tue, Nov 08, 2022 at 04:52:16AM +0000, Wei Gong wrote:
> 
> > according to sriov's protocol specification vendor_id and
> > device_id field in all VFs return FFFFh when read
> > so when vf devs is in the pci_device_is_present,it will be
> > misjudged as surprise removeal
> > 
> > when io is issued on the vf, normally disable virtio_blk vf
> > devs,at this time the disable opration will hang. and virtio
> > blk dev io hang.
> >
> > task:bash            state:D stack:    0 pid: 1773 ppid:  1241 flags:0x00004002
> > Call Trace:
> >  <TASK>
> >  __schedule+0x2ee/0x900
> >  schedule+0x4f/0xc0
> >  blk_mq_freeze_queue_wait+0x69/0xa0
> >  ? wait_woken+0x80/0x80
> >  blk_mq_freeze_queue+0x1b/0x20
> >  blk_cleanup_queue+0x3d/0xd0
> >  virtblk_remove+0x3c/0xb0 [virtio_blk]
> >  virtio_dev_remove+0x4b/0x80
> >  device_release_driver_internal+0x103/0x1d0
> >  device_release_driver+0x12/0x20
> >  bus_remove_device+0xe1/0x150
> >  device_del+0x192/0x3f0
> >  device_unregister+0x1b/0x60
> >  unregister_virtio_device+0x18/0x30
> >  virtio_pci_remove+0x41/0x80
> >  pci_device_remove+0x3e/0xb0
> >  device_release_driver_internal+0x103/0x1d0
> >  device_release_driver+0x12/0x20
> >  pci_stop_bus_device+0x79/0xa0
> >  pci_stop_and_remove_bus_device+0x13/0x20
> >  pci_iov_remove_virtfn+0xc5/0x130
> >  ? pci_get_device+0x4a/0x60
> >  sriov_disable+0x33/0xf0
> >  pci_disable_sriov+0x26/0x30
> >  virtio_pci_sriov_configure+0x6f/0xa0
> >  sriov_numvfs_store+0x104/0x140
> >  dev_attr_store+0x17/0x30
> >  sysfs_kf_write+0x3e/0x50
> >  kernfs_fop_write_iter+0x138/0x1c0
> >  new_sync_write+0x117/0x1b0
> >  vfs_write+0x185/0x250
> >  ksys_write+0x67/0xe0
> >  __x64_sys_write+0x1a/0x20
> >  do_syscall_64+0x61/0xb0
> >  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > RIP: 0033:0x7f21bd1f3ba4
> > RSP: 002b:00007ffd34a24188 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
> > RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f21bd1f3ba4
> > RDX: 0000000000000002 RSI: 0000560305040800 RDI: 0000000000000001
> > RBP: 0000560305040800 R08: 000056030503fd50 R09: 0000000000000073
> > R10: 00000000ffffffff R11: 0000000000000202 R12: 0000000000000002
> > R13: 00007f21bd2de760 R14: 00007f21bd2da5e0 R15: 00007f21bd2d99e0
> > 
> > when virtio_blk is performing io, as long as there two stages of:
> > 1. dispatch io. queue_usage_counter++;
> > 2. io is completed after receiving the interrupt. queue_usage_counter--;
> > 
> > disable virtio_blk vfs:
> >   if(!pci_device_is_present(pci_dev))
> >     virtio_break_device(&vp_dev->vdev);
> > virtqueue for vf devs will be marked broken.
> > the interrupt notification io is end. Since it is judged that the
> > virtqueue has been marked as broken, the completed io will not be
> > performed.
> > So queue_usage_counter will not be cleared.
> > when the disk is removed at the same time, the queue will be frozen,
> > and you must wait for the queue_usage_counter to be cleared.
> > Therefore, it leads to the removeal of hang.
> 
> I want to follow along in the code, but I need some hints.
> 
> "queue_usage_counter" looks like it's supposed to be a symbol, but I
> can't find it.

I think it refers to q->q_usage_counter in blk core.

> Where (which function) is the I/O dispatched and queue_usage_counter
> incremented?  Where is queue_usage_counter decremented?
> 
> Prior to this change pci_device_is_present(VF) returned "false"
> (because the VF Vendor ID is 0xffff); after the change it will return
> "true" (because it will look at the PF Vendor ID instead).
> 
> Previously virtio_pci_remove() called virtio_break_device().  I guess
> that meant the virtio I/O operation will never be completed?
> 
> But if we don't call virtio_break_device(), the virtio I/O operation
> *will* be completed?
> 
> Bjorn

It's completed anyway - nothing special happened at the device
level - but driver does not detect it.

Calling virtio_break_device will mark all queues as broken, as
a result attempts to check whether operation completed
will return false.

This probably means we need to work on handling surprise removal
better in virtio blk - since it looks like actual suprise
removal will hang too. But that I think is a separate issue.
  
Wei Gong Nov. 11, 2022, 4 a.m. UTC | #18
On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote:
> Hi Wei,
> 
> I can't quite parse this.  Is the problem that you had some virtio I/O
> in progress, you wrote "0" to /sys/.../sriov_numvfs, and the virtio
> I/O operation hangs?

It is, and has the same effect when unbind driver.

> 
> Is there any indication to the user, e.g., softlockup oops?

There is no more user output than the Trace stack listed below.

> 
> More questions below.
> 
> On Tue, Nov 08, 2022 at 04:52:16AM +0000, Wei Gong wrote:
> 
> > according to sriov's protocol specification vendor_id and
> > device_id field in all VFs return FFFFh when read
> > so when vf devs is in the pci_device_is_present,it will be
> > misjudged as surprise removeal
> > 
> > when io is issued on the vf, normally disable virtio_blk vf
> > devs,at this time the disable opration will hang. and virtio
> > blk dev io hang.
> >
> > task:bash            state:D stack:    0 pid: 1773 ppid:  1241 flags:0x00004002
> > Call Trace:
> >  <TASK>
> >  __schedule+0x2ee/0x900
> >  schedule+0x4f/0xc0
> >  blk_mq_freeze_queue_wait+0x69/0xa0
> >  ? wait_woken+0x80/0x80
> >  blk_mq_freeze_queue+0x1b/0x20
> >  blk_cleanup_queue+0x3d/0xd0
> >  virtblk_remove+0x3c/0xb0 [virtio_blk]
> >  virtio_dev_remove+0x4b/0x80
> >  device_release_driver_internal+0x103/0x1d0
> >  device_release_driver+0x12/0x20
> >  bus_remove_device+0xe1/0x150
> >  device_del+0x192/0x3f0
> >  device_unregister+0x1b/0x60
> >  unregister_virtio_device+0x18/0x30
> >  virtio_pci_remove+0x41/0x80
> >  pci_device_remove+0x3e/0xb0
> >  device_release_driver_internal+0x103/0x1d0
> >  device_release_driver+0x12/0x20
> >  pci_stop_bus_device+0x79/0xa0
> >  pci_stop_and_remove_bus_device+0x13/0x20
> >  pci_iov_remove_virtfn+0xc5/0x130
> >  ? pci_get_device+0x4a/0x60
> >  sriov_disable+0x33/0xf0
> >  pci_disable_sriov+0x26/0x30
> >  virtio_pci_sriov_configure+0x6f/0xa0
> >  sriov_numvfs_store+0x104/0x140
> >  dev_attr_store+0x17/0x30
> >  sysfs_kf_write+0x3e/0x50
> >  kernfs_fop_write_iter+0x138/0x1c0
> >  new_sync_write+0x117/0x1b0
> >  vfs_write+0x185/0x250
> >  ksys_write+0x67/0xe0
> >  __x64_sys_write+0x1a/0x20
> >  do_syscall_64+0x61/0xb0
> >  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > RIP: 0033:0x7f21bd1f3ba4
> > RSP: 002b:00007ffd34a24188 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
> > RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f21bd1f3ba4
> > RDX: 0000000000000002 RSI: 0000560305040800 RDI: 0000000000000001
> > RBP: 0000560305040800 R08: 000056030503fd50 R09: 0000000000000073
> > R10: 00000000ffffffff R11: 0000000000000202 R12: 0000000000000002
> > R13: 00007f21bd2de760 R14: 00007f21bd2da5e0 R15: 00007f21bd2d99e0
> > 
> > when virtio_blk is performing io, as long as there two stages of:
> > 1. dispatch io. queue_usage_counter++;
> > 2. io is completed after receiving the interrupt. queue_usage_counter--;
> > 
> > disable virtio_blk vfs:
> >   if(!pci_device_is_present(pci_dev))
> >     virtio_break_device(&vp_dev->vdev);
> > virtqueue for vf devs will be marked broken.
> > the interrupt notification io is end. Since it is judged that the
> > virtqueue has been marked as broken, the completed io will not be
> > performed.
> > So queue_usage_counter will not be cleared.
> > when the disk is removed at the same time, the queue will be frozen,
> > and you must wait for the queue_usage_counter to be cleared.
> > Therefore, it leads to the removeal of hang.
> 
> I want to follow along in the code, but I need some hints.
> 
> "queue_usage_counter" looks like it's supposed to be a symbol, but I
> can't find it.
> 
> Where (which function) is the I/O dispatched and queue_usage_counter
> incremented?  Where is queue_usage_counter decremented?

queue_usage_counter is blk core q->q_usage_counter.

In blk_mq:
submit_bio
  -> submit_bio_noacct
    -> __submit_bio_noacct_mq
      -> bio_queue_enter
bio_queue_enter: q_usage_counter++


Complete io:
vp_vring_interrupt
  -> vring_interrupt (   if (vq->broken) { return; }  )
    -> virtblk_done
      -> blk_mq_complete_request
        -> virtblk_request_done
	  -> blk_mq_end_request
	    -> blk_mq_free_request
	      -> blk_queue_exit
blk_queue_exit: q_usage_counter--

> 
> Prior to this change pci_device_is_present(VF) returned "false"
> (because the VF Vendor ID is 0xffff); after the change it will return
> "true" (because it will look at the PF Vendor ID instead).
> 
> Previously virtio_pci_remove() called virtio_break_device().  I guess
> that meant the virtio I/O operation will never be completed?
> 
> But if we don't call virtio_break_device(), the virtio I/O operation
> *will* be completed?
> 
> Bjorn

I think so.
In virtblk I think I don't want virtio_break_device() to be called
because it need to complete, but it would be well if it could be 
called at the right time.

The problem new is that it's being called in a situation where it
shouldn't, and that's because pci_device_is_present() is giving
incorrect return.

Wei
  
Bjorn Helgaas Nov. 11, 2022, 11:39 p.m. UTC | #19
On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote:
> virtio uses the same driver for VFs and PFs.  Accordingly,
> pci_device_is_present is used to detect device presence. This function
> isn't currently working properly for VFs since it attempts reading
> device and vendor ID. As VFs are present if and only if PF is present,
> just return the value for that device.
> 
> Reported-by: Wei Gong <gongwei833x@gmail.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Applied as below to pci/enumeration for v6.2, thanks!

commit 98b04dd0b457 ("PCI: Fix pci_device_is_present() for VFs by checking PF")
Author: Michael S. Tsirkin <mst@redhat.com>
Date:   Wed Oct 26 02:11:21 2022 -0400

    PCI: Fix pci_device_is_present() for VFs by checking PF
    
    pci_device_is_present() previously didn't work for VFs because it reads the
    Vendor and Device ID, which are 0xffff for VFs, which looks like they
    aren't present.  Check the PF instead.
    
    Wei Gong reported that if virtio I/O is in progress when the driver is
    unbound or "0" is written to /sys/.../sriov_numvfs, the virtio I/O
    operation hangs, which may result in output like this:
    
      task:bash state:D stack:    0 pid: 1773 ppid:  1241 flags:0x00004002
      Call Trace:
       schedule+0x4f/0xc0
       blk_mq_freeze_queue_wait+0x69/0xa0
       blk_mq_freeze_queue+0x1b/0x20
       blk_cleanup_queue+0x3d/0xd0
       virtblk_remove+0x3c/0xb0 [virtio_blk]
       virtio_dev_remove+0x4b/0x80
       ...
       device_unregister+0x1b/0x60
       unregister_virtio_device+0x18/0x30
       virtio_pci_remove+0x41/0x80
       pci_device_remove+0x3e/0xb0
    
    This happened because pci_device_is_present(VF) returned "false" in
    virtio_pci_remove(), so it called virtio_break_device().  The broken vq
    meant that vring_interrupt() skipped the vq.callback() that would have
    completed the virtio I/O operation via virtblk_done().
    
    [bhelgaas: commit log, simplify to always use pci_physfn(), add stable tag]
    Link: https://lore.kernel.org/r/20221026060912.173250-1-mst@redhat.com
    Reported-by: Wei Gong <gongwei833x@gmail.com>
    Tested-by: Wei Gong <gongwei833x@gmail.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Cc: stable@vger.kernel.org

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 9f3cc829dfee..fba95486caaf 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -6447,6 +6447,8 @@ bool pci_device_is_present(struct pci_dev *pdev)
 {
 	u32 v;
 
+	/* Check PF if pdev is a VF, since VF Vendor/Device IDs are 0xffff */
+	pdev = pci_physfn(pdev);
 	if (pci_dev_is_disconnected(pdev))
 		return false;
 	return pci_bus_read_dev_vendor_id(pdev->bus, pdev->devfn, &v, 0);
  
Bjorn Helgaas Nov. 11, 2022, 11:42 p.m. UTC | #20
On Thu, Nov 10, 2022 at 03:15:55PM -0500, Michael S. Tsirkin wrote:
> On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote:
> ...

> > Prior to this change pci_device_is_present(VF) returned "false"
> > (because the VF Vendor ID is 0xffff); after the change it will return
> > "true" (because it will look at the PF Vendor ID instead).
> > 
> > Previously virtio_pci_remove() called virtio_break_device().  I guess
> > that meant the virtio I/O operation will never be completed?
> > 
> > But if we don't call virtio_break_device(), the virtio I/O operation
> > *will* be completed?
> 
> It's completed anyway - nothing special happened at the device
> level - but driver does not detect it.
> 
> Calling virtio_break_device will mark all queues as broken, as
> a result attempts to check whether operation completed
> will return false.
> 
> This probably means we need to work on handling surprise removal
> better in virtio blk - since it looks like actual suprise
> removal will hang too. But that I think is a separate issue.

Yeah, this situation doesn't seem like it's inherently special for
virtio or VFs, so it's a little surprising to see
pci_device_is_present() used there.

Bjorn
  
Michael S. Tsirkin Nov. 13, 2022, 8:46 a.m. UTC | #21
On Fri, Nov 11, 2022 at 05:42:19PM -0600, Bjorn Helgaas wrote:
> On Thu, Nov 10, 2022 at 03:15:55PM -0500, Michael S. Tsirkin wrote:
> > On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote:
> > ...
> 
> > > Prior to this change pci_device_is_present(VF) returned "false"
> > > (because the VF Vendor ID is 0xffff); after the change it will return
> > > "true" (because it will look at the PF Vendor ID instead).
> > > 
> > > Previously virtio_pci_remove() called virtio_break_device().  I guess
> > > that meant the virtio I/O operation will never be completed?
> > > 
> > > But if we don't call virtio_break_device(), the virtio I/O operation
> > > *will* be completed?
> > 
> > It's completed anyway - nothing special happened at the device
> > level - but driver does not detect it.
> > 
> > Calling virtio_break_device will mark all queues as broken, as
> > a result attempts to check whether operation completed
> > will return false.
> > 
> > This probably means we need to work on handling surprise removal
> > better in virtio blk - since it looks like actual suprise
> > removal will hang too. But that I think is a separate issue.
> 
> Yeah, this situation doesn't seem like it's inherently special for
> virtio or VFs, so it's a little surprising to see
> pci_device_is_present() used there.
> 
> Bjorn


Just making sure - pci_device_is_present *is* the suggested way
to distinguish between graceful and surprise removal, isn't it?
  
Bjorn Helgaas Nov. 15, 2022, 4:24 p.m. UTC | #22
[+cc Lukas; you can probably give a better answer here :)]

On Sun, Nov 13, 2022 at 03:46:06AM -0500, Michael S. Tsirkin wrote:
> On Fri, Nov 11, 2022 at 05:42:19PM -0600, Bjorn Helgaas wrote:
> > On Thu, Nov 10, 2022 at 03:15:55PM -0500, Michael S. Tsirkin wrote:
> > > On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote:
> > > ...
> > 
> > > > Prior to this change pci_device_is_present(VF) returned "false"
> > > > (because the VF Vendor ID is 0xffff); after the change it will return
> > > > "true" (because it will look at the PF Vendor ID instead).
> > > > 
> > > > Previously virtio_pci_remove() called virtio_break_device().  I guess
> > > > that meant the virtio I/O operation will never be completed?
> > > > 
> > > > But if we don't call virtio_break_device(), the virtio I/O operation
> > > > *will* be completed?
> > > 
> > > It's completed anyway - nothing special happened at the device
> > > level - but driver does not detect it.
> > > 
> > > Calling virtio_break_device will mark all queues as broken, as
> > > a result attempts to check whether operation completed
> > > will return false.
> > > 
> > > This probably means we need to work on handling surprise removal
> > > better in virtio blk - since it looks like actual suprise
> > > removal will hang too. But that I think is a separate issue.
> > 
> > Yeah, this situation doesn't seem like it's inherently special for
> > virtio or VFs, so it's a little surprising to see
> > pci_device_is_present() used there.
> 
> Just making sure - pci_device_is_present *is* the suggested way
> to distinguish between graceful and surprise removal, isn't it?

I'm not quite sure what you're asking here.  A driver would learn
about a graceful removal when its .remove() method is called before
the device is physically removed.  The device is still accessible and
everything should just work.

A driver would learn about a surprise removal either by a read result
that is PCI_POSSIBLE_ERROR() or possibly when its .error_detected()
callback is called.  The .remove() method will eventually be called
when we destroy the pci_dev.

I guess .remove() might use pci_device_is_present() and assume that if
it returns "true", this is a graceful removal.  But that's not
reliable since the device could be physically removed between the
pci_device_is_present() call and the driver's next access to it.

I think the best thing would be for .remove() to just do whatever it
needs to do and look for errors, e.g., using PCI_POSSIBLE_ERROR(),
without relying on pci_device_is_present().

If .remove() wants to avoid doing something that might cause an error,
maybe we should expose pci_dev_is_disconnected().  That's set by the
hotplug remove paths before .remove() is called and feels a little
less racy.

Bjorn
  
Lukas Wunner Nov. 16, 2022, 11:16 a.m. UTC | #23
[cc += Parav Pandit, author of 43bb40c5b926]

On Sun, Nov 13, 2022 at 03:46:06AM -0500, Michael S. Tsirkin wrote:
> On Fri, Nov 11, 2022 at 05:42:19PM -0600, Bjorn Helgaas wrote:
> > On Thu, Nov 10, 2022 at 03:15:55PM -0500, Michael S. Tsirkin wrote:
> > > On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote:
> > > > Prior to this change pci_device_is_present(VF) returned "false"
> > > > (because the VF Vendor ID is 0xffff); after the change it will return
> > > > "true" (because it will look at the PF Vendor ID instead).
> > > > 
> > > > Previously virtio_pci_remove() called virtio_break_device().  I guess
> > > > that meant the virtio I/O operation will never be completed?
> > > > 
> > > > But if we don't call virtio_break_device(), the virtio I/O operation
> > > > *will* be completed?
> 
> Just making sure - pci_device_is_present *is* the suggested way
> to distinguish between graceful and surprise removal, isn't it?

No, it's not.  Instead of !pci_device_is_present() you really want to
call pci_dev_is_disconnected() instead.

While the fix Bjorn applied for v6.2 may solve the issue and may make
sense on it's own, it's not the solution you're looking for.  You want
to swap the call to !pci_device_is_present() with pci_dev_is_disconnected(),
move pci_dev_is_disconnected() from drivers/pci/pci.h to include/linux/pci.h
and add a Fixes tag referencing 43bb40c5b926.

If you don't want to move pci_dev_is_disconnected(), you can alternatively
check for "pdev->error_state == pci_channel_io_perm_failure" or call
pci_channel_offline().  The latter will also return true though on
transient inaccessibility of the device (e.g. if it's being reset).

The theory of operation is as follows:  The PCI layer does indeed know
whether the device was surprise removed or gracefully removed and that
information is passed in the "presence" flag to pciehp_unconfigure_device()
(in drivers/pci/hotplug/pciehp_pci.c).  That function does the following:

	if (!presence)
		pci_walk_bus(parent, pci_dev_set_disconnected, NULL);

In other words, pdev->error_state is set to pci_channel_io_perm_failure
on the entire hierarchy below the hotplug port.  And pci_dev_is_disconnected()
simply checks whether that's the device's error_state.

pci_dev_is_disconnected() makes sense if you definitely know the device
is gone and want to skip certain steps or delays on device teardown.
However be aware that the device may be hot-removed after graceful
removal was initiated.  In such a situation, pci_dev_is_disconnected()
may return false and you'll try to access the device as normal, even
though it was yanked from the slot after the pci_dev_is_disconnected()
call was performed.  Ideally you should be able to cope with such
scenarios as well.

For some more background info, refer to this LWN article (scroll down
to the "Surprise removal" section):
https://lwn.net/Articles/767885/

Thanks,

Lukas
  
Parav Pandit Nov. 17, 2022, 5:36 a.m. UTC | #24
> From: Lukas Wunner <lukas@wunner.de>
> Sent: Wednesday, November 16, 2022 6:16 AM
> 
> [cc += Parav Pandit, author of 43bb40c5b926]
> 
> On Sun, Nov 13, 2022 at 03:46:06AM -0500, Michael S. Tsirkin wrote:
> > On Fri, Nov 11, 2022 at 05:42:19PM -0600, Bjorn Helgaas wrote:
> > > On Thu, Nov 10, 2022 at 03:15:55PM -0500, Michael S. Tsirkin wrote:
> > > > On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote:
> > > > > Prior to this change pci_device_is_present(VF) returned "false"
> > > > > (because the VF Vendor ID is 0xffff); after the change it will
> > > > > return "true" (because it will look at the PF Vendor ID instead).
> > > > >
> > > > > Previously virtio_pci_remove() called virtio_break_device().  I
> > > > > guess that meant the virtio I/O operation will never be completed?
> > > > >
> > > > > But if we don't call virtio_break_device(), the virtio I/O
> > > > > operation
> > > > > *will* be completed?
> >
> > Just making sure - pci_device_is_present *is* the suggested way to
> > distinguish between graceful and surprise removal, isn't it?
> 
> No, it's not.  Instead of !pci_device_is_present() you really want to call
> pci_dev_is_disconnected() instead.
> 
> While the fix Bjorn applied for v6.2 may solve the issue and may make sense
> on it's own, it's not the solution you're looking for.  You want to swap the
> call to !pci_device_is_present() with pci_dev_is_disconnected(), move
> pci_dev_is_disconnected() from drivers/pci/pci.h to include/linux/pci.h and
> add a Fixes tag referencing 43bb40c5b926.
> 
> If you don't want to move pci_dev_is_disconnected(), you can alternatively
> check for "pdev->error_state == pci_channel_io_perm_failure" or call
> pci_channel_offline().  The latter will also return true though on transient
> inaccessibility of the device (e.g. if it's being reset).
> 
pci_device_is_present() is calling pci_dev_is_disconnected().
pci_dev_is_disconnected() avoids reading the vendor id.
So pci_dev_is_disconnected() looks less strong check.
I see that it can return a valid value on recoverable error case.

In that case, is pci_channel_offline() a more precise way to check that covers transient and permanent error?

And if that is the right check, we need to fix all the callers, mainly widely used nvme driver [1].

[1] https://elixir.bootlin.com/linux/v6.1-rc5/source/drivers/nvme/host/pci.c#L3228

Also, we need to add API documentation on when to use this API in context of hotplug, so that all related drivers can consistently use single API.

> The theory of operation is as follows:  The PCI layer does indeed know
> whether the device was surprise removed or gracefully removed and that
> information is passed in the "presence" flag to pciehp_unconfigure_device()
> (in drivers/pci/hotplug/pciehp_pci.c).  That function does the following:
> 
> 	if (!presence)
> 		pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
> 
> In other words, pdev->error_state is set to pci_channel_io_perm_failure on
> the entire hierarchy below the hotplug port.  And pci_dev_is_disconnected()
> simply checks whether that's the device's error_state.
> 
> pci_dev_is_disconnected() makes sense if you definitely know the device is
> gone and want to skip certain steps or delays on device teardown.
> However be aware that the device may be hot-removed after graceful
> removal was initiated.  In such a situation, pci_dev_is_disconnected() may
> return false and you'll try to access the device as normal, even though it was
> yanked from the slot after the pci_dev_is_disconnected() call was
> performed.  Ideally you should be able to cope with such scenarios as well.
> 
> For some more background info, refer to this LWN article (scroll down to the
> "Surprise removal" section):
> https://lwn.net/Articles/767885/
> 
> Thanks,
> 
> Lukas
  
Michael S. Tsirkin Dec. 19, 2022, 5:56 a.m. UTC | #25
On Thu, Nov 17, 2022 at 05:36:48AM +0000, Parav Pandit wrote:
> 
> > From: Lukas Wunner <lukas@wunner.de>
> > Sent: Wednesday, November 16, 2022 6:16 AM
> > 
> > [cc += Parav Pandit, author of 43bb40c5b926]
> > 
> > On Sun, Nov 13, 2022 at 03:46:06AM -0500, Michael S. Tsirkin wrote:
> > > On Fri, Nov 11, 2022 at 05:42:19PM -0600, Bjorn Helgaas wrote:
> > > > On Thu, Nov 10, 2022 at 03:15:55PM -0500, Michael S. Tsirkin wrote:
> > > > > On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote:
> > > > > > Prior to this change pci_device_is_present(VF) returned "false"
> > > > > > (because the VF Vendor ID is 0xffff); after the change it will
> > > > > > return "true" (because it will look at the PF Vendor ID instead).
> > > > > >
> > > > > > Previously virtio_pci_remove() called virtio_break_device().  I
> > > > > > guess that meant the virtio I/O operation will never be completed?
> > > > > >
> > > > > > But if we don't call virtio_break_device(), the virtio I/O
> > > > > > operation
> > > > > > *will* be completed?
> > >
> > > Just making sure - pci_device_is_present *is* the suggested way to
> > > distinguish between graceful and surprise removal, isn't it?
> > 
> > No, it's not.  Instead of !pci_device_is_present() you really want to call
> > pci_dev_is_disconnected() instead.
> > 
> > While the fix Bjorn applied for v6.2 may solve the issue and may make sense
> > on it's own, it's not the solution you're looking for.  You want to swap the
> > call to !pci_device_is_present() with pci_dev_is_disconnected(), move
> > pci_dev_is_disconnected() from drivers/pci/pci.h to include/linux/pci.h and
> > add a Fixes tag referencing 43bb40c5b926.
> > 
> > If you don't want to move pci_dev_is_disconnected(), you can alternatively
> > check for "pdev->error_state == pci_channel_io_perm_failure" or call
> > pci_channel_offline().  The latter will also return true though on transient
> > inaccessibility of the device (e.g. if it's being reset).
> > 
> pci_device_is_present() is calling pci_dev_is_disconnected().
> pci_dev_is_disconnected() avoids reading the vendor id.
> So pci_dev_is_disconnected() looks less strong check.
> I see that it can return a valid value on recoverable error case.
> 
> In that case, is pci_channel_offline() a more precise way to check that covers transient and permanent error?
> 
> And if that is the right check, we need to fix all the callers, mainly widely used nvme driver [1].
> 
> [1] https://elixir.bootlin.com/linux/v6.1-rc5/source/drivers/nvme/host/pci.c#L3228
> 
> Also, we need to add API documentation on when to use this API in context of hotplug, so that all related drivers can consistently use single API.

Bjorn, Lukas, what's your take on this idea?
Thanks!


> > The theory of operation is as follows:  The PCI layer does indeed know
> > whether the device was surprise removed or gracefully removed and that
> > information is passed in the "presence" flag to pciehp_unconfigure_device()
> > (in drivers/pci/hotplug/pciehp_pci.c).  That function does the following:
> > 
> > 	if (!presence)
> > 		pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
> > 
> > In other words, pdev->error_state is set to pci_channel_io_perm_failure on
> > the entire hierarchy below the hotplug port.  And pci_dev_is_disconnected()
> > simply checks whether that's the device's error_state.
> > 
> > pci_dev_is_disconnected() makes sense if you definitely know the device is
> > gone and want to skip certain steps or delays on device teardown.
> > However be aware that the device may be hot-removed after graceful
> > removal was initiated.  In such a situation, pci_dev_is_disconnected() may
> > return false and you'll try to access the device as normal, even though it was
> > yanked from the slot after the pci_dev_is_disconnected() call was
> > performed.  Ideally you should be able to cope with such scenarios as well.
> > 
> > For some more background info, refer to this LWN article (scroll down to the
> > "Surprise removal" section):
> > https://lwn.net/Articles/767885/
> > 
> > Thanks,
> > 
> > Lukas
  
Lukas Wunner Dec. 19, 2022, 8:22 a.m. UTC | #26
On Mon, Dec 19, 2022 at 12:56:15AM -0500, Michael S. Tsirkin wrote:
> On Thu, Nov 17, 2022 at 05:36:48AM +0000, Parav Pandit wrote:
> > > From: Lukas Wunner <lukas@wunner.de>
> > > Sent: Wednesday, November 16, 2022 6:16 AM
> > > 
> > > [cc += Parav Pandit, author of 43bb40c5b926]
> > > 
> > > On Sun, Nov 13, 2022 at 03:46:06AM -0500, Michael S. Tsirkin wrote:
> > > > On Fri, Nov 11, 2022 at 05:42:19PM -0600, Bjorn Helgaas wrote:
> > > > > On Thu, Nov 10, 2022 at 03:15:55PM -0500, Michael S. Tsirkin wrote:
> > > > > > On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote:
> > > > > > > Prior to this change pci_device_is_present(VF) returned "false"
> > > > > > > (because the VF Vendor ID is 0xffff); after the change it will
> > > > > > > return "true" (because it will look at the PF Vendor ID instead).
> > > > > > >
> > > > > > > Previously virtio_pci_remove() called virtio_break_device().  I
> > > > > > > guess that meant the virtio I/O operation will never be completed?
> > > > > > >
> > > > > > > But if we don't call virtio_break_device(), the virtio I/O
> > > > > > > operation
> > > > > > > *will* be completed?
> > > >
> > > > Just making sure - pci_device_is_present *is* the suggested way to
> > > > distinguish between graceful and surprise removal, isn't it?
> > > 
> > > No, it's not.  Instead of !pci_device_is_present() you really want to call
> > > pci_dev_is_disconnected() instead.
> > > 
> > > While the fix Bjorn applied for v6.2 may solve the issue and may make sense
> > > on it's own, it's not the solution you're looking for.  You want to swap the
> > > call to !pci_device_is_present() with pci_dev_is_disconnected(), move
> > > pci_dev_is_disconnected() from drivers/pci/pci.h to include/linux/pci.h and
> > > add a Fixes tag referencing 43bb40c5b926.
> > > 
> > > If you don't want to move pci_dev_is_disconnected(), you can alternatively
> > > check for "pdev->error_state == pci_channel_io_perm_failure" or call
> > > pci_channel_offline().  The latter will also return true though on transient
> > > inaccessibility of the device (e.g. if it's being reset).
> > > 
> > pci_device_is_present() is calling pci_dev_is_disconnected().
> > pci_dev_is_disconnected() avoids reading the vendor id.
> > So pci_dev_is_disconnected() looks less strong check.
> > I see that it can return a valid value on recoverable error case.
> > 
> > In that case, is pci_channel_offline() a more precise way to check that covers transient and permanent error?
> > 
> > And if that is the right check, we need to fix all the callers, mainly widely used nvme driver [1].
> > 
> > [1] https://elixir.bootlin.com/linux/v6.1-rc5/source/drivers/nvme/host/pci.c#L3228
> > 
> > Also, we need to add API documentation on when to use this API in context of hotplug, so that all related drivers can consistently use single API.
> 
> Bjorn, Lukas, what's your take on this idea?

I don't really know what to add to my e-mail of Nov 16
(quoted here in full).

Yes, pci_channel_offline() returns true on transient and permanent
failure.  Whether that's what you want, depends on your use case.
If you want to check for a surprise-removed device, then you only
want to check for permanent failure, so pci_channel_offline() is
not correct and you should rather check for
"pdev->error_state == pci_channel_io_perm_failure" or move
pci_dev_is_disconnected() to include/linux/pci.h.  But again,
I've already explained this in my e-mail ov Nov 16, so I don't
know what's unclear.

Thanks,

Lukas

> > > The theory of operation is as follows:  The PCI layer does indeed know
> > > whether the device was surprise removed or gracefully removed and that
> > > information is passed in the "presence" flag to pciehp_unconfigure_device()
> > > (in drivers/pci/hotplug/pciehp_pci.c).  That function does the following:
> > > 
> > > 	if (!presence)
> > > 		pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
> > > 
> > > In other words, pdev->error_state is set to pci_channel_io_perm_failure on
> > > the entire hierarchy below the hotplug port.  And pci_dev_is_disconnected()
> > > simply checks whether that's the device's error_state.
> > > 
> > > pci_dev_is_disconnected() makes sense if you definitely know the device is
> > > gone and want to skip certain steps or delays on device teardown.
> > > However be aware that the device may be hot-removed after graceful
> > > removal was initiated.  In such a situation, pci_dev_is_disconnected() may
> > > return false and you'll try to access the device as normal, even though it was
> > > yanked from the slot after the pci_dev_is_disconnected() call was
> > > performed.  Ideally you should be able to cope with such scenarios as well.
> > > 
> > > For some more background info, refer to this LWN article (scroll down to the
> > > "Surprise removal" section):
> > > https://lwn.net/Articles/767885/
> > > 
> > > Thanks,
> > > 
> > > Lukas
  

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 2127aba3550b..899b3f52e84e 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -6445,8 +6445,13 @@  bool pci_devs_are_dma_aliases(struct pci_dev *dev1, struct pci_dev *dev2)
 
 bool pci_device_is_present(struct pci_dev *pdev)
 {
+	struct pci_dev *physfn = pci_physfn(pdev);
 	u32 v;
 
+	/* Not a PF? Switch to the PF. */
+	if (physfn != pdev)
+		return pci_device_is_present(physfn);
+
 	if (pci_dev_is_disconnected(pdev))
 		return false;
 	return pci_bus_read_dev_vendor_id(pdev->bus, pdev->devfn, &v, 0);