Message ID | 20221026060912.173250-1-mst@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp88158wru; Tue, 25 Oct 2022 23:32:24 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6BYmnjp+TWQEpO8vKL3kKuvZxDFj9xedMeyXnOU0H4sW13mYAaidhtDHSc5haKQqPHkwYx X-Received: by 2002:a63:f755:0:b0:44b:e438:ef2f with SMTP id f21-20020a63f755000000b0044be438ef2fmr37176252pgk.314.1666765944625; Tue, 25 Oct 2022 23:32:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666765944; cv=none; d=google.com; s=arc-20160816; b=IbQLDXqif7mtdXcxDC9SnW2vQlFzhmQyWMPaxxAzLIzk39LeqeOu5Xm5Hs155pfd3y MD3bMAW2J2cRYS56ceWCbART39HFxXh6oI+q0J90htC2EI14jAIE5u6jHDq1SG6epRIF R0sA6ce5gmJIU3eV32t56KfZZernMV1cK2ArvWSLaPmsgsJpYpTFUw8E8wLbo1Vbv4DY AqJnpxLgllI2AJpQXA1h3zZispmeXke1Eh2D80QjBc0ziqNJ4m3Oc4vEopWKFrZObkVA rRNBVmxtu4ZcKA9LSnmqbbb1XBWA/lQRtLESLp8uGF+zlKh/y2CHn/rwiPYicBV41kEl SjVg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-disposition:mime-version:message-id :subject:cc:to:from:date:dkim-signature; bh=tgxjTxvjPTIXV3SFnxEgSVVrpV7F4tmAX7ZyULG30rc=; b=Qly6tpKc+lV2HNdM7gEKkUbUOpayQU+1r3hvAX2GXvkJHBrAdAYVNx9aATBWhPZLFb VMqT7TEhjrPE7ukp4eU/fD6Nq6UIdzDZfUAmBxvJjrR88lvQ5EMb3Odgu/AG5Ya9Pko3 twk4vjavesNkoPXKYfydAO1KS3fUBdm3pDvAE1U5r5yPaTOn7zn6VuDWdvsAQfacfqvZ HAbslETKLWWLlwKPbepGW43r/IgD5zbovJfc9d+LMS1nKTQ0wYlb9aMgCI8hloF8b0WQ Wvi+xPvNzZEzvs90cnvcf2etyIG4M83cO3Cw58/VEp9yMi+o+NHB6ctqfpmn9r7qMpfU 9FMg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=VGJCYgsr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w4-20020a631604000000b0046f0ae1bb7dsi5615291pgl.170.2022.10.25.23.32.09; Tue, 25 Oct 2022 23:32:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=VGJCYgsr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231937AbiJZGLj (ORCPT <rfc822;pwkd43@gmail.com> + 99 others); Wed, 26 Oct 2022 02:11:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52282 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233022AbiJZGLe (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 26 Oct 2022 02:11:34 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3FDE6923F7 for <linux-kernel@vger.kernel.org>; Tue, 25 Oct 2022 23:11:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666764691; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type; bh=tgxjTxvjPTIXV3SFnxEgSVVrpV7F4tmAX7ZyULG30rc=; b=VGJCYgsrBkesDEYLsUc2TUc9I6sQQ9uEfdOWwvo5FEtKx0tmGgYfP4ReUBI7DzExh/+23l C5F1QZB9MExiaxESy5cfcF/wHkxXm8RoqIV3fACa/eIOIYUhQy6sG/SSffsTGkl2RNgJE4 xYveTBJH3FydOYv7V0IwoR0WDX1aYL8= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-279-K37i301cPmugDpp7MWN5pA-1; Wed, 26 Oct 2022 02:11:26 -0400 X-MC-Unique: K37i301cPmugDpp7MWN5pA-1 Received: by mail-wm1-f69.google.com with SMTP id o18-20020a05600c4fd200b003c6ceb1339bso1015203wmq.1 for <linux-kernel@vger.kernel.org>; Tue, 25 Oct 2022 23:11:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-disposition:mime-version:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tgxjTxvjPTIXV3SFnxEgSVVrpV7F4tmAX7ZyULG30rc=; b=0oC5M8EK2gxBFsTwf43oxyJunO1O+3KyNWi3ntOjhwQ0TYdwTZYSghyo9e5oDGf5YK TLPA0StFMIHcQBbxjmPbCmH4L257kvjAcGjAFfFk5vADbPeDKU5eZePNIxfJMthhDnLg IziVhCkOvQHxeVDyJz//Z0/RMqUM7/3Zyaw8zXynVNaK4iwUgUXvNu014XzwWZMhmRQj ptttEbNE2aCv7u3ZY2rPMCBJFswfvyBDK5byNsYKUk6axHVT+nEtxdjIL+mFd8qKOzVP el7byyPBH7PNXT1OJp3rvQPy+ltkBslydXVbXb8eQxVyotMXyLfiBn9NRjBMpTJY72IX khPg== X-Gm-Message-State: ACrzQf0HVY0eTtsn+/auwZWck2CMRqyylaaG9XOmve3ot2zb18URWCOE 1NYpKjdGkLLi4eg/w47+J2ZbY8sB0lJUDG3G/CkH3eW/Y53Yb2BFJFOQ0sQgQRvKVBBNZR559s4 RoDE49z3htwp8QwZNylYRT4O7Q45Sf7wuJzsPp84+6RMP4u8bAb7fbk1sLUJ7R90Xb2NH6w== X-Received: by 2002:a7b:c4c2:0:b0:3b4:fdc4:6df9 with SMTP id g2-20020a7bc4c2000000b003b4fdc46df9mr1144876wmk.123.1666764685222; Tue, 25 Oct 2022 23:11:25 -0700 (PDT) X-Received: by 2002:a7b:c4c2:0:b0:3b4:fdc4:6df9 with SMTP id g2-20020a7bc4c2000000b003b4fdc46df9mr1144857wmk.123.1666764684970; Tue, 25 Oct 2022 23:11:24 -0700 (PDT) Received: from redhat.com ([2.55.3.42]) by smtp.gmail.com with ESMTPSA id p8-20020a05600c2e8800b003c64c186206sm915339wmn.16.2022.10.25.23.11.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Oct 2022 23:11:24 -0700 (PDT) Date: Wed, 26 Oct 2022 02:11:21 -0400 From: "Michael S. Tsirkin" <mst@redhat.com> To: linux-kernel@vger.kernel.org Cc: Wei Gong <gongwei833x@gmail.com>, Bjorn Helgaas <bhelgaas@google.com>, linux-pci@vger.kernel.org Subject: [PATCH v2] pci: fix device presence detection for VFs Message-ID: <20221026060912.173250-1-mst@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Mailer: git-send-email 2.27.0.106.g8ac3dc51b1 X-Mutt-Fcc: =sent X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747730767204325220?= X-GMAIL-MSGID: =?utf-8?q?1747730767204325220?= |
Series |
[v2] pci: fix device presence detection for VFs
|
|
Commit Message
Michael S. Tsirkin
Oct. 26, 2022, 6:11 a.m. UTC
virtio uses the same driver for VFs and PFs. Accordingly,
pci_device_is_present is used to detect device presence. This function
isn't currently working properly for VFs since it attempts reading
device and vendor ID. As VFs are present if and only if PF is present,
just return the value for that device.
Reported-by: Wei Gong <gongwei833x@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
Wei Gong, thanks for your testing of the RFC!
As I made a small change, would appreciate re-testing.
Thanks!
changes from RFC:
use pci_physfn() wrapper to make the code build without PCI_IOV
drivers/pci/pci.c | 5 +++++
1 file changed, 5 insertions(+)
Comments
On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote: > virtio uses the same driver for VFs and PFs. Accordingly, > pci_device_is_present is used to detect device presence. This function > isn't currently working properly for VFs since it attempts reading > device and vendor ID. As VFs are present if and only if PF is present, > just return the value for that device. > > Reported-by: Wei Gong <gongwei833x@gmail.com> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Wei Gong <gongwei833x@gmail.com> > --- > > Wei Gong, thanks for your testing of the RFC! > As I made a small change, would appreciate re-testing. I updated your change and retested in my environment. It worked well and fixed my problem. > > Thanks! > > changes from RFC: > use pci_physfn() wrapper to make the code build without PCI_IOV > > > drivers/pci/pci.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 2127aba3550b..899b3f52e84e 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -6445,8 +6445,13 @@ bool pci_devs_are_dma_aliases(struct pci_dev *dev1, struct pci_dev *dev2) > > bool pci_device_is_present(struct pci_dev *pdev) > { > + struct pci_dev *physfn = pci_physfn(pdev); > u32 v; > > + /* Not a PF? Switch to the PF. */ > + if (physfn != pdev) > + return pci_device_is_present(physfn); > + > if (pci_dev_is_disconnected(pdev)) > return false; > return pci_bus_read_dev_vendor_id(pdev->bus, pdev->devfn, &v, 0); > -- > MST >
O Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote: > virtio uses the same driver for VFs and PFs. Accordingly, > pci_device_is_present is used to detect device presence. This function > isn't currently working properly for VFs since it attempts reading > device and vendor ID. As VFs are present if and only if PF is present, > just return the value for that device. > > Reported-by: Wei Gong <gongwei833x@gmail.com> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > --- > > Wei Gong, thanks for your testing of the RFC! > As I made a small change, would appreciate re-testing. > > Thanks! > > changes from RFC: > use pci_physfn() wrapper to make the code build without PCI_IOV > > > drivers/pci/pci.c | 5 +++++ > 1 file changed, 5 insertions(+) Tested-by: Wei Gong <gongwei833x@gmail.com> retest done and well. I would rephrase the bug. according to sriov's protocol specification vendor_id and device_id field in all VFs return FFFFh when read so when vf devs is in the pci_device_is_present,it will be misjudged as surprise removeal when io is issued on the vf, normally disable virtio_blk vf devs,at this time the disable opration will hang. and virtio blk dev io hang. task:bash state:D stack: 0 pid: 1773 ppid: 1241 flags:0x00004002 Call Trace: <TASK> __schedule+0x2ee/0x900 schedule+0x4f/0xc0 blk_mq_freeze_queue_wait+0x69/0xa0 ? wait_woken+0x80/0x80 blk_mq_freeze_queue+0x1b/0x20 blk_cleanup_queue+0x3d/0xd0 virtblk_remove+0x3c/0xb0 [virtio_blk] virtio_dev_remove+0x4b/0x80 device_release_driver_internal+0x103/0x1d0 device_release_driver+0x12/0x20 bus_remove_device+0xe1/0x150 device_del+0x192/0x3f0 device_unregister+0x1b/0x60 unregister_virtio_device+0x18/0x30 virtio_pci_remove+0x41/0x80 pci_device_remove+0x3e/0xb0 device_release_driver_internal+0x103/0x1d0 device_release_driver+0x12/0x20 pci_stop_bus_device+0x79/0xa0 pci_stop_and_remove_bus_device+0x13/0x20 pci_iov_remove_virtfn+0xc5/0x130 ? pci_get_device+0x4a/0x60 sriov_disable+0x33/0xf0 pci_disable_sriov+0x26/0x30 virtio_pci_sriov_configure+0x6f/0xa0 sriov_numvfs_store+0x104/0x140 dev_attr_store+0x17/0x30 sysfs_kf_write+0x3e/0x50 kernfs_fop_write_iter+0x138/0x1c0 new_sync_write+0x117/0x1b0 vfs_write+0x185/0x250 ksys_write+0x67/0xe0 __x64_sys_write+0x1a/0x20 do_syscall_64+0x61/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f21bd1f3ba4 RSP: 002b:00007ffd34a24188 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f21bd1f3ba4 RDX: 0000000000000002 RSI: 0000560305040800 RDI: 0000000000000001 RBP: 0000560305040800 R08: 000056030503fd50 R09: 0000000000000073 R10: 00000000ffffffff R11: 0000000000000202 R12: 0000000000000002 R13: 00007f21bd2de760 R14: 00007f21bd2da5e0 R15: 00007f21bd2d99e0 when virtio_blk is performing io, as long as there two stages of: 1. dispatch io. queue_usage_counter++; 2. io is completed after receiving the interrupt. queue_usage_counter--; disable virtio_blk vfs: if(!pci_device_is_present(pci_dev)) virtio_break_device(&vp_dev->vdev); virtqueue for vf devs will be marked broken. the interrupt notification io is end. Since it is judged that the virtqueue has been marked as broken, the completed io will not be performed. So queue_usage_counter will not be cleared. when the disk is removed at the same time, the queue will be frozen, and you must wait for the queue_usage_counter to be cleared. Therefore, it leads to the removeal of hang. Thanks, Wei Gong
On Tue, Nov 08, 2022 at 04:52:16AM +0000, Wei Gong wrote: > O Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote: > > virtio uses the same driver for VFs and PFs. Accordingly, > > pci_device_is_present is used to detect device presence. This function > > isn't currently working properly for VFs since it attempts reading > > device and vendor ID. As VFs are present if and only if PF is present, > > just return the value for that device. > > > > Reported-by: Wei Gong <gongwei833x@gmail.com> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > --- > > > > Wei Gong, thanks for your testing of the RFC! > > As I made a small change, would appreciate re-testing. > > > > Thanks! > > > > changes from RFC: > > use pci_physfn() wrapper to make the code build without PCI_IOV > > > > > > drivers/pci/pci.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > Tested-by: Wei Gong <gongwei833x@gmail.com> Thanks! Bjorn, could you queue this fix in your tree or should I queue this in the virtio tree? I also think this would be nice to have for stable too - do you agree? Thanks! > retest done and well. > > I would rephrase the bug. > > according to sriov's protocol specification vendor_id and > device_id field in all VFs return FFFFh when read > so when vf devs is in the pci_device_is_present,it will be > misjudged as surprise removeal > > when io is issued on the vf, normally disable virtio_blk vf > devs,at this time the disable opration will hang. and virtio > blk dev io hang. > > task:bash state:D stack: 0 pid: 1773 ppid: 1241 flags:0x00004002 > Call Trace: > <TASK> > __schedule+0x2ee/0x900 > schedule+0x4f/0xc0 > blk_mq_freeze_queue_wait+0x69/0xa0 > ? wait_woken+0x80/0x80 > blk_mq_freeze_queue+0x1b/0x20 > blk_cleanup_queue+0x3d/0xd0 > virtblk_remove+0x3c/0xb0 [virtio_blk] > virtio_dev_remove+0x4b/0x80 > device_release_driver_internal+0x103/0x1d0 > device_release_driver+0x12/0x20 > bus_remove_device+0xe1/0x150 > device_del+0x192/0x3f0 > device_unregister+0x1b/0x60 > unregister_virtio_device+0x18/0x30 > virtio_pci_remove+0x41/0x80 > pci_device_remove+0x3e/0xb0 > device_release_driver_internal+0x103/0x1d0 > device_release_driver+0x12/0x20 > pci_stop_bus_device+0x79/0xa0 > pci_stop_and_remove_bus_device+0x13/0x20 > pci_iov_remove_virtfn+0xc5/0x130 > ? pci_get_device+0x4a/0x60 > sriov_disable+0x33/0xf0 > pci_disable_sriov+0x26/0x30 > virtio_pci_sriov_configure+0x6f/0xa0 > sriov_numvfs_store+0x104/0x140 > dev_attr_store+0x17/0x30 > sysfs_kf_write+0x3e/0x50 > kernfs_fop_write_iter+0x138/0x1c0 > new_sync_write+0x117/0x1b0 > vfs_write+0x185/0x250 > ksys_write+0x67/0xe0 > __x64_sys_write+0x1a/0x20 > do_syscall_64+0x61/0xb0 > entry_SYSCALL_64_after_hwframe+0x44/0xae > RIP: 0033:0x7f21bd1f3ba4 > RSP: 002b:00007ffd34a24188 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 > RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f21bd1f3ba4 > RDX: 0000000000000002 RSI: 0000560305040800 RDI: 0000000000000001 > RBP: 0000560305040800 R08: 000056030503fd50 R09: 0000000000000073 > R10: 00000000ffffffff R11: 0000000000000202 R12: 0000000000000002 > R13: 00007f21bd2de760 R14: 00007f21bd2da5e0 R15: 00007f21bd2d99e0 > > when virtio_blk is performing io, as long as there two stages of: > 1. dispatch io. queue_usage_counter++; > 2. io is completed after receiving the interrupt. queue_usage_counter--; > > disable virtio_blk vfs: > if(!pci_device_is_present(pci_dev)) > virtio_break_device(&vp_dev->vdev); > virtqueue for vf devs will be marked broken. > the interrupt notification io is end. Since it is judged that the > virtqueue has been marked as broken, the completed io will not be > performed. > So queue_usage_counter will not be cleared. > when the disk is removed at the same time, the queue will be frozen, > and you must wait for the queue_usage_counter to be cleared. > Therefore, it leads to the removeal of hang. > > > > Thanks, > Wei Gong >
On Mon, Nov 07, 2022 at 11:58:21PM -0500, Michael S. Tsirkin wrote: > On Tue, Nov 08, 2022 at 04:52:16AM +0000, Wei Gong wrote: > > O Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote: > > > virtio uses the same driver for VFs and PFs. Accordingly, > > > pci_device_is_present is used to detect device presence. This function > > > isn't currently working properly for VFs since it attempts reading > > > device and vendor ID. As VFs are present if and only if PF is present, > > > just return the value for that device. > > > > > > Reported-by: Wei Gong <gongwei833x@gmail.com> > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > > --- > > > > > > Wei Gong, thanks for your testing of the RFC! > > > As I made a small change, would appreciate re-testing. > > > > > > Thanks! > > > > > > changes from RFC: > > > use pci_physfn() wrapper to make the code build without PCI_IOV > > > > > > > > > drivers/pci/pci.c | 5 +++++ > > > 1 file changed, 5 insertions(+) > > > > Tested-by: Wei Gong <gongwei833x@gmail.com> > > > > Thanks! > Bjorn, could you queue this fix in your tree or should I queue this > in the virtio tree? I also think this would be nice to have for stable too - > do you agree? > Thanks! It's on my list to look at, sorry for the delay. Bjorn
On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote: > virtio uses the same driver for VFs and PFs. Accordingly, > pci_device_is_present is used to detect device presence. This function > isn't currently working properly for VFs since it attempts reading > device and vendor ID. > As VFs are present if and only if PF is present, > just return the value for that device. VFs are only present when the PF is present *and* the PF has VF Enable set. Do you care about the possibility that VF Enable has been cleared? > Reported-by: Wei Gong <gongwei833x@gmail.com> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > --- > > Wei Gong, thanks for your testing of the RFC! > As I made a small change, would appreciate re-testing. > > Thanks! > > changes from RFC: > use pci_physfn() wrapper to make the code build without PCI_IOV > > > drivers/pci/pci.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 2127aba3550b..899b3f52e84e 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -6445,8 +6445,13 @@ bool pci_devs_are_dma_aliases(struct pci_dev *dev1, struct pci_dev *dev2) > > bool pci_device_is_present(struct pci_dev *pdev) > { > + struct pci_dev *physfn = pci_physfn(pdev); > u32 v; > > + /* Not a PF? Switch to the PF. */ > + if (physfn != pdev) > + return pci_device_is_present(physfn); > + > if (pci_dev_is_disconnected(pdev)) > return false; > return pci_bus_read_dev_vendor_id(pdev->bus, pdev->devfn, &v, 0); > -- > MST >
On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote: > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote: > > virtio uses the same driver for VFs and PFs. Accordingly, > > pci_device_is_present is used to detect device presence. This function > > isn't currently working properly for VFs since it attempts reading > > device and vendor ID. > > > As VFs are present if and only if PF is present, > > just return the value for that device. > > VFs are only present when the PF is present *and* the PF has VF Enable > set. Do you care about the possibility that VF Enable has been > cleared? Can you also include a hint about how the problem manifests, and a URL to the report if available? It's beyond the scope of this patch, but I've never liked the semantics of pci_device_is_present() because it's racy by design. All it tells us is that some time in the *past*, the device was present. It's telling that almost all calls test for !pci_device_is_present(), which does make a little more sense. > > Reported-by: Wei Gong <gongwei833x@gmail.com> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > --- > > > > Wei Gong, thanks for your testing of the RFC! > > As I made a small change, would appreciate re-testing. > > > > Thanks! > > > > changes from RFC: > > use pci_physfn() wrapper to make the code build without PCI_IOV > > > > > > drivers/pci/pci.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index 2127aba3550b..899b3f52e84e 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -6445,8 +6445,13 @@ bool pci_devs_are_dma_aliases(struct pci_dev *dev1, struct pci_dev *dev2) > > > > bool pci_device_is_present(struct pci_dev *pdev) > > { > > + struct pci_dev *physfn = pci_physfn(pdev); > > u32 v; > > > > + /* Not a PF? Switch to the PF. */ > > + if (physfn != pdev) > > + return pci_device_is_present(physfn); > > + > > if (pci_dev_is_disconnected(pdev)) > > return false; > > return pci_bus_read_dev_vendor_id(pdev->bus, pdev->devfn, &v, 0); > > -- > > MST > >
On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote: > On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote: > > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote: > > > virtio uses the same driver for VFs and PFs. Accordingly, > > > pci_device_is_present is used to detect device presence. This function > > > isn't currently working properly for VFs since it attempts reading > > > device and vendor ID. > > > > > As VFs are present if and only if PF is present, > > > just return the value for that device. > > > > VFs are only present when the PF is present *and* the PF has VF Enable > > set. Do you care about the possibility that VF Enable has been > > cleared? > > Can you also include a hint about how the problem manifests, and a URL > to the report if available? Here you go: lore.kernel.org/all/20221108044819.GA861843%40zander/t.mbox.gz is it enough to include this link or do you want me to repost copying the text from there? > It's beyond the scope of this patch, but I've never liked the > semantics of pci_device_is_present() because it's racy by design. All > it tells us is that some time in the *past*, the device was present. > It's telling that almost all calls test for !pci_device_is_present(), > which does make a little more sense. I agree. The problem is in the API really. What people want is pci_device_was_removed() With surprise removal at least at the pci express level we know that there was a surprise removal event. PCI subsystem seems to chose to discard that information. There's nothing driver could do to reliably detect that - if someone pulled the card out then stuck it back in quickly driver will assume it's the old card and attempt graceful removal, which is likely to fail. However some of the problem is at the hardware level too. If you are poking at the device's config and it's pulled out and another is put back in quickly, your config access might land at the new card. Does not feel robust. I don't have a good solution for this except "try to avoid config cycles as much as you can". > > > Reported-by: Wei Gong <gongwei833x@gmail.com> > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > > --- > > > > > > Wei Gong, thanks for your testing of the RFC! > > > As I made a small change, would appreciate re-testing. > > > > > > Thanks! > > > > > > changes from RFC: > > > use pci_physfn() wrapper to make the code build without PCI_IOV > > > > > > > > > drivers/pci/pci.c | 5 +++++ > > > 1 file changed, 5 insertions(+) > > > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > > index 2127aba3550b..899b3f52e84e 100644 > > > --- a/drivers/pci/pci.c > > > +++ b/drivers/pci/pci.c > > > @@ -6445,8 +6445,13 @@ bool pci_devs_are_dma_aliases(struct pci_dev *dev1, struct pci_dev *dev2) > > > > > > bool pci_device_is_present(struct pci_dev *pdev) > > > { > > > + struct pci_dev *physfn = pci_physfn(pdev); > > > u32 v; > > > > > > + /* Not a PF? Switch to the PF. */ > > > + if (physfn != pdev) > > > + return pci_device_is_present(physfn); > > > + > > > if (pci_dev_is_disconnected(pdev)) > > > return false; > > > return pci_bus_read_dev_vendor_id(pdev->bus, pdev->devfn, &v, 0); > > > -- > > > MST > > >
On Tue, Nov 08, 2022 at 10:19:07AM -0500, Michael S. Tsirkin wrote: > On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote: > > On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote: > > > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote: > > > > virtio uses the same driver for VFs and PFs. Accordingly, > > > > pci_device_is_present is used to detect device presence. This function > > > > isn't currently working properly for VFs since it attempts reading > > > > device and vendor ID. > > > > > > > As VFs are present if and only if PF is present, > > > > just return the value for that device. > > > > > > VFs are only present when the PF is present *and* the PF has VF Enable > > > set. Do you care about the possibility that VF Enable has been > > > cleared? I think you missed this question. > > Can you also include a hint about how the problem manifests, and a URL > > to the report if available? > > Here you go: > lore.kernel.org/all/20221108044819.GA861843%40zander/t.mbox.gz > > is it enough to include this link or do you want me > to repost copying the text from there? Uh, well, OK, I guess I could dig through that and figure what what's relevant. I'd like the commit log to contain a hint of what the problem looks like and some justification for why it should be backported to stable. I still look at Documentation/process/stable-kernel-rules.rst occasionally to decide things like this, but I get the feeling that it's a little out-of-date and more restrictive than current practice. But I do think the "PF exists but VF disabled" situation needs to be clarified somehow, too. Bjorn
On Tue, Nov 08, 2022 at 11:58:53AM -0600, Bjorn Helgaas wrote: > On Tue, Nov 08, 2022 at 10:19:07AM -0500, Michael S. Tsirkin wrote: > > On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote: > > > On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote: > > > > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote: > > > > > virtio uses the same driver for VFs and PFs. Accordingly, > > > > > pci_device_is_present is used to detect device presence. This function > > > > > isn't currently working properly for VFs since it attempts reading > > > > > device and vendor ID. > > > > > > > > > As VFs are present if and only if PF is present, > > > > > just return the value for that device. > > > > > > > > VFs are only present when the PF is present *and* the PF has VF Enable > > > > set. Do you care about the possibility that VF Enable has been > > > > cleared? > > I think you missed this question. I was hoping Wei will answer that, I don't have the hardware. > > > Can you also include a hint about how the problem manifests, and a URL > > > to the report if available? > > > > Here you go: > > lore.kernel.org/all/20221108044819.GA861843%40zander/t.mbox.gz > > > > is it enough to include this link or do you want me > > to repost copying the text from there? > > Uh, well, OK, I guess I could dig through that and figure what what's > relevant. I'd like the commit log to contain a hint of what the > problem looks like and some justification for why it should be > backported to stable. > > I still look at Documentation/process/stable-kernel-rules.rst > occasionally to decide things like this, but I get the feeling that > it's a little out-of-date and more restrictive than current practice. > > But I do think the "PF exists but VF disabled" situation needs to be > clarified somehow, too. > > Bjorn
O Tue, Nov 08, 2022 at 01:02:35PM -0500, Michael S. Tsirkin wrote: > On Tue, Nov 08, 2022 at 11:58:53AM -0600, Bjorn Helgaas wrote: > > On Tue, Nov 08, 2022 at 10:19:07AM -0500, Michael S. Tsirkin wrote: > > > On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote: > > > > On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote: > > > > > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote: > > > > > > virtio uses the same driver for VFs and PFs. Accordingly, > > > > > > pci_device_is_present is used to detect device presence. This function > > > > > > isn't currently working properly for VFs since it attempts reading > > > > > > device and vendor ID. > > > > > > > > > > > As VFs are present if and only if PF is present, > > > > > > just return the value for that device. > > > > > > > > > > VFs are only present when the PF is present *and* the PF has VF Enable > > > > > set. Do you care about the possibility that VF Enable has been > > > > > cleared? > > > > I think you missed this question. > > I was hoping Wei will answer that, I don't have the hardware. In my case I don't care that VF Enable has been cleared. > > > > > Can you also include a hint about how the problem manifests, and a URL > > > > to the report if available? > > > > > > Here you go: > > > lore.kernel.org/all/20221108044819.GA861843%40zander/t.mbox.gz > > > > > > is it enough to include this link or do you want me > > > to repost copying the text from there? > > > > Uh, well, OK, I guess I could dig through that and figure what what's > > relevant. I'd like the commit log to contain a hint of what the > > problem looks like and some justification for why it should be > > backported to stable. > > > > I still look at Documentation/process/stable-kernel-rules.rst > > occasionally to decide things like this, but I get the feeling that > > it's a little out-of-date and more restrictive than current practice. > > > > But I do think the "PF exists but VF disabled" situation needs to be > > clarified somehow, too. > > > > Bjorn >
On Wed, Nov 09, 2022 at 04:36:17AM +0000, Wei Gong wrote: > O Tue, Nov 08, 2022 at 01:02:35PM -0500, Michael S. Tsirkin wrote: > > On Tue, Nov 08, 2022 at 11:58:53AM -0600, Bjorn Helgaas wrote: > > > On Tue, Nov 08, 2022 at 10:19:07AM -0500, Michael S. Tsirkin wrote: > > > > On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote: > > > > > On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote: > > > > > > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote: > > > > > > > virtio uses the same driver for VFs and PFs. > > > > > > > Accordingly, pci_device_is_present is used to detect > > > > > > > device presence. This function isn't currently working > > > > > > > properly for VFs since it attempts reading device and > > > > > > > vendor ID. > > > > > > > > > > > > > As VFs are present if and only if PF is present, > > > > > > > just return the value for that device. > > > > > > > > > > > > VFs are only present when the PF is present *and* the PF > > > > > > has VF Enable set. Do you care about the possibility that > > > > > > VF Enable has been cleared? > > > > > > I think you missed this question. > > > > I was hoping Wei will answer that, I don't have the hardware. > > In my case I don't care that VF Enable has been cleared. OK, let me rephrase that :) I think pci_device_is_present(VF) should return "false" if the PF is present but VFs are disabled. If you think it should return "true" when the PF is present and VFs are disabled, we should explain why. We would also need to fix the commit log, because "VFs are present if and only if PF is present" is not actually true. "VFs are present only if PF is present" is true, but "VFs are present if PF is present" is not. Bjorn
On Tue, Nov 08, 2022 at 11:12:34PM -0600, Bjorn Helgaas wrote: > On Wed, Nov 09, 2022 at 04:36:17AM +0000, Wei Gong wrote: > > O Tue, Nov 08, 2022 at 01:02:35PM -0500, Michael S. Tsirkin wrote: > > > On Tue, Nov 08, 2022 at 11:58:53AM -0600, Bjorn Helgaas wrote: > > > > On Tue, Nov 08, 2022 at 10:19:07AM -0500, Michael S. Tsirkin wrote: > > > > > On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote: > > > > > > On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote: > > > > > > > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote: > > > > > > > > virtio uses the same driver for VFs and PFs. > > > > > > > > Accordingly, pci_device_is_present is used to detect > > > > > > > > device presence. This function isn't currently working > > > > > > > > properly for VFs since it attempts reading device and > > > > > > > > vendor ID. > > > > > > > > > > > > > > > As VFs are present if and only if PF is present, > > > > > > > > just return the value for that device. > > > > > > > > > > > > > > VFs are only present when the PF is present *and* the PF > > > > > > > has VF Enable set. Do you care about the possibility that > > > > > > > VF Enable has been cleared? > > > > > > > > I think you missed this question. > > > > > > I was hoping Wei will answer that, I don't have the hardware. > > > > In my case I don't care that VF Enable has been cleared. > > OK, let me rephrase that :) > > I think pci_device_is_present(VF) should return "false" if the PF is > present but VFs are disabled. I agree. > > If you think it should return "true" when the PF is present and VFs > are disabled, we should explain why. I don't think it should return "true" when the PF is present and VFS are disabled. I think pci_device_is_present(VF) should return "true" if the PF is present and VFs are enabled. In the current implementation, it cannot correctly judge whether the VF is present. When the PF is present and VFs are enabled, I think it should return "true", but in fact it returns "false" Through your comments, I realize that this patch is inaccurate in judging whether VF present in the case of "the PF is present and VFs are disabled" Thinks, Wei > > We would also need to fix the commit log, because "VFs are present if > and only if PF is present" is not actually true. "VFs are present > only if PF is present" is true, but "VFs are present if PF is present" > is not. > > Bjorn
On Tue, Nov 08, 2022 at 11:12:34PM -0600, Bjorn Helgaas wrote: > On Wed, Nov 09, 2022 at 04:36:17AM +0000, Wei Gong wrote: > > O Tue, Nov 08, 2022 at 01:02:35PM -0500, Michael S. Tsirkin wrote: > > > On Tue, Nov 08, 2022 at 11:58:53AM -0600, Bjorn Helgaas wrote: > > > > On Tue, Nov 08, 2022 at 10:19:07AM -0500, Michael S. Tsirkin wrote: > > > > > On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote: > > > > > > On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote: > > > > > > > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote: > > > > > > > > virtio uses the same driver for VFs and PFs. > > > > > > > > Accordingly, pci_device_is_present is used to detect > > > > > > > > device presence. This function isn't currently working > > > > > > > > properly for VFs since it attempts reading device and > > > > > > > > vendor ID. > > > > > > > > > > > > > > > As VFs are present if and only if PF is present, > > > > > > > > just return the value for that device. > > > > > > > > > > > > > > VFs are only present when the PF is present *and* the PF > > > > > > > has VF Enable set. Do you care about the possibility that > > > > > > > VF Enable has been cleared? > > > > > > > > I think you missed this question. > > > > > > I was hoping Wei will answer that, I don't have the hardware. > > > > In my case I don't care that VF Enable has been cleared. > > OK, let me rephrase that :) > > I think pci_device_is_present(VF) should return "false" if the PF is > present but VFs are disabled. > > If you think it should return "true" when the PF is present and VFs > are disabled, we should explain why. > > We would also need to fix the commit log, because "VFs are present if > and only if PF is present" is not actually true. "VFs are present > only if PF is present" is true, but "VFs are present if PF is present" > is not. > > Bjorn Bjorn, I don't really understand the question. How does one get a vf pointer without enabling sriov? They are only created by sriov_add_vfs after calling pcibios_sriov_enable.
On Wed, Nov 09, 2022 at 02:10:30AM -0500, Michael S. Tsirkin wrote: > On Tue, Nov 08, 2022 at 11:12:34PM -0600, Bjorn Helgaas wrote: > > On Wed, Nov 09, 2022 at 04:36:17AM +0000, Wei Gong wrote: > > > O Tue, Nov 08, 2022 at 01:02:35PM -0500, Michael S. Tsirkin wrote: > > > > On Tue, Nov 08, 2022 at 11:58:53AM -0600, Bjorn Helgaas wrote: > > > > > On Tue, Nov 08, 2022 at 10:19:07AM -0500, Michael S. Tsirkin wrote: > > > > > > On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote: > > > > > > > On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote: > > > > > > > > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote: > > > > > > > > > virtio uses the same driver for VFs and PFs. > > > > > > > > > Accordingly, pci_device_is_present is used to detect > > > > > > > > > device presence. This function isn't currently working > > > > > > > > > properly for VFs since it attempts reading device and > > > > > > > > > vendor ID. > > > > > > > > > > > > > > > > > As VFs are present if and only if PF is present, > > > > > > > > > just return the value for that device. > > > > > > > > > > > > > > > > VFs are only present when the PF is present *and* the PF > > > > > > > > has VF Enable set. Do you care about the possibility that > > > > > > > > VF Enable has been cleared? > > > > > > > > > > I think you missed this question. > > > > > > > > I was hoping Wei will answer that, I don't have the hardware. > > > > > > In my case I don't care that VF Enable has been cleared. > > > > OK, let me rephrase that :) > > > > I think pci_device_is_present(VF) should return "false" if the PF is > > present but VFs are disabled. > > > > If you think it should return "true" when the PF is present and VFs > > are disabled, we should explain why. > > > > We would also need to fix the commit log, because "VFs are present if > > and only if PF is present" is not actually true. "VFs are present > > only if PF is present" is true, but "VFs are present if PF is present" > > is not. > > Bjorn, I don't really understand the question. > > How does one get a vf pointer without enabling sriov? > They are only created by sriov_add_vfs after calling > pcibios_sriov_enable. Oh, I think I see where you're coming from. The fact that we have a VF pointer means VFs were enabled in the past, and as long as the PF is still present, the VFs should still be enabled. Since the continued existence of the VF device depends on VF Enable, I guess my question is whether we need to worry about VF Enable being cleared, e.g., via sysfs reset or a buggy PF driver. Taking a step back, I don't understand the "if (!pci_device_is_present()) virtio_break_device()" strategy because checking for device presence is always unreliable. I assume the consumer of vq->broken, e.g., virtnet_send_command(), would see a failed PCI read that probably returns ~0 data. Could it not check for that and then figure out whether that's valid data or an error indication? It looks like today, virtnet_send_command() might sit in that "while" loop calling virtqueue_get_buf() repeatedly until virtio_pci_remove() notices the device is gone and marks it broken. Something must be failing in virtqueue_get_buf() in that interval between the device disappearing and virtio_pci_remove() noticing it. Bjorn
On Wed, Nov 09, 2022 at 11:30:29AM -0600, Bjorn Helgaas wrote: > On Wed, Nov 09, 2022 at 02:10:30AM -0500, Michael S. Tsirkin wrote: > > On Tue, Nov 08, 2022 at 11:12:34PM -0600, Bjorn Helgaas wrote: > > > On Wed, Nov 09, 2022 at 04:36:17AM +0000, Wei Gong wrote: > > > > O Tue, Nov 08, 2022 at 01:02:35PM -0500, Michael S. Tsirkin wrote: > > > > > On Tue, Nov 08, 2022 at 11:58:53AM -0600, Bjorn Helgaas wrote: > > > > > > On Tue, Nov 08, 2022 at 10:19:07AM -0500, Michael S. Tsirkin wrote: > > > > > > > On Tue, Nov 08, 2022 at 09:02:28AM -0600, Bjorn Helgaas wrote: > > > > > > > > On Tue, Nov 08, 2022 at 08:53:00AM -0600, Bjorn Helgaas wrote: > > > > > > > > > On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote: > > > > > > > > > > virtio uses the same driver for VFs and PFs. > > > > > > > > > > Accordingly, pci_device_is_present is used to detect > > > > > > > > > > device presence. This function isn't currently working > > > > > > > > > > properly for VFs since it attempts reading device and > > > > > > > > > > vendor ID. > > > > > > > > > > > > > > > > > > > As VFs are present if and only if PF is present, > > > > > > > > > > just return the value for that device. > > > > > > > > > > > > > > > > > > VFs are only present when the PF is present *and* the PF > > > > > > > > > has VF Enable set. Do you care about the possibility that > > > > > > > > > VF Enable has been cleared? > > > > > > > > > > > > I think you missed this question. > > > > > > > > > > I was hoping Wei will answer that, I don't have the hardware. > > > > > > > > In my case I don't care that VF Enable has been cleared. > > > > > > OK, let me rephrase that :) > > > > > > I think pci_device_is_present(VF) should return "false" if the PF is > > > present but VFs are disabled. > > > > > > If you think it should return "true" when the PF is present and VFs > > > are disabled, we should explain why. > > > > > > We would also need to fix the commit log, because "VFs are present if > > > and only if PF is present" is not actually true. "VFs are present > > > only if PF is present" is true, but "VFs are present if PF is present" > > > is not. > > > > Bjorn, I don't really understand the question. > > > > How does one get a vf pointer without enabling sriov? > > They are only created by sriov_add_vfs after calling > > pcibios_sriov_enable. > > Oh, I think I see where you're coming from. The fact that we have a > VF pointer means VFs were enabled in the past, and as long as the PF > is still present, the VFs should still be enabled. > > Since the continued existence of the VF device depends on VF Enable, I > guess my question is whether we need to worry about VF Enable being > cleared, e.g., via sysfs reset or a buggy PF driver. > > Taking a step back, I don't understand the > "if (!pci_device_is_present()) virtio_break_device()" strategy because > checking for device presence is always unreliable. The point is to break out of loops. > I assume the > consumer of vq->broken, e.g., virtnet_send_command(), would see a > failed PCI read that probably returns ~0 data. Could it not check for > that and then figure out whether that's valid data or an error > indication? No, it's not doing any reads - it is waiting for a DMA. > It looks like today, virtnet_send_command() might sit in that "while" > loop calling virtqueue_get_buf() repeatedly until virtio_pci_remove() > notices the device is gone and marks it broken. Something must be > failing in virtqueue_get_buf() in that interval between the device > disappearing and virtio_pci_remove() noticing it. > > Bjorn Nope - it is just doing posted writes, these disappear into thin ether if there's no target.
Hi Wei, I can't quite parse this. Is the problem that you had some virtio I/O in progress, you wrote "0" to /sys/.../sriov_numvfs, and the virtio I/O operation hangs? Is there any indication to the user, e.g., softlockup oops? More questions below. On Tue, Nov 08, 2022 at 04:52:16AM +0000, Wei Gong wrote: > according to sriov's protocol specification vendor_id and > device_id field in all VFs return FFFFh when read > so when vf devs is in the pci_device_is_present,it will be > misjudged as surprise removeal > > when io is issued on the vf, normally disable virtio_blk vf > devs,at this time the disable opration will hang. and virtio > blk dev io hang. > > task:bash state:D stack: 0 pid: 1773 ppid: 1241 flags:0x00004002 > Call Trace: > <TASK> > __schedule+0x2ee/0x900 > schedule+0x4f/0xc0 > blk_mq_freeze_queue_wait+0x69/0xa0 > ? wait_woken+0x80/0x80 > blk_mq_freeze_queue+0x1b/0x20 > blk_cleanup_queue+0x3d/0xd0 > virtblk_remove+0x3c/0xb0 [virtio_blk] > virtio_dev_remove+0x4b/0x80 > device_release_driver_internal+0x103/0x1d0 > device_release_driver+0x12/0x20 > bus_remove_device+0xe1/0x150 > device_del+0x192/0x3f0 > device_unregister+0x1b/0x60 > unregister_virtio_device+0x18/0x30 > virtio_pci_remove+0x41/0x80 > pci_device_remove+0x3e/0xb0 > device_release_driver_internal+0x103/0x1d0 > device_release_driver+0x12/0x20 > pci_stop_bus_device+0x79/0xa0 > pci_stop_and_remove_bus_device+0x13/0x20 > pci_iov_remove_virtfn+0xc5/0x130 > ? pci_get_device+0x4a/0x60 > sriov_disable+0x33/0xf0 > pci_disable_sriov+0x26/0x30 > virtio_pci_sriov_configure+0x6f/0xa0 > sriov_numvfs_store+0x104/0x140 > dev_attr_store+0x17/0x30 > sysfs_kf_write+0x3e/0x50 > kernfs_fop_write_iter+0x138/0x1c0 > new_sync_write+0x117/0x1b0 > vfs_write+0x185/0x250 > ksys_write+0x67/0xe0 > __x64_sys_write+0x1a/0x20 > do_syscall_64+0x61/0xb0 > entry_SYSCALL_64_after_hwframe+0x44/0xae > RIP: 0033:0x7f21bd1f3ba4 > RSP: 002b:00007ffd34a24188 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 > RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f21bd1f3ba4 > RDX: 0000000000000002 RSI: 0000560305040800 RDI: 0000000000000001 > RBP: 0000560305040800 R08: 000056030503fd50 R09: 0000000000000073 > R10: 00000000ffffffff R11: 0000000000000202 R12: 0000000000000002 > R13: 00007f21bd2de760 R14: 00007f21bd2da5e0 R15: 00007f21bd2d99e0 > > when virtio_blk is performing io, as long as there two stages of: > 1. dispatch io. queue_usage_counter++; > 2. io is completed after receiving the interrupt. queue_usage_counter--; > > disable virtio_blk vfs: > if(!pci_device_is_present(pci_dev)) > virtio_break_device(&vp_dev->vdev); > virtqueue for vf devs will be marked broken. > the interrupt notification io is end. Since it is judged that the > virtqueue has been marked as broken, the completed io will not be > performed. > So queue_usage_counter will not be cleared. > when the disk is removed at the same time, the queue will be frozen, > and you must wait for the queue_usage_counter to be cleared. > Therefore, it leads to the removeal of hang. I want to follow along in the code, but I need some hints. "queue_usage_counter" looks like it's supposed to be a symbol, but I can't find it. Where (which function) is the I/O dispatched and queue_usage_counter incremented? Where is queue_usage_counter decremented? Prior to this change pci_device_is_present(VF) returned "false" (because the VF Vendor ID is 0xffff); after the change it will return "true" (because it will look at the PF Vendor ID instead). Previously virtio_pci_remove() called virtio_break_device(). I guess that meant the virtio I/O operation will never be completed? But if we don't call virtio_break_device(), the virtio I/O operation *will* be completed? Bjorn
On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote: > Hi Wei, > > I can't quite parse this. Is the problem that you had some virtio I/O > in progress, you wrote "0" to /sys/.../sriov_numvfs, and the virtio > I/O operation hangs? I think so. I also think that just attempting to remove the module or to unbind the driver from it will have the same effect. > Is there any indication to the user, e.g., softlockup oops? > > More questions below. > > On Tue, Nov 08, 2022 at 04:52:16AM +0000, Wei Gong wrote: > > > according to sriov's protocol specification vendor_id and > > device_id field in all VFs return FFFFh when read > > so when vf devs is in the pci_device_is_present,it will be > > misjudged as surprise removeal > > > > when io is issued on the vf, normally disable virtio_blk vf > > devs,at this time the disable opration will hang. and virtio > > blk dev io hang. > > > > task:bash state:D stack: 0 pid: 1773 ppid: 1241 flags:0x00004002 > > Call Trace: > > <TASK> > > __schedule+0x2ee/0x900 > > schedule+0x4f/0xc0 > > blk_mq_freeze_queue_wait+0x69/0xa0 > > ? wait_woken+0x80/0x80 > > blk_mq_freeze_queue+0x1b/0x20 > > blk_cleanup_queue+0x3d/0xd0 > > virtblk_remove+0x3c/0xb0 [virtio_blk] > > virtio_dev_remove+0x4b/0x80 > > device_release_driver_internal+0x103/0x1d0 > > device_release_driver+0x12/0x20 > > bus_remove_device+0xe1/0x150 > > device_del+0x192/0x3f0 > > device_unregister+0x1b/0x60 > > unregister_virtio_device+0x18/0x30 > > virtio_pci_remove+0x41/0x80 > > pci_device_remove+0x3e/0xb0 > > device_release_driver_internal+0x103/0x1d0 > > device_release_driver+0x12/0x20 > > pci_stop_bus_device+0x79/0xa0 > > pci_stop_and_remove_bus_device+0x13/0x20 > > pci_iov_remove_virtfn+0xc5/0x130 > > ? pci_get_device+0x4a/0x60 > > sriov_disable+0x33/0xf0 > > pci_disable_sriov+0x26/0x30 > > virtio_pci_sriov_configure+0x6f/0xa0 > > sriov_numvfs_store+0x104/0x140 > > dev_attr_store+0x17/0x30 > > sysfs_kf_write+0x3e/0x50 > > kernfs_fop_write_iter+0x138/0x1c0 > > new_sync_write+0x117/0x1b0 > > vfs_write+0x185/0x250 > > ksys_write+0x67/0xe0 > > __x64_sys_write+0x1a/0x20 > > do_syscall_64+0x61/0xb0 > > entry_SYSCALL_64_after_hwframe+0x44/0xae > > RIP: 0033:0x7f21bd1f3ba4 > > RSP: 002b:00007ffd34a24188 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 > > RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f21bd1f3ba4 > > RDX: 0000000000000002 RSI: 0000560305040800 RDI: 0000000000000001 > > RBP: 0000560305040800 R08: 000056030503fd50 R09: 0000000000000073 > > R10: 00000000ffffffff R11: 0000000000000202 R12: 0000000000000002 > > R13: 00007f21bd2de760 R14: 00007f21bd2da5e0 R15: 00007f21bd2d99e0 > > > > when virtio_blk is performing io, as long as there two stages of: > > 1. dispatch io. queue_usage_counter++; > > 2. io is completed after receiving the interrupt. queue_usage_counter--; > > > > disable virtio_blk vfs: > > if(!pci_device_is_present(pci_dev)) > > virtio_break_device(&vp_dev->vdev); > > virtqueue for vf devs will be marked broken. > > the interrupt notification io is end. Since it is judged that the > > virtqueue has been marked as broken, the completed io will not be > > performed. > > So queue_usage_counter will not be cleared. > > when the disk is removed at the same time, the queue will be frozen, > > and you must wait for the queue_usage_counter to be cleared. > > Therefore, it leads to the removeal of hang. > > I want to follow along in the code, but I need some hints. > > "queue_usage_counter" looks like it's supposed to be a symbol, but I > can't find it. I think it refers to q->q_usage_counter in blk core. > Where (which function) is the I/O dispatched and queue_usage_counter > incremented? Where is queue_usage_counter decremented? > > Prior to this change pci_device_is_present(VF) returned "false" > (because the VF Vendor ID is 0xffff); after the change it will return > "true" (because it will look at the PF Vendor ID instead). > > Previously virtio_pci_remove() called virtio_break_device(). I guess > that meant the virtio I/O operation will never be completed? > > But if we don't call virtio_break_device(), the virtio I/O operation > *will* be completed? > > Bjorn It's completed anyway - nothing special happened at the device level - but driver does not detect it. Calling virtio_break_device will mark all queues as broken, as a result attempts to check whether operation completed will return false. This probably means we need to work on handling surprise removal better in virtio blk - since it looks like actual suprise removal will hang too. But that I think is a separate issue.
On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote: > Hi Wei, > > I can't quite parse this. Is the problem that you had some virtio I/O > in progress, you wrote "0" to /sys/.../sriov_numvfs, and the virtio > I/O operation hangs? It is, and has the same effect when unbind driver. > > Is there any indication to the user, e.g., softlockup oops? There is no more user output than the Trace stack listed below. > > More questions below. > > On Tue, Nov 08, 2022 at 04:52:16AM +0000, Wei Gong wrote: > > > according to sriov's protocol specification vendor_id and > > device_id field in all VFs return FFFFh when read > > so when vf devs is in the pci_device_is_present,it will be > > misjudged as surprise removeal > > > > when io is issued on the vf, normally disable virtio_blk vf > > devs,at this time the disable opration will hang. and virtio > > blk dev io hang. > > > > task:bash state:D stack: 0 pid: 1773 ppid: 1241 flags:0x00004002 > > Call Trace: > > <TASK> > > __schedule+0x2ee/0x900 > > schedule+0x4f/0xc0 > > blk_mq_freeze_queue_wait+0x69/0xa0 > > ? wait_woken+0x80/0x80 > > blk_mq_freeze_queue+0x1b/0x20 > > blk_cleanup_queue+0x3d/0xd0 > > virtblk_remove+0x3c/0xb0 [virtio_blk] > > virtio_dev_remove+0x4b/0x80 > > device_release_driver_internal+0x103/0x1d0 > > device_release_driver+0x12/0x20 > > bus_remove_device+0xe1/0x150 > > device_del+0x192/0x3f0 > > device_unregister+0x1b/0x60 > > unregister_virtio_device+0x18/0x30 > > virtio_pci_remove+0x41/0x80 > > pci_device_remove+0x3e/0xb0 > > device_release_driver_internal+0x103/0x1d0 > > device_release_driver+0x12/0x20 > > pci_stop_bus_device+0x79/0xa0 > > pci_stop_and_remove_bus_device+0x13/0x20 > > pci_iov_remove_virtfn+0xc5/0x130 > > ? pci_get_device+0x4a/0x60 > > sriov_disable+0x33/0xf0 > > pci_disable_sriov+0x26/0x30 > > virtio_pci_sriov_configure+0x6f/0xa0 > > sriov_numvfs_store+0x104/0x140 > > dev_attr_store+0x17/0x30 > > sysfs_kf_write+0x3e/0x50 > > kernfs_fop_write_iter+0x138/0x1c0 > > new_sync_write+0x117/0x1b0 > > vfs_write+0x185/0x250 > > ksys_write+0x67/0xe0 > > __x64_sys_write+0x1a/0x20 > > do_syscall_64+0x61/0xb0 > > entry_SYSCALL_64_after_hwframe+0x44/0xae > > RIP: 0033:0x7f21bd1f3ba4 > > RSP: 002b:00007ffd34a24188 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 > > RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f21bd1f3ba4 > > RDX: 0000000000000002 RSI: 0000560305040800 RDI: 0000000000000001 > > RBP: 0000560305040800 R08: 000056030503fd50 R09: 0000000000000073 > > R10: 00000000ffffffff R11: 0000000000000202 R12: 0000000000000002 > > R13: 00007f21bd2de760 R14: 00007f21bd2da5e0 R15: 00007f21bd2d99e0 > > > > when virtio_blk is performing io, as long as there two stages of: > > 1. dispatch io. queue_usage_counter++; > > 2. io is completed after receiving the interrupt. queue_usage_counter--; > > > > disable virtio_blk vfs: > > if(!pci_device_is_present(pci_dev)) > > virtio_break_device(&vp_dev->vdev); > > virtqueue for vf devs will be marked broken. > > the interrupt notification io is end. Since it is judged that the > > virtqueue has been marked as broken, the completed io will not be > > performed. > > So queue_usage_counter will not be cleared. > > when the disk is removed at the same time, the queue will be frozen, > > and you must wait for the queue_usage_counter to be cleared. > > Therefore, it leads to the removeal of hang. > > I want to follow along in the code, but I need some hints. > > "queue_usage_counter" looks like it's supposed to be a symbol, but I > can't find it. > > Where (which function) is the I/O dispatched and queue_usage_counter > incremented? Where is queue_usage_counter decremented? queue_usage_counter is blk core q->q_usage_counter. In blk_mq: submit_bio -> submit_bio_noacct -> __submit_bio_noacct_mq -> bio_queue_enter bio_queue_enter: q_usage_counter++ Complete io: vp_vring_interrupt -> vring_interrupt ( if (vq->broken) { return; } ) -> virtblk_done -> blk_mq_complete_request -> virtblk_request_done -> blk_mq_end_request -> blk_mq_free_request -> blk_queue_exit blk_queue_exit: q_usage_counter-- > > Prior to this change pci_device_is_present(VF) returned "false" > (because the VF Vendor ID is 0xffff); after the change it will return > "true" (because it will look at the PF Vendor ID instead). > > Previously virtio_pci_remove() called virtio_break_device(). I guess > that meant the virtio I/O operation will never be completed? > > But if we don't call virtio_break_device(), the virtio I/O operation > *will* be completed? > > Bjorn I think so. In virtblk I think I don't want virtio_break_device() to be called because it need to complete, but it would be well if it could be called at the right time. The problem new is that it's being called in a situation where it shouldn't, and that's because pci_device_is_present() is giving incorrect return. Wei
On Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote: > virtio uses the same driver for VFs and PFs. Accordingly, > pci_device_is_present is used to detect device presence. This function > isn't currently working properly for VFs since it attempts reading > device and vendor ID. As VFs are present if and only if PF is present, > just return the value for that device. > > Reported-by: Wei Gong <gongwei833x@gmail.com> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Applied as below to pci/enumeration for v6.2, thanks! commit 98b04dd0b457 ("PCI: Fix pci_device_is_present() for VFs by checking PF") Author: Michael S. Tsirkin <mst@redhat.com> Date: Wed Oct 26 02:11:21 2022 -0400 PCI: Fix pci_device_is_present() for VFs by checking PF pci_device_is_present() previously didn't work for VFs because it reads the Vendor and Device ID, which are 0xffff for VFs, which looks like they aren't present. Check the PF instead. Wei Gong reported that if virtio I/O is in progress when the driver is unbound or "0" is written to /sys/.../sriov_numvfs, the virtio I/O operation hangs, which may result in output like this: task:bash state:D stack: 0 pid: 1773 ppid: 1241 flags:0x00004002 Call Trace: schedule+0x4f/0xc0 blk_mq_freeze_queue_wait+0x69/0xa0 blk_mq_freeze_queue+0x1b/0x20 blk_cleanup_queue+0x3d/0xd0 virtblk_remove+0x3c/0xb0 [virtio_blk] virtio_dev_remove+0x4b/0x80 ... device_unregister+0x1b/0x60 unregister_virtio_device+0x18/0x30 virtio_pci_remove+0x41/0x80 pci_device_remove+0x3e/0xb0 This happened because pci_device_is_present(VF) returned "false" in virtio_pci_remove(), so it called virtio_break_device(). The broken vq meant that vring_interrupt() skipped the vq.callback() that would have completed the virtio I/O operation via virtblk_done(). [bhelgaas: commit log, simplify to always use pci_physfn(), add stable tag] Link: https://lore.kernel.org/r/20221026060912.173250-1-mst@redhat.com Reported-by: Wei Gong <gongwei833x@gmail.com> Tested-by: Wei Gong <gongwei833x@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 9f3cc829dfee..fba95486caaf 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -6447,6 +6447,8 @@ bool pci_device_is_present(struct pci_dev *pdev) { u32 v; + /* Check PF if pdev is a VF, since VF Vendor/Device IDs are 0xffff */ + pdev = pci_physfn(pdev); if (pci_dev_is_disconnected(pdev)) return false; return pci_bus_read_dev_vendor_id(pdev->bus, pdev->devfn, &v, 0);
On Thu, Nov 10, 2022 at 03:15:55PM -0500, Michael S. Tsirkin wrote: > On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote: > ... > > Prior to this change pci_device_is_present(VF) returned "false" > > (because the VF Vendor ID is 0xffff); after the change it will return > > "true" (because it will look at the PF Vendor ID instead). > > > > Previously virtio_pci_remove() called virtio_break_device(). I guess > > that meant the virtio I/O operation will never be completed? > > > > But if we don't call virtio_break_device(), the virtio I/O operation > > *will* be completed? > > It's completed anyway - nothing special happened at the device > level - but driver does not detect it. > > Calling virtio_break_device will mark all queues as broken, as > a result attempts to check whether operation completed > will return false. > > This probably means we need to work on handling surprise removal > better in virtio blk - since it looks like actual suprise > removal will hang too. But that I think is a separate issue. Yeah, this situation doesn't seem like it's inherently special for virtio or VFs, so it's a little surprising to see pci_device_is_present() used there. Bjorn
On Fri, Nov 11, 2022 at 05:42:19PM -0600, Bjorn Helgaas wrote: > On Thu, Nov 10, 2022 at 03:15:55PM -0500, Michael S. Tsirkin wrote: > > On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote: > > ... > > > > Prior to this change pci_device_is_present(VF) returned "false" > > > (because the VF Vendor ID is 0xffff); after the change it will return > > > "true" (because it will look at the PF Vendor ID instead). > > > > > > Previously virtio_pci_remove() called virtio_break_device(). I guess > > > that meant the virtio I/O operation will never be completed? > > > > > > But if we don't call virtio_break_device(), the virtio I/O operation > > > *will* be completed? > > > > It's completed anyway - nothing special happened at the device > > level - but driver does not detect it. > > > > Calling virtio_break_device will mark all queues as broken, as > > a result attempts to check whether operation completed > > will return false. > > > > This probably means we need to work on handling surprise removal > > better in virtio blk - since it looks like actual suprise > > removal will hang too. But that I think is a separate issue. > > Yeah, this situation doesn't seem like it's inherently special for > virtio or VFs, so it's a little surprising to see > pci_device_is_present() used there. > > Bjorn Just making sure - pci_device_is_present *is* the suggested way to distinguish between graceful and surprise removal, isn't it?
[+cc Lukas; you can probably give a better answer here :)] On Sun, Nov 13, 2022 at 03:46:06AM -0500, Michael S. Tsirkin wrote: > On Fri, Nov 11, 2022 at 05:42:19PM -0600, Bjorn Helgaas wrote: > > On Thu, Nov 10, 2022 at 03:15:55PM -0500, Michael S. Tsirkin wrote: > > > On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote: > > > ... > > > > > > Prior to this change pci_device_is_present(VF) returned "false" > > > > (because the VF Vendor ID is 0xffff); after the change it will return > > > > "true" (because it will look at the PF Vendor ID instead). > > > > > > > > Previously virtio_pci_remove() called virtio_break_device(). I guess > > > > that meant the virtio I/O operation will never be completed? > > > > > > > > But if we don't call virtio_break_device(), the virtio I/O operation > > > > *will* be completed? > > > > > > It's completed anyway - nothing special happened at the device > > > level - but driver does not detect it. > > > > > > Calling virtio_break_device will mark all queues as broken, as > > > a result attempts to check whether operation completed > > > will return false. > > > > > > This probably means we need to work on handling surprise removal > > > better in virtio blk - since it looks like actual suprise > > > removal will hang too. But that I think is a separate issue. > > > > Yeah, this situation doesn't seem like it's inherently special for > > virtio or VFs, so it's a little surprising to see > > pci_device_is_present() used there. > > Just making sure - pci_device_is_present *is* the suggested way > to distinguish between graceful and surprise removal, isn't it? I'm not quite sure what you're asking here. A driver would learn about a graceful removal when its .remove() method is called before the device is physically removed. The device is still accessible and everything should just work. A driver would learn about a surprise removal either by a read result that is PCI_POSSIBLE_ERROR() or possibly when its .error_detected() callback is called. The .remove() method will eventually be called when we destroy the pci_dev. I guess .remove() might use pci_device_is_present() and assume that if it returns "true", this is a graceful removal. But that's not reliable since the device could be physically removed between the pci_device_is_present() call and the driver's next access to it. I think the best thing would be for .remove() to just do whatever it needs to do and look for errors, e.g., using PCI_POSSIBLE_ERROR(), without relying on pci_device_is_present(). If .remove() wants to avoid doing something that might cause an error, maybe we should expose pci_dev_is_disconnected(). That's set by the hotplug remove paths before .remove() is called and feels a little less racy. Bjorn
[cc += Parav Pandit, author of 43bb40c5b926] On Sun, Nov 13, 2022 at 03:46:06AM -0500, Michael S. Tsirkin wrote: > On Fri, Nov 11, 2022 at 05:42:19PM -0600, Bjorn Helgaas wrote: > > On Thu, Nov 10, 2022 at 03:15:55PM -0500, Michael S. Tsirkin wrote: > > > On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote: > > > > Prior to this change pci_device_is_present(VF) returned "false" > > > > (because the VF Vendor ID is 0xffff); after the change it will return > > > > "true" (because it will look at the PF Vendor ID instead). > > > > > > > > Previously virtio_pci_remove() called virtio_break_device(). I guess > > > > that meant the virtio I/O operation will never be completed? > > > > > > > > But if we don't call virtio_break_device(), the virtio I/O operation > > > > *will* be completed? > > Just making sure - pci_device_is_present *is* the suggested way > to distinguish between graceful and surprise removal, isn't it? No, it's not. Instead of !pci_device_is_present() you really want to call pci_dev_is_disconnected() instead. While the fix Bjorn applied for v6.2 may solve the issue and may make sense on it's own, it's not the solution you're looking for. You want to swap the call to !pci_device_is_present() with pci_dev_is_disconnected(), move pci_dev_is_disconnected() from drivers/pci/pci.h to include/linux/pci.h and add a Fixes tag referencing 43bb40c5b926. If you don't want to move pci_dev_is_disconnected(), you can alternatively check for "pdev->error_state == pci_channel_io_perm_failure" or call pci_channel_offline(). The latter will also return true though on transient inaccessibility of the device (e.g. if it's being reset). The theory of operation is as follows: The PCI layer does indeed know whether the device was surprise removed or gracefully removed and that information is passed in the "presence" flag to pciehp_unconfigure_device() (in drivers/pci/hotplug/pciehp_pci.c). That function does the following: if (!presence) pci_walk_bus(parent, pci_dev_set_disconnected, NULL); In other words, pdev->error_state is set to pci_channel_io_perm_failure on the entire hierarchy below the hotplug port. And pci_dev_is_disconnected() simply checks whether that's the device's error_state. pci_dev_is_disconnected() makes sense if you definitely know the device is gone and want to skip certain steps or delays on device teardown. However be aware that the device may be hot-removed after graceful removal was initiated. In such a situation, pci_dev_is_disconnected() may return false and you'll try to access the device as normal, even though it was yanked from the slot after the pci_dev_is_disconnected() call was performed. Ideally you should be able to cope with such scenarios as well. For some more background info, refer to this LWN article (scroll down to the "Surprise removal" section): https://lwn.net/Articles/767885/ Thanks, Lukas
> From: Lukas Wunner <lukas@wunner.de> > Sent: Wednesday, November 16, 2022 6:16 AM > > [cc += Parav Pandit, author of 43bb40c5b926] > > On Sun, Nov 13, 2022 at 03:46:06AM -0500, Michael S. Tsirkin wrote: > > On Fri, Nov 11, 2022 at 05:42:19PM -0600, Bjorn Helgaas wrote: > > > On Thu, Nov 10, 2022 at 03:15:55PM -0500, Michael S. Tsirkin wrote: > > > > On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote: > > > > > Prior to this change pci_device_is_present(VF) returned "false" > > > > > (because the VF Vendor ID is 0xffff); after the change it will > > > > > return "true" (because it will look at the PF Vendor ID instead). > > > > > > > > > > Previously virtio_pci_remove() called virtio_break_device(). I > > > > > guess that meant the virtio I/O operation will never be completed? > > > > > > > > > > But if we don't call virtio_break_device(), the virtio I/O > > > > > operation > > > > > *will* be completed? > > > > Just making sure - pci_device_is_present *is* the suggested way to > > distinguish between graceful and surprise removal, isn't it? > > No, it's not. Instead of !pci_device_is_present() you really want to call > pci_dev_is_disconnected() instead. > > While the fix Bjorn applied for v6.2 may solve the issue and may make sense > on it's own, it's not the solution you're looking for. You want to swap the > call to !pci_device_is_present() with pci_dev_is_disconnected(), move > pci_dev_is_disconnected() from drivers/pci/pci.h to include/linux/pci.h and > add a Fixes tag referencing 43bb40c5b926. > > If you don't want to move pci_dev_is_disconnected(), you can alternatively > check for "pdev->error_state == pci_channel_io_perm_failure" or call > pci_channel_offline(). The latter will also return true though on transient > inaccessibility of the device (e.g. if it's being reset). > pci_device_is_present() is calling pci_dev_is_disconnected(). pci_dev_is_disconnected() avoids reading the vendor id. So pci_dev_is_disconnected() looks less strong check. I see that it can return a valid value on recoverable error case. In that case, is pci_channel_offline() a more precise way to check that covers transient and permanent error? And if that is the right check, we need to fix all the callers, mainly widely used nvme driver [1]. [1] https://elixir.bootlin.com/linux/v6.1-rc5/source/drivers/nvme/host/pci.c#L3228 Also, we need to add API documentation on when to use this API in context of hotplug, so that all related drivers can consistently use single API. > The theory of operation is as follows: The PCI layer does indeed know > whether the device was surprise removed or gracefully removed and that > information is passed in the "presence" flag to pciehp_unconfigure_device() > (in drivers/pci/hotplug/pciehp_pci.c). That function does the following: > > if (!presence) > pci_walk_bus(parent, pci_dev_set_disconnected, NULL); > > In other words, pdev->error_state is set to pci_channel_io_perm_failure on > the entire hierarchy below the hotplug port. And pci_dev_is_disconnected() > simply checks whether that's the device's error_state. > > pci_dev_is_disconnected() makes sense if you definitely know the device is > gone and want to skip certain steps or delays on device teardown. > However be aware that the device may be hot-removed after graceful > removal was initiated. In such a situation, pci_dev_is_disconnected() may > return false and you'll try to access the device as normal, even though it was > yanked from the slot after the pci_dev_is_disconnected() call was > performed. Ideally you should be able to cope with such scenarios as well. > > For some more background info, refer to this LWN article (scroll down to the > "Surprise removal" section): > https://lwn.net/Articles/767885/ > > Thanks, > > Lukas
On Thu, Nov 17, 2022 at 05:36:48AM +0000, Parav Pandit wrote: > > > From: Lukas Wunner <lukas@wunner.de> > > Sent: Wednesday, November 16, 2022 6:16 AM > > > > [cc += Parav Pandit, author of 43bb40c5b926] > > > > On Sun, Nov 13, 2022 at 03:46:06AM -0500, Michael S. Tsirkin wrote: > > > On Fri, Nov 11, 2022 at 05:42:19PM -0600, Bjorn Helgaas wrote: > > > > On Thu, Nov 10, 2022 at 03:15:55PM -0500, Michael S. Tsirkin wrote: > > > > > On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote: > > > > > > Prior to this change pci_device_is_present(VF) returned "false" > > > > > > (because the VF Vendor ID is 0xffff); after the change it will > > > > > > return "true" (because it will look at the PF Vendor ID instead). > > > > > > > > > > > > Previously virtio_pci_remove() called virtio_break_device(). I > > > > > > guess that meant the virtio I/O operation will never be completed? > > > > > > > > > > > > But if we don't call virtio_break_device(), the virtio I/O > > > > > > operation > > > > > > *will* be completed? > > > > > > Just making sure - pci_device_is_present *is* the suggested way to > > > distinguish between graceful and surprise removal, isn't it? > > > > No, it's not. Instead of !pci_device_is_present() you really want to call > > pci_dev_is_disconnected() instead. > > > > While the fix Bjorn applied for v6.2 may solve the issue and may make sense > > on it's own, it's not the solution you're looking for. You want to swap the > > call to !pci_device_is_present() with pci_dev_is_disconnected(), move > > pci_dev_is_disconnected() from drivers/pci/pci.h to include/linux/pci.h and > > add a Fixes tag referencing 43bb40c5b926. > > > > If you don't want to move pci_dev_is_disconnected(), you can alternatively > > check for "pdev->error_state == pci_channel_io_perm_failure" or call > > pci_channel_offline(). The latter will also return true though on transient > > inaccessibility of the device (e.g. if it's being reset). > > > pci_device_is_present() is calling pci_dev_is_disconnected(). > pci_dev_is_disconnected() avoids reading the vendor id. > So pci_dev_is_disconnected() looks less strong check. > I see that it can return a valid value on recoverable error case. > > In that case, is pci_channel_offline() a more precise way to check that covers transient and permanent error? > > And if that is the right check, we need to fix all the callers, mainly widely used nvme driver [1]. > > [1] https://elixir.bootlin.com/linux/v6.1-rc5/source/drivers/nvme/host/pci.c#L3228 > > Also, we need to add API documentation on when to use this API in context of hotplug, so that all related drivers can consistently use single API. Bjorn, Lukas, what's your take on this idea? Thanks! > > The theory of operation is as follows: The PCI layer does indeed know > > whether the device was surprise removed or gracefully removed and that > > information is passed in the "presence" flag to pciehp_unconfigure_device() > > (in drivers/pci/hotplug/pciehp_pci.c). That function does the following: > > > > if (!presence) > > pci_walk_bus(parent, pci_dev_set_disconnected, NULL); > > > > In other words, pdev->error_state is set to pci_channel_io_perm_failure on > > the entire hierarchy below the hotplug port. And pci_dev_is_disconnected() > > simply checks whether that's the device's error_state. > > > > pci_dev_is_disconnected() makes sense if you definitely know the device is > > gone and want to skip certain steps or delays on device teardown. > > However be aware that the device may be hot-removed after graceful > > removal was initiated. In such a situation, pci_dev_is_disconnected() may > > return false and you'll try to access the device as normal, even though it was > > yanked from the slot after the pci_dev_is_disconnected() call was > > performed. Ideally you should be able to cope with such scenarios as well. > > > > For some more background info, refer to this LWN article (scroll down to the > > "Surprise removal" section): > > https://lwn.net/Articles/767885/ > > > > Thanks, > > > > Lukas
On Mon, Dec 19, 2022 at 12:56:15AM -0500, Michael S. Tsirkin wrote: > On Thu, Nov 17, 2022 at 05:36:48AM +0000, Parav Pandit wrote: > > > From: Lukas Wunner <lukas@wunner.de> > > > Sent: Wednesday, November 16, 2022 6:16 AM > > > > > > [cc += Parav Pandit, author of 43bb40c5b926] > > > > > > On Sun, Nov 13, 2022 at 03:46:06AM -0500, Michael S. Tsirkin wrote: > > > > On Fri, Nov 11, 2022 at 05:42:19PM -0600, Bjorn Helgaas wrote: > > > > > On Thu, Nov 10, 2022 at 03:15:55PM -0500, Michael S. Tsirkin wrote: > > > > > > On Thu, Nov 10, 2022 at 01:35:47PM -0600, Bjorn Helgaas wrote: > > > > > > > Prior to this change pci_device_is_present(VF) returned "false" > > > > > > > (because the VF Vendor ID is 0xffff); after the change it will > > > > > > > return "true" (because it will look at the PF Vendor ID instead). > > > > > > > > > > > > > > Previously virtio_pci_remove() called virtio_break_device(). I > > > > > > > guess that meant the virtio I/O operation will never be completed? > > > > > > > > > > > > > > But if we don't call virtio_break_device(), the virtio I/O > > > > > > > operation > > > > > > > *will* be completed? > > > > > > > > Just making sure - pci_device_is_present *is* the suggested way to > > > > distinguish between graceful and surprise removal, isn't it? > > > > > > No, it's not. Instead of !pci_device_is_present() you really want to call > > > pci_dev_is_disconnected() instead. > > > > > > While the fix Bjorn applied for v6.2 may solve the issue and may make sense > > > on it's own, it's not the solution you're looking for. You want to swap the > > > call to !pci_device_is_present() with pci_dev_is_disconnected(), move > > > pci_dev_is_disconnected() from drivers/pci/pci.h to include/linux/pci.h and > > > add a Fixes tag referencing 43bb40c5b926. > > > > > > If you don't want to move pci_dev_is_disconnected(), you can alternatively > > > check for "pdev->error_state == pci_channel_io_perm_failure" or call > > > pci_channel_offline(). The latter will also return true though on transient > > > inaccessibility of the device (e.g. if it's being reset). > > > > > pci_device_is_present() is calling pci_dev_is_disconnected(). > > pci_dev_is_disconnected() avoids reading the vendor id. > > So pci_dev_is_disconnected() looks less strong check. > > I see that it can return a valid value on recoverable error case. > > > > In that case, is pci_channel_offline() a more precise way to check that covers transient and permanent error? > > > > And if that is the right check, we need to fix all the callers, mainly widely used nvme driver [1]. > > > > [1] https://elixir.bootlin.com/linux/v6.1-rc5/source/drivers/nvme/host/pci.c#L3228 > > > > Also, we need to add API documentation on when to use this API in context of hotplug, so that all related drivers can consistently use single API. > > Bjorn, Lukas, what's your take on this idea? I don't really know what to add to my e-mail of Nov 16 (quoted here in full). Yes, pci_channel_offline() returns true on transient and permanent failure. Whether that's what you want, depends on your use case. If you want to check for a surprise-removed device, then you only want to check for permanent failure, so pci_channel_offline() is not correct and you should rather check for "pdev->error_state == pci_channel_io_perm_failure" or move pci_dev_is_disconnected() to include/linux/pci.h. But again, I've already explained this in my e-mail ov Nov 16, so I don't know what's unclear. Thanks, Lukas > > > The theory of operation is as follows: The PCI layer does indeed know > > > whether the device was surprise removed or gracefully removed and that > > > information is passed in the "presence" flag to pciehp_unconfigure_device() > > > (in drivers/pci/hotplug/pciehp_pci.c). That function does the following: > > > > > > if (!presence) > > > pci_walk_bus(parent, pci_dev_set_disconnected, NULL); > > > > > > In other words, pdev->error_state is set to pci_channel_io_perm_failure on > > > the entire hierarchy below the hotplug port. And pci_dev_is_disconnected() > > > simply checks whether that's the device's error_state. > > > > > > pci_dev_is_disconnected() makes sense if you definitely know the device is > > > gone and want to skip certain steps or delays on device teardown. > > > However be aware that the device may be hot-removed after graceful > > > removal was initiated. In such a situation, pci_dev_is_disconnected() may > > > return false and you'll try to access the device as normal, even though it was > > > yanked from the slot after the pci_dev_is_disconnected() call was > > > performed. Ideally you should be able to cope with such scenarios as well. > > > > > > For some more background info, refer to this LWN article (scroll down to the > > > "Surprise removal" section): > > > https://lwn.net/Articles/767885/ > > > > > > Thanks, > > > > > > Lukas
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 2127aba3550b..899b3f52e84e 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -6445,8 +6445,13 @@ bool pci_devs_are_dma_aliases(struct pci_dev *dev1, struct pci_dev *dev2) bool pci_device_is_present(struct pci_dev *pdev) { + struct pci_dev *physfn = pci_physfn(pdev); u32 v; + /* Not a PF? Switch to the PF. */ + if (physfn != pdev) + return pci_device_is_present(physfn); + if (pci_dev_is_disconnected(pdev)) return false; return pci_bus_read_dev_vendor_id(pdev->bus, pdev->devfn, &v, 0);