Message ID | 336455b4f630f329380a8f53ee8cad3868764d5c.1686295549.git.mst@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp771016vqr; Fri, 9 Jun 2023 00:30:51 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5rI6AY1M5bxf61EcNjNMzm8Luw5pF/eW+RPp3d2icD4r9KdzkvogNbTo6bJpZWxn/dBYex X-Received: by 2002:a05:6a20:748d:b0:10e:786c:820d with SMTP id p13-20020a056a20748d00b0010e786c820dmr333926pzd.32.1686295851170; Fri, 09 Jun 2023 00:30:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686295851; cv=none; d=google.com; s=arc-20160816; b=A+HTNOz5c7UN/itbTa2nDvV0Fb2IrvdNn0mGm+8Iz71srXyra1GAC2LvFyzvn2tzrA HBVX4g7UDn3BSMl/VZV/AP8EczbQ2xJmloHKKlX6sKL5dQlXpXXWM6Wc+fCzbCZwGT8M TZ8VIqNPtU/azKkHElnfBlJSsrtFclle867aQbmgi9kTMPBb6S1AP+BPgrirOJ1F2LsI nIt0s7weHP327SNsDEJUrFO6ZnUmV1mVQDAfUEmMQz5KZ1ABSNPbv19Wxgp+zql7nqF8 VbOhXIiNIiO/JMe0P+SDb6i1Nz+3y1xR7ugzSQFXEqCerqHE4d19SvG5CRVSypqXxe3B 6fQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-disposition:mime-version:message-id :subject:cc:to:from:date:dkim-signature; bh=k8uL9RB2TZeMZAUxmhUna0m1Gfo0R9DVnwUW3ywqZ+c=; b=dj2dHTJiOdgJI5wlR5hdujt2Xt0putyqvJklGeTSjAL9/Gv4O7xBuO55ZMkkDQemW/ s4WmMTjvRf4+x8+HJjLJtze5s39BjW/njCRk3tHlc4iZgf7LuBnB18xtx8eVGpxkOcWT vzfI27Cezhp3eH3S9Mph+dCqxP/Hl8aeLuIrhUJ/pVFB74J51iz5oI7Gf4VLt9Rqavca cUp0db0WUCWVNrL+ClVF3XVu77FcuG/jlcel3ykf0Ex9A7jysbeAk5eGJyvUZnLCAiCv sPu36/G0Gu9wkxmAuefFP6v5EhuySW6bqZY1s4p0Qi8u2vlkhTaXvmjA79uEzTWRCmwh fARA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=SGkpVqx6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q22-20020aa79616000000b0065ba0464407si2071665pfg.116.2023.06.09.00.30.36; Fri, 09 Jun 2023 00:30:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=SGkpVqx6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238690AbjFIH2X (ORCPT <rfc822;liningstudo@gmail.com> + 99 others); Fri, 9 Jun 2023 03:28:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42594 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230000AbjFIH2V (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 9 Jun 2023 03:28:21 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C8EF2D7C for <linux-kernel@vger.kernel.org>; Fri, 9 Jun 2023 00:27:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1686295651; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type; bh=k8uL9RB2TZeMZAUxmhUna0m1Gfo0R9DVnwUW3ywqZ+c=; b=SGkpVqx6vB4Q+F2tMmXgHhmILPkKC1bl4n3lPHov+CkAPFBDJklG6oRxNYK9FbT5lsN2Om u9k69Hv4s43Lbyi0/wyFpJjz3GMVBJtR9xenh51FRsZQRzcLew655RFoJ3xRDeo4B9h4FN nfAuxOFETkUpwuU4cA9I5C/kIHkwrik= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-480-rIo2R1KvPmeD3875NVveGw-1; Fri, 09 Jun 2023 03:27:30 -0400 X-MC-Unique: rIo2R1KvPmeD3875NVveGw-1 Received: by mail-wm1-f72.google.com with SMTP id 5b1f17b1804b1-3f7eb414fcbso6696705e9.2 for <linux-kernel@vger.kernel.org>; Fri, 09 Jun 2023 00:27:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686295648; x=1688887648; h=content-disposition:mime-version:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=k8uL9RB2TZeMZAUxmhUna0m1Gfo0R9DVnwUW3ywqZ+c=; b=NY5IbSATK0CTUyatypCwHkg6TaaK+GTC2k+q4ca5dlJ+JKzNbwq6uEjIdC1038N6cD P0w5m+AhgjSsShIYThKPYixY/aujFIjGazV+apfTt9aEypi+caNR/X4HL6DGhO/8FVEj fVT3x3+QbrvoRhpl21yra+6b6uZXciEEDbarnsOlmZjjKBVW9PTebx6OpeGoHtcWN5SH sXA1YPNSbmjHPdh8ULo/DRXzgLGg4MXa0Jh6E9URaucyuEqOZbmFNsffOIZEKN1L9ahX Fiz1Jz6XBoOE+z0AtjNBQvCjBD9jxh137hktiKWIS2Hc+BAPC4aRoGHLYbkrySw3jceb RqZw== X-Gm-Message-State: AC+VfDwcF8XQtceSR2jCC3DEff4NTjTAQtdKBhpMpaIS78IA5bJSlfoV 2eoE9lSBCWipPktx5NmUPxxNiEcGFpXAthLEbiC3NdutPL9dSNEj82EH4Euuah7/8q7tpApiGH+ tPy+4KYz1CKsIBC+5tgu8a2yx25fCQx+T0EOiFP57tX9v60ZZaPA3/kxgvjD3Nrr0Io7uIPFrmH B2Qg== X-Received: by 2002:a1c:4b09:0:b0:3f7:f2a1:482f with SMTP id y9-20020a1c4b09000000b003f7f2a1482fmr230028wma.11.1686295648269; Fri, 09 Jun 2023 00:27:28 -0700 (PDT) X-Received: by 2002:a1c:4b09:0:b0:3f7:f2a1:482f with SMTP id y9-20020a1c4b09000000b003f7f2a1482fmr230008wma.11.1686295647923; Fri, 09 Jun 2023 00:27:27 -0700 (PDT) Received: from redhat.com ([2.55.4.169]) by smtp.gmail.com with ESMTPSA id x17-20020a1c7c11000000b003f809461162sm103677wmc.16.2023.06.09.00.27.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Jun 2023 00:27:27 -0700 (PDT) Date: Fri, 9 Jun 2023 03:27:24 -0400 From: "Michael S. Tsirkin" <mst@redhat.com> To: linux-kernel@vger.kernel.org Cc: kernel test robot <lkp@intel.com>, Suwan Kim <suwan.kim027@gmail.com>, "Roberts, Martin" <martin.roberts@intel.com>, Jason Wang <jasowang@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>, Xuan Zhuo <xuanzhuo@linux.alibaba.com>, Jens Axboe <axboe@kernel.dk>, virtualization@lists.linux-foundation.org, linux-block@vger.kernel.org Subject: [PATCH v2] Revert "virtio-blk: support completion batching for the IRQ path" Message-ID: <336455b4f630f329380a8f53ee8cad3868764d5c.1686295549.git.mst@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Mailer: git-send-email 2.27.0.106.g8ac3dc51b1 X-Mutt-Fcc: =sent X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768209358602479649?= X-GMAIL-MSGID: =?utf-8?q?1768209358602479649?= |
Series |
[v2] Revert "virtio-blk: support completion batching for the IRQ path"
|
|
Commit Message
Michael S. Tsirkin
June 9, 2023, 7:27 a.m. UTC
This reverts commit 07b679f70d73483930e8d3c293942416d9cd5c13.
This change appears to have broken things...
We now see applications hanging during disk accesses.
e.g.
multi-port virtio-blk device running in h/w (FPGA)
Host running a simple 'fio' test.
[global]
thread=1
direct=1
ioengine=libaio
norandommap=1
group_reporting=1
bs=4K
rw=read
iodepth=128
runtime=1
numjobs=4
time_based
[job0]
filename=/dev/vda
[job1]
filename=/dev/vdb
[job2]
filename=/dev/vdc
...
[job15]
filename=/dev/vdp
i.e. 16 disks; 4 queues per disk; simple burst of 4KB reads
This is repeatedly run in a loop.
After a few, normally <10 seconds, fio hangs.
With 64 queues (16 disks), failure occurs within a few seconds; with 8 queues (2 disks) it may take ~hour before hanging.
Last message:
fio-3.19
Starting 8 threads
Jobs: 1 (f=1): [_(7),R(1)][68.3%][eta 03h:11m:06s]
I think this means at the end of the run 1 queue was left incomplete.
'diskstats' (run while fio is hung) shows no outstanding transactions.
e.g.
$ cat /proc/diskstats
...
252 0 vda 1843140071 0 14745120568 712568645 0 0 0 0 0 3117947 712568645 0 0 0 0 0 0
252 16 vdb 1816291511 0 14530332088 704905623 0 0 0 0 0 3117711 704905623 0 0 0 0 0 0
...
Other stats (in the h/w, and added to the virtio-blk driver ([a]virtio_queue_rq(), [b]virtblk_handle_req(), [c]virtblk_request_done()) all agree, and show every request had a completion, and that virtblk_request_done() never gets called.
e.g.
PF= 0 vq=0 1 2 3
[a]request_count - 839416590 813148916 105586179 84988123
[b]completion1_count - 839416590 813148916 105586179 84988123
[c]completion2_count - 0 0 0 0
PF= 1 vq=0 1 2 3
[a]request_count - 823335887 812516140 104582672 75856549
[b]completion1_count - 823335887 812516140 104582672 75856549
[c]completion2_count - 0 0 0 0
i.e. the issue is after the virtio-blk driver.
This change was introduced in kernel 6.3.0.
I am seeing this using 6.3.3.
If I run with an earlier kernel (5.15), it does not occur.
If I make a simple patch to the 6.3.3 virtio-blk driver, to skip the blk_mq_add_to_batch()call, it does not fail.
e.g.
kernel 5.15 - this is OK
virtio_blk.c,virtblk_done() [irq handler]
if (likely(!blk_should_fake_timeout(req->q))) {
blk_mq_complete_request(req);
}
kernel 6.3.3 - this fails
virtio_blk.c,virtblk_handle_req() [irq handler]
if (likely(!blk_should_fake_timeout(req->q))) {
if (!blk_mq_complete_request_remote(req)) {
if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) {
virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed
}
}
}
If I do, kernel 6.3.3 - this is OK
virtio_blk.c,virtblk_handle_req() [irq handler]
if (likely(!blk_should_fake_timeout(req->q))) {
if (!blk_mq_complete_request_remote(req)) {
virtblk_request_done(req); //force this here...
if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) {
virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed
}
}
}
Perhaps you might like to fix/test/revert this change...
Martin
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306090826.C1fZmdMe-lkp@intel.com/
Cc: Suwan Kim <suwan.kim027@gmail.com>
Reported-by: "Roberts, Martin" <martin.roberts@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
Since v1:
fix build error
Still completely untested as I'm traveling.
Martin, Suwan, could you please test and report?
Suwan if you have a better revert in mind pls post and
I will be happy to drop this.
Thanks!
drivers/block/virtio_blk.c | 82 +++++++++++++++++---------------------
1 file changed, 37 insertions(+), 45 deletions(-)
Comments
I think, at some point, vbr->status, got changed to virtblk_vbr_status(vbr). If I use the version of virtblk_poll() from the patch, but substitute virtblk_vbr_status(vbr), then it (patched 6.3.3) compiles OK. Note, my setup never causes virtblk_poll() to be called, so its influence on the issue is unknown - but maybe it also shouldn't be running in batch mode. With the patch, I've not (yet) managed to hang it - I will let it run a bit longer. Martin -----Original Message----- From: Michael S. Tsirkin <mst@redhat.com> Sent: Friday, June 9, 2023 8:27 AM To: linux-kernel@vger.kernel.org Cc: lkp <lkp@intel.com>; Suwan Kim <suwan.kim027@gmail.com>; Roberts, Martin <martin.roberts@intel.com>; Jason Wang <jasowang@redhat.com>; Paolo Bonzini <pbonzini@redhat.com>; Stefan Hajnoczi <stefanha@redhat.com>; Xuan Zhuo <xuanzhuo@linux.alibaba.com>; Jens Axboe <axboe@kernel.dk>; virtualization@lists.linux-foundation.org; linux-block@vger.kernel.org Subject: [PATCH v2] Revert "virtio-blk: support completion batching for the IRQ path" This reverts commit 07b679f70d73483930e8d3c293942416d9cd5c13. This change appears to have broken things... We now see applications hanging during disk accesses. e.g. multi-port virtio-blk device running in h/w (FPGA) Host running a simple 'fio' test. [global] thread=1 direct=1 ioengine=libaio norandommap=1 group_reporting=1 bs=4K rw=read iodepth=128 runtime=1 numjobs=4 time_based [job0] filename=/dev/vda [job1] filename=/dev/vdb [job2] filename=/dev/vdc ... [job15] filename=/dev/vdp i.e. 16 disks; 4 queues per disk; simple burst of 4KB reads This is repeatedly run in a loop. After a few, normally <10 seconds, fio hangs. With 64 queues (16 disks), failure occurs within a few seconds; with 8 queues (2 disks) it may take ~hour before hanging. Last message: fio-3.19 Starting 8 threads Jobs: 1 (f=1): [_(7),R(1)][68.3%][eta 03h:11m:06s] I think this means at the end of the run 1 queue was left incomplete. 'diskstats' (run while fio is hung) shows no outstanding transactions. e.g. $ cat /proc/diskstats ... 252 0 vda 1843140071 0 14745120568 712568645 0 0 0 0 0 3117947 712568645 0 0 0 0 0 0 252 16 vdb 1816291511 0 14530332088 704905623 0 0 0 0 0 3117711 704905623 0 0 0 0 0 0 ... Other stats (in the h/w, and added to the virtio-blk driver ([a]virtio_queue_rq(), [b]virtblk_handle_req(), [c]virtblk_request_done()) all agree, and show every request had a completion, and that virtblk_request_done() never gets called. e.g. PF= 0 vq=0 1 2 3 [a]request_count - 839416590 813148916 105586179 84988123 [b]completion1_count - 839416590 813148916 105586179 84988123 [c]completion2_count - 0 0 0 0 PF= 1 vq=0 1 2 3 [a]request_count - 823335887 812516140 104582672 75856549 [b]completion1_count - 823335887 812516140 104582672 75856549 [c]completion2_count - 0 0 0 0 i.e. the issue is after the virtio-blk driver. This change was introduced in kernel 6.3.0. I am seeing this using 6.3.3. If I run with an earlier kernel (5.15), it does not occur. If I make a simple patch to the 6.3.3 virtio-blk driver, to skip the blk_mq_add_to_batch()call, it does not fail. e.g. kernel 5.15 - this is OK virtio_blk.c,virtblk_done() [irq handler] if (likely(!blk_should_fake_timeout(req->q))) { blk_mq_complete_request(req); } kernel 6.3.3 - this fails virtio_blk.c,virtblk_handle_req() [irq handler] if (likely(!blk_should_fake_timeout(req->q))) { if (!blk_mq_complete_request_remote(req)) { if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) { virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed } } } If I do, kernel 6.3.3 - this is OK virtio_blk.c,virtblk_handle_req() [irq handler] if (likely(!blk_should_fake_timeout(req->q))) { if (!blk_mq_complete_request_remote(req)) { virtblk_request_done(req); //force this here... if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) { virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed } } } Perhaps you might like to fix/test/revert this change... Martin Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202306090826.C1fZmdMe-lkp@intel.com/ Cc: Suwan Kim <suwan.kim027@gmail.com> Reported-by: "Roberts, Martin" <martin.roberts@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> --- Since v1: fix build error Still completely untested as I'm traveling. Martin, Suwan, could you please test and report? Suwan if you have a better revert in mind pls post and I will be happy to drop this. Thanks! drivers/block/virtio_blk.c | 82 +++++++++++++++++--------------------- 1 file changed, 37 insertions(+), 45 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 2b918e28acaa..b47358da92a2 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -348,63 +348,33 @@ static inline void virtblk_request_done(struct request *req) blk_mq_end_request(req, status); } -static void virtblk_complete_batch(struct io_comp_batch *iob) -{ - struct request *req; - - rq_list_for_each(&iob->req_list, req) { - virtblk_unmap_data(req, blk_mq_rq_to_pdu(req)); - virtblk_cleanup_cmd(req); - } - blk_mq_end_request_batch(iob); -} - -static int virtblk_handle_req(struct virtio_blk_vq *vq, - struct io_comp_batch *iob) -{ - struct virtblk_req *vbr; - int req_done = 0; - unsigned int len; - - while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) { - struct request *req = blk_mq_rq_from_pdu(vbr); - - if (likely(!blk_should_fake_timeout(req->q)) && - !blk_mq_complete_request_remote(req) && - !blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), - virtblk_complete_batch)) - virtblk_request_done(req); - req_done++; - } - - return req_done; -} - static void virtblk_done(struct virtqueue *vq) { struct virtio_blk *vblk = vq->vdev->priv; - struct virtio_blk_vq *vblk_vq = &vblk->vqs[vq->index]; - int req_done = 0; + bool req_done = false; + int qid = vq->index; + struct virtblk_req *vbr; unsigned long flags; - DEFINE_IO_COMP_BATCH(iob); + unsigned int len; - spin_lock_irqsave(&vblk_vq->lock, flags); + spin_lock_irqsave(&vblk->vqs[qid].lock, flags); do { virtqueue_disable_cb(vq); - req_done += virtblk_handle_req(vblk_vq, &iob); + while ((vbr = virtqueue_get_buf(vblk->vqs[qid].vq, &len)) != NULL) { + struct request *req = blk_mq_rq_from_pdu(vbr); + if (likely(!blk_should_fake_timeout(req->q))) + blk_mq_complete_request(req); + req_done = true; + } if (unlikely(virtqueue_is_broken(vq))) break; } while (!virtqueue_enable_cb(vq)); - if (req_done) { - if (!rq_list_empty(iob.req_list)) - iob.complete(&iob); - - /* In case queue is stopped waiting for more buffers. */ + /* In case queue is stopped waiting for more buffers. */ + if (req_done) blk_mq_start_stopped_hw_queues(vblk->disk->queue, true); - } - spin_unlock_irqrestore(&vblk_vq->lock, flags); + spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags); } static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx) @@ -1283,15 +1253,37 @@ static void virtblk_map_queues(struct blk_mq_tag_set *set) } } +static void virtblk_complete_batch(struct io_comp_batch *iob) +{ + struct request *req; + + rq_list_for_each(&iob->req_list, req) { + virtblk_unmap_data(req, blk_mq_rq_to_pdu(req)); + virtblk_cleanup_cmd(req); + } + blk_mq_end_request_batch(iob); +} + static int virtblk_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob) { struct virtio_blk *vblk = hctx->queue->queuedata; struct virtio_blk_vq *vq = get_virtio_blk_vq(hctx); + struct virtblk_req *vbr; unsigned long flags; + unsigned int len; int found = 0; spin_lock_irqsave(&vq->lock, flags); - found = virtblk_handle_req(vq, iob); + + while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) { + struct request *req = blk_mq_rq_from_pdu(vbr); + + found++; + if (!blk_mq_complete_request_remote(req) && + !blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), + virtblk_complete_batch)) + virtblk_request_done(req); + } if (found) blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);
On Fri, Jun 09, 2023 at 09:34:40AM +0000, Roberts, Martin wrote: > I think, at some point, vbr->status, got changed to virtblk_vbr_status(vbr). > If I use the version of virtblk_poll() from the patch, but substitute virtblk_vbr_status(vbr), then it (patched 6.3.3) compiles OK. Hmm v2 does not have vbr->status anymore. Are you referring to v1? > Note, my setup never causes virtblk_poll() to be called, so its influence on the issue is unknown - but maybe it also shouldn't be running in batch mode. > > With the patch, I've not (yet) managed to hang it - I will let it run a bit longer. > Martin Want to post the patch that works for you? > -----Original Message----- > From: Michael S. Tsirkin <mst@redhat.com> > Sent: Friday, June 9, 2023 8:27 AM > To: linux-kernel@vger.kernel.org > Cc: lkp <lkp@intel.com>; Suwan Kim <suwan.kim027@gmail.com>; Roberts, Martin <martin.roberts@intel.com>; Jason Wang <jasowang@redhat.com>; Paolo Bonzini <pbonzini@redhat.com>; Stefan Hajnoczi <stefanha@redhat.com>; Xuan Zhuo <xuanzhuo@linux.alibaba.com>; Jens Axboe <axboe@kernel.dk>; virtualization@lists.linux-foundation.org; linux-block@vger.kernel.org > Subject: [PATCH v2] Revert "virtio-blk: support completion batching for the IRQ path" > > This reverts commit 07b679f70d73483930e8d3c293942416d9cd5c13. > > This change appears to have broken things... > We now see applications hanging during disk accesses. > e.g. > multi-port virtio-blk device running in h/w (FPGA) > Host running a simple 'fio' test. > [global] > thread=1 > direct=1 > ioengine=libaio > norandommap=1 > group_reporting=1 > bs=4K > rw=read > iodepth=128 > runtime=1 > numjobs=4 > time_based > [job0] > filename=/dev/vda > [job1] > filename=/dev/vdb > [job2] > filename=/dev/vdc > ... > [job15] > filename=/dev/vdp > > i.e. 16 disks; 4 queues per disk; simple burst of 4KB reads > This is repeatedly run in a loop. > > After a few, normally <10 seconds, fio hangs. > With 64 queues (16 disks), failure occurs within a few seconds; with 8 queues (2 disks) it may take ~hour before hanging. > Last message: > fio-3.19 > Starting 8 threads > Jobs: 1 (f=1): [_(7),R(1)][68.3%][eta 03h:11m:06s] > I think this means at the end of the run 1 queue was left incomplete. > > 'diskstats' (run while fio is hung) shows no outstanding transactions. > e.g. > $ cat /proc/diskstats > ... > 252 0 vda 1843140071 0 14745120568 712568645 0 0 0 0 0 3117947 712568645 0 0 0 0 0 0 > 252 16 vdb 1816291511 0 14530332088 704905623 0 0 0 0 0 3117711 704905623 0 0 0 0 0 0 > ... > > Other stats (in the h/w, and added to the virtio-blk driver ([a]virtio_queue_rq(), [b]virtblk_handle_req(), [c]virtblk_request_done()) all agree, and show every request had a completion, and that virtblk_request_done() never gets called. > e.g. > PF= 0 vq=0 1 2 3 > [a]request_count - 839416590 813148916 105586179 84988123 > [b]completion1_count - 839416590 813148916 105586179 84988123 > [c]completion2_count - 0 0 0 0 > > PF= 1 vq=0 1 2 3 > [a]request_count - 823335887 812516140 104582672 75856549 > [b]completion1_count - 823335887 812516140 104582672 75856549 > [c]completion2_count - 0 0 0 0 > > i.e. the issue is after the virtio-blk driver. > > This change was introduced in kernel 6.3.0. > I am seeing this using 6.3.3. > If I run with an earlier kernel (5.15), it does not occur. > If I make a simple patch to the 6.3.3 virtio-blk driver, to skip the blk_mq_add_to_batch()call, it does not fail. > e.g. > kernel 5.15 - this is OK > virtio_blk.c,virtblk_done() [irq handler] > if (likely(!blk_should_fake_timeout(req->q))) { > blk_mq_complete_request(req); > } > > kernel 6.3.3 - this fails > virtio_blk.c,virtblk_handle_req() [irq handler] > if (likely(!blk_should_fake_timeout(req->q))) { > if (!blk_mq_complete_request_remote(req)) { > if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) { > virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed > } > } > } > > If I do, kernel 6.3.3 - this is OK > virtio_blk.c,virtblk_handle_req() [irq handler] > if (likely(!blk_should_fake_timeout(req->q))) { > if (!blk_mq_complete_request_remote(req)) { > virtblk_request_done(req); //force this here... > if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) { > virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed > } > } > } > > Perhaps you might like to fix/test/revert this change... > Martin > > Reported-by: kernel test robot <lkp@intel.com> > Closes: https://lore.kernel.org/oe-kbuild-all/202306090826.C1fZmdMe-lkp@intel.com/ > Cc: Suwan Kim <suwan.kim027@gmail.com> > Reported-by: "Roberts, Martin" <martin.roberts@intel.com> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > --- > > Since v1: > fix build error > > Still completely untested as I'm traveling. > Martin, Suwan, could you please test and report? > Suwan if you have a better revert in mind pls post and > I will be happy to drop this. > > Thanks! > > > drivers/block/virtio_blk.c | 82 +++++++++++++++++--------------------- > 1 file changed, 37 insertions(+), 45 deletions(-) > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > index 2b918e28acaa..b47358da92a2 100644 > --- a/drivers/block/virtio_blk.c > +++ b/drivers/block/virtio_blk.c > @@ -348,63 +348,33 @@ static inline void virtblk_request_done(struct request *req) > blk_mq_end_request(req, status); > } > > -static void virtblk_complete_batch(struct io_comp_batch *iob) > -{ > - struct request *req; > - > - rq_list_for_each(&iob->req_list, req) { > - virtblk_unmap_data(req, blk_mq_rq_to_pdu(req)); > - virtblk_cleanup_cmd(req); > - } > - blk_mq_end_request_batch(iob); > -} > - > -static int virtblk_handle_req(struct virtio_blk_vq *vq, > - struct io_comp_batch *iob) > -{ > - struct virtblk_req *vbr; > - int req_done = 0; > - unsigned int len; > - > - while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) { > - struct request *req = blk_mq_rq_from_pdu(vbr); > - > - if (likely(!blk_should_fake_timeout(req->q)) && > - !blk_mq_complete_request_remote(req) && > - !blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), > - virtblk_complete_batch)) > - virtblk_request_done(req); > - req_done++; > - } > - > - return req_done; > -} > - > static void virtblk_done(struct virtqueue *vq) > { > struct virtio_blk *vblk = vq->vdev->priv; > - struct virtio_blk_vq *vblk_vq = &vblk->vqs[vq->index]; > - int req_done = 0; > + bool req_done = false; > + int qid = vq->index; > + struct virtblk_req *vbr; > unsigned long flags; > - DEFINE_IO_COMP_BATCH(iob); > + unsigned int len; > > - spin_lock_irqsave(&vblk_vq->lock, flags); > + spin_lock_irqsave(&vblk->vqs[qid].lock, flags); > do { > virtqueue_disable_cb(vq); > - req_done += virtblk_handle_req(vblk_vq, &iob); > + while ((vbr = virtqueue_get_buf(vblk->vqs[qid].vq, &len)) != NULL) { > + struct request *req = blk_mq_rq_from_pdu(vbr); > > + if (likely(!blk_should_fake_timeout(req->q))) > + blk_mq_complete_request(req); > + req_done = true; > + } > if (unlikely(virtqueue_is_broken(vq))) > break; > } while (!virtqueue_enable_cb(vq)); > > - if (req_done) { > - if (!rq_list_empty(iob.req_list)) > - iob.complete(&iob); > - > - /* In case queue is stopped waiting for more buffers. */ > + /* In case queue is stopped waiting for more buffers. */ > + if (req_done) > blk_mq_start_stopped_hw_queues(vblk->disk->queue, true); > - } > - spin_unlock_irqrestore(&vblk_vq->lock, flags); > + spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags); > } > > static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx) > @@ -1283,15 +1253,37 @@ static void virtblk_map_queues(struct blk_mq_tag_set *set) > } > } > > +static void virtblk_complete_batch(struct io_comp_batch *iob) > +{ > + struct request *req; > + > + rq_list_for_each(&iob->req_list, req) { > + virtblk_unmap_data(req, blk_mq_rq_to_pdu(req)); > + virtblk_cleanup_cmd(req); > + } > + blk_mq_end_request_batch(iob); > +} > + > static int virtblk_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob) > { > struct virtio_blk *vblk = hctx->queue->queuedata; > struct virtio_blk_vq *vq = get_virtio_blk_vq(hctx); > + struct virtblk_req *vbr; > unsigned long flags; > + unsigned int len; > int found = 0; > > spin_lock_irqsave(&vq->lock, flags); > - found = virtblk_handle_req(vq, iob); > + > + while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) { > + struct request *req = blk_mq_rq_from_pdu(vbr); > + > + found++; > + if (!blk_mq_complete_request_remote(req) && > + !blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), > + virtblk_complete_batch)) > + virtblk_request_done(req); > + } > > if (found) > blk_mq_start_stopped_hw_queues(vblk->disk->queue, true); > -- > MST
OK, I didn't realise you had updated the patch; I only looked at the first one. I think you did the same as me, just changed vbr->status to virtblk_vbr_status(vbr), in virtblk_poll(). -----Original Message----- From: Michael S. Tsirkin <mst@redhat.com> Sent: Friday, June 9, 2023 10:42 AM To: Roberts, Martin <martin.roberts@intel.com> Cc: linux-kernel@vger.kernel.org; lkp <lkp@intel.com>; Suwan Kim <suwan.kim027@gmail.com>; Jason Wang <jasowang@redhat.com>; Paolo Bonzini <pbonzini@redhat.com>; Stefan Hajnoczi <stefanha@redhat.com>; Xuan Zhuo <xuanzhuo@linux.alibaba.com>; Jens Axboe <axboe@kernel.dk>; virtualization@lists.linux-foundation.org; linux-block@vger.kernel.org Subject: Re: [PATCH v2] Revert "virtio-blk: support completion batching for the IRQ path" On Fri, Jun 09, 2023 at 09:34:40AM +0000, Roberts, Martin wrote: > I think, at some point, vbr->status, got changed to virtblk_vbr_status(vbr). > If I use the version of virtblk_poll() from the patch, but substitute virtblk_vbr_status(vbr), then it (patched 6.3.3) compiles OK. Hmm v2 does not have vbr->status anymore. Are you referring to v1? > Note, my setup never causes virtblk_poll() to be called, so its influence on the issue is unknown - but maybe it also shouldn't be running in batch mode. > > With the patch, I've not (yet) managed to hang it - I will let it run a bit longer. > Martin Want to post the patch that works for you? > -----Original Message----- > From: Michael S. Tsirkin <mst@redhat.com> > Sent: Friday, June 9, 2023 8:27 AM > To: linux-kernel@vger.kernel.org > Cc: lkp <lkp@intel.com>; Suwan Kim <suwan.kim027@gmail.com>; Roberts, Martin <martin.roberts@intel.com>; Jason Wang <jasowang@redhat.com>; Paolo Bonzini <pbonzini@redhat.com>; Stefan Hajnoczi <stefanha@redhat.com>; Xuan Zhuo <xuanzhuo@linux.alibaba.com>; Jens Axboe <axboe@kernel.dk>; virtualization@lists.linux-foundation.org; linux-block@vger.kernel.org > Subject: [PATCH v2] Revert "virtio-blk: support completion batching for the IRQ path" > > This reverts commit 07b679f70d73483930e8d3c293942416d9cd5c13. > > This change appears to have broken things... > We now see applications hanging during disk accesses. > e.g. > multi-port virtio-blk device running in h/w (FPGA) > Host running a simple 'fio' test. > [global] > thread=1 > direct=1 > ioengine=libaio > norandommap=1 > group_reporting=1 > bs=4K > rw=read > iodepth=128 > runtime=1 > numjobs=4 > time_based > [job0] > filename=/dev/vda > [job1] > filename=/dev/vdb > [job2] > filename=/dev/vdc > ... > [job15] > filename=/dev/vdp > > i.e. 16 disks; 4 queues per disk; simple burst of 4KB reads > This is repeatedly run in a loop. > > After a few, normally <10 seconds, fio hangs. > With 64 queues (16 disks), failure occurs within a few seconds; with 8 queues (2 disks) it may take ~hour before hanging. > Last message: > fio-3.19 > Starting 8 threads > Jobs: 1 (f=1): [_(7),R(1)][68.3%][eta 03h:11m:06s] > I think this means at the end of the run 1 queue was left incomplete. > > 'diskstats' (run while fio is hung) shows no outstanding transactions. > e.g. > $ cat /proc/diskstats > ... > 252 0 vda 1843140071 0 14745120568 712568645 0 0 0 0 0 3117947 712568645 0 0 0 0 0 0 > 252 16 vdb 1816291511 0 14530332088 704905623 0 0 0 0 0 3117711 704905623 0 0 0 0 0 0 > ... > > Other stats (in the h/w, and added to the virtio-blk driver ([a]virtio_queue_rq(), [b]virtblk_handle_req(), [c]virtblk_request_done()) all agree, and show every request had a completion, and that virtblk_request_done() never gets called. > e.g. > PF= 0 vq=0 1 2 3 > [a]request_count - 839416590 813148916 105586179 84988123 > [b]completion1_count - 839416590 813148916 105586179 84988123 > [c]completion2_count - 0 0 0 0 > > PF= 1 vq=0 1 2 3 > [a]request_count - 823335887 812516140 104582672 75856549 > [b]completion1_count - 823335887 812516140 104582672 75856549 > [c]completion2_count - 0 0 0 0 > > i.e. the issue is after the virtio-blk driver. > > This change was introduced in kernel 6.3.0. > I am seeing this using 6.3.3. > If I run with an earlier kernel (5.15), it does not occur. > If I make a simple patch to the 6.3.3 virtio-blk driver, to skip the blk_mq_add_to_batch()call, it does not fail. > e.g. > kernel 5.15 - this is OK > virtio_blk.c,virtblk_done() [irq handler] > if (likely(!blk_should_fake_timeout(req->q))) { > blk_mq_complete_request(req); > } > > kernel 6.3.3 - this fails > virtio_blk.c,virtblk_handle_req() [irq handler] > if (likely(!blk_should_fake_timeout(req->q))) { > if (!blk_mq_complete_request_remote(req)) { > if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) { > virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed > } > } > } > > If I do, kernel 6.3.3 - this is OK > virtio_blk.c,virtblk_handle_req() [irq handler] > if (likely(!blk_should_fake_timeout(req->q))) { > if (!blk_mq_complete_request_remote(req)) { > virtblk_request_done(req); //force this here... > if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) { > virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed > } > } > } > > Perhaps you might like to fix/test/revert this change... > Martin > > Reported-by: kernel test robot <lkp@intel.com> > Closes: https://lore.kernel.org/oe-kbuild-all/202306090826.C1fZmdMe-lkp@intel.com/ > Cc: Suwan Kim <suwan.kim027@gmail.com> > Reported-by: "Roberts, Martin" <martin.roberts@intel.com> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > --- > > Since v1: > fix build error > > Still completely untested as I'm traveling. > Martin, Suwan, could you please test and report? > Suwan if you have a better revert in mind pls post and > I will be happy to drop this. > > Thanks! > > > drivers/block/virtio_blk.c | 82 +++++++++++++++++--------------------- > 1 file changed, 37 insertions(+), 45 deletions(-) > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > index 2b918e28acaa..b47358da92a2 100644 > --- a/drivers/block/virtio_blk.c > +++ b/drivers/block/virtio_blk.c > @@ -348,63 +348,33 @@ static inline void virtblk_request_done(struct request *req) > blk_mq_end_request(req, status); > } > > -static void virtblk_complete_batch(struct io_comp_batch *iob) > -{ > - struct request *req; > - > - rq_list_for_each(&iob->req_list, req) { > - virtblk_unmap_data(req, blk_mq_rq_to_pdu(req)); > - virtblk_cleanup_cmd(req); > - } > - blk_mq_end_request_batch(iob); > -} > - > -static int virtblk_handle_req(struct virtio_blk_vq *vq, > - struct io_comp_batch *iob) > -{ > - struct virtblk_req *vbr; > - int req_done = 0; > - unsigned int len; > - > - while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) { > - struct request *req = blk_mq_rq_from_pdu(vbr); > - > - if (likely(!blk_should_fake_timeout(req->q)) && > - !blk_mq_complete_request_remote(req) && > - !blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), > - virtblk_complete_batch)) > - virtblk_request_done(req); > - req_done++; > - } > - > - return req_done; > -} > - > static void virtblk_done(struct virtqueue *vq) > { > struct virtio_blk *vblk = vq->vdev->priv; > - struct virtio_blk_vq *vblk_vq = &vblk->vqs[vq->index]; > - int req_done = 0; > + bool req_done = false; > + int qid = vq->index; > + struct virtblk_req *vbr; > unsigned long flags; > - DEFINE_IO_COMP_BATCH(iob); > + unsigned int len; > > - spin_lock_irqsave(&vblk_vq->lock, flags); > + spin_lock_irqsave(&vblk->vqs[qid].lock, flags); > do { > virtqueue_disable_cb(vq); > - req_done += virtblk_handle_req(vblk_vq, &iob); > + while ((vbr = virtqueue_get_buf(vblk->vqs[qid].vq, &len)) != NULL) { > + struct request *req = blk_mq_rq_from_pdu(vbr); > > + if (likely(!blk_should_fake_timeout(req->q))) > + blk_mq_complete_request(req); > + req_done = true; > + } > if (unlikely(virtqueue_is_broken(vq))) > break; > } while (!virtqueue_enable_cb(vq)); > > - if (req_done) { > - if (!rq_list_empty(iob.req_list)) > - iob.complete(&iob); > - > - /* In case queue is stopped waiting for more buffers. */ > + /* In case queue is stopped waiting for more buffers. */ > + if (req_done) > blk_mq_start_stopped_hw_queues(vblk->disk->queue, true); > - } > - spin_unlock_irqrestore(&vblk_vq->lock, flags); > + spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags); > } > > static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx) > @@ -1283,15 +1253,37 @@ static void virtblk_map_queues(struct blk_mq_tag_set *set) > } > } > > +static void virtblk_complete_batch(struct io_comp_batch *iob) > +{ > + struct request *req; > + > + rq_list_for_each(&iob->req_list, req) { > + virtblk_unmap_data(req, blk_mq_rq_to_pdu(req)); > + virtblk_cleanup_cmd(req); > + } > + blk_mq_end_request_batch(iob); > +} > + > static int virtblk_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob) > { > struct virtio_blk *vblk = hctx->queue->queuedata; > struct virtio_blk_vq *vq = get_virtio_blk_vq(hctx); > + struct virtblk_req *vbr; > unsigned long flags; > + unsigned int len; > int found = 0; > > spin_lock_irqsave(&vq->lock, flags); > - found = virtblk_handle_req(vq, iob); > + > + while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) { > + struct request *req = blk_mq_rq_from_pdu(vbr); > + > + found++; > + if (!blk_mq_complete_request_remote(req) && > + !blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), > + virtblk_complete_batch)) > + virtblk_request_done(req); > + } > > if (found) > blk_mq_start_stopped_hw_queues(vblk->disk->queue, true); > -- > MST
On Fri, Jun 09, 2023 at 09:49:18AM +0000, Roberts, Martin wrote:
> OK, I didn't realise you had updated the patch; I only looked at the first one. I think you did the same as me, just changed vbr->status to virtblk_vbr_status(vbr), in virtblk_poll().
Yes exactly, though testing exactly what's on list is always a good
idea, just to avoid confusion.
On Fri, Jun 09, 2023 at 03:27:24AM -0400, Michael S. Tsirkin wrote: > This reverts commit 07b679f70d73483930e8d3c293942416d9cd5c13. This commit was also breaking kernel tests on a virtual Android device (cuttlefish). We were seeing hangups like: [ 2889.910733] INFO: task kworker/u8:2:6312 blocked for more than 720 seconds. [ 2889.910967] Tainted: G OE 6.2.0-mainline-g5c05cafa8df7-ab9969617 #1 [ 2889.911143] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2889.911389] task:kworker/u8:2 state:D stack:12160 pid:6312 ppid:2 flags:0x00004000 [ 2889.911567] Workqueue: writeback wb_workfn (flush-254:57) [ 2889.911771] Call Trace: [ 2889.911831] <TASK> [ 2889.911893] __schedule+0x55f/0x880 [ 2889.912021] schedule+0x6a/0xc0 [ 2889.912110] schedule_timeout+0x58/0x1a0 [ 2889.912200] wait_for_common+0xf7/0x1b0 [ 2889.912289] wait_for_completion+0x1c/0x40 [ 2889.912377] f2fs_issue_checkpoint+0x14c/0x210 [ 2889.912504] f2fs_sync_fs+0x4c/0xb0 [ 2889.912597] f2fs_balance_fs_bg+0x2f6/0x340 [ 2889.912736] ? can_migrate_task+0x39/0x2b0 [ 2889.912872] f2fs_write_node_pages+0x77/0x240 [ 2889.912989] do_writepages+0xde/0x240 [ 2889.913094] __writeback_single_inode+0x3f/0x360 [ 2889.913231] writeback_sb_inodes+0x320/0x5f0 [ 2889.913349] ? move_expired_inodes+0x58/0x210 [ 2889.913470] __writeback_inodes_wb+0x97/0x100 [ 2889.913587] wb_writeback+0x17e/0x390 [ 2889.913682] wb_workfn+0x35f/0x500 [ 2889.913774] process_one_work+0x1d9/0x3b0 [ 2889.913870] worker_thread+0x251/0x410 [ 2889.913960] kthread+0xeb/0x110 [ 2889.914052] ? __cfi_worker_thread+0x10/0x10 [ 2889.914168] ? __cfi_kthread+0x10/0x10 [ 2889.914257] ret_from_fork+0x29/0x50 [ 2889.914364] </TASK> [ 2889.914565] INFO: task mkdir09:6425 blocked for more than 720 seconds. [ 2889.916065] Tainted: G OE 6.2.0-mainline-g5c05cafa8df7-ab9969617 #1 [ 2889.916255] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2889.916450] task:mkdir09 state:D stack:13016 pid:6425 ppid:6423 flags:0x00004000 [ 2889.916656] Call Trace: [ 2889.916900] <TASK> [ 2889.917004] __schedule+0x55f/0x880 [ 2889.917129] schedule+0x6a/0xc0 [ 2889.917273] schedule_timeout+0x58/0x1a0 [ 2889.917425] wait_for_common+0xf7/0x1b0 [ 2889.917535] wait_for_completion+0x1c/0x40 [ 2889.917670] f2fs_issue_checkpoint+0x14c/0x210 [ 2889.917844] f2fs_sync_fs+0x4c/0xb0 [ 2889.917969] f2fs_do_sync_file+0x3a8/0x8c0 [ 2889.918090] ? mt_find+0xa0/0x1a0 [ 2889.918216] f2fs_sync_file+0x2f/0x60 [ 2889.918310] vfs_fsync_range+0x74/0x90 [ 2889.918567] __se_sys_msync+0x176/0x270 [ 2889.918730] __x64_sys_msync+0x1c/0x40 [ 2889.918873] do_syscall_64+0x53/0xb0 [ 2889.918996] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 2889.919178] RIP: 0033:0x7540b08bcf47 [ 2889.919297] RSP: 002b:00007fff5fcbeea8 EFLAGS: 00000206 ORIG_RAX: 000000000000001a [ 2889.919496] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007540b08bcf47 [ 2889.919828] RDX: 0000000000000004 RSI: 0000000000001000 RDI: 00007540b4ae7000 [ 2889.920227] RBP: 0000000000000000 R08: 721e0000010b0016 R09: 0000000000000003 [ 2889.920435] R10: 0000000000000100 R11: 0000000000000206 R12: 00005d2f95fd2f08 [ 2889.920793] R13: 00005d2f95fbc310 R14: 00007540b08e1bb8 R15: 00005d2f95fbc310 [ 2889.921072] </TASK> in random tests in the LTP (linux test project) test suite. > --- > > Since v1: > fix build error > > Still completely untested as I'm traveling. > Martin, Suwan, could you please test and report? > Suwan if you have a better revert in mind pls post and > I will be happy to drop this. > > Thanks! > This revert appears to have resolved the test failures for me. Tested-by: edliaw@google.com
On Tue, Jun 20, 2023 at 10:54:19PM +0000, Edward Liaw wrote: > On Fri, Jun 09, 2023 at 03:27:24AM -0400, Michael S. Tsirkin wrote: > > This reverts commit 07b679f70d73483930e8d3c293942416d9cd5c13. > This commit was also breaking kernel tests on a virtual Android device > (cuttlefish). We were seeing hangups like: > > [ 2889.910733] INFO: task kworker/u8:2:6312 blocked for more than 720 seconds. > [ 2889.910967] Tainted: G OE 6.2.0-mainline-g5c05cafa8df7-ab9969617 #1 > [ 2889.911143] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 2889.911389] task:kworker/u8:2 state:D stack:12160 pid:6312 ppid:2 flags:0x00004000 > [ 2889.911567] Workqueue: writeback wb_workfn (flush-254:57) > [ 2889.911771] Call Trace: > [ 2889.911831] <TASK> > [ 2889.911893] __schedule+0x55f/0x880 > [ 2889.912021] schedule+0x6a/0xc0 > [ 2889.912110] schedule_timeout+0x58/0x1a0 > [ 2889.912200] wait_for_common+0xf7/0x1b0 > [ 2889.912289] wait_for_completion+0x1c/0x40 > [ 2889.912377] f2fs_issue_checkpoint+0x14c/0x210 > [ 2889.912504] f2fs_sync_fs+0x4c/0xb0 > [ 2889.912597] f2fs_balance_fs_bg+0x2f6/0x340 > [ 2889.912736] ? can_migrate_task+0x39/0x2b0 > [ 2889.912872] f2fs_write_node_pages+0x77/0x240 > [ 2889.912989] do_writepages+0xde/0x240 > [ 2889.913094] __writeback_single_inode+0x3f/0x360 > [ 2889.913231] writeback_sb_inodes+0x320/0x5f0 > [ 2889.913349] ? move_expired_inodes+0x58/0x210 > [ 2889.913470] __writeback_inodes_wb+0x97/0x100 > [ 2889.913587] wb_writeback+0x17e/0x390 > [ 2889.913682] wb_workfn+0x35f/0x500 > [ 2889.913774] process_one_work+0x1d9/0x3b0 > [ 2889.913870] worker_thread+0x251/0x410 > [ 2889.913960] kthread+0xeb/0x110 > [ 2889.914052] ? __cfi_worker_thread+0x10/0x10 > [ 2889.914168] ? __cfi_kthread+0x10/0x10 > [ 2889.914257] ret_from_fork+0x29/0x50 > [ 2889.914364] </TASK> > [ 2889.914565] INFO: task mkdir09:6425 blocked for more than 720 seconds. > [ 2889.916065] Tainted: G OE 6.2.0-mainline-g5c05cafa8df7-ab9969617 #1 > [ 2889.916255] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 2889.916450] task:mkdir09 state:D stack:13016 pid:6425 ppid:6423 flags:0x00004000 > [ 2889.916656] Call Trace: > [ 2889.916900] <TASK> > [ 2889.917004] __schedule+0x55f/0x880 > [ 2889.917129] schedule+0x6a/0xc0 > [ 2889.917273] schedule_timeout+0x58/0x1a0 > [ 2889.917425] wait_for_common+0xf7/0x1b0 > [ 2889.917535] wait_for_completion+0x1c/0x40 > [ 2889.917670] f2fs_issue_checkpoint+0x14c/0x210 > [ 2889.917844] f2fs_sync_fs+0x4c/0xb0 > [ 2889.917969] f2fs_do_sync_file+0x3a8/0x8c0 > [ 2889.918090] ? mt_find+0xa0/0x1a0 > [ 2889.918216] f2fs_sync_file+0x2f/0x60 > [ 2889.918310] vfs_fsync_range+0x74/0x90 > [ 2889.918567] __se_sys_msync+0x176/0x270 > [ 2889.918730] __x64_sys_msync+0x1c/0x40 > [ 2889.918873] do_syscall_64+0x53/0xb0 > [ 2889.918996] entry_SYSCALL_64_after_hwframe+0x72/0xdc > [ 2889.919178] RIP: 0033:0x7540b08bcf47 > [ 2889.919297] RSP: 002b:00007fff5fcbeea8 EFLAGS: 00000206 ORIG_RAX: 000000000000001a > [ 2889.919496] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007540b08bcf47 > [ 2889.919828] RDX: 0000000000000004 RSI: 0000000000001000 RDI: 00007540b4ae7000 > [ 2889.920227] RBP: 0000000000000000 R08: 721e0000010b0016 R09: 0000000000000003 > [ 2889.920435] R10: 0000000000000100 R11: 0000000000000206 R12: 00005d2f95fd2f08 > [ 2889.920793] R13: 00005d2f95fbc310 R14: 00007540b08e1bb8 R15: 00005d2f95fbc310 > [ 2889.921072] </TASK> > > in random tests in the LTP (linux test project) test suite. > > > --- > > > > Since v1: > > fix build error > > > > Still completely untested as I'm traveling. > > Martin, Suwan, could you please test and report? > > Suwan if you have a better revert in mind pls post and > > I will be happy to drop this. > > > > Thanks! > > > This revert appears to have resolved the test failures for me. > > Tested-by: edliaw@google.com Oh interesting, can you share how to reproduce the failures? Then maybe Suwan Kim can take a look at fixing up the patch.
On Tue, Jun 20, 2023 at 10:54:19PM +0000, Edward Liaw wrote: > On Fri, Jun 09, 2023 at 03:27:24AM -0400, Michael S. Tsirkin wrote: > > This reverts commit 07b679f70d73483930e8d3c293942416d9cd5c13. > This commit was also breaking kernel tests on a virtual Android device > (cuttlefish). We were seeing hangups like: > > [ 2889.910733] INFO: task kworker/u8:2:6312 blocked for more than 720 seconds. > [ 2889.910967] Tainted: G OE 6.2.0-mainline-g5c05cafa8df7-ab9969617 #1 > [ 2889.911143] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 2889.911389] task:kworker/u8:2 state:D stack:12160 pid:6312 ppid:2 flags:0x00004000 > [ 2889.911567] Workqueue: writeback wb_workfn (flush-254:57) > [ 2889.911771] Call Trace: > [ 2889.911831] <TASK> > [ 2889.911893] __schedule+0x55f/0x880 > [ 2889.912021] schedule+0x6a/0xc0 > [ 2889.912110] schedule_timeout+0x58/0x1a0 > [ 2889.912200] wait_for_common+0xf7/0x1b0 > [ 2889.912289] wait_for_completion+0x1c/0x40 > [ 2889.912377] f2fs_issue_checkpoint+0x14c/0x210 > [ 2889.912504] f2fs_sync_fs+0x4c/0xb0 > [ 2889.912597] f2fs_balance_fs_bg+0x2f6/0x340 > [ 2889.912736] ? can_migrate_task+0x39/0x2b0 > [ 2889.912872] f2fs_write_node_pages+0x77/0x240 > [ 2889.912989] do_writepages+0xde/0x240 > [ 2889.913094] __writeback_single_inode+0x3f/0x360 > [ 2889.913231] writeback_sb_inodes+0x320/0x5f0 > [ 2889.913349] ? move_expired_inodes+0x58/0x210 > [ 2889.913470] __writeback_inodes_wb+0x97/0x100 > [ 2889.913587] wb_writeback+0x17e/0x390 > [ 2889.913682] wb_workfn+0x35f/0x500 > [ 2889.913774] process_one_work+0x1d9/0x3b0 > [ 2889.913870] worker_thread+0x251/0x410 > [ 2889.913960] kthread+0xeb/0x110 > [ 2889.914052] ? __cfi_worker_thread+0x10/0x10 > [ 2889.914168] ? __cfi_kthread+0x10/0x10 > [ 2889.914257] ret_from_fork+0x29/0x50 > [ 2889.914364] </TASK> > [ 2889.914565] INFO: task mkdir09:6425 blocked for more than 720 seconds. > [ 2889.916065] Tainted: G OE 6.2.0-mainline-g5c05cafa8df7-ab9969617 #1 > [ 2889.916255] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 2889.916450] task:mkdir09 state:D stack:13016 pid:6425 ppid:6423 flags:0x00004000 > [ 2889.916656] Call Trace: > [ 2889.916900] <TASK> > [ 2889.917004] __schedule+0x55f/0x880 > [ 2889.917129] schedule+0x6a/0xc0 > [ 2889.917273] schedule_timeout+0x58/0x1a0 > [ 2889.917425] wait_for_common+0xf7/0x1b0 > [ 2889.917535] wait_for_completion+0x1c/0x40 > [ 2889.917670] f2fs_issue_checkpoint+0x14c/0x210 > [ 2889.917844] f2fs_sync_fs+0x4c/0xb0 > [ 2889.917969] f2fs_do_sync_file+0x3a8/0x8c0 > [ 2889.918090] ? mt_find+0xa0/0x1a0 > [ 2889.918216] f2fs_sync_file+0x2f/0x60 > [ 2889.918310] vfs_fsync_range+0x74/0x90 > [ 2889.918567] __se_sys_msync+0x176/0x270 > [ 2889.918730] __x64_sys_msync+0x1c/0x40 > [ 2889.918873] do_syscall_64+0x53/0xb0 > [ 2889.918996] entry_SYSCALL_64_after_hwframe+0x72/0xdc > [ 2889.919178] RIP: 0033:0x7540b08bcf47 > [ 2889.919297] RSP: 002b:00007fff5fcbeea8 EFLAGS: 00000206 ORIG_RAX: 000000000000001a > [ 2889.919496] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007540b08bcf47 > [ 2889.919828] RDX: 0000000000000004 RSI: 0000000000001000 RDI: 00007540b4ae7000 > [ 2889.920227] RBP: 0000000000000000 R08: 721e0000010b0016 R09: 0000000000000003 > [ 2889.920435] R10: 0000000000000100 R11: 0000000000000206 R12: 00005d2f95fd2f08 > [ 2889.920793] R13: 00005d2f95fbc310 R14: 00007540b08e1bb8 R15: 00005d2f95fbc310 > [ 2889.921072] </TASK> > > in random tests in the LTP (linux test project) test suite. > > > --- > > > > Since v1: > > fix build error > > > > Still completely untested as I'm traveling. > > Martin, Suwan, could you please test and report? > > Suwan if you have a better revert in mind pls post and > > I will be happy to drop this. > > > > Thanks! > > > This revert appears to have resolved the test failures for me. > > Tested-by: edliaw@google.com Does Android have VIRTIO_BLK_F_ZONED ? Could you list the features this device has please? Thanks!
On Wed, Jun 21, 2023 at 11:14 PM Michael S. Tsirkin <mst@redhat.com> wrote: > Oh interesting, can you share how to reproduce the failures? > Then maybe Suwan Kim can take a look at fixing up the patch. I did not have a deterministic way to reproduce the failure. I ran these test suites 5x consecutively and that seemed to fail often enough to let me bisect it. I checked out the kernel at d645c62fd19af9c9246150a958450ffd26ad0bfa Merge 84cc6674b76b ("Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost") into android-mainline, built it, and ran cuttlefish with the following configuration: tools/bazel run //common-modules/virtual-device:virtual_device_x86_64_dist # builds the kernel with bazel acloud create --local-instance --local-kernel-image $ANDROID_MAINLINE/out/virtual_device_x86_64/dist --branch aosp-master --build-target aosp_cf_x86_64_phone-userdebug # run cuttlefish with custom kernel And ran the following tests: atest --rerun-until-failure 5 vab_legacy_tests vts_core_liblp_test vts_libdm_test vts_libsnapshot_test vts_ltp_test_x86_64 I'm not sure if that will be helpful for you, though. On Thu, Jun 22, 2023 at 10:53 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > Does Android have VIRTIO_BLK_F_ZONED ? Could you list the > features this device has please? Thanks! It does not. $ cat /sys/block/vda/device/driver/virtio*/features 001000100100111000000000000001001000000000000000000000000000000 VIRTIO_BLK_F_SEG_MAX VIRTIO_BLK_F_BLK_SIZE VIRTIO_BLK_F_FLUSH VIRTIO_BLK_F_MQ VIRTIO_BLK_F_DISCARD VIRTIO_BLK_F_WRITE_ZEROES
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 2b918e28acaa..b47358da92a2 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -348,63 +348,33 @@ static inline void virtblk_request_done(struct request *req) blk_mq_end_request(req, status); } -static void virtblk_complete_batch(struct io_comp_batch *iob) -{ - struct request *req; - - rq_list_for_each(&iob->req_list, req) { - virtblk_unmap_data(req, blk_mq_rq_to_pdu(req)); - virtblk_cleanup_cmd(req); - } - blk_mq_end_request_batch(iob); -} - -static int virtblk_handle_req(struct virtio_blk_vq *vq, - struct io_comp_batch *iob) -{ - struct virtblk_req *vbr; - int req_done = 0; - unsigned int len; - - while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) { - struct request *req = blk_mq_rq_from_pdu(vbr); - - if (likely(!blk_should_fake_timeout(req->q)) && - !blk_mq_complete_request_remote(req) && - !blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), - virtblk_complete_batch)) - virtblk_request_done(req); - req_done++; - } - - return req_done; -} - static void virtblk_done(struct virtqueue *vq) { struct virtio_blk *vblk = vq->vdev->priv; - struct virtio_blk_vq *vblk_vq = &vblk->vqs[vq->index]; - int req_done = 0; + bool req_done = false; + int qid = vq->index; + struct virtblk_req *vbr; unsigned long flags; - DEFINE_IO_COMP_BATCH(iob); + unsigned int len; - spin_lock_irqsave(&vblk_vq->lock, flags); + spin_lock_irqsave(&vblk->vqs[qid].lock, flags); do { virtqueue_disable_cb(vq); - req_done += virtblk_handle_req(vblk_vq, &iob); + while ((vbr = virtqueue_get_buf(vblk->vqs[qid].vq, &len)) != NULL) { + struct request *req = blk_mq_rq_from_pdu(vbr); + if (likely(!blk_should_fake_timeout(req->q))) + blk_mq_complete_request(req); + req_done = true; + } if (unlikely(virtqueue_is_broken(vq))) break; } while (!virtqueue_enable_cb(vq)); - if (req_done) { - if (!rq_list_empty(iob.req_list)) - iob.complete(&iob); - - /* In case queue is stopped waiting for more buffers. */ + /* In case queue is stopped waiting for more buffers. */ + if (req_done) blk_mq_start_stopped_hw_queues(vblk->disk->queue, true); - } - spin_unlock_irqrestore(&vblk_vq->lock, flags); + spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags); } static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx) @@ -1283,15 +1253,37 @@ static void virtblk_map_queues(struct blk_mq_tag_set *set) } } +static void virtblk_complete_batch(struct io_comp_batch *iob) +{ + struct request *req; + + rq_list_for_each(&iob->req_list, req) { + virtblk_unmap_data(req, blk_mq_rq_to_pdu(req)); + virtblk_cleanup_cmd(req); + } + blk_mq_end_request_batch(iob); +} + static int virtblk_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob) { struct virtio_blk *vblk = hctx->queue->queuedata; struct virtio_blk_vq *vq = get_virtio_blk_vq(hctx); + struct virtblk_req *vbr; unsigned long flags; + unsigned int len; int found = 0; spin_lock_irqsave(&vq->lock, flags); - found = virtblk_handle_req(vq, iob); + + while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) { + struct request *req = blk_mq_rq_from_pdu(vbr); + + found++; + if (!blk_mq_complete_request_remote(req) && + !blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), + virtblk_complete_batch)) + virtblk_request_done(req); + } if (found) blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);