[v2] Revert "virtio-blk: support completion batching for the IRQ path"

Message ID 336455b4f630f329380a8f53ee8cad3868764d5c.1686295549.git.mst@redhat.com
State New
Headers
Series [v2] Revert "virtio-blk: support completion batching for the IRQ path" |

Commit Message

Michael S. Tsirkin June 9, 2023, 7:27 a.m. UTC
  This reverts commit 07b679f70d73483930e8d3c293942416d9cd5c13.

This change appears to have broken things...
We now see applications hanging during disk accesses.
e.g.
multi-port virtio-blk device running in h/w (FPGA)
Host running a simple 'fio' test.
[global]
thread=1
direct=1
ioengine=libaio
norandommap=1
group_reporting=1
bs=4K
rw=read
iodepth=128
runtime=1
numjobs=4
time_based
[job0]
filename=/dev/vda
[job1]
filename=/dev/vdb
[job2]
filename=/dev/vdc
...
[job15]
filename=/dev/vdp

i.e. 16 disks; 4 queues per disk; simple burst of 4KB reads
This is repeatedly run in a loop.

After a few, normally <10 seconds, fio hangs.
With 64 queues (16 disks), failure occurs within a few seconds; with 8 queues (2 disks) it may take ~hour before hanging.
Last message:
fio-3.19
Starting 8 threads
Jobs: 1 (f=1): [_(7),R(1)][68.3%][eta 03h:11m:06s]
I think this means at the end of the run 1 queue was left incomplete.

'diskstats' (run while fio is hung) shows no outstanding transactions.
e.g.
$ cat /proc/diskstats
...
252       0 vda 1843140071 0 14745120568 712568645 0 0 0 0 0 3117947 712568645 0 0 0 0 0 0
252      16 vdb 1816291511 0 14530332088 704905623 0 0 0 0 0 3117711 704905623 0 0 0 0 0 0
...

Other stats (in the h/w, and added to the virtio-blk driver ([a]virtio_queue_rq(), [b]virtblk_handle_req(), [c]virtblk_request_done()) all agree, and show every request had a completion, and that virtblk_request_done() never gets called.
e.g.
PF= 0                         vq=0           1           2           3
[a]request_count     -   839416590   813148916   105586179    84988123
[b]completion1_count -   839416590   813148916   105586179    84988123
[c]completion2_count -           0           0           0           0

PF= 1                         vq=0           1           2           3
[a]request_count     -   823335887   812516140   104582672    75856549
[b]completion1_count -   823335887   812516140   104582672    75856549
[c]completion2_count -           0           0           0           0

i.e. the issue is after the virtio-blk driver.

This change was introduced in kernel 6.3.0.
I am seeing this using 6.3.3.
If I run with an earlier kernel (5.15), it does not occur.
If I make a simple patch to the 6.3.3 virtio-blk driver, to skip the blk_mq_add_to_batch()call, it does not fail.
e.g.
kernel 5.15 - this is OK
virtio_blk.c,virtblk_done() [irq handler]
                 if (likely(!blk_should_fake_timeout(req->q))) {
                          blk_mq_complete_request(req);
                 }

kernel 6.3.3 - this fails
virtio_blk.c,virtblk_handle_req() [irq handler]
                 if (likely(!blk_should_fake_timeout(req->q))) {
                          if (!blk_mq_complete_request_remote(req)) {
                                  if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) {
                                           virtblk_request_done(req);    //this never gets called... so blk_mq_add_to_batch() must always succeed
                                   }
                          }
                 }

If I do, kernel 6.3.3 - this is OK
virtio_blk.c,virtblk_handle_req() [irq handler]
                 if (likely(!blk_should_fake_timeout(req->q))) {
                          if (!blk_mq_complete_request_remote(req)) {
                                   virtblk_request_done(req); //force this here...
                                  if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) {
                                           virtblk_request_done(req);    //this never gets called... so blk_mq_add_to_batch() must always succeed
                                   }
                          }
                 }

Perhaps you might like to fix/test/revert this change...
Martin

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306090826.C1fZmdMe-lkp@intel.com/
Cc: Suwan Kim <suwan.kim027@gmail.com>
Reported-by: "Roberts, Martin" <martin.roberts@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

Since v1:
	fix build error

Still completely untested as I'm traveling.
Martin, Suwan, could you please test and report?
Suwan if you have a better revert in mind pls post and
I will be happy to drop this.

Thanks!


 drivers/block/virtio_blk.c | 82 +++++++++++++++++---------------------
 1 file changed, 37 insertions(+), 45 deletions(-)
  

Comments

Roberts, Martin June 9, 2023, 9:34 a.m. UTC | #1
I think, at some point,  vbr->status, got changed to virtblk_vbr_status(vbr).
If I use the version of virtblk_poll() from the patch, but substitute virtblk_vbr_status(vbr), then it (patched 6.3.3) compiles OK.

Note, my setup never causes virtblk_poll() to be called, so its influence on the issue is unknown - but maybe it also shouldn't be running in batch mode.

With the patch, I've not (yet) managed to hang it - I will let it run a bit longer.
Martin

-----Original Message-----
From: Michael S. Tsirkin <mst@redhat.com> 
Sent: Friday, June 9, 2023 8:27 AM
To: linux-kernel@vger.kernel.org
Cc: lkp <lkp@intel.com>; Suwan Kim <suwan.kim027@gmail.com>; Roberts, Martin <martin.roberts@intel.com>; Jason Wang <jasowang@redhat.com>; Paolo Bonzini <pbonzini@redhat.com>; Stefan Hajnoczi <stefanha@redhat.com>; Xuan Zhuo <xuanzhuo@linux.alibaba.com>; Jens Axboe <axboe@kernel.dk>; virtualization@lists.linux-foundation.org; linux-block@vger.kernel.org
Subject: [PATCH v2] Revert "virtio-blk: support completion batching for the IRQ path"

This reverts commit 07b679f70d73483930e8d3c293942416d9cd5c13.

This change appears to have broken things...
We now see applications hanging during disk accesses.
e.g.
multi-port virtio-blk device running in h/w (FPGA)
Host running a simple 'fio' test.
[global]
thread=1
direct=1
ioengine=libaio
norandommap=1
group_reporting=1
bs=4K
rw=read
iodepth=128
runtime=1
numjobs=4
time_based
[job0]
filename=/dev/vda
[job1]
filename=/dev/vdb
[job2]
filename=/dev/vdc
...
[job15]
filename=/dev/vdp

i.e. 16 disks; 4 queues per disk; simple burst of 4KB reads
This is repeatedly run in a loop.

After a few, normally <10 seconds, fio hangs.
With 64 queues (16 disks), failure occurs within a few seconds; with 8 queues (2 disks) it may take ~hour before hanging.
Last message:
fio-3.19
Starting 8 threads
Jobs: 1 (f=1): [_(7),R(1)][68.3%][eta 03h:11m:06s]
I think this means at the end of the run 1 queue was left incomplete.

'diskstats' (run while fio is hung) shows no outstanding transactions.
e.g.
$ cat /proc/diskstats
...
252       0 vda 1843140071 0 14745120568 712568645 0 0 0 0 0 3117947 712568645 0 0 0 0 0 0
252      16 vdb 1816291511 0 14530332088 704905623 0 0 0 0 0 3117711 704905623 0 0 0 0 0 0
...

Other stats (in the h/w, and added to the virtio-blk driver ([a]virtio_queue_rq(), [b]virtblk_handle_req(), [c]virtblk_request_done()) all agree, and show every request had a completion, and that virtblk_request_done() never gets called.
e.g.
PF= 0                         vq=0           1           2           3
[a]request_count     -   839416590   813148916   105586179    84988123
[b]completion1_count -   839416590   813148916   105586179    84988123
[c]completion2_count -           0           0           0           0

PF= 1                         vq=0           1           2           3
[a]request_count     -   823335887   812516140   104582672    75856549
[b]completion1_count -   823335887   812516140   104582672    75856549
[c]completion2_count -           0           0           0           0

i.e. the issue is after the virtio-blk driver.

This change was introduced in kernel 6.3.0.
I am seeing this using 6.3.3.
If I run with an earlier kernel (5.15), it does not occur.
If I make a simple patch to the 6.3.3 virtio-blk driver, to skip the blk_mq_add_to_batch()call, it does not fail.
e.g.
kernel 5.15 - this is OK
virtio_blk.c,virtblk_done() [irq handler]
                 if (likely(!blk_should_fake_timeout(req->q))) {
                          blk_mq_complete_request(req);
                 }

kernel 6.3.3 - this fails
virtio_blk.c,virtblk_handle_req() [irq handler]
                 if (likely(!blk_should_fake_timeout(req->q))) {
                          if (!blk_mq_complete_request_remote(req)) {
                                  if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) {
                                           virtblk_request_done(req);    //this never gets called... so blk_mq_add_to_batch() must always succeed
                                   }
                          }
                 }

If I do, kernel 6.3.3 - this is OK
virtio_blk.c,virtblk_handle_req() [irq handler]
                 if (likely(!blk_should_fake_timeout(req->q))) {
                          if (!blk_mq_complete_request_remote(req)) {
                                   virtblk_request_done(req); //force this here...
                                  if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) {
                                           virtblk_request_done(req);    //this never gets called... so blk_mq_add_to_batch() must always succeed
                                   }
                          }
                 }

Perhaps you might like to fix/test/revert this change...
Martin

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306090826.C1fZmdMe-lkp@intel.com/
Cc: Suwan Kim <suwan.kim027@gmail.com>
Reported-by: "Roberts, Martin" <martin.roberts@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

Since v1:
	fix build error

Still completely untested as I'm traveling.
Martin, Suwan, could you please test and report?
Suwan if you have a better revert in mind pls post and
I will be happy to drop this.

Thanks!


 drivers/block/virtio_blk.c | 82 +++++++++++++++++---------------------
 1 file changed, 37 insertions(+), 45 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 2b918e28acaa..b47358da92a2 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -348,63 +348,33 @@ static inline void virtblk_request_done(struct request *req)
 	blk_mq_end_request(req, status);
 }
 
-static void virtblk_complete_batch(struct io_comp_batch *iob)
-{
-	struct request *req;
-
-	rq_list_for_each(&iob->req_list, req) {
-		virtblk_unmap_data(req, blk_mq_rq_to_pdu(req));
-		virtblk_cleanup_cmd(req);
-	}
-	blk_mq_end_request_batch(iob);
-}
-
-static int virtblk_handle_req(struct virtio_blk_vq *vq,
-			      struct io_comp_batch *iob)
-{
-	struct virtblk_req *vbr;
-	int req_done = 0;
-	unsigned int len;
-
-	while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) {
-		struct request *req = blk_mq_rq_from_pdu(vbr);
-
-		if (likely(!blk_should_fake_timeout(req->q)) &&
-		    !blk_mq_complete_request_remote(req) &&
-		    !blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr),
-					 virtblk_complete_batch))
-			virtblk_request_done(req);
-		req_done++;
-	}
-
-	return req_done;
-}
-
 static void virtblk_done(struct virtqueue *vq)
 {
 	struct virtio_blk *vblk = vq->vdev->priv;
-	struct virtio_blk_vq *vblk_vq = &vblk->vqs[vq->index];
-	int req_done = 0;
+	bool req_done = false;
+	int qid = vq->index;
+	struct virtblk_req *vbr;
 	unsigned long flags;
-	DEFINE_IO_COMP_BATCH(iob);
+	unsigned int len;
 
-	spin_lock_irqsave(&vblk_vq->lock, flags);
+	spin_lock_irqsave(&vblk->vqs[qid].lock, flags);
 	do {
 		virtqueue_disable_cb(vq);
-		req_done += virtblk_handle_req(vblk_vq, &iob);
+		while ((vbr = virtqueue_get_buf(vblk->vqs[qid].vq, &len)) != NULL) {
+			struct request *req = blk_mq_rq_from_pdu(vbr);
 
+			if (likely(!blk_should_fake_timeout(req->q)))
+				blk_mq_complete_request(req);
+			req_done = true;
+		}
 		if (unlikely(virtqueue_is_broken(vq)))
 			break;
 	} while (!virtqueue_enable_cb(vq));
 
-	if (req_done) {
-		if (!rq_list_empty(iob.req_list))
-			iob.complete(&iob);
-
-		/* In case queue is stopped waiting for more buffers. */
+	/* In case queue is stopped waiting for more buffers. */
+	if (req_done)
 		blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);
-	}
-	spin_unlock_irqrestore(&vblk_vq->lock, flags);
+	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
 }
 
 static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
@@ -1283,15 +1253,37 @@ static void virtblk_map_queues(struct blk_mq_tag_set *set)
 	}
 }
 
+static void virtblk_complete_batch(struct io_comp_batch *iob)
+{
+	struct request *req;
+
+	rq_list_for_each(&iob->req_list, req) {
+		virtblk_unmap_data(req, blk_mq_rq_to_pdu(req));
+		virtblk_cleanup_cmd(req);
+	}
+	blk_mq_end_request_batch(iob);
+}
+
 static int virtblk_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
 {
 	struct virtio_blk *vblk = hctx->queue->queuedata;
 	struct virtio_blk_vq *vq = get_virtio_blk_vq(hctx);
+	struct virtblk_req *vbr;
 	unsigned long flags;
+	unsigned int len;
 	int found = 0;
 
 	spin_lock_irqsave(&vq->lock, flags);
-	found = virtblk_handle_req(vq, iob);
+
+	while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) {
+		struct request *req = blk_mq_rq_from_pdu(vbr);
+
+		found++;
+		if (!blk_mq_complete_request_remote(req) &&
+		    !blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr),
+						virtblk_complete_batch))
+			virtblk_request_done(req);
+	}
 
 	if (found)
 		blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);
  
Michael S. Tsirkin June 9, 2023, 9:42 a.m. UTC | #2
On Fri, Jun 09, 2023 at 09:34:40AM +0000, Roberts, Martin wrote:
> I think, at some point,  vbr->status, got changed to virtblk_vbr_status(vbr).
> If I use the version of virtblk_poll() from the patch, but substitute virtblk_vbr_status(vbr), then it (patched 6.3.3) compiles OK.

Hmm v2 does not have vbr->status anymore. Are you referring to v1?

> Note, my setup never causes virtblk_poll() to be called, so its influence on the issue is unknown - but maybe it also shouldn't be running in batch mode.
> 
> With the patch, I've not (yet) managed to hang it - I will let it run a bit longer.
> Martin

Want to post the patch that works for you?

> -----Original Message-----
> From: Michael S. Tsirkin <mst@redhat.com> 
> Sent: Friday, June 9, 2023 8:27 AM
> To: linux-kernel@vger.kernel.org
> Cc: lkp <lkp@intel.com>; Suwan Kim <suwan.kim027@gmail.com>; Roberts, Martin <martin.roberts@intel.com>; Jason Wang <jasowang@redhat.com>; Paolo Bonzini <pbonzini@redhat.com>; Stefan Hajnoczi <stefanha@redhat.com>; Xuan Zhuo <xuanzhuo@linux.alibaba.com>; Jens Axboe <axboe@kernel.dk>; virtualization@lists.linux-foundation.org; linux-block@vger.kernel.org
> Subject: [PATCH v2] Revert "virtio-blk: support completion batching for the IRQ path"
> 
> This reverts commit 07b679f70d73483930e8d3c293942416d9cd5c13.
> 
> This change appears to have broken things...
> We now see applications hanging during disk accesses.
> e.g.
> multi-port virtio-blk device running in h/w (FPGA)
> Host running a simple 'fio' test.
> [global]
> thread=1
> direct=1
> ioengine=libaio
> norandommap=1
> group_reporting=1
> bs=4K
> rw=read
> iodepth=128
> runtime=1
> numjobs=4
> time_based
> [job0]
> filename=/dev/vda
> [job1]
> filename=/dev/vdb
> [job2]
> filename=/dev/vdc
> ...
> [job15]
> filename=/dev/vdp
> 
> i.e. 16 disks; 4 queues per disk; simple burst of 4KB reads
> This is repeatedly run in a loop.
> 
> After a few, normally <10 seconds, fio hangs.
> With 64 queues (16 disks), failure occurs within a few seconds; with 8 queues (2 disks) it may take ~hour before hanging.
> Last message:
> fio-3.19
> Starting 8 threads
> Jobs: 1 (f=1): [_(7),R(1)][68.3%][eta 03h:11m:06s]
> I think this means at the end of the run 1 queue was left incomplete.
> 
> 'diskstats' (run while fio is hung) shows no outstanding transactions.
> e.g.
> $ cat /proc/diskstats
> ...
> 252       0 vda 1843140071 0 14745120568 712568645 0 0 0 0 0 3117947 712568645 0 0 0 0 0 0
> 252      16 vdb 1816291511 0 14530332088 704905623 0 0 0 0 0 3117711 704905623 0 0 0 0 0 0
> ...
> 
> Other stats (in the h/w, and added to the virtio-blk driver ([a]virtio_queue_rq(), [b]virtblk_handle_req(), [c]virtblk_request_done()) all agree, and show every request had a completion, and that virtblk_request_done() never gets called.
> e.g.
> PF= 0                         vq=0           1           2           3
> [a]request_count     -   839416590   813148916   105586179    84988123
> [b]completion1_count -   839416590   813148916   105586179    84988123
> [c]completion2_count -           0           0           0           0
> 
> PF= 1                         vq=0           1           2           3
> [a]request_count     -   823335887   812516140   104582672    75856549
> [b]completion1_count -   823335887   812516140   104582672    75856549
> [c]completion2_count -           0           0           0           0
> 
> i.e. the issue is after the virtio-blk driver.
> 
> This change was introduced in kernel 6.3.0.
> I am seeing this using 6.3.3.
> If I run with an earlier kernel (5.15), it does not occur.
> If I make a simple patch to the 6.3.3 virtio-blk driver, to skip the blk_mq_add_to_batch()call, it does not fail.
> e.g.
> kernel 5.15 - this is OK
> virtio_blk.c,virtblk_done() [irq handler]
>                  if (likely(!blk_should_fake_timeout(req->q))) {
>                           blk_mq_complete_request(req);
>                  }
> 
> kernel 6.3.3 - this fails
> virtio_blk.c,virtblk_handle_req() [irq handler]
>                  if (likely(!blk_should_fake_timeout(req->q))) {
>                           if (!blk_mq_complete_request_remote(req)) {
>                                   if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) {
>                                            virtblk_request_done(req);    //this never gets called... so blk_mq_add_to_batch() must always succeed
>                                    }
>                           }
>                  }
> 
> If I do, kernel 6.3.3 - this is OK
> virtio_blk.c,virtblk_handle_req() [irq handler]
>                  if (likely(!blk_should_fake_timeout(req->q))) {
>                           if (!blk_mq_complete_request_remote(req)) {
>                                    virtblk_request_done(req); //force this here...
>                                   if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) {
>                                            virtblk_request_done(req);    //this never gets called... so blk_mq_add_to_batch() must always succeed
>                                    }
>                           }
>                  }
> 
> Perhaps you might like to fix/test/revert this change...
> Martin
> 
> Reported-by: kernel test robot <lkp@intel.com>
> Closes: https://lore.kernel.org/oe-kbuild-all/202306090826.C1fZmdMe-lkp@intel.com/
> Cc: Suwan Kim <suwan.kim027@gmail.com>
> Reported-by: "Roberts, Martin" <martin.roberts@intel.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> 
> Since v1:
> 	fix build error
> 
> Still completely untested as I'm traveling.
> Martin, Suwan, could you please test and report?
> Suwan if you have a better revert in mind pls post and
> I will be happy to drop this.
> 
> Thanks!
> 
> 
>  drivers/block/virtio_blk.c | 82 +++++++++++++++++---------------------
>  1 file changed, 37 insertions(+), 45 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 2b918e28acaa..b47358da92a2 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -348,63 +348,33 @@ static inline void virtblk_request_done(struct request *req)
>  	blk_mq_end_request(req, status);
>  }
>  
> -static void virtblk_complete_batch(struct io_comp_batch *iob)
> -{
> -	struct request *req;
> -
> -	rq_list_for_each(&iob->req_list, req) {
> -		virtblk_unmap_data(req, blk_mq_rq_to_pdu(req));
> -		virtblk_cleanup_cmd(req);
> -	}
> -	blk_mq_end_request_batch(iob);
> -}
> -
> -static int virtblk_handle_req(struct virtio_blk_vq *vq,
> -			      struct io_comp_batch *iob)
> -{
> -	struct virtblk_req *vbr;
> -	int req_done = 0;
> -	unsigned int len;
> -
> -	while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) {
> -		struct request *req = blk_mq_rq_from_pdu(vbr);
> -
> -		if (likely(!blk_should_fake_timeout(req->q)) &&
> -		    !blk_mq_complete_request_remote(req) &&
> -		    !blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr),
> -					 virtblk_complete_batch))
> -			virtblk_request_done(req);
> -		req_done++;
> -	}
> -
> -	return req_done;
> -}
> -
>  static void virtblk_done(struct virtqueue *vq)
>  {
>  	struct virtio_blk *vblk = vq->vdev->priv;
> -	struct virtio_blk_vq *vblk_vq = &vblk->vqs[vq->index];
> -	int req_done = 0;
> +	bool req_done = false;
> +	int qid = vq->index;
> +	struct virtblk_req *vbr;
>  	unsigned long flags;
> -	DEFINE_IO_COMP_BATCH(iob);
> +	unsigned int len;
>  
> -	spin_lock_irqsave(&vblk_vq->lock, flags);
> +	spin_lock_irqsave(&vblk->vqs[qid].lock, flags);
>  	do {
>  		virtqueue_disable_cb(vq);
> -		req_done += virtblk_handle_req(vblk_vq, &iob);
> +		while ((vbr = virtqueue_get_buf(vblk->vqs[qid].vq, &len)) != NULL) {
> +			struct request *req = blk_mq_rq_from_pdu(vbr);
>  
> +			if (likely(!blk_should_fake_timeout(req->q)))
> +				blk_mq_complete_request(req);
> +			req_done = true;
> +		}
>  		if (unlikely(virtqueue_is_broken(vq)))
>  			break;
>  	} while (!virtqueue_enable_cb(vq));
>  
> -	if (req_done) {
> -		if (!rq_list_empty(iob.req_list))
> -			iob.complete(&iob);
> -
> -		/* In case queue is stopped waiting for more buffers. */
> +	/* In case queue is stopped waiting for more buffers. */
> +	if (req_done)
>  		blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);
> -	}
> -	spin_unlock_irqrestore(&vblk_vq->lock, flags);
> +	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>  }
>  
>  static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
> @@ -1283,15 +1253,37 @@ static void virtblk_map_queues(struct blk_mq_tag_set *set)
>  	}
>  }
>  
> +static void virtblk_complete_batch(struct io_comp_batch *iob)
> +{
> +	struct request *req;
> +
> +	rq_list_for_each(&iob->req_list, req) {
> +		virtblk_unmap_data(req, blk_mq_rq_to_pdu(req));
> +		virtblk_cleanup_cmd(req);
> +	}
> +	blk_mq_end_request_batch(iob);
> +}
> +
>  static int virtblk_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
>  {
>  	struct virtio_blk *vblk = hctx->queue->queuedata;
>  	struct virtio_blk_vq *vq = get_virtio_blk_vq(hctx);
> +	struct virtblk_req *vbr;
>  	unsigned long flags;
> +	unsigned int len;
>  	int found = 0;
>  
>  	spin_lock_irqsave(&vq->lock, flags);
> -	found = virtblk_handle_req(vq, iob);
> +
> +	while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) {
> +		struct request *req = blk_mq_rq_from_pdu(vbr);
> +
> +		found++;
> +		if (!blk_mq_complete_request_remote(req) &&
> +		    !blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr),
> +						virtblk_complete_batch))
> +			virtblk_request_done(req);
> +	}
>  
>  	if (found)
>  		blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);
> -- 
> MST
  
Roberts, Martin June 9, 2023, 9:49 a.m. UTC | #3
OK, I didn't realise you had updated the patch; I only looked at the first one.  I think you did the same as me, just changed vbr->status to virtblk_vbr_status(vbr), in virtblk_poll().

-----Original Message-----
From: Michael S. Tsirkin <mst@redhat.com> 
Sent: Friday, June 9, 2023 10:42 AM
To: Roberts, Martin <martin.roberts@intel.com>
Cc: linux-kernel@vger.kernel.org; lkp <lkp@intel.com>; Suwan Kim <suwan.kim027@gmail.com>; Jason Wang <jasowang@redhat.com>; Paolo Bonzini <pbonzini@redhat.com>; Stefan Hajnoczi <stefanha@redhat.com>; Xuan Zhuo <xuanzhuo@linux.alibaba.com>; Jens Axboe <axboe@kernel.dk>; virtualization@lists.linux-foundation.org; linux-block@vger.kernel.org
Subject: Re: [PATCH v2] Revert "virtio-blk: support completion batching for the IRQ path"

On Fri, Jun 09, 2023 at 09:34:40AM +0000, Roberts, Martin wrote:
> I think, at some point,  vbr->status, got changed to virtblk_vbr_status(vbr).
> If I use the version of virtblk_poll() from the patch, but substitute virtblk_vbr_status(vbr), then it (patched 6.3.3) compiles OK.

Hmm v2 does not have vbr->status anymore. Are you referring to v1?

> Note, my setup never causes virtblk_poll() to be called, so its influence on the issue is unknown - but maybe it also shouldn't be running in batch mode.
> 
> With the patch, I've not (yet) managed to hang it - I will let it run a bit longer.
> Martin

Want to post the patch that works for you?

> -----Original Message-----
> From: Michael S. Tsirkin <mst@redhat.com> 
> Sent: Friday, June 9, 2023 8:27 AM
> To: linux-kernel@vger.kernel.org
> Cc: lkp <lkp@intel.com>; Suwan Kim <suwan.kim027@gmail.com>; Roberts, Martin <martin.roberts@intel.com>; Jason Wang <jasowang@redhat.com>; Paolo Bonzini <pbonzini@redhat.com>; Stefan Hajnoczi <stefanha@redhat.com>; Xuan Zhuo <xuanzhuo@linux.alibaba.com>; Jens Axboe <axboe@kernel.dk>; virtualization@lists.linux-foundation.org; linux-block@vger.kernel.org
> Subject: [PATCH v2] Revert "virtio-blk: support completion batching for the IRQ path"
> 
> This reverts commit 07b679f70d73483930e8d3c293942416d9cd5c13.
> 
> This change appears to have broken things...
> We now see applications hanging during disk accesses.
> e.g.
> multi-port virtio-blk device running in h/w (FPGA)
> Host running a simple 'fio' test.
> [global]
> thread=1
> direct=1
> ioengine=libaio
> norandommap=1
> group_reporting=1
> bs=4K
> rw=read
> iodepth=128
> runtime=1
> numjobs=4
> time_based
> [job0]
> filename=/dev/vda
> [job1]
> filename=/dev/vdb
> [job2]
> filename=/dev/vdc
> ...
> [job15]
> filename=/dev/vdp
> 
> i.e. 16 disks; 4 queues per disk; simple burst of 4KB reads
> This is repeatedly run in a loop.
> 
> After a few, normally <10 seconds, fio hangs.
> With 64 queues (16 disks), failure occurs within a few seconds; with 8 queues (2 disks) it may take ~hour before hanging.
> Last message:
> fio-3.19
> Starting 8 threads
> Jobs: 1 (f=1): [_(7),R(1)][68.3%][eta 03h:11m:06s]
> I think this means at the end of the run 1 queue was left incomplete.
> 
> 'diskstats' (run while fio is hung) shows no outstanding transactions.
> e.g.
> $ cat /proc/diskstats
> ...
> 252       0 vda 1843140071 0 14745120568 712568645 0 0 0 0 0 3117947 712568645 0 0 0 0 0 0
> 252      16 vdb 1816291511 0 14530332088 704905623 0 0 0 0 0 3117711 704905623 0 0 0 0 0 0
> ...
> 
> Other stats (in the h/w, and added to the virtio-blk driver ([a]virtio_queue_rq(), [b]virtblk_handle_req(), [c]virtblk_request_done()) all agree, and show every request had a completion, and that virtblk_request_done() never gets called.
> e.g.
> PF= 0                         vq=0           1           2           3
> [a]request_count     -   839416590   813148916   105586179    84988123
> [b]completion1_count -   839416590   813148916   105586179    84988123
> [c]completion2_count -           0           0           0           0
> 
> PF= 1                         vq=0           1           2           3
> [a]request_count     -   823335887   812516140   104582672    75856549
> [b]completion1_count -   823335887   812516140   104582672    75856549
> [c]completion2_count -           0           0           0           0
> 
> i.e. the issue is after the virtio-blk driver.
> 
> This change was introduced in kernel 6.3.0.
> I am seeing this using 6.3.3.
> If I run with an earlier kernel (5.15), it does not occur.
> If I make a simple patch to the 6.3.3 virtio-blk driver, to skip the blk_mq_add_to_batch()call, it does not fail.
> e.g.
> kernel 5.15 - this is OK
> virtio_blk.c,virtblk_done() [irq handler]
>                  if (likely(!blk_should_fake_timeout(req->q))) {
>                           blk_mq_complete_request(req);
>                  }
> 
> kernel 6.3.3 - this fails
> virtio_blk.c,virtblk_handle_req() [irq handler]
>                  if (likely(!blk_should_fake_timeout(req->q))) {
>                           if (!blk_mq_complete_request_remote(req)) {
>                                   if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) {
>                                            virtblk_request_done(req);    //this never gets called... so blk_mq_add_to_batch() must always succeed
>                                    }
>                           }
>                  }
> 
> If I do, kernel 6.3.3 - this is OK
> virtio_blk.c,virtblk_handle_req() [irq handler]
>                  if (likely(!blk_should_fake_timeout(req->q))) {
>                           if (!blk_mq_complete_request_remote(req)) {
>                                    virtblk_request_done(req); //force this here...
>                                   if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) {
>                                            virtblk_request_done(req);    //this never gets called... so blk_mq_add_to_batch() must always succeed
>                                    }
>                           }
>                  }
> 
> Perhaps you might like to fix/test/revert this change...
> Martin
> 
> Reported-by: kernel test robot <lkp@intel.com>
> Closes: https://lore.kernel.org/oe-kbuild-all/202306090826.C1fZmdMe-lkp@intel.com/
> Cc: Suwan Kim <suwan.kim027@gmail.com>
> Reported-by: "Roberts, Martin" <martin.roberts@intel.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> 
> Since v1:
> 	fix build error
> 
> Still completely untested as I'm traveling.
> Martin, Suwan, could you please test and report?
> Suwan if you have a better revert in mind pls post and
> I will be happy to drop this.
> 
> Thanks!
> 
> 
>  drivers/block/virtio_blk.c | 82 +++++++++++++++++---------------------
>  1 file changed, 37 insertions(+), 45 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 2b918e28acaa..b47358da92a2 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -348,63 +348,33 @@ static inline void virtblk_request_done(struct request *req)
>  	blk_mq_end_request(req, status);
>  }
>  
> -static void virtblk_complete_batch(struct io_comp_batch *iob)
> -{
> -	struct request *req;
> -
> -	rq_list_for_each(&iob->req_list, req) {
> -		virtblk_unmap_data(req, blk_mq_rq_to_pdu(req));
> -		virtblk_cleanup_cmd(req);
> -	}
> -	blk_mq_end_request_batch(iob);
> -}
> -
> -static int virtblk_handle_req(struct virtio_blk_vq *vq,
> -			      struct io_comp_batch *iob)
> -{
> -	struct virtblk_req *vbr;
> -	int req_done = 0;
> -	unsigned int len;
> -
> -	while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) {
> -		struct request *req = blk_mq_rq_from_pdu(vbr);
> -
> -		if (likely(!blk_should_fake_timeout(req->q)) &&
> -		    !blk_mq_complete_request_remote(req) &&
> -		    !blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr),
> -					 virtblk_complete_batch))
> -			virtblk_request_done(req);
> -		req_done++;
> -	}
> -
> -	return req_done;
> -}
> -
>  static void virtblk_done(struct virtqueue *vq)
>  {
>  	struct virtio_blk *vblk = vq->vdev->priv;
> -	struct virtio_blk_vq *vblk_vq = &vblk->vqs[vq->index];
> -	int req_done = 0;
> +	bool req_done = false;
> +	int qid = vq->index;
> +	struct virtblk_req *vbr;
>  	unsigned long flags;
> -	DEFINE_IO_COMP_BATCH(iob);
> +	unsigned int len;
>  
> -	spin_lock_irqsave(&vblk_vq->lock, flags);
> +	spin_lock_irqsave(&vblk->vqs[qid].lock, flags);
>  	do {
>  		virtqueue_disable_cb(vq);
> -		req_done += virtblk_handle_req(vblk_vq, &iob);
> +		while ((vbr = virtqueue_get_buf(vblk->vqs[qid].vq, &len)) != NULL) {
> +			struct request *req = blk_mq_rq_from_pdu(vbr);
>  
> +			if (likely(!blk_should_fake_timeout(req->q)))
> +				blk_mq_complete_request(req);
> +			req_done = true;
> +		}
>  		if (unlikely(virtqueue_is_broken(vq)))
>  			break;
>  	} while (!virtqueue_enable_cb(vq));
>  
> -	if (req_done) {
> -		if (!rq_list_empty(iob.req_list))
> -			iob.complete(&iob);
> -
> -		/* In case queue is stopped waiting for more buffers. */
> +	/* In case queue is stopped waiting for more buffers. */
> +	if (req_done)
>  		blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);
> -	}
> -	spin_unlock_irqrestore(&vblk_vq->lock, flags);
> +	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>  }
>  
>  static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
> @@ -1283,15 +1253,37 @@ static void virtblk_map_queues(struct blk_mq_tag_set *set)
>  	}
>  }
>  
> +static void virtblk_complete_batch(struct io_comp_batch *iob)
> +{
> +	struct request *req;
> +
> +	rq_list_for_each(&iob->req_list, req) {
> +		virtblk_unmap_data(req, blk_mq_rq_to_pdu(req));
> +		virtblk_cleanup_cmd(req);
> +	}
> +	blk_mq_end_request_batch(iob);
> +}
> +
>  static int virtblk_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
>  {
>  	struct virtio_blk *vblk = hctx->queue->queuedata;
>  	struct virtio_blk_vq *vq = get_virtio_blk_vq(hctx);
> +	struct virtblk_req *vbr;
>  	unsigned long flags;
> +	unsigned int len;
>  	int found = 0;
>  
>  	spin_lock_irqsave(&vq->lock, flags);
> -	found = virtblk_handle_req(vq, iob);
> +
> +	while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) {
> +		struct request *req = blk_mq_rq_from_pdu(vbr);
> +
> +		found++;
> +		if (!blk_mq_complete_request_remote(req) &&
> +		    !blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr),
> +						virtblk_complete_batch))
> +			virtblk_request_done(req);
> +	}
>  
>  	if (found)
>  		blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);
> -- 
> MST
  
Michael S. Tsirkin June 9, 2023, 4:06 p.m. UTC | #4
On Fri, Jun 09, 2023 at 09:49:18AM +0000, Roberts, Martin wrote:
> OK, I didn't realise you had updated the patch; I only looked at the first one.  I think you did the same as me, just changed vbr->status to virtblk_vbr_status(vbr), in virtblk_poll().

Yes exactly, though testing exactly what's on list is always a good
idea, just to avoid confusion.
  
Edward Liaw June 20, 2023, 10:54 p.m. UTC | #5
On Fri, Jun 09, 2023 at 03:27:24AM -0400, Michael S. Tsirkin wrote:
> This reverts commit 07b679f70d73483930e8d3c293942416d9cd5c13.
This commit was also breaking kernel tests on a virtual Android device
(cuttlefish).  We were seeing hangups like:

[ 2889.910733] INFO: task kworker/u8:2:6312 blocked for more than 720 seconds.
[ 2889.910967]       Tainted: G           OE      6.2.0-mainline-g5c05cafa8df7-ab9969617 #1
[ 2889.911143] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2889.911389] task:kworker/u8:2    state:D stack:12160 pid:6312  ppid:2      flags:0x00004000
[ 2889.911567] Workqueue: writeback wb_workfn (flush-254:57)
[ 2889.911771] Call Trace:
[ 2889.911831]  <TASK>
[ 2889.911893]  __schedule+0x55f/0x880
[ 2889.912021]  schedule+0x6a/0xc0
[ 2889.912110]  schedule_timeout+0x58/0x1a0
[ 2889.912200]  wait_for_common+0xf7/0x1b0
[ 2889.912289]  wait_for_completion+0x1c/0x40
[ 2889.912377]  f2fs_issue_checkpoint+0x14c/0x210
[ 2889.912504]  f2fs_sync_fs+0x4c/0xb0
[ 2889.912597]  f2fs_balance_fs_bg+0x2f6/0x340
[ 2889.912736]  ? can_migrate_task+0x39/0x2b0
[ 2889.912872]  f2fs_write_node_pages+0x77/0x240
[ 2889.912989]  do_writepages+0xde/0x240
[ 2889.913094]  __writeback_single_inode+0x3f/0x360
[ 2889.913231]  writeback_sb_inodes+0x320/0x5f0
[ 2889.913349]  ? move_expired_inodes+0x58/0x210
[ 2889.913470]  __writeback_inodes_wb+0x97/0x100
[ 2889.913587]  wb_writeback+0x17e/0x390
[ 2889.913682]  wb_workfn+0x35f/0x500
[ 2889.913774]  process_one_work+0x1d9/0x3b0
[ 2889.913870]  worker_thread+0x251/0x410
[ 2889.913960]  kthread+0xeb/0x110
[ 2889.914052]  ? __cfi_worker_thread+0x10/0x10
[ 2889.914168]  ? __cfi_kthread+0x10/0x10
[ 2889.914257]  ret_from_fork+0x29/0x50
[ 2889.914364]  </TASK>
[ 2889.914565] INFO: task mkdir09:6425 blocked for more than 720 seconds.
[ 2889.916065]       Tainted: G           OE      6.2.0-mainline-g5c05cafa8df7-ab9969617 #1
[ 2889.916255] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2889.916450] task:mkdir09         state:D stack:13016 pid:6425  ppid:6423   flags:0x00004000
[ 2889.916656] Call Trace:
[ 2889.916900]  <TASK>
[ 2889.917004]  __schedule+0x55f/0x880
[ 2889.917129]  schedule+0x6a/0xc0
[ 2889.917273]  schedule_timeout+0x58/0x1a0
[ 2889.917425]  wait_for_common+0xf7/0x1b0
[ 2889.917535]  wait_for_completion+0x1c/0x40
[ 2889.917670]  f2fs_issue_checkpoint+0x14c/0x210
[ 2889.917844]  f2fs_sync_fs+0x4c/0xb0
[ 2889.917969]  f2fs_do_sync_file+0x3a8/0x8c0
[ 2889.918090]  ? mt_find+0xa0/0x1a0
[ 2889.918216]  f2fs_sync_file+0x2f/0x60
[ 2889.918310]  vfs_fsync_range+0x74/0x90
[ 2889.918567]  __se_sys_msync+0x176/0x270
[ 2889.918730]  __x64_sys_msync+0x1c/0x40
[ 2889.918873]  do_syscall_64+0x53/0xb0
[ 2889.918996]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 2889.919178] RIP: 0033:0x7540b08bcf47
[ 2889.919297] RSP: 002b:00007fff5fcbeea8 EFLAGS: 00000206 ORIG_RAX: 000000000000001a
[ 2889.919496] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007540b08bcf47
[ 2889.919828] RDX: 0000000000000004 RSI: 0000000000001000 RDI: 00007540b4ae7000
[ 2889.920227] RBP: 0000000000000000 R08: 721e0000010b0016 R09: 0000000000000003
[ 2889.920435] R10: 0000000000000100 R11: 0000000000000206 R12: 00005d2f95fd2f08
[ 2889.920793] R13: 00005d2f95fbc310 R14: 00007540b08e1bb8 R15: 00005d2f95fbc310
[ 2889.921072]  </TASK>

in random tests in the LTP (linux test project) test suite.

> ---
> 
> Since v1:
> 	fix build error
> 
> Still completely untested as I'm traveling.
> Martin, Suwan, could you please test and report?
> Suwan if you have a better revert in mind pls post and
> I will be happy to drop this.
> 
> Thanks!
> 
This revert appears to have resolved the test failures for me.

Tested-by: edliaw@google.com
  
Michael S. Tsirkin June 22, 2023, 6:14 a.m. UTC | #6
On Tue, Jun 20, 2023 at 10:54:19PM +0000, Edward Liaw wrote:
> On Fri, Jun 09, 2023 at 03:27:24AM -0400, Michael S. Tsirkin wrote:
> > This reverts commit 07b679f70d73483930e8d3c293942416d9cd5c13.
> This commit was also breaking kernel tests on a virtual Android device
> (cuttlefish).  We were seeing hangups like:
> 
> [ 2889.910733] INFO: task kworker/u8:2:6312 blocked for more than 720 seconds.
> [ 2889.910967]       Tainted: G           OE      6.2.0-mainline-g5c05cafa8df7-ab9969617 #1
> [ 2889.911143] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 2889.911389] task:kworker/u8:2    state:D stack:12160 pid:6312  ppid:2      flags:0x00004000
> [ 2889.911567] Workqueue: writeback wb_workfn (flush-254:57)
> [ 2889.911771] Call Trace:
> [ 2889.911831]  <TASK>
> [ 2889.911893]  __schedule+0x55f/0x880
> [ 2889.912021]  schedule+0x6a/0xc0
> [ 2889.912110]  schedule_timeout+0x58/0x1a0
> [ 2889.912200]  wait_for_common+0xf7/0x1b0
> [ 2889.912289]  wait_for_completion+0x1c/0x40
> [ 2889.912377]  f2fs_issue_checkpoint+0x14c/0x210
> [ 2889.912504]  f2fs_sync_fs+0x4c/0xb0
> [ 2889.912597]  f2fs_balance_fs_bg+0x2f6/0x340
> [ 2889.912736]  ? can_migrate_task+0x39/0x2b0
> [ 2889.912872]  f2fs_write_node_pages+0x77/0x240
> [ 2889.912989]  do_writepages+0xde/0x240
> [ 2889.913094]  __writeback_single_inode+0x3f/0x360
> [ 2889.913231]  writeback_sb_inodes+0x320/0x5f0
> [ 2889.913349]  ? move_expired_inodes+0x58/0x210
> [ 2889.913470]  __writeback_inodes_wb+0x97/0x100
> [ 2889.913587]  wb_writeback+0x17e/0x390
> [ 2889.913682]  wb_workfn+0x35f/0x500
> [ 2889.913774]  process_one_work+0x1d9/0x3b0
> [ 2889.913870]  worker_thread+0x251/0x410
> [ 2889.913960]  kthread+0xeb/0x110
> [ 2889.914052]  ? __cfi_worker_thread+0x10/0x10
> [ 2889.914168]  ? __cfi_kthread+0x10/0x10
> [ 2889.914257]  ret_from_fork+0x29/0x50
> [ 2889.914364]  </TASK>
> [ 2889.914565] INFO: task mkdir09:6425 blocked for more than 720 seconds.
> [ 2889.916065]       Tainted: G           OE      6.2.0-mainline-g5c05cafa8df7-ab9969617 #1
> [ 2889.916255] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 2889.916450] task:mkdir09         state:D stack:13016 pid:6425  ppid:6423   flags:0x00004000
> [ 2889.916656] Call Trace:
> [ 2889.916900]  <TASK>
> [ 2889.917004]  __schedule+0x55f/0x880
> [ 2889.917129]  schedule+0x6a/0xc0
> [ 2889.917273]  schedule_timeout+0x58/0x1a0
> [ 2889.917425]  wait_for_common+0xf7/0x1b0
> [ 2889.917535]  wait_for_completion+0x1c/0x40
> [ 2889.917670]  f2fs_issue_checkpoint+0x14c/0x210
> [ 2889.917844]  f2fs_sync_fs+0x4c/0xb0
> [ 2889.917969]  f2fs_do_sync_file+0x3a8/0x8c0
> [ 2889.918090]  ? mt_find+0xa0/0x1a0
> [ 2889.918216]  f2fs_sync_file+0x2f/0x60
> [ 2889.918310]  vfs_fsync_range+0x74/0x90
> [ 2889.918567]  __se_sys_msync+0x176/0x270
> [ 2889.918730]  __x64_sys_msync+0x1c/0x40
> [ 2889.918873]  do_syscall_64+0x53/0xb0
> [ 2889.918996]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [ 2889.919178] RIP: 0033:0x7540b08bcf47
> [ 2889.919297] RSP: 002b:00007fff5fcbeea8 EFLAGS: 00000206 ORIG_RAX: 000000000000001a
> [ 2889.919496] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007540b08bcf47
> [ 2889.919828] RDX: 0000000000000004 RSI: 0000000000001000 RDI: 00007540b4ae7000
> [ 2889.920227] RBP: 0000000000000000 R08: 721e0000010b0016 R09: 0000000000000003
> [ 2889.920435] R10: 0000000000000100 R11: 0000000000000206 R12: 00005d2f95fd2f08
> [ 2889.920793] R13: 00005d2f95fbc310 R14: 00007540b08e1bb8 R15: 00005d2f95fbc310
> [ 2889.921072]  </TASK>
> 
> in random tests in the LTP (linux test project) test suite.
> 
> > ---
> > 
> > Since v1:
> > 	fix build error
> > 
> > Still completely untested as I'm traveling.
> > Martin, Suwan, could you please test and report?
> > Suwan if you have a better revert in mind pls post and
> > I will be happy to drop this.
> > 
> > Thanks!
> > 
> This revert appears to have resolved the test failures for me.
> 
> Tested-by: edliaw@google.com

Oh interesting, can you share how to reproduce the failures?
Then maybe Suwan Kim can take a look at fixing up the patch.
  
Michael S. Tsirkin June 22, 2023, 5:52 p.m. UTC | #7
On Tue, Jun 20, 2023 at 10:54:19PM +0000, Edward Liaw wrote:
> On Fri, Jun 09, 2023 at 03:27:24AM -0400, Michael S. Tsirkin wrote:
> > This reverts commit 07b679f70d73483930e8d3c293942416d9cd5c13.
> This commit was also breaking kernel tests on a virtual Android device
> (cuttlefish).  We were seeing hangups like:
> 
> [ 2889.910733] INFO: task kworker/u8:2:6312 blocked for more than 720 seconds.
> [ 2889.910967]       Tainted: G           OE      6.2.0-mainline-g5c05cafa8df7-ab9969617 #1
> [ 2889.911143] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 2889.911389] task:kworker/u8:2    state:D stack:12160 pid:6312  ppid:2      flags:0x00004000
> [ 2889.911567] Workqueue: writeback wb_workfn (flush-254:57)
> [ 2889.911771] Call Trace:
> [ 2889.911831]  <TASK>
> [ 2889.911893]  __schedule+0x55f/0x880
> [ 2889.912021]  schedule+0x6a/0xc0
> [ 2889.912110]  schedule_timeout+0x58/0x1a0
> [ 2889.912200]  wait_for_common+0xf7/0x1b0
> [ 2889.912289]  wait_for_completion+0x1c/0x40
> [ 2889.912377]  f2fs_issue_checkpoint+0x14c/0x210
> [ 2889.912504]  f2fs_sync_fs+0x4c/0xb0
> [ 2889.912597]  f2fs_balance_fs_bg+0x2f6/0x340
> [ 2889.912736]  ? can_migrate_task+0x39/0x2b0
> [ 2889.912872]  f2fs_write_node_pages+0x77/0x240
> [ 2889.912989]  do_writepages+0xde/0x240
> [ 2889.913094]  __writeback_single_inode+0x3f/0x360
> [ 2889.913231]  writeback_sb_inodes+0x320/0x5f0
> [ 2889.913349]  ? move_expired_inodes+0x58/0x210
> [ 2889.913470]  __writeback_inodes_wb+0x97/0x100
> [ 2889.913587]  wb_writeback+0x17e/0x390
> [ 2889.913682]  wb_workfn+0x35f/0x500
> [ 2889.913774]  process_one_work+0x1d9/0x3b0
> [ 2889.913870]  worker_thread+0x251/0x410
> [ 2889.913960]  kthread+0xeb/0x110
> [ 2889.914052]  ? __cfi_worker_thread+0x10/0x10
> [ 2889.914168]  ? __cfi_kthread+0x10/0x10
> [ 2889.914257]  ret_from_fork+0x29/0x50
> [ 2889.914364]  </TASK>
> [ 2889.914565] INFO: task mkdir09:6425 blocked for more than 720 seconds.
> [ 2889.916065]       Tainted: G           OE      6.2.0-mainline-g5c05cafa8df7-ab9969617 #1
> [ 2889.916255] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 2889.916450] task:mkdir09         state:D stack:13016 pid:6425  ppid:6423   flags:0x00004000
> [ 2889.916656] Call Trace:
> [ 2889.916900]  <TASK>
> [ 2889.917004]  __schedule+0x55f/0x880
> [ 2889.917129]  schedule+0x6a/0xc0
> [ 2889.917273]  schedule_timeout+0x58/0x1a0
> [ 2889.917425]  wait_for_common+0xf7/0x1b0
> [ 2889.917535]  wait_for_completion+0x1c/0x40
> [ 2889.917670]  f2fs_issue_checkpoint+0x14c/0x210
> [ 2889.917844]  f2fs_sync_fs+0x4c/0xb0
> [ 2889.917969]  f2fs_do_sync_file+0x3a8/0x8c0
> [ 2889.918090]  ? mt_find+0xa0/0x1a0
> [ 2889.918216]  f2fs_sync_file+0x2f/0x60
> [ 2889.918310]  vfs_fsync_range+0x74/0x90
> [ 2889.918567]  __se_sys_msync+0x176/0x270
> [ 2889.918730]  __x64_sys_msync+0x1c/0x40
> [ 2889.918873]  do_syscall_64+0x53/0xb0
> [ 2889.918996]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [ 2889.919178] RIP: 0033:0x7540b08bcf47
> [ 2889.919297] RSP: 002b:00007fff5fcbeea8 EFLAGS: 00000206 ORIG_RAX: 000000000000001a
> [ 2889.919496] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007540b08bcf47
> [ 2889.919828] RDX: 0000000000000004 RSI: 0000000000001000 RDI: 00007540b4ae7000
> [ 2889.920227] RBP: 0000000000000000 R08: 721e0000010b0016 R09: 0000000000000003
> [ 2889.920435] R10: 0000000000000100 R11: 0000000000000206 R12: 00005d2f95fd2f08
> [ 2889.920793] R13: 00005d2f95fbc310 R14: 00007540b08e1bb8 R15: 00005d2f95fbc310
> [ 2889.921072]  </TASK>
> 
> in random tests in the LTP (linux test project) test suite.
> 
> > ---
> > 
> > Since v1:
> > 	fix build error
> > 
> > Still completely untested as I'm traveling.
> > Martin, Suwan, could you please test and report?
> > Suwan if you have a better revert in mind pls post and
> > I will be happy to drop this.
> > 
> > Thanks!
> > 
> This revert appears to have resolved the test failures for me.
> 
> Tested-by: edliaw@google.com


Does Android have VIRTIO_BLK_F_ZONED ? Could you list the
features this device has please? Thanks!
  
Edward Liaw June 23, 2023, 6:33 p.m. UTC | #8
On Wed, Jun 21, 2023 at 11:14 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> Oh interesting, can you share how to reproduce the failures?
> Then maybe Suwan Kim can take a look at fixing up the patch.

I did not have a deterministic way to reproduce the failure.  I ran
these test suites 5x consecutively and that seemed to fail often
enough to let me bisect it.

I checked out the kernel at d645c62fd19af9c9246150a958450ffd26ad0bfa
Merge 84cc6674b76b ("Merge tag 'for_linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost") into
android-mainline, built it, and ran cuttlefish with the following
configuration:
tools/bazel run
//common-modules/virtual-device:virtual_device_x86_64_dist  # builds
the kernel with bazel
acloud create --local-instance --local-kernel-image
$ANDROID_MAINLINE/out/virtual_device_x86_64/dist --branch aosp-master
--build-target aosp_cf_x86_64_phone-userdebug  # run cuttlefish with
custom kernel

And ran the following tests:
atest --rerun-until-failure 5 vab_legacy_tests vts_core_liblp_test
vts_libdm_test vts_libsnapshot_test vts_ltp_test_x86_64

I'm not sure if that will be helpful for you, though.

On Thu, Jun 22, 2023 at 10:53 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> Does Android have VIRTIO_BLK_F_ZONED ? Could you list the
> features this device has please? Thanks!

It does not.

$ cat /sys/block/vda/device/driver/virtio*/features
001000100100111000000000000001001000000000000000000000000000000
VIRTIO_BLK_F_SEG_MAX
VIRTIO_BLK_F_BLK_SIZE
VIRTIO_BLK_F_FLUSH
VIRTIO_BLK_F_MQ
VIRTIO_BLK_F_DISCARD
VIRTIO_BLK_F_WRITE_ZEROES
  

Patch

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 2b918e28acaa..b47358da92a2 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -348,63 +348,33 @@  static inline void virtblk_request_done(struct request *req)
 	blk_mq_end_request(req, status);
 }
 
-static void virtblk_complete_batch(struct io_comp_batch *iob)
-{
-	struct request *req;
-
-	rq_list_for_each(&iob->req_list, req) {
-		virtblk_unmap_data(req, blk_mq_rq_to_pdu(req));
-		virtblk_cleanup_cmd(req);
-	}
-	blk_mq_end_request_batch(iob);
-}
-
-static int virtblk_handle_req(struct virtio_blk_vq *vq,
-			      struct io_comp_batch *iob)
-{
-	struct virtblk_req *vbr;
-	int req_done = 0;
-	unsigned int len;
-
-	while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) {
-		struct request *req = blk_mq_rq_from_pdu(vbr);
-
-		if (likely(!blk_should_fake_timeout(req->q)) &&
-		    !blk_mq_complete_request_remote(req) &&
-		    !blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr),
-					 virtblk_complete_batch))
-			virtblk_request_done(req);
-		req_done++;
-	}
-
-	return req_done;
-}
-
 static void virtblk_done(struct virtqueue *vq)
 {
 	struct virtio_blk *vblk = vq->vdev->priv;
-	struct virtio_blk_vq *vblk_vq = &vblk->vqs[vq->index];
-	int req_done = 0;
+	bool req_done = false;
+	int qid = vq->index;
+	struct virtblk_req *vbr;
 	unsigned long flags;
-	DEFINE_IO_COMP_BATCH(iob);
+	unsigned int len;
 
-	spin_lock_irqsave(&vblk_vq->lock, flags);
+	spin_lock_irqsave(&vblk->vqs[qid].lock, flags);
 	do {
 		virtqueue_disable_cb(vq);
-		req_done += virtblk_handle_req(vblk_vq, &iob);
+		while ((vbr = virtqueue_get_buf(vblk->vqs[qid].vq, &len)) != NULL) {
+			struct request *req = blk_mq_rq_from_pdu(vbr);
 
+			if (likely(!blk_should_fake_timeout(req->q)))
+				blk_mq_complete_request(req);
+			req_done = true;
+		}
 		if (unlikely(virtqueue_is_broken(vq)))
 			break;
 	} while (!virtqueue_enable_cb(vq));
 
-	if (req_done) {
-		if (!rq_list_empty(iob.req_list))
-			iob.complete(&iob);
-
-		/* In case queue is stopped waiting for more buffers. */
+	/* In case queue is stopped waiting for more buffers. */
+	if (req_done)
 		blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);
-	}
-	spin_unlock_irqrestore(&vblk_vq->lock, flags);
+	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
 }
 
 static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
@@ -1283,15 +1253,37 @@  static void virtblk_map_queues(struct blk_mq_tag_set *set)
 	}
 }
 
+static void virtblk_complete_batch(struct io_comp_batch *iob)
+{
+	struct request *req;
+
+	rq_list_for_each(&iob->req_list, req) {
+		virtblk_unmap_data(req, blk_mq_rq_to_pdu(req));
+		virtblk_cleanup_cmd(req);
+	}
+	blk_mq_end_request_batch(iob);
+}
+
 static int virtblk_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
 {
 	struct virtio_blk *vblk = hctx->queue->queuedata;
 	struct virtio_blk_vq *vq = get_virtio_blk_vq(hctx);
+	struct virtblk_req *vbr;
 	unsigned long flags;
+	unsigned int len;
 	int found = 0;
 
 	spin_lock_irqsave(&vq->lock, flags);
-	found = virtblk_handle_req(vq, iob);
+
+	while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) {
+		struct request *req = blk_mq_rq_from_pdu(vbr);
+
+		found++;
+		if (!blk_mq_complete_request_remote(req) &&
+		    !blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr),
+						virtblk_complete_batch))
+			virtblk_request_done(req);
+	}
 
 	if (found)
 		blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);