From patchwork Thu Jun 8 21:47:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Michael S. Tsirkin" X-Patchwork-Id: 105138 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp567128vqr; Thu, 8 Jun 2023 15:06:10 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7or5x1zZjg/fb5xPiUDrGVOgjEV/QYbAqX8mj/7DIBaQohv0qu0n3CznR7Yc9z+1AKWmAi X-Received: by 2002:a05:6a20:8f17:b0:110:9b0b:71ab with SMTP id b23-20020a056a208f1700b001109b0b71abmr5436194pzk.40.1686261970522; Thu, 08 Jun 2023 15:06:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686261970; cv=none; d=google.com; s=arc-20160816; b=oAFA1PB7soNL5xE5JMHe2cNyV0EO/P2s4bhzhdlnoSw+YLCaJ1gO2pBMFM9397ohVR AdFY9Pchut024LS6W9x+onXqSFTfQ1I8xDA5czw3hP3uXYiIyvW+zQcj9Fn0ear47sEn hrdMYrlLJxBau/+ssVOiI/NoxdDGwaPJrMl0sdVZr97j1kEekwjsglAgEYenz5B91M9G pk2/2JeiwGkYls68YbHehHgppyUV3yffI+wdurRgpeiaj43l8O/PQuvMtDNu21rSNNG9 4EvJl7BemIFG3E0lkTSWmR/oSA5+fOkRxkCK1Q3hpB/Ue2JiEbTWBd6uqnjHuy6PN1sw zC3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-disposition:mime-version:message-id :subject:cc:to:from:date:dkim-signature; bh=3M2thlA3iIdihiFPMBCL4mQEMmIBWZIL+21nRfiIlP4=; b=ii7d7nz+Fgp7mrFgH1wg5UNR8XKPBTw+oqqHbbLaqMcDqE3Lfa/OykvcGnhqRUMcoX vCY3upZf15saJYw4cEek+wP9N3gCinDiJ7yP4oLzbI25KK46Xmz/w+iJpXojZ+lLiB4G hS2PLhSp7VRDvmK7XtJ7OMY+7APgeVWT8WhATMMo4Nx2zoxPMXk6+VsEI5j+230CHjvQ hV+qVOiKC1eUXsftQMgMEGoyoouux5mZ1fTyrfjCiQ0NLN9qqOGU9N4Ryoz1xGSY73R8 Bl1pMCYMY+mHNeI/Jrl3ou6ciuVrh7vIlCO2Ct6WWiVsNQpQkSbX31oWhjcTNkMaDND7 /MSQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=IPx2late; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k62-20020a636f41000000b0054201a157fesi1549933pgc.427.2023.06.08.15.05.50; Thu, 08 Jun 2023 15:06:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=IPx2late; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236862AbjFHVsD (ORCPT + 99 others); Thu, 8 Jun 2023 17:48:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40784 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229450AbjFHVsC (ORCPT ); Thu, 8 Jun 2023 17:48:02 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4CE630C5 for ; Thu, 8 Jun 2023 14:47:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1686260842; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type; bh=3M2thlA3iIdihiFPMBCL4mQEMmIBWZIL+21nRfiIlP4=; b=IPx2lateVJE3x39jEgDgo4qA7hab9K9K+o4frz0a4Bti8xYfhluU7wRk76frk2w/WVOshY 1UgLeze7mLbNbjL32N1ZG5iLFBh1+ZZWWPvFDbjQs6vafdsjlaZTUFTYDiFrAEqFPYemdZ hVvVVF8YsMtz2pGGm+kdVjo1KzufYIw= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-393-7PX-dB-hM32Fnz-qkfiapg-1; Thu, 08 Jun 2023 17:47:21 -0400 X-MC-Unique: 7PX-dB-hM32Fnz-qkfiapg-1 Received: by mail-wm1-f72.google.com with SMTP id 5b1f17b1804b1-3f5df65f9f4so5445675e9.2 for ; Thu, 08 Jun 2023 14:47:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686260838; x=1688852838; h=content-disposition:mime-version:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3M2thlA3iIdihiFPMBCL4mQEMmIBWZIL+21nRfiIlP4=; b=VOHMw+fHPwiOlHw15mIJcvnmUGpcTmEIiDyTHAxuUKuwARY76FZEIUNoQFgqlK99h8 w5GnVUC+47iGIx1+mhP0e9uGZTk87aPz2sXhtj6UX00muCblRS3M0CIy8RUzeyCBdOAj 5aCqJZdOyq1Dh1lf/aUSHU+YSUrICPxrS2NhSaBf2jDAqiz8qSxrDPMtK+l5yCQdZZjS D3pt1mscR3bE95CEGrsXjm6sVVpjX7KJFl8f6ZudP++tpXimCkvB9uMOV+d7UR9crCZT SQLzs8x0/X7ZDW1KquRKrZYEmY/rKg8f2WPvwaEgVYGBHNga1FD3OADMcRvMt9SjWLb6 eRJw== X-Gm-Message-State: AC+VfDwBNpcv2czklHo8bmGHQBCTDfmtGFSFyORpmBFRmD8T6AcMd/Ci nwOxu7zzxkngLjmip+w7DI3mNYes9pZ4Crno5+VxcSLIiCfGM9j7ni9cEIEc8xtB4JJ9VZ8y1iC +LLRPCFyKDSer0dwv0+P5Y6jOpssLSyMERhdRo+SEJzmcCt+MuqJX0KLTqXJm6eYCn6LF4F7Mb8 mGAQ== X-Received: by 2002:adf:fac2:0:b0:30a:e097:7b5a with SMTP id a2-20020adffac2000000b0030ae0977b5amr6721272wrs.36.1686260838599; Thu, 08 Jun 2023 14:47:18 -0700 (PDT) X-Received: by 2002:adf:fac2:0:b0:30a:e097:7b5a with SMTP id a2-20020adffac2000000b0030ae0977b5amr6721251wrs.36.1686260838141; Thu, 08 Jun 2023 14:47:18 -0700 (PDT) Received: from redhat.com ([2.55.41.2]) by smtp.gmail.com with ESMTPSA id s1-20020a5d4ec1000000b002c70ce264bfsm2615372wrv.76.2023.06.08.14.47.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 14:47:17 -0700 (PDT) Date: Thu, 8 Jun 2023 17:47:15 -0400 From: "Michael S. Tsirkin" To: linux-kernel@vger.kernel.org Cc: "Roberts, Martin" , Jason Wang , Xuan Zhuo , Paolo Bonzini , Stefan Hajnoczi , Jens Axboe , virtualization@lists.linux-foundation.org, linux-block@vger.kernel.org, Suwan Kim Subject: [PATCH] Revert "virtio-blk: support completion batching for the IRQ path" Message-ID: <9443f7273fbba6e62f89e54cbb2a70d2c93ed8e9.1686260774.git.mst@redhat.com> MIME-Version: 1.0 Content-Disposition: inline X-Mailer: git-send-email 2.27.0.106.g8ac3dc51b1 X-Mutt-Fcc: =sent X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768173832054468867?= X-GMAIL-MSGID: =?utf-8?q?1768173832054468867?= This reverts commit 07b679f70d73483930e8d3c293942416d9cd5c13. This change appears to have broken things... We now see applications hanging during disk accesses. e.g. multi-port virtio-blk device running in h/w (FPGA) Host running a simple 'fio' test. [global] thread=1 direct=1 ioengine=libaio norandommap=1 group_reporting=1 bs=4K rw=read iodepth=128 runtime=1 numjobs=4 time_based [job0] filename=/dev/vda [job1] filename=/dev/vdb [job2] filename=/dev/vdc ... [job15] filename=/dev/vdp i.e. 16 disks; 4 queues per disk; simple burst of 4KB reads This is repeatedly run in a loop. After a few, normally <10 seconds, fio hangs. With 64 queues (16 disks), failure occurs within a few seconds; with 8 queues (2 disks) it may take ~hour before hanging. Last message: fio-3.19 Starting 8 threads Jobs: 1 (f=1): [_(7),R(1)][68.3%][eta 03h:11m:06s] I think this means at the end of the run 1 queue was left incomplete. 'diskstats' (run while fio is hung) shows no outstanding transactions. e.g. $ cat /proc/diskstats ... 252 0 vda 1843140071 0 14745120568 712568645 0 0 0 0 0 3117947 712568645 0 0 0 0 0 0 252 16 vdb 1816291511 0 14530332088 704905623 0 0 0 0 0 3117711 704905623 0 0 0 0 0 0 ... Other stats (in the h/w, and added to the virtio-blk driver ([a]virtio_queue_rq(), [b]virtblk_handle_req(), [c]virtblk_request_done()) all agree, and show every request had a completion, and that virtblk_request_done() never gets called. e.g. PF= 0 vq=0 1 2 3 [a]request_count - 839416590 813148916 105586179 84988123 [b]completion1_count - 839416590 813148916 105586179 84988123 [c]completion2_count - 0 0 0 0 PF= 1 vq=0 1 2 3 [a]request_count - 823335887 812516140 104582672 75856549 [b]completion1_count - 823335887 812516140 104582672 75856549 [c]completion2_count - 0 0 0 0 i.e. the issue is after the virtio-blk driver. This change was introduced in kernel 6.3.0. I am seeing this using 6.3.3. If I run with an earlier kernel (5.15), it does not occur. If I make a simple patch to the 6.3.3 virtio-blk driver, to skip the blk_mq_add_to_batch()call, it does not fail. e.g. kernel 5.15 - this is OK virtio_blk.c,virtblk_done() [irq handler] if (likely(!blk_should_fake_timeout(req->q))) { blk_mq_complete_request(req); } kernel 6.3.3 - this fails virtio_blk.c,virtblk_handle_req() [irq handler] if (likely(!blk_should_fake_timeout(req->q))) { if (!blk_mq_complete_request_remote(req)) { if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) { virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed } } } If I do, kernel 6.3.3 - this is OK virtio_blk.c,virtblk_handle_req() [irq handler] if (likely(!blk_should_fake_timeout(req->q))) { if (!blk_mq_complete_request_remote(req)) { virtblk_request_done(req); //force this here... if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) { virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed } } } Perhaps you might like to fix/test/revert this change... Martin Reported-by: "Roberts, Martin" Signed-off-by: Michael S. Tsirkin --- Completely untested. Martin can you pls test and report whether this fixes the problem? Thanks! drivers/block/virtio_blk.c | 82 +++++++++++++++++--------------------- 1 file changed, 37 insertions(+), 45 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 2b918e28acaa..c1e65b4d8071 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -348,63 +348,33 @@ static inline void virtblk_request_done(struct request *req) blk_mq_end_request(req, status); } -static void virtblk_complete_batch(struct io_comp_batch *iob) -{ - struct request *req; - - rq_list_for_each(&iob->req_list, req) { - virtblk_unmap_data(req, blk_mq_rq_to_pdu(req)); - virtblk_cleanup_cmd(req); - } - blk_mq_end_request_batch(iob); -} - -static int virtblk_handle_req(struct virtio_blk_vq *vq, - struct io_comp_batch *iob) -{ - struct virtblk_req *vbr; - int req_done = 0; - unsigned int len; - - while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) { - struct request *req = blk_mq_rq_from_pdu(vbr); - - if (likely(!blk_should_fake_timeout(req->q)) && - !blk_mq_complete_request_remote(req) && - !blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), - virtblk_complete_batch)) - virtblk_request_done(req); - req_done++; - } - - return req_done; -} - static void virtblk_done(struct virtqueue *vq) { struct virtio_blk *vblk = vq->vdev->priv; - struct virtio_blk_vq *vblk_vq = &vblk->vqs[vq->index]; - int req_done = 0; + bool req_done = false; + int qid = vq->index; + struct virtblk_req *vbr; unsigned long flags; - DEFINE_IO_COMP_BATCH(iob); + unsigned int len; - spin_lock_irqsave(&vblk_vq->lock, flags); + spin_lock_irqsave(&vblk->vqs[qid].lock, flags); do { virtqueue_disable_cb(vq); - req_done += virtblk_handle_req(vblk_vq, &iob); + while ((vbr = virtqueue_get_buf(vblk->vqs[qid].vq, &len)) != NULL) { + struct request *req = blk_mq_rq_from_pdu(vbr); + if (likely(!blk_should_fake_timeout(req->q))) + blk_mq_complete_request(req); + req_done = true; + } if (unlikely(virtqueue_is_broken(vq))) break; } while (!virtqueue_enable_cb(vq)); - if (req_done) { - if (!rq_list_empty(iob.req_list)) - iob.complete(&iob); - - /* In case queue is stopped waiting for more buffers. */ + /* In case queue is stopped waiting for more buffers. */ + if (req_done) blk_mq_start_stopped_hw_queues(vblk->disk->queue, true); - } - spin_unlock_irqrestore(&vblk_vq->lock, flags); + spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags); } static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx) @@ -1283,15 +1253,37 @@ static void virtblk_map_queues(struct blk_mq_tag_set *set) } } +static void virtblk_complete_batch(struct io_comp_batch *iob) +{ + struct request *req; + + rq_list_for_each(&iob->req_list, req) { + virtblk_unmap_data(req, blk_mq_rq_to_pdu(req)); + virtblk_cleanup_cmd(req); + } + blk_mq_end_request_batch(iob); +} + static int virtblk_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob) { struct virtio_blk *vblk = hctx->queue->queuedata; struct virtio_blk_vq *vq = get_virtio_blk_vq(hctx); + struct virtblk_req *vbr; unsigned long flags; + unsigned int len; int found = 0; spin_lock_irqsave(&vq->lock, flags); - found = virtblk_handle_req(vq, iob); + + while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) { + struct request *req = blk_mq_rq_from_pdu(vbr); + + found++; + if (!blk_mq_complete_request_remote(req) && + !blk_mq_add_to_batch(req, iob, vbr->status, + virtblk_complete_batch)) + virtblk_request_done(req); + } if (found) blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);