Message ID | 20230725130102.3030032-2-chengming.zhou@linux.dev |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9010:0:b0:3e4:2afc:c1 with SMTP id l16csp2482575vqg; Tue, 25 Jul 2023 06:49:07 -0700 (PDT) X-Google-Smtp-Source: APBJJlGu+YCeXKWSWGzX+dik5ToHfLZ73DZrmScAlobO1c3WttKSif8gZ65I/MViedjJzF720L4C X-Received: by 2002:a17:907:7846:b0:994:8e9:67fe with SMTP id lb6-20020a170907784600b0099408e967femr11410108ejc.35.1690292947490; Tue, 25 Jul 2023 06:49:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690292947; cv=none; d=google.com; s=arc-20160816; b=tcwsfXTOcxAB/CqP1mR+Dxu85IH79tN1M9Quw8+eNZN6n9oA5T14bqsNcgqT5XeyZB pJgWlcy6rCAXqlSG0RUxBHlCdY9vf0+bvOR/RrHzkQylnHbmbilDXIK+N4m31cRRXogv i2AKuPiuCyal37F5EhyeXW3ha8xfwpMLZo1oe0B2YAXarBwSSoISH0eQER6o/7LVtw90 xYt5ZS2hgGKCvOgo0JrgxlGVcG0Sq1TUbdl1xX6KUMX6eB256m6sh2JxHaiB0Vb9BjqQ icF2Gvve7pXgWbnbUN/Vz5GPQIyEHVUHJk2YZDofbsn/6nGERQVZTcNtm2Rz6gb4l1it XGCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=/QeLRD3PFm/UhjbcDO63KGqnRYCXFOsqKzmt3mk4Vfs=; fh=4fzGWvE2LW7K10hgePOrl39slR3lbqz4nggQweyfPdc=; b=eOciJsvxAPQGZwalPgnheicIE9oHfLGVUXhqZYRNCSbAnXTHhzBFiwRZ4k6gzH1hcy YIIOwueohXdnEJTUgJZRU3NXIAY69zjGoORZd8sNAUzdpPtxOqlQv4sa4vD+xpsJ9k1L Sz3SmyPE//iFEQ3367y+jakfz+gIkmaXKf2Mfcjrbj7olFmD53yY2sBoCc/NljPkFCci PIqHDo+au90RWlm+0dAqS9wPkiTCPu9lky9rhAmnFzacuohmYzQaYojA9C+zBVVSc65b nDwMf0WdA+2hClrCL1QDNp5yYO1zhs7E1tBIDTfqRVF5AO28UuVHnbH40H+PNtGL/hK5 BA4w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=dYoyHX5V; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v23-20020a1709060b5700b009888b517932si8026881ejg.333.2023.07.25.06.48.43; Tue, 25 Jul 2023 06:49:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=dYoyHX5V; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229891AbjGYNTG (ORCPT <rfc822;kautuk.consul.80@gmail.com> + 99 others); Tue, 25 Jul 2023 09:19:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49694 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229830AbjGYNTF (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 25 Jul 2023 09:19:05 -0400 Received: from out-27.mta0.migadu.com (out-27.mta0.migadu.com [91.218.175.27]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5825F19BA for <linux-kernel@vger.kernel.org>; Tue, 25 Jul 2023 06:18:59 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1690291137; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/QeLRD3PFm/UhjbcDO63KGqnRYCXFOsqKzmt3mk4Vfs=; b=dYoyHX5VNzJ1aicvzztBVYudt9ptphJkTO29hYSGPPfG0Z0U7+n46WMtbxZ4J7Q35KtYcO VI3RNCkVOFCy/kp957wNpvXCnsfHnxP/W8pdkhTwK+I9mj16ZUuVEqp1sFFDxTF5iSA752 y0QkHtWqGy1/KK6cPZOi18C8u5+TCYc= From: chengming.zhou@linux.dev To: axboe@kernel.dk, hch@lst.de, ming.lei@redhat.com Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, zhouchengming@bytedance.com Subject: [PATCH v2 1/4] blk-flush: flush_rq should inherit first_rq's cmd_flags Date: Tue, 25 Jul 2023 21:00:59 +0800 Message-ID: <20230725130102.3030032-2-chengming.zhou@linux.dev> In-Reply-To: <20230725130102.3030032-1-chengming.zhou@linux.dev> References: <20230725130102.3030032-1-chengming.zhou@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772400617812067378 X-GMAIL-MSGID: 1772400617812067378 |
Series |
blk-flush: optimize non-postflush requests
|
|
Commit Message
Chengming Zhou
July 25, 2023, 1 p.m. UTC
From: Chengming Zhou <zhouchengming@bytedance.com> The cmd_flags in blk_kick_flush() should inherit the original request's cmd_flags, but the current code looks buggy to me: flush_end_io() blk_flush_complete_seq() // requests on flush running list blk_kick_flush() So the request passed to blk_flush_complete_seq() may will be ended before blk_kick_flush(). On the other hand, flush_rq will inherit first_rq's tag, it should use first_rq's cmd_flags too. This patch is just preparation for the following patches, no bugfix intended. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> --- block/blk-flush.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-)
Comments
On Tue, Jul 25, 2023 at 09:00:59PM +0800, chengming.zhou@linux.dev wrote: > From: Chengming Zhou <zhouchengming@bytedance.com> > > The cmd_flags in blk_kick_flush() should inherit the original request's > cmd_flags, but the current code looks buggy to me: Should it? I know the code is kinda trying to do it, but does it really make sense? Adding Hannes who originally added this inheritance and discussing the details below: > flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH; > - flush_rq->cmd_flags |= (flags & REQ_DRV) | (flags & REQ_FAILFAST_MASK); > + flush_rq->cmd_flags |= (first_rq->cmd_flags & REQ_DRV) | > + (first_rq->cmd_flags & REQ_FAILFAST_MASK); Two cases here: 1) REQ_FAILFAST_MASK: I don't think this is actually set on flush request currently, and even if it was applying it to the flush that serves more than a single originating command seems wrong to me. 2) REQ_DRV is only set by drivers that have seen a bio. For dm this is used as REQ_DM_POLL_LIST which should never be set for a flush/fua request. For nvme-mpath it is REQ_NVME_MPATH, which is set in the bio based driver and used for decision making in the I/O completion handler. So I guess this one actually does need to get passed through.
On 2023/7/31 14:09, Christoph Hellwig wrote: > On Tue, Jul 25, 2023 at 09:00:59PM +0800, chengming.zhou@linux.dev wrote: >> From: Chengming Zhou <zhouchengming@bytedance.com> >> >> The cmd_flags in blk_kick_flush() should inherit the original request's >> cmd_flags, but the current code looks buggy to me: > > Should it? I know the code is kinda trying to do it, but does it really > make sense? Adding Hannes who originally added this inheritance and > discussing the details below: I'm not sure, actually I don't get what the current code is doing... Hope Hannes could provide some details. blk_flush_complete_seq(rq) -> blk_kick_flush(rq->cmd_flags) flush_rq will use the cmd_flags of request which just complete a sequence, there are three cases: 1. blk_insert_flush(rq): rq is pending, wait for flush 2. flush_end_io(flush_rq): rq flush seq done 3. mq_flush_data_end_io(rq): rq data seq done Only in the 1st case, the rq is the pending request that wait for flush_rq. In the 2nd and 3rd cases, the rq has nothing to do with the next flush_rq? So it's more reasonable for flush_rq to use its pending first_rq's cmd_flags? > >> flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH; >> - flush_rq->cmd_flags |= (flags & REQ_DRV) | (flags & REQ_FAILFAST_MASK); >> + flush_rq->cmd_flags |= (first_rq->cmd_flags & REQ_DRV) | >> + (first_rq->cmd_flags & REQ_FAILFAST_MASK); > > Two cases here: > > 1) REQ_FAILFAST_MASK: I don't think this is actually set on flush request > currently, and even if it was applying it to the flush that serves more > than a single originating command seems wrong to me. > 2) REQ_DRV is only set by drivers that have seen a bio. For dm this > is used as REQ_DM_POLL_LIST which should never be set for a flush/fua > request. For nvme-mpath it is REQ_NVME_MPATH, which is set in the > bio based driver and used for decision making in the I/O completion > handler. So I guess this one actually does need to get passed > through. > The commit 84fca1b0c461 ("block: pass failfast and driver-specific flags to flush requests") says: If flush requests are being sent to the device we need to inherit the failfast and driver-specific flags, too, otherwise I/O will fail. 1) REQ_FAILFAST_MASK: agree, shouldn't set to the flush_rq I think? 2) REQ_DRV: I don't get why this flag not set would cause I/O fail? Thanks!
On Mon, Jul 31, 2023 at 10:02:39PM +0800, Chengming Zhou wrote: > The commit 84fca1b0c461 ("block: pass failfast and driver-specific flags to > flush requests") says: > If flush requests are being sent to the device we need to inherit the > failfast and driver-specific flags, too, otherwise I/O will fail. > > 1) REQ_FAILFAST_MASK: agree, shouldn't set to the flush_rq I think? > 2) REQ_DRV: I don't get why this flag not set would cause I/O fail? I don't think it would fail I/O by it's own, but it will cause the nvme driver to not do the correct handling of an error when a flush is set to multipath setups.
On 7/31/23 08:09, Christoph Hellwig wrote: > On Tue, Jul 25, 2023 at 09:00:59PM +0800, chengming.zhou@linux.dev wrote: >> From: Chengming Zhou <zhouchengming@bytedance.com> >> >> The cmd_flags in blk_kick_flush() should inherit the original request's >> cmd_flags, but the current code looks buggy to me: > > Should it? I know the code is kinda trying to do it, but does it really > make sense? Adding Hannes who originally added this inheritance and > discussing the details below: > Yeah, it does. The flush machinery is sending flushes before and/or after the original request (preflush/postflush). For blocked transports (ie during FC RSCN handling) the transport will error out commands depending on the FAILFAST setting. If FAILFAST is set the SCSI layer gets an STS_TRANSPORT error (causing the I/O to be retried), but STS_ERROR if not set (causing I/O to failed). So if the FAILFAST setting is _not_ aligned between flush_rq and the original we'll get an error on the flush rq and a retry on the original rq, causing the entire command to fail. I guess we need to align them. Cheers, Hannes
On Mon, Jul 31, 2023 at 06:28:01PM +0200, Hannes Reinecke wrote: > The flush machinery is sending flushes before and/or after the original > request (preflush/postflush). For blocked transports (ie during FC RSCN > handling) the transport will error out commands depending on the FAILFAST > setting. If FAILFAST is set the SCSI layer gets an STS_TRANSPORT error > (causing the I/O to be retried), but STS_ERROR if not set (causing I/O to > failed). > > So if the FAILFAST setting is _not_ aligned between flush_rq and the > original we'll get an error on the flush rq and a retry on the original rq, > causing the entire command to fail. > > I guess we need to align them. But you can't, because multiple pre/postflushes are coalesced into a single outstanding flush request. They can and will not match quite commonly.
On Tue, Aug 01, 2023 at 01:04:32PM +0200, Christoph Hellwig wrote: > On Mon, Jul 31, 2023 at 06:28:01PM +0200, Hannes Reinecke wrote: > > The flush machinery is sending flushes before and/or after the original > > request (preflush/postflush). For blocked transports (ie during FC RSCN > > handling) the transport will error out commands depending on the FAILFAST > > setting. If FAILFAST is set the SCSI layer gets an STS_TRANSPORT error > > (causing the I/O to be retried), but STS_ERROR if not set (causing I/O to > > failed). > > > > So if the FAILFAST setting is _not_ aligned between flush_rq and the > > original we'll get an error on the flush rq and a retry on the original rq, > > causing the entire command to fail. > > > > I guess we need to align them. > > But you can't, because multiple pre/postflushes are coalesced into a > single outstanding flush request. They can and will not match quite > commonly. And if you mean the REQ_FAILFAST_TRANSPORT added by dm - this will never even see the flush state machine, as that is run in dm-mpath which then inserts the fully built flush request into the lower request queue. At least for request based multipath, bio could hit it.
On 2023/8/1 19:06, Christoph Hellwig wrote: > On Tue, Aug 01, 2023 at 01:04:32PM +0200, Christoph Hellwig wrote: >> On Mon, Jul 31, 2023 at 06:28:01PM +0200, Hannes Reinecke wrote: >>> The flush machinery is sending flushes before and/or after the original >>> request (preflush/postflush). For blocked transports (ie during FC RSCN >>> handling) the transport will error out commands depending on the FAILFAST >>> setting. If FAILFAST is set the SCSI layer gets an STS_TRANSPORT error >>> (causing the I/O to be retried), but STS_ERROR if not set (causing I/O to >>> failed). >>> >>> So if the FAILFAST setting is _not_ aligned between flush_rq and the >>> original we'll get an error on the flush rq and a retry on the original rq, >>> causing the entire command to fail. >>> >>> I guess we need to align them. >> >> But you can't, because multiple pre/postflushes are coalesced into a >> single outstanding flush request. They can and will not match quite >> commonly. > > And if you mean the REQ_FAILFAST_TRANSPORT added by dm - this will > never even see the flush state machine, as that is run in dm-mpath > which then inserts the fully built flush request into the lower request > queue. At least for request based multipath, bio could hit it. Yes, multiple pre/postflushes are coalesced into a single flush request. So we can't figure out which request to use. From the above explanation, can we just drop this inherit logic? It seems strange or wrong here. Thanks.
diff --git a/block/blk-flush.c b/block/blk-flush.c index e73dc22d05c1..fc25228f7bb1 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -92,7 +92,7 @@ enum { }; static void blk_kick_flush(struct request_queue *q, - struct blk_flush_queue *fq, blk_opf_t flags); + struct blk_flush_queue *fq); static inline struct blk_flush_queue * blk_get_flush_queue(struct request_queue *q, struct blk_mq_ctx *ctx) @@ -166,11 +166,9 @@ static void blk_flush_complete_seq(struct request *rq, { struct request_queue *q = rq->q; struct list_head *pending = &fq->flush_queue[fq->flush_pending_idx]; - blk_opf_t cmd_flags; BUG_ON(rq->flush.seq & seq); rq->flush.seq |= seq; - cmd_flags = rq->cmd_flags; if (likely(!error)) seq = blk_flush_cur_seq(rq); @@ -210,7 +208,7 @@ static void blk_flush_complete_seq(struct request *rq, BUG(); } - blk_kick_flush(q, fq, cmd_flags); + blk_kick_flush(q, fq); } static enum rq_end_io_ret flush_end_io(struct request *flush_rq, @@ -277,7 +275,6 @@ bool is_flush_rq(struct request *rq) * blk_kick_flush - consider issuing flush request * @q: request_queue being kicked * @fq: flush queue - * @flags: cmd_flags of the original request * * Flush related states of @q have changed, consider issuing flush request. * Please read the comment at the top of this file for more info. @@ -286,8 +283,7 @@ bool is_flush_rq(struct request *rq) * spin_lock_irq(fq->mq_flush_lock) * */ -static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq, - blk_opf_t flags) +static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq) { struct list_head *pending = &fq->flush_queue[fq->flush_pending_idx]; struct request *first_rq = @@ -336,7 +332,8 @@ static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq, flush_rq->internal_tag = first_rq->internal_tag; flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH; - flush_rq->cmd_flags |= (flags & REQ_DRV) | (flags & REQ_FAILFAST_MASK); + flush_rq->cmd_flags |= (first_rq->cmd_flags & REQ_DRV) | + (first_rq->cmd_flags & REQ_FAILFAST_MASK); flush_rq->rq_flags |= RQF_FLUSH_SEQ; flush_rq->end_io = flush_end_io; /*