Message ID | 20221118113052.1324140-1-yukuai1@huaweicloud.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp128118wrr; Fri, 18 Nov 2022 03:11:06 -0800 (PST) X-Google-Smtp-Source: AA0mqf44okqoLXRZb6G9JLjLF5qubGiDVQkKV4EC0NawciMhoYUAEvVswh88P6NFuLFRSikkbAWe X-Received: by 2002:a17:902:c3d1:b0:186:886f:e1e0 with SMTP id j17-20020a170902c3d100b00186886fe1e0mr7002926plj.162.1668769865849; Fri, 18 Nov 2022 03:11:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668769865; cv=none; d=google.com; s=arc-20160816; b=Yh8cKbKhA8kJSw4J4rgOomvRrM4izbRWVqX9iUmjgwr7n1IKqIG5dOyhBqPDR6zaDM jF74f4AIEcg5+S40RO03WAS1L5DLgeN9fgxXcr76A38PmLf0rcV0RbP0yPjs0f0Y5+G+ hyCy5E/a0k7X2msySXvrX6Ln7znTiLCRfvo3Jn7GcUVXvM02ZWFmQcdxTHZ9Wdncb5RZ CHtAzPL5bhn9ZDkHs+JWawq3v0174SfmP0owWklJa2FtKajJwx6SzZyVmn6GlEusiqkB k+zKPH4AkT80In17livq098v8BI7Tcw8aaDrLZQ7U0pO6aVuNaHrhQSPvuAncPjMywOh pCmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=ctSZ4AIPlRA8y4RXkbVpM4arkBcaH9rENjkETKN6rbg=; b=Tm1bxK0FGE5MFrr4r0ZFtHS19BO5FTxxyIgvsytQM8TfqpcLj6FfOYVscuWXzC0QBH QSbcgQmUI3ypOzuDAlaYeLnJCQY8rb7adrLz0SbBuRo9lA5VdkCe+UdBj/y6LQNc+IgF DOqFbi/96awOKWyxkqy5ZULUfkrhpg0ZHbMXhh9jCsgM2FsXOTSzrNJ9WxeZZ1Cck3gG VSod0aWvk7GhbLINQgN9+ZJjBY21jULuRy84VtVY8ynWwMp/Ky0UsfniQllOljRtm1Qi yXNFLtqmBgGD13hILa7Fa3O0BDx3sFx611bO/yNbDQYURrqUC/+HP7gexHXuZRF9witJ E8XA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d14-20020a63fd0e000000b00476a21877f4si3734834pgh.302.2022.11.18.03.10.51; Fri, 18 Nov 2022 03:11:05 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241926AbiKRLKH (ORCPT <rfc822;kkmonlee@gmail.com> + 99 others); Fri, 18 Nov 2022 06:10:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49104 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241819AbiKRLJi (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 18 Nov 2022 06:09:38 -0500 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 238149A5E0; Fri, 18 Nov 2022 03:09:34 -0800 (PST) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4NDDZJ2FGCz4f3jZ3; Fri, 18 Nov 2022 19:09:28 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.127.227]) by APP1 (Coremail) with SMTP id cCh0CgBXTq3pZ3djfEnGAg--.30577S4; Fri, 18 Nov 2022 19:09:31 +0800 (CST) From: Yu Kuai <yukuai1@huaweicloud.com> To: ming.lei@redhat.com, jejb@linux.ibm.com, martin.petersen@oracle.com Cc: linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com Subject: [PATCH RFC] scsi: core: remove unsed 'restarts' from scsi_device Date: Fri, 18 Nov 2022 19:30:52 +0800 Message-Id: <20221118113052.1324140-1-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID: cCh0CgBXTq3pZ3djfEnGAg--.30577S4 X-Coremail-Antispam: 1UD129KBjvJXoWxCF1xZrW3tFykAF45WFy5XFb_yoW5ZFWfp3 9IqanIyrWfWr48W3s5Xr4UXF1Yg3yj9345WFWxK34rWasFkryrAw1ktr15XFy8JrWvyF1D AFsrZFWkWr4qqrJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUym14x267AKxVW8JVW5JwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK02 1l84ACjcxK6xIIjxv20xvE14v26F1j6w1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4j 6F4UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s 0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xII jxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr 1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7MxAIw28IcxkI7VAKI48J MxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwV AFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUtVW8ZwCIc40Y0x0EwIxGrwCI42IY6xIIjxv2 0xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVWUJVW8JwCI42IY6xAIw20EY4 v20xvaj40_WFyUJVCq3wCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E 14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x0JUdHUDUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749832030840698148?= X-GMAIL-MSGID: =?utf-8?q?1749832030840698148?= |
Series |
[RFC] scsi: core: remove unsed 'restarts' from scsi_device
|
|
Commit Message
Yu Kuai
Nov. 18, 2022, 11:30 a.m. UTC
From: Yu Kuai <yukuai3@huawei.com> During code review, I found that 'restarts' is not useful anymore after the following commits: 1) commit ab3cee3762e5 ("blk-mq: In blk_mq_dispatch_rq_list() "no budget" is a reason to kick") 2) commit d3b38596875d ("blk-mq: run queue no matter whether the request is the last request") 3) commit 673235f91531 ("scsi: core: Fix race between handling STS_RESOURCE and completion") Now that if get budget ever failed, block layer will make sure to trigger new run queue for the hctx. Hence there is no need to run queue from scsi layer in this case. Signed-off-by: Yu Kuai <yukuai3@huawei.com> --- drivers/scsi/scsi_lib.c | 35 ----------------------------------- include/scsi/scsi_device.h | 1 - 2 files changed, 36 deletions(-)
Comments
On 11/18/22 03:30, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
Regarding the subject: unsed -> unused?
Hi, 在 2022/11/19 9:11, Bart Van Assche 写道: > On 11/18/22 03:30, Yu Kuai wrote: >> From: Yu Kuai <yukuai3@huawei.com> > > Regarding the subject: unsed -> unused? Yes, sorry for the spelling mistake. > > . >
Hi, 在 2022/11/18 19:30, Yu Kuai 写道: > From: Yu Kuai <yukuai3@huawei.com> > > During code review, I found that 'restarts' is not useful anymore after > the following commits: > > 1) commit ab3cee3762e5 ("blk-mq: In blk_mq_dispatch_rq_list() "no budget" > is a reason to kick") > 2) commit d3b38596875d ("blk-mq: run queue no matter whether the request > is the last request") > 3) commit 673235f91531 ("scsi: core: Fix race between handling STS_RESOURCE > and completion") > > Now that if get budget ever failed, block layer will make sure to > trigger new run queue for the hctx. Hence there is no need to run queue > from scsi layer in this case. > Does anyone has suggestions about this patch? More info why I tried to remove this: while testing megaraid with 4 nvme with none elevator, the default queue_depth is 128, while I test it with fio 128 jobs and 1 iodepth, bw is about 4Gib/s, however, if I test with 128 jobs and 2 iodepth, bw is decreased to about 0.8Gib/s, and with this patch applied, bw can stay 4Gib/s in the later case. Thanks, Kuai
On Sat, Nov 26, 2022 at 04:54:46PM +0800, Yu Kuai wrote: > Hi, > > 在 2022/11/18 19:30, Yu Kuai 写道: > > From: Yu Kuai <yukuai3@huawei.com> > > > > During code review, I found that 'restarts' is not useful anymore after > > the following commits: > > > > 1) commit ab3cee3762e5 ("blk-mq: In blk_mq_dispatch_rq_list() "no budget" > > is a reason to kick") > > 2) commit d3b38596875d ("blk-mq: run queue no matter whether the request > > is the last request") > > 3) commit 673235f91531 ("scsi: core: Fix race between handling STS_RESOURCE > > and completion") > > > > Now that if get budget ever failed, block layer will make sure to > > trigger new run queue for the hctx. Hence there is no need to run queue > > from scsi layer in this case. > > > > Does anyone has suggestions about this patch? > > More info why I tried to remove this: > > while testing megaraid with 4 nvme with none elevator, the default > queue_depth is 128, while I test it with fio 128 jobs and 1 iodepth, > bw is about 4Gib/s, however, if I test with 128 jobs and 2 iodepth, > bw is decreased to about 0.8Gib/s, and with this patch applied, > bw can stay 4Gib/s in the later case. I will look at this patch next week. Can you investigate a bit the reason why perf boost is from this patch? Thanks, Ming
On 2022/11/26 16:54, Yu Kuai wrote: > Hi, > > 在 2022/11/18 19:30, Yu Kuai 写道: >> From: Yu Kuai <yukuai3@huawei.com> >> >> During code review, I found that 'restarts' is not useful anymore after >> the following commits: >> >> 1) commit ab3cee3762e5 ("blk-mq: In blk_mq_dispatch_rq_list() "no budget" >> is a reason to kick") >> 2) commit d3b38596875d ("blk-mq: run queue no matter whether the request >> is the last request") >> 3) commit 673235f91531 ("scsi: core: Fix race between handling >> STS_RESOURCE >> and completion") >> >> Now that if get budget ever failed, block layer will make sure to >> trigger new run queue for the hctx. Hence there is no need to run queue >> from scsi layer in this case. >> > > Does anyone has suggestions about this patch? > > More info why I tried to remove this: > > while testing megaraid with 4 nvme with none elevator, the default > queue_depth is 128, while I test it with fio 128 jobs and 1 iodepth, > bw is about 4Gib/s, however, if I test with 128 jobs and 2 iodepth, > bw is decreased to about 0.8Gib/s, and with this patch applied, > bw can stay 4Gib/s in the later case. > Hi Yu Kuai, This information should be included in the commit message. Thanks, Jason
Hi, Ming 在 2022/11/27 17:45, Ming Lei 写道: > On Sat, Nov 26, 2022 at 04:54:46PM +0800, Yu Kuai wrote: >> Hi, >> >> 在 2022/11/18 19:30, Yu Kuai 写道: >>> From: Yu Kuai <yukuai3@huawei.com> >>> >>> During code review, I found that 'restarts' is not useful anymore after >>> the following commits: >>> >>> 1) commit ab3cee3762e5 ("blk-mq: In blk_mq_dispatch_rq_list() "no budget" >>> is a reason to kick") >>> 2) commit d3b38596875d ("blk-mq: run queue no matter whether the request >>> is the last request") >>> 3) commit 673235f91531 ("scsi: core: Fix race between handling STS_RESOURCE >>> and completion") >>> >>> Now that if get budget ever failed, block layer will make sure to >>> trigger new run queue for the hctx. Hence there is no need to run queue >>> from scsi layer in this case. >>> >> >> Does anyone has suggestions about this patch? >> >> More info why I tried to remove this: >> >> while testing megaraid with 4 nvme with none elevator, the default >> queue_depth is 128, while I test it with fio 128 jobs and 1 iodepth, >> bw is about 4Gib/s, however, if I test with 128 jobs and 2 iodepth, >> bw is decreased to about 0.8Gib/s, and with this patch applied, >> bw can stay 4Gib/s in the later case. > > I will look at this patch next week. > > Can you investigate a bit the reason why perf boost is from this patch? test cmd: without this patch, perf will show that on cpu time is wasted on mod_delayed_work_on: - 3.44% swapper [kernel.vmlinux] [k] mod_delayed_work_on - mod_delayed_work_on - 3.44% kblockd_mod_delayed_work_on __blk_mq_delay_run_hw_queue scsi_run_queue_async scsi_end_request scsi_io_completion scsi_finish_command + scsi_complete + 1.05% ksoftirqd/30 [kernel.vmlinux] [k] mod_delayed_work_on + 0.54% fio [kernel.vmlinux] [k] mod_delayed_work_on 0.36% ksoftirqd/22 [kernel.vmlinux] [k] mod_delayed_work_on 0.35% ksoftirqd/1 [kernel.vmlinux] [k] mod_delayed_work_on I also attched the flamegraph. Thanks, Kuai > o> Thanks, > Ming > > > . >
On Sat, Nov 26, 2022 at 04:54:46PM +0800, Yu Kuai wrote: > Hi, > > 在 2022/11/18 19:30, Yu Kuai 写道: > > From: Yu Kuai <yukuai3@huawei.com> > > > > During code review, I found that 'restarts' is not useful anymore after > > the following commits: > > > > 1) commit ab3cee3762e5 ("blk-mq: In blk_mq_dispatch_rq_list() "no budget" > > is a reason to kick") > > 2) commit d3b38596875d ("blk-mq: run queue no matter whether the request > > is the last request") > > 3) commit 673235f91531 ("scsi: core: Fix race between handling STS_RESOURCE > > and completion") > > > > Now that if get budget ever failed, block layer will make sure to > > trigger new run queue for the hctx. Hence there is no need to run queue > > from scsi layer in this case. > > But scsi_run_queue_async() needs to run all hw queue because budget is actually LUN/request queue wide. > > Does anyone has suggestions about this patch? > > More info why I tried to remove this: > > while testing megaraid with 4 nvme with none elevator, the default > queue_depth is 128, while I test it with fio 128 jobs and 1 iodepth, > bw is about 4Gib/s, however, if I test with 128 jobs and 2 iodepth, > bw is decreased to about 0.8Gib/s, and with this patch applied, > bw can stay 4Gib/s in the later case. What is .can_queue and nr_hw_queues in your setting? thanks, Ming
在 2022/11/28 11:27, Ming Lei 写道: > On Sat, Nov 26, 2022 at 04:54:46PM +0800, Yu Kuai wrote: >> Hi, >> >> 在 2022/11/18 19:30, Yu Kuai 写道: >>> From: Yu Kuai <yukuai3@huawei.com> >>> >>> During code review, I found that 'restarts' is not useful anymore after >>> the following commits: >>> >>> 1) commit ab3cee3762e5 ("blk-mq: In blk_mq_dispatch_rq_list() "no budget" >>> is a reason to kick") >>> 2) commit d3b38596875d ("blk-mq: run queue no matter whether the request >>> is the last request") >>> 3) commit 673235f91531 ("scsi: core: Fix race between handling STS_RESOURCE >>> and completion") >>> >>> Now that if get budget ever failed, block layer will make sure to >>> trigger new run queue for the hctx. Hence there is no need to run queue >>> from scsi layer in this case. >>> > > But scsi_run_queue_async() needs to run all hw queue because budget is > actually LUN/request queue wide. Why the hw queue need to run if get budget never failed in this hw queue? > >> >> Does anyone has suggestions about this patch? >> >> More info why I tried to remove this: >> >> while testing megaraid with 4 nvme with none elevator, the default >> queue_depth is 128, while I test it with fio 128 jobs and 1 iodepth, >> bw is about 4Gib/s, however, if I test with 128 jobs and 2 iodepth, >> bw is decreased to about 0.8Gib/s, and with this patch applied, >> bw can stay 4Gib/s in the later case. > > What is .can_queue and nr_hw_queues in your setting? test cmd: fio -name=0 -ioengine=libaio -direct=1 -group_reporting=1 -randseed=2022 -rwmixread=70 -refill_buffers -filename=/dev/sdg -numjobs=128 -size=1TB -runtime=60s -bs=4k -iodepth=2 -rw=randwrite test environment: arm64 Kunpeng-920, 128 cpu megaraid with 4 NVMEs, 128 hctx and queue_depth is 128 > > > > thanks, > Ming > > . >
在 2022/11/28 11:35, Yu Kuai 写道: > > Why the hw queue need to run if get budget never failed in this hw > queue? And by the way, after Jan's patch "blk-mq: Improve performance of non-mq IO schedulers with multiple HW queues", scsi_run_queue_async() can only garantee to run hw queue for the current cpu, not all the hw queues.
On Mon, Nov 28, 2022 at 11:35:18AM +0800, Yu Kuai wrote: > > > 在 2022/11/28 11:27, Ming Lei 写道: > > On Sat, Nov 26, 2022 at 04:54:46PM +0800, Yu Kuai wrote: > > > Hi, > > > > > > 在 2022/11/18 19:30, Yu Kuai 写道: > > > > From: Yu Kuai <yukuai3@huawei.com> > > > > > > > > During code review, I found that 'restarts' is not useful anymore after > > > > the following commits: > > > > > > > > 1) commit ab3cee3762e5 ("blk-mq: In blk_mq_dispatch_rq_list() "no budget" > > > > is a reason to kick") > > > > 2) commit d3b38596875d ("blk-mq: run queue no matter whether the request > > > > is the last request") > > > > 3) commit 673235f91531 ("scsi: core: Fix race between handling STS_RESOURCE > > > > and completion") > > > > > > > > Now that if get budget ever failed, block layer will make sure to > > > > trigger new run queue for the hctx. Hence there is no need to run queue > > > > from scsi layer in this case. > > > > > > > > But scsi_run_queue_async() needs to run all hw queue because budget is > > actually LUN/request queue wide. > > Why the hw queue need to run if get budget never failed in this hw > queue? Because all hw queues share the queue wide budget, and once budget is available, all hw queues are re-run, and the hw queue won't be scheduled actually if there is nothing to run, see blk_mq_run_hw_queue(). > > > > > > > > > Does anyone has suggestions about this patch? > > > > > > More info why I tried to remove this: > > > > > > while testing megaraid with 4 nvme with none elevator, the default > > > queue_depth is 128, while I test it with fio 128 jobs and 1 iodepth, > > > bw is about 4Gib/s, however, if I test with 128 jobs and 2 iodepth, > > > bw is decreased to about 0.8Gib/s, and with this patch applied, > > > bw can stay 4Gib/s in the later case. > > > > What is .can_queue and nr_hw_queues in your setting? > test cmd: > fio -name=0 -ioengine=libaio -direct=1 -group_reporting=1 -randseed=2022 > -rwmixread=70 -refill_buffers -filename=/dev/sdg -numjobs=128 -size=1TB > -runtime=60s -bs=4k -iodepth=2 -rw=randwrite > > test environment: > arm64 Kunpeng-920, 128 cpu > megaraid with 4 NVMEs, 128 hctx and queue_depth is 128 From your setting, megaraid should sets ->host_tagset, that said there is only 128 tags for all 4 NVMEs(128 hw queue shares the all 128 tags too). That looks one really bad setting. BTW, why do you drive nvme via megaraid instead nvme driver? > And by the way, after Jan's patch "blk-mq: Improve performance of non-mq > IO schedulers with multiple HW queues", scsi_run_queue_async() can only > garantee to run hw queue for the current cpu, not all the hw queues. That isn't true, each hctx is still run in case of none & kyber scheduler. thanks, Ming
Hi, 在 2022/11/28 12:12, Ming Lei 写道: > On Mon, Nov 28, 2022 at 11:35:18AM +0800, Yu Kuai wrote: >> >> >> 在 2022/11/28 11:27, Ming Lei 写道: >>> On Sat, Nov 26, 2022 at 04:54:46PM +0800, Yu Kuai wrote: >>>> Hi, >>>> >>>> 在 2022/11/18 19:30, Yu Kuai 写道: >>>>> From: Yu Kuai <yukuai3@huawei.com> >>>>> >>>>> During code review, I found that 'restarts' is not useful anymore after >>>>> the following commits: >>>>> >>>>> 1) commit ab3cee3762e5 ("blk-mq: In blk_mq_dispatch_rq_list() "no budget" >>>>> is a reason to kick") >>>>> 2) commit d3b38596875d ("blk-mq: run queue no matter whether the request >>>>> is the last request") >>>>> 3) commit 673235f91531 ("scsi: core: Fix race between handling STS_RESOURCE >>>>> and completion") >>>>> >>>>> Now that if get budget ever failed, block layer will make sure to >>>>> trigger new run queue for the hctx. Hence there is no need to run queue >>>>> from scsi layer in this case. >>>>> >>> >>> But scsi_run_queue_async() needs to run all hw queue because budget is >>> actually LUN/request queue wide. >> >> Why the hw queue need to run if get budget never failed in this hw >> queue? > > Because all hw queues share the queue wide budget, and once budget > is available, all hw queues are re-run, and the hw queue won't be > scheduled actually if there is nothing to run, see > blk_mq_run_hw_queue(). I still don't get it why all hw queues should be re-run, is this just for performance or fixing a bug? And I'm not sure this behavior is good for performance. > >> >>> >>>> >>>> Does anyone has suggestions about this patch? >>>> >>>> More info why I tried to remove this: >>>> >>>> while testing megaraid with 4 nvme with none elevator, the default >>>> queue_depth is 128, while I test it with fio 128 jobs and 1 iodepth, >>>> bw is about 4Gib/s, however, if I test with 128 jobs and 2 iodepth, >>>> bw is decreased to about 0.8Gib/s, and with this patch applied, >>>> bw can stay 4Gib/s in the later case. >>> >>> What is .can_queue and nr_hw_queues in your setting? >> test cmd: >> fio -name=0 -ioengine=libaio -direct=1 -group_reporting=1 -randseed=2022 >> -rwmixread=70 -refill_buffers -filename=/dev/sdg -numjobs=128 -size=1TB >> -runtime=60s -bs=4k -iodepth=2 -rw=randwrite >> >> test environment: >> arm64 Kunpeng-920, 128 cpu >> megaraid with 4 NVMEs, 128 hctx and queue_depth is 128 > >>From your setting, megaraid should sets ->host_tagset, that said there > is only 128 tags for all 4 NVMEs(128 hw queue shares the all 128 tags > too). > > That looks one really bad setting. It's right that is bad, but the point here is to triggered get budget failed frequently. If I set queue_depth to 2048, and I use 128 numjobs with 32 iodpeth, performance is still bad. > > BTW, why do you drive nvme via megaraid instead nvme driver? > >> And by the way, after Jan's patch "blk-mq: Improve performance of non-mq >> IO schedulers with multiple HW queues", scsi_run_queue_async() can only >> garantee to run hw queue for the current cpu, not all the hw queues. > > That isn't true, each hctx is still run in case of none & kyber scheduler. Yes, but current default hctx shared elevator is deadline. > > thanks, > Ming > > . >
Hi, > Hi, > > 在 2022/11/28 12:12, Ming Lei 写道: >> >> BTW, why do you drive nvme via megaraid instead nvme driver? >> >>> And by the way, after Jan's patch "blk-mq: Improve performance of non-mq >>> IO schedulers with multiple HW queues", scsi_run_queue_async() can only >>> garantee to run hw queue for the current cpu, not all the hw queues. >> >> That isn't true, each hctx is still run in case of none & kyber >> scheduler. And I really suspect that why Jan's patch can improve performance is because it avoid many run queues from scsi_run_queue_async(). > > Yes, but current default hctx shared elevator is deadline. >> >> thanks, >> Ming >> >> . >> > > . >
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 56f641ba1261..f6325a0f80fb 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -507,24 +507,6 @@ static void scsi_run_queue_async(struct scsi_device *sdev) if (scsi_target(sdev)->single_lun || !list_empty(&sdev->host->starved_list)) { kblockd_schedule_work(&sdev->requeue_work); - } else { - /* - * smp_mb() present in sbitmap_queue_clear() or implied in - * .end_io is for ordering writing .device_busy in - * scsi_device_unbusy() and reading sdev->restarts. - */ - int old = atomic_read(&sdev->restarts); - - /* - * ->restarts has to be kept as non-zero if new budget - * contention occurs. - * - * No need to run queue when either another re-run - * queue wins in updating ->restarts or a new budget - * contention occurs. - */ - if (old && atomic_cmpxchg(&sdev->restarts, old, 0) == old) - blk_mq_run_hw_queues(sdev->request_queue, true); } } @@ -1666,23 +1648,6 @@ static int scsi_mq_get_budget(struct request_queue *q) if (token >= 0) return token; - atomic_inc(&sdev->restarts); - - /* - * Orders atomic_inc(&sdev->restarts) and atomic_read(&sdev->device_busy). - * .restarts must be incremented before .device_busy is read because the - * code in scsi_run_queue_async() depends on the order of these operations. - */ - smp_mb__after_atomic(); - - /* - * If all in-flight requests originated from this LUN are completed - * before reading .device_busy, sdev->device_busy will be observed as - * zero, then blk_mq_delay_run_hw_queues() will dispatch this request - * soon. Otherwise, completion of one of these requests will observe - * the .restarts flag, and the request queue will be run for handling - * this request, see scsi_end_request(). - */ if (unlikely(scsi_device_busy(sdev) == 0 && !scsi_device_blocked(sdev))) blk_mq_delay_run_hw_queues(sdev->request_queue, SCSI_QUEUE_DELAY); diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index 24bdbf7999ab..66345de80897 100644 --- a/include/scsi/scsi_device.h +++ b/include/scsi/scsi_device.h @@ -115,7 +115,6 @@ struct scsi_device { struct sbitmap budget_map; atomic_t device_blocked; /* Device returned QUEUE_FULL. */ - atomic_t restarts; spinlock_t list_lock; struct list_head starved_entry; unsigned short queue_depth; /* How deep of a queue we want */