[RFC,v2,0/8] blk-mq: improve tag fair sharing

Message ID 20231021154806.4019417-1-yukuai1@huaweicloud.com
Headers
Series blk-mq: improve tag fair sharing |

Message

Yu Kuai Oct. 21, 2023, 3:47 p.m. UTC
  From: Yu Kuai <yukuai3@huawei.com>

Current implementation:
 - a counter active_queues record how many queue/hctx is sharing tags,
 and it's updated while issue new IO, and cleared in
 blk_mq_timeout_work().
 - if active_queues is more than 1, then tags is fair shared to each
 node;

New implementation:
 - a new field 'available_tags' is added to each node, and it's
 calculate in slow path, hence fast path won't be affected, patch 5;
 - a new counter 'busy_queues' is added to blk_mq_tags, and it's updated
 while fail to get driver tag, and it's also cleared in
 blk_mq_timeout_work(), and tag sharing will based on 'busy_queues'
 instead of 'active_queues', patch 6,7;
 - a new counter 'busy_count' is added to each node to record how many
 times a node failed to get driver tag, and it's used to judge if a node
 is busy and need more tags, patch 8;
 - a new timer is added to blk_mq_tags, it will start if any node failed
 to get driver tag, and timer function will be used to borrow tags and
 return borrowed tags, patch 8;

A simple test, 32 tags with two shared node:
[global]
ioengine=libaio
iodepth=2
bs=4k
direct=1
rw=randrw
group_reporting

[sda]
numjobs=32
filename=/dev/sda

[sdb]
numjobs=1
filename=/dev/sdb

Test result(monitor new debugfs entry):

time    active          available
        sda     sdb     sda     sdb
0       0       0       32      32
1       16      2       16      16      -> start fair sharing
2       19      2       20      16
3       24      2       24      16
4       26      2       28      16      -> borrow 32/8=4 tags each round
5       28      2       28      16      -> save at lease 4 tags for sdb

Yu Kuai (8):
  blk-mq: factor out a structure from blk_mq_tags
  blk-mq: factor out a structure to store information for tag sharing
  blk-mq: add a helper to initialize shared_tag_info
  blk-mq: support to track active queues from blk_mq_tags
  blk-mq: precalculate available tags for hctx_may_queue()
  blk-mq: add new helpers blk_mq_driver_tag_busy/idle()
  blk-mq-tag: delay tag sharing until fail to get driver tag
  blk-mq-tag: allow shared queue/hctx to get more driver tags

 block/blk-core.c       |   2 -
 block/blk-mq-debugfs.c |  30 +++++-
 block/blk-mq-tag.c     | 226 +++++++++++++++++++++++++++++++++++++++--
 block/blk-mq.c         |  12 ++-
 block/blk-mq.h         |  64 +++++++-----
 include/linux/blk-mq.h |  36 +++++--
 include/linux/blkdev.h |  11 +-
 7 files changed, 328 insertions(+), 53 deletions(-)
  

Comments

Ming Lei Oct. 23, 2023, 4:38 a.m. UTC | #1
Hello Yu Kuai,

On Sat, Oct 21, 2023 at 11:47:58PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Current implementation:
>  - a counter active_queues record how many queue/hctx is sharing tags,
>  and it's updated while issue new IO, and cleared in
>  blk_mq_timeout_work().
>  - if active_queues is more than 1, then tags is fair shared to each
>  node;

Can you explain a bit what the problem is in current tag sharing?
And what is your basic approach for this problem?

Just mentioning the implementation is not too helpful for initial
review, cause the problem and approach(correctness) need to be
understood first.

Thanks, 
Ming
  
Yu Kuai Oct. 23, 2023, 7:26 a.m. UTC | #2
Hi,

在 2023/10/23 12:38, Ming Lei 写道:
> Hello Yu Kuai,
> 
> On Sat, Oct 21, 2023 at 11:47:58PM +0800, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Current implementation:
>>   - a counter active_queues record how many queue/hctx is sharing tags,
>>   and it's updated while issue new IO, and cleared in
>>   blk_mq_timeout_work().
>>   - if active_queues is more than 1, then tags is fair shared to each
>>   node;
> 
> Can you explain a bit what the problem is in current tag sharing?
> And what is your basic approach for this problem?
> 
> Just mentioning the implementation is not too helpful for initial
> review, cause the problem and approach(correctness) need to be
> understood first.

Of course, I'll add following if there will be a v3;

Current problems:

If there are multiple active_queues, then tag is fair shared to each
queue, and if one queue is not busy(for example, only issue one IO once
for a while), then shared tags for this queue is wasted and can't be
used for other queues.

Depends on the hardware, this might casue performance problems in some
user case. For example, as reported by [1], UFS devices
have multiple logical units. One of these logical units (WLUN) is used
to submit control commands, e.g. START STOP UNIT. If any request is
submitted to the WLUN, the queue depth is reduced from 31 to 15 or
lower for data LUNs.

This patchset first delay tag sharing from issue IO to failed to get
driver tag; then add a counter to record how many times shared queue
failed to get driver tag to indicate if the queue is busy; finially,
allow busy queue to borrow more tags from idle queue.

Thanks,
Kuai

> 
> Thanks,
> Ming
> 
> .
>