[for-next,v3,0/3] implement pcpu bio caching for IRQ I/O

Message ID cover.1666347703.git.asml.silence@gmail.com
Headers
Series implement pcpu bio caching for IRQ I/O |

Message

Pavel Begunkov Oct. 21, 2022, 10:34 a.m. UTC
  Add bio pcpu caching for normal / IRQ-driven I/O extending REQ_ALLOC_CACHE,
which was limited to iopoll. t/io_uring with an Optane SSD setup showed +7%
for batches of 32 requests and +4.3% for batches of 8.

IRQ, 128/32/32, cache off
IOPS=59.08M, BW=28.84GiB/s, IOS/call=31/31
IOPS=59.30M, BW=28.96GiB/s, IOS/call=32/32
IOPS=59.97M, BW=29.28GiB/s, IOS/call=31/31
IOPS=59.92M, BW=29.26GiB/s, IOS/call=32/32
IOPS=59.81M, BW=29.20GiB/s, IOS/call=32/31

IRQ, 128/32/32, cache on
IOPS=64.05M, BW=31.27GiB/s, IOS/call=32/31
IOPS=64.22M, BW=31.36GiB/s, IOS/call=32/32
IOPS=64.04M, BW=31.27GiB/s, IOS/call=31/31
IOPS=63.16M, BW=30.84GiB/s, IOS/call=32/32

IRQ, 32/8/8, cache off
IOPS=50.60M, BW=24.71GiB/s, IOS/call=7/8
IOPS=50.22M, BW=24.52GiB/s, IOS/call=8/7
IOPS=49.54M, BW=24.19GiB/s, IOS/call=8/8
IOPS=50.07M, BW=24.45GiB/s, IOS/call=7/7
IOPS=50.46M, BW=24.64GiB/s, IOS/call=8/8

IRQ, 32/8/8, cache on
IOPS=51.39M, BW=25.09GiB/s, IOS/call=8/7
IOPS=52.52M, BW=25.64GiB/s, IOS/call=7/8
IOPS=52.57M, BW=25.67GiB/s, IOS/call=8/8
IOPS=52.58M, BW=25.67GiB/s, IOS/call=8/7
IOPS=52.61M, BW=25.69GiB/s, IOS/call=8/8

The next step will be turning it on for other users, hopefully by default.
The only restriction we currently have is that the allocations can't be
done from non-irq context and so needs auditing.

note: needs "bio: safeguard REQ_ALLOC_CACHE bio put" missing in for-6.2/block

v2: fix botched splicing threshold checks
v3: remove merged patch
    limit scope of flags var in bio_put_percpu_cache (Christoph Hellwig)

Pavel Begunkov (3):
  bio: split pcpu cache part of bio_put into a helper
  block/bio: add pcpu caching for non-polling bio_put
  io_uring/rw: enable bio caches for IRQ rw

 block/bio.c   | 93 +++++++++++++++++++++++++++++++++++++++------------
 io_uring/rw.c |  3 +-
 2 files changed, 74 insertions(+), 22 deletions(-)
  

Comments

Kanchan Joshi Oct. 25, 2022, 1:25 p.m. UTC | #1
On Fri, Oct 21, 2022 at 11:34:04AM +0100, Pavel Begunkov wrote:
>Add bio pcpu caching for normal / IRQ-driven I/O extending REQ_ALLOC_CACHE,
>which was limited to iopoll. 

So below comment (stating process context as MUST) can also be removed as
part of this series now?

 495  * If REQ_ALLOC_CACHE is set, the final put of the bio MUST be done from process
 496  * context, not hard/soft IRQ.
 497  *
 498  * Returns: Pointer to new bio on success, NULL on failure.
 499  */
 500 struct bio *bio_alloc_bioset(struct block_device *bdev, unsigned short nr_vecs,
 501                              blk_opf_t opf, gfp_t gfp_mask,
 502                              struct bio_set *bs)
 503 {

>t/io_uring with an Optane SSD setup showed +7%
>for batches of 32 requests and +4.3% for batches of 8.
>
>IRQ, 128/32/32, cache off
>IOPS=59.08M, BW=28.84GiB/s, IOS/call=31/31
>IOPS=59.30M, BW=28.96GiB/s, IOS/call=32/32
>IOPS=59.97M, BW=29.28GiB/s, IOS/call=31/31
>IOPS=59.92M, BW=29.26GiB/s, IOS/call=32/32
>IOPS=59.81M, BW=29.20GiB/s, IOS/call=32/31
>
>IRQ, 128/32/32, cache on
>IOPS=64.05M, BW=31.27GiB/s, IOS/call=32/31
>IOPS=64.22M, BW=31.36GiB/s, IOS/call=32/32
>IOPS=64.04M, BW=31.27GiB/s, IOS/call=31/31
>IOPS=63.16M, BW=30.84GiB/s, IOS/call=32/32
>
>IRQ, 32/8/8, cache off
>IOPS=50.60M, BW=24.71GiB/s, IOS/call=7/8
>IOPS=50.22M, BW=24.52GiB/s, IOS/call=8/7
>IOPS=49.54M, BW=24.19GiB/s, IOS/call=8/8
>IOPS=50.07M, BW=24.45GiB/s, IOS/call=7/7
>IOPS=50.46M, BW=24.64GiB/s, IOS/call=8/8
>
>IRQ, 32/8/8, cache on
>IOPS=51.39M, BW=25.09GiB/s, IOS/call=8/7
>IOPS=52.52M, BW=25.64GiB/s, IOS/call=7/8
>IOPS=52.57M, BW=25.67GiB/s, IOS/call=8/8
>IOPS=52.58M, BW=25.67GiB/s, IOS/call=8/7
>IOPS=52.61M, BW=25.69GiB/s, IOS/call=8/8
>
>The next step will be turning it on for other users, hopefully by default.
>The only restriction we currently have is that the allocations can't be
>done from non-irq context and so needs auditing.

Isn't allocation (of bio) happening in non-irq context already?

And
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
  
Pavel Begunkov Oct. 25, 2022, 2:51 p.m. UTC | #2
On 10/25/22 14:25, Kanchan Joshi wrote:
> On Fri, Oct 21, 2022 at 11:34:04AM +0100, Pavel Begunkov wrote:
>> Add bio pcpu caching for normal / IRQ-driven I/O extending REQ_ALLOC_CACHE,
>> which was limited to iopoll. 
> 
> So below comment (stating process context as MUST) can also be removed as
> part of this series now?

Right, good point


> 495  * If REQ_ALLOC_CACHE is set, the final put of the bio MUST be done from process
> 496  * context, not hard/soft IRQ.
> 497  *
> 498  * Returns: Pointer to new bio on success, NULL on failure.
> 499  */
> 500 struct bio *bio_alloc_bioset(struct block_device *bdev, unsigned short nr_vecs,
> 501                              blk_opf_t opf, gfp_t gfp_mask,
> 502                              struct bio_set *bs)
> 503 {
[...]
>> The next step will be turning it on for other users, hopefully by default.
>> The only restriction we currently have is that the allocations can't be
>> done from non-irq context and so needs auditing.
> 
> Isn't allocation (of bio) happening in non-irq context already?

That's my assumption, true for most of them, but I need to actually
check that. Will be following up after this series is merged.


> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>

thanks
  
Jens Axboe Oct. 25, 2022, 7:42 p.m. UTC | #3
On Fri, 21 Oct 2022 11:34:04 +0100, Pavel Begunkov wrote:
> Add bio pcpu caching for normal / IRQ-driven I/O extending REQ_ALLOC_CACHE,
> which was limited to iopoll. t/io_uring with an Optane SSD setup showed +7%
> for batches of 32 requests and +4.3% for batches of 8.
> 
> IRQ, 128/32/32, cache off
> IOPS=59.08M, BW=28.84GiB/s, IOS/call=31/31
> IOPS=59.30M, BW=28.96GiB/s, IOS/call=32/32
> IOPS=59.97M, BW=29.28GiB/s, IOS/call=31/31
> IOPS=59.92M, BW=29.26GiB/s, IOS/call=32/32
> IOPS=59.81M, BW=29.20GiB/s, IOS/call=32/31
> 
> [...]

Applied, thanks!

[1/3] bio: split pcpu cache part of bio_put into a helper
      commit: 0b0735a8c24f006d2d9d8b2b408b8c90f3163abd
[2/3] block/bio: add pcpu caching for non-polling bio_put
      commit: 13a184e269656994180e8c64ff56db03ed737902
[3/3] io_uring/rw: enable bio caches for IRQ rw
      commit: 93dad04746ea1340dec267f0e98ac42e8bc67160

Best regards,