[v2,00/13] optimise registered buffer/file updates

Message ID cover.1680576071.git.asml.silence@gmail.com
Headers
Series optimise registered buffer/file updates |

Message

Pavel Begunkov April 4, 2023, 12:39 p.m. UTC
  The patchset optimises registered files and buffers updates / removals,
The rsrc-update-bench test showes 11x improvement (1040K -> 11468K
updates / sec). It also improves latency by eliminating rcu grace
period waiting and bouncing it to another worker, and reduces
memory footprint by removing percpu refs.

That's quite important for apps updating files/buffers with medium or
higher frequency as updates are slow and expensive, and it currently
takes quite a number of IO requests per update to make using fixed
files/buffers worthwhile.

Another upside is that it makes it simpler, patch 9 removes very
convoluted synchronisation via flush_delayed_work() from the quiesce
path.

v2: rebase, add patches 12 and 13 to remove the last pair atomics out
    of the path and to limit caching.

Pavel Begunkov (13):
  io_uring/rsrc: use non-pcpu refcounts for nodes
  io_uring/rsrc: keep cached refs per node
  io_uring: don't put nodes under spinlocks
  io_uring: io_free_req() via tw
  io_uring/rsrc: protect node refs with uring_lock
  io_uring/rsrc: kill rsrc_ref_lock
  io_uring/rsrc: rename rsrc_list
  io_uring/rsrc: optimise io_rsrc_put allocation
  io_uring/rsrc: don't offload node free
  io_uring/rsrc: cache struct io_rsrc_node
  io_uring/rsrc: add lockdep sanity checks
  io_uring/rsrc: optimise io_rsrc_data refcounting
  io_uring/rsrc: add custom limit for node caching

 include/linux/io_uring_types.h |   8 +-
 io_uring/alloc_cache.h         |   6 +-
 io_uring/io_uring.c            |  54 ++++++----
 io_uring/rsrc.c                | 176 ++++++++++++---------------------
 io_uring/rsrc.h                |  58 +++++------
 5 files changed, 136 insertions(+), 166 deletions(-)
  

Comments

Jens Axboe April 4, 2023, 3:30 p.m. UTC | #1
On 4/4/23 6:39?AM, Pavel Begunkov wrote:
> The patchset optimises registered files and buffers updates / removals,
> The rsrc-update-bench test showes 11x improvement (1040K -> 11468K
> updates / sec). It also improves latency by eliminating rcu grace
> period waiting and bouncing it to another worker, and reduces
> memory footprint by removing percpu refs.
> 
> That's quite important for apps updating files/buffers with medium or
> higher frequency as updates are slow and expensive, and it currently
> takes quite a number of IO requests per update to make using fixed
> files/buffers worthwhile.
> 
> Another upside is that it makes it simpler, patch 9 removes very
> convoluted synchronisation via flush_delayed_work() from the quiesce
> path.

Ran this on the big box. Stock kernel is 6.3-rc5 + for-6.4/io_uring, and
patched is same kernel with this patchset applied.

Test				Kernel		Ops
---------------------------------------------------------
CPU0 rsrc-update-bench		Stock		  165670
CPU0 rsrc-update-bench		Stock		  166412
rsrc-update-bench		Stock		  213411
rsrc-update-bench		Stock		  208995

CPU0 rsrc-update-bench		Patched		10890297
CPU0 rsrc-update-bench		Patched		10451699
rsrc-update-bench		Patched		10793148
rsrc-update-bench		Patched		10934918

which is just ridicolous. It's ~64x faster pinned, and ~51x faster not
pinned. 

On top of that, it's a nice cleanup too and reduction in complexity.
  
Jens Axboe April 4, 2023, 3:33 p.m. UTC | #2
On Tue, 04 Apr 2023 13:39:44 +0100, Pavel Begunkov wrote:
> The patchset optimises registered files and buffers updates / removals,
> The rsrc-update-bench test showes 11x improvement (1040K -> 11468K
> updates / sec). It also improves latency by eliminating rcu grace
> period waiting and bouncing it to another worker, and reduces
> memory footprint by removing percpu refs.
> 
> That's quite important for apps updating files/buffers with medium or
> higher frequency as updates are slow and expensive, and it currently
> takes quite a number of IO requests per update to make using fixed
> files/buffers worthwhile.
> 
> [...]

Applied, thanks!

[01/13] io_uring/rsrc: use non-pcpu refcounts for nodes
        commit: b8fb5b4fdd67f9d18109c5d21d44a8bd4ddb608b
[02/13] io_uring/rsrc: keep cached refs per node
        commit: 8e15c0e71b8ae64fb7163532860f8d608165281f
[03/13] io_uring: don't put nodes under spinlocks
        commit: 2ad4c6d08018e4eec130c29992028dc356ab2181
[04/13] io_uring: io_free_req() via tw
        commit: 03adabe81abb20221079b48343783b4327bd1186
[05/13] io_uring/rsrc: protect node refs with uring_lock
        commit: ef8ae64ffa9578c12e44de42604004c2cc3e9c27
[06/13] io_uring/rsrc: kill rsrc_ref_lock
        commit: 0a4813b1abdf06e44ce60cdebfd374cfd27c46bf
[07/13] io_uring/rsrc: rename rsrc_list
        commit: c824986c113f15e2ef2c00da9a226c09ecaac74c
[08/13] io_uring/rsrc: optimise io_rsrc_put allocation
        commit: ff7c75ecaa9e6b251f76c24e289d4bfe413ffe31
[09/13] io_uring/rsrc: don't offload node free
        commit: 36b9818a5a84cb7c977fb723babca1c8d74f288f
[10/13] io_uring/rsrc: cache struct io_rsrc_node
        commit: 9eae8655f9cd2eeed99fb7a0d2bb22816c17e497
[11/13] io_uring/rsrc: add lockdep sanity checks
        commit: 1f2c8f610aa6c6a3dc3523f93eaf28c25051df6f
[12/13] io_uring/rsrc: optimise io_rsrc_data refcounting
        commit: 757ef4682b6aa29fdf752ad47f0d63eb48b261cf
[13/13] io_uring/rsrc: add custom limit for node caching
        commit: 69bbc6ade9d9d4e3c556cb83e77b6f3cd9ad3d18

Best regards,