[RFC,00/11] optimise registered buffer/file updates

Message ID cover.1680187408.git.asml.silence@gmail.com
Headers
Series optimise registered buffer/file updates |

Message

Pavel Begunkov March 30, 2023, 2:53 p.m. UTC
  Updating registered files and buffers is a very slow operation, which
makes it not feasible for workloads with medium update frequencies.
Rework the underlying rsrc infra for greater performance and lesser
memory footprint.

The improvement is ~11x for a benchmark updating files in a loop
(1040K -> 11468K updates / sec).

The set requires a couple of patches from the 6.3 branch, for that
reason it's an RFC and will be resent after merge.

https://github.com/isilence/linux.git optimise-rsrc-update

Pavel Begunkov (11):
  io_uring/rsrc: use non-pcpu refcounts for nodes
  io_uring/rsrc: keep cached refs per node
  io_uring: don't put nodes under spinlocks
  io_uring: io_free_req() via tw
  io_uring/rsrc: protect node refs with uring_lock
  io_uring/rsrc: kill rsrc_ref_lock
  io_uring/rsrc: rename rsrc_list
  io_uring/rsrc: optimise io_rsrc_put allocation
  io_uring/rsrc: don't offload node free
  io_uring/rsrc: cache struct io_rsrc_node
  io_uring/rsrc: add lockdep sanity checks

 include/linux/io_uring_types.h |   7 +-
 io_uring/io_uring.c            |  47 ++++++----
 io_uring/rsrc.c                | 152 +++++++++++----------------------
 io_uring/rsrc.h                |  50 ++++++-----
 4 files changed, 105 insertions(+), 151 deletions(-)
  

Comments

Gabriel Krisman Bertazi March 31, 2023, 1:35 p.m. UTC | #1
Pavel,

Pavel Begunkov <asml.silence@gmail.com> writes:
> Updating registered files and buffers is a very slow operation, which
> makes it not feasible for workloads with medium update frequencies.
> Rework the underlying rsrc infra for greater performance and lesser
> memory footprint.
>
> The improvement is ~11x for a benchmark updating files in a loop
> (1040K -> 11468K updates / sec).

Nice. That's a really impressive improvement.

I've been adding io_uring test cases for automated performance
regression testing with mmtests (open source).  I'd love to take a look
at this test case and adapt it to mmtests, so we can pick it up and run
it frequently.

is it something you can share?
  
Jens Axboe March 31, 2023, 3:18 p.m. UTC | #2
On 3/30/23 8:53 AM, Pavel Begunkov wrote:
> Updating registered files and buffers is a very slow operation, which
> makes it not feasible for workloads with medium update frequencies.
> Rework the underlying rsrc infra for greater performance and lesser
> memory footprint.
> 
> The improvement is ~11x for a benchmark updating files in a loop
> (1040K -> 11468K updates / sec).
> 
> The set requires a couple of patches from the 6.3 branch, for that
> reason it's an RFC and will be resent after merge.

Looks pretty sane to me, didn't find anything immediately wrong. I
do wonder if we should have a conditional uring_lock helper, we do
have a few of those. But not really related to this series, as it
just moves one around.
  
Pavel Begunkov March 31, 2023, 4:21 p.m. UTC | #3
On 3/31/23 14:35, Gabriel Krisman Bertazi wrote:
> Pavel,
> 
> Pavel Begunkov <asml.silence@gmail.com> writes:
>> Updating registered files and buffers is a very slow operation, which
>> makes it not feasible for workloads with medium update frequencies.
>> Rework the underlying rsrc infra for greater performance and lesser
>> memory footprint.
>>
>> The improvement is ~11x for a benchmark updating files in a loop
>> (1040K -> 11468K updates / sec).
> 
> Nice. That's a really impressive improvement.
> 
> I've been adding io_uring test cases for automated performance
> regression testing with mmtests (open source).  I'd love to take a look
> at this test case and adapt it to mmtests, so we can pick it up and run
> it frequently.
> 
> is it something you can share?

I'll post it later.

The test is quite stupid and with the patches less than 10% of CPU
cycles go to the update machinery (against 90+ w/o), the rest is spend
for syscalling, submitting update requests, etc., so it almost hits the
limit.

Another test we can do is to measure latency b/w the point we asked a
rsrc to be removed and when it actually got destroyed/freed, e.g. tags
will help with that. It should've been improved nicely as well as it
removes the RCU grace period and other bouncing.