[PATCHv2] io_uring: Support calling io_uring_register with a registered ring fd

Message ID f2396369e638284586b069dbddffb8c992afba95.1676419314.git.josh@joshtriplett.org
State New
Headers
Series [PATCHv2] io_uring: Support calling io_uring_register with a registered ring fd |

Commit Message

Josh Triplett Feb. 15, 2023, 12:42 a.m. UTC
  Add a new flag IORING_REGISTER_USE_REGISTERED_RING (set via the high bit
of the opcode) to treat the fd as a registered index rather than a file
descriptor.

This makes it possible for a library to open an io_uring, register the
ring fd, close the ring fd, and subsequently use the ring entirely via
registered index.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
---

v2: Rebase. Change io_uring_register to extract the flag from the opcode first.

 include/uapi/linux/io_uring.h |  6 +++++-
 io_uring/io_uring.c           | 34 +++++++++++++++++++++++++++-------
 2 files changed, 32 insertions(+), 8 deletions(-)
  

Comments

Jens Axboe Feb. 15, 2023, 5:44 p.m. UTC | #1
On 2/14/23 5:42 PM, Josh Triplett wrote:
> Add a new flag IORING_REGISTER_USE_REGISTERED_RING (set via the high bit
> of the opcode) to treat the fd as a registered index rather than a file
> descriptor.
> 
> This makes it possible for a library to open an io_uring, register the
> ring fd, close the ring fd, and subsequently use the ring entirely via
> registered index.

This looks pretty straight forward to me, only real question I had
was whether using the top bit of the register opcode for this is the
best choice. But I can't think of better ways to do it, and the space
is definitely big enough to do that, so looks fine to me.

One more comment below:

> +	if (use_registered_ring) {
> +		/*
> +		 * Ring fd has been registered via IORING_REGISTER_RING_FDS, we
> +		 * need only dereference our task private array to find it.
> +		 */
> +		struct io_uring_task *tctx = current->io_uring;

I need to double check if it's guaranteed we always have current->io_uring
assigned here. If the ring is registered we certainly will have it, but
what if someone calls io_uring_register(2) without having a ring setup
upfront?

IOW, I think we need a NULL check here and failing the request at that
point.
  
Josh Triplett Feb. 15, 2023, 8:33 p.m. UTC | #2
On Wed, Feb 15, 2023 at 10:44:38AM -0700, Jens Axboe wrote:
> On 2/14/23 5:42 PM, Josh Triplett wrote:
> > Add a new flag IORING_REGISTER_USE_REGISTERED_RING (set via the high bit
> > of the opcode) to treat the fd as a registered index rather than a file
> > descriptor.
> > 
> > This makes it possible for a library to open an io_uring, register the
> > ring fd, close the ring fd, and subsequently use the ring entirely via
> > registered index.
> 
> This looks pretty straight forward to me, only real question I had
> was whether using the top bit of the register opcode for this is the
> best choice. But I can't think of better ways to do it, and the space
> is definitely big enough to do that, so looks fine to me.

It seemed like the cleanest way available given the ABI of
io_uring_register, yeah.

> One more comment below:
> 
> > +	if (use_registered_ring) {
> > +		/*
> > +		 * Ring fd has been registered via IORING_REGISTER_RING_FDS, we
> > +		 * need only dereference our task private array to find it.
> > +		 */
> > +		struct io_uring_task *tctx = current->io_uring;
> 
> I need to double check if it's guaranteed we always have current->io_uring
> assigned here. If the ring is registered we certainly will have it, but
> what if someone calls io_uring_register(2) without having a ring setup
> upfront?
> 
> IOW, I think we need a NULL check here and failing the request at that
> point.

The next line is:

+               if (unlikely(!tctx || fd >= IO_RINGFD_REG_MAX))

The first part of that condition is the NULL check you're looking for,
right?

- Josh Triplett
  
Jens Axboe Feb. 15, 2023, 9:39 p.m. UTC | #3
On 2/15/23 1:33?PM, Josh Triplett wrote:
> On Wed, Feb 15, 2023 at 10:44:38AM -0700, Jens Axboe wrote:
>> On 2/14/23 5:42?PM, Josh Triplett wrote:
>>> Add a new flag IORING_REGISTER_USE_REGISTERED_RING (set via the high bit
>>> of the opcode) to treat the fd as a registered index rather than a file
>>> descriptor.
>>>
>>> This makes it possible for a library to open an io_uring, register the
>>> ring fd, close the ring fd, and subsequently use the ring entirely via
>>> registered index.
>>
>> This looks pretty straight forward to me, only real question I had
>> was whether using the top bit of the register opcode for this is the
>> best choice. But I can't think of better ways to do it, and the space
>> is definitely big enough to do that, so looks fine to me.
> 
> It seemed like the cleanest way available given the ABI of
> io_uring_register, yeah.
> 
>> One more comment below:
>>
>>> +	if (use_registered_ring) {
>>> +		/*
>>> +		 * Ring fd has been registered via IORING_REGISTER_RING_FDS, we
>>> +		 * need only dereference our task private array to find it.
>>> +		 */
>>> +		struct io_uring_task *tctx = current->io_uring;
>>
>> I need to double check if it's guaranteed we always have current->io_uring
>> assigned here. If the ring is registered we certainly will have it, but
>> what if someone calls io_uring_register(2) without having a ring setup
>> upfront?
>>
>> IOW, I think we need a NULL check here and failing the request at that
>> point.
> 
> The next line is:
> 
> +               if (unlikely(!tctx || fd >= IO_RINGFD_REG_MAX))
> 
> The first part of that condition is the NULL check you're looking for,
> right?

Ah yeah, I'm just blind... Looks fine!
  
Jens Axboe Feb. 16, 2023, 3:24 a.m. UTC | #4
On Tue, 14 Feb 2023 16:42:22 -0800, Josh Triplett wrote:
> Add a new flag IORING_REGISTER_USE_REGISTERED_RING (set via the high bit
> of the opcode) to treat the fd as a registered index rather than a file
> descriptor.
> 
> This makes it possible for a library to open an io_uring, register the
> ring fd, close the ring fd, and subsequently use the ring entirely via
> registered index.
> 
> [...]

Applied, thanks!

[1/1] io_uring: Support calling io_uring_register with a registered ring fd
      commit: 04eb372cac91a4f70c9b921c1b86758f5553d311

Best regards,
  
Dylan Yudaken Feb. 16, 2023, 9:35 a.m. UTC | #5
On Tue, 2023-02-14 at 16:42 -0800, Josh Triplett wrote:
> @@ -4177,17 +4177,37 @@ SYSCALL_DEFINE4(io_uring_register, unsigned
> int, fd, unsigned int, opcode,
>         struct io_ring_ctx *ctx;
>         long ret = -EBADF;
>         struct fd f;
> +       bool use_registered_ring;
> +
> +       use_registered_ring = !!(opcode &
> IORING_REGISTER_USE_REGISTERED_RING);
> +       opcode &= ~IORING_REGISTER_USE_REGISTERED_RING;
>  
>         if (opcode >= IORING_REGISTER_LAST)
>                 return -EINVAL;
>  
> -       f = fdget(fd);
> -       if (!f.file)
> -               return -EBADF;
> +       if (use_registered_ring) {
> +               /*
> +                * Ring fd has been registered via
> IORING_REGISTER_RING_FDS, we
> +                * need only dereference our task private array to
> find it.
> +                */
> +               struct io_uring_task *tctx = current->io_uring;
>  
> -       ret = -EOPNOTSUPP;
> -       if (!io_is_uring_fops(f.file))
> -               goto out_fput;
> +               if (unlikely(!tctx || fd >= IO_RINGFD_REG_MAX))
> +                       return -EINVAL;
> +               fd = array_index_nospec(fd, IO_RINGFD_REG_MAX);
> +               f.file = tctx->registered_rings[fd];
> +               f.flags = 0;
> +               if (unlikely(!f.file))
> +                       return -EBADF;
> +               opcode &= ~IORING_REGISTER_USE_REGISTERED_RING;

^ this line looks duplicated at the top of the function?


Also - is there a liburing regression test for this?
  
Josh Triplett Feb. 16, 2023, 12:05 p.m. UTC | #6
On Thu, Feb 16, 2023 at 09:35:44AM +0000, Dylan Yudaken wrote:
> On Tue, 2023-02-14 at 16:42 -0800, Josh Triplett wrote:
> > @@ -4177,17 +4177,37 @@ SYSCALL_DEFINE4(io_uring_register, unsigned
> > int, fd, unsigned int, opcode,
> >         struct io_ring_ctx *ctx;
> >         long ret = -EBADF;
> >         struct fd f;
> > +       bool use_registered_ring;
> > +
> > +       use_registered_ring = !!(opcode &
> > IORING_REGISTER_USE_REGISTERED_RING);
> > +       opcode &= ~IORING_REGISTER_USE_REGISTERED_RING;
> >  
> >         if (opcode >= IORING_REGISTER_LAST)
> >                 return -EINVAL;
> >  
> > -       f = fdget(fd);
> > -       if (!f.file)
> > -               return -EBADF;
> > +       if (use_registered_ring) {
> > +               /*
> > +                * Ring fd has been registered via
> > IORING_REGISTER_RING_FDS, we
> > +                * need only dereference our task private array to
> > find it.
> > +                */
> > +               struct io_uring_task *tctx = current->io_uring;
> >  
> > -       ret = -EOPNOTSUPP;
> > -       if (!io_is_uring_fops(f.file))
> > -               goto out_fput;
> > +               if (unlikely(!tctx || fd >= IO_RINGFD_REG_MAX))
> > +                       return -EINVAL;
> > +               fd = array_index_nospec(fd, IO_RINGFD_REG_MAX);
> > +               f.file = tctx->registered_rings[fd];
> > +               f.flags = 0;
> > +               if (unlikely(!f.file))
> > +                       return -EBADF;
> > +               opcode &= ~IORING_REGISTER_USE_REGISTERED_RING;
> 
> ^ this line looks duplicated at the top of the function?

Good catch!

Jens, since you've already applied this, can you remove this line or
would you like a patch doing so?

> Also - is there a liburing regression test for this?

Userspace, including test: https://github.com/axboe/liburing/pull/664
  
Jens Axboe Feb. 16, 2023, 1:10 p.m. UTC | #7
On 2/16/23 5:05?AM, Josh Triplett wrote:
> On Thu, Feb 16, 2023 at 09:35:44AM +0000, Dylan Yudaken wrote:
>> On Tue, 2023-02-14 at 16:42 -0800, Josh Triplett wrote:
>>> @@ -4177,17 +4177,37 @@ SYSCALL_DEFINE4(io_uring_register, unsigned
>>> int, fd, unsigned int, opcode,
>>>         struct io_ring_ctx *ctx;
>>>         long ret = -EBADF;
>>>         struct fd f;
>>> +       bool use_registered_ring;
>>> +
>>> +       use_registered_ring = !!(opcode &
>>> IORING_REGISTER_USE_REGISTERED_RING);
>>> +       opcode &= ~IORING_REGISTER_USE_REGISTERED_RING;
>>>  
>>>         if (opcode >= IORING_REGISTER_LAST)
>>>                 return -EINVAL;
>>>  
>>> -       f = fdget(fd);
>>> -       if (!f.file)
>>> -               return -EBADF;
>>> +       if (use_registered_ring) {
>>> +               /*
>>> +                * Ring fd has been registered via
>>> IORING_REGISTER_RING_FDS, we
>>> +                * need only dereference our task private array to
>>> find it.
>>> +                */
>>> +               struct io_uring_task *tctx = current->io_uring;
>>>  
>>> -       ret = -EOPNOTSUPP;
>>> -       if (!io_is_uring_fops(f.file))
>>> -               goto out_fput;
>>> +               if (unlikely(!tctx || fd >= IO_RINGFD_REG_MAX))
>>> +                       return -EINVAL;
>>> +               fd = array_index_nospec(fd, IO_RINGFD_REG_MAX);
>>> +               f.file = tctx->registered_rings[fd];
>>> +               f.flags = 0;
>>> +               if (unlikely(!f.file))
>>> +                       return -EBADF;
>>> +               opcode &= ~IORING_REGISTER_USE_REGISTERED_RING;
>>
>> ^ this line looks duplicated at the top of the function?
> 
> Good catch!

Indeed!

> Jens, since you've already applied this, can you remove this line or
> would you like a patch doing so?

It's still top-of-tree, I just amended it.
  

Patch

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 2780bce62faf..35e6f8046b9b 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -470,6 +470,7 @@  struct io_uring_params {
 #define IORING_FEAT_RSRC_TAGS		(1U << 10)
 #define IORING_FEAT_CQE_SKIP		(1U << 11)
 #define IORING_FEAT_LINKED_FILE		(1U << 12)
+#define IORING_FEAT_REG_REG_RING	(1U << 13)
 
 /*
  * io_uring_register(2) opcodes and arguments
@@ -517,7 +518,10 @@  enum {
 	IORING_REGISTER_FILE_ALLOC_RANGE	= 25,
 
 	/* this goes last */
-	IORING_REGISTER_LAST
+	IORING_REGISTER_LAST,
+
+	/* flag added to the opcode to use a registered ring fd */
+	IORING_REGISTER_USE_REGISTERED_RING	= 1U << 31
 };
 
 /* io-wq worker categories */
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index db623b3185c8..1fb743ecba5a 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3663,7 +3663,7 @@  static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
 			IORING_FEAT_POLL_32BITS | IORING_FEAT_SQPOLL_NONFIXED |
 			IORING_FEAT_EXT_ARG | IORING_FEAT_NATIVE_WORKERS |
 			IORING_FEAT_RSRC_TAGS | IORING_FEAT_CQE_SKIP |
-			IORING_FEAT_LINKED_FILE;
+			IORING_FEAT_LINKED_FILE | IORING_FEAT_REG_REG_RING;
 
 	if (copy_to_user(params, p, sizeof(*p))) {
 		ret = -EFAULT;
@@ -4177,17 +4177,37 @@  SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, unsigned int, opcode,
 	struct io_ring_ctx *ctx;
 	long ret = -EBADF;
 	struct fd f;
+	bool use_registered_ring;
+
+	use_registered_ring = !!(opcode & IORING_REGISTER_USE_REGISTERED_RING);
+	opcode &= ~IORING_REGISTER_USE_REGISTERED_RING;
 
 	if (opcode >= IORING_REGISTER_LAST)
 		return -EINVAL;
 
-	f = fdget(fd);
-	if (!f.file)
-		return -EBADF;
+	if (use_registered_ring) {
+		/*
+		 * Ring fd has been registered via IORING_REGISTER_RING_FDS, we
+		 * need only dereference our task private array to find it.
+		 */
+		struct io_uring_task *tctx = current->io_uring;
 
-	ret = -EOPNOTSUPP;
-	if (!io_is_uring_fops(f.file))
-		goto out_fput;
+		if (unlikely(!tctx || fd >= IO_RINGFD_REG_MAX))
+			return -EINVAL;
+		fd = array_index_nospec(fd, IO_RINGFD_REG_MAX);
+		f.file = tctx->registered_rings[fd];
+		f.flags = 0;
+		if (unlikely(!f.file))
+			return -EBADF;
+		opcode &= ~IORING_REGISTER_USE_REGISTERED_RING;
+	} else {
+		f = fdget(fd);
+		if (unlikely(!f.file))
+			return -EBADF;
+		ret = -EOPNOTSUPP;
+		if (!io_is_uring_fops(f.file))
+			goto out_fput;
+	}
 
 	ctx = f.file->private_data;