[v5,0/8] io_uring: Initial support for {s,g}etsockopt commands

Message ID 20230911103407.1393149-1-leitao@debian.org
Headers
Series io_uring: Initial support for {s,g}etsockopt commands |

Message

Breno Leitao Sept. 11, 2023, 10:33 a.m. UTC
  This patchset adds support for getsockopt (SOCKET_URING_OP_GETSOCKOPT)
and setsockopt (SOCKET_URING_OP_SETSOCKOPT) in io_uring commands.
SOCKET_URING_OP_SETSOCKOPT and SOCKET_URING_OP_GETSOCKOPT implement generic
case, covering all levels and optnames (a change from the previous
version, where getsockopt was limited to level=SOL_SOCKET).

In order to keep the implementation (and tests) simple, some refactors
were done prior to the changes, as follows:

Patch 1-2:  Remove the core {s,g}etsockopt() core function from
__sys_{g,s}etsockopt, so, the code could be reused by other callers,
such as io_uring.

Patch 3: Pass compat mode to the file/socket callbacks

Patch 4: Move io_uring helpers from io_uring_zerocopy_tx to a generic
io_uring headers. This simplify the test case (last patch)

Patch 5: Protect io_uring_cmd_sock() to not be called if CONFIG_NET is
disabled.

Important to say that userspace pointers need to be alive until the
operation is completed, as in the systemcall.

These changes were tested with a new test[1] in liburing, LTP sockopt*
tests, as also with bpf/progs/sockopt test case, which is now adapted to
run using both system calls and io_uring commands.

[1] Link: https://github.com/leitao/liburing/blob/getsock/test/socket-getsetsock-cmd.c

RFC -> V1:
	* Copy user memory at io_uring subsystem, and call proto_ops
	  callbacks using kernel memory
	* Implement all the cases for SOCKET_URING_OP_SETSOCKOPT

V1 -> V2
	* Implemented the BPF part
	* Using user pointers from optval to avoid kmalloc in io_uring part.

V2 -> V3:
	* Break down __sys_setsockopt and reuse the core code, avoiding
	  duplicated code. This removed the requirement to expose
	  sock_use_custom_sol_socket().
	* Added io_uring test to selftests/bpf/sockopt.
	* Fixed compat argument, by passing it to the issue_flags.

V3 -> V4:
	* Rebase on top of commit 1ded5e5a5931b ("net: annotate data-races around sock->ops")
	* Also broke down __sys_setsockopt() to reuse the core function
	  from io_uring.
	* Create a new patch to return -EOPNOTSUPP if CONFIG_NET is
	  disabled
	* Added two SOL_SOCKET tests in bpf/prog_tests/sockopt.

V4 -> V5:
	* Do not use sockptr anymore, by changing the optlen getsock argument
	  to be a user pointer (instead of a kernel pointer). This change also drop
	  the limitation on getsockopt from previous versions, and now all
	  levels are supported.
	* Simplified the BPF sockopt test, since there is no more limitation on
	  the io_uring commands.
	* No more changes in the BPF subsystem.
	* Moved the optlen field in the SQE struct. It is now a pointer instead
	  of u32.

Breno Leitao (8):
  net/socket: Break down __sys_setsockopt
  net/socket: Break down __sys_getsockopt
  io_uring/cmd: Pass compat mode in issue_flags
  selftests/net: Extract uring helpers to be reusable
  io_uring/cmd: return -EOPNOTSUPP if net is disabled
  io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
  io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT
  selftests/bpf/sockopt: Add io_uring support

 include/linux/io_uring.h                      |   1 +
 include/net/sock.h                            |   5 +
 include/uapi/linux/io_uring.h                 |  10 +
 io_uring/uring_cmd.c                          |  41 +++
 net/socket.c                                  |  89 ++++--
 tools/include/io_uring/mini_liburing.h        | 292 ++++++++++++++++++
 .../selftests/bpf/prog_tests/sockopt.c        |  95 +++++-
 tools/testing/selftests/net/Makefile          |   1 +
 .../selftests/net/io_uring_zerocopy_tx.c      | 268 +---------------
 9 files changed, 497 insertions(+), 305 deletions(-)
 create mode 100644 tools/include/io_uring/mini_liburing.h
  

Comments

Paolo Abeni Sept. 12, 2023, 9:37 a.m. UTC | #1
On Mon, 2023-09-11 at 03:34 -0700, Breno Leitao wrote:
> Split __sys_getsockopt() into two functions by removing the core
> logic into a sub-function (do_sock_getsockopt()). This will avoid
> code duplication when executing the same operation in other callers, for
> instance.
> 
> do_sock_getsockopt() will be called by io_uring getsockopt() command
> operation in the following patch.
> 
> Suggested-by: Martin KaFai Lau <martin.lau@linux.dev>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
>  include/net/sock.h |  3 +++
>  net/socket.c       | 51 ++++++++++++++++++++++++++++------------------
>  2 files changed, 34 insertions(+), 20 deletions(-)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index aa8fb54ad0af..fbd568a43d28 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1863,6 +1863,9 @@ int sock_setsockopt(struct socket *sock, int level, int op,
>  		    sockptr_t optval, unsigned int optlen);
>  int do_sock_setsockopt(struct socket *sock, bool compat, int level,
>  		       int optname, char __user *user_optval, int optlen);
> +int do_sock_getsockopt(struct socket *sock, bool compat, int level,
> +		       int optname, char __user *user_optval,
> +		       int __user *user_optlen);
>  
>  int sk_getsockopt(struct sock *sk, int level, int optname,
>  		  sockptr_t optval, sockptr_t optlen);
> diff --git a/net/socket.c b/net/socket.c
> index 360332e098d4..3ec779a56f79 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -2333,28 +2333,17 @@ SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname,
>  INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
>  							 int optname));
>  
> -/*
> - *	Get a socket option. Because we don't know the option lengths we have
> - *	to pass a user mode parameter for the protocols to sort out.
> - */
> -int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
> -		int __user *optlen)
> +int do_sock_getsockopt(struct socket *sock, bool compat, int level,
> +		       int optname, char __user *optval,
> +		       int __user *optlen)
>  {
>  	int max_optlen __maybe_unused;
>  	const struct proto_ops *ops;
> -	int err, fput_needed;
> -	struct socket *sock;
> -
> -	sock = sockfd_lookup_light(fd, &err, &fput_needed);
> -	if (!sock)
> -		return err;
> +	int err;
>  
>  	err = security_socket_getsockopt(sock, level, optname);
>  	if (err)
> -		goto out_put;
> -
> -	if (!in_compat_syscall())
> -		max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
> +		return err;
>  
>  	ops = READ_ONCE(sock->ops);
>  	if (level == SOL_SOCKET)
> @@ -2362,14 +2351,36 @@ int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
>  	else if (unlikely(!ops->getsockopt))
>  		err = -EOPNOTSUPP;
>  	else
> -		err = ops->getsockopt(sock, level, optname, optval,
> -					    optlen);
> +		err = ops->getsockopt(sock, level, optname, optval, optlen);
>  
> -	if (!in_compat_syscall())
> +	if (!compat) {
> +		max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
>  		err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
>  						     optval, optlen, max_optlen,
>  						     err);
> -out_put:
> +	}
> +
> +	return err;
> +}
> +EXPORT_SYMBOL(do_sock_getsockopt);
> +
> +/*	Get a socket option. Because we don't know the option lengths we have
> + *	to pass a user mode parameter for the protocols to sort out.
> + */
> +int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
> +		     int __user *optlen)
> +{
> +	int err, fput_needed;
> +	bool compat = in_compat_syscall();
> +	struct socket *sock;

Please respect the reverse x-mas tree order, thanks!

Paolo