[RFC,06/10] net: add SO_DEVMEM_DONTNEED setsockopt to release RX pages

Message ID 20230710223304.1174642-7-almasrymina@google.com
State New
Headers
Series Device Memory TCP |

Commit Message

Mina Almasry July 10, 2023, 10:32 p.m. UTC
  Add an interface for the user to notify the kernel that it is done reading
the NET_RX dmabuf pages returned as cmsg. The kernel will drop the
reference on the NET_RX pages to make them available for re-use.

Signed-off-by: Mina Almasry <almasrymina@google.com>
---
 include/uapi/asm-generic/socket.h |  1 +
 include/uapi/linux/uio.h          |  4 +++
 net/core/sock.c                   | 41 +++++++++++++++++++++++++++++++
 3 files changed, 46 insertions(+)
  

Comments

Andy Lutomirski July 16, 2023, 11:57 p.m. UTC | #1
On 7/10/23 15:32, Mina Almasry wrote:
> Add an interface for the user to notify the kernel that it is done reading
> the NET_RX dmabuf pages returned as cmsg. The kernel will drop the
> reference on the NET_RX pages to make them available for re-use.
> 
> Signed-off-by: Mina Almasry <almasrymina@google.com>
> ---

> +		for (i = 0; i < num_tokens; i++) {
> +			for (j = 0; j < tokens[i].token_count; j++) {
> +				struct page *pg = xa_erase(&sk->sk_pagepool,
> +							   tokens[i].token_start + j);
> +
> +				if (pg)
> +					put_page(pg);
> +				else
> +					/* -EINTR here notifies the userspace
> +					 * that not all tokens passed to it have
> +					 * been freed.
> +					 */
> +					ret = -EINTR;

Unless I'm missing something, this type of error reporting is 
unrecoverable -- userspace doesn't know how many tokens have been freed.

I think you should either make it explicitly unrecoverable (somehow shut 
down dmabuf handling entirely) or tell userspace how many tokens were 
successfully freed.

--Andy
  
Mina Almasry July 17, 2023, 2:06 a.m. UTC | #2
On Sun, Jul 16, 2023 at 4:57 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On 7/10/23 15:32, Mina Almasry wrote:
> > Add an interface for the user to notify the kernel that it is done reading
> > the NET_RX dmabuf pages returned as cmsg. The kernel will drop the
> > reference on the NET_RX pages to make them available for re-use.
> >
> > Signed-off-by: Mina Almasry <almasrymina@google.com>
> > ---
>
> > +             for (i = 0; i < num_tokens; i++) {
> > +                     for (j = 0; j < tokens[i].token_count; j++) {
> > +                             struct page *pg = xa_erase(&sk->sk_pagepool,
> > +                                                        tokens[i].token_start + j);
> > +
> > +                             if (pg)
> > +                                     put_page(pg);
> > +                             else
> > +                                     /* -EINTR here notifies the userspace
> > +                                      * that not all tokens passed to it have
> > +                                      * been freed.
> > +                                      */
> > +                                     ret = -EINTR;
>
> Unless I'm missing something, this type of error reporting is
> unrecoverable -- userspace doesn't know how many tokens have been freed.
>
> I think you should either make it explicitly unrecoverable (somehow shut
> down dmabuf handling entirely) or tell userspace how many tokens were
> successfully freed.
>

Thank you, the latter suggestion sounds perfect actually. The user
can't do much if the kernel fails to free all the tokens, but at least
it can know that something wrong is happening and can log an error for
further debugging. Great suggestion! Thanks!
  

Patch

diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index 88f9234f78cb..2a5a7f5da358 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -132,6 +132,7 @@ 
 
 #define SO_RCVMARK		75
 
+#define SO_DEVMEM_DONTNEED	97
 #define SO_DEVMEM_HEADER	98
 #define SCM_DEVMEM_HEADER	SO_DEVMEM_HEADER
 #define SO_DEVMEM_OFFSET	99
diff --git a/include/uapi/linux/uio.h b/include/uapi/linux/uio.h
index 8b0be0f50838..faaa765fd5a4 100644
--- a/include/uapi/linux/uio.h
+++ b/include/uapi/linux/uio.h
@@ -26,6 +26,10 @@  struct cmsg_devmem {
 	__u32 frag_token;
 };
 
+struct devmemtoken {
+	__u32 token_start;
+	__u32 token_count;
+};
 /*
  *	UIO_MAXIOV shall be at least 16 1003.1g (5.4.1.1)
  */
diff --git a/net/core/sock.c b/net/core/sock.c
index 24f2761bdb1d..f9b9d9ec7322 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1531,7 +1531,48 @@  int sk_setsockopt(struct sock *sk, int level, int optname,
 		/* Paired with READ_ONCE() in tcp_rtx_synack() */
 		WRITE_ONCE(sk->sk_txrehash, (u8)val);
 		break;
+	case SO_DEVMEM_DONTNEED: {
+		struct devmemtoken tokens[128];
+		unsigned int num_tokens, i, j;
 
+		if (sk->sk_type != SOCK_STREAM ||
+		    sk->sk_protocol != IPPROTO_TCP) {
+			ret = -EBADF;
+			break;
+		}
+
+		if (optlen % sizeof(struct devmemtoken) ||
+		    optlen > sizeof(tokens)) {
+			ret = -EINVAL;
+			break;
+		}
+
+		num_tokens = optlen / sizeof(struct devmemtoken);
+		if (copy_from_sockptr(tokens, optval, optlen)) {
+			ret = -EFAULT;
+			break;
+		}
+
+		ret = 0;
+
+		for (i = 0; i < num_tokens; i++) {
+			for (j = 0; j < tokens[i].token_count; j++) {
+				struct page *pg = xa_erase(&sk->sk_pagepool,
+							   tokens[i].token_start + j);
+
+				if (pg)
+					put_page(pg);
+				else
+					/* -EINTR here notifies the userspace
+					 * that not all tokens passed to it have
+					 * been freed.
+					 */
+					ret = -EINTR;
+			}
+		}
+
+		break;
+	}
 	default:
 		ret = -ENOPROTOOPT;
 		break;