Message ID | 20230329141354.516864-5-dhowells@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp449784vqo; Wed, 29 Mar 2023 07:19:59 -0700 (PDT) X-Google-Smtp-Source: AKy350Z5QSHH7uWHt2pPHrjfm80JlvZae+oxHIb3OKm4WR+BvKJ0P9iXg1U7nGi/XrDI6qA8H5Xz X-Received: by 2002:a05:6402:4d5:b0:4ac:d2b4:ec2c with SMTP id n21-20020a05640204d500b004acd2b4ec2cmr19920788edw.29.1680099599622; Wed, 29 Mar 2023 07:19:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680099599; cv=none; d=google.com; s=arc-20160816; b=ovTWOYkUK48ZKHGOnvpccoLMa6A8f0NTRriI8pZp/LVrPatCXNV3Y5lgyI5L2kWLRA hTf8SJl4Miuy/UNu0lTtzTt+uEAJSGUy3MECBhEXwoyv3trJF0z05Xj/oTNRoHnttmhP vMTztqBL2gjGSDy8+Q/cI0kbc+Ad06Tqqn91+57iF+BIaywlcLWrg9xMSU4g7sfyzJ2W x4jTOjy2XBj9lVEjyxLMI3X88LCYPIobAhn0DCTDlgXS8HnrqkTHHf7t13k5l4YoFBwq Ex6AbOatE0r56cAmIIaDDmORDRbrjsymlpgYpzUN7ia06mpewO0zWoSrJOSHswCiVT0E GYvg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Jo7xD6cW+5TTIp8oqR2hYvnkWHwWe/N8CbruF8cE9GA=; b=vFwsUmVvdCT92iNM5h6RvsEieovVgFOuAyqNHnfY7Foe/vFO+zQFioin00PMYGZmjS C1o5OMhEw+IAA1Npo3I2q/jXEOaU93oesQtth/iHioh3reSLkQpOthYcwlTcGjb7cf7I PvnVjxfPpySr5WrCw8jI51R32pyNsxwwgWfxf7UWomUORulNWOarrku6cO4cfSN465Pe 2YWnR8K9TGaFR8v1RcpF7Acot/jCG/IEBAn2tCaAVUB/F4UfBRT7gUZbHj6cmkl5p41L d+XKri+pzK/Z+e2SjZZcpx+d1Fs2RE+C6sLcVNDF2dGaUA1OVFueIxHKLitmApi1ASnw hdGw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XSorInbJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x23-20020aa7d6d7000000b004af60cc2a8csi34657979edr.486.2023.03.29.07.19.36; Wed, 29 Mar 2023 07:19:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XSorInbJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231127AbjC2ORi (ORCPT <rfc822;rua109.linux@gmail.com> + 99 others); Wed, 29 Mar 2023 10:17:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53570 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230451AbjC2OQo (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 29 Mar 2023 10:16:44 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA9014C3E for <linux-kernel@vger.kernel.org>; Wed, 29 Mar 2023 07:15:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099254; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Jo7xD6cW+5TTIp8oqR2hYvnkWHwWe/N8CbruF8cE9GA=; b=XSorInbJHYv7h0BCJsb/2BlcJsTxSyyglGtCgq/izDeFljKyhqfM0I8+mZxo/kg4Y6YZgR YP7gqXE8Ghnsea46ocW0FpYKxg/MWCGwZPuhsQgP3Z3MZpzNo5P8R16sR+5r13ZxxUsUnq DqFqOEzAQdBHRBR7zJaY8dmQ7pl4khw= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-228-hXJAadh0MrOcoI5-BRrEXg-1; Wed, 29 Mar 2023 10:14:11 -0400 X-MC-Unique: hXJAadh0MrOcoI5-BRrEXg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 42E2B185A78B; Wed, 29 Mar 2023 14:14:10 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4BD8D2166B33; Wed, 29 Mar 2023 14:14:08 +0000 (UTC) From: David Howells <dhowells@redhat.com> To: Matthew Wilcox <willy@infradead.org>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com> Cc: David Howells <dhowells@redhat.com>, Al Viro <viro@zeniv.linux.org.uk>, Christoph Hellwig <hch@infradead.org>, Jens Axboe <axboe@kernel.dk>, Jeff Layton <jlayton@kernel.org>, Christian Brauner <brauner@kernel.org>, Chuck Lever III <chuck.lever@oracle.com>, Linus Torvalds <torvalds@linux-foundation.org>, netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Willem de Bruijn <willemdebruijn.kernel@gmail.com> Subject: [RFC PATCH v2 04/48] net: Declare MSG_SPLICE_PAGES internal sendmsg() flag Date: Wed, 29 Mar 2023 15:13:10 +0100 Message-Id: <20230329141354.516864-5-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761712117502837814?= X-GMAIL-MSGID: =?utf-8?q?1761712117502837814?= |
Series |
splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES)
|
|
Commit Message
David Howells
March 29, 2023, 2:13 p.m. UTC
Declare MSG_SPLICE_PAGES, an internal sendmsg() flag, that hints to a
network protocol that it should splice pages from the source iterator
rather than copying the data if it can. This flag is added to a list that
is cleared by sendmsg and recvmsg syscalls on entry.
This is intended as a replacement for the ->sendpage() op, allowing a way
to splice in several multipage folios in one go.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
include/linux/socket.h | 3 +++
net/socket.c | 7 +++++++
2 files changed, 10 insertions(+)
Comments
David Howells wrote: > Declare MSG_SPLICE_PAGES, an internal sendmsg() flag, that hints to a > network protocol that it should splice pages from the source iterator > rather than copying the data if it can. This flag is added to a list that > is cleared by sendmsg and recvmsg syscalls on entry. > > This is intended as a replacement for the ->sendpage() op, allowing a way > to splice in several multipage folios in one go. > > Signed-off-by: David Howells <dhowells@redhat.com> > cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> > cc: "David S. Miller" <davem@davemloft.net> > cc: Eric Dumazet <edumazet@google.com> > cc: Jakub Kicinski <kuba@kernel.org> > cc: Paolo Abeni <pabeni@redhat.com> > cc: Jens Axboe <axboe@kernel.dk> > cc: Matthew Wilcox <willy@infradead.org> > cc: netdev@vger.kernel.org > --- > include/linux/socket.h | 3 +++ > net/socket.c | 7 +++++++ > 2 files changed, 10 insertions(+) > > diff --git a/include/linux/socket.h b/include/linux/socket.h > index 13c3a237b9c9..c2fa0f800999 100644 > --- a/include/linux/socket.h > +++ b/include/linux/socket.h > @@ -327,6 +327,7 @@ struct ucred { > */ > > #define MSG_ZEROCOPY 0x4000000 /* Use user data in kernel path */ > +#define MSG_SPLICE_PAGES 0x8000000 /* Splice the pages from the iterator in sendmsg() */ > #define MSG_FASTOPEN 0x20000000 /* Send data in TCP SYN */ > #define MSG_CMSG_CLOEXEC 0x40000000 /* Set close_on_exec for file > descriptor received through > @@ -337,6 +338,8 @@ struct ucred { > #define MSG_CMSG_COMPAT 0 /* We never have 32 bit fixups */ > #endif > > +/* Flags to be cleared on entry by sendmsg, recvmsg, sendmmsg and recvmmsg syscalls */ > +#define MSG_INTERNAL_FLAGS (MSG_SPLICE_PAGES) This is fine, but there is no real need to cover both send and receive. The sendpage internal flags only ensure that those flags cannot enter sendpage code from any unintentional path. Indeed those "internal" flags can end up in sendmsg, at least for UDP. Similarly, this flag set only has to protect sendto and sendmsg. That can simplify the patch a bit. > /* Setsockoptions(2) level. Thanks to BSD these must match IPPROTO_xxx */ > #define SOL_IP 0 > diff --git a/net/socket.c b/net/socket.c > index 6bae8ce7059e..dfb912bbed62 100644 > --- a/net/socket.c > +++ b/net/socket.c > @@ -2139,6 +2139,7 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags, > msg.msg_name = (struct sockaddr *)&address; > msg.msg_namelen = addr_len; > } > + flags &= ~MSG_INTERNAL_FLAGS; > if (sock->file->f_flags & O_NONBLOCK) > flags |= MSG_DONTWAIT; > msg.msg_flags = flags; > @@ -2192,6 +2193,7 @@ int __sys_recvfrom(int fd, void __user *ubuf, size_t size, unsigned int flags, > if (!sock) > goto out; > > + flags &= ~MSG_INTERNAL_FLAGS; > if (sock->file->f_flags & O_NONBLOCK) > flags |= MSG_DONTWAIT; > err = sock_recvmsg(sock, &msg, flags); > @@ -2579,6 +2581,7 @@ long __sys_sendmsg(int fd, struct user_msghdr __user *msg, unsigned int flags, > > if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT)) > return -EINVAL; > + flags &= ~MSG_INTERNAL_FLAGS; > > sock = sockfd_lookup_light(fd, &err, &fput_needed); > if (!sock) > @@ -2627,6 +2630,7 @@ int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen, > entry = mmsg; > compat_entry = (struct compat_mmsghdr __user *)mmsg; > err = 0; > + flags &= ~MSG_INTERNAL_FLAGS; > flags |= MSG_BATCH; > No need to modify __sys_sendmmsg explicitly, as it ends up calling __sys_sendmsg? Also, sendpage does this flags masking in the internal sock_FUNC helpers rather than __sys_FUNC. Might be preferable. > while (datagrams < vlen) { > @@ -2775,6 +2779,7 @@ long __sys_recvmsg_sock(struct socket *sock, struct msghdr *msg, > struct user_msghdr __user *umsg, > struct sockaddr __user *uaddr, unsigned int flags) > { > + flags &= ~MSG_INTERNAL_FLAGS; > return ____sys_recvmsg(sock, msg, umsg, uaddr, flags, 0); > } > > @@ -2787,6 +2792,7 @@ long __sys_recvmsg(int fd, struct user_msghdr __user *msg, unsigned int flags, > > if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT)) > return -EINVAL; > + flags &= ~MSG_INTERNAL_FLAGS; > > sock = sockfd_lookup_light(fd, &err, &fput_needed); > if (!sock) > @@ -2839,6 +2845,7 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg, > goto out_put; > } > } > + flags &= ~MSG_INTERNAL_FLAGS; > > entry = mmsg; > compat_entry = (struct compat_mmsghdr __user *)mmsg; >
Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > No need to modify __sys_sendmmsg explicitly, as it ends up calling > __sys_sendmsg? > > Also, sendpage does this flags masking in the internal sock_FUNC > helpers rather than __sys_FUNC. Might be preferable. I was wondering whether other flags, such as MSG_BATCH should be added to the list. Is it bad if userspace sets that in sendmsg()? AF_KCM, at least, looks at it. David
David Howells wrote: > Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > > No need to modify __sys_sendmmsg explicitly, as it ends up calling > > __sys_sendmsg? > > > > Also, sendpage does this flags masking in the internal sock_FUNC > > helpers rather than __sys_FUNC. Might be preferable. > > I was wondering whether other flags, such as MSG_BATCH should be added to the > list. Is it bad if userspace sets that in sendmsg()? AF_KCM, at least, looks > at it. That flag was added exactly for AF_KCM. A process that explicitly sets it might experience bad behavior (increased latency), but there are no legacy AF_KCM applications that precede the flag.
diff --git a/include/linux/socket.h b/include/linux/socket.h index 13c3a237b9c9..c2fa0f800999 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -327,6 +327,7 @@ struct ucred { */ #define MSG_ZEROCOPY 0x4000000 /* Use user data in kernel path */ +#define MSG_SPLICE_PAGES 0x8000000 /* Splice the pages from the iterator in sendmsg() */ #define MSG_FASTOPEN 0x20000000 /* Send data in TCP SYN */ #define MSG_CMSG_CLOEXEC 0x40000000 /* Set close_on_exec for file descriptor received through @@ -337,6 +338,8 @@ struct ucred { #define MSG_CMSG_COMPAT 0 /* We never have 32 bit fixups */ #endif +/* Flags to be cleared on entry by sendmsg, recvmsg, sendmmsg and recvmmsg syscalls */ +#define MSG_INTERNAL_FLAGS (MSG_SPLICE_PAGES) /* Setsockoptions(2) level. Thanks to BSD these must match IPPROTO_xxx */ #define SOL_IP 0 diff --git a/net/socket.c b/net/socket.c index 6bae8ce7059e..dfb912bbed62 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2139,6 +2139,7 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags, msg.msg_name = (struct sockaddr *)&address; msg.msg_namelen = addr_len; } + flags &= ~MSG_INTERNAL_FLAGS; if (sock->file->f_flags & O_NONBLOCK) flags |= MSG_DONTWAIT; msg.msg_flags = flags; @@ -2192,6 +2193,7 @@ int __sys_recvfrom(int fd, void __user *ubuf, size_t size, unsigned int flags, if (!sock) goto out; + flags &= ~MSG_INTERNAL_FLAGS; if (sock->file->f_flags & O_NONBLOCK) flags |= MSG_DONTWAIT; err = sock_recvmsg(sock, &msg, flags); @@ -2579,6 +2581,7 @@ long __sys_sendmsg(int fd, struct user_msghdr __user *msg, unsigned int flags, if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT)) return -EINVAL; + flags &= ~MSG_INTERNAL_FLAGS; sock = sockfd_lookup_light(fd, &err, &fput_needed); if (!sock) @@ -2627,6 +2630,7 @@ int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen, entry = mmsg; compat_entry = (struct compat_mmsghdr __user *)mmsg; err = 0; + flags &= ~MSG_INTERNAL_FLAGS; flags |= MSG_BATCH; while (datagrams < vlen) { @@ -2775,6 +2779,7 @@ long __sys_recvmsg_sock(struct socket *sock, struct msghdr *msg, struct user_msghdr __user *umsg, struct sockaddr __user *uaddr, unsigned int flags) { + flags &= ~MSG_INTERNAL_FLAGS; return ____sys_recvmsg(sock, msg, umsg, uaddr, flags, 0); } @@ -2787,6 +2792,7 @@ long __sys_recvmsg(int fd, struct user_msghdr __user *msg, unsigned int flags, if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT)) return -EINVAL; + flags &= ~MSG_INTERNAL_FLAGS; sock = sockfd_lookup_light(fd, &err, &fput_needed); if (!sock) @@ -2839,6 +2845,7 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg, goto out_put; } } + flags &= ~MSG_INTERNAL_FLAGS; entry = mmsg; compat_entry = (struct compat_mmsghdr __user *)mmsg;