[net-next,v2,08/10] crypto: af_alg: Support MSG_SPLICE_PAGES
Commit Message
Make AF_ALG sendmsg() support MSG_SPLICE_PAGES. This causes pages to be
spliced from the source iterator.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-crypto@vger.kernel.org
cc: netdev@vger.kernel.org
---
crypto/af_alg.c | 28 ++++++++++++++++++++++++++--
crypto/algif_aead.c | 22 +++++++++++-----------
crypto/algif_skcipher.c | 8 ++++----
3 files changed, 41 insertions(+), 17 deletions(-)
Comments
On Tue, 2023-05-30 at 15:16 +0100, David Howells wrote:
> Make AF_ALG sendmsg() support MSG_SPLICE_PAGES. This causes pages to be
> spliced from the source iterator.
>
> This allows ->sendpage() to be replaced by something that can handle
> multiple multipage folios in a single transaction.
>
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Herbert Xu <herbert@gondor.apana.org.au>
> cc: "David S. Miller" <davem@davemloft.net>
> cc: Eric Dumazet <edumazet@google.com>
> cc: Jakub Kicinski <kuba@kernel.org>
> cc: Paolo Abeni <pabeni@redhat.com>
> cc: Jens Axboe <axboe@kernel.dk>
> cc: Matthew Wilcox <willy@infradead.org>
> cc: linux-crypto@vger.kernel.org
> cc: netdev@vger.kernel.org
> ---
> crypto/af_alg.c | 28 ++++++++++++++++++++++++++--
> crypto/algif_aead.c | 22 +++++++++++-----------
> crypto/algif_skcipher.c | 8 ++++----
> 3 files changed, 41 insertions(+), 17 deletions(-)
>
> diff --git a/crypto/af_alg.c b/crypto/af_alg.c
> index fd56ccff6fed..62f4205d42e3 100644
> --- a/crypto/af_alg.c
> +++ b/crypto/af_alg.c
> @@ -940,6 +940,10 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size,
> bool init = false;
> int err = 0;
>
> + if ((msg->msg_flags & MSG_SPLICE_PAGES) &&
> + !iov_iter_is_bvec(&msg->msg_iter))
> + return -EINVAL;
> +
> if (msg->msg_controllen) {
> err = af_alg_cmsg_send(msg, &con);
> if (err)
> @@ -985,7 +989,7 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size,
> while (size) {
> struct scatterlist *sg;
> size_t len = size;
> - size_t plen;
> + ssize_t plen;
>
> /* use the existing memory in an allocated page */
> if (ctx->merge) {
> @@ -1030,7 +1034,27 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size,
> if (sgl->cur)
> sg_unmark_end(sg + sgl->cur - 1);
>
> - if (1 /* TODO check MSG_SPLICE_PAGES */) {
> + if (msg->msg_flags & MSG_SPLICE_PAGES) {
> + struct sg_table sgtable = {
> + .sgl = sg,
> + .nents = sgl->cur,
> + .orig_nents = sgl->cur,
> + };
> +
> + plen = extract_iter_to_sg(&msg->msg_iter, len, &sgtable,
> + MAX_SGL_ENTS, 0);
It looks like the above expect/supports only ITER_BVEC iterators, what
about adding a WARN_ON_ONCE(<other iov type>)?
Also, I'm keeping this series a bit more in pw to allow Herbert or
others to have a look.
Cheers,
Paolo
Paolo Abeni <pabeni@redhat.com> wrote:
> > + if ((msg->msg_flags & MSG_SPLICE_PAGES) &&
> > + !iov_iter_is_bvec(&msg->msg_iter))
> > + return -EINVAL;
> > +
> ...
> It looks like the above expect/supports only ITER_BVEC iterators, what
> about adding a WARN_ON_ONCE(<other iov type>)?
Meh. I relaxed that requirement as I'm now using tools to extract stuff from
any iterator (extract_iter_to_sg() in this case) rather than walking the
bvec[] directly. I forgot to remove the check from af_alg. I can add an
extra patch to remove it. Also, it probably doesn't matter for AF_ALG since
that's only likely to be called from userspace, either directly (which will
not set MSG_SPLICE_PAGES) or via splice (which will pass a BVEC). Internal
kernel code will use crypto API directly.
> Also, I'm keeping this series a bit more in pw to allow Herbert or
> others to have a look.
Thanks.
David
On Thu, 2023-06-01 at 12:35 +0100, David Howells wrote:
> Paolo Abeni <pabeni@redhat.com> wrote:
>
> > > + if ((msg->msg_flags & MSG_SPLICE_PAGES) &&
> > > + !iov_iter_is_bvec(&msg->msg_iter))
> > > + return -EINVAL;
> > > +
> > ...
> > It looks like the above expect/supports only ITER_BVEC iterators, what
> > about adding a WARN_ON_ONCE(<other iov type>)?
>
> Meh. I relaxed that requirement as I'm now using tools to extract stuff from
> any iterator (extract_iter_to_sg() in this case) rather than walking the
> bvec[] directly. I forgot to remove the check from af_alg. I can add an
> extra patch to remove it. Also, it probably doesn't matter for AF_ALG since
> that's only likely to be called from userspace, either directly (which will
> not set MSG_SPLICE_PAGES) or via splice (which will pass a BVEC). Internal
> kernel code will use crypto API directly.
Thank you for the clarification, I got lost a bit. The patch LGTM as
is.
>
> > Also, I'm keeping this series a bit more in pw to allow Herbert or
> > others to have a look.
@Herbert, the series LGTM, I think we should apply it. If you have any
concerns, please voice them soon!
Thanks,
Paolo
@@ -940,6 +940,10 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size,
bool init = false;
int err = 0;
+ if ((msg->msg_flags & MSG_SPLICE_PAGES) &&
+ !iov_iter_is_bvec(&msg->msg_iter))
+ return -EINVAL;
+
if (msg->msg_controllen) {
err = af_alg_cmsg_send(msg, &con);
if (err)
@@ -985,7 +989,7 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size,
while (size) {
struct scatterlist *sg;
size_t len = size;
- size_t plen;
+ ssize_t plen;
/* use the existing memory in an allocated page */
if (ctx->merge) {
@@ -1030,7 +1034,27 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size,
if (sgl->cur)
sg_unmark_end(sg + sgl->cur - 1);
- if (1 /* TODO check MSG_SPLICE_PAGES */) {
+ if (msg->msg_flags & MSG_SPLICE_PAGES) {
+ struct sg_table sgtable = {
+ .sgl = sg,
+ .nents = sgl->cur,
+ .orig_nents = sgl->cur,
+ };
+
+ plen = extract_iter_to_sg(&msg->msg_iter, len, &sgtable,
+ MAX_SGL_ENTS, 0);
+ if (plen < 0) {
+ err = plen;
+ goto unlock;
+ }
+
+ for (; sgl->cur < sgtable.nents; sgl->cur++)
+ get_page(sg_page(&sg[sgl->cur]));
+ len -= plen;
+ ctx->used += plen;
+ copied += plen;
+ size -= plen;
+ } else {
do {
struct page *pg;
unsigned int i = sgl->cur;
@@ -9,8 +9,8 @@
* The following concept of the memory management is used:
*
* The kernel maintains two SGLs, the TX SGL and the RX SGL. The TX SGL is
- * filled by user space with the data submitted via sendpage/sendmsg. Filling
- * up the TX SGL does not cause a crypto operation -- the data will only be
+ * filled by user space with the data submitted via sendpage. Filling up
+ * the TX SGL does not cause a crypto operation -- the data will only be
* tracked by the kernel. Upon receipt of one recvmsg call, the caller must
* provide a buffer which is tracked with the RX SGL.
*
@@ -113,19 +113,19 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg,
}
/*
- * Data length provided by caller via sendmsg/sendpage that has not
- * yet been processed.
+ * Data length provided by caller via sendmsg that has not yet been
+ * processed.
*/
used = ctx->used;
/*
- * Make sure sufficient data is present -- note, the same check is
- * also present in sendmsg/sendpage. The checks in sendpage/sendmsg
- * shall provide an information to the data sender that something is
- * wrong, but they are irrelevant to maintain the kernel integrity.
- * We need this check here too in case user space decides to not honor
- * the error message in sendmsg/sendpage and still call recvmsg. This
- * check here protects the kernel integrity.
+ * Make sure sufficient data is present -- note, the same check is also
+ * present in sendmsg. The checks in sendmsg shall provide an
+ * information to the data sender that something is wrong, but they are
+ * irrelevant to maintain the kernel integrity. We need this check
+ * here too in case user space decides to not honor the error message
+ * in sendmsg and still call recvmsg. This check here protects the
+ * kernel integrity.
*/
if (!aead_sufficient_data(sk))
return -EINVAL;
@@ -9,10 +9,10 @@
* The following concept of the memory management is used:
*
* The kernel maintains two SGLs, the TX SGL and the RX SGL. The TX SGL is
- * filled by user space with the data submitted via sendpage/sendmsg. Filling
- * up the TX SGL does not cause a crypto operation -- the data will only be
- * tracked by the kernel. Upon receipt of one recvmsg call, the caller must
- * provide a buffer which is tracked with the RX SGL.
+ * filled by user space with the data submitted via sendmsg. Filling up the TX
+ * SGL does not cause a crypto operation -- the data will only be tracked by
+ * the kernel. Upon receipt of one recvmsg call, the caller must provide a
+ * buffer which is tracked with the RX SGL.
*
* During the processing of the recvmsg operation, the cipher request is
* allocated and prepared. As part of the recvmsg operation, the processed