[v2] coredump: Use vmsplice_to_pipe() for pipes in dump_emit_page()

Message ID 20221031210349.3346-1-yepeilin.cs@gmail.com
State New
Headers
Series [v2] coredump: Use vmsplice_to_pipe() for pipes in dump_emit_page() |

Commit Message

Peilin Ye Oct. 31, 2022, 9:03 p.m. UTC
  From: Peilin Ye <peilin.ye@bytedance.com>

Currently, there is a copy for each page when dumping VMAs to pipe
handlers using dump_emit_page().  For example:

  fs/binfmt_elf.c:elf_core_dump()
      fs/coredump.c:dump_user_range()
                     :dump_emit_page()
        fs/read_write.c:__kernel_write_iter()
                fs/pipe.c:pipe_write()
             lib/iov_iter.c:copy_page_from_iter()

Use vmsplice_to_pipe() instead of __kernel_write_iter() to avoid this
copy for pipe handlers.

Tested by dumping a 40-GByte core into a simple handler that splice()s
from stdin to disk in a loop, PIPE_DEF_BUFFERS (16) pages at a time.

                              Before           After   Improved by
  Time to Completion   52.04 seconds   46.30 seconds        11.03%
  CPU Usage                   89.43%          84.90%         5.07%

Suggested-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
---
change in v2:
  - fix warning in net/tls/tls_sw.c (kernel test robot)

 fs/coredump.c          | 7 ++++++-
 fs/splice.c            | 4 ++--
 include/linux/splice.h | 3 +++
 3 files changed, 11 insertions(+), 3 deletions(-)
  

Comments

Al Viro Nov. 19, 2022, 4:46 a.m. UTC | #1
On Mon, Oct 31, 2022 at 02:03:49PM -0700, Peilin Ye wrote:

> +	n = vmsplice_to_pipe(file, &iter, 0);
> +	if (n == -EBADF)
> +		n = __kernel_write_iter(cprm->file, &iter, &pos);

Yuck.  If anything, I would rather put a flag into coredump_params
and check it instead; this check for -EBADF is both unidiomatic and
brittle.  Suppose someday somebody looks at vmsplice(2) and
decides that it would make sense to lift the "is it a pipe" check
into e.g. vmsplice_type().  There's no obvious reasons not to,
unless one happens to know that coredump relies upon that check done
in vmsplice_to_pipe().  It's asking for trouble several years down
the road.

Make it explicit and independent from details of error checking
in vmsplice(2).
  
Peilin Ye Nov. 30, 2022, 3:40 a.m. UTC | #2
On Sat, Nov 19, 2022 at 04:46:17AM +0000, Al Viro wrote:
> On Mon, Oct 31, 2022 at 02:03:49PM -0700, Peilin Ye wrote:
> 
> > +	n = vmsplice_to_pipe(file, &iter, 0);
> > +	if (n == -EBADF)
> > +		n = __kernel_write_iter(cprm->file, &iter, &pos);
> 
> Yuck.  If anything, I would rather put a flag into coredump_params
> and check it instead; this check for -EBADF is both unidiomatic and
> brittle.  Suppose someday somebody looks at vmsplice(2) and
> decides that it would make sense to lift the "is it a pipe" check
> into e.g. vmsplice_type().  There's no obvious reasons not to,
> unless one happens to know that coredump relies upon that check done
> in vmsplice_to_pipe().  It's asking for trouble several years down
> the road.
> 
> Make it explicit and independent from details of error checking
> in vmsplice(2).

Thanks for the review!  I was a bit hesitant about introducing a new
field to coredump_params for this optimization.  Will do it in v3.

Peilin Ye
  

Patch

diff --git a/fs/coredump.c b/fs/coredump.c
index da0e9525c4e8..c0a8713d9971 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -42,6 +42,7 @@ 
 #include <linux/timekeeping.h>
 #include <linux/sysctl.h>
 #include <linux/elf.h>
+#include <linux/splice.h>
 
 #include <linux/uaccess.h>
 #include <asm/mmu_context.h>
@@ -862,7 +863,11 @@  static int dump_emit_page(struct coredump_params *cprm, struct page *page)
 		return 0;
 	pos = file->f_pos;
 	iov_iter_bvec(&iter, WRITE, &bvec, 1, PAGE_SIZE);
-	n = __kernel_write_iter(cprm->file, &iter, &pos);
+
+	n = vmsplice_to_pipe(file, &iter, 0);
+	if (n == -EBADF)
+		n = __kernel_write_iter(cprm->file, &iter, &pos);
+
 	if (n != PAGE_SIZE)
 		return 0;
 	file->f_pos = pos;
diff --git a/fs/splice.c b/fs/splice.c
index 0878b852b355..2051700cda79 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1234,8 +1234,8 @@  static long vmsplice_to_user(struct file *file, struct iov_iter *iter,
  * as splice-from-memory, where the regular splice is splice-from-file (or
  * to file). In both cases the output is a pipe, naturally.
  */
-static long vmsplice_to_pipe(struct file *file, struct iov_iter *iter,
-			     unsigned int flags)
+long vmsplice_to_pipe(struct file *file, struct iov_iter *iter,
+		      unsigned int flags)
 {
 	struct pipe_inode_info *pipe;
 	long ret = 0;
diff --git a/include/linux/splice.h b/include/linux/splice.h
index a55179fd60fc..38b3560a318b 100644
--- a/include/linux/splice.h
+++ b/include/linux/splice.h
@@ -10,6 +10,7 @@ 
 #define SPLICE_H
 
 #include <linux/pipe_fs_i.h>
+#include <linux/uio.h>
 
 /*
  * Flags passed in from splice/tee/vmsplice
@@ -81,6 +82,8 @@  extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *,
 extern long do_splice(struct file *in, loff_t *off_in,
 		      struct file *out, loff_t *off_out,
 		      size_t len, unsigned int flags);
+extern long vmsplice_to_pipe(struct file *file, struct iov_iter *iter,
+			     unsigned int flags);
 
 extern long do_tee(struct file *in, struct file *out, size_t len,
 		   unsigned int flags);