Message ID | 20230519074047.1739879-24-dhowells@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1059634vqo; Fri, 19 May 2023 00:57:20 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5phE6HvYa4FOpHhX9Sd1dxJMfdFoOled99LhfNUaS6or3BxxYmPWEzQuV+SJVHztxyL2eQ X-Received: by 2002:a05:6a20:6f04:b0:106:92a:37ae with SMTP id gt4-20020a056a206f0400b00106092a37aemr1923223pzb.30.1684483039744; Fri, 19 May 2023 00:57:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684483039; cv=none; d=google.com; s=arc-20160816; b=Z88e94dPPMUlhoEtWd+/EuH8QEKCubtpUAuEck66LkxjIoiD2Ebd+ZwKY5u3DD/rX5 D9Xw1acRetZGIm1zhC1ZLIoovp/+weW7JHO8I9mJoBM1VZ/rfMQkm64gXlqrko/E7B6Z sRn0dRFW7rv6ZGQvqk3FTyDpA9oWqrRmlUasuOjNJARvPg4x1OXa5ANaGd6eNrjAAPFR kBjpAP3VKyjgA8/5VaViGOpoBsz5yXuk36j3kpScnE63g8fERHW1sQq6K04FBizcZRkQ aup7MUWwabcxg74F1iCCOGTNMwTHs3vkW78oBp4hfVX4MOtjExCHGW4PEk+2HGj2xa0Y IOYA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=mbYjI1mFg3thAEdOWfyRBtPSF9SFp7VqzA4DBX9XOaM=; b=WF8RIGr5QASOUFdM9jKS5tgbSi04lFTDpVXnRnBBPDCs39fh1L2IXNRKjFxXfZ3tRf cOpGZtjr6qnafSVJrwab0/tRBc3KW5GhNvxiaZcAGVN9EUNAx9gLGZCTdkvWKQY/BLYe TLzLNoUUogT56A2Stp2lPsYuVQuTZ9hIxVTKlLKCqCoBCyh2cGwy6QJY2GpMIdcTYag3 9YuvY0Z4pPlZvp3qthp633WgBIiyQq6dL03isnQRoUQOvf7bDttm3JcQu/FAufEAcFjs Lgv9KAUE88Q1xsAr8uF5EiTCHyRwa4UvjCzWkcTlIAfG8zrlPaIQMWqmUa5PWHSqVG7b 4M7Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=WCrcyP08; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v5-20020a63b945000000b0052ca3209fc9si3278841pgo.654.2023.05.19.00.57.05; Fri, 19 May 2023 00:57:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=WCrcyP08; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231416AbjESHop (ORCPT <rfc822;cscallsign@gmail.com> + 99 others); Fri, 19 May 2023 03:44:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59468 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231342AbjESHoG (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 19 May 2023 03:44:06 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 05E521716 for <linux-kernel@vger.kernel.org>; Fri, 19 May 2023 00:42:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684482143; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mbYjI1mFg3thAEdOWfyRBtPSF9SFp7VqzA4DBX9XOaM=; b=WCrcyP08WfHWAkoqyq8aqaP5sLteQR/xHraMnZxvpd3pherzrYPc2rfTvWryXhRAio9PyP u/k+BCPSoui+qPnsghnWr7hu/g0uI0EY3jQ+UKN3c4k4cccWxhsP7wFKhNEeavvUrCAphf /3Nbs9JmzyAuXBf1DQSilWxP2A9UpWw= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-633-8jk5gKwVNnagZwC444Xzmg-1; Fri, 19 May 2023 03:42:19 -0400 X-MC-Unique: 8jk5gKwVNnagZwC444Xzmg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id DF47D800C81; Fri, 19 May 2023 07:42:18 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.42.28.221]) by smtp.corp.redhat.com (Postfix) with ESMTP id 759107AE4; Fri, 19 May 2023 07:42:16 +0000 (UTC) From: David Howells <dhowells@redhat.com> To: Jens Axboe <axboe@kernel.dk>, Al Viro <viro@zeniv.linux.org.uk>, Christoph Hellwig <hch@infradead.org> Cc: David Howells <dhowells@redhat.com>, Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>, Jeff Layton <jlayton@kernel.org>, David Hildenbrand <david@redhat.com>, Jason Gunthorpe <jgg@nvidia.com>, Logan Gunthorpe <logang@deltatee.com>, Hillf Danton <hdanton@sina.com>, Christian Brauner <brauner@kernel.org>, Linus Torvalds <torvalds@linux-foundation.org>, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Christoph Hellwig <hch@lst.de>, Steven Rostedt <rostedt@goodmis.org>, Masami Hiramatsu <mhiramat@kernel.org>, linux-trace-kernel@vger.kernel.org Subject: [PATCH v20 23/32] splice: Convert trace/seq to use direct_splice_read() Date: Fri, 19 May 2023 08:40:38 +0100 Message-Id: <20230519074047.1739879-24-dhowells@redhat.com> In-Reply-To: <20230519074047.1739879-1-dhowells@redhat.com> References: <20230519074047.1739879-1-dhowells@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766308487798273166?= X-GMAIL-MSGID: =?utf-8?q?1766308487798273166?= |
Series |
splice, block: Use page pinning and kill ITER_PIPE
|
|
Commit Message
David Howells
May 19, 2023, 7:40 a.m. UTC
For the splice from the trace seq buffer, just use direct_splice_read().
In the future, something better can probably be done by gifting pages from
seq->buf into the pipe, but that would require changing seq->buf into a
vmap over an array of pages.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Steven Rostedt <rostedt@goodmis.org>
cc: Masami Hiramatsu <mhiramat@kernel.org>
cc: linux-kernel@vger.kernel.org
cc: linux-trace-kernel@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-mm@kvack.org
---
kernel/trace/trace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Comments
On Fri, 19 May 2023 08:40:38 +0100 David Howells <dhowells@redhat.com> wrote: > For the splice from the trace seq buffer, just use direct_splice_read(). > > In the future, something better can probably be done by gifting pages from > seq->buf into the pipe, but that would require changing seq->buf into a > vmap over an array of pages. If you can give me a POC of what needs to be done, I could possibly implement it. > > Signed-off-by: David Howells <dhowells@redhat.com> > cc: Christoph Hellwig <hch@lst.de> > cc: Al Viro <viro@zeniv.linux.org.uk> > cc: Jens Axboe <axboe@kernel.dk> > cc: Steven Rostedt <rostedt@goodmis.org> > cc: Masami Hiramatsu <mhiramat@kernel.org> > cc: linux-kernel@vger.kernel.org > cc: linux-trace-kernel@vger.kernel.org > cc: linux-fsdevel@vger.kernel.org > cc: linux-block@vger.kernel.org > cc: linux-mm@kvack.org > --- > kernel/trace/trace.c | 2 +- Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> -- Steve > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index ebc59781456a..b664020efcb7 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -5171,7 +5171,7 @@ static const struct file_operations tracing_fops = { > .open = tracing_open, > .read = seq_read, > .read_iter = seq_read_iter, > - .splice_read = generic_file_splice_read, > + .splice_read = direct_splice_read, > .write = tracing_write_stub, > .llseek = tracing_lseek, > .release = tracing_release,
Steven Rostedt <rostedt@goodmis.org> wrote: > > In the future, something better can probably be done by gifting pages from > > seq->buf into the pipe, but that would require changing seq->buf into a > > vmap over an array of pages. > > If you can give me a POC of what needs to be done, I could possibly > implement it. I wrote my idea up here for Masami[*]: We could implement seq_splice_read(). What we would need to do is to change how the seq buffer is allocated: bulk allocate a bunch of arbitrary pages which we then vmap(). When we need to splice, we read into the buffer, do a vunmap() and then splice the pages holding the data we used into the pipe. If we don't manage to splice all the data, we can continue splicing from the pages we have left next time. If a read() comes along to view partially spliced data, we would need to copy from the individual pages. When we use up all the data, we discard all the pages we might have spliced from and shuffle down the other pages, call the bulk allocator to replenish the buffer and then vmap() it again. Any pages we've spliced from must be discarded and replaced and not rewritten. If a read() comes without the buffer having been spliced from, it can do as it does now. David --- [*] https://lore.kernel.org/linux-fsdevel/20230522-pfund-ferngeblieben-53fad9c0e527@brauner/T/#mc03959454c76cc3f29024b092c62d88c90f7c071
On Mon, May 22, 2023 at 7:50 AM David Howells <dhowells@redhat.com> wrote: > > We could implement seq_splice_read(). What we would need to do is to change > how the seq buffer is allocated: bulk allocate a bunch of arbitrary pages > which we then vmap(). When we need to splice, we read into the buffer, do a > vunmap() and then splice the pages holding the data we used into the pipe. Please don't use vmap as a way to do zero-copy. The virtual mapping games are more expensive than a small copy from some random seq file. Yes, yes, seq_file currently uses "kvmalloc()", which does fall back to vmalloc too. But the keyword there is "falls back". Most of the time it's just a regular boring kmalloc, and most of the time a seq-file is tiny. Linus
On Mon, 22 May 2023 10:42:12 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, May 22, 2023 at 7:50 AM David Howells <dhowells@redhat.com> wrote: > > > > We could implement seq_splice_read(). What we would need to do is to change > > how the seq buffer is allocated: bulk allocate a bunch of arbitrary pages > > which we then vmap(). When we need to splice, we read into the buffer, do a > > vunmap() and then splice the pages holding the data we used into the pipe. > > Please don't use vmap as a way to do zero-copy. > > The virtual mapping games are more expensive than a small copy from > some random seq file. > > Yes, yes, seq_file currently uses "kvmalloc()", which does fall back > to vmalloc too. But the keyword there is "falls back". Most of the > time it's just a regular boring kmalloc, and most of the time a > seq-file is tiny. I was thinking this change had to do with the splice callback for trace_pipe_raw (which is a hot path that does zero copy of the ftrace ring buffer into files). But looking at this further, I see that it's for just the "trace" file, which is a textual conversion of the tracing data (slow path, although some user space uses this and parses the text, which IMHO is wrong). In other words, I don't really care much about this code being "efficient". -- Steve
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index ebc59781456a..b664020efcb7 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -5171,7 +5171,7 @@ static const struct file_operations tracing_fops = { .open = tracing_open, .read = seq_read, .read_iter = seq_read_iter, - .splice_read = generic_file_splice_read, + .splice_read = direct_splice_read, .write = tracing_write_stub, .llseek = tracing_lseek, .release = tracing_release,