Message ID | 167391048988.2311931.1567396746365286847.stgit@warthog.procyon.org.uk |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1446800wrn; Mon, 16 Jan 2023 15:10:49 -0800 (PST) X-Google-Smtp-Source: AMrXdXv79gP/rW+UIdcG+W2G/yhK5K9s452DpWDgBKyintSe+qL4Z01quL69Cj7PSEG/WEeClySQ X-Received: by 2002:a05:6402:5110:b0:49d:fff2:d4d7 with SMTP id m16-20020a056402511000b0049dfff2d4d7mr1122547edd.30.1673910649250; Mon, 16 Jan 2023 15:10:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673910649; cv=none; d=google.com; s=arc-20160816; b=WvTOYyjoTf4uHvys+7myZrUgvjeWCwtPfacKAtAhdijEm8hbQcgfs4nroXcgb2bjWg pvb5oH4A8H6rdW67vFU2+OXmRZ5N7bDS6CqJgAUfofgcoSAnvYMonGxh8maSLVKpzuC8 u36XM72jROPW35yMs3MF39m1IqJSSIwAvJdky+gkzn7JjN04j2CEAES0XY5YXuaTH/tc D0mphHTCR6Xbgy5wGmKxmSolK0h/WBJ672xr5SNMzOPkN3NBLEVhErn7n8gWWrkmdr8G mha2AOJhfLUo6RpSJT85M6kPV+SCZrjYFF7w2R/4xTt3vK5lliMogQcu9Y8dRTJIwLUY BOjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=U7dHEM5AmzKgzCCEEdwyvVJA92oDXUOBOpHeITrEM9k=; b=W0ZCPa/uvl0wCCIQQhdZHdBr3z5mICeKw82Qd3ubxHp13/ls27xN3tE888geedc2TY Lm1gP7dUZVoOJjKpBPQyukvRskT1I4ay92BYQQpInWPB+1n8rQq4VPJczAuse12Sr6sy PupOXyjjb37NbhHoFYYdxND3sprNMA7x8ZJ1yrOFIY81T9Vub1MtzyvWRuW6oTMDCKMZ pUc3eTQvvcaMqbpeCn2DQLqH6D353qO3nevjEXSBn0zD9JoFd2tRIa6uTeN6E71wyTun Io/0fpGez936nsQI08iiY2HOphgKA1Kmx9EVKCKWaABck3qvVCC1oEI+Pjatjn/Elbau ZysQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=SebS2ZJ5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j20-20020a05640211d400b0047db5ff4936si2046833edw.490.2023.01.16.15.10.25; Mon, 16 Jan 2023 15:10:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=SebS2ZJ5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235175AbjAPXJn (ORCPT <rfc822;stefanalexe802@gmail.com> + 99 others); Mon, 16 Jan 2023 18:09:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48154 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235196AbjAPXJG (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 16 Jan 2023 18:09:06 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A59C722032 for <linux-kernel@vger.kernel.org>; Mon, 16 Jan 2023 15:08:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910497; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=U7dHEM5AmzKgzCCEEdwyvVJA92oDXUOBOpHeITrEM9k=; b=SebS2ZJ5nIMKTlmgSs5/X8GGIt2lkmWBsgRyl4+aqtEYTK8lFKkzfISV1WgkRF+4R0KbCh xr5/VFa+1hWgFFBiep47MVwNbTzaR6mECGJo3EL90kNE8r76eVDUPw9aV8z//2w22xXR/x cr6Nd2a+LvpgCgBU5AlhvYPSqP50W2A= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-574-m8_Zm0OtPyCt81uCEzo9Ig-1; Mon, 16 Jan 2023 18:08:12 -0500 X-MC-Unique: m8_Zm0OtPyCt81uCEzo9Ig-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CE7FB101A52E; Mon, 16 Jan 2023 23:08:11 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7D3F9492B00; Mon, 16 Jan 2023 23:08:10 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 01/34] vfs: Unconditionally set IOCB_WRITE in call_write_iter() From: David Howells <dhowells@redhat.com> To: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig <hch@infradead.org>, Matthew Wilcox <willy@infradead.org>, Jens Axboe <axboe@kernel.dk>, Jan Kara <jack@suse.cz>, Jeff Layton <jlayton@kernel.org>, Logan Gunthorpe <logang@deltatee.com>, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:08:09 +0000 Message-ID: <167391048988.2311931.1567396746365286847.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755222532591891773?= X-GMAIL-MSGID: =?utf-8?q?1755222532591891773?= |
Series |
iov_iter: Improve page extraction (ref, pin or just list)
|
|
Commit Message
David Howells
Jan. 16, 2023, 11:08 p.m. UTC
IOCB_WRITE is set by aio, io_uring and cachefiles before submitting a write
operation to the VFS, but it isn't set by, say, the write() system call.
Fix this by setting IOCB_WRITE unconditionally in call_write_iter().
This will allow drivers to use IOCB_WRITE instead of the iterator data
source to determine the I/O direction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
include/linux/fs.h | 1 +
1 file changed, 1 insertion(+)
Comments
On Mon, Jan 16, 2023 at 11:08:09PM +0000, David Howells wrote: > IOCB_WRITE is set by aio, io_uring and cachefiles before submitting a write > operation to the VFS, but it isn't set by, say, the write() system call. > > Fix this by setting IOCB_WRITE unconditionally in call_write_iter(). > > This will allow drivers to use IOCB_WRITE instead of the iterator data > source to determine the I/O direction. > > Signed-off-by: David Howells <dhowells@redhat.com> > cc: Alexander Viro <viro@zeniv.linux.org.uk> > cc: Christoph Hellwig <hch@lst.de> > cc: Jens Axboe <axboe@kernel.dk> > cc: linux-block@vger.kernel.org > cc: linux-fsdevel@vger.kernel.org > --- > > include/linux/fs.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 066555ad1bf8..649ff061440e 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -2183,6 +2183,7 @@ static inline ssize_t call_read_iter(struct file *file, struct kiocb *kio, > static inline ssize_t call_write_iter(struct file *file, struct kiocb *kio, > struct iov_iter *iter) > { > + kio->ki_flags |= IOCB_WRITE; > return file->f_op->write_iter(kio, iter); > } This doesn't remove the existing setting of IOCB_WRITE, and also feelds like the wrong place. I suspect the best is to: - rename init_sync_kiocb to init_kiocb - pass a new argument for the destination to it. I'm not entirely sure if flags is a good thing, or an explicit READ/WRITE might be better because it's harder to get wrong, even if a the compiler might generate worth code for it. - also use it in the async callers (io_uring, aio, overlayfs, loop, nvmet, target, cachefs, file backed swap)
Christoph Hellwig <hch@infradead.org> wrote: > I suspect the best is to: > > - rename init_sync_kiocb to init_kiocb > - pass a new argument for the destination Do you mean the direction rather than the destination? > to it. I'm not entirely > sure if flags is a good thing, or an explicit READ/WRITE might be > better because it's harder to get wrong, even if a the compiler > might generate worth code for it. > - also use it in the async callers (io_uring, aio, overlayfs, loop, > nvmet, target, cachefs, file backed swap) David
On Tue, Jan 17, 2023 at 08:28:24AM +0000, David Howells wrote: > Christoph Hellwig <hch@infradead.org> wrote: > > > I suspect the best is to: > > > > - rename init_sync_kiocb to init_kiocb > > - pass a new argument for the destination > > Do you mean the direction rather than the destination? Yes.
Christoph Hellwig <hch@infradead.org> wrote: > I suspect the best is to: > > - rename init_sync_kiocb to init_kiocb > - pass a new argument for the direction to it. I'm not entirely > sure if flags is a good thing, or an explicit READ/WRITE might be > better because it's harder to get wrong, even if a the compiler > might generate worth code for it. So something like: init_kiocb(kiocb, file, WRITE); init_kiocb(kiocb, file, READ); David
On Tue, Jan 17, 2023 at 11:11:16AM +0000, David Howells wrote: > So something like: > > init_kiocb(kiocb, file, WRITE); > init_kiocb(kiocb, file, READ); Yes.
On Mon, Jan 16, 2023 at 11:08:09PM +0000, David Howells wrote: > IOCB_WRITE is set by aio, io_uring and cachefiles before submitting a write > operation to the VFS, but it isn't set by, say, the write() system call. > > Fix this by setting IOCB_WRITE unconditionally in call_write_iter(). Which does nothing for places that do not use call_write_iter()... __kernel_write_iter() is one such; for less obvious specimen see drivers/nvme/target/io-cmd-file.c:nvmet_file_submit_bvec() - there we have iocb coming from the caller and *not* fed to init_sync_kiocb(), so Christoph's suggestion doesn't work either. Sure, we could take care of that by adding ki_flags |= IOCB_WRITE in there, but... FWIW, call chains for ->write_iter() (as an explicit method call) are: ->write_iter() <- __kernel_write_iter() [init_sync_kiocb()] ->write_iter() <- call_write_iter() <- new_sync_write() [init_sync_kiocb()] ->write_iter() <- call_write_iter() <- do_iter_read_write() [init_sync_kiocb()] ->write_iter() <- call_write_iter() <- aio_write() [sets KIOCB_WRITE] ->write_iter() <- call_write_iter() <- io_write() [sets KIOCB_WRITE] ->write_iter() <- nvmet_file_submit_bvec() ->write_iter() <- call_write_iter() <- lo_rw_aio() ->write_iter() <- call_write_iter() <- fd_execute_rw_aio() ->write_iter() <- call_write_iter() <- vfs_iocb_iter_write() The last 4 neither set KIOCB_WRITE nor call init_sync_kiocb(). What's more, there are places that call instances (or their guts - look at btrfs_do_write_iter() callers) directly...
On Mon, Jan 16, 2023 at 11:52:43PM -0800, Christoph Hellwig wrote: > This doesn't remove the existing setting of IOCB_WRITE, and also > feelds like the wrong place. > > I suspect the best is to: > > - rename init_sync_kiocb to init_kiocb > - pass a new argument for the destination to it. I'm not entirely > sure if flags is a good thing, or an explicit READ/WRITE might be > better because it's harder to get wrong, even if a the compiler > might generate worth code for it. > - also use it in the async callers (io_uring, aio, overlayfs, loop, > nvmet, target, cachefs, file backed swap) Do you want it to mess with get_current_ioprio() for those? Looks wrong...
On Wed, Jan 18, 2023 at 10:05:38PM +0000, Al Viro wrote: > __kernel_write_iter() is one such; for less obvious specimen see > drivers/nvme/target/io-cmd-file.c:nvmet_file_submit_bvec() - there > we have iocb coming from the caller and *not* fed to init_sync_kiocb(), > so Christoph's suggestion doesn't work either. Sure, we could take > care of that by adding ki_flags |= IOCB_WRITE in there, but... None of the asyc users of iocbs currently uses init_sync_kiocb. My suggestion is to use it everywhere - we have less than a dozen of these that I all listed.
On Wed, Jan 18, 2023 at 10:11:45PM +0000, Al Viro wrote: > On Mon, Jan 16, 2023 at 11:52:43PM -0800, Christoph Hellwig wrote: > > > This doesn't remove the existing setting of IOCB_WRITE, and also > > feelds like the wrong place. > > > > I suspect the best is to: > > > > - rename init_sync_kiocb to init_kiocb > > - pass a new argument for the destination to it. I'm not entirely > > sure if flags is a good thing, or an explicit READ/WRITE might be > > better because it's harder to get wrong, even if a the compiler > > might generate worth code for it. > > - also use it in the async callers (io_uring, aio, overlayfs, loop, > > nvmet, target, cachefs, file backed swap) > > Do you want it to mess with get_current_ioprio() for those? Looks > wrong... We want to be consistent for sync vs async submission. So I think yes, we want to do the get_current_ioprio for most of them, exceptions beeing aio and io_uring - those could use a __init_iocb or init_iocb_ioprio variant that passs in the explicit priority if we want to avoid the call if it would be overriden later.
Al Viro <viro@zeniv.linux.org.uk> wrote: > Which does nothing for places that do not use call_write_iter()... > __kernel_write_iter() is one such; for less obvious specimen see > drivers/nvme/target/io-cmd-file.c:nvmet_file_submit_bvec() Should these be calling call_read/write_iter()? If not, should call_read/write_iter() be dropped? David
Al Viro <viro@zeniv.linux.org.uk> wrote: > ->write_iter() <- nvmet_file_submit_bvec() > ->write_iter() <- call_write_iter() <- lo_rw_aio() Could call init_kiocb() in lo_rw_aio() and then just overwrite ki_ioprio. > ->write_iter() <- call_write_iter() <- fd_execute_rw_aio() fd_execute_rw_aio() perhaps should call init_kiocb() since the struct is allocated with kmalloc() and not fully initialised. > ->write_iter() <- call_write_iter() <- vfs_iocb_iter_write() > > The last 4 neither set KIOCB_WRITE nor call init_sync_kiocb(). vfs_iocb_iter_write() is given an initialised kiocb. It should not be calling init_sync_kiocb() itself. It's called from two places: cachefiles, which initialises the kiocb itself and sets IOCB_WRITE, and overlayfs, which gets the kiocb from the VFS via its ->write_iter hook the caller of which should have already set IOCB_WRITE. cachefiles should be using init_kiocb() - though since it used kzalloc, init_kiocb() clearing the struct is redundant. > What's more, there are places that call instances (or their guts - look at > btrfs_do_write_iter() callers) directly... At least in the case of btrfs_ioctl_encoded_write(), that can call init_kiocb(). But as you say, there are more to be found. David
Christoph Hellwig <hch@infradead.org> wrote: > We want to be consistent for sync vs async submission. So I think yes, > we want to do the get_current_ioprio for most of them, exceptions > beeing aio and io_uring - those could use a __init_iocb or > init_iocb_ioprio variant that passs in the explicit priority if we want > to avoid the call if it would be overriden later. io_uring is a bit problematic in this regard. io_prep_rw() starts the initialisation of the kiocb, so io_read() and io_write() can't just reinitialise it. OTOH, I'm not sure io_prep_rw() has sufficient information to hand. I wonder if I should add a flag to struct io_op_def to indicate that this is going to be a write operation and maybe add a REQ_F_WRITE flag that gets set by that. David
On Thu, Jan 19, 2023 at 10:01:26AM +0000, David Howells wrote: > Al Viro <viro@zeniv.linux.org.uk> wrote: > > > Which does nothing for places that do not use call_write_iter()... > > __kernel_write_iter() is one such; for less obvious specimen see > > drivers/nvme/target/io-cmd-file.c:nvmet_file_submit_bvec() > > Should these be calling call_read/write_iter()? If not, should > call_read/write_iter() be dropped? I wish they'd just go away, they are a bit of a distraction.
On Thu, Jan 19, 2023 at 11:34:26AM +0000, David Howells wrote: > io_uring is a bit problematic in this regard. io_prep_rw() starts the > initialisation of the kiocb, so io_read() and io_write() can't just > reinitialise it. OTOH, I'm not sure io_prep_rw() has sufficient information > to hand. It could probably be refactored. That being said, I suspect we're better off deferring the whole iov_iter direction cleanup. It's a bit ugly right now, but there is nothing urgent. The gup pin work otoh really is something we need to get down rather sooner than later. So what about deferring this whole cleanup for now?
Christoph Hellwig <hch@infradead.org> wrote:
> So what about deferring this whole cleanup for now?
So you'd rather I stick with the direction indicator in the iov_iter struct
for now?
I still want to add iov_iter_extract_pages(), netfs_extract_user_iter() and
netfs_extract_iter_to_sg(), even if it's only cifs and netfslib that use them
for the moment.
David
diff --git a/include/linux/fs.h b/include/linux/fs.h index 066555ad1bf8..649ff061440e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2183,6 +2183,7 @@ static inline ssize_t call_read_iter(struct file *file, struct kiocb *kio, static inline ssize_t call_write_iter(struct file *file, struct kiocb *kio, struct iov_iter *iter) { + kio->ki_flags |= IOCB_WRITE; return file->f_op->write_iter(kio, iter); }