fs/read_write: Enable copy_file_range for block device.

Message ID 20230724060336.8939-1-nj.shetty@samsung.com
State New
Headers
Series fs/read_write: Enable copy_file_range for block device. |

Commit Message

Nitesh Shetty July 24, 2023, 6:03 a.m. UTC
  From: Anuj Gupta <anuj20.g@samsung.com>

Change generic_copy_file_checks to use ->f_mapping->host for both inode_in
and inode_out. Allow block device in generic_file_rw_checks.

Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 fs/read_write.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)
  

Comments

Dave Chinner July 24, 2023, 6:39 a.m. UTC | #1
On Mon, Jul 24, 2023 at 11:33:36AM +0530, Nitesh Shetty wrote:
> From: Anuj Gupta <anuj20.g@samsung.com>
> 
> Change generic_copy_file_checks to use ->f_mapping->host for both inode_in
> and inode_out. Allow block device in generic_file_rw_checks.

Why? copy_file_range() is for copying a range of a regular file to
another regular file - why do we want to support block devices for
somethign that is clearly intended for copying data files?

Also, the copy_file_range man page states:

ERRORS
.....
    EINVAL Either fd_in or fd_out is not a regular file.
.....

If we are changing the behavioru of copy_file_range (why?), then man
page updates need to be done as well, documenting the change, which
kernel versions only support regular files, etc.

Cheers,

Dave.
  
Christoph Hellwig July 24, 2023, 4:38 p.m. UTC | #2
> > Change generic_copy_file_checks to use ->f_mapping->host for both inode_in
> > and inode_out. Allow block device in generic_file_rw_checks.
> 
> Why? copy_file_range() is for copying a range of a regular file to
> another regular file - why do we want to support block devices for
> somethign that is clearly intended for copying data files?

Nitesh has a series to add block layer copy offload, and uses that to
implement copy_file_range on block device nodes, which seems like a
sensible use case for copy_file_range on block device nodes, and that
series was hiding a change like this deep down in a "block" title
patch, so I asked for it to be split out.  It still really should
be in that series, as there's very little point in changing this
check without an actual implementation making use of it.
  
Christoph Hellwig July 24, 2023, 4:40 p.m. UTC | #3
>  {
> -	struct inode *inode_in = file_inode(file_in);
> -	struct inode *inode_out = file_inode(file_out);
> +	struct inode *inode_in = file_in->f_mapping->host;
> +	struct inode *inode_out = file_out->f_mapping->host;

This doesn't directly have anything to do with block devices, as regular
files can also have a f_mapping that's different.  None of the file
systems actually supporting copy offload right now do, but changing
the dereference here is a correctness thing totally independent of
block device support.
  
Dave Chinner July 24, 2023, 10:08 p.m. UTC | #4
On Mon, Jul 24, 2023 at 06:38:38PM +0200, Christoph Hellwig wrote:
> > > Change generic_copy_file_checks to use ->f_mapping->host for both inode_in
> > > and inode_out. Allow block device in generic_file_rw_checks.
> > 
> > Why? copy_file_range() is for copying a range of a regular file to
> > another regular file - why do we want to support block devices for
> > somethign that is clearly intended for copying data files?
> 
> Nitesh has a series to add block layer copy offload,

Yes, I know.

> and uses that to
> implement copy_file_range on block device nodes,

Yes, I know.

> which seems like a
> sensible use case for copy_file_range on block device nodes,

Except for the fact it's documented and implemented as for copying
data ranges of *regular files*. Block devices are not regular
files...

There is nothing in this patchset that explains why this syscall
feature creep is desired, why it is the best solution, what benefits
it provides, how this new feature is discoverable, etc. It also does
not mention that user facing documentation needs to change, etc

> and that
> series was hiding a change like this deep down in a "block" title
> patch,

I know.

> so I asked for it to be split out.  It still really should
> be in that series, as there's very little point in changing this
> check without an actual implementation making use of it.

And that's because it's the wrong way to be implementing block
device copy offloads.

That patchset originally added ioctls to issue block copy offloads
to block devices from userspace - that's the way block device
specific functionality is generally added and I have no problems
with that.

However, when I originally suggested that the generic
copy_file_range() fallback path that filesystems use (i.e.
generic_copy_file_range()) should try to use the block copy offload
path before falling back to using splice to copy the data through
host memory, things went off the rails.

That has turned into "copy_file_range() should support block devices
directly" and the ioctl interfaces were removed. The block copy
offload patchset still doesn't have a generic path for filesystems
to use this block copy offload. This is *not* what I originally
suggested, and provides none of the benefit to users that would come
from what I originally suggested.  Block device copy offload is
largely useless to users if file data copies within a filesystem
don't make use of it - very few applications ever copy data directly
to/from block devices directly...

So from a system level point of view, expanding copy_file_range() to
do direct block device data copies doesn't make any sense. Expanding
the existing copy_file_range() generic fallback to attempt block
copy offload (i.e. hardware accel) makes much more sense, and will
make copy_file_range() much more useful to users on filesystem like
ext4 that don't have reflink support...

So, yeah, this patch, regardless of how it is presented, needs to a
whole lot more justification that "we want to do this" in the commit
message...

-Dave.
  
Nitesh Shetty July 25, 2023, 11:48 a.m. UTC | #5
Hi Dave,

On 23/07/25 08:08AM, Dave Chinner wrote:
>On Mon, Jul 24, 2023 at 06:38:38PM +0200, Christoph Hellwig wrote:
>> > > Change generic_copy_file_checks to use ->f_mapping->host for both inode_in
>> > > and inode_out. Allow block device in generic_file_rw_checks.
>> >
>> > Why? copy_file_range() is for copying a range of a regular file to
>> > another regular file - why do we want to support block devices for
>> > somethign that is clearly intended for copying data files?
>>
>> Nitesh has a series to add block layer copy offload,
>
>Yes, I know.
>
>> and uses that to
>> implement copy_file_range on block device nodes,
>
>Yes, I know.
>
>> which seems like a
>> sensible use case for copy_file_range on block device nodes,
>
>Except for the fact it's documented and implemented as for copying
>data ranges of *regular files*. Block devices are not regular
>files...
>
>There is nothing in this patchset that explains why this syscall
>feature creep is desired, why it is the best solution, what benefits
>it provides, how this new feature is discoverable, etc. It also does
>not mention that user facing documentation needs to change, etc
>

Agreed. I missed adding context in description.

>> and that
>> series was hiding a change like this deep down in a "block" title
>> patch,
>
>I know.
>
>> so I asked for it to be split out.  It still really should
>> be in that series, as there's very little point in changing this
>> check without an actual implementation making use of it.
>
>And that's because it's the wrong way to be implementing block
>device copy offloads.
>
>That patchset originally added ioctls to issue block copy offloads
>to block devices from userspace - that's the way block device
>specific functionality is generally added and I have no problems
>with that.
>

We moved to copy_file_range, so that we can reuse the existing infra
instead of adding another ioctl[1].

>However, when I originally suggested that the generic
>copy_file_range() fallback path that filesystems use (i.e.
>generic_copy_file_range()) should try to use the block copy offload
>path before falling back to using splice to copy the data through
>host memory, things went off the rails.
>
>That has turned into "copy_file_range() should support block devices
>directly" and the ioctl interfaces were removed. The block copy
>offload patchset still doesn't have a generic path for filesystems
>to use this block copy offload. This is *not* what I originally
>suggested, and provides none of the benefit to users that would come
>from what I originally suggested.  Block device copy offload is
>largely useless to users if file data copies within a filesystem
>don't make use of it - very few applications ever copy data directly
>to/from block devices directly...
>
>So from a system level point of view, expanding copy_file_range() to
>do direct block device data copies doesn't make any sense. Expanding
>the existing copy_file_range() generic fallback to attempt block
>copy offload (i.e. hardware accel) makes much more sense, and will
>make copy_file_range() much more useful to users on filesystem like
>ext4 that don't have reflink support...
>

Agreed. But adding all the cases is making the series heavier and harder to
iterate. So we trimmed some of the patches and feature. 
Hopefully we can get to filesystems, if the current series gets in.

>So, yeah, this patch, regardless of how it is presented, needs to a
>whole lot more justification that "we want to do this" in the commit
>message...
>

Agreed, the commit description was not conveying the things we wanted to
do. It makes sense to send block related relaxation as part of copy offload
series instead of doing it here.
However, the change to get inode pointer using mapping_host is still
independent and can go as a separate fix patch.


Thank you,
Nitesh Shetty


[1] https://lore.kernel.org/all/Y3607N6lDRK6WU7%2F@T590/
  

Patch

diff --git a/fs/read_write.c b/fs/read_write.c
index b07de77ef126..eaeb481477f4 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1405,8 +1405,8 @@  static int generic_copy_file_checks(struct file *file_in, loff_t pos_in,
 				    struct file *file_out, loff_t pos_out,
 				    size_t *req_count, unsigned int flags)
 {
-	struct inode *inode_in = file_inode(file_in);
-	struct inode *inode_out = file_inode(file_out);
+	struct inode *inode_in = file_in->f_mapping->host;
+	struct inode *inode_out = file_out->f_mapping->host;
 	uint64_t count = *req_count;
 	loff_t size_in;
 	int ret;
@@ -1708,7 +1708,9 @@  int generic_file_rw_checks(struct file *file_in, struct file *file_out)
 	/* Don't copy dirs, pipes, sockets... */
 	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
 		return -EISDIR;
-	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
+	if (!S_ISREG(inode_in->i_mode) && !S_ISBLK(inode_in->i_mode))
+		return -EINVAL;
+	if ((inode_in->i_mode & S_IFMT) != (inode_out->i_mode & S_IFMT))
 		return -EINVAL;
 
 	if (!(file_in->f_mode & FMODE_READ) ||