zonefs: do not use append if device does not support it

Message ID 20230626164752.1098394-1-nmi@metaspace.dk
State New
Headers
Series zonefs: do not use append if device does not support it |

Commit Message

Andreas Hindborg June 26, 2023, 4:47 p.m. UTC
  From: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk>

Zonefs will try to use `zonefs_file_dio_append()` for direct sync writes even if
device `max_zone_append_sectors` is zero. This will cause the IO to fail as the
io vector is truncated to zero. It also causes a call to
`invalidate_inode_pages2_range()` with end set to UINT_MAX, which is probably
not intentional. Thus, do not use append when device does not support it.

Signed-off-by: Andreas Hindborg (Samsung) <nmi@metaspace.dk>
---
 fs/zonefs/file.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


base-commit: 45a3e24f65e90a047bef86f927ebdc4c710edaa1
  

Comments

Johannes Thumshirn June 26, 2023, 5:54 p.m. UTC | #1
On 26.06.23 18:47, Andreas Hindborg wrote:
> From: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk>
> 
> Zonefs will try to use `zonefs_file_dio_append()` for direct sync writes even if
> device `max_zone_append_sectors` is zero. This will cause the IO to fail as the
> io vector is truncated to zero. It also causes a call to
> `invalidate_inode_pages2_range()` with end set to UINT_MAX, which is probably
> not intentional. Thus, do not use append when device does not support it.
> 

I'm sorry but I think it has been stated often enough that for Linux Zone Append
is a mandatory feature for a Zoned Block Device. Therefore this path is essentially
dead code as max_zone_append_sectors will always be greater than zero.

So this is a clear NAK from my side.
  
Andreas Hindborg June 26, 2023, 6:23 p.m. UTC | #2
Johannes Thumshirn <Johannes.Thumshirn@wdc.com> writes:

> On 26.06.23 18:47, Andreas Hindborg wrote:
>> From: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk>
>> 
>> Zonefs will try to use `zonefs_file_dio_append()` for direct sync writes even if
>> device `max_zone_append_sectors` is zero. This will cause the IO to fail as the
>> io vector is truncated to zero. It also causes a call to
>> `invalidate_inode_pages2_range()` with end set to UINT_MAX, which is probably
>> not intentional. Thus, do not use append when device does not support it.
>> 
>
> I'm sorry but I think it has been stated often enough that for Linux Zone Append
> is a mandatory feature for a Zoned Block Device. Therefore this path is essentially
> dead code as max_zone_append_sectors will always be greater than zero.
>
> So this is a clear NAK from my side.

OK, thanks for clarifying 👍 I came across this bugging out while
playing around with zone append for ublk. The code makes sense if the
stack expects append to always be present.

I didn't follow the discussion, could you reiterate why the policy is
that zoned devices _must_ support append?

Best regards,
Andreas
  
Damien Le Moal June 27, 2023, 12:21 a.m. UTC | #3
On 6/27/23 03:23, Andreas Hindborg (Samsung) wrote:
> 
> Johannes Thumshirn <Johannes.Thumshirn@wdc.com> writes:
> 
>> On 26.06.23 18:47, Andreas Hindborg wrote:
>>> From: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk>
>>>
>>> Zonefs will try to use `zonefs_file_dio_append()` for direct sync writes even if
>>> device `max_zone_append_sectors` is zero. This will cause the IO to fail as the
>>> io vector is truncated to zero. It also causes a call to
>>> `invalidate_inode_pages2_range()` with end set to UINT_MAX, which is probably
>>> not intentional. Thus, do not use append when device does not support it.
>>>
>>
>> I'm sorry but I think it has been stated often enough that for Linux Zone Append
>> is a mandatory feature for a Zoned Block Device. Therefore this path is essentially
>> dead code as max_zone_append_sectors will always be greater than zero.
>>
>> So this is a clear NAK from my side.
> 
> OK, thanks for clarifying 👍 I came across this bugging out while
> playing around with zone append for ublk. The code makes sense if the
> stack expects append to always be present.
> 
> I didn't follow the discussion, could you reiterate why the policy is
> that zoned devices _must_ support append?

To avoid support fragmentation and for performance. btrfs zoned block device
support requires zone append and using that command makes writes much faster as
we do not have to go through zone locking.
Note that for zonefs, I plan to add async zone append support as well, linked
with O_APPEND use to further improve write performance with ZNS drives.

> 
> Best regards,
> Andreas
>
  
Christoph Hellwig June 27, 2023, 3:45 a.m. UTC | #4
On Mon, Jun 26, 2023 at 06:47:52PM +0200, Andreas Hindborg wrote:
> From: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk>
> 
> Zonefs will try to use `zonefs_file_dio_append()` for direct sync writes even if
> device `max_zone_append_sectors` is zero. This will cause the IO to fail as the
> io vector is truncated to zero. It also causes a call to
> `invalidate_inode_pages2_range()` with end set to UINT_MAX, which is probably
> not intentional. Thus, do not use append when device does not support it.

How do you even manage to hit this code?  Zone Append is a mandatory
feature and driver need to check it is available.
  
Damien Le Moal June 27, 2023, 4:45 a.m. UTC | #5
On 6/27/23 12:45, Christoph Hellwig wrote:
> On Mon, Jun 26, 2023 at 06:47:52PM +0200, Andreas Hindborg wrote:
>> From: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk>
>>
>> Zonefs will try to use `zonefs_file_dio_append()` for direct sync writes even if
>> device `max_zone_append_sectors` is zero. This will cause the IO to fail as the
>> io vector is truncated to zero. It also causes a call to
>> `invalidate_inode_pages2_range()` with end set to UINT_MAX, which is probably
>> not intentional. Thus, do not use append when device does not support it.
> 
> How do you even manage to hit this code?  Zone Append is a mandatory
> feature and driver need to check it is available.

ublk driver probably is missing that check ? I have not looked at the code for
zone support.

But thinking of it, we probably would be better off having a generic check for
"q->limits.max_zone_append_sectors != 0" in blk_revalidate_disk_zones().
  
Christoph Hellwig June 27, 2023, 4:48 a.m. UTC | #6
On Tue, Jun 27, 2023 at 01:45:38PM +0900, Damien Le Moal wrote:
> But thinking of it, we probably would be better off having a generic check for
> "q->limits.max_zone_append_sectors != 0" in blk_revalidate_disk_zones().

Agreed.
  
Damien Le Moal June 27, 2023, 4:50 a.m. UTC | #7
On 6/27/23 13:48, Christoph Hellwig wrote:
> On Tue, Jun 27, 2023 at 01:45:38PM +0900, Damien Le Moal wrote:
>> But thinking of it, we probably would be better off having a generic check for
>> "q->limits.max_zone_append_sectors != 0" in blk_revalidate_disk_zones().
> 
> Agreed.

I'll send something.
  
Andreas Hindborg June 27, 2023, 5:14 a.m. UTC | #8
Damien Le Moal <dlemoal@kernel.org> writes:

> On 6/27/23 12:45, Christoph Hellwig wrote:
>> On Mon, Jun 26, 2023 at 06:47:52PM +0200, Andreas Hindborg wrote:
>>> From: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk>
>>>
>>> Zonefs will try to use `zonefs_file_dio_append()` for direct sync writes even if
>>> device `max_zone_append_sectors` is zero. This will cause the IO to fail as the
>>> io vector is truncated to zero. It also causes a call to
>>> `invalidate_inode_pages2_range()` with end set to UINT_MAX, which is probably
>>> not intentional. Thus, do not use append when device does not support it.
>> 
>> How do you even manage to hit this code?  Zone Append is a mandatory
>> feature and driver need to check it is available.
>
> ublk driver probably is missing that check ? I have not looked at the code for
> zone support.
>
> But thinking of it, we probably would be better off having a generic check for
> "q->limits.max_zone_append_sectors != 0" in blk_revalidate_disk_zones().

I was playing with ublk zone support. It seems I made it buggy by
allowing zone append size to go to zero.

Adding the check would be a nice help to people like me that will
implement whatever in their driver :)

Best regards
Andreas
  
Andreas Hindborg June 27, 2023, 5:45 a.m. UTC | #9
Damien Le Moal <dlemoal@kernel.org> writes:

> On 6/27/23 03:23, Andreas Hindborg (Samsung) wrote:
>> 
>> Johannes Thumshirn <Johannes.Thumshirn@wdc.com> writes:
>> 
>>> On 26.06.23 18:47, Andreas Hindborg wrote:
>>>> From: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk>
>>>>
>>>> Zonefs will try to use `zonefs_file_dio_append()` for direct sync writes even if
>>>> device `max_zone_append_sectors` is zero. This will cause the IO to fail as the
>>>> io vector is truncated to zero. It also causes a call to
>>>> `invalidate_inode_pages2_range()` with end set to UINT_MAX, which is probably
>>>> not intentional. Thus, do not use append when device does not support it.
>>>>
>>>
>>> I'm sorry but I think it has been stated often enough that for Linux Zone Append
>>> is a mandatory feature for a Zoned Block Device. Therefore this path is essentially
>>> dead code as max_zone_append_sectors will always be greater than zero.
>>>
>>> So this is a clear NAK from my side.
>> 
>> OK, thanks for clarifying 👍 I came across this bugging out while
>> playing around with zone append for ublk. The code makes sense if the
>> stack expects append to always be present.
>> 
>> I didn't follow the discussion, could you reiterate why the policy is
>> that zoned devices _must_ support append?
>
> To avoid support fragmentation and for performance. btrfs zoned block device
> support requires zone append and using that command makes writes much faster as
> we do not have to go through zone locking.
> Note that for zonefs, I plan to add async zone append support as well, linked
> with O_APPEND use to further improve write performance with ZNS drives.
>

Thanks for clarifying, Damien 👍

BR Andreas
  

Patch

diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index 132f01d3461f..c97fe2aa20b0 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -536,9 +536,11 @@  static ssize_t zonefs_write_checks(struct kiocb *iocb, struct iov_iter *from)
 static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct inode *inode = file_inode(iocb->ki_filp);
+	struct block_device *bdev = inode->i_sb->s_bdev;
 	struct zonefs_inode_info *zi = ZONEFS_I(inode);
 	struct zonefs_zone *z = zonefs_inode_zone(inode);
 	struct super_block *sb = inode->i_sb;
+	unsigned int max_append = bdev_max_zone_append_sectors(bdev);
 	bool sync = is_sync_kiocb(iocb);
 	bool append = false;
 	ssize_t ret, count;
@@ -581,7 +583,7 @@  static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
 		append = sync;
 	}
 
-	if (append) {
+	if (append && max_append) {
 		ret = zonefs_file_dio_append(iocb, from);
 	} else {
 		/*