Message ID | 20230626164752.1098394-1-nmi@metaspace.dk |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp7633224vqr; Mon, 26 Jun 2023 10:16:29 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7zP5jUrAvLkHgtDWnw1IYIJoZ20odqGtvPVwFDt3YEzlcr9NzR/O/8GZLjN98h6smJCTbl X-Received: by 2002:aa7:df96:0:b0:51b:f421:2c29 with SMTP id b22-20020aa7df96000000b0051bf4212c29mr7828689edy.16.1687799789048; Mon, 26 Jun 2023 10:16:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687799789; cv=none; d=google.com; s=arc-20160816; b=wR5tjQQxgmlI4uLVWINpQkwwedI3Y51/yi1uSH7ShsswmJYzcjyPa9wDnDsxBh+xgb qdA1von7j/tuVSGQeLpP2VLXdyWw+1BP0jYLrTIzs/vSGqKRNtycZMwjnmflzFPcwfAV CbAq+Ks2/LIrboN642zQeVoyQ2iyKfOLk5PFaTr4uuw+QJxnjsjqHXpHwPEcEn8wU5Vk qIeX6rSN6OrlRZIKOlARDsm0i3tgjvsXnntAf5zNpTGcbegafyTxEztq4PfQeJAPZsIq IAH52YcBTBgCPgMl5wwNuN90jwW5m3n4K3FdRqMDoIEW28ruMdV8gPZNT87MdokYTeQx HymQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=A/nUDlqt951lKsKkZgydvbd14iydf4nHrE0dUZUBvRw=; fh=Lxfn14xTut8YyV6vFGSR3NOGjG/FAHf9WJhctUCmG60=; b=IqeGfwwdZiMzKsJyb5MKA4rhkDJeerC2U43gO1xyVRg7HkPkjJwUS1iGa/Lv1aW/+s XfoMRlA2DvqH2d5ESVOaynfbAYfYCK2LrNcTrDVE+ZrFpQI+NJ+WCoYy3XLWiVVwKtAf D4zOL/CmEGpA9Py+tu0jlLDQfraZ5ieBkvhH7iplnvo2+5YtmYpkqq/huTbUxznTnCDd quPBs2QyM3Gn2q1+x6SP0FSsibHLPmPozL/BZIzOPtcZ8suZ4qS2lti4hMcIjFvCP29s DwrV60/5+LP8I0M2o/Krgw6CBb1NFxc50YaxNOE+Cjaa+jHJOhTgfazQ/EDhoLJnbZDT GZVg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@metaspace-dk.20221208.gappssmtp.com header.s=20221208 header.b=YWsvvtIB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r4-20020aa7cfc4000000b0051d9195b3e1si2225851edy.232.2023.06.26.10.16.04; Mon, 26 Jun 2023 10:16:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@metaspace-dk.20221208.gappssmtp.com header.s=20221208 header.b=YWsvvtIB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229471AbjFZQsA (ORCPT <rfc822;filip.gregor98@gmail.com> + 99 others); Mon, 26 Jun 2023 12:48:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40100 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229597AbjFZQr6 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 26 Jun 2023 12:47:58 -0400 Received: from mail-ed1-x532.google.com (mail-ed1-x532.google.com [IPv6:2a00:1450:4864:20::532]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1559918E for <linux-kernel@vger.kernel.org>; Mon, 26 Jun 2023 09:47:56 -0700 (PDT) Received: by mail-ed1-x532.google.com with SMTP id 4fb4d7f45d1cf-51d9865b7bdso1357710a12.1 for <linux-kernel@vger.kernel.org>; Mon, 26 Jun 2023 09:47:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=metaspace-dk.20221208.gappssmtp.com; s=20221208; t=1687798074; x=1690390074; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=A/nUDlqt951lKsKkZgydvbd14iydf4nHrE0dUZUBvRw=; b=YWsvvtIBa6fa44eN1VmuC/broBdRFbDtLYl8CO7rqAcvHYrR7tUmJUQUrS4j/39x2d CUoZCznWtf3qjSbO9plXJDIzeomk2BI0jmGA99MF8TMs9KPCrc5uSXK0vrczxoNhOPHd d0hMhkGlXMZkArgsPLPIMjo+90/Ugiv/uz4mqaf6TpkBxpq2reFn/mYIBfmrxMwq4v3M cUlbEPG5gCe7GxewogMCSRh/b11LmDFmC/6FZyd47PIl8PvKE8LhDIOYG4v8npbFuDS4 GUtN15Iu5szeLpekMDPx/ELJlrtcXDlIxcNMgfhqhL7GPs939gXzW2beKymd/Mh3t7lv /8uA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687798074; x=1690390074; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=A/nUDlqt951lKsKkZgydvbd14iydf4nHrE0dUZUBvRw=; b=R4b/MaOlaBKnj2K4z5dZNp9Y5ZADi2xaaMo7uuurAAOl6sDY63m1RY6VL7upuYKMC4 rxwyp4ZWnpoBebHg/PNQnZvaZWZBqCFY64DjWld4vxAW00iCfZIV1EWN0g1btL/jUtQK cCaS9OxFvbWosW7yiU/vRvyv9rFWZ48CfAtWiBl9imXIqT44aMNFXiF5DTWITUx0haps 210kPv7ETjMTbB/ZrMIkXIjgv0bpO2MOm+5FXmKx8E22ESXePXLOMCD0xM565ePih/g8 5b+qN7u9DGJMP5fLU5NLTcAFoOnUVKjeZqZGEP9BgViniTkboUKH9kbo4+bBsw3P2qJr c2pw== X-Gm-Message-State: AC+VfDwlLk3kR2AbkJCUlBV0/D5HMMpImMW6Mrt+GJwqGGVNl4x8byVx KWJ/NAopYEcEo3H69xcm3uDSoaVcoUqBO1S05us= X-Received: by 2002:a05:6402:358:b0:51d:9db4:8201 with SMTP id r24-20020a056402035800b0051d9db48201mr1902333edw.7.1687798074413; Mon, 26 Jun 2023 09:47:54 -0700 (PDT) Received: from localhost ([79.142.230.34]) by smtp.gmail.com with ESMTPSA id s25-20020aa7c559000000b0051d914a9f49sm2305579edr.65.2023.06.26.09.47.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Jun 2023 09:47:53 -0700 (PDT) From: Andreas Hindborg <nmi@metaspace.dk> To: Damien Le Moal <dlemoal@kernel.org> Cc: linux-fsdevel@vger.kernel.org (open list:ZONEFS FILESYSTEM), gost.dev@samsung.com, Andreas Hindborg <a.hindborg@samsung.com>, Naohiro Aota <naohiro.aota@wdc.com>, Johannes Thumshirn <jth@kernel.org>, linux-kernel@vger.kernel.org (open list), "Andreas Hindborg (Samsung)" <nmi@metaspace.dk> Subject: [PATCH] zonefs: do not use append if device does not support it Date: Mon, 26 Jun 2023 18:47:52 +0200 Message-ID: <20230626164752.1098394-1-nmi@metaspace.dk> X-Mailer: git-send-email 2.41.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769786351411525258?= X-GMAIL-MSGID: =?utf-8?q?1769786351411525258?= |
Series |
zonefs: do not use append if device does not support it
|
|
Commit Message
Andreas Hindborg
June 26, 2023, 4:47 p.m. UTC
From: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk> Zonefs will try to use `zonefs_file_dio_append()` for direct sync writes even if device `max_zone_append_sectors` is zero. This will cause the IO to fail as the io vector is truncated to zero. It also causes a call to `invalidate_inode_pages2_range()` with end set to UINT_MAX, which is probably not intentional. Thus, do not use append when device does not support it. Signed-off-by: Andreas Hindborg (Samsung) <nmi@metaspace.dk> --- fs/zonefs/file.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) base-commit: 45a3e24f65e90a047bef86f927ebdc4c710edaa1
Comments
On 26.06.23 18:47, Andreas Hindborg wrote: > From: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk> > > Zonefs will try to use `zonefs_file_dio_append()` for direct sync writes even if > device `max_zone_append_sectors` is zero. This will cause the IO to fail as the > io vector is truncated to zero. It also causes a call to > `invalidate_inode_pages2_range()` with end set to UINT_MAX, which is probably > not intentional. Thus, do not use append when device does not support it. > I'm sorry but I think it has been stated often enough that for Linux Zone Append is a mandatory feature for a Zoned Block Device. Therefore this path is essentially dead code as max_zone_append_sectors will always be greater than zero. So this is a clear NAK from my side.
Johannes Thumshirn <Johannes.Thumshirn@wdc.com> writes: > On 26.06.23 18:47, Andreas Hindborg wrote: >> From: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk> >> >> Zonefs will try to use `zonefs_file_dio_append()` for direct sync writes even if >> device `max_zone_append_sectors` is zero. This will cause the IO to fail as the >> io vector is truncated to zero. It also causes a call to >> `invalidate_inode_pages2_range()` with end set to UINT_MAX, which is probably >> not intentional. Thus, do not use append when device does not support it. >> > > I'm sorry but I think it has been stated often enough that for Linux Zone Append > is a mandatory feature for a Zoned Block Device. Therefore this path is essentially > dead code as max_zone_append_sectors will always be greater than zero. > > So this is a clear NAK from my side. OK, thanks for clarifying 👍 I came across this bugging out while playing around with zone append for ublk. The code makes sense if the stack expects append to always be present. I didn't follow the discussion, could you reiterate why the policy is that zoned devices _must_ support append? Best regards, Andreas
On 6/27/23 03:23, Andreas Hindborg (Samsung) wrote: > > Johannes Thumshirn <Johannes.Thumshirn@wdc.com> writes: > >> On 26.06.23 18:47, Andreas Hindborg wrote: >>> From: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk> >>> >>> Zonefs will try to use `zonefs_file_dio_append()` for direct sync writes even if >>> device `max_zone_append_sectors` is zero. This will cause the IO to fail as the >>> io vector is truncated to zero. It also causes a call to >>> `invalidate_inode_pages2_range()` with end set to UINT_MAX, which is probably >>> not intentional. Thus, do not use append when device does not support it. >>> >> >> I'm sorry but I think it has been stated often enough that for Linux Zone Append >> is a mandatory feature for a Zoned Block Device. Therefore this path is essentially >> dead code as max_zone_append_sectors will always be greater than zero. >> >> So this is a clear NAK from my side. > > OK, thanks for clarifying 👍 I came across this bugging out while > playing around with zone append for ublk. The code makes sense if the > stack expects append to always be present. > > I didn't follow the discussion, could you reiterate why the policy is > that zoned devices _must_ support append? To avoid support fragmentation and for performance. btrfs zoned block device support requires zone append and using that command makes writes much faster as we do not have to go through zone locking. Note that for zonefs, I plan to add async zone append support as well, linked with O_APPEND use to further improve write performance with ZNS drives. > > Best regards, > Andreas >
On Mon, Jun 26, 2023 at 06:47:52PM +0200, Andreas Hindborg wrote: > From: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk> > > Zonefs will try to use `zonefs_file_dio_append()` for direct sync writes even if > device `max_zone_append_sectors` is zero. This will cause the IO to fail as the > io vector is truncated to zero. It also causes a call to > `invalidate_inode_pages2_range()` with end set to UINT_MAX, which is probably > not intentional. Thus, do not use append when device does not support it. How do you even manage to hit this code? Zone Append is a mandatory feature and driver need to check it is available.
On 6/27/23 12:45, Christoph Hellwig wrote: > On Mon, Jun 26, 2023 at 06:47:52PM +0200, Andreas Hindborg wrote: >> From: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk> >> >> Zonefs will try to use `zonefs_file_dio_append()` for direct sync writes even if >> device `max_zone_append_sectors` is zero. This will cause the IO to fail as the >> io vector is truncated to zero. It also causes a call to >> `invalidate_inode_pages2_range()` with end set to UINT_MAX, which is probably >> not intentional. Thus, do not use append when device does not support it. > > How do you even manage to hit this code? Zone Append is a mandatory > feature and driver need to check it is available. ublk driver probably is missing that check ? I have not looked at the code for zone support. But thinking of it, we probably would be better off having a generic check for "q->limits.max_zone_append_sectors != 0" in blk_revalidate_disk_zones().
On Tue, Jun 27, 2023 at 01:45:38PM +0900, Damien Le Moal wrote: > But thinking of it, we probably would be better off having a generic check for > "q->limits.max_zone_append_sectors != 0" in blk_revalidate_disk_zones(). Agreed.
On 6/27/23 13:48, Christoph Hellwig wrote: > On Tue, Jun 27, 2023 at 01:45:38PM +0900, Damien Le Moal wrote: >> But thinking of it, we probably would be better off having a generic check for >> "q->limits.max_zone_append_sectors != 0" in blk_revalidate_disk_zones(). > > Agreed. I'll send something.
Damien Le Moal <dlemoal@kernel.org> writes: > On 6/27/23 12:45, Christoph Hellwig wrote: >> On Mon, Jun 26, 2023 at 06:47:52PM +0200, Andreas Hindborg wrote: >>> From: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk> >>> >>> Zonefs will try to use `zonefs_file_dio_append()` for direct sync writes even if >>> device `max_zone_append_sectors` is zero. This will cause the IO to fail as the >>> io vector is truncated to zero. It also causes a call to >>> `invalidate_inode_pages2_range()` with end set to UINT_MAX, which is probably >>> not intentional. Thus, do not use append when device does not support it. >> >> How do you even manage to hit this code? Zone Append is a mandatory >> feature and driver need to check it is available. > > ublk driver probably is missing that check ? I have not looked at the code for > zone support. > > But thinking of it, we probably would be better off having a generic check for > "q->limits.max_zone_append_sectors != 0" in blk_revalidate_disk_zones(). I was playing with ublk zone support. It seems I made it buggy by allowing zone append size to go to zero. Adding the check would be a nice help to people like me that will implement whatever in their driver :) Best regards Andreas
Damien Le Moal <dlemoal@kernel.org> writes: > On 6/27/23 03:23, Andreas Hindborg (Samsung) wrote: >> >> Johannes Thumshirn <Johannes.Thumshirn@wdc.com> writes: >> >>> On 26.06.23 18:47, Andreas Hindborg wrote: >>>> From: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk> >>>> >>>> Zonefs will try to use `zonefs_file_dio_append()` for direct sync writes even if >>>> device `max_zone_append_sectors` is zero. This will cause the IO to fail as the >>>> io vector is truncated to zero. It also causes a call to >>>> `invalidate_inode_pages2_range()` with end set to UINT_MAX, which is probably >>>> not intentional. Thus, do not use append when device does not support it. >>>> >>> >>> I'm sorry but I think it has been stated often enough that for Linux Zone Append >>> is a mandatory feature for a Zoned Block Device. Therefore this path is essentially >>> dead code as max_zone_append_sectors will always be greater than zero. >>> >>> So this is a clear NAK from my side. >> >> OK, thanks for clarifying 👍 I came across this bugging out while >> playing around with zone append for ublk. The code makes sense if the >> stack expects append to always be present. >> >> I didn't follow the discussion, could you reiterate why the policy is >> that zoned devices _must_ support append? > > To avoid support fragmentation and for performance. btrfs zoned block device > support requires zone append and using that command makes writes much faster as > we do not have to go through zone locking. > Note that for zonefs, I plan to add async zone append support as well, linked > with O_APPEND use to further improve write performance with ZNS drives. > Thanks for clarifying, Damien 👍 BR Andreas
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c index 132f01d3461f..c97fe2aa20b0 100644 --- a/fs/zonefs/file.c +++ b/fs/zonefs/file.c @@ -536,9 +536,11 @@ static ssize_t zonefs_write_checks(struct kiocb *iocb, struct iov_iter *from) static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from) { struct inode *inode = file_inode(iocb->ki_filp); + struct block_device *bdev = inode->i_sb->s_bdev; struct zonefs_inode_info *zi = ZONEFS_I(inode); struct zonefs_zone *z = zonefs_inode_zone(inode); struct super_block *sb = inode->i_sb; + unsigned int max_append = bdev_max_zone_append_sectors(bdev); bool sync = is_sync_kiocb(iocb); bool append = false; ssize_t ret, count; @@ -581,7 +583,7 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from) append = sync; } - if (append) { + if (append && max_append) { ret = zonefs_file_dio_append(iocb, from); } else { /*