Message ID | 20230504105624.9789-1-idryomov@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp220056vqo; Thu, 4 May 2023 04:02:56 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4kPIUu2D5IDfw2hdDKkkAsHHNZBlEXRSlabBj/iUubeRtmq8+PyRJvS64iU/r0yAJMOSpo X-Received: by 2002:a17:90a:4106:b0:23f:9fac:6b35 with SMTP id u6-20020a17090a410600b0023f9fac6b35mr1696130pjf.39.1683198175955; Thu, 04 May 2023 04:02:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683198175; cv=none; d=google.com; s=arc-20160816; b=zjEXiP8ZPgBi7aYASaxOlpXxjtE0yBQxkD4rv0ABDEnkLw4B8lndihrIOa/6hDDWJF c9SY6Ld8v74mZEt+fMHP/yLcYV8nKKe0/cDtMFlX0yHKVeGXY9Enlo6Ky0p8YOm5WXyW /6VMxX1fqWKX2/qOLoa/qmL6WeRM1aKwjYz7075pHjJ7QDQAQ9QRwemTEozvWv5Vq5ip NqwJpBYc/oGFCWJ885ITU5fipL0RUYtoZi74XHljXzva10dMnp832khWufpRL0d4fiKk glGiyVTh6o84aunIMt4WX0+Y3j8s8QehOsawebhQpgEW5T2O6hcaLdtdpf2o5mEuTE00 xMlw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=ifOfkEUyxxO0eazLLKLawS21srpdOWmLX2BHDyl3i/E=; b=lJ6GyTyqY7kC/5XADVv8FHOPcy1KfLHBCnw2NxIAmNEhUHpbyLl4KGw51ZHh36FZ3f +dxBcA9xgxbA524nqrsisy34/YGHD8AboAn6Qt3un8zWU16pu/12I+naQ0ZWH1D7JO61 bKi9CZ3kZydBrWqFCRRVc+eeAO4qmsbRFnYZrlgm7JNA587WRT7WRJE2A8GoYgwzwAv3 sxJsEfzJ04SgsLi1cQk5NLC1Ja5lw99NbQUpdjV/a6X1rklpM76jerWZ4SYvt907uVaE wnyEGWimsZU1iFDToqpI4fVnxVVI7lvFBuHjdL81pprWnkYDVd1CwgaG1SgPYDA+Mhis N+Yg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b="r6/ULPkE"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g6-20020a636b06000000b0052c4026d6a9si6894013pgc.585.2023.05.04.04.02.42; Thu, 04 May 2023 04:02:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b="r6/ULPkE"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230119AbjEDK4z (ORCPT <rfc822;jaysivo@gmail.com> + 99 others); Thu, 4 May 2023 06:56:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33414 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230194AbjEDK4h (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 4 May 2023 06:56:37 -0400 Received: from mail-ej1-x62d.google.com (mail-ej1-x62d.google.com [IPv6:2a00:1450:4864:20::62d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A20D44C01; Thu, 4 May 2023 03:56:36 -0700 (PDT) Received: by mail-ej1-x62d.google.com with SMTP id a640c23a62f3a-965ab8ed1c0so49909266b.2; Thu, 04 May 2023 03:56:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1683197795; x=1685789795; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=ifOfkEUyxxO0eazLLKLawS21srpdOWmLX2BHDyl3i/E=; b=r6/ULPkETTOQMH/robFPgDlQCst1bREYJAsFn1dwhp1MLmY5wcl5twzUKHdKZPf3Ze EsTq0uWPk86oeTO9wQ/eYqzHXRD+koT4Uf2UClwaMh7lkTA6VTMO7nuZOKmTLTdr+Pd5 yU00n4P009PDG13BAWmuQXGz5t705gggUNyiK7Ro7snRgKBmcWGitgDPMmbxSu2IaNzq WL4esf1Oek2AYFza9Y74mHcHzccfEuf/XoB/JYsVoGbaR2dKVKe3/4W1YFh4r2/T710o 86pPPfnJrjmyuIOHyluem8JkDDqd2gxZku1IPJ3+pMbf9eYlLiQWGdZRhbCJhiDMMQU/ /bvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683197795; x=1685789795; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ifOfkEUyxxO0eazLLKLawS21srpdOWmLX2BHDyl3i/E=; b=FZ6sgJMGajQNR/mADVXiKAipZDsf5JbYIpjbC++ERD/EcMOD+kV2hlk48x7pu1HuW9 g25wATPgsqagXm216uPQx1LslHaOR1f9hOy4SoAioVjPGB/i8lzP94wrijeOUfk3ryER i/SZt0ejUI/oGob6uujjp6WGh4bo7rjZFPu8xpFDFulQxBEoy1TM0kPIrEFrTLPwqFw5 eAESuf+347X9+0To41YcZud35pD7RFLZlzzJACE1/lwovvvH0NYYQks918eBZlj5yF9O wYvPuu7Cbmw3t//xD160GD3W/NqA9pPQHf+FtC1jE9DIoHBLKmnPpJzozDKwE9F1avNn wpqA== X-Gm-Message-State: AC+VfDxypbNXvQ+s+yclXc8xznh3TdsfH/pqvO5glCQBHBP1UPFN3mOs 2hZLtxIEmjthEu2yfAuoJUI= X-Received: by 2002:a17:907:8a02:b0:93e:908d:cfe2 with SMTP id sc2-20020a1709078a0200b0093e908dcfe2mr6022858ejc.0.1683197794976; Thu, 04 May 2023 03:56:34 -0700 (PDT) Received: from zambezi.local (ip-94-112-104-28.bb.vodafone.cz. [94.112.104.28]) by smtp.gmail.com with ESMTPSA id hf15-20020a1709072c4f00b009659fa6eeddsm988030ejc.196.2023.05.04.03.56.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 May 2023 03:56:34 -0700 (PDT) From: Ilya Dryomov <idryomov@gmail.com> To: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz>, Johannes Thumshirn <johannes.thumshirn@wdc.com>, Jens Axboe <axboe@kernel.dk>, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH] mm: always respect QUEUE_FLAG_STABLE_WRITES on the block device Date: Thu, 4 May 2023 12:56:24 +0200 Message-Id: <20230504105624.9789-1-idryomov@gmail.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1764961210790933401?= X-GMAIL-MSGID: =?utf-8?q?1764961210790933401?= |
Series |
mm: always respect QUEUE_FLAG_STABLE_WRITES on the block device
|
|
Commit Message
Ilya Dryomov
May 4, 2023, 10:56 a.m. UTC
Commit 1cb039f3dc16 ("bdi: replace BDI_CAP_STABLE_WRITES with a queue
and a sb flag") introduced a regression for the raw block device use
case. Capturing QUEUE_FLAG_STABLE_WRITES flag in set_bdev_super() has
the effect of respecting it only when there is a filesystem mounted on
top of the block device. If a filesystem is not mounted, block devices
that do integrity checking return sporadic checksum errors.
Additionally, this commit made the corresponding sysfs knob writeable
for debugging purposes. However, because QUEUE_FLAG_STABLE_WRITES flag
is captured when the filesystem is mounted and isn't consulted after
that anywhere outside of swap code, changing it doesn't take immediate
effect even though dumping the knob shows the new value. With no way
to dump SB_I_STABLE_WRITES flag, this is needlessly confusing.
Resurrect the original stable writes behavior by changing
folio_wait_stable() to account for the case of a raw block device and
also:
- for the case of a filesystem, test QUEUE_FLAG_STABLE_WRITES flag
each time instead of capturing it in the superblock so that changes
are reflected immediately (thus aligning with the case of a raw block
device)
- retain SB_I_STABLE_WRITES flag for filesystems that need stable
writes independent of the underlying block device (currently just
NFS)
Cc: stable@vger.kernel.org
Fixes: 1cb039f3dc16 ("bdi: replace BDI_CAP_STABLE_WRITES with a queue and a sb flag")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
fs/super.c | 2 --
mm/page-writeback.c | 12 +++++++++++-
2 files changed, 11 insertions(+), 3 deletions(-)
Comments
On Thu, May 04, 2023 at 12:56:24PM +0200, Ilya Dryomov wrote: > Commit 1cb039f3dc16 ("bdi: replace BDI_CAP_STABLE_WRITES with a queue > and a sb flag") introduced a regression for the raw block device use > case. Capturing QUEUE_FLAG_STABLE_WRITES flag in set_bdev_super() has > the effect of respecting it only when there is a filesystem mounted on > top of the block device. If a filesystem is not mounted, block devices > that do integrity checking return sporadic checksum errors. With "If a file system is not mounted" you want to say "when accessing a block device directly" here, right? The two are not exclusive.. > Additionally, this commit made the corresponding sysfs knob writeable > for debugging purposes. However, because QUEUE_FLAG_STABLE_WRITES flag > is captured when the filesystem is mounted and isn't consulted after > that anywhere outside of swap code, changing it doesn't take immediate > effect even though dumping the knob shows the new value. With no way > to dump SB_I_STABLE_WRITES flag, this is needlessly confusing. But very much intentional. s_bdev often is not the only device in a file system, and we should never reference if from core helpers. So I think we should go with something like this: diff --git a/mm/page-writeback.c b/mm/page-writeback.c index db794399900734..aa36cc2a4530c1 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -3129,7 +3129,11 @@ EXPORT_SYMBOL_GPL(folio_wait_writeback_killable); */ void folio_wait_stable(struct folio *folio) { - if (folio_inode(folio)->i_sb->s_iflags & SB_I_STABLE_WRITES) + struct inode *inode = folio_inode(folio); + struct super_block *sb = inode->i_sb; + + if ((sb->s_iflags & SB_I_STABLE_WRITES) || + (sb_is_blkdev_sb(sb) && bdev_stable_writes(I_BDEV(inode)))) folio_wait_writeback(folio); } EXPORT_SYMBOL_GPL(folio_wait_stable);
On Thu, May 04, 2023 at 03:55:15PM +0200, Christoph Hellwig wrote: > On Thu, May 04, 2023 at 12:56:24PM +0200, Ilya Dryomov wrote: > > Commit 1cb039f3dc16 ("bdi: replace BDI_CAP_STABLE_WRITES with a queue > > and a sb flag") introduced a regression for the raw block device use > > case. Capturing QUEUE_FLAG_STABLE_WRITES flag in set_bdev_super() has > > the effect of respecting it only when there is a filesystem mounted on > > top of the block device. If a filesystem is not mounted, block devices > > that do integrity checking return sporadic checksum errors. > > With "If a file system is not mounted" you want to say "when accessing > a block device directly" here, right? The two are not exclusive.. > > > Additionally, this commit made the corresponding sysfs knob writeable > > for debugging purposes. However, because QUEUE_FLAG_STABLE_WRITES flag > > is captured when the filesystem is mounted and isn't consulted after > > that anywhere outside of swap code, changing it doesn't take immediate > > effect even though dumping the knob shows the new value. With no way > > to dump SB_I_STABLE_WRITES flag, this is needlessly confusing. > > But very much intentional. s_bdev often is not the only device > in a file system, and we should never reference if from core > helpers. > > So I think we should go with something like this: > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index db794399900734..aa36cc2a4530c1 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -3129,7 +3129,11 @@ EXPORT_SYMBOL_GPL(folio_wait_writeback_killable); > */ > void folio_wait_stable(struct folio *folio) > { > - if (folio_inode(folio)->i_sb->s_iflags & SB_I_STABLE_WRITES) > + struct inode *inode = folio_inode(folio); > + struct super_block *sb = inode->i_sb; > + > + if ((sb->s_iflags & SB_I_STABLE_WRITES) || > + (sb_is_blkdev_sb(sb) && bdev_stable_writes(I_BDEV(inode)))) > folio_wait_writeback(folio); > } > EXPORT_SYMBOL_GPL(folio_wait_stable); I hate both of these patches ;-) What we should do is add AS_STABLE_WRITES, have the appropriate places call mapping_set_stable_writes() and then folio_wait_stable() becomes if (mapping_test_stable_writes(folio->mapping)) folio_wait_writeback(folio); and we remove all the dereferences (mapping->host->i_sb->s_iflags, plus whatever else is going on there)
On Thu, May 4, 2023 at 3:55 PM Christoph Hellwig <hch@lst.de> wrote: > > On Thu, May 04, 2023 at 12:56:24PM +0200, Ilya Dryomov wrote: > > Commit 1cb039f3dc16 ("bdi: replace BDI_CAP_STABLE_WRITES with a queue > > and a sb flag") introduced a regression for the raw block device use > > case. Capturing QUEUE_FLAG_STABLE_WRITES flag in set_bdev_super() has > > the effect of respecting it only when there is a filesystem mounted on > > top of the block device. If a filesystem is not mounted, block devices > > that do integrity checking return sporadic checksum errors. > > With "If a file system is not mounted" you want to say "when accessing > a block device directly" here, right? The two are not exclusive.. Hi Christoph, Right, I meant to say "when accessing a block device directly". > > > Additionally, this commit made the corresponding sysfs knob writeable > > for debugging purposes. However, because QUEUE_FLAG_STABLE_WRITES flag > > is captured when the filesystem is mounted and isn't consulted after > > that anywhere outside of swap code, changing it doesn't take immediate > > effect even though dumping the knob shows the new value. With no way > > to dump SB_I_STABLE_WRITES flag, this is needlessly confusing. > > But very much intentional. s_bdev often is not the only device > in a file system, and we should never reference if from core > helpers. > > So I think we should go with something like this: > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index db794399900734..aa36cc2a4530c1 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -3129,7 +3129,11 @@ EXPORT_SYMBOL_GPL(folio_wait_writeback_killable); > */ > void folio_wait_stable(struct folio *folio) > { > - if (folio_inode(folio)->i_sb->s_iflags & SB_I_STABLE_WRITES) > + struct inode *inode = folio_inode(folio); > + struct super_block *sb = inode->i_sb; > + > + if ((sb->s_iflags & SB_I_STABLE_WRITES) || > + (sb_is_blkdev_sb(sb) && bdev_stable_writes(I_BDEV(inode)))) > folio_wait_writeback(folio); Heh, this is almost exactly what I came up with initially (|| arms were swapped in that version), but decided to improve upon after noticing the writeable thing. Thanks, Ilya
On Thu, May 4, 2023 at 4:16 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Thu, May 04, 2023 at 03:55:15PM +0200, Christoph Hellwig wrote: > > On Thu, May 04, 2023 at 12:56:24PM +0200, Ilya Dryomov wrote: > > > Commit 1cb039f3dc16 ("bdi: replace BDI_CAP_STABLE_WRITES with a queue > > > and a sb flag") introduced a regression for the raw block device use > > > case. Capturing QUEUE_FLAG_STABLE_WRITES flag in set_bdev_super() has > > > the effect of respecting it only when there is a filesystem mounted on > > > top of the block device. If a filesystem is not mounted, block devices > > > that do integrity checking return sporadic checksum errors. > > > > With "If a file system is not mounted" you want to say "when accessing > > a block device directly" here, right? The two are not exclusive.. > > > > > Additionally, this commit made the corresponding sysfs knob writeable > > > for debugging purposes. However, because QUEUE_FLAG_STABLE_WRITES flag > > > is captured when the filesystem is mounted and isn't consulted after > > > that anywhere outside of swap code, changing it doesn't take immediate > > > effect even though dumping the knob shows the new value. With no way > > > to dump SB_I_STABLE_WRITES flag, this is needlessly confusing. > > > > But very much intentional. s_bdev often is not the only device > > in a file system, and we should never reference if from core > > helpers. > > > > So I think we should go with something like this: > > > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > > index db794399900734..aa36cc2a4530c1 100644 > > --- a/mm/page-writeback.c > > +++ b/mm/page-writeback.c > > @@ -3129,7 +3129,11 @@ EXPORT_SYMBOL_GPL(folio_wait_writeback_killable); > > */ > > void folio_wait_stable(struct folio *folio) > > { > > - if (folio_inode(folio)->i_sb->s_iflags & SB_I_STABLE_WRITES) > > + struct inode *inode = folio_inode(folio); > > + struct super_block *sb = inode->i_sb; > > + > > + if ((sb->s_iflags & SB_I_STABLE_WRITES) || > > + (sb_is_blkdev_sb(sb) && bdev_stable_writes(I_BDEV(inode)))) > > folio_wait_writeback(folio); > > } > > EXPORT_SYMBOL_GPL(folio_wait_stable); > > I hate both of these patches ;-) What we should do is add > AS_STABLE_WRITES, have the appropriate places call > mapping_set_stable_writes() and then folio_wait_stable() becomes > > if (mapping_test_stable_writes(folio->mapping)) > folio_wait_writeback(folio); > > and we remove all the dereferences (mapping->host->i_sb->s_iflags, plus > whatever else is going on there) Hi Matthew, We would still need something resembling Christoph's suggestion for 5.10 and 5.15 (at least). Since this fixes a regression, would you support merging the "ugly" version to facilitate backports or would you rather see the AS/mapping-based refactor first? Thanks, Ilya
On Thu, May 04, 2023 at 05:07:10PM +0200, Ilya Dryomov wrote: > On Thu, May 4, 2023 at 4:16 PM Matthew Wilcox <willy@infradead.org> wrote: > > I hate both of these patches ;-) What we should do is add > > AS_STABLE_WRITES, have the appropriate places call > > mapping_set_stable_writes() and then folio_wait_stable() becomes > > > > if (mapping_test_stable_writes(folio->mapping)) > > folio_wait_writeback(folio); > > > > and we remove all the dereferences (mapping->host->i_sb->s_iflags, plus > > whatever else is going on there) > > Hi Matthew, > > We would still need something resembling Christoph's suggestion for > 5.10 and 5.15 (at least). Since this fixes a regression, would you > support merging the "ugly" version to facilitate backports or would > you rather see the AS/mapping-based refactor first? That's a terrible way of developing for Linux. First, do it the right way for mainline. Then, see how easy the patch is to backport to relevant kernel versions; if it's too ugly to cherry-pick, do something equivalent. But never start out with the premise "This must be backported, so do it as simply as possible first".
On Thu 04-05-23 15:16:39, Matthew Wilcox wrote: > On Thu, May 04, 2023 at 03:55:15PM +0200, Christoph Hellwig wrote: > > On Thu, May 04, 2023 at 12:56:24PM +0200, Ilya Dryomov wrote: > > > Commit 1cb039f3dc16 ("bdi: replace BDI_CAP_STABLE_WRITES with a queue > > > and a sb flag") introduced a regression for the raw block device use > > > case. Capturing QUEUE_FLAG_STABLE_WRITES flag in set_bdev_super() has > > > the effect of respecting it only when there is a filesystem mounted on > > > top of the block device. If a filesystem is not mounted, block devices > > > that do integrity checking return sporadic checksum errors. > > > > With "If a file system is not mounted" you want to say "when accessing > > a block device directly" here, right? The two are not exclusive.. > > > > > Additionally, this commit made the corresponding sysfs knob writeable > > > for debugging purposes. However, because QUEUE_FLAG_STABLE_WRITES flag > > > is captured when the filesystem is mounted and isn't consulted after > > > that anywhere outside of swap code, changing it doesn't take immediate > > > effect even though dumping the knob shows the new value. With no way > > > to dump SB_I_STABLE_WRITES flag, this is needlessly confusing. > > > > But very much intentional. s_bdev often is not the only device > > in a file system, and we should never reference if from core > > helpers. > > > > So I think we should go with something like this: > > > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > > index db794399900734..aa36cc2a4530c1 100644 > > --- a/mm/page-writeback.c > > +++ b/mm/page-writeback.c > > @@ -3129,7 +3129,11 @@ EXPORT_SYMBOL_GPL(folio_wait_writeback_killable); > > */ > > void folio_wait_stable(struct folio *folio) > > { > > - if (folio_inode(folio)->i_sb->s_iflags & SB_I_STABLE_WRITES) > > + struct inode *inode = folio_inode(folio); > > + struct super_block *sb = inode->i_sb; > > + > > + if ((sb->s_iflags & SB_I_STABLE_WRITES) || > > + (sb_is_blkdev_sb(sb) && bdev_stable_writes(I_BDEV(inode)))) > > folio_wait_writeback(folio); > > } > > EXPORT_SYMBOL_GPL(folio_wait_stable); > > I hate both of these patches ;-) What we should do is add > AS_STABLE_WRITES, have the appropriate places call > mapping_set_stable_writes() and then folio_wait_stable() becomes > > if (mapping_test_stable_writes(folio->mapping)) > folio_wait_writeback(folio); > > and we remove all the dereferences (mapping->host->i_sb->s_iflags, plus > whatever else is going on there) For bdev address_space that's easy but what Ilya also mentioned is a problem when 'stable_write' flag gets toggled on the device and in that case having to propagate the flag update to all the address_space structures is a nightmare... Honza
On Thu, May 04, 2023 at 05:55:56PM +0200, Jan Kara wrote: > For bdev address_space that's easy but what Ilya also mentioned is a > problem when 'stable_write' flag gets toggled on the device and in that > case having to propagate the flag update to all the address_space > structures is a nightmare... We have a number of flags which don't take effect when modified on a block device with a mounted filesystem on it. For example, modifying the readahead settings do not change existing files, only new ones. Since this flag is only modifiable for debugging purposes, I think I'm OK with it not affecting already-mounted filesystems. It feels like a decision that reasonable people could disagree on, though.
On Fri 05-05-23 09:07:36, Dave Chinner wrote: > On Thu, May 04, 2023 at 05:16:48PM +0100, Matthew Wilcox wrote: > > On Thu, May 04, 2023 at 05:55:56PM +0200, Jan Kara wrote: > > > For bdev address_space that's easy but what Ilya also mentioned is a > > > problem when 'stable_write' flag gets toggled on the device and in that > > > case having to propagate the flag update to all the address_space > > > structures is a nightmare... > > > > We have a number of flags which don't take effect when modified on a > > block device with a mounted filesystem on it. For example, modifying > > the readahead settings do not change existing files, only new ones. > > Since this flag is only modifiable for debugging purposes, I think I'm > > OK with it not affecting already-mounted filesystems. It feels like a > > decision that reasonable people could disagree on, though. > > I think an address space flag makes sense, because then we don't > even have to care about the special bdev sb/inode thing - > folio->mapping will already point at the bdev mapping and so do the > right thing. > > That is, if the bdev changes stable_write state, it can toggle the > AS_STABLE_WRITE flag on it's inode->i_mapping straight away and all > the folios and files pointing to the bdev mapping will change > behaviour immediately. Everything else retains the same behaviour > we have now - the stable_write state is persistent on the superblock > until the filesystem mount is cycled. Yeah, I'm fine with this behavior. I just wasn't sure whether Ilya didn't need the sysfs change to be visible in the filesystem so that was why I pointed that out. But apparently he doesn't need it. Honza
diff --git a/fs/super.c b/fs/super.c index 04bc62ab7dfe..6705b3506ae8 100644 --- a/fs/super.c +++ b/fs/super.c @@ -1213,8 +1213,6 @@ static int set_bdev_super(struct super_block *s, void *data) s->s_dev = s->s_bdev->bd_dev; s->s_bdi = bdi_get(s->s_bdev->bd_disk->bdi); - if (bdev_stable_writes(s->s_bdev)) - s->s_iflags |= SB_I_STABLE_WRITES; return 0; } diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 516b1aa247e8..469bc57add8c 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -3169,7 +3169,17 @@ EXPORT_SYMBOL_GPL(folio_wait_writeback_killable); */ void folio_wait_stable(struct folio *folio) { - if (folio_inode(folio)->i_sb->s_iflags & SB_I_STABLE_WRITES) + struct inode *inode = folio_inode(folio); + struct super_block *sb = inode->i_sb; + bool stable_writes; + + if (sb_is_blkdev_sb(sb)) + stable_writes = bdev_stable_writes(I_BDEV(inode)); + else + stable_writes = bdev_stable_writes(sb->s_bdev) || + (sb->s_iflags & SB_I_STABLE_WRITES); + + if (stable_writes) folio_wait_writeback(folio); } EXPORT_SYMBOL_GPL(folio_wait_stable);