Message ID | 20230124193025.185781-3-jlayton@kernel.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp2335421wrn; Tue, 24 Jan 2023 11:31:36 -0800 (PST) X-Google-Smtp-Source: AMrXdXulPPvshy3uEpoDXqjFDlga2ijNk6nPyRfdh6OFLSbytj+D51Vv4o0sN03Jsw2e40rdu9tn X-Received: by 2002:a05:6a20:c213:b0:b8:5c35:b75 with SMTP id bt19-20020a056a20c21300b000b85c350b75mr27783124pzb.26.1674588696498; Tue, 24 Jan 2023 11:31:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674588696; cv=none; d=google.com; s=arc-20160816; b=Pj12HvJbI2Xr0hbi1WcazbOgSvS4QaIGYsT/X4AKrKOB/vfsCkhqGCHO76M87M7caj domOYDt/JUayKrmr0Pg794B8hVDkTIEFLxm9YmFuwamdhahwEZRD67dAEr+vS/cRynH1 qhzL2SrUyr9ioKtr7WuOFex2NizbqM1ZaOfn7fVPUKrrM94JKYBcLvGlE1OvKyknNVCD VQ4nRdauVrWW/XCjLKyhVUcEB2zxUHXSFoG5o0J9zPNW3LJFX3fDaddxSEUlizSip/mP HbDqN6U+hZm/UHh5iUT4bSmD+b/APC3YBzukCJyDHr6Ys19BZgNnNyLYYndaI64McF52 nOUg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=j48smMvRMETMPm7bKXa5AlIl76+F3lPBKsZ77LpKfOc=; b=diJZ+YjHJS1/XAo/k63YHA3G8QxUwrFqcrQy4FLTAcMu7Ke8iKNnhvtCiRSEFpU8NT Y8qI0pIQxoP0kcz4Ld2GzG/QhJ7ncHy2siYIgwoig9kkxR1LZFocNjImee8nKqEcVWgX mcjuBb9lPUN091Ucaaej50h/nEOb9BTmRsaPSQMD/dmmM+RPkQOJW4AxIvDhe6wH5zOc tN2CvrDop1wGoxiGEHC8oye2FStIg6JVCdERxTudnDPOeJKE0dX0L/aajVvkBffNcU5p l+DN3YRxkDXU5UdSAkpM2KVIIfesZJ3vXMOLNrii0IwWHvxsl1JxYP2PUqdDVYFGDo72 gkRA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=vFjMvDbV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q30-20020a63751e000000b004b484eb665bsi3151640pgc.16.2023.01.24.11.31.24; Tue, 24 Jan 2023 11:31:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=vFjMvDbV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234264AbjAXTak (ORCPT <rfc822;rust.linux@gmail.com> + 99 others); Tue, 24 Jan 2023 14:30:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32922 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230461AbjAXTaf (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 24 Jan 2023 14:30:35 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CF5804DBDC; Tue, 24 Jan 2023 11:30:34 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6ACD761323; Tue, 24 Jan 2023 19:30:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BFC9EC433EF; Tue, 24 Jan 2023 19:30:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1674588633; bh=7icNKyCzEc+AGNwxXSz8RhAIx0cDMmt7igP9nFBKIrE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=vFjMvDbVbNFKgJn4lxivMvgzp/tDIEw9JKcAQ6WEIDS6lN25HTVXKxuCvlXN9sRb2 1mCRdMwbOuTgi8m2BQfJmgF7cU28fMVYC0s0DSDWsN/u5iSE2OWxi32MZ87S548WMx WCnMBODaCGiEeRIfQKtkmpa/r5GLDVSTL+mgoX+CxbwUo4oUcfUqAkBvhH5ngInf1n YbnzgUmOTrSJuEMlqDSNPUVyIxIu115IsFuT6Xf+ve6HXp1xipXGt9zlxmfkl4vQq+ M+BW1XA8rKK24ZfWZBsbIyPAsBqQ9K/NzmAt1wMreoWOB0aCl/FwdfUqA2NUgOTXcz zKgv5NZznoZmg== From: Jeff Layton <jlayton@kernel.org> To: tytso@mit.edu, adilger.kernel@dilger.ca, djwong@kernel.org, david@fromorbit.com, trondmy@hammerspace.com, neilb@suse.de, viro@zeniv.linux.org.uk, zohar@linux.ibm.com, xiubli@redhat.com, chuck.lever@oracle.com, lczerner@redhat.com, jack@suse.cz, bfields@fieldses.org, brauner@kernel.org, fweimer@redhat.com Cc: linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ceph-devel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-nfs@vger.kernel.org, linux-xfs@vger.kernel.org, Colin Walters <walters@verbum.org> Subject: [PATCH v8 RESEND 2/8] fs: clarify when the i_version counter must be updated Date: Tue, 24 Jan 2023 14:30:19 -0500 Message-Id: <20230124193025.185781-3-jlayton@kernel.org> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230124193025.185781-1-jlayton@kernel.org> References: <20230124193025.185781-1-jlayton@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755933516990356062?= X-GMAIL-MSGID: =?utf-8?q?1755933516990356062?= |
Series |
fs: clean up internal i_version handling
|
|
Commit Message
Jeff Layton
Jan. 24, 2023, 7:30 p.m. UTC
The i_version field in the kernel has had different semantics over
the decades, but NFSv4 has certain expectations. Update the comments
in iversion.h to describe when the i_version must change.
Cc: Colin Walters <walters@verbum.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Trond Myklebust <trondmy@hammerspace.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
include/linux/iversion.h | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
Comments
On Tue 24-01-23 14:30:19, Jeff Layton wrote: > The i_version field in the kernel has had different semantics over > the decades, but NFSv4 has certain expectations. Update the comments > in iversion.h to describe when the i_version must change. > > Cc: Colin Walters <walters@verbum.org> > Cc: NeilBrown <neilb@suse.de> > Cc: Trond Myklebust <trondmy@hammerspace.com> > Cc: Dave Chinner <david@fromorbit.com> > Signed-off-by: Jeff Layton <jlayton@kernel.org> Looks good to me. But one note below: > diff --git a/include/linux/iversion.h b/include/linux/iversion.h > index 6755d8b4f20b..fced8115a5f4 100644 > --- a/include/linux/iversion.h > +++ b/include/linux/iversion.h > @@ -9,8 +9,25 @@ > * --------------------------- > * The change attribute (i_version) is mandated by NFSv4 and is mostly for > * knfsd, but is also used for other purposes (e.g. IMA). The i_version must > - * appear different to observers if there was a change to the inode's data or > - * metadata since it was last queried. > + * appear larger to observers if there was an explicit change to the inode's > + * data or metadata since it was last queried. > + * > + * An explicit change is one that would ordinarily result in a change to the > + * inode status change time (aka ctime). i_version must appear to change, even > + * if the ctime does not (since the whole point is to avoid missing updates due > + * to timestamp granularity). If POSIX or other relevant spec mandates that the > + * ctime must change due to an operation, then the i_version counter must be > + * incremented as well. > + * > + * Making the i_version update completely atomic with the operation itself would > + * be prohibitively expensive. Traditionally the kernel has updated the times on > + * directories after an operation that changes its contents. For regular files, > + * the ctime is usually updated before the data is copied into the cache for a > + * write. This means that there is a window of time when an observer can > + * associate a new timestamp with old file contents. Since the purpose of the > + * i_version is to allow for better cache coherency, the i_version must always > + * be updated after the results of the operation are visible. Updating it before > + * and after a change is also permitted. This sounds good but it is not the case for any of the current filesystems, is it? Perhaps the documentation should mention this so that people are not confused? Honza
On Tue, Jan 24, 2023 at 02:30:19PM -0500, Jeff Layton wrote: > The i_version field in the kernel has had different semantics over > the decades, but NFSv4 has certain expectations. Update the comments > in iversion.h to describe when the i_version must change. > > Cc: Colin Walters <walters@verbum.org> > Cc: NeilBrown <neilb@suse.de> > Cc: Trond Myklebust <trondmy@hammerspace.com> > Cc: Dave Chinner <david@fromorbit.com> > Signed-off-by: Jeff Layton <jlayton@kernel.org> > --- Looks good to me, Reviewed-by: Christian Brauner <brauner@kernel.org>
On Wed, 2023-01-25 at 17:06 +0100, Jan Kara wrote: > On Tue 24-01-23 14:30:19, Jeff Layton wrote: > > The i_version field in the kernel has had different semantics over > > the decades, but NFSv4 has certain expectations. Update the comments > > in iversion.h to describe when the i_version must change. > > > > Cc: Colin Walters <walters@verbum.org> > > Cc: NeilBrown <neilb@suse.de> > > Cc: Trond Myklebust <trondmy@hammerspace.com> > > Cc: Dave Chinner <david@fromorbit.com> > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > Looks good to me. But one note below: > > > diff --git a/include/linux/iversion.h b/include/linux/iversion.h > > index 6755d8b4f20b..fced8115a5f4 100644 > > --- a/include/linux/iversion.h > > +++ b/include/linux/iversion.h > > @@ -9,8 +9,25 @@ > > * --------------------------- > > * The change attribute (i_version) is mandated by NFSv4 and is mostly for > > * knfsd, but is also used for other purposes (e.g. IMA). The i_version must > > - * appear different to observers if there was a change to the inode's data or > > - * metadata since it was last queried. > > + * appear larger to observers if there was an explicit change to the inode's > > + * data or metadata since it was last queried. > > + * > > + * An explicit change is one that would ordinarily result in a change to the > > + * inode status change time (aka ctime). i_version must appear to change, even > > + * if the ctime does not (since the whole point is to avoid missing updates due > > + * to timestamp granularity). If POSIX or other relevant spec mandates that the > > + * ctime must change due to an operation, then the i_version counter must be > > + * incremented as well. > > + * > > + * Making the i_version update completely atomic with the operation itself would > > + * be prohibitively expensive. Traditionally the kernel has updated the times on > > + * directories after an operation that changes its contents. For regular files, > > + * the ctime is usually updated before the data is copied into the cache for a > > + * write. This means that there is a window of time when an observer can > > + * associate a new timestamp with old file contents. Since the purpose of the > > + * i_version is to allow for better cache coherency, the i_version must always > > + * be updated after the results of the operation are visible. Updating it before > > + * and after a change is also permitted. > > This sounds good but it is not the case for any of the current filesystems, is > it? Perhaps the documentation should mention this so that people are not > confused? > > Honza Correct. Currently, all filesystems change the times and version before a write instead of after. I'm hoping that situation will change soon though, as I've been working on a patchset to fix this for tmpfs, ext4 and btrfs. If you still want to see something for this though, what would you suggest for verbiage? Thanks,
On Thu 26-01-23 05:54:16, Jeff Layton wrote: > On Wed, 2023-01-25 at 17:06 +0100, Jan Kara wrote: > > On Tue 24-01-23 14:30:19, Jeff Layton wrote: > > > The i_version field in the kernel has had different semantics over > > > the decades, but NFSv4 has certain expectations. Update the comments > > > in iversion.h to describe when the i_version must change. > > > > > > Cc: Colin Walters <walters@verbum.org> > > > Cc: NeilBrown <neilb@suse.de> > > > Cc: Trond Myklebust <trondmy@hammerspace.com> > > > Cc: Dave Chinner <david@fromorbit.com> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > Looks good to me. But one note below: > > > > > diff --git a/include/linux/iversion.h b/include/linux/iversion.h > > > index 6755d8b4f20b..fced8115a5f4 100644 > > > --- a/include/linux/iversion.h > > > +++ b/include/linux/iversion.h > > > @@ -9,8 +9,25 @@ > > > * --------------------------- > > > * The change attribute (i_version) is mandated by NFSv4 and is mostly for > > > * knfsd, but is also used for other purposes (e.g. IMA). The i_version must > > > - * appear different to observers if there was a change to the inode's data or > > > - * metadata since it was last queried. > > > + * appear larger to observers if there was an explicit change to the inode's > > > + * data or metadata since it was last queried. > > > + * > > > + * An explicit change is one that would ordinarily result in a change to the > > > + * inode status change time (aka ctime). i_version must appear to change, even > > > + * if the ctime does not (since the whole point is to avoid missing updates due > > > + * to timestamp granularity). If POSIX or other relevant spec mandates that the > > > + * ctime must change due to an operation, then the i_version counter must be > > > + * incremented as well. > > > + * > > > + * Making the i_version update completely atomic with the operation itself would > > > + * be prohibitively expensive. Traditionally the kernel has updated the times on > > > + * directories after an operation that changes its contents. For regular files, > > > + * the ctime is usually updated before the data is copied into the cache for a > > > + * write. This means that there is a window of time when an observer can > > > + * associate a new timestamp with old file contents. Since the purpose of the > > > + * i_version is to allow for better cache coherency, the i_version must always > > > + * be updated after the results of the operation are visible. Updating it before > > > + * and after a change is also permitted. > > > > This sounds good but it is not the case for any of the current filesystems, is > > it? Perhaps the documentation should mention this so that people are not > > confused? > > Correct. Currently, all filesystems change the times and version before > a write instead of after. I'm hoping that situation will change soon > though, as I've been working on a patchset to fix this for tmpfs, ext4 > and btrfs. That is good but we'll see how long it takes to get merged. AFAIR it is not a complete nobrainer ;) > If you still want to see something for this though, what would you > suggest for verbiage? Sure: ... the i_version must a be updated after the results of the operation are visible (note that none of the filesystems currently do this, it is a work in progress to fix this). And once your patches are merged, you can also delete this note :). Honza
On Thu, 2023-01-26 at 12:36 +0100, Jan Kara wrote: > On Thu 26-01-23 05:54:16, Jeff Layton wrote: > > On Wed, 2023-01-25 at 17:06 +0100, Jan Kara wrote: > > > On Tue 24-01-23 14:30:19, Jeff Layton wrote: > > > > The i_version field in the kernel has had different semantics over > > > > the decades, but NFSv4 has certain expectations. Update the comments > > > > in iversion.h to describe when the i_version must change. > > > > > > > > Cc: Colin Walters <walters@verbum.org> > > > > Cc: NeilBrown <neilb@suse.de> > > > > Cc: Trond Myklebust <trondmy@hammerspace.com> > > > > Cc: Dave Chinner <david@fromorbit.com> > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > > > Looks good to me. But one note below: > > > > > > > diff --git a/include/linux/iversion.h b/include/linux/iversion.h > > > > index 6755d8b4f20b..fced8115a5f4 100644 > > > > --- a/include/linux/iversion.h > > > > +++ b/include/linux/iversion.h > > > > @@ -9,8 +9,25 @@ > > > > * --------------------------- > > > > * The change attribute (i_version) is mandated by NFSv4 and is mostly for > > > > * knfsd, but is also used for other purposes (e.g. IMA). The i_version must > > > > - * appear different to observers if there was a change to the inode's data or > > > > - * metadata since it was last queried. > > > > + * appear larger to observers if there was an explicit change to the inode's > > > > + * data or metadata since it was last queried. > > > > + * > > > > + * An explicit change is one that would ordinarily result in a change to the > > > > + * inode status change time (aka ctime). i_version must appear to change, even > > > > + * if the ctime does not (since the whole point is to avoid missing updates due > > > > + * to timestamp granularity). If POSIX or other relevant spec mandates that the > > > > + * ctime must change due to an operation, then the i_version counter must be > > > > + * incremented as well. > > > > + * > > > > + * Making the i_version update completely atomic with the operation itself would > > > > + * be prohibitively expensive. Traditionally the kernel has updated the times on > > > > + * directories after an operation that changes its contents. For regular files, > > > > + * the ctime is usually updated before the data is copied into the cache for a > > > > + * write. This means that there is a window of time when an observer can > > > > + * associate a new timestamp with old file contents. Since the purpose of the > > > > + * i_version is to allow for better cache coherency, the i_version must always > > > > + * be updated after the results of the operation are visible. Updating it before > > > > + * and after a change is also permitted. > > > > > > This sounds good but it is not the case for any of the current filesystems, is > > > it? Perhaps the documentation should mention this so that people are not > > > confused? > > > > Correct. Currently, all filesystems change the times and version before > > a write instead of after. I'm hoping that situation will change soon > > though, as I've been working on a patchset to fix this for tmpfs, ext4 > > and btrfs. > > That is good but we'll see how long it takes to get merged. AFAIR it is not > a complete nobrainer ;) > > > If you still want to see something for this though, what would you > > suggest for verbiage? > > Sure: > > ... the i_version must a be updated after the results of the operation are > visible (note that none of the filesystems currently do this, it is a work > in progress to fix this). > > And once your patches are merged, you can also delete this note :). > > Honza Sounds good, I folded something similar to that into the patch and pushed it into the branch I'm feeding into linux-next. I won't bother re-posting for just that though: diff --git a/include/linux/iversion.h b/include/linux/iversion.h index fced8115a5f4..f174ff1b59ee 100644 --- a/include/linux/iversion.h +++ b/include/linux/iversion.h @@ -27,7 +27,8 @@ * associate a new timestamp with old file contents. Since the purpose of the * i_version is to allow for better cache coherency, the i_version must always * be updated after the results of the operation are visible. Updating it before - * and after a change is also permitted. + * and after a change is also permitted. (Note that no filesystems currently do + * this. Fixing that is a work-in-progress). * * Observers see the i_version as a 64-bit number that never decreases. If it * remains the same since it was last checked, then nothing has changed in the Thanks!
diff --git a/include/linux/iversion.h b/include/linux/iversion.h index 6755d8b4f20b..fced8115a5f4 100644 --- a/include/linux/iversion.h +++ b/include/linux/iversion.h @@ -9,8 +9,25 @@ * --------------------------- * The change attribute (i_version) is mandated by NFSv4 and is mostly for * knfsd, but is also used for other purposes (e.g. IMA). The i_version must - * appear different to observers if there was a change to the inode's data or - * metadata since it was last queried. + * appear larger to observers if there was an explicit change to the inode's + * data or metadata since it was last queried. + * + * An explicit change is one that would ordinarily result in a change to the + * inode status change time (aka ctime). i_version must appear to change, even + * if the ctime does not (since the whole point is to avoid missing updates due + * to timestamp granularity). If POSIX or other relevant spec mandates that the + * ctime must change due to an operation, then the i_version counter must be + * incremented as well. + * + * Making the i_version update completely atomic with the operation itself would + * be prohibitively expensive. Traditionally the kernel has updated the times on + * directories after an operation that changes its contents. For regular files, + * the ctime is usually updated before the data is copied into the cache for a + * write. This means that there is a window of time when an observer can + * associate a new timestamp with old file contents. Since the purpose of the + * i_version is to allow for better cache coherency, the i_version must always + * be updated after the results of the operation are visible. Updating it before + * and after a change is also permitted. * * Observers see the i_version as a 64-bit number that never decreases. If it * remains the same since it was last checked, then nothing has changed in the