Message ID | 20230918-hirte-neuzugang-4c2324e7bae3@brauner |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp2604085vqi; Mon, 18 Sep 2023 05:07:21 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGaXzq1jamk1D/5YIZAwOXxKHaQV+uO2V6PEeld/rNf37/Hx37qYWaJY08GWo7sBmtTEoDm X-Received: by 2002:a05:6808:93:b0:3a8:5904:1a98 with SMTP id s19-20020a056808009300b003a859041a98mr9289373oic.54.1695038840902; Mon, 18 Sep 2023 05:07:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695038840; cv=none; d=google.com; s=arc-20160816; b=iCQwXbNNeSWUSVuuG/D5S0GusPyR43oY3XEL7hhPWTPZWH+dC7T/j7mFctKHEei/XZ PJoquegFYtjARaf8+DPpA+acBlrC+I84tc8JCMV1KLqA3pZRxAUag8GvU3UA6h+ZYxAf 9Y+hWlAibbuLSuCia/pY4zaWOgvSfqCZ2Xffbs5NxdZaMNGBPXiW/PAseOUjdhoBh7YY 58IsrdkKQ5G9aXiLpYTkBTrpKVhSqxLu4bFtxB9H+PFVDMStNjcbsgx/IHjDIftmy6dC 5wOugNVceUDelhKP0R5uzf+LX1nBryiis5BfFd8IfYPm/IfkQWt9FxW1/oPkFhlmn3FO qupg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=/THK81SMuKpw/81YWJ7f3dFCXvqJApUyBixBApedpqo=; fh=qDZ+YnS9HJbCVOaeWFr2FnYZt7eE+vt5IJIDIlhBdxU=; b=lsScVOsgk3Y990jXLa910EG/5hkbwBxvPJ2Alz/q7HiOG/JEXjgYcblvpds9wWZVRk e71OFaRXkJ9pSkC+A4TElEBSBKwdxxw+EYRzKBVVLEP83BE4vfiNqkHyXaD8v14RZuYh z6jQMNzgBador8sdBlpuzL0D5YvJ1NtViueuw0+1BaBbfegJD2HkN1Fv66EsftjUJ0Ke vUu/QyE4D7fdmmOMMS3KrVx4it7oqmCdBpr1a8VvO4nhNVeVkbsxDcGqsZYhdXQe81+i 68ukMKNMs9hNbzomEYs5lp6bljl5gFfTsU570PYAAjKZ9gFPaJZDyA0Zk2NFdXMoDGBt krPg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=keiGkh7A; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id 125-20020a630083000000b005694b757228si7931436pga.688.2023.09.18.05.07.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Sep 2023 05:07:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=keiGkh7A; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 06B8F80212C3; Mon, 18 Sep 2023 04:55:53 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241766AbjIRLzR (ORCPT <rfc822;kernel.ruili@gmail.com> + 27 others); Mon, 18 Sep 2023 07:55:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45586 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241856AbjIRLy5 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 18 Sep 2023 07:54:57 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A032E19A6; Mon, 18 Sep 2023 04:54:10 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BC542C433C9; Mon, 18 Sep 2023 11:54:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1695038049; bh=v+xypjGMEIlYAlPz2FidGLaJjBWo4XUOP393ptzMwzg=; h=From:To:Cc:Subject:Date:From; b=keiGkh7A0IvMJISs7RIF0kow9jUiEFqIi/T4bpgQniYEIfVNbOGX3g4jju3qiQUwX rmIoOWV/XMm2YjAn6Lp+NpYd9icMOdTd+wXl1kKOtqbk86+jvf7FAydYgS7ICTF3wD RnosGW1WJo7ayBvpQBfRvBj0WQ3ngPHde7/c0e9ukNI0IR9sjb/N8K70vk7xxSm7DV 5D3+eZ8/N7RHuk1hEzwnt3MpshQoERKsdZd/pvw37ewBT8anXxoLaO/A38I+gaxWAj pG3grOks4WM71DX1bbl0d5XsySIeNoo5+K7FazzMrVsxsQyf6uussHM/MScg9Osrxd HyJsx58/yle5w== From: Christian Brauner <brauner@kernel.org> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: Christian Brauner <brauner@kernel.org>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [GIT PULL] timestamp fixes Date: Mon, 18 Sep 2023 13:53:25 +0200 Message-Id: <20230918-hirte-neuzugang-4c2324e7bae3@brauner> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=2257; i=brauner@kernel.org; h=from:subject:message-id; bh=3jEO6GpBAHPkvujhQ7ujs4cc5WObAyG+rtu0U4VrKxU=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaRyWHHFiBgXuQhyfN29ekf7bWPFiRP8pvKIF99TlV+xtkJU 70J4RykLgxgXg6yYIotDu0m43HKeis1GmRowc1iZQIYwcHEKwEQSlRj+R6xdPG9bjsaPn5d3uc+vnt Qg4Z08+RPvp+pXbB/PeYt5rGf4K878stNgsl2V+OUJokd51htuZxVfeHUj+xJeloPe6fNl2QA= X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Mon, 18 Sep 2023 04:55:53 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777377048048448111 X-GMAIL-MSGID: 1777377048048448111 |
Series |
[GIT,PULL] timestamp fixes
|
|
Pull-request
git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/v6.6-rc2.vfs.ctimeMessage
Christian Brauner
Sept. 18, 2023, 11:53 a.m. UTC
Hey Linus, /* Summary */ This contains a few fixes for the multi-grain timestamps work for this cycle: * Only update the atime if "now" is later than the current value. This can happen when the atime gets updated with a fine-grained timestamp and then later gets updated using a coarse-grained timestamp. * Make sure setattr_copy() handles multi-grained timestamps. * Always initialize the __i_ctime field during allocation otherwise inode_set_ctime_current() may skip initializing __i_ctime because the random value in there might be in the future. * Fix overlayfs to always set ctime when it sets atime and mtime. /* Testing */ clang: Ubuntu clang version 15.0.7 gcc: (Ubuntu 12.2.0-3ubuntu1) 12.2.0 All patches are based on v6.6-rc1 and have been sitting in linux-next. No build failures or warnings were observed. All old and new tests in selftests, and LTP pass without regressions. /* Conflicts */ At the time of creating this PR no merge conflicts were reported from linux-next and no merge conflicts showed up doing a test-merge with current mainline. The following changes since commit 0bb80ecc33a8fb5a682236443c1e740d5c917d1d: Linux 6.6-rc1 (2023-09-10 16:28:41 -0700) are available in the Git repository at: git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/v6.6-rc2.vfs.ctime for you to fetch changes up to f8edd33686154b9165457c95e2ed5943e916781e: overlayfs: set ctime when setting mtime and atime (2023-09-14 10:24:36 +0200) Please consider pulling these changes from the signed v6.6-rc2.vfs.ctime tag. Thanks! Christian ---------------------------------------------------------------- v6.6-rc2.vfs.ctime ---------------------------------------------------------------- Jeff Layton (4): fs: have setattr_copy handle multigrain timestamps appropriately fs: initialize inode->__i_ctime to the epoch fs: don't update the atime if existing atime is newer than "now" overlayfs: set ctime when setting mtime and atime fs/attr.c | 52 ++++++++++++++++++++++++++++++++++++++++++++------ fs/inode.c | 6 ++++-- fs/overlayfs/copy_up.c | 2 +- 3 files changed, 51 insertions(+), 9 deletions(-)
Comments
On Mon, 18 Sept 2023 at 04:54, Christian Brauner <brauner@kernel.org> wrote: > > * Only update the atime if "now" is later than the current value. This > can happen when the atime gets updated with a fine-grained timestamp > and then later gets updated using a coarse-grained timestamp. I pulled this, and then I unpulled it again. I think this is fundamentally wrong. If somebody has set the time into the future (for whatever reason - maybe the clocks were wrong at some point), afaik accessing a file should reset it, and very much used to do that. Am I missing something? Because this really seems *horribly* broken garbage. Any "go from fine-grained back to coarse-grained" situation needs to explicitly test *that* case. Not some kind of completely broken "don't update to past value" like this. Linus
On Mon, 2023-09-18 at 11:24 -0700, Linus Torvalds wrote: > On Mon, 18 Sept 2023 at 04:54, Christian Brauner <brauner@kernel.org> wrote: > > > > * Only update the atime if "now" is later than the current value. This > > can happen when the atime gets updated with a fine-grained timestamp > > and then later gets updated using a coarse-grained timestamp. > > I pulled this, and then I unpulled it again. > > I think this is fundamentally wrong. > > If somebody has set the time into the future (for whatever reason - > maybe the clocks were wrong at some point), afaik accessing a file > should reset it, and very much used to do that. > > Am I missing something? Because this really seems *horribly* broken garbage. > > Any "go from fine-grained back to coarse-grained" situation needs to > explicitly test *that* case. > > Not some kind of completely broken "don't update to past value" like this. > Fair point. Now that I've considered it some more, I think that commit 7df48e7d99a4 (fs: don't update the atime if existing atime is newer than "now") is not necessary. What prompted this was a bug report from the kernel test robot that showed the atime going backward on a STRICTATIME LTP test, but I think the root cause of that was the missing ctime initialization after allocation that we fixed in 0a22d3ff61b7 (fs: initialize inode->__i_ctime to the epoch). In general, we always update the atime with a coarse-grained timestamp, since atime and ctime updates are never done together during normal read and write operations. As you note, things are a little more murky with utimes() updates but I think we should be safe to overwrite the atime with a coarse-grained timestamp unconditionally. We should be able to just drop that patch from the series. Whether you want to pull the rest now or wait for a bit, I'll leave up to you and Christian to decide. Thanks,
On Mon, 18 Sept 2023 at 12:39, Jeff Layton <jlayton@kernel.org> wrote: > > In general, we always update the atime with a coarse-grained timestamp, > since atime and ctime updates are never done together during normal read > and write operations. As you note, things are a little more murky with > utimes() updates but I think we should be safe to overwrite the atime > with a coarse-grained timestamp unconditionally. I do think utimes() ends up always overwriting, but that's a different code-path entirely (ie it goes through the ->setattr() logic, not this inode_update_timestamps() code). So I *think* that even with your patch, doing a "touch" would end up doing the right thing - it would update atime even if it was in the future before. But doing a plain "read()" would break, and not update atime. That said, I didn't actually ever *test* any of this, so this is purely from reading the patch, and I can easily have missed something. Anyway, I do think that the timespec64_equal() tests are a bit iffy in fs/inode.c now, since the timespecs that are being tested might be of different precision. So I do think there's a *problem* here, I just do not believe that doing that timespec64_equal() -> timespec64_compare() is at all the right thing to do. My *gut* feel is that in both cases, we have this if (timespec64_equal(&inode->i_atime, &now)) and the problem is that *sometimes* 'now' is the coarse time, but sometimes it's the fine-grained one, and so checking for equality is simply nonsensical. I get the feeling that that timespec64_equal() logic for those atime updates should be something like - if 'now' is in the future, we always considering it different, and update the time - if 'now' is further in the past than the coarse granularity, we also update the time ("clearly not equal") - but if 'now' is in the past, but within the coarse time granularity, we consider it equal and do not update anything but it's not like I have really given this a huge amount of thought. It's just that "don't update if in the past" that I am pretty sure can *not* be right. Linus
On Mon, 2023-09-18 at 13:18 -0700, Linus Torvalds wrote: > On Mon, 18 Sept 2023 at 12:39, Jeff Layton <jlayton@kernel.org> wrote: > > > > In general, we always update the atime with a coarse-grained timestamp, > > since atime and ctime updates are never done together during normal read > > and write operations. As you note, things are a little more murky with > > utimes() updates but I think we should be safe to overwrite the atime > > with a coarse-grained timestamp unconditionally. > > I do think utimes() ends up always overwriting, but that's a different > code-path entirely (ie it goes through the ->setattr() logic, not this > inode_update_timestamps() code). > > So I *think* that even with your patch, doing a "touch" would end up > doing the right thing - it would update atime even if it was in the > future before. > > But doing a plain "read()" would break, and not update atime. > > That said, I didn't actually ever *test* any of this, so this is > purely from reading the patch, and I can easily have missed something. > No, you're quite right. That's exactly what would have happened. > Anyway, I do think that the timespec64_equal() tests are a bit iffy in > fs/inode.c now, since the timespecs that are being tested might be of > different precision. > > So I do think there's a *problem* here, I just do not believe that > doing that timespec64_equal() -> timespec64_compare() is at all the > right thing to do. > > My *gut* feel is that in both cases, we have this > > if (timespec64_equal(&inode->i_atime, &now)) > > and the problem is that *sometimes* 'now' is the coarse time, but > sometimes it's the fine-grained one, and so checking for equality is > simply nonsensical. > > I get the feeling that that timespec64_equal() logic for those atime > updates should be something like > > - if 'now' is in the future, we always considering it different, and > update the time > > - if 'now' is further in the past than the coarse granularity, we > also update the time ("clearly not equal") > > - but if 'now' is in the past, but within the coarse time > granularity, we consider it equal and do not update anything > > but it's not like I have really given this a huge amount of thought. > It's just that "don't update if in the past" that I am pretty sure can > *not* be right. > > I think the atime problem is solved by just dropping the patch I mentioned before. The atime is updated for read operations and the m/ctime for write. You only get a fine-grained timestamp when updating the ctime, so for an atime update we always use a coarse timestamp. It should never roll backward. [1] We may have a problem with the ctime update though, since you pointed it out. We have this in inode_set_ctime_current(), in the codepath where the QUERIED bit isn't set: /* * If we've recently updated with a fine-grained timestamp, * then the coarse-grained one may still be earlier than the * existing ctime. Just keep the existing value if so. */ ctime.tv_sec = inode->__i_ctime.tv_sec; if (timespec64_compare(&ctime, &now) > 0) return ctime; The ctime can't be set via utimes(), so that's not an issue here, but we could get a realtime clock jump backward that causes this to not be updated like it should be. I think (like you suggest above) that this needs some bounds-checking where we make sure that the current coarse grained time isn't more than around 1-2 jiffies earlier than the existing ctime. If it is, then we'll go ahead and just update it anyway. Thoughts?
On Mon, 18 Sept 2023 at 13:56, Jeff Layton <jlayton@kernel.org> wrote: > > We may have a problem with the ctime update though, since you pointed it > out. We have this in inode_set_ctime_current(), in the codepath where > the QUERIED bit isn't set: > > /* > * If we've recently updated with a fine-grained timestamp, > * then the coarse-grained one may still be earlier than the > * existing ctime. Just keep the existing value if so. > */ > ctime.tv_sec = inode->__i_ctime.tv_sec; > if (timespec64_compare(&ctime, &now) > 0) > return ctime; > > The ctime can't be set via utimes(), so that's not an issue here, but we > could get a realtime clock jump backward that causes this to not be > updated like it should be. > > I think (like you suggest above) that this needs some bounds-checking > where we make sure that the current coarse grained time isn't more than > around 1-2 jiffies earlier than the existing ctime. If it is, then we'll > go ahead and just update it anyway. > > Thoughts? Ack, that sounds about right to me. Christian - I'm just going to assume that you'll sort this out and I'll get a new pull request at some point. Holler if you think something else is needed, ok? Linus
> Christian - I'm just going to assume that you'll sort this out and > I'll get a new pull request at some point. Holler if you think > something else is needed, ok? I'll take care of it and will send you a new pull request once everything's sorted. Thanks for looking! Christian
On Tue, Sep 19, 2023 at 10:28:11AM +0200, Christian Brauner wrote: > > Christian - I'm just going to assume that you'll sort this out and > > I'll get a new pull request at some point. Holler if you think > > something else is needed, ok? > > I'll take care of it and will send you a new pull request once > everything's sorted. Thanks for looking! In the meantime we had a report that unconditionally enabling multi-grain timestamps causes a regression for some users workloads. So, the honest thing is to revert the five commits that introduced the infrastructure on top of the cleanup we did for v6.6. Jeff and Jan agreed and we can give it another go for v6.7+. The general improvements to timestamp handling and related cleanup stand on their. I'll be putting the reverts into -next now and I'll get you a pull request by the end of the week. In case you'd rather do it right now it's already here: git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/v6.6-rc3.vfs.ctime.revert I'd put something like: Users reported regressions due to enabling multi-grained timestamps unconditionally because timestamps appeared stale and could break compilation. As no clear consensus on a solution has come up and the discussion has gone back to the drawing board whether we should even expose this to userspace, revert the infrastructure changes for. If it isn't code that's here to stay, make it go away. Sorry for the inconvenience. Christian
On Wed, 20 Sept 2023 at 09:22, Christian Brauner <brauner@kernel.org> wrote: > > In the meantime we had a report that unconditionally enabling > multi-grain timestamps causes a regression for some users workloads. Ok. > I'll be putting the reverts into -next now and I'll get you a pull > request by the end of the week. In case you'd rather do it right now > it's already here: By the end of the week is fine, I'll wait for the proper pull request. Thanks, Linus