mbox series

[GIT,PULL] timestamp fixes

Message ID	20230918-hirte-neuzugang-4c2324e7bae3@brauner
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; From: Christian Brauner <brauner@kernel.org> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: Christian Brauner <brauner@kernel.org>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [GIT PULL] timestamp fixes Date: Mon, 18 Sep 2023 13:53:25 +0200 Message-Id: <20230918-hirte-neuzugang-4c2324e7bae3@brauner> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[GIT,PULL] timestamp fixes \| [GIT,PULL] timestamp fixes

Pull-request

git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/v6.6-rc2.vfs.ctime

Message

Christian Brauner Sept. 18, 2023, 11:53 a.m. UTC

  Hey Linus,

/* Summary */
This contains a few fixes for the multi-grain timestamps work for this cycle:

* Only update the atime if "now" is later than the current value. This
  can happen when the atime gets updated with a fine-grained timestamp
  and then later gets updated using a coarse-grained timestamp.
* Make sure setattr_copy() handles multi-grained timestamps.
* Always initialize the __i_ctime field during allocation otherwise
  inode_set_ctime_current() may skip initializing __i_ctime because the
  random value in there might be in the future.
* Fix overlayfs to always set ctime when it sets atime and mtime.

/* Testing */
clang: Ubuntu clang version 15.0.7
gcc: (Ubuntu 12.2.0-3ubuntu1) 12.2.0

All patches are based on v6.6-rc1 and have been sitting in linux-next.
No build failures or warnings were observed. All old and new tests in
selftests, and LTP pass without regressions.

/* Conflicts */
At the time of creating this PR no merge conflicts were reported from
linux-next and no merge conflicts showed up doing a test-merge with
current mainline.

The following changes since commit 0bb80ecc33a8fb5a682236443c1e740d5c917d1d:

  Linux 6.6-rc1 (2023-09-10 16:28:41 -0700)

are available in the Git repository at:

  git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/v6.6-rc2.vfs.ctime

for you to fetch changes up to f8edd33686154b9165457c95e2ed5943e916781e:

  overlayfs: set ctime when setting mtime and atime (2023-09-14 10:24:36 +0200)

Please consider pulling these changes from the signed v6.6-rc2.vfs.ctime tag.

Thanks!
Christian

----------------------------------------------------------------
v6.6-rc2.vfs.ctime

----------------------------------------------------------------
Jeff Layton (4):
      fs: have setattr_copy handle multigrain timestamps appropriately
      fs: initialize inode->__i_ctime to the epoch
      fs: don't update the atime if existing atime is newer than "now"
      overlayfs: set ctime when setting mtime and atime

 fs/attr.c              | 52 ++++++++++++++++++++++++++++++++++++++++++++------
 fs/inode.c             |  6 ++++--
 fs/overlayfs/copy_up.c |  2 +-
 3 files changed, 51 insertions(+), 9 deletions(-)

Comments

Linus Torvalds Sept. 18, 2023, 6:24 p.m. UTC | #1

On Mon, 18 Sept 2023 at 04:54, Christian Brauner <brauner@kernel.org> wrote:
>
> * Only update the atime if "now" is later than the current value. This
>   can happen when the atime gets updated with a fine-grained timestamp
>   and then later gets updated using a coarse-grained timestamp.

I pulled this, and then I unpulled it again.

I think this is fundamentally wrong.

If somebody has set the time into the future (for whatever reason -
maybe the clocks were wrong at some point), afaik accessing a file
should reset it, and very much used to do that.

Am I missing something? Because this really seems *horribly* broken garbage.

Any "go from fine-grained back to coarse-grained" situation needs to
explicitly test *that* case.

Not some kind of completely broken "don't update to past value" like this.

               Linus

Jeff Layton Sept. 18, 2023, 7:39 p.m. UTC | #2

On Mon, 2023-09-18 at 11:24 -0700, Linus Torvalds wrote:
> On Mon, 18 Sept 2023 at 04:54, Christian Brauner <brauner@kernel.org> wrote:
> > 
> > * Only update the atime if "now" is later than the current value. This
> >   can happen when the atime gets updated with a fine-grained timestamp
> >   and then later gets updated using a coarse-grained timestamp.
> 
> I pulled this, and then I unpulled it again.
> 
> I think this is fundamentally wrong.
> 
> If somebody has set the time into the future (for whatever reason -
> maybe the clocks were wrong at some point), afaik accessing a file
> should reset it, and very much used to do that.
> 
> Am I missing something? Because this really seems *horribly* broken garbage.
> 
> Any "go from fine-grained back to coarse-grained" situation needs to
> explicitly test *that* case.
> 
> Not some kind of completely broken "don't update to past value" like this.
> 

Fair point.  Now that I've considered it some more, I think that commit
7df48e7d99a4 (fs: don't update the atime if existing atime is newer than
"now") is not necessary.

What prompted this was a bug report from the kernel test robot that
showed the atime going backward on a STRICTATIME LTP test, but I think
the root cause of that was the missing ctime initialization after
allocation that we fixed in 0a22d3ff61b7 (fs: initialize
inode->__i_ctime to the epoch).

In general, we always update the atime with a coarse-grained timestamp,
since atime and ctime updates are never done together during normal read
and write operations. As you note, things are a little more murky with
utimes() updates but I think we should be safe to overwrite the atime
with a coarse-grained timestamp unconditionally.

We should be able to just drop that patch from the series. Whether you
want to pull the rest now or wait for a bit, I'll leave up to you and
Christian to decide.

Thanks,

Linus Torvalds Sept. 18, 2023, 8:18 p.m. UTC | #3

On Mon, 18 Sept 2023 at 12:39, Jeff Layton <jlayton@kernel.org> wrote:
>
> In general, we always update the atime with a coarse-grained timestamp,
> since atime and ctime updates are never done together during normal read
> and write operations. As you note, things are a little more murky with
> utimes() updates but I think we should be safe to overwrite the atime
> with a coarse-grained timestamp unconditionally.

I do think utimes() ends up always overwriting, but that's a different
code-path entirely (ie it goes through the ->setattr() logic, not this
inode_update_timestamps() code).

So I *think* that even with your patch, doing a "touch" would end up
doing the right thing - it would update atime even if it was in the
future before.

But doing a plain "read()" would break, and not update atime.

That said, I didn't actually ever *test* any of this, so this is
purely from reading the patch, and I can easily have missed something.

Anyway, I do think that the timespec64_equal() tests are a bit iffy in
fs/inode.c now, since the timespecs that are being tested might be of
different precision.

So I do think there's a *problem* here, I just do not believe that
doing that timespec64_equal() -> timespec64_compare() is at all the
right thing to do.

My *gut* feel is that in both cases, we have this

        if (timespec64_equal(&inode->i_atime, &now))

and the problem is that *sometimes* 'now' is the coarse time, but
sometimes it's the fine-grained one, and so checking for equality is
simply nonsensical.

I get the feeling that that timespec64_equal() logic for those atime
updates should be something like

 - if 'now' is in the future, we always considering it different, and
update the time

 - if 'now' is further in the past than the coarse granularity, we
also update the time ("clearly not equal")

 - but if 'now' is in the past, but within the coarse time
granularity, we consider it equal and do not update anything

but it's not like I have really given this a huge amount of thought.
It's just that "don't update if in the past" that I am pretty sure can
*not* be right.

                  Linus

Jeff Layton Sept. 18, 2023, 8:56 p.m. UTC | #4

On Mon, 2023-09-18 at 13:18 -0700, Linus Torvalds wrote:
> On Mon, 18 Sept 2023 at 12:39, Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > In general, we always update the atime with a coarse-grained timestamp,
> > since atime and ctime updates are never done together during normal read
> > and write operations. As you note, things are a little more murky with
> > utimes() updates but I think we should be safe to overwrite the atime
> > with a coarse-grained timestamp unconditionally.
> 
> I do think utimes() ends up always overwriting, but that's a different
> code-path entirely (ie it goes through the ->setattr() logic, not this
> inode_update_timestamps() code).
> 
> So I *think* that even with your patch, doing a "touch" would end up
> doing the right thing - it would update atime even if it was in the
> future before.
> 
> But doing a plain "read()" would break, and not update atime.
> 
> That said, I didn't actually ever *test* any of this, so this is
> purely from reading the patch, and I can easily have missed something.
> 

No, you're quite right. That's exactly what would have happened.

> Anyway, I do think that the timespec64_equal() tests are a bit iffy in
> fs/inode.c now, since the timespecs that are being tested might be of
> different precision.
> 
> So I do think there's a *problem* here, I just do not believe that
> doing that timespec64_equal() -> timespec64_compare() is at all the
> right thing to do.
> 
> My *gut* feel is that in both cases, we have this
> 
>         if (timespec64_equal(&inode->i_atime, &now))
> 
> and the problem is that *sometimes* 'now' is the coarse time, but
> sometimes it's the fine-grained one, and so checking for equality is
> simply nonsensical.
> 
> I get the feeling that that timespec64_equal() logic for those atime
> updates should be something like
> 
>  - if 'now' is in the future, we always considering it different, and
> update the time
> 
>  - if 'now' is further in the past than the coarse granularity, we
> also update the time ("clearly not equal")
> 
>  - but if 'now' is in the past, but within the coarse time
> granularity, we consider it equal and do not update anything
> 
> but it's not like I have really given this a huge amount of thought.
> It's just that "don't update if in the past" that I am pretty sure can
> *not* be right.
> 
> 

I think the atime problem is solved by just dropping the patch I
mentioned before. The atime is updated for read operations and the
m/ctime for write. You only get a fine-grained timestamp when updating
the ctime, so for an atime update we always use a coarse timestamp. It
should never roll backward. [1]

We may have a problem with the ctime update though, since you pointed it
out. We have this in inode_set_ctime_current(), in the codepath where
the QUERIED bit isn't set:

                /*
                 * If we've recently updated with a fine-grained timestamp,
                 * then the coarse-grained one may still be earlier than the
                 * existing ctime. Just keep the existing value if so.
                 */
                ctime.tv_sec = inode->__i_ctime.tv_sec;
                if (timespec64_compare(&ctime, &now) > 0)
                        return ctime;

The ctime can't be set via utimes(), so that's not an issue here, but we
could get a realtime clock jump backward that causes this to not be
updated like it should be.

I think (like you suggest above) that this needs some bounds-checking
where we make sure that the current coarse grained time isn't more than
around 1-2 jiffies earlier than the existing ctime. If it is, then we'll
go ahead and just update it anyway.

Thoughts?

Linus Torvalds Sept. 18, 2023, 10:10 p.m. UTC | #5

On Mon, 18 Sept 2023 at 13:56, Jeff Layton <jlayton@kernel.org> wrote:
>
> We may have a problem with the ctime update though, since you pointed it
> out. We have this in inode_set_ctime_current(), in the codepath where
> the QUERIED bit isn't set:
>
>                 /*
>                  * If we've recently updated with a fine-grained timestamp,
>                  * then the coarse-grained one may still be earlier than the
>                  * existing ctime. Just keep the existing value if so.
>                  */
>                 ctime.tv_sec = inode->__i_ctime.tv_sec;
>                 if (timespec64_compare(&ctime, &now) > 0)
>                         return ctime;
>
> The ctime can't be set via utimes(), so that's not an issue here, but we
> could get a realtime clock jump backward that causes this to not be
> updated like it should be.
>
> I think (like you suggest above) that this needs some bounds-checking
> where we make sure that the current coarse grained time isn't more than
> around 1-2 jiffies earlier than the existing ctime. If it is, then we'll
> go ahead and just update it anyway.
>
> Thoughts?

Ack, that sounds about right to me.

Christian - I'm just going to assume that you'll sort this out and
I'll get a new pull request at some point. Holler if you think
something else is needed, ok?

              Linus

Christian Brauner Sept. 19, 2023, 8:28 a.m. UTC | #6

> Christian - I'm just going to assume that you'll sort this out and
> I'll get a new pull request at some point. Holler if you think
> something else is needed, ok?

I'll take care of it and will send you a new pull request once
everything's sorted. Thanks for looking!

Christian

Christian Brauner Sept. 20, 2023, 4:22 p.m. UTC | #7

On Tue, Sep 19, 2023 at 10:28:11AM +0200, Christian Brauner wrote:
> > Christian - I'm just going to assume that you'll sort this out and
> > I'll get a new pull request at some point. Holler if you think
> > something else is needed, ok?
> 
> I'll take care of it and will send you a new pull request once
> everything's sorted. Thanks for looking!

In the meantime we had a report that unconditionally enabling
multi-grain timestamps causes a regression for some users workloads.

So, the honest thing is to revert the five commits that introduced the
infrastructure on top of the cleanup we did for v6.6. Jeff and Jan
agreed and we can give it another go for v6.7+. The general improvements
to timestamp handling and related cleanup stand on their.

I'll be putting the reverts into -next now and I'll get you a pull
request by the end of the week. In case you'd rather do it right now
it's already here:

git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/v6.6-rc3.vfs.ctime.revert

I'd put something like:

Users reported regressions due to enabling multi-grained timestamps
unconditionally because timestamps appeared stale and could break
compilation. As no clear consensus on a solution has come up and the
discussion has gone back to the drawing board whether we should even
expose this to userspace, revert the infrastructure changes for. If it
isn't code that's here to stay, make it go away.

Sorry for the inconvenience.
Christian

Linus Torvalds Sept. 20, 2023, 6 p.m. UTC | #8

On Wed, 20 Sept 2023 at 09:22, Christian Brauner <brauner@kernel.org> wrote:
>
> In the meantime we had a report that unconditionally enabling
> multi-grain timestamps causes a regression for some users workloads.

Ok.

> I'll be putting the reverts into -next now and I'll get you a pull
> request by the end of the week. In case you'd rather do it right now
> it's already here:

By the end of the week is fine, I'll wait for the proper pull request.

Thanks,

             Linus